Skip to content

Add CUR 2.0 cost analytics and billing protection#103

Merged
Alexanderamiri merged 1 commit into
mainfrom
feat/cur-cost-analytics
Mar 26, 2026
Merged

Add CUR 2.0 cost analytics and billing protection#103
Alexanderamiri merged 1 commit into
mainfrom
feat/cur-cost-analytics

Conversation

@Alexanderamiri
Copy link
Copy Markdown
Member

Summary

  • CUR 2.0 export (daily Parquet with resource IDs) via aws_bcmdataexports_export — enables resource-level cost attribution ("this specific S3 bucket cost $X")
  • Athena workgroup + Glue crawler for querying CUR data from Lambdas and CLI
  • Billing protection: CloudWatch billing alarm ($200 threshold, ~4-6h delay) + account-level budget with 80%/100% notifications ($500/month)
  • Weekly cost report: new resource-level drilldown section (top 10 resources by cost), enriched LLM narrative with resource context
  • Daily spike check: per-spike resource breakdown showing top 5 resources for each spiking service
  • Shared athena.py: reusable Athena query utility with graceful degradation (reports still send if CUR data is unavailable)

New resources

Resource Purpose
S3 javabin-cur-553637109631 CUR 2.0 Parquet data (lifecycle: IA 90d, expire 365d)
S3 javabin-athena-results-553637109631 Athena query results (expire 7d)
aws_bcmdataexports_export CUR 2.0 daily export with RESOURCES
Glue database javabin_cur Catalog for CUR tables
Glue crawler javabin-cur-crawler Daily 06:00 UTC schema discovery
Athena workgroup javabin-cost-analytics 100MB scan limit
CloudWatch alarm javabin-billing-alarm EstimatedCharges > $200 (us-east-1)
Budget javabin-account-monthly $500/month, alerts at 80%/100%

Notes

  • CUR data takes 24-48h for first delivery after apply
  • Glue crawler must run after first delivery before Athena queries work
  • All CUR queries gracefully degrade — existing reports work unchanged until data arrives
  • Billing alarm SNS topic is in us-east-1 (CloudWatch billing constraint). A Lambda forwarder to eu-central-1 Slack can be added later.

Test plan

  • terraform fmt -recursive && terraform validate passes
  • Apply via CI pipeline (plan → review → apply)
  • After 24-48h: verify CUR Parquet files in s3://javabin-cur-553637109631/cur/
  • Verify Glue crawler populates table in javabin_cur database
  • Run manual Athena query in javabin-cost-analytics workgroup
  • Invoke javabin-cost-report Lambda, check Slack for resource drilldown section
  • Invoke javabin-daily-cost-check Lambda, verify resource context on spikes

- New cost-analytics module: CUR 2.0 export (daily Parquet with resource IDs),
  Glue database + crawler, Athena workgroup, CloudWatch billing alarm ($200),
  account-level budget with 80%/100% notifications ($500)
- Shared athena.py utility for synchronous Athena queries with graceful degradation
- Weekly cost report: resource-level drilldown (top 10 resources), enriched LLM narrative
- Daily spike check: per-spike resource breakdown (top 5 resources per spiking service)
- Lambda IAM policies updated for Athena/Glue/S3 access
- Lambda timeouts increased (cost-report 60→120s, daily-check 60→90s)
@github-actions
Copy link
Copy Markdown

Terraform Plan

🚧 Changes detected — Plan: 19 to add, 4 to change, 0 to destroy.

Plan output
Acquiring state lock. This may take a few moments...

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # module.cost_analytics.aws_athena_workgroup.cur will be created
  + resource "aws_athena_workgroup" "cur" {
      + arn           = (known after apply)
      + force_destroy = false
      + id            = (known after apply)
      + name          = "javabin-cost-analytics"
      + state         = "ENABLED"
      + tags_all      = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }

      + configuration {
          + bytes_scanned_cutoff_per_query     = 104857600
          + enforce_workgroup_configuration    = true
          + publish_cloudwatch_metrics_enabled = true
          + requester_pays_enabled             = false

          + result_configuration {
              + output_location = (known after apply)

              + encryption_configuration {
                  + encryption_option = "SSE_S3"
                }
            }
        }
    }

  # module.cost_analytics.aws_bcmdataexports_export.cur will be created
  + resource "aws_bcmdataexports_export" "cur" {
      + id       = (known after apply)
      + tags_all = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }

      + export {
          + export_arn = (known after apply)
          + name       = "javabin-cur"

          + data_query {
              + query_statement      = "SELECT identity_line_item_id, identity_time_interval, bill_invoice_id, bill_invoicing_entity, bill_billing_entity, bill_bill_type, bill_payer_account_id, bill_billing_period_start_date, bill_billing_period_end_date, line_item_usage_account_id, line_item_line_item_type, line_item_usage_start_date, line_item_usage_end_date, line_item_product_code, line_item_usage_type, line_item_operation, line_item_availability_zone, line_item_resource_id, line_item_usage_amount, line_item_normalization_factor, line_item_normalized_usage_amount, line_item_currency_code, line_item_unblended_rate, line_item_unblended_cost, line_item_blended_rate, line_item_blended_cost, line_item_line_item_description, product_product_name, product_region, pricing_unit, pricing_public_on_demand_cost, pricing_public_on_demand_rate, pricing_term, pricing_offering_class, resource_tags_user_team, resource_tags_user_service, resource_tags_user_environment, resource_tags_user_repo, resource_tags_user_managed_by FROM COST_AND_USAGE_REPORT"
              + table_configurations = {
                  + "COST_AND_USAGE_REPORT" = {
                      + "INCLUDE_MANUAL_DISCOUNT_COMPATIBILITY" = "FALSE"
                      + "INCLUDE_RESOURCES"                     = "TRUE"
                      + "INCLUDE_SPLIT_COST_ALLOCATION_DATA"    = "FALSE"
                      + "TIME_GRANULARITY"                      = "DAILY"
                    }
                }
            }

          + destination_configurations {
              + s3_destination {
                  + s3_bucket = (known after apply)
                  + s3_prefix = "cur"
                  + s3_region = "eu-central-1"

                  + s3_output_configurations {
                      + compression = "PARQUET"
                      + format      = "PARQUET"
                      + output_type = "CUSTOM"
                      + overwrite   = "OVERWRITE_REPORT"
                    }
                }
            }

          + refresh_cadence {
              + frequency = "SYNCHRONOUS"
            }
        }
    }

  # module.cost_analytics.aws_budgets_budget.account will be created
  + resource "aws_budgets_budget" "account" {
      + account_id        = (known after apply)
      + arn               = (known after apply)
      + budget_type       = "COST"
      + id                = (known after apply)
      + limit_amount      = "500"
      + limit_unit        = "USD"
      + name              = "javabin-account-monthly"
      + name_prefix       = (known after apply)
      + tags_all          = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + time_period_end   = "2087-06-15_00:00"
      + time_period_start = (known after apply)
      + time_unit         = "MONTHLY"

      + notification {
          + comparison_operator        = "GREATER_THAN"
          + notification_type          = "ACTUAL"
          + subscriber_email_addresses = []
          + subscriber_sns_topic_arns  = [
              + "arn:aws:sns:eu-central-1:553637109631:javabin-alerts",
            ]
          + threshold                  = 100
          + threshold_type             = "PERCENTAGE"
        }
      + notification {
          + comparison_operator        = "GREATER_THAN"
          + notification_type          = "ACTUAL"
          + subscriber_email_addresses = []
          + subscriber_sns_topic_arns  = [
              + "arn:aws:sns:eu-central-1:553637109631:javabin-alerts",
            ]
          + threshold                  = 80
          + threshold_type             = "PERCENTAGE"
        }
    }

  # module.cost_analytics.aws_cloudwatch_metric_alarm.billing will be created
  + resource "aws_cloudwatch_metric_alarm" "billing" {
      + actions_enabled                       = true
      + alarm_actions                         = (known after apply)
      + alarm_description                     = "Account estimated charges exceeded ${var.billing_alarm_threshold_usd}"
      + alarm_name                            = "javabin-billing-alarm"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanThreshold"
      + dimensions                            = {
          + "Currency" = "USD"
        }
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 1
      + id                                    = (known after apply)
      + metric_name                           = "EstimatedCharges"
      + namespace                             = "AWS/Billing"
      + ok_actions                            = (known after apply)
      + period                                = 21600
      + statistic                             = "Maximum"
      + tags_all                              = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + threshold                             = 200
      + treat_missing_data                    = "missing"
    }

  # module.cost_analytics.aws_glue_catalog_database.cur will be created
  + resource "aws_glue_catalog_database" "cur" {
      + arn          = (known after apply)
      + catalog_id   = (known after apply)
      + id           = (known after apply)
      + location_uri = (known after apply)
      + name         = "javabin_cur"
      + tags_all     = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
    }

  # module.cost_analytics.aws_glue_crawler.cur will be created
  + resource "aws_glue_crawler" "cur" {
      + arn           = (known after apply)
      + configuration = jsonencode(
            {
              + Grouping = {
                  + TableGroupingPolicy = "CombineCompatibleSchemas"
                }
              + Version  = 1
            }
        )
      + database_name = "javabin_cur"
      + id            = (known after apply)
      + name          = "javabin-cur-crawler"
      + role          = (known after apply)
      + schedule      = "cron(0 6 * * ? *)"
      + tags_all      = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }

      + s3_target {
          + path = (known after apply)
        }

      + schema_change_policy {
          + delete_behavior = "DELETE_FROM_DATABASE"
          + update_behavior = "UPDATE_IN_DATABASE"
        }
    }

  # module.cost_analytics.aws_iam_role.cur_crawler will be created
  + resource "aws_iam_role" "cur_crawler" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRole"
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "glue.amazonaws.com"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "javabin-cur-crawler"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + unique_id             = (known after apply)
    }

  # module.cost_analytics.aws_iam_role_policy.cur_crawler_s3 will be created
  + resource "aws_iam_role_policy" "cur_crawler_s3" {
      + id          = (known after apply)
      + name        = "javabin-cur-crawler-s3"
      + name_prefix = (known after apply)
      + policy      = (known after apply)
      + role        = (known after apply)
    }

  # module.cost_analytics.aws_iam_role_policy_attachment.cur_crawler_glue will be created
  + resource "aws_iam_role_policy_attachment" "cur_crawler_glue" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
      + role       = "javabin-cur-crawler"
    }

  # module.cost_analytics.aws_s3_bucket.athena_results will be created
  + resource "aws_s3_bucket" "athena_results" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "javabin-athena-results-553637109631"
      + bucket_domain_name          = (known after apply)
      + bucket_prefix               = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)
    }

  # module.cost_analytics.aws_s3_bucket.cur_data will be created
  + resource "aws_s3_bucket" "cur_data" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "javabin-cur-553637109631"
      + bucket_domain_name          = (known after apply)
      + bucket_prefix               = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)
    }

  # module.cost_analytics.aws_s3_bucket_lifecycle_configuration.athena_results will be created
  + resource "aws_s3_bucket_lifecycle_configuration" "athena_results" {
      + bucket                                 = (known after apply)
      + expected_bucket_owner                  = (known after apply)
      + id                                     = (known after apply)
      + transition_default_minimum_object_size = "all_storage_classes_128K"

      + rule {
          + id     = "expire-query-results"
          + status = "Enabled"

          + expiration {
              + days                         = 7
              + expired_object_delete_marker = false
            }

          + filter {
            }
        }
    }

  # module.cost_analytics.aws_s3_bucket_lifecycle_configuration.cur_data will be created
  + resource "aws_s3_bucket_lifecycle_configuration" "cur_data" {
      + bucket                                 = (known after apply)
      + expected_bucket_owner                  = (known after apply)
      + id                                     = (known after apply)
      + transition_default_minimum_object_size = "all_storage_classes_128K"

      + rule {
          + id     = "archive-and-expire"
          + status = "Enabled"

          + expiration {
              + days                         = 365
              + expired_object_delete_marker = false
            }

          + filter {
            }

          + transition {
              + days          = 90
              + storage_class = "STANDARD_IA"
            }
        }
    }

  # module.cost_analytics.aws_s3_bucket_policy.cur_data will be created
  + resource "aws_s3_bucket_policy" "cur_data" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

  # module.cost_analytics.aws_s3_bucket_public_access_block.athena_results will be created
  + resource "aws_s3_bucket_public_access_block" "athena_results" {
      + block_public_acls       = true
      + block_public_policy     = true
      + bucket                  = (known after apply)
      + id                      = (known after apply)
      + ignore_public_acls      = true
      + restrict_public_buckets = true
    }

  # module.cost_analytics.aws_s3_bucket_public_access_block.cur_data will be created
  + resource "aws_s3_bucket_public_access_block" "cur_data" {
      + block_public_acls       = true
      + block_public_policy     = true
      + bucket                  = (known after apply)
      + id                      = (known after apply)
      + ignore_public_acls      = true
      + restrict_public_buckets = true
    }

  # module.cost_analytics.aws_s3_bucket_server_side_encryption_configuration.athena_results will be created
  + resource "aws_s3_bucket_server_side_encryption_configuration" "athena_results" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + rule {
          + apply_server_side_encryption_by_default {
              + sse_algorithm = "AES256"
            }
        }
    }

  # module.cost_analytics.aws_s3_bucket_server_side_encryption_configuration.cur_data will be created
  + resource "aws_s3_bucket_server_side_encryption_configuration" "cur_data" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + rule {
          + apply_server_side_encryption_by_default {
              + sse_algorithm = "AES256"
            }
        }
    }

  # module.cost_analytics.aws_sns_topic.billing_alarm will be created
  + resource "aws_sns_topic" "billing_alarm" {
      + arn                         = (known after apply)
      + beginning_archive_time      = (known after apply)
      + content_based_deduplication = false
      + fifo_throughput_scope       = (known after apply)
      + fifo_topic                  = false
      + id                          = (known after apply)
      + name                        = "javabin-billing-alarm"
      + name_prefix                 = (known after apply)
      + owner                       = (known after apply)
      + policy                      = (known after apply)
      + signature_version           = (known after apply)
      + tags_all                    = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + tracing_config              = (known after apply)
    }

  # module.lambdas.aws_iam_role_policy.cost_report will be updated in-place
  ~ resource "aws_iam_role_policy" "cost_report" {
        id     = "javabin-cost-report:javabin-cost-report"
        name   = "javabin-cost-report"
      ~ policy = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = "ssm:GetParameter"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:ssm:eu-central-1:553637109631:parameter/javabin/slack/*"
                      - Sid      = "SSMRead"
                    },
                  - {
                      - Action   = [
                          - "ce:GetCostAndUsage",
                        ]
                      - Effect   = "Allow"
                      - Resource = "*"
                      - Sid      = "CostExplorer"
                    },
                  - {
                      - Action   = [
                          - "bedrock:InvokeModel",
                          - "bedrock:Converse",
                        ]
                      - Effect   = "Allow"
                      - Resource = "arn:aws:bedrock:eu-central-1:553637109631:inference-profile/eu.anthropic.*"
                      - Sid      = "BedrockInferenceProfile"
                    },
                  - {
                      - Action    = [
                          - "bedrock:InvokeModel",
                          - "bedrock:Converse",
                        ]
                      - Condition = {
                          - StringLike = {
                              - "bedrock:InferenceProfileArn" = "arn:aws:bedrock:eu-central-1:553637109631:inference-profile/eu.anthropic.*"
                            }
                        }
                      - Effect    = "Allow"
                      - Resource  = "arn:aws:bedrock:eu-*::foundation-model/anthropic.*"
                      - Sid       = "BedrockFoundationModels"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        # (1 unchanged attribute hidden)
    }

  # module.lambdas.aws_iam_role_policy.daily_cost_check will be updated in-place
  ~ resource "aws_iam_role_policy" "daily_cost_check" {
        id     = "javabin-daily-cost-check:javabin-daily-cost-check"
        name   = "javabin-daily-cost-check"
      ~ policy = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = "ssm:GetParameter"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:ssm:eu-central-1:553637109631:parameter/javabin/slack/*"
                      - Sid      = "SSMRead"
                    },
                  - {
                      - Action   = "ce:GetCostAndUsage"
                      - Effect   = "Allow"
                      - Resource = "*"
                      - Sid      = "CostExplorer"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        # (1 unchanged attribute hidden)
    }

  # module.lambdas.aws_lambda_function.cost_report will be updated in-place
  ~ resource "aws_lambda_function" "cost_report" {
        id                             = "javabin-cost-report"
      ~ last_modified                  = "2026-03-26T19:54:44.138+0000" -> (known after apply)
      ~ source_code_hash               = "esY+QEmRWxvAf+1NPbBdjCkq3eLoOeCDHEj67n8xG9Q=" -> "snN/E6iutC00t7GbuCV4zAizu7uDzNr/Zukx9Ta4fxw="
        tags                           = {}
      ~ timeout                        = 60 -> 120
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              + "ATHENA_WORKGROUP"   = "javabin-cost-analytics"
              + "CUR_DATABASE"       = "javabin_cur"
              + "CUR_TABLE"          = "javabin_cur"
                # (2 unchanged elements hidden)
            }
        }

        # (3 unchanged blocks hidden)
    }

  # module.lambdas.aws_lambda_function.daily_cost_check will be updated in-place
  ~ resource "aws_lambda_function" "daily_cost_check" {
        id                             = "javabin-daily-cost-check"
      ~ last_modified                  = "2026-03-26T19:54:44.138+0000" -> (known after apply)
      ~ source_code_hash               = "BJb5rFoBUKg2rYSX/YVv30vdN5IwFS0S+RbcroWezXA=" -> "JJxy0PHHV4EJlt8QX2hACxRSBlHTNpMQJMTqeG/IvG4="
        tags                           = {}
      ~ timeout                        = 60 -> 90
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              + "ATHENA_WORKGROUP"   = "javabin-cost-analytics"
              + "CUR_DATABASE"       = "javabin_cur"
              + "CUR_TABLE"          = "javabin_cur"
                # (1 unchanged element hidden)
            }
        }

        # (3 unchanged blocks hidden)
    }

Plan: 19 to add, 4 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "tfplan"

LLM Review

Risk: 🟢 LOW

Adding cost analytics infrastructure (Athena, Glue, S3, budgets) and updating Lambda functions with new cost reporting capabilities.

  • [routine] Lambda function updates: cost_report timeout increased 60→120s, daily_cost_check timeout increased 60→90s, and source code hashes updated. These are standard function updates with no breaking changes.
  • 💰 [cost] New billable resources being created: AWS Athena workgroup, Glue crawler, 2 S3 buckets, AWS Budgets, and CloudWatch billing alarm. Estimated monthly cost impact: ~$5-15 for Athena queries + S3 storage + Glue crawler runs.
  • [routine] S3 bucket security properly configured: public access blocks enabled on both buckets, server-side encryption (AES256) enabled, lifecycle policies set (7-day expiration for Athena results, 90-day transition to STANDARD_IA for CUR data).
  • [routine] IAM permissions expanded for cost_report and daily_cost_check Lambda functions to include Athena, Glue database, and CUR table access. Permissions are scoped appropriately with no overly broad wildcards.
  • [routine] Budget alert configured with $500 monthly limit and SNS notifications at 80% and 100% thresholds. No dangerous changes to existing infrastructure.

@Alexanderamiri Alexanderamiri merged commit 22f0f73 into main Mar 26, 2026
3 checks passed
@Alexanderamiri Alexanderamiri deleted the feat/cur-cost-analytics branch March 26, 2026 21:50
Alexanderamiri added a commit that referenced this pull request Mar 26, 2026
## Summary

Fixes two CI failures from the CUR 2.0 deployment (#103):

1. **CUR 2.0 query**: `SELECT *` instead of explicit column names — CUR
2.0 column names differ from legacy CUR. `table_configurations` already
controls what's included.
2. **Crawler IAM role**: Added `permissions_boundary` — the org boundary
requires all roles to carry a boundary.

## Test plan

- [ ] CI plan + apply succeeds
- [ ] CUR export created in us-east-1
- [ ] Glue crawler role created with boundary
Alexanderamiri added a commit that referenced this pull request May 9, 2026
## Summary

- **CUR 2.0 export** (daily Parquet with resource IDs) via
`aws_bcmdataexports_export` — enables resource-level cost attribution
("this specific S3 bucket cost $X")
- **Athena workgroup + Glue crawler** for querying CUR data from Lambdas
and CLI
- **Billing protection**: CloudWatch billing alarm ($200 threshold,
~4-6h delay) + account-level budget with 80%/100% notifications
($500/month)
- **Weekly cost report**: new resource-level drilldown section (top 10
resources by cost), enriched LLM narrative with resource context
- **Daily spike check**: per-spike resource breakdown showing top 5
resources for each spiking service
- **Shared `athena.py`**: reusable Athena query utility with graceful
degradation (reports still send if CUR data is unavailable)

## New resources

| Resource | Purpose |
|----------|---------|
| S3 `javabin-cur-553637109631` | CUR 2.0 Parquet data (lifecycle: IA
90d, expire 365d) |
| S3 `javabin-athena-results-553637109631` | Athena query results
(expire 7d) |
| `aws_bcmdataexports_export` | CUR 2.0 daily export with RESOURCES |
| Glue database `javabin_cur` | Catalog for CUR tables |
| Glue crawler `javabin-cur-crawler` | Daily 06:00 UTC schema discovery
|
| Athena workgroup `javabin-cost-analytics` | 100MB scan limit |
| CloudWatch alarm `javabin-billing-alarm` | EstimatedCharges > $200
(us-east-1) |
| Budget `javabin-account-monthly` | $500/month, alerts at 80%/100% |

## Notes

- CUR data takes 24-48h for first delivery after apply
- Glue crawler must run after first delivery before Athena queries work
- All CUR queries gracefully degrade — existing reports work unchanged
until data arrives
- Billing alarm SNS topic is in us-east-1 (CloudWatch billing
constraint). A Lambda forwarder to eu-central-1 Slack can be added
later.

## Test plan

- [ ] `terraform fmt -recursive && terraform validate` passes
- [ ] Apply via CI pipeline (plan → review → apply)
- [ ] After 24-48h: verify CUR Parquet files in
`s3://javabin-cur-553637109631/cur/`
- [ ] Verify Glue crawler populates table in `javabin_cur` database
- [ ] Run manual Athena query in `javabin-cost-analytics` workgroup
- [ ] Invoke `javabin-cost-report` Lambda, check Slack for resource
drilldown section
- [ ] Invoke `javabin-daily-cost-check` Lambda, verify resource context
on spikes
Alexanderamiri added a commit that referenced this pull request May 9, 2026
## Summary

Fixes two CI failures from the CUR 2.0 deployment (#103):

1. **CUR 2.0 query**: `SELECT *` instead of explicit column names — CUR
2.0 column names differ from legacy CUR. `table_configurations` already
controls what's included.
2. **Crawler IAM role**: Added `permissions_boundary` — the org boundary
requires all roles to carry a boundary.

## Test plan

- [ ] CI plan + apply succeeds
- [ ] CUR export created in us-east-1
- [ ] Glue crawler role created with boundary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant