Add CUR 2.0 cost analytics and billing protection#103
Merged
Conversation
- New cost-analytics module: CUR 2.0 export (daily Parquet with resource IDs), Glue database + crawler, Athena workgroup, CloudWatch billing alarm ($200), account-level budget with 80%/100% notifications ($500) - Shared athena.py utility for synchronous Athena queries with graceful degradation - Weekly cost report: resource-level drilldown (top 10 resources), enriched LLM narrative - Daily spike check: per-spike resource breakdown (top 5 resources per spiking service) - Lambda IAM policies updated for Athena/Glue/S3 access - Lambda timeouts increased (cost-report 60→120s, daily-check 60→90s)
4 tasks
Terraform Plan🚧 Changes detected — Plan: 19 to add, 4 to change, 0 to destroy. Plan outputLLM ReviewRisk: 🟢 LOW Adding cost analytics infrastructure (Athena, Glue, S3, budgets) and updating Lambda functions with new cost reporting capabilities.
|
3 tasks
Alexanderamiri
added a commit
that referenced
this pull request
Mar 26, 2026
## Summary Fixes two CI failures from the CUR 2.0 deployment (#103): 1. **CUR 2.0 query**: `SELECT *` instead of explicit column names — CUR 2.0 column names differ from legacy CUR. `table_configurations` already controls what's included. 2. **Crawler IAM role**: Added `permissions_boundary` — the org boundary requires all roles to carry a boundary. ## Test plan - [ ] CI plan + apply succeeds - [ ] CUR export created in us-east-1 - [ ] Glue crawler role created with boundary
Alexanderamiri
added a commit
that referenced
this pull request
May 9, 2026
## Summary
- **CUR 2.0 export** (daily Parquet with resource IDs) via
`aws_bcmdataexports_export` — enables resource-level cost attribution
("this specific S3 bucket cost $X")
- **Athena workgroup + Glue crawler** for querying CUR data from Lambdas
and CLI
- **Billing protection**: CloudWatch billing alarm ($200 threshold,
~4-6h delay) + account-level budget with 80%/100% notifications
($500/month)
- **Weekly cost report**: new resource-level drilldown section (top 10
resources by cost), enriched LLM narrative with resource context
- **Daily spike check**: per-spike resource breakdown showing top 5
resources for each spiking service
- **Shared `athena.py`**: reusable Athena query utility with graceful
degradation (reports still send if CUR data is unavailable)
## New resources
| Resource | Purpose |
|----------|---------|
| S3 `javabin-cur-553637109631` | CUR 2.0 Parquet data (lifecycle: IA
90d, expire 365d) |
| S3 `javabin-athena-results-553637109631` | Athena query results
(expire 7d) |
| `aws_bcmdataexports_export` | CUR 2.0 daily export with RESOURCES |
| Glue database `javabin_cur` | Catalog for CUR tables |
| Glue crawler `javabin-cur-crawler` | Daily 06:00 UTC schema discovery
|
| Athena workgroup `javabin-cost-analytics` | 100MB scan limit |
| CloudWatch alarm `javabin-billing-alarm` | EstimatedCharges > $200
(us-east-1) |
| Budget `javabin-account-monthly` | $500/month, alerts at 80%/100% |
## Notes
- CUR data takes 24-48h for first delivery after apply
- Glue crawler must run after first delivery before Athena queries work
- All CUR queries gracefully degrade — existing reports work unchanged
until data arrives
- Billing alarm SNS topic is in us-east-1 (CloudWatch billing
constraint). A Lambda forwarder to eu-central-1 Slack can be added
later.
## Test plan
- [ ] `terraform fmt -recursive && terraform validate` passes
- [ ] Apply via CI pipeline (plan → review → apply)
- [ ] After 24-48h: verify CUR Parquet files in
`s3://javabin-cur-553637109631/cur/`
- [ ] Verify Glue crawler populates table in `javabin_cur` database
- [ ] Run manual Athena query in `javabin-cost-analytics` workgroup
- [ ] Invoke `javabin-cost-report` Lambda, check Slack for resource
drilldown section
- [ ] Invoke `javabin-daily-cost-check` Lambda, verify resource context
on spikes
Alexanderamiri
added a commit
that referenced
this pull request
May 9, 2026
## Summary Fixes two CI failures from the CUR 2.0 deployment (#103): 1. **CUR 2.0 query**: `SELECT *` instead of explicit column names — CUR 2.0 column names differ from legacy CUR. `table_configurations` already controls what's included. 2. **Crawler IAM role**: Added `permissions_boundary` — the org boundary requires all roles to carry a boundary. ## Test plan - [ ] CI plan + apply succeeds - [ ] CUR export created in us-east-1 - [ ] Glue crawler role created with boundary
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
aws_bcmdataexports_export— enables resource-level cost attribution ("this specific S3 bucket cost $X")athena.py: reusable Athena query utility with graceful degradation (reports still send if CUR data is unavailable)New resources
javabin-cur-553637109631javabin-athena-results-553637109631aws_bcmdataexports_exportjavabin_curjavabin-cur-crawlerjavabin-cost-analyticsjavabin-billing-alarmjavabin-account-monthlyNotes
Test plan
terraform fmt -recursive && terraform validatepassess3://javabin-cur-553637109631/cur/javabin_curdatabasejavabin-cost-analyticsworkgroupjavabin-cost-reportLambda, check Slack for resource drilldown sectionjavabin-daily-cost-checkLambda, verify resource context on spikes