Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.1
What happened and how to reproduce it?
The Dashboard's Historical Metrics section displays running, succeeded, waiting, and failed task instances / Dag runs with a percentage bar. These percentages are calculated in the frontend by dividing each state's count by the sum of all state counts (total):
The API endpoint "/ui/dashboard/historical_metrics_data" internally limits/caps the state count at STATE_COUNT_CAP = 1000 to limit the SQL query row count for performance reasons, so a maximum count of 1000 is returned.
The bug: The total used to calculate percentages is computed from the capped API values. So if success has 2500 and failure has 7 entries but the API returns 1000 for success, the computed total is 1007 instead of 2507. This wrong total results in wrong percentages.
What you think should happen instead?
The UI should show the correct percentage or no percentage for capped values, but not the wrong percentages.
There are two possible ways to fix this:
- Return real counts from the API
- Hide percentages in the frontend when any state is capped
Returning the real counts from the API with a huge amount of task instances can be significantly slower than the current approach, which uses the limit per state for a good reason: to ensure a fast response even on huge datasets. The downside of keeping the cap is that the frontend cannot compute correct percentages when any state exceeds it, since the total cannot be computed correctly.
I would be happy to file a PR about this issue, but want to discuss those tradeoffs first with the community.
Best,
Stefan
Operating System
Ubuntu 24.04.4 LTS
Deployment
Virtualenv installation
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.1
What happened and how to reproduce it?
The Dashboard's Historical Metrics section displays running, succeeded, waiting, and failed task instances / Dag runs with a percentage bar. These percentages are calculated in the frontend by dividing each state's count by the sum of all state counts (total):
The API endpoint "/ui/dashboard/historical_metrics_data" internally limits/caps the state count at STATE_COUNT_CAP = 1000 to limit the SQL query row count for performance reasons, so a maximum count of 1000 is returned.
The bug: The total used to calculate percentages is computed from the capped API values. So if success has 2500 and failure has 7 entries but the API returns 1000 for success, the computed total is 1007 instead of 2507. This wrong total results in wrong percentages.
What you think should happen instead?
The UI should show the correct percentage or no percentage for capped values, but not the wrong percentages.
There are two possible ways to fix this:
Returning the real counts from the API with a huge amount of task instances can be significantly slower than the current approach, which uses the limit per state for a good reason: to ensure a fast response even on huge datasets. The downside of keeping the cap is that the frontend cannot compute correct percentages when any state exceeds it, since the total cannot be computed correctly.
I would be happy to file a PR about this issue, but want to discuss those tradeoffs first with the community.
Best,
Stefan
Operating System
Ubuntu 24.04.4 LTS
Deployment
Virtualenv installation
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct