Skip to content

Usage Server repeatedly reprocesses historical usage and creates duplicate cloud_usage records for usage_type=1 #13399

@mwa-sudo

Description

@mwa-sudo

problem

We are experiencing what appears to be the same issue described in #13112, however in our environment it affects usage_type = 1 (default VM usage) instead of the usage type reported in the original issue.
The Usage Server repeatedly reprocesses historical usage records from a fixed timestamp and continuously inserts duplicate records into the cloud_usage table.

observed behaviour

The Usage Server repeatedly processes the same historical records and generates duplicate rows in cloud_usage.

  • start date repeatedly reprocessed: 2026-05-29T22:00:00+0200

relation to #13112

This issue appears very similar to #13112.

Differences:

Because the symptoms are effectively identical, it indicates that the issue still persists in newer versions.

evidence

Relevant Usage Server logs:

During the period where duplicate usage records are generated, the Usage Server repeatedly processes historical time ranges and emits SQL integrity constraint violations.
A log snippet is attached. VM and account names have been redacted from the logs for data protection purposes.

_usage.2026-05-28.log

A representative example is shown below:
2026-05-29 23:15:00,005 INFO Parsing usage records between [2026-05-29T20:00:00+0000] and [2026-05-29T20:59:59+0000] ERROR: Duplicate entry '570-2026-05-29 20:50:40' for key 'id' java.sql.SQLIntegrityConstraintViolationException WARN: Failed to create usage event id: 1703940 type: VOLUME.CREATE due to Entity already exists
The exception indicates that the Usage Server is attempting to insert records that already exist. This appears to coincide with the repeated processing of historical usage intervals and the continuous growth of duplicate usage records described above.

To quantify the issue, we wrote a Python script that hashes usage records and counts identical entries for a single virtual machine grouped by hour.
The expected result is exactly one record per hourly interval (count = 1).
This expectation is met until 2026-05-29 22:00, after which duplicate records begin appearing. The duplication follows a distinct pattern:

  • 2026-05-29 22:00 → 204 identical records
  • 2026-05-29 23:00 → 203 identical records
  • 2026-05-30 00:00 → 202 identical records
  • 2026-05-30 01:00 → 201 identical records

The pattern continues with each subsequent hourly bucket containing one fewer duplicate than the previous bucket.
This suggests that historical usage intervals are being repeatedly regenerated or reprocessed by the Usage Server. While the total number of records continues to increase over time, the number of duplicates associated with each successive hourly interval decreases by one.
Example output for a single VM:

{
"2026-05-29": {
"21": 1,
"22": 204,
"23": 203
},
"2026-05-30": {
"00": 202,
"01": 201,
"02": 200,
"03": 199
}
}

versions

OS: Ubuntu 24.04.4 LTS
Hypervisor: KVM, libvirt 10.0.0-2ubuntu8.13
Database: MariaDB 10.11.14-MariaDB-0ubuntu0.24.04.1-log Ubuntu 24.04
CloudStack: 4.22.1.0

The steps to reproduce the bug

The exact trigger is currently unknown. In our environment, the issue can be observed with the following setup:

  1. Enable the Usage Server in Apache CloudStack.
  2. Configure:
    • usage.stats.job.aggregation.range = 60
    • usage.stats.job.exec.time = 00:15
  3. Allow the Usage Server to run normally for an extended period.
  4. Observe the cloud_usage table and Usage Server logs.
  5. Duplicate usage records begin appearing and continue to accumulate over time.

What to do about it?

One observation that might be relevant, is a start-timestamp on 2026-05-29 at 20:00:14 (UTC). This is the only differing usage record. All other duplicates are exactly the same (within VM, day and hour) and their start-timestamps are all for second "00". Without knowing the actual contexts, it does look as if this one existing usage record for second "14" kind of congests the system now. Meaning the server did create an entry for second "14". At the same time it tried to create an entry for second "00". This failed due to the constraint-violation by having the same ID as the entry for second "14". This is observable in the logs.

Starting the following hour the system is indeed creating entries for all subsequent usage records including the one for the timestamp in question. But somehow maybe because of this one "failed" entry, the system does not recognize all subsequently created entries as valid/processed and recreates all of them again every hour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions