Skip to content

Dropped metrics from ddmd and mgd #10552

@jmcarp

Description

@jmcarp

Oximeter is currently dropping a subsample of metrics on the colo rack:

Image

Investigating further, these failures all come from two producer ports, 8001 and 4677:

Image

These metrics come from ddmd and mgd, and sure enough, there are gaps (note that these metrics should be sampled every 1s):

Image

I think this means that collecting from these producers is often taking >1s. I think we should either figure out why and fix it, or figure out oximeter can better tolerate collection latencies greater than the sampling interval, or raise the sampling interval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions