Skip to content

Batch-level error rates for early shutdown #211

@johnnygreco

Description

@johnnygreco

Priority Level

Medium

Task Summary

Our error-rate calculation for early shutdown in the ConcurrentThreadExecutor attempts to calculate error rates more or less in real time. This can lead to massive overestimates of the error rate, particularly for jobs with high concurrency, since jobs can fail faster than they succeed.

The proposal of this issue is to instead calculate error rates at the batch level (i.e., outside of ConcurrentThreadExecutor).

A couple benefits of this approach:

  • The batch-level error rate will be a much more stable measurement at a consistent scale across jobs.
  • This will allow the early-shutdown mechanism to be applied to generators that do not support concurrency.

One downside is that you always have to wait for at least one batch to complete. This seems acceptable given that the batch size is adjustable and (for cases where this all matters) will generally be much smaller than the target number of records.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions