Situation
Some workloads (for example, long‑running benchmarks like SPECjbb) have expected runtimes that are close to or exceed common experiment timeout values.
When the experiment timeout is set too low (e.g., 2 hours), Virtual Client correctly enforces the timeout and terminates the workload before it completes.
This behavior is by design and VC is functioning correctly.
What makes this confusing
Although the workload is canceled intentionally:
- The experiment and step may still appear as Succeeded
- No workload metrics or logs are emitted
- There is no explicit signal explaining that the workload was terminated due to timeout
From a user and downstream analytics perspective, this appears as:
“Successful experiment with missing data”
This leads to confusion, debugging churn, and false assumptions about data loss or ingestion bugs.
Why this is not user error
While users must configure timeouts longer than the expected workload runtime, there is currently no system feedback indicating that:
The workload was cut short
The reason was an enforced experiment timeout
The resulting data is incomplete by design
Without this signal, users cannot reliably distinguish:
Misconfiguration (timeout too short)
Real workload failures
Telemetry ingestion issues
Proposed fix (observability improvement)
Virtual Client should emit explicit telemetry when a workload is terminated due to experiment timeout, for example:
“Workload execution was canceled because Virtual Client enforced the experiment timeout.”
This should be surfaced as:
A clear log/trace event
A structured telemetry field (e.g., terminationReason = ExperimentTimeout)
Optionally reflected in step status or annotations
Key point
This is not a correctness bug in workload execution.
It is an observability and diagnosability gap.
The fix is not changing VC behavior, but making the reason for cancellation explicit and visible so users and dashboards can interpret results correctly.
===
As noted in screenshot below, specjbb has 2 hour timeout from the user and scenario runs for close to that but no metrics are produced. We need some logs to indicate VC killed the process.
According to (SpecJBB requirement) [https://www.spec.org/jbb2015/docs/userguide.pdf], it takes 2 hours run. We are hitting that edge case.
Traces
| where ExperimentId == "a0d3fd60-9144-4fde-aa3b-4526bd754aa0"
| where ProfileName == "PERF-SPECJBB.json"
| where * contains "error" or SeverityLevel > 2
| count // =0
Situation
Some workloads (for example, long‑running benchmarks like SPECjbb) have expected runtimes that are close to or exceed common experiment timeout values.
When the experiment timeout is set too low (e.g., 2 hours), Virtual Client correctly enforces the timeout and terminates the workload before it completes.
This behavior is by design and VC is functioning correctly.
What makes this confusing
Although the workload is canceled intentionally:
From a user and downstream analytics perspective, this appears as:
“Successful experiment with missing data”
This leads to confusion, debugging churn, and false assumptions about data loss or ingestion bugs.
Why this is not user error
While users must configure timeouts longer than the expected workload runtime, there is currently no system feedback indicating that:
The workload was cut short
The reason was an enforced experiment timeout
The resulting data is incomplete by design
Without this signal, users cannot reliably distinguish:
Misconfiguration (timeout too short)
Real workload failures
Telemetry ingestion issues
Proposed fix (observability improvement)
Virtual Client should emit explicit telemetry when a workload is terminated due to experiment timeout, for example:
“Workload execution was canceled because Virtual Client enforced the experiment timeout.”
This should be surfaced as:
A clear log/trace event
A structured telemetry field (e.g., terminationReason = ExperimentTimeout)
Optionally reflected in step status or annotations
Key point
This is not a correctness bug in workload execution.
It is an observability and diagnosability gap.
The fix is not changing VC behavior, but making the reason for cancellation explicit and visible so users and dashboards can interpret results correctly.
===
As noted in screenshot below, specjbb has 2 hour timeout from the user and scenario runs for close to that but no metrics are produced. We need some logs to indicate VC killed the process.
According to (SpecJBB requirement) [https://www.spec.org/jbb2015/docs/userguide.pdf], it takes 2 hours run. We are hitting that edge case.
Traces
| where ExperimentId == "a0d3fd60-9144-4fde-aa3b-4526bd754aa0"
| where ProfileName == "PERF-SPECJBB.json"
| where * contains "error" or SeverityLevel > 2
| count // =0