Add a metric to track job creation to pod creation time.#13
Add a metric to track job creation to pod creation time.#13
Conversation
|
@microsoft-github-policy-service agree |
| jobStateEntries: measurementutil.NewObjectTransitionTimes(jobLifecycleLatencyMeasurementName), | ||
| eventQueue: workqueue.New(), | ||
| podCreationTime: measurementutil.NewPodCreationEventTimes(), | ||
| eventTicker: time.NewTicker(time.Minute), |
There was a problem hiding this comment.
This means that the measurement will run very 1 minute to collect data right? I wonder if we should do second for more fine-grained result? Or allow users to customize the frequency with a variable?
There was a problem hiding this comment.
Correct me if I'm wrong, but the events are stored in ETCD for a while, every time we list, it will return all the events in its cache. As long as the interval is smaller than the life span the cache, it should not miss anything.
There was a problem hiding this comment.
Talked to Anson, I'm working on a new version that uses informer of pods instead of ListEvents.
| } | ||
| } | ||
|
|
||
| time.Sleep(2 * time.Second) |
There was a problem hiding this comment.
Why do we have to wait for 2 seconds in the test?
There was a problem hiding this comment.
The ticker is set to 1 second. It sleeps for 2 seconds so that at least 1 tick is done.
| } | ||
| gotP50 := getMetric(createToPodStart, "Perc50") | ||
| if gotP50 != 3*time.Minute { | ||
| t.Errorf("Expect create_to_pod_start Perc50 = 3 minutes, got %v", gotP50) |
There was a problem hiding this comment.
The majority of measurement reports time in second so you might want to consider second instead of minute to align with upstream
There was a problem hiding this comment.
"Second" is tested in Line 170, right? Are you asking for every test case to use seconds as unit?
| } | ||
|
|
||
| // ListEvents retrieves events for the object with the given name. | ||
| func ListEvents(c clientset.Interface, namespace string, name string, options ...*APICallOptions) (obj *apiv1.EventList, err error) { |
There was a problem hiding this comment.
can we change the order of these two functions, it will make resolve conflicts easier
| p.addEvent, | ||
| ) | ||
| go p.processEvents() | ||
| go measurementutil.RunEveryTick(p.eventTicker, p.getFuncToListJobEvents(c), p.stopCh) |
There was a problem hiding this comment.
or do we have to measure periodically? or it can be event driven
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds a metric to measure latency from job creation to pod creation in JobLifecycleLatency.
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
Passed unit tests