Time inconsistencies heuristic for faulty measurements detection#147
Time inconsistencies heuristic for faulty measurements detection#147
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #147 +/- ##
==========================================
+ Coverage 82.77% 83.52% +0.74%
==========================================
Files 78 84 +6
Lines 4871 5068 +197
==========================================
+ Hits 4032 4233 +201
+ Misses 839 835 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
hellais
left a comment
There was a problem hiding this comment.
This PR looks good. Left a comment for what to improve in the threshold checks for measurements from time travelers
| WHERE | ||
| measurement_start_time >= %(start_time)s AND | ||
| measurement_start_time < %(end_time)s AND | ||
| abs(dateDiff('second', parseDateTimeBestEffort(substring(measurement_uid, 1, 15)), measurement_start_time)) >= %(treshold)s |
There was a problem hiding this comment.
I think we should separate the checks for measurements from the future, from the ones from the past. Measurements from the future should never happen and are a sign of a probe with a faulty clock. Those from the past may be normal, since probes might be re-uploading measurements later.
| "threshold": threshold, | ||
| } | ||
|
|
||
| values.append(("time_inconsistency", probe_cc, probe_asn, orjson.dumps(details).decode())) |
There was a problem hiding this comment.
To facilitate querying and analysis we should maybe have two keys here for future and past measurements so we can look at them separately
| ) | ||
| ) | ||
| res = db.execute("SELECT COUNT() FROM faulty_measurements WHERE type = 'volume'") | ||
| assert res == [(1,)], "There should be at the least one event" |
There was a problem hiding this comment.
Shouldn't the assert say "exactly one measurement"
Implements an airflow dag that flags measurements with timestamp anomalies
closes #146