[agent] Add udev DB reader and FillFromCrawler for device map#239
[agent] Add udev DB reader and FillFromCrawler for device map#239fastrapier wants to merge 5 commits intofeat/udev-properties-devicemapfrom
Conversation
szhem
left a comment
There was a problem hiding this comment.
Code Review
Blockers: none. Two High-severity findings that should be addressed before merge; the rest can be follow-ups.
Summary
| Severity | Count |
|---|---|
| Critical | 0 |
| High | 2 |
| Medium | 4 |
| Low | 5 |
| Nit | 1 |
Positive Observations
- Clean mutex discipline: file I/O in
FillFromCrawlerhappens outside the lock; map swap is atomic undermu.Lock(). EnrichWithUdevDBdoes not mutate the inputenv— explicitly covered byTestEnrichWithUdevDB_DoesNotMutateOriginal.All()returns a shallow copy — aliasing race prevented and validated byTestAll_ReturnsCopy.- Sentinel
ErrUnknownActionfollows the Go guideline;errors.Isusage covered in tests. - Field-resolution priorities (
SCSI_IDENT_SERIAL → ID_SERIAL,ID_WWN_WITH_EXTENSION → ID_WWN,ID_MODEL_ENC → ID_MODEL) mirrorlsblkexactly — good behavioural compatibility with the existing tooling. unhexmangle+normalizeWhitespacecovered by honest tests, including malformed sequences (\xZZ,\x4).- Every branch of
HandleEventis exercised (add/change/remove/bind/unbind/move/online/offline/unknown + bad MAJOR/MINOR + missing fields) — exemplary coverage. TestDeviceMap_ConcurrentAccessnow collects goroutine errors instead of swallowing them (addresses prior review).
PR-level notes
- Tests (Nit):
TestFillFromCrawler_ReplacesAllchecks onlyLen()and absence of the old key — consider asserting that the new entries carry the expectedDevName/Modelso regressions in the fill path surface faster. - Tests (Nit):
TestHandleEvent_RemoveNonexistent_NoErrorcodifies silent remove-of-unknown. A one-line comment explaining this is intentional (kernel may emitremovefor devices the agent never saw) would help future readers.
| // DeviceMap is a thread-safe in-memory store of block device properties | ||
| // keyed by "major:minor". It is populated by netlink events (HandleEvent) | ||
| // and provides a consistent snapshot via All(). |
There was a problem hiding this comment.
Severity: Medium — Outdated godoc
After this PR the map is also populated by FillFromCrawler, but the doc still says it is only populated by netlink events. Readers of godoc will miss the bootstrap path.
Recommendation:
// DeviceMap is a thread-safe in-memory store of block device properties
// keyed by "major:minor". It is populated by FillFromCrawler at startup
// and by HandleEvent from netlink events thereafter. A consistent snapshot
// is available via All().There was a problem hiding this comment.
Fixed in cd4f80d. Godoc now mentions both FillFromCrawler and HandleEvent.
| // HandleEvent processes a single netlink uevent. The action string comes | ||
| // directly from the kernel (add, change, remove, bind, unbind, move, online, | ||
| // offline). The env map is the raw udev environment from the event. |
There was a problem hiding this comment.
Severity: Low — HandleEvent godoc does not mention ErrUnknownAction
Returning the sentinel is part of the contract — tests rely on errors.Is(err, ErrUnknownAction), but callers reading the godoc will not know they can differentiate this failure mode.
Recommendation: add one line:
// Returns ErrUnknownAction (wrapped) when the action is not one of the
// listed values; callers can check via errors.Is.
There was a problem hiding this comment.
Fixed in cd4f80d. Added ErrUnknownAction to HandleEvent godoc.
| // properties (MAJOR, MINOR, DEVNAME, DEVTYPE) without udev-enriched fields | ||
| // (serial, model, wwn). Devices that fail to parse are skipped and their | ||
| // errors are collected in the returned slice. | ||
| func (dm *DeviceMap) FillFromCrawler(devices []crawler.Device, udevDBPath string) []error { |
There was a problem hiding this comment.
Severity: Medium — udevDBPath as a method parameter, not a DeviceMap field
The path to the udev DB is an environment invariant, not a per-call detail. Every caller now has to know and thread DefaultRunUdevDataPath through. If a future change wants to enrich in HandleEvent as well, the signature has to change again.
Recommendation:
type DeviceMap struct {
mu sync.RWMutex
devices map[string]Properties
udevDBPath string
}
func NewDeviceMap(udevDBPath string) *DeviceMap {
return &DeviceMap{
devices: make(map[string]Properties),
udevDBPath: udevDBPath,
}
}
func (dm *DeviceMap) FillFromCrawler(devices []crawler.Device) []error { ... }Bonus: easier to mock in the PR 3 scanner tests — pass t.TempDir() into the constructor, no extra argument plumbing.
There was a problem hiding this comment.
Fixed in cd4f80d. udevDBPath is now a DeviceMap struct field, passed via NewDeviceMap(udevDBPath).
| // properties (MAJOR, MINOR, DEVNAME, DEVTYPE) without udev-enriched fields | ||
| // (serial, model, wwn). Devices that fail to parse are skipped and their | ||
| // errors are collected in the returned slice. | ||
| func (dm *DeviceMap) FillFromCrawler(devices []crawler.Device, udevDBPath string) []error { |
There was a problem hiding this comment.
Severity: Medium — FillFromCrawler does not accept a context.Context
The method performs synchronous file I/O on N devices with no timeout and no shutdown path. In PR 3 this runs on agent startup; if /run/udev/data/ is on slow storage or the pod is being terminated mid-start, the agent will hang.
Recommendation: prepare the signature now so PR 3 can wire a real context:
func (dm *DeviceMap) FillFromCrawler(ctx context.Context, devices []crawler.Device) []error {
for _, dev := range devices {
if err := ctx.Err(); err != nil {
return append(errs, err)
}
...
}
}There was a problem hiding this comment.
Fixed in cd4f80d. FillFromCrawler now accepts context.Context and checks ctx.Err() on each iteration.
| // errors are collected in the returned slice. | ||
| func (dm *DeviceMap) FillFromCrawler(devices []crawler.Device, udevDBPath string) []error { | ||
| newDevices := make(map[string]Properties, len(devices)) | ||
| var errs []error |
There was a problem hiding this comment.
Severity: Nit — Preallocate errs
Upper bound is known (at most 2*len(devices) entries — enrichment + parse). A pre-sized slice avoids a couple of growslice calls.
Recommendation:
errs := make([]error, 0, len(devices))There was a problem hiding this comment.
Fixed in cd4f80d. errs is now preallocated: make([]error, 0, len(devices)).
| defer dm.mu.Unlock() | ||
| dm.devices = newDevices | ||
|
|
||
| return errs |
There was a problem hiding this comment.
Severity: Low — Consider returning error (via errors.Join) instead of []error
Go 1.20+ errors.Join is the idiomatic way to aggregate multiple errors. The caller gets a single error that still supports errors.Is/errors.As, logs as one line, and integrates with existing error-propagation code. Today every caller will have to loop through []error and glue the strings together.
Recommendation:
func (dm *DeviceMap) FillFromCrawler(...) error {
...
return errors.Join(errs...)
}There was a problem hiding this comment.
Fixed in cd4f80d. FillFromCrawler now returns error via errors.Join(errs...).
| // ReadUdevDB reads /run/udev/data/b<major>:<minor> and returns the E: properties | ||
| // as a key-value map. Non-E: lines (N:, S:, I:, etc.) are ignored. | ||
| func ReadUdevDB(basePath string, major, minor int) (map[string]string, error) { | ||
| path := fmt.Sprintf("%s/b%d:%d", basePath, major, minor) |
There was a problem hiding this comment.
Severity: Medium — Use filepath.Join instead of fmt.Sprintf for path construction
Current form does not handle a trailing slash in basePath cleanly (/run/udev/data/ → /run/udev/data//b8:0). It works, but the idiomatic Go equivalent is safer and self-documents intent.
Recommendation:
path := filepath.Join(basePath, fmt.Sprintf("b%d:%d", major, minor))There was a problem hiding this comment.
Fixed in cd4f80d. Now uses filepath.Join(basePath, fmt.Sprintf("b%d:%d", major, minor)).
| } | ||
| props := make(map[string]string) | ||
| for _, line := range strings.Split(string(data), "\n") { | ||
| after, ok := strings.CutPrefix(strings.TrimSpace(line), "E:") |
There was a problem hiding this comment.
Severity: Low — TrimSpace of the whole line can introduce a leading space in the key
The udev DB format is strict (E:KEY=VALUE, no whitespace), so today this does not matter. But if the format ever grows a tolerant variant like E: FOO=BAR, strings.CutPrefix(strings.TrimSpace(line), "E:") yields " FOO=BAR", and the key stored in the map becomes " FOO" with a leading space.
Recommendation: keep the prefix check literal and trim key/value separately if needed:
after, ok := strings.CutPrefix(line, "E:")
if !ok {
continue
}
if idx := strings.IndexByte(after, '='); idx > 0 {
props[after[:idx]] = strings.TrimRight(after[idx+1:], "\r\n")
}There was a problem hiding this comment.
Fixed in cd4f80d. Removed TrimSpace, now CutPrefix(line, "E:") directly. Value trimmed with TrimRight(after[idx+1:], "\r\n"). Also switched to IndexByte.
| // serialFromUdevEnv follows the lsblk get_properties_by_udev() priority chain: | ||
| // SCSI_IDENT_SERIAL -> ID_SCSI_SERIAL -> ID_SERIAL_SHORT -> ID_SERIAL. | ||
| func serialFromUdevEnv(env map[string]string) string { | ||
| for _, key := range []string{"SCSI_IDENT_SERIAL", "ID_SCSI_SERIAL", "ID_SERIAL_SHORT", "ID_SERIAL"} { |
There was a problem hiding this comment.
Severity: Low — Priority list allocated on every call
[]string{"SCSI_IDENT_SERIAL", ...} is a new literal every time serialFromUdevEnv runs. For hot reconciliation loops this is avoidable GC pressure.
Recommendation: hoist to a package-level var:
var serialKeyPriority = []string{
"SCSI_IDENT_SERIAL",
"ID_SCSI_SERIAL",
"ID_SERIAL_SHORT",
"ID_SERIAL",
}
func serialFromUdevEnv(env map[string]string) string {
for _, key := range serialKeyPriority {
...
}
}There was a problem hiding this comment.
Fixed in cd4f80d. Hoisted to package-level var serialKeyPriority.
| return b.String() | ||
| } | ||
|
|
||
| func ensureDevPrefix(devName string) string { |
There was a problem hiding this comment.
Severity: Low — ensureDevPrefix has no defence against path traversal in DEVNAME
DEVNAME="../etc/passwd" becomes /dev/../etc/passwd. Today this value is only used as an identifier, so the risk is latent. But if a follow-up PR starts doing os.Open(props.DevName) or otherwise dereferences the path (for ioctl, size probing, etc.), this escapes /dev/ silently. The source of DEVNAME is trusted (kernel → udev), but defence in depth matters here.
Recommendation:
func ensureDevPrefix(devName string) string {
if devName == "" || strings.HasPrefix(devName, "/dev/") {
return devName
}
if strings.ContainsAny(devName, "/\x00") {
return "" // or return an error from ParseProperties
}
return "/dev/" + devName
}There was a problem hiding this comment.
Leaving as-is. DEVNAME comes from the kernel (kobject uevent → dev->disk_name), not from user input. The value is always a flat device name like sda, nvme0n1p1, dm-0 — no path components. We don't do any file I/O on props.DevName; it's used purely as an identifier string. Adding validation for a kernel-sourced field that can never contain traversal sequences would be dead code.
59f0aa7 to
d51649e
Compare
…tion sentinel Address code review feedback: - Remove stuttering: UdevProperties → Properties, ParseUdevProperties → ParseProperties - Add ErrUnknownAction sentinel error for programmatic error handling - Fix concurrency test to collect and verify errors instead of discarding them Signed-off-by: Daniil Studenikin <daniil.studenikin@flant.ru>
- Add doc-comments on all exported symbols (ErrUnknownAction, NewDeviceMap, Properties, ParseProperties) - Document HandleEvent add-like/remove-like policy in docstring - Move mutex acquisition after action validation to avoid locking on unknown actions - Add "handle event:" prefix to unknown action error for consistency - Switch MAJOR/MINOR parsing to strconv.ParseUint to reject negatives - Soften normalizeWhitespace comment (approximating, not matching) - Simplify unhexmangle via strconv.ParseUint, remove hexVal helper - Add concurrent read/write and same-key contention tests Signed-off-by: Daniil Studenikin <daniil.studenikin@flant.ru>
Add udevdb.go with ReadUdevDB and EnrichWithUdevDB for reading /run/udev/data and merging udev-enriched properties into crawler events (which only carry basic kernel uevent fields). Add FillFromCrawler to DeviceMap: atomically replaces the device map with sysfs crawler results, enriching each device from the udev database. Unparseable devices are skipped with errors returned to the caller. Signed-off-by: Daniil Studenikin <daniil.studenikin@flant.ru>
- Skip E: lines with empty key (idx > 0 instead of idx >= 0) - Add test for value containing '=' sign - Add test for invalid MINOR in EnrichWithUdevDB Signed-off-by: Daniil Studenikin <daniil.studenikin@flant.ru>
- Update DeviceMap/HandleEvent godoc (FillFromCrawler, ErrUnknownAction) - Move udevDBPath from method param to DeviceMap struct field - Add context.Context to FillFromCrawler for cancellation support - Preallocate errs slice - Ignore ErrNotExist from udev DB enrichment (normal at early boot) - Reject nil devices slice instead of silently wiping the map - Return errors.Join instead of []error - Use filepath.Join for path construction in ReadUdevDB - Fix TrimSpace/CutPrefix ordering and use IndexByte in ReadUdevDB - Hoist serialKeyPriority to package-level var Signed-off-by: Daniil Studenikin <daniil.studenikin@flant.ru>
cd4f80d to
a1f6a8a
Compare
Summary
PR 2 of 3 in the effort to replace
lsblkexec-based block device discovery.Adds initial device population support to
DeviceMap:ReadUdevDB— reads/run/udev/data/b<major>:<minor>and extractsE:properties. Crawler events from sysfs only carry basic kernel ueventfields (MAJOR, MINOR, DEVNAME, DEVTYPE); this function provides the
udev-enriched fields (serial, model, wwn).
EnrichWithUdevDB— merges udev DB properties into a raw event envmap. Event keys take priority over DB keys. On any failure returns the
original env with an explanatory error.
FillFromCrawler— atomically replaces the entireDeviceMapwithdevices from
crawler.Device(go-udev), enriching each with the udevdatabase. Unparseable devices are skipped; all errors are returned to
the caller.
How this will be used
PR 3 will call
FillFromCrawlerduring scanner startup (aftergo-udev/crawler.ExistingDevices) to populate the DeviceMap beforethe netlink monitor starts receiving events.
What is NOT in this PR
[]crawler.Device)Test plan
error cases, immutability), FillFromCrawler (replace semantics, skip
bad devices, nil input, udev DB enrichment)