Conversation
Register writes and FromString reads the splitters map with no synchronization, which is a data race if called concurrently. - chunker/registry.go: add splittersMu sync.RWMutex, Lock in Register - chunker/parse.go: RLock/RUnlock around map read in FromString
Register silently accepted empty names, names with dashes (unmatchable by FromString which splits on dash), nil constructors, and duplicate registrations that could overwrite built-ins. - chunker/registry.go: panic on empty name, dash in name, nil ctor, duplicate name (follows database/sql.Register convention) - chunker/registry_test.go: use unique names per subtest, add panic test cases, add comment explaining why not parallel
CtorFunc is an uncommon abbreviation. SplitterFunc follows Go naming conventions (like http.HandlerFunc) and clearly communicates purpose. - chunker/registry.go: rename type and all usages
FromString doc only listed built-in chunkers and didn't mention custom ones. Package doc didn't mention extensibility. SplitterFunc had no doc. - chunker/registry.go: add SplitterFunc doc, rewrite Register doc with init-time usage example and panic contract - chunker/parse.go: rewrite FromString doc to list built-in chunkers and mention Register - chunker/splitting.go: expand package doc to mention extensibility via Register and FromString
- chunker/parse_test.go: add TestParseDefault ("", "default"),
TestParseUnknown, reject "size-123-extra" in TestParseSize
The old FromString silently accepted "size-123-extra" by only reading the first segment after the dash. The refactored parseSizeString now validates the format and rejects extra parameters.
Built-in chunkers were written directly to the map, bypassing the name and duplicate checks in Register. Call Register instead so the same rules apply to built-ins and custom chunkers alike.
Replaces strings.Index + manual slicing with strings.Cut.
ctor was a leftover from the original CtorFunc type name. fn is the idiomatic Go convention for function-typed parameters.
lidel
left a comment
There was a problem hiding this comment.
Thanks for this, @alanshaw ❤️
I made a few cosmetic/docs tweaks:
- added
sync.RWMutexto protect the registry from concurrent access - added input validation in
Register(empty name, dash in name, nil fn, duplicates) - routed built-in chunkers through
Registerininit()so they get the same validation - renamed
CtorFunctoSplitterFuncbut its mostly idiomatic clarity / estetic choice (ctor is C++/java i think, in golang it feels off) - minor godoc and code cleanup
If this looks good to you (👍️?), we can merge it next week.
ps. Are you working on custom chunkers? :)
| panic("chunk: Register fn is nil") | ||
| } | ||
| if _, dup := splitters[name]; dup { | ||
| panic("chunk: Register called twice for chunker " + name) |
There was a problem hiding this comment.
Hmm, can we allow replacing a chunker? Maybe I have a faster implementation...
There was a problem hiding this comment.
Use custom name then, to avoid bugs in new code beign reported as bugs in default chunker in the same name.
And if you have a faster implementation, why not make upstream library better? :-)
Fine but I would like to replace an existing chunker 🙏
Yeah I partly didn't do that because they shouldn't need validation - they are the defaults and should be present and correct. Not opposed to it though.
Yep no problem. |
TBH it's not a deal breaker if you disagree - feel free to merge. |
|
Cool, let's keep as-is. Removes ambiguity -- easier to triage bugs and debug issues when we can clearly see a custom chunker was passed. |
Allows custom chunkers to be registered and accessed by their string name + optional params as per the default chunkers.