Skip to content

feat: custom chunker registry#1116

Merged
lidel merged 12 commits intoipfs:mainfrom
alanshaw:ash/feat/custom-chunker-registry
Mar 10, 2026
Merged

feat: custom chunker registry#1116
lidel merged 12 commits intoipfs:mainfrom
alanshaw:ash/feat/custom-chunker-registry

Conversation

@alanshaw
Copy link
Member

@alanshaw alanshaw commented Mar 6, 2026

Allows custom chunkers to be registered and accessed by their string name + optional params as per the default chunkers.

@alanshaw alanshaw requested a review from a team as a code owner March 6, 2026 12:55
@lidel lidel self-assigned this Mar 6, 2026
lidel added 10 commits March 6, 2026 18:00
Register writes and FromString reads the splitters map with no
synchronization, which is a data race if called concurrently.

- chunker/registry.go: add splittersMu sync.RWMutex, Lock in Register
- chunker/parse.go: RLock/RUnlock around map read in FromString
Register silently accepted empty names, names with dashes (unmatchable
by FromString which splits on dash), nil constructors, and duplicate
registrations that could overwrite built-ins.

- chunker/registry.go: panic on empty name, dash in name, nil ctor,
  duplicate name (follows database/sql.Register convention)
- chunker/registry_test.go: use unique names per subtest, add panic
  test cases, add comment explaining why not parallel
CtorFunc is an uncommon abbreviation. SplitterFunc follows Go naming
conventions (like http.HandlerFunc) and clearly communicates purpose.

- chunker/registry.go: rename type and all usages
FromString doc only listed built-in chunkers and didn't mention custom
ones. Package doc didn't mention extensibility. SplitterFunc had no doc.

- chunker/registry.go: add SplitterFunc doc, rewrite Register doc with
  init-time usage example and panic contract
- chunker/parse.go: rewrite FromString doc to list built-in chunkers
  and mention Register
- chunker/splitting.go: expand package doc to mention extensibility
  via Register and FromString
- chunker/parse_test.go: add TestParseDefault ("", "default"),
  TestParseUnknown, reject "size-123-extra" in TestParseSize
The old FromString silently accepted "size-123-extra" by only reading
the first segment after the dash. The refactored parseSizeString now
validates the format and rejects extra parameters.
Built-in chunkers were written directly to the map, bypassing the
name and duplicate checks in Register. Call Register instead so the
same rules apply to built-ins and custom chunkers alike.
Replaces strings.Index + manual slicing with strings.Cut.
ctor was a leftover from the original CtorFunc type name. fn is the
idiomatic Go convention for function-typed parameters.
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @alanshaw ❤️

I made a few cosmetic/docs tweaks:

  • added sync.RWMutex to protect the registry from concurrent access
  • added input validation in Register (empty name, dash in name, nil fn, duplicates)
  • routed built-in chunkers through Register in init() so they get the same validation
  • renamed CtorFunc to SplitterFunc but its mostly idiomatic clarity / estetic choice (ctor is C++/java i think, in golang it feels off)
  • minor godoc and code cleanup

If this looks good to you (👍️?), we can merge it next week.

ps. Are you working on custom chunkers? :)

panic("chunk: Register fn is nil")
}
if _, dup := splitters[name]; dup {
panic("chunk: Register called twice for chunker " + name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we allow replacing a chunker? Maybe I have a faster implementation...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use custom name then, to avoid bugs in new code beign reported as bugs in default chunker in the same name.

And if you have a faster implementation, why not make upstream library better? :-)

@alanshaw
Copy link
Member Author

alanshaw commented Mar 9, 2026

  • added input validation in Register (empty name, dash in name, nil fn, duplicates)

Fine but I would like to replace an existing chunker 🙏

  • routed built-in chunkers through Register in init() so they get the same validation

Yeah I partly didn't do that because they shouldn't need validation - they are the defaults and should be present and correct. Not opposed to it though.

  • renamed CtorFunc to SplitterFunc but its mostly idiomatic clarity / estetic choice (ctor is C++/java i think, in golang it feels off)

Yep no problem.

@alanshaw
Copy link
Member Author

alanshaw commented Mar 9, 2026

  • added input validation in Register (empty name, dash in name, nil fn, duplicates)

Fine but I would like to replace an existing chunker 🙏

TBH it's not a deal breaker if you disagree - feel free to merge.

@lidel
Copy link
Member

lidel commented Mar 10, 2026

Cool, let's keep as-is.
It feels safer if its impossible to override defaults.

Removes ambiguity -- easier to triage bugs and debug issues when we can clearly see a custom chunker was passed.

@lidel lidel merged commit e8ef4a2 into ipfs:main Mar 10, 2026
11 checks passed
@alanshaw alanshaw deleted the ash/feat/custom-chunker-registry branch March 10, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants