Skip to content

Conversation

@acul71
Copy link
Contributor

@acul71 acul71 commented Dec 20, 2025

Summary

This PR implements comprehensive feature parity with go-cid, adding 13 major features organized by priority.

Features Added

Phase 1: Critical Features (P0)

1. JSON Marshaling (IPLD Format)

  • BaseCID.to_json_dict() - Convert CID to IPLD JSON format: {"/": "<cid-string>"}
  • BaseCID.from_json_dict() - Parse CID from IPLD JSON format
  • CIDJSONEncoder - Custom JSON encoder for CID objects
  • Full integration with Python's json module

2. Prefix Operations

  • Prefix class - Create and manage CID metadata (version, codec, multihash type/length)
  • Prefix.sum() - Hash data and create CID from prefix
  • Prefix.to_bytes() / Prefix.from_bytes() - Serialize/deserialize prefix
  • BaseCID.prefix() - Extract prefix from existing CID
  • Factory methods: Prefix.v0() and Prefix.v1()

Phase 2: High Priority Features (P1)

3. /ipfs/ Path Parsing

  • parse_ipfs_path() - Extract CID from /ipfs/ paths
  • Automatic extraction in from_string()
  • Supports formats: /ipfs/Qm..., https://ipfs.io/ipfs/Qm..., http://localhost:8080/ipfs/Qm...

4. Extract Encoding

  • extract_encoding() - Extract multibase encoding from CID string without fully parsing
  • Works with both CIDv0 and CIDv1

5. Trailing Bytes Validation

  • from_bytes_strict() - Parse CID from bytes, validating no trailing bytes
  • Ensures all input bytes are consumed during parsing

Phase 3: Medium Priority Features (P2)

6. Builder Pattern

  • V0Builder - Fluent API for constructing CIDv0
  • V1Builder - Fluent API for constructing CIDv1
  • Builder.sum() - Hash data and create CID
  • Builder.with_codec() - Chain codec changes

7. Set Operations

  • CIDSet class - Manage collections of unique CIDs
  • Methods: add(), has(), remove(), visit(), for_each()
  • Full Python set interface support (__len__, __contains__, __iter__)
  • Requires __hash__ implementation on BaseCID (also added)

8. Defined() Check

  • BaseCID.defined() - Check if CID is defined (not a zero-value/undefined CID)

9. Stream Parsing

  • from_reader() - Parse CID from reader/stream
  • Returns tuple of (bytes_read, CID)
  • Supports both CIDv0 and CIDv1

Phase 4: Low Priority Features (P3)

10. MustParse()

  • must_parse() - Parse CID, always raising exception on error
  • Convenience function for strict parsing

11. Binary/Text Marshaling

  • BaseCID.to_bytes() - Serialize to bytes (alias for buffer)
  • BaseCID.to_text() - Serialize to text (UTF-8 encoded string)
  • BaseCID.from_text() - Deserialize from text

12. KeyString()

  • BaseCID.key_string() - Return binary representation as string for use as map keys

13. Loggable()

  • BaseCID.loggable() - Return dict for logging purposes

Implementation Details

New Modules

  • cid/prefix.py - Prefix class and operations
  • cid/builder.py - Builder pattern implementation
  • cid/set.py - CIDSet class

Modified Modules

  • cid/cid.py - Added multiple new methods and helper functions
  • cid/__init__.py - Updated exports
  • docs/api_reference.rst - Added documentation for all new features
  • docs/usage.rst - Added comprehensive usage examples
  • tests/test_cid.py - Added JSON marshaling tests
  • tests/test_new_features.py - New test file with 46 tests
  • tests/test_prefix.py - New test file for prefix operations

Testing

  • ✅ 168 tests total (122 existing + 46 new)
  • ✅ All tests passing
  • ✅ 81% code coverage
  • ✅ Comprehensive test coverage for all new features

Documentation

  • ✅ API reference updated with all new functions and classes
  • ✅ Usage examples added for all features
  • ✅ All code blocks verified and working
  • ✅ Documentation builds successfully

Breaking Changes

None - all new features are additive and backward compatible.

Checklist

  • All features implemented
  • All features tested
  • All features documented
  • Code passes make pr (linting, type checking, tests)
  • Documentation passes make docs
  • All code blocks verified and working
  • Newsfragment added

Closes #60

- Add JSON Marshaling (IPLD Format): to_json_dict(), from_json_dict(), CIDJSONEncoder
- Add Prefix Operations: Prefix class, prefix() method, sum() for creating CIDs
- Add /ipfs/ path parsing: parse_ipfs_path() and automatic extraction
- Add extract_encoding() to get multibase encoding without full parsing
- Add from_bytes_strict() for strict CID parsing with trailing bytes validation
- Add Builder Pattern: V0Builder and V1Builder for fluent CID construction
- Add Set Operations: CIDSet class with full Python set interface
- Add defined() check method for CID validation
- Add from_reader() for parsing CIDs from streams
- Add must_parse() convenience function
- Add Binary/Text Marshaling: to_bytes(), to_text(), from_text()
- Add key_string() for binary representation as string
- Add loggable() for logging purposes
- Add __hash__() to BaseCID for use in sets and dicts
- Add comprehensive tests (46 new tests, 168 total)
- Add complete documentation with working code examples
- Add newsfragment for issue #60

All features tested, documented, and passing make pr and make docs.

Closes #60
@acul71
Copy link
Contributor Author

acul71 commented Dec 20, 2025

@seetadev @pacrob
This is ready to be reviewed and merged.
It makes py-cid feature parity with go-cid
A new py-cid release will be needed

@seetadev seetadev merged commit 4ce1a14 into master Dec 20, 2025
24 checks passed
@seetadev
Copy link
Contributor

@acul71 : Hi Luca. Appreciate the great efforts 💯

This is fantastic. Indeed, this PR brings py-cid in parity with go-cid. This will be very helpful in multiple initiatives utilising cids.

@pacrob pacrob deleted the feature/implement-missing-features branch December 20, 2025 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New Features: Feature Parity with go-cid

3 participants