Skip to content

Comments

Add Dict service to Go SDK#265

Open
sri-thorobid wants to merge 2 commits intomodal-labs:mainfrom
ThoroBid-AI:sri-thorobid/go-dict-impl
Open

Add Dict service to Go SDK#265
sri-thorobid wants to merge 2 commits intomodal-labs:mainfrom
ThoroBid-AI:sri-thorobid/go-dict-impl

Conversation

@sri-thorobid
Copy link

Note: This PR was developed with the assistance of Claude Code.

Summary

  • Implement full Dict API for the Go SDK: Put, Get, Update, Pop, Contains, Len, Keys, Values, Items, Clear, FromName, Delete
  • Use og-rek + cloudpickle hybrid serialization for byte-level compatibility with Python's cloudpickle.dumps(value, protocol=4)
  • Register DictService on Client (client.Dicts)

Serialization

Why this complexity?

Modal's Dict server matches keys by byte-equality of their serialized pickle representation. When Python writes d[42] = "hello", the key is stored as the exact bytes produced by cloudpickle.dumps(42, protocol=4). For Go to read that key back, it must produce the identical byte sequence — not just a semantically equivalent pickle, but byte-for-byte the same output.

This is non-trivial because Go's og-rek pickle library and Python's cloudpickle produce structurally different output for the same values:

Difference og-rek (Go) cloudpickle (Python)
Large integers (outside int32) ASCII I opcode: I1234567890\n LONG1 opcode: 0x8a + binary two's complement LE bytes
Bytes ([]byte) Wraps in builtins.bytearray() constructor Bare SHORT_BINBYTES (C) or BINBYTES (B) opcode
Strings/bytes framing No MEMOIZE after string/bytes opcodes MEMOIZE (0x94) after each string/bytes value
Frame header No FRAME wrapper FRAME (0x95) + 8-byte LE length prefix for payloads ≥ 4 bytes

Without handling these differences, a Go client writing dict[42] = "value" would store the key under different bytes than Python expects, making the entry invisible to Python readers (and vice versa).

How it works

Keys and values are serialized using og-rek's pickle encoder, then post-processed by ogrekToCloudpickle to match cloudpickle's protocol 4 output:

  1. Convert ASCII I opcode → LONG1 for integers outside int32 range
  2. Unwrap bytearray() constructor → bare SHORT_BINBYTES/BINBYTES for []byte
  3. Inject MEMOIZE after string/bytes opcodes
  4. Wrap payload in FRAME header if ≥ 4 bytes

Deserialization uses og-rek's decoder with the inverse transform (cloudpickleToOgRek).

Go key/value types

Go type Pickle encoding Deserialized as
nil NONE pickle.None{}
bool NEWTRUE / NEWFALSE bool
int, int8int64 BININT1 / BININT2 / BININT / LONG1 int64 (≤ int32) or *big.Int
uint8uint64 BININT1 / BININT2 / BININT / LONG1 int64 (≤ int32) or *big.Int
float32, float64 BINFLOAT float64
string SHORT_BINUNICODE / BINUNICODE string
[]byte SHORT_BINBYTES / BINBYTES pickle.Bytes (type Bytes string)
map[any]any EMPTY_DICT + SETITEMS map[interface{}]interface{}
[]any EMPTY_LIST + APPENDS []interface{}

Note that deserialized types differ from input types due to og-rek's type mapping (e.g. all ints widen to int64, None becomes pickle.None{} not nil, []byte becomes pickle.Bytes). Callers should use type assertions accordingly.

Files changed

File Description
modal-go/dict.go Dict service + instance implementation (776 lines)
modal-go/dict_serialization_test.go Golden byte tests, deserialization tests, round-trip tests, transform tests
modal-go/test/dict_test.go Integration tests for all Dict operations
modal-go/client.go Register Dicts field on Client

Test plan

  • Unit tests: 25 golden byte cases matching Python's cloudpickle.dumps output
  • Unit tests: round-trip serialize → deserialize for all key types
  • Unit tests: ogrekToCloudpickle / cloudpickleToOgRek inverse transform
  • Integration tests: all Dict operations against live Modal backend (14 tests)
  • Cross-language: Go writes → Python reads (43/43 passed)
  • Cross-language: Python writes → Go reads (52/52 passed)
  • Go round-trip: Go writes → Go reads (56/56 passed)

Implement full Dict API (Put, Get, Update, Pop, Contains, Len, Keys,
Values, Items, Clear, FromName, Delete) using og-rek + cloudpickle
hybrid serialization for cross-language compatibility with Python.
@cursor
Copy link

cursor bot commented Feb 17, 2026

PR Summary

Medium Risk
Adds a new storage API surface and relies on subtle byte-level pickle transformations for cross-language key compatibility; mistakes could cause hard-to-debug missing keys or incorrect lookups.

Overview
Adds a new client.Dicts service and Dict type to the Go SDK, enabling creation of ephemeral or named Dicts and supporting core operations like Put/Get/Update/Pop, membership/length queries, clearing, and key/value/item streaming iteration.

Implements specialized pickle serialization for Dict keys to match Python cloudpickle protocol 4 byte-for-byte (FRAME/MEMOIZE injection, LONG1 handling, and []byte bytearray unwrapping), while values use plain og-rek pickle; includes golden-byte and round-trip unit tests plus end-to-end tests covering the Dict API and delete/allow-missing behavior.

Written by Cursor Bugbot for commit 460afe3. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

- encodeLong1: guard against empty Bytes() for n=-1 (abs-1==0),
  which would panic on data[len(data)-1] index out of range
- cloudpickleToOgRek: only strip trailing 0x94 as MEMOIZE when the
  leading opcode is a string/bytes type, mirroring ogrekToCloudpickle.
  Previously, a BININT payload ending in 0x94 (e.g. int -1811939328)
  would have its data byte falsely stripped.
@mwaskom
Copy link
Contributor

mwaskom commented Feb 18, 2026

Thanks @sri-thorobid. While we are planning to add support for Dict or some other Key-value primitive to the Go and JS SDKs, there is a substantial amount of complexity around serialization stability that you've picked up on here. Its likely that we're going to approach using CBOR serialization as we do for Function payloads, which is build to be cross-language.

@sri-thorobid
Copy link
Author

@mwaskom thats what i figured i just raised this incase others need this for now for primitive keys, we are using it for now in our fork till cbor compatability. feel free to close PR as it will still be searchable. Thanks for the view into roadmap though! excited for not having to manage a fork :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants