asun-py

C++ pybind11 extension for ASUN (Array-Schema Unified Notation).

Provides 7 functions without requiring manual schema strings for encoding: encode, encodeTyped, encodePretty, encodePrettyTyped, decode, encodeBinary, decodeBinary.

The wheel also ships asun.pyi and py.typed, so editors and static type checkers can understand the extension module without a separate stub package.

中文文档

Why ASUN?

json

Standard JSON repeats every field name in every record. When you send structured data to an LLM, over an API, or across services, that repetition wastes tokens, bytes, and attention:

[
  { "id": 1, "name": "Alice", "active": true },
  { "id": 2, "name": "Bob", "active": false },
  { "id": 3, "name": "Carol", "active": true }
]

asun

ASUN declares the schema once and streams data as compact tuples:

[{id, name, active}]:
  (1,Alice,true),
  (2,Bob,false),
  (3,Carol,true)

Fewer tokens. Smaller payloads. Clearer structure, and faster parsing than repeated-object JSON.

Requirements

Tool	Version
g++	≥ 11 (C++17)
python3-dev	any (provides `Python.h`)
Python	≥ 3.8

pybind11 2.13.6 headers are vendored in vendor/pybind11/ — no separate installation needed.

Build

# Option A — shell script (auto-installs python3-dev via sudo if missing)
bash build.sh

# Option B — Makefile
make

# Option C — CMake
cmake -B build && cmake --build build

API

Type inference rules

Python value	Inferred ASUN type
`bool`	`bool`
`int`	`int`
`float`	`float`
`str`	`str`
`None`	optional (e.g. `str?`, `int?`)

Cross-row type merging for lists: When encoding a list, all rows are scanned to compute the final type:

A field that is non-None in row 0 but None in some later row is promoted to optional (e.g. str → str?, int → int?).
Type conflicts between non-None values (e.g. int in row 0, str in row 1) fall back to str.

This means encodeTyped is safe to use even when only some rows have None for a given field.

`encode(obj) -> str` — schema without scalar hints, inferred

asun.encode({"id": 1, "name": "Alice"})
# → '{id,name}:\n(1,Alice)\n'

asun.encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# → '[{id,name}]:\n(1,Alice),\n(2,Bob)\n'

Decode semantics without scalar hints: When decoded with decode(), terminal field values are returned as strings because the schema omits scalar type hints. Structural bindings such as @{} and @[] still remain in the schema. Use encodeTyped when you need a type-preserving round-trip.

`encodeTyped(obj) -> str` — schema with scalar hints, inferred

Type is inferred from all rows (not just the first). A field that is None in any row is made optional:

asun.encodeTyped({"id": 1, "name": "Alice", "active": True})
# → '{id@int,name@str,active@bool}:\n(1,Alice,true)\n'

# Optional field inferred from cross-row merging:
asun.encodeTyped([{"id": 1, "tag": "hello"}, {"id": 2, "tag": None}])
# → '[{id@int,tag@str?}]:\n(1,hello),\n(2,)\n'

`encodePretty(obj) -> str` — pretty + untyped, inferred

pretty = asun.encodePretty(rows)

`encodePrettyTyped(obj) -> str` — pretty + scalar hints, inferred

pretty = asun.encodePrettyTyped(rows)

`decode(text) -> dict | list[dict]`

Decodes both schemas with scalar hints and schemas without scalar hints embedded in the text:

# schema with scalar hints → values restored as Python types
rec  = asun.decode('{id@int, name@str}:\n(1,Alice)\n')    # {'id': 1, 'name': 'Alice'}
rows = asun.decode('[{id@int, name@str}]:\n(1,Alice),\n(2,Bob)\n')

# schema without scalar hints → scalar values returned as strings
rec2 = asun.decode('{id,name}:\n(1,Alice)\n')             # {'id': '1', 'name': 'Alice'}

Block comments are supported anywhere whitespace is allowed:

rec = asun.decode('/* top */ {id@int,name@str}: /* row */ (1, /* name */ Alice)')

`encodeBinary(obj) -> bytes` — schema inferred internally

data = asun.encodeBinary(rows)

`decodeBinary(data, schema) -> dict | list[dict]`

Schema is required because the binary wire format carries no embedded type information:

rows = asun.decodeBinary(data, "[{id@int, name@str}]")

Typing

asun-py includes inline typing support for the compiled extension:

from asun import decode

rows = decode("[{id@int, name@str}]:(1,Alice),(2,Bob)")

Type checkers will infer dict[str, Any] | list[dict[str, Any]] for decode results and validate function signatures from the bundled asun.pyi.

Binary format

Little-endian layout, identical to asun-rs and asun-go:

Type	Bytes
`int`	8 (i64 LE)
`uint`	8 (u64 LE)
`float`	8 (f64 LE)
`bool`	1
`str`	4-byte length LE + UTF-8 bytes
optional	1-byte tag (0=null, 1=present) + value
slice	4-byte count LE + elements

Run tests

# after building:
python3 -m pytest tests/ -v

Example

import asun

users = [
    {"id": 1, "name": "Alice", "score": 9.5},
    {"id": 2, "name": "Bob",   "score": 7.2},
]

# Schema is inferred automatically—no schema string needed
text        = asun.encode(users)            # schema binding without scalar hints
textTyped   = asun.encodeTyped(users)       # schema binding with scalar hints
pretty      = asun.encodePrettyTyped(users) # pretty + scalar hints
blob        = asun.encodeBinary(users)     # binary (schema inferred internally)

assert asun.decode(textTyped)  == users    # round-trip with scalar hints
assert asun.decode(pretty)     == users
assert asun.decodeBinary(blob, "[{id@int, name@str, score@float}]") == users

Latest Benchmarks

Measured on this machine with:

bash build.sh
PYTHONPATH=. python3 examples/bench.py

Headline numbers:

Flat 1,000-record dataset: ASUN text serialize 118.98ms vs JSON 403.32ms, deserialize 221.21ms vs JSON 441.89ms
Flat 10,000-record dataset: ASUN text serialize 81.70ms vs JSON 293.38ms, deserialize 158.39ms vs JSON 317.44ms
Size summary for 1,000 flat records: JSON 137,674 B, ASUN text 57,761 B (58% smaller), ASUN binary 74,454 B (46% smaller vs JSON)
Throughput summary on 1,000 records: ASUN text was 3.58x faster than JSON for serialize and 2.01x faster for deserialize
Binary mode was even faster: 7.18x faster than JSON on serialization and 4.16x faster on deserialization in the benchmark summary

Contributors

Athan

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
vendor/pybind11		vendor/pybind11
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
asun.pyi		asun.pyi
build-wheels.sh		build-wheels.sh
build.sh		build.sh
publish.sh		publish.sh
py.typed		py.typed
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asun-py

Why ASUN?

Requirements

Build

API

Type inference rules

`encode(obj) -> str` — schema without scalar hints, inferred

`encodeTyped(obj) -> str` — schema with scalar hints, inferred

`encodePretty(obj) -> str` — pretty + untyped, inferred

`encodePrettyTyped(obj) -> str` — pretty + scalar hints, inferred

`decode(text) -> dict | list[dict]`

`encodeBinary(obj) -> bytes` — schema inferred internally

`decodeBinary(data, schema) -> dict | list[dict]`

Typing

Binary format

Run tests

Example

Latest Benchmarks

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

asun-py

Why ASUN?

Requirements

Build

API

Type inference rules

encode(obj) -> str — schema without scalar hints, inferred

encodeTyped(obj) -> str — schema with scalar hints, inferred

encodePretty(obj) -> str — pretty + untyped, inferred

encodePrettyTyped(obj) -> str — pretty + scalar hints, inferred

decode(text) -> dict | list[dict]

encodeBinary(obj) -> bytes — schema inferred internally

decodeBinary(data, schema) -> dict | list[dict]

Typing

Binary format

Run tests

Example

Latest Benchmarks

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`encode(obj) -> str` — schema without scalar hints, inferred

`encodeTyped(obj) -> str` — schema with scalar hints, inferred

`encodePretty(obj) -> str` — pretty + untyped, inferred

`encodePrettyTyped(obj) -> str` — pretty + scalar hints, inferred

`decode(text) -> dict | list[dict]`

`encodeBinary(obj) -> bytes` — schema inferred internally

`decodeBinary(data, schema) -> dict | list[dict]`

Packages