The architecture of cjsonx is designed around extreme performance and memory locality. It achieves parse speeds in excess of 1.0 GB/s by decoupling the parsing process into two distinct phases: SIMD Structural Scanning (Stage 1) and Recursive Descent Parsing (Stage 2).
This approach was pioneered by simdjson and popularized in C by yyjson. cjsonx builds upon these foundations by introducing the Eisel-Lemire float parsing algorithm into the lexical layer for incredibly fast numerical decoding.
In standard parsers, the CPU spends the vast majority of its time checking every single character one by one (if (c == ' ') continue;). This causes massive pipeline stalls and branch mispredictions.
cjsonx completely avoids this. Instead, it reads the JSON payload in 32-byte (AVX2) or 16-byte (NEON) chunks.
- Bitmask Generation: SIMD instructions are used to compare the chunk against structural characters (
{,},[,],:,,,"). - String Escaping: A secondary SIMD pass identifies the start and end of strings, ensuring that characters inside strings (like a colon
:inside"time: 12:00") are masked out and ignored. - Tape Creation: The indices of all valid structural characters are compressed into a "Tape" (an array of integers).
By the end of Stage 1, we have a complete map of the JSON structure. Whitespace is mathematically ignored at a speed of several gigabytes per second.
With the Tape constructed, the parser no longer reads the JSON character by character. It jumps directly from one structural character to the next.
- Computed Gotos: On supported compilers (GCC/Clang), the central switch statement is replaced with a dispatch table (
goto *dispatch_table[c]). This allows the CPU's branch predictor to track the flow of JSON states independently, significantly reducing branch mispredictions and boosting parse speed by 10-15%. - Flat Arena Allocation: Instead of calling
malloc()for every object, array, or string,cjsonxallocates a single block of memory (an Arena) upfront. - 16-byte Nodes: Every JSON element (whether it's a number, a string reference, or an object) is represented by a highly compressed 16-byte structure (
cjsonx_node_t).- 8 bytes for data (a
double, a string length, or child indices). - 8 bytes for metadata (type, tape index).
- 8 bytes for data (a
- Cache Locality: Because nodes are stored sequentially in the Arena array, traversing the DOM tree is extremely cache-friendly, leading to massive speedups during querying (
cjsonx_get).
When parsing strings, cjsonx achieves zero-copy speeds wherever possible. If a string contains escape characters or control characters, it must be validated and potentially re-allocated.
To maximize throughput, cjsonx uses Single-Pass SIMD & SWAR techniques:
- SIMD (AVX2/NEON/WASM): It scans 16 to 32 bytes at a time, checking for escapes (
\), non-ASCII characters, and raw control characters (< 0x20) simultaneously in a single CPU instruction. - SWAR (SIMD Within A Register): On older platforms without vector instructions, it loads 8 bytes into a 64-bit register and uses bitwise arithmetic to validate the entire block concurrently without looping character-by-character.
Most fast parsers use standard 64-bit IEEE 754 conversions (strtod or custom integer math) which can be extremely slow and sometimes inaccurate on complex edge cases.
cjsonx leverages the Eisel-Lemire Algorithm directly in the parsing pipeline:
- As the parser reads a number, it accumulates the mantissa and exponent using fast 64-bit integer arithmetic.
- It then performs a high-precision table lookup (using a precomputed table of powers of 10) and a 128-bit multiplication (
__uint128_tor emulated on 32-bit platforms). - This guarantees that 99.9% of all floating-point numbers are resolved exactly and correctly in a single fast-path operation without any floating-point math overhead.
- If the number is an extreme edge case that the fast path cannot perfectly resolve, it gracefully falls back to the standard C library (
strtod) to guarantee 100% correctness.