C++ port of the canonical TypeScript implementation.
Status: complete. Full TS-canonical parity: all 40 functions, 15 type bit-flags, 3 mode constants (
M_KEYPRE/M_KEYPOST/M_VAL),SKIP/DELETEsentinels (pointer-identity), and theInjectionstate machine.inject/transform/validate/selectall dispatch through the canonical injector machinery: 11 transform commands, 6 validate checkers, 4 select operators.Passes the full shared corpus. Run locally with
make testfromcpp/. Per-file pass counts are written tocorpus-scoreboard.jsonafter each run; the committed baseline lives attest-baseline.json.
For motivation, language-neutral concepts, and the cross-language
parity matrix, see the top-level README and
REPORT.md. For the in-depth guide (tutorial, recipes,
explanation), see DOCS.md.
In the monorepo:
cd cpp
make test # smoke + corpus driver (the default build target)
make smoke # just the smoke test
make corpus # just the corpus driver
make sanitize # build + run with ASan + UBSan
make check_leak # build + run under valgrindThe library is header-only across three files in src/:
value.hpp—Value(astd::variant-based tagged type), the in-treeOrderedMap,Sentinel, type bit-flags, predicates.value_io.hpp— JSON parse/serialise via an in-tree, hand-written recursive-descent parser/printer (no third-party dependency).voxgig_struct.hpp— main API: utilities,getpath/setpath/walk/merge/inject/transform/validate/selectplus alltransform_*/validate_*/select_*injectors.
Namespace voxgig::structlib. Requires C++17 (for std::variant /
structured bindings). The library proper has no third-party
dependency — runtime values use the custom Value type and JSON text is
handled in-tree. nlohmann/json is used
only by the test harness (corpus loading), so make test expects its
header on the include path (e.g. /usr/include/nlohmann/json.hpp).
#include "voxgig_struct.hpp"
using namespace voxgig::structlib;
int main() {
// Build a value and read a deep path.
Value store = Value::parse(R"({"db":{"host":"localhost","port":5432}})");
Value host = getpath_v(store, Value("db.host")); // "localhost"
// Reshape by example.
Value out = transform(
Value::parse(R"({"user":{"first":"Ada"},"age":36})"),
Value::parse(R"({"name":"`user.first`","years":"`age`"})"));
// {"name":"Ada","years":36}
return 0;
}(Construct Values with Value::parse(json_text) or the typed
constructors; see DOCS.md and src/value.hpp for the full
set.)
Functions take const Value& arguments and return Value (or
std::vector<Value> for select). The full, example-by-example reference
is in DOCS.md; the canonical semantics for every function
are in the top-level reference.
Two C++-specific naming points (the names are otherwise the canonical ones):
_v("value-style") suffix onwalk_v,merge_v,getpath_v,setpath_v— disambiguates the public value API from header-internal helpers of the same root name.typename_strinstead oftypename—typenameis a reserved C++ keyword.
The parity check (../tools/check_parity.py) maps these back to the
canonical names, so the port reports full parity.
Runtime values are the in-tree Value type (a std::variant tagged
union) with an in-tree insertion-ordered OrderedMap. Nested maps and
lists are reference-stable — the property the canonical algorithm relies on
for walk/merge/inject/setpath.
value.hpp/value_io.hpp parse and serialise JSON in-tree; the library
links no JSON dependency. nlohmann/json appears only in the test driver.
The port follows the shared Group A/B rule (see
../UNDEF_SPEC.md): readers treat a stored null as
"no value"; value-processors preserve it. Value::undef() is the absent
sentinel used as the default alt.
Uniform six-function regex API (see /design/REGEX_API.md). The C++ port
wraps <regex> (C++11), which defaults to the ECMAScript dialect.
| Function | Maps to |
|---|---|
re_compile(pattern) |
std::regex(pattern) (throws std::regex_error on bad pattern) |
re_test(pattern, input) |
std::regex_search → bool |
re_find(pattern, input) |
first match groups as std::vector<std::string> (empty if no match) |
re_find_all(pattern, input) |
std::vector<std::vector<std::string>> |
re_replace(pattern, input, rep) |
std::regex_replace(input, re, rep) |
re_escape(s) |
escape regex metacharacters |
Patterns must stay inside the RE2 subset documented in /design/REGEX.md.
std::regex defaults to ECMAScript syntax and supports backreferences
and lookaround; using them will not be portable.
- libstdc++
<regex>has the worst-in-class catastrophic backtracking. The discovery panel measures ~1.2 s for^(a+)+$over 22 a's plus!. This is well-known and is the reason many production C++ projects avoid<regex>in favour of RE2 or PCRE2. Stay inside the RE2 subset and avoid nested quantifiers; even then, performance won't match the dedicated engines. - Zero-width
replace.re_replace("a*", "abc", "X")returns"XXbXcX"— the ECMA convention shared by all PCRE/ECMA/.NET/Java/Onigmo engines plus the in-tree Thompson ports. Go (RE2) returns"XbXcX"instead; see/design/REGEX_PATHOLOGICAL.md.
See /design/REGEX_PATHOLOGICAL.md for the cross-port pathological-input panel.
cd cpp
make test # compile + run the corpus driver
make lint # clang-tidy + clang-format checkTests live in tests/; the corpus driver reads the shared
fixtures from ../build/test/.