Skip to content

Use dolt JSON encoding#2639

Open
zachmu wants to merge 67 commits into
mainfrom
zachmu/json-enc
Open

Use dolt JSON encoding#2639
zachmu wants to merge 67 commits into
mainfrom
zachmu/json-enc

Conversation

@zachmu
Copy link
Copy Markdown
Member

@zachmu zachmu commented Apr 24, 2026

This gives us the storage, merge, and perf benefits for JSON in doltgres.

This change also results in a behavior change for ORDER BY on JSONB columns. Postgres has a particular b-tree order it uses to store JSON documents in order to efficient perform its lookups. Dolt's method for efficient JSON retrieval is quite different, and has nothing to do with the order in which documents are stored in a primary index.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

Main PR
covering_index_scan_postgres 1929.12/s 1936.30/s +0.3%
groupby_scan_postgres 125.87/s 126.20/s +0.2%
index_join_postgres 649.78/s 640.57/s -1.5%
index_join_scan_postgres 805.94/s 813.18/s +0.8%
index_scan_postgres 23.01/s 23.67/s +2.8%
oltp_delete_insert_postgres 741.94/s 761.45/s +2.6%
oltp_insert 667.07/s 685.02/s +2.6%
oltp_point_select 2990.73/s 3065.69/s +2.5%
oltp_read_only 3075.31/s 3104.53/s +0.9%
oltp_read_write 2011.09/s ${\color{lightgreen}2282.78/s}$ ${\color{lightgreen}+13.5\%}$
oltp_update_index 619.52/s ${\color{lightgreen}717.09/s}$ ${\color{lightgreen}+15.7\%}$
oltp_update_non_index 679.56/s 745.15/s +9.6%
oltp_write_only 1600.80/s 1624.32/s +1.4%
select_random_points 1881.35/s 1874.71/s -0.4%
select_random_ranges 1103.03/s 1124.35/s +1.9%
table_scan_postgres 21.99/s 22.69/s +3.1%
types_delete_insert_postgres 680.48/s 733.18/s +7.7%
types_table_scan_postgres 9.21/s ${\color{red}7.75/s}$ ${\color{red}-15.9\%}$

@zachmu zachmu requested a review from nicktobey April 24, 2026 23:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

Main PR
Total 42090 42090
Successful 18250 18280
Failures 23840 23810
Partial Successes1 5385 5414
Main PR
Successful 43.3595% 43.4307%
Failures 56.6405% 56.5693%

${\color{red}Regressions (3)}$

json_encoding

QUERY:          SELECT '"\u0000"'::json;
RECEIVED ERROR: row sets differ:
    Postgres:
        {"�"}
    Doltgres:
        {"\"�\""}

jsonb

QUERY:          SELECT count(*) FROM testjsonb WHERE j > '{"p":1}';
RECEIVED ERROR: row sets differ:
    Postgres:
        {884}
    Doltgres:
        {894}
QUERY:          select '12345.0000000000000000000000000000000000000000000005'::jsonb::numeric;
RECEIVED ERROR: row sets differ:
    Postgres:
        {"12345.0000000000000000000000000000000000000000000005"}
    Doltgres:
        {"12345"}

${\color{lightgreen}Progressions (33)}$

json

QUERY: SELECT test_json -> 'x'
FROM test_json
WHERE json_type = 'scalar';
QUERY: SELECT test_json -> 'x'
FROM test_json
WHERE json_type = 'array';
QUERY: SELECT test_json -> 'x'
FROM test_json
WHERE json_type = 'object';
QUERY: SELECT test_json -> 2
FROM test_json
WHERE json_type = 'scalar';
QUERY: SELECT test_json -> 2
FROM test_json
WHERE json_type = 'object';
QUERY: select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 1;
QUERY: select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> -1;
QUERY: select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 'z';
QUERY: select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> '';
QUERY: select '[{"b": "c"}, {"b": "cc"}]'::json -> 3;
QUERY: select '[{"b": "c"}, {"b": "cc"}]'::json -> 'z';
QUERY: select '"foo"'::json -> 1;
QUERY: select '"foo"'::json -> 'z';
QUERY: select '{"a": {"b":{"c": "foo"}}}'::json #> array['a', null];
QUERY: select '{"a": {"b":{"c": "foo"}}}'::json #> array['a', ''];
QUERY: select '{"a": {"b":{"c": "foo"}}}'::json #> array['a','b','c','d'];
QUERY: select '{"a": {"b":{"c": "foo"}}}'::json #> array['a','z','c'];
QUERY: select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json #> array['a','z','b'];
QUERY: select '[{"b": "c"}, {"b": "cc"}]'::json #> array['z','b'];
QUERY: select '"foo"'::json #> array['z'];
QUERY: select '42'::json #> array['f2'];
QUERY: select '42'::json #> array['0'];

json_encoding

QUERY: SELECT '"\u000g"'::json;
QUERY: SELECT '"\u000g"'::jsonb;
QUERY: SELECT jsonb '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
QUERY: SELECT jsonb '{ "a":  "null \\u0000 escape" }' as not_an_escape;

jsonb

QUERY: SELECT count(distinct j) FROM testjsonb;
QUERY: SELECT '["a","b","c",[1,2],null]'::jsonb -> -6;
QUERY: SELECT '{"a":"b","c":[1,2,3]}'::jsonb #> '{c,3}';
QUERY: SELECT '{"a":"b","c":[1,2,3]}'::jsonb #> '{c,-1}';
QUERY: SELECT '{"a":"b","c":[1,2,3]}'::jsonb #> '{c,-3}';
QUERY: SELECT '{"a":"b","c":[1,2,3]}'::jsonb #> '{c,-4}';

subselect

QUERY: select count(*) from tenk1 t
where (exists(select 1 from tenk1 k where k.unique1 = t.unique2) or ten < 0);

Footnotes

  1. These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants