Skip to content

GH-41017: [C++] Preserve ordered flag in DictionaryBuilder#49797

Open
tinezivic wants to merge 1 commit intoapache:mainfrom
tinezivic:fix-dictionary-builder-ordered
Open

GH-41017: [C++] Preserve ordered flag in DictionaryBuilder#49797
tinezivic wants to merge 1 commit intoapache:mainfrom
tinezivic:fix-dictionary-builder-ordered

Conversation

@tinezivic
Copy link
Copy Markdown

@tinezivic tinezivic commented Apr 19, 2026

Rationale for this change

DictionaryBuilderBase::type() does not pass the ordered parameter to ::arrow::dictionary(), causing it to always default to false. This means that any DictionaryArray built through MakeBuilder() or MakeDictionaryBuilder() with an ordered DictionaryType will produce an array where type().ordered() == false.

This affects PyArrow users: pa.array(data, type=pa.dictionary(pa.int8(), pa.string(), ordered=True)) returns an array with ordered=False.

Reported in: #41017
Also caused: pandas-dev/pandas#58152

What changes are included in this PR?

  • Add bool ordered_ = false member and set_ordered() method to DictionaryBuilderBase (both the primary template and the NullType specialization)
  • Pass ordered_ to ::arrow::dictionary() in type() and FinishInternal()
  • Add bool ordered field to DictionaryBuilderCase in builder.cc
  • Propagate dict_type.ordered() through MakeBuilderImpl::Visit() and MakeDictionaryBuilder()

Are these changes tested?

Yes. Added 3 C++ tests in array_dict_test.cc:

  • MakeBuilderPreservesOrdered — verifies MakeBuilder with ordered dict type produces ordered array
  • MakeBuilderUnorderedByDefault — verifies unordered stays unordered
  • MakeDictionaryBuilderPreservesOrdered — verifies MakeDictionaryBuilder preserves ordered

Are there any user-facing changes?

No API changes. DictionaryBuilder now correctly preserves the ordered flag from the input DictionaryType, fixing the silent data loss.

AI Disclosure

This fix was developed with the assistance of GitHub Copilot (Claude Sonnet 4.6). The AI assisted with code generation, C++ template structure, and diff review. The contributor identified the root cause, understood the DictionaryBuilder internals, and verified the fix logic.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #41017 has been automatically assigned in GitHub to PR creator.

@tinezivic tinezivic force-pushed the fix-dictionary-builder-ordered branch from 7960ab8 to d3323b0 Compare April 19, 2026 06:11
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #41017 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this will probably also solve #49674 ...there is a unit test in that PR that would be good to include here if this approach seems reasonable.

I am wondering if there is some way to set the ordering of the values in a dictionary builder when using ordered = True, or otherwise the result will be an array of values probably ordered by where each value was first seen. I am not sure there is a realistic use case for building an ordered dictionary where the values aren't known in advance (but perhaps that is possible with the DictionaryBuilder).

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 20, 2026
DictionaryBuilderBase::type() did not pass the ordered parameter
to ::arrow::dictionary(), causing it to always default to false.
This meant that building a DictionaryArray via MakeBuilder or
MakeDictionaryBuilder with an ordered DictionaryType would produce
an array with ordered=false.

Fix: Add ordered_ member and set_ordered() to DictionaryBuilderBase
(both the primary template and the NullType specialization). The
DictionaryBuilderCase in builder.cc now propagates the ordered flag
from the input DictionaryType to the builder after construction.

Generated-by: GitHub Copilot
@tinezivic tinezivic force-pushed the fix-dictionary-builder-ordered branch from d3323b0 to 03c4560 Compare April 20, 2026 16:05
@tinezivic
Copy link
Copy Markdown
Author

Thank you for the review and for pointing to #49798!

Regarding the test from #49798: I have added TestMakeEmptyArrayPreservesDictionaryOrdered to array_test.cc (just amended the commit). It verifies that MakeEmptyArray with an ordered=true DictionaryType produces an array whose type still has ordered()==true. Since MakeEmptyArray calls MakeBuilder internally, this test directly exercises the fix path and covers the scenario from #49674.

Regarding ordering of values: You are right to flag this. The ordered flag in Arrow DictionaryType is a semantic annotation for comparisons (analogous to pandas Categorical(ordered=True)), not a guarantee that dictionary values are physically sorted. A DictionaryBuilder that sees values in insertion order may produce a dictionary where the physical ordering of values does not match the logical ordering implied by ordered=true. This is an inherent characteristic of builder-based encoding: if the caller knows the complete dictionary in advance and wants a truly sorted encoding, they should construct a DictionaryArray directly rather than via a builder. Preserving the ordered flag here is still correct — it prevents silent data loss where a type annotation is simply dropped — but the responsibility for ensuring value ordering rests with the caller. I can add a note to this effect in the PR description if that would be helpful.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants