Skip to content

Overhaul parquet viewer: full encoding support + rich navigation#24

Open
nwoolmer wants to merge 9 commits intomasterfrom
feature/parquet-viewer-overhaul
Open

Overhaul parquet viewer: full encoding support + rich navigation#24
nwoolmer wants to merge 9 commits intomasterfrom
feature/parquet-viewer-overhaul

Conversation

@nwoolmer
Copy link
Copy Markdown
Contributor

@nwoolmer nwoolmer commented Apr 18, 2026

Summary

Fixes the original motivating bug (VARCHAR columns encoded with DeltaLengthByteArray / DeltaByteArray rendered as empty cells because everything was decoded as PLAIN) and turns the parquet viewer into a real interactive TUI.

Decoder — full encoding coverage

decode_data_page now dispatches on every Encoding variant exposed by parquet2:

  • Plain, PlainDictionary, RleDictionary (existing)
  • DeltaLengthByteArray, DeltaByteArray, DeltaBinaryPacked (new)
  • Rle for BOOLEAN — handles V1 4-byte length prefix vs V2 (new)
  • BitPacked — falls through to PLAIN; only realistic use is BOOLEAN (new)
  • ByteStreamSplit for FLOAT/DOUBLE/INT32/INT64/FixedLenByteArray (new)

Decoder correctness fixes

  • INT96 Julian-day math guarded with checked_mul/checked_add
  • Timestamp { is_adjusted_to_utc } honored with a trailing Z
  • Subsecond precision preserved (ms/µs/ns tails, trim trailing zeros)
  • New format_time_of_day for Time logical type (HH:MM:SS[.frac])
  • Repeated columns (max_rep_level > 0) emit an explicit marker instead of silently misaligned per-element output
  • Control characters sanitized at decode time (\n preserved for row-detail popup; \t/\r/other become space/? so the terminal can't reinterpret them and break column alignment)
  • Unicode display width used everywhere a column width is measured

Table view — interactive TUI

  • Row + column cursor, row-number gutter with selection marker, horizontal scroll indicators (◄ / ►)
  • Cell, row, and column-info popups (Enter, Shift+Enter, i)
  • Regex search with wrap + per-cell match highlighting; works in both tree and table; * / # search current cell; / empty accept clears
  • Row selection (Space / a, Ctrl+A, Ctrl+I) with selection-aware CSV/NDJSON export
  • Clipboard: y cell, Y row TSV, c column, j row JSON
  • Column ergonomics: freeze (f), hide/show (h/H), width (w/W/=/~), first/last (^/$), page (Shift+←/→), alignment (numeric right-aligns), thousands separators (,)
  • In-window sort (s); canonical row id preserved in gutter
  • Mouse click positions cursor; mouse wheel scrolls
  • Reload (R) preserves cursor/selection/widths/hidden/frozen/sort by column name so schema drift doesn't dislodge UI state

Tree view additions

  • Dictionary contents (lazy-loaded per column chunk)
  • INT96 / FixedLenByteArray statistics
  • Column byte offsets (data page, dict page, index page)
  • Pretty-printed JSON in cell popup when value is valid JSON

Sliding-window table buffer

  • TABLE_BUFFER_ROWS raised from 1k → 10k
  • load_table_window does page-level skip before decoding when the requested slice starts past a page boundary
  • Window recenters on the cursor on reload

Exports

  • e current row group → CSV
  • E full file → CSV (row-group-at-a-time streaming writer)
  • J full file → NDJSON
  • Selection-aware: filename includes .selected when active, and row groups with no selected rows are skipped before decode

Tests

997 total, up from 120 pre-change. Covers all new encoding decoders via parquet2's own encoders (round-trip), sanitization, Unicode-width math, alignment classifier, subsecond timestamp rendering, time-of-day, CSV/JSON quoting, regex search, thousands separators, hidden-col navigation, sort comparator, and mouse-click hit-testing.

Help dialog updated with every new key binding.

Test plan

  • Open a parquet file with DeltaLengthByteArray / DeltaByteArray VARCHAR columns — cells render populated, not empty
  • Open a file with BOOLEAN columns (RLE and BitPacked paths)
  • Open a file with ByteStreamSplit-encoded FLOAT/DOUBLE
  • INT96 and subsecond timestamp columns render with correct precision and Z suffix where applicable
  • Table view: cursor movement, search (/, *, #), row selection, clipboard ops, freeze/hide/width adjust, sort (s), reload (R) preserves state
  • Exports: e, E, J with and without selection; .selected appears in filename when active
  • Cell popup pretty-prints JSON when value is valid JSON
  • cargo test — all 997 pass
  • cargo clippy --all-targets -- -D warnings clean

🤖 Generated with Claude Code


Review follow-ups (commits 1fd2df17e99ec4)

Blockers fixed

  • cargo fmt --check clean (was 42 diffs)
  • cargo clippy --all-targets -- -D warnings clean (was 15 errors: useless .into_iter(), manual Iterator::find, clamp-like pattern, ?-operator, .is_multiple_of(), too-many-args → grouped into ColLayout / DataRowCtx structs)
  • 21 new hardcoded Color:: references routed through Theme via 7 new fields (parquet_gutter_fg, parquet_cursor_gutter_fg, parquet_cursor_row_bg, parquet_cursor_cell_bg, parquet_selected_row_fg, parquet_indicator_fg, parquet_popup_cont_fg); both far_manager and questdb_dark palettes populated
  • Popup frame now uses dh::render_dialog_frame() instead of a hand-rolled Block::default() + manual Rect centering + buf.cell_mut() clearing
  • TableSearch holds a TextInput (was a raw String), so the search prompt inherits undo/redo, selection and clipboard paste like every other dialog

Warnings fixed

  • Compiled regex cached on TableSearch keyed by query text; search_regex() call hoisted out of the per-row render loop (was recompiling ~N times per keystroke when search active)
  • Extracted compile_search_regex() helper — was duplicated 3× across search_jump, search_tree_jump, search_regex
  • Extracted pick_unused_export_path() helper — the 1000-attempt file-collision loop was duplicated 3× across CSV / full-CSV / NDJSON exports; magic 1000 is now EXPORT_ATTEMPT_LIMIT
  • map_parquet_key Ctrl branch no longer swallows unmatched Ctrl chords (Ctrl+C etc. can now fall through instead of returning Action::None)
  • Removed write-only TableSearch::last_match_row field (dead code)

Bug fixed during follow-up testing

  • Pressing / in Tree view silently opened the search input but the hardcoded Tree hint bar never reflected the search state, so typing characters were captured into an invisible prompt. render_tree now shows the live /query prompt (and status messages) when search is active, matching Table-view behavior.

Verification

  • cargo fmt --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test — all 997 pass ✓

nwoolmer and others added 9 commits April 18, 2026 01:08
Fix the original bug that motivated this: VARCHAR columns encoded with
DeltaLengthByteArray / DeltaByteArray rendered as empty cells in the
Data Preview (decoded as PLAIN). decode_data_page now dispatches on
every Encoding variant exposed by parquet2:

 - Plain, PlainDictionary, RleDictionary (existing paths)
 - DeltaLengthByteArray, DeltaByteArray, DeltaBinaryPacked (new)
 - Rle (BOOLEAN; handles V1 4-byte length prefix vs V2) (new)
 - BitPacked (falls through to PLAIN; only realistic use is BOOLEAN) (new)
 - ByteStreamSplit for FLOAT/DOUBLE/INT32/INT64/FixedLenByteArray (new)

Also correctness-level fixes in the decoder:

 - INT96 Julian-day math guarded with checked_mul/checked_add
 - Timestamp { is_adjusted_to_utc } honored with a trailing `Z`
 - Subsecond precision preserved (ms/µs/ns tails, trim trailing zeros)
 - New format_time_of_day for Time logical type (HH:MM:SS[.frac])
 - Repeated columns (max_rep_level > 0) emit an explicit marker
   instead of silently misaligned per-element output
 - Control characters sanitized at decode time (\\n preserved for
   row-detail popup; \\t/\\r/other become space/? so terminal can't
   reinterpret them and break column alignment)
 - Unicode display width used everywhere a column width is measured

Table view gains a real interactive TUI:

 - Row + column cursor, row-number gutter with selection marker,
   horizontal scroll indicators (◄ / ►)
 - Cell, row, and column-info popups (`Enter`, `Shift+Enter`, `i`)
 - Regex search with wrap + per-cell match highlighting; works in
   both tree and table; `*` / `#` search current cell; `/` empty
   accept clears
 - Row selection (`Space` / `a`, `Ctrl+A`, `Ctrl+I`) with
   selection-aware CSV/NDJSON export
 - Clipboard (`y` cell, `Y` row TSV, `c` column, `j` row JSON)
 - Column ergonomics: freeze (`f`), hide/show (`h`/`H`), width
   (`w`/`W`/`=`/`~`), first/last (`^`/`$`), page (Shift+←/→),
   alignment (numeric right-aligns), thousands separators (`,`)
 - In-window sort (`s`); canonical row id preserved in gutter
 - Mouse click positions cursor; mouse wheel scrolls
 - Reload (`R`) preserves cursor/selection/widths/hidden/frozen/sort
   by column NAME so schema drift doesn't dislodge UI state

Tree view additions:

 - Dictionary contents (lazy-loaded per column chunk)
 - INT96 / FixedLenByteArray statistics
 - Column byte offsets (data page, dict page, index page)
 - Pretty-printed JSON in cell popup when value is valid JSON

Sliding-window table buffer:

 - TABLE_BUFFER_ROWS raised from 1k → 10k
 - load_table_window skips pages (page-level skip before decoding)
   when the requested slice starts past a page boundary
 - Window recenters on the cursor on reload

Exports:

 - `e` current row group → CSV
 - `E` full file → CSV (row-group-at-a-time streaming writer)
 - `J` full file → NDJSON
 - Selection-aware: filename includes `.selected` when active, and
   row groups with no selected rows are skipped before decode

Tests: 997 total (up from 120 pre-change). Covers all new encoding
decoders via parquet2's own encoders (round-trip), sanitization,
Unicode-width math, alignment classifier, subsecond timestamp
rendering, time-of-day, CSV/JSON quoting, regex search, thousands
separators, hidden-col navigation, sort comparator, and mouse-click
hit-testing.

Help dialog updated with every new key binding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- fmt + clippy --all-targets clean (15 errors fixed: into_iter, manual find,
  clamp pattern, is_multiple_of, ?-operator, too-many-args → struct params)
- Route 21 new Color:: references through Theme (parquet_gutter_fg,
  parquet_cursor_{gutter_fg,row_bg,cell_bg}, parquet_selected_row_fg,
  parquet_indicator_fg, parquet_popup_cont_fg); both far_manager and
  questdb_dark palettes populated
- Popup frame now uses dh::render_dialog_frame instead of a hand-rolled
  Block + Rect centering + buf.cell_mut clearing
- TableSearch holds a TextInput (not a raw String), so the search prompt
  gets undo/redo, selection and clipboard paste like every other dialog
- Cache compiled regex on TableSearch keyed by query text, and hoist the
  single search_regex() call out of the per-row render loop
- Extract compile_search_regex() helper (was duplicated 3×) and
  pick_unused_export_path() helper (export-filename collision loop
  was duplicated 3×, with a magic 1000 that is now EXPORT_ATTEMPT_LIMIT)
- Stop swallowing unknown Ctrl chords in map_parquet_key: unmatched
  Ctrl+<x> now falls through instead of returning Action::None

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Written in three places but never read anywhere. Dead since introduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tree view had a hardcoded hint bar that never reflected search state, so
pressing `/` in Tree mode silently opened the search input without any
visible feedback — subsequent characters were captured but invisible.

Make the Tree hint bar behave like the Table hint: show the live `/query`
prompt while the search input is open, then the status message (for
"no match" / "wrapped"), then the static help text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Ctrl+G "Go to Line" prompt was the last ad-hoc dialog in the app: a
raw `Option<String>` backing field and a hand-rolled `Block::default` +
manual Rect + custom "> " prompt + fake `_` cursor glyph. It looked
nothing like the other dialogs and had no cursor movement, selection,
undo/redo or clipboard paste.

Migrate it to match everything else:
- `goto_line_input: Option<String>` → `Option<TextInput>` — callers read
  `.text` for the accepted value
- Render through `dh::render_dialog_frame` + `dh::render_text_input`, so
  the real terminal cursor is set (via `ui::set_cursor`) instead of a
  fake underscore glyph
- Key dispatch now also allows Left/Right/Home/End/Delete for cursor
  movement; input validation (decimal vs hex digits) stays at the insert
  site so paste can't smuggle in junk
- Remove `Theme::dialog_cursor_fg` — nothing referenced it after the
  fake-cursor glyph went away

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous pass converted the goto-line prompt to TextInput + render_dialog_frame
but kept its minimal "> input" one-liner look. Bring it in line with the
F7 Search dialog: labelled row + input row + separator + centered
{ Go }/[ Cancel ] buttons, with Tab/Shift+Tab focus cycling.

- New `GotoLineDialogState { input, focused, is_hex }` mirrors
  `SearchDialogState` (is_hex captured at open time so rendering and key
  dispatch don't have to re-derive it from `AppMode`)
- New `GotoLineDialogField { Input, ButtonOk, ButtonCancel }` with
  next/prev focus cycling
- Key dispatch routes Tab/Shift+Tab → focus cycle, Left/Right on buttons
  → swap OK/Cancel, and only applies editing keys when the input is
  focused. Enter confirms from Input/OK, cancels from Cancel
- Rendering uses the same dh primitives as search_dialog.rs:
  render_line (label), render_text_input (input), render_separator,
  render_buttons

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two classes of lints that rustc 1.95's clippy flags but 1.90 didn't:

- `clippy::collapsible_match` — nested `if` inside a `match` arm was the
  idiomatic way in 1.90. In 1.95 it must be moved into a guard clause on
  the arm pattern. Four hits in `toggle_expand()` / `expand()` for
  `NodeId::RowGroupData` and `NodeId::ColumnDict`.
- `clippy::unnecessary_min_or_max` — `saturating_sub(1).max(0)` on a
  `usize` is provably a no-op since `usize` is never < 0. Drop the
  `.max(0)`.

Root cause of the local-vs-CI mismatch: local toolchain was pinned to
1.90 while CI uses `dtolnay/rust-toolchain@stable` (1.95). Locally
`rustup override set stable` now.

Also CLAUDE.md: drop `-D warnings` from the recommended local clippy
command so warnings surface without aborting on the first hit. The hard
rule that every warning still has to go before the PR lands (CI enforces
`-D warnings`) is called out explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dialog_input_bg` is `selection` (Rgb(74, 50, 100) — muted dark purple);
`input_selection_bg` was Rgb(74, 56, 96), which differs by ±6 per channel
and reads as the identical color to the eye. The selected range was
therefore invisible inside any focused TextInput (dialogs, search,
goto-line).

Point `input_selection_bg` at `magenta` (the QuestDB brand color already
used for `highlight_bg`) — it contrasts clearly against the input bg and
the panel bg, and `text`/White stay readable on it. Set fg to White to
lock in max contrast instead of depending on `text`'s luminance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the F7 Search dialog's spacing: leave y=3 blank so the separator
sits at y=4 (one row below the input field) instead of flush against it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants