Skip to content

Draft full data type treeviewitems#3311

Closed
James Robinson (jlrobins) wants to merge 47 commits intomainfrom
draft_full_data_type_treeviewitems
Closed

Draft full data type treeviewitems#3311
James Robinson (jlrobins) wants to merge 47 commits intomainfrom
draft_full_data_type_treeviewitems

Conversation

@jlrobins
Copy link
Contributor

@jlrobins James Robinson (jlrobins) commented Feb 27, 2026

Draft PR for folks to try out expandable compound datatypes in the Flink relations view (realworld data spotify-listening-data).

This is the in-progress clauding of the UI phase on top of the parser created in #3305

I have not finished fine-tuning this code yet. It does seem to do what I want, however.

- Create simplified FlinkType model for UI purposes
  - Replace complex hierarchy with simple, focused interfaces
  - Add FlinkTypeKind enum (SCALAR, ROW, MAP, ARRAY, MULTISET)
  - Use CompoundFlinkType for ROW and MAP types with members array

- Implement recursive descent parser with peek()/consume() operations
  - Parse all Flink type variants: scalars, parameterized, arrays, multisets
  - Support ROW types with backtick-quoted field names and optional comments
  - Support MAP types with exactly 2 members (key, value)
  - Minimal regex use - character-by-character parsing for clarity
  - Proper handling of nullability markers (NOT NULL, NULL)
Store the full element type structure in a `members` array instead of
flattening the information. This allows full type hierarchy preservation
for nested containers (ARRAY<ROW<>>, MULTISET<ARRAY<>>, etc).

Changes:
- Move `members?: FlinkType[]` from CompoundFlinkType to base FlinkType
- Remove deprecated `areMembersNullable` field
- ARRAY/MULTISET now return CompoundFlinkType with single-element members
- Update type model docstrings to clarify ARRAY/MULTISET participation
- Enhance tests to verify leaf nodes are scalar (NOT compound) and that
  nested container types can be cleanly discriminated

Test coverage:
- ARRAY<ROW<>>, ARRAY<MAP<>>, ARRAY<MULTISET<>>
- MULTISET<ARRAY<>>, MULTISET<ROW<>>
- ARRAY<ARRAY<>> (doubly nested containers)
- All 62 parser tests passing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… unterminated comments

Addresses Copilot PR review comments:
- Add validation that closing backtick exists when parsing backtick-quoted ROW field names,
  throws clear error instead of silently consuming wrong character
- Add validation in parseComment() to throw error if EOF reached without closing quote,
  preventing truncated comments

These changes improve error reporting for malformed input and prevent confusing downstream
errors from propagating.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Addresses two medium priority issues:

1. Fix O(n²) performance in consumeWhile() by using slice() instead of
   string concatenation in loop. Previously, concatenating characters one
   at a time caused quadratic time complexity for large inputs.

2. Add negative offset bounds checking in peekAt(). The bounds check
   now properly handles negative offsets by checking `idx < 0` in
   addition to the upper bound check.

Also adds comprehensive test suite (17 tests) covering:
- Negative offset edge cases (before input start)
- consumeWhile() correctness and performance (including 10000-char stress test)
- Position advancement after consuming
- Multiple sequential consumeWhile() calls
- Edge cases (empty input, trimming behavior)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Addresses low priority feedback from code review:

1. Fix docstring typo in flinkTypes.ts: 'TIME STAMP WITH TIME ZONE' →
   'TIMESTAMP WITH TIME ZONE' for consistency with Flink SQL standards

2. Clean up tryConsume() lint rule disabling by replacing unused-var
   disabling for-of loop with simpler for-loop over keyword.length

3. Remove reference to non-existent REAL_WORLD_FULL_DATA_TYPE_EXAMPLES.md
   file in test describe block name (tests remain valid)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…e test coverage

- Updated JSDoc comment to accurately describe what the function validates (checks for non-empty members array, validates SCALAR+members is invalid)
- Added 7 new test cases covering all compound type kinds (ARRAY, ROW, MAP, MULTISET)
- Added tests for scalar types without members and edge cases (empty members array)
- Added test for nested compound type structures (ARRAY<ROW<>>)
- All 84 flinkTypeParser tests passing, 3055 total unit tests passing

This addresses Copilot comment ID 2861056430 by ensuring the documentation accurately reflects the implementation's actual behavior.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Fixes TypeScript compilation error TS2304: Cannot find name 'FlinkType'
in src/parsers/flinkTypeParser.test.ts due to using FlinkType in type
annotations for the new test cases.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…test -t

Add support for running multiple test suites in a single pass by:
- Detecting pipe-separated patterns (e.g., "pattern1|pattern2") and converting to regex OR syntax
- Detecting regex syntax (brackets, parentheses, wildcards, etc.)
- Using Mocha's grep() method for regex patterns (via GREP env var)
- Preserving FGREP behavior for simple substring matching (backward compatible)

Allows flexible test filtering:
  npx gulp test -t "parser|builder"          # Multiple test suites
  npx gulp test -t "flinkType.*test$"        # Regex patterns
  npx gulp test -t "async|integration"       # Various patterns

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fix GREP type safety in testing.ts: convert env var to RegExp with error handling
- Update comment wording for FGREP to clarify fixed-string/substring behavior
- Remove duplicate old comment in test file (line 489)
- Update stale Playlist test comment that claimed parser limitations no longer exist
- All test comments now accurately reflect current parser capabilities

All 129 parser tests passing.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove unnecessary escape characters in regex pattern (Gulpfile.js:740)
- Remove unused catch parameter (src/testing.ts:41)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Remove logger.warn() calls added for debugging type parsing. The debugging
was helpful during development but is no longer needed now that the unified
type formatter is working correctly.

Also remove unused Logger import from flinkRelation.ts.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
FlinkTypeNode instances are ephemeral - they are recreated each time a node is
expanded in the TreeView. Using fixed IDs calculated from the type structure
caused VS Code to report duplicate IDs when the same node was expanded multiple
times or when multiple columns had similar nested structures.

Solution: Don't set item.id on FlinkTypeNode. This allows VS Code to use object
identity instead, which correctly identifies new instances as unique elements.

This fixes the issue where expanding ROW or MAP columns would fail with a
"duplicate ID" error, preventing further expansion of nested types.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…TISET

When ARRAY<ROW> or MULTISET<MAP> are expanded, we skip the intermediate
container node for better UX and show only the element's fields directly.
However, this created duplicate ID errors because the container context
was lost in the ID calculation.

Solution: Include ARRAY/MULTISET markers in the ID path so that the same
field names from different arrays get unique IDs:
- Before: artists:[element]:id and items:[element]:id (collision if both have 'id')
- After: artists:[element]:id and items:[element]:id (context preserved uniquely)

The ID now tracks the full parent chain including ARRAY/MULTISET markers,
ensuring uniqueness even when intermediate nodes are skipped in the UI.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…tructures

Add a new test that validates ID uniqueness when multiple columns have
the same type structure (e.g., multiple ARRAY<ROW<id, name>> columns).

This test verifies:
1. All generated node IDs are unique within the tree view
2. IDs include the column context (artists vs metadata)
3. IDs include ARRAY markers to distinguish container instances
4. Different columns with identical structures have different IDs

This ensures that when expanding ARRAY/ROW columns with similar field names,
each field gets a globally unique ID in the tree view, preventing VS Code's
duplicate ID errors.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ent nodes

When a column has type ARRAY<ROW> or MULTISET<MAP>, we skip the intermediate
container node in the UI for cleaner display. However, this caused ID collisions
because the element's children had no parent node to track the container context.

Solution: Create a synthetic (non-displayed) ARRAY/MULTISET node as the parentNode
for the element's children. This ensures the ID hierarchy includes the container
marker even though the container node doesn't appear as a tree item.

Now when expanding column "artists" with type ARRAY<ROW<id, name>>:
- Children get IDs like: "spotify-listening-data.artists:[element]:id"
- The synthetic ARRAY node establishes the ID path without appearing in the tree
- Multiple columns with same structure get unique IDs

This fixes the "Element with id X is already registered" error when expanding
top-level ARRAY/MULTISET columns.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changes:
1. FlinkTypeNode.getDescription() now uses formatFlinkTypeForDisplay() instead of formatSqlType()
   - This includes array notation (e.g., "VARCHAR[]", "ROW[]") in tree item descriptions
   - Consistent with label display for better user clarity
   - Example: ARRAY<VARCHAR> description now shows "VARCHAR[]" instead of just "VARCHAR"

2. Added comprehensive tests for global ID uniqueness in flinkRelation.test.ts
   - Test: "generates globally unique IDs across multiple ARRAY<ROW> columns with same field names"
   - Verifies that IDs include column context and ARRAY markers even for identical structures
   - Proves synthetic parent node pattern maintains proper ID hierarchy

3. Added focused tests for ARRAY/MULTISET descriptions in flinkTypeNode.test.ts
   - Tests for ARRAY<scalar> descriptions with and without NOT NULL
   - Tests for MULTISET<scalar> descriptions
   - Tests for ARRAY<ROW> descriptions
   - Ensures array notation is properly displayed in tree item descriptions

Test Results:
- 3188 tests passing (up from 3184)
- All new tests pass
- No regressions

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changes:
1. Improved error message in FlinkRelationColumn.getParsedType()
   - Now includes table name ('relationName'), column name, and data type
   - Provides clear context for debugging parse failures
   - Full stack trace captured via logError() utility
   - Example: "Failed to parse Flink type for column 'bad_column' in table 'test_table'"

2. Added tests for parse error handling in flinkRelation.test.ts
   - "returns null and logs error for unparseable type strings"
     * Verifies error is caught and null is returned
     * Verifies _parseError flag prevents re-attempting parse
   - "successfully parses valid type strings and caches result"
     * Verifies successful parsing works
     * Verifies results are cached (same object reference on second call)

Test Results:
- 3190 tests passing (up from 3188)
- Clear error logging provides debugging trail for parse failures
- Error messages include full context: table, column, and data type

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Simplified the icon implementation to use IconNames.PLACEHOLDER for all type kinds.
This provides a consistent, neutral icon for all TreeView items while we finalize
the icon design strategy.

Changes:
- Replaced ThemeIcon with IconNames.PLACEHOLDER in FlinkTypeNode.getIcon()
- Changed return type from ThemeIcon to string (IconNames)
- Added import for IconNames enum
- All 54 FlinkTypeNode tests pass

This is a temporary simplification pending future icon design work.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Implemented unified icon strategy that uses a single function for determining
icons for both table columns and nested type nodes.

Changes:
1. Added new IconNames entries:
   - FLINK_TYPE_ROW = "symbol-struct" (for ROW types)
   - FLINK_TYPE_ARRAY = "symbol-array" (for ARRAY and MULTISET types)

2. Created getIconForFlinkType() utility function in src/utils/flinkTypes.ts:
   - Centralizes icon logic for all Flink types
   - ROW types use symbol-struct
   - ARRAY/MULTISET types use symbol-array
   - All other types (scalars, MAP) use default column icon (symbol-constant)

3. Updated FlinkRelationColumn.getTreeItem():
   - Now uses getIconForFlinkType() to determine icon based on parsed type
   - Maintains column icon as fallback when parsing fails

4. Updated FlinkTypeNode.getIcon():
   - Delegates to same getIconForFlinkType() function
   - Ensures consistent icons across the entire type hierarchy

Benefits:
- Single source of truth for icon resolution
- Consistent icon experience for columns and nested types
- Special icon treatment for structurally important types (ROW, ARRAY)
- Unified codebase approach

Test Results:
- 3190 tests passing (no regressions)
- All FlinkTypeNode tests pass
- All FlinkRelationColumn tests pass

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changed TreeView node ID scheme from mixed separators and container markers
to a cleaner, more semantic approach using only field names and '.' separators.

Changes:
1. FlinkTypeNode.id getter refactored:
   - Removed [element] and {element} markers for ARRAY/MULTISET containers
   - Use only field names for all path segments
   - Use '.' as the sole separator (instead of mixing ':' and '.')
   - New format: "table.columnName.fieldName.nestedFieldName..."
   - Example: "spotify-listening-data.track.artists.uri" (was: "spotify-listening-data.track:[element]:artists:uri")

2. Benefits of new scheme:
   - More semantic: uses actual field names instead of synthetic markers
   - Simpler format: single separator throughout
   - Clearer hierarchy: easy to parse and understand the path
   - Still globally unique: field names provide necessary disambiguation

3. Updated all related tests:
   - Tests for ID generation now expect '.' separators and field names
   - Removed assertions checking for [element]/{element} markers
   - Updated ID format expectations in all test cases

Test Results:
- 3190 tests passing (no regressions)
- All FlinkTypeNode tests pass with new ID scheme
- All FlinkRelationColumn tests pass
- Global ID uniqueness maintained across multiple ARRAY/MULTISET columns

Example IDs:
- Scalar field in ROW: "table.columnName.fieldName"
- Nested ROW: "table.columnName.outerField.innerField"
- ARRAY<ROW> element: "table.arrayColumnName.fieldName" (no [element] marker)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
testFilter is never reassigned after initialization, so it should be
declared as const rather than let.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changes:
- Created BaseFlinkType interface with common fields (kind, dataType, isFieldNullable, fieldName, comment)
- ScalarFlinkType extends BaseFlinkType with kind=SCALAR, no members field
- CompoundFlinkType extends BaseFlinkType with kind=ROW|ARRAY|MULTISET|MAP and required members
- FlinkType is now a union type: ScalarFlinkType | CompoundFlinkType
- Updated isCompoundFlinkType() type guard to simple kind check (no validation throws)

Benefits:
✅ TypeScript prevents accessing .members on scalar types (compile-time safety)
✅ Non-optional members field guarantees non-empty for compounds (via construction)
✅ Type structure self-documents invariants
✅ DRY: common fields defined once in BaseFlinkType
✅ Zero runtime overhead
✅ Extractable for use in other TypeScript codebases

Updated tests:
- Removed tests that expected isCompoundFlinkType() to throw on invalid states
- Added tests verifying type guard correctly discriminates all type kinds

All 92 parser tests passing, TypeScript strict mode satisfied.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… tests

- Use symbol-package icon for MULTISET types (distinct from ARRAY's symbol-array)
- symbol-package semantically represents grouped collections without ordering constraints
- Updated getIconForFlinkType() to return separate icons for ARRAY vs MULTISET
- Added 7 comprehensive unit tests for getIconForFlinkType() covering all type kinds
- Verified ARRAY and MULTISET return different icons (symbol-array vs symbol-package)
- Fixed unused FlinkType import in flinkTypeNode.test.ts

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Added 20 tests covering all type combinations with isExpandable() verification
- Tests prove children exist if and only if isExpandable() is true
- Organized by type categories:
  * Scalar types (non-expandable): INT, VARCHAR, DECIMAL
  * ROW types (expandable): simple, many fields, nested ROW
  * MAP types (expandable): multiple key/value combinations
  * ARRAY with scalar elements (non-expandable): INT, VARCHAR, DECIMAL
  * ARRAY with compound elements (expandable): ROW, MAP, nested ARRAY<ARRAY<ROW>>
  * MULTISET with scalar elements (non-expandable): INT, VARCHAR
  * MULTISET with compound elements (expandable): ROW, MAP
- Included existing ID uniqueness tests for ARRAY<ROW> and MULTISET<ROW>
- Helper function assertExpandableConsistency() ensures correctness across all cases

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The depth property was never actually used:
- Not read by any methods (ID generation uses parentNode chain traversal)
- Not involved in isExpandable() logic (uses type kind instead)
- Only assigned in constructor and passed to children
- Vestigial from earlier implementation design

Removed:
- depth field declaration and JSDoc
- depth parameter from constructor and FlinkRelationColumn.getTypeChildren()
- All depth assertions from unit tests
- Updated test names to reflect removal of depth parameter

All 54 FlinkTypeNode tests pass.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…inkRelationColumn

Replace FlinkRelationColumn object reference with parentColumnId string:
- Changed FlinkTypeNode.parentColumn to parentColumnId: string | null
- Propagate parentColumnId (not parentColumn) through child creation
- Remove FlinkRelationColumn import from flinkTypeNode.ts
- Updated all tests to use parentColumnId instead of parentColumn

Benefits:
- Eliminates bidirectional coupling between FlinkRelationColumn and FlinkTypeNode
- Reduces memory footprint (string vs object reference)
- Clearer one-way dependency direction: Column → TypeNode
- TreeView item IDs remain identical (no behavior change)

Updated FlinkDatabaseViewProvider.getParent():
- Look up parent column by ID from relationsContainer when needed
- Maintains tree hierarchy without object references

All 54 FlinkTypeNode tests pass.
All 66 FlinkRelation tests pass.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changes:
- getParsedType() now returns FlinkType (never null)
- On parse failure, returns synthetic SCALAR type with full dataType string
- Removed _parseError flag (no longer needed, always cache result)
- Simplified simpleDataType() getter - no null check or fallback needed
- Removed null checks in isExpandable() and getTypeChildren()
- Removed ternary in getTreeItem() icon selection

Rationale:
- If parsing fails in production (shouldn't happen), it's still visible in UI
  as the SCALAR type with the full unparseable string
- Eliminates defensive null checks from all callers
- Cleaner API: always returns a value, no special error state

Behavior:
- TreeView item IDs remain identical (no user-facing changes)
- Column expands/displays correctly whether type parses or not
- Parse errors still logged for debugging

All 137 related tests pass.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove redundant compound type checks from getTypeChildren() methods
- Both methods now rely solely on isExpandable property
- Simplify isExpandable() final clause to directly return isCompoundFlinkType(members[0])
- Since ARRAY/MULTISET is the only remaining branch after ROW/MAP check,
  no need for kind check or ternary operator
- Add explicit type annotations to arrow function parameters for type safety
- Extract { kind, members } destructuring in getTypeChildren for clarity
- Replace all simpleDataType property usages with direct formatFlinkTypeForDisplay calls
- Remove simpleDataType getter entirely (eliminated redundant wrapper)

All 227 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove isCompoundFlinkType() checks since isExpandable() already guards against scalars
- isExpandable returning true guarantees parsed type is compound
- Use type assertions (as CompoundFlinkType) instead of runtime checks
- Simplifies both FlinkRelationColumn.getTypeChildren() and FlinkTypeNode.getChildren()
- Update docstrings to accurately state: if expandable, certain assumptions can be made;
  if not expandable, returns empty array. No assumptions made, but isExpandable() validates conditions.

All 227 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add get iconName(): IconNames property to FlinkTypeNode
- Implementation mirrors logic from getIconForFlinkType utility function
- FlinkTypeNode now determines its own icon, parallel to how FlinkRelation does
- Update FlinkTypeNode.getTreeItem() to use this.iconName instead of utility function
- Move all getIconForFlinkType tests to FlinkTypeNode.iconName tests
- Remove getIconForFlinkType tests from flinkTypes.test.ts
- Remove unused getIconForFlinkType function import from flinkTypes.test.ts
- Keep getIconForFlinkType utility for use by FlinkRelationColumn

All 227 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add get formattedType(): string property to FlinkTypeNode
- Returns formatted display string for the type using formatFlinkTypeForDisplay()
- Provides convenient accessor for display contexts within FlinkTypeNode
- FlinkRelationColumn continues using utility function directly

All 227 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…pe) method

- Change treeItemDescription from getter to private method
- Takes parsed FlinkType as parameter to avoid redundant getParsedType() calls
- Called from getTreeItem() which already parses the type for icon determination
- Reduces redundant parsing when building tree items
- Update all test references to call method with getParsedType()

All 227 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- tooltipLine() method was never called anywhere in the codebase
- Remove method and its test from flinkRelation.ts and flinkRelation.test.ts
- Reduces code surface

All 226 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Replace hardcoded IconNames.FLINK_FUNCTION with getIconForFlinkType(parsed)
- Tooltip header now shows appropriate icon based on column type (ROW, ARRAY, MULTISET, etc.)
- Remove TODO comment about needing column-specific icons - now implemented
- Parse type once in getToolTip() for consistent use

All 226 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add static getIconForType(flinkType: FlinkType): IconNames to FlinkTypeNode
- iconName getter now delegates to static method
- FlinkRelationColumn.getTreeItem() uses static method directly
- Pass icon name to getToolTip() to avoid redundant determination
- Update getToolTip() signature to accept iconName parameter
- Remove throwaway FlinkTypeNode instantiation pattern

This approach keeps icon logic with FlinkTypeNode where it semantically belongs,
eliminates duplication, and provides efficient access for callers with a FlinkType
but no FlinkTypeNode instance. Reduces redundant parsing in getTreeItem().

All 226 tests passing, TypeScript compilation succeeds.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@sonarqube-confluent
Copy link

Base automatically changed from flink_simplified_datatype_parser to main March 3, 2026 15:49
@jlrobins James Robinson (jlrobins) deleted the draft_full_data_type_treeviewitems branch March 3, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant