Skip to content

Conversation

@fightBoxing
Copy link

Purpose

Linked issue:

This PR enhances the test coverage for the Lance file format implementation in Paimon. The changes include:

  1. Fix compilation errors: Corrected incorrect usage of generic parameters for FileWriter interface

  2. Comprehensive test coverage: Added 14 new test methods covering:

    • All supported numeric types (TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL)
    • All string and binary types (CHAR, VARCHAR, BINARY, VARBINARY, BYTES)
    • Time-related types (DATE, TIME, TIMESTAMP with various precisions)
    • Complex types (ARRAY, MULTISET, VARIANT)
    • Nested RowType structures
    • Unsupported types validation (MAP, TIMESTAMP_WITH_LOCAL_TIME_ZONE)
    • Configuration tests for batch size and memory settings
    • Projection scenarios for column pruning
    • Edge cases (empty RowType, single field types, mixed array types)
  3. Documentation improvements: Added descriptive comments for all test methods to improve code readability and maintainability

Tests

  • LanceFileFormatTest unit tests in paimon-lance/src/test/java/org/apache/paimon/format/lance/LanceFileFormatTest.java

All 16 test methods verify:

  • testCreateReaderFactory - Basic reader factory creation
  • testCreateWriterFactory - Basic writer factory creation
  • testValidateDataFields_UnsupportedType_Map - Validation rejects MAP type
  • testValidateDataFields_UnsupportedType_LocalZonedTimestamp - Validation rejects TIMESTAMP_WITH_LOCAL_TIME_ZONE
  • testValidateDataFields_SupportedTypes_Basic - Basic supported types validation
  • testValidateDataFields_AllNumericTypes - All numeric types validation
  • testValidateDataFields_AllStringTypes - All string/binary types validation
  • testValidateDataFields_TimeTypes - Time types with different precisions
  • testValidateDataFields_ComplexTypes - Arrays, multisets, and variants
  • testValidateDataFields_NestedRowType - Nested structures
  • testReaderFactory_WithProjectedTypes - Column pruning scenarios
  • testReaderFactory_BatchSizeConfiguration - Batch size configuration
  • testWriterFactory_BatchSizeConfiguration - Writer batch and memory configuration
  • testValidateDataFields_EmptyRowType - Edge case: empty RowType
  • testValidateDataFields_SingleFieldTypes - Single field type scenarios
  • testValidateDataFields_MixedArrayTypes - Mixed array element types
  • testValidateDataFields_VariantType - VARIANT type support

API and Format

No. This change only affects test code and does not modify any public APIs or storage formats.

Documentation

No. This is a test enhancement with no new features introduced.

JingsongLi and others added 30 commits September 24, 2025 16:22
JingsongLi and others added 29 commits October 30, 2025 17:44
1. LanceReader: Fix resource leak in constructor
   - Ensure rootAllocator and other resources are properly closed when exception occurs
   - Add closeQuietly() method to safely close resources and log warnings
   - Catch RuntimeException to prevent resource leak in more exception scenarios

2. LanceRecordsWriter: Improve performance monitoring
   - Replace System.currentTimeMillis() with System.nanoTime() for higher precision
   - Add Arrow conversion time tracking (arrowConvertCostNanos)
   - Add flush operation count tracking (flushCount)
   - Change log level from INFO to DEBUG to avoid excessive logs in production
   - Use parameterized logging format instead of string concatenation
@fightBoxing fightBoxing force-pushed the rocky-lance-test_1.3 branch from e82a5d6 to c815193 Compare January 4, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.