Skip to content

feat: add custom Lance metrics to trace read-path scan performance#460

Open
summaryzb wants to merge 1 commit intolance-format:mainfrom
summaryzb:feat/read-path-custom-metrics
Open

feat: add custom Lance metrics to trace read-path scan performance#460
summaryzb wants to merge 1 commit intolance-format:mainfrom
summaryzb:feat/read-path-custom-metrics

Conversation

@summaryzb
Copy link
Copy Markdown
Contributor

Summary

Adds custom metrics to the Lance Spark read path using Spark's DataSource V2 CustomMetric API, enabling per-task timing and counter instrumentation that surfaces on the Spark UI Scan node. Six metrics are tracked: fragments scanned, batches read, dataset open time, scanner creation time, batch load time, and a derived total scan time.

Motivation

Implement #459

before this pr

image

after this pr

image

Approach

The implementation uses Spark's CustomMetric / CustomTaskMetric API, which is the standard DataSource V2 mechanism for surfacing connector-specific metrics in the Spark UI.

Metric definitions (LanceCustomMetrics): Six CustomSumMetric inner classes define the metrics. Each has a public no-arg constructor (required by Spark's reflection-based instantiation). A static allMetrics() method returns all definitions for LanceScan.supportedCustomMetrics(). The CustomSumMetric base class handles aggregation across tasks automatically.

Executor-side tracking (LanceReadMetricsTracker): A thread-confined accumulator that lives inside each PartitionReader. It collects per-phase nanosecond timings and counters via simple add*() methods. The currentMetricsValues() method returns a snapshot array of CustomTaskMetric instances -- Spark calls this after each next() invocation. The derived scanTimeNs metric is computed as datasetOpenTimeNs + scannerCreateTimeNs + batchLoadTimeNs.

Instrumentation points: Timing is captured at three boundaries in the scan lifecycle:

  1. LanceFragmentScanner.create() wraps Dataset.open() and fragment.newScan() with System.nanoTime() measurements, storing the durations as instance fields.
  2. LanceFragmentColumnarBatchScanner.loadNextBatch() measures each ArrowReader.loadNextBatch() call.
  3. LanceColumnarPartitionReader reads these timings from the scanner and feeds them into its LanceReadMetricsTracker.

The same pattern is applied to LanceCountStarPartitionReader (pushed-down COUNT(*)) and LanceRowPartitionReader (delegates to the columnar reader). All three reader types override currentMetricsValues() to report metrics to Spark.

Test Coverage

  • Metric count: allMetrics() returns exactly 6 metrics.
  • Name uniqueness: all metric names are distinct.
  • Name correctness: metric names match the string constants defined in LanceCustomMetrics.
  • Description presence and uniqueness: every metric has a non-empty, unique description.
  • Sum aggregation: CustomSumMetric.aggregateTaskMetrics() correctly sums values (including empty array).
  • Reflection instantiation: all six metric inner classes can be constructed via no-arg reflection (as Spark does at runtime).
  • Tracker initial state: all counters start at zero.
  • Tracker accumulation: repeated add*() calls accumulate correctly; derived scanTimeNs equals sum of three sub-timings.
  • Tracker bulk add: addFragmentsScanned() supports multi-fragment increments.
  • Tracker-definition consistency: task metric names match definition names (required for Spark aggregation).
  • Integration -- columnar read: a SELECT x, y query produces non-zero values for all six metrics, and scanTimeNs == datasetOpenTimeNs + scannerCreateTimeNs + batchLoadTimeNs after Spark aggregation.
  • Integration -- COUNT(*) with filter: the LanceCountStarPartitionReader path also produces non-zero values for all metrics.
  • Integration -- basic read/count completions: verifies that metrics instrumentation does not break normal query execution.

Change-Id: I63dd17d7e8469c27a73251d7eca3ac373d279d7f
@github-actions github-actions Bot added the enhancement New feature or request label Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant