Skip to content

feat: integrate MemWAL with deterministic sharding in daft-lance#23

Merged
rchowell merged 2 commits into
daft-engine:mainfrom
beinan:feat/memwal-sharding
Jun 5, 2026
Merged

feat: integrate MemWAL with deterministic sharding in daft-lance#23
rchowell merged 2 commits into
daft-engine:mainfrom
beinan:feat/memwal-sharding

Conversation

@beinan
Copy link
Copy Markdown
Contributor

@beinan beinan commented Jun 5, 2026

Summary

Integrates Lance's log-structured ingestion framework (MemWAL) with daft-lance to enable high-throughput parallel writes.

Key changes:

  • Adds use_mem_wal: bool = False and compact_after_write: bool = True flags to LanceDataSink.
  • Deterministically shards writes by assigning a unique MemWAL shard UUID (uuid.uuid4()) to each task / micropartition write. This allows concurrent writers (e.g. distributed Ray workers) to write without OCC manifest lock contention.
  • Implements post-write compaction during the finalization phase via compact_files_internal from daft_lance.lance_compaction so that freshly written MemWAL data is immediately visible.
  • Adds comprehensive testing in tests/io/lancedb/test_mem_wal_writes.py covering creation, appending, sharding, compaction flags, schema preservation, and COW fallbacks.

Test plan

  • Verified that all 19 new MemWAL-specific test cases in tests/io/lancedb/test_mem_wal_writes.py pass.
  • Verified that the full daft-lance test suite runs and passes cleanly with zero regressions.

🤖 Generated with Claude Code

Beinan Wang added 2 commits June 5, 2026 14:33
Integrates Lance's log-structured ingestion framework (MemWAL) with
daft-lance to enable high-throughput parallel writes. When enabled, each
write task shards records deterministically by writing to a unique
Memory Write-Ahead Log region. Post-write, a distributed compaction phase
commits WAL records to standard Copy-on-Write (COW) fragments.

Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Addresses linting/styling issues reported by CI pre-commit checks:
- Removes unused WriteResult import in test_mem_wal_writes.py
- Re-formats multi-line arrays and parameters using black/ruff style guidelines
- Formats uv.lock metadata

Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
@beinan beinan marked this pull request as ready for review June 5, 2026 21:50
@rchowell rchowell merged commit 20b837e into daft-engine:main Jun 5, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants