Skip to content

refactor: unify load/ingest — load survives, ingest deprecated#184

Merged
aaltshuler merged 5 commits into
mainfrom
refactor/load-ingest-unification
Jun 11, 2026
Merged

refactor: unify load/ingest — load survives, ingest deprecated#184
aaltshuler merged 5 commits into
mainfrom
refactor/load-ingest-unification

Conversation

@aaltshuler

Copy link
Copy Markdown
Collaborator

Summary

load and ingest were one write path (ingest = branch-fork wrapper over load_as; both staged → publisher CAS since MR-793) wearing two accidentally-different surfaces. This PR unifies them around one explicit rule: fork-if-missing only when a base branch is explicitly given — without --from/from, a missing branch is an error, never an implicit fork.

Five commits, each green under cargo test --workspace --locked:

  1. docs(execution) — fix stale rows describing Overwrite as an inline-commit residual; it's been staged (stage_overwrite + commit_staged + publisher CAS) since MR-793, and a mid-load failure leaves Lance HEAD untouched in all three modes.
  2. feat(engine)load_as/load_file_as gain base: Option<&str> (Some(base) = fork-if-missing, None = branch must exist). LoadResult gains branch/base_branch/branch_created. The ingest* family becomes #[deprecated] shims preserving the historical contract exactly. The layered policy check is unchanged: Change always, BranchCreate additionally when a fork happens.
  3. refactor(loader)load_jsonl{,_file} helpers borrow &Omnigraph (the &mut was vestigial) and are documented as the active-branch convenience. (Planned as a deletion, but ~200 branch-semantics-sensitive call sites made keeping the one helper the lower-liability option; full removal can ride with the shim-removal release.)
  4. feat(server)!POST /ingest forks only when from is present. Previously a typo'd branch silently forked main and landed the data there; now it's a 404 with nothing created (regression-tested). BranchCreate authz only runs when a fork will happen. Handler calls load_as directly; OpenAPI regenerated.
  5. feat(cli)!load is the single data command: remote-capable (POSTs /ingest like other remote commands — previously load was the only data command forced to write Lance storage directly), --from <base> for fork-if-missing, --mode required. ingest stays as a deprecated alias (defaults --from main --mode merge) warning on stderr, matching the read/change deprecation convention. Docs updated in the same change.

Breaking changes

Surface Change Migration
CLI load --mode is now required (was silent overwrite — a destructive default) add --mode overwrite to keep old behavior
CLI load output gains base_branch/branch_created fields (additive) none
POST /ingest from absent + missing branch → 404 not_found, nothing created (was: implicit fork from main) pass "from": "main" explicitly to keep fork-if-missing
POST /ingest IngestOutput.base_branch is now nullable (echoes the request's from) treat as optional
Engine SDK load_as/load_file_as gained a base parameter; ingest{,_as,_file,_file_as} are #[deprecated] shims slated for removal pass None/Some(base); migrate ingest callers to load_as
Engine SDK loader::load_jsonl{,_file} take &Omnigraph (was &mut) none — &mut coerces

Deferred (removal release, 0.7/0.8)

Drop the ingest CLI command, the engine ingest* shims, and IngestResult; consolidated release notes land with the version bump per the maintenance contract.

Test plan

  • cargo test --workspace --locked green at every commit boundary
  • New coverage: engine fork/must-exist paths (in-source loader tests), server 404-and-creates-nothing + from-absent-loads-existing-branch, CLI load --from fork + missing-branch error + required---mode error + ingest stderr deprecation warning, remote load round-trip (in the #[ignore]-gated loopback suite, runs in CI)
  • OpenAPI regenerated; openapi.rs passes in strict mode
  • scripts/check-agents-md.sh clean on tracked files
  • Manual smoke: init → bare load (clap error) → load --branch nope (refused) → --from main (forked + loaded) → ingest (works + warns)

🤖 Generated with Claude Code

aaltshuler and others added 5 commits June 11, 2026 03:44
…commit

The LoadMode table still described Overwrite as an inline-commit-per-type
residual with a partial-truncation failure window. Since MR-793 Phase 2,
Overwrite goes through the same MutationStaging accumulator as Append/Merge,
staged as a Lance Operation::Overwrite transaction via stage_overwrite
(table_store.rs) and committed with commit_staged + publisher CAS — a
mid-load failure leaves Lance HEAD untouched in all three modes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a
missing target branch is forked from base first (the former ingest
semantics); with None the target branch must exist — staging fails on an
unknown branch, so a typo'd name can never create one. LoadResult gains
branch/base_branch/branch_created metadata (additive).

The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes
#[deprecated] shims over load_as that preserve the historical contract
exactly (from: None still means fork from main; base recorded even when no
fork happened). IngestResult and to_ingest_tables stay for the shims and
the server until the removal release.

The layered policy check is unchanged: Change on the target branch always,
BranchCreate additionally when a fork actually happens (enforced inside
branch_create_from_as with the actor threaded through).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ir role

The free helpers needlessly demanded &mut Omnigraph (every load API takes
&self) and read as leftovers. Rather than rewriting their ~200 call sites
across the test suites — which would have to re-derive the active-branch
resolution at each site — keep the one convenience and make it honest:
borrow immutably (&mut callers coerce, no churn) and document it as the
active-branch shorthand over Omnigraph::load.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Branch creation becomes opt-in by presence of the request's 'from' field.
Previously the handler defaulted from to 'main' and always auto-created a
missing branch — a typo'd branch name silently forked main and landed the
data there, with the client none the wiser. Now a request without 'from'
against a missing branch returns 404 branch-not-found and creates nothing;
with 'from' set, fork-if-missing behaves as before. The BranchCreate
authority is only consulted when a fork will actually happen.

The handler calls the unified load_as directly (the deprecated ingest_as
shim is no longer used in the server). IngestOutput.base_branch becomes
nullable: it echoes the request's 'from' and is null when absent. OpenAPI
regenerated; the CLI's local ingest arm moves to load_file_as + the new
converter shape.

BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from'
omitted must now pass from='main' explicitly. IngestOutput.base_branch is
now nullable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
omnigraph load is now the single data-write command:
- works against remote graphs (POSTs the server's /ingest endpoint with the
  same bearer/actor resolution as other remote commands) — previously load
  was the only data command forced to open Lance storage directly
- --from <base> opts into fork-if-missing for --branch (the former ingest
  semantics); without --from a missing branch is an error, never a fork
- --mode is now required: overwrite is destructive, so there is no implicit
  default (the old silent default was overwrite)
- output gains base_branch/branch_created (and table sums on remote loads)

omnigraph ingest stays as a deprecated alias (defaults preserved: --from
main --mode merge) that prints a one-line warning to stderr, matching the
read/change deprecation convention; removal in a later release.

Docs updated in the same change: cli.md, cli-reference.md, policy.md,
audit.md, execution.md (unified load section), AGENTS.md quick-flow,
README.md.

BREAKING CHANGE: scripts running omnigraph load without --mode must now
pass it explicitly (previously defaulted to the destructive overwrite).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@aaltshuler aaltshuler merged commit 328bfef into main Jun 11, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant