refactor: unify load/ingest — load survives, ingest deprecated#184
Merged
Conversation
…commit The LoadMode table still described Overwrite as an inline-commit-per-type residual with a partial-truncation failure window. Since MR-793 Phase 2, Overwrite goes through the same MutationStaging accumulator as Append/Merge, staged as a Lance Operation::Overwrite transaction via stage_overwrite (table_store.rs) and committed with commit_staged + publisher CAS — a mid-load failure leaves Lance HEAD untouched in all three modes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a missing target branch is forked from base first (the former ingest semantics); with None the target branch must exist — staging fails on an unknown branch, so a typo'd name can never create one. LoadResult gains branch/base_branch/branch_created metadata (additive). The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes #[deprecated] shims over load_as that preserve the historical contract exactly (from: None still means fork from main; base recorded even when no fork happened). IngestResult and to_ingest_tables stay for the shims and the server until the removal release. The layered policy check is unchanged: Change on the target branch always, BranchCreate additionally when a fork actually happens (enforced inside branch_create_from_as with the actor threaded through). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ir role The free helpers needlessly demanded &mut Omnigraph (every load API takes &self) and read as leftovers. Rather than rewriting their ~200 call sites across the test suites — which would have to re-derive the active-branch resolution at each site — keep the one convenience and make it honest: borrow immutably (&mut callers coerce, no churn) and document it as the active-branch shorthand over Omnigraph::load. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Branch creation becomes opt-in by presence of the request's 'from' field. Previously the handler defaulted from to 'main' and always auto-created a missing branch — a typo'd branch name silently forked main and landed the data there, with the client none the wiser. Now a request without 'from' against a missing branch returns 404 branch-not-found and creates nothing; with 'from' set, fork-if-missing behaves as before. The BranchCreate authority is only consulted when a fork will actually happen. The handler calls the unified load_as directly (the deprecated ingest_as shim is no longer used in the server). IngestOutput.base_branch becomes nullable: it echoes the request's 'from' and is null when absent. OpenAPI regenerated; the CLI's local ingest arm moves to load_file_as + the new converter shape. BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from' omitted must now pass from='main' explicitly. IngestOutput.base_branch is now nullable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
omnigraph load is now the single data-write command: - works against remote graphs (POSTs the server's /ingest endpoint with the same bearer/actor resolution as other remote commands) — previously load was the only data command forced to open Lance storage directly - --from <base> opts into fork-if-missing for --branch (the former ingest semantics); without --from a missing branch is an error, never a fork - --mode is now required: overwrite is destructive, so there is no implicit default (the old silent default was overwrite) - output gains base_branch/branch_created (and table sums on remote loads) omnigraph ingest stays as a deprecated alias (defaults preserved: --from main --mode merge) that prints a one-line warning to stderr, matching the read/change deprecation convention; removal in a later release. Docs updated in the same change: cli.md, cli-reference.md, policy.md, audit.md, execution.md (unified load section), AGENTS.md quick-flow, README.md. BREAKING CHANGE: scripts running omnigraph load without --mode must now pass it explicitly (previously defaulted to the destructive overwrite). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
loadandingestwere one write path (ingest = branch-fork wrapper overload_as; both staged → publisher CAS since MR-793) wearing two accidentally-different surfaces. This PR unifies them around one explicit rule: fork-if-missing only when a base branch is explicitly given — without--from/from, a missing branch is an error, never an implicit fork.Five commits, each green under
cargo test --workspace --locked:stage_overwrite+commit_staged+ publisher CAS) since MR-793, and a mid-load failure leaves Lance HEAD untouched in all three modes.load_as/load_file_asgainbase: Option<&str>(Some(base)= fork-if-missing,None= branch must exist).LoadResultgainsbranch/base_branch/branch_created. Theingest*family becomes#[deprecated]shims preserving the historical contract exactly. The layered policy check is unchanged:Changealways,BranchCreateadditionally when a fork happens.load_jsonl{,_file}helpers borrow&Omnigraph(the&mutwas vestigial) and are documented as the active-branch convenience. (Planned as a deletion, but ~200 branch-semantics-sensitive call sites made keeping the one helper the lower-liability option; full removal can ride with the shim-removal release.)POST /ingestforks only whenfromis present. Previously a typo'dbranchsilently forkedmainand landed the data there; now it's a 404 with nothing created (regression-tested).BranchCreateauthz only runs when a fork will happen. Handler callsload_asdirectly; OpenAPI regenerated.loadis the single data command: remote-capable (POSTs/ingestlike other remote commands — previouslyloadwas the only data command forced to write Lance storage directly),--from <base>for fork-if-missing,--moderequired.ingeststays as a deprecated alias (defaults--from main --mode merge) warning on stderr, matching theread/changedeprecation convention. Docs updated in the same change.Breaking changes
load--modeis now required (was silentoverwrite— a destructive default)--mode overwriteto keep old behaviorloadbase_branch/branch_createdfields (additive)POST /ingestfromabsent + missing branch → 404not_found, nothing created (was: implicit fork frommain)"from": "main"explicitly to keep fork-if-missingPOST /ingestIngestOutput.base_branchis now nullable (echoes the request'sfrom)load_as/load_file_asgained abaseparameter;ingest{,_as,_file,_file_as}are#[deprecated]shims slated for removalNone/Some(base); migrate ingest callers toload_asloader::load_jsonl{,_file}take&Omnigraph(was&mut)&mutcoercesDeferred (removal release, 0.7/0.8)
Drop the
ingestCLI command, the engineingest*shims, andIngestResult; consolidated release notes land with the version bump per the maintenance contract.Test plan
cargo test --workspace --lockedgreen at every commit boundaryload --fromfork + missing-branch error + required---modeerror + ingest stderr deprecation warning, remoteloadround-trip (in the#[ignore]-gated loopback suite, runs in CI)openapi.rspasses in strict modescripts/check-agents-md.shclean on tracked filesload(clap error) →load --branch nope(refused) →--from main(forked + loaded) →ingest(works + warns)🤖 Generated with Claude Code