Skip to content

feat: writable external tables (INSERT/LOAD into stage files)#24889

Open
fengttt wants to merge 10 commits into
matrixorigin:mainfrom
fengttt:feature/writable-external-table
Open

feat: writable external tables (INSERT/LOAD into stage files)#24889
fengttt wants to merge 10 commits into
matrixorigin:mainfrom
fengttt:feature/writable-external-table

Conversation

@fengttt

@fengttt fengttt commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What

Make external tables writable. An external table created with a new
write_file_pattern option accepts INSERT ... SELECT and LOAD, writing
CSV or JSONLine files into a stage:// destination. The pattern is a
strftime(3) template with two MO extensions:

  • %nNn random decimal digits
  • %U → a UUID

Each parallel write pipeline produces exactly one file. Reads, UPDATE/DELETE,
and read-only external tables are unchanged.

CREATE EXTERNAL TABLE t (a int, b varchar(20), c double)
INFILE{'filepath'='stage://s/part_*.csv', 'format'='csv',
       'write_file_pattern'='stage://s/part_%U.csv'}
FIELDS TERMINATED BY ',';

INSERT INTO t SELECT * FROM src;          -- writes one CSV file to the stage
LOAD DATA INFILE '...' INTO TABLE t ...;  -- same, from a source file

How

  • pkg/sql/colexec/externalwrite (new): strftime expander, ExternalWriter
    with a fileservice streaming sink (io.Pipe), CSV/JSONLine encoders. Encoders
    are const-vector aware and emit only the table's declared columns.
  • insert operator: third write mode ToExternal/insert_external alongside
    ToWriteS3/insert_table. Write config is carried on the Go InsertCtx
    (no proto change), built at compile time from TableDef.Createsql.
  • compile: compileInsert routes external-write nodes to one writer op per
    source scope (no S3 merge/shuffle) → one file per pipeline, parallel across CNs.
  • planner: minimal external insert plan (build_insert/build_load);
    op-aware checkTableType allows writable-external for insert;
    initInsertStmt Pkey nil-guard (external tables have no PK); the modern DML
    binder defers external targets to the legacy planner.
  • DDL: validates write_file_pattern (must be stage://, csv/jsonline only,
    pattern must parse) and accepts it in the read-side option validators.

Semantics / limitations

  • Output formats: CSV and JSONLine only; destination must be stage://.
  • Empty pipeline → no file (lazy file creation).
  • Writes are not transactional (files are finalized at pipeline close); a
    failed statement may leave partial files. Documented as a v1 limitation.
  • UPDATE/DELETE on external tables remain unsupported.

Tests

  • Unit: pkg/sql/colexec/externalwrite (expander, encoders), build_ddl validator.
  • BVT: test/distributed/cases/stage/writable_external_table.{sql,result}
    CSV/JSONLine insert + readback, multi-file accumulation, LOAD into external,
    and all error cases. Passes 37/37 locally. Existing stage.sql (external reads
    • DML-rejection checks) still passes 100%.

Design + as-built notes: docs/design/writable_external_table_impl.md.

🤖 Generated with Claude Code

Make external tables writable when created with a WRITE_FILE_PATTERN option
(a strftime template with MO extensions %nN = n random digits and %U = UUID,
resolving to a stage:// path). Such tables accept INSERT ... SELECT and LOAD,
writing CSV or JSONLine files into the stage; each parallel pipeline writes one
file. Reads, UPDATE/DELETE, and read-only external tables are unchanged.

- pkg/sql/colexec/externalwrite: strftime expander, ExternalWriter with a
  fileservice streaming sink, CSV/JSONLine encoders (const-vector aware, emit
  only the declared columns).
- insert operator: third write mode ToExternal/insert_external alongside
  ToWriteS3/insert_table; write config carried on the Go InsertCtx (no proto
  change), built in compile from TableDef.Createsql.
- compile: compileInsert routes external-write nodes to one writer op per
  source scope (no S3 merge/shuffle).
- planner: minimal external insert plan (build_insert/build_load); op-aware
  checkTableType allows writable-external for insert; initInsertStmt Pkey guard;
  modern DML binder defers external targets to the legacy path.
- DDL: validate WRITE_FILE_PATTERN (stage:// + csv/jsonline + parseable) and
  accept it in the read-side option validators.
- docs/design + BVT case test/distributed/cases/stage/writable_external_table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SHOW CREATE

Address iamlinjunhong's review on writable external tables:

1. Remote execution no longer loses external-write mode. pipeline.Insert
   gains to_external + external_stmt_unix_nano; the receiving CN rebuilds
   the writer config from the serialized TableDef's stored ExternParam via
   the new shared buildExternalInsertArg, so WRITE_FILE_PATTERN time
   directives expand against the same statement-start instant on every CN.
   The writer's session time zone is now resolved in insert.Prepare so a
   remotely rebuilt operator renders TIMESTAMPs in the session zone too.

   Testing on a 2-CN cluster exposed a second instance of the same bug:
   dupOperator dropped ToExternal when a scope is parallelized, so the
   duplicated inserts silently wrote rows into the disttae workspace and
   the transaction panicked at commit (UnionWindow index out of range in
   dumpInsertBatchLocked). Both paths are fixed and covered by unit tests
   (encode/decode round-trip, dup), plus a 100k-row parallel-insert BVT
   case that fails without the dup fix.

2. format='jsonline' with jsondata='array' is rejected at DDL time for
   writable external tables: the writer emits one object per line, which
   an 'array' table could not read back.

3. The CSV writer now defaults ENCLOSED BY to '"', matching the external
   reader (which always parses with '"' when the clause is absent), so
   strings containing the field terminator, quotes, newlines or
   backslashes round-trip. New BVT case covers all four.

4. SHOW CREATE TABLE now emits WRITE_FILE_PATTERN for external tables
   (both INFILE and s3option forms), so the output recreates a writable
   table instead of silently degrading to read-only.

All verified end to end on the local multi-CN docker cluster, including a
500k-row parallel insert producing 5 writer files that read back exactly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@fengttt

fengttt commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@iamlinjunhong Thanks for the thorough review — all four issues (plus the three you re-raised from Copilot, which were fixed in 1bb3a22) are now addressed in d15911b.

1. Remote execution loses external-write mode — Fixed via proto plumbing, as you suggested. pipeline.Insert now carries to_external plus external_stmt_unix_nano; the receiving CN rebuilds the full writer config from the serialized TableDef's stored ExternParam through a shared buildExternalInsertArg (used by both local compile and remote-run decode), so every CN expands WRITE_FILE_PATTERN time directives against the same statement-start instant. The writer's session time zone is resolved in insert.Prepare (the remote process carries session info), so remotely rebuilt operators render TIMESTAMPs consistently too. Covered by an encode/decode round-trip unit test.

While verifying this end to end on a 2-CN docker cluster with a 500k-row insert, I found a second instance of the same class of bug: dupOperator dropped ToExternal when a scope is parallelized, so on any multi-pipeline insert the duplicated instances silently degraded to engine-relation inserts — rows landed in the disttae workspace and the txn panicked at commit (UnionWindow index-out-of-range in dumpInsertBatchLocked). Fixed, unit-tested, and the BVT now includes a 100k-row parallel insert that fails without the fix. After both fixes the 500k-row insert writes 5 parallel files and reads back exactly.

2. JSONLine ignores jsondata='array' — Now rejected at DDL time in validateWriteFilePattern ("writable external table does not support jsondata 'array', use 'object'"), since the writer emits one object per line and an 'array' table could not read back its own output. Unit test + BVT error case added.

3. CSV default enclosure mismatch — The writer now defaults EnclosedBy to ". (The reader always parses with " when ENCLOSED BY is absent — and even when it's explicitly empty — so writing with the same default makes values containing the field terminator, quotes, newlines, or backslashes round-trip.) Added a BVT case inserting with,comma / with"quote / with\nnewline / with\\backslash through a table with no ENCLOSED BY clause and reading them back.

4. SHOW CREATE TABLE loses write_file_pattern — Now emitted for both the INFILE and s3option forms in build_show_util.go, and the BVT asserts the SHOW CREATE TABLE output.

Verification: make + make static-check clean; targeted unit tests in externalwrite, insert, plan, compile pass; the writable_external_table BVT passes 71/71 against a freshly built 2-CN docker cluster.

fengttt and others added 3 commits June 11, 2026 12:48
- TestShowCreateExternalWriteFilePattern: WRITE_FILE_PATTERN survives SHOW
  CREATE formatting in both the INFILE and URL s3option forms (the s3option
  form had no coverage), and read-only tables emit no such key.
- TestConstructExternalInsertStmtTime: the writer config takes its timestamp
  from defines.StartTS on the process context (falling back to the wall
  clock), which is what keeps WRITE_FILE_PATTERN time directives consistent
  across parallel pipelines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A 4.4M-row flushed source crosses the multi-CN stats thresholds (>512
blocks), so the INSERT into the writable external table compiles to a
MULTICN plan; on a multi-CN cluster the source-scan scopes carry the
external-write insert to remote CNs through the pipeline protocol,
exercising the to_external encode/decode and remote writer rebuild end to
end. On a single CN the same case degenerates to the parallel-pipeline
insert and asserts identical results.

Verified on the local 2-CN docker cluster: during the run cn2's
mo_rpc_message_total{name="pipeline-server",type="receive"} advanced by 13
while the session ran on cn1, and the 4.4M rows read back exactly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread pkg/sql/plan/build_ddl.go Outdated

// validateWriteFilePattern validates the WRITE_FILE_PATTERN option that makes an
// external table writable. No-op for read-only external tables (option absent).
func validateWriteFilePattern(ctx context.Context, param *tree.ExternParam) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Parallel writers silently collide when WRITE_FILE_PATTERN has no uniqueness directive

validateWriteFilePattern checks only the stage:// prefix, the format, and a dry-run expansion — it never requires %U/%nN. But execution creates one writer per parallel pipeline (one per scope in compileInsert, more via dupOperator), all expanding the pattern against the same statement timestamp. ExpandFilePattern's own doc comment says "It is the caller's responsibility to make the pattern produce unique names across parallel writers" — and no caller enforces it.

Failure scenario: 'write_file_pattern'='stage://s/out-%Y%m%d.csv' (valid today) + a multi-pipeline INSERT/LOAD → every pipeline opens the same path. Both LocalFS.Write and S3FS.Write do a non-atomic exists-check-then-write, so the statement either fails with FileAlreadyExists after some files were already written, or in the race window last-writer-wins and the other pipelines' rows are silently lost while affected-rows reports success. Two sequential same-day INSERTs collide the same way.

Suggested fix: reject patterns without %U/%nN here at DDL time (or auto-append a uniqueness suffix at expansion when absent).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49bvalidateWriteFilePattern now rejects patterns without %U/%nN (PatternHasUniqueDirective, %%-escape aware). Unit tests + BVT error case added; all existing patterns already used %U.

// appendExternalInsertPlan appends a minimal INSERT node for a writable external
// table. The source (lastNodeId) has already been bound, cast to the table
// column types and projected by initInsertStmt, so we only attach the INSERT.
func appendExternalInsertPlan(builder *QueryBuilder, bindCtx *BindContext, objRef *ObjectRef, tableDef *TableDef, lastNodeId int32) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Minimal external-insert plan skips PreInsert — NOT NULL is unenforced and AUTO_INCREMENT columns write NULL

NOT NULL enforcement for INSERT lives only in the PreInsert operator (colexec.BatchDataNotNullCheck, preinsert.go:263 — its single non-test caller), and AUTO_INCREMENT filling (genAutoIncrCol) likewise only runs there. appendExternalInsertPlan attaches the INSERT node directly with no PreInsert, and the compile side constructs none, so:

  • CREATE EXTERNAL TABLE t (a int NOT NULL, ...) ... write_file_pattern=...; INSERT INTO t VALUES (NULL, ...) succeeds and writes \N instead of raising "Column 'a' cannot be null". Explicit NULLs and NULLs from a SELECT/LOAD source all pass through (omitted columns still fail at plan time via getDefaultExpr).
  • CREATE EXTERNAL TABLE accepts AUTO_INCREMENT (shared buildTableDefs, no external gate), and getDefaultExpr plans the column as a NULL literal expecting PreInsert to fill it — so INSERT INTO t(v) ... writes \N for the id column instead of generated values.

The LOAD path (build_load.go external branch) has the same gap. Suggested fix: reject AUTO_INCREMENT columns for writable external tables at DDL time, and run BatchDataNotNullCheck either via a PreInsert node here or inside insert_external.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — NOT NULL is enforced in insert_external via attrs-aligned ColDefs (covers INSERT and LOAD; const-null vectors handled), and AUTO_INCREMENT columns are rejected at DDL time for writable external tables (hidden fake-PK columns exempt — they're never written; found live when the check first rejected every PK-less table). BVT asserts both.

Comment thread pkg/sql/colexec/insert/types.go Outdated
insert.ctr.partitionS3Writers = nil
}
if insert.ctr.extWriter != nil {
insert.ctr.extWriter.Close(proc.Ctx)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Pipeline failure finalizes the partial file — orphan rows become visible, and a retried statement duplicates them

Reset and Free call extWriter.Close(proc.Ctx) unconditionally, ignoring the pipelineFailed/err parameters and discarding Close's error. externalWriter.CloseFileServiceWriter.Close does a clean pipe close (not CloseWithError), so the async fs.Write treats EOF as normal end-of-stream and durably persists whatever was buffered: LocalFS renames the temp file into place, S3 completes the PUT.

Failure scenario: INSERT INTO wext SELECT ... writes 2 of 5 batches, then the pipeline fails (expression error, lock conflict). Cleanup finalizes the partial file into the stage → every later SELECT on the table includes those orphan rows. Worse, for a retryable error (ErrTxnNeedRetry) the retry loop reruns the statement and writes a complete second file under a new %U expansion → the table now returns the 2 partial batches plus all 5 batches.

External writes can't be transactional, but the failure path should abort rather than finalize: plumb pipelineFailed/err into an abort that cancels the underlying write (e.g. CloseWithError / remove-on-error), and surface Close errors instead of dropping them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — added FileServiceWriter.Abort (closes the pipe with an error so the in-flight fileservice write discards its partial output) and an Abort on the ExternalWriter interface; Reset/Free now abort instead of Close — a non-nil writer there always means the stream didn't end cleanly, since insert_external nils it after a successful Close. Verified live: a 200k-row insert failing mid-stream on division-by-zero leaves zero files in the stage.

return []byte("true"), false, nil
}
return []byte("false"), false, nil
case types.T_bit:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] CSV: bit values are written as raw unquoted bytes — terminator bytes corrupt the record structure

csvValue returns the bit value's raw big-endian bytes with quote=false, so the bytes are written without enclosure or escaping. Any bit value whose byte representation contains the field terminator, line terminator, enclosure char, or backslash breaks the CSV framing.

Failure scenario: bit(8) value 44 (0x2C = ,) or 10 (\n) → the unescaped byte splits the row mid-record; reading the table back fails with a column-count mismatch or returns shifted values — the writer's own round-trip guarantee breaks. (String types are enclosed+escaped; bit is not, yet its payload is equally arbitrary bytes.)

Suggested fix: return quote=true for T_bit so the bytes get enclosed and escaped like other binary-ish values (the reader parses bit from the enclosed field's raw bytes, so enclosure is round-trip-safe).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — bit values are now enclosed+escaped like binary (quote=true). BVT round-trips 44 (','), 34 ('"'), 92 ('\'), 128 (high byte). One caveat discovered while testing: bytes that are pure whitespace (e.g. 10='\n') are written correctly but read back NULL because the external READER TrimSpaces non-string fields — a pre-existing read-side limitation noted in the test.

return json.RawMessage(val.String()), nil
case types.T_bool:
return vector.GetFixedAtNoTypeCheck[bool](vec, i), nil
case types.T_bit:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] JSONLine: bit bytes ≥ 0x80 are invalid UTF-8 and get silently replaced with U+FFFD by json.Marshal

jsonValue returns the bit value's raw big-endian bytes as a Go string. encoding/json coerces invalid UTF-8 to during marshaling, so any bit value with a byte ≥ 0x80 is corrupted.

Failure scenario: bit(8) value 128 (single byte 0x80) → marshaled as "�" (3 bytes); on read-back the external reader's getColData T_bit sees a 3-byte field and reconstructs 0xEFBFBD (or errors "data too long") instead of 128. Values ≤ 127 happen to survive, which makes the bug easy to miss in tests.

Suggested fix: emit bit as a JSON number is what the reader can't parse (that was the original bug) — so either base64-encode like T_binary does and teach the reader, or reject bit columns for jsonline writable tables at DDL time. Round-tripping arbitrary bytes through a JSON string is not salvageable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — went with rejection: bit columns are refused at DDL time for writable jsonline tables (CSV keeps them, enclosed+escaped), with an encoder error guarding the unreachable path. Round-tripping arbitrary bytes through a JSON string isn't salvageable without changing the reader.

// start time). It is the caller's responsibility to make the pattern produce
// unique names across parallel writers (use %U or %nN); the standard directives
// alone are not unique.
func ExpandFilePattern(pattern string, t time.Time) (string, error) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Time directives expand in each CN's OS-local zone — one statement can scatter files across date partitions

Only the instant travels to remote CNs (ExternalStmtUnixNano); time.Unix(0, nanos) on the receiving CN carries its OS-local zone, and the sender's defines.StartTS is local time.Now() in its zone. strftimeDirective extracts t.Year()/Month()/Day()/Hour() from the time's own location with no .UTC() normalization anywhere in this file — so the proto comment's promise that "every CN expands WRITE_FILE_PATTERN time directives identically" only holds when all hosts share a time zone.

Failure scenario: CN1 at TZ=UTC+8, CN2 at TZ=UTC, pattern stage://s/dt=%Y-%m-%d/part-%U.csv, statement at 2026-06-12 00:10 CN1-local → local pipelines write under dt=2026-06-12, remote pipelines under dt=2026-06-11.

Suggested fix: normalize once — e.g. t = t.UTC() at the top of ExpandFilePattern (documenting that time directives are UTC), or render in a fixed zone carried in the config. UTC also makes the dry-run validation in validateWriteFilePattern consistent with runtime.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49bExpandFilePattern normalizes to UTC at the top (documented in the doc comment and design docs), so the same instant expands identically on every CN; unit test compares a UTC and UTC+8 rendering of the same instant.

Comment thread pkg/sql/colexec/externalwrite/encode.go Outdated
return nil, moerr.NewInternalErrorf(context.Background(), "external write (jsonline): %v", err)
}
buf.Write(jb)
buf.WriteByte('\n')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] JSONLine writer hardcodes \n but the reader honors the table's LINES TERMINATED BY — self-incompatible round-trip

encodeJSONLine always emits \n as the record separator, ignoring cfg.LineTerminator (which buildExternalInsertArg populates from the table's LINES TERMINATED BY clause). The external reader's newCSVParserFromReader keeps the custom linesTerminatedBy for jsonline (the jsonline override only touches the field terminator/escape), so a table declared with LINES TERMINATED BY '\r\n' parses its own written file as one malformed line.

Failure scenario: format='jsonline' ... lines terminated by '\r\n' + INSERT 3 rows → file contains \n-separated objects; SELECT parses with \r\n and fails or returns the wrong row count.

Suggested fix: buf.Write(w.cfg.LineTerminator) here (CSV already honors it), or reject a non-\n LINES TERMINATED BY for jsonline writable tables at DDL time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — the JSONLine encoder now writes cfg.LineTerminator between records (exercised by a \r\n unit test). This came along with the map-free encoder rewrite from the perf finding.

Comment thread pkg/sql/compile/operator.go Outdated
// parallel pipelines / multiple CNs all resolve the same path even when the
// pattern contains time directives (e.g. %Y/%m/%d/%H). Fall back to time.Now()
// when the statement start is not carried on the context.
stmtAt := time.Now()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] defines.StartTS is only set by the MySQL frontend — internal-executor statements get a different time.Now() per scope

The only setter of defines.StartTS{} is the frontend's doComQuery; statements run through the internal SQL executor (sql_executor.go, used by ~20 components) never carry it, so this fallback runs once per source scope (compileInsert calls constructExternalInsert in the per-scope loop) and each writer gets a slightly different timestamp. A pattern with %H%M%S (or a midnight-straddling %d) can then expand to different paths within one statement — exactly what the statement-start timestamp was introduced to prevent.

Compile.startAt already carries the statement start on every construction path (frontend, internal executor receiveAt, retry) — threading c.startAt from compileInsert into constructExternalInsert removes the per-scope time.Now() entirely and fixes the internal-executor gap at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49bcompileInsert resolves the timestamp once per statement via externalInsertStmtTime, preferring defines.StartTS, then Compile.startAt (populated on every construction path including the internal SQL executor), then the wall clock; all scopes share the single value.

Comment thread pkg/sql/plan/dml_context.go Outdated
// legacy planner (buildInsert/buildLoad), which supports INSERT/LOAD into
// writable external tables and rejects the rest with a precise error.
if tableDef.TableType == catalog.SystemExternalRel {
return moerr.NewUnsupportedDML(ctx.GetContext(), "external table")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] REPLACE INTO an external table now leaks the internal ErrUnsupportedDML sentinel instead of a user-facing error

This early return fires for every external table, and bindAndOptimizeReplaceQuery is the one DML entry point with no legacy fallback for ErrUnsupportedDML (insert/load/delete/update all catch it in build.go and replan). So REPLACE INTO <external table> now surfaces "unsupported DML: external table" — an internal fallback signal whose constructor even wraps the context with ContextWithNoReport — instead of the previous "invalid input: cannot insert/update/delete from external table". UPDATE/DELETE are fine (their fallbacks reach checkTableType, which still produces the old message).

Suggested fix: handle ErrUnsupportedDML in bindAndOptimizeReplaceQuery (mapping to the same InvalidInput message), or special-case REPLACE before this resolver.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — two parts: read-only external tables now produce the user-facing "cannot insert/update/delete from external table" directly in ResolveSingleTable (only writable ones emit the legacy-fallback sentinel, which insert/load/update/delete consume), and bindAndOptimizeReplaceQuery maps the remaining external-table sentinel to the same user-facing error instead of leaking it. BVT asserts the REPLACE message.


// addEscape escapes backslashes and (doubled) the enclosure character, matching
// pkg/frontend/export.go addEscapeToString.
func addEscape(s []byte, escape byte) []byte {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Perf] Encoder hot path: avoidable per-cell/per-row allocations

This encoder runs per cell on the bulk INSERT/LOAD path; three cheap wins:

  1. addEscape always does two bytes.ReplaceAll passes — each allocates a full copy even with zero matches (the common case), defeating the zero-copy vec.GetBytesAt. Fast-path it: if bytes.IndexByte(s, '\\') < 0 && (escape == 0 || bytes.IndexByte(s, escape) < 0) { return s }.
  2. encodeCSV/encodeJSONLine allocate a fresh unsized bytes.Buffer per batch and discard it after fw.Write (which fully consumes the data before returning). One reused buffer on externalWriter with Reset() per batch right-sizes itself after the first batch — an ~800KB batch currently pays ~14 grow/realloc/copy steps from 64B.
  3. encodeJSONLine builds a map[string]interface{} per row and reflect-json.Marshals it, re-sorting the same fixed key set every row and boxing every numeric cell. Precomputing the encoded key prefixes (,"name":) once and appending values via the existing per-type switch removes the map, the boxing, and the per-row key sort. (1M rows × 10 cols ≈ 13M avoidable allocations.)

Also micro: []byte("true")/[]byte("false")/[]byte("\\N") per cell can be package-level vars.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c0c49b — all three: addEscape fast-paths the no-escape case (returns the input slice, asserted by a pointer-identity test), one reused buffer per writer replaces the per-batch allocation, and JSONLine now encodes directly (pre-encoded key prefixes, strconv into a scratch slice, json-compatible float formatting, json.Compact for JSON columns) with byte-identical output — existing exact-match tests pass unchanged. The constant \N/true/false slices are package vars now too.

P1 — silent data loss / constraint holes:
- DDL now requires a %U/%nN uniqueness directive in WRITE_FILE_PATTERN
  (PatternHasUniqueDirective): every parallel pipeline expands the pattern
  against the same statement timestamp, so a directive-free pattern made all
  writers open the identical path (FileAlreadyExists or last-writer-wins).
- NOT NULL is enforced in insert_external (the minimal plan runs no
  PreInsert); AUTO_INCREMENT columns are rejected at DDL time for writable
  external tables (hidden fake-PK columns exempt — they are never written).
- Pipeline failure now ABORTS the in-flight file (FileServiceWriter.Abort
  closes the pipe with an error so fileservice discards the partial object)
  instead of finalizing it; verified live: a failing 200k-row insert leaves
  zero files.

P2 — round-trip correctness:
- CSV writes bit values enclosed+escaped (raw bytes can contain the
  terminator/quote); jsonline writable tables reject bit columns at DDL time
  (bytes >= 0x80 cannot survive a JSON string) with an encoder guard behind.
- WRITE_FILE_PATTERN time directives render in UTC so local and remote CNs
  expand the same instant identically regardless of host OS time zones.
- encodeJSONLine honors the table's LINES TERMINATED BY (the reader parses
  jsonline with it; the writer hardcoded '\n').

P3 — error fidelity / timestamp plumbing:
- The statement timestamp is resolved once per statement in compileInsert,
  preferring defines.StartTS then Compile.startAt (set on every construction
  path, including the internal SQL executor) over per-scope time.Now().
- REPLACE INTO an external table reports the user-facing "cannot
  insert/update/delete from external table" again: read-only external tables
  error directly in ResolveSingleTable (only writable ones emit the
  legacy-fallback sentinel) and bindAndOptimizeReplaceQuery maps the
  remaining sentinel instead of leaking it.

Perf/reuse:
- Encoder hot path: addEscape fast-paths the no-escape case (no copy),
  one reused buffer per writer replaces a fresh bytes.Buffer per batch, and
  JSONLine encodes values directly (pre-encoded key prefixes, strconv into
  scratch, json-compat float formatting) instead of building a
  map[string]interface{} and reflect-marshaling per row — output stays
  byte-identical. cellIsNull -> vector.IsNull; UrlToStageDef+ToPath ->
  stageutil.UrlToPath.

BVT: bit tricky-byte round-trip (comma/quote/backslash/high-byte), NOT NULL
violation, REPLACE error, and DDL error cases for missing uniqueness
directive, AUTO_INCREMENT, and jsonline+bit; the jsonline wide table drops
its bit column. Note: bit bytes that are pure whitespace (e.g. 10) are
written correctly but read back NULL — the external reader TrimSpaces
non-string fields, a pre-existing read-side limitation.

All 105 BVT assertions pass against a freshly built 2-CN docker cluster;
make + static-check clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@aunjgr aunjgr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All concerns from iamlinjunhong and the self-review addressed: remote ToExternal serialization, SHOW CREATE TABLE, CSV enclosure defaults, JSONLine jsondata, plus P1 fixes (uniqueness enforcement, NOT NULL, pipeline abort, bit encoding). Clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants