Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cgo/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*.o
*.o
*.a
*.so
__.SYMDEF*
37 changes: 37 additions & 0 deletions docs/design/writable_external_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
User can create an external table. At this moment the external table is read only.
We need to make external table writable. Suppose user created an external table T, user
can write to the table with

```
INSERT INTO T SELECT * FROM ...
```

Insert into external table should be done in the same way as insert into a matrixone table.
The query should be planned and optimized, and when insert rows, instead of inserting into
matrixone table, it should just call a API and add rows into the table. `LOAD` should load
data into the external table using same API. The API should be invoked using batches, to
try to load multiple rows in one batch.

When `INSERT` or `LOAD` a large amount of data, it should be able to run on multi CN in parallel.
Just call the external insert API in parallel and we will assume the writer will be able to write
external table without causing race condition.

At this moment, we do not support `UPDATE` and `DELETE`, we will add this feature later.

As implementation, we will only support INSERT to csv files and jsonline files. For external table,
it must have an additional config option `WRITE_FILE_PATTERN=strftime_string`, such that newly inserted
data is written to a new file, (or many new files if there are parallel writers, but each of the pipeline
should only create one file). The `strftime_string` can contain `%` formatting characters as strftime.
We will extend strftime with the following.
1. `%nN` be replaced by n random digit numbers.
2. `%U` be replaced by a generated UUID

The pattern must contain `%U` or `%nN` (enforced at CREATE time) so parallel
writers expand to distinct files; time directives are rendered in UTC.

For CSV and jsonline file, the `strftime_string` should point to a valid, writable stage, `stage://...`





433 changes: 433 additions & 0 deletions docs/design/writable_external_table_impl.md

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions pkg/fileservice/file_service_writer.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,17 @@ func (w *FileServiceWriter) Close() error {
w.Group = nil
return err
}

// Abort terminates the write without finalizing the target file: the pipe is
// closed with cause, so the in-flight FileService.Write fails and discards its
// partial output (local: the temp file is removed; object stores: the put is
// not completed) instead of persisting a truncated file the way Close would.
func (w *FileServiceWriter) Abort(cause error) {
_ = w.Writer.CloseWithError(cause)
_ = w.Group.Wait()
_ = w.Reader.Close()

w.Reader = nil
w.Writer = nil
w.Group = nil
}
Comment on lines +102 to +110
866 changes: 475 additions & 391 deletions pkg/pb/pipeline/pipeline.pb.go

Large diffs are not rendered by default.

Loading
Loading