Skip to content

Add import_format to S3/GCS import data sources#21

Merged
juliavallina merged 2 commits into
tinybirdco:mainfrom
filias:add-s3-gcs-import-format
Jun 29, 2026
Merged

Add import_format to S3/GCS import data sources#21
juliavallina merged 2 commits into
tinybirdco:mainfrom
filias:add-s3-gcs-import-format

Conversation

@filias

@filias filias commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

S3 and GCS import data sources accept an optional import_format (csv/ndjson/parquet), emitted as IMPORT_FORMAT in the generated .datasource, and round-tripped by the datafile parser and the TypeScript-migration emitter.

Why

The S3/GCS connector infers file format from the extension. Files whose extension doesn't imply the format — e.g. NDJSON delivered as .log (Fastly's S3 logging hard-codes the .log suffix) — fail import with Format not supported. IMPORT_FORMAT is already supported by the platform (tb datasource create --s3-format; the datafile import_format param); this exposes it in the SDK so generated datasources can set it.

Changes

Mirrors the existing from_timestamp field end-to-end, for both S3 and GCS:

  • import_format on S3Config / GCSConfig (schema), emitted by _generate_import_config.
  • Migrate parity: parsed in parse_datasource (IMPORT_FORMAT), carried on the S3/GCS migrate models, emitted by emit_ts.
  • Tests: generation (set + unset), GCS generation, datafile parse, and a parse→emit round-trip.
  • CHANGELOG entry.

make check passes locally (ruff, format, mypy, pytest 133 passed, gitleaks).

filias and others added 2 commits June 29, 2026 09:32
S3 and GCS import data sources accept an optional import_format
(csv/ndjson/parquet), emitted as IMPORT_FORMAT in the generated
.datasource. This lets you ingest files whose extension does not imply
the format (for example NDJSON delivered as .log), which otherwise fail
with "Format not supported".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror from_timestamp end-to-end: parse IMPORT_FORMAT, carry it on the
S3/GCS migrate models, emit it from emit_ts, and add parse + round-trip
tests. Keeps generator/parser/emitter parity for the new field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@juliavallina juliavallina self-requested a review June 29, 2026 10:20

@juliavallina juliavallina left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for contributing. I'll merge it and release a new version (0.4.0)

@juliavallina juliavallina merged commit ac71b2a into tinybirdco:main Jun 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants