Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 77 additions & 3 deletions docs/competition_creation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ public competition-creation API endpoints (kagglesdk 0.1.31+):
- [`kaggle competitions create`](#kaggle-competitions-create)
- [`kaggle competitions pages create`](#kaggle-competitions-pages-create)
- [`kaggle competitions launch`](#kaggle-competitions-launch)
- [`kaggle competitions data update`](#kaggle-competitions-data-update)

All four commands require an authenticated session
All of these commands require an authenticated session
(`kaggle config set username/password` or an API token).

A typical end-to-end host workflow looks like:
Expand All @@ -27,11 +28,14 @@ kaggle competitions create -p ./my-comp
kaggle competitions pages create my-comp-slug --name description -f ./description.md --publish
kaggle competitions pages create my-comp-slug --name rules -f ./rules.md --publish

# 5. Launch the competition (now, or schedule a future UTC time).
# 5. Update the competition data (train.csv, test.csv, sample_submission.csv, ...).
kaggle competitions data update my-comp-slug -p ./data -m "Initial release"

# 6. Launch the competition (now, or schedule a future UTC time).
kaggle competitions launch my-comp-slug --at 2027-01-01T00:00:00Z
```

The four commands are independent — for example, you can call `pages create`
These commands are independent — for example, you can call `pages create`
on a competition that already exists, or use `launch` on a competition created
via the host wizard.

Expand Down Expand Up @@ -340,3 +344,73 @@ kaggle competitions launch my-comp --at 2027-01-01T00:00:00Z

A competition can only be launched once. Subsequent calls will be rejected by
the backend.

---

## `kaggle competitions data update`

Creates a new version of the data files for a competition you host. Uploads
via the standard blob-upload pipeline, then sends a single request bundling
the uploaded tokens. Each update **replaces the prior version's file set in
full** — there is no per-file "keep from previous" mode in v1, so list every
file you want in the new version.

**Usage:**

```bash
kaggle competitions data update <competition> -p <path> -m "<version notes>" \
[--rerun] [--include-hidden]
```

**Arguments:**

- `<competition>`: The competition slug.

**Options:**

- `-p, --path <path>` (required): Either a **directory** (walked recursively —
every file becomes an upload with its relative path preserved in the API's
`name` field, e.g. `train/images/img1.jpg`), or a **single archive file**
(e.g. a pre-packed `.zip` or `.tar`) uploaded as-is. Sub-directories are
always traversed; hidden entries (see `--include-hidden`) are the only files
skipped by default.
- `-m, --message "<notes>"` (required): Notes describing this version
(e.g. `"Added test set"`).
- `--rerun` (optional): Update the RERUN databundle — the private host-only
data swapped in during rerun scoring. Requires Kaggle admin access for now.
Without this flag, the update targets the PUBLIC databundle (what
participants download).
- `--include-hidden` (optional): Upload hidden files and traverse hidden
sub-directories (names starting with `.` — e.g. `.DS_Store`, `.git/`,
`.gitignore`). Skipped by default so you don't accidentally publish OS
metadata or version-control detritus.

**Examples:**

```bash
# Update using a directory tree (recurses into sub-folders).
kaggle competitions data update my-comp -p ./data -m "Initial release"

# Update using a pre-packed archive as a single file (useful when you already
# need a zip for other purposes, or for directory-shaped file formats like
# Zarr).
kaggle competitions data update my-comp -p ./data.zip -m "Initial release"

# New version with a bug-fix.
kaggle competitions data update my-comp -p ./data -m "Fix label encoding in train.csv"

# Update the private rerun-scoring data.
kaggle competitions data update my-comp -p ./rerun-data \
-m "Held-out test set" --rerun
```

**A note on directory-shaped file formats:** some formats (Zarr, some
TensorFlow SavedModel layouts, etc.) are on-disk directories that are logically
a single unit. If you pass a directory containing such a format, the recursive
walk uploads each internal chunk as its own file — often what you want for
Zarr, since participants can then stream individual chunks. If you'd rather
keep the format as an opaque single upload, pre-pack it into a `.zip` or
`.tar` and pass that file to `-p` instead.

The command prints the public URL plus the new `databundle_id` and
`databundle_version_id` on success.
127 changes: 127 additions & 0 deletions src/kaggle/api/kaggle_api_extended.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@
ApiCreateCompetitionPageRequest,
ApiDeleteCompetitionPageRequest,
ApiUpdateCompetitionPageRequest,
ApiCreateCompetitionDataRequest,
ApiCreateCompetitionDataResponse,
ApiCompetitionDataFile,
ApiCompetitionPage,
ApiCreateCompetitionRequest,
ApiCreateCompetitionResponse,
Expand Down Expand Up @@ -135,6 +138,7 @@
)
from kagglesdk.competitions.types.competition_enums import (
CompetitionListTab,
CompetitionDatabundleType,
CompetitionPrivacy,
HostSegment,
CompetitionSortBy,
Expand Down Expand Up @@ -2598,6 +2602,129 @@ def competition_delete_page_cli(
if self.competition_delete_page(competition_name, page_name, no_confirm=no_confirm):
print(f'Page "{page_name}" deleted from competition "{competition_name}".')

def competition_data_update(
self,
competition_name: str,
path: str,
version_notes: str,
rerun: bool = False,
quiet: bool = False,
include_hidden: bool = False,
) -> ApiCreateCompetitionDataResponse:
"""Update (version) the data files for a competition you host.

Uploads the files at ``path`` via the blob-upload pipeline and sends
a CreateCompetitionData request bundling the resulting tokens. Each
update replaces the prior version's file set in full.

- If ``path`` is a single file (e.g. a pre-packed .zip or .tar), it is
uploaded as-is; the file's basename becomes its entry name.
- If ``path`` is a directory, it is walked recursively — every file
becomes its own upload with the path relative to ``path`` preserved
in the API's ``name`` field (e.g. ``train/images/img1.jpg``).
Sub-directories are always traversed. Hidden entries (names starting
with ``.``, including ``.DS_Store`` / ``.git`` / ``.gitignore``) are
skipped by default; pass ``include_hidden=True`` to upload them too.

Args:
competition_name (str): The competition name (slug).
path (str): Path to a directory or a single archive file.
version_notes (str): Notes describing this version (required).
rerun (bool): If True, update the RERUN databundle (private
host-only data used during rerun scoring).
quiet (bool): Suppress per-file upload progress lines.
include_hidden (bool): If True, upload hidden files and traverse
hidden sub-directories. Default False.

Returns:
ApiCreateCompetitionDataResponse: url, databundle_id,
databundle_version_id of the new version.
"""
if not version_notes or not version_notes.strip():
raise ValueError("--message/-m version notes are required")
if not os.path.exists(path):
raise ValueError("Invalid path: " + path)

# Collect (relative_name, full_path) tuples first so we can validate
# and then upload deterministically.
uploads: List[Tuple[str, str]] = []
if os.path.isfile(path):
uploads.append((os.path.basename(path), path))
else:
for dirpath, dirnames, filenames in os.walk(path):
if not include_hidden:
# Prune hidden sub-directories in place so os.walk skips them.
dirnames[:] = [d for d in dirnames if not d.startswith(".")]
filenames = [n for n in filenames if not n.startswith(".")]
for name in filenames:
full = os.path.join(dirpath, name)
rel = os.path.relpath(full, path).replace(os.sep, "/")
uploads.append((rel, full))
uploads.sort()

if not uploads:
raise ValueError(f"No files found under {path} to upload")

files: List[ApiCompetitionDataFile] = []
# TODO: confirm with backend whether competition data should use
# ApiBlobType.INBOX (used here as the closest catch-all) or whether a
# dedicated COMPETITION_DATA blob type needs adding.
with ResumableUploadContext() as upload_context:
for rel_name, full_path in uploads:
upload_file = self._upload_file(
rel_name, full_path, ApiBlobType.INBOX, upload_context, quiet, resources=None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erdalsivri Just to confirm, reusing INBOX here won't cause any problems?

)
if upload_file is not None:
f = ApiCompetitionDataFile()
f.name = rel_name
f.token = upload_file.token
files.append(f)

if not files:
raise ValueError("All file uploads failed; nothing to update")

with self.build_kaggle_client() as kaggle:
request = ApiCreateCompetitionDataRequest()
request.competition_name = competition_name
request.version_notes = version_notes
request.files = files
if rerun:
request.competition_databundle_type = CompetitionDatabundleType.COMPETITION_DATABUNDLE_TYPE_RERUN
return kaggle.competitions.competition_api_client.create_competition_data(request)

def competition_data_update_cli(
self,
competition=None,
competition_opt=None,
path=None,
version_notes=None,
rerun=False,
quiet=False,
include_hidden=False,
):
"""CLI wrapper for competition_data_update."""
competition_name = competition or competition_opt
if competition_name is None:
competition_name = self.get_config_value(self.CONFIG_NAME_COMPETITION)
if competition_name is not None and not quiet:
print("Using competition: " + competition_name)
if competition_name is None:
raise ValueError("No competition specified")
if not path:
raise ValueError("-p/--path is required (folder or archive file)")
if not version_notes:
raise ValueError("-m/--message version notes are required")

response = self.competition_data_update(
competition_name=competition_name,
path=path,
version_notes=version_notes,
rerun=rerun,
quiet=quiet,
include_hidden=include_hidden,
)
print(f'New data version created for "{competition_name}": {response.url}')

def competition_launch(self, competition_name: str, future_time: Optional[datetime] = None) -> None:
"""Launch a competition you host, optionally at a future UTC time.

Expand Down
64 changes: 64 additions & 0 deletions src/kaggle/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,66 @@ def parse_competitions(subparsers) -> None:
parser_competitions_pages_delete._action_groups.append(parser_competitions_pages_delete_optional)
parser_competitions_pages_delete.set_defaults(func=api.competition_delete_page_cli)

# Competitions data (group: update)
parser_competitions_data = subparsers_competitions.add_parser(
"data",
formatter_class=argparse.RawTextHelpFormatter,
help=Help.command_competitions_data,
)
subparsers_competitions_data = parser_competitions_data.add_subparsers(title="commands", dest="command")
subparsers_competitions_data.required = True
subparsers_competitions_data.choices = Help.entity_data_choices

# Competitions data update
parser_competitions_data_update = subparsers_competitions_data.add_parser(
"update",
formatter_class=argparse.RawTextHelpFormatter,
help=Help.command_competitions_data_update,
)
parser_competitions_data_update_optional = parser_competitions_data_update._action_groups.pop()
parser_competitions_data_update_optional.add_argument(
"competition", nargs="?", default=None, help=Help.param_competition
)
parser_competitions_data_update_optional.add_argument(
"-c", "--competition", dest="competition_opt", required=False, help=argparse.SUPPRESS
)
parser_competitions_data_update_optional.add_argument(
"-p",
"--path",
dest="path",
required=True,
help=(
"Path to upload. May be either a directory (walked recursively — "
"sub-directory paths are preserved in each file's name) or a "
"single archive file (e.g. a pre-packed .zip / .tar), which is "
"uploaded as-is."
),
)
parser_competitions_data_update_optional.add_argument(
"-m",
"--message",
dest="version_notes",
required=True,
help='Notes describing this version (e.g. "Added test set").',
)
parser_competitions_data_update_optional.add_argument(
"--rerun",
dest="rerun",
action="store_true",
help="Update the RERUN databundle (private host-only data used during rerun scoring).",
)
parser_competitions_data_update_optional.add_argument(
"--include-hidden",
dest="include_hidden",
action="store_true",
help="Include hidden files and directories (names starting with '.'). Skipped by default.",
)
parser_competitions_data_update_optional.add_argument(
"-q", "--quiet", dest="quiet", action="store_true", help=Help.param_quiet
)
parser_competitions_data_update._action_groups.append(parser_competitions_data_update_optional)
parser_competitions_data_update.set_defaults(func=api.competition_data_update_cli)

# Competitions launch (publish now, or schedule for a future UTC time)
parser_competitions_launch = subparsers_competitions.add_parser(
"launch", formatter_class=argparse.RawTextHelpFormatter, help=Help.command_competitions_launch
Expand Down Expand Up @@ -2117,6 +2177,7 @@ class Help(object):
"replay",
"logs",
"pages",
"data",
"launch",
"init",
"create",
Expand Down Expand Up @@ -2182,6 +2243,7 @@ class Help(object):
forums_topics_choices = ["list", "show"]
entity_topics_choices = ["list", "show"]
entity_pages_choices = ["list", "create", "update", "delete"]
entity_data_choices = ["update"]
config_choices = ["view", "set", "unset"]
auth_choices = ["login", "print-access-token", "revoke"]

Expand Down Expand Up @@ -2257,6 +2319,8 @@ class Help(object):
command_competitions_pages_create = "Create a new page on a competition you host"
command_competitions_pages_update = "Update fields on an existing competition page"
command_competitions_pages_delete = "Delete a page from a competition you host"
command_competitions_data = "Manage a competition's data files"
command_competitions_data_update = "Update (version) the data files for a competition you host"
command_competitions_launch = "Launch a competition you host, optionally at a future UTC time"
command_competitions_init = "Initialize folder with a competition-metadata.json template"
command_competitions_create = "Create a new competition from competition-metadata.json"
Expand Down
Loading
Loading