Skip to content

fix(cli): honor --unzip for cached dataset downloads#1086

Merged
stevemessick merged 1 commit into
Kaggle:mainfrom
sridipbasu:fix/cached-unzip
Jul 1, 2026
Merged

fix(cli): honor --unzip for cached dataset downloads#1086
stevemessick merged 1 commit into
Kaggle:mainfrom
sridipbasu:fix/cached-unzip

Conversation

@sridipbasu

Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where kaggle datasets download --unzip did not extract a dataset if the ZIP archive was already cached locally.

Root Cause

The extraction logic in dataset_download_files() was executed only when a new download occurred (downloaded == True).

If the local ZIP was already up to date, download_needed() returned False, causing the extraction block to be skipped entirely, even when the user explicitly requested --unzip.

Fix

Decouple the extraction step from the download step by allowing the unzip logic to execute whenever --unzip is requested, regardless of whether the ZIP was freshly downloaded or already cached.

This preserves the existing download and caching behavior while ensuring cached ZIP files can still be extracted.

Reproduction

Before this change:

  1. Download a dataset:

    kaggle datasets download -d owner/dataset
  2. Run:

    kaggle datasets download -d owner/dataset --unzip

Observed

The CLI reports:

Skipping, found more recently modified local copy

The existing ZIP archive is not extracted.

Expected

The cached ZIP should be extracted without requiring a re-download.

Testing

Added regression tests covering:

  • Cached ZIP + --unzip extracts the existing archive without downloading again.
  • Fresh download + --unzip continues to work as before.

Also manually verified the following scenarios:

  • Normal dataset download
  • Cached dataset + --unzip
  • Forced download + --unzip

Notes

This is a minimal, localized change that preserves the existing caching logic while ensuring the --unzip flag behaves consistently for cached dataset downloads.

@sridipbasu

Copy link
Copy Markdown
Contributor Author

Supporting Evidence

I have attached is a screenshot showing the reproduced behavior before the fix.

It demonstrates that when a dataset ZIP is already cached locally, running:

kaggle datasets download -d <dataset> --unzip

reports:

Skipping, found more recently modified local copy

but does not extract the existing ZIP archive. Using --force --unzip downloads the archive again and extracts it successfully.

Screenshot 2026-07-01 014056

@stevemessick

Copy link
Copy Markdown
Contributor

/gcbrun

@stevemessick stevemessick merged commit bb1228c into Kaggle:main Jul 1, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants