Skip to content

Optimize MaxText unit and integration test suite runtime#3860

Open
shralex wants to merge 1 commit intomainfrom
shralex_test_2
Open

Optimize MaxText unit and integration test suite runtime#3860
shralex wants to merge 1 commit intomainfrom
shralex_test_2

Conversation

@shralex
Copy link
Copy Markdown
Collaborator

@shralex shralex commented May 9, 2026

This pull request optimizes the MaxText TPU/GPU CI test suites to substantially reduce total execution duration. It eliminates redundant compilation, graph tracing, and setup latencies without affecting functional verification or reducing test coverage.

Classes of Optimizations:

  1. Model Downscaling (Compilation): Scaled down model configurations (embedding dimensions, heads, layers) across unit and integration tests to minimize XLA compilation overhead.
  2. Sequence & Step Capping (Execution): Capped sequence lengths at 128 or 512 (down from 1024/8192) and reduced loop steps/step counts across test assertions and runs.
  3. Cached Data & Model Pipelines (Initialization & Compilation):
    Replaced method-level dataset recreation with class-level lazy caching to construct data pipelines once per suite.
  4. Moved input pipeline tests to be cpu_only.
  5. Added class-level caching of unquantized base model results in QuantTest to eliminate redundant compilation and forward/backward pass execution across multiple quantization tests.
  6. Local Path & Tokenizer Redirection (I/O & Network):
  • Enforced synthetic datasets and redirected GCS asset paths to local mock directories to eliminate network/GCS latency.
  • Redirected remote Hugging Face tokenizers (google-t5/t5-large, deepseek-ai/DeepSeek-V3, and zephyr-7b-beta) in HF/Grain tests to local pre-saved asset directories (tokenizer.default and qwen3-tokenizer), completely removing external network request overhead.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@shralex shralex force-pushed the shralex_test_2 branch 10 times, most recently from 71c696c to f69ef10 Compare May 9, 2026 17:34
@shralex shralex force-pushed the shralex_test_2 branch 17 times, most recently from 24a5669 to dc08cc1 Compare May 10, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant