Optimize MaxText unit and integration test suite runtime by shralex · Pull Request #3860 · AI-Hypercomputer/maxtext

shralex · 2026-05-09T14:58:22Z

This pull request optimizes the MaxText TPU/GPU CI test suites to substantially reduce total execution duration. It eliminates redundant compilation, graph tracing, and setup latencies without affecting functional verification or reducing test coverage.

Classes of Optimizations:

Model Downscaling (Compilation): Scaled down model configurations (embedding dimensions, heads, layers) across unit and integration tests to minimize XLA compilation overhead.
Sequence & Step Capping (Execution): Capped sequence lengths at 128 or 512 (down from 1024/8192) and reduced loop steps/step counts across test assertions and runs.
Cached Data & Model Pipelines (Initialization & Compilation):
Replaced method-level dataset recreation with class-level lazy caching to construct data pipelines once per suite.
Moved input pipeline tests to be cpu_only.
Added class-level caching of unquantized base model results in QuantTest to eliminate redundant compilation and forward/backward pass execution across multiple quantization tests.
Local Path & Tokenizer Redirection (I/O & Network):

Enforced synthetic datasets and redirected GCS asset paths to local mock directories to eliminate network/GCS latency.
Redirected remote Hugging Face tokenizers (google-t5/t5-large, deepseek-ai/DeepSeek-V3, and zephyr-7b-beta) in HF/Grain tests to local pre-saved asset directories (tokenizer.default and qwen3-tokenizer), completely removing external network request overhead.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-09T15:04:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

shralex requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, richjames0, suexu1025 and vipannalla as code owners May 9, 2026 14:58

shralex force-pushed the shralex_test_2 branch 10 times, most recently from 71c696c to f69ef10 Compare May 9, 2026 17:34

shralex force-pushed the shralex_test_2 branch 17 times, most recently from 24a5669 to dc08cc1 Compare May 10, 2026 06:16

Optimize MaxText unit and integration test suite runtime

a7a6eb2

shralex force-pushed the shralex_test_2 branch from dc08cc1 to a7a6eb2 Compare May 10, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize MaxText unit and integration test suite runtime#3860

Optimize MaxText unit and integration test suite runtime#3860
shralex wants to merge 1 commit intomainfrom
shralex_test_2

shralex commented May 9, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shralex commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

codecov Bot commented May 9, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shralex commented May 9, 2026 •

edited

Loading