Open
Conversation
…uplication - Remove dead test: delete tests/test_cleanup.py (imported removed madengine.tools.run_models). - Constants: add _get_env_or_creds_or_default() in core/constants.py and refactor NAS_NODES, MAD_AWS_S3, MAD_MINIO, PUBLIC_GITHUB_ROCM_KEY to use it; add tests/unit/test_constants.py. - Path helpers: add utils/path_utils.py with scripts_base_dir_from() and get_madengine_root(); use in container_runner and kubernetes; add tests/unit/test_path_utils.py. - Run-details: add utils/run_details.py with get_pipeline(), get_build_number(), and flatten_tags_in_place(); refactor container_runner and kubernetes to use them; add tests/unit/test_run_details.py. - Deployment: add apply_deployment_config() in config_loader and create_jinja_env() in base; refactor slurm and kubernetes init to use them; add TestApplyDeploymentConfig and tests/unit/test_deployment_base.py. All changes preserve behavior. Unit test suite (157+ tests) passes.
Unit tests (tests/unit/): - Remove duplicate get_rocm_path coverage from test_constants (kept in test_rocm_path). - Consolidate by topic: test_utils (path_utils + run_details), test_deployment (base + common), test_execution (container_runner_helpers + dockerfile_utils), test_orchestration (image_filtering + orchestrator_logic). - Parametrize and merge redundant cases (e.g. falsy inputs, env/default, launcher normalization, timeout behavior). - Drop duplicate test (e.g. gfx_compilation in dockerfile_utils) and trim edge-case bloat; keep important behavior. - Reduce from 16 to 12 unit test files; 203 tests passing. Integration tests (tests/integration/): - Fix 6 failures in multi-GPU-arch tests: use madengine.execution.dockerfile_utils instead of removed DockerBuilder private methods. - Merge error tests into test_errors.py (CLI + unified system, dedupe setup_logging and context serialization). - Merge batch manifest into test_orchestrator_workflows; merge multi_gpu_arch into test_platform_integration. - Reduce from 10 to 7 integration test files; 155 passed, 1 skipped.
- build_run_command(), assert_model_in_perf_csv(),
get_run_live_log_path(), get_timeout_seconds_from_log()
- DEFAULT_CLEAN_FILES for consistent cleanup lists
- Use DEFAULT_CLEAN_FILES (and + extras) in all e2e modules:
test_run_workflows, test_execution_features, test_data_workflows,
test_build_workflows, test_profiling_workflows, test_scripting_workflows
- Parametrize context tests: merge test_dockerfile_picked_on_detected_context_0/1
into test_dockerfile_picked_on_detected_context (ctx_value, expected_performance)
- Parametrize timeout-in-log tests: replace four separate tests with
test_timeout_value_in_log (tags, log_base_name, expected_seconds, extra_args)
- Leave timeout-fire and other e2e tests unchanged; 60 e2e tests still pass
…OCm/madengine into coketaste/refactor-dis-quality
…ults parses each node’s .out, aggregates (e.g. sum for samples_per_second), and writes one row. 2N×8G should report about 2× the 1N×8G throughput.
…t of result collected from each node
…y, to handle single result and mult results
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1.1 Remove or replace tests/test_cleanup.py
2.1 Constants loading in core/constants.py
2.2 Deployment init and Jinja2 setup (SLURM vs K8s)
2.3 Run-details / perf row construction (K8s + container_runner)
2.4 scripts_base_dir and madengine root