Refactor to a unified v3 to v4 master migrator#284
Refactor to a unified v3 to v4 master migrator#284ParticularlyPythonicBS wants to merge 5 commits intounstablefrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughConsolidates migration logic into a new master v3→v4 migrator, removes legacy per-version migration scripts, refactors the CLI to call the master API (single-file and directory batch modes), adds a parameterized batch runner, and updates tests and fixtures to exercise the new flows. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as Temoa CLI
participant Runner as run_all_v4_migrations.run_migrations
participant Master as master_migration
participant Old as v3 Source
participant Schema as v4 Schema
participant New as v4 Target
User->>CLI: migrate <input_path> --schema <schema> --output <out> [--type]
alt input is directory
CLI->>Runner: run_migrations(input_dir, migration_script, schema_path, ...)
Runner->>Master: invoke master_migration per file (type derived from suffix)
else single file
CLI->>Master: migrate_database(...) or migrate_sql_dump(...)
end
Master->>Schema: load v4 schema
Master->>Old: open/connect/read v3 source
Master->>New: create/populate v4 target
Note over Master: phases — operator-table transforms, mapped table copies, custom transforms, metadata/vacuum
Master-->>CLI: return status / summary
CLI->>User: print completion / errors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/cli.py`:
- Around line 454-459: The directory migration path calls run_migrations(...)
but doesn't pass the CLI silent flag so scan/progress output remains printed;
update the CLI call to forward the silent flag (e.g., run_migrations(...,
dry_run=False, silent=silent)) and, if needed, add a corresponding silent
parameter to the run_migrations function signature and its internal logging so
it suppresses progress/scan prints when silent is true (or alternately make
run_migrations accept a logger parameter and respect its level).
In `@temoa/utilities/master_migration.py`:
- Around line 375-388: The current migrate_database opens output_path and writes
the v4 schema before running execute_v3_to_v4_migration, risking a half-written
DB if migration fails; change migrate_database to create and open a temporary
database file (e.g., via tempfile.NamedTemporaryFile or
tempfile.TemporaryDirectory), apply the schema and run
execute_v3_to_v4_migration using con_old and con_new pointed at the temp file,
commit/vacuum and close both connections, then atomically replace the final
output_path with the temp file (os.replace) so no partially migrated DB is left
behind on failure.
- Around line 240-242: The SQL string in the loop using con_old.execute with an
f-string and row[0]/row[1] is vulnerable to syntax errors when values contain
quotes; change the query to use parameterized SQL (pass the region and tech as
parameters to con_old.execute rather than interpolating into the string) in the
block where the vintage is selected (the for v in con_old.execute(...) loop) so
the DB driver safely escapes values and avoids broken SQL.
In `@temoa/utilities/run_all_v4_migrations.py`:
- Around line 39-50: run_migrations currently calls sys.exit on validation
failures which prevents callers (like temoa.cli) from handling errors; change
the three input checks in run_migrations to raise descriptive exceptions (e.g.,
FileNotFoundError or ValueError) instead of printing and calling sys.exit, and
update the summary/failure reporting logic at the end of the function to either
raise a consolidated exception on overall failure or return a structured result
object (status + error details) so the caller can decide how to handle
termination. Ensure error messages include the invalid Path and keep the
function signature (run_migrations) unchanged so temoa.cli can catch and handle
the exceptions or inspect the returned result.
- Around line 86-98: The migration subprocess invocation hardcodes 'python3' in
the migration_cmd list, which can bypass the active virtualenv or fail on
platforms without that binary; replace the literal with sys.executable so the
current Python interpreter is used. Update the construction of migration_cmd
(where migration_script, target_file, schema_path, temp_output_file and mig_type
are assembled) to use sys.executable, and ensure you import sys at top of the
module if not already present; leave the call to run_command(result =
run_command(migration_cmd, cwd=Path.cwd())) unchanged.
- Around line 77-79: The temp file descriptors returned by tempfile.mkstemp()
are being discarded and leaked when creating temp_output_file and
original_backup_file; capture the returned file descriptor and path (fd, path =
mkstemp(...)), call os.close(fd) immediately to close the open descriptor, then
wrap Path(path) for temp_output_file and original_backup_file so the files can
be reopened later (especially on Windows); keep mig_type logic unchanged.
In `@tests/test_cli.py`:
- Around line 175-189: The test test_cli_migrate_accepts_directory_input only
exercises the empty-directory fast path; create a migratable fixture inside
dummy_dir (e.g., add a small .sql or .sqlite file in the dummy_dir before
calling runner.invoke(app, args)) so the batch migration flow is exercised;
update any expectations if the presence of the file changes output (keep the
assertions that look for 'Batch migrating directory' and include the new file
name in the stdout check) and ensure the fixture file is written using Path
methods on dummy_dir so the test still cleans up correctly.
In `@tests/test_v4_migration.py`:
- Around line 17-24: The test currently only creates a v3.1 SQLite DB (db_v3_1
using SCHEMA_V3_1 and MOCK_DATA) so the direct v3→v4 migration path isn't
exercised; update the master-migration test to be parametrized to run against
both the existing v3.1 fixture and a real v3 fixture (e.g., add SCHEMA_V3 and
MOCK_DATA_V3 or a v3_fixture) by creating a second temporary DB (db_v3_real)
from the real v3 schema/data and invoking the same migration/assertion logic for
both fixtures; ensure the test harness (the parametrized test or loop)
references db_v3_1 and the new db_v3_real so the migrator code paths for v3→v4
are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b4213a37-2f78-4dcc-a0a2-096bef61a766
📒 Files selected for processing (5)
temoa/cli.pytemoa/utilities/master_migration.pytemoa/utilities/run_all_v4_migrations.pytests/test_cli.pytests/test_v4_migration.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/test_cli.py (1)
175-189:⚠️ Potential issue | 🟡 MinorDirectory migration test still misses the real batch-processing path.
This still exercises the empty-directory early-exit branch only, so it doesn’t validate per-file migration behavior in directory mode.
Suggested test adjustment
def test_cli_migrate_accepts_directory_input(tmp_path: Path) -> None: """Test that the migrate command accepts a directory as input for batch processing.""" dummy_dir = tmp_path / 'my_dummy_dir' dummy_dir.mkdir() + fixture_sql = dummy_dir / UTOPIA_SQL_FIXTURE.name + shutil.copy2(UTOPIA_SQL_FIXTURE, fixture_sql) args = ['migrate', str(dummy_dir)] result = runner.invoke(app, args) assert result.exit_code == 0 # Normalize whitespace to handle platform-specific line breaks from rich.print() normalized_output = ' '.join(result.stdout.split()) assert 'Batch migrating directory' in normalized_output # Check for the directory name in the original output (paths may be split across lines) assert 'my_dummy_dir' in result.stdout - assert 'No .sql, .sqlite, or .db files found' in result.stdout + assert fixture_sql.name in result.stdout + assert 'No .sql, .sqlite, or .db files found' not in result.stdout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cli.py` around lines 175 - 189, The test test_cli_migrate_accepts_directory_input currently only exercises the empty-directory early-exit path; modify it to create at least one database/sql file inside dummy_dir (e.g., my_dummy_dir/test.sql or test.db) with minimal content so the CLI runs the per-file batch-processing path, then invoke runner.invoke(app, args) and update assertions to check for per-file migration output (look for messages emitted by the migrate command such as file-specific "migrated" or "processing" text) instead of the "No .sql, .sqlite, or .db files found" message; locate this change around the dummy_dir creation and the runner.invoke(app, args) call in test_cli_migrate_accepts_directory_input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cli.py`:
- Around line 252-259: The test's path assertion is too weak because it only
checks tmp_path.name; update the assertion to assert the normalized full
tmp_path string (e.g., tmp_path.as_posix() or str(tmp_path)) appears in the
normalized output instead of tmp_path.name, using the same normalization
approach you applied to result.stdout (clean_output/normalized_output) so path
separators and whitespace are normalized consistently; locate the assertions
around clean_output / normalized_output and replace the tmp_path.name check with
a membership check for the normalized full tmp_path.
---
Duplicate comments:
In `@tests/test_cli.py`:
- Around line 175-189: The test test_cli_migrate_accepts_directory_input
currently only exercises the empty-directory early-exit path; modify it to
create at least one database/sql file inside dummy_dir (e.g.,
my_dummy_dir/test.sql or test.db) with minimal content so the CLI runs the
per-file batch-processing path, then invoke runner.invoke(app, args) and update
assertions to check for per-file migration output (look for messages emitted by
the migrate command such as file-specific "migrated" or "processing" text)
instead of the "No .sql, .sqlite, or .db files found" message; locate this
change around the dummy_dir creation and the runner.invoke(app, args) call in
test_cli_migrate_accepts_directory_input.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0d07b330-5194-4407-862b-3f4c970cdd09
📒 Files selected for processing (1)
tests/test_cli.py
| clean_output = re.sub(r'\s+', '', result.stdout) | ||
|
|
||
| assert 'SQL dump migration completed' in normalized_output | ||
| assert 'Warning: Input directory' in normalized_output | ||
| assert str(non_writable_mock_parent) in normalized_output | ||
| assert 'is not writable.' in normalized_output | ||
| assert 'Saving output to current directory:' in normalized_output | ||
| assert str(tmp_path) in normalized_output | ||
| assert 'mock_non_writable_input_parent' in clean_output | ||
| assert 'is not writable' in normalized_output | ||
| assert 'Saving output to current directory' in normalized_output | ||
| assert tmp_path.name in clean_output |
There was a problem hiding this comment.
Path assertion became too weak after normalization.
Line 259 now checks only tmp_path.name, which can match unrelated output. Assert the normalized full path instead.
Suggested assertion fix
- assert tmp_path.name in clean_output
+ assert str(tmp_path).replace(' ', '') in clean_output📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| clean_output = re.sub(r'\s+', '', result.stdout) | |
| assert 'SQL dump migration completed' in normalized_output | |
| assert 'Warning: Input directory' in normalized_output | |
| assert str(non_writable_mock_parent) in normalized_output | |
| assert 'is not writable.' in normalized_output | |
| assert 'Saving output to current directory:' in normalized_output | |
| assert str(tmp_path) in normalized_output | |
| assert 'mock_non_writable_input_parent' in clean_output | |
| assert 'is not writable' in normalized_output | |
| assert 'Saving output to current directory' in normalized_output | |
| assert tmp_path.name in clean_output | |
| clean_output = re.sub(r'\s+', '', result.stdout) | |
| assert 'SQL dump migration completed' in normalized_output | |
| assert 'Warning: Input directory' in normalized_output | |
| assert 'mock_non_writable_input_parent' in clean_output | |
| assert 'is not writable' in normalized_output | |
| assert 'Saving output to current directory' in normalized_output | |
| assert str(tmp_path).replace(' ', '') in clean_output |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_cli.py` around lines 252 - 259, The test's path assertion is too
weak because it only checks tmp_path.name; update the assertion to assert the
normalized full tmp_path string (e.g., tmp_path.as_posix() or str(tmp_path))
appears in the normalized output instead of tmp_path.name, using the same
normalization approach you applied to result.stdout
(clean_output/normalized_output) so path separators and whitespace are
normalized consistently; locate the assertions around clean_output /
normalized_output and replace the tmp_path.name check with a membership check
for the normalized full tmp_path.
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
tests/test_cli.py (1)
257-264: 🧹 Nitpick | 🔵 TrivialPath assertion remains weak after normalization.
Line 264 checks only
tmp_path.nameinclean_output, which could match unrelated substrings. Consider asserting a more specific path pattern:- assert tmp_path.name in clean_output + # Normalize path for comparison (handles Windows backslashes) + normalized_tmp_path = str(tmp_path).replace('\\', '/').replace(' ', '') + assert normalized_tmp_path in clean_output or tmp_path.name in clean_output🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cli.py` around lines 257 - 264, The assertion using only tmp_path.name is too weak and may match unrelated substrings; update the test in tests/test_cli.py to assert a more specific path presence by checking the full temporary path (e.g., str(tmp_path) or tmp_path.as_posix()) or by asserting the basename with surrounding separators (e.g., path separator + tmp_path.name) in clean_output/normalized_output; locate the assertions around variables clean_output, normalized_output and result and replace the tmp_path.name check with a check for the full tmp_path string or a separator-anchored pattern to ensure the actual directory path is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/utilities/master_migration.py`:
- Around line 124-375: The function execute_v3_to_v4_migration is too large and
should be split into focused helper functions to reduce cyclomatic complexity:
extract the operator-added logic into _migrate_operator_tables(con_old,
con_new), the standard/copy logic into _migrate_standard_tables(con_old,
con_new), the LoanLifetimeTech logic into _migrate_loan_lifetime(con_old,
con_new), all time-related aggregates into _migrate_time_tables(con_old,
con_new), and CapacityFactorProcess into _migrate_capacity_factor(con_old,
con_new); each helper should encapsulate its current try/except and return the
number of rows it migrated (or accept and update a mutable total), and
execute_v3_to_v4_migration should call these helpers in sequence (keeping the
final metadata updates) and sum/print the total.
- Around line 111-114: The function creates an intermediate variable mapped_once
just to apply re.sub and lower(), so simplify by inlining: replace the
mapped_once assignments and return with a single return that applies
re.sub(r'__+', '_', ''.join(out)).lower() (or directly return re.sub(...,
to_snake_case(token)).lower() if appropriate) so you remove the unnecessary
mapped_once variable; references: mapped_once, to_snake_case, token, re.sub.
- Around line 408-428: The migrate_sql_dump function currently writes the SQL
dump directly to output_path which can leave a partial file on failure; change
it to write to a temporary file in the same directory (e.g., using
tempfile.NamedTemporaryFile or mkstemp with dir=output_path.parent and a unique
suffix), flush and fsync the temp file after writing the iterdump output from
con_new_in_memory, close it, then atomically replace output_path with
os.replace(temp_path, output_path); ensure the temp is removed on exceptions and
permissions/umask are respected so the final state matches the original
behavior.
- Around line 145-172: The migration builds new_cols and computes op_index =
new_cols.index('operator') but doesn't defensively assert the result; add a
defensive assertion after computing op_index to ensure the operator column is
present and within bounds (e.g., assert 0 <= op_index < len(new_cols), include
new_name in the message) so migrations for tables in OPERATOR_ADDED_TABLES fail
fast on unexpected schema changes rather than producing incorrect row mappings;
reference op_index, new_cols, new_name and OPERATOR_ADDED_TABLES when adding
this check.
In `@temoa/utilities/run_all_v4_migrations.py`:
- Around line 136-140: The raise RuntimeError when failed_files is truthy makes
the following else unnecessary; remove the else block and unindent the print so
that after the if failed_files: raise RuntimeError(...) branch, you check if not
silent: print('All files migrated successfully.') at the same indentation level.
Update the block that references failed_files and silent (the RuntimeError raise
and the print statement) accordingly.
In `@tests/test_v4_migration.py`:
- Around line 19-25: pytest.mark.parametrize currently passes the parameter
names as a single string; change its first argument to a tuple of parameter
names (e.g., ('schema_file', 'mock_data_file')) so pytest recognizes each
parameter correctly; update the call where pytest.mark.parametrize is used
around the test function referencing SCHEMA_V3, MOCK_DATA_V3, SCHEMA_V3_1, and
MOCK_DATA_V3_1 to use the tuple form.
In `@tests/testing_data/migration_v3_mock.sql`:
- Around line 1-30: Add a representative operator-constraint row so the
operator-injection and period-to-vintage branches in master_migration.py (the
OPERATOR_ADDED_TABLES flow) are exercised; e.g., insert a single test row into
MinCapacity or EmissionLimit in the mock SQL (use the same region 'R1', tech
'T1', a vintage/period value present in TimePeriod like 2030, and matching
commodity/units) so the migration code paths that handle operator-added tables
and period->vintage translation (operator-injection and period-to-vintage around
master_migration.py lines 135-172) are covered. Ensure the inserted column names
match the schema for the chosen table (MinCapacity or EmissionLimit) and use
values consistent with existing test rows (region 'R1', tech 'T1', vintage 2030)
so the migration will process it.
---
Duplicate comments:
In `@tests/test_cli.py`:
- Around line 257-264: The assertion using only tmp_path.name is too weak and
may match unrelated substrings; update the test in tests/test_cli.py to assert a
more specific path presence by checking the full temporary path (e.g.,
str(tmp_path) or tmp_path.as_posix()) or by asserting the basename with
surrounding separators (e.g., path separator + tmp_path.name) in
clean_output/normalized_output; locate the assertions around variables
clean_output, normalized_output and result and replace the tmp_path.name check
with a check for the full tmp_path string or a separator-anchored pattern to
ensure the actual directory path is present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 296aa71e-421b-45c9-bc17-8a56fd837bdf
📒 Files selected for processing (9)
temoa/cli.pytemoa/utilities/db_migration_v3_1_to_v4.pytemoa/utilities/db_migration_v3_to_v3_1.pytemoa/utilities/master_migration.pytemoa/utilities/run_all_v4_migrations.pytemoa/utilities/sql_migration_v3_1_to_v4.pytests/test_cli.pytests/test_v4_migration.pytests/testing_data/migration_v3_mock.sql
💤 Files with no reviewable changes (3)
- temoa/utilities/db_migration_v3_to_v3_1.py
- temoa/utilities/db_migration_v3_1_to_v4.py
- temoa/utilities/sql_migration_v3_1_to_v4.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
temoa/utilities/run_all_v4_migrations.py (1)
26-36:⚠️ Potential issue | 🟡 Minor
silent=Truestill emits subprocess logs.
run_migrations()now looks like a reusable programmatic entry point, butrun_command()still prints the command and captured stdout/stderr unconditionally. Silent callers will still get per-file noise.🔇 Minimal fix
def run_command( - cmd: list[str], cwd: Path | None = None, capture_output: bool = True + cmd: list[str], + cwd: Path | None = None, + capture_output: bool = True, + silent: bool = False, ) -> subprocess.CompletedProcess[str]: """Helper to run shell commands.""" - print(f'Executing: {" ".join(cmd)}') + if not silent: + print(f'Executing: {" ".join(cmd)}') result = subprocess.run(cmd, cwd=cwd, capture_output=capture_output, text=True, check=False) - if result.returncode != 0 and capture_output: + if result.returncode != 0 and capture_output and not silent: print(f'COMMAND FAILED (exit code {result.returncode}):') print('STDOUT:\n', result.stdout) print('STDERR:\n', result.stderr) return result @@ - result = run_command(migration_cmd, cwd=Path.cwd()) + result = run_command(migration_cmd, cwd=Path.cwd(), silent=silent)Also applies to: 39-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@temoa/utilities/run_all_v4_migrations.py` around lines 26 - 36, run_command currently prints the executed command and captured stdout/stderr unconditionally which breaks callers that expect silence (e.g., run_migrations). Add a boolean parameter (e.g., silent: bool = False) to run_command and use it to suppress all prints: only print the "Executing: ..." line and the failure diagnostics when silent is False; keep existing behavior otherwise. Update callers (notably run_migrations) to pass silent=True when they want no output. Ensure function signature (run_command) and all call sites are updated accordingly to avoid regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/utilities/master_migration.py`:
- Around line 404-410: The migrate_database function currently calls
sqlite3.connect(source_path) which will silently create a new empty DB if
source_path is missing; add an explicit validation at the top of
migrate_database to verify source_path.exists() and source_path.is_file() (and
optionally that schema_path exists) and raise a clear FileNotFoundError or
ValueError if the input file is missing, before calling
sqlite3.connect(source_path), so a typo in --input fails fast instead of
producing an empty migration.
- Around line 405-407: migrate_database currently creates the temp sqlite file
in the system temp dir (tempfile.mkstemp) and then calls os.replace(temp_path,
output_path), which can fail with EXDEV when output_path is on a different
filesystem; update migrate_database to create the temp file in
output_path.parent (pass dir=output_path.parent to mkstemp or use
NamedTemporaryFile in that dir) so temp_path and output_path are on the same
filesystem, and wrap the os.replace(temp_path, output_path) call in the same
exception-handling/cleanup pattern used by migrate_sql_dump (catch exceptions,
remove temp_path on failure, and re-raise or log appropriately) to ensure atomic
replacement and proper cleanup of temp files.
In `@tests/test_v4_migration.py`:
- Around line 19-26: The test_v4_migrations parametrized fixture currently seeds
V3/V3.1 data but never asserts the operator-table migration path; update the
test to explicitly verify that _migrate_operator_tables() produced the expected
rows by asserting the migrated limit_capacity and limit_emission entries and
that each row contains the injected operator values (match against
SCHEMA_V3/SCHEMA_V3_1 mock data expectations); locate the assertions near the
existing checks for time-table/capacity-factor/metadata and add checks for
presence, correct values of limit_capacity.limit and limit_emission.limit (or
equivalent column names) and their associated operator identifiers.
---
Outside diff comments:
In `@temoa/utilities/run_all_v4_migrations.py`:
- Around line 26-36: run_command currently prints the executed command and
captured stdout/stderr unconditionally which breaks callers that expect silence
(e.g., run_migrations). Add a boolean parameter (e.g., silent: bool = False) to
run_command and use it to suppress all prints: only print the "Executing: ..."
line and the failure diagnostics when silent is False; keep existing behavior
otherwise. Update callers (notably run_migrations) to pass silent=True when they
want no output. Ensure function signature (run_command) and all call sites are
updated accordingly to avoid regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: be199f5f-00b8-4e73-80af-e41be7d373a9
📒 Files selected for processing (4)
temoa/utilities/master_migration.pytemoa/utilities/run_all_v4_migrations.pytests/test_v4_migration.pytests/testing_data/migration_v3_mock.sql
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/utilities/master_migration.py`:
- Around line 179-190: The set custom_handled_old_tables currently unions
OPERATOR_ADDED_TABLES.keys() but intentionally does not list v3.1 names like
LimitCapacity/LimitEmission so they fall through to _migrate_standard_tables()
because those v3.1 tables already include the operator column; add a short
clarifying comment next to the custom_handled_old_tables definition explaining
that v3.1 operator tables (e.g., LimitCapacity, LimitEmission) are intentionally
handled by the standard migration path and thus do not need to be listed in
OPERATOR_ADDED_TABLES.keys() to avoid confusion.
- Around line 430-439: The try block currently closes con_old and con_new before
os.replace(), then the finally block closes them again; remove the explicit
con_old.close() and con_new.close() calls from the try section so the finally
block is the single cleanup path; keep the existing exception handling that
removes temp_path on error and ensure the finally block still unconditionally
(or defensively check for None) closes con_old and con_new to guarantee proper
cleanup.
In `@tests/test_v4_migration.py`:
- Line 30: Rename the variable db_v3_1 in tests/test_v4_migration.py to a
neutral name (e.g., test_db or source_db) to reflect the parametrized test that
runs against both v3 and v3.1 schemas, and update every occurrence (assignments,
path uses, and any function calls or assertions referencing db_v3_1) to the new
name so references like db_v3_1 = tmp_path / 'test_db.sqlite' and subsequent
uses are consistently replaced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7bb0f9f3-64fc-4d82-a190-8a5fdaff6374
📒 Files selected for processing (4)
temoa/utilities/master_migration.pytemoa/utilities/run_all_v4_migrations.pytests/test_v4_migration.pytests/testing_data/migration_v3_1_mock.sql
| custom_handled_old_tables = { | ||
| 'MetaData', | ||
| 'MetaDataReal', | ||
| 'TimeSeason', | ||
| 'time_season', | ||
| 'time_of_day', | ||
| 'time_season_sequential', | ||
| 'TimeSeasonSequential', | ||
| 'TimeSegmentFraction', | ||
| 'LoanLifetimeTech', | ||
| 'CapacityFactorProcess', | ||
| }.union(OPERATOR_ADDED_TABLES.keys()) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Confirm v3.1 operator tables are intentionally handled via standard table migration.
The custom_handled_old_tables set excludes tables in OPERATOR_ADDED_TABLES.keys() (v3 names like MinCapacity, EmissionLimit). However, v3.1 table names (LimitCapacity, LimitEmission) are not excluded, meaning they will be processed by _migrate_standard_tables().
This appears correct—v3.1 tables already have the operator column, so they can be copied directly without transformation. However, adding a comment would clarify this intentional design:
# Note: v3.1 tables (LimitCapacity, LimitEmission, etc.) are handled here
# since they already have the operator column and don't need transformation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@temoa/utilities/master_migration.py` around lines 179 - 190, The set
custom_handled_old_tables currently unions OPERATOR_ADDED_TABLES.keys() but
intentionally does not list v3.1 names like LimitCapacity/LimitEmission so they
fall through to _migrate_standard_tables() because those v3.1 tables already
include the operator column; add a short clarifying comment next to the
custom_handled_old_tables definition explaining that v3.1 operator tables (e.g.,
LimitCapacity, LimitEmission) are intentionally handled by the standard
migration path and thus do not need to be listed in OPERATOR_ADDED_TABLES.keys()
to avoid confusion.
| con_old.close() | ||
| con_new.close() | ||
| os.replace(temp_path, output_path) | ||
| except Exception: | ||
| if temp_path.exists(): | ||
| os.remove(temp_path) | ||
| raise | ||
| finally: | ||
| con_old.close() | ||
| con_new.close() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Potential double-close of database connections.
The finally block closes con_old and con_new unconditionally, but these connections are also closed in the try block (lines 430-431) before os.replace(). If migration succeeds, the connections are closed twice. While SQLite handles double-close gracefully, this is unnecessary.
💡 Suggested fix: Remove explicit close in try block
con_new.commit()
con_new.execute('VACUUM;')
con_new.execute('PRAGMA foreign_keys = 1;')
- con_old.close()
- con_new.close()
os.replace(temp_path, output_path)
except Exception:
if temp_path.exists():
os.remove(temp_path)
raise
finally:
con_old.close()
con_new.close()Let the finally block handle all cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@temoa/utilities/master_migration.py` around lines 430 - 439, The try block
currently closes con_old and con_new before os.replace(), then the finally block
closes them again; remove the explicit con_old.close() and con_new.close() calls
from the try section so the finally block is the single cleanup path; keep the
existing exception handling that removes temp_path on error and ensure the
finally block still unconditionally (or defensively check for None) closes
con_old and con_new to guarantee proper cleanup.
| """Test both SQL and SQLite master migrators.""" | ||
|
|
||
| # 1. Create SQLite DB | ||
| db_v3_1 = tmp_path / 'test_db.sqlite' |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Rename db_v3_1 to reflect parametrized usage.
The variable db_v3_1 is misleading since this test is now parametrized to run with both v3 and v3.1 schemas. Consider renaming to test_db or source_db for clarity.
💡 Suggested rename
- db_v3_1 = tmp_path / 'test_db.sqlite'
- with sqlite3.connect(db_v3_1) as conn:
+ source_db = tmp_path / 'test_db.sqlite'
+ with sqlite3.connect(source_db) as conn:And update all subsequent references to db_v3_1 accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_v4_migration.py` at line 30, Rename the variable db_v3_1 in
tests/test_v4_migration.py to a neutral name (e.g., test_db or source_db) to
reflect the parametrized test that runs against both v3 and v3.1 schemas, and
update every occurrence (assignments, path uses, and any function calls or
assertions referencing db_v3_1) to the new name so references like db_v3_1 =
tmp_path / 'test_db.sqlite' and subsequent uses are consistently replaced.
- Created master_migration.py which streamlines direct structure translation from v3 and v3.1 to v4. - Refactored run_all_v4_migrations.py to expose a reusable batch migrations function. - Updated cli.py to seamlessly accept directory migrations leveraging the new master hook. - Fixed scope and schema edge cases, ensuring robust validation across the tests.
…s, and harden testing
…xpanded test coverage
206d758 to
a20dc7c
Compare
Consolidates and streamlines the database migration process.
Moves from a multi-step migration v3->v3.1->v4 to a master migrator script that directly manages structural transformations to the v4 schema bypassing intermediate states.
Additionally, the command-line interface has been updated to support thread-safe, batch directory migrations directly out-of-the-box.
Closes #197
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests