Skip to content

Generic pre processing of data#163

Open
louispt1 wants to merge 2 commits intoversion-2from
generic-pre-processing
Open

Generic pre processing of data#163
louispt1 wants to merge 2 commits intoversion-2from
generic-pre-processing

Conversation

@louispt1
Copy link
Copy Markdown
Member

@louispt1 louispt1 commented Apr 22, 2026

Context

Decouples excel formatting from the packer, introducing ExportDataCollection as a format-agnostic container for scenario export data. This enables multiple formats of export such as excel, parquet, json and csv.

Implemented changes

  • ExportDataCollection model: A Pydantic-based container that stores export data as pandas DataFrames without format-specific transformations
  • ExcelExporter class: Dedicated Excel export handler that encapsulates all Excel-specific formatting
  • Packer refactoring:
    • collect_export_data(): Collects all export data in format-agnostic structure
    • to_excel(): Delegates to the ExcelExporter
    • Main info restructured from scenarios-as-columns to scenarios-as-rows
  • Removed ScenarioExcelService in favour of new approach
Usage

Excel exports (same):
packer.to_excel("output.xlsx", include_inputs=True)

New multi-format exports:

export_data = packer.collect_export_data(include_inputs=True)
export_data.main_info.to_csv("output.csv")
export_data.inputs.to_parquet("inputs.parquet") # Note: You need to install pyarrow or another parquet package - I did not include this in the pyetm dependencies

Or serialize to dict for JSON

with open("data.json", "w") as f:
  json.dump(export_data.to_dict(), f)

Related

Closes #133
Closes #130

Checklist

  • I have tested these changes
  • I have updated documentation as needed
  • I have tagged the relevant people for review

@louispt1 louispt1 requested a review from noracato April 22, 2026 06:46
@noracato
Copy link
Copy Markdown
Member

What is now the conceptual difference between the packer and the exporter? I'm interested in your vision on the overall pyETM architecture with this addition!

@louispt1
Copy link
Copy Markdown
Member Author

What is now the conceptual difference between the packer and the exporter? I'm interested in your vision on the overall pyETM architecture with this addition!

I can see why you ask - maybe it's a bit over-designed. I see the packer as the orchestrator - collecting and packaging the individual packs in a format agnostic way. The excel exporter specifically formats excel outputs, but in theory a parquet_exporter or json_exporter (etc) could be added alongside to extend the export formats.

I think before the packer was doing a lot, and it may not have been so easy to extend. Hopefully this approach makes it easier to export to alternative formats.

Arguably the packer is still doing too much, or is still too architecturally complex. I think abstracting the 'from excel' methods out as well would mean you could have multiple importers and exporters that end up in the unified packer. However perhaps that is out of scope for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants