Skip to content

Refactor: Reduce Complexity of sklearn_serializer.py #48

@Gnpd

Description

@Gnpd

Refactor: Reduce Complexity of sklearn_serializer.py

Summary

The file openmodels/serializers/sklearn/sklearn_serializer.py has grown quite large and complex, containing a mix of serialization logic, deserialization logic, type/dtype mapping, special-case handlers, and a large number of scikit-learn-specific workarounds. This makes the codebase harder to maintain, test, and extend.

Motivation

  • Maintainability: The current file is lengthy and contains many responsibilities, making it difficult to navigate and update.
  • Testability: Isolating logic into smaller, focused modules or classes will make it easier to write targeted unit tests.
  • Extensibility: Reducing complexity will make it easier to add support for new estimators, kernels, or serialization features in the future.
  • Readability: A more modular structure will help new contributors understand and contribute to the codebase.

Suggested Refactoring Tasks

  • Split the file into smaller modules: For example, move loss serialization, kernel serialization, tree serialization, and special-case handlers into their own files or classes.
  • Group related helper functions: Consider grouping helpers (e.g., type/dtype mapping, attribute extraction) into utility modules.
  • Reduce duplication: Identify and refactor repeated patterns (e.g., recursive serialization/deserialization) into reusable functions.
  • Document module boundaries: Add docstrings and comments to clarify the responsibilities of each new module/class.
  • Add or improve tests: Ensure that the refactored code is covered by unit tests, especially for edge cases and custom estimator support.

Acceptance Criteria

  • The main sklearn_serializer.py file should be significantly shorter and focused on high-level orchestration.
  • Specialized logic (losses, kernels, trees, etc.) should be moved to dedicated modules or classes.
  • All existing tests should pass, and new tests should be added for any newly isolated logic.
  • The public API and behavior should remain unchanged.

Related file: openmodels/serializers/sklearn/sklearn_serializer.py

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions