[Refactor] faster get_leafs method for xbooster/lgb_constructor.py#13
[Refactor] faster get_leafs method for xbooster/lgb_constructor.py#13RektPunk wants to merge 1 commit intoxRiskLab:mainfrom
get_leafs method for xbooster/lgb_constructor.py#13Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideRefactors the margin-output branch of get_leafs in lgb_constructor to batch LightGBM per-tree predictions into a NumPy column stack before constructing the DataFrame, eliminating per-column assignment for better performance. Class diagram for updated get_leafs method in lgb_constructorclassDiagram
class LGBConstructor {
+model
+get_leafs(X, output_type, n_trees)
}
class LightGBMModel {
+predict(X, raw_score, start_iteration, num_iteration)
}
LGBConstructor --> LightGBMModel : uses
Flow diagram for optimized margin branch in get_leafsflowchart LR
A[start] --> B["Call get_leafs with X, output_type, n_trees"]
B --> C{output_type == margin}
C -->|no| D["Execute non margin logic (unchanged)"]
D --> Z[return df_leafs]
C -->|yes| E["Initialize tree_results as empty list"]
E --> F["Set i = 0"]
F --> G{ i < n_trees }
G -->|no| H["Stack tree_results with np.column_stack"]
H --> I["Construct df_leafs = DataFrame(stacked, index=X.index, columns=_colnames)"]
I --> Z[return df_leafs]
G -->|yes| J["res = model.predict(X, raw_score=True, start_iteration=i, num_iteration=1)"]
J --> K["Append res to tree_results"]
K --> L["i = i + 1"]
L --> G
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new
DataFrameconstruction relies on_colnames, which doesn’t appear in this snippet; double-check that_colnamesis defined in this branch and that its length always matchesn_treesto avoid runtime errors or misaligned columns. - Using
index=X.indexchanges behavior compared to the previous implementation (which used the default RangeIndex); ensureXis guaranteed to be a pandas object with anindexattribute, or guard against numpy arrays to avoid attribute errors.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `DataFrame` construction relies on `_colnames`, which doesn’t appear in this snippet; double-check that `_colnames` is defined in this branch and that its length always matches `n_trees` to avoid runtime errors or misaligned columns.
- Using `index=X.index` changes behavior compared to the previous implementation (which used the default RangeIndex); ensure `X` is guaranteed to be a pandas object with an `index` attribute, or guard against numpy arrays to avoid attribute errors.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Great work on these performance optimizations! All your changes from PRs #10, #11, #13, and #14 have been merged into the What's included:
Impact:
The RC branch ( Thank you for these excellent contributions! |
Description
This PR optimizes the
get_leafsmethod (specifically for output_type="margin") by replacing iterative DataFrame column assignment with the Collect-then-Construct pattern usingnp.column_stack.Key Changes
np.column_stack.Performance Benchmark
Measured using
%%timeiton the example ipynb:Summary by Sourcery
Enhancements:
np.column_stack-based DataFrame construction to speed up margin leaf retrieval.