[Refactor] make faster construct_scorecard method for xbooster/lgb_constructor.py#10
[Refactor] make faster construct_scorecard method for xbooster/lgb_constructor.py#10RektPunk wants to merge 2 commits intoxRiskLab:mainfrom
construct_scorecard method for xbooster/lgb_constructor.py#10Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideOptimizes construct_scorecard in xbooster/lgb_constructor.py by replacing per-tree looping and concatenation with a single vectorized Pandas/NumPy aggregation and updating the merge keys accordingly. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `xbooster/lgb_constructor.py:297-298` </location>
<code_context>
- # Aggregate indices, leafs, and counts of events and non-events
- df_binning_table = pd.concat([df_binning_table, binning_table], axis=0)
+ # Aggregate indices, leafs, and counts of events and non-events
+ tree_leaf_idx_long = pd.DataFrame(tree_leaf_idx).melt(var_name="Tree", value_name="Node")
+ tree_leaf_idx_long["label"] = np.tile(labels.values, tree_leaf_idx.shape[1])
+ binning_table = (
+ tree_leaf_idx_long.groupby(["Tree", "Node"])["label"]
</code_context>
<issue_to_address>
**issue (bug_risk):** This refactor changes label alignment semantics from index-based to purely positional, which may subtly alter behavior if `labels` has a non-trivial index.
The previous `pd.concat(..., axis=1)` + groupby relied on index-aware alignment between `labels` and `tree_leaf_idx`. Swapping to `labels.values` + `np.tile` assumes pure positional alignment and ignores the label index, so any non-default or filtered index on `labels` could yield silently incorrect event/non-event counts. If index alignment matters, either reset the index on `labels` before using `.values`, or construct the long DataFrame via a concat that preserves the index instead of manual tiling.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
construct_scorecard method for xbooster/lgb_constructor.pyconstruct_scorecard method for xbooster/lgb_constructor.py
|
Great work on these performance optimizations! All your changes from PRs #10, #11, #13, and #14 have been merged into the What's included:
Impact:
The RC branch ( Thank you for these excellent contributions! |
Description
This PR optimizes the
construct_scorecardmethod by replacing the iterative tree-by-tree processing with a vectorized approach using Pandas and NumPy.Previously, the code used a Python loop to iterate through each tree, performing individual grouping and concatenation. This created significant overhead, especially as the number of trees increased. The new implementation leverages
melt()and a singlegroupby()operation to process all trees simultaneously.Key Changes
Performance Benchmark
Measured using
%%timeiton the example ipynb:Summary by Sourcery
Optimize LightGBM scorecard construction by replacing per-tree looping with a vectorized aggregation over all trees and merging results directly on tree/node identifiers.
Enhancements: