Skip to content

Wavelet tree implementation#40

Open
NikolayMishukov wants to merge 5 commits intomainfrom
wavelet_tree
Open

Wavelet tree implementation#40
NikolayMishukov wants to merge 5 commits intomainfrom
wavelet_tree

Conversation

@NikolayMishukov
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread include/pixie/utils.h
Comment thread include/pixie/wavelet_tree.h
Comment thread include/pixie/wavelet_tree.h
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 20, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 1
Issue Details (click to expand)

WARNING

File Line Issue
include/pixie/utils.h 66 alphabet_size == 0 produces an invalid uniform distribution range

SUGGESTION

File Line Issue
include/pixie/wavelet_tree.h 39 Avoid std::function in recursive build path to reduce overhead
Files Reviewed (3 files)
  • include/pixie/wavelet_tree.h - 1 issue
  • scripts/coverage_report.sh - 0 issues
  • src/tests/wavelet_tree_tests.cpp - 0 issues

Reviewed by gpt-5.2-codex-20260114 · 297,555 tokens

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 94.49541% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.05%. Comparing base (7db5cd7) to head (e4bd7a5).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
include/pixie/wavelet_tree.h 93.38% 7 Missing and 2 partials ⚠️
src/tests/wavelet_tree_tests.cpp 97.36% 0 Missing and 2 partials ⚠️
include/pixie/utils.h 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #40      +/-   ##
==========================================
+ Coverage   86.17%   87.05%   +0.88%     
==========================================
  Files          11       13       +2     
  Lines        2850     3068     +218     
  Branches      561      604      +43     
==========================================
+ Hits         2456     2671     +215     
+ Misses        257      254       -3     
- Partials      137      143       +6     
Flag Coverage Δ
gcov 87.05% <94.49%> (+0.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

size_t end,
std::span<uint64_t> data,
node_index_t parent,
const std::function<size_t(node_index_t)>& get_middle = [](auto) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Avoid std::function in the hot build path

std::function introduces type-erasure overhead per recursive call. If you can make BuildNode a template on the callable (or pass a small function object by value), you can keep the same behavior while eliminating indirect calls and potential allocations during construction.


namespace pixie {

enum WaveletTreeBuildType { Standard, Huffman };
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use enum class. In current setup it will conflict with other enums using the same values, with enum class it will not be an issue, you'd need to specify as WaveletTreeBuildType::Standard though. Also please use all uppercase for enums, i.e. STANDARD, HUFFMAN


enum WaveletTreeBuildType { Standard, Huffman };

class WaveletTree {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring description is required for class and public methods

std::vector<node_index_t> leaves_;
std::vector<size_t> permutation_, inverse_permutation_;

node_index_t BuildNode(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake_case for methods, i.e. build_node

Comment thread include/pixie/wavelet_tree.h
class WaveletTree {
using node_index_t = size_t;
struct WaveletNode {
static constexpr node_index_t kNil =
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use npos for consistency

node_index_t BuildNode(
size_t begin,
size_t end,
std::span<uint64_t> data,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node is not suppose to change data than it should have type std::span<const uint64_t>

Comment thread CMakeLists.txt
gtest_main
${PIXIE_DIAGNOSTICS_LIBS})

add_executable(wavelet_tree_tests
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add in the converage_report.sh and build-test workflow

}

nodes_.emplace_back(middle, std::move(bit_vector), data.size());
auto cut = std::stable_partition(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how SDSL constructs WT: it does it in two passes over the data without intermediate buffers. All bitvectors are concotenated (optional improvement)
https://app.kilo.ai/s/605b1323-63b1-430e-be01-69dda2718f71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants