fix: update to perch-hoplite 1.0.0 API (Deployment → Recording → Window) by LimitlessGreen · Pull Request #871 · birdnet-team/BirdNET-Analyzer

LimitlessGreen · 2026-02-16T17:56:02Z

Summary

Migrate from the deprecated perch-hoplite API (EmbeddingSource model) to the new Deployment → Recording → Window data model introduced in perch-hoplite v1.0.0.

The previous code used EmbeddingSource, get_embedding_source(), insert_embedding(), and SQLiteUsearchDB (lowercase "s"), all of which have been removed or renamed in perch-hoplite 1.0.0.

Changes

Embedding pipeline (`birdnet_analyzer/embeddings/core.py`)

Rewrite to use insert_deployment() / insert_recording() / insert_window() instead of the removed insert_embedding() + EmbeddingSource
Add ghost segment filtering: birdnet pads shorter files in a batch to match max_n_segments, and not all padded segments are masked. Now additionally checks s_start >= input_durations[i] and clamps s_end = min(s_end, file_dur) to avoid inserting phantom windows
Use handle_duplicates="skip" on insert_window() for resume support
Fix create_csv_output() to use match_window_ids() + get_window() + get_recording()

Model utilities (`birdnet_analyzer/model_utils.py`)

Replace removed model.encode_array() with model.encode_session() + session.run_arrays() (birdnet library API change)

Search (`birdnet_analyzer/search/utils.py`, `search/core.py`)

Fix SQLiteUsearchDB → SQLiteUSearchDB casing (renamed in perch-hoplite 1.0)
Replace embedding_id with window_id in SearchResult
Replace removed get_embedding_source() with get_window() + get_recording()

GUI (`birdnet_analyzer/gui/search.py`, `gui/embeddings.py`)

Same get_window() + get_recording() migration
Fix SQLiteUSearchDB casing

Tests (`tests/embeddings/test_embeddings.py`)

Update mock for get_embeddings() to return a proper AcousticFileEncodingResult-like object with segment_duration_s, overlap_duration_s, n_inputs, embeddings, embeddings_masked, inputs, input_durations

Testing

373 tests pass, no regressions introduced (7 pre-existing failures in tests/analyze/test_analyze.py and tests/test_utils.py are unrelated to this PR and also fail on the base birdnet-lib branch)
tests/embeddings/test_embeddings.py passes with updated mock
Verified embedding creation with a real audio dataset (100 recordings → 495 windows)
Verified search and CSV export functionality

Copilot

Pull request overview

This PR migrates BirdNET-Analyzer from the deprecated perch-hoplite EmbeddingSource API to the new v1.0.0 Deployment → Recording → Window data model. The migration includes API renames (e.g., SQLiteUsearchDB → SQLiteUSearchDB), replacement of removed methods (model.encode_array() → model.encode_session() + session.run_arrays()), and implementation of ghost segment filtering to prevent invalid padded segments from being inserted into the database.

Changes:

Migrated embedding pipeline to use new deployment/recording/window hierarchy with improved resume support via handle_duplicates="skip"
Fixed model utilities to use new birdnet library encoding session API
Updated search and GUI components to retrieve window and recording data using the new perch-hoplite 1.0 API

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/embeddings/test_embeddings.py	Updated mock to return proper encoding result structure with required attributes
birdnet_analyzer/search/utils.py	Fixed SQLiteUSearchDB casing and migrated from embedding_id to window_id
birdnet_analyzer/search/core.py	Updated to use get_window() and get_recording() instead of removed get_embedding_source()
birdnet_analyzer/model_utils.py	Replaced encode_array() with encode_session() + run_arrays() and added result squeezing
birdnet_analyzer/gui/search.py	Updated GUI search to use new window/recording API
birdnet_analyzer/gui/embeddings.py	Fixed SQLiteUSearchDB casing in database creation
birdnet_analyzer/embeddings/core.py	Comprehensive rewrite to use deployment/recording/window model with ghost segment filtering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/embeddings/test_embeddings.py

birdnet_analyzer/embeddings/core.py

Migrate from deprecated perch-hoplite API (EmbeddingSource model) to the new Deployment → Recording → Window data model introduced in v1.0.0. Changes: - embeddings/core.py: Rewrite embedding pipeline to use insert_deployment/insert_recording/insert_window instead of insert_embedding+EmbeddingSource. Add ghost segment filtering for birdnet's padded AcousticFileEncodingResult. Use handle_duplicates="skip" for resume support. - model_utils.py: Replace removed encode_array() with encode_session()+run_arrays() API. - search/utils.py: Fix SQLiteUsearchDB → SQLiteUSearchDB casing, replace embedding_id with window_id in SearchResult. - search/core.py: Use get_window()+get_recording() instead of removed get_embedding_source(). - gui/search.py: Same get_window()+get_recording() migration. - gui/embeddings.py: Fix SQLiteUSearchDB casing. - tests/embeddings/test_embeddings.py: Update mock to match new AcousticFileEncodingResult structure.

…uard os.makedirs against empty dirname

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LimitlessGreen · 2026-02-17T14:15:08Z

I rebased it. Just note that this is for the current WIP birdnet-team:birdnet-lib (#867) branch. Since the refactoring to birdnetlib is ongoing, this should not interfere with it.

mschulist · 2026-02-17T21:20:41Z

If it is possible, you will see significant performance improvements using insert_windows_batch instead of insert_window (about 5x improvement). Perhaps you can do a single batch per file (which would also remove the redundant get_all_recordings calls)?

In addition, you'll also see a significant speedup if you use the USearch index to perform the ANN/KNN instead of the brute force approach it currently uses. This might require changing the interface because the search metric must be defined during the db creation, however I imagine most people are using the inner product metric anyways...

LimitlessGreen · 2026-02-19T19:24:06Z

@mschulist Thanks a lot for the pointers here, they were super helpful. I implemented both suggestions and ran benchmarks.

What I changed

I switched embedding writes to insert_windows_batch (instead of per-window inserts).
I added USearch ANN for score_function=dot when the DB metric is IP (with brute-force fallback otherwise).

Benchmark setup

100 WAV files from a representative sample subset
Same machine and DB configuration for before/after comparison

Results

Workload	Insert (before -> after)	Insert speedup	Search `dot` (before -> after)	Search speedup
30 segments/file (3 runs)	20.84s -> 20.69s	1.01x	0.1245s -> 0.0031s	40.37x
60 segments/file (2 runs)	85.19s -> 80.84s	1.05x	0.1556s -> 0.0042s	36.86x
90 segments/file (1 run)	189.11s -> 185.40s	1.02x	0.2414s -> 0.0056s	42.74x

Note on score-function interfaces

ANN is metric-bound at DB/index level, while score function is currently selected per query.
So right now I use ANN for compatible combinations (dot + IP), and I keep brute-force fallback for other combinations (cosine/euclidean) to preserve correctness.

Why insert gain is modest (my current hypothesis)

insert_windows_batch removes some overhead (for example fewer repeated recording lookups), but the dominant cost still seems to be per-window DB/index writes and duplicate handling, so ingest improvement is measurable but small in this setup.

I’m still thinking through the cleanest interface changes for cosine/euclidean (and whether to expose backend choice more explicitly), so I can make that behavior clearer and less surprising.

mschulist · 2026-02-19T19:43:07Z

Yeah it is a bit unfortunate that there is so much overhead with insert_windows_batch when checking for duplicates... But at least the indexing is fast!

Josef-Haupt · 2026-02-26T14:40:55Z

Looks good!

max-mauermann · 2026-02-26T15:08:44Z

Looks good to me also.
Thanks for providing this!

Copilot AI review requested due to automatic review settings February 16, 2026 17:56

Copilot started reviewing on behalf of LimitlessGreen February 16, 2026 17:56 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

tests/embeddings/test_embeddings.py Show resolved Hide resolved

birdnet_analyzer/embeddings/core.py Outdated Show resolved Hide resolved

birdnet_analyzer/embeddings/core.py Outdated Show resolved Hide resolved

LimitlessGreen and others added 4 commits February 17, 2026 14:59

fix: address review feedback — add missing input_durations to mock, g…

b535912

…uard os.makedirs against empty dirname

Apply suggestions from code review

911cbb1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: remove duplicate input_durations line

289c346

LimitlessGreen force-pushed the fix/perch-hoplite-1.0-api-compat branch from 8e5d479 to 289c346 Compare February 17, 2026 14:01

LimitlessGreen added 2 commits February 19, 2026 20:28

embeddings: batch window inserts per file

772b3e2

search: use ANN for dot/IP with brute fallback

1c7bd7c

max-mauermann merged commit 5cb7939 into birdnet-team:birdnet-lib Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update to perch-hoplite 1.0.0 API (Deployment → Recording → Window)#871

fix: update to perch-hoplite 1.0.0 API (Deployment → Recording → Window)#871
max-mauermann merged 6 commits intobirdnet-team:birdnet-libfrom
LimitlessGreen:fix/perch-hoplite-1.0-api-compat

LimitlessGreen commented Feb 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LimitlessGreen commented Feb 17, 2026 •

edited

Loading

Uh oh!

mschulist commented Feb 17, 2026

Uh oh!

LimitlessGreen commented Feb 19, 2026

Uh oh!

mschulist commented Feb 19, 2026

Uh oh!

Josef-Haupt commented Feb 26, 2026

Uh oh!

max-mauermann commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

LimitlessGreen commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Embedding pipeline (birdnet_analyzer/embeddings/core.py)

Model utilities (birdnet_analyzer/model_utils.py)

Search (birdnet_analyzer/search/utils.py, search/core.py)

GUI (birdnet_analyzer/gui/search.py, gui/embeddings.py)

Tests (tests/embeddings/test_embeddings.py)

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LimitlessGreen commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mschulist commented Feb 17, 2026

Uh oh!

LimitlessGreen commented Feb 19, 2026

What I changed

Benchmark setup

Results

Note on score-function interfaces

Why insert gain is modest (my current hypothesis)

Uh oh!

mschulist commented Feb 19, 2026

Uh oh!

Josef-Haupt commented Feb 26, 2026

Uh oh!

max-mauermann commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LimitlessGreen commented Feb 16, 2026 •

edited

Loading

Embedding pipeline (`birdnet_analyzer/embeddings/core.py`)

Model utilities (`birdnet_analyzer/model_utils.py`)

Search (`birdnet_analyzer/search/utils.py`, `search/core.py`)

GUI (`birdnet_analyzer/gui/search.py`, `gui/embeddings.py`)

Tests (`tests/embeddings/test_embeddings.py`)

LimitlessGreen commented Feb 17, 2026 •

edited

Loading