feat: add NIP-50 support#160
Conversation
|
I've been working on testing this with @dskvr , have some feedback: My relay has ~20m events, so this is a good test of the indexing functionality. We ran into some troubles with indexing (it stalled out after ~8m events), so @dskvr added some additional improvements in I started indexing the db with this config: Indexing started running great, I came back this morning and my logs are getting spammed with:
Where the counter never increments. It just keeps sending this same log over and over. I tried It's possible this is a red herring log, where because of my My On the bright side, query performance is great. Querying I think this PR is on the right track here, just needs some tweaking on rebuilding the index on large datasets. Just my 2 sats. |
|
This is very impressive, thank you! Sounds like more testing is necessary, but yes this looks broadly like it's on the right track. |
|
@leesalminen Are you still running this branch, if so, any issues other than the debugging output bug? @hoytech The only issue I am aware of is that when interrupting an indexing operation, the debugging output is not correct when resuming. Also, there needs to be a method to destroy the index. |
@leesalminen Bugs you reported in |
Adds ISearchProvider interface, NoopSearchProvider stub, SearchProvider factory (makeSearchProvider()), and KindMatcher for kind-range filtering. RelayServer grows a unique_ptr<ISearchProvider> searchProvider field plus searchIndexerThread/searchIndexerRunning members. RelayWebsocket advertises NIP-50 in supported_nips when the provider is healthy. QueryScheduler gains the searchProvider field to pass through to DBQuery. Initialization order in cmd_relay.cpp ensures searchProvider is set before websocket threads start, avoiding any data race. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
Full LMDB-backed full-text search implementation. Tokenizer.h splits text into normalized lowercase tokens (2-48 chars). LmdbSearchProvider.h implements BM25 scoring over SearchIndex (DUPSORT inverted index with packed levId:48/tf:16 postings) and SearchDocMeta (per-doc len+kind metadata). Supports configurable candidate ranking strategies and recency boost. Uses lmdb::from_sv<uint64_t>/to_sv<uint64_t> throughout for alignment-safe LMDB value reads/writes (MERGE-03: eliminates all reinterpret_cast UB). Includes dup-detection guard to prevent re-indexing already-indexed events and batch-process logging. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
SearchRunner.h integrates ISearchProvider with DBQuery for NIP-50 query execution. RelayWriter passes searchProvider to writeEvents() and includes a search indexing loop after write commit. RelayCron registers search index cleanup hooks for event expiration. cmd_relay.cpp initializes the search provider via makeSearchProvider() before websocket threads start, starts/joins the searchIndexer background thread. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
Adds three search maintenance commands (auto-discovered by golpe): - search-reindex: catch-up indexer with checkpoint support, manual levId override, and batch-progress logging - search-set-state: manually set lastIndexedLevId and indexVersion in SearchState table - search-index-stats: report index size (token count, doc count, table sizes) cmd_delete.cpp gains search index cleanup to remove indexed events on manual deletion. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
golpe.yaml: add SearchIndex (DUPSORT inverted index), SearchDocMeta
(per-doc BM25 metadata, MDB_INTEGERKEY), and SearchState (index progress
tracking) tables.
src/apps/relay/golpe.yaml: add 13 relay__search__* config keys covering
enabled flag, backend, indexedKinds, BM25 ranking weights, candidate
ranking strategy, and query limits. Preserves all upstream additions:
relay__auth__enabled, relay__auth__serviceUrl, relay__maxTagsPerFilter,
relay__filterValidation__* block.
strfry.conf: add search { } config block after filterValidation block.
DBQuery.h: integrate SearchRunner and ISearchProvider; constructor accepts
optional searchProvider parameter; process() dispatches to searchRunner
when filter has search field and provider is healthy.
ActiveMonitors.h: add hasSearch() guard to skip search-only filters in
the monitor fast path.
Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
filters.h: add std::optional<std::string> search field and search key parsing to NostrFilter constructor; add hasSearch() predicate. Preserves upstream's try-catch wrapping for filter parse errors (MERGE-05), and relay__maxTagsPerFilter config key replacing hardcoded limit. events.h/events.cpp: merge writeEvents() signature to accept both logLevel (uint64_t, replaces upstream's bool logDeletions) and ISearchProvider *searchProvider. Preserves upstream's a-tag deletion handling for kind-5 parameterized replaceable events (parseATag call, replaceDeletion index). Adds searchProvider->deleteEvent() in the deletion loop for search index consistency. RelayReqWorker.cpp: pass searchProvider to QueryScheduler so DBQuery receives it at construction time for search-aware query dispatch. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
Adds bench/scenarios/ (small.yml, medium.yml) for 10k and 1M event benchmark scenarios, and bench/scripts/ (prepare.sh, run.sh, report.py, sysinfo.sh) for reproducible benchmark runs with search enabled/disabled. bench/SCENARIOS.md documents the methodology. Co-Authored-By: sandwich <dskvr@users.noreply.github.com>
|
Correction to my prior comment: the squashed branch from earlier today silently regressed several pieces of upstream evolution (the rebase used a diff-apply approach that overwrote upstream's later changes in non-conflicting files). The PR has now been re-rebased using cherry-pick onto current Verification on the current branch:
Apologies for the noise; the diff against |
Testing report from production relay (~26M events, 52 GiB LMDB, 2 weeks uptime)Tested the PR on a live nostr relay over ~2 weeks. Lots of good news, some real bugs, and (importantly) a clear workaround that gets writes flowing on prod again. Build/env:
TL;DR
✅ What works well
🐛 Bug 1 —
|
| Date | Signal | Signature |
|---|---|---|
| May 7 | SIGABRT | free(): double free detected in tcache 2 |
| May 15 | SIGABRT | free(): double free detected in tcache 2 |
| May 17 | SIGSEGV | (no extra info) |
| May 21 | SIGABRT | free(): double free detected in tcache 2 |
The May 7 crash happened with relay.search.enabled = false too, so the bug isn't strictly gated by the flag — Search provider initialized: backend=lmdb enabled=0 healthy=1 shows the provider inits regardless of enabled, so shared code may still be reachable.
The SIGSEGV on May 17 (vs. the otherwise consistent SIGABRT) suggests broader memory-management issues, not a single specific double-free site. Cadence: roughly one crash every 2-3 days under steady prod traffic.
No core dumps captured — this host had ulimit -c 0 and no systemd-coredump. Happy to re-run with cores enabled if a backtrace would help.
🐛 Bug 3 — wildly inconsistent no-match query latency
Searching for unique tokens not in the index returned EOSE with 0 events but at very different speeds across 5 trials:
marker=klaudejw50ahbz → EOSE in 20,008 ms (0 results)
marker=klaudewlutbdvg → EOSE in 192 ms (0 results)
marker=klaudeefplccci → EOSE in 16,572 ms (0 results)
marker=klaude34rlwdk8 → EOSE in 191 ms (0 results)
For 6-char token prefixes that should miss the inverted index entirely, an alternating ~190 ms vs ~17–20 s pattern is surprising. Could be tokenizer behavior on specific prefixes, posting-list fanout, or a cold-cache effect. From an end-user perspective, 20s for zero results is rough.
🐛 Bug 4 — search indexer wedges the Writer thread after catch-up
This is the show-stopper for us on prod, but the cause is now well-isolated:
Symptom: with relay.search.enabled = true and the indexer caught up to head, the Writer thread goes silent shortly after restart:
- Right after restart: Writer logs a flurry of
Inserted eventlines for ~10–20 seconds. - Then: Writer log lines drop to 0–3 per minute on a relay that should be ingesting many events per second.
- DB size stops growing.
- New publishes from external clients (tested from a separate machine to
wss://no.str.cr) get noOKresponse, 30 s timeout. The events never appear in[Writer]logs. - Reads/queries continue to work normally throughout.
Isolation:
- Initially suspected our
writevalidate.jspolicy plugin (it has a known sin: silentlycontinue-ing on JSON parse errors in its catch block, which would leave strfry waiting forever for a policy response that never comes). Disabled the plugin entirely and restarted → same wedge symptoms appeared in the same timeframe. Not the plugin. - Then disabled search (
relay.search.enabled = false) and restarted → writes flow normally: 281[Writer]log lines in 100 s vs. 3 lines in 90 s with search enabled. External publish from a separate machine getsOK accepted=truein ~25 ms after WS open.
So the indexer thread, in steady state after catch-up completes, appears to be contending with the Writer thread for the LMDB write txn and the writer either deadlocks or is starved out. The mutual exclusivity of search vs. writes makes the PR unshippable for prod as-is, but the failure mode is well-defined and the workaround is trivial (just don't turn search on).
Misc observations / doc asks
relay.search.enabled = falsesilently no-ops thesearchfilter field instead ofCLOSED-ing the REQ. With search disabled,["REQ", id, {"search": "foo", "limit": 5}]returned 5 random recent events as if no filter were present. Most relays would close the sub with an "unsupported" message — this behavior could confuse NIP-50 clients into thinking the relay supports search. Either advertise it correctly in NIP-11 orCLOSEDthe sub when search is off.Search provider initialized: ... enabled=0 healthy=1appears even whenrelay.search.enabled = false. Confirm whether the disabled-path is truly inert vs. just gates queries.- Reindex throughput: ~6,500 events/sec offline (~1.6 MB/s). 66 min projected for a 26M-event DB. Worth calling out in the PR description so operators can plan downtime.
- Live catch-up throughput: ~400–500 LevIds/sec sustained, with ~60% match rate against our
indexedKinds. Caught up 18M events in roughly 10–14 hours under live load. bm25 { k1, b }sub-block: existing configs predating the PR won't have it. golpe's parser tolerates the missing block (defaults used), but the migration notes should call this out: either "block is optional" or "recommended values: k1=1.2, b=0.75".search_index_statshangs when the relay is running: command blocked indefinitely until SIGTERM. Worked fine when relay was stopped. Might be read-txn related.
Happy to dig deeper on any of these — get a coredump for Bug 2, isolate the bad event for Bug 1, run more search latency experiments for Bug 3, or test specific theories for Bug 4. Just let me know what would be most useful.
Bug 4 fix — verified working on prodPatched the build on our prod relay and Bug 4 (Writer self-deadlock) is resolved. Confirmed end-to-end with search enabled and live traffic:
The diff is small. Defers I'm not opening a PR for this — sharing as a starting point in case it's useful. Treat as a sketch, not as a clean upstreamable commit; you may want to handle Diff against current PR HEAD (`c04ed93`)diff --git a/src/events.h b/src/events.h
index 6757cd8..3d84124 100644
--- a/src/events.h
+++ b/src/events.h
@@ -87,7 +87,7 @@ struct EventToWrite {
};
-void writeEvents(lmdb::txn &txn, NegentropyFilterCache &neFilterCache, std::vector<EventToWrite> &evs, uint64_t logLevel = 1, class ISearchProvider *searchProvider = nullptr);
+void writeEvents(lmdb::txn &txn, NegentropyFilterCache &neFilterCache, std::vector<EventToWrite> &evs, uint64_t logLevel = 1, class ISearchProvider *searchProvider = nullptr, std::vector<uint64_t> *outLevIdsToRemoveFromSearch = nullptr);
bool deleteEventBasic(lmdb::txn &txn, uint64_t levId);
template <typename C>
diff --git a/src/events.cpp b/src/events.cpp
index 9c98c99..9a75971 100644
--- a/src/events.cpp
+++ b/src/events.cpp
@@ -249,7 +249,7 @@ static bool isEventABeforeEventB(const PackedEventView &a, const PackedEventView
return a.created_at() < b.created_at() || (a.created_at() == b.created_at() && a.id() > b.id());
}
-void writeEvents(lmdb::txn &txn, NegentropyFilterCache &neFilterCache, std::vector<EventToWrite> &evs, uint64_t logLevel, ISearchProvider *searchProvider) {
+void writeEvents(lmdb::txn &txn, NegentropyFilterCache &neFilterCache, std::vector<EventToWrite> &evs, uint64_t logLevel, ISearchProvider *searchProvider, std::vector<uint64_t> *outLevIdsToRemoveFromSearch) {
bool logDeletions = logLevel > 0;
std::sort(evs.begin(), evs.end(), [](auto &a, auto &b) {
auto aC = a.createdAt();
@@ -386,14 +386,14 @@ void writeEvents(lmdb::txn &txn, NegentropyFilterCache &neFilterCache, std::vect
updateNegentropy(PackedEventView(evToDel->buf), false);
deleteEventBasic(txn, levId);
- // Remove from search index
- if (searchProvider && searchProvider->healthy()) {
- try {
- searchProvider->deleteEvent(levId);
- } catch (std::exception &e) {
- // Don't fail deletions if search removal fails, just log
- LE << "Search delete failed for levId=" << levId << ": " << e.what();
- }
+ // Stash levId for search-index removal AFTER outer txn commits.
+ // Calling searchProvider->deleteEvent() here would open a second
+ // top-level LMDB write txn while we still hold this one, causing
+ // the Writer thread to self-deadlock on the LMDB writer mutex.
+ // The actual deleteEvent call happens in the caller (e.g. RelayWriter)
+ // after txn.commit(), mirroring the existing inline indexEvent pattern.
+ if (outLevIdsToRemoveFromSearch) {
+ outLevIdsToRemoveFromSearch->push_back(levId);
}
}
diff --git a/src/apps/relay/RelayWriter.cpp b/src/apps/relay/RelayWriter.cpp
index ba3e6d2..785605a 100644
--- a/src/apps/relay/RelayWriter.cpp
+++ b/src/apps/relay/RelayWriter.cpp
@@ -1,5 +1,7 @@
#include "RelayServer.h"
+#include <unordered_set>
+
#include "PluginEventSifter.h"
#include "PrometheusMetrics.h"
@@ -62,10 +64,15 @@ void RelayServer::runWriter(ThreadPool<MsgWriter>::Thread &thr) {
// Do write
+ // Collected inside writeEvents (under the outer write txn) and processed
+ // AFTER that txn commits, to avoid a self-deadlock: searchProvider->deleteEvent
+ // opens its own top-level write txn and LMDB only allows one per env/thread.
+ std::vector<uint64_t> deletedLevIdsForSearch;
+
try {
auto t0 = std::chrono::steady_clock::now();
auto txn = env.txn_rw();
- writeEvents(txn, neFilterCache, newEvents, 1, searchProvider.get());
+ writeEvents(txn, neFilterCache, newEvents, 1, searchProvider.get(), &deletedLevIdsForSearch);
txn.commit();
auto t1 = std::chrono::steady_clock::now();
auto us = std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count();
@@ -90,10 +97,18 @@ void RelayServer::runWriter(ThreadPool<MsgWriter>::Thread &thr) {
// Index events for search (NIP-50)
// Always index and run search when provider exists, regardless of healthy() status
- // healthy() only gates NIP-11 advertisement
+ // healthy() only gates NIP-11 advertisement.
+ // Both indexEvent and the deleteEvent loop below run AFTER txn.commit() because
+ // each one opens its own top-level LMDB write txn, which would self-deadlock
+ // if attempted inside the outer write txn.
if (searchProvider) {
+ // Skip indexing for events that were also marked for deletion in this same
+ // batch (e.g. inserted then immediately superseded by a NIP-09 delete or a
+ // replaceable/d-tag replacement later in the batch). Avoids wasted index+delete.
+ std::unordered_set<uint64_t> deletedSet(deletedLevIdsForSearch.begin(), deletedLevIdsForSearch.end());
+
for (auto &newEvent : newEvents) {
- if (newEvent.status == EventWriteStatus::Written) {
+ if (newEvent.status == EventWriteStatus::Written && !deletedSet.count(newEvent.levId)) {
PackedEventView packed(newEvent.packedStr);
try {
searchProvider->indexEvent(newEvent.levId, newEvent.jsonStr, packed.kind(), packed.created_at());
@@ -103,6 +118,17 @@ void RelayServer::runWriter(ThreadPool<MsgWriter>::Thread &thr) {
}
}
}
+
+ // Deferred search-index removal for events deleted/replaced in this batch.
+ if (searchProvider->healthy()) {
+ for (auto levId : deletedLevIdsForSearch) {
+ try {
+ searchProvider->deleteEvent(levId);
+ } catch (std::exception &e) {
+ LE << "Search delete failed for levId=" << levId << ": " << e.what();
+ }
+ }
+ }
}
// Log |
Patched build — 21 h prod soakJust over a day in on the patched build. No new crashes, no re-wedges, writes flowing under steady production load:
One observation worth flagging — not a regression from the patch, just something to keep an eye on: write |
|
@leesalminen 2.9s isn't acceptable. I'm going to see what is causing the added latency to the write path, index writes are supposed to be non-blocking. I'll see if I can trace it down and either fix or optimize the implementation. Many thanks for reporting back! |
…hProvider::query - Move per-token df computation before getTotalDocs/getAvgDocLen calls - Return empty results immediately when any token has df==0 (AND semantics) - Eliminates O(N) corpus-stat cursor walks for no-match queries - Add dfs.reserve() and idfs.reserve() for minor vector allocation improvement
…ce in search indexing Bug 1 (cmd_search_reindex): held an LMDB read txn open while indexEventWithTxnHook opened a write txn on the same thread, violating LMDB's one-txn-per-thread rule (MDB_NOTLS not set). The decoded std::string_view was also used after the write txn could invalidate the underlying LMDB page. Fix: wrap the read phase in a nested scope that closes before indexEventWithTxnHook is called; copy decoded JSON view to std::string inside that scope. Bug 2a (runCatchupIndexer): same use-after-txn-close pattern — rtxn.commit() was called while the decoded std::string_view was still live, freeing LMDB-managed pages for uncompressed payloads. Fix: same owned-string copy pattern, rtxn closes inside its scope. Bug 2b (KindMatcher lazy init): mutable bool kindMatcherInitialized guarded by a bare if-check was a data race between the Writer thread and the catch-up indexer thread on concurrent first-use. Fix: replace with std::once_flag + std::call_once().
|
Pulled Verified fixedBug 3 — no-match query latency ( Previously ranged 191ms → 20s+. Fast path is doing exactly what it should. Bug 4 — writer wedge on d-tag deletes ( The journal shows Likely fixed (need full reindex to prove)Bug 1 — The I haven't re-run a full reindex yet to confirm (multi-hour on 26M events, and the tool only supports Bug 2 — intermittent crashes every 2–3 days: Was previously seeing SIGABRT/SIGSEGV restarts roughly every 2–3 days even without reindex activity. The Other observationsIndexer is keeping up well in live mode — batches of 100–700 events processed in <20ms, 10s tick interval. Most events get Publish → searchable latency: new event findable in search ~500ms after Common-term query latency is mostly 200–310ms, with one outlier:
Parallel search: 20 concurrent queries, median eose 1.8s, max 2.7s — reasonable. No errors.
Otherwise: clean startup, 0 restarts since deploy, no signals/aborts in journal. Will let it soak and report any regressions. Nice work on the fixes — the diagnosis on Bug 1 was sharp. |
It should be capped to
@leesalminen Could you try tweaking the config values, primarily Notes:
|
I'll add a |
- Re-index from a specific levId without clearing the existing index - Does not touch SearchState so normal resume continues to work - Mutually exclusive with --restart - Enables operators to validate fixes against known-bad ranges without a multi-hour full rebuild (PR hoytech#160 reviewer request)
|
@leesalminen Doesn't clear the index, doesn't touch Looking at the |
|
@leesalminen pause the perf work; I found something bigger. Probing the indexer: Minimal probe with production packing, levIds 1..16777216: So your Plan:
Hold off on a fresh reindex and on lowering |
Postings were packed as host-endian uint64 and stored in a plain MDB_DUPSORT DBI, so dup ordering used memcmp on little-endian bytes — which is not numeric. MDB_APPENDDUP returned MDB_KEYEXIST whenever a new levId's byte sequence was less than the prior stored entry (every 256/65536/16M boundary crossing), and lmdb::dbi_put silently swallows MDB_KEYEXIST. The caller didn't check the return value, so an unknown fraction of postings was silently dropped on every existing index. Empirical probe: 4/10 inserts dropped on monotonic levIds 1..16M. This commit: - packPosting/unpackPosting now use htobe64/be64toh so the stored bytes memcmp-compare in numeric order. APPENDDUP succeeds and postings are stored in levId-ascending order as the read and trim paths assume. - kIndexVersion bumped 1 -> 2. The old format is incompatible with the new packing, so existing operators must run `search_reindex --restart` to rebuild. - LmdbSearchProvider gains a lazy stale-detection guard (std::once_flag). If the on-disk indexVersion doesn't match kIndexVersion (or indexVersion==0 with leftover data from an interrupted pre-upgrade rebuild), staleIndex is set and indexEvent / indexEventWithTxnHook / deleteEvent / query / runCatchupIndexer / healthy() all short-circuit with a loud log instructing the operator to run search_reindex --restart. - Both dbi_SearchIndex.put and dbi_SearchDocMeta.put callsites now check the return value and log via LE on unexpected MDB_KEYEXIST. With big-endian packing in place these should never fire; any log is a real bug signal. - cmd_search_reindex gains an explicit stale guard: without --restart, on a stale or partially-rebuilt index, it errors out cleanly instead of silently no-op'ing through the provider's stale gate. - cmd_search_reindex --restart now uses dbi.drop(txn, false) to empty SearchIndex and SearchDocMeta. The previous cursor-delete loop walked MDB_NEXT_NODUP and called cursor.del() with no flag, which only deleted one duplicate per key — leaving most DUPSORT data behind. This bug pre-dates the byte-order fix but was surfaced by the new stale-detection logic finding leftover data after a supposed clear.
|
@leesalminen Schema fix is on Changes:
Upgrade path on your relay:
After the rebuild, please re-measure the |
Covers the property that MDB_DUPSORT depends on without a custom comparator: memcmp on packed posting bytes must match numeric order on the underlying levId. Catches the silent posting-drop regression just fixed in fba11d7. Four test layers: - Pack/unpack roundtrip across adversarial levIds (boundary values spanning every 8-bit byte: 0, 254, 255, 256, 65535, 65536, 16M-1, 16M, 2^47) crossed with tfs in {1, 2, 65535}. - memcmp ordering: sgn(memcmp(pack(a), pack(b))) == sgn(a-b) over every pair from the same boundary set. - Tripwire constant: packPosting(0x1234, 0xABCD) bytes must equal 00 00 00 00 12 34 AB CD. Any change to the pack formula is loud. - Live MDB_DUPSORT integration in a temp env via mkdtemp: MDB_APPENDDUP-insert for all boundary levIds, assert zero MDB_KEYEXIST returns, walk FIRST_DUP -> NEXT_DUP, assert numeric iteration order. Smoke-tested by temporarily reverting packPosting to host-endian: test produced 321 failures including 8 live MDB_APPENDDUP rejections matching the original empirically-discovered bug. Build target follows the existing test/SubIdTests.cpp pattern: `make test-search-posting`.
… as exit 1 Adds test/searchReindexTest.pl covering three CLI-level regressions surfaced during PR hoytech#160 testing: 1. Stale-index detection: search_reindex without --restart must refuse to operate on a v1 stored index, exit non-zero, and instruct the operator to re-run with --restart. The test plants a v1 indexVersion via `search_set_state --index-version=1 --allow-lower` and asserts the error message and exit code. 2. --restart idempotency: running search_reindex --restart twice in a row must index the same number of events. The latent cursor-delete bug (only deleting one dup per key, fixed in fba11d7) would cause the second pass to differ. Catches future regressions in the clear path. 3. --from-levid does not touch SearchState: build a v2 index, plant a sentinel lastIndexedLevId via search_set_state, run search_reindex --from-levid=1, then verify the sentinel survives by re-running search_set_state and asserting on the "previous:" echo. Smoke-tested by toggling persistState=true in the partial branch -- the test catches the regression as expected. Also converts cmd_search_reindex's error paths (search disabled, wrong backend, --restart/--from-levid mutex, stale index, in-progress mismatch, --from-levid<1) from `std::cerr; return` to `throw herr(...)`. main() catches std::exception, prints, and exits 1, so this is required for scripting/tests to detect failures. Matches the existing convention in cmd_search_set_state. Test config added at test/cfgs/searchTest.conf (search enabled, indexedKinds = "*"). Mirrors test/writeTest.pl style: nostril-signed kind-1 events imported via `./strfry import`, no WebSocket harness. Run with: perl test/searchReindexTest.pl
Overview
This PR implements NIP-50 (Search Capability) for strfry, enabling full-text search across Nostr events using BM25 ranking. The implementation includes:
Architecture
Core Components
Search Provider Interface (
src/search/SearchProvider.h)LMDB Search Backend (
src/search/LmdbSearchProvider.h)Background Indexer (in
LmdbSearchProvider::runCatchupIndexer())SearchState.lastIndexedLevIdSearch Runner (
src/search/SearchRunner.h)Database Schema
New LMDB tables (defined in
golpe.yaml):Configuration
Key settings in
strfry.conf(relay.search):relay { search { enabled = true # Enable NIP‑50 search backend = "lmdb" # or "noop" # Indexing/Query controls indexedKinds = "1, 30023" # Kind pattern: numbers, ranges, '*', exclusions (-A-B) maxQueryTerms = 16 # Max terms parsed from a query maxPostingsPerToken = 100000 # Cap per token (pruning/vacuum TBD) maxCandidateDocs = 1000 # Max candidate docs before scoring overfetchFactor = 5 # Fetch limit × factor, bounded by maxCandidateDocs # Recency tie-breaker (optional) recencyBoostPercent = 0 # Integer percent (0–100); 1 = 1% # Candidate pre-scoring ranking candidateRankMode = "order" # "order" | "weighted" candidateRanking = "terms-tf-recency" # When mode="order": see supported orders below rankWeightTerms = 100 # When mode="weighted": weight for matched terms rankWeightTf = 50 # When mode="weighted": weight for aggregate TF rankWeightRecency = 10 # When mode="weighted": weight for recency } }Supported
candidateRankingorders (desc for each component):terms-tf-recency(default)terms-recency-tftf-terms-recencytf-recency-termsrecency-terms-tfrecency-tf-termsConfiguration Parameters
enabled: Master switch for search functionalitybackend: Search provider implementation ("lmdb" or "noop")indexedKinds: Pattern of kinds to index (numbers/ranges/*/exclusions)maxQueryTerms: Maximum query terms parsedmaxPostingsPerToken: Max postings per token key (upper bound during fetch; pruning TBD)maxCandidateDocs: Maximum candidates for scoringoverfetchFactor: Candidate over-fetch before post-filteringrecencyBoostPercent: Recency tie-breaker percent (0–100; 1 = 1%)candidateRankMode:orderorweightedcandidateRanking: Order used when mode=order(list above)rankWeightTerms/rankWeightTf/rankWeightRecency: Weights for mode=weightedUsage
Enabling Search
Build strfry:
make -j$(nproc)Update
strfry.conf:relay { search { enabled = true backend = "lmdb" } }Start strfry:
Indexing behavior:
Search Queries
Clients can issue NIP-50 search queries using the
searchfilter field:{ "kinds": [1], "search": "bitcoin lightning network", "limit": 100 }Search features:
Monitoring
Background indexer logs:
Query metrics include search-specific timings when
relay.logging.dbScanPerf = true(scan=Search).Performance Characteristics
Indexing Performance
Query Performance
maxCandidateDocsand result set sizeTuning guidelines:
maxCandidateDocsfor faster queries with slightly lower recalloverfetchFactorto improve recall for multi-token queriesBenchmark Suite
Put something together for benchmarks, but didn't finish. Will likely remove it before marking ready for review
A comprehensive benchmark suite is included under `bench/`:Running Benchmarks
Prepare a test database:
This generates cryptographically valid Nostr events using
nakand ingests them into a fresh database.Run the benchmark:
bench/scripts/run.sh -s scenarios/small.yml --out bench/results/raw/small-$(date +%s)Generate reports:
Benchmark Metrics
Testing
Manual Testing
Index a test database:
Issue search queries via WebSocket:
Verify results are returned in relevance order
Integration Points
DBQuery.h: Search queries execute alongside traditional index scansActiveMonitors.h: Search filters excluded from live subscription indexes (one-shot queries)QueryScheduler.h: Search provider injected into query execution pathcmd_relay.cpp: Background indexer lifecycle managementMigration Notes
Existing Databases
For existing strfry installations:
cd golpe && ./build.sh && cd .. && makeThe indexer will automatically catch up on all existing events. Monitor logs for progress.
Rollback
To disable search without data loss:
relay.search.enabled = falsein configThe search tables remain in the database but are not used. They can be manually removed using the
mdbcommand-line tools if desired.Known Limitations
contentfield of events (does not index tags or metadata)maxCandidateDocsfor optimal performanceFuture Enhancements
Potential improvements for future iterations:
Related Issues
TODO List before eligible as candidate for merge