Skip to content

ruvector: dead external parallel-worker import (ruvector-onnx-embeddings-wasm/parallel) in ONNX embedder #531

Description

@ruvnet

Summary

The ONNX embedder in the ruvector npm package references an external parallel-worker package — ruvector-onnx-embeddings-wasm/parallel — that is not installed and not declared in package.json. Both import sites resolve to catch blocks, so the external parallel path is effectively dead. A bundled, zero-dependency worker_threads pool now serves as the fallback (and is the real implementation), but the dead external references remain in the code and the dynamic import throws on every init.

Observed first in 0.2.25; still present in 0.2.27 (current npm/packages/ruvector/package.json).

Where

npm/packages/ruvector/src/core/onnx-embedder.ts

Site 1 — capability probe (detectParallelAvailable), line 125:

async function detectParallelAvailable(): Promise<boolean> {
  try {
    await dynamicImport('ruvector-onnx-embeddings-wasm/parallel'); // always throws — pkg not installed/declared
    parallelAvailable = true;
    return true;
  } catch {
    parallelAvailable = false;
    return false;
  }
}

Site 2 — init (tryInitParallel), line 156:

// 1) Optional external package (back-compat). Absent by default.
try {
  const parallelModule = await dynamicImport('ruvector-onnx-embeddings-wasm/parallel');
  const { ParallelEmbedder } = parallelModule;
  ...
} catch {
  // External package not installed — fall through to the bundled pool.
}

The bundled fallback (lines 173–206) loads onnx/bundled-parallel.mjs and is gated behind enableParallel === true.

Why it matters

  • The external import throws on every init that reaches it, even though it's caught. It's dead code that misleads readers into thinking an external multi-core path exists.
  • The dependency ruvector-onnx-embeddings-wasm/parallel was explicitly rejected in docs/adr/ADR-194-ruvector-onnx-embedder-api-and-throughput.md (§3 Alternatives — "rejected — not installed, not a declared dependency; a ~150-line zero-dep bundled pool replaces it"). The code does not reflect that decision.
  • detectParallelAvailable() reports availability based on the missing external package, not on the bundled pool that actually ships — so the capability signal is wrong.

Performance context

Single-thread WASM floor is ~192 ms/embed (~5.2 eps, batch-32 gives ~no speedup) — scripts/bench/onnx-bench-results.json. The bundled pool (30 workers) reaches ~72.8 eps (~14× speedup, perfect cosine equivalence, minCos = 1.0). At the single-thread floor a large-corpus index pass runs as a multi-tens-of-minutes background job; the bundled pool is what makes that path viable, so the dead external reference shadowing it is worth cleaning up.

Proposed fix

  1. Remove both dynamicImport('ruvector-onnx-embeddings-wasm/parallel') references (lines 125, 156) since the package is rejected by ADR-194 and will never resolve.
  2. Make detectParallelAvailable() reflect the bundled pool's availability (model bytes loaded + worker_threads usable), not the absent external package.
  3. Keep the bundled-pool path (lines 173–206) as the single parallel implementation.
  4. Add a brief note in the embedder doc / ADR-194 that the external package was removed from code, not just rejected on paper.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions