Skip to content

Novelty check always returns near_duplicate: FHE score exceeds cosine similarity range #74

@sunchuljung

Description

@sunchuljung

Summary

Novelty check in _capture_single() always classifies new captures as near_duplicate, blocking all captures regardless of actual semantic similarity.

Symptoms

  • capture MCP tool returns {"captured": false, "reason": "Near-duplicate"} for completely unrelated topics
  • Example: a debugging/bugfix record (MCP server shutdown crash) is flagged as near-duplicate of marketing strategy records
  • novelty.related shows similarity scores far exceeding [0, 1] range:
    "related": [
      {"similarity": 3.78},
      {"similarity": 0.578},
      {"similarity": -1.516}
    ]

Root Cause (Observed)

  1. server.py:1330 takes max(score) from Vault's DecryptScores response
  2. _classify_novelty() checks if max_sim >= 0.93near_duplicate
  3. Vault returns score = 3.78, so 3.78 >= 0.93 → always near_duplicate

The novelty thresholds in agents/common/schemas/embedding.py were calibrated assuming cosine similarity in [0, 1]:

NOVELTY_THRESHOLD_NOVEL = 0.4
NOVELTY_THRESHOLD_RELATED = 0.7
NOVELTY_THRESHOLD_NEAR_DUPLICATE = 0.93

Verification

Plaintext cosine similarity (Qwen3-Embedding-0.6B, L2-normalized vectors) between the same texts:

debug_shutdown vs cosine similarity
biz_marketing 0.28
biz_sales 0.23
biz_gaia 0.30

All below 0.4 (novel), confirming vectors are properly L2-normalized and the texts are genuinely dissimilar. The FHE-decrypted score from Vault does not match plaintext cosine similarity.

Impact

  • All captures are blocked once any record exists in the index
  • engineering/bugfix domain captures show 0 records despite multiple attempts
  • Only records captured before novelty check was introduced (2026-04-08) exist

References

  • Novelty check logic: mcp/server/server.py:1314-1348
  • Classification: agents/common/schemas/embedding.py:33-56
  • Vault decrypt: rune-admin/vault/vault_core.py:236-290
  • FHE scoring: mcp/adapter/envector_sdk.py:204-228 (Index.scoring()cipher.decrypt_score())

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions