Skip to content

feat: scheduled ingest with deduplication#97

Open
Shriiii01 wants to merge 108 commits intoekailabs:mainfrom
Shriiii01:feature/scheduled-ingest-dedup
Open

feat: scheduled ingest with deduplication#97
Shriiii01 wants to merge 108 commits intoekailabs:mainfrom
Shriiii01:feature/scheduled-ingest-dedup

Conversation

@Shriiii01
Copy link
Copy Markdown
Contributor

  • Pass deduplicate through /v1/ingest to store.ingest() and Memory.add()
  • Add conversation log (JSONL) on each /v1/chat/completions, no live ingest
  • Add scripts/ingest-from-log.mjs: checkpoint-based, rate-limited 1 req/s
  • Config: CONVERSATION_LOG_PATH, INGEST_CHECKPOINT_PATH, MEMORY_INGEST_URL

Shriiii01 and others added 30 commits February 7, 2026 16:54
- Add OllamaProvider class with OpenAI-compatible API support
- Register Ollama in ProviderRegistry with model selection rules
- Add Ollama configuration to AppConfig (baseUrl, apiKey, enabled)
- Add Ollama to chat_completions_providers_v1.json catalog with 16 popular models
- Add ollama.yaml pricing file (free/local models)
- Update ProviderName type to include 'ollama'
- Add OLLAMA_BASE_URL and OLLAMA_API_KEY to .env.example

Ollama runs models locally and exposes an OpenAI-compatible API at
http://localhost:11434/v1 by default. Users can configure a custom
base URL via OLLAMA_BASE_URL environment variable.
- Added Ollama to responses_providers_v1.json catalog
- Created OllamaResponsesPassthrough class implementing Responses API
- Registered Ollama in responses-passthrough-registry.ts

Ollama supports the OpenResponses API specification at /v1/responses endpoint,
providing future-proof support as /chat/completions may be deprecated.
# Conflicts:
#	gateway/src/infrastructure/config/app-config.ts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntegration

Add OpenRouter integration with unified launcher and Docker fixes
Changed 'wait -n' to 'wait' to keep the container running indefinitely
instead of exiting when the first service exits. This allows all
services (gateway, dashboard, memory, openrouter) to continue running.

Fixes issue where container would restart repeatedly with exit code 0.
- Document all 4 services and their ports (gateway, dashboard, memory, openrouter)
- Add Docker service control via ENABLE_* environment variables
- Clarify OpenRouter integration service runs on port 4010 (not 4006)
- Add Docker Compose section with service management instructions
- Document Docker restart behavior and service lifecycle
…ntegration

Fix Docker fullstack entrypoint and document service configuration
Changed video URL from hZC1Y_dWdhI to sLg9YmYtg64 in:
- README.md
- docs/ROFL_DEPLOYMENT.md

Uses GitBook-compatible embed syntax.
…ntegration

Update YouTube demo video URLs and GitBook embeds
The sectorColors/sectorDescriptions maps and MemorySectorSummary type
only had 3 sectors (episodic, semantic, procedural) but the memory
service also returns reflective memories, causing an undefined .includes()
crash in MemoryStrength.
sm86 and others added 19 commits February 25, 2026 20:42
Agents can now set a relevancePrompt that gates ingest — an LLM checks
if incoming content matches the agent's scope before extraction/embedding.
Irrelevant content is rejected early with a reason. Adds GET/PUT single
agent endpoints and updates README with new API docs and flow diagram.
Replace silent INSERT OR IGNORE with a strict existence check that throws
agent_not_found. Only the default agent is auto-created at init via a
private upsertDefaultAgent(). Add routeError helper to router so all
catch blocks return 404 on agent_not_found instead of 500.
- New /agents page: agent cards with soul/relevance prompts, per-agent
  stats (users, episodic, semantic, procedural), edit/create/delete modals
- api.ts: add soulMd + relevancePrompt to getAgents() type; add
  createAgent() and updateAgent() calls
- layout.tsx: add global top nav with Memory Vault and Agents links
- memory/page.tsx: offset sticky header to top-11 to clear global nav
Dedup, relevance gate, and agents dashboard
Generate standalone package-lock.json files for memory and
integrations/openrouter, and switch all runtime stages from
npm install --omit=dev to npm ci --omit=dev. This eliminates
npm registry calls during Docker builds, fixing intermittent
403 rate-limit failures in CI.
Add sqlite-vec extension to @ekai/memory, replacing the 200-row
linear scan + JS cosine similarity with proper ANN indexing via
vec0 virtual tables with cosine distance metric.

- Add sqlite-vec dependency, load extension on construction
- Create vec0 virtual tables (memory_vec, procedural_vec,
  semantic_vec, reflective_vec) lazily on first embedding
- Insert into vec tables alongside main tables on write path
- Replace getCandidatesForSector 200-row scan with two-step
  KNN: ANN query on vec table, then filter via main table
- Replace findDuplicate linear scans with vec KNN queries
- Update scoreRowPBWM to accept precomputed similarity
- Make embedding optional on record types (query results
  no longer carry full embedding arrays)
- Stop selecting embedding in semantic graph traversal queries
- Clean up vec tables on delete operations
Embedding is always present on write and never read on query path
(similarity is precomputed by sqlite-vec). Graph traversal methods
now return Omit<SemanticMemoryRecord, 'embedding'> since they are
structural queries that don't select the embedding column.
Add sqlite-vec ANN vector search to @ekai/memory
The standalone memory/package-lock.json was stale after sqlite-vec was
added to package.json, causing npm ci to fail in the Docker build.
Regenerate memory lockfile to include sqlite-vec dependency
Lifecycle event logger that registers all 13 OpenClaw hooks via
api.registerHook() and appends JSONL entries with safe serialization.
Published to npm as @ekai/contexto.
Extract event storage from openclaw plugin into @ekai/store workspace
with EventWriter (normalization, safe serialization, per-session JSONL
files) and EventReader (session listing, reconstruction with tool call
pairing and userId attribution). Includes path-traversal protection,
chronological event ordering, and 48 tests.
- Fix durationMs: 0 being overwritten by computed value (falsy check → else-if)
- Sync runtime configSchema with manifest (declare dataDir property)
- Reorder root build: install first for clean-env safety, store before dependents
- Remove redundant double-resolve in reconstructSession
- Fix misleading test name for raw ID storage behavior
- Simplify rawAgentId/rawSessionId storage: remove dead !== check since
  sanitizeId always appends a hash suffix, just check if input is present
- Add _error optional field to StoreEvent and AppendInput interfaces to
  reflect the serialization-failure fallback that appears in JSONL output
Add OpenClaw plugin and JSONL event store
- Pass deduplicate through /v1/ingest to store.ingest() and Memory.add()
- Add conversation log (JSONL) on each /v1/chat/completions, no live ingest
- Add scripts/ingest-from-log.mjs: checkpoint-based, rate-limited 1 req/s
- Config: CONVERSATION_LOG_PATH, INGEST_CHECKPOINT_PATH, MEMORY_INGEST_URL
@Shriiii01
Copy link
Copy Markdown
Contributor Author

Shriiii01 commented Mar 3, 2026

Fixes #90

Happy to adjust based on feedback.

sm86 and others added 4 commits March 6, 2026 02:02
- Resolve README.md: keep archived notice and project description
- Resolve README.md and .env.example
- Keep scheduled ingest env vars, drop gateway from merge result
@Shriiii01 Shriiii01 force-pushed the feature/scheduled-ingest-dedup branch 2 times, most recently from 99e109b to dd08087 Compare March 7, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants