Skip to content

feat: implement Jeen Skills Integration Framework (G3)#17

Open
stavp-star wants to merge 1 commit into
stav/agent-architecture-performencefrom
stav/enable-skills
Open

feat: implement Jeen Skills Integration Framework (G3)#17
stavp-star wants to merge 1 commit into
stav/agent-architecture-performencefrom
stav/enable-skills

Conversation

@stavp-star

@stavp-star stavp-star commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR implements the Jeen Skills Integration Framework (G3), loading domain-specific knowledge (prompt fragments) from the Jeen API, caching them using Redis, and injecting them dynamically at runtime.

Key Changes

  1. JeenSkillClient (agent/src/agent/utils/jeen_client.py): Fetches skills from POST /admin/assets/skills/by-ids via HTTP with configurable API URL and API keys. Gracefully bypasses skill loading if JEEN_LLM_CORE_URL or JEEN_API_KEY are not set.
  2. SkillRegistry (agent/src/agent/utils/skill_registry.py): Manages caching with Redis and compiling skills into prompt strings. Respects the new SKILLS_HOT_RELOAD setting by bypassing the cache when enabled.
  3. Init Skills Node (agent/src/agent/nodes/init_skills.py): Centralizes the skills I/O at the start of the LangGraph flow, updating the state and injecting skills_loaded names into the Langfuse trace metadata.
  4. LangGraph Updates (agent/src/agent/graph.py): Added init_skills node into the execution path, keeping subsequent reasoning nodes pure and relying only on the state.
  5. Node Injections: Updated schema_explorer.py and query_builder.py to inject compiled skill prompt fragments.

Open Questions & Platform Blockers

🛑 Platform Blocks & Limitations:

  1. Skill Tags (TTS-G3-01): The Jeen llm-core API only supports retrieving skills by ID (by-ids / by-users). It does not support a by-tags endpoint or tag-based queries, nor does SkillResponseDto contain tags. As a result, the backend/MCP client must pass explicit IDs in the active_skills list to load.
  2. Skill Types (TTS-G3-01 / TTS-G3-02): Currently, Jeen's SkillResponseDto only provides systemPromptFragment and lacks type indicators (like prompt_fragment, tool, or few_shot). The agent treats all incoming skills as prompt fragments.

❓ Open Questions for Discussion:

  1. Mocking base skills: Should we mock base skills (e.g. general Trino/text2sql skills) locally in the agent for now, or wait for the backend to support listing/tags?
  2. Support for tool and few-shot types: When the backend adds support for non-prompt skills (tools and few-shots), will the schema of SkillResponseDto be updated with structured fields (e.g. skillType, toolDefinition, fewShotExamples) or will it require a custom JSON parsing format?

Summary by CodeRabbit

  • New Features

    • Satisfaction checks validate query results before finalization.
    • Schema enrichment features: semantic typing, join graph analysis, table summarization.
    • Dynamic skills system with hot-reload capability.
    • Strict and hybrid table scoping modes for access control.
    • Human-in-the-loop escalation path for complex queries.
    • Enhanced query refinement with error tracking and context accumulation.
  • Bug Fixes

    • Resilient fallback for data preview when external storage fails.
    • Configuration validation at application startup.
  • Chores

    • Simplified dependency installation and workspace configuration.
    • Comprehensive test coverage added.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR modernizes the build infrastructure by replacing SSH-based Git dependencies with local python-core-utils path sources, rebuilds the LangGraph agent topology with a hardened G2 control flow (HITL escalation node, satisfaction-check gate, validate_config entry point), introduces a Jeen-backed skill system with Redis caching, adds four schema enrichment phases, refactors backend Langfuse tracing from langfuse_context to direct client API, removes scoring.py, and adds a comprehensive test suite.

Changes

Agent G2: Skills, Schema Enrichment, Graph Topology & Node Hardening

Layer / File(s) Summary
AgentState, AgentSettings, and LLM factory
agent/src/agent/state.py, agent/src/agent/config.py, agent/src/agent/llm.py, agent/src/agent/utils/sql.py
AgentState gains 13 new optional fields (skills, error history, scoping mode, HITL escalation, satisfaction tracking). AgentSettings adds Redis URL, Langfuse keys, Jeen config, TABLE_SCOPING_MODE, schema enrichment toggles, satisfaction thresholds, and cache TTLs. get_llm factory replaces direct ChatOpenAI construction. clean_sql normalizes LLM-generated SQL.
JeenSkillClient, SkillRegistry, and init_skills_node
agent/src/agent/utils/jeen_client.py, agent/src/agent/utils/skill_registry.py, agent/src/agent/nodes/init_skills.py
JeenSkillClient fetches active skills from Jeen API with bearer auth. SkillRegistry wraps Redis mget caching with Jeen fallback and builds system prompt additions. init_skills_node loads skills at graph entry, emits Langfuse trace, returns loaded_skills: None on failure.
Schema enrichment phases
agent/src/agent/utils/schema_enrichment.py
Adds run_semantic_typing, run_join_graph (networkx/BFS), run_schema_summarization, and run_ambiguity_detection as feature-gated async LLM enrichment phases with Pydantic output models including ColumnCoverageOutput and SemanticAlignmentOutput reused by satisfaction checks.
schema_explorer_node major rewrite
agent/src/agent/nodes/schema_explorer.py
hybrid_search_tables gains scoping_mode parameter for strict vs hybrid allowlisting. get_table_profile adds Redis cache read/write with ESCA persistence. schema_explorer_node runs optional enrichment phases with a semaphore, injects skills, verifies hallucinated tables via Trino info schema, tracks retries, and returns structured state including escalation_reason.
get_esca_client utility and node refactoring
agent/src/agent/utils/esca.py, agent/src/agent/nodes/extractor.py, agent/src/agent/nodes/query_builder.py, agent/src/agent/nodes/refiner.py, agent/src/agent/nodes/finalizer.py
get_esca_client async context manager replaces direct EscaClient construction. TimeExtractor added. refiner_node accumulates error_history, runs Trino in asyncio.to_thread, captures esca_write_failed, returns inline_result_rows. finalizer_node builds inline preview from inline_result_rows when ESCA write failed. All nodes use get_llm(...) factory.
satisfaction_check_node
agent/src/agent/nodes/satisfaction_check.py
Four feature-flagged checks (execution error, row count bounds, column coverage via LLM, semantic alignment via LLM) run after refiner. Returns satisfaction_failures, satisfaction_fail_count, and escalation_reason when failures reach the configured maximum.
LangGraph G2 topology
agent/src/agent/graph.py
Adds validate_config_node (fail-fast strict mode), hitl_escalation_node (Langfuse tagging), and routing helpers route_schema_explorer, route_refiner, route_satisfaction, route_query_builder, route_rejection. Wires validate_config → init_skills → extractor → schema_explorer → query_builder → refiner → satisfaction_check → finalizer, with HITL escalation compiled with interrupt_before=["hitl_escalation"].
Agent API: chat router and MCP server
agent/src/agent/routers/chat.py, agent/src/agent/mcp_server.py
chat_endpoint replaces module-level Langfuse setup with get_langfuse_handler DI and adds schema_plan to completed responses. chat_with_agent adds active_skills parameter and state validation for resume paths.
Test suite
agent/tests/conftest.py, agent/tests/test_routing.py, agent/tests/test_resilience.py, agent/tests/test_isolation.py, agent/tests/test_cache_and_gates.py
conftest.py provides mock_llm (autouse), mock_langfuse (autouse), mock_redis, mock_trino, mock_esca fixtures. Tests cover routing/escalation logic, refiner error accumulation, ESCA fallback resilience, profile-fetch concurrency, satisfaction-check gating, CacheService SCAN eviction, and Langfuse handler isolation.

Infrastructure & Shared Core Modernization

Layer / File(s) Summary
SSH removal, local python-core-utils, compose wiring
agent/Dockerfile, backend/Dockerfile, docker-compose.yml, agent/pyproject.toml, backend/pyproject.toml, core/pyproject.toml, .gitignore
Both Dockerfiles drop openssh-client and SSH secret mounts in favor of git + uv sync over HTTPS, copying python-core-utils locally. All pyproject.toml files switch to local path sources. docker-compose activates backend seed/init command and removes deploy_key secret.
CacheService: Redis-backed shared cache
core/src/core/cache.py, core/src/core/__init__.py
New CacheService provides async get/set/invalidate with fallback semantics, JSON helpers, typed key builders, and profile invalidation via SCAN/pipeline. get_cache_service singleton factory exported from core package.
CoreSettings Langfuse fields, get_langfuse_handler, trino params, model cleanup
core/src/core/config.py, core/src/core/langfuse.py, core/src/core/trino.py, core/src/core/models/models.py
CoreSettings gains Langfuse credential fields. get_langfuse_handler added as a safe FastAPI dependency. execute_query_sync gains optional params argument. EvaluationHistoryMetricRead and EvalResultStatus removed from core models.

Backend Langfuse Refactor, Cleanup & Health Endpoints

Layer / File(s) Summary
Langfuse context → direct client API migration
backend/app/services/langfuse_client.py, backend/app/services/evaluator.py, backend/app/routers/evaluation.py, backend/app/routers/orchestration.py, backend/app/services/llm_judge.py, backend/pyproject.toml
langfuse_context usage removed across all files; replaced with langfuse_client.client.trace(...) calls guarded by trace ID availability. get_prompt_as_langchain removed. evaluate_with_llm decorated with @observe. Langfuse unpinned to >=2.0.0.
scoring.py removal and publish regression gate removal
backend/app/services/scoring.py, backend/app/routers/publish.py, backend/app/services/trino_client.py, backend/app/services/profiling_engine.py
scoring.py deleted (removes all scoring dataclasses and functions). publish.py removes _check_regression helper. profiling_engine.py imports execute_query_sync from core.
OPENAI_API_KEY validation, health endpoints, infra_init env-vars
backend/app/config.py, backend/app/main.py, backend/app/routers/health.py, backend/app/infra_init.py
Settings gains OPENAI_API_KEY. Startup raises RuntimeError when missing. New GET /health/judge and GET /health/esca endpoints added. infra_init.py replaces hardcoded Docker hostnames with os.environ lookups.
Import reordering across backend files
backend/app/routers/admin_approval.py, backend/app/routers/admin_auth.py, backend/app/routers/audit.py, backend/app/routers/enrichment.py, backend/app/routers/feedback.py, backend/app/routers/questions.py, backend/app/routers/scopes.py, backend/app/services/auth.py, backend/app/services/join_detection.py
Pure import reordering to place core imports before fastapi/sqlmodel imports; no logic changes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ChatEndpoint as chat_endpoint
  participant Graph as LangGraph G2
  participant SchemaExplorer as schema_explorer_node
  participant Refiner as refiner_node
  participant SatisfactionCheck as satisfaction_check_node
  participant Finalizer as finalizer_node
  participant HITL as hitl_escalation_node

  Client->>ChatEndpoint: POST /chat (query, active_skills)
  ChatEndpoint->>Graph: ainvoke(state, config={callbacks})
  Graph->>Graph: validate_config_node
  Graph->>Graph: init_skills_node (Redis/Jeen)
  Graph->>SchemaExplorer: schema_explorer_node (scoping, enrichment, hallucination check)
  SchemaExplorer-->>HITL: hallucinated tables + retries exhausted
  SchemaExplorer->>Refiner: schema_plan ready
  Refiner->>SatisfactionCheck: SQL executed, inline_result_rows
  SatisfactionCheck-->>Finalizer: all checks pass
  SatisfactionCheck-->>Refiner: retry (fail_count < max)
  SatisfactionCheck-->>HITL: fail_count >= SATISFACTION_MAX_FAILURES
  HITL-->>Client: interrupt (escalation_reason)
  Finalizer->>ChatEndpoint: summary, sql_explanation, schema_plan
  ChatEndpoint->>Client: ChatResponse
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • StavPonte11/text2sql-onboarding#10: Modifies rejection_router_node and route_* routing logic in agent/src/agent/graph.py, directly overlapping with the G2 graph topology rebuilt in this PR.
  • StavPonte11/text2sql-onboarding#13: Modifies agent/src/agent/mcp_server.py's chat_with_agent function contract, the same entry point extended here with active_skills and state validation.

Poem

🐇 Hop, hop through the graph we go,
Skills from Jeen, Redis all aglow,
Satisfaction checks in a row,
HITL gate for when things go slow,
SSH keys? We let those go! 🗝️

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stav/enable-skills

@elirazpevz elirazpevz changed the base branch from main to stav/agent-architecture-performence June 22, 2026 14:38
@elirazpevz

Copy link
Copy Markdown
Collaborator

A few things:

  • There is now a way to load skills, but I did not see a way for the LLM to actually request / remove skills? so it seems like it either is all loaded or nothing is loaded, both options not solving the actual problem.
  • We should consider using a deep agent instead of raw LLM if we want them to access skills - it has built in skill fetching
  • I prefer implementing this feature only if we are actually going to use it - so, we should think what skills we actually have and where we want to use them. this is important for implementation - for example, I'm not convinced jeen will be our only skills source, and I am not sure which parts of the graph will even use them. I want a more concrete plan for this feature and from the existing code it's not clear how this feature is actually going to work / be used
  • as mentioned, I prefer an abstraction for the skill source - we might want to add skills locally / from other sources and it would be best to be agnostic to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants