feat: implement Jeen Skills Integration Framework (G3) by stavp-star · Pull Request #17 · StavPonte11/text2sql-onboarding

stavp-star · 2026-06-16T22:20:34Z

Description

This PR implements the Jeen Skills Integration Framework (G3), loading domain-specific knowledge (prompt fragments) from the Jeen API, caching them using Redis, and injecting them dynamically at runtime.

Key Changes

JeenSkillClient (agent/src/agent/utils/jeen_client.py): Fetches skills from POST /admin/assets/skills/by-ids via HTTP with configurable API URL and API keys. Gracefully bypasses skill loading if JEEN_LLM_CORE_URL or JEEN_API_KEY are not set.
SkillRegistry (agent/src/agent/utils/skill_registry.py): Manages caching with Redis and compiling skills into prompt strings. Respects the new SKILLS_HOT_RELOAD setting by bypassing the cache when enabled.
Init Skills Node (agent/src/agent/nodes/init_skills.py): Centralizes the skills I/O at the start of the LangGraph flow, updating the state and injecting skills_loaded names into the Langfuse trace metadata.
LangGraph Updates (agent/src/agent/graph.py): Added init_skills node into the execution path, keeping subsequent reasoning nodes pure and relying only on the state.
Node Injections: Updated schema_explorer.py and query_builder.py to inject compiled skill prompt fragments.

Open Questions & Platform Blockers

🛑 Platform Blocks & Limitations:

Skill Tags (TTS-G3-01): The Jeen llm-core API only supports retrieving skills by ID (by-ids / by-users). It does not support a by-tags endpoint or tag-based queries, nor does SkillResponseDto contain tags. As a result, the backend/MCP client must pass explicit IDs in the active_skills list to load.
Skill Types (TTS-G3-01 / TTS-G3-02): Currently, Jeen's SkillResponseDto only provides systemPromptFragment and lacks type indicators (like prompt_fragment, tool, or few_shot). The agent treats all incoming skills as prompt fragments.

❓ Open Questions for Discussion:

Mocking base skills: Should we mock base skills (e.g. general Trino/text2sql skills) locally in the agent for now, or wait for the backend to support listing/tags?
Support for tool and few-shot types: When the backend adds support for non-prompt skills (tools and few-shots), will the schema of SkillResponseDto be updated with structured fields (e.g. skillType, toolDefinition, fewShotExamples) or will it require a custom JSON parsing format?

Summary by CodeRabbit

New Features
- Satisfaction checks validate query results before finalization.
- Schema enrichment features: semantic typing, join graph analysis, table summarization.
- Dynamic skills system with hot-reload capability.
- Strict and hybrid table scoping modes for access control.
- Human-in-the-loop escalation path for complex queries.
- Enhanced query refinement with error tracking and context accumulation.
Bug Fixes
- Resilient fallback for data preview when external storage fails.
- Configuration validation at application startup.
Chores
- Simplified dependency installation and workspace configuration.
- Comprehensive test coverage added.

coderabbitai · 2026-06-16T22:20:46Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR modernizes the build infrastructure by replacing SSH-based Git dependencies with local python-core-utils path sources, rebuilds the LangGraph agent topology with a hardened G2 control flow (HITL escalation node, satisfaction-check gate, validate_config entry point), introduces a Jeen-backed skill system with Redis caching, adds four schema enrichment phases, refactors backend Langfuse tracing from langfuse_context to direct client API, removes scoring.py, and adds a comprehensive test suite.

Changes

Agent G2: Skills, Schema Enrichment, Graph Topology & Node Hardening

Layer / File(s)	Summary
AgentState, AgentSettings, and LLM factory `agent/src/agent/state.py`, `agent/src/agent/config.py`, `agent/src/agent/llm.py`, `agent/src/agent/utils/sql.py`	`AgentState` gains 13 new optional fields (skills, error history, scoping mode, HITL escalation, satisfaction tracking). `AgentSettings` adds Redis URL, Langfuse keys, Jeen config, `TABLE_SCOPING_MODE`, schema enrichment toggles, satisfaction thresholds, and cache TTLs. `get_llm` factory replaces direct `ChatOpenAI` construction. `clean_sql` normalizes LLM-generated SQL.
JeenSkillClient, SkillRegistry, and init_skills_node `agent/src/agent/utils/jeen_client.py`, `agent/src/agent/utils/skill_registry.py`, `agent/src/agent/nodes/init_skills.py`	`JeenSkillClient` fetches active skills from Jeen API with bearer auth. `SkillRegistry` wraps Redis mget caching with Jeen fallback and builds system prompt additions. `init_skills_node` loads skills at graph entry, emits Langfuse trace, returns `loaded_skills: None` on failure.
Schema enrichment phases `agent/src/agent/utils/schema_enrichment.py`	Adds `run_semantic_typing`, `run_join_graph` (networkx/BFS), `run_schema_summarization`, and `run_ambiguity_detection` as feature-gated async LLM enrichment phases with Pydantic output models including `ColumnCoverageOutput` and `SemanticAlignmentOutput` reused by satisfaction checks.
schema_explorer_node major rewrite `agent/src/agent/nodes/schema_explorer.py`	`hybrid_search_tables` gains `scoping_mode` parameter for strict vs hybrid allowlisting. `get_table_profile` adds Redis cache read/write with ESCA persistence. `schema_explorer_node` runs optional enrichment phases with a semaphore, injects skills, verifies hallucinated tables via Trino info schema, tracks retries, and returns structured state including `escalation_reason`.
get_esca_client utility and node refactoring `agent/src/agent/utils/esca.py`, `agent/src/agent/nodes/extractor.py`, `agent/src/agent/nodes/query_builder.py`, `agent/src/agent/nodes/refiner.py`, `agent/src/agent/nodes/finalizer.py`	`get_esca_client` async context manager replaces direct `EscaClient` construction. `TimeExtractor` added. `refiner_node` accumulates `error_history`, runs Trino in `asyncio.to_thread`, captures `esca_write_failed`, returns `inline_result_rows`. `finalizer_node` builds inline preview from `inline_result_rows` when ESCA write failed. All nodes use `get_llm(...)` factory.
satisfaction_check_node `agent/src/agent/nodes/satisfaction_check.py`	Four feature-flagged checks (execution error, row count bounds, column coverage via LLM, semantic alignment via LLM) run after refiner. Returns `satisfaction_failures`, `satisfaction_fail_count`, and `escalation_reason` when failures reach the configured maximum.
LangGraph G2 topology `agent/src/agent/graph.py`	Adds `validate_config_node` (fail-fast strict mode), `hitl_escalation_node` (Langfuse tagging), and routing helpers `route_schema_explorer`, `route_refiner`, `route_satisfaction`, `route_query_builder`, `route_rejection`. Wires validate_config → init_skills → extractor → schema_explorer → query_builder → refiner → satisfaction_check → finalizer, with HITL escalation compiled with `interrupt_before=["hitl_escalation"]`.
Agent API: chat router and MCP server `agent/src/agent/routers/chat.py`, `agent/src/agent/mcp_server.py`	`chat_endpoint` replaces module-level Langfuse setup with `get_langfuse_handler` DI and adds `schema_plan` to completed responses. `chat_with_agent` adds `active_skills` parameter and state validation for resume paths.
Test suite `agent/tests/conftest.py`, `agent/tests/test_routing.py`, `agent/tests/test_resilience.py`, `agent/tests/test_isolation.py`, `agent/tests/test_cache_and_gates.py`	`conftest.py` provides `mock_llm` (autouse), `mock_langfuse` (autouse), `mock_redis`, `mock_trino`, `mock_esca` fixtures. Tests cover routing/escalation logic, refiner error accumulation, ESCA fallback resilience, profile-fetch concurrency, satisfaction-check gating, CacheService SCAN eviction, and Langfuse handler isolation.

Infrastructure & Shared Core Modernization

Layer / File(s)	Summary
SSH removal, local python-core-utils, compose wiring `agent/Dockerfile`, `backend/Dockerfile`, `docker-compose.yml`, `agent/pyproject.toml`, `backend/pyproject.toml`, `core/pyproject.toml`, `.gitignore`	Both Dockerfiles drop `openssh-client` and SSH secret mounts in favor of `git` + `uv sync` over HTTPS, copying `python-core-utils` locally. All `pyproject.toml` files switch to local path sources. docker-compose activates backend seed/init command and removes `deploy_key` secret.
CacheService: Redis-backed shared cache `core/src/core/cache.py`, `core/src/core/__init__.py`	New `CacheService` provides async get/set/invalidate with fallback semantics, JSON helpers, typed key builders, and profile invalidation via SCAN/pipeline. `get_cache_service` singleton factory exported from `core` package.
CoreSettings Langfuse fields, get_langfuse_handler, trino params, model cleanup `core/src/core/config.py`, `core/src/core/langfuse.py`, `core/src/core/trino.py`, `core/src/core/models/models.py`	`CoreSettings` gains Langfuse credential fields. `get_langfuse_handler` added as a safe FastAPI dependency. `execute_query_sync` gains optional `params` argument. `EvaluationHistoryMetricRead` and `EvalResultStatus` removed from core models.

Backend Langfuse Refactor, Cleanup & Health Endpoints

Layer / File(s)	Summary
Langfuse context → direct client API migration `backend/app/services/langfuse_client.py`, `backend/app/services/evaluator.py`, `backend/app/routers/evaluation.py`, `backend/app/routers/orchestration.py`, `backend/app/services/llm_judge.py`, `backend/pyproject.toml`	`langfuse_context` usage removed across all files; replaced with `langfuse_client.client.trace(...)` calls guarded by trace ID availability. `get_prompt_as_langchain` removed. `evaluate_with_llm` decorated with `@observe`. Langfuse unpinned to `>=2.0.0`.
scoring.py removal and publish regression gate removal `backend/app/services/scoring.py`, `backend/app/routers/publish.py`, `backend/app/services/trino_client.py`, `backend/app/services/profiling_engine.py`	`scoring.py` deleted (removes all scoring dataclasses and functions). `publish.py` removes `_check_regression` helper. `profiling_engine.py` imports `execute_query_sync` from `core`.
OPENAI_API_KEY validation, health endpoints, infra_init env-vars `backend/app/config.py`, `backend/app/main.py`, `backend/app/routers/health.py`, `backend/app/infra_init.py`	`Settings` gains `OPENAI_API_KEY`. Startup raises `RuntimeError` when missing. New `GET /health/judge` and `GET /health/esca` endpoints added. `infra_init.py` replaces hardcoded Docker hostnames with `os.environ` lookups.
Import reordering across backend files `backend/app/routers/admin_approval.py`, `backend/app/routers/admin_auth.py`, `backend/app/routers/audit.py`, `backend/app/routers/enrichment.py`, `backend/app/routers/feedback.py`, `backend/app/routers/questions.py`, `backend/app/routers/scopes.py`, `backend/app/services/auth.py`, `backend/app/services/join_detection.py`	Pure import reordering to place `core` imports before `fastapi`/`sqlmodel` imports; no logic changes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ChatEndpoint as chat_endpoint
  participant Graph as LangGraph G2
  participant SchemaExplorer as schema_explorer_node
  participant Refiner as refiner_node
  participant SatisfactionCheck as satisfaction_check_node
  participant Finalizer as finalizer_node
  participant HITL as hitl_escalation_node

  Client->>ChatEndpoint: POST /chat (query, active_skills)
  ChatEndpoint->>Graph: ainvoke(state, config={callbacks})
  Graph->>Graph: validate_config_node
  Graph->>Graph: init_skills_node (Redis/Jeen)
  Graph->>SchemaExplorer: schema_explorer_node (scoping, enrichment, hallucination check)
  SchemaExplorer-->>HITL: hallucinated tables + retries exhausted
  SchemaExplorer->>Refiner: schema_plan ready
  Refiner->>SatisfactionCheck: SQL executed, inline_result_rows
  SatisfactionCheck-->>Finalizer: all checks pass
  SatisfactionCheck-->>Refiner: retry (fail_count < max)
  SatisfactionCheck-->>HITL: fail_count >= SATISFACTION_MAX_FAILURES
  HITL-->>Client: interrupt (escalation_reason)
  Finalizer->>ChatEndpoint: summary, sql_explanation, schema_plan
  ChatEndpoint->>Client: ChatResponse

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

StavPonte11/text2sql-onboarding#10: Modifies rejection_router_node and route_* routing logic in agent/src/agent/graph.py, directly overlapping with the G2 graph topology rebuilt in this PR.
StavPonte11/text2sql-onboarding#13: Modifies agent/src/agent/mcp_server.py's chat_with_agent function contract, the same entry point extended here with active_skills and state validation.

Poem

🐇 Hop, hop through the graph we go,
Skills from Jeen, Redis all aglow,
Satisfaction checks in a row,
HITL gate for when things go slow,
SSH keys? We let those go! 🗝️

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch stav/enable-skills

elirazpevz · 2026-06-23T08:30:57Z

A few things:

There is now a way to load skills, but I did not see a way for the LLM to actually request / remove skills? so it seems like it either is all loaded or nothing is loaded, both options not solving the actual problem.
We should consider using a deep agent instead of raw LLM if we want them to access skills - it has built in skill fetching
I prefer implementing this feature only if we are actually going to use it - so, we should think what skills we actually have and where we want to use them. this is important for implementation - for example, I'm not convinced jeen will be our only skills source, and I am not sure which parts of the graph will even use them. I want a more concrete plan for this feature and from the existing code it's not clear how this feature is actually going to work / be used
as mentioned, I prefer an abstraction for the skill source - we might want to add skills locally / from other sources and it would be best to be agnostic to it

feat: implement Jeen Skills Integration Framework (G3)

3cb1ede

elirazpevz changed the base branch from main to stav/agent-architecture-performence June 22, 2026 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement Jeen Skills Integration Framework (G3)#17

feat: implement Jeen Skills Integration Framework (G3)#17
stavp-star wants to merge 1 commit into
stav/agent-architecture-performencefrom
stav/enable-skills

stavp-star commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

elirazpevz commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stavp-star commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Open Questions & Platform Blockers

🛑 Platform Blocks & Limitations:

❓ Open Questions for Discussion:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

elirazpevz commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stavp-star commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading