Update docs and some rag work by ossirytk · Pull Request #22 · ossirytk/light-chat

ossirytk · 2026-03-28T05:09:01Z

Fix docs/quality_gate.md:105 - incorrect baseline-capture command → uv run capture-conversation-baselines
Fix docs/future_work/COPILOT_COMPACT_REFERENCE.md - incorrect commands with extra subcommand arg → uv run capture-conversation-baselines
Fix docs/rag_management/03_MANAGE_COLLECTIONS.md:177-178 - dedupe --min-recall/--min-mrr bullet points
Fix scripts/quality_gate.py:89-106 - narrow broad except Exception to specific env-missing exceptions (OSError, ImportError, ModuleNotFoundError, RuntimeError)
Fix docs/rag_management/00_README.md:32,41 - case mismatch QUALITY_GATE.md → quality_gate.md
Fix docs/rag_management/LINTING.md:275 - case mismatch QUALITY_GATE.md → quality_gate.md

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Copilot

Pull request overview

This PR consolidates and expands “quality” tooling across RAG + conversation evaluation by adding a unified quality-gate command/workflow, extending persona-drift telemetry export and calibration utilities, and updating docs/fixtures/message-examples accordingly.

Changes:

Add unified quality-gate CLI (and GitHub Actions workflow) covering message-example lint, retrieval fixtures (optional), and conversation fixtures (mock + baseline regression).
Extend conversation session exports with persona-drift config/anchor + persist per-turn drift trace; add drift calibration command.
Add retrieval fixture hard-gates (--min-recall/--min-mrr), trend rendering, and update RAG docs + message examples to the [USER]/[ASSISTANT] format.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
web_app.py	Adds `quality` block to saved session payload for downstream calibration/analysis.
tests/test_conversation_quality_eval.py	Adds tests for loading session drift turns and calibration report generation + fixture schema checks.
tests/test_context_manager.py	Simplifies history test string formatting.
tests/fixtures/conversation_fixtures_negative.json	Adds negative conversation fixture pack for leakage/OOC checks.
tests/fixtures/conversation_fixtures_hard.json	Adds hard conversation fixture pack for drift/style-break scenarios.
scripts/rag/push_rag_data.py	Formatting-only adjustments in option building.
scripts/rag/migrate_collection_embedding.py	Formatting-only adjustments for fingerprint keys/client construction.
scripts/rag/manage_collections_core_types.py	Extends fixture eval options with optional `min_recall`/`min_mrr`.
scripts/rag/manage_collections_core_evaluation.py	Adds CSV history loader helper and exports it.
scripts/rag/manage_collections_core.py	Formatting-only adjustment to exported name aggregation.
scripts/rag/manage_collections_commands_eval.py	Adds `--min-recall/--min-mrr` gating and a `show-retrieval-trends` command.
scripts/rag/manage_collections_commands_collections.py	Formatting-only adjustment to metadata filtering comprehension.
scripts/rag/lint_message_examples.py	Minor message string formatting + indentation fix in a violation append block.
scripts/rag/analyze_rag_text_enrichment.py	Formatting-only adjustment to category threshold boolean.
scripts/rag/analyze_rag_coverage.py	Formatting-only adjustments; adds blank line.
scripts/quality_gate.py	New unified quality-gate CLI orchestrating lint + retrieval + conversation checks.
scripts/conversation/evaluate_quality.py	Adds session-based persona drift calibration tooling and associated CLI command.
scripts/conversation/capture_baselines.py	New CLI to capture mock-mode baseline artifacts for conversation fixtures.
rag_data/shodan_message_examples.txt	Updates message-example formatting to `[USER]/[ASSISTANT]` style with blank-line separation.
rag_data/leonardo_da_vinci_message_examples.txt	Updates message-example formatting to `[USER]/[ASSISTANT]` style with blank-line separation.
rag_data/legacy/shodan_message_examples.txt	Adds legacy copy of the prior “User:/Persona:” formatted examples.
rag_data/legacy/leonardo_da_vinci_message_examples.txt	Adds legacy copy of the prior “User:/Persona:” formatted examples.
pyproject.toml	Registers `quality-gate` and `capture-conversation-baselines` as project scripts.
docs/rag_management/LINTING.md	Updates quality-gate integration instructions.
docs/rag_management/CATEGORY_CONFIG.md	Clarifies that category assignment is determined at analysis time, not push time.
docs/rag_management/05_WORKFLOWS_TROUBLESHOOTING.md	Updates recommended workflow and module-style invocation examples.
docs/rag_management/03_MANAGE_COLLECTIONS.md	Documents new gating options and retrieval trend display command.
docs/rag_management/02_PUSH_RAG_DATA.md	Updates module invocation and adds pre-push checklist/coverage note.
docs/rag_management/01_ANALYZE_RAG_TEXT.md	Updates examples to module-style invocation.
docs/rag_management/00_README.md	Adds module-invocation guidance and links to the quality gate doc.
docs/quality_gate.md	New end-to-end quality gate documentation.
docs/future_work/UI_REFINEMENTS.md	New UI backlog doc separated from general refinements.
docs/future_work/REFINEMENTS.md	Updates completed items and points UI work to the new UI backlog doc.
docs/future_work/COPILOT_COMPACT_REFERENCE.md	Updates implemented-state reference with new quality/conversation tooling.
docs/context_management/07_IMPLEMENTATION.md	Updates implementation notes to include multi-query, compression, and drift scoring.
docs/context_management/03_CONFIGURATION.md	Updates config docs for new keys and notes on defaults.
docs/context_management/02_HOW_IT_WORKS.md	Updates retrieval flow description for multi-query/compression/drift telemetry.
docs/context_management/01_QUICKSTART.md	Updates quickstart config snippets and naming.
docs/context_management/00_README.md	Updates feature bullets for new retrieval/drift capabilities.
docs/configs/01_CONFIG_V2.md	Documents `conversation_quality` and other new v2 config keys.
docs/configs/00_README.md	Notes config tracking behavior and lack of example config file.
docs/RAG_SCRIPTS_GUIDE.md	Updates preferred invocation and adds a canonical workflow section.
core/conversation_retrieval_orchestration_mixin.py	Formatting-only change to `mes_trace` construction when MES retrieval is disabled.
core/conversation_response_mixin.py	Persists persona drift trace in exported state; clears/imports trace.
core/conversation_manager.py	Initializes `persona_drift_trace` deque alongside drift history.
README.md	Updates RAG workflow docs to module invocation + adds optional quality gates before push.
.github/workflows/quality_gate.yml	Adds CI workflow running ruff, pytest, baseline capture, and quality gate (skip retrieval).

Comments suppressed due to low confidence (1)

docs/rag_management/03_MANAGE_COLLECTIONS.md:404

show-retrieval-trends is documented earlier in this file already, and this later section repeats the same command/option text (and duplicates the section number). Please remove one of the sections (or replace the second with a short cross-link) to avoid documentation drift.

### 10) `show-retrieval-trends`

Display a compact trend table from a retrieval eval history CSV, showing Recall@k and MRR per run with deltas vs the previous row.

```bash
uv run python -m scripts.rag.manage_collections show-retrieval-trends \
  --history-csv logs/retrieval_eval/history.csv

Limit to the most recent N rows:

uv run python -m scripts.rag.manage_collections show-retrieval-trends \
  --history-csv logs/retrieval_eval/history.csv \
  --last-n 10

Options:

--history-csv (default: logs/retrieval_eval/history.csv)
--last-n (default: show all rows)

The table prints columns: #, Date, Fixture, Mode, k, Recall@k, MRR, dRecall, dMRR.

</details>

Copilot · 2026-03-28T05:14:06Z

+Before using baseline comparison, capture canonical baselines in mock mode:
+
+```bash
+uv run python -m scripts.conversation.capture_baselines capture-conversation-baselines


This baseline-capture invocation is incorrect for the current scripts/conversation/capture_baselines.py module: python -m scripts.conversation.capture_baselines already runs the click command, so passing capture-conversation-baselines as an extra argument will cause a click usage error. Update the docs to either use uv run python -m scripts.conversation.capture_baselines or the [project.scripts] entrypoint uv run capture-conversation-baselines.

Suggested change

uv run python -m scripts.conversation.capture_baselines capture-conversation-baselines

uv run capture-conversation-baselines

Copilot · 2026-03-28T05:14:07Z

 uv run ruff format .
+uv run python -m scripts.conversation.evaluate_quality evaluate-conversation-fixtures
+uv run python -m scripts.conversation.evaluate_quality calibrate-persona-drift
+uv run python -m scripts.conversation.capture_baselines capture-conversation-baselines


The operational command shown here is likely incorrect: uv run python -m scripts.conversation.capture_baselines already runs the click command, so adding capture-conversation-baselines as an extra argument will fail. Prefer either uv run python -m scripts.conversation.capture_baselines or uv run capture-conversation-baselines (from [project.scripts]).

Suggested change

uv run python -m scripts.conversation.capture_baselines capture-conversation-baselines

uv run python -m scripts.conversation.capture_baselines

Copilot · 2026-03-28T05:14:07Z

+- `--min-recall` (exit non-zero if Recall@k falls below this threshold)
+- `--min-mrr` (exit non-zero if MRR falls below this threshold)


The --min-recall / --min-mrr options are listed twice, which makes it unclear whether they differ. Please dedupe these bullet points (keep a single description per flag).

Suggested change

- `--min-recall` (exit non-zero if Recall@k falls below this threshold)

- `--min-mrr` (exit non-zero if MRR falls below this threshold)

Copilot · 2026-03-28T05:14:07Z

+    try:
+        options = FixtureEvalOptions(
+            fixture_file=fixture_file,
+            k=None,
+            retrieval_mode="similarity",
+            persist_directory=None,
+            embedding_model=None,
+            embedding_device=None,
+            show_failures=False,
+            min_recall=min_recall,
+            min_mrr=min_mrr,
+        )
+        run = _execute_fixture_evaluation(options)
+    except click.ClickException as exc:
+        return "fail", str(exc.format_message())
+    except Exception as exc:
+        return "skip", f"Retrieval evaluation unavailable: {exc}"
+


The retrieval step treats any non-ClickException as "skip" (except Exception: return 'skip' ...). This can mask real bugs (e.g., parsing errors, regressions inside evaluation) and produce a false-green gate. Consider only skipping for known “environment missing” failure modes (e.g., Chroma/embedding initialization errors) and failing (or re-raising) for unexpected exceptions, possibly with a stack trace in the output.

Copilot · 2026-03-28T05:14:07Z

 4. `04_OLD_PREPARE_RAG.md`
 5. `05_WORKFLOWS_TROUBLESHOOTING.md`

+See also `docs/QUALITY_GATE.md` for the unified quality gate reference.


docs/QUALITY_GATE.md is referenced here, but the file added in this PR is docs/quality_gate.md (different casing). On case-sensitive filesystems (Linux CI), this link will be broken; update the reference (or rename the file) so the path matches exactly.

Copilot · 2026-03-28T05:14:07Z

+The unified quality gate (`docs/QUALITY_GATE.md`) runs RAG lint, retrieval fixtures, and
+conversation fixture evaluation in one command. See that document for the full guide.


This section links to docs/QUALITY_GATE.md, but the actual document in this PR is docs/quality_gate.md. Please align the casing so the link works on Linux/macOS (case-sensitive) environments.

Copilot · 2026-03-28T05:14:08Z

 ```

-This runs linting, coverage scoring, and retrieval benchmarks together.
+The gate runs message-example linting, conversation fixture evaluation, and (optionally) retrieval fixture evaluation in sequence, printing a PASS/WARN/FAIL table and exiting non-zero if any step fails. See `docs/QUALITY_GATE.md` for full usage.


The doc points to docs/QUALITY_GATE.md, but the quality gate doc added in this PR is docs/quality_gate.md. This mismatch will break links on case-sensitive filesystems; update the reference or rename the target file to match.

Suggested change

The gate runs message-example linting, conversation fixture evaluation, and (optionally) retrieval fixture evaluation in sequence, printing a PASS/WARN/FAIL table and exiting non-zero if any step fails. See `docs/QUALITY_GATE.md` for full usage.

The gate runs message-example linting, conversation fixture evaluation, and (optionally) retrieval fixture evaluation in sequence, printing a PASS/WARN/FAIL table and exiting non-zero if any step fails. See `docs/quality_gate.md` for full usage.

ruff was only installed globally as a uv tool locally, but was not declared in pyproject.toml. GitHub Actions runs uv sync --dev which only installs declared dependencies, so ruff was absent on CI runners causing 'Failed to spawn: ruff / No such file or directory'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ruff EXE001 requires files with shebang lines to have the executable bit set. On Windows this bit is never set automatically, but the CI runner is Linux where ruff enforces it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ossirytk · 2026-03-28T05:31:59Z

@copilot apply changes based on the comments in this thread

…on handler Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/1563bc6c-3872-436c-b454-40f2b5e71c59 Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>

Copilot · 2026-03-28T05:36:38Z

@copilot apply changes based on the comments in this thread

Applied all changes from the review thread in commit 1d958f2:

docs/quality_gate.md — fixed baseline-capture command to uv run capture-conversation-baselines
docs/future_work/COPILOT_COMPACT_REFERENCE.md — fixed the same incorrect command in both the prose (line 36) and the code block (line 227)
docs/rag_management/03_MANAGE_COLLECTIONS.md — removed duplicate --min-recall/--min-mrr bullet points
scripts/quality_gate.py — narrowed the broad except Exception to (OSError, ImportError, ModuleNotFoundError, RuntimeError) so only known environment-missing failures produce a skip; unexpected exceptions propagate normally
docs/rag_management/00_README.md and docs/rag_management/LINTING.md — fixed case-mismatch references QUALITY_GATE.md → quality_gate.md

Update docs and some rag work

6b2dfde

Copilot AI review requested due to automatic review settings March 28, 2026 05:09

Copilot started reviewing on behalf of ossirytk March 28, 2026 05:09 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

ossirytk and others added 2 commits March 28, 2026 07:15

Copilot started work on behalf of ossirytk March 28, 2026 05:32 View session

fix: address PR review comments - docs corrections and narrow excepti…

1d958f2

…on handler Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/1563bc6c-3872-436c-b454-40f2b5e71c59 Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>

Copilot finished work on behalf of ossirytk March 28, 2026 05:37

ossirytk merged commit 1c876f0 into main Mar 28, 2026
2 checks passed

ossirytk deleted the vibing-rag-quality-work branch March 28, 2026 05:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs and some rag work#22

Update docs and some rag work#22
ossirytk merged 4 commits intomainfrom
vibing-rag-quality-work

ossirytk commented Mar 28, 2026 •

edited by Copilot AI

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

ossirytk commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	uv run python -m scripts.conversation.capture_baselines capture-conversation-baselines
	uv run capture-conversation-baselines

		- `--min-recall` (exit non-zero if Recall@k falls below this threshold)
		- `--min-mrr` (exit non-zero if MRR falls below this threshold)

		The unified quality gate (`docs/QUALITY_GATE.md`) runs RAG lint, retrieval fixtures, and
		conversation fixture evaluation in one command. See that document for the full guide.

	The gate runs message-example linting, conversation fixture evaluation, and (optionally) retrieval fixture evaluation in sequence, printing a PASS/WARN/FAIL table and exiting non-zero if any step fails. See `docs/QUALITY_GATE.md` for full usage.
	The gate runs message-example linting, conversation fixture evaluation, and (optionally) retrieval fixture evaluation in sequence, printing a PASS/WARN/FAIL table and exiting non-zero if any step fails. See `docs/quality_gate.md` for full usage.

Conversation

ossirytk commented Mar 28, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

ossirytk commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ossirytk commented Mar 28, 2026 •

edited by Copilot AI

Loading