Add Simili Bot for semantic issue triage #283

Kavirubc · 2026-02-12T04:07:23Z

Summary

This PR adds semantic-based issue triage to entireio-cli using Simili Bot. The bot automatically:

Finds similar issues and comments with links
Detects duplicates and warns maintainers
Suggests labels based on issue content

Why This Matters

Managing issues across repositories is hard. Simili Bot uses semantic search (not just keywords) to find related issues, even if they use different words. This helps:

Reduce duplicate issues
Connect related discussions
Speed up triage with auto-labeling

What Changed

Technology Choices

Gemini API — Used for two purposes:

Embeddings (gemini-embedding-001) — Converts issues into vector representations for semantic search
Response generation (gemini-2.0-flash-lite) — Analyzes issues for labeling, duplicate detection, and quality assessment. Gemini offers good context windows and cost-effectiveness. (Note: LLM model is hardcoded in v0.1.5, will be configurable in v0.2.0)

Qdrant — Open-source vector database. Easy to self-host and has no vendor lock-in. You can run it locally or use their cloud service.

Configuration Files

.github/simili.yaml — Bot configuration (Qdrant + Gemini settings)
.github/workflows/simili.yml — GitHub Action that runs on new issues
scripts/simili-backfill.sh — Optional script to process existing issues

Workflow Triggers

The bot runs on issues: [opened, reopened] only. We excluded edited to prevent comment spam (the bot would post a new triage report every time someone edits an issue).

Safety Analysis

I audited every step of the bot's pipeline to ensure no loops or unintended behavior:

Risk	Guard
Double indexing	Uses deterministic UUID (MD5 of org-repo-number), so re-indexing is an upsert
Transfer loops	Blocks transferring back to the source repository
Bot reacting to itself	Skips comments containing "Simili Triage Report"
Re-triage after transfer	Checks GitHub API for recent transfer events

Setup Required

Before merging, add these secrets to the repository:

GEMINI_API_KEY — Google Gemini API key for embeddings
QDRANT_URL — Qdrant vector database URL
QDRANT_API_KEY — Qdrant API key

Then run manual indexing to populate the database:

simili index --repo entireio/entireio-cli --config .github/simili.yaml

Testing

After merge, create a test issue to verify:

Bot posts a triage comment with similar issues
Labels are suggested
If you create a duplicate, bot warns with [!WARNING]

Known Limitations

Duplicate auto-closing is NOT implemented — The bot only warns. Maintainers must manually close duplicates.
No re-index on edit — If someone edits an issue, the vector DB won't update until you run simili index again.

Both features are planned for v0.2.0 (early March 2026).

Background

I built Simili Bot while working with large open-source repositories to solve the problem of managing thousands of issues across multiple repos. This is the only implementation that does semantic issue matching and cross-repo triage at this scale.

I'm happy to help with initial indexing and configuration for free. For ongoing support, visit github.com/sponsors/Kavirubc.

Docs: https://simili.mintlify.app
Source: https://github.com/similigh/simili-bot

Questions?

Happy to work with the team to adjust the configuration or answer any questions about how the bot works.

- Remove unimplemented duplicate_closing config fields - Trigger only on opened/reopened (not edited) to prevent comment spam - Use v0.1.5 with simplified single-repo config

- Uses gh CLI + simili process to apply triage to open issues - Includes rate limiting (2s between issues) - Requires simili CLI extension to be installed

Copilot

Pull request overview

This PR adds Simili Bot, a semantic issue triage system that uses AI to automatically find similar issues, detect duplicates, and suggest labels. The integration uses Gemini API for embeddings and response generation, with Qdrant as the vector database. However, the PR contains critical bugs that prevent it from working correctly, including incorrect repository names throughout the configuration.

Changes:

Adds GitHub Actions workflow to run Simili Bot on new/reopened issues
Adds Simili Bot configuration with Qdrant and Gemini settings
Adds backfill script for processing existing open issues

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File	Description
`.github/workflows/simili.yml`	GitHub Actions workflow that triggers Simili Bot on issue creation/reopening
`.github/simili.yaml`	Configuration file for Qdrant vector database, Gemini API, and similarity thresholds
`scripts/simili-backfill.sh`	Bash script to process existing open issues through Simili Bot

scripts/simili-backfill.sh

Copilot · 2026-02-12T04:12:38Z

.github/workflows/simili.yml

+      issues: write
+      contents: read
+    steps:
+      - uses: actions/checkout@v4


The workflow uses actions/checkout@v4 while other workflows in this repository use actions/checkout@v6. For consistency with the codebase conventions and to ensure the latest features and security fixes, this should be updated to v6.

Copilot · 2026-02-12T04:12:38Z

.github/simili.yaml

+
+repositories:
+  - org: "entireio"
+    repo: "entireio-cli"


The repository name is specified as "entireio-cli" but based on the repository structure and other files (like scripts/install.sh which references "entireio/cli"), the correct repository name should be "cli". This mismatch will prevent the bot from correctly identifying and processing issues from this repository.

Suggested change

repo: "entireio-cli"

repo: "cli"

Copilot · 2026-02-12T04:12:39Z

.github/simili.yaml

+qdrant:
+  url: "${QDRANT_URL}"
+  api_key: "${QDRANT_API_KEY}"
+  collection: "entireio-cli-issues"


The Qdrant collection name is specified as "entireio-cli-issues" which doesn't align with the actual repository name "entireio/cli". While this may be intentional for the collection name, it's inconsistent with the repository naming convention used throughout the codebase. Consider using "entireio-cli-issues" or a name that clearly matches the repository for easier maintenance and debugging.

Suggested change

collection: "entireio-cli-issues"

collection: "entireio-cli"

Copilot · 2026-02-12T04:12:39Z

.github/simili.yaml

+  - org: "entireio"
+    repo: "entireio-cli"


The PR description repeatedly refers to the repository as "entireio-cli" and "entireio/entireio-cli", but the actual repository is "entireio/cli" (as evidenced by go.mod, README.md, scripts/install.sh, and all other documentation). This naming discrepancy appears in the configuration files and will cause the integration to fail. The description should be updated to reflect the correct repository name, and all configuration values should use "cli" as the repository name.

scripts/simili-backfill.sh

Copilot · 2026-02-12T04:12:39Z

scripts/simili-backfill.sh

+  fi
+
+  # Cleanup
+  rm "/tmp/issue-$NUMBER.json"


The cleanup operation at line 44 will fail if the simili command fails on line 37 and exits due to set -e. The temporary file will not be removed in error cases, leading to leftover files in /tmp. Consider using a trap to ensure cleanup happens on exit, or move the cleanup to a finally-like construct.

Copilot · 2026-02-12T04:12:40Z

.github/workflows/simili.yml

+      - uses: actions/checkout@v4
+
+      - name: Run Simili Bot
+        uses: similigh/simili-bot@v0.1.5


The workflow uses a third-party GitHub Action (similigh/simili-bot@v0.1.5) with access to repository secrets (GEMINI_API_KEY, QDRANT_URL, QDRANT_API_KEY) and write permissions to issues. For security best practices, consider:

Pinning to a specific commit SHA instead of a version tag to prevent supply chain attacks

Reviewing the action's source code at the specified version

Documenting the trust decision in the repository

This is especially important since the action has access to sensitive API keys that could be exfiltrated if the action is compromised.

Suggested change

uses: similigh/simili-bot@v0.1.5

# Pinned to a specific commit SHA for supply-chain security.

# Replace <COMMIT_SHA_FOR_V0_1_5> with the reviewed commit for v0.1.5.

uses: similigh/simili-bot@<COMMIT_SHA_FOR_V0_1_5>

Copilot · 2026-02-12T04:12:40Z

scripts/simili-backfill.sh

+  if command -v simili &> /dev/null; then
+    simili process --issue "/tmp/issue-$NUMBER.json" --config "$CONFIG" || echo "    ⚠️  Failed to process #$NUMBER"
+  else
+    echo "    ⚠️  simili CLI not found. Install with: gh extension install similigh/simili-bot"


The installation instruction suggests installing via gh extension install similigh/simili-bot, but this appears to be a GitHub CLI extension installation command for a bot that's actually a GitHub Action. The error message is misleading - users should instead follow the Simili Bot documentation for proper CLI installation. If the simili CLI tool has a different installation method, this message should reflect that.

Suggested change

echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot"

echo " ⚠️ simili CLI not found. Please install the simili CLI before running this script (see the Simili documentation for installation instructions)."

Copilot · 2026-02-12T04:12:40Z

.github/workflows/simili.yml

+name: Simili Issue Triage
+
+on:
+  issues:
+    types: [opened, reopened]
+
+jobs:
+  triage:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      contents: read
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Run Simili Bot
+        uses: similigh/simili-bot@v0.1.5
+        with:
+          config_path: .github/simili.yaml
+        env:
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+          QDRANT_URL: ${{ secrets.QDRANT_URL }}
+          QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


The PR description mentions that manual indexing is required after merge ("run simili index --repo entireio/entireio-cli"), but there's no automation or documentation for keeping the index updated. The PR also states that editing issues won't update the vector DB until manual re-indexing. This creates an operational burden where:

The index will become stale over time as issues are edited

Someone must remember to periodically re-run indexing

There's no monitoring or alerting if indexing fails

Consider either:

Adding a scheduled workflow to periodically re-index (e.g., weekly)

Documenting the re-indexing procedure in a MAINTENANCE.md or similar file

Creating an issue to track the upcoming v0.2.0 feature that will handle this automatically

Without this, the integration may degrade in quality over time as the index becomes increasingly stale.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Kavirubc added 3 commits February 12, 2026 09:07

feat: add simili bot configuration

bde7fb0

feat: update simili config and workflow per safety analysis

1e93839

- Remove unimplemented duplicate_closing config fields - Trigger only on opened/reopened (not edited) to prevent comment spam - Use v0.1.5 with simplified single-repo config

feat: add optional backfill script for existing issues

3681312

- Uses gh CLI + simili process to apply triage to open issues - Includes rate limiting (2s between issues) - Requires simili CLI extension to be installed

Kavirubc requested a review from a team as a code owner February 12, 2026 04:07

Copilot AI review requested due to automatic review settings February 12, 2026 04:07

Copilot started reviewing on behalf of Kavirubc February 12, 2026 04:07 View session

Merge branch 'main' into feature/simili-integration

f98ef88

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Kavirubc and others added 2 commits February 12, 2026 09:51

Update scripts/simili-backfill.sh

eda70c6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/simili-backfill.sh

08c1287

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Simili Bot for semantic issue triage #283

Add Simili Bot for semantic issue triage #283

Kavirubc commented Feb 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

-        uses: similigh/simili-bot@v0.1.5
+        # Pinned to a specific commit SHA for supply-chain security.
+        # Replace <COMMIT_SHA_FOR_V0_1_5> with the reviewed commit for v0.1.5.
+        uses: similigh/simili-bot@<COMMIT_SHA_FOR_V0_1_5>

	echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot"
	echo " ⚠️ simili CLI not found. Please install the simili CLI before running this script (see the Simili documentation for installation instructions)."

Add Simili Bot for semantic issue triage #283

Are you sure you want to change the base?

Add Simili Bot for semantic issue triage #283

Conversation

Kavirubc commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Matters

What Changed

Technology Choices

Configuration Files

Workflow Triggers

Safety Analysis

Setup Required

Testing

Known Limitations

Background

Questions?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Kavirubc commented Feb 12, 2026 •

edited

Loading