Skip to content

Conversation

@Kavirubc
Copy link

@Kavirubc Kavirubc commented Feb 12, 2026

Summary

This PR adds semantic-based issue triage to entireio-cli using Simili Bot. The bot automatically:

  • Finds similar issues and comments with links
  • Detects duplicates and warns maintainers
  • Suggests labels based on issue content

Why This Matters

Managing issues across repositories is hard. Simili Bot uses semantic search (not just keywords) to find related issues, even if they use different words. This helps:

  • Reduce duplicate issues
  • Connect related discussions
  • Speed up triage with auto-labeling

What Changed

Technology Choices

Gemini API — Used for two purposes:

  • Embeddings (gemini-embedding-001) — Converts issues into vector representations for semantic search
  • Response generation (gemini-2.0-flash-lite) — Analyzes issues for labeling, duplicate detection, and quality assessment. Gemini offers good context windows and cost-effectiveness. (Note: LLM model is hardcoded in v0.1.5, will be configurable in v0.2.0)

Qdrant — Open-source vector database. Easy to self-host and has no vendor lock-in. You can run it locally or use their cloud service.

Configuration Files

  • .github/simili.yaml — Bot configuration (Qdrant + Gemini settings)
  • .github/workflows/simili.yml — GitHub Action that runs on new issues
  • scripts/simili-backfill.sh — Optional script to process existing issues

Workflow Triggers

The bot runs on issues: [opened, reopened] only. We excluded edited to prevent comment spam (the bot would post a new triage report every time someone edits an issue).

Safety Analysis

I audited every step of the bot's pipeline to ensure no loops or unintended behavior:

Risk Guard
Double indexing Uses deterministic UUID (MD5 of org-repo-number), so re-indexing is an upsert
Transfer loops Blocks transferring back to the source repository
Bot reacting to itself Skips comments containing "Simili Triage Report"
Re-triage after transfer Checks GitHub API for recent transfer events

Setup Required

Before merging, add these secrets to the repository:

  1. GEMINI_API_KEY — Google Gemini API key for embeddings
  2. QDRANT_URL — Qdrant vector database URL
  3. QDRANT_API_KEY — Qdrant API key

Then run manual indexing to populate the database:

simili index --repo entireio/entireio-cli --config .github/simili.yaml

Testing

After merge, create a test issue to verify:

  1. Bot posts a triage comment with similar issues
  2. Labels are suggested
  3. If you create a duplicate, bot warns with [!WARNING]

Known Limitations

  • Duplicate auto-closing is NOT implemented — The bot only warns. Maintainers must manually close duplicates.
  • No re-index on edit — If someone edits an issue, the vector DB won't update until you run simili index again.

Both features are planned for v0.2.0 (early March 2026).

Background

I built Simili Bot while working with large open-source repositories to solve the problem of managing thousands of issues across multiple repos. This is the only implementation that does semantic issue matching and cross-repo triage at this scale.

I'm happy to help with initial indexing and configuration for free. For ongoing support, visit github.com/sponsors/Kavirubc.

Docs: https://simili.mintlify.app
Source: https://github.com/similigh/simili-bot

Questions?

Happy to work with the team to adjust the configuration or answer any questions about how the bot works.

- Remove unimplemented duplicate_closing config fields
- Trigger only on opened/reopened (not edited) to prevent comment spam
- Use v0.1.5 with simplified single-repo config
- Uses gh CLI + simili process to apply triage to open issues
- Includes rate limiting (2s between issues)
- Requires simili CLI extension to be installed
@Kavirubc Kavirubc requested a review from a team as a code owner February 12, 2026 04:07
Copilot AI review requested due to automatic review settings February 12, 2026 04:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Simili Bot, a semantic issue triage system that uses AI to automatically find similar issues, detect duplicates, and suggest labels. The integration uses Gemini API for embeddings and response generation, with Qdrant as the vector database. However, the PR contains critical bugs that prevent it from working correctly, including incorrect repository names throughout the configuration.

Changes:

  • Adds GitHub Actions workflow to run Simili Bot on new/reopened issues
  • Adds Simili Bot configuration with Qdrant and Gemini settings
  • Adds backfill script for processing existing open issues

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File Description
.github/workflows/simili.yml GitHub Actions workflow that triggers Simili Bot on issue creation/reopening
.github/simili.yaml Configuration file for Qdrant vector database, Gemini API, and similarity thresholds
scripts/simili-backfill.sh Bash script to process existing open issues through Simili Bot

issues: write
contents: read
steps:
- uses: actions/checkout@v4
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow uses actions/checkout@v4 while other workflows in this repository use actions/checkout@v6. For consistency with the codebase conventions and to ensure the latest features and security fixes, this should be updated to v6.

Copilot uses AI. Check for mistakes.

repositories:
- org: "entireio"
repo: "entireio-cli"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository name is specified as "entireio-cli" but based on the repository structure and other files (like scripts/install.sh which references "entireio/cli"), the correct repository name should be "cli". This mismatch will prevent the bot from correctly identifying and processing issues from this repository.

Suggested change
repo: "entireio-cli"
repo: "cli"

Copilot uses AI. Check for mistakes.
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}"
collection: "entireio-cli-issues"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Qdrant collection name is specified as "entireio-cli-issues" which doesn't align with the actual repository name "entireio/cli". While this may be intentional for the collection name, it's inconsistent with the repository naming convention used throughout the codebase. Consider using "entireio-cli-issues" or a name that clearly matches the repository for easier maintenance and debugging.

Suggested change
collection: "entireio-cli-issues"
collection: "entireio-cli"

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
- org: "entireio"
repo: "entireio-cli"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description repeatedly refers to the repository as "entireio-cli" and "entireio/entireio-cli", but the actual repository is "entireio/cli" (as evidenced by go.mod, README.md, scripts/install.sh, and all other documentation). This naming discrepancy appears in the configuration files and will cause the integration to fail. The description should be updated to reflect the correct repository name, and all configuration values should use "cli" as the repository name.

Copilot uses AI. Check for mistakes.
fi

# Cleanup
rm "/tmp/issue-$NUMBER.json"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup operation at line 44 will fail if the simili command fails on line 37 and exits due to set -e. The temporary file will not be removed in error cases, leading to leftover files in /tmp. Consider using a trap to ensure cleanup happens on exit, or move the cleanup to a finally-like construct.

Copilot uses AI. Check for mistakes.
- uses: actions/checkout@v4

- name: Run Simili Bot
uses: similigh/simili-bot@v0.1.5
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow uses a third-party GitHub Action (similigh/simili-bot@v0.1.5) with access to repository secrets (GEMINI_API_KEY, QDRANT_URL, QDRANT_API_KEY) and write permissions to issues. For security best practices, consider:

  1. Pinning to a specific commit SHA instead of a version tag to prevent supply chain attacks
  2. Reviewing the action's source code at the specified version
  3. Documenting the trust decision in the repository

This is especially important since the action has access to sensitive API keys that could be exfiltrated if the action is compromised.

Suggested change
uses: similigh/simili-bot@v0.1.5
# Pinned to a specific commit SHA for supply-chain security.
# Replace <COMMIT_SHA_FOR_V0_1_5> with the reviewed commit for v0.1.5.
uses: similigh/simili-bot@<COMMIT_SHA_FOR_V0_1_5>

Copilot uses AI. Check for mistakes.
if command -v simili &> /dev/null; then
simili process --issue "/tmp/issue-$NUMBER.json" --config "$CONFIG" || echo " ⚠️ Failed to process #$NUMBER"
else
echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation instruction suggests installing via gh extension install similigh/simili-bot, but this appears to be a GitHub CLI extension installation command for a bot that's actually a GitHub Action. The error message is misleading - users should instead follow the Simili Bot documentation for proper CLI installation. If the simili CLI tool has a different installation method, this message should reflect that.

Suggested change
echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot"
echo " ⚠️ simili CLI not found. Please install the simili CLI before running this script (see the Simili documentation for installation instructions)."

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +24
name: Simili Issue Triage

on:
issues:
types: [opened, reopened]

jobs:
triage:
runs-on: ubuntu-latest
permissions:
issues: write
contents: read
steps:
- uses: actions/checkout@v4

- name: Run Simili Bot
uses: similigh/simili-bot@v0.1.5
with:
config_path: .github/simili.yaml
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
QDRANT_URL: ${{ secrets.QDRANT_URL }}
QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions that manual indexing is required after merge ("run simili index --repo entireio/entireio-cli"), but there's no automation or documentation for keeping the index updated. The PR also states that editing issues won't update the vector DB until manual re-indexing. This creates an operational burden where:

  1. The index will become stale over time as issues are edited
  2. Someone must remember to periodically re-run indexing
  3. There's no monitoring or alerting if indexing fails

Consider either:

  • Adding a scheduled workflow to periodically re-index (e.g., weekly)
  • Documenting the re-indexing procedure in a MAINTENANCE.md or similar file
  • Creating an issue to track the upcoming v0.2.0 feature that will handle this automatically

Without this, the integration may degrade in quality over time as the index becomes increasingly stale.

Copilot uses AI. Check for mistakes.
Kavirubc and others added 2 commits February 12, 2026 09:51
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant