-
Notifications
You must be signed in to change notification settings - Fork 152
Add Simili Bot for semantic issue triage #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Remove unimplemented duplicate_closing config fields - Trigger only on opened/reopened (not edited) to prevent comment spam - Use v0.1.5 with simplified single-repo config
- Uses gh CLI + simili process to apply triage to open issues - Includes rate limiting (2s between issues) - Requires simili CLI extension to be installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Simili Bot, a semantic issue triage system that uses AI to automatically find similar issues, detect duplicates, and suggest labels. The integration uses Gemini API for embeddings and response generation, with Qdrant as the vector database. However, the PR contains critical bugs that prevent it from working correctly, including incorrect repository names throughout the configuration.
Changes:
- Adds GitHub Actions workflow to run Simili Bot on new/reopened issues
- Adds Simili Bot configuration with Qdrant and Gemini settings
- Adds backfill script for processing existing open issues
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
.github/workflows/simili.yml |
GitHub Actions workflow that triggers Simili Bot on issue creation/reopening |
.github/simili.yaml |
Configuration file for Qdrant vector database, Gemini API, and similarity thresholds |
scripts/simili-backfill.sh |
Bash script to process existing open issues through Simili Bot |
| issues: write | ||
| contents: read | ||
| steps: | ||
| - uses: actions/checkout@v4 |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses actions/checkout@v4 while other workflows in this repository use actions/checkout@v6. For consistency with the codebase conventions and to ensure the latest features and security fixes, this should be updated to v6.
|
|
||
| repositories: | ||
| - org: "entireio" | ||
| repo: "entireio-cli" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repository name is specified as "entireio-cli" but based on the repository structure and other files (like scripts/install.sh which references "entireio/cli"), the correct repository name should be "cli". This mismatch will prevent the bot from correctly identifying and processing issues from this repository.
| repo: "entireio-cli" | |
| repo: "cli" |
| qdrant: | ||
| url: "${QDRANT_URL}" | ||
| api_key: "${QDRANT_API_KEY}" | ||
| collection: "entireio-cli-issues" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Qdrant collection name is specified as "entireio-cli-issues" which doesn't align with the actual repository name "entireio/cli". While this may be intentional for the collection name, it's inconsistent with the repository naming convention used throughout the codebase. Consider using "entireio-cli-issues" or a name that clearly matches the repository for easier maintenance and debugging.
| collection: "entireio-cli-issues" | |
| collection: "entireio-cli" |
| - org: "entireio" | ||
| repo: "entireio-cli" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description repeatedly refers to the repository as "entireio-cli" and "entireio/entireio-cli", but the actual repository is "entireio/cli" (as evidenced by go.mod, README.md, scripts/install.sh, and all other documentation). This naming discrepancy appears in the configuration files and will cause the integration to fail. The description should be updated to reflect the correct repository name, and all configuration values should use "cli" as the repository name.
| fi | ||
|
|
||
| # Cleanup | ||
| rm "/tmp/issue-$NUMBER.json" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup operation at line 44 will fail if the simili command fails on line 37 and exits due to set -e. The temporary file will not be removed in error cases, leading to leftover files in /tmp. Consider using a trap to ensure cleanup happens on exit, or move the cleanup to a finally-like construct.
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Run Simili Bot | ||
| uses: similigh/simili-bot@v0.1.5 |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses a third-party GitHub Action (similigh/simili-bot@v0.1.5) with access to repository secrets (GEMINI_API_KEY, QDRANT_URL, QDRANT_API_KEY) and write permissions to issues. For security best practices, consider:
- Pinning to a specific commit SHA instead of a version tag to prevent supply chain attacks
- Reviewing the action's source code at the specified version
- Documenting the trust decision in the repository
This is especially important since the action has access to sensitive API keys that could be exfiltrated if the action is compromised.
| uses: similigh/simili-bot@v0.1.5 | |
| # Pinned to a specific commit SHA for supply-chain security. | |
| # Replace <COMMIT_SHA_FOR_V0_1_5> with the reviewed commit for v0.1.5. | |
| uses: similigh/simili-bot@<COMMIT_SHA_FOR_V0_1_5> |
| if command -v simili &> /dev/null; then | ||
| simili process --issue "/tmp/issue-$NUMBER.json" --config "$CONFIG" || echo " ⚠️ Failed to process #$NUMBER" | ||
| else | ||
| echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation instruction suggests installing via gh extension install similigh/simili-bot, but this appears to be a GitHub CLI extension installation command for a bot that's actually a GitHub Action. The error message is misleading - users should instead follow the Simili Bot documentation for proper CLI installation. If the simili CLI tool has a different installation method, this message should reflect that.
| echo " ⚠️ simili CLI not found. Install with: gh extension install similigh/simili-bot" | |
| echo " ⚠️ simili CLI not found. Please install the simili CLI before running this script (see the Simili documentation for installation instructions)." |
| name: Simili Issue Triage | ||
|
|
||
| on: | ||
| issues: | ||
| types: [opened, reopened] | ||
|
|
||
| jobs: | ||
| triage: | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| issues: write | ||
| contents: read | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Run Simili Bot | ||
| uses: similigh/simili-bot@v0.1.5 | ||
| with: | ||
| config_path: .github/simili.yaml | ||
| env: | ||
| GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} | ||
| QDRANT_URL: ${{ secrets.QDRANT_URL }} | ||
| QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }} | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions that manual indexing is required after merge ("run simili index --repo entireio/entireio-cli"), but there's no automation or documentation for keeping the index updated. The PR also states that editing issues won't update the vector DB until manual re-indexing. This creates an operational burden where:
- The index will become stale over time as issues are edited
- Someone must remember to periodically re-run indexing
- There's no monitoring or alerting if indexing fails
Consider either:
- Adding a scheduled workflow to periodically re-index (e.g., weekly)
- Documenting the re-indexing procedure in a MAINTENANCE.md or similar file
- Creating an issue to track the upcoming v0.2.0 feature that will handle this automatically
Without this, the integration may degrade in quality over time as the index becomes increasingly stale.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR adds semantic-based issue triage to
entireio-cliusing Simili Bot. The bot automatically:Why This Matters
Managing issues across repositories is hard. Simili Bot uses semantic search (not just keywords) to find related issues, even if they use different words. This helps:
What Changed
Technology Choices
Gemini API — Used for two purposes:
gemini-embedding-001) — Converts issues into vector representations for semantic searchgemini-2.0-flash-lite) — Analyzes issues for labeling, duplicate detection, and quality assessment. Gemini offers good context windows and cost-effectiveness. (Note: LLM model is hardcoded in v0.1.5, will be configurable in v0.2.0)Qdrant — Open-source vector database. Easy to self-host and has no vendor lock-in. You can run it locally or use their cloud service.
Configuration Files
.github/simili.yaml— Bot configuration (Qdrant + Gemini settings).github/workflows/simili.yml— GitHub Action that runs on new issuesscripts/simili-backfill.sh— Optional script to process existing issuesWorkflow Triggers
The bot runs on
issues: [opened, reopened]only. We excludededitedto prevent comment spam (the bot would post a new triage report every time someone edits an issue).Safety Analysis
I audited every step of the bot's pipeline to ensure no loops or unintended behavior:
Setup Required
Before merging, add these secrets to the repository:
GEMINI_API_KEY— Google Gemini API key for embeddingsQDRANT_URL— Qdrant vector database URLQDRANT_API_KEY— Qdrant API keyThen run manual indexing to populate the database:
Testing
After merge, create a test issue to verify:
[!WARNING]Known Limitations
simili indexagain.Both features are planned for v0.2.0 (early March 2026).
Background
I built Simili Bot while working with large open-source repositories to solve the problem of managing thousands of issues across multiple repos. This is the only implementation that does semantic issue matching and cross-repo triage at this scale.
I'm happy to help with initial indexing and configuration for free. For ongoing support, visit github.com/sponsors/Kavirubc.
Docs: https://simili.mintlify.app
Source: https://github.com/similigh/simili-bot
Questions?
Happy to work with the team to adjust the configuration or answer any questions about how the bot works.