Simplify CLI: Accept URLs directly for indexing#12
Conversation
|
augment review |
🤖 Augment PR SummarySummary: This PR simplifies the indexing CLI by allowing users to pass a repository/website URL directly and having the tool auto-detect the source type. Changes:
Technical Notes: Self-hosted GitLab/Bitbucket URLs are supported via 🤖 Was this summary useful? React with 👍 or 👎 |
src/core/url-parser.ts
Outdated
| } | ||
|
|
||
| const owner = pathParts[0]; | ||
| const repo = pathParts[1]; |
There was a problem hiding this comment.
URLs pasted from clone endpoints often end with .git; currently repo/projectName/defaultIndexName will include the suffix, which likely breaks GitHub/GitLab/Bitbucket API paths and yields odd default index keys. Consider stripping a trailing .git from repo/project segments when parsing.
Other Locations
src/core/url-parser.ts:125src/core/url-parser.ts:152
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
✅ Fixed! Added a stripGitSuffix() helper function that removes trailing .git from repo/project names. Applied to all three parsers (GitHub, GitLab, Bitbucket) and added comprehensive tests.
See commit 845aa3b.
src/core/url-parser.ts
Outdated
| } | ||
|
|
||
| // GitLab (gitlab.com or hostname contains "gitlab") | ||
| if (hostname === "gitlab.com" || hostname.includes("gitlab")) { |
There was a problem hiding this comment.
Using hostname.includes("gitlab") / hostname.includes("bitbucket") can misclassify unrelated websites whose domain happens to contain those substrings and then fail parsing instead of falling back to website. Consider a more conservative self-hosted detection strategy to reduce false positives.
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
✅ Fixed! Changed from hostname.includes() to hostname.startsWith() for self-hosted detection:
// Before (too broad)
hostname.includes("gitlab")
// After (conservative)
hostname.startsWith("gitlab.")This prevents false positives like "notgitlab.com" or "mygitlabtools.com" from being misclassified. Added tests to verify the conservative behavior.
See commit 845aa3b.
| program.parse(); | ||
| // Auto-detect URL mode: ctxc index <url> -> ctxc index url <url> | ||
| // This allows users to skip the 'url' subcommand when providing a URL directly | ||
| const indexIdx = process.argv.indexOf("index"); |
There was a problem hiding this comment.
The URL auto-rewrite only triggers when the URL is the first argument after index, so ctxc index -i myidx https://… won’t be rewritten and will likely error. Is that limitation intentional, or should the rewrite scan forward for the first non-option arg?
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
This is intentional. The URL is treated like a positional argument (similar to how git clone <url> works), so it must come first after index.
The supported patterns are:
ctxc index https://github.com/owner/repo # URL first
ctxc index https://github.com/owner/repo -i myidx # Options after URLScanning forward for non-option args would add complexity and could lead to ambiguous parsing in edge cases. The current behavior is consistent with how most CLI tools handle positional arguments.
- Parse GitHub URLs (owner/repo, tree/branch, commit/sha) - Parse GitLab URLs (project path, subgroups, /-/tree/branch) - Parse Bitbucket URLs (workspace/repo, src/branch, branch/name) - Fallback to website source for unknown URLs - Extract default index names from URLs - Support self-hosted GitLab and Bitbucket instances - Export parseSourceUrl and ParsedUrl from @augmentcode/context-connectors/core - Add comprehensive unit tests (19 test cases) Agent-Id: agent-8394bd07-7a81-41d0-ac95-1ca62623e6fb
Adds command that auto-detects source type (GitHub, GitLab, Bitbucket, or website) from the URL and creates the appropriate source. Features: - Parses URL using parseSourceUrl() to determine source type - Supports --ref option to override URL-detected branch/tag - Supports -i/--index option to override default index name - Supports --store and --store-path options - Default index name derived from repo/project name - Graceful error handling for invalid URLs Agent-Id: agent-c9423996-94bb-4ab3-8311-ca0cc822da14
Adds pre-parse argument rewriting that auto-detects when a URL is passed directly to 'ctxc index' and transparently inserts the 'url' subcommand. Before: ctxc index url https://github.com/owner/repo After: ctxc index https://github.com/owner/repo Both syntaxes now work. Existing subcommands (github, gitlab, etc.) are unchanged and continue to work. Agent-Id: agent-ce81a04d-72f2-4289-8eb7-c3074d7d8030
- Strip .git suffix from repo/project names (clone URLs now work) - Conservative self-hosted detection (hostname.startsWith instead of includes) - CLI: Reorder args so options before URL are handled correctly Fixes: 1. URLs like https://github.com/owner/repo.git now parse correctly 2. notgitlab.com no longer incorrectly matches as GitLab 3. now works (options can be anywhere) Added 8 new tests for edge cases. Agent-Id: agent-ce81a04d-72f2-4289-8eb7-c3074d7d8030
00f2c42 to
ad138f4
Compare
Remove argument reordering logic. URL must now appear immediately after 'index', consistent with how other subcommands work: ctxc index https://github.com/owner/repo -i name ✓ ctxc index -i name https://github.com/owner/repo ✗ (error) This is more predictable and matches CLI conventions.
884ab0a to
89e6268
Compare
Agent-Id: agent-f65941cf-2aac-4651-a905-32f3d8b9313d Linked-Note-Id: 26a5d7df-b154-45b3-9351-1698a06d4fd0
- Move examples from description to addHelpText() so they only appear in Usage: ctxc index [options] [command] Index a data source Options: -h, --help display help for command Commands: github [options] Index a GitHub repository gitlab [options] Index a GitLab project bitbucket [options] Index a Bitbucket repository website [options] Crawl and index a website help [command] display help for command, not in the main menu - Simplify usage line to instead of showing two separate usage patterns Agent-Id: agent-8cab8bce-f29f-48f1-8fc1-86167ff2398b
Summary
Simplifies the
ctxc indexcommand to accept URLs directly, eliminating the need for verbose source-specific flags.Before
After
Changes
1. URL Parser Module (
src/core/url-parser.ts)parseSourceUrl()function that auto-detects source type from URLs/tree/main2. CLI URL Mode (
src/bin/cmd-index.ts)urlsubcommand:ctxc index url <url>-i)--refto override branch/tag3. Direct URL Syntax (
src/bin/index.ts)ctxc index <url>without theurlsubcommandurlwhen a URL is detectedBackward Compatibility
✅ All existing subcommands work unchanged:
ctxc index github --owner augmentcode --repo context-connectors # Still worksTesting
Examples
Pull Request opened by Augment Code with guidance from the PR author