Extend CLI telemetry

## Context

We already have a solid telemetry foundation (Segment → Mixpanel, auto-tracking via `ApifyCommand`, opt-out support). However, the current events only capture command name, flags, OS/arch, runtime, install method, CLI version, and user identity.

To build 6 Mixpanel dashboards (CLI Health, AI Agent Adoption, Onboarding, Retention, Performance, Feature Adoption), we need to extend the existing telemetry with additional properties and new events.

## Part 1: New properties on existing `cli_command_*` events

These are simple additions — if an error occurs, just log it. No complex classification needed yet (see Part 4 for that).

### 1. `exit_code` — command success/failure
- Capture the exit code in the `finally` block of `ApifyCommand._run()`
- This is the single most important missing metric — we currently can't measure success rate

### 2. `duration_ms` — command latency
- Record `Date.now()` at start of `_run()`, compute delta in `finally`
- Enables p50/p95 latency tracking per command

### 3. `ai_agent` — AI agent detection
- Check environment variables to detect which AI agent is invoking the CLI (same approach Stripe CLI uses in production — see [`pkg/useragent/useragent.go`](https://github.com/stripe/stripe-cli/blob/master/pkg/useragent/useragent.go))
- Priority env vars to check:

| Env var | Agent |
|---|---|
| `CLAUDECODE` | `claude_code` |
| `CURSOR_AGENT` | `cursor` |
| `CLINE_ACTIVE` | `cline` |
| `CODEX_SANDBOX` / `CODEX_THREAD_ID` | `codex_cli` |
| `GEMINI_CLI` | `gemini_cli` |
| `OPENCODE` | `open_code` |
| `OPENCLAW_SHELL` | `openclaw` |

### 4. `is_ci` + `ci_provider` — CI environment detection
- Check common CI env vars: `CI`, `GITHUB_ACTIONS`, `GITLAB_CI`, `JENKINS_URL`, `CIRCLECI`, `BUILDKITE`, `TRAVIS`, etc.
- `is_ci`: boolean
- `ci_provider`: string (github_actions / gitlab / jenkins / circle / buildkite / unknown)

### 5. `is_interactive` — TTY detection
- `process.stdin.isTTY` — distinguishes interactive terminal usage from piped/scripted usage
- Combined with `ai_agent` and `is_ci`, this gives us a `caller_type` dimension (human / ai_agent / ci)

### 6. `was_retried` — retry detection
- Detect if the same command is invoked again within ~10 seconds (compare command + timestamp in telemetry state file)
- Helps measure whether error messages are actionable — especially for AI agents

### 7. `--user-agent` flag — caller/skill identification
- Add an optional `--user-agent` flag to all commands (also supported via `APIFY_CLI_USER_AGENT` env var)
- Allows callers to self-identify, e.g. `apify actor call xyz --user-agent "apify-agent-skills/ultimate-scraper-1.3.0"`
- Sent as a `user_agent` telemetry property alongside `ai_agent`
- This gives us two tracking dimensions:
  - `ai_agent` (automatic via env vars) → **which AI tool** is running the CLI
  - `user_agent` (opt-in via flag) → **which skill/plugin** triggered the command
- **Important**: We should be sure that `--user-agent` is the correct flag.

**Why this is needed:**
- Skills currently track usage via hardcoded `User-Agent` headers in API calls (e.g. `apify-agent-skills/apify-ultimate-scraper-1.3.0`). As Skills switch from API to CLI, this monitoring breaks. The `--user-agent` flag replaces this.
- DX Heroes plugins (Claude Code, Cursor, OpenCode, etc.) can hardcode it too (e.g. `apify-plugin/claude-code-1.0.0`)
- Regular CLI users don't pass it → `null` → we know it's direct usage
- The `ai_agent` env var detection (item 3) still works independently — so we get both dimensions automatically

## Part 2: New events

### 8. `CLI Installed` event
- Fire on first run (when no telemetry state file exists yet, before creating it)
- Properties: `cli_version`, `os`, `arch`, `node_version`, `install_method`, `is_ci`, `ci_provider`, `ai_agent`
- Powers onboarding funnels: Install → first command → first successful run

### 9. `Auth Event` event
- Fire on login and logout (`src/commands/auth/login.ts`, `src/commands/auth/logout.ts`)
- Properties: `action` (login/logout), `auth_method` (token/browser), `success` (boolean), `ai_agent`, `is_ci`
- Powers: Auth success rate chart, onboarding funnel (Install → Login → First Run)

### 10. `API Request` event
- Fire on every Apify API call made by the CLI
- Properties: `endpoint_path` (e.g. `/v2/acts`), `method` (GET/POST/etc.), `status_code`, `duration_ms`, `request_id`
- Powers: API latency tracking, API error rate, slowest endpoints identification
- Note: Strip any IDs from paths to avoid cardinality explosion (e.g. `/v2/acts/{id}/runs` not `/v2/acts/abc123/runs`)

## Part 3: Opt-out improvement

### 11. `DO_NOT_TRACK` support
- Currently only `APIFY_CLI_DISABLE_TELEMETRY` is supported
- The [Console Do Not Track](https://consoledonottrack.com/) standard (`DO_NOT_TRACK=1`) is becoming industry norm — GitHub CLI, Stripe CLI, and others respect it
- Check `DO_NOT_TRACK` in addition to the existing env var

## Part 4: Document analytics better

Currently when we first initialize telemetry, we link users to https://docs.apify.com/cli/docs/telemetry, which doesn't seem to be reachable anyway but through the direct link.

### 1. Fix the link so its always visible ?

Should just be about adding them to the sidebar

### 2. More in-depth documentation of what fields we send, when

Especially now that we are planning on adding new events, we should look into documenting each event separately, all fields and what they represent, make it even more explicit we do NOT track arguments or flag values passed in (for flags, only if they are used). 

### 3. Potentially link to our analytics handling code

I mean might as well, we are open source :D 

## Part 5: Structured error handling (separate effort)

> This is intentionally separate from Part 1. Part 1 just logs errors as-is. This part requires rethinking how errors are handled across the entire CLI codebase.

### 12. `error_category` + `error_code` — structured error classification
- Classify errors into categories: `auth`, `network`, `validation`, `config`, `runtime`, `unknown`
- Assign structured error codes (e.g. `AUTH_TOKEN_EXPIRED`, `NETWORK_TIMEOUT`, `VALIDATION_MISSING_INPUT`)
- Powers the "Top errors" table and "Error category trend" reports
- **This requires a broader effort:** audit all error paths in the CLI, introduce an error class/enum system, and ensure every thrown error carries a category and code
- Can be implemented later once we have raw error data from Part 1 to understand what errors actually occur in practice

## Additional tasks

- [ ] Update docs with the info about what we are tracking 

## References

- Stripe CLI telemetry (lean, AI detection built-in): [`pkg/useragent/useragent.go`](https://github.com/stripe/stripe-cli/blob/master/pkg/useragent/useragent.go)
- Next.js telemetry (comprehensive, 14 event types): https://nextjs.org/telemetry
- Console Do Not Track standard: https://consoledonottrack.com/

Env var	Agent
`CLAUDECODE`	`claude_code`
`CURSOR_AGENT`	`cursor`
`CLINE_ACTIVE`	`cline`
`CODEX_SANDBOX` / `CODEX_THREAD_ID`	`codex_cli`
`GEMINI_CLI`	`gemini_cli`
`OPENCODE`	`open_code`
`OPENCLAW_SHELL`	`openclaw`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend CLI telemetry #1073

Context

Part 1: New properties on existing `cli_command_*` events

1. `exit_code` — command success/failure

2. `duration_ms` — command latency

3. `ai_agent` — AI agent detection

4. `is_ci` + `ci_provider` — CI environment detection

5. `is_interactive` — TTY detection

6. `was_retried` — retry detection

7. `--user-agent` flag — caller/skill identification

Part 2: New events

8. `CLI Installed` event

9. `Auth Event` event

10. `API Request` event

Part 3: Opt-out improvement

11. `DO_NOT_TRACK` support

Part 4: Document analytics better

1. Fix the link so its always visible ?

2. More in-depth documentation of what fields we send, when

3. Potentially link to our analytics handling code

Part 5: Structured error handling (separate effort)

12. `error_category` + `error_code` — structured error classification

Additional tasks

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Extend CLI telemetry #1073

Description

Context

Part 1: New properties on existing cli_command_* events

1. exit_code — command success/failure

2. duration_ms — command latency

3. ai_agent — AI agent detection

4. is_ci + ci_provider — CI environment detection

5. is_interactive — TTY detection

6. was_retried — retry detection

7. --user-agent flag — caller/skill identification

Part 2: New events

8. CLI Installed event

9. Auth Event event

10. API Request event

Part 3: Opt-out improvement

11. DO_NOT_TRACK support

Part 4: Document analytics better

1. Fix the link so its always visible ?

2. More in-depth documentation of what fields we send, when

3. Potentially link to our analytics handling code

Part 5: Structured error handling (separate effort)

12. error_category + error_code — structured error classification

Additional tasks

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Part 1: New properties on existing `cli_command_*` events

1. `exit_code` — command success/failure

2. `duration_ms` — command latency

3. `ai_agent` — AI agent detection

4. `is_ci` + `ci_provider` — CI environment detection

5. `is_interactive` — TTY detection

6. `was_retried` — retry detection

7. `--user-agent` flag — caller/skill identification

8. `CLI Installed` event

9. `Auth Event` event

10. `API Request` event

11. `DO_NOT_TRACK` support

12. `error_category` + `error_code` — structured error classification