Skip to content

Extend CLI telemetry #1073

@patrikbraborec

Description

@patrikbraborec

Context

We already have a solid telemetry foundation (Segment → Mixpanel, auto-tracking via ApifyCommand, opt-out support). However, the current events only capture command name, flags, OS/arch, runtime, install method, CLI version, and user identity.

To build 6 Mixpanel dashboards (CLI Health, AI Agent Adoption, Onboarding, Retention, Performance, Feature Adoption), we need to extend the existing telemetry with additional properties and new events.

Part 1: New properties on existing cli_command_* events

These are simple additions — if an error occurs, just log it. No complex classification needed yet (see Part 4 for that).

1. exit_code — command success/failure

  • Capture the exit code in the finally block of ApifyCommand._run()
  • This is the single most important missing metric — we currently can't measure success rate

2. duration_ms — command latency

  • Record Date.now() at start of _run(), compute delta in finally
  • Enables p50/p95 latency tracking per command

3. ai_agent — AI agent detection

  • Check environment variables to detect which AI agent is invoking the CLI (same approach Stripe CLI uses in production — see pkg/useragent/useragent.go)
  • Priority env vars to check:
Env var Agent
CLAUDECODE claude_code
CURSOR_AGENT cursor
CLINE_ACTIVE cline
CODEX_SANDBOX / CODEX_THREAD_ID codex_cli
GEMINI_CLI gemini_cli
OPENCODE open_code
OPENCLAW_SHELL openclaw

4. is_ci + ci_provider — CI environment detection

  • Check common CI env vars: CI, GITHUB_ACTIONS, GITLAB_CI, JENKINS_URL, CIRCLECI, BUILDKITE, TRAVIS, etc.
  • is_ci: boolean
  • ci_provider: string (github_actions / gitlab / jenkins / circle / buildkite / unknown)

5. is_interactive — TTY detection

  • process.stdin.isTTY — distinguishes interactive terminal usage from piped/scripted usage
  • Combined with ai_agent and is_ci, this gives us a caller_type dimension (human / ai_agent / ci)

6. was_retried — retry detection

  • Detect if the same command is invoked again within ~10 seconds (compare command + timestamp in telemetry state file)
  • Helps measure whether error messages are actionable — especially for AI agents

7. --user-agent flag — caller/skill identification

  • Add an optional --user-agent flag to all commands (also supported via APIFY_CLI_USER_AGENT env var)
  • Allows callers to self-identify, e.g. apify actor call xyz --user-agent "apify-agent-skills/ultimate-scraper-1.3.0"
  • Sent as a user_agent telemetry property alongside ai_agent
  • This gives us two tracking dimensions:
    • ai_agent (automatic via env vars) → which AI tool is running the CLI
    • user_agent (opt-in via flag) → which skill/plugin triggered the command
  • Important: We should be sure that --user-agent is the correct flag.

Why this is needed:

  • Skills currently track usage via hardcoded User-Agent headers in API calls (e.g. apify-agent-skills/apify-ultimate-scraper-1.3.0). As Skills switch from API to CLI, this monitoring breaks. The --user-agent flag replaces this.
  • DX Heroes plugins (Claude Code, Cursor, OpenCode, etc.) can hardcode it too (e.g. apify-plugin/claude-code-1.0.0)
  • Regular CLI users don't pass it → null → we know it's direct usage
  • The ai_agent env var detection (item 3) still works independently — so we get both dimensions automatically

Part 2: New events

8. CLI Installed event

  • Fire on first run (when no telemetry state file exists yet, before creating it)
  • Properties: cli_version, os, arch, node_version, install_method, is_ci, ci_provider, ai_agent
  • Powers onboarding funnels: Install → first command → first successful run

9. Auth Event event

  • Fire on login and logout (src/commands/auth/login.ts, src/commands/auth/logout.ts)
  • Properties: action (login/logout), auth_method (token/browser), success (boolean), ai_agent, is_ci
  • Powers: Auth success rate chart, onboarding funnel (Install → Login → First Run)

10. API Request event

  • Fire on every Apify API call made by the CLI
  • Properties: endpoint_path (e.g. /v2/acts), method (GET/POST/etc.), status_code, duration_ms, request_id
  • Powers: API latency tracking, API error rate, slowest endpoints identification
  • Note: Strip any IDs from paths to avoid cardinality explosion (e.g. /v2/acts/{id}/runs not /v2/acts/abc123/runs)

Part 3: Opt-out improvement

11. DO_NOT_TRACK support

  • Currently only APIFY_CLI_DISABLE_TELEMETRY is supported
  • The Console Do Not Track standard (DO_NOT_TRACK=1) is becoming industry norm — GitHub CLI, Stripe CLI, and others respect it
  • Check DO_NOT_TRACK in addition to the existing env var

Part 4: Document analytics better

Currently when we first initialize telemetry, we link users to https://docs.apify.com/cli/docs/telemetry, which doesn't seem to be reachable anyway but through the direct link.

1. Fix the link so its always visible ?

Should just be about adding them to the sidebar

2. More in-depth documentation of what fields we send, when

Especially now that we are planning on adding new events, we should look into documenting each event separately, all fields and what they represent, make it even more explicit we do NOT track arguments or flag values passed in (for flags, only if they are used).

3. Potentially link to our analytics handling code

I mean might as well, we are open source :D

Part 5: Structured error handling (separate effort)

This is intentionally separate from Part 1. Part 1 just logs errors as-is. This part requires rethinking how errors are handled across the entire CLI codebase.

12. error_category + error_code — structured error classification

  • Classify errors into categories: auth, network, validation, config, runtime, unknown
  • Assign structured error codes (e.g. AUTH_TOKEN_EXPIRED, NETWORK_TIMEOUT, VALIDATION_MISSING_INPUT)
  • Powers the "Top errors" table and "Error category trend" reports
  • This requires a broader effort: audit all error paths in the CLI, introduce an error class/enum system, and ensure every thrown error carries a category and code
  • Can be implemented later once we have raw error data from Part 1 to understand what errors actually occur in practice

Additional tasks

  • Update docs with the info about what we are tracking

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions