Skip to content
Closed
63 changes: 63 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,69 @@ Don't use `fmt.Print*` for operational messages (checkpoint saves, hook invocati

**Privacy**: Don't log user content (prompts, file contents, commit messages). Log only operational metadata (IDs, counts, paths, durations).

### Explain Command Export Mode

The `explain` command has been extended with export flags for structured output suitable for public sharing (e.g., portfolios, job applications).

**Usage:**
```bash
# Export with showcase redaction (privacy-safe)
entire explain -c <checkpoint-id> --export --showcase --format=json -o showcase.json

# Export as markdown for README
entire explain -c <checkpoint-id> --export --showcase --format=markdown -o SHOWCASE.md

# Export without showcase (entropy-based redaction only)
entire explain -c <checkpoint-id> --export --format=json -o export.json

# Raw transcript (existing functionality, JSONL output)
entire explain -c <checkpoint-id> --raw-transcript
```

**Export Flags:**
- `--export`: Enable structured export mode (JSON or Markdown)
- `--showcase`: Apply showcase redaction (use with --export)
- `--format`: Output format (json, markdown) - default: json
- `-o, --output`: Output file (default: stdout)

**Showcase Redaction:**

Showcase mode applies aggressive redaction for public sharing:
- **Entropy-based** (existing): API keys, tokens, secrets
- **Pattern-based**: Internal URLs, private IPs, DB connection strings, emails, JWTs, PEM keys, AWS ARNs
- **Structural**: File paths normalized to project-relative, project names from git remotes
- **Blocklist**: User-defined terms (company names, project codenames)

Configuration in `.entire/settings.json`:
```json
{
"strategy_options": {
"showcase": {
"redact_paths": true,
"redact_usernames": true,
"redact_project_info": true,
"allowed_paths": ["src/", "lib/"],
"allowed_domains": [],
"custom_blocklist": ["my-company", "project-codename"]
}
}
}
```

**Security Warning:**

ALWAYS review output before publishing - redaction is best-effort. Showcase mode cannot guarantee 100% removal of all sensitive data.

**Testing:**
- Unit tests: `mise run test` (tests in `redact/showcase_test.go`, `settings/settings_test.go`, `cli/explain_formatters_test.go`)
- Integration tests: `mise run test:integration` (tests in `integration_test/explain_export_test.go`)

**Key files:**
- `redact/showcase.go` - Showcase redaction implementation
- `settings/settings.go` - Settings helpers for showcase config
- `cli/explain.go` - Export flags and routing
- `cli/explain_formatters.go` - JSON and Markdown formatters

### Git Operations

We use github.com/go-git/go-git for most git operations, but with important exceptions:
Expand Down
177 changes: 177 additions & 0 deletions INSIGHTS_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Insights Command Implementation

## Summary

Implemented the `entire insights` command to provide session analytics across time periods.

## Files Created

### Core Implementation (550 lines)
1. **`cmd/entire/cli/insights.go`** (200 lines)
- Command definition with flags (--period, --agent, --json, --export, --format, --output, --no-cache)
- Flag validation
- Routing to compute → format → output pipeline
- Integration with checkDisabledGuard()

2. **`cmd/entire/cli/insights_analytics.go`** (450 lines)
- `filterSessionQuality()`: Quality gate (exclude <2 messages, <1 min duration)
- `computeMetadataMetrics()`: Fast path using session/checkpoint metadata
- `enrichWithTranscriptData()`: Parallel transcript parsing with worker pool
- `extractToolUsage()`: Parse JSONL, count tool_use blocks
- `extractHourlyActivity()`: Parse timestamps, bucket by hour
- `chunkTranscript()`: Split transcripts >30K chars into 25K segments
- `aggregateFacets()`: Aggregate facets into insights report
- Data types: InsightsQuery, InsightsReport, AgentStat, ActivityPoint, etc.

3. **`cmd/entire/cli/insights_cache.go`** (150 lines)
- `SessionFacet`: Cached per-session analytics (tokens, tools, duration, messages)
- `InsightsCache`: Persistent cache in `.entire/insights-cache/<repo-hash>.json`
- `loadCache()`: Read cached facets from disk
- `saveCache()`: Persist updated facets
- Cache TTL: 30 days

4. **`cmd/entire/cli/insights_filters.go`** (50 lines)
- `applyPeriodFilter()`: Convert week/month/year to time ranges

5. **`cmd/entire/cli/insights_formatters.go`** (400 lines)
- `formatInsightsTerminal()`: Human-readable terminal output with tables and heatmap
- `formatInsightsJSON()`: Structured JSON export
- `formatInsightsMarkdown()`: Documentation-ready Markdown
- `formatInsightsHTML()`: Interactive HTML report with CSS styling
- Helper formatters: `formatDuration()`, `formatNumber()`, `renderHeatmap()`

### Tests (350 lines)
6. **`cmd/entire/cli/insights_test.go`** (200 lines)
- Unit tests for analytics computation
- Filter logic tests (period, agent)
- Formatter tests (duration, number, hash, barChar)
- All tests use `t.Parallel()` pattern

7. **`cmd/entire/cli/integration_test/insights_test.go`** (150 lines)
- End-to-end command execution tests
- Export format validation (JSON, Markdown, HTML)
- Flag validation tests
- Uses `RunForAllStrategies()` pattern

### Integration
8. **`cmd/entire/cli/root.go`** (1 line changed)
- Added `cmd.AddCommand(newInsightsCmd())` to register command

## Key Features

### Session Quality Filtering
- Excludes sessions with <2 user messages
- Excludes sessions <1 minute duration
- Filters out agent sub-task sessions

### Incremental Caching
- First run: Full analysis of all sessions
- Subsequent runs: Only analyze new sessions (max 50 per run)
- Cache stored in `.entire/insights-cache/<repo-hash>.json`
- 30-day TTL for cache entries
- `--no-cache` flag to force full re-analysis

### Metrics Provided
- **Summary**: Sessions, total time, tokens, estimated cost, files modified
- **Agent breakdown**: Sessions/tokens/hours per agent type
- **Top tools**: Most frequently used tools (top 5)
- **Peak hours**: 24-hour activity heatmap
- **Recent sessions**: Last 5 sessions with descriptions

### Output Formats
- **Terminal** (default): Formatted tables with Unicode box drawing
- **JSON**: Structured data for programmatic use
- **Markdown**: Documentation-ready format with tables
- **HTML**: Minimal light theme report with:
- Left sidebar navigation (Overview, Repositories)
- Time-based greeting header ("Evening, developer")
- Large monospace stat cards with tiny labels
- Scatter/bubble activity chart (hour-of-day on Y-axis, date on X-axis, amber dots)
- Coral/orange "Claude Code" badges
- GitHub-style tables with diff stats (+green / -red)
- Almost zero borders, lots of whitespace

### Performance
- **Without caching (first run)**:
- Week (14 sessions): <1s
- Month (60 sessions): 2-3s (max 50 parsed)
- Year (412 sessions): 8-10s (max 50 parsed per run)

- **With caching (subsequent runs)**:
- Week (5 new): <500ms
- Month (10 new): 1-2s
- Year (20 new): 3-5s

### Bounded Parallelism
- Worker pool: `runtime.NumCPU() / 2` goroutines
- Max 50 new sessions analyzed per run (Claude Code pattern)
- Transcript chunking for >30K chars

## Command Usage

```bash
# Default week view
entire insights

# Month view
entire insights --period month

# Year view
entire insights --period year

# Filter by agent
entire insights --agent claude-code

# JSON output
entire insights --json

# Export to file
entire insights --export --format json -o stats.json
entire insights --export --format markdown -o INSIGHTS.md
entire insights --export --format html -o report.html

# Force full re-analysis
entire insights --no-cache
```

## Testing

Run unit tests:
```bash
mise run test
```

Run integration tests:
```bash
mise run test:integration
```

Run full CI suite:
```bash
mise run test:ci
```

## Implementation Notes

1. **Follows existing patterns**: Command structure matches `explain.go`, formatters follow `explain_formatters.go`
2. **Reuses abstractions**: Uses `strategy.ListSessions()`, `checkpoint.GitStore`, `transcript.ParseFromBytes()`
3. **Privacy-first**: Local storage only, HTML reports stay on disk
4. **Quality gate**: Filters noise before aggregation
5. **Incremental updates**: Cache avoids re-analyzing unchanged sessions
6. **Bounded resources**: Max 50 sessions/run prevents resource exhaustion
7. **HTML design follows light theme guide**:
- White background, minimal color palette
- Large monospace numbers with tiny unit labels
- Scatter/bubble chart instead of heatmaps (hour on Y-axis, date on X-axis)
- Coral/orange badges for Claude Code
- GitHub-style tables with diff stats
- Left sidebar navigation
- Time-based greeting header
- Almost zero borders, generous whitespace

## Future Enhancements (v2)

Optional LLM-powered analysis with `--analyze` flag:
- Pattern recognition: goal categories, friction points, success patterns
- Actionable recommendations: skill suggestions, workflow improvements
- Qualitative insights from quantitative data
2 changes: 1 addition & 1 deletion cmd/entire/cli/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ func IsEnabled() (bool, error) {
// GetStrategy returns the configured strategy instance.
// Falls back to default if the configured strategy is not found.
//
//nolint:ireturn // Factory pattern requires returning the interface

func GetStrategy() strategy.Strategy {
s, err := settings.Load()
if err != nil {
Expand Down
Loading