feat(bandit): expand reviewer-pool exploration beyond configured pool

## Problem
L0 bandit 이 reviewer reward 데이터 수집하지만 exploration 공간이 config 에 명시된 pool 로 제한. 새 모델 (OpenRouter 에 주간 추가되는 최신 모델) 이 자동 시도되지 않음. 사용자가 수동으로 config 업데이트 안 하면 bandit 이 고립됨.

## Why this matters
Self-improving 시스템이 되려면 exploration 을 외부 카탈로그로 확장해야 함. 그래야 "시간이 갈수록 더 좋은 모델 자동 선택" 효과.

## Proposed approach
### Catalog fetcher
- OpenRouter `/models` API 에서 주기적으로 목록 가져옴
- 필터: 코딩/영문/구조화 출력 가능한 모델만 (metadata 힌트)
- 신규 모델은 ε-greedy exploration budget 의 일정 % 로 try

### Exploration budget
- 주 1회 (cron), bandit 이 low-reward pool 중 하위 1개를 신규 후보로 교체
- Config 의 base pool 은 보호 (user 설정 존중)

### Safety
- 새 모델 try 는 expected reward 없으니 low-severity PR 에서만 사용
- 3회 연속 unparseable → 자동 blacklist

## Acceptance criteria
- [ ] OpenRouter catalog fetcher (`packages/shared/src/providers/openrouter-catalog.ts`)
- [ ] Bandit exploration budget 로직
- [ ] Safety: new model 의 output 은 initial round 에서 별도 weight
- [ ] 문서: exploration 이 어떻게 작동하는지 (옵트아웃 가능)

## References
- `packages/core/src/l0/bandit-store.ts`
- `packages/core/src/l0/leaderboard.ts`
- `#474` (learning effectiveness audit) 와 correlated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bandit): expand reviewer-pool exploration beyond configured pool #481

Problem

Why this matters

Proposed approach

Catalog fetcher

Exploration budget

Safety

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(bandit): expand reviewer-pool exploration beyond configured pool #481

Description

Problem

Why this matters

Proposed approach

Catalog fetcher

Exploration budget

Safety

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions