diff --git a/uts/.claude/skills/write-test-spec.md b/uts/.claude/skills/write-test-spec.md index 0dac235e2..d5a05404d 100644 --- a/uts/.claude/skills/write-test-spec.md +++ b/uts/.claude/skills/write-test-spec.md @@ -22,6 +22,12 @@ This skill provides comprehensive guidance for writing portable test specificati - Provision apps via `POST /apps` with body from `ably-common/test-resources/test-app-setup.json` - Use `endpoint: "sandbox"` in ClientOptions +### Proxy Integration Tests (Ably Sandbox via Proxy) +- Run against Ably Sandbox through a programmable proxy (`uts/test/proxy/`) +- Proxy transparently forwards traffic but can inject faults via rules +- Use for testing fault behaviour: connection failures, token renewal under errors, heartbeat starvation, channel error injection +- See `uts/test/realtime/integration/helpers/proxy.md` for the full proxy infrastructure spec + ## Mock Infrastructure Patterns ### HTTP Mock Infrastructure @@ -229,6 +235,189 @@ mock_ws.active_connection.send_to_client(ProtocolMessage( mock_ws.active_connection.simulate_disconnect() ``` +## Proxy Integration Tests + +For detailed proxy infrastructure documentation, see `uts/test/realtime/integration/helpers/proxy.md`. + +### When to Use Proxy Tests + +| Test type | When to use | +|-----------|-------------| +| **Unit test** (mock HTTP/WebSocket) | Client-side logic, state machines, request formation, error parsing. Fast, deterministic. | +| **Direct sandbox integration** | Happy-path behaviour: connect, publish, subscribe. No fault injection needed. | +| **Proxy integration test** | Fault behaviour against real backend: connection failures, resume, heartbeat starvation, token renewal under network errors, channel error injection. | + +### Proxy Test Structure + +```markdown +# Feature Name Proxy Integration Tests + +Spec points: `RTN14a`, `RTN14b`, ... + +## Test Type +Proxy integration test against Ably Sandbox endpoint + +## Proxy Infrastructure +See `uts/test/realtime/integration/helpers/proxy.md` for proxy infrastructure specification. + +## Corresponding Unit Tests +- `uts/test/realtime/unit/connection/connection_failures_test.md` — RTN15a, RTN15b + +## Sandbox Setup +[standard app provisioning — same as direct sandbox tests] + +--- + +## RTN14a - Test name + +| Spec | Requirement | +|------|-------------| +| RTN14a | ... | + +**Corresponding unit test:** `connection_open_failures_test.md` RTN14a + +Tests that [behaviour] when the proxy injects [fault]. + +### Setup + +```pseudo +session = create_proxy_session( + target: TargetConfig(realtimeHost: "sandbox-realtime.ably.io", restHost: "sandbox-rest.ably.io"), + port: allocated_port, + rules: [{ + "match": { ... }, + "action": { ... }, + "times": 1, + "comment": "description" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +state_changes = [] +client.connection.on(change => state_changes.append(change.current)) + +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.failed + WITH timeout: 15 seconds +``` + +### Assertions + +```pseudo +ASSERT client.connection.state == ConnectionState.failed +ASSERT client.connection.errorReason.code == 40005 +``` +``` + +### Common Proxy Rule Patterns + +**Replace server response with error:** +```json +{ + "match": { "type": "ws_frame_to_client", "action": "CONNECTED" }, + "action": { + "type": "replace", + "message": { "action": 9, "error": { "code": 40005, "statusCode": 400, "message": "Error" } } + }, + "times": 1 +} +``` + +**Refuse connection (one-shot):** +```json +{ + "match": { "type": "ws_connect", "count": 1 }, + "action": { "type": "refuse_connection" }, + "times": 1 +} +``` + +**Suppress frame (cause timeout):** +```json +{ + "match": { "type": "ws_frame_to_server", "action": "ATTACH" }, + "action": { "type": "suppress" } +} +``` + +**Temporal trigger (timed fault injection):** +```json +{ + "match": { "type": "delay_after_ws_connect", "delayMs": 2000 }, + "action": { "type": "suppress_onwards" }, + "times": 1 +} +``` + +**Inject message to client:** +```json +{ + "match": { "type": "delay_after_ws_connect", "delayMs": 1000 }, + "action": { + "type": "inject_to_client_and_close", + "message": { "action": 6, "error": { "code": 40142, "statusCode": 401, "message": "Token expired" } } + }, + "times": 1 +} +``` + +**HTTP fault (return custom response):** +```json +{ + "match": { "type": "http_request", "pathContains": "/channels/" }, + "action": { + "type": "http_respond", + "status": 401, + "body": { "error": { "code": 40142, "statusCode": 401, "message": "Token expired" } } + }, + "times": 1 +} +``` + +### Proxy Test Conventions + +1. Each test references the spec point AND the corresponding unit test +2. Tests use `create_proxy_session()` with rules, then connect SDK through the proxy +3. Tests use `AWAIT_STATE` for state assertions and record state changes for sequence verification +4. Tests verify behaviour via SDK state AND proxy event log where useful +5. All tests use `useBinaryProtocol: false` (SDK doesn't implement msgpack) +6. All tests use `endpoint: "localhost"` which auto-disables fallback hosts (REC2c2) +7. Timeouts are generous (10-30s) since real network is involved +8. Each test file provisions a sandbox app in `BEFORE ALL TESTS` and cleans up in `AFTER ALL TESTS` +9. Each test creates its own proxy session and cleans it up after +10. Use imperative actions (`session.trigger_action()`) when you need to disconnect at a specific point in the test flow, rather than timing-based rules +11. Use `add_rules()` to add rules dynamically during a test (e.g., after channel attach succeeds, add a rule to suppress DETACH) + +### Proxy Event Log Assertions + +```pseudo +# Verify resume was attempted on reconnection +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +ASSERT ws_connects[1].queryParams["resume"] IS NOT null + +# Verify heartbeats=true in connection URL +ASSERT ws_connects[0].queryParams["heartbeats"] == "true" + +# Verify specific frames were sent +frames = log.filter(e => e.type == "ws_frame" AND e.direction == "client_to_server") +attach_frames = frames.filter(f => f.message.action == 10) # ATTACH = 10 +ASSERT attach_frames.length == 1 +``` + ## Spec Requirement Summaries **Every test must include a spec requirement summary immediately after the heading.** @@ -777,7 +966,17 @@ uts/test/ │ │ ├── connection_open_failures_test.md │ │ └── ... │ └── integration/ -│ └── (future Realtime integration tests) +│ ├── helpers/ +│ │ └── proxy.md # Proxy infrastructure spec +│ ├── proxy/ +│ │ ├── connection_open_failures.md # RTN14 tests via proxy +│ │ ├── connection_resume.md # RTN15 tests via proxy +│ │ ├── heartbeat.md # RTN23 tests via proxy +│ │ ├── channel_faults.md # RTL4, RTL5, RTL13, RTL14 via proxy +│ │ ├── rest_faults.md # RSC10, RSC15 via proxy +│ │ └── end_to_end.md # RTL6 publish + history via proxy +│ ├── connection_lifecycle_test.md # Direct sandbox tests +│ └── ... └── README.md ``` diff --git a/uts/completion-status.md b/uts/completion-status.md index 9909eb07b..43a27031c 100644 --- a/uts/completion-status.md +++ b/uts/completion-status.md @@ -386,6 +386,31 @@ This matrix lists all spec items from the [Ably features spec](../../specificati --- +### Proxy Integration Tests + +| Spec item | Description | UTS test spec | +|-----------|-------------|---------------| +| RTN14a | Fatal error during connection open → FAILED | Yes — `realtime/integration/proxy/connection_open_failures.md` | +| RTN14b | Token error during connection → renew and retry | Yes — `realtime/integration/proxy/connection_open_failures.md` | +| RTN14c | Connection timeout (no CONNECTED received) | Yes — `realtime/integration/proxy/connection_open_failures.md` | +| RTN14d | Retry after connection refused | Yes — `realtime/integration/proxy/connection_open_failures.md` | +| RTN14g | Connection-level ERROR during open → FAILED | Yes — `realtime/integration/proxy/connection_open_failures.md` | +| RTN15a | Unexpected disconnect triggers resume | Yes — `realtime/integration/proxy/connection_resume.md` | +| RTN15b/c6 | Resume preserves connectionId | Yes — `realtime/integration/proxy/connection_resume.md` | +| RTN15c7 | Failed resume gets new connectionId | Yes — `realtime/integration/proxy/connection_resume.md` | +| RTN15h1 | DISCONNECTED with token error, non-renewable → FAILED | Yes — `realtime/integration/proxy/connection_resume.md` | +| RTN15h3 | DISCONNECTED with non-token error → reconnect | Yes — `realtime/integration/proxy/connection_resume.md` | +| RTN23a | Heartbeat starvation causes disconnect | Yes — `realtime/integration/proxy/heartbeat.md` | +| RTN23a | heartbeats=true in connection URL | Yes — `realtime/integration/proxy/heartbeat.md` | +| RTL4f | Attach timeout (server doesn't respond) | Yes — `realtime/integration/proxy/channel_faults.md` | +| RTL4h | Server responds with ERROR to ATTACH | Yes — `realtime/integration/proxy/channel_faults.md` | +| RTL5f | Detach timeout (server doesn't respond) | Yes — `realtime/integration/proxy/channel_faults.md` | +| RTL13a | Server sends unsolicited DETACHED → reattach | Yes — `realtime/integration/proxy/channel_faults.md` | +| RTL14 | Server sends channel ERROR → FAILED | Yes — `realtime/integration/proxy/channel_faults.md` | +| RSC10 | Token renewal on HTTP 401 | Yes — `realtime/integration/proxy/rest_faults.md` | +| RSC15a | HTTP 503 error (no fallback) | Yes — `realtime/integration/proxy/rest_faults.md` | +| RTL6 | End-to-end publish and history | Yes — `realtime/integration/proxy/rest_faults.md` | + ## Summary | Area | Spec groups | With UTS spec | Coverage | diff --git a/uts/proxy/IMPLEMENTATION.md b/uts/proxy/IMPLEMENTATION.md new file mode 100644 index 000000000..02a266181 --- /dev/null +++ b/uts/proxy/IMPLEMENTATION.md @@ -0,0 +1,710 @@ +# Ably Test Proxy — Go Implementation Plan + +## Project structure + +``` +uts/test/proxy/ +├── PROPOSAL.md # API and design proposal +├── IMPLEMENTATION.md # This file +├── go.mod # module: ably.io/test-proxy +├── go.sum +├── main.go # Entry point, flag parsing, server startup +├── server.go # Control API HTTP server, routing +├── session.go # Session lifecycle (create, get, delete, timeout) +├── session_store.go # Thread-safe session storage +├── rule.go # Rule types, matching logic, action types +├── ws_proxy.go # WebSocket proxy (client↔server frame relay) +├── http_proxy.go # HTTP reverse proxy with rule interception +├── protocol.go # Ably protocol message parsing (JSON + msgpack) +├── log.go # Event log (per-session traffic capture) +├── action.go # Imperative action dispatch +├── listener.go # Per-session TCP listener management +└── proxy_test.go # Tests +``` + +## Design decisions + +1. **Port-per-session routing.** The SDK constructs URLs with standard paths (`/channels/...`, `/?key=...`). It cannot prepend a session path prefix. Therefore, each session gets its own TCP listener on a dedicated port. The test process assigns a port from a pool (e.g., 10000–11023, 1024 ports) and passes it in the session creation request. The proxy binds that port and maps all traffic on it to the session. The SDK connects to `ws://localhost:{sessionPort}/...` and `http://localhost:{sessionPort}/...` with normal paths. + +2. **No TLS between client and proxy.** The proxy serves plain HTTP/WS to the SDK. Upstream connections to the Ably sandbox use TLS (`wss://`, `https://`). The SDK is configured with `tls: false`. + +3. **Msgpack support.** The proxy decodes both JSON (text frames) and msgpack (binary frames) for rule matching. Go has good msgpack libraries. Both formats are decoded into the same `ProtocolMessage` struct for matching, then the original raw bytes are forwarded unchanged (the proxy never re-encodes). + +## Dependencies + +| Package | Purpose | +|---------|---------| +| `github.com/gorilla/websocket` | WebSocket client and server | +| `github.com/vmihailenco/msgpack/v5` | Msgpack decoding for binary protocol frames | +| `net/http` | Control API server | +| `net/http/httputil` | `ReverseProxy` for HTTP passthrough | +| `encoding/json` | JSON protocol message parsing, API request/response | +| `sync` | Mutex for session store and rule list | +| `net` | TCP listener management for per-session ports | + +## Phases + +### Phase 1: Skeleton and control API + +Build the control API HTTP server, session CRUD, per-session listener management, and health check. No proxying yet. + +**Files:** `main.go`, `server.go`, `session.go`, `session_store.go`, `rule.go`, `log.go`, `listener.go` + +#### `main.go` + +- Parse `--control-port` flag (default 9100) +- Start the control API HTTP server on the control port +- Handle SIGINT/SIGTERM for graceful shutdown (close all session listeners) + +#### `server.go` + +Control API mux routing: + +| Method | Path | Handler | +|--------|------|---------| +| `GET` | `/health` | Return `{"ok": true}` | +| `POST` | `/sessions` | Create session | +| `GET` | `/sessions/{id}` | Get session metadata | +| `POST` | `/sessions/{id}/rules` | Add rules | +| `POST` | `/sessions/{id}/actions` | Trigger imperative action | +| `GET` | `/sessions/{id}/log` | Get event log | +| `DELETE` | `/sessions/{id}` | Teardown session | + +Use `net/http.ServeMux` (Go 1.22 has method-aware routing with `{id}` wildcards). No framework needed. + +#### `listener.go` + +Per-session TCP listener management: + +```go +// StartSessionListener binds to the given port, starts an HTTP server +// that routes all WS and HTTP traffic to the session's handlers. +// Returns an error immediately if the port cannot be bound. +func StartSessionListener(session *Session, port int) error + +// StopSessionListener closes the listener and shuts down the HTTP server. +func StopSessionListener(session *Session) +``` + +On session creation: +1. Caller provides `port` in the request body +2. Proxy calls `net.Listen("tcp", fmt.Sprintf(":%d", port))` +3. If listen fails (port in use, permission denied), return HTTP 409 with error message +4. Start `http.Serve` on the listener in a goroutine +5. The per-session HTTP server routes: + - WebSocket upgrade requests → `ws_proxy` handler + - All other HTTP requests → `http_proxy` handler + +On session deletion: +1. Close the TCP listener (stops accepting new connections) +2. Close all active WS connections +3. Shut down the per-session HTTP server + +#### `session.go` + +```go +type CreateSessionRequest struct { + Target TargetConfig `json:"target"` + Rules []Rule `json:"rules,omitempty"` + TimeoutMs int `json:"timeoutMs,omitempty"` // default 30000 + Port int `json:"port"` // required — caller-assigned +} + +type CreateSessionResponse struct { + SessionID string `json:"sessionId"` + Proxy ProxyConfig `json:"proxy"` +} + +type ProxyConfig struct { + Host string `json:"host"` // "localhost:{port}" + Port int `json:"port"` +} +``` + +Session creation flow: +1. Validate request (port is required, target has at least one host) +2. Generate session ID (random 8-char hex) +3. Create `Session` struct with rules, empty event log +4. Attempt to bind the requested port — fail fast with 409 if it can't +5. Start the per-session HTTP server +6. Start timeout timer (`time.AfterFunc`) +7. Store session in `SessionStore` +8. Return session ID and proxy host/port + +#### `session_store.go` + +Thread-safe `map[string]*Session` with `sync.RWMutex`. + +```go +type SessionStore struct { + sessions map[string]*Session + mu sync.RWMutex +} + +func (s *SessionStore) Create(session *Session) error +func (s *SessionStore) Get(id string) (*Session, bool) +func (s *SessionStore) Delete(id string) (*Session, bool) +func (s *SessionStore) All() []*Session +``` + +#### `rule.go` + +Rule, MatchConfig, ActionConfig structs with JSON tags. Matching logic is a method on Session: + +```go +// FindMatchingRule iterates rules in order and returns the first match. +// Returns nil if no rule matches (passthrough). +func (s *Session) FindMatchingRule(event MatchEvent) *Rule +``` + +`MatchEvent` is a tagged union representing the thing being matched: + +```go +type MatchEvent struct { + Type string // "ws_connect", "ws_frame_to_server", "ws_frame_to_client", "http_request" + Action string // protocol message action name (for frame matches) + Channel string // protocol message channel (for frame matches) + Method string // HTTP method + Path string // HTTP request path + QueryParams map[string]string // WS connection query params +} +``` + +#### `log.go` + +Append-only event log with mutex. + +```go +type EventLog struct { + events []Event + mu sync.Mutex +} + +func (l *EventLog) Append(event Event) +func (l *EventLog) Events() []Event // returns a copy +``` + +### Phase 2: WebSocket proxy — passthrough + +Implement transparent WebSocket proxying with no rules applied. + +**Files:** `ws_proxy.go`, `protocol.go` + +#### WebSocket proxy flow + +1. Client connects to `ws://localhost:{sessionPort}/?key=...&heartbeats=true&...` +2. Per-session HTTP server detects WebSocket upgrade, hands off to `WsProxyHandler` +3. Increment `session.WsConnectCount` +4. Log `ws_connect` event with URL and query params +5. Build upstream URL: `wss://{target.realtimeHost}/?key=...&heartbeats=true&...` + - Copy all query params from client request + - Scheme is always `wss` (TLS to upstream) +6. Dial upstream WebSocket +7. If dial fails, return error to client (502) +8. Accept the client WebSocket upgrade +9. Start two goroutines: + - **client→server relay**: `readFromClient()` → log → `writeToServer()` + - **server→client relay**: `readFromServer()` → log → `writeToClient()` +10. When either side closes or errors, close the other side +11. Log `ws_disconnect` event + +#### `protocol.go` + +Parse protocol messages for rule matching. Support both JSON and msgpack. + +```go +// ParseProtocolMessage attempts to decode a WebSocket frame into a ProtocolMessage. +// For text frames, parses as JSON. +// For binary frames, parses as msgpack. +// Returns the parsed message and nil error on success. +// On parse failure, returns a zero ProtocolMessage and error (frame is still forwarded). +func ParseProtocolMessage(data []byte, messageType int) (ProtocolMessage, error) +``` + +The `ProtocolMessage` struct: + +```go +type ProtocolMessage struct { + Action int // numeric action code (always normalized to int) + Channel string + Error *ErrorInfo +} +``` + +Action name↔number mapping (subset needed for matching): + +| Name | Number | +|------|--------| +| HEARTBEAT | 0 | +| ACK | 1 | +| NACK | 2 | +| CONNECT | 3 | +| CONNECTED | 4 | +| DISCONNECT | 5 | +| DISCONNECTED | 6 | +| CLOSE | 7 | +| CLOSED | 8 | +| ERROR | 9 | +| ATTACH | 10 | +| ATTACHED | 11 | +| DETACH | 12 | +| DETACHED | 13 | +| PRESENCE | 14 | +| MESSAGE | 15 | +| SYNC | 16 | +| AUTH | 17 | + +Rule matching accepts either name (`"ATTACH"`) or number (`10`) in the match config. Internally normalized to int. + +**Msgpack decoding:** + +Ably msgpack protocol messages are arrays where the first element is the action number. Use `github.com/vmihailenco/msgpack/v5` to decode into a `[]interface{}` and extract the action and channel fields by position. The field positions follow the Ably protocol: + +| Index | Field | +|-------|-------| +| 0 | action | +| 1 | channel | +| ... | (other fields — not needed for matching) | + +Alternatively, decode as a map if the server uses map encoding. Try map first, fall back to array. Log a warning if neither works but still forward the raw frame. + +#### Connection tracking + +```go +type WsConnection struct { + ClientConn *websocket.Conn + ServerConn *websocket.Conn + ConnNumber int // which connection attempt this is (1-based) + timers []*time.Timer // for delay_after_ws_connect cleanup + mu sync.Mutex +} +``` + +Session tracks `activeWsConn *WsConnection` (most recent). When a new WS connection arrives, any previous one should already be closed (the SDK doesn't multiplex WS connections). But track it as a list for safety. + +### Phase 3: WebSocket proxy — rule matching + +Apply rules to WebSocket frames and connection events. + +**Files:** `ws_proxy.go`, `rule.go` + +#### Rule evaluation points + +**On WS connection attempt** (before dialing upstream): +1. Build `MatchEvent{Type: "ws_connect", QueryParams: ...}` +2. Find matching rule +3. If rule action is `refuse_connection`: return HTTP 502 to client, don't dial upstream +4. If rule action is `accept_and_close`: accept WS upgrade, send close frame, don't dial upstream +5. Otherwise: proceed to dial upstream + +**On frame from client** (before forwarding to server): +1. Parse protocol message +2. Build `MatchEvent{Type: "ws_frame_to_server", Action: ..., Channel: ...}` +3. Check `session.suppressClientToServer` flag — if set, drop frame +4. Find matching rule +5. Execute action (suppress, delay, replace, etc.) +6. If no rule matched: forward frame + +**On frame from server** (before forwarding to client): +1. Parse protocol message +2. Build `MatchEvent{Type: "ws_frame_to_client", Action: ..., Channel: ...}` +3. Check `session.suppressServerToClient` flag — if set, drop frame +4. Find matching rule +5. Execute action +6. If no rule matched: forward frame + +#### Count tracking + +The `count` match field means "only match the Nth occurrence of this event type." Counters are per-session: + +- `session.wsConnectCount` — incremented on each WS connection attempt +- `session.wsFrameToServerCount` — incremented on each frame from client +- `session.wsFrameToClientCount` — incremented on each frame from server + +A rule with `count: 2` matches when the counter equals 2 at evaluation time. + +Optionally, counters can be scoped per-action (e.g., "the 2nd ATTACH frame"). Implementation: the rule's `fired` counter tracks how many times the rule's match condition has been checked against a matching event. If `count` is set, the rule only fires when `fired + 1 == count`. + +**Simpler approach (recommended):** `count` is a per-rule occurrence counter. The rule tracks how many times its match condition (type + action + channel) has been satisfied. It only fires when that count equals the specified value. This is more intuitive: `{ "type": "ws_connect", "count": 2 }` means "the 2nd connection attempt that would otherwise match this rule." + +#### `times` handling + +```go +func (s *Session) FireRule(rule *Rule) { + rule.fired++ + if rule.Times > 0 && rule.fired >= rule.Times { + s.removeRule(rule) + } +} +``` + +### Phase 4: HTTP proxy — passthrough and rules + +Implement HTTP reverse proxying for REST API calls. + +**Files:** `http_proxy.go` + +#### HTTP proxy flow + +1. Client sends HTTP request to `http://localhost:{sessionPort}/channels/test/messages` +2. Per-session HTTP server routes non-WebSocket requests to `HttpProxyHandler` +3. Increment `session.HttpReqCount` +4. Log `http_request` event +5. Build `MatchEvent{Type: "http_request", Method: ..., Path: ...}` +6. Find matching rule +7. Execute action: + - `passthrough` (or no match): forward to upstream + - `http_respond`: return specified response immediately + - `http_delay`: sleep then forward + - `http_drop`: hijack connection and close + - `http_replace_response`: forward, discard response, return specified response +8. If forwarding: use `httputil.ReverseProxy` with upstream `https://{target.restHost}` +9. Log `http_response` event + +#### Forwarding details + +- Copy all request headers, body, query params +- Set `Host` header to target host +- Scheme is `https` (TLS to upstream) +- Response headers and body are copied back to client +- Content-Type, status code, etc. are preserved + +#### HTTP count tracking + +`session.httpReqCount` increments on each request. Per-rule `count` matching works the same as for WS: per-rule occurrence counter. + +### Phase 5: Imperative actions + +Implement `POST /sessions/{id}/actions`. + +**Files:** `action.go` + +```go +type ActionRequest struct { + Type string `json:"type"` + Message json.RawMessage `json:"message,omitempty"` + CloseCode int `json:"closeCode,omitempty"` +} +``` + +Handler: +1. Parse request body +2. Find session +3. Find active WS connection(s) +4. Execute action on the connection: + - `disconnect`: `conn.ClientConn.UnderlyingConn().Close()` (raw TCP close) + - `close`: `conn.ClientConn.WriteMessage(websocket.CloseMessage, ...)` + - `inject_to_client`: `conn.ClientConn.WriteMessage(websocket.TextMessage, message)` + - `inject_to_client_and_close`: write message then close +5. Log the action as an event +6. Return 200 OK (or 404/409 on errors) + +### Phase 6: Temporal triggers + +Implement `delay_after_ws_connect` match type. + +**Files:** `ws_proxy.go` + +After upstream WS connection is established: + +1. Lock session mutex +2. Iterate rules looking for `delay_after_ws_connect` type +3. For each matching rule, schedule `time.AfterFunc`: + ```go + timer := time.AfterFunc(time.Duration(rule.Match.DelayMs)*time.Millisecond, func() { + executeAction(session, wsConn, rule.Action) + session.FireRule(rule) + }) + wsConn.timers = append(wsConn.timers, timer) + ``` +4. On WS connection close, cancel all pending timers: + ```go + for _, t := range wsConn.timers { + t.Stop() + } + ``` +5. On session delete, cancel all timers on all connections + +### Phase 7: Tests + +**Files:** `proxy_test.go` + +Test infrastructure: each test starts a local mock upstream server (HTTP + WS echo/scripted), creates the proxy, creates a session pointing at the mock upstream, and connects a client through the proxy. + +```go +// Helper: start a mock upstream WS server that sends CONNECTED then echoes frames +func startMockUpstream(t *testing.T) (wsURL string, httpURL string, cleanup func()) + +// Helper: start the proxy control API +func startProxy(t *testing.T) (controlURL string, cleanup func()) + +// Helper: create a session and return proxy host:port +func createSession(t *testing.T, controlURL string, req CreateSessionRequest) CreateSessionResponse +``` + +#### Test cases + +**Control API:** +1. Health check returns 200 +2. Create session succeeds, returns valid port +3. Create session with port already in use returns 409 +4. Get session returns metadata +5. Delete session returns event log, frees port +6. Session auto-deleted after timeout +7. Add rules dynamically + +**WebSocket proxy:** +8. WS passthrough — frames forwarded both directions +9. WS connection refusal — first connection refused, second passes through +10. WS disconnect action — abrupt close mid-stream +11. WS frame suppression — client ATTACH suppressed, server never sees it +12. WS inject_to_client — proxy injects a frame, original also forwarded +13. WS inject_to_client_and_close — proxy injects then closes +14. WS frame replacement — original frame replaced with different one +15. WS suppress_onwards — all subsequent server frames dropped +16. WS count matching — rule only fires on Nth connection/frame +17. WS one-shot rule (times=1) — fires once then removed + +**HTTP proxy:** +18. HTTP passthrough — request forwarded, response returned +19. HTTP respond — fake 401 returned for first request, second passes through +20. HTTP delay — response delayed by specified duration +21. HTTP drop — connection dropped, no response +22. HTTP replace_response — upstream response discarded, fake one returned +23. HTTP count matching + +**Imperative actions:** +24. Disconnect via actions API +25. Inject message via actions API +26. Action on session with no active WS returns error + +**Temporal triggers:** +27. delay_after_ws_connect fires and disconnects +28. delay_after_ws_connect cancelled if connection closes first +29. delay_after_ws_connect cancelled on session delete + +**Event log:** +30. Log captures WS connect, frames, disconnect events +31. Log captures HTTP request/response events +32. Log records which rule matched (or null for passthrough) + +**Concurrent sessions:** +33. Two sessions on different ports with different rules don't interfere + +**Msgpack:** +34. Binary (msgpack) frames parsed and matched by action +35. Binary frames forwarded unchanged (no re-encoding) + +## Data types + +### Session + +```go +type Session struct { + ID string + Target TargetConfig + Port int + Rules []*Rule + EventLog *EventLog + TimeoutTimer *time.Timer + Listener net.Listener + Server *http.Server + + activeWsConns []*WsConnection + wsConnectCount int + httpReqCount int + + suppressServerToClient bool + suppressClientToServer bool + + mu sync.Mutex +} + +type TargetConfig struct { + RealtimeHost string `json:"realtimeHost"` + RestHost string `json:"restHost"` +} +``` + +### Rule + +```go +type Rule struct { + Match MatchConfig `json:"match"` + Action ActionConfig `json:"action"` + Times int `json:"times,omitempty"` // 0 = unlimited + Comment string `json:"comment,omitempty"` + + matchCount int // how many times the match condition was satisfied +} + +type MatchConfig struct { + Type string `json:"type"` + Count int `json:"count,omitempty"` + Action string `json:"action,omitempty"` + Channel string `json:"channel,omitempty"` + Method string `json:"method,omitempty"` + PathContains string `json:"pathContains,omitempty"` + QueryContains map[string]string `json:"queryContains,omitempty"` + DelayMs int `json:"delayMs,omitempty"` +} + +type ActionConfig struct { + Type string `json:"type"` + CloseCode int `json:"closeCode,omitempty"` + DelayMs int `json:"delayMs,omitempty"` + Message json.RawMessage `json:"message,omitempty"` + Status int `json:"status,omitempty"` + Body json.RawMessage `json:"body,omitempty"` + Headers map[string]string `json:"headers,omitempty"` +} +``` + +### Event log + +```go +type Event struct { + Timestamp time.Time `json:"timestamp"` + Type string `json:"type"` + Direction string `json:"direction,omitempty"` + URL string `json:"url,omitempty"` + QueryParams map[string]string `json:"queryParams,omitempty"` + Message json.RawMessage `json:"message,omitempty"` + Method string `json:"method,omitempty"` + Path string `json:"path,omitempty"` + Status int `json:"status,omitempty"` + Initiator string `json:"initiator,omitempty"` + CloseCode int `json:"closeCode,omitempty"` + RuleMatched *string `json:"ruleMatched"` + Headers map[string]string `json:"headers,omitempty"` +} + +type EventLog struct { + events []Event + mu sync.Mutex +} +``` + +### Protocol message (minimal parsing) + +```go +type ProtocolMessage struct { + Action int + Channel string + Error *ErrorInfo +} + +type ErrorInfo struct { + Code int `json:"code"` + StatusCode int `json:"statusCode"` + Message string `json:"message"` +} + +// Action name constants +const ( + ActionHeartbeat = 0 + ActionAck = 1 + ActionNack = 2 + ActionConnect = 3 + ActionConnected = 4 + ActionDisconnect = 5 + ActionDisconnected = 6 + ActionClose = 7 + ActionClosed = 8 + ActionError = 9 + ActionAttach = 10 + ActionAttached = 11 + ActionDetach = 12 + ActionDetached = 13 + ActionPresence = 14 + ActionMessage = 15 + ActionSync = 16 + ActionAuth = 17 +) + +// actionNames maps name strings to int for rule matching +var actionNames = map[string]int{ + "HEARTBEAT": 0, + "ACK": 1, + // ... +} +``` + +### WsConnection + +```go +type WsConnection struct { + ClientConn *websocket.Conn + ServerConn *websocket.Conn + ConnNumber int + timers []*time.Timer + closed bool + mu sync.Mutex +} +``` + +## Build and run + +```bash +cd uts/test/proxy +go mod init ably.io/test-proxy +go get github.com/gorilla/websocket +go get github.com/vmihailenco/msgpack/v5 +go build -o test-proxy . + +# Run (control API on port 9100) +./test-proxy --port 9100 + +# Run tests +go test ./... -v +``` + +## Integration with Dart test runner + +The Dart test harness will: + +1. Spawn the proxy process: `Process.start('test-proxy', ['--port', '9100'])` +2. Wait for `GET http://localhost:9100/health` to return 200 +3. Maintain a port pool (e.g., 10000–11023) +4. For each test (or test group): + a. Allocate a port from the pool + b. Create a session: `POST http://localhost:9100/sessions` with `{"port": 10042, "target": {...}, "rules": [...]}` + c. If 409 (port conflict), try another port + d. Configure the SDK: + ```dart + ClientOptions( + realtimeHost: 'localhost:10042', + restHost: 'localhost:10042', + tls: false, + key: sandboxKey, + ) + ``` + e. Run the test + f. Delete the session: `DELETE http://localhost:9100/sessions/{id}` + g. Return port to pool +5. After all tests, kill the proxy process + +## Port pool design (Dart side) + +```dart +class PortPool { + final Set _available; + final Set _inUse = {}; + + PortPool({int start = 10000, int count = 1024}) + : _available = Set.from(List.generate(count, (i) => start + i)); + + int allocate() { + if (_available.isEmpty) throw StateError('No ports available'); + final port = _available.first; + _available.remove(port); + _inUse.add(port); + return port; + } + + void release(int port) { + _inUse.remove(port); + _available.add(port); + } +} +``` diff --git a/uts/proxy/PROPOSAL.md b/uts/proxy/PROPOSAL.md new file mode 100644 index 000000000..29f49ba17 --- /dev/null +++ b/uts/proxy/PROPOSAL.md @@ -0,0 +1,461 @@ +# Ably Test Proxy — Proposal + +## Overview + +A programmable HTTP/WebSocket proxy that sits between an Ably SDK under test and the real Ably sandbox backend. The proxy transparently forwards traffic by default, but can be configured with **rules** to inject faults — dropped connections, modified responses, injected protocol messages, delayed frames, etc. + +This enables **integration tests for fault behaviour** that would otherwise require mocking. The proxy gives tests the realism of talking to the actual Ably sandbox while retaining the ability to simulate network and protocol faults. + +## Motivation + +The existing UTS unit tests use mock HTTP/WebSocket clients to test fault handling (connection failures, token expiry, heartbeat starvation, channel errors, etc.). These are valuable but have limitations: + +- They test against synthetic responses, not the real server protocol +- They cannot verify that resume actually works end-to-end with a real server +- They require the test to script every server response, including the "happy path" ones + +A proxy-based approach lets tests rely on the real sandbox for normal behaviour and only inject specific faults. This increases confidence that the SDK handles real-world failure modes correctly. + +## Architecture + +``` + ┌────────────────────────────────────────────┐ + │ Ably Test Proxy (single process) │ + │ │ +┌──────────┐ │ ┌──────────────────┐ │ ┌───────────────┐ +│ SDK │────WS──▶│ │ :10042 (session1)│───wss──────────────▶│──────│ Ably Sandbox │ +│ under │◀───────▶│ │ :10043 (session2)│◀──────────────────▶│ │◀────│ (real backend) │ +│ test │──HTTP──▶│ │ ... │───https─────────────│──────│ │ +└──────────┘ │ └──────────────────┘ │ └───────────────┘ + │ │ + │ ┌──────────────────┐ │ + │ │ :9100 control API │ │ + │ └──────────────────┘ │ + └────────────────────────────────────────────┘ + ▲ + │ HTTP control API + ┌────────┴────────────┐ + │ Test process │ + │ (creates sessions, │ + │ assigns ports, │ + │ adds rules, │ + │ triggers actions) │ + └─────────────────────┘ +``` + +- **Single proxy process** serves multiple concurrent test sessions +- **Control API** (HTTP on a dedicated port, e.g. `:9100`) manages sessions and rules +- **Per-session ports** (assigned by the test process from a port pool) handle proxied WS and HTTP traffic. Each session binds its own TCP listener so the SDK can connect with standard URL paths. +- **No TLS between client and proxy.** The proxy serves plain HTTP/WS to the SDK. Upstream connections to the Ably sandbox use TLS (`wss://`, `https://`). +- **Default behaviour** is transparent passthrough to the real Ably sandbox +- **Protocol-aware for both JSON and msgpack.** The proxy decodes frames in both formats for rule matching. Raw bytes are forwarded unchanged (no re-encoding). + +## Control API + +Base URL: `http://localhost:{CONTROL_PORT}` + +### Create session + +The test process assigns a port from its port pool and passes it in the request. The proxy binds that port immediately — if the bind fails, the request fails with 409. + +``` +POST /sessions +Content-Type: application/json + +{ + "target": { + "realtimeHost": "sandbox-realtime.ably.io", + "restHost": "sandbox-rest.ably.io" + }, + "rules": [ ...rules... ], + "timeoutMs": 30000, + "port": 10042 +} + +Response 201: +{ + "sessionId": "abc123", + "proxy": { + "host": "localhost:10042", + "port": 10042 + } +} + +Response 409 (port unavailable): +{ + "error": "failed to bind port 10042: address already in use" +} +``` + +The SDK under test connects to the proxy port with standard URLs: +- WebSocket: `ws://localhost:10042/?key=...&heartbeats=true` +- REST: `http://localhost:10042/channels/test/messages` + +### Add rules dynamically + +``` +POST /sessions/{sessionId}/rules +Content-Type: application/json + +{ + "rules": [ ...additional rules... ], + "position": "append" // or "prepend" +} + +Response 200: +{ + "ruleCount": 5 +} +``` + +### Trigger an imperative action + +For cases where timed rules are awkward (e.g., "drop the connection NOW"): + +``` +POST /sessions/{sessionId}/actions +Content-Type: application/json + +{ "type": "disconnect" } + +Response 200: +{ "ok": true } +``` + +### Get captured traffic log + +``` +GET /sessions/{sessionId}/log +Response 200: +{ + "events": [ ...see event format below... ] +} +``` + +### Teardown session + +``` +DELETE /sessions/{sessionId} +Response 200: +{ + "events": [ ...final captured traffic log... ] +} +``` + +Teardown closes all active connections, stops the per-session listener, and frees the port. + +### Health check + +``` +GET /health +Response 200: { "ok": true } +``` + +## Rule format + +Each rule has a **match** condition, an **action** to perform, and an optional **times** limit: + +```jsonc +{ + "match": { ... }, + "action": { ... }, + "times": 1, // optional: remove rule after N matches (default: unlimited) + "comment": "..." // optional: for readability +} +``` + +Rules are evaluated in order. The first matching rule wins. Unmatched traffic is passed through unchanged. + +### Match conditions + +#### WebSocket connection attempt + +```jsonc +{ "type": "ws_connect" } +{ "type": "ws_connect", "count": 2 } // only the 2nd connection attempt +{ "type": "ws_connect", "queryContains": { "resume": "*" } } // has resume param +``` + +#### WebSocket frame from client → server + +```jsonc +{ "type": "ws_frame_to_server" } +{ "type": "ws_frame_to_server", "action": "ATTACH" } +{ "type": "ws_frame_to_server", "action": "ATTACH", "channel": "my-channel" } +{ "type": "ws_frame_to_server", "action": "MESSAGE" } +``` + +#### WebSocket frame from server → client + +```jsonc +{ "type": "ws_frame_to_client" } +{ "type": "ws_frame_to_client", "action": "CONNECTED" } +{ "type": "ws_frame_to_client", "action": "ATTACHED", "channel": "my-channel" } +{ "type": "ws_frame_to_client", "action": "HEARTBEAT" } +``` + +#### HTTP request + +```jsonc +{ "type": "http_request" } +{ "type": "http_request", "method": "POST" } +{ "type": "http_request", "pathContains": "/channels/" } +{ "type": "http_request", "pathContains": "/keys/" } +``` + +#### Temporal trigger + +```jsonc +{ "type": "delay_after_ws_connect", "delayMs": 5000 } +``` + +Fires once, `delayMs` after the WebSocket connection is established. Used for timed fault injection (e.g., heartbeat starvation, timed disconnection). + +### Actions + +#### Passthrough (default) + +```jsonc +{ "type": "passthrough" } +``` + +Forward unchanged. + +#### Connection-level faults + +```jsonc +// Refuse the WebSocket connection at TCP level +{ "type": "refuse_connection" } + +// Accept WebSocket handshake but immediately close +{ "type": "accept_and_close", "closeCode": 1011 } + +// Disconnect abruptly (no close frame) +{ "type": "disconnect" } + +// Close cleanly with code +{ "type": "close", "closeCode": 1000 } +``` + +#### Frame manipulation + +```jsonc +// Suppress (swallow) the frame — don't forward it +{ "type": "suppress" } + +// Delay before forwarding +{ "type": "delay", "delayMs": 2000 } + +// Inject a frame to the client (as if from server), in addition to the matched frame +{ "type": "inject_to_client", "message": { "action": 6, ... } } + +// Inject a frame to the client then close the WebSocket +{ "type": "inject_to_client_and_close", "message": { "action": 6, ... }, "closeCode": 1000 } + +// Replace the frame with a different one +{ "type": "replace", "message": { "action": 4, ... } } + +// Suppress all subsequent frames in the same direction (for heartbeat starvation) +{ "type": "suppress_onwards" } +``` + +#### HTTP faults + +```jsonc +// Return a specific HTTP response instead of forwarding +{ "type": "http_respond", "status": 503, "body": { ... }, "headers": { ... } } + +// Delay the HTTP response +{ "type": "http_delay", "delayMs": 5000 } + +// Drop the HTTP connection (no response) +{ "type": "http_drop" } + +// Forward but replace the response +{ "type": "http_replace_response", "status": 401, "body": { ... } } +``` + +## Event log format + +All traffic through a session is recorded: + +```jsonc +{ + "events": [ + { + "timestamp": "2026-02-23T10:00:00.123Z", + "type": "ws_connect", + "url": "ws://...", + "queryParams": { "key": "...", "heartbeats": "true" } + }, + { + "timestamp": "2026-02-23T10:00:00.200Z", + "type": "ws_frame", + "direction": "server_to_client", + "message": { "action": 4, "connectionId": "...", ... }, + "ruleMatched": null + }, + { + "timestamp": "2026-02-23T10:00:01.500Z", + "type": "ws_frame", + "direction": "client_to_server", + "message": { "action": 15, "channel": "test", ... }, + "ruleMatched": "rule-2" + }, + { + "timestamp": "2026-02-23T10:00:02.000Z", + "type": "ws_disconnect", + "initiator": "proxy", + "closeCode": 1006 + }, + { + "timestamp": "2026-02-23T10:00:02.100Z", + "type": "http_request", + "direction": "client_to_server", + "method": "GET", + "path": "/channels/test/messages", + "headers": { ... } + }, + { + "timestamp": "2026-02-23T10:00:02.300Z", + "type": "http_response", + "direction": "server_to_client", + "status": 200, + "ruleMatched": null + } + ] +} +``` + +## Usage patterns + +### Pattern 1: Imperative disconnect (RTN15a equivalent) + +``` +# Create passthrough session on port 10042 +POST /sessions {"port": 10042, "target": SANDBOX} + +# Connect SDK: Realtime(realtimeHost: "localhost:10042", tls: false) +# Wait for CONNECTED + +# Trigger disconnect +POST /sessions/{id}/actions {"type": "disconnect"} + +# SDK reconnects through proxy (passthrough), resumes +# Wait for CONNECTED again + +# Verify from log +GET /sessions/{id}/log +→ expect two ws_connect events +→ expect second ws_connect has queryParams.resume +``` + +### Pattern 2: One-shot connection refusal (RTN14d equivalent) + +```json +{ + "port": 10042, + "target": {"realtimeHost": "sandbox-realtime.ably.io"}, + "rules": [{ + "match": {"type": "ws_connect", "count": 1}, + "action": {"type": "refuse_connection"}, + "times": 1 + }] +} +``` + +First connection attempt is refused. SDK retries. Second passes through to sandbox. + +### Pattern 3: Injected DISCONNECTED with token error (RTN15h1 equivalent) + +```json +{ + "port": 10042, + "target": {"realtimeHost": "sandbox-realtime.ably.io"}, + "rules": [{ + "match": {"type": "delay_after_ws_connect", "delayMs": 1000}, + "action": { + "type": "inject_to_client_and_close", + "message": { + "action": 6, + "error": {"code": 40142, "statusCode": 401, "message": "Token expired"} + } + }, + "times": 1 + }] +} +``` + +### Pattern 4: REST 401 for token renewal (RSA4b4 equivalent) + +```json +{ + "port": 10042, + "target": {"restHost": "sandbox-rest.ably.io"}, + "rules": [{ + "match": {"type": "http_request", "pathContains": "/channels/"}, + "action": { + "type": "http_respond", + "status": 401, + "body": {"error": {"code": 40142, "statusCode": 401, "message": "Token expired"}} + }, + "times": 1 + }] +} +``` + +First channel request gets fake 401. Client renews token, retries. Second request passes through to real sandbox. + +### Pattern 5: Heartbeat starvation (RTN23 equivalent) + +```json +{ + "port": 10042, + "target": {"realtimeHost": "sandbox-realtime.ably.io"}, + "rules": [{ + "match": {"type": "delay_after_ws_connect", "delayMs": 2000}, + "action": {"type": "suppress_onwards"}, + "times": 1 + }] +} +``` + +SDK connects, gets CONNECTED from real server. After 2s, proxy starts swallowing all server→client frames. Client heartbeat timer expires. Client disconnects and reconnects. + +### Pattern 6: Channel attach suppression (RTL4f timeout equivalent) + +```json +{ + "port": 10042, + "target": {"realtimeHost": "sandbox-realtime.ably.io"}, + "rules": [{ + "match": {"type": "ws_frame_to_server", "action": "ATTACH", "channel": "test"}, + "action": {"type": "suppress"}, + "times": 1 + }] +} +``` + +Client sends ATTACH, proxy swallows it. Server never sees it, never responds. Client's attach timeout fires. + +## Scope and non-goals + +### In scope + +- WebSocket proxying with Ably protocol message awareness (JSON and msgpack) +- HTTP proxying for REST API calls +- Rule-based fault injection (connection, frame, and HTTP levels) +- Imperative actions (disconnect, close) +- Traffic capture and logging +- Concurrent sessions on separate ports for parallel tests + +### Not in scope + +- Fake timers / time advancement (integration tests use real time with short configured timeouts) +- Mock authUrl server (tests can spin up their own if needed) +- TLS between client and proxy (proxy serves plain HTTP/WS; TLS is used only upstream to sandbox) +- Modifying the SDK's internal state + +## Implementation + +The proxy is implemented in Go. See `IMPLEMENTATION.md` for the implementation plan. diff --git a/uts/proxy/action.go b/uts/proxy/action.go new file mode 100644 index 000000000..d66972bda --- /dev/null +++ b/uts/proxy/action.go @@ -0,0 +1,129 @@ +package main + +import ( + "encoding/json" + "fmt" + + "github.com/gorilla/websocket" +) + +// ExecuteImperativeAction executes an immediate action on the session's active WS connection(s). +func ExecuteImperativeAction(session *Session, req ActionRequest) error { + session.EventLog.Append(Event{ + Type: "action", + Initiator: "proxy", + Message: mustMarshal(req), + }) + + switch req.Type { + case "disconnect": + return imperativeDisconnect(session) + case "close": + return imperativeClose(session, req.CloseCode) + case "inject_to_client": + return imperativeInjectToClient(session, req.Message, false) + case "inject_to_client_and_close": + return imperativeInjectToClient(session, req.Message, true) + default: + return fmt.Errorf("unknown action type: %s", req.Type) + } +} + +func imperativeDisconnect(session *Session) error { + wc := session.GetActiveWsConn() + if wc == nil { + return fmt.Errorf("no active WebSocket connection") + } + + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + }) + + // Abrupt close — close the underlying TCP connection + if conn, ok := wc.ClientConn.(*websocket.Conn); ok { + conn.UnderlyingConn().Close() + } + if conn, ok := wc.ServerConn.(*websocket.Conn); ok { + conn.UnderlyingConn().Close() + } + + wc.MarkClosed() + return nil +} + +func imperativeClose(session *Session, closeCode int) error { + wc := session.GetActiveWsConn() + if wc == nil { + return fmt.Errorf("no active WebSocket connection") + } + + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + CloseCode: closeCode, + }) + + if conn, ok := wc.ClientConn.(*websocket.Conn); ok { + msg := websocket.FormatCloseMessage(closeCode, "") + conn.WriteMessage(websocket.CloseMessage, msg) + conn.UnderlyingConn().Close() + } + if conn, ok := wc.ServerConn.(*websocket.Conn); ok { + conn.UnderlyingConn().Close() + } + + wc.MarkClosed() + return nil +} + +func imperativeInjectToClient(session *Session, message json.RawMessage, andClose bool) error { + wc := session.GetActiveWsConn() + if wc == nil { + return fmt.Errorf("no active WebSocket connection") + } + + if conn, ok := wc.ClientConn.(*websocket.Conn); ok { + if err := conn.WriteMessage(websocket.TextMessage, message); err != nil { + return fmt.Errorf("failed to inject message: %w", err) + } + + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: "server_to_client", + Message: message, + Initiator: "proxy", + }) + } + + if andClose { + if conn, ok := wc.ClientConn.(*websocket.Conn); ok { + msg := websocket.FormatCloseMessage(websocket.CloseNormalClosure, "") + conn.WriteMessage(websocket.CloseMessage, msg) + conn.UnderlyingConn().Close() + } + if conn, ok := wc.ServerConn.(*websocket.Conn); ok { + conn.UnderlyingConn().Close() + } + wc.MarkClosed() + + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + }) + } + + return nil +} + +func mustMarshal(v interface{}) json.RawMessage { + b, err := json.Marshal(v) + if err != nil { + return json.RawMessage(`{}`) + } + return b +} diff --git a/uts/proxy/go.mod b/uts/proxy/go.mod new file mode 100644 index 000000000..536c3d0e3 --- /dev/null +++ b/uts/proxy/go.mod @@ -0,0 +1,9 @@ +module ably.io/test-proxy + +go 1.22.3 + +require ( + github.com/gorilla/websocket v1.5.3 // indirect + github.com/vmihailenco/msgpack/v5 v5.4.1 // indirect + github.com/vmihailenco/tagparser/v2 v2.0.0 // indirect +) diff --git a/uts/proxy/go.sum b/uts/proxy/go.sum new file mode 100644 index 000000000..ec57b9bf5 --- /dev/null +++ b/uts/proxy/go.sum @@ -0,0 +1,6 @@ +github.com/gorilla/websocket v1.5.3 h1:saDtZ6Pbx/0u+bgYQ3q96pZgCzfhKXGPqt7kZ72aNNg= +github.com/gorilla/websocket v1.5.3/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE= +github.com/vmihailenco/msgpack/v5 v5.4.1 h1:cQriyiUvjTwOHg8QZaPihLWeRAAVoCpE00IUPn0Bjt8= +github.com/vmihailenco/msgpack/v5 v5.4.1/go.mod h1:GaZTsDaehaPpQVyxrf5mtQlH+pc21PIudVV/E3rRQok= +github.com/vmihailenco/tagparser/v2 v2.0.0 h1:y09buUbR+b5aycVFQs/g70pqKVZNBmxwAhO7/IwNM9g= +github.com/vmihailenco/tagparser/v2 v2.0.0/go.mod h1:Wri+At7QHww0WTrCBeu4J6bNtoV6mEfg5OIWRZA9qds= diff --git a/uts/proxy/http_proxy.go b/uts/proxy/http_proxy.go new file mode 100644 index 000000000..f85db1c16 --- /dev/null +++ b/uts/proxy/http_proxy.go @@ -0,0 +1,205 @@ +package main + +import ( + "crypto/tls" + "encoding/json" + "fmt" + "io" + "log" + "net" + "net/http" + "net/http/httputil" + "net/url" + "time" +) + +// HandleHttpProxy handles an HTTP request from the SDK client, +// proxying it to the upstream Ably REST host. +func HandleHttpProxy(session *Session, w http.ResponseWriter, r *http.Request) { + reqCount := session.IncrementHttpReqCount() + _ = reqCount + + // Log request headers (subset for readability) + headers := make(map[string]string) + for _, key := range []string{"Authorization", "Content-Type", "Accept", "X-Ably-Version", "X-Ably-Lib"} { + if v := r.Header.Get(key); v != "" { + headers[key] = v + } + } + + // Log http_request event + session.EventLog.Append(Event{ + Type: "http_request", + Direction: "client_to_server", + Method: r.Method, + Path: r.URL.Path, + Headers: headers, + }) + + // Build match event + matchEvent := MatchEvent{ + Type: "http_request", + Action: -1, + Method: r.Method, + Path: r.URL.Path, + } + + rule, ruleIdx := session.FindMatchingRule(matchEvent) + + if rule != nil { + session.FireRule(rule) + + switch rule.Action.Type { + case "http_respond": + ruleLabel := LogRuleMatch(rule, ruleIdx) + respondWithRule(w, session, rule, ruleLabel) + return + + case "http_delay": + time.Sleep(time.Duration(rule.Action.DelayMs) * time.Millisecond) + // Fall through to proxy + + case "http_drop": + ruleLabel := LogRuleMatch(rule, ruleIdx) + session.EventLog.Append(Event{ + Type: "http_response", + Direction: "server_to_client", + Status: 0, + RuleMatched: ruleLabel, + }) + // Hijack the connection and close it without responding + hj, ok := w.(http.Hijacker) + if ok { + conn, _, err := hj.Hijack() + if err == nil { + conn.Close() + } + } + return + + case "http_replace_response": + // Forward to upstream, discard response, return specified response + proxyToUpstreamAndDiscard(session, r) + ruleLabel := LogRuleMatch(rule, ruleIdx) + respondWithRule(w, session, rule, ruleLabel) + return + + case "passthrough": + // Fall through to proxy + } + } + + // Proxy to upstream + if session.Target.RestHost == "" { + writeError(w, http.StatusBadGateway, "no REST host configured") + return + } + + scheme := "https" + if session.Target.Insecure { + scheme = "http" + } + upstreamURL := &url.URL{ + Scheme: scheme, + Host: session.Target.RestHost, + } + + transport := &http.Transport{ + DialContext: (&net.Dialer{ + Timeout: 10 * time.Second, + }).DialContext, + } + if !session.Target.Insecure { + transport.TLSClientConfig = &tls.Config{} + } + + proxy := &httputil.ReverseProxy{ + Director: func(req *http.Request) { + req.URL.Scheme = upstreamURL.Scheme + req.URL.Host = upstreamURL.Host + req.Host = upstreamURL.Host + }, + Transport: transport, + ModifyResponse: func(resp *http.Response) error { + ruleLabel := LogRuleMatch(rule, ruleIdx) + session.EventLog.Append(Event{ + Type: "http_response", + Direction: "server_to_client", + Status: resp.StatusCode, + RuleMatched: ruleLabel, + }) + return nil + }, + ErrorHandler: func(w http.ResponseWriter, r *http.Request, err error) { + log.Printf("session %s: HTTP proxy error: %v", session.ID, err) + writeError(w, http.StatusBadGateway, fmt.Sprintf("upstream error: %v", err)) + }, + } + + proxy.ServeHTTP(w, r) +} + +func respondWithRule(w http.ResponseWriter, session *Session, rule *Rule, ruleLabel *string) { + status := rule.Action.Status + if status <= 0 { + status = 200 + } + + // Set headers + for k, v := range rule.Action.Headers { + w.Header().Set(k, v) + } + + // Default content type + if w.Header().Get("Content-Type") == "" { + w.Header().Set("Content-Type", "application/json") + } + + w.WriteHeader(status) + + if len(rule.Action.Body) > 0 { + w.Write(rule.Action.Body) + } + + session.EventLog.Append(Event{ + Type: "http_response", + Direction: "server_to_client", + Status: status, + RuleMatched: ruleLabel, + }) +} + +func proxyToUpstreamAndDiscard(session *Session, r *http.Request) { + if session.Target.RestHost == "" { + return + } + + scheme := "https" + if session.Target.Insecure { + scheme = "http" + } + upstreamURL := fmt.Sprintf("%s://%s%s", scheme, session.Target.RestHost, r.URL.RequestURI()) + req, err := http.NewRequest(r.Method, upstreamURL, r.Body) + if err != nil { + return + } + req.Header = r.Header.Clone() + + client := &http.Client{Timeout: 10 * time.Second} + if !session.Target.Insecure { + client.Transport = &http.Transport{TLSClientConfig: &tls.Config{}} + } + resp, err := client.Do(req) + if err != nil { + return + } + io.ReadAll(resp.Body) + resp.Body.Close() +} + +// WriteJSONResponse writes a JSON response body with the given status and data. +func WriteJSONResponse(w http.ResponseWriter, status int, data interface{}) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(status) + json.NewEncoder(w).Encode(data) +} diff --git a/uts/proxy/listener.go b/uts/proxy/listener.go new file mode 100644 index 000000000..fca7f843e --- /dev/null +++ b/uts/proxy/listener.go @@ -0,0 +1,71 @@ +package main + +import ( + "context" + "fmt" + "log" + "net" + "net/http" +) + +// StartSessionListener binds the given port and starts an HTTP server +// that routes WebSocket upgrades to WsProxyHandler and other HTTP requests +// to HttpProxyHandler. Returns an error if the port cannot be bound. +func StartSessionListener(session *Session, port int) error { + addr := fmt.Sprintf(":%d", port) + listener, err := net.Listen("tcp", addr) + if err != nil { + return fmt.Errorf("failed to bind port %d: %w", port, err) + } + + session.mu.Lock() + session.listener = listener + session.mu.Unlock() + + mux := http.NewServeMux() + mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { + // Check if this is a WebSocket upgrade request + if isWebSocketUpgrade(r) { + HandleWsProxy(session, w, r) + return + } + // Otherwise treat as HTTP proxy + HandleHttpProxy(session, w, r) + }) + + server := &http.Server{Handler: mux} + + go func() { + if err := server.Serve(listener); err != nil && err != http.ErrServerClosed { + log.Printf("session %s listener on port %d closed: %v", session.ID, port, err) + } + }() + + // Store server reference so we can shut it down later + session.mu.Lock() + session.Server = server + session.mu.Unlock() + + return nil +} + +// StopSessionListener gracefully shuts down the per-session HTTP server and closes the listener. +func StopSessionListener(session *Session) { + session.mu.Lock() + server := session.Server + session.Server = nil + session.mu.Unlock() + + if server != nil { + server.Shutdown(context.Background()) + } +} + +func isWebSocketUpgrade(r *http.Request) bool { + for _, v := range r.Header["Upgrade"] { + if v == "websocket" { + return true + } + } + return false +} diff --git a/uts/proxy/log.go b/uts/proxy/log.go new file mode 100644 index 000000000..111cb2064 --- /dev/null +++ b/uts/proxy/log.go @@ -0,0 +1,54 @@ +package main + +import ( + "encoding/json" + "sync" + "time" +) + +// Event represents a single logged event in a session's traffic log. +type Event struct { + Timestamp time.Time `json:"timestamp"` + Type string `json:"type"` // ws_connect, ws_frame, ws_disconnect, http_request, http_response, action + Direction string `json:"direction,omitempty"` // client_to_server, server_to_client + URL string `json:"url,omitempty"` + QueryParams map[string]string `json:"queryParams,omitempty"` + Message json.RawMessage `json:"message,omitempty"` + Method string `json:"method,omitempty"` + Path string `json:"path,omitempty"` + Status int `json:"status,omitempty"` + Initiator string `json:"initiator,omitempty"` // client, server, proxy + CloseCode int `json:"closeCode,omitempty"` + RuleMatched *string `json:"ruleMatched"` + Headers map[string]string `json:"headers,omitempty"` +} + +// EventLog is an append-only, thread-safe event log. +type EventLog struct { + events []Event + mu sync.Mutex +} + +// NewEventLog creates a new empty event log. +func NewEventLog() *EventLog { + return &EventLog{} +} + +// Append adds an event to the log. +func (l *EventLog) Append(event Event) { + l.mu.Lock() + defer l.mu.Unlock() + if event.Timestamp.IsZero() { + event.Timestamp = time.Now().UTC() + } + l.events = append(l.events, event) +} + +// Events returns a copy of all events. +func (l *EventLog) Events() []Event { + l.mu.Lock() + defer l.mu.Unlock() + out := make([]Event, len(l.events)) + copy(out, l.events) + return out +} diff --git a/uts/proxy/main.go b/uts/proxy/main.go new file mode 100644 index 000000000..9899bbd6c --- /dev/null +++ b/uts/proxy/main.go @@ -0,0 +1,47 @@ +package main + +import ( + "flag" + "fmt" + "log" + "net/http" + "os" + "os/signal" + "syscall" +) + +func main() { + port := flag.Int("port", 9100, "control API port") + flag.Parse() + + store := NewSessionStore() + server := NewServer(store) + + addr := fmt.Sprintf(":%d", *port) + httpServer := &http.Server{ + Addr: addr, + Handler: server, + } + + // Graceful shutdown on signal + sigCh := make(chan os.Signal, 1) + signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM) + + go func() { + <-sigCh + log.Println("shutting down...") + + // Clean up all sessions + for _, session := range store.All() { + StopSessionListener(session) + session.Close() + } + + httpServer.Close() + }() + + log.Printf("Ably test proxy control API listening on %s", addr) + if err := httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed { + log.Fatalf("failed to start server: %v", err) + } +} diff --git a/uts/proxy/protocol.go b/uts/proxy/protocol.go new file mode 100644 index 000000000..24eaac388 --- /dev/null +++ b/uts/proxy/protocol.go @@ -0,0 +1,234 @@ +package main + +import ( + "encoding/json" + "fmt" + "strconv" + "strings" + + "github.com/gorilla/websocket" + "github.com/vmihailenco/msgpack/v5" +) + +// Ably protocol message action constants. +const ( + ActionHeartbeat = 0 + ActionAck = 1 + ActionNack = 2 + ActionConnect = 3 + ActionConnected = 4 + ActionDisconnect = 5 + ActionDisconnected = 6 + ActionClose = 7 + ActionClosed = 8 + ActionError = 9 + ActionAttach = 10 + ActionAttached = 11 + ActionDetach = 12 + ActionDetached = 13 + ActionPresence = 14 + ActionMessage = 15 + ActionSync = 16 + ActionAuth = 17 +) + +var actionNames = map[string]int{ + "HEARTBEAT": ActionHeartbeat, + "ACK": ActionAck, + "NACK": ActionNack, + "CONNECT": ActionConnect, + "CONNECTED": ActionConnected, + "DISCONNECT": ActionDisconnect, + "DISCONNECTED": ActionDisconnected, + "CLOSE": ActionClose, + "CLOSED": ActionClosed, + "ERROR": ActionError, + "ATTACH": ActionAttach, + "ATTACHED": ActionAttached, + "DETACH": ActionDetach, + "DETACHED": ActionDetached, + "PRESENCE": ActionPresence, + "MESSAGE": ActionMessage, + "SYNC": ActionSync, + "AUTH": ActionAuth, +} + +var actionNumbers = map[int]string{} + +func init() { + for name, num := range actionNames { + actionNumbers[num] = name + } +} + +// ActionFromString converts an action name (e.g. "ATTACH") or numeric string (e.g. "10") to an int. +// Returns -1 if the string is not recognized. +func ActionFromString(s string) int { + // Try as name first + if n, ok := actionNames[strings.ToUpper(s)]; ok { + return n + } + // Try as number + if n, err := strconv.Atoi(s); err == nil { + return n + } + return -1 +} + +// ActionName returns the name for an action number, or the number as a string. +func ActionName(action int) string { + if name, ok := actionNumbers[action]; ok { + return name + } + return strconv.Itoa(action) +} + +// ProtocolMessage is a minimal representation of an Ably protocol message, +// containing only the fields needed for rule matching. +type ProtocolMessage struct { + Action int + Channel string + Error *ErrorInfo +} + +// ErrorInfo is a minimal representation of an Ably error. +type ErrorInfo struct { + Code int `json:"code"` + StatusCode int `json:"statusCode"` + Message string `json:"message"` +} + +// ParseProtocolMessage decodes a WebSocket frame into a ProtocolMessage. +// For text frames (JSON) and binary frames (msgpack). +// Returns the parsed message. On failure, returns a message with Action=-1. +func ParseProtocolMessage(data []byte, messageType int) ProtocolMessage { + if messageType == websocket.TextMessage { + return parseJSON(data) + } + if messageType == websocket.BinaryMessage { + return parseMsgpack(data) + } + return ProtocolMessage{Action: -1} +} + +func parseJSON(data []byte) ProtocolMessage { + var raw map[string]json.RawMessage + if err := json.Unmarshal(data, &raw); err != nil { + return ProtocolMessage{Action: -1} + } + + pm := ProtocolMessage{Action: -1} + + if actionRaw, ok := raw["action"]; ok { + // Action can be int or string + var actionInt int + if err := json.Unmarshal(actionRaw, &actionInt); err == nil { + pm.Action = actionInt + } else { + var actionStr string + if err := json.Unmarshal(actionRaw, &actionStr); err == nil { + pm.Action = ActionFromString(actionStr) + } + } + } + + if channelRaw, ok := raw["channel"]; ok { + json.Unmarshal(channelRaw, &pm.Channel) + } + + if errorRaw, ok := raw["error"]; ok { + var ei ErrorInfo + if err := json.Unmarshal(errorRaw, &ei); err == nil { + pm.Error = &ei + } + } + + return pm +} + +func parseMsgpack(data []byte) ProtocolMessage { + // Ably msgpack can be either a map or an array. + // Try map first (the common wire format). + var rawMap map[string]interface{} + if err := msgpack.Unmarshal(data, &rawMap); err == nil { + return parseMsgpackMap(rawMap) + } + + // Fall back to array format. + var rawArray []interface{} + if err := msgpack.Unmarshal(data, &rawArray); err == nil { + return parseMsgpackArray(rawArray) + } + + return ProtocolMessage{Action: -1} +} + +func parseMsgpackMap(m map[string]interface{}) ProtocolMessage { + pm := ProtocolMessage{Action: -1} + + if action, ok := m["action"]; ok { + pm.Action = toInt(action) + } + if channel, ok := m["channel"]; ok { + if s, ok := channel.(string); ok { + pm.Channel = s + } + } + if errObj, ok := m["error"]; ok { + if errMap, ok := errObj.(map[string]interface{}); ok { + pm.Error = &ErrorInfo{ + Code: toInt(errMap["code"]), + StatusCode: toInt(errMap["statusCode"]), + Message: fmt.Sprintf("%v", errMap["message"]), + } + } + } + + return pm +} + +func parseMsgpackArray(a []interface{}) ProtocolMessage { + pm := ProtocolMessage{Action: -1} + + if len(a) > 0 { + pm.Action = toInt(a[0]) + } + if len(a) > 1 { + if s, ok := a[1].(string); ok { + pm.Channel = s + } + } + + return pm +} + +func toInt(v interface{}) int { + switch n := v.(type) { + case int: + return n + case int8: + return int(n) + case int16: + return int(n) + case int32: + return int(n) + case int64: + return int(n) + case uint: + return int(n) + case uint8: + return int(n) + case uint16: + return int(n) + case uint32: + return int(n) + case uint64: + return int(n) + case float32: + return int(n) + case float64: + return int(n) + default: + return -1 + } +} diff --git a/uts/proxy/proxy_test.go b/uts/proxy/proxy_test.go new file mode 100644 index 000000000..385dd44a6 --- /dev/null +++ b/uts/proxy/proxy_test.go @@ -0,0 +1,1118 @@ +package main + +import ( + "bytes" + "encoding/json" + "fmt" + "io" + "net" + "net/http" + "net/http/httptest" + "strings" + "sync" + "testing" + "time" + + "github.com/gorilla/websocket" + "github.com/vmihailenco/msgpack/v5" +) + +// -- Test helpers -- + +// startControlServer starts the control API server on a random port, returns its URL and cleanup func. +func startControlServer(t *testing.T) (string, *SessionStore, func()) { + t.Helper() + store := NewSessionStore() + server := NewServer(store) + ts := httptest.NewServer(server) + return ts.URL, store, ts.Close +} + +// freePort returns an available TCP port. +func freePort(t *testing.T) int { + t.Helper() + l, err := net.Listen("tcp", ":0") + if err != nil { + t.Fatalf("failed to get free port: %v", err) + } + port := l.Addr().(*net.TCPAddr).Port + l.Close() + return port +} + +// createSession creates a session via the control API, returns the response. +func createSession(t *testing.T, controlURL string, req CreateSessionRequest) CreateSessionResponse { + t.Helper() + body, _ := json.Marshal(req) + resp, err := http.Post(controlURL+"/sessions", "application/json", bytes.NewReader(body)) + if err != nil { + t.Fatalf("failed to create session: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusCreated { + b, _ := io.ReadAll(resp.Body) + t.Fatalf("create session returned %d: %s", resp.StatusCode, string(b)) + } + + var result CreateSessionResponse + json.NewDecoder(resp.Body).Decode(&result) + return result +} + +// deleteSession deletes a session via the control API. +func deleteSession(t *testing.T, controlURL, sessionID string) map[string]interface{} { + t.Helper() + req, _ := http.NewRequest("DELETE", controlURL+"/sessions/"+sessionID, nil) + resp, err := http.DefaultClient.Do(req) + if err != nil { + t.Fatalf("failed to delete session: %v", err) + } + defer resp.Body.Close() + var result map[string]interface{} + json.NewDecoder(resp.Body).Decode(&result) + return result +} + +// getLog fetches the event log for a session. +func getLog(t *testing.T, controlURL, sessionID string) []Event { + t.Helper() + resp, err := http.Get(controlURL + "/sessions/" + sessionID + "/log") + if err != nil { + t.Fatalf("failed to get log: %v", err) + } + defer resp.Body.Close() + var result struct { + Events []Event `json:"events"` + } + json.NewDecoder(resp.Body).Decode(&result) + return result.Events +} + +// triggerAction sends an imperative action. +func triggerAction(t *testing.T, controlURL, sessionID string, action ActionRequest) { + t.Helper() + body, _ := json.Marshal(action) + resp, err := http.Post(controlURL+"/sessions/"+sessionID+"/actions", "application/json", bytes.NewReader(body)) + if err != nil { + t.Fatalf("failed to trigger action: %v", err) + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + b, _ := io.ReadAll(resp.Body) + t.Fatalf("trigger action returned %d: %s", resp.StatusCode, string(b)) + } +} + +// addRules adds rules to a session dynamically. +func addRules(t *testing.T, controlURL, sessionID string, rules []json.RawMessage, position string) { + t.Helper() + rulesJSON, _ := json.Marshal(rules) + reqBody, _ := json.Marshal(map[string]interface{}{ + "rules": json.RawMessage(rulesJSON), + "position": position, + }) + resp, err := http.Post(controlURL+"/sessions/"+sessionID+"/rules", "application/json", bytes.NewReader(reqBody)) + if err != nil { + t.Fatalf("failed to add rules: %v", err) + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + b, _ := io.ReadAll(resp.Body) + t.Fatalf("add rules returned %d: %s", resp.StatusCode, string(b)) + } +} + +// startMockWsServer starts a simple WS server that sends a CONNECTED message then echoes frames. +func startMockWsServer(t *testing.T) (string, func()) { + t.Helper() + upgrader := websocket.Upgrader{CheckOrigin: func(r *http.Request) bool { return true }} + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + conn, err := upgrader.Upgrade(w, r, nil) + if err != nil { + return + } + defer conn.Close() + + // Send CONNECTED + connected := map[string]interface{}{ + "action": ActionConnected, + "connectionId": "mock-conn-1", + "connectionDetails": map[string]interface{}{ + "connectionKey": "mock-key-1", + "maxIdleInterval": 15000, + "connectionStateTtl": 120000, + }, + } + connJSON, _ := json.Marshal(connected) + conn.WriteMessage(websocket.TextMessage, connJSON) + + // Echo loop + for { + msgType, data, err := conn.ReadMessage() + if err != nil { + return + } + conn.WriteMessage(msgType, data) + } + }) + + server := httptest.NewServer(handler) + + // Convert http URL to ws host + host := strings.TrimPrefix(server.URL, "http://") + return host, server.Close +} + +// startMockHttpServer starts a simple HTTP server that returns 200 with a JSON body. +func startMockHttpServer(t *testing.T) (string, func()) { + t.Helper() + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusOK) + json.NewEncoder(w).Encode(map[string]string{"status": "ok", "path": r.URL.Path}) + }) + server := httptest.NewServer(handler) + host := strings.TrimPrefix(server.URL, "http://") + return host, server.Close +} + +// connectWs connects a WebSocket client to the given host. +func connectWs(t *testing.T, host string, queryParams ...string) *websocket.Conn { + t.Helper() + u := fmt.Sprintf("ws://%s/", host) + if len(queryParams) > 0 { + u += "?" + strings.Join(queryParams, "&") + } + conn, _, err := websocket.DefaultDialer.Dial(u, nil) + if err != nil { + t.Fatalf("failed to connect WS to %s: %v", host, err) + } + return conn +} + +// readWsMessage reads a text message from a WS connection with timeout. +func readWsMessage(t *testing.T, conn *websocket.Conn, timeout time.Duration) map[string]interface{} { + t.Helper() + conn.SetReadDeadline(time.Now().Add(timeout)) + _, data, err := conn.ReadMessage() + if err != nil { + t.Fatalf("failed to read WS message: %v", err) + } + conn.SetReadDeadline(time.Time{}) + var msg map[string]interface{} + json.Unmarshal(data, &msg) + return msg +} + +// -- Tests -- + +func TestHealthCheck(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + resp, err := http.Get(controlURL + "/health") + if err != nil { + t.Fatalf("health check failed: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != 200 { + t.Fatalf("expected 200, got %d", resp.StatusCode) + } + + var result map[string]bool + json.NewDecoder(resp.Body).Decode(&result) + if !result["ok"] { + t.Fatal("expected ok: true") + } +} + +func TestSessionLifecycle(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1234"}, + Port: port, + }) + + if session.SessionID == "" { + t.Fatal("expected session ID") + } + if session.Proxy.Port != port { + t.Fatalf("expected port %d, got %d", port, session.Proxy.Port) + } + + // Get session + resp, err := http.Get(controlURL + "/sessions/" + session.SessionID) + if err != nil { + t.Fatalf("get session failed: %v", err) + } + resp.Body.Close() + if resp.StatusCode != 200 { + t.Fatalf("expected 200, got %d", resp.StatusCode) + } + + // Delete session + result := deleteSession(t, controlURL, session.SessionID) + if result["events"] == nil { + t.Fatal("expected events in delete response") + } + + // Verify port is freed + l, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) + if err != nil { + t.Fatalf("port %d should be free after session delete: %v", port, err) + } + l.Close() +} + +func TestSessionPortConflict(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + + // Create first session on the port + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1234"}, + Port: port, + }) + + // Try to create second session on same port — should fail with 409 + body, _ := json.Marshal(CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1234"}, + Port: port, + }) + resp, err := http.Post(controlURL+"/sessions", "application/json", bytes.NewReader(body)) + if err != nil { + t.Fatalf("failed to make request: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusConflict { + t.Fatalf("expected 409, got %d", resp.StatusCode) + } + + deleteSession(t, controlURL, session.SessionID) +} + +func TestWsPassthrough(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + }) + defer deleteSession(t, controlURL, session.SessionID) + + // Connect through proxy + conn := connectWs(t, fmt.Sprintf("localhost:%d", port), "key=test.key:secret") + defer conn.Close() + + // Should receive CONNECTED from mock upstream + msg := readWsMessage(t, conn, 2*time.Second) + action, _ := msg["action"].(float64) + if int(action) != ActionConnected { + t.Fatalf("expected CONNECTED (4), got %v", msg["action"]) + } + + // Send a frame — should be echoed back + testMsg := map[string]interface{}{"action": float64(ActionMessage), "channel": "test"} + testJSON, _ := json.Marshal(testMsg) + conn.WriteMessage(websocket.TextMessage, testJSON) + + echo := readWsMessage(t, conn, 2*time.Second) + echoAction, _ := echo["action"].(float64) + if int(echoAction) != ActionMessage { + t.Fatalf("expected echo of MESSAGE, got %v", echo["action"]) + } + + // Check event log + events := getLog(t, controlURL, session.SessionID) + hasConnect := false + frameCount := 0 + for _, e := range events { + if e.Type == "ws_connect" { + hasConnect = true + } + if e.Type == "ws_frame" { + frameCount++ + } + } + if !hasConnect { + t.Fatal("expected ws_connect event in log") + } + if frameCount < 2 { + t.Fatalf("expected at least 2 ws_frame events, got %d", frameCount) + } +} + +func TestHttpPassthrough(t *testing.T) { + upstreamHost, upstreamCleanup := startMockHttpServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RestHost: upstreamHost, Insecure: true}, + Port: port, + }) + defer deleteSession(t, controlURL, session.SessionID) + + // Make HTTP request through proxy + resp, err := http.Get(fmt.Sprintf("http://localhost:%d/channels/test/messages", port)) + if err != nil { + t.Fatalf("HTTP request failed: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != 200 { + t.Fatalf("expected 200, got %d", resp.StatusCode) + } + + var body map[string]string + json.NewDecoder(resp.Body).Decode(&body) + if body["path"] != "/channels/test/messages" { + t.Fatalf("expected path /channels/test/messages, got %s", body["path"]) + } + + // Check event log + events := getLog(t, controlURL, session.SessionID) + hasRequest := false + hasResponse := false + for _, e := range events { + if e.Type == "http_request" { + hasRequest = true + } + if e.Type == "http_response" { + hasResponse = true + } + } + if !hasRequest { + t.Fatal("expected http_request event in log") + } + if !hasResponse { + t.Fatal("expected http_response event in log") + } +} + +func TestWsConnectionRefusal(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "ws_connect"}, + Action: ActionConfig{Type: "refuse_connection"}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + // First connection should be refused + u := fmt.Sprintf("ws://localhost:%d/", port) + _, resp, err := websocket.DefaultDialer.Dial(u, nil) + if err == nil { + t.Fatal("expected connection to be refused") + } + if resp != nil && resp.StatusCode != http.StatusBadGateway { + t.Fatalf("expected 502, got %d", resp.StatusCode) + } + + // Second connection should succeed (rule was one-shot) + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + msg := readWsMessage(t, conn, 2*time.Second) + action, _ := msg["action"].(float64) + if int(action) != ActionConnected { + t.Fatalf("expected CONNECTED on second attempt, got %v", msg["action"]) + } +} + +func TestWsFrameSuppression(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "ws_frame_to_server", Action: "MESSAGE"}, + Action: ActionConfig{Type: "suppress"}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + + // Read CONNECTED + readWsMessage(t, conn, 2*time.Second) + + // Send MESSAGE — should be suppressed (not echoed) + msg1 := map[string]interface{}{"action": float64(ActionMessage), "channel": "test"} + msg1JSON, _ := json.Marshal(msg1) + conn.WriteMessage(websocket.TextMessage, msg1JSON) + + // Send another MESSAGE — should pass through (rule was one-shot) + msg2 := map[string]interface{}{"action": float64(ActionMessage), "channel": "test2"} + msg2JSON, _ := json.Marshal(msg2) + conn.WriteMessage(websocket.TextMessage, msg2JSON) + + // Should get echo of second message only + echo := readWsMessage(t, conn, 2*time.Second) + echoChannel, _ := echo["channel"].(string) + if echoChannel != "test2" { + t.Fatalf("expected echo of test2, got channel %s", echoChannel) + } +} + +func TestWsInjectAndClose(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + injectedMsg, _ := json.Marshal(map[string]interface{}{ + "action": ActionDisconnected, + "error": map[string]interface{}{"code": 40142, "statusCode": 401, "message": "Token expired"}, + }) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "ws_frame_to_client", Action: "CONNECTED"}, + Action: ActionConfig{Type: "inject_to_client_and_close", Message: injectedMsg}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + + // Should receive the injected DISCONNECTED message + msg := readWsMessage(t, conn, 2*time.Second) + action, _ := msg["action"].(float64) + if int(action) != ActionDisconnected { + t.Fatalf("expected DISCONNECTED (6), got %v", msg["action"]) + } + + // Connection should be closed + conn.SetReadDeadline(time.Now().Add(1 * time.Second)) + _, _, err := conn.ReadMessage() + if err == nil { + t.Fatal("expected connection to be closed after inject_to_client_and_close") + } +} + +func TestWsImperativeDisconnect(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + }) + defer deleteSession(t, controlURL, session.SessionID) + + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + + // Read CONNECTED + readWsMessage(t, conn, 2*time.Second) + + // Trigger disconnect via control API + triggerAction(t, controlURL, session.SessionID, ActionRequest{Type: "disconnect"}) + + // Connection should be closed + conn.SetReadDeadline(time.Now().Add(2 * time.Second)) + _, _, err := conn.ReadMessage() + if err == nil { + t.Fatal("expected connection to be closed after imperative disconnect") + } +} + +func TestHttpRespond(t *testing.T) { + upstreamHost, upstreamCleanup := startMockHttpServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "http_request", PathContains: "/channels/"}, + Action: ActionConfig{Type: "http_respond", Status: 401, Body: json.RawMessage(`{"error":{"code":40142,"message":"Token expired"}}`)}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RestHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + // First request to /channels/ should get fake 401 + resp, err := http.Get(fmt.Sprintf("http://localhost:%d/channels/test/messages", port)) + if err != nil { + t.Fatalf("request failed: %v", err) + } + resp.Body.Close() + if resp.StatusCode != 401 { + t.Fatalf("expected 401, got %d", resp.StatusCode) + } + + // Second request should pass through (rule was one-shot) + resp2, err := http.Get(fmt.Sprintf("http://localhost:%d/channels/test/messages", port)) + if err != nil { + t.Fatalf("request failed: %v", err) + } + resp2.Body.Close() + if resp2.StatusCode != 200 { + t.Fatalf("expected 200 on second request, got %d", resp2.StatusCode) + } +} + +func TestWsTemporalTrigger(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "delay_after_ws_connect", DelayMs: 200}, + Action: ActionConfig{Type: "disconnect"}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + + // Read CONNECTED + readWsMessage(t, conn, 2*time.Second) + + // Wait for temporal trigger to fire (200ms + margin) + conn.SetReadDeadline(time.Now().Add(2 * time.Second)) + _, _, err := conn.ReadMessage() + if err == nil { + t.Fatal("expected connection to be closed by temporal trigger") + } + + // Verify disconnect logged + time.Sleep(100 * time.Millisecond) + events := getLog(t, controlURL, session.SessionID) + hasDisconnect := false + for _, e := range events { + if e.Type == "ws_disconnect" && e.Initiator == "proxy" { + hasDisconnect = true + } + } + if !hasDisconnect { + t.Fatal("expected ws_disconnect event from proxy in log") + } +} + +func TestWsSuppressOnwards(t *testing.T) { + upstreamHost, upstreamCleanup := startMockWsServer(t) + defer upstreamCleanup() + + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + rulesJSON, _ := json.Marshal([]Rule{{ + Match: MatchConfig{Type: "delay_after_ws_connect", DelayMs: 200}, + Action: ActionConfig{Type: "suppress_onwards"}, + Times: 1, + }}) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: upstreamHost, Insecure: true}, + Port: port, + Rules: rulesJSON, + }) + defer deleteSession(t, controlURL, session.SessionID) + + conn := connectWs(t, fmt.Sprintf("localhost:%d", port)) + defer conn.Close() + + // Read CONNECTED (arrives before suppress_onwards fires) + readWsMessage(t, conn, 2*time.Second) + + // Wait for suppress_onwards to take effect + time.Sleep(400 * time.Millisecond) + + // Send a MESSAGE — the echo from server should be suppressed + testMsg := map[string]interface{}{"action": float64(ActionMessage), "channel": "test"} + testJSON, _ := json.Marshal(testMsg) + conn.WriteMessage(websocket.TextMessage, testJSON) + + // Should NOT receive a response (server echoes but proxy suppresses) + conn.SetReadDeadline(time.Now().Add(500 * time.Millisecond)) + _, _, err := conn.ReadMessage() + if err == nil { + t.Fatal("expected no message after suppress_onwards") + } +} + +// Tests that exercise the rule matching logic without needing a real upstream. + +func TestRuleMatching(t *testing.T) { + rule := &Rule{ + Match: MatchConfig{ + Type: "ws_frame_to_server", + Action: "ATTACH", + }, + Action: ActionConfig{Type: "suppress"}, + } + + // Should match + event := MatchEvent{Type: "ws_frame_to_server", Action: ActionAttach} + if !rule.Matches(event) { + t.Fatal("expected rule to match ATTACH frame") + } + + // Should not match different action + event2 := MatchEvent{Type: "ws_frame_to_server", Action: ActionDetach} + if rule.Matches(event2) { + t.Fatal("expected rule to NOT match DETACH frame") + } + + // Should not match different type + event3 := MatchEvent{Type: "ws_frame_to_client", Action: ActionAttach} + if rule.Matches(event3) { + t.Fatal("expected rule to NOT match client-direction frame") + } +} + +func TestRuleMatchingWithChannel(t *testing.T) { + rule := &Rule{ + Match: MatchConfig{ + Type: "ws_frame_to_server", + Action: "ATTACH", + Channel: "my-channel", + }, + Action: ActionConfig{Type: "suppress"}, + } + + event := MatchEvent{Type: "ws_frame_to_server", Action: ActionAttach, Channel: "my-channel"} + if !rule.Matches(event) { + t.Fatal("expected rule to match") + } + + event2 := MatchEvent{Type: "ws_frame_to_server", Action: ActionAttach, Channel: "other-channel"} + if rule.Matches(event2) { + t.Fatal("expected rule to NOT match different channel") + } +} + +func TestRuleMatchingWsConnect(t *testing.T) { + rule := &Rule{ + Match: MatchConfig{ + Type: "ws_connect", + QueryContains: map[string]string{"resume": "*"}, + }, + Action: ActionConfig{Type: "refuse_connection"}, + } + + // Should match when resume is present + event := MatchEvent{Type: "ws_connect", Action: -1, QueryParams: map[string]string{"resume": "key-1", "key": "abc"}} + if !rule.Matches(event) { + t.Fatal("expected rule to match when resume param present") + } + + // Should not match when resume is absent + event2 := MatchEvent{Type: "ws_connect", Action: -1, QueryParams: map[string]string{"key": "abc"}} + if rule.Matches(event2) { + t.Fatal("expected rule to NOT match when resume param absent") + } +} + +func TestRuleMatchingHttpRequest(t *testing.T) { + rule := &Rule{ + Match: MatchConfig{ + Type: "http_request", + Method: "POST", + PathContains: "/channels/", + }, + Action: ActionConfig{Type: "http_respond", Status: 401}, + } + + event := MatchEvent{Type: "http_request", Action: -1, Method: "POST", Path: "/channels/test/messages"} + if !rule.Matches(event) { + t.Fatal("expected rule to match") + } + + event2 := MatchEvent{Type: "http_request", Action: -1, Method: "GET", Path: "/channels/test/messages"} + if rule.Matches(event2) { + t.Fatal("expected rule to NOT match GET") + } + + event3 := MatchEvent{Type: "http_request", Action: -1, Method: "POST", Path: "/time"} + if rule.Matches(event3) { + t.Fatal("expected rule to NOT match /time") + } +} + +func TestSessionFindMatchingRuleWithCount(t *testing.T) { + session := &Session{ + Rules: []*Rule{ + { + Match: MatchConfig{Type: "ws_connect", Count: 2}, + Action: ActionConfig{Type: "refuse_connection"}, + }, + }, + EventLog: NewEventLog(), + } + + event := MatchEvent{Type: "ws_connect", Action: -1} + + // First attempt — count=1, rule wants count=2, should not fire + rule1, _ := session.FindMatchingRule(event) + if rule1 != nil { + t.Fatal("expected no match on first ws_connect") + } + + // Second attempt — count=2, should fire + rule2, _ := session.FindMatchingRule(event) + if rule2 == nil { + t.Fatal("expected match on second ws_connect") + } +} + +func TestRuleTimesLimit(t *testing.T) { + session := &Session{ + Rules: []*Rule{ + { + Match: MatchConfig{Type: "ws_frame_to_server", Action: "ATTACH"}, + Action: ActionConfig{Type: "suppress"}, + Times: 1, + }, + }, + EventLog: NewEventLog(), + } + + event := MatchEvent{Type: "ws_frame_to_server", Action: ActionAttach} + + // First match — should fire + rule, _ := session.FindMatchingRule(event) + if rule == nil { + t.Fatal("expected match") + } + session.FireRule(rule) + + // Rule should be removed (times=1, fired once) + if len(session.Rules) != 0 { + t.Fatalf("expected 0 rules, got %d", len(session.Rules)) + } + + // Second match — no rule + rule2, _ := session.FindMatchingRule(event) + if rule2 != nil { + t.Fatal("expected no match after rule exhausted") + } +} + +func TestProtocolParseJSON(t *testing.T) { + msg := `{"action":10,"channel":"test-channel"}` + pm := ParseProtocolMessage([]byte(msg), websocket.TextMessage) + if pm.Action != ActionAttach { + t.Fatalf("expected action %d, got %d", ActionAttach, pm.Action) + } + if pm.Channel != "test-channel" { + t.Fatalf("expected channel test-channel, got %s", pm.Channel) + } +} + +func TestProtocolParseJSONWithError(t *testing.T) { + msg := `{"action":9,"error":{"code":40142,"statusCode":401,"message":"Token expired"}}` + pm := ParseProtocolMessage([]byte(msg), websocket.TextMessage) + if pm.Action != ActionError { + t.Fatalf("expected action %d, got %d", ActionError, pm.Action) + } + if pm.Error == nil { + t.Fatal("expected error to be parsed") + } + if pm.Error.Code != 40142 { + t.Fatalf("expected error code 40142, got %d", pm.Error.Code) + } +} + +func TestProtocolParseMsgpack(t *testing.T) { + // Build a msgpack map: {"action": 10, "channel": "test"} + // Using raw msgpack encoding + raw := map[string]interface{}{ + "action": 10, + "channel": "test", + } + data := mustMarshalMsgpack(t, raw) + pm := ParseProtocolMessage(data, websocket.BinaryMessage) + if pm.Action != ActionAttach { + t.Fatalf("expected action %d, got %d", ActionAttach, pm.Action) + } + if pm.Channel != "test" { + t.Fatalf("expected channel test, got %s", pm.Channel) + } +} + +func TestActionFromString(t *testing.T) { + tests := []struct { + input string + want int + }{ + {"ATTACH", ActionAttach}, + {"attach", ActionAttach}, + {"CONNECTED", ActionConnected}, + {"10", ActionAttach}, + {"4", ActionConnected}, + {"unknown", -1}, + } + for _, tt := range tests { + got := ActionFromString(tt.input) + if got != tt.want { + t.Errorf("ActionFromString(%q) = %d, want %d", tt.input, got, tt.want) + } + } +} + +func TestEventLog(t *testing.T) { + log := NewEventLog() + log.Append(Event{Type: "ws_connect"}) + log.Append(Event{Type: "ws_frame", Direction: "client_to_server"}) + + events := log.Events() + if len(events) != 2 { + t.Fatalf("expected 2 events, got %d", len(events)) + } + if events[0].Type != "ws_connect" { + t.Fatalf("expected ws_connect, got %s", events[0].Type) + } + if events[1].Direction != "client_to_server" { + t.Fatalf("expected client_to_server, got %s", events[1].Direction) + } +} + +func TestEventLogConcurrency(t *testing.T) { + el := NewEventLog() + var wg sync.WaitGroup + for i := 0; i < 100; i++ { + wg.Add(1) + go func(i int) { + defer wg.Done() + el.Append(Event{Type: fmt.Sprintf("event-%d", i)}) + }(i) + } + wg.Wait() + + events := el.Events() + if len(events) != 100 { + t.Fatalf("expected 100 events, got %d", len(events)) + } +} + +func TestSessionTimeout(t *testing.T) { + controlURL, store, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1234"}, + Port: port, + TimeoutMs: 200, // 200ms timeout + }) + + // Session should exist + _, ok := store.Get(session.SessionID) + if !ok { + t.Fatal("expected session to exist") + } + + // Wait for timeout + time.Sleep(500 * time.Millisecond) + + // Session should be cleaned up + _, ok = store.Get(session.SessionID) + if ok { + t.Fatal("expected session to be cleaned up after timeout") + } + + // Port should be free + l, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) + if err != nil { + t.Fatalf("port should be free after timeout: %v", err) + } + l.Close() +} + +func TestAddRulesDynamically(t *testing.T) { + controlURL, store, cleanup := startControlServer(t) + defer cleanup() + + port := freePort(t) + session := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1234"}, + Port: port, + }) + defer deleteSession(t, controlURL, session.SessionID) + + // Add a rule + ruleJSON, _ := json.Marshal(Rule{ + Match: MatchConfig{Type: "ws_connect"}, + Action: ActionConfig{Type: "refuse_connection"}, + Times: 1, + }) + addRules(t, controlURL, session.SessionID, []json.RawMessage{ruleJSON}, "append") + + // Verify rule count + sess, _ := store.Get(session.SessionID) + if sess.RuleCount() != 1 { + t.Fatalf("expected 1 rule, got %d", sess.RuleCount()) + } +} + +func TestMultipleRulesOrder(t *testing.T) { + session := &Session{ + Rules: []*Rule{ + { + Match: MatchConfig{Type: "ws_frame_to_server", Action: "ATTACH"}, + Action: ActionConfig{Type: "suppress"}, + Comment: "first", + }, + { + Match: MatchConfig{Type: "ws_frame_to_server"}, + Action: ActionConfig{Type: "passthrough"}, + Comment: "second", + }, + }, + EventLog: NewEventLog(), + } + + // ATTACH should match the first rule (more specific) + event := MatchEvent{Type: "ws_frame_to_server", Action: ActionAttach} + rule, _ := session.FindMatchingRule(event) + if rule == nil || rule.Comment != "first" { + t.Fatal("expected first rule to match") + } + + // MESSAGE should match the second rule (catch-all) + event2 := MatchEvent{Type: "ws_frame_to_server", Action: ActionMessage} + rule2, _ := session.FindMatchingRule(event2) + if rule2 == nil || rule2.Comment != "second" { + t.Fatal("expected second rule to match") + } +} + +func TestConcurrentSessions(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + port1 := freePort(t) + port2 := freePort(t) + + session1 := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:1111"}, + Port: port1, + }) + defer deleteSession(t, controlURL, session1.SessionID) + + session2 := createSession(t, controlURL, CreateSessionRequest{ + Target: TargetConfig{RealtimeHost: "localhost:2222"}, + Port: port2, + }) + defer deleteSession(t, controlURL, session2.SessionID) + + if session1.SessionID == session2.SessionID { + t.Fatal("expected different session IDs") + } + if session1.Proxy.Port == session2.Proxy.Port { + t.Fatal("expected different ports") + } +} + +func TestSessionNotFound(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + resp, _ := http.Get(controlURL + "/sessions/nonexistent") + if resp.StatusCode != http.StatusNotFound { + t.Fatalf("expected 404, got %d", resp.StatusCode) + } + resp.Body.Close() + + resp2, _ := http.Get(controlURL + "/sessions/nonexistent/log") + if resp2.StatusCode != http.StatusNotFound { + t.Fatalf("expected 404, got %d", resp2.StatusCode) + } + resp2.Body.Close() +} + +func TestCreateSessionValidation(t *testing.T) { + controlURL, _, cleanup := startControlServer(t) + defer cleanup() + + // Missing port + body, _ := json.Marshal(CreateSessionRequest{Target: TargetConfig{RealtimeHost: "localhost:1234"}}) + resp, _ := http.Post(controlURL+"/sessions", "application/json", bytes.NewReader(body)) + if resp.StatusCode != http.StatusBadRequest { + t.Fatalf("expected 400 for missing port, got %d", resp.StatusCode) + } + resp.Body.Close() + + // Missing target + body2, _ := json.Marshal(CreateSessionRequest{Port: 9999}) + resp2, _ := http.Post(controlURL+"/sessions", "application/json", bytes.NewReader(body2)) + if resp2.StatusCode != http.StatusBadRequest { + t.Fatalf("expected 400 for missing target, got %d", resp2.StatusCode) + } + resp2.Body.Close() +} + +// -- Msgpack test helper -- + +func mustMarshalMsgpack(t *testing.T, v interface{}) []byte { + t.Helper() + data, err := msgpack.Marshal(v) + if err != nil { + t.Fatalf("failed to marshal msgpack: %v", err) + } + return data +} diff --git a/uts/proxy/rule.go b/uts/proxy/rule.go new file mode 100644 index 000000000..53768ed74 --- /dev/null +++ b/uts/proxy/rule.go @@ -0,0 +1,110 @@ +package main + +import ( + "encoding/json" + "strings" +) + +// Rule represents a single proxy rule with match condition, action, and optional firing limit. +type Rule struct { + Match MatchConfig `json:"match"` + Action ActionConfig `json:"action"` + Times int `json:"times,omitempty"` // 0 = unlimited + Comment string `json:"comment,omitempty"` + + matchCount int // how many times the match condition was satisfied (for count matching) +} + +// MatchConfig describes when a rule fires. +type MatchConfig struct { + Type string `json:"type"` // ws_connect, ws_frame_to_server, ws_frame_to_client, http_request, delay_after_ws_connect + Count int `json:"count,omitempty"` // only match the Nth occurrence (1-based) + Action string `json:"action,omitempty"` // protocol message action name or number + Channel string `json:"channel,omitempty"` // channel name must equal this + Method string `json:"method,omitempty"` // HTTP method + PathContains string `json:"pathContains,omitempty"` // request path must contain this + QueryContains map[string]string `json:"queryContains,omitempty"` // query params that must be present ("*" = any value) + DelayMs int `json:"delayMs,omitempty"` // for delay_after_ws_connect +} + +// ActionConfig describes what happens when a rule fires. +type ActionConfig struct { + Type string `json:"type"` // passthrough, refuse_connection, accept_and_close, disconnect, close, suppress, delay, inject_to_client, inject_to_client_and_close, replace, suppress_onwards, http_respond, http_delay, http_drop, http_replace_response + CloseCode int `json:"closeCode,omitempty"` + DelayMs int `json:"delayMs,omitempty"` + Message json.RawMessage `json:"message,omitempty"` + Status int `json:"status,omitempty"` + Body json.RawMessage `json:"body,omitempty"` + Headers map[string]string `json:"headers,omitempty"` +} + +// MatchEvent is the context passed to rule matching. +type MatchEvent struct { + Type string // ws_connect, ws_frame_to_server, ws_frame_to_client, http_request + Action int // protocol message action (normalized to int), -1 if not applicable + ActionStr string // original action string for logging + Channel string // protocol message channel + Method string // HTTP method + Path string // HTTP request path + QueryParams map[string]string // WS connection query params +} + +// Matches checks whether this rule's match config matches the given event. +// It does NOT check the count — that is handled by FindMatchingRule. +func (r *Rule) Matches(event MatchEvent) bool { + m := r.Match + + if m.Type != event.Type { + return false + } + + // Action filter (for frame matches) + if m.Action != "" && event.Action >= 0 { + // Try matching by name or number + wantAction := ActionFromString(m.Action) + if wantAction < 0 { + return false // unknown action name + } + if wantAction != event.Action { + return false + } + } + + // Channel filter + if m.Channel != "" && m.Channel != event.Channel { + return false + } + + // HTTP method filter + if m.Method != "" && !strings.EqualFold(m.Method, event.Method) { + return false + } + + // HTTP path filter + if m.PathContains != "" && !strings.Contains(event.Path, m.PathContains) { + return false + } + + // Query param filter (for ws_connect) + if len(m.QueryContains) > 0 { + for k, v := range m.QueryContains { + actual, ok := event.QueryParams[k] + if !ok { + return false + } + if v != "*" && v != actual { + return false + } + } + } + + return true +} + +// ruleLabel returns a human-readable label for logging which rule matched. +func ruleLabel(rule *Rule, index int) string { + if rule.Comment != "" { + return rule.Comment + } + return "rule-" + strings.Repeat("0", 1) + string(rune('0'+index)) +} diff --git a/uts/proxy/server.go b/uts/proxy/server.go new file mode 100644 index 000000000..0849f6d24 --- /dev/null +++ b/uts/proxy/server.go @@ -0,0 +1,245 @@ +package main + +import ( + "encoding/json" + "fmt" + "io" + "log" + "net/http" + "strings" + "time" +) + +// Server is the control API server. +type Server struct { + store *SessionStore + mux *http.ServeMux +} + +// NewServer creates a new control API server. +func NewServer(store *SessionStore) *Server { + s := &Server{store: store} + s.mux = http.NewServeMux() + s.mux.HandleFunc("GET /health", s.handleHealth) + s.mux.HandleFunc("POST /sessions", s.handleCreateSession) + s.mux.HandleFunc("GET /sessions/{id}", s.handleGetSession) + s.mux.HandleFunc("POST /sessions/{id}/rules", s.handleAddRules) + s.mux.HandleFunc("POST /sessions/{id}/actions", s.handleAction) + s.mux.HandleFunc("GET /sessions/{id}/log", s.handleGetLog) + s.mux.HandleFunc("DELETE /sessions/{id}", s.handleDeleteSession) + return s +} + +// ServeHTTP implements http.Handler. +func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) { + s.mux.ServeHTTP(w, r) +} + +func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) { + writeJSON(w, http.StatusOK, map[string]bool{"ok": true}) +} + +func (s *Server) handleCreateSession(w http.ResponseWriter, r *http.Request) { + body, err := io.ReadAll(r.Body) + if err != nil { + writeError(w, http.StatusBadRequest, "failed to read request body") + return + } + + var req CreateSessionRequest + if err := json.Unmarshal(body, &req); err != nil { + writeError(w, http.StatusBadRequest, "invalid JSON: "+err.Error()) + return + } + + if req.Port <= 0 { + writeError(w, http.StatusBadRequest, "port is required and must be positive") + return + } + + if req.Target.RealtimeHost == "" && req.Target.RestHost == "" { + writeError(w, http.StatusBadRequest, "target must have at least one of realtimeHost or restHost") + return + } + + timeoutMs := req.TimeoutMs + if timeoutMs <= 0 { + timeoutMs = 30000 + } + + // Parse rules + var rules []*Rule + if len(req.Rules) > 0 { + if err := json.Unmarshal(req.Rules, &rules); err != nil { + writeError(w, http.StatusBadRequest, "invalid rules: "+err.Error()) + return + } + } + + session := &Session{ + ID: GenerateID(), + Target: req.Target, + Port: req.Port, + Rules: rules, + EventLog: NewEventLog(), + timeoutMs: timeoutMs, + } + + // Attempt to bind the port + if err := StartSessionListener(session, req.Port); err != nil { + writeError(w, http.StatusConflict, err.Error()) + return + } + + // Set up auto-cleanup timer + session.timeoutTimer = time.AfterFunc(time.Duration(timeoutMs)*time.Millisecond, func() { + log.Printf("session %s timed out after %dms, cleaning up", session.ID, timeoutMs) + s.cleanupSession(session.ID) + }) + + s.store.Create(session) + + resp := CreateSessionResponse{ + SessionID: session.ID, + Proxy: ProxyConfig{ + Host: fmt.Sprintf("localhost:%d", req.Port), + Port: req.Port, + }, + } + + log.Printf("created session %s on port %d (timeout %dms, %d rules)", + session.ID, req.Port, timeoutMs, len(rules)) + + writeJSON(w, http.StatusCreated, resp) +} + +func (s *Server) handleGetSession(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + session, ok := s.store.Get(id) + if !ok { + writeError(w, http.StatusNotFound, "session not found") + return + } + + writeJSON(w, http.StatusOK, map[string]interface{}{ + "sessionId": session.ID, + "port": session.Port, + "target": session.Target, + "ruleCount": session.RuleCount(), + }) +} + +func (s *Server) handleAddRules(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + session, ok := s.store.Get(id) + if !ok { + writeError(w, http.StatusNotFound, "session not found") + return + } + + body, err := io.ReadAll(r.Body) + if err != nil { + writeError(w, http.StatusBadRequest, "failed to read request body") + return + } + + var req AddRulesRequest + if err := json.Unmarshal(body, &req); err != nil { + writeError(w, http.StatusBadRequest, "invalid JSON: "+err.Error()) + return + } + + var rules []*Rule + if err := json.Unmarshal(req.Rules, &rules); err != nil { + writeError(w, http.StatusBadRequest, "invalid rules: "+err.Error()) + return + } + + prepend := strings.EqualFold(req.Position, "prepend") + session.AddRules(rules, prepend) + session.ResetTimeout() + + writeJSON(w, http.StatusOK, map[string]int{"ruleCount": session.RuleCount()}) +} + +func (s *Server) handleAction(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + session, ok := s.store.Get(id) + if !ok { + writeError(w, http.StatusNotFound, "session not found") + return + } + + body, err := io.ReadAll(r.Body) + if err != nil { + writeError(w, http.StatusBadRequest, "failed to read request body") + return + } + + var req ActionRequest + if err := json.Unmarshal(body, &req); err != nil { + writeError(w, http.StatusBadRequest, "invalid JSON: "+err.Error()) + return + } + + session.ResetTimeout() + + if err := ExecuteImperativeAction(session, req); err != nil { + writeError(w, http.StatusConflict, err.Error()) + return + } + + writeJSON(w, http.StatusOK, map[string]bool{"ok": true}) +} + +func (s *Server) handleGetLog(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + session, ok := s.store.Get(id) + if !ok { + writeError(w, http.StatusNotFound, "session not found") + return + } + + writeJSON(w, http.StatusOK, map[string]interface{}{ + "events": session.EventLog.Events(), + }) +} + +func (s *Server) handleDeleteSession(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + events := s.cleanupSession(id) + if events == nil { + writeError(w, http.StatusNotFound, "session not found") + return + } + + writeJSON(w, http.StatusOK, map[string]interface{}{ + "events": events, + }) +} + +func (s *Server) cleanupSession(id string) []Event { + session, ok := s.store.Delete(id) + if !ok { + return nil + } + + log.Printf("cleaning up session %s on port %d", session.ID, session.Port) + + StopSessionListener(session) + session.Close() + + return session.EventLog.Events() +} + +// -- helpers -- + +func writeJSON(w http.ResponseWriter, status int, v interface{}) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(status) + json.NewEncoder(w).Encode(v) +} + +func writeError(w http.ResponseWriter, status int, msg string) { + writeJSON(w, status, map[string]string{"error": msg}) +} diff --git a/uts/proxy/session.go b/uts/proxy/session.go new file mode 100644 index 000000000..8d554610b --- /dev/null +++ b/uts/proxy/session.go @@ -0,0 +1,286 @@ +package main + +import ( + "encoding/json" + "fmt" + "net" + "net/http" + "sync" + "time" +) + +// TargetConfig specifies the upstream Ably hosts. +type TargetConfig struct { + RealtimeHost string `json:"realtimeHost,omitempty"` + RestHost string `json:"restHost,omitempty"` + Insecure bool `json:"insecure,omitempty"` // if true, use ws:// and http:// upstream (for testing) +} + +// Session represents a single test session with its own port, rules, and state. +type Session struct { + ID string `json:"id"` + Target TargetConfig `json:"target"` + Port int `json:"port"` + Rules []*Rule `json:"-"` + EventLog *EventLog `json:"-"` + + listener net.Listener + Server *http.Server + timeoutTimer *time.Timer + timeoutMs int + + activeWsConns []*WsConnection + wsConnectCount int + httpReqCount int + + suppressServerToClient bool + suppressClientToServer bool + + mu sync.Mutex +} + +// WsConnection tracks an active proxied WebSocket connection. +type WsConnection struct { + ClientConn interface{} // *websocket.Conn — set during ws_proxy + ServerConn interface{} // *websocket.Conn — set during ws_proxy + ConnNumber int + timers []*time.Timer + closed bool + closeCh chan struct{} // closed when connection is torn down + mu sync.Mutex +} + +// NewWsConnection creates a new WsConnection. +func NewWsConnection(connNumber int) *WsConnection { + return &WsConnection{ + ConnNumber: connNumber, + closeCh: make(chan struct{}), + } +} + +// MarkClosed marks this connection as closed and signals the closeCh. +func (wc *WsConnection) MarkClosed() { + wc.mu.Lock() + defer wc.mu.Unlock() + if !wc.closed { + wc.closed = true + close(wc.closeCh) + } +} + +// IsClosed returns whether this connection has been closed. +func (wc *WsConnection) IsClosed() bool { + wc.mu.Lock() + defer wc.mu.Unlock() + return wc.closed +} + +// CancelTimers stops all pending timers for this connection. +func (wc *WsConnection) CancelTimers() { + wc.mu.Lock() + defer wc.mu.Unlock() + for _, t := range wc.timers { + t.Stop() + } + wc.timers = nil +} + +// AddTimer adds a timer to this connection's timer list. +func (wc *WsConnection) AddTimer(t *time.Timer) { + wc.mu.Lock() + defer wc.mu.Unlock() + wc.timers = append(wc.timers, t) +} + +// FindMatchingRule iterates rules in order and returns the first match. +// It handles count tracking: increments the rule's matchCount when the +// base condition matches, and only returns the rule if count is satisfied. +// Returns the rule and its index, or nil/-1 if no rule matches. +func (s *Session) FindMatchingRule(event MatchEvent) (*Rule, int) { + s.mu.Lock() + defer s.mu.Unlock() + + for i, rule := range s.Rules { + if !rule.Matches(event) { + continue + } + + // Base condition matches — increment match count + rule.matchCount++ + + // Check count constraint + if rule.Match.Count > 0 && rule.matchCount != rule.Match.Count { + continue + } + + return rule, i + } + return nil, -1 +} + +// FireRule records that a rule has fired and removes it if times is exhausted. +func (s *Session) FireRule(rule *Rule) { + s.mu.Lock() + defer s.mu.Unlock() + s.fireRuleLocked(rule) +} + +func (s *Session) fireRuleLocked(rule *Rule) { + if rule.Times > 0 { + rule.Times-- + if rule.Times <= 0 { + s.removeRuleLocked(rule) + } + } +} + +func (s *Session) removeRuleLocked(rule *Rule) { + for i, r := range s.Rules { + if r == rule { + s.Rules = append(s.Rules[:i], s.Rules[i+1:]...) + return + } + } +} + +// AddRules appends or prepends rules to the session. +func (s *Session) AddRules(rules []*Rule, prepend bool) { + s.mu.Lock() + defer s.mu.Unlock() + if prepend { + s.Rules = append(rules, s.Rules...) + } else { + s.Rules = append(s.Rules, rules...) + } +} + +// RuleCount returns the current number of rules. +func (s *Session) RuleCount() int { + s.mu.Lock() + defer s.mu.Unlock() + return len(s.Rules) +} + +// GetActiveWsConn returns the most recently added active WS connection, or nil. +func (s *Session) GetActiveWsConn() *WsConnection { + s.mu.Lock() + defer s.mu.Unlock() + for i := len(s.activeWsConns) - 1; i >= 0; i-- { + if !s.activeWsConns[i].IsClosed() { + return s.activeWsConns[i] + } + } + return nil +} + +// AddWsConn registers a new WS connection and increments the connect count. +func (s *Session) AddWsConn(wc *WsConnection) { + s.mu.Lock() + defer s.mu.Unlock() + s.wsConnectCount++ + wc.ConnNumber = s.wsConnectCount + s.activeWsConns = append(s.activeWsConns, wc) +} + +// RemoveWsConn removes a WS connection from the active list. +func (s *Session) RemoveWsConn(wc *WsConnection) { + s.mu.Lock() + defer s.mu.Unlock() + for i, c := range s.activeWsConns { + if c == wc { + s.activeWsConns = append(s.activeWsConns[:i], s.activeWsConns[i+1:]...) + return + } + } +} + +// IncrementHttpReqCount increments and returns the HTTP request count. +func (s *Session) IncrementHttpReqCount() int { + s.mu.Lock() + defer s.mu.Unlock() + s.httpReqCount++ + return s.httpReqCount +} + +// ResetTimeout resets the session's auto-cleanup timer. +func (s *Session) ResetTimeout() { + s.mu.Lock() + defer s.mu.Unlock() + if s.timeoutTimer != nil { + s.timeoutTimer.Stop() + } + if s.timeoutMs > 0 { + s.timeoutTimer = time.AfterFunc(time.Duration(s.timeoutMs)*time.Millisecond, func() { + // Auto-cleanup is handled by the session store + }) + } +} + +// Close shuts down the session: closes all WS connections, cancels timers, closes listener. +func (s *Session) Close() { + s.mu.Lock() + defer s.mu.Unlock() + + if s.timeoutTimer != nil { + s.timeoutTimer.Stop() + s.timeoutTimer = nil + } + + for _, wc := range s.activeWsConns { + wc.CancelTimers() + wc.MarkClosed() + } + s.activeWsConns = nil + + if s.listener != nil { + s.listener.Close() + s.listener = nil + } +} + +// LogRuleMatch returns a string pointer for logging which rule matched, or nil. +func LogRuleMatch(rule *Rule, index int) *string { + if rule == nil { + return nil + } + label := fmt.Sprintf("rule-%d", index) + if rule.Comment != "" { + label = rule.Comment + } + return &label +} + +// -- API request/response types -- + +// CreateSessionRequest is the JSON body for POST /sessions. +type CreateSessionRequest struct { + Target TargetConfig `json:"target"` + Rules json.RawMessage `json:"rules,omitempty"` + TimeoutMs int `json:"timeoutMs,omitempty"` + Port int `json:"port"` +} + +// CreateSessionResponse is the JSON response for POST /sessions. +type CreateSessionResponse struct { + SessionID string `json:"sessionId"` + Proxy ProxyConfig `json:"proxy"` +} + +// ProxyConfig describes how to connect to the session's proxy. +type ProxyConfig struct { + Host string `json:"host"` + Port int `json:"port"` +} + +// AddRulesRequest is the JSON body for POST /sessions/{id}/rules. +type AddRulesRequest struct { + Rules json.RawMessage `json:"rules"` + Position string `json:"position,omitempty"` // "append" (default) or "prepend" +} + +// ActionRequest is the JSON body for POST /sessions/{id}/actions. +type ActionRequest struct { + Type string `json:"type"` + Message json.RawMessage `json:"message,omitempty"` + CloseCode int `json:"closeCode,omitempty"` +} diff --git a/uts/proxy/session_store.go b/uts/proxy/session_store.go new file mode 100644 index 000000000..a2028ff27 --- /dev/null +++ b/uts/proxy/session_store.go @@ -0,0 +1,64 @@ +package main + +import ( + "crypto/rand" + "encoding/hex" + "sync" +) + +// SessionStore is a thread-safe store for sessions. +type SessionStore struct { + sessions map[string]*Session + mu sync.RWMutex +} + +// NewSessionStore creates a new empty session store. +func NewSessionStore() *SessionStore { + return &SessionStore{ + sessions: make(map[string]*Session), + } +} + +// Create adds a session to the store. +func (s *SessionStore) Create(session *Session) { + s.mu.Lock() + defer s.mu.Unlock() + s.sessions[session.ID] = session +} + +// Get returns a session by ID. +func (s *SessionStore) Get(id string) (*Session, bool) { + s.mu.RLock() + defer s.mu.RUnlock() + session, ok := s.sessions[id] + return session, ok +} + +// Delete removes a session by ID, returning it if found. +func (s *SessionStore) Delete(id string) (*Session, bool) { + s.mu.Lock() + defer s.mu.Unlock() + session, ok := s.sessions[id] + if ok { + delete(s.sessions, id) + } + return session, ok +} + +// All returns all sessions. +func (s *SessionStore) All() []*Session { + s.mu.RLock() + defer s.mu.RUnlock() + out := make([]*Session, 0, len(s.sessions)) + for _, session := range s.sessions { + out = append(out, session) + } + return out +} + +// GenerateID creates a random 8-character hex session ID. +func GenerateID() string { + b := make([]byte, 4) + rand.Read(b) + return hex.EncodeToString(b) +} diff --git a/uts/proxy/test-proxy b/uts/proxy/test-proxy new file mode 100755 index 000000000..a70f75f15 Binary files /dev/null and b/uts/proxy/test-proxy differ diff --git a/uts/proxy/ws_proxy.go b/uts/proxy/ws_proxy.go new file mode 100644 index 000000000..344a84898 --- /dev/null +++ b/uts/proxy/ws_proxy.go @@ -0,0 +1,500 @@ +package main + +import ( + "crypto/tls" + "encoding/json" + "fmt" + "log" + "net/http" + "net/url" + "sync" + "time" + + "github.com/gorilla/websocket" +) + +var wsUpgrader = websocket.Upgrader{ + CheckOrigin: func(r *http.Request) bool { return true }, +} + +// HandleWsProxy handles a WebSocket connection from the SDK client, +// proxying it to the upstream Ably realtime host. +func HandleWsProxy(session *Session, w http.ResponseWriter, r *http.Request) { + // Build query params map for logging and matching + queryParams := make(map[string]string) + for k, v := range r.URL.Query() { + if len(v) > 0 { + queryParams[k] = v[0] + } + } + + // Create WsConnection and register it + wc := NewWsConnection(0) + session.AddWsConn(wc) + defer func() { + wc.CancelTimers() + wc.MarkClosed() + session.RemoveWsConn(wc) + }() + + // Log ws_connect event + connectURL := fmt.Sprintf("ws://%s%s", r.Host, r.URL.String()) + session.EventLog.Append(Event{ + Type: "ws_connect", + URL: connectURL, + QueryParams: queryParams, + }) + + // Check rules for ws_connect match + matchEvent := MatchEvent{ + Type: "ws_connect", + Action: -1, + QueryParams: queryParams, + } + + rule, ruleIdx := session.FindMatchingRule(matchEvent) + if rule != nil { + session.FireRule(rule) + + switch rule.Action.Type { + case "refuse_connection": + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + RuleMatched: LogRuleMatch(rule, ruleIdx), + }) + http.Error(w, "connection refused by proxy rule", http.StatusBadGateway) + return + + case "accept_and_close": + clientConn, err := wsUpgrader.Upgrade(w, r, nil) + if err != nil { + log.Printf("session %s: failed to upgrade WS for accept_and_close: %v", session.ID, err) + return + } + closeCode := rule.Action.CloseCode + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + msg := websocket.FormatCloseMessage(closeCode, "") + clientConn.WriteMessage(websocket.CloseMessage, msg) + clientConn.Close() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + CloseCode: closeCode, + RuleMatched: LogRuleMatch(rule, ruleIdx), + }) + return + } + // For other action types on ws_connect, fall through to normal proxying + } + + // Build upstream URL + if session.Target.RealtimeHost == "" { + http.Error(w, "no realtime host configured", http.StatusBadGateway) + return + } + + scheme := "wss" + if session.Target.Insecure { + scheme = "ws" + } + upstreamURL := url.URL{ + Scheme: scheme, + Host: session.Target.RealtimeHost, + Path: r.URL.Path, + RawQuery: r.URL.RawQuery, + } + + // Dial upstream + dialer := websocket.Dialer{} + if !session.Target.Insecure { + dialer.TLSClientConfig = &tls.Config{} + } + serverConn, _, err := dialer.Dial(upstreamURL.String(), nil) + if err != nil { + log.Printf("session %s: failed to dial upstream %s: %v", session.ID, upstreamURL.String(), err) + http.Error(w, fmt.Sprintf("failed to connect to upstream: %v", err), http.StatusBadGateway) + return + } + defer serverConn.Close() + + // Accept client WebSocket upgrade + clientConn, err := wsUpgrader.Upgrade(w, r, nil) + if err != nil { + log.Printf("session %s: failed to upgrade WS: %v", session.ID, err) + return + } + defer clientConn.Close() + + // Store connections + wc.ClientConn = clientConn + wc.ServerConn = serverConn + + // Schedule temporal triggers + scheduleTemporalTriggers(session, wc) + + // Relay frames between client and server + var wg sync.WaitGroup + wg.Add(2) + + // server → client relay + go func() { + defer wg.Done() + relayFrames(session, wc, serverConn, clientConn, "server_to_client", "ws_frame_to_client") + }() + + // client → server relay + go func() { + defer wg.Done() + relayFrames(session, wc, clientConn, serverConn, "client_to_server", "ws_frame_to_server") + }() + + wg.Wait() + + // Log disconnect if not already logged + if !wc.IsClosed() { + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "client", + }) + } +} + +// relayFrames reads frames from src and writes to dst, applying rules. +func relayFrames(session *Session, wc *WsConnection, src, dst *websocket.Conn, direction, matchType string) { + for { + if wc.IsClosed() { + return + } + + msgType, data, err := src.ReadMessage() + if err != nil { + if !wc.IsClosed() { + initiator := "client" + if direction == "server_to_client" { + initiator = "server" + } + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: initiator, + }) + wc.MarkClosed() + // Close the other side + dst.Close() + } + return + } + + // Check suppress_onwards flag + session.mu.Lock() + suppressed := false + if direction == "server_to_client" && session.suppressServerToClient { + suppressed = true + } else if direction == "client_to_server" && session.suppressClientToServer { + suppressed = true + } + session.mu.Unlock() + + if suppressed { + continue + } + + // Parse protocol message for rule matching and logging + pm := ParseProtocolMessage(data, msgType) + + // Log the frame (as JSON for readability, even if binary) + var logMsg json.RawMessage + if msgType == websocket.TextMessage { + logMsg = json.RawMessage(data) + } else { + // For binary frames, log the parsed summary + logMsg = mustMarshal(map[string]interface{}{ + "action": pm.Action, + "channel": pm.Channel, + "_binary": true, + }) + } + + // Build match event + matchEvent := MatchEvent{ + Type: matchType, + Action: pm.Action, + Channel: pm.Channel, + } + + // Find matching rule + rule, ruleIdx := session.FindMatchingRule(matchEvent) + + ruleLabel := LogRuleMatch(rule, ruleIdx) + + // Log the frame + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: direction, + Message: logMsg, + RuleMatched: ruleLabel, + }) + + if rule == nil { + // No rule matched — passthrough + if err := dst.WriteMessage(msgType, data); err != nil { + wc.MarkClosed() + return + } + continue + } + + // Execute rule action + session.FireRule(rule) + + switch rule.Action.Type { + case "passthrough": + if err := dst.WriteMessage(msgType, data); err != nil { + wc.MarkClosed() + return + } + + case "suppress": + // Don't forward + + case "delay": + time.Sleep(time.Duration(rule.Action.DelayMs) * time.Millisecond) + if err := dst.WriteMessage(msgType, data); err != nil { + wc.MarkClosed() + return + } + + case "inject_to_client": + // Send the injected message to client + if direction == "server_to_client" || direction == "client_to_server" { + clientConn := wc.ClientConn.(*websocket.Conn) + clientConn.WriteMessage(websocket.TextMessage, rule.Action.Message) + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: "server_to_client", + Message: rule.Action.Message, + Initiator: "proxy", + }) + } + // Also forward the original + if err := dst.WriteMessage(msgType, data); err != nil { + wc.MarkClosed() + return + } + + case "inject_to_client_and_close": + clientConn := wc.ClientConn.(*websocket.Conn) + clientConn.WriteMessage(websocket.TextMessage, rule.Action.Message) + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: "server_to_client", + Message: rule.Action.Message, + Initiator: "proxy", + }) + // Close + closeCode := rule.Action.CloseCode + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + closeMsg := websocket.FormatCloseMessage(closeCode, "") + clientConn.WriteMessage(websocket.CloseMessage, closeMsg) + clientConn.UnderlyingConn().Close() + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + CloseCode: closeCode, + }) + return + + case "replace": + // Send replacement message instead of original + if err := dst.WriteMessage(websocket.TextMessage, rule.Action.Message); err != nil { + wc.MarkClosed() + return + } + + case "disconnect": + // Abrupt close + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + clientConn.UnderlyingConn().Close() + } + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + }) + return + + case "close": + closeCode := rule.Action.CloseCode + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + closeMsg := websocket.FormatCloseMessage(closeCode, "") + clientConn.WriteMessage(websocket.CloseMessage, closeMsg) + clientConn.UnderlyingConn().Close() + } + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + CloseCode: closeCode, + }) + return + + case "suppress_onwards": + session.mu.Lock() + if direction == "server_to_client" { + session.suppressServerToClient = true + } else { + session.suppressClientToServer = true + } + session.mu.Unlock() + // Don't forward this frame either + + default: + // Unknown action — passthrough + if err := dst.WriteMessage(msgType, data); err != nil { + wc.MarkClosed() + return + } + } + } +} + +// scheduleTemporalTriggers sets up delay_after_ws_connect timers. +func scheduleTemporalTriggers(session *Session, wc *WsConnection) { + session.mu.Lock() + defer session.mu.Unlock() + + for _, rule := range session.Rules { + if rule.Match.Type != "delay_after_ws_connect" { + continue + } + + r := rule // capture for closure + delayMs := r.Match.DelayMs + if delayMs <= 0 { + delayMs = 0 + } + + timer := time.AfterFunc(time.Duration(delayMs)*time.Millisecond, func() { + if wc.IsClosed() { + return + } + log.Printf("session %s: temporal trigger fired (delay %dms): %s", session.ID, delayMs, r.Action.Type) + executeTemporalAction(session, wc, r) + session.FireRule(r) + }) + wc.AddTimer(timer) + } +} + +// executeTemporalAction executes an action from a temporal trigger. +func executeTemporalAction(session *Session, wc *WsConnection, rule *Rule) { + ruleLabel := rule.Comment + if ruleLabel == "" { + ruleLabel = "temporal-trigger" + } + + switch rule.Action.Type { + case "disconnect": + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + clientConn.UnderlyingConn().Close() + } + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + RuleMatched: &ruleLabel, + }) + + case "close": + closeCode := rule.Action.CloseCode + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + closeMsg := websocket.FormatCloseMessage(closeCode, "") + clientConn.WriteMessage(websocket.CloseMessage, closeMsg) + clientConn.UnderlyingConn().Close() + } + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + CloseCode: closeCode, + RuleMatched: &ruleLabel, + }) + + case "inject_to_client": + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + clientConn.WriteMessage(websocket.TextMessage, rule.Action.Message) + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: "server_to_client", + Message: rule.Action.Message, + Initiator: "proxy", + RuleMatched: &ruleLabel, + }) + } + + case "inject_to_client_and_close": + if clientConn, ok := wc.ClientConn.(*websocket.Conn); ok { + clientConn.WriteMessage(websocket.TextMessage, rule.Action.Message) + session.EventLog.Append(Event{ + Type: "ws_frame", + Direction: "server_to_client", + Message: rule.Action.Message, + Initiator: "proxy", + RuleMatched: &ruleLabel, + }) + closeCode := rule.Action.CloseCode + if closeCode <= 0 { + closeCode = websocket.CloseNormalClosure + } + closeMsg := websocket.FormatCloseMessage(closeCode, "") + clientConn.WriteMessage(websocket.CloseMessage, closeMsg) + clientConn.UnderlyingConn().Close() + } + if serverConn, ok := wc.ServerConn.(*websocket.Conn); ok { + serverConn.UnderlyingConn().Close() + } + wc.MarkClosed() + session.EventLog.Append(Event{ + Type: "ws_disconnect", + Initiator: "proxy", + RuleMatched: &ruleLabel, + }) + + case "suppress_onwards": + session.mu.Lock() + session.suppressServerToClient = true + session.mu.Unlock() + session.EventLog.Append(Event{ + Type: "action", + Initiator: "proxy", + Message: mustMarshal(map[string]string{"type": "suppress_onwards", "direction": "server_to_client"}), + RuleMatched: &ruleLabel, + }) + } +} diff --git a/uts/realtime/integration/helpers/proxy.md b/uts/realtime/integration/helpers/proxy.md new file mode 100644 index 000000000..a157d7196 --- /dev/null +++ b/uts/realtime/integration/helpers/proxy.md @@ -0,0 +1,233 @@ +# Proxy Infrastructure for Integration Tests + +## Overview + +The Ably test proxy is a programmable HTTP/WebSocket proxy (Go, at `uts/test/proxy/`) that sits between the SDK under test and the Ably sandbox. It transparently forwards traffic by default, but can be configured with rules to inject faults — dropped connections, modified responses, injected protocol messages, delayed frames, etc. + +Proxy integration tests use this to verify fault-handling behaviour against the real Ably backend, providing higher confidence than unit tests with mocked transports. + +## When to Use Proxy Tests vs Unit Tests vs Direct Sandbox Tests + +| Test type | When to use | +|-----------|-------------| +| **Unit test** (mock HTTP/WebSocket) | Testing client-side logic, state machines, request formation, error parsing. Fast, deterministic. | +| **Direct sandbox integration** | Testing happy-path behaviour: connect, publish, subscribe, presence. No fault injection needed. | +| **Proxy integration test** | Testing fault behaviour against the real backend: connection failures, resume, heartbeat starvation, token renewal under network errors, channel error injection. | + +## Proxy Session Lifecycle + +```pseudo +# 1. Create a proxy session with rules +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [ ...rules... ] +) + +# 2. Connect SDK through proxy +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", # REC1b2: sets both restHost and realtimeHost + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, # Required: Dart SDK doesn't implement msgpack + autoConnect: false + # Note: explicit hostname endpoint automatically disables fallback hosts (REC2c2) +)) + +# 3. Run test scenario +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + +# 4. (Optional) Add rules dynamically or trigger imperative actions +session.add_rules(new_rules, position: "prepend") +session.trigger_action({ type: "disconnect" }) + +# 5. (Optional) Verify proxy event log +log = session.get_log() +ASSERT log CONTAINS event WHERE type == "ws_connect" AND queryParams.resume IS NOT null + +# 6. Clean up +client.close() +session.close() +``` + +## Proxy Session Interface + +```pseudo +interface ProxySession: + session_id: String + proxy_host: String # Always "localhost" + proxy_port: Int # Assigned from port pool + + add_rules(rules: List, position?: "append"|"prepend") + trigger_action(action: ActionRequest) + get_log(): List + close() + +function create_proxy_session( + endpoint: String, # e.g. "sandbox" → resolves to sandbox-realtime.ably.io / sandbox-rest.ably.io + port: Int, + rules?: List, + timeoutMs?: Int # Session auto-cleanup timeout (default 30000) +): ProxySession +``` + +## Rule Format + +Each rule has a **match** condition, an **action** to perform, and an optional **times** limit: + +```json +{ + "match": { ... }, + "action": { ... }, + "times": 1, + "comment": "human-readable label" +} +``` + +Rules are evaluated in order. First matching rule wins. Unmatched traffic passes through unchanged. When `times` is specified, the rule auto-removes after that many firings. + +### Match Conditions + +```json +// WebSocket connection attempt +{ "type": "ws_connect" } +{ "type": "ws_connect", "count": 2 } +{ "type": "ws_connect", "queryContains": { "resume": "*" } } + +// WebSocket frame: server → client +{ "type": "ws_frame_to_client", "action": "CONNECTED" } +{ "type": "ws_frame_to_client", "action": "ATTACHED", "channel": "my-channel" } + +// WebSocket frame: client → server +{ "type": "ws_frame_to_server", "action": "ATTACH", "channel": "my-channel" } + +// HTTP request +{ "type": "http_request", "pathContains": "/channels/" } +{ "type": "http_request", "method": "POST", "pathContains": "/keys/" } + +// Temporal trigger (fires once after delay from WS connect) +{ "type": "delay_after_ws_connect", "delayMs": 5000 } +``` + +**`count`**: 1-based occurrence counter. `count: 2` matches only the 2nd occurrence. + +### Actions + +```json +// Passthrough (default for unmatched traffic) +{ "type": "passthrough" } + +// Connection-level +{ "type": "refuse_connection" } +{ "type": "accept_and_close", "closeCode": 1011 } +{ "type": "disconnect" } +{ "type": "close", "closeCode": 1000 } + +// Frame manipulation +{ "type": "suppress" } +{ "type": "delay", "delayMs": 2000 } +{ "type": "inject_to_client", "message": { "action": 6, ... } } +{ "type": "inject_to_client_and_close", "message": { "action": 6, ... }, "closeCode": 1000 } +{ "type": "replace", "message": { "action": 4, ... } } +{ "type": "suppress_onwards" } + +// HTTP +{ "type": "http_respond", "status": 401, "body": { ... } } +{ "type": "http_delay", "delayMs": 5000 } +{ "type": "http_drop" } +``` + +### Imperative Actions + +For cases where timed rules are awkward (e.g. "drop the connection NOW"): + +```json +{ "type": "disconnect" } +{ "type": "close", "closeCode": 1000 } +{ "type": "inject_to_client", "message": { ... } } +{ "type": "inject_to_client_and_close", "message": { ... } } +``` + +## Event Log + +The proxy records all traffic through a session. The log can be retrieved to verify test assertions. + +```json +{ + "events": [ + { "type": "ws_connect", "url": "ws://...", "queryParams": { "key": "..." } }, + { "type": "ws_frame", "direction": "server_to_client", "message": { "action": 4, ... } }, + { "type": "ws_disconnect", "initiator": "proxy", "closeCode": 1006 }, + { "type": "http_request", "method": "GET", "path": "/channels/test/messages" }, + { "type": "http_response", "status": 200 } + ] +} +``` + +### Common Log Assertions + +```pseudo +# Verify resume was attempted +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +ASSERT ws_connects[1].queryParams["resume"] IS NOT null + +# Verify a specific frame was sent +frames = log.filter(e => e.type == "ws_frame" AND e.direction == "client_to_server") +attach_frames = frames.filter(f => f.message.action == 10) # ATTACH +ASSERT attach_frames.length == 1 +``` + +## Protocol Message Action Numbers + +| Name | Number | Direction | +|------|--------|-----------| +| HEARTBEAT | 0 | Both | +| ACK | 1 | Server → Client | +| NACK | 2 | Server → Client | +| CONNECT | 3 | Client → Server | +| CONNECTED | 4 | Server → Client | +| DISCONNECT | 5 | Client → Server | +| DISCONNECTED | 6 | Server → Client | +| CLOSE | 7 | Client → Server | +| CLOSED | 8 | Server → Client | +| ERROR | 9 | Server → Client | +| ATTACH | 10 | Client → Server | +| ATTACHED | 11 | Server → Client | +| DETACH | 12 | Client → Server | +| DETACHED | 13 | Server → Client | +| PRESENCE | 14 | Both | +| MESSAGE | 15 | Both | +| SYNC | 16 | Server → Client | +| AUTH | 17 | Client → Server | + +## SDK ClientOptions for Proxy Tests + +All proxy integration tests should configure the SDK with: + +```pseudo +ClientOptions( + key: api_key, + endpoint: "localhost", # REC1b2: sets both restHost and realtimeHost to "localhost" + port: proxy_port, # The proxy session's assigned port + tls: false, # Proxy serves plain HTTP/WS; TLS only upstream + useBinaryProtocol: false, # Required: SDK doesn't implement msgpack + autoConnect: false # Explicit connect for test control + # fallbackHosts: not needed — endpoint="localhost" auto-disables fallbacks (REC2c2) +) +``` + +## Conventions for Proxy Integration Test Specs + +1. Each test references the spec point AND the corresponding unit test +2. Tests use `create_proxy_session()` with rules, then connect SDK through the proxy +3. Tests use `AWAIT_STATE` for state assertions and record state changes for sequence verification +4. Tests verify behaviour via SDK state AND proxy event log where useful +5. All tests use `useBinaryProtocol: false` (SDK doesn't implement msgpack) +6. All tests use `endpoint: "localhost"` which auto-disables fallback hosts (REC2c2) +7. Timeouts are generous (10-30s) since real network is involved +8. Each test file provisions a sandbox app in `BEFORE ALL TESTS` and cleans up in `AFTER ALL TESTS` +9. Each test creates its own proxy session and cleans it up after diff --git a/uts/realtime/integration/proxy/channel_faults.md b/uts/realtime/integration/proxy/channel_faults.md new file mode 100644 index 000000000..b25f908c2 --- /dev/null +++ b/uts/realtime/integration/proxy/channel_faults.md @@ -0,0 +1,553 @@ +# Channel Fault Proxy Integration Tests + +Spec points: `RTL4f`, `RTL4h`, `RTL5f`, `RTL13a`, `RTL14` + +## Test Type + +Proxy integration test against Ably Sandbox endpoint + +## Proxy Infrastructure + +See `uts/test/realtime/integration/helpers/proxy.md` for the full proxy infrastructure specification. + +## Corresponding Unit Tests + +- `uts/test/realtime/unit/channels/channel_attach.md` -- RTL4f (attach timeout), RTL4h (server error on attach) +- `uts/test/realtime/unit/channels/channel_detach.md` -- RTL5f (detach timeout) +- `uts/test/realtime/unit/channels/channel_server_initiated_detach.md` -- RTL13a (unsolicited DETACHED triggers reattach) +- `uts/test/realtime/unit/channels/channel_error.md` -- RTL14 (channel ERROR transitions to FAILED) + +## Sandbox Setup + +Tests run against the Ably Sandbox via a programmable proxy. + +### App Provisioning + +```pseudo +BEFORE ALL TESTS: + # Provision test app + response = POST https://sandbox-rest.ably.io/apps + WITH body from ably-common/test-resources/test-app-setup.json + + app_config = parse_json(response.body) + api_key = app_config.keys[0].key_str + app_id = app_config.app_id + +AFTER ALL TESTS: + # Clean up test app + DELETE https://sandbox-rest.ably.io/apps/{app_id} + WITH Authorization: Basic {api_key} +``` + +### Common Cleanup + +```pseudo +AFTER EACH TEST: + IF client IS NOT null AND client.connection.state IN [connected, connecting, disconnected]: + client.connection.close() + AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10 seconds + IF session IS NOT null: + session.close() +``` + +--- + +## Test 13: RTL4f -- Attach timeout (server doesn't respond) + +| Spec | Requirement | +|------|-------------| +| RTL4f | If an ATTACHED ProtocolMessage is not received within realtimeRequestTimeout, the attach request should be treated as though it has failed and the channel should transition to the SUSPENDED state | + +Tests that when the proxy suppresses the client's ATTACH message so the server never sees it, the SDK's attach timer fires and the channel transitions to SUSPENDED. This verifies the same behaviour as the unit test but with a real Ably connection and real clock timing. + +### Setup + +```pseudo +channel_name = "test-RTL4f-${random_id()}" + +# Create proxy session that suppresses ATTACH messages for our channel +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_server", "action": "ATTACH", "channel": channel_name }, + "action": { "type": "suppress" }, + "comment": "RTL4f: Suppress ATTACH so server never responds" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false, + realtimeRequestTimeout: 3000 +)) + +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Record channel state changes for sequence verification +channel_state_changes = [] +channel.on((change) => { + channel_state_changes.append(change.current) +}) + +# Connect through proxy -- connection itself is not faulted +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +# Start attach -- proxy will suppress the ATTACH, so server never responds +attach_future = channel.attach() + +# Channel should enter ATTACHING immediately +AWAIT_STATE channel.state == ChannelState.attaching + WITH timeout: 5 seconds + +# Wait for the channel to transition to SUSPENDED after realtimeRequestTimeout +AWAIT_STATE channel.state == ChannelState.suspended + WITH timeout: 15 seconds + +# The attach() call should have failed with a timeout error +AWAIT attach_future FAILS WITH error +``` + +### Assertions + +```pseudo +# Channel transitioned to SUSPENDED +ASSERT channel.state == ChannelState.suspended + +# Error indicates timeout +ASSERT error IS NOT null + +# State sequence: ATTACHING -> SUSPENDED +ASSERT channel_state_changes CONTAINS_IN_ORDER [ + ChannelState.attaching, + ChannelState.suspended +] + +# Connection remains CONNECTED (attach timeout is channel-scoped) +ASSERT client.connection.state == ConnectionState.connected + +# Proxy log confirms the ATTACH was suppressed (never forwarded to server) +log = session.get_log() +attach_frames_to_server = log.filter(e => + e.type == "ws_frame" AND + e.direction == "client_to_server" AND + e.message.action == 10 AND + e.message.channel == channel_name +) +ASSERT attach_frames_to_server.length == 0 +``` + +--- + +## Test 14: RTL4h / RTL14 -- Server responds with ERROR to ATTACH + +| Spec | Requirement | +|------|-------------| +| RTL4h | If an ERROR ProtocolMessage is received for the channel during ATTACHING, the channel transitions to FAILED | +| RTL14 | If an ERROR ProtocolMessage is received for this channel, the channel should immediately transition to the FAILED state | + +Tests that when the proxy replaces the server's ATTACHED response with a channel-scoped ERROR, the SDK transitions the channel to FAILED with the injected error. The connection should remain CONNECTED. + +### Setup + +```pseudo +channel_name = "test-RTL4h-${random_id()}" + +# Create proxy session that replaces ATTACHED with channel ERROR +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_client", "action": "ATTACHED", "channel": channel_name }, + "action": { + "type": "replace", + "message": { + "action": 9, + "channel": channel_name, + "error": { "code": 40160, "statusCode": 403, "message": "Not permitted" } + } + }, + "times": 1, + "comment": "RTL4h: Replace ATTACHED with channel ERROR" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) + +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Record channel state changes for sequence verification +channel_state_changes = [] +channel.on((change) => { + channel_state_changes.append(change.current) +}) + +# Connect through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +# Attach -- proxy replaces ATTACHED with ERROR +AWAIT channel.attach() FAILS WITH error + +# Channel should be in FAILED state +AWAIT_STATE channel.state == ChannelState.failed + WITH timeout: 10 seconds +``` + +### Assertions + +```pseudo +# Channel transitioned to FAILED +ASSERT channel.state == ChannelState.failed + +# Error reason matches the injected error +ASSERT channel.errorReason IS NOT null +ASSERT channel.errorReason.code == 40160 +ASSERT channel.errorReason.statusCode == 403 + +# The error returned from attach() matches +ASSERT error IS NOT null +ASSERT error.code == 40160 + +# State sequence: ATTACHING -> FAILED +ASSERT channel_state_changes CONTAINS_IN_ORDER [ + ChannelState.attaching, + ChannelState.failed +] + +# Connection remains CONNECTED (channel error does not affect connection) +ASSERT client.connection.state == ConnectionState.connected +``` + +--- + +## Test 15: RTL5f -- Detach timeout (server doesn't respond) + +| Spec | Requirement | +|------|-------------| +| RTL5f | If a DETACHED ProtocolMessage is not received within realtimeRequestTimeout, the detach request should be treated as though it has failed and the channel will return to its previous state | + +Tests that when the channel is attached normally and then the proxy suppresses the DETACH message, the SDK's detach timer fires and the channel reverts to ATTACHED. This requires a two-phase proxy configuration: first allow normal attach, then add a rule to suppress DETACH. + +### Setup + +```pseudo +channel_name = "test-RTL5f-${random_id()}" + +# Phase 1: Create proxy session with NO fault rules (clean passthrough) +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false, + realtimeRequestTimeout: 3000 +)) + +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Record channel state changes for sequence verification +channel_state_changes = [] +channel.on((change) => { + channel_state_changes.append(change.current) +}) + +# Phase 1: Connect and attach normally through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +AWAIT channel.attach() +ASSERT channel.state == ChannelState.attached + +# Clear state change history from the attach phase +channel_state_changes.clear() + +# Phase 2: Add rule to suppress DETACH messages +session.add_rules([{ + "match": { "type": "ws_frame_to_server", "action": "DETACH", "channel": channel_name }, + "action": { "type": "suppress" }, + "comment": "RTL5f: Suppress DETACH so server never responds" +}], position: "prepend") + +# Phase 3: Try to detach -- proxy suppresses DETACH, so server never sends DETACHED +detach_future = channel.detach() + +# Channel should enter DETACHING +AWAIT_STATE channel.state == ChannelState.detaching + WITH timeout: 5 seconds + +# Wait for the channel to revert to ATTACHED after realtimeRequestTimeout +AWAIT_STATE channel.state == ChannelState.attached + WITH timeout: 15 seconds + +# The detach() call should have failed with a timeout error +AWAIT detach_future FAILS WITH error +``` + +### Assertions + +```pseudo +# Channel reverted to ATTACHED (previous state) +ASSERT channel.state == ChannelState.attached + +# Error indicates timeout +ASSERT error IS NOT null + +# State sequence: DETACHING -> ATTACHED (revert) +ASSERT channel_state_changes CONTAINS_IN_ORDER [ + ChannelState.detaching, + ChannelState.attached +] + +# Connection remains CONNECTED +ASSERT client.connection.state == ConnectionState.connected +``` + +--- + +## Test 16: RTL13a -- Server sends unsolicited DETACHED, channel re-attaches + +| Spec | Requirement | +|------|-------------| +| RTL13a | If the channel is ATTACHED and receives a server-initiated DETACHED, an immediate reattach attempt should be made by sending ATTACH, transitioning to ATTACHING with the error from the DETACHED message | + +Tests that when the proxy injects an unsolicited DETACHED message for an attached channel, the SDK automatically re-attaches. The proxy passes all normal traffic through, and the re-attach ATTACH/ATTACHED exchange completes against the real server. + +### Setup + +```pseudo +channel_name = "test-RTL13a-${random_id()}" + +# Create proxy session with clean passthrough (no fault rules initially) +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) + +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Connect and attach normally through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +AWAIT channel.attach() +ASSERT channel.state == ChannelState.attached + +# Record channel state changes from this point +channel_state_changes = [] +channel.on((change) => { + channel_state_changes.append(change.current) +}) + +# Inject an unsolicited DETACHED message with error via imperative action +session.trigger_action({ + type: "inject_to_client", + message: { + "action": 13, + "channel": channel_name, + "error": { "code": 90198, "statusCode": 500, "message": "Channel detached by server" } + } +}) + +# Channel should transition ATTACHING (reattach) -> ATTACHED (reattach succeeds) +AWAIT_STATE channel.state == ChannelState.attached + WITH timeout: 15 seconds +``` + +### Assertions + +```pseudo +# Channel re-attached successfully +ASSERT channel.state == ChannelState.attached + +# State sequence: ATTACHING (with error from DETACHED) -> ATTACHED +ASSERT channel_state_changes CONTAINS_IN_ORDER [ + ChannelState.attaching, + ChannelState.attached +] + +# Connection remains CONNECTED throughout +ASSERT client.connection.state == ConnectionState.connected + +# Proxy log shows the re-attach ATTACH message from the client +log = session.get_log() +attach_frames = log.filter(e => + e.type == "ws_frame" AND + e.direction == "client_to_server" AND + e.message.action == 10 AND + e.message.channel == channel_name +) +# At least 2 ATTACH frames: initial attach + reattach after injected DETACHED +ASSERT attach_frames.length >= 2 +``` + +--- + +## Test 17: RTL14 -- Server sends channel ERROR, channel goes FAILED + +| Spec | Requirement | +|------|-------------| +| RTL14 | If an ERROR ProtocolMessage is received for this channel, the channel should immediately transition to the FAILED state, and the RealtimeChannel.errorReason should be set | + +Tests that when the proxy injects a channel-scoped ERROR message for an attached channel, the SDK transitions the channel to FAILED. The connection should remain CONNECTED because the error is channel-scoped, not connection-scoped. + +### Setup + +```pseudo +channel_name = "test-RTL14-${random_id()}" + +# Create proxy session with clean passthrough +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) + +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Connect and attach normally through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +AWAIT channel.attach() +ASSERT channel.state == ChannelState.attached + +# Record channel state changes from this point +channel_state_changes = [] +channel.on((change) => { + channel_state_changes.append(change.current) +}) + +# Inject a channel-scoped ERROR message via imperative action +session.trigger_action({ + type: "inject_to_client", + message: { + "action": 9, + "channel": channel_name, + "error": { "code": 40160, "statusCode": 403, "message": "Not permitted" } + } +}) + +# Channel should transition to FAILED +AWAIT_STATE channel.state == ChannelState.failed + WITH timeout: 10 seconds +``` + +### Assertions + +```pseudo +# Channel transitioned to FAILED +ASSERT channel.state == ChannelState.failed + +# errorReason is set from the injected ERROR +ASSERT channel.errorReason IS NOT null +ASSERT channel.errorReason.code == 40160 +ASSERT channel.errorReason.statusCode == 403 +ASSERT channel.errorReason.message CONTAINS "Not permitted" + +# State change event shows ATTACHED -> FAILED +ASSERT channel_state_changes CONTAINS_IN_ORDER [ + ChannelState.failed +] +ASSERT length(channel_state_changes) == 1 + +# Connection remains CONNECTED (channel-scoped ERROR does not close connection) +ASSERT client.connection.state == ConnectionState.connected +``` + +--- + +## Integration Test Notes + +### Why Proxy Tests vs Unit Tests + +These tests verify the same spec points as the unit tests, but provide higher confidence because: + +1. **Real WebSocket connections** -- the SDK's actual transport layer is exercised +2. **Real Ably protocol** -- the proxy modifies real server responses, not synthetic mocks +3. **Real timing** -- timeout behaviour (RTL4f, RTL5f) is tested with actual clocks, not fake timers +4. **Real server interaction** -- the reattach in RTL13a completes against the live sandbox, verifying the full round-trip + +### Timeout Handling + +All `AWAIT_STATE` calls use generous timeouts because real network traffic is involved: +- Connection to CONNECTED via proxy: 15 seconds +- Channel state transitions with real server: 15 seconds +- Timeout-based transitions (RTL4f, RTL5f): realtimeRequestTimeout + 12 seconds headroom +- Cleanup close: 10 seconds + +### Channel Names + +Each test uses a unique channel name with a random component (`${random_id()}`) to avoid interference between tests running in the same sandbox app. + +### Two-Phase Proxy Configuration + +Test 15 (RTL5f) uses a two-phase approach: +1. Start with clean passthrough rules to allow normal connection and attach +2. Dynamically add fault rules via `session.add_rules()` before the detach attempt + +This avoids needing separate proxy sessions for the attach and detach phases. diff --git a/uts/realtime/integration/proxy/connection_open_failures.md b/uts/realtime/integration/proxy/connection_open_failures.md new file mode 100644 index 000000000..5a924cb02 --- /dev/null +++ b/uts/realtime/integration/proxy/connection_open_failures.md @@ -0,0 +1,495 @@ +# Connection Opening Failures — Proxy Integration Tests + +Spec points: `RTN14a`, `RTN14b`, `RTN14c`, `RTN14d`, `RTN14g` + +## Test Type + +Proxy integration test against Ably Sandbox endpoint + +## Proxy Infrastructure + +See `uts/test/realtime/integration/helpers/proxy.md` for the full proxy infrastructure specification. + +## Related Unit Tests + +See `uts/test/realtime/unit/connection/connection_open_failures_test.md` for the corresponding unit tests that verify the same spec points with mocked WebSocket. + +## Sandbox Setup + +Tests run against the Ably Sandbox via a programmable proxy. + +### App Provisioning + +```pseudo +BEFORE ALL TESTS: + # Provision test app + response = POST https://sandbox-rest.ably.io/apps + WITH body from ably-common/test-resources/test-app-setup.json + + app_config = parse_json(response.body) + api_key = app_config.keys[0].key_str + app_id = app_config.app_id + +AFTER ALL TESTS: + # Clean up test app + DELETE https://sandbox-rest.ably.io/apps/{app_id} + WITH Authorization: Basic {api_key} +``` + +### Common Cleanup + +```pseudo +AFTER EACH TEST: + IF client IS NOT null AND client.connection.state IN [connected, connecting, disconnected]: + client.connection.close() + AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10 seconds + IF session IS NOT null: + session.close() +``` + +--- + +## RTN14a — Fatal error during connection open causes FAILED + +| Spec | Requirement | +|------|-------------| +| RTN14a | If the connection attempt encounters a fatal error (non-token error), the connection transitions to FAILED | + +Tests that when the server responds with a fatal ERROR (non-token error code) during connection open, the SDK transitions to FAILED and sets errorReason. This verifies the same behaviour as the unit test but against the real Ably sandbox with fault injection. + +### Setup + +```pseudo +# Create proxy session that replaces the first CONNECTED with a fatal ERROR +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_client", "action": "CONNECTED" }, + "action": { + "type": "replace", + "message": { + "action": 9, + "error": { "code": 40005, "statusCode": 400, "message": "Invalid key" } + } + }, + "times": 1, + "comment": "RTN14a: Replace CONNECTED with fatal ERROR" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# Wait for FAILED state +AWAIT_STATE client.connection.state == ConnectionState.failed + WITH timeout: 15 seconds +``` + +### Assertions + +```pseudo +# Connection transitioned to FAILED +ASSERT client.connection.state == ConnectionState.failed + +# Error reason is set from the injected ERROR message +ASSERT client.connection.errorReason IS NOT null +ASSERT client.connection.errorReason.code == 40005 +ASSERT client.connection.errorReason.statusCode == 400 + +# State sequence includes CONNECTING -> FAILED +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.failed +] + +# Connection ID/key not set (never received real CONNECTED) +ASSERT client.connection.id IS null +ASSERT client.connection.key IS null +``` + +--- + +## RTN14b — Token error during connection, SDK renews and reconnects + +| Spec | Requirement | +|------|-------------| +| RTN14b | If a token error (40140-40149) occurs during connection and the token is renewable, attempt to obtain a new token and retry | + +Tests that when the server responds with a token error during the first connection attempt, the SDK renews the token via authCallback and successfully connects on the second attempt. The proxy intercepts only the first CONNECTED, replacing it with a 40142 error; the second attempt passes through. + +### Setup + +```pseudo +# Track authCallback invocations +auth_callback_count = 0 + +# Create proxy session that injects token error on first CONNECTED only +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_client", "action": "CONNECTED" }, + "action": { + "type": "replace", + "message": { + "action": 9, + "error": { "code": 40142, "statusCode": 401, "message": "Token expired" } + } + }, + "times": 1, + "comment": "RTN14b: Token error on first connect, renewal should succeed" + }] +) + +# Use token auth with authCallback so the SDK can renew +client = Realtime(options: ClientOptions( + authCallback: (params) => { + auth_callback_count++ + # Request a token from the sandbox using the API key + token_details = request_token_from_sandbox(api_key, params) + RETURN token_details + }, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# SDK should see token error, renew token, reconnect, and reach CONNECTED +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 30 seconds +``` + +### Assertions + +```pseudo +# Successfully connected after token renewal +ASSERT client.connection.state == ConnectionState.connected + +# Connection properties are set (from the real CONNECTED on second attempt) +ASSERT client.connection.id IS NOT null +ASSERT client.connection.key IS NOT null + +# authCallback was called at least twice (initial token + renewal) +ASSERT auth_callback_count >= 2 + +# State sequence shows the SDK went through CONNECTING, then back to CONNECTING after error, +# and finally reached CONNECTED +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.connected +] + +# Proxy event log shows two WebSocket connections +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 + +# No residual error reason on successful connection +ASSERT client.connection.errorReason IS null +``` + +--- + +## RTN14d — Retry after connection refused + +| Spec | Requirement | +|------|-------------| +| RTN14d | After a recoverable connection failure, the client transitions to DISCONNECTED and automatically retries after disconnectedRetryTimeout | + +Tests that when the first WebSocket connection is refused at the transport level, the SDK transitions to DISCONNECTED, waits for the retry timeout, and successfully connects on the second attempt. The proxy refuses the first connection and passes through the second. + +### Setup + +```pseudo +# Create proxy session that refuses the first WebSocket connection +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_connect", "count": 1 }, + "action": { "type": "refuse_connection" }, + "times": 1, + "comment": "RTN14d: Refuse first WebSocket connection" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false, + disconnectedRetryTimeout: 2000 +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# SDK should fail on first attempt, go DISCONNECTED, retry, then reach CONNECTED +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 30 seconds +``` + +### Assertions + +```pseudo +# Successfully connected after retry +ASSERT client.connection.state == ConnectionState.connected + +# Connection properties are set +ASSERT client.connection.id IS NOT null +ASSERT client.connection.key IS NOT null + +# State sequence shows CONNECTING -> DISCONNECTED -> CONNECTING -> CONNECTED +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.disconnected, + ConnectionState.connecting, + ConnectionState.connected +] + +# Proxy event log shows two WebSocket connection attempts +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +``` + +--- + +## RTN14g — Connection-level ERROR during open causes FAILED + +| Spec | Requirement | +|------|-------------| +| RTN14g | If an ERROR ProtocolMessage with empty channel attribute is received, transition to FAILED state and set errorReason | + +Tests that when the server responds with a connection-level ERROR (no channel field) with a server error code during connection open, the SDK transitions to FAILED. This is functionally similar to RTN14a but uses a 5xx error code (server error) rather than a 4xx client error, confirming that both ranges (outside 40140-40149) result in FAILED. + +### Setup + +```pseudo +# Create proxy session that replaces the first CONNECTED with a server ERROR +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_client", "action": "CONNECTED" }, + "action": { + "type": "replace", + "message": { + "action": 9, + "error": { "code": 50000, "statusCode": 500, "message": "Internal server error" } + } + }, + "times": 1, + "comment": "RTN14g: Connection-level ERROR (server error) during open" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# Wait for FAILED state +AWAIT_STATE client.connection.state == ConnectionState.failed + WITH timeout: 15 seconds +``` + +### Assertions + +```pseudo +# Connection transitioned to FAILED +ASSERT client.connection.state == ConnectionState.failed + +# Error reason is set from the injected ERROR message +ASSERT client.connection.errorReason IS NOT null +ASSERT client.connection.errorReason.code == 50000 +ASSERT client.connection.errorReason.statusCode == 500 +ASSERT client.connection.errorReason.message == "Internal server error" + +# State sequence includes CONNECTING -> FAILED +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.failed +] + +# Connection ID/key not set +ASSERT client.connection.id IS null +ASSERT client.connection.key IS null +``` + +--- + +## RTN14c — Connection timeout (no CONNECTED received) + +| Spec | Requirement | +|------|-------------| +| RTN14c | A connection attempt fails if not connected within realtimeRequestTimeout | + +Tests that when the server accepts the WebSocket but never sends a CONNECTED message, the SDK times out and transitions to DISCONNECTED. The proxy suppresses the CONNECTED message from the server, forcing the SDK to rely on its timeout logic. + +### Setup + +```pseudo +# Create proxy session that suppresses all CONNECTED messages +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "ws_frame_to_client", "action": "CONNECTED" }, + "action": { "type": "suppress" }, + "comment": "RTN14c: Suppress CONNECTED to force timeout" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false, + realtimeRequestTimeout: 3000 +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# SDK should time out waiting for CONNECTED and transition to DISCONNECTED +AWAIT_STATE client.connection.state == ConnectionState.disconnected + WITH timeout: 15 seconds +``` + +### Assertions + +```pseudo +# Connection timed out and transitioned to DISCONNECTED +ASSERT client.connection.state == ConnectionState.disconnected + +# Error reason indicates timeout +ASSERT client.connection.errorReason IS NOT null +ASSERT client.connection.errorReason.message CONTAINS "timeout" + OR client.connection.errorReason.code IN [50003, 80003] + +# State sequence includes CONNECTING -> DISCONNECTED +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.disconnected +] + +# Connection ID/key not set (CONNECTED was never received) +ASSERT client.connection.id IS null +ASSERT client.connection.key IS null +``` + +--- + +## Integration Test Notes + +### Timeout Handling + +All `AWAIT_STATE` calls use generous timeouts because real network traffic is involved: +- Connection to CONNECTED via proxy: 30 seconds (allows for auth + transport + retry) +- Connection to FAILED/DISCONNECTED: 15 seconds (allows for proxy rule processing) +- Cleanup close: 10 seconds + +### Token Auth Helper + +The RTN14b test requires a helper to request tokens from the sandbox: + +```pseudo +function request_token_from_sandbox(api_key, token_params): + # Split API key into key name and secret + key_name = api_key.split(":")[0] + key_secret = api_key.split(":")[1] + + # Request a token from the sandbox REST API + response = POST https://sandbox-rest.ably.io/keys/{key_name}/requestToken + WITH Authorization: Basic base64(api_key) + WITH body: token_params OR {} + + RETURN parse_json(response.body) # TokenDetails +``` + +### Why Proxy Tests vs Unit Tests + +These tests verify the same spec points as the unit tests in `connection_open_failures_test.md`, but provide higher confidence because: + +1. **Real WebSocket connections** -- the SDK's actual transport layer is exercised +2. **Real Ably protocol** -- the proxy modifies real server responses, not synthetic mocks +3. **Real timing** -- timeout behaviour is tested with actual clocks, not fake timers +4. **Real token renewal** -- RTN14b exercises the full authCallback-to-reconnect flow against the sandbox diff --git a/uts/realtime/integration/proxy/connection_resume.md b/uts/realtime/integration/proxy/connection_resume.md new file mode 100644 index 000000000..c45806cfe --- /dev/null +++ b/uts/realtime/integration/proxy/connection_resume.md @@ -0,0 +1,586 @@ +# Connection Resume Proxy Integration Tests (RTN15) + +Spec points: `RTN15a`, `RTN15b`, `RTN15c6`, `RTN15c7`, `RTN15h1`, `RTN15h3` + +## Test Type + +Proxy integration test against Ably Sandbox endpoint. + +Uses the programmable proxy (`uts/test/proxy/`) to inject transport-level faults while the SDK communicates with the real Ably backend. See `uts/test/realtime/integration/helpers/proxy.md` for proxy infrastructure details. + +Corresponding unit tests: `uts/test/realtime/unit/connection/connection_failures_test.md` + +## Sandbox Setup + +```pseudo +BEFORE ALL TESTS: + # Provision test app + response = POST https://sandbox-rest.ably.io/apps + WITH body from ably-common/test-resources/test-app-setup.json + + app_config = parse_json(response.body) + api_key = app_config.keys[0].key_str + app_id = app_config.app_id + +AFTER ALL TESTS: + # Clean up test app + DELETE https://sandbox-rest.ably.io/apps/{app_id} + WITH Authorization: Basic {api_key} +``` + +## Port Allocation + +Each test allocates a unique proxy port to avoid conflicts: + +```pseudo +BEFORE ALL TESTS: + port_base = allocate_port_range(count: 5) + # Tests use port_base + 0 through port_base + 4 +``` + +--- + +## Test 6: RTN15a - Unexpected disconnect triggers resume + +| Spec | Requirement | +|------|-------------| +| RTN15a | If transport is disconnected unexpectedly, attempt resume | + +Tests that an unexpected transport disconnect causes the SDK to reconnect and attempt a resume, verified via the proxy event log. + +**Unit test counterpart:** `connection_failures_test.md` > RTN15a + +### Setup + +**Proxy rules:** None (passthrough). The disconnect is triggered imperatively after the SDK connects. + +```pseudo +session = create_proxy_session( + endpoint: "sandbox", + port: port_base + 0, + rules: [] +) +``` + +**SDK config:** + +```pseudo +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: port_base + 0, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Connect through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s + +# Record state changes from this point +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Trigger unexpected disconnect via proxy imperative action +session.trigger_action({ type: "disconnect" }) + +# SDK should reconnect and resume +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s +``` + +### Assertions + +```pseudo +# State changes should include disconnected -> connecting -> connected +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.disconnected, + ConnectionState.connecting, + ConnectionState.connected +] + +# Verify resume was attempted via proxy log +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 + +# Second WebSocket connection should include resume query parameter +ASSERT ws_connects[1].queryParams["resume"] IS NOT null +``` + +### Cleanup + +```pseudo +client.connection.close() +AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10s +session.close() +``` + +--- + +## Test 7: RTN15b, RTN15c6 - Resume preserves connectionId + +| Spec | Requirement | +|------|-------------| +| RTN15b | Resume is attempted with connectionKey in `resume` query parameter | +| RTN15c6 | Successful resume indicated by same connectionId in CONNECTED response | + +Tests that after an unexpected disconnect and successful resume, the connection ID remains the same and the resume query parameter contains the connection key. + +**Unit test counterpart:** `connection_failures_test.md` > RTN15b, RTN15c6 + +### Setup + +**Proxy rules:** None (passthrough). Disconnect is triggered imperatively. + +```pseudo +session = create_proxy_session( + endpoint: "sandbox", + port: port_base + 1, + rules: [] +) +``` + +**SDK config:** + +```pseudo +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: port_base + 1, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Connect through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s + +# Record connection identity before disconnect +original_connection_id = client.connection.id +original_connection_key = client.connection.key +ASSERT original_connection_id IS NOT null +ASSERT original_connection_key IS NOT null + +# Trigger unexpected disconnect +session.trigger_action({ type: "disconnect" }) + +# Wait for SDK to resume +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s +``` + +### Assertions + +```pseudo +# RTN15c6: Connection ID is preserved (successful resume) +ASSERT client.connection.id == original_connection_id + +# RTN15b: Second ws_connect URL includes resume={connectionKey} +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +ASSERT ws_connects[1].queryParams["resume"] == original_connection_key + +# No error reason on successful resume +ASSERT client.connection.errorReason IS null +``` + +### Cleanup + +```pseudo +client.connection.close() +AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10s +session.close() +``` + +--- + +## Test 8: RTN15c7 - Failed resume gets new connectionId + +| Spec | Requirement | +|------|-------------| +| RTN15c7 | If resume fails, server sends CONNECTED with new connectionId and error | + +Tests that when a resume fails (simulated by the proxy replacing the server's second CONNECTED response with one containing a different connectionId and error), the SDK accepts the new connection identity and exposes the error. + +**Unit test counterpart:** `connection_failures_test.md` > RTN15c7 + +### Setup + +**Proxy rules:** Replace the 2nd CONNECTED message (the resume response) with a crafted one that has a different connectionId and an error, simulating a failed resume. + +```pseudo +session = create_proxy_session( + endpoint: "sandbox", + port: port_base + 2, + rules: [ + { + "match": { "type": "ws_frame_to_client", "action": "CONNECTED", "count": 2 }, + "action": { + "type": "replace", + "message": { + "action": 4, + "connectionId": "proxy-injected-new-id", + "connectionKey": "proxy-injected-new-key", + "connectionDetails": { + "connectionKey": "proxy-injected-new-key", + "clientId": null, + "maxMessageSize": 65536, + "maxInboundRate": 250, + "maxOutboundRate": 100, + "maxFrameSize": 524288, + "serverId": "test-server", + "connectionStateTtl": 120000, + "maxIdleInterval": 15000 + }, + "error": { + "code": 80008, + "statusCode": 400, + "message": "Unable to recover connection" + } + } + }, + "times": 1, + "comment": "RTN15c7: Replace 2nd CONNECTED with failed resume (different connectionId + error 80008)" + } + ] +) +``` + +**SDK config:** + +```pseudo +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: port_base + 2, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Connect through proxy — first CONNECTED passes through normally +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s + +# Record original identity +original_connection_id = client.connection.id +ASSERT original_connection_id IS NOT null +ASSERT original_connection_id != "proxy-injected-new-id" + +# Trigger disconnect — SDK will attempt resume +session.trigger_action({ type: "disconnect" }) + +# SDK reconnects, but proxy replaces the CONNECTED response with a new connectionId +# SDK should still reach CONNECTED, but with the new identity +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s +``` + +### Assertions + +```pseudo +# RTN15c7: Connection ID changed (resume failed, got new connection) +ASSERT client.connection.id == "proxy-injected-new-id" +ASSERT client.connection.id != original_connection_id + +# Connection key updated to the new one +ASSERT client.connection.key == "proxy-injected-new-key" + +# Error reason is set indicating why resume failed +ASSERT client.connection.errorReason IS NOT null +ASSERT client.connection.errorReason.code == 80008 + +# Connection is still CONNECTED (not FAILED — the server gave a new connection) +ASSERT client.connection.state == ConnectionState.connected + +# Verify resume was attempted in the proxy log +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +ASSERT ws_connects[1].queryParams["resume"] IS NOT null +``` + +### Cleanup + +```pseudo +client.connection.close() +AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10s +session.close() +``` + +--- + +## Test 9: RTN15h1 - DISCONNECTED with token error, non-renewable token -> FAILED + +| Spec | Requirement | +|------|-------------| +| RTN15h1 | If DISCONNECTED contains a token error and the token is not renewable, transition to FAILED | + +Tests that when the proxy injects a DISCONNECTED message with a token error (code 40142), and the SDK was configured with a non-renewable token (token string only, no key or authCallback), the SDK transitions to FAILED because it has no means to renew the token. + +**Unit test counterpart:** `connection_failures_test.md` > RTN15h1 + +### Setup + +**Proxy rules:** After the initial WebSocket connection is established, wait 1 second then inject a DISCONNECTED message with token error and close the connection. + +```pseudo +session = create_proxy_session( + endpoint: "sandbox", + port: port_base + 3, + rules: [ + { + "match": { "type": "delay_after_ws_connect", "delayMs": 1000 }, + "action": { + "type": "inject_to_client_and_close", + "message": { + "action": 6, + "error": { + "code": 40142, + "statusCode": 401, + "message": "Token expired" + } + } + }, + "times": 1, + "comment": "RTN15h1: Inject DISCONNECTED with token error (40142) after 1s" + } + ] +) +``` + +**Token provisioning:** Obtain a real token from the sandbox so the initial connection succeeds, then use it without any renewal capability. + +```pseudo +# Provision a token via REST using the API key +rest = Rest(options: ClientOptions(key: api_key, endpoint: "sandbox")) +token_details = rest.auth.requestToken() +token_string = token_details.token +``` + +**SDK config:** Use the token string directly — no key, no authCallback. This makes the token non-renewable. + +```pseudo +client = Realtime(options: ClientOptions( + token: token_string, + endpoint: "localhost", + port: port_base + 3, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Connect through proxy — initial connection succeeds with the real token +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s + +# Record state changes +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# After 1s the proxy injects DISCONNECTED with 40142 and closes the socket. +# The SDK has a non-renewable token, so it cannot renew -> FAILED. +AWAIT_STATE client.connection.state == ConnectionState.failed + WITH timeout: 15s +``` + +### Assertions + +```pseudo +# RTN15h1: Ended in FAILED state +ASSERT client.connection.state == ConnectionState.failed + +# Error reason reflects the token error +ASSERT client.connection.errorReason IS NOT null +ASSERT client.connection.errorReason.code == 40142 +ASSERT client.connection.errorReason.statusCode == 401 + +# State changes should show the transition to FAILED +# (may pass through DISCONNECTED briefly before FAILED) +ASSERT state_changes CONTAINS ConnectionState.failed +``` + +### Cleanup + +```pseudo +# No need to close — already in FAILED state +session.close() +``` + +--- + +## Test 10: RTN15h3 - DISCONNECTED with non-token error triggers reconnect + +| Spec | Requirement | +|------|-------------| +| RTN15h3 | If DISCONNECTED contains a non-token error, initiate immediate reconnect with resume | + +Tests that when the proxy injects a DISCONNECTED message with a non-token error (code 80003), the SDK reconnects and resumes rather than transitioning to FAILED. + +**Unit test counterpart:** `connection_failures_test.md` > RTN15h3 + +### Setup + +**Proxy rules:** After the initial WebSocket connection, wait 1 second then inject a DISCONNECTED message with a non-token error and close the connection. Only fire once — the reconnection attempt passes through cleanly. + +```pseudo +session = create_proxy_session( + endpoint: "sandbox", + port: port_base + 4, + rules: [ + { + "match": { "type": "delay_after_ws_connect", "delayMs": 1000 }, + "action": { + "type": "inject_to_client_and_close", + "message": { + "action": 6, + "error": { + "code": 80003, + "statusCode": 500, + "message": "Service temporarily unavailable" + } + } + }, + "times": 1, + "comment": "RTN15h3: Inject DISCONNECTED with non-token error (80003) after 1s, once only" + } + ] +) +``` + +**SDK config:** + +```pseudo +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: port_base + 4, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) +``` + +### Test Steps + +```pseudo +# Connect through proxy +client.connect() +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s + +# Record state changes +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# After 1s the proxy injects DISCONNECTED with non-token error and closes. +# The rule fires once, so the reconnection attempt passes through to the real server. + +# Wait for DISCONNECTED (from the injected message) +AWAIT_STATE client.connection.state == ConnectionState.disconnected + WITH timeout: 10s + +# SDK should automatically reconnect +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15s +``` + +### Assertions + +```pseudo +# RTN15h3: SDK reconnected successfully (not FAILED) +ASSERT client.connection.state == ConnectionState.connected + +# State changes should show: disconnected -> connecting -> connected +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.disconnected, + ConnectionState.connecting, + ConnectionState.connected +] + +# Verify resume was attempted +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 +ASSERT ws_connects[1].queryParams["resume"] IS NOT null + +# No error reason after successful reconnection +ASSERT client.connection.errorReason IS null +``` + +### Cleanup + +```pseudo +client.connection.close() +AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10s +session.close() +``` + +--- + +## Integration Test Notes + +### Timeout Handling + +All `AWAIT_STATE` calls use generous timeouts since real network traffic through the proxy is involved: +- Initial CONNECTED: 15 seconds (auth + transport setup through proxy) +- Reconnection CONNECTED: 15 seconds (allows for SDK retry logic + network round-trip) +- DISCONNECTED (injected): 10 seconds (1s proxy delay + processing) +- FAILED: 15 seconds (SDK may attempt intermediate steps) +- CLOSED (cleanup): 10 seconds + +### Error Handling + +If any test fails to reach an expected state: +- Log the connection `errorReason` +- Log all recorded `state_changes` +- Retrieve and log the proxy session event log via `session.get_log()` +- Fail with diagnostic information + +### Cleanup + +Always clean up both the SDK client and the proxy session: + +```pseudo +AFTER EACH TEST: + IF client IS NOT null AND client.connection.state IN [connected, connecting, disconnected]: + client.connection.close() + AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10s + IF session IS NOT null: + session.close() +``` diff --git a/uts/realtime/integration/proxy/heartbeat.md b/uts/realtime/integration/proxy/heartbeat.md new file mode 100644 index 000000000..4957cee43 --- /dev/null +++ b/uts/realtime/integration/proxy/heartbeat.md @@ -0,0 +1,174 @@ +# Realtime Heartbeat — Proxy Integration Tests + +Spec points: `RTN23a` + +## Test Type + +Proxy integration test against Ably Sandbox endpoint + +## Proxy Infrastructure + +See `uts/test/realtime/integration/helpers/proxy.md` for the full proxy infrastructure specification. + +## Related Unit Tests + +See `uts/test/realtime/unit/connection/heartbeat_test.md` for the corresponding unit tests that verify the same spec points with mocked WebSocket and fake timers. + +## Sandbox Setup + +Tests run against the Ably Sandbox via a programmable proxy. + +### App Provisioning + +```pseudo +BEFORE ALL TESTS: + # Provision test app + response = POST https://sandbox-rest.ably.io/apps + WITH body from ably-common/test-resources/test-app-setup.json + + app_config = parse_json(response.body) + api_key = app_config.keys[0].key_str + app_id = app_config.app_id + +AFTER ALL TESTS: + # Clean up test app + DELETE https://sandbox-rest.ably.io/apps/{app_id} + WITH Authorization: Basic {api_key} +``` + +### Common Cleanup + +```pseudo +AFTER EACH TEST: + IF client IS NOT null AND client.connection.state IN [connected, connecting, disconnected]: + client.connection.close() + AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10 seconds + IF session IS NOT null: + session.close() +``` + +--- + +## RTN23a — Heartbeat starvation causes disconnect and reconnect + +| Spec | Requirement | +|------|-------------| +| RTN23a | If no activity is received for `maxIdleInterval + realtimeRequestTimeout`, the transport should be disconnected | + +Tests that when the proxy suppresses all server-to-client frames after the initial CONNECTED handshake, the SDK's heartbeat idle timer fires and the client transitions through DISCONNECTED before reconnecting successfully. This exercises the real idle timer logic (no fake timers) against a live Ably connection. + +The server's CONNECTED message includes `connectionDetails.maxIdleInterval` (typically 15000ms). The SDK computes the heartbeat timeout as `maxIdleInterval + realtimeRequestTimeout`. With a shortened `realtimeRequestTimeout` of 5000ms, the total timeout is approximately 20s. The test uses a generous overall timeout of 45s. + +### Setup + +```pseudo +# Create proxy session that suppresses all server frames after initial CONNECTED settles +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "delay_after_ws_connect", "delayMs": 2000 }, + "action": { "type": "suppress_onwards" }, + "times": 1, + "comment": "RTN23a: Suppress all server frames after 2s to starve heartbeats" + }] +) + +client = Realtime(options: ClientOptions( + key: api_key, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false, + realtimeRequestTimeout: 5000 +)) +``` + +### Test Steps + +```pseudo +# Record state changes for sequence verification +state_changes = [] +client.connection.on((change) => { + state_changes.append(change.current) +}) + +# Start connection +client.connect() + +# SDK receives real CONNECTED from Ably (within the 2s before suppression starts) +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +# Capture connection details from the first connection +first_connection_id = client.connection.id +first_connection_key = client.connection.key +ASSERT first_connection_id IS NOT null + +# Now all server frames are suppressed. The SDK's idle timer will fire after +# maxIdleInterval + realtimeRequestTimeout (~15s + 5s = ~20s). +# The SDK transitions to DISCONNECTED and reconnects. +# The suppress_onwards rule has times=1, so the second WS connection is unaffected. + +# Wait for the SDK to disconnect and reconnect successfully +AWAIT_STATE client.connection.state == ConnectionState.connected + WITH timeout: 45 seconds + WITH condition: client.connection.id != first_connection_id +``` + +### Assertions + +```pseudo +# Connection is re-established with new connection details +ASSERT client.connection.state == ConnectionState.connected +ASSERT client.connection.id IS NOT null +ASSERT client.connection.key IS NOT null + +# State sequence shows: connected -> disconnected -> reconnecting -> connected +ASSERT state_changes CONTAINS_IN_ORDER [ + ConnectionState.connecting, + ConnectionState.connected, + ConnectionState.disconnected, + ConnectionState.connecting, + ConnectionState.connected +] + +# Proxy event log confirms two WebSocket connections +log = session.get_log() +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 2 + +# Second connection should include resume parameter (RTN15c) +ASSERT ws_connects[1].queryParams["resume"] IS NOT null +``` + +--- + +## Integration Test Notes + +### Timing Considerations + +The heartbeat starvation test (RTN23a) is inherently slow because the idle timer depends on `maxIdleInterval` from the server's CONNECTED message. The Ably sandbox typically sends `maxIdleInterval: 15000` (15 seconds). Combined with `realtimeRequestTimeout`, the total idle timeout is approximately 20 seconds. This is unavoidable in an integration test that exercises real timers against a real backend. + +The unit tests in `heartbeat_test.md` use fake timers and short intervals for fast, deterministic testing of the same logic. + +### `suppress_onwards` Semantics + +The `suppress_onwards` action suppresses all subsequent server-to-client frames on the current WebSocket connection. It is a temporal rule triggered by `delay_after_ws_connect`, which means: + +1. It fires once after the specified delay from the first WebSocket connect +2. With `times: 1`, it only applies to the first WS connection in the session +3. When the SDK reconnects with a new WebSocket connection, frames flow normally + +This is the key mechanism that allows the test to verify heartbeat starvation on the first connection while permitting successful reconnection. + +### Why Proxy Tests vs Unit Tests + +These tests complement the unit tests in `heartbeat_test.md`: + +1. **Real idle timer** -- the SDK's actual timer fires after real elapsed time, not fake timers +2. **Real `maxIdleInterval`** -- the value comes from the Ably sandbox's CONNECTED message, not a mock +3. **Real reconnection** -- the SDK reconnects through a real WebSocket to a real server +4. **Real `heartbeats=true` parameter** -- verified in the actual WebSocket URL captured by the proxy diff --git a/uts/realtime/integration/proxy/rest_faults.md b/uts/realtime/integration/proxy/rest_faults.md new file mode 100644 index 000000000..b9de4f1f6 --- /dev/null +++ b/uts/realtime/integration/proxy/rest_faults.md @@ -0,0 +1,364 @@ +# REST Fault Proxy Integration Tests + +Spec points: `RSC10`, `RSC15a`, `RTL6` + +## Test Type + +Proxy integration test against Ably Sandbox endpoint + +## Proxy Infrastructure + +See `uts/test/realtime/integration/helpers/proxy.md` for the full proxy infrastructure specification. + +## Corresponding Unit Tests + +- `uts/test/rest/unit/auth/token_renewal.md` -- RSC10 (unit test verifies token renewal logic with mocked HTTP) +- `uts/test/rest/unit/fallback.md` -- RSC15a (unit test verifies fallback/error handling with mocked HTTP) +- `uts/test/realtime/unit/channels/channel_publish.md` -- RTL6 (unit test verifies publish request formation) + +## Sandbox Setup + +Tests run against the Ably Sandbox via a programmable proxy. + +### App Provisioning + +```pseudo +BEFORE ALL TESTS: + # Provision test app + response = POST https://sandbox-rest.ably.io/apps + WITH body from ably-common/test-resources/test-app-setup.json + + app_config = parse_json(response.body) + api_key = app_config.keys[0].key_str + app_id = app_config.app_id + +AFTER ALL TESTS: + # Clean up test app + DELETE https://sandbox-rest.ably.io/apps/{app_id} + WITH Authorization: Basic {api_key} +``` + +### Common Cleanup + +```pseudo +AFTER EACH TEST: + IF client IS NOT null: + # For Realtime clients, close the connection + IF client HAS connection AND client.connection.state IN [connected, connecting, disconnected]: + client.connection.close() + AWAIT_STATE client.connection.state == ConnectionState.closed + WITH timeout: 10 seconds + IF session IS NOT null: + session.close() +``` + +### Token Auth Helper + +```pseudo +function request_token_from_sandbox(api_key, token_params): + # Split API key into key name and secret + key_name = api_key.split(":")[0] + key_secret = api_key.split(":")[1] + + # Request a token from the sandbox REST API + response = POST https://sandbox-rest.ably.io/keys/{key_name}/requestToken + WITH Authorization: Basic base64(api_key) + WITH body: token_params OR {} + + RETURN parse_json(response.body) # TokenDetails +``` + +--- + +## Test 18: RSC10 -- Token renewal on HTTP 401 (40142) + +| Spec | Requirement | +|------|-------------| +| RSC10 | When a REST request receives a 401 with a token error (40140-40149), the SDK should renew the token and retry the request | + +Tests that when an authenticated REST request receives an HTTP 401 with error code 40142 (token expired), the SDK transparently renews the token via `authCallback` and retries the request. The proxy returns 401 on the first HTTP request to a channel endpoint, then passes through subsequent requests. + +### Setup + +```pseudo +# Track authCallback invocations +auth_callback_count = 0 + +# Create proxy session that returns 401 on the first channel request +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "http_request", "pathContains": "/channels/" }, + "action": { + "type": "http_respond", + "status": 401, + "body": { "error": { "code": 40142, "statusCode": 401, "message": "Token expired" } } + }, + "times": 1, + "comment": "RSC10: Return 401 on first channel request, then passthrough" + }] +) + +# Use token auth with authCallback so the SDK can renew +client = Rest(options: ClientOptions( + authCallback: (params) => { + auth_callback_count++ + # Request a token from the sandbox using the API key + token_details = request_token_from_sandbox(api_key, params) + RETURN token_details + }, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false +)) + +channel_name = "test-RSC10-token-renewal-" + random_string() +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Publish a message -- first request gets 401, SDK renews token, retries +result = AWAIT channel.publish("test-event", "hello") + +# The publish should succeed (SDK transparently renewed and retried) +``` + +### Assertions + +```pseudo +# Publish completed successfully (no error thrown) +ASSERT result IS successful + +# authCallback was called at least twice (initial token + renewal after 401) +ASSERT auth_callback_count >= 2 + +# Proxy event log shows two HTTP requests to the channel endpoint +log = session.get_log() +http_requests = log.filter(e => e.type == "http_request" AND e.path CONTAINS "/channels/") +ASSERT http_requests.length >= 2 + +# First request was intercepted (got 401), second request passed through (got 2xx) +http_responses = log.filter(e => e.type == "http_response") +ASSERT http_responses[0].status == 401 +ASSERT http_responses[1].status IN [200, 201] +``` + +--- + +## Test 19: RSC15a -- HTTP 503 error (no fallback through proxy) + +| Spec | Requirement | +|------|-------------| +| RSC15a | When the SDK receives an HTTP 5xx error and fallback hosts are disabled, it should return the error to the caller | +| REC2c2 | Fallback hosts are automatically disabled when `endpoint` is set to an explicit hostname | + +Tests that when a REST request receives an HTTP 503 (Service Unavailable) and the client is configured with `endpoint: "localhost"` (which disables fallback hosts per REC2c2), the SDK returns the error to the caller without attempting fallback hosts. + +### Setup + +```pseudo +# Create proxy session that returns 503 on the first channel request +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [{ + "match": { "type": "http_request", "pathContains": "/channels/" }, + "action": { + "type": "http_respond", + "status": 503, + "body": { "error": { "code": 50300, "statusCode": 503, "message": "Service temporarily unavailable" } } + }, + "times": 1, + "comment": "RSC15a: Return 503 on first channel request" + }] +) + +# Use key auth (Basic auth not possible over non-TLS, so use token auth) +client = Rest(options: ClientOptions( + authCallback: (params) => { + token_details = request_token_from_sandbox(api_key, params) + RETURN token_details + }, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false +)) + +channel_name = "test-RSC15a-503-error-" + random_string() +channel = client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Try to publish a message -- should fail with 503 error +AWAIT channel.publish("test-event", "hello") FAILS WITH error +``` + +### Assertions + +```pseudo +# The error propagates to the caller with the correct error code +ASSERT error.code == 50300 +ASSERT error.statusCode == 503 + +# Proxy event log shows only one HTTP request to the channel endpoint +# (no fallback attempts since endpoint="localhost" disables fallback hosts) +log = session.get_log() +http_requests = log.filter(e => e.type == "http_request" AND e.path CONTAINS "/channels/") +ASSERT http_requests.length == 1 +``` + +--- + +## Test 20: RTL6 -- End-to-end publish and history through proxy + +| Spec | Requirement | +|------|-------------| +| RTL6 | Messages published via a Realtime connection should be deliverable and retrievable | + +Tests that the proxy transparently forwards both WebSocket and HTTP traffic without interfering with normal operation. A Realtime client publishes a message through the proxy, and a REST client retrieves it via channel history, also through the proxy. This is a "golden path" test that validates the proxy infrastructure itself. + +### Setup + +```pseudo +# Create proxy session with no rules (pure passthrough) +session = create_proxy_session( + endpoint: "sandbox", + port: allocated_port, + rules: [] +) + +# Create Realtime client through proxy for publishing +realtime_client = Realtime(options: ClientOptions( + authCallback: (params) => { + token_details = request_token_from_sandbox(api_key, params) + RETURN token_details + }, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false, + autoConnect: false +)) + +# Create REST client through proxy for history retrieval +rest_client = Rest(options: ClientOptions( + authCallback: (params) => { + token_details = request_token_from_sandbox(api_key, params) + RETURN token_details + }, + endpoint: "localhost", + port: session.proxy_port, + tls: false, + useBinaryProtocol: false +)) + +channel_name = "test-RTL6-publish-history-" + random_string() +realtime_channel = realtime_client.channels.get(channel_name) +rest_channel = rest_client.channels.get(channel_name) +``` + +### Test Steps + +```pseudo +# Connect Realtime client through proxy +realtime_client.connect() +AWAIT_STATE realtime_client.connection.state == ConnectionState.connected + WITH timeout: 15 seconds + +# Attach to the channel +AWAIT realtime_channel.attach() +AWAIT_STATE realtime_channel.state == ChannelState.attached + WITH timeout: 10 seconds + +# Publish a message via Realtime +AWAIT realtime_channel.publish("test-msg", "hello world") + +# Brief pause to allow the message to be persisted on the server +# (history is eventually consistent) +poll_until( + condition: () => { + history = AWAIT rest_channel.history() + RETURN history.items.length > 0 + }, + interval: 500ms, + timeout: 10 seconds +) + +# Retrieve channel history via REST +history = AWAIT rest_channel.history() +``` + +### Assertions + +```pseudo +# History contains the published message +ASSERT history.items.length >= 1 + +# Find the published message in history +published_msg = history.items.find(m => m.name == "test-msg") +ASSERT published_msg IS NOT null +ASSERT published_msg.data == "hello world" + +# Proxy event log shows both WebSocket and HTTP traffic +log = session.get_log() + +# At least one WebSocket connection was made (Realtime client) +ws_connects = log.filter(e => e.type == "ws_connect") +ASSERT ws_connects.length >= 1 + +# At least one HTTP request was made (REST history call + token requests) +http_requests = log.filter(e => e.type == "http_request") +ASSERT http_requests.length >= 1 +``` + +### Cleanup + +```pseudo +# Close the Realtime client +realtime_client.connection.close() +AWAIT_STATE realtime_client.connection.state == ConnectionState.closed + WITH timeout: 10 seconds +``` + +--- + +## Integration Test Notes + +### Timeout Handling + +All `AWAIT_STATE` calls use generous timeouts because real network traffic is involved: +- Connection to CONNECTED via proxy: 15 seconds (allows for auth + transport setup) +- Channel attach: 10 seconds +- History polling: 10 seconds (allows for eventual consistency) +- Cleanup close: 10 seconds + +### Authentication Through Proxy + +All tests use `authCallback` with token auth rather than API key auth. This is required because: +1. `tls: false` is needed for proxy tests (proxy serves plain HTTP/WS with TLS only upstream) +2. RSC18 prohibits Basic auth over non-TLS connections +3. `authCallback` makes tokens renewable, which is needed for RSC10 (token renewal test) + +The `authCallback` requests tokens directly from the sandbox REST API (bypassing the proxy) using the API key. Only the SDK's own HTTP/WebSocket traffic goes through the proxy. + +### Fallback Host Behaviour + +With `endpoint: "localhost"`, fallback hosts are automatically disabled (REC2c2). This means: +- RSC15a: The SDK will NOT attempt fallback hosts after a 5xx error +- The error propagates directly to the caller +- The proxy log will show only a single HTTP request (no fallback attempts) + +### Why Proxy Tests for REST Faults + +These tests verify behaviour that unit tests cover with mocked HTTP, but provide higher confidence because: +1. **Real HTTP connections** -- the SDK's actual HTTP client is exercised through the proxy +2. **Real token renewal** -- RSC10 exercises the full authCallback flow against the sandbox +3. **Real error responses** -- the proxy returns correctly-formatted HTTP error responses +4. **End-to-end verification** -- RTL6 confirms publish and history work through the proxy infrastructure