Skip to content

Console auto-discovers workers via GetClusterWorkers RPC#1

Draft
EdsonPetry wants to merge 20 commits intoedson.petry/console-dashboardfrom
edson.petry/console-worker-discovery
Draft

Console auto-discovers workers via GetClusterWorkers RPC#1
EdsonPetry wants to merge 20 commits intoedson.petry/console-dashboardfrom
edson.petry/console-worker-discovery

Conversation

@EdsonPetry
Copy link
Copy Markdown
Owner

Stacked on PR 363

@EdsonPetry EdsonPetry force-pushed the edson.petry/console-worker-discovery branch from 67f8391 to 1cd8d46 Compare March 13, 2026 19:08
jayshrivastava and others added 3 commits March 16, 2026 11:13
…rics (datafusion-contrib#363)

## Summary

- **Rewrite the console TUI** with a worker-centric dashboard featuring
a cluster overview and per-worker detail views, replacing the previous
global task view
- **Extend the observability service** to collect worker-level metrics
including CPU usage, RSS memory, output rows, unique query IDs, and task
counts via updated protobuf definitions
- **Add CPU/RSS sparklines and sortable columns** to the worker table,
with queries-in-flight metric in the cluster metrics panel
- **Add new examples**: a cluster spawner/killer example and concurrent
TPC-DS query runner with explain analyze support

## Changes

### Console TUI (`console/`)
- Split monolithic `ui.rs` into modular components: `cluster.rs`,
`worker.rs`, `header.rs`, `footer.rs`, `help.rs`
- New `input.rs` for keyboard event handling and `state.rs` for shared
UI state
- Worker table with sortable columns (Name, Stage, Tasks, Queries, CPU,
Memory)
- Sparkline charts for CPU and RSS per worker
- Cluster-level metrics: throughput, active/total workers, completed
queries, avg duration
- Help overlay (toggle with `?`)

### Observability (`src/observability/`)
- Extended protobuf schema with `cpu_usage`, `rss_bytes`, `output_rows`,
`query_ids`, `num_tasks` fields
- Updated service to generate and collect the new per-worker metrics
- Worker now reports richer metrics through the progress API

### Flight service (`src/flight_service/`)
- Changed `observability_service()` generator method on `Worker` to
`with_observability_service()`, users now no longer need to wrap the
`ObservbilityServiceImpl` that the old observability service generator
created with a `ObservabilityServiceServer`.
old:
```rust
    let worker = Worker::default();
   let observability_service = worker.observability_service();

    Server::builder()
        .add_service(ObservabilityServiceServer::new(observability_service))
        .add_service(worker.into_flight_server())
        .serve(SocketAddr::new(IpAddr::V4(Ipv4Addr::LOCALHOST), args.port))
        .await?;
```

new
```rust
    let worker = Worker::default();

    Server::builder()
        .add_service(worker.with_observability_service())
        .add_service(worker.into_flight_server())
        .serve(SocketAddr::new(IpAddr::V4(Ipv4Addr::LOCALHOST), args.port))
        .await?;
```



### Examples
- `cluster.rs`: spawns and manages a local cluster of workers with
observability
- `tpcds_runner.rs`: supports concurrent query execution and explain
analyze

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@EdsonPetry EdsonPetry force-pushed the edson.petry/console-worker-discovery branch 2 times, most recently from 3af59fc to 6a34c46 Compare March 17, 2026 16:21
EdsonPetry and others added 17 commits March 18, 2026 20:50
…spawn` (datafusion-contrib#379)

This change fixes the background system metrics collection thread that
silently never ran, causing CPU and memory to always report as zero in
the console. Instead of having a dedicated background thread we have a
dedicated tokio task handle system metrics collection on a worker's
observability service.
Console now supports two modes:
- Auto-discovery (default): connects to a seed worker, calls
GetClusterWorkers to find all cluster workers, and re-polls every 5s for
topology changes
- Manual mode (--cluster-ports): unchanged behavior for local dev

New --connect flag specifies a seed URL; defaults to localhost:6789
…gument

Remove DEFAULT_WORKER_PORT from the library and make the console port a
required positional argument instead of silently defaulting to 9001.
@EdsonPetry EdsonPetry force-pushed the edson.petry/console-worker-discovery branch from 0ddbde1 to 8c87e4a Compare March 19, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants