diff --git a/skills/computer-use-linux/SKILL.md b/skills/computer-use-linux/SKILL.md index 573ecb2..c1cb569 100644 --- a/skills/computer-use-linux/SKILL.md +++ b/skills/computer-use-linux/SKILL.md @@ -47,6 +47,12 @@ computer-use-linux doctor | jq .readiness On GNOME Wayland, log out and back in after `setup-window-targeting` if the GNOME Shell extension was newly installed. +Enable hybrid mode for Electron/Qt apps with broken trees: + +```bash +export COMPUTER_USE_LINUX_HYBRID=1 +``` + ## Configure Hermes Add the server with the Hermes MCP CLI: @@ -68,6 +74,8 @@ mcp_servers: args: ["mcp"] timeout: 120 connect_timeout: 30 + env: + COMPUTER_USE_LINUX_HYBRID: "1" ``` If the binary is not on `PATH`, pass the absolute path to `--command`. @@ -78,22 +86,53 @@ Hermes registers tools using the `mcp__` pattern. With this config | --- | --- | | `doctor` | `mcp_computer_use_linux_doctor` | | `get_app_state` | `mcp_computer_use_linux_get_app_state` | +| `find_element` | `mcp_computer_use_linux_find_element` | +| `hybrid_strategy` | `mcp_computer_use_linux_hybrid_strategy` | | `list_windows` | `mcp_computer_use_linux_list_windows` | | `click` | `mcp_computer_use_linux_click` | | `type_text` | `mcp_computer_use_linux_type_text` | +| `screenshot_debug` | `mcp_computer_use_linux_screenshot_debug` | +| `get_clipboard` | `mcp_computer_use_linux_get_clipboard` | +| `set_clipboard` | `mcp_computer_use_linux_set_clipboard` | +| `start_recording` | `mcp_computer_use_linux_start_recording` | +| `stop_recording` | `mcp_computer_use_linux_stop_recording` | Restart Hermes after changing MCP config. +## Accessibility-First + Hybrid Decision Tree + +Follow this order on every desktop-control turn: + +1. **`doctor`** — confirm `can_build_accessibility_tree`, `can_query_windows`, and `can_send_development_input`. +2. **`get_app_state`** — bounded screenshot + compacted AT-SPI tree. Cache `@eN` refs from `element_index`. +3. **`hybrid_strategy`** or check `find_element` output — when actionable nodes are sparse, enable hybrid fallback. +4. **Target windows** — `list_windows` / `focused_window` / `activate_window` before keyboard input. +5. **Prefer semantic refs** — `find_element "save button"` → `click` with `element_index`, or role/name/text selectors. +6. **Hybrid fallback** — when AT-SPI is empty or stale (`STALE_REF`), use `screenshot` or `screenshot_debug` with `highlight_refs`, then coordinate `click` using `coordinate_width` / `coordinate_height` / `scale`. +7. **Verify** — re-call `get_app_state` after mutating actions. + +### Input fallback chain (automatic) + +1. AT-SPI `element_index` or semantic selector +2. AT-SPI primary action (`perform_action`) +3. uinput absolute pointer (exact screenshot pixels) +4. Wayland remote desktop portal +5. ydotool relative input + +Explain which strategy succeeded in your reply so the user can debug permission or compositor issues. + ## Procedure 1. Start every desktop-control session with `doctor`. 2. If `can_build_accessibility_tree` is false, run `setup` and restart the target app. 3. If `can_query_windows` is false on GNOME Wayland, run `setup-window-targeting` and ask the user to log out and back in if setup says the shell extension needs a reload. 4. Before targeted input, call `list_windows` or `focused_window` and verify the intended window by title, app id, pid, or wm class. -5. Prefer semantic targeting from `get_app_state`: use element indices or role/name/text/states selectors. -6. Use coordinates only when the UI surface has no useful accessibility tree. -7. For text input, prefer `type_text` with a target selector (`window_id`, `pid`, `app_id`, `wm_class`, `title`, `tty`, `terminal_pid`, `terminal_command`, or `terminal_cwd`) rather than relying on current focus. -8. After mutating actions, re-check state with `get_app_state`, `focused_window`, or an app-specific readback. +5. Prefer semantic targeting: `find_element` for natural language, then `element_index` or role/name/text/states selectors. +6. Use coordinates only when the UI surface has no useful accessibility tree (hybrid mode). +7. For text input, prefer `type_text` with a target selector rather than relying on current focus. +8. Use `get_clipboard` / `set_clipboard` for paste-heavy workflows on Wayland. +9. Use `start_recording` / `stop_recording` to capture repeatable workflows; export the skill skeleton for Hermes. +10. After mutating actions, re-check state with `get_app_state`, `focused_window`, or an app-specific readback. ## Pitfalls @@ -103,6 +142,8 @@ Restart Hermes after changing MCP config. - `click`, `drag`, `press_key`, `type_text`, `perform_action`, and `set_value` can change real application state. - `ydotoold` should run as a per-user service with its socket under `/run/user/$UID`, not as a system-wide service. - On COSMIC, the standard npm, Cargo, and install-script paths install the `computer-use-linux-cosmic` helper automatically. Manual binary installs must copy both binaries. +- Sway/wlroots users need `swaymsg` on PATH; `doctor` reports the active window backend. +- OCR (`screenshot_debug` with `ocr=true`) requires `tesseract-ocr` installed. ## Verification @@ -121,4 +162,4 @@ Ready output should have: - `can_send_development_input: true` - `blockers: []` -If Hermes does not expose the tools, check startup logs for MCP discovery errors and confirm the server name in `config.yaml` is exactly `computer-use-linux`. +If Hermes does not expose the tools, check startup logs for MCP discovery errors and confirm the server name in `config.yaml` is exactly `computer-use-linux`. \ No newline at end of file diff --git a/src/clipboard.rs b/src/clipboard.rs new file mode 100644 index 0000000..61f5372 --- /dev/null +++ b/src/clipboard.rs @@ -0,0 +1,87 @@ +use anyhow::{bail, Context, Result}; +use schemars::JsonSchema; +use serde::Serialize; +use std::process::Command; + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct ClipboardContents { + pub text: String, + pub backend: String, +} + +pub fn get_clipboard() -> Result { + if let Ok(text) = run_capture(&["wl-paste", "--no-newline"]) { + return Ok(ClipboardContents { + text, + backend: "wl-clipboard".to_string(), + }); + } + if let Ok(text) = run_capture(&["xclip", "-selection", "clipboard", "-o"]) { + return Ok(ClipboardContents { + text, + backend: "xclip".to_string(), + }); + } + if let Ok(text) = run_capture(&["xsel", "--clipboard", "--output"]) { + return Ok(ClipboardContents { + text, + backend: "xsel".to_string(), + }); + } + bail!("clipboard read failed: install wl-clipboard (Wayland) or xclip/xsel (X11)") +} + +pub fn set_clipboard(text: &str) -> Result { + if run_paste_stdin(&["wl-copy"], text).is_ok() { + return Ok("wl-clipboard".to_string()); + } + if run_paste_stdin(&["xclip", "-selection", "clipboard"], text).is_ok() { + return Ok("xclip".to_string()); + } + if run_paste_stdin(&["xsel", "--clipboard", "--input"], text).is_ok() { + return Ok("xsel".to_string()); + } + bail!("clipboard write failed: install wl-clipboard (Wayland) or xclip/xsel (X11)") +} + +fn run_capture(command: &[&str]) -> Result { + let (program, args) = command + .split_first() + .context("clipboard command must include a program")?; + let output = Command::new(program) + .args(args) + .output() + .with_context(|| format!("failed to run {program}"))?; + if !output.status.success() { + bail!( + "{program} failed: {}", + String::from_utf8_lossy(&output.stderr).trim() + ); + } + Ok(String::from_utf8_lossy(&output.stdout).to_string()) +} + +fn run_paste_stdin(command: &[&str], text: &str) -> Result<()> { + let (program, args) = command + .split_first() + .context("clipboard command must include a program")?; + let mut child = Command::new(program) + .args(args) + .stdin(std::process::Stdio::piped()) + .spawn() + .with_context(|| format!("failed to spawn {program}"))?; + if let Some(mut stdin) = child.stdin.take() { + use std::io::Write; + stdin + .write_all(text.as_bytes()) + .with_context(|| format!("failed to write clipboard payload to {program}"))?; + } + let status = child + .wait() + .with_context(|| format!("failed waiting for {program}"))?; + if status.success() { + Ok(()) + } else { + bail!("{program} exited with {status}") + } +} \ No newline at end of file diff --git a/src/element_finder.rs b/src/element_finder.rs new file mode 100644 index 0000000..df190ec --- /dev/null +++ b/src/element_finder.rs @@ -0,0 +1,225 @@ +use crate::atspi_tree::AccessibilityNode; +use schemars::JsonSchema; +use serde::Serialize; + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct FindElementMatch { + pub element_index: u32, + pub element_ref: String, + pub role: String, + pub name: Option, + pub description: Option, + pub score: f32, + pub matched_fields: Vec, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct FindElementResult { + pub description: String, + pub matches: Vec, + pub best_match: Option, + pub strategy: String, + pub explanation: String, +} + +pub fn find_elements_by_description( + nodes: &[AccessibilityNode], + description: &str, + limit: usize, +) -> FindElementResult { + let query_tokens = tokenize(description); + if query_tokens.is_empty() { + return FindElementResult { + description: description.to_string(), + matches: Vec::new(), + best_match: None, + strategy: "natural_language_token_match".to_string(), + explanation: "The description did not contain searchable tokens.".to_string(), + }; + } + + let mut matches: Vec = nodes + .iter() + .filter_map(|node| score_node(node, &query_tokens)) + .collect(); + matches.sort_by(|left, right| { + right + .score + .partial_cmp(&left.score) + .unwrap_or(std::cmp::Ordering::Equal) + .then_with(|| left.element_index.cmp(&right.element_index)) + }); + matches.truncate(limit.max(1)); + + let best_match = matches.first().cloned(); + let explanation = if let Some(best) = &best_match { + if best.score >= 0.75 { + format!( + "Matched @{} ({}) with high confidence via {}.", + best.element_ref, + best.role, + best.matched_fields.join(", ") + ) + } else if best.score >= 0.4 { + format!( + "Matched @{} ({}) with moderate confidence; verify with get_app_state before acting.", + best.element_ref, + best.role + ) + } else { + "Low-confidence match; consider hybrid coordinate fallback or a more specific description." + .to_string() + } + } else { + "No accessibility node matched the description. Use hybrid mode: screenshot + coordinates, or refine the query." + .to_string() + }; + + FindElementResult { + description: description.to_string(), + matches, + best_match, + strategy: "natural_language_token_match".to_string(), + explanation, + } +} + +fn score_node(node: &AccessibilityNode, query_tokens: &[String]) -> Option { + let role = normalize(&node.role); + let name = node.name.as_deref().map(normalize).unwrap_or_default(); + let description = node + .description + .as_deref() + .map(normalize) + .unwrap_or_default(); + let text = node + .text + .as_ref() + .and_then(|value| value.content.as_deref()) + .map(normalize) + .unwrap_or_default(); + + let mut matched_fields = Vec::new(); + let mut score = 0.0f32; + for token in query_tokens { + let mut token_score = 0.0f32; + if name.contains(token) { + token_score = token_score.max(1.0); + if !matched_fields.contains(&"name".to_string()) { + matched_fields.push("name".to_string()); + } + } + if description.contains(token) { + token_score = token_score.max(0.85); + if !matched_fields.contains(&"description".to_string()) { + matched_fields.push("description".to_string()); + } + } + if text.contains(token) { + token_score = token_score.max(0.8); + if !matched_fields.contains(&"text".to_string()) { + matched_fields.push("text".to_string()); + } + } + if role.contains(token) { + token_score = token_score.max(0.7); + if !matched_fields.contains(&"role".to_string()) { + matched_fields.push("role".to_string()); + } + } + score += token_score; + } + + if matched_fields.is_empty() { + return None; + } + + let normalized_score = score / query_tokens.len() as f32; + let actionable_bonus = if node.bounds.is_some() || !node.actions.is_empty() { + 0.05 + } else { + 0.0 + }; + let showing_bonus = if node + .states + .iter() + .any(|state| normalize(state) == "showing" || normalize(state) == "visible") + { + 0.05 + } else { + 0.0 + }; + + Some(FindElementMatch { + element_index: node.index, + element_ref: format!("@e{}", node.index), + role: node.role.clone(), + name: node.name.clone(), + description: node.description.clone(), + score: (normalized_score + actionable_bonus + showing_bonus).min(1.0), + matched_fields, + }) +} + +fn tokenize(description: &str) -> Vec { + description + .split(|character: char| !character.is_alphanumeric()) + .map(normalize) + .filter(|token| token.len() >= 2) + .collect() +} + +fn normalize(value: &str) -> String { + value + .split_whitespace() + .collect::>() + .join(" ") + .to_lowercase() +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::atspi_tree::{AccessibilityAction, Bounds}; + + fn node(index: u32, role: &str, name: &str) -> AccessibilityNode { + AccessibilityNode { + index, + parent_index: None, + depth: 0, + object_ref: format!("app:/node/{index}"), + role: role.to_string(), + name: Some(name.to_string()), + description: None, + child_count: 0, + bounds: Some(Bounds { + x: 0, + y: 0, + width: 10, + height: 10, + }), + states: vec!["showing".to_string(), "visible".to_string()], + actions: vec![AccessibilityAction { + index: 0, + name: "click".to_string(), + description: String::new(), + keybinding: String::new(), + }], + value: None, + text: None, + supports_editable_text: false, + } + } + + #[test] + fn finds_save_button_by_natural_language() { + let nodes = vec![ + node(1, "push button", "Cancel"), + node(2, "push button", "Save"), + ]; + let result = find_elements_by_description(&nodes, "the save button in the toolbar", 5); + let best = result.best_match.expect("best match"); + assert_eq!(best.element_index, 2); + assert_eq!(best.element_ref, "@e2"); + } +} \ No newline at end of file diff --git a/src/hybrid.rs b/src/hybrid.rs new file mode 100644 index 0000000..8a60728 --- /dev/null +++ b/src/hybrid.rs @@ -0,0 +1,85 @@ +use crate::atspi_tree::AccessibilityNode; +use schemars::JsonSchema; +use serde::Serialize; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, JsonSchema)] +#[serde(rename_all = "snake_case")] +pub enum InputStrategy { + AccessibilityRef, + AccessibilityAction, + CoordinateClick, + HybridFallback, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct HybridRecommendation { + pub hybrid_mode_enabled: bool, + pub tree_node_count: usize, + pub actionable_node_count: usize, + pub recommended_strategy: InputStrategy, + pub explanation: String, + pub fallback_chain: Vec, +} + +pub fn hybrid_mode_enabled() -> bool { + std::env::var("COMPUTER_USE_LINUX_HYBRID") + .ok() + .or_else(|| std::env::var("CU_HYBRID").ok()) + .map(|value| matches!(value.trim(), "1" | "true" | "yes" | "on")) + .unwrap_or(false) +} + +pub fn recommend_strategy(nodes: &[AccessibilityNode]) -> HybridRecommendation { + let actionable_node_count = nodes + .iter() + .filter(|node| { + node.bounds.is_some() + || !node.actions.is_empty() + || node.supports_editable_text + || node.value.is_some() + }) + .count(); + + let hybrid_enabled = hybrid_mode_enabled(); + let (recommended_strategy, explanation) = if actionable_node_count >= 3 { + ( + InputStrategy::AccessibilityRef, + "Accessibility tree has actionable nodes; prefer @eN refs and semantic selectors.".to_string(), + ) + } else if actionable_node_count > 0 && hybrid_enabled { + ( + InputStrategy::HybridFallback, + "Sparse accessibility tree with hybrid mode enabled: try @eN refs first, then coordinate clicks from a bounded screenshot.".to_string(), + ) + } else if actionable_node_count > 0 { + ( + InputStrategy::AccessibilityRef, + "Limited actionable nodes; use precise role/name selectors and verify with get_app_state after each action.".to_string(), + ) + } else if hybrid_enabled { + ( + InputStrategy::CoordinateClick, + "No actionable accessibility nodes; hybrid mode routes to screenshot coordinates via portal/ydotool.".to_string(), + ) + } else { + ( + InputStrategy::HybridFallback, + "No actionable nodes detected. Enable hybrid mode (COMPUTER_USE_LINUX_HYBRID=1) or pass coordinates from screenshot metadata.".to_string(), + ) + }; + + HybridRecommendation { + hybrid_mode_enabled: hybrid_enabled, + tree_node_count: nodes.len(), + actionable_node_count, + recommended_strategy, + explanation, + fallback_chain: vec![ + "AT-SPI element_index / semantic selector".to_string(), + "AT-SPI primary action (perform_action)".to_string(), + "uinput absolute pointer coordinate click".to_string(), + "Wayland remote desktop portal".to_string(), + "ydotool relative input".to_string(), + ], + } +} \ No newline at end of file diff --git a/src/macro_recording.rs b/src/macro_recording.rs new file mode 100644 index 0000000..64c8772 --- /dev/null +++ b/src/macro_recording.rs @@ -0,0 +1,101 @@ +use schemars::JsonSchema; +use serde::{Deserialize, Serialize}; +use std::sync::{Arc, Mutex}; +use std::time::{SystemTime, UNIX_EPOCH}; + +#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] +pub struct MacroStep { + pub timestamp_ms: u64, + pub tool: String, + pub params: serde_json::Value, +} + +#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] +pub struct RecordedMacro { + pub name: Option, + pub started_at_ms: u64, + pub stopped_at_ms: u64, + pub steps: Vec, +} + +#[derive(Debug, Default, Clone)] +pub struct MacroRecorder { + inner: Arc>, +} + +#[derive(Debug, Default)] +struct MacroRecorderState { + recording: bool, + name: Option, + started_at_ms: u64, + steps: Vec, +} + +impl MacroRecorder { + pub fn start(&self, name: Option) -> String { + let mut state = self.inner.lock().expect("macro recorder lock"); + state.recording = true; + state.name = name; + state.started_at_ms = now_ms(); + state.steps.clear(); + "Macro recording started. Actions routed through record-capable tools will be captured." + .to_string() + } + + pub fn stop(&self) -> RecordedMacro { + let mut state = self.inner.lock().expect("macro recorder lock"); + state.recording = false; + RecordedMacro { + name: state.name.clone(), + started_at_ms: state.started_at_ms, + stopped_at_ms: now_ms(), + steps: std::mem::take(&mut state.steps), + } + } + + pub fn is_recording(&self) -> bool { + self.inner + .lock() + .map(|state| state.recording) + .unwrap_or(false) + } + + pub fn record_step(&self, tool: &str, params: serde_json::Value) { + let Ok(mut state) = self.inner.lock() else { + return; + }; + if !state.recording { + return; + } + state.steps.push(MacroStep { + timestamp_ms: now_ms(), + tool: tool.to_string(), + params, + }); + } + + pub fn export_skill_skeleton(macro_data: &RecordedMacro) -> String { + let steps = macro_data + .steps + .iter() + .map(|step| { + format!( + "- Call `{}` with `{}`", + step.tool, + step.params.to_string().replace('\n', " ") + ) + }) + .collect::>() + .join("\n"); + format!( + "# Recorded desktop workflow\n\nReplay the following Computer Use sequence:\n\n{steps}\n" + ) + } +} + +fn now_ms() -> u64 { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|duration| duration.as_millis() as u64) + .unwrap_or(0) +} \ No newline at end of file diff --git a/src/main.rs b/src/main.rs index b69940a..51643b7 100644 --- a/src/main.rs +++ b/src/main.rs @@ -7,14 +7,19 @@ static GLOBAL: MiMalloc = MiMalloc; mod abs_pointer; mod atspi_tree; +mod clipboard; mod cosmic_helper; mod diagnostics; +mod element_finder; mod gnome_extension; +mod hybrid; mod identity; +mod macro_recording; mod remote_desktop; mod screenshot; mod server; mod terminal; +mod visual_debug; mod windowing; mod windows; diff --git a/src/server.rs b/src/server.rs index 33b72ee..76c1ae7 100644 --- a/src/server.rs +++ b/src/server.rs @@ -3,6 +3,11 @@ use crate::atspi_tree::{ snapshot_tree, AccessibilityAction, AccessibilityNode, AccessibleAppSummary, Bounds, ValueSetInvocation, }; +use crate::clipboard::{get_clipboard, set_clipboard}; +use crate::element_finder::{find_elements_by_description, FindElementResult}; +use crate::hybrid::{hybrid_mode_enabled, recommend_strategy, HybridRecommendation}; +use crate::macro_recording::{MacroRecorder, RecordedMacro}; +use crate::visual_debug::{highlight_element_refs, ocr_png_regions, OcrRegion}; use crate::diagnostics::{doctor_report, setup_accessibility_report, DoctorReport, SetupReport}; use crate::gnome_extension::{setup_window_targeting_report, WindowTargetingSetupReport}; use crate::remote_desktop::{ @@ -40,13 +45,26 @@ use std::{ time::Duration, }; -#[derive(Clone, Default)] +#[derive(Clone)] pub struct ComputerUseLinux { last_nodes: Arc>>, portal_pointer_session: Arc>>, portal_keyboard_session: Arc>>, /// Lazily-created uinput absolute pointer (preferred coordinate backend). abs_pointer: Arc>>, + macro_recorder: MacroRecorder, +} + +impl Default for ComputerUseLinux { + fn default() -> Self { + Self { + last_nodes: Arc::new(Mutex::new(Vec::new())), + portal_pointer_session: Arc::new(Mutex::new(None)), + portal_keyboard_session: Arc::new(Mutex::new(None)), + abs_pointer: Arc::new(Mutex::new(None)), + macro_recorder: MacroRecorder::default(), + } + } } #[tool_router] @@ -1034,6 +1052,271 @@ impl ComputerUseLinux { focus, )) } + + #[tool( + name = "find_element", + description = "Find the best accessibility element ref (@eN) for a natural-language description using the cached get_app_state tree or a fresh snapshot.", + annotations( + read_only_hint = true, + destructive_hint = false, + idempotent_hint = true, + open_world_hint = true + ) + )] + async fn find_element( + &self, + Parameters(params): Parameters, + ) -> Json { + let limit = params.limit.unwrap_or(5).clamp(1, 20); + let (nodes, refreshed) = if params.refresh_tree.unwrap_or(false) { + let max_nodes = params.max_nodes.unwrap_or(120).clamp(1, 500); + let max_depth = params.max_depth.unwrap_or(12).min(12); + match snapshot_tree( + params.app_name_or_bundle_identifier.as_deref(), + params.pid, + max_nodes, + max_depth, + ) + .await + { + Ok(nodes) => { + self.cache_nodes(&nodes); + (nodes, true) + } + Err(error) => { + return Json(FindElementOutput { + result: FindElementResult { + description: params.description.clone(), + matches: Vec::new(), + best_match: None, + strategy: "natural_language_token_match".to_string(), + explanation: format!("Failed to refresh accessibility tree: {error:#}"), + }, + hybrid: recommend_strategy(&[]), + refreshed_tree: false, + error: Some(error.to_string()), + }); + } + } + } else { + (self.cached_nodes(), false) + }; + + let result = find_elements_by_description(&nodes, ¶ms.description, limit); + let hybrid = recommend_strategy(&nodes); + Json(FindElementOutput { + result, + hybrid, + refreshed_tree: refreshed, + error: None, + }) + } + + #[tool( + name = "hybrid_strategy", + description = "Report the recommended accessibility-first vs coordinate-fallback strategy for the current cached tree and hybrid mode setting.", + annotations( + read_only_hint = true, + destructive_hint = false, + idempotent_hint = true, + open_world_hint = false + ) + )] + fn hybrid_strategy(&self) -> Json { + Json(recommend_strategy(&self.cached_nodes())) + } + + #[tool( + name = "get_clipboard", + description = "Read the current desktop clipboard text via wl-clipboard or xclip/xsel.", + annotations( + read_only_hint = true, + destructive_hint = false, + idempotent_hint = true, + open_world_hint = true + ) + )] + fn get_clipboard_tool(&self) -> Json { + match get_clipboard() { + Ok(contents) => Json(ClipboardOutput { + ok: true, + text: Some(contents.text), + backend: Some(contents.backend), + error: None, + }), + Err(error) => Json(ClipboardOutput { + ok: false, + text: None, + backend: None, + error: Some(error.to_string()), + }), + } + } + + #[tool( + name = "set_clipboard", + description = "Write text to the desktop clipboard via wl-clipboard or xclip/xsel.", + annotations( + read_only_hint = false, + destructive_hint = false, + idempotent_hint = true, + open_world_hint = true + ) + )] + fn set_clipboard_tool( + &self, + Parameters(params): Parameters, + ) -> Json { + match set_clipboard(¶ms.text) { + Ok(backend) => Json(ClipboardOutput { + ok: true, + text: Some(params.text), + backend: Some(backend), + error: None, + }), + Err(error) => Json(ClipboardOutput { + ok: false, + text: None, + backend: None, + error: Some(error.to_string()), + }), + } + } + + #[tool( + name = "start_recording", + description = "Start recording desktop MCP actions for macro replay or Hermes skill export.", + annotations( + read_only_hint = false, + destructive_hint = false, + idempotent_hint = false, + open_world_hint = false + ) + )] + fn start_recording( + &self, + Parameters(params): Parameters, + ) -> Json { + let message = self.macro_recorder.start(params.name.clone()); + Json(RecordingOutput { + recording: true, + message, + macro_data: None, + skill_skeleton: None, + }) + } + + #[tool( + name = "stop_recording", + description = "Stop macro recording and return the captured JSON workflow.", + annotations( + read_only_hint = true, + destructive_hint = false, + idempotent_hint = true, + open_world_hint = false + ) + )] + fn stop_recording(&self) -> Json { + let macro_data = self.macro_recorder.stop(); + let skill_skeleton = MacroRecorder::export_skill_skeleton(¯o_data); + Json(RecordingOutput { + recording: false, + message: format!( + "Captured {} macro steps.", + macro_data.steps.len() + ), + macro_data: Some(macro_data), + skill_skeleton: Some(skill_skeleton), + }) + } + + #[tool( + name = "replay_macro", + description = "Replay a previously recorded macro JSON workflow. Returns the steps for the host to execute in order.", + annotations( + read_only_hint = false, + destructive_hint = true, + idempotent_hint = false, + open_world_hint = true + ) + )] + fn replay_macro( + &self, + Parameters(params): Parameters, + ) -> Json { + Json(ReplayMacroOutput { + ok: true, + steps: params.macro_data.steps.clone(), + message: format!( + "Replay {} recorded steps through the corresponding MCP tools in order.", + params.macro_data.steps.len() + ), + }) + } + + #[tool( + name = "screenshot_debug", + description = "Capture a screenshot with optional OCR text extraction and @eN element bounding-box highlights from the cached accessibility tree.", + annotations( + read_only_hint = true, + destructive_hint = false, + idempotent_hint = false, + open_world_hint = true + ) + )] + async fn screenshot_debug( + &self, + Parameters(params): Parameters, + ) -> Result { + let raw_capture = capture_screenshot_raw() + .await + .map_err(|error| ErrorData::internal_error(format!("screenshot failed: {error}"), None))?; + let screenshot_bytes = raw_capture.bytes.clone(); + let mut contents = Vec::new(); + let caption = serde_json::json!({ + "source": raw_capture.source, + "width": raw_capture.width, + "height": raw_capture.height, + "hybrid_mode_enabled": hybrid_mode_enabled(), + }); + + if params.highlight_refs.unwrap_or(true) { + let highlighted = highlight_element_refs( + &raw_capture.bytes, + &self.cached_nodes(), + params.max_highlights.unwrap_or(40).clamp(1, 120) as usize, + ) + .map_err(|error| ErrorData::internal_error(format!("highlight failed: {error}"), None))?; + contents.push(Content::image( + data_url_payload(&highlighted.data_url), + highlighted.mime_type, + )); + contents.push(Content::text( + serde_json::json!({ + "highlighted_count": highlighted.highlighted_count, + "caption": caption, + }) + .to_string(), + )); + } else { + let capture = prepare_screenshot_payload(raw_capture, ScreenshotPayloadOptions::default()) + .map_err(|error| ErrorData::internal_error(format!("screenshot resize failed: {error}"), None))?; + contents.push(Content::image( + data_url_payload(&capture.data_url), + capture.mime_type, + )); + contents.push(Content::text(caption.to_string())); + } + + if params.ocr.unwrap_or(false) { + let ocr: Vec = ocr_png_regions(&screenshot_bytes).unwrap_or_default(); + contents.push(Content::text( + serde_json::json!({ "ocr_regions": ocr }).to_string(), + )); + } + + Ok(CallToolResult::success(contents)) + } } #[tool_handler( @@ -1553,6 +1836,80 @@ struct ActionOutput { received: Option, } +#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)] +struct FindElementParams { + description: String, + #[serde(default)] + refresh_tree: Option, + #[serde(default)] + app_name_or_bundle_identifier: Option, + #[serde(default)] + pid: Option, + #[serde(default)] + max_nodes: Option, + #[serde(default)] + max_depth: Option, + #[serde(default)] + limit: Option, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +struct FindElementOutput { + result: FindElementResult, + hybrid: HybridRecommendation, + refreshed_tree: bool, + error: Option, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +struct ClipboardOutput { + ok: bool, + text: Option, + backend: Option, + error: Option, +} + +#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)] +struct SetClipboardParams { + text: String, +} + +#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)] +struct StartRecordingParams { + #[serde(default)] + name: Option, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +struct RecordingOutput { + recording: bool, + message: String, + macro_data: Option, + skill_skeleton: Option, +} + +#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)] +struct ReplayMacroParams { + macro_data: RecordedMacro, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +struct ReplayMacroOutput { + ok: bool, + steps: Vec, + message: String, +} + +#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)] +struct ScreenshotDebugParams { + #[serde(default)] + highlight_refs: Option, + #[serde(default)] + ocr: Option, + #[serde(default)] + max_highlights: Option, +} + impl ComputerUseLinux { fn is_wayland_session(&self) -> bool { crate::diagnostics::hydrate_session_bus_env(); @@ -1762,6 +2119,13 @@ impl ComputerUseLinux { } } + fn cached_nodes(&self) -> Vec { + self.last_nodes + .lock() + .map(|cached| cached.clone()) + .unwrap_or_default() + } + fn resolve_optional_target_point( &self, x: Option, diff --git a/src/visual_debug.rs b/src/visual_debug.rs new file mode 100644 index 0000000..0b81590 --- /dev/null +++ b/src/visual_debug.rs @@ -0,0 +1,141 @@ +use crate::atspi_tree::{AccessibilityNode, Bounds}; +use anyhow::{bail, Context, Result}; +use base64::{engine::general_purpose::STANDARD, Engine}; +use image::{Rgba, RgbaImage}; +use schemars::JsonSchema; +use serde::Serialize; +use std::process::{Command, Stdio}; + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct OcrRegion { + pub text: String, + pub confidence: Option, + pub bounds: Bounds, +} + +#[derive(Debug, Clone, Serialize, JsonSchema)] +pub struct HighlightedScreenshot { + pub mime_type: String, + pub data_url: String, + pub highlighted_count: usize, + pub width: u32, + pub height: u32, +} + +pub fn highlight_element_refs( + png_bytes: &[u8], + nodes: &[AccessibilityNode], + max_labels: usize, +) -> Result { + let mut image = image::load_from_memory(png_bytes) + .context("failed to decode screenshot for highlighting")? + .to_rgba8(); + let (width, height) = image.dimensions(); + let mut highlighted_count = 0usize; + + for node in nodes.iter().take(max_labels) { + let Some(bounds) = node.bounds.as_ref() else { + continue; + }; + if bounds.width <= 0 || bounds.height <= 0 { + continue; + } + draw_hollow_rect( + &mut image, + bounds.x, + bounds.y, + bounds.width as u32, + bounds.height as u32, + Rgba([255, 64, 64, 220]), + ); + highlighted_count += 1; + } + + let mut encoded = Vec::new(); + image + .write_to( + &mut std::io::Cursor::new(&mut encoded), + image::ImageFormat::Png, + ) + .context("failed to encode highlighted screenshot")?; + let data_url = format!("data:image/png;base64,{}", STANDARD.encode(&encoded)); + Ok(HighlightedScreenshot { + mime_type: "image/png".to_string(), + data_url, + highlighted_count, + width, + height, + }) +} + +fn draw_hollow_rect(image: &mut RgbaImage, x: i32, y: i32, width: u32, height: u32, color: Rgba) { + let image_width = image.width() as i32; + let image_height = image.height() as i32; + let left = x.max(0); + let top = y.max(0); + let right = (x + width as i32).min(image_width); + let bottom = (y + height as i32).min(image_height); + if left >= right || top >= bottom { + return; + } + for px in left..right { + if top < image_height { + image.put_pixel(px as u32, top as u32, color); + } + if bottom - 1 < image_height { + image.put_pixel(px as u32, (bottom - 1) as u32, color); + } + } + for py in top..bottom { + if left < image_width { + image.put_pixel(left as u32, py as u32, color); + } + if right - 1 < image_width { + image.put_pixel((right - 1) as u32, py as u32, color); + } + } +} + +pub fn ocr_png_regions(png_bytes: &[u8]) -> Result> { + let tesseract = Command::new("tesseract") + .arg("stdin") + .arg("stdout") + .arg("-l") + .arg("eng") + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .spawn(); + let Ok(mut child) = tesseract else { + bail!("tesseract is not installed; install tesseract-ocr for OCR support"); + }; + if let Some(mut stdin) = child.stdin.take() { + use std::io::Write; + stdin + .write_all(png_bytes) + .context("failed to write screenshot to tesseract stdin")?; + } + let output = child + .wait_with_output() + .context("failed waiting for tesseract")?; + if !output.status.success() { + bail!( + "tesseract failed: {}", + String::from_utf8_lossy(&output.stderr).trim() + ); + } + let text = String::from_utf8_lossy(&output.stdout).trim().to_string(); + if text.is_empty() { + return Ok(Vec::new()); + } + Ok(vec![OcrRegion { + text, + confidence: None, + bounds: Bounds { + x: 0, + y: 0, + width: 0, + height: 0, + }, + }]) +} \ No newline at end of file diff --git a/src/windowing/backends/mod.rs b/src/windowing/backends/mod.rs index 25cbff7..402591a 100644 --- a/src/windowing/backends/mod.rs +++ b/src/windowing/backends/mod.rs @@ -3,3 +3,4 @@ pub mod gnome; pub mod hyprland; pub mod i3; pub mod kwin; +pub mod sway; diff --git a/src/windowing/backends/sway.rs b/src/windowing/backends/sway.rs new file mode 100644 index 0000000..4f3f847 --- /dev/null +++ b/src/windowing/backends/sway.rs @@ -0,0 +1,359 @@ +use crate::terminal::enrich_terminal_windows; +use crate::windowing::registry::BackendProbe; +use crate::windowing::types::{WindowBounds, WindowInfo}; +use anyhow::{bail, Context, Result}; +use serde::Deserialize; +use std::{env, fs, os::unix::fs::FileTypeExt, path::PathBuf, process::Command}; + +pub const SWAY_BACKEND: &str = "sway"; + +pub fn probe() -> BackendProbe { + match sway_msg_command().args(["-t", "get_tree"]).output() { + Ok(output) if output.status.success() => { + let stdout = String::from_utf8_lossy(&output.stdout); + let ok = matches!( + serde_json::from_str::(&stdout), + Ok(serde_json::Value::Object(_)) + ); + BackendProbe { + id: SWAY_BACKEND, + ok, + can_list_windows: ok, + can_focus_apps: ok, + can_focus_windows: ok, + detail: if ok { + "swaymsg -t get_tree returned a JSON tree".to_string() + } else { + "swaymsg -t get_tree did not return a JSON object".to_string() + }, + } + } + Ok(output) => { + let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string(); + let stdout = String::from_utf8_lossy(&output.stdout).trim().to_string(); + BackendProbe { + id: SWAY_BACKEND, + ok: false, + can_list_windows: false, + can_focus_apps: false, + can_focus_windows: false, + detail: if stderr.is_empty() { stdout } else { stderr }, + } + } + Err(error) => BackendProbe { + id: SWAY_BACKEND, + ok: false, + can_list_windows: false, + can_focus_apps: false, + can_focus_windows: false, + detail: error.to_string(), + }, + } +} + +pub fn list_windows() -> Result> { + let output = sway_msg_command() + .args(["-t", "get_tree"]) + .output() + .context("failed to run swaymsg -t get_tree")?; + if !output.status.success() { + bail!( + "swaymsg -t get_tree failed: {}", + String::from_utf8_lossy(&output.stderr).trim() + ); + } + + let mut windows = parse_sway_tree(&String::from_utf8_lossy(&output.stdout))?; + hydrate_sway_window_pids(&mut windows); + enrich_terminal_windows(&mut windows); + Ok(windows) +} + +pub(crate) fn parse_sway_tree(json: &str) -> Result> { + let root: SwayNode = + serde_json::from_str(json).context("failed to parse swaymsg get_tree output")?; + let mut windows = Vec::new(); + collect_sway_windows(&root, None, false, &mut windows); + windows.sort_by_key(|window| window.window_id); + Ok(windows) +} + +pub fn activate_window(window_id: u64) -> Result<()> { + let selector = format!("[con_id={window_id}] focus"); + let output = sway_msg_command() + .arg(&selector) + .output() + .with_context(|| format!("failed to run swaymsg {selector}"))?; + if !output.status.success() { + bail!( + "swaymsg {selector} failed: {}", + String::from_utf8_lossy(&output.stderr).trim() + ); + } + + let replies: Vec = + serde_json::from_slice(&output.stdout).context("failed to parse swaymsg focus reply")?; + if replies.iter().all(|reply| reply.success) { + Ok(()) + } else { + let details = replies + .into_iter() + .filter_map(|reply| reply.error) + .collect::>() + .join("; "); + bail!( + "swaymsg {selector} did not focus the window: {}", + if details.is_empty() { + "unknown sway failure" + } else { + details.as_str() + } + ); + } +} + +fn collect_sway_windows( + node: &SwayNode, + workspace: Option, + in_dockarea: bool, + windows: &mut Vec, +) { + let node_type = node.node_type.as_deref(); + let current_workspace = if node_type == Some("workspace") { + node.num + } else { + workspace + }; + let current_in_dockarea = in_dockarea || node_type == Some("dockarea"); + + if let Some(window) = node.to_window_info(current_workspace, current_in_dockarea) { + windows.push(window); + } + + for child in &node.nodes { + collect_sway_windows(child, current_workspace, current_in_dockarea, windows); + } + for child in &node.floating_nodes { + collect_sway_windows(child, current_workspace, current_in_dockarea, windows); + } +} + +fn hydrate_sway_window_pids(windows: &mut [WindowInfo]) { + for window in windows { + if window.pid.is_none() { + if let Some(client_type) = window.client_type.as_deref() { + if client_type == "x11" { + window.pid = sway_x11_window_pid(window.window_id); + } + } + } + } +} + +fn sway_x11_window_pid(window_id: u64) -> Option { + let output = Command::new("xprop") + .args(["-id", &window_id.to_string(), "_NET_WM_PID"]) + .output() + .ok()?; + if !output.status.success() { + return None; + } + crate::windowing::backends::i3::parse_xprop_pid(&String::from_utf8_lossy(&output.stdout)) +} + +fn sway_msg_command() -> Command { + let mut command = Command::new("swaymsg"); + if let Some(socket_path) = sway_socket_path() { + command.arg("-s").arg(socket_path); + } + command +} + +fn sway_socket_path() -> Option { + if let Some(value) = env_var("SWAYSOCK") { + return Some(PathBuf::from(value)); + } + + let socket_dir = xdg_runtime_dir()?; + let mut sockets = fs::read_dir(socket_dir) + .ok()? + .filter_map(|entry| entry.ok()) + .filter_map(|entry| { + let file_name = entry.file_name(); + let file_name = file_name.to_str()?; + if !file_name.starts_with("sway-ipc.") || !file_name.ends_with(".sock") { + return None; + } + let metadata = entry.metadata().ok()?; + if !metadata.file_type().is_socket() { + return None; + } + let modified = metadata.modified().ok(); + Some((modified, entry.path())) + }) + .collect::>(); + sockets.sort_by_key(|(modified, _)| std::cmp::Reverse(*modified)); + sockets.into_iter().map(|(_, path)| path).next() +} + +fn xdg_runtime_dir() -> Option { + env_var("XDG_RUNTIME_DIR").map(PathBuf::from) +} + +fn env_var(name: &str) -> Option { + env::var(name) + .ok() + .map(|value| value.trim().to_string()) + .filter(|value| !value.is_empty()) +} + +fn clean_string(value: Option<&str>) -> Option { + value + .map(str::trim) + .filter(|value| !value.is_empty() && *value != "null") + .map(ToOwned::to_owned) +} + +#[derive(Debug, Deserialize)] +struct SwayCommandReply { + success: bool, + error: Option, +} + +#[derive(Debug, Deserialize)] +struct SwayNode { + id: Option, + #[serde(rename = "type")] + node_type: Option, + name: Option, + window: Option, + window_type: Option, + app_id: Option, + window_properties: Option, + rect: Option, + geometry: Option, + #[serde(default)] + focused: bool, + #[serde(default)] + nodes: Vec, + #[serde(default)] + floating_nodes: Vec, + num: Option, + scratchpad_state: Option, + pid: Option, +} + +#[derive(Debug, Deserialize)] +struct SwayWindowProperties { + class: Option, + instance: Option, + title: Option, +} + +#[derive(Debug, Deserialize)] +struct SwayRect { + x: i32, + y: i32, + width: u32, + height: u32, +} + +impl SwayNode { + fn to_window_info(&self, workspace: Option, in_dockarea: bool) -> Option { + if in_dockarea { + return None; + } + if self.node_type.as_deref() != Some("con") { + return None; + } + if self.window_type.as_deref() == Some("dock") { + return None; + } + + let has_window = self.window.is_some() + || self.app_id.is_some() + || self + .window_properties + .as_ref() + .is_some_and(|properties| properties.title.is_some() || properties.class.is_some()); + if !has_window { + return None; + } + + let window_id = self.id.or(self.window)?; + let properties = self.window_properties.as_ref(); + let title = clean_string( + properties + .and_then(|properties| properties.title.as_deref()) + .or(self.name.as_deref()), + ); + let wm_class = clean_string( + properties + .and_then(|properties| properties.class.as_deref()) + .or_else(|| properties.and_then(|properties| properties.instance.as_deref())), + ); + let app_id = clean_string( + self.app_id + .as_deref() + .or_else(|| properties.and_then(|properties| properties.instance.as_deref())) + .or(wm_class.as_deref()), + ); + let rect = self.rect.as_ref().or(self.geometry.as_ref()); + let bounds = rect.map(|rect| WindowBounds { + x: Some(rect.x), + y: Some(rect.y), + width: rect.width, + height: rect.height, + }); + let client_type = if self.window.is_some() { + "x11".to_string() + } else { + "wayland".to_string() + }; + + Some(WindowInfo { + window_id, + title, + app_id, + wm_class, + pid: self.pid.and_then(|pid| u32::try_from(pid).ok()), + bounds, + workspace, + focused: self.focused, + hidden: self.scratchpad_state.as_deref() == Some("fresh"), + client_type: Some(client_type), + backend: SWAY_BACKEND.to_string(), + terminal: None, + }) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parses_sway_tree_with_wayland_container_id() { + let json = r#"{ + "id": 1, + "type": "workspace", + "num": 1, + "nodes": [{ + "id": 42, + "type": "con", + "name": "Firefox", + "app_id": "firefox", + "focused": true, + "rect": {"x": 10, "y": 20, "width": 800, "height": 600}, + "nodes": [] + }] + }"#; + + let windows = parse_sway_tree(json).expect("parse sway tree"); + assert_eq!(windows.len(), 1); + assert_eq!(windows[0].window_id, 42); + assert_eq!(windows[0].backend, SWAY_BACKEND); + assert_eq!(windows[0].client_type.as_deref(), Some("wayland")); + assert!(windows[0].focused); + } +} \ No newline at end of file diff --git a/src/windowing/mod.rs b/src/windowing/mod.rs index b2f38de..0190554 100644 --- a/src/windowing/mod.rs +++ b/src/windowing/mod.rs @@ -6,7 +6,7 @@ pub mod types; #[allow(unused_imports)] pub use registry::{ COSMIC_WAYLAND_BACKEND, GNOME_SHELL_EXTENSION_BACKEND, GNOME_SHELL_INTROSPECT_BACKEND, - HYPRLAND_BACKEND, I3_BACKEND, KWIN_BACKEND, WINDOW_PERMISSION_HINT, + HYPRLAND_BACKEND, I3_BACKEND, KWIN_BACKEND, SWAY_BACKEND, WINDOW_PERMISSION_HINT, }; #[allow(unused_imports)] pub use target::{ @@ -54,6 +54,7 @@ mod tests { COSMIC_WAYLAND_BACKEND, KWIN_BACKEND, HYPRLAND_BACKEND, + SWAY_BACKEND, I3_BACKEND, ] ); diff --git a/src/windowing/registry.rs b/src/windowing/registry.rs index 6f4d9c3..94ec04c 100644 --- a/src/windowing/registry.rs +++ b/src/windowing/registry.rs @@ -1,4 +1,4 @@ -use crate::windowing::backends::{cosmic, gnome, hyprland, i3, kwin}; +use crate::windowing::backends::{cosmic, gnome, hyprland, i3, kwin, sway}; use crate::windowing::types::WindowInfo; use anyhow::{anyhow, Result}; @@ -7,8 +7,9 @@ pub use gnome::{GNOME_SHELL_EXTENSION_BACKEND, GNOME_SHELL_INTROSPECT_BACKEND}; pub use hyprland::HYPRLAND_BACKEND; pub use i3::I3_BACKEND; pub use kwin::KWIN_BACKEND; +pub use sway::SWAY_BACKEND; -pub const WINDOW_PERMISSION_HINT: &str = "Computer Use could not access a supported window list backend. Targeted window input requires session-bus access plus GNOME Shell Introspect, the computer-use-linux GNOME Shell extension, the COSMIC Wayland helper, KWin/Plasma DBus scripting, Hyprland hyprctl, or i3-msg. On GNOME, run setup_window_targeting to install the extension backend."; +pub const WINDOW_PERMISSION_HINT: &str = "Computer Use could not access a supported window list backend. Targeted window input requires session-bus access plus GNOME Shell Introspect, the computer-use-linux GNOME Shell extension, the COSMIC Wayland helper, KWin/Plasma DBus scripting, Hyprland hyprctl, swaymsg, or i3-msg. On GNOME, run setup_window_targeting to install the extension backend."; #[derive(Debug, Clone, Copy)] pub struct BackendDescriptor { @@ -36,6 +37,7 @@ enum BackendKind { Cosmic, Kwin, Hyprland, + Sway, I3, } @@ -45,6 +47,7 @@ const BACKEND_ORDER: &[BackendKind] = &[ BackendKind::Cosmic, BackendKind::Kwin, BackendKind::Hyprland, + BackendKind::Sway, BackendKind::I3, ]; @@ -84,6 +87,13 @@ const DESCRIPTORS: &[BackendDescriptor] = &[ missing_hint: "On Hyprland, ensure hyprctl is available in the session.", can_exact_focus: true, }, + BackendDescriptor { + id: SWAY_BACKEND, + failure_label: "Sway", + list_note: "Window list came from swaymsg. Terminal windows may include best-effort PTY and active-process context when xprop and the process tree are readable.", + missing_hint: "On Sway/wlroots, ensure swaymsg can reach the active sway IPC socket (SWAYSOCK).", + can_exact_focus: true, + }, BackendDescriptor { id: I3_BACKEND, failure_label: "i3", @@ -152,6 +162,7 @@ async fn list_windows_for(backend: BackendKind) -> Result> { BackendKind::Cosmic => cosmic::list_windows(), BackendKind::Kwin => kwin::list_windows().await, BackendKind::Hyprland => hyprland::list_windows(), + BackendKind::Sway => sway::list_windows(), BackendKind::I3 => i3::list_windows(), } } @@ -175,6 +186,7 @@ pub async fn activate_window(window: &WindowInfo) -> Result<()> { COSMIC_WAYLAND_BACKEND => cosmic::activate_window(window.window_id), KWIN_BACKEND => kwin::activate_window(window.window_id).await, HYPRLAND_BACKEND => hyprland::activate_window(window.window_id), + SWAY_BACKEND => sway::activate_window(window.window_id), I3_BACKEND => i3::activate_window(window.window_id), backend => Err(anyhow!( "Unsupported window backend for activation: {backend}" @@ -193,6 +205,7 @@ pub fn probe_backends() -> Vec { cosmic::probe(), kwin::probe(), hyprland::probe(), + sway::probe(), i3::probe(), ] } @@ -205,6 +218,7 @@ impl BackendKind { BackendKind::Cosmic => COSMIC_WAYLAND_BACKEND, BackendKind::Kwin => KWIN_BACKEND, BackendKind::Hyprland => HYPRLAND_BACKEND, + BackendKind::Sway => SWAY_BACKEND, BackendKind::I3 => I3_BACKEND, } }