Control Android devices programmatically. Tap, swipe, type, launch apps, and automate UI interactions with structured, agent-friendly output.
🤖 Agents, start here: Getting Started Guide
What: A semantic layer over ADB. Simplifies and standardizes device interactions into consistent, predictable commands that agents can reliably parse and execute.
Why: Raw ADB outputs unstructured text. This tool returns structured JSON with consistent error handling, screen coordinates parsed from accessibility trees, and clear success/failure states. Screen dumps are compact and token-efficient (~50-200 tokens vs thousands for raw XML), perfect for LLM agents with context limits.
How: Perception → Action loop. Get the UI state as structured data, reason about it, execute the next action. Repeat.
| Raw ADB | android-use |
|---|---|
adb shell dumpsys window windows + parsing |
android-use get-screen → structured JSON |
adb shell input tap 540 960 |
android-use tap 540 960 with validation |
| Exit code 0 or manual string checking | Typed results: {success: true, data: {...}} |
curl -fsSL https://raw.githubusercontent.com/iurysza/android-use/main/install.sh | bashPrerequisites: ADB installed, Android device with USB debugging enabled.
android-use check-device # List devices
android-use get-screen # Get UI with tap coordinates
android-use tap 540 960 # Tap at coordinates
android-use type-text "Hello" # Type text
android-use launch-app com.android.chrome # Launch app- Agent Setup Guide - Complete setup and usage guide
- Examples - Tutorials and common patterns
- Changelog
MIT
