This repository provides an interface for ROS 2 navigation utilizing voice commands. The system processes natural language input via Google's Gemini 2.5 Flash model and visualizes the robot's state on a React-based dashboard.
The interaction pipeline consists of four primary components:
- Frontend: React application built with Vite, utilizing
MediaRecorderfor audio capture. - Backend: FastAPI server that coordinates API requests.
- Speech processing: OpenAI Speech-to-Text transcribes the audio input.
- Command synthesis: Gemini 2.5 Flash interpolates the transcript and outputs structured ROS 2
Twistmessages. - Execution: The
rosbridge_servertransmits the parsed commands to the ROS 2 Humble environment (Nav2, Cartographer, AMCL).
The system includes a regex-based fallback parser to ensure operational continuity during API service interruptions.
graph TD
User((User)) -->|Voice| Frontend[React Dashboard]
Frontend -->|Audio Blob| Backend[FastAPI Server]
Backend -->|STT| OpenAI[OpenAI Speech-to-Text]
OpenAI -->|Transcript| Backend
Backend -->|Gemini 2.5 Flash| AI[Google AI Studio]
AI -->|JSON Action| Backend
Backend -->|Response| Frontend
Frontend -->|WebSocket| Bridge[rosbridge_server]
Bridge -->|/cmd_vel| Robot[ROS 2 Robot/Sim]
- Ubuntu 22.04 LTS (or WSL2)
- ROS 2 Humble Hawksbill (Desktop Install)
- Node.js (or Bun)
- Python 3.10+
-
Repository mapping
git clone https://github.com/howdoiusekeyboard/ros2_navigation_project.git cd ros2_navigation_project -
Environment configuration Populate the backend environment file with required API keys.
cp backend/.env.example backend/.env # Add OPENAI_API_KEY and GEMINI_API_KEY to the .env file -
System initialization Execute the provided bash script to instantiate the simulation, backend server, and frontend dashboard concurrently.
./start_robot_dashboard.sh
-
Interface access The dashboard hosts on
http://localhost:5173. A Chromium-based browser is required for fullMediaRecordercompatibility.
- Verify connection state via the dashboard ("Connected to ROS 2").
- Initiate voice capture utilizing the interface microphone control.
- Issue spatial or directional commands (e.g., "rotate left 90 degrees", "proceed forward 2 meters", "halt").
- Alternatively, use the text input field for manual command insertion.
src/: ROS 2 packages integrating Cartographer and Nav2 configurations.backend/: Python backend utilizing FastAPI.project/: React-based interactive dashboard.scripts/: Operational scripts for initialization and debugging.
- SETUP.md: Comprehensive environment preparation guide.
- RECOVERY.md: Guidelines for restoring the system from failure states.
This project operates under the MIT License. Reference the LICENSE file for exact parameters.