A lightweight, system-tray based application that provides real-time voice-to-text transcription using OpenAI's Whisper API. Simply hold F4, speak, and release to have your speech transcribed directly to your cursor position.
- 🎙️ Real-time voice recording with automatic silence detection
- 🔄 Instant transcription using OpenAI's Whisper API
- 🌐 Support for multiple languages (English and Portuguese)
- 🖥️ System tray integration for easy access
- ⌨️ Hotkey support (F4) for quick recording
- 📝 Direct text insertion at cursor position
- 🎯 Minimal CPU usage when idle
- Python 3.7 or higher
- Linux with GTK3 support
- ALSA/PulseAudio for audio capture
- OpenAI API key
-
Clone the repository:
git clone https://github.com/yourusername/voice-to-text.git cd voice-to-text -
Create a
.envfile with your OpenAI API key:echo "OPENAI_API_KEY=your_api_key_here" > .env
-
Run the installation script:
sudo ./install.sh
The installation script will:
- Install required system dependencies
- Set up a Python virtual environment
- Install Python package dependencies
- Configure audio settings
- Create a desktop entry for easy access
-
Start the application:
./run.sh
Or launch it from your applications menu.
-
Look for the green icon in your system tray.
-
To transcribe:
- Hold F4
- Speak clearly
- Release F4
- The transcribed text will appear at your cursor position
-
Additional features:
- Right-click the tray icon for options
- Switch between languages from the tray menu
- Click "About" for version information
- Use "Exit" to close the application
The application can be configured by modifying src/voice_to_text/config.py:
- Audio settings (sample rate, channels, chunk size)
- Recording thresholds and durations
- Language preferences
- UI customization
The application is structured into several key components:
src/voice_to_text/
├── audio/
│ ├── recorder.py # Audio capture and processing
│ └── transcriber.py # Whisper API integration
├── ui/
│ ├── tray_icon.py # System tray interface
│ ├── text_output.py # Text insertion handling
│ └── keyboard_handler.py # Hotkey management
└── config.py # Application configuration
-
No audio input detected
- Check your microphone permissions
- Verify ALSA/PulseAudio configuration
- Run
alsamixerto check input levels
-
Transcription not appearing
- Ensure your OpenAI API key is valid
- Check internet connectivity
- Verify cursor is in a text-editable area
-
System tray icon not showing
- Ensure GTK3 is properly installed
- Check if your desktop environment supports system trays
Application logs are stored in voice_to_text.log. Enable debug logging by modifying the logging level in run.py.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for the transcription API
- PyAudio for audio handling
- pystray for system tray integration
If you encounter any issues or have questions, please:
- Check the Troubleshooting section
- Look through existing Issues
- Create a new issue if needed
Made with ❤️ by Rodrigo Werneck @rodrigowf ou rodrigowf.github.io in partnership with CursorIDE powered by Claude-3.5-sonnet