Voice-to-Text Transcription Tool

A lightweight, system-tray based application that provides real-time voice-to-text transcription using OpenAI's Whisper API. Simply hold F4, speak, and release to have your speech transcribed directly to your cursor position.

Features

🎙️ Real-time voice recording with automatic silence detection
🔄 Instant transcription using OpenAI's Whisper API
🌐 Support for multiple languages (English and Portuguese)
🖥️ System tray integration for easy access
⌨️ Hotkey support (F4) for quick recording
📝 Direct text insertion at cursor position
🎯 Minimal CPU usage when idle

Requirements

Python 3.7 or higher
Linux with GTK3 support
ALSA/PulseAudio for audio capture
OpenAI API key

Installation

Clone the repository:

git clone https://github.com/yourusername/voice-to-text.git
cd voice-to-text

Create a .env file with your OpenAI API key:

echo "OPENAI_API_KEY=your_api_key_here" > .env

Run the installation script:
```
sudo ./install.sh
```

The installation script will:

Install required system dependencies
Set up a Python virtual environment
Install Python package dependencies
Configure audio settings
Create a desktop entry for easy access

Usage

Start the application:
```
./run.sh
```
Or launch it from your applications menu.
Look for the green icon in your system tray.
To transcribe:
- Hold F4
- Speak clearly
- Release F4
- The transcribed text will appear at your cursor position
Additional features:
- Right-click the tray icon for options
- Switch between languages from the tray menu
- Click "About" for version information
- Use "Exit" to close the application

Configuration

The application can be configured by modifying src/voice_to_text/config.py:

Audio settings (sample rate, channels, chunk size)
Recording thresholds and durations
Language preferences
UI customization

Architecture

The application is structured into several key components:

src/voice_to_text/
├── audio/
│   ├── recorder.py    # Audio capture and processing
│   └── transcriber.py # Whisper API integration
├── ui/
│   ├── tray_icon.py   # System tray interface
│   ├── text_output.py # Text insertion handling
│   └── keyboard_handler.py # Hotkey management
└── config.py          # Application configuration

Troubleshooting

Common Issues

No audio input detected
- Check your microphone permissions
- Verify ALSA/PulseAudio configuration
- Run alsamixer to check input levels
Transcription not appearing
- Ensure your OpenAI API key is valid
- Check internet connectivity
- Verify cursor is in a text-editable area
System tray icon not showing
- Ensure GTK3 is properly installed
- Check if your desktop environment supports system trays

Logs

Application logs are stored in voice_to_text.log. Enable debug logging by modifying the logging level in run.py.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI Whisper for the transcription API
PyAudio for audio handling
pystray for system tray integration

Support

If you encounter any issues or have questions, please:

Check the Troubleshooting section
Look through existing Issues
Create a new issue if needed

Made with ❤️ by Rodrigo Werneck @rodrigowf ou rodrigowf.github.io in partnership with CursorIDE powered by Claude-3.5-sonnet

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
_old		_old
docs		docs
src		src
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
install.sh		install.sh
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh
voice-to-text.desktop		voice-to-text.desktop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-to-Text Transcription Tool

Features

Requirements

Installation

Usage

Configuration

Architecture

Troubleshooting

Common Issues

Logs

Contributing

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Text Transcription Tool

Features

Requirements

Installation

Usage

Configuration

Architecture

Troubleshooting

Common Issues

Logs

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages