A professional voice recording and AI-powered interview assistance application designed for software development interview preparation.
- High-quality voice recording with configurable audio settings
- Real-time speech transcription using OpenAI Whisper
- AI-powered interview responses via Ollama or compatible APIs
- Production-ready architecture with proper error handling and logging
- Configurable settings via INI file
- Professional UI with dark theme and responsive design
- Thread-safe operations with proper resource management
- Comprehensive logging for debugging and monitoring
- Python 3.8 or higher
- Audio input device (microphone)
- 4GB+ RAM (8GB recommended for larger Whisper models)
- GPU with CUDA support (optional, for faster transcription)
- Ollama or compatible AI service running locally
- PortAudio (for audio recording)
# Save the main application file as voice_assistant.py# Install required packages
pip install -r requirements.txt
# Optional: Install CUDA support for faster transcription
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118# PortAudio is usually included with sounddevice
# If you encounter issues, install Visual C++ Build Tools# Install using Homebrew
brew install portaudiosudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the required model
ollama pull llama3
# Start Ollama service
ollama serveThe application uses a config.ini file for configuration. On first run, it will create a default configuration file that you can customize.
sample_rate: Audio sample rate (default: 44100)channels: Number of audio channels (default: 1)dtype: Audio data type (default: int16)
model: Whisper model size (tiny, base, small, medium, large)device: Processing device (auto, cpu, cuda)
api_url: AI service endpointmodel: AI model nametimeout: Request timeout in seconds
level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)file: Log file path (optional)
python voice_assistant.py- Start Recording: Press SPACE or click the microphone button
- Stop Recording: Press SPACE again or click the stop button
- View Results: The popup window shows transcription and AI response
- Toggle Popup: Double-click the microphone button
- SPACE: Start/Stop recording
- Double-click mic button: Toggle popup window
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt# Copy and customize configuration
cp config.ini.template config.ini
# Edit config.ini with your settingsCreate a systemd service file:
[Unit]
Description=Voice Interview Assistant
After=network.target
[Service]
Type=simple
User=your_username
WorkingDirectory=/path/to/voice-assistant
ExecStart=/path/to/venv/bin/python voice_assistant.py
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target- Check logs in the configured log file
- Monitor system resources (CPU, memory, GPU)
- Set up log rotation for production environments
# Check available audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"
# Test microphone
python -c "import sounddevice as sd; import numpy as np; print('Recording...'); data = sd.rec(44100, samplerate=44100, channels=1); sd.wait(); print('Done')"- Ensure sufficient RAM is available
- Try a smaller model (tiny, base) if memory is limited
- Check CUDA installation for GPU acceleration
- Verify Ollama is running:
curl http://localhost:11434/api/tags - Check firewall settings
- Verify model availability:
ollama list
- Ensure tkinter is installed (usually included with Python)
- Check display settings for popup window positioning
- Verify window manager compatibility
- Use GPU acceleration (CUDA)
- Choose appropriate Whisper model size
- Optimize audio settings
- Use smaller Whisper models (tiny, base)
- Reduce audio buffer sizes
- Close popup when not needed
- ConfigManager: Handles application configuration
- AudioManager: Manages audio recording with thread safety
- AIService: Handles Whisper transcription and AI API calls
- VoiceInterviewAssistant: Main application controller
- UI Components: Professional GUI with responsive design
- All audio operations are thread-safe
- Proper resource cleanup on shutdown
- Graceful handling of interruptions
- Comprehensive exception handling
- Graceful degradation on errors
- User-friendly error messages
- Detailed logging for debugging
- Audio data is processed locally (privacy-first)
- Temporary files are cleaned up automatically
- No sensitive data is stored permanently
- API calls use session management
- Monitor log files for errors
- Check AI service availability
- Verify audio device connectivity
- Monitor system resources
- Regularly update dependencies
- Monitor Whisper model updates
- Check Ollama service updates
- Review and rotate log files
When contributing to the production version:
- Follow Python PEP 8 style guidelines
- Add comprehensive error handling
- Include logging for debugging
- Write unit tests for new features
- Update configuration documentation
- Test on multiple platforms
This production-ready version includes enterprise-grade features and should be used according to your organization's software licensing policies.
For production deployment support:
- Check logs for detailed error information
- Verify all dependencies are correctly installed
- Test individual components (audio, transcription, AI service)
- Monitor system resources during operation
Production Notes: This version includes comprehensive error handling, logging, configuration management, and thread safety suitable for production environments. Always test thoroughly in your specific environment before deployment.