Kratosni is a sophisticated, end-to-end streaming conversational AI built as the final project for the #30DaysofVoiceAgents challenge by Murf AI. It features a complete voice-in, voice-out pipeline with a polished UI, client-side configuration, and special skills powered by function calling.
- Real-time Voice Conversation: Speak to the agent and receive a spoken response with low latency.
- Intelligent Persona: Interacts with the witty and helpful personality of "Kratosni."
- Advanced Special Skills: Uses Google Gemini's function calling to:
- 📈 Fetch live stock prices.
- 💱 Perform real-time currency conversions.
- Polished UI: A professional, dark-themed interface with a persistent chat history display.
- Client-Side API Key Management: A settings modal allows users to securely enter their own API keys, which are stored in the browser's local storage.
- Natural Interruptions: The agent immediately stops speaking and starts listening the moment the user begins to speak, allowing for fluid conversation.
- End-to-End Streaming Pipeline:
- Speech-to-Text: Live transcription via AssemblyAI.
- LLM Logic: Contextual understanding and tool use by Google Gemini.
- Text-to-Speech: High-quality, streaming voice output from Murf AI.
The application uses a decoupled frontend-backend architecture:
- Frontend (Vanilla JS): A single-page application that handles all user interaction. It performs client-side audio conversion to PCM using the Web Audio API, manages the UI, and communicates with the backend via a single WebSocket connection.
- Backend (Python/FastAPI): An asynchronous API server that orchestrates the entire AI pipeline. It receives the PCM audio stream from the client and manages the real-time, bidirectional communication with all external AI services.
- Backend: Python, FastAPI, Uvicorn, WebSockets
- Frontend: HTML5, CSS3, JavaScript (Web Audio API)
- Python Libraries:
google-generativeai,assemblyai,websockets,requests - Services: Murf AI, AssemblyAI, Google Gemini, Alpha Vantage (Stocks), ExchangeRate-API (Currency)
The easiest way to try Kratosni is to use the live version deployed on Render.
If you wish to run the project on your local machine, follow these instructions.
Prerequisites
- Python 3.8+
- An active internet connection
-
Clone the repository:
git clone [https://github.com/topdev22/AI_voice_agent.git](https://github.com/topdev22/AI_voice_agent.git) cd Voice_Agent -
Create and activate a virtual environment:
# Windows python -m venv venv .\venv\Scripts\activate # macOS / Linux python3 -m venv venv source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
- Run the FastAPI server from the root directory:
uvicorn main:app --reload
- Open your web browser and navigate to
http://127.0.0.1:8000. - Click the ⚙️ Settings icon in the UI to enter your API keys for all the required services. The keys will be saved in your browser.
- Click the microphone button to start a conversation.