A deterministic, multi-agent system for robust mathematical problem solving.
Math Mentor is an autonomous reasoning system designed to solve high-school and undergraduate level mathematics problems with high reliability. Unlike varied "chat" interfaces, this application decouples semantic understanding from deterministic computation.
The system accepts multimodal inputs (text, image, audio) and employs a Human-in-the-Loop (HITL) workflow to handle ambiguity before it propagates to the solver.
- Separation of Concerns: LLMs are excellent at planning and translation but poor at arithmetic. This system uses Gemini 2.0 Flash solely for semantic understanding and code generation, while deterministic solvers handle computation.
- Reflexion: A unified Orchestrator manages a feedback loop where failures in solution verification trigger immediate introspection and strategy adjustment, rather than silent failure.
- Episodic Memory: The system persists successful solution patterns. When faced with a new problem, it retrieves semantically similar past successes to guide its current strategy (Self-Learning).
The application is structured as a pipeline of specialized agents coordinated by a central Orchestrator.
- OCR Engine: Uses Google Gemini Vision (Program-of-Thought prompting) to transcribe mathematical images into structured text. Includes transparency handling and contrast optimization.
- ASR Engine: Uses Google Cloud Speech-to-Text v2 (Chirp 2) to transcribe spoken mathematical queries with state-of-the-art accuracy, converting natural speech into formal problem statements.
- Parser Agent: Normalizes raw input into a structured schema, identifying variables, constraints, and specifically "what needs to be solved". Flags ambiguous input for user clarification.
- Router Agent: Classifies intent (e.g., Algebra, Probability, Calculus) to select the optimal solving strategy and filter the knowledge base.
- Solver Agent: The core reasoning engine. It adopts a "Program-of-Thought" approach, generating SymPy code to solve problems deterministically. It integrates RAG to access mathematical laws and reference material.
- Verifier Agent: A "Judge" model that validates solutions by:
- Numerical Substitution (plugging answers back into equations).
- Conceptual Sanity Checks (validating units, domains, and bounds).
- Explainer (DeckGen + Solver): Rather than a redundant text summarizer, the system uses a specialized Visual Deck Generator. This component takes the Solver's logical trace and transforms it into a step-by-step visual explanation.
- Vector Store (FAISS): Stores embeddings of past interactions.
- Relational DB (SQLite): Logs full conversation history, user feedback, and verification states.
flowchart TD
%% Nodes
User([User])
UI[Frontend Interface]
Orchestrator{Orchestrator}
DeckGen["Visual Explainer\n(Deck Generator)"]
subgraph Perception["Perception Layer"]
OCR[OCR Engine]
ASR[ASR Engine]
end
subgraph Agents["Cognitive Layer"]
Parser[Parser Agent]
Router[Router Agent]
Solver[Solver Agent]
Verifier[Verifier Agent]
end
subgraph Memory["Memory System"]
RAG[(Knowledge Base)]
History[(Episodic Memory)]
end
%% Edges
User <--> UI
UI -->|Image| OCR
UI -->|Audio| ASR
UI -->|Text| Orchestrator
OCR --> Orchestrator
ASR --> Orchestrator
%% Core Loop
Orchestrator --> Parser
Parser -->|Structured JSON| Orchestrator
Orchestrator --> Router
Router -->|Strategy| Orchestrator
Orchestrator -->|Problem + Context| Solver
Solver <-->|Retrieve Similar| History
Solver <-->|Retrieve Knowledge| RAG
Solver -->|Python Code| Verifier
Verifier -->|Substitution Check| Orchestrator
Orchestrator -.->|Reflexion Retry| Solver
Orchestrator -->|Verified Trace| DeckGen
DeckGen -->|Visual Deck| UI
- Frontend: Streamlit (Python)
- LLM Orchestration: Google Gemini 2.0 Flash & Pro
- Symbolic Math: SymPy, NumPy
- Vector Search: FAISS
- Backend Framework: Python 3.10+
- Python 3.11 or higher
- Google Gemini API Key (required)
- Google Cloud credentials (optional, for audio transcription)
-
Clone the repository
git clone <repository_url> cd math-mentor
-
Install dependencies It is recommended to use a virtual environment.
# Using uv (recommended) uv sync # Or using pip pip install -r requirements.txt
-
Configure Environment Copy
.env.exampleto.envand configure:cp .env.example .env
Minimum configuration (text input only):
GEMINI_API_KEY=your_gemini_api_key_here
Full configuration (audio + vision):
GEMINI_API_KEY=your_gemini_api_key_here GOOGLE_APPLICATION_CREDENTIALS=path/to/gcp-credentials.json GCP_PROJECT_ID=your_project_id STT_LOCATION=global STT_RECOGNIZER=_
-
Run the Application
# Using uv uv run streamlit run frontend/app.py # Or with venv activated streamlit run frontend/app.py
- Select Input Mode: Use the sidebar to switch between Text, Image Upload, or Audio Recording.
- Submit Problem: Enter the math problem. The system will first parse and validate the input.
- Review Plan: The "Thinking Process" expander visualizes the real-time agent workflow (Parsing -> Retrieval -> Planning -> Verification).
- Verify & Explain: The final output includes the computed answer, a step-by-step derivation, and (where applicable) dynamic visual aids.
- Feedback: Use the Thumbs Up/Down and "Edit" buttons to provide feedback, which is stored to improve future performance.