Modernized AI-powered CSV analysis — chat with, summarize, and analyze your CSV files using OpenAI, Anthropic, or a local Ollama model. Built for Streamlit Cloud, local laptops, and future API split.
This is the v2 rewrite of Safiullah-Rahu/CSV-AI. The product idea is unchanged; the architecture is modular, the AI stack is provider-agnostic, and the UI is a clean modern dashboard.
- 💬 Chat — schema- and sample-aware Q&A with token-streaming.
- 📝 Summarize — single-call structured overview (replaces the old map-reduce flow).
- 📊 Analyze — deterministic pandas stats + LLM analyst narrative side-by-side, with charts, missingness, and correlations.
- 🔌 Multi-provider — OpenAI, Anthropic Claude, or local Ollama.
- 🎛️ Modern UI — sidebar nav,
st.chat_message, light/dark friendly, custom CSS polish. - 🧱 Modular — clean
app/package; no Streamlit imports in services, so a FastAPI layer can be added later.
git clone https://github.com/Safiullah-Rahu/CSV-AI.git
cd CSV-AI
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # then add your API keys
streamlit run streamlit_app.pyOpen http://localhost:8501 and upload a CSV.
CSV-AI loads settings from (in order): environment variables → .env file → Streamlit secrets. See .env.example and .streamlit/secrets.toml.example.
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
— | Required if using OpenAI. |
ANTHROPIC_API_KEY |
— | Required if using Anthropic. |
OLLAMA_BASE_URL |
http://localhost:11434 |
Local Ollama endpoint. |
DEFAULT_PROVIDER |
openai |
One of openai, anthropic, ollama. |
DEFAULT_MODEL |
gpt-4o-mini |
Used until the user picks one in the sidebar. |
DEFAULT_TEMPERATURE |
0.2 |
0.0–1.5. |
DEFAULT_MAX_TOKENS |
1024 |
Response length cap. |
app/
├── config/ # pydantic-settings (env + secrets)
├── llm/ # provider-agnostic LLM interface + OpenAI / Anthropic / Ollama
├── data/ # CSV loader, profiler, sampler, prompt-context builder
├── prompts/ # versioned system prompts
├── services/ # ChatService, SummaryService, AnalysisService (UI-free)
├── ui/ # Streamlit pages + components + theme + session state
└── utils/ # logging, errors, token counting
tests/ # pytest suite (loader, profiler, factory)
streamlit_app.py # Streamlit Cloud entry point
See ARCHITECTURE.md for the rationale behind each layer.
- Streamlit Community Cloud — point it at
streamlit_app.py, add keys toSecrets. - Local —
streamlit run streamlit_app.py. - Docker —
docker compose up --build(uses.env). - Future API split — services are pure-Python; a FastAPI layer is a small adapter.
Full instructions in DEPLOYMENT.md.
pip install -r requirements-dev.txt
pytest # run tests
ruff check . # lint
black . # format| v1 | v2 | |
|---|---|---|
| Architecture | 278-line app.py |
modular app/ package |
| LLM | OpenAI only, via LangChain | OpenAI · Anthropic · Ollama via thin native SDKs |
| Imports | langchain.chat_models, langchain.embeddings (deprecated) |
current SDKs |
| Chat context | FAISS retrieval over CSV chunks | schema + smart sample (cheaper, more accurate) |
| Summarize | LangChain load_summarize_chain(map_reduce) |
single structured prompt |
| Analyze | create_pandas_dataframe_agent only |
deterministic pandas stats + LLM narrative |
| UI | one selectbox of "functionality" |
sidebar nav + tabbed stats + theme polish |
| Config | os.environ inline |
pydantic-settings |
| Tests | none | pytest suite |
| Docker | none | Dockerfile + compose |
MIT — see LICENSE.