The model proposes, the code disposes.
The model holds the conversation. Tested code makes every decision that matters.
I build retrieval, agents, MCP servers, voice, and automation with that one rule at the center. The flagships ship deterministic test suites that need no API key, show captured output from actual runs, and come with an architecture diagram; every README is honest about the trade-offs. Twenty-one public projects, more than 200 deterministic tests, one rule.
The rule, running · Flagships · Retrieval and RAG · Agents and tools · Voice and automation · Evaluation and quality · Data engineering and extraction · Stack · How I work
Three scenes on rotation, all from captured runs: web-pilot's guardrails blocking a credential field and an off-site jump (examples/safety_demo.py), rag-chat abstaining below its measured BM25 floor, and doc-eval's gate failing a candidate that improved the headline number while losing a field. Each one is reproducible from its repo.
bedrock · rag-chat · web-pilot · voice-receptionist
Citations you can click, an honest "I don't know", and retrieval quality that is measured, not assumed.
source-of-truth
Centralizes scattered sources (markdown, FAQ, HTML, plaintext) into one queryable base that flags where two sources disagree, marks stale sources, and abstains honestly. Deterministic, no API key.
rag-chat
Chat-with-your-docs widget: grounded answers with clickable citations and a working chat UI. Key-free demo mode.
rag-quality
A RAG pipeline with its own eval harness (hit@k, MRR, recall@k): BM25, dense, and RRF hybrid retrieval, scored against a labeled corpus.
mini-rag
A small, working RAG API on FastAPI, Chroma, and Claude. The clean baseline.
Agents that can only do what their tool layer allows.
web-pilot
Browser-use agent built on the house rule: the model proposes one action, code disposes. Closed action vocabulary, guardrails (domain allowlist, no credential or payment fields, step budget), full audit trace.
multi-agent-analyst
Planner, self-correcting SQL agent, BM25 retriever, and a verifier crew. Cites evidence or honestly refuses.
sql-agent
Plain English to SQL, with a self-correction loop, on a read-only connection.
mcp-listings
MCP server exposing a property-listings dataset to Claude as typed tools.
Conversation up front, a tested service underneath, and a human whenever confidence drops.
voice-receptionist
AI phone receptionist over Twilio: books against a live calendar, abstains on policy questions, logs every call.
retell-sms-handler
Makes a Retell voice agent's In-Call SMS fire deterministically through a Conversation Flow function node: consent gated, the destination resolved in code, every send logged. FastAPI and Twilio.
n8n-lead-triage
n8n routes, a tested HTTP service decides. Low confidence or any model failure routes to a human, never to silence.
grounded-copy
Generates on-brand product copy from your catalog, then checks every claim back against it: an invented price, spec, discount or material is flagged for review, nothing fabricated ships. Deterministic, key-free core.
A regression gets blocked before it ships, not discovered after.
bedrock
A natural-language-to-SQL data agent with a stability harness: it runs each question K times against a defended answer key, flags the ones that flap, and a CI gate blocks a candidate when reliability regresses. Deterministic, key-free demo.
doc-eval
Field-level evaluation and a CI release gate for LLM document extraction.
The layer underneath everything above: scrape, parse, validate, schedule.
Not a screenshot: live numbers from the Central Bank of Brazil, refreshed on weekdays by a scheduled pipeline in this repo, committed only when the data changes. The commit history is the proof.
record-refinery
Raw business records in, a clean deduplicated dataset plus a QA brief out. Email, phone, and url validated in code, with an optional model-proposed canonical name for fuzzy dedup.
pdf-extract
PDFs into validated structured JSON, with pluggable OCR.
docs-to-markdown
Any documentation or marketing site into a clean Markdown corpus, ready for RAG.
ai-watcher
Watches RSS feeds and turns new articles into structured AI summaries with Claude.
bcb-data-pipeline
Daily ETL of Brazilian macro indicators from the central bank API, on pandas and GitHub Actions.
web-scraper
A Playwright scraper, SQLite storage, and a Streamlit dashboard, end to end.
Core Python · FastAPI · Pydantic · pytest
Retrieval Chroma · BM25 + dense embeddings · SQLite
Models Anthropic Claude API · MCP (Model Context Protocol)
Extraction Playwright · pandas · Tesseract OCR · AWS Textract (optional adapter)
Automation n8n · Twilio · Retell · Streamlit
Full-stack web TypeScript · JavaScript · React · Next.js · Vue · Node.js · Vercel · Supabase
Ops Docker · GitHub Actions
| Grounded | Tested | Honest |
|---|---|---|
| Answers cite their sources and refuse when the evidence is not there. | Deterministic suites that run with no API key and no network; CI gates catch regressions. | Trade-offs go in the README: rag-quality ships the eval showing hybrid retrieval was not a free win on its corpus. |
"completed the work ahead of schedule and with accuracy. I would hire him again."
recent client review
Open to AI engineering, RAG, agents, and data work.
Based in Brazil, working with clients worldwide.