Skip to content
View vinimabreu's full-sized avatar
This profile updates itself on weekdays
This profile updates itself on weekdays
  • Brazil

Block or report vinimabreu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vinimabreu/README.md

Vinicius Pereira. AI systems that are grounded, tested, and honest.

The model proposes, the code disposes.
The model holds the conversation. Tested code makes every decision that matters.

I build retrieval, agents, MCP servers, voice, and automation with that one rule at the center. The flagships ship deterministic test suites that need no API key, show captured output from actual runs, and come with an architecture diagram; every README is honest about the trade-offs. Twenty-one public projects, more than 200 deterministic tests, one rule.

Python FastAPI Pydantic pytest Playwright pandas Tesseract OCR Claude MCP (Model Context Protocol) Chroma SQLite Docker n8n Twilio Retell Streamlit GitHub Actions TypeScript JavaScript React Next.js Vue.js Node.js Vercel Supabase

The rule, running · Flagships · Retrieval and RAG · Agents and tools · Voice and automation · Evaluation and quality · Data engineering and extraction · Stack · How I work

The house rule, running

Three captured scenes on rotation: web-pilot's guardrails block a password field and an off-site jump, rag-chat abstains honestly below its measured retrieval floor, and doc-eval's gate blocks a release whose headline number improved.

Three scenes on rotation, all from captured runs: web-pilot's guardrails blocking a credential field and an off-site jump (examples/safety_demo.py), rag-chat abstaining below its measured BM25 floor, and doc-eval's gate failing a candidate that improved the headline number while losing a field. Each one is reproducible from its repo.

Flagships

bedrock: a NL-to-SQL data agent that proves it answers the same right thing every time rag-chat: chat with your docs, grounded answers with clickable citations web-pilot: a browser-use agent with guardrails enforced in code voice-receptionist: an AI phone receptionist that books appointments against a live calendar

bedrock · rag-chat · web-pilot · voice-receptionist

Retrieval and RAG

Citations you can click, an honest "I don't know", and retrieval quality that is measured, not assumed.

source-of-truth
Centralizes scattered sources (markdown, FAQ, HTML, plaintext) into one queryable base that flags where two sources disagree, marks stale sources, and abstains honestly. Deterministic, no API key.

rag-chat
Chat-with-your-docs widget: grounded answers with clickable citations and a working chat UI. Key-free demo mode.

rag-quality
A RAG pipeline with its own eval harness (hit@k, MRR, recall@k): BM25, dense, and RRF hybrid retrieval, scored against a labeled corpus.

mini-rag
A small, working RAG API on FastAPI, Chroma, and Claude. The clean baseline.

Agents and tools

Agents that can only do what their tool layer allows.

web-pilot
Browser-use agent built on the house rule: the model proposes one action, code disposes. Closed action vocabulary, guardrails (domain allowlist, no credential or payment fields, step budget), full audit trace.

multi-agent-analyst
Planner, self-correcting SQL agent, BM25 retriever, and a verifier crew. Cites evidence or honestly refuses.

sql-agent
Plain English to SQL, with a self-correction loop, on a read-only connection.

mcp-listings
MCP server exposing a property-listings dataset to Claude as typed tools.

Voice and automation

Conversation up front, a tested service underneath, and a human whenever confidence drops.

voice-receptionist
AI phone receptionist over Twilio: books against a live calendar, abstains on policy questions, logs every call.

retell-sms-handler
Makes a Retell voice agent's In-Call SMS fire deterministically through a Conversation Flow function node: consent gated, the destination resolved in code, every send logged. FastAPI and Twilio.

n8n-lead-triage
n8n routes, a tested HTTP service decides. Low confidence or any model failure routes to a human, never to silence.

grounded-copy
Generates on-brand product copy from your catalog, then checks every claim back against it: an invented price, spec, discount or material is flagged for review, nothing fabricated ships. Deterministic, key-free core.

Evaluation and quality

A regression gets blocked before it ships, not discovered after.

bedrock
A natural-language-to-SQL data agent with a stability harness: it runs each question K times against a defended answer key, flags the ones that flap, and a CI gate blocks a candidate when reliability regresses. Deterministic, key-free demo.

doc-eval
Field-level evaluation and a CI release gate for LLM document extraction.

Data engineering and extraction

The layer underneath everything above: scrape, parse, validate, schedule.

Live Central Bank of Brazil indicators rendered by this repo's scheduled pipeline

Not a screenshot: live numbers from the Central Bank of Brazil, refreshed on weekdays by a scheduled pipeline in this repo, committed only when the data changes. The commit history is the proof.

record-refinery
Raw business records in, a clean deduplicated dataset plus a QA brief out. Email, phone, and url validated in code, with an optional model-proposed canonical name for fuzzy dedup.

pdf-extract
PDFs into validated structured JSON, with pluggable OCR.

docs-to-markdown
Any documentation or marketing site into a clean Markdown corpus, ready for RAG.

ai-watcher
Watches RSS feeds and turns new articles into structured AI summaries with Claude.

bcb-data-pipeline
Daily ETL of Brazilian macro indicators from the central bank API, on pandas and GitHub Actions.

web-scraper
A Playwright scraper, SQLite storage, and a Streamlit dashboard, end to end.

Stack

Core  Python · FastAPI · Pydantic · pytest
Retrieval  Chroma · BM25 + dense embeddings · SQLite
Models  Anthropic Claude API · MCP (Model Context Protocol)
Extraction  Playwright · pandas · Tesseract OCR · AWS Textract (optional adapter)
Automation  n8n · Twilio · Retell · Streamlit
Full-stack web  TypeScript · JavaScript · React · Next.js · Vue · Node.js · Vercel · Supabase
Ops  Docker · GitHub Actions

How I work

Grounded Tested Honest
Answers cite their sources and refuse when the evidence is not there. Deterministic suites that run with no API key and no network; CI gates catch regressions. Trade-offs go in the README: rag-quality ships the eval showing hybrid retrieval was not a free win on its corpus.

"completed the work ahead of schedule and with accuracy. I would hire him again."

recent client review

Open to AI engineering, RAG, agents, and data work.
Based in Brazil, working with clients worldwide.

Pinned Loading

  1. doc-eval doc-eval Public

    Field-level evaluation and release gate for LLM document extraction: exact golden set, per-field precision/recall, CI gate that blocks regressions

    Python

  2. multi-agent-analyst multi-agent-analyst Public

    A multi-agent crew that answers business questions no single source can: a planner routes work to a self-correcting SQL agent and a BM25 doc retriever, a verifier feeds objections back, and every a…

    Python

  3. n8n-lead-triage n8n-lead-triage Public

    AI lead triage in n8n: the workflow routes, a tested HTTP service decides, and low confidence or any model failure routes to a human instead of dropping the lead

    Python

  4. rag-chat rag-chat Public

    Chat-with-your-docs widget: grounded answers from your documents with clickable citations, an honest "I don't know", and a real chat UI. Pure-Python retrieval, runs key-free in demo mode.

    Python

  5. voice-receptionist voice-receptionist Public

    An AI phone receptionist for a multi-location business: books against a live agenda over Twilio, answers policy questions with abstention, hands off to humans, and logs every call. The model propos…

    Python

  6. web-pilot web-pilot Public

    A browser-use agent where the model proposes one action and code disposes: closed action vocabulary, guardrails (domain allowlist, no credential/payment fields, step budget), full audit trace. Runs…

    Python