Skip to content
View MateusRestier's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Highlights

  • Pro

Block or report MateusRestier

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
MateusRestier/README.md

Typing SVG


๐Ÿง  About Me

I'm a Data Scientist based in Rio de Janeiro, Brazil ๐Ÿ‡ง๐Ÿ‡ท. Currently, I work at FGV IBRE, focusing on Generative AI, Machine Learning, and Data Engineering within the AWS ecosystem. My work involves building advanced NLP pipelines, developing classification models, and optimizing cloud data processes using Snowflake and Streamlit.

I hold a B.Sc. in Computer Science and have a strong background in automating business processes and BI from my previous experiences at Bagaggio and Enel.

Fun fact: Before diving into data, I was a professional e-sports player, a journey that sharpened my resilience, strategic thinking, and ability to perform under high pressure ๐ŸŽฎ.


๐Ÿ› ๏ธ Tech Stack

Languages & Frameworks

Python SQL FastAPI LangChain LangGraph Streamlit Scikit-Learn Pandas Selenium

Cloud & Data Engineering

AWS Snowflake DuckDB Dagster Terraform Docker GitHub Actions Power BI Git

Vector Stores & AI Infra

pgvector Qdrant Google Gemini Azure OpenAI Google Earth Engine


๐Ÿš€ Featured Projects

๐Ÿ”น Automated Economic Releases โ€” automated-economic-releases ๐Ÿ”’

Streamlit platform that automates press release generation for 15+ FGV IBRE economic indicators. Ingests Excel data via pandas/openpyxl, assembles few-shot prompts from historical bulletins, and calls Azure OpenAI (GPT-4o) to generate narratives. Final output is rendered as a formatted Word (.docx) via docxtpl (Jinja2) and logs downloads to Snowflake. Optionally enriches context via a RAG API before generation.

๐Ÿ”น Document RAG System โ€” document-vector-pipeline ยท document-rag-framework ๐Ÿ”’

End-to-end semantic search and RAG system for Portuguese-language journalism and economic articles. The pipeline fetches from MongoDB/AWS DocumentDB, cleans noisy text with spaCy (pt_core_news_sm) + NLTK, generates dense embeddings via sentence-transformers, and stores them in Qdrant with SHA-256 deduplication and checkpoint/resume support. A LangGraph-orchestrated framework then enriches economic analysis text by querying Qdrant across multiple collections, scoring chunk relevance with an LLM, and integrating context via Azure OpenAI (GPT-4o) โ€” with configurable quality thresholds and YAML prompt overrides. Built for AWS SageMaker deployment.

๐Ÿ”น COPOM RAG System โ€” copom-vector-pipeline ยท copom-rag-api ยท copom-streamlit ๐Ÿ”“

๐ŸŒ Live app โ€” End-to-end RAG system for Q&A over COPOM (Brazilian Monetary Policy Committee) documents. The pipeline downloads meeting minutes and communications from the BCB Open Data API, parses PDFs with pdfplumber, chunks with LangChain + tiktoken, embeds with Google Gemini, and stores 1536-dim vectors in PostgreSQL + pgvector (HNSW index). A FastAPI service retrieves chunks via cosine search, optionally reranks with LLM, and generates cited answers with Gemini. A Streamlit app delivers the Q&A interface, document filters, and an admin panel with live ingestion logs.

๐Ÿ”น Nightsight Analytics โ€” nightsight-analytics ๐Ÿ”’

Monthly economic activity monitor for all 5,570 Brazilian municipalities derived from VIIRS/DNB nighttime satellite imagery (NASA/NOAA). Batch-extracts radiance data via Google Earth Engine, applies Winsorization (p99), IHS transform, and temporal Z-score normalization per municipality, then loads into Snowflake. An interactive Streamlit + PyDeck WebGL dashboard renders choropleth maps, municipal time series, and regional rankings.

๐Ÿ”น Insight Invest โ€” insight-invest ๐Ÿ”“

Automated stock analysis system that scrapes fundamental indicators from Investidor10 via BeautifulSoup4, stores them in PostgreSQL, and trains scikit-learn Random Forest models for both price forecasting (regressor) and performance classification. A daily schedule-based orchestrator runs scraping, multi-day predictions, and recommendations automatically. Results are explored through a Dash/Plotly dashboard with Bootstrap theming, backed by Docker Compose.

๐Ÿ”น Sound DNA โ€” sound-dna ๐Ÿ”“

End-to-end music genre classification pipeline: ingests tracks from YouTube via yt-dlp, stores metadata in MongoDB, and extracts 369 audio features per track using librosa (MFCCs, Chroma, Spectral Contrast, ZCR, RMS, Tempogram). Trains XGBoost and Random Forest classifiers, and explores clusters via PCA โ†’ K-Means โ†’ t-SNE. A Streamlit app accepts YouTube links or file uploads and returns real-time genre predictions with waveform, mel-spectrogram, and MFCC visualizations.

๐Ÿ”น Scheduled Action Bot โ€” scheduled-action-bot ๐Ÿ”’

Human-in-the-loop browser automation system with a multi-layer scheduling architecture. A FastAPI app deployed on Render (Docker) stays alive via UptimeRobot + GitHub Actions keepalive. At scheduled times, cron-job.org triggers the API, which reads email confirmations via IMAP and โ€” upon approval โ€” dispatches a Selenium (Chrome headless) agent to execute the browser workflow with randomized timing delays. Action results are reported back via Resend API.

๐Ÿ”น Competitor Analysis โ€” competitor-analysis ๐Ÿ”’

Parallel web scraping and data intelligence system that monitors 6 Brazilian e-commerce competitors across 12 product categories, extracting 20+ attributes per product. Each retailer has a dedicated Selenium + BeautifulSoup4 scraper tailored to its DOM structure, with hybrid extraction (JSON-LD structured data + CSS selectors), dynamic scroll/pagination, and user-agent spoofing. A ThreadPoolExecutor runs all 6 scrapers concurrently, bulk-inserts results into SQL Server via SQLAlchemy + pyodbc (parameter-limit-aware batching), and passes the data through a 50+ rule normalization engine (colors, sizes, materials, attribute inference). Final output feeds a Power BI dashboard for pricing and product mix analysis.

๐Ÿ”น Football DataOps Lakehouse โ€” football-dataops-lakehouse ๐Ÿ”“

Local data lakehouse replicating AWS architecture (S3 โ†’ Athena โ†’ MWAA) entirely on open-source tooling. Uses Terraform to provision MinIO buckets, Dagster to orchestrate a medallion pipeline (raw โ†’ validated โ†’ trusted) over StatsBomb open data, Great Expectations for data quality gates (coordinates, IDs), and DuckDB for stateless Parquet queries. Full Docker Compose setup with GitHub Actions CI/CD.

๐Ÿ”น JoyBind โ€” joybind ๐Ÿ”“

Windows desktop app that maps any USB/Bluetooth gamepad to keyboard, mouse, and macro sequences โ€” no drivers required. Built with customtkinter (GUI), pygame (60 Hz joystick polling), and pyautogui/pynput (OS-level input simulation via Windows SendInput for Raw Input compatibility). Supports key combos, hold-while-pressed, timed macros, multi-step action sequences, analog stick โ†’ mouse, and JSON-based preset system. Distributed as a standalone .exe via PyInstaller.


๐ŸŽฏ Currently Focused On

  • ๐Ÿค– RAG & LLM Orchestration: Building LangGraph-based pipelines with multi-query generation, LLM relevance validation, and vector stores (pgvector, Qdrant) for production economic analysis systems.
  • ๐Ÿงฎ Vector Search & Embeddings: Designing production ETL pipelines for large text corpora โ€” from noisy ingestion and NLP preprocessing to dense embedding storage and semantic retrieval.
  • ๐Ÿง  Machine Learning: Applying classification, forecasting, and pattern recognition models across domains โ€” from financial time series to audio feature engineering.
  • โ˜๏ธ Cloud Architecture: Optimizing data pipelines and efficiency on AWS and Snowflake.

๐Ÿค Let's Connect!

Whether it's about AI, Cloud Engineering, Automation, or Retro Gaming, I'm always happy to chat! ๐Ÿ“ฌ LinkedIn | โœ‰๏ธ restier2001@gmail.com

Pinned Loading

  1. insight-invest insight-invest Public

    End-to-end automated stock analysis, forecasting, and recommendation system using web scraping, RandomForest models, PostgreSQL, and an interactive Dash/Plotly dashboard.

    Python

  2. joybind joybind Public

    JoyBind maps controller buttons to custom keyboard strokes and absolute screen coordinates. Built with Python to simplify and automate macro interactions in games.

    Python

  3. sound-dna sound-dna Public

    End-to-end pipeline for music genre classification: YouTube ingestion (yt-dlp), 369 audio features via DSP (librosa), ML models (XGBoost/Random Forest), and a Streamlit app with interactive spectraโ€ฆ

    Python

  4. copom-rag-api copom-rag-api Public

    RAG API for question answering over COPOM (Brazilian Monetary Policy Committee) documents. Retrieves relevant chunks via pgvector cosine search and generates answers with Google Gemini.

    Python

  5. copom-streamlit copom-streamlit Public

    Streamlit web app for the COPOM RAG system. Q&A interface over COPOM meeting minutes and policy communications, plus an admin panel to manage document ingestion. Consumes the copom-rag-api REST API.

    Python

  6. copom-vector-pipeline copom-vector-pipeline Public

    ETL pipeline that downloads COPOM meeting minutes and policy communications from the BCB Open Data API, parses PDFs, chunks and embeds text using Google Gemini, and stores vector embeddings in Postโ€ฆ

    Python