Skip to content
View irfanalidv's full-sized avatar

Organizations

@brainsfeed @re-sources-io

Block or report irfanalidv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
irfanalidv/README.md

Irfan Ali

Generative AI Engineer · LLMs · RAG · Agentic Pipelines

I build AI systems that work in production — not just demos. Over 7+ years across data engineering, NLP, and LLM systems. I've shipped production AI at a Schneider Electric company, built the entire AI intelligence layer at a Hong Kong AI startup, and founded DataCortex IQ to deliver AI systems for clients globally.

LinkedIn PyPI Website Email


What I actually work on

Most AI projects fail at the same places — retrieval breaks under real data, agents loop forever, pipelines that passed evals fail in production, costs explode at scale. That's the problem I solve.

My work sits at the intersection of LLM orchestration, agentic pipelines, and production data engineering — building the full system from ingestion to deployment, not just the model layer.

Recent systems I've shipped:

  • Voice AI platform with real-time STT/TTS, LLM reasoning, structured extraction, and post-call analytics (Reflecta)
  • Autonomous multi-channel lead intelligence system with agent-driven pipelines, multi-provider enrichment, and RAG-style web extraction (Kuration AI)
  • LLM-powered dealer assistant with domain fine-tuned GPT model, multi-chain LangChain pipeline, and load calculation logic (Luminous Power Technologies / Schneider Electric)

PyPI packages

Libraries I maintain for AI infrastructure, retrieval, and data systems — used by developers in production.

Package What it does Downloads
agentensemble Multi-agent orchestration — ReAct, Swarm, Pipeline, Debate, WorkflowGraph patterns with routing, planning, RAG, and cost tracking Downloads
ragfallback Stop RAG from failing silently — query rewriting, retrieval confidence scoring, fallback strategies, retry logic Downloads
ragnav Navigation-first RAG for long documents — routes queries to right pages, follows cross-references, coherent evidence retrieval Downloads
scrapeflow-py Production Playwright scraping — LLM extraction, hybrid selectors, session persistence, rate limiting, anti-detection Downloads
agentcare Voice AI for healthcare — call intake, structured extraction, missing-data recovery, appointment orchestration, post-call analytics Downloads
askpandas Natural language queries on CSV data using local LLMs — no API keys, no data leaves your machine Downloads
lingo-nlp-toolkit Lightweight NLP toolkit bridging traditional pipelines and transformer-ready workflows Downloads
pyrochain Agentic feature engineering — PyTorch + LangChain agents for multimodal feature extraction Downloads
toxic-comment-classifier Deep learning toxicity detection — obscene language, threats, insults, identity hate with per-category scores Downloads

→ All packages on PyPI


Stack

LLMs          GPT-4o · Claude · Gemini · Mistral · Ollama (local)
Orchestration LangChain · LangGraph · custom agent frameworks
RAG           hybrid BM25 + embeddings · reranking · fallback strategies
Backend       Python · FastAPI · async pipelines · queue-driven systems
Scraping      Playwright · Selenium · Firecrawl · ZenRows
Databases     MongoDB · PostgreSQL · vector stores
Infra         Docker · Azure · Azure ML · Azure DevOps · GCP
Data          Pandas · ETL pipelines · structured extraction · NLP

Where I've worked

Company Role What I built
Kuration AI · Hong Kong Founding AI Engineer Entire AI intelligence layer — agent pipelines, multi-provider enrichment, RAG-style web extraction, LLM orchestration
Luminous Power Technologies · Schneider Electric Senior Manager — Data & Analytics, R&D LLM dealer assistant (fine-tuned GPT), R&D intelligence dashboard, GenAI data platform on Azure
Lynk · India Data Analytics & Automation Analytics pipelines, NLP-powered expert matchmaking, decision-ready data workflows
brainsfeed · Hong Kong Head of Data & Analytics Built Infosphere from scratch — NLP enrichment platform with 15+ attribute extraction and natural-language search

Research

  • Multi-Aspect Temporal Topic Evolution with Neural-Symbolic Fusion and Information Extraction for Yelp Review AnalysisIndian Journal of Artificial Intelligence and Neural Networking (IJAINN), Oct 2025. DOI
  • Advanced Cross-Validation Framework for Mental Health AI: BERT and Neural Networks Achieve High Accuracy on MentalChat16KIJAINN, Dec 2025. DOI

Currently

  • Pursuing M.Sc. Data Science & AI at IISER Tirupati (Institute of National Importance, Ministry of Education, Govt. of India) — GPA 8.0/10
  • Building Reflecta — continuity-first mental wellness platform with voice AI
  • Running DataCortex IQ — available for AI engineering contracts and consulting
  • Open to full-time Generative AI / Agentic AI Engineering roles (remote preferred)

GitHub Stats

GitHub Stats

Top Languages


Building at the intersection of LLMs, agentic systems, and production data engineering. India · Previously: Hong Kong · France · US

Pinned Loading

  1. ragfallback ragfallback Public

    ragfallback is a Python library that prevents silent RAG failures — chunk quality, retrieval fallback, adaptive querying, and answer evaluation in one package.

    Python 1

  2. AgentEnsemble AgentEnsemble Public

    AgentEnsemble is a Production-ready multi-agent orchestration for Python. ReAct, Swarm, Pipeline, Debate, Router, Planner, WorkflowGraph. Observability, cost tracking, human-in-loop. Structured out…

    Python 1

  3. AskPandas AskPandas Public

    AI-powered data engineering and analytics assistant for querying CSV data using natural language—locally and intelligently

    Python 1

  4. scrapeflow-py scrapeflow-py Public

    Production-ready web scraping engine on Playwright. LLM extraction, hybrid selectors, session persistence, rate limiting, anti-detection.

    Python 1

  5. AgentCare AgentCare Public

    AgentCare is a library-first Python framework for building voice AI workflows: call intake, extraction, missing-data recovery, appointment orchestration, confirmation messaging, and operations anal…

    Python 1

  6. RAGNav RAGNav Public

    RAGNav is a Hybrid RAG retrieval — BM25 + embeddings + document graph. Runs fully offline. SQuAD R@3: 0.956, zero API calls.

    Python 1