Skip to content
View tavakohr's full-sized avatar

Block or report tavakohr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tavakohr/README.md

Hamid Tavakoli, MD, MSc

Physician · Clinical Data Scientist · Senior Statistical Programmer
Bridging healthcare and code — R, CDISC, OMOP CDM, and AI-driven pipelines for Real-World Evidence.

🌐 metahealthinfo.com · 💼 LinkedIn


What I build

I'm an MD who programs. I work at the seam between clinical trials, regulatory data standards, and applied AI — turning messy clinical source documents into submission-grade, reproducible R pipelines.

  • 🧬 Clinical R / pharmaverse — moving teams from SAS to admiral, cards, Tplyr, metacore, xportr.
  • 📐 CDISC standards — SDTM, ADaM, and the Analysis Results Standard (ARS) — annotated TLF shells → ARS JSON → ARD.
  • 🏥 OHDSI / OMOP CDM — phenotype curation, MIMIC-IV ETL, LLM-assisted concept-set benchmarking.
  • 🤖 Clinical AI — LLM pipelines for spec generation, metadata enrichment, and phenotype discovery.

📌 Featured work

Repo What it is
pharmaverse-tutorials 46 interactive learnr tutorials (712 live exercises) for SAS→pharmaverse transition, on real CDISCPILOT01 data.
arsbridge R package: parse/validate/execute CDISC ARS specs into tidy ARD via {cards}, with multi-LLM metadata enrichment.
ars-learnr-tutorial 7-chapter hands-on course: annotated TLF shells → ARM-TS JSON on pharmaverse datasets.
cards-tutorial 10-chapter {cards}/{cardx} course — ARD, model tidying, ARS JSON mapping.
omop-phenotype-pipeline Benchmarking LLM-assisted OMOP phenotype curation — concept- vs patient-level F1 on MIMIC-IV.
precise-X 🔒 (private) Lead statistical programmer — built and validated the Cox PH + LASSO survival model predicting first severe COPD exacerbation within 5 years from UK primary-care records. Published in Thorax (2025). 📄 Read the paper

🛠 Stack

R · pharmaverse (admiral, cards, Tplyr, metacore, xportr, teal) · SAS · SQL / PostgreSQL · Python · OMOP CDM · MIMIC-IV · CDISC SDTM / ADaM / ARS · Docker · LLM pipelines (Claude, Gemini)


Open to clinical-R, RWE, and health-AI collaboration. Reach out via any repo discussion or my site.

Pinned Loading

  1. omop-phenotype-pipeline omop-phenotype-pipeline Public

    Benchmarking LLM-assisted OMOP phenotype curation: concept-level vs patient-level F1 on MIMIC-IV, with phenotype-aware methodology improvements.

    R 1

  2. pharmaverse-tutorials pharmaverse-tutorials Public

    Interactive learnr tutorials for clinical R programmers transitioning from SAS to the pharmaverse ecosystem (dplyr, admiral, sdtm.oak, metacore, xportr, teal and more). Built on real CDISC CDISCPIL…

    R 1

  3. ars-learnr-tutorial ars-learnr-tutorial Public

    Hands-on learnr tutorial covering CDISC Analysis Results Standard v1.0 — from annotated shells to ARM-TS JSON, with live exercises on real pharmaverse datasets.

    R 1

  4. arsbridge arsbridge Public

    LLM-assisted conversion of annotated TLF shells + ADaM spec into CDISC ARS v1.0 JSON

    R 1

  5. cards-tutorial cards-tutorial Public

    Interactive learnr tutorial covering cards and cardx R packages, model tidying (broom/parameters), and CDISC ARS validation.

    R

  6. spark-nlp-workshop spark-nlp-workshop Public

    Forked from JohnSnowLabs/spark-nlp-workshop

    Public runnable examples of using John Snow Labs' NLP for Apache Spark.

    Jupyter Notebook