Skip to content

kbd0011/cdisc-pilot-replication

Repository files navigation

cdisc-pilot-replication

pipeline validate R SAS CDISC

End-to-end CDISC SDTM → ADaM → TLF submission package, replicating the FDA-reviewed 2007 CDISC Pilot Alzheimer's study (CDISCPILOT01). Built in parallel R ({admiral} + pharmaverse) and SAS 9.4, with PROC COMPARE (R↔SAS equivalence) and Pinnacle 21 Community (ADaMIG 1.3) validation harnesses wired into the build.

What this demonstrates

  • CDISC ADaM dataset generation - ADSL, ADAE, ADLB, ADTTE, ADQSADAS, all conformant to ADaMIG 1.3
  • Dual implementation (R + SAS) with a PROC COMPARE harness for R↔SAS numerical equivalence (criterion=1e-9)
  • Submission-quality TLFs - 4 tables, 2 listings, 2 figures ({rtables}, {tern}, {rlistings}, survminer, ggplot2)
  • MMRM primary efficacy analysis with ICH E9(R1) estimand specification
  • define.xml v2.0 generated from variable-level metadata Excel specs
  • Reproducibility - renv lockfile, GitHub Actions CI, end-to-end in ~20 seconds
  • Regulatory writing - mock Statistical Analysis Plan + Analysis Data Reviewer's Guide (PHUSE template)

Repository contents

Deliverable Path Format
5 ADaM XPT v5 datasets data/xpt/*.xpt binary, committed
4 submission tables outputs/tlfs/t_*.pdf PDF
2 listings outputs/tlfs/l_*.pdf PDF
2 figures outputs/tlfs/f_*.pdf PDF
Define-XML 2.0 outputs/define/define.xml XML
Statistical Analysis Plan docs/sap/sap.pdf PDF
Analysis Data Reviewer's Guide docs/adrg/adrg.pdf PDF
Pinnacle 21 validation harness python/pinnacle21_parser.py, outputs/validation/ wired - run pending
PROC COMPARE harness sas/compare_r_sas.sas wired - run pending

Pipeline architecture

pharmaversesdtm        cdisc-org/sdtm-adam-pilot-project
  (DM, EX, DS, AE, LB)         (QS.xpt, gitignored in data/raw/)
        │                             │
        └──────────────┬──────────────┘
                       ▼
              R/01_adsl.R - R/05_adqsadas.R           sas/adsl.sas
              (admiral pharmaverse build)             (SAS 9.4 parallel build)
                       │                                       │
                       ▼                                       ▼
              data/adam/*.rds                         data/xpt/sas_adsl.xpt
                       │                                       │
                       ▼                                       │
              R/06_export_xpt.R                                │
              (xportr labels/lengths/formats)                  │
                       │                                       │
                       ▼                                       │
              data/xpt/*.xpt ◄──── PROC COMPARE ◄──────────────┘
                       │           (criterion 1e-9)
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
   R/07-09        R/10_define_xml  Pinnacle 21
   TLF gen        define.xml v2.0  ADaMIG 1.3
        │              │              │
        ▼              ▼              ▼
   outputs/tlfs/  outputs/define/ outputs/validation/
                                       │
                                       ▼
                               python/pinnacle21_parser.py
                                       │
                                       ▼
                               README badge / JSON

Reproduce

Quickstart (clean checkout, ~30 minutes including first-time package install):

git clone https://github.com/kbd0011/cdisc-pilot-replication.git
cd cdisc-pilot-replication

# 1. Restore the exact R package versions pinned in renv.lock
Rscript -e 'renv::restore()'

# 2. Install Python validators
pip install -r requirements.txt

# 3. Download external SDTM (qs.xpt, ~33MB)
make fetch-raw

# 4. Run the whole pipeline
make all

make all runs three phases:

make pipeline   # R/00_setup → 10_define_xml (~20s, all tests green)
make validate   # python validators + xpt round-trip + P21 parse
make docs       # Quarto render SAP + ADRG to PDF

The SAS parallel build is run manually on SAS OnDemand for Academics - see sas/README.md.

Current pipeline metrics

Dataset Rows Vars XPT size
ADSL 306 39 254 KB
ADAE 1,191 (1,122 TE) 31 1.3 MB
ADLB 58,700 28 41 MB
ADTTE 254 (92 events) 17 154 KB
ADQSADAS 12,241 26 8.2 MB
Validation Status
xpt::xpt_validate() on all 5 OK
XPT vs spec round-trip (python/xpt_metadata_check.py) OK - all datasets pass
define.xml well-formedness (xmllint, validate_define.py) OK
testthat suite 745/745 expectations pass (90 tests across 5 ADaM datasets + XPT export)
Pinnacle 21 ADaMIG 1.3 Pending P21 run - see outputs/validation/README.md
SAS PROC COMPARE Pending ODA run - see sas/README.md

Documentation

Layout

cdisc-pilot-replication/
├── R/                      pipeline scripts (00_setup → 10_define_xml + run_all)
├── sas/                    parallel ADSL + PROC COMPARE
├── python/                 define.xml / XPT / P21 validators
├── metadata/               variable-level Excel specs (141 vars across 5 datasets)
├── data/
│   ├── raw/                external SDTM (qs.xpt) - gitignored
│   ├── adam/               R-built ADaM .rds - gitignored
│   └── xpt/                XPT v5 submission deliverables
├── outputs/
│   ├── tlfs/               4 tables, 2 listings, 2 figures
│   ├── validation/         P21 + PROC COMPARE + JSON summaries
│   └── define/             define.xml + (user-supplied) define2-0-0.{xsd,xsl}
├── docs/
│   ├── sap/sap.pdf
│   └── adrg/adrg.pdf
└── tests/                  testthat (90 tests, 745 expectations)

License

MIT - see LICENSE.

About

End-to-end CDISC SDTM→ADaM→TLF replication of the FDA CDISC Pilot study - parallel R (admiral/pharmaverse) + SAS 9.4, define.xml v2.0, reproducible in ~20s behind a 745-expectation testthat suite.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors