📄 parkinsons_spatial_transcriptomics_research_paper.pdf
Raw data directory containing the downloaded files from
GEO: GSE253975.
Folder containing pretrained CellPLM checkpoint weights:
CellPLM Checkpoints (Dropbox).
Preprocessed AnnData object saved after running the preprocessing notebook (loading/merging counts, filtering, and CP10K + log-normalization).
Loads the raw Visium .txt.gz count matrices in GSE253975_data, inspects them, builds a combined AnnData with sample_id/batch metadata, filters zero-count genes, normalizes to CP10K + log1p, and saves the reusable adata_preprocessed.h5ad (with a small matrix preview at the end).
Starts from adata_preprocessed.h5ad, selects 2k HVGs, scales, runs PCA, corrects batch effects with Harmony, builds neighbor graph + UMAP, and clusters with K-Means. It computes silhouette/compactness/batch-mixing metrics for PCA vs Harmony.
Also loads adata_preprocessed.h5ad but replaces PCA with 512-dim CellPLM embeddings (CPU fallback). Uses those embeddings for neighbors, UMAP, K-Means clustering, and silhouette/compactness/batch-mixing metrics.
- Install the required Python packages listed in
requirements.txt. - Make sure the
GSE253975_data/directory and theckpt/checkpoint folder are in place. - Run
preprocessing.ipynbto load the GEO files and generate the unified normalizedadata_preprocessed.h5ad. - Run
parkinsons_analysis.ipynbandparkinsons_analysis_cellplm.ipynbto perform their respective analyses and automatically generate figures in thefigures/folder.