This project provides a Python-based implementation of the Poisson Process AutoDecoder (PPAD) model [https://arxiv.org/abs/2502.01627], adapted for use with Spontaneous Reporting System (SRS) data like FAERS or VAERS. It includes modules for data simulation, data loading, and the core PPAD model, along with demonstration notebooks.
💡 Poisson Process AutoDecoder (PPAD) for Signal Detection in Event Reporting Systems A novel deep learning approach for unsupervised representation learning and anomaly detection in safety event data streams, suitable for Post-Market Surveillance (PMS) of adverse drug reactions (ADR) and medical device incidents.
🎯 Project Overview The Poisson Process AutoDecoder (PPAD) is a neural field method designed to accurately model and encode the non-homogeneous (time-varying) rates of Poisson Processes. Since the submission of safety reports (events) over time is mathematically modeled as a Poisson process, PPAD offers a robust method to analyze and compress this data.
This repository demonstrates how PPAD can be used to transform complex event histories into actionable intelligence:
Encode a product's entire reporting history into a fixed-length vector—a "Risk Fingerprint."
Detect subtle or complex deviations (signals) in the event rate that are often overlooked by simple count-based methods.
Cluster analogous products based on similarities in their underlying risk time profiles.
🧠 PPAD Methodology: Modeling the Continuous Rate PPAD functions as an AutoDecoder using an unsupervised learning framework, ensuring the model's output is optimized against the actual, recorded event times:
Latent Vector (z): Every event history is assigned a fixed-length, low-dimensional vector (z), which serves as its unique data compression and risk fingerprint.
Neural Field Decoder (f): A shared neural network (f) takes the latent vector (z) and a time input (t) and outputs the expected continuous Poisson rate for that product at that time:
λ(t)=f(z,t)
Optimization Objective: The model minimizes the negative log-likelihood of the observed event times under the model's predicted rate function λ(t). By minimizing this loss across a vast dataset, the latent space Z naturally groups product histories with similar patterns of event arrivals.
🚨 Signal Detection and Anomaly Analysis PPAD's power lies in its ability to quantify how well new data fits the learned "normal" risk space.
- Anomaly Scoring for Batch Analysis When a new batch of reports is received (e.g., as part of a scheduled reporting cycle):
The core network weights f are frozen.
A new latent vector z new
is optimized to represent the new, updated event history.
The final Loss Value (Negative Log-Likelihood) for that optimized z new
is the Anomaly Score.
Signal Trigger: An unusually high Anomaly Score signifies that the product's updated history is a poor fit for the patterns learned by the model (i.e., the rate function λ(t) has experienced a significant, uncharacteristic shift). This validates the injection experiment—the surge of reports generated a high loss, confirming a strong anomaly signal.
- Risk Space Clustering The resulting latent vectors (z) for all products can be used for deep, structural analysis:
Pattern-Based Classification: Allows analysts to visualize and cluster products based on the shape of their reporting trends (e.g., all products with a 'sudden transient spike' versus all products with a 'slow, logarithmic increase').
Identification of Novelty: A product whose z vector falls outside the established clusters in the latent space is flagged as a novel risk profile, guiding focused safety investigation resources.
🛠️ Implementation Note: Operational Fit PPAD is an ideal analytical tool for systems that rely on scheduled batch processing (e.g., weekly, monthly, or quarterly reviews). While the method is highly sensitive to changes, the computational requirement to optimize a new latent vector (z) for each update means it is best employed for in-depth anomaly fingerprinting rather than instantaneous, millisecond-latency monitoring.
The project is organized into the following directories:
ppad_lib/: A Python library containing the core logic.model.py: ThePPADDecodermodel implementation.losses.py: Custom loss functions (Poisson NLL, TV penalty).simulation.py: Functions for generating synthetic SRS data.data_loader.py: Functions for loading and parsing data from CSV files.utils.py: Utility functions like positional encoding.
notebooks/: Jupyter notebooks demonstrating the usage of the library.demo.ipynb: A basic demo showing how to use the core components.anomaly_detection_demo.ipynb: An advanced, end-to-end workflow for training the model and detecting anomalies.
data/: Contains sample data files.*.py: Root-level test scripts.
Follow these steps to set up and run the project.
First, clone the repository and install the required Python packages using the requirements.txt file.
# It is recommended to use a virtual environment
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txtThis project includes two main demonstrations.
To explore the basic components of the library, you can use the demo.ipynb notebook. First, launch Jupyter:
jupyter notebookThen, navigate to notebooks/demo.ipynb and run the cells.
To run the full, end-to-end anomaly detection workflow, execute the new notebook runner script from the root directory:
python run_anomaly_demo.pyThis script will programmatically execute the anomaly_detection_demo.ipynb notebook, which trains the model, detects anomalies, and saves the following files to the results/ directory:
anomaly_detection_summary.json: A JSON file with the numerical results.anomaly_detection_plot.png: A plot visualizing the detected anomaly.plot_interpretation.txt: A text file explaining the results.
The notebooks/anomaly_detection_demo.ipynb notebook itself contains the full implementation and detailed explanations of each step.
To verify that the environment is set up correctly and the code is functional, you can run the included test scripts from the root directory:
# Run the core logic test
python run_demo_test.py
# Run the data loader unit tests
python test_data_loader.py
# Run the simulation unit tests
python test_simulation.pyA successful run should complete without errors.