-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Implement:
-
Self‑Supervised Contrastive Pretraining
Implement a built‑in contrastive learning stage inspired by ReConTab, where an asymmetric autoencoder with regularization selects salient features and a contrastive loss distills robust, invariant embeddings
arXiv
. -
Masked Feature Prediction Pretraining
Offer a masked‑attribute prediction task akin to TabTransformer’s masked language modelling—masking random features and training the model to reconstruct them—thereby contextualizing embeddings via intra‑row dependencies
ar5iv
. -
Tree‑Regularized Embedding Layer
Provide a supervised, tree‑regularized embedding layer (both Tree‑to‑Vector and Tree‑to‑Token) that binarizes inputs via pretrained tree‑ensemble splits and generates embeddings capturing hierarchical, rule‑based structure
arXiv
. -
Random Fourier Feature Preprocessor
Incorporate a Random Fourier Feature transformation module projecting numeric inputs into a fixed high‑frequency basis (via sin/cos of random projections), improving conditioning and convergence without additional learned parameters
arXiv
. -
Periodic and PLE Numeric Embeddings
Add numeric embedding functions using periodic expansions (sin/cos) and parameterized linear expansions (PLE), which empirically close the gap between MLPs/Transformers and tree‑based baselines on tabular tasks
arXiv
. -
Entity Embeddings for Categorical Variables
Support an entity embedding API mapping category indices to dense vectors—leveraging the classic approach that clusters similar categories in latent space and reduces overfitting for high‑cardinality features
arXiv
. -
Semantic Feature Enrichment via Pretrained Word Embeddings
Enable optional semantic text embedding lookup for descriptive categorical fields, pulling in pretrained Word2Vec or GloVe vectors to infuse domain semantics into KDP’s categorical pipelines
MachineLearningMastery.com
. -
Multi‑Grained Categorical Embeddings
Implement an end‑to‑end multi‑grained embedding layer that hierarchically encodes category subsets (e.g., via decision‑forest splits) to capture rich, multi‑resolution feature granularity
ScienceDirect
. -
Library of Self‑Supervised Tabular SSL Tasks
Bundle a suite of state‑of‑the‑art tabular self‑supervised objectives—SCARF, SAINT, SubTab, XTab, etc.—as configurable pretraining strategies directly within KDP
GitHub
. -
Multi‑Scale Fourier Feature Embedding
Incorporate a multiscale FourierFeatureEmbedding layer supporting user‑specified sigma scales, enabling simultaneous capture of both low and high‑frequency numeric patterns
mathlab.github.io
.