From 31a5dfc1e02367e738aaa96f33c1cacded3be295 Mon Sep 17 00:00:00 2001
From: Meilame Tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>
Date: Tue, 25 Nov 2025 23:12:57 +0100
Subject: [PATCH] Update README for repository deprecation notice
Updated README to indicate repository is no longer maintained and to direct users to the new project torchTextClassifiers.
---
README.md | 73 +++++++++++++++++++++++--------------------------------
1 file changed, 31 insertions(+), 42 deletions(-)
diff --git a/README.md b/README.md
index 23d7b31..67ba4b6 100644
--- a/README.md
+++ b/README.md
@@ -2,15 +2,21 @@
A flexible PyTorch implementation of FastText for text classification with support for categorical features.
+> **⚠️ This repository is no longer maintained.**
+>
+> It has evolved into a newer, actively maintained project: **torchTextClassifiers** - that aims at being a more general, unified framework and a toolkit for text classification in PyTorch.
+>
+> 👉 Please use the updated version here: [https://github.com/InseeFrLab/torchTextClassifiers](https://github.com/InseeFrLab/torchTextClassifiers)
+
## Features
-- Supports text classification with FastText architecture
-- Handles both text and categorical features
-- N-gram tokenization
-- Flexible optimizer and scheduler options
-- GPU and CPU support
-- Model checkpointing and early stopping
-- Prediction and model explanation capabilities
+* Supports text classification with FastText architecture
+* Handles both text and categorical features
+* N-gram tokenization
+* Flexible optimizer and scheduler options
+* GPU and CPU support
+* Model checkpointing and early stopping
+* Prediction and model explanation capabilities
## Installation
@@ -20,19 +26,18 @@ pip install torchFastText
## Key Components
-- `build()`: Constructs the FastText model architecture
-- `train()`: Trains the model with built-in callbacks and logging
-- `predict()`: Generates class predictions
-- `predict_and_explain()`: Provides predictions with feature attributions
+* `build()`: Constructs the FastText model architecture
+* `train()`: Trains the model with built-in callbacks and logging
+* `predict()`: Generates class predictions
+* `predict_and_explain()`: Provides predictions with feature attributions
## Subpackages
-- `preprocess`: To preprocess text input, using `nltk` and `unidecode` libraries.
-- `explainability`: Simple methods to visualize feature attributions at word and letter levels, using `captum`library.
+* `preprocess`: To preprocess text input, using `nltk` and `unidecode` libraries.
+* `explainability`: Simple methods to visualize feature attributions at word and letter levels, using `captum` library.
Run `pip install torchFastText[preprocess]` or `pip install torchFastText[explainability]` to download these optional dependencies.
-
## Quick Start
```python
@@ -63,40 +68,37 @@ model.train(
predictions = model.predict(test_data)
```
-where ```train_data``` is an array of size $(N,d)$, having the text in string format in the first column, the other columns containing tokenized categorical variables in `int` format.
+where `train_data` is an array of size $(N,d)$, having the text in string format in the first column, the other columns containing tokenized categorical variables in `int` format.
Please make sure `y_train` contains at least one time each possible label.
## Dependencies
-- PyTorch Lightning
-- NumPy
+* PyTorch Lightning
+* NumPy
## Categorical features
-If any, each categorical feature $i$ is associated to an embedding matrix of size (number of unique values, embedding dimension) where the latter is a hyperparameter (`categorical_embedding_dims`) - chosen by the user - that can take three types of values:
+If any, each categorical feature $i$ is associated to an embedding matrix of size (number of unique values, embedding dimension) where the latter is a hyperparameter (`categorical_embedding_dims`) – chosen by the user – that can take three types of values:
-- `None`: same embedding dimension as the token embedding matrix. The categorical embeddings are then summed to the sentence-level embedding (which itself is an averaging of the token embeddings). See [Figure 1](#Default-architecture).
-- `int`: the categorical embeddings have all the same embedding dimensions, they are averaged and the resulting vector is concatenated to the sentence-level embedding (the last linear layer has an adapted input size). See [Figure 2](#avg-architecture).
-- `list`: the categorical embeddings have different embedding dimensions, all of them are concatenated without aggregation to the sentence-level embedding (the last linear layer has an adapted input size). See [Figure 3](#concat-architecture).
+* `None`: same embedding dimension as the token embedding matrix. The categorical embeddings are then summed to the sentence-level embedding (which itself is an averaging of the token embeddings). See Figure 1.
+* `int`: the categorical embeddings have all the same embedding dimensions, they are averaged and the resulting vector is concatenated to the sentence-level embedding (the last linear layer has an adapted input size). See Figure 2.
+* `list`: the categorical embeddings have different embedding dimensions, all of them are concatenated without aggregation to the sentence-level embedding (the last linear layer has an adapted input size). See Figure 3.
Default is `None`.
-
-
+
*Figure 1: The 'sum' architecture*
-
-
+
*Figure 2: The 'average and concatenate' architecture*
-
-
+
*Figure 3: The 'concatenate all' architecture*
## Documentation
-For detailed usage and examples, please refer to the [example notebook](notebooks/example.ipynb). Use `pip install -r requirements.txt` after cloning the repository to install the necessary dependencies (some are specific to the notebook).
+For detailed usage and examples, please refer to the example notebook (`notebooks/example.ipynb`). Use `pip install -r requirements.txt` after cloning the repository to install the necessary dependencies (some are specific to the notebook).
## Contributing
@@ -106,21 +108,8 @@ Contributions are welcome! Please feel free to submit a Pull Request.
MIT
-
## References
Inspired by the original FastText paper [1] and implementation.
-[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)
-
-```
-@InProceedings{joulin2017bag,
- title={Bag of Tricks for Efficient Text Classification},
- author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
- booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
- month={April},
- year={2017},
- publisher={Association for Computational Linguistics},
- pages={427--431},
-}
-```
+[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, *Bag of Tricks for Efficient Text Classification*.