Spelling Corrector Experimental models

This is experimental project for research of spelling corrector models. We've researched several configurations of systems for spelling corrections:

Language model (all based on ELMO model):

TFKuznetsovELMO (Best by quality but slow)
PyTorchELMO (Good for speed and good for quality)
TransformerELMO (Nice for speed but bad quality)
RusVecores ELMO (Nice for speed but bad quality)
TFHubELMO (Not good solution)

Error model and candidates generator:

Levenshtein
Weighted Levenshtein

Reranker for corrections making:

Summator (decision selected for hypotheses with maximal sum of language_model score and error_score)
Logistic Regression (decision is made by trainable Regression algorithm which uses multiple features includin language model score and error score and many other)

##Root documentation

http://192.168.10.188:8081/index.php/Spelling_Corrector

Table with comparison of models on different datasets

https://docs.google.com/spreadsheets/d/1QzYF2O5z1nQR8gic8uFTx4TnB242EV3uNXPjl-xTppg/edit#gid=0

Basic usage

How to prepare spelling corrector object with specific implementation underneath

git clone https://github.com/acriptis/spelling_corrector_experiments

pip install -r requirements.txt

then you can launch tests:

python tests/test_spelling_correctors.py

but it requires about 30GB of RAM and loads disck space (model ELMO40inKuz consu,es about 4.5GB of disk space, TorchELMO40in model consumes about 5GB of disk space) (But it loads all models, )

As-Server usage

TODO How to run spelling corrector as server

deepepavlov riseapi?
django rest api service
flask
TODO wrap solution to Docker image?

How to train your own components

TODO

Language model

TODO

ReRanker

TODO

Spelling Corrector Candidates Generator

TODO describe how to configure

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
dp_components		dp_components
experiments_with_lightweight_ELMO		experiments_with_lightweight_ELMO
language_models		language_models
online_spellchekers		online_spellchekers
reranker		reranker
spelling_correction_models		spelling_correction_models
tests		tests
utilities		utilities
.gitignore		.gitignore
190813 DeepPavlov Spelling Corrector Experiments and Evaluation.ipynb		190813 DeepPavlov Spelling Corrector Experiments and Evaluation.ipynb
190826 Experiments with Levsnshtein HypothesesGenerator.ipynb		190826 Experiments with Levsnshtein HypothesesGenerator.ipynb
190826 KenLM interfaces.ipynb		190826 KenLM interfaces.ipynb
190826 Naive Spelling Corrector.ipynb		190826 Naive Spelling Corrector.ipynb
190829 Naive Hierarchical Spelling Corrector.ipynb		190829 Naive Hierarchical Spelling Corrector.ipynb
190905 Sorokin ELMO Corrector.ipynb		190905 Sorokin ELMO Corrector.ipynb
190911 Clean Dialog Dataset.ipynb		190911 Clean Dialog Dataset.ipynb
190911 Sorokin ELMO Corrector refactored (ELMO40inSpellingCorrector).ipynb		190911 Sorokin ELMO Corrector refactored (ELMO40inSpellingCorrector).ipynb
190916 ELMO40in2 Spelling Corrector Experiments with merging tokens hypotheses.ipynb		190916 ELMO40in2 Spelling Corrector Experiments with merging tokens hypotheses.ipynb
191022 TorchELMO40in2 Experiments.ipynb		191022 TorchELMO40in2 Experiments.ipynb
191023 Evaluate TorchELMO40in2Batchy, TF_Kuznetsov and Yandex.Speller on ABK dataset.ipynb		191023 Evaluate TorchELMO40in2Batchy, TF_Kuznetsov and Yandex.Speller on ABK dataset.ipynb
191030 TF_ELMO40in Experiments (ELMOLMTFHub).ipynb		191030 TF_ELMO40in Experiments (ELMOLMTFHub).ipynb
191106 ABK datasets from json. SD_TYPOS. Evaluation.ipynb		191106 ABK datasets from json. SD_TYPOS. Evaluation.ipynb
191203 ReRanker Dataset Preparation. SC Analysis Dicts preparation..ipynb		191203 ReRanker Dataset Preparation. SC Analysis Dicts preparation..ipynb
Evaluate_yandex_spell_checker.ipynb		Evaluate_yandex_spell_checker.ipynb
README.md		README.md
__init__.py		__init__.py
evaluate.py		evaluate.py
lettercaser.py		lettercaser.py
requirements.txt		requirements.txt
where_is_mistake.py		where_is_mistake.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spelling Corrector Experimental models

Table with comparison of models on different datasets

Basic usage

As-Server usage

How to train your own components

Language model

ReRanker

Spelling Corrector Candidates Generator

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spelling Corrector Experimental models

Table with comparison of models on different datasets

Basic usage

As-Server usage

How to train your own components

Language model

ReRanker

Spelling Corrector Candidates Generator

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages