A powerful web application for natural language text analysis and preprocessing.
The NLP Text Processor offers a comprehensive suite of tools for analyzing and transforming text data:
- Word Tokenization: Split text into individual words using NLTK, spaCy, TextBlob, or simple whitespace splitting.
- Sentence Tokenization: Break down text into sentences for granular analysis.
Clean and standardize your text with multiple operations:
- Lowercasing: Convert all text to lowercase.
- Contraction Correction: Expand contractions (e.g., "don't" -> "do not").
- Punctuation Removal: Strip all punctuation marks.
- Whitespace Cleanup: Remove multiple spaces and trim text.
- Spelling Correction: Automatically correct spelling errors.
- Emoji Conversion: Convert emojis to their text description or remove them.
- Stop Words Removal: Filter out common non-informative words.
- POS Tagging: Identify parts of speech (Nouns, Verbs, Adjectives, etc.).
- Stemming: Reduce words to their root form (e.g., "running" -> "run").
- Lemmatization: Convert words to their base dictionary form (e.g., "better" -> "good").
- Sentiment Analysis: Detect if text is Positive, Negative, or Neutral with polarity and subjectivity metrics.
- Word Cloud: Visualize the most frequent words in your text.
- Instant text metrics including sentence count, token count, and average tokens per sentence.
- Interactive progress tracking during processing.
- Frontend: Streamlit
- NLP Libraries:
- Utilities:
emoji,contractions,demoji
nlp-text-processor/
├── app.py # Core NLP logic and helper functions
├── gui.py # Main Streamlit interface application
├── requirements.txt # Project dependencies
└── README.md # Project documentation
-
Clone the repository
git clone https://github.com/utachicodes/nlp_text_processor.git cd nlp-text-processor -
Create a virtual environment (Recommended)
# Windows python -m venv venv venv\Scripts\activate # macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Download required NLP data The app will attempt to download NLTK data automatically, but you may need to install the spaCy model manually if it fails:
python -m spacy download en_core_web_sm
-
Run the application
streamlit run gui.py
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.