Skip to content

A simple web tool for analyzing and preparing text using common NLP techniques.

License

Notifications You must be signed in to change notification settings

utachicodes/nlp_text_processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

NLP Text Processor

Python Streamlit License

A powerful web application for natural language text analysis and preprocessing.


Features

The NLP Text Processor offers a comprehensive suite of tools for analyzing and transforming text data:

Tokenization

  • Word Tokenization: Split text into individual words using NLTK, spaCy, TextBlob, or simple whitespace splitting.
  • Sentence Tokenization: Break down text into sentences for granular analysis.

Text Normalization

Clean and standardize your text with multiple operations:

  • Lowercasing: Convert all text to lowercase.
  • Contraction Correction: Expand contractions (e.g., "don't" -> "do not").
  • Punctuation Removal: Strip all punctuation marks.
  • Whitespace Cleanup: Remove multiple spaces and trim text.
  • Spelling Correction: Automatically correct spelling errors.
  • Emoji Conversion: Convert emojis to their text description or remove them.

Advanced Analysis

  • Stop Words Removal: Filter out common non-informative words.
  • POS Tagging: Identify parts of speech (Nouns, Verbs, Adjectives, etc.).
  • Stemming: Reduce words to their root form (e.g., "running" -> "run").
  • Lemmatization: Convert words to their base dictionary form (e.g., "better" -> "good").

📈 Visualizations & Sentiment

  • Sentiment Analysis: Detect if text is Positive, Negative, or Neutral with polarity and subjectivity metrics.
  • Word Cloud: Visualize the most frequent words in your text.

Real-time Statistics

  • Instant text metrics including sentence count, token count, and average tokens per sentence.
  • Interactive progress tracking during processing.

Tech Stack


Project Structure

nlp-text-processor/
├── app.py              # Core NLP logic and helper functions
├── gui.py              # Main Streamlit interface application
├── requirements.txt    # Project dependencies
└── README.md           # Project documentation

Installation & Setup

  1. Clone the repository

    git clone https://github.com/utachicodes/nlp_text_processor.git
    cd nlp-text-processor
  2. Create a virtual environment (Recommended)

    # Windows
    python -m venv venv
    venv\Scripts\activate
    
    # macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Download required NLP data The app will attempt to download NLTK data automatically, but you may need to install the spaCy model manually if it fails:

    python -m spacy download en_core_web_sm
  5. Run the application

    streamlit run gui.py

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A simple web tool for analyzing and preparing text using common NLP techniques.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages