OpenLLM-1M

OpenLLM-1M is a locally deployable chatbot application powered by a Streaming Large Language Model (LLM). It supports processing and generating responses for inputs up to 1 million tokens, making it suitable for handling long-context conversations. This project leverages Gradio for the user interface and is designed to run efficiently on CPU, eliminating the need for GPU resources.

By default, it is integrated with the following models:

Additionally, OpenLLM-1M allows deployment of single or multiple models depending on your use case.

🚀 Features

Long-Context Support: Handles inputs and outputs up to 1 million tokens.
CPU-Friendly: Optimized to run on standard CPUs without requiring GPU.
Interactive UI: Built with Gradio for a user-friendly web interface.
Streaming LLM Integration: Utilizes streaming techniques for efficient processing of large inputs.
Local Deployment: Fully functional when run locally, ensuring privacy and control.
Flexible Model Deployment: Supports deployment of one or multiple models simultaneously, depending on the requirements.
Default Model Integration: Pre-integrated with HuggingFaceTB/SmolLM2-135M-Instruct and HuggingFaceTB/SmolLM2-360M-Instruct models.

🛠️ Installation

Prerequisites

Anaconda For environment management. Python 3.10 Ensure compatibility with dependencies.

Setup Steps

Clone the Repository:

git clone https://github.com/Rahulkumar010/OpenLLM-1M.git
cd OpenLLM-1M

Create and Activate a Conda Environment:

conda create -n openllm python=3.10
conda activate openllm

Install Required Packages:

pip install uv
uv pip install -r requirements_dev.txt

🧠 Running the Application

Terminal 1: Start the Model Serve

cd FastChat
python model_server.py

Terminal 2: Launch the Gradio Interface

python app.py

After executing these commands, open your browser and navigate to the provided local URL to interact with the chatbot.

🔧 To Do

Dockerization: Containerize the application for easier deployment across different environments.
UI Enhancements: Add functionality to enable or disable streaming through the user interface.
Easier Model Integration: Simplify the process of integrating new models into the application. Currently, adding a new model requires modifying the codebase, but future updates aim to provide a more modular approach for easier integration of additional models.

📚 References & Inspiration

FastChat: Core model server implementaion.
Streaming LLM (MIT Han Lab): Streaming techniques for efficient LLM procesing
Qwen5: A reference model for long-context LMs.

📬 Contributions

This project is a personal exploration. However, contributions are welcome! Feel free to fork the repository, submit issues, or propose enhancements via pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please reach out to rahul01110100@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FastChat		FastChat
assets		assets
README.md		README.md
app.py		app.py
patching.py		patching.py
requirements_dev.txt		requirements_dev.txt
web_ui.py		web_ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenLLM-1M

🚀 Features

🛠️ Installation

Prerequisites

Setup Steps

🧠 Running the Application

Terminal 1: Start the Model Serve

Terminal 2: Launch the Gradio Interface

🔧 To Do

📚 References & Inspiration

📬 Contributions

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenLLM-1M

🚀 Features

🛠️ Installation

Prerequisites

Setup Steps

🧠 Running the Application

Terminal 1: Start the Model Serve

Terminal 2: Launch the Gradio Interface

🔧 To Do

📚 References & Inspiration

📬 Contributions

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages