An Android app that lets you chat with famous historical and fictional characters entirely on-device. No internet connection is required for inference. Characters answer questions grounded in their own knowledge base through a Retrieval-Augmented Generation (RAG) pipeline running 100% locally.
Available characters:
- Sherlock Holmes — World's greatest consulting detective
- Marcus Aurelius — Roman Emperor & Stoic philosopher
Screen_recording_20260605_114422.mp4
The project follows Clean Architecture with an MVI (Model-View-Intent) presentation layer.
app/
├── data/
│ ├── local/ # EmbeddingModel, BertTokenizer, VectorDatabase (JNI)
│ └── repository/ # ChatRepositoryImpl, RagRepositoryImpl
├── di/ # Hilt modules (AppModule, RagModule)
├── domain/
│ ├── model/ # Character, ChatMessage, ModelStatus, Result
│ ├── repository/ # ChatRepository, RagRepository (interfaces)
│ └── usecase/ # BuildRagContextUseCase, SendMessageUseCase,
│ # DownloadModelUseCase, ImportModelUseCase,
│ # InitializeEngineUseCase
├── presentation/
│ ├── characterselection/
│ ├── chat/
│ └── navigation/
└── cpp/ # sqlite3 + sqlite-vec + JNI bridge (CMake)
| Tool | Version |
|---|---|
| Kotlin | 2.2.10 |
| Android Gradle Plugin (AGP) | 9.2.1 |
| KSP (Kotlin Symbol Processing) | 2.3.9 |
| CMake (NDK native build) | 3.22.1 |
| Min SDK | 24 |
| Target SDK | 36 |
| ABI filters | arm64-v8a, x86_64 |
| Library | Purpose |
|---|---|
Jetpack Compose BOM 2026.02.01 |
Declarative UI toolkit |
| Material3 | Design system & components |
material-icons-extended |
Extended icon set |
Navigation Compose 2.8.9 |
In-app navigation with back stack |
Activity Compose 1.13.0 |
ComponentActivity + Compose integration |
| Library | Purpose |
|---|---|
Hilt 2.59 |
Dependency injection (Dagger-based) |
Hilt Navigation Compose 1.2.0 |
hiltViewModel() scoped to nav destinations |
Lifecycle ViewModel Compose 2.10.0 |
collectAsStateWithLifecycle, viewModelScope |
Kotlin Coroutines 1.10.1 |
Async operations, Flow for streaming responses |
| Library | Purpose |
|---|---|
LiteRT LM (com.google.ai.edge.litertlm) 0.12.0 |
Google's on-device LLM runtime (formerly LiteRT/TFLite LM) |
Gemma 3 1B IT int4 (.litertlm model file) |
The language model (~1 GB, 4-bit quantized) |
| Library | Purpose |
|---|---|
ONNX Runtime Android 1.20.0 |
Runs the embedding model on-device |
all-MiniLM-L6-v2 (.onnx) |
Sentence embedding model (384-dim vectors) |
| sqlite-vec (C extension, compiled via NDK) | KNN vector similarity search inside SQLite |
| SQLite (amalgamation, compiled via NDK) | Embedded relational + vector database |
The app opens on CharacterSelectionScreen. The user picks a character, which triggers CharacterSelectionIntent.SelectCharacter. The ViewModel emits a navigation event and the nav graph navigates to ChatScreen, passing the characterId as a route argument.
On arrival at ChatScreen, ChatIntent.Initialize(character) is dispatched. The ViewModel checks whether the model file exists in filesDir.
If the model is absent — ModelSetupScreen is shown with two options:
- Download — streams
gemma3-1b-it-int4.litertlmfrom HuggingFace over HTTP, writing to a.tmpfile and atomically renaming it on completion. Progress is reported viaModelStatus.Transferring. - Import from storage — opens a file picker (
ActivityResultContracts.OpenDocument). The chosen file is copied intofilesDirwith the same progress-reporting mechanism.
Once the file exists, InitializeEngineUseCase runs automatically.
InitializeEngineUseCase
└─ ChatRepository.initializeEngine()
├─ SamplerConfig(topK=40, topP=0.9, temperature=0.4, seed=0)
├─ Engine(EngineConfig(modelPath, maxNumTokens=2048)).initialize()
└─ Conversation = engine.createConversation(ConversationConfig(samplerConfig))
ModelStatus transitions: Absent → Initializing → Ready
The UI shows an indeterminate LinearProgressIndicator while initializing, and enables the input bar once Ready.
When the user hits Send, ChatViewModel.sendMessage() executes the following pipeline on the IO/Default dispatcher:
BertTokenizer.encode(userText)
- Normalizes text (NFD, lowercase, strip diacritics)
- Applies basic tokenization (whitespace + punctuation splitting, CJK char isolation)
- Applies WordPiece sub-word tokenization against
vocab.txt(30 522 tokens) - Prepends
[CLS](101) and appends[SEP](102), truncates to max 256 tokens - Produces
inputIds,attentionMask,tokenTypeIdsasLongArray
EmbeddingModel.embed(userText) → FloatArray (384 dimensions)
- Loads
all-MiniLM-L6-v2.onnxfrom assets at first use (kept open as a singleton) - Creates ONNX tensors from the tokenizer output
- Runs the ONNX session →
last_hidden_statetensor[1, seqLen, 384] - Mean pooling — averages the token embeddings of non-masked positions
- L2 normalization — normalizes the resulting vector to unit length
VectorDatabase.search(embedding, characterId, topK=3)
On first call, characters_rag.db is copied from assets/ to filesDir (one-time operation).
The search is executed via JNI in native C++:
// vector_search.cpp
sqlite3_open_v2(dbPath, ...)
sqlite3_vec_init(db, ...) // loads the sqlite-vec extension
SELECT content, character_id
FROM character_knowledge
WHERE embedding MATCH ? AND k = ?
ORDER BY distancesqlite-vecexposes a virtual KNN table that matches the query vector against the stored 384-dim embeddings using cosine/L2 distance.- Results are filtered by
character_idto return only knowledge from the selected character. - Returns the top-3 most relevant text chunks.
// SendMessageUseCase
"Use the following context to answer the question.\n\n" +
"Context:\n" +
"[sherlock_holmes]: <chunk 1>\n\n" +
"[sherlock_holmes]: <chunk 2>\n\n" +
"[sherlock_holmes]: <chunk 3>\n\n" +
"Question: <user message>"ChatRepository.streamResponse(augmentedPrompt)
└─ conversation.sendMessageAsync(augmentedPrompt) // LiteRT LM
└─ Flow<Content.Text> → token-by-token emission
sendMessageAsyncreturns aFlowof response objects; each carries incrementalContent.Texttokens.- The ViewModel collects this flow and appends each token to the last
ChatMessagein the UI state, producing a live streaming effect. - Generation is capped at 400 output tokens per response.
- The input bar is disabled (
isGenerating = true) until the flow completes.
ModelStatus.Absent
│ (file download / import)
▼
ModelStatus.Transferring(progress, label)
│ (copy complete)
▼
ModelStatus.Initializing
│ (Engine.initialize() + createConversation())
▼
ModelStatus.Ready ──► chat loop (RAG + streaming)
│
└──► ModelStatus.Failure(message) on any error
characters_rag.db is a pre-built SQLite database shipped inside assets/. It contains a character_knowledge table with:
| Column | Type | Description |
|---|---|---|
content |
TEXT | A knowledge chunk (sentence or paragraph) |
character_id |
TEXT | "sherlock_holmes" or "marcus_aurelius" |
embedding |
BLOB | 384-float vector (little-endian IEEE 754) |
The sqlite-vec virtual table extension enables sub-millisecond ANN (Approximate Nearest Neighbor) queries directly in SQLite without any external vector store.
- Android Studio Meerkat or newer
- NDK installed (the build requires CMake 3.22+ to compile the native sqlite-vec layer)
- A device or emulator with
arm64-v8aorx86_64ABI
git clone <repo-url>
cd AIChatbot
./gradlew installDebugOn first launch the app will prompt you to either:
- Download automatically — requires ~1 GB of data (Wi-Fi recommended)
- Import manually — download
gemma3-1b-it-int4.litertlmfrom HuggingFace litert-community/Gemma3-1B-IT and pick the file from device storage
| File | Role |
|---|---|
BertTokenizer.kt |
WordPiece tokenizer for all-MiniLM-L6-v2 |
EmbeddingModel.kt |
ONNX session wrapper, mean pool + L2 norm |
VectorDatabase.kt |
Asset DB copy + JNI bridge to sqlite-vec |
vector_search.cpp |
Native KNN query via sqlite3 + sqlite-vec |
ChatRepositoryImpl.kt |
LiteRT LM engine lifecycle + streaming |
RagRepositoryImpl.kt |
Orchestrates embed → search → context |
SendMessageUseCase.kt |
Builds the RAG-augmented prompt |
ChatViewModel.kt |
MVI state machine + token streaming |
AppNavGraph.kt |
Navigation graph (character selection → chat) |