Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/edge_ai/chat_with_llm/1_chat_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Exercise 01: Simple Chat Request

During the workshop, we will be using Gemma 3 1B as our language model. The models are deployed using llama.cpp, which exposes an OpenAI-compatible API on port 8080.

We have defined the necessary structs to interact with the model API.

A chat request consists of the model name, an array of messages and optionally tools and response format.

A message consists of the role (user, assistant, system) and the content.

Complete the TODO 1 to implement the chat interaction logic.
21 changes: 21 additions & 0 deletions docs/edge_ai/chat_with_llm/2_RAG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Exercise 02: Retrieval-Augmented Generation (RAG)

In this section, we will implement a RAG system that combines the language model with a document retrieval system.

The embeddings model is also deployed using llama.cpp and exposes a slightly different API on port 8081.

A RAG system is implemented as follows:
1. Calculate embeddings on documents inside the knowledge base.
2. Calculate the embedding of the user query.
3. Store the document embeddings in a vector database (for simplicity, we will use an in-memory vector store).
4. Get the most similar documents from the knowledge base using the query embedding, with a metric such as cosine similarity.
5. Pass the retrieved documents as context to the language model and generate a response.

Here are some examples that you can add to the database and ask questions about them:

1. The secret code to access the project is 'quantum_leap_42'.
2. Alice is the lead engineer for the new 'Orion' feature.
3. The project deadline has been moved to next Friday.


For this exercise, solve TODO 2 to implement the document retrieval logic.
33 changes: 33 additions & 0 deletions docs/edge_ai/chat_with_llm/3_structured_outputs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Exercise 03: Structured Outputs

Structured outputs are a way to format the model's responses, such that they can be parsed by other systems. Information extraction is a common use case for structured outputs, where the model is asked to extract specific information from a given text.

Structured outputs are defined by a JSON Schema that describes the structure of the expected output.

The schema is passed in the API request in the `response_format` field. An example schema for extracting the city from a given text looks like this:

```json
{
"type": "json_schema",
"json_schema": {
"name": "example_schema",
"schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
}
}
}
}
}
```

In the background, llama.cpp parses this schema and creates a GBNF grammar that guides the model's response generation. More information in the [llama.cpp documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars).

Keep in mind that using structured outputs can degrade the performance of LLMs, as shown by [Tam et al.](https://arxiv.org/abs/2408.02442)

For this exercise, solve TODO 3 in order to extract the name, city and age of user from a given text.

Here's an example prompt you can use to test your implementation:
```John is a 25 years old software engineer living in New York.```
48 changes: 48 additions & 0 deletions docs/edge_ai/chat_with_llm/4_tool_calling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Exercise 04: Tool Calling

LLMs are very good at generating text, but they are not very good at performing tasks that require letter-perfect accuracy, such as calculations. Try asking the model to calculate the sum of two numbers over 10000, and you will see that it often makes mistakes.
These weaknesses can be mitigated by using tools, which are functions that can be called by the model to perform specific tasks.

Tool calling is a technique that builds on structured outputs. It allows the user to define functions that can be called by the language model and executed during the conversation.

Tool calling also uses structured outputs under the hood, as defining a tool is done using a JSON Schema.

A tool for calculating the sum of two numbers might look like this:

```json
[
{
"type": "function",
"function": {
"name": "add",
"description": "Add two numbers.",
"parameters": {
"type": "object",
"properties": {
"num1": {
"type": "integer",
"description": "The first number."
},
"num2": {
"type": "integer",
"description": "The second number."
},
},
"required": [
"num1",
"num2",
]
}
}
}
]
```

In this exercise, solve TODO 4 to implement a tool that calculates mathematical operations (add, subtract, multiply, divide) between two numbers.


### 5. Extra
Congratulations, you implemented a basic agent! If you want to extend it, you can try these other options:
1. Replace the in-memory RAG implementation with a proper vector database (e.g. Qdrant).
2. Add more tools for the agent to use - e.g. a web search tool, a bash file finding tool, etc.
3. Try to extract data from other types of documents (e.g. logs) or use other data types of [JSON Schema](https://json-schema.org/understanding-json-schema/reference/type) (e.g. arrays, enums).
48 changes: 48 additions & 0 deletions docs/edge_ai/chat_with_llm/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
position: 3
---
# Chat With LLM
The Chat with LLM workshop will guide you through four essential techniques used for interacting with LLMs:
* Simple chat request
* RAG
* Structured outputs
* Tool calling

The application runs in the CLI and expects a user prompt. The user then selects one of the available techniques to interact with the LLM. The model will respond. The messages inside the conversation are stored in memory. The application will keep running until the user types "exit".

## Slides

<iframe src="/pdf/edgeai/2_llm_rustworkshop.pdf" loading="lazy" width="700" height="400">
Not able to display the slides
</iframe>

<a href="/pdf/edgeai/2_llm_rustworkshop.pdf" target="_blank">download the slides</a>.

## Quick Start

### Prerequisites
The following are already installed on the Raspberry Pi:
* [Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html)
* [Llama.cpp](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cpu-build)

### Deploying the models
```bash
llama-server --embeddings --hf-repo second-state/All-MiniLM-L6-v2-Embedding-GGUF --hf-file all-MiniLM-L6-v2-ggml-model-f16.gguf --port 8081 # embeddings model available on localhost:8081
llama-server --jinja --hf-repo MaziyarPanahi/gemma-3-1b-it-GGUF --hf-file gemma-3-1b-it.Q5_K_M.gguf # llm available on localhost:8080
```

## Repository

Please clone the repository.

```bash
git clone https://github.com/Wyliodrin/edge-ai-chat-with-llm.git
cd edge-ai-chat-with-llm
```

## Workshop
You will be working inside the `workshop.rs` file. The full implementation is available in the `full_demo.rs` file, in case you get stuck.
In order to run the workshop, execute:
```bash
RUST_LOG=info cargo run --bin workshop
```
119 changes: 119 additions & 0 deletions docs/edge_ai/face_authentication/1_image_processing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Exercise 01. Image Processing and Normalization

## Overview

This exercise teaches you how to properly preprocess images for computer vision models, specifically focusing on ImageNet normalization. You'll implement the `image_with_std_mean` function that transforms raw images into model-ready tensors.

## Understanding Tensors and Image Processing

### What is a Tensor?

A **tensor** is a multi-dimensional array that serves as the fundamental data structure in machine learning. Think of it as:

- **1D tensor**: A vector (like `[1, 2, 3, 4]`)
- **2D tensor**: A matrix (like a spreadsheet with rows and columns)
- **3D tensor**: A cube of data (like our image with height × width × channels)
- **4D tensor**: A batch of 3D tensors (multiple images)

For images, we use **3D tensors** with dimensions:
- **Channels**: Color information (3 for RGB: Red, Green, Blue)
- **Height**: Number of pixel rows
- **Width**: Number of pixel columns

ConvNeXt expects tensors in **"channels-first"** format: `(channels, height, width)` rather than `(height, width, channels)`.

### What is Normalization?

**Normalization** transforms data to have consistent statistical properties. For images, we perform two types:

1. **Scale Normalization**: Convert pixel values from `[0-255]` to `[0-1]` by dividing by 255
2. **Statistical Normalization**: Transform to have zero mean and unit variance using: `(value - mean) / standard_deviation`

### Why Use Mean and Standard Deviation?

The **ImageNet mean and standard deviation** values aren't arbitrary - they're computed from millions of natural images:

- **Mean `[0.485, 0.456, 0.406]`**: Average pixel values across Red, Green, Blue channels
- **Std `[0.229, 0.224, 0.225]`**: Standard deviation for each channel

**Why these specific values matter for ConvNeXt:**

1. **Distribution Matching**: ConvNeXt was trained on ImageNet data with these exact statistics. Using different values would be like speaking a different language to the model.

2. **Zero-Centered Data**: Subtracting the mean centers pixel values around zero, which helps neural networks learn faster and more stably.

3. **Unit Variance**: Dividing by standard deviation ensures all channels contribute equally to learning, preventing one color channel from dominating.

4. **Gradient Flow**: Normalized inputs lead to better gradient flow during training, preventing vanishing or exploding gradients.

## Why ImageNet Normalization is Critical for ConvNeXt

**ImageNet normalization is essential for four key reasons:**

1. **Neural Network Stability**: Raw pixel values (0-255) are too large and cause training instability. Normalizing to smaller ranges helps gradients flow properly during backpropagation.

2. **Pre-trained Model Compatibility**: ConvNeXt models are trained on ImageNet-normalized data. Using the same normalization ensures your input matches what the model expects - like using the same units of measurement.

3. **Feature Standardization**: Different color channels have different statistical distributions in natural images. Per-channel normalization gives equal importance to all color information.

4. **Mathematical Optimization**: The normalization formula `(pixel/255 - mean) / std` transforms arbitrary pixel values into a standardized range that neural networks can process efficiently.

**Without proper normalization, ConvNeXt will produce poor results** because the input distribution doesn't match its training data - imagine trying to use a thermometer calibrated in Celsius to read Fahrenheit temperatures!

## Your Task

Implement the `image_with_std_mean` function that:

1. **Resizes** the input image to the specified resolution using Triangle filtering
2. **Converts** to RGB8 format to ensure consistent color channels
3. **Creates** a tensor with shape `(3, height, width)` - channels first format
4. **Normalizes** pixel values from [0-255] to [0-1] range
5. **Applies** ImageNet standardization: `(pixel/255 - mean) / std`

## Implementation Steps

```rust
pub fn image_with_std_mean(
img: &DynamicImage,
res: usize,
mean: &[f32; 3],
std: &[f32; 3],
) -> Result<Tensor>
```

### Implementation Approach:

1. **Resize Image**: Use appropriate image resizing methods
2. **Convert Format**: Ensure consistent color channel format
3. **Extract Data**: Get raw pixel data from the image
4. **Create Tensor**: Build tensor with correct shape and dimensions
5. **Normalize**: Apply scaling and ImageNet standardization

### Key Operations Needed:
- Image resizing and format conversion
- Tensor creation from raw data
- Dimension reordering (channels-first format)
- Mathematical operations for normalization
- Broadcasting for per-channel operations

**Hint**: Check the CHEATSHEET.md for specific API calls and tensor operations.

## Testing

The test verifies that:
- Tensor values are in the expected normalized range (approximately [-2.5, 2.5])
- Values are actually normalized (not just zeros or ones)
- The transformation follows ImageNet standards

Run the test with:
```bash
cargo test
```

## Expected Output Format

- **Input**: DynamicImage of any size
- **Output**: Tensor with shape `(3, 224, 224)` and ImageNet-normalized values
- **Value Range**: Approximately [-2.12, 2.64] based on ImageNet constants

This preprocessing step is crucial for the face authentication pipeline, as it ensures images are in the exact format expected by the ConvNeXt model in the next exercise.
Loading