diff --git a/docs/edge_ai/chat_with_llm/1_chat_request.md b/docs/edge_ai/chat_with_llm/1_chat_request.md new file mode 100644 index 0000000..946bd2b --- /dev/null +++ b/docs/edge_ai/chat_with_llm/1_chat_request.md @@ -0,0 +1,11 @@ +# Exercise 01: Simple Chat Request + +During the workshop, we will be using Gemma 3 1B as our language model. The models are deployed using llama.cpp, which exposes an OpenAI-compatible API on port 8080. + +We have defined the necessary structs to interact with the model API. + +A chat request consists of the model name, an array of messages and optionally tools and response format. + +A message consists of the role (user, assistant, system) and the content. + +Complete the TODO 1 to implement the chat interaction logic. diff --git a/docs/edge_ai/chat_with_llm/2_RAG.md b/docs/edge_ai/chat_with_llm/2_RAG.md new file mode 100644 index 0000000..9b19226 --- /dev/null +++ b/docs/edge_ai/chat_with_llm/2_RAG.md @@ -0,0 +1,21 @@ +# Exercise 02: Retrieval-Augmented Generation (RAG) + +In this section, we will implement a RAG system that combines the language model with a document retrieval system. + +The embeddings model is also deployed using llama.cpp and exposes a slightly different API on port 8081. + +A RAG system is implemented as follows: +1. Calculate embeddings on documents inside the knowledge base. +2. Calculate the embedding of the user query. +3. Store the document embeddings in a vector database (for simplicity, we will use an in-memory vector store). +4. Get the most similar documents from the knowledge base using the query embedding, with a metric such as cosine similarity. +5. Pass the retrieved documents as context to the language model and generate a response. + +Here are some examples that you can add to the database and ask questions about them: + +1. The secret code to access the project is 'quantum_leap_42'. +2. Alice is the lead engineer for the new 'Orion' feature. +3. The project deadline has been moved to next Friday. + + +For this exercise, solve TODO 2 to implement the document retrieval logic. \ No newline at end of file diff --git a/docs/edge_ai/chat_with_llm/3_structured_outputs.md b/docs/edge_ai/chat_with_llm/3_structured_outputs.md new file mode 100644 index 0000000..96699c1 --- /dev/null +++ b/docs/edge_ai/chat_with_llm/3_structured_outputs.md @@ -0,0 +1,33 @@ +# Exercise 03: Structured Outputs + +Structured outputs are a way to format the model's responses, such that they can be parsed by other systems. Information extraction is a common use case for structured outputs, where the model is asked to extract specific information from a given text. + +Structured outputs are defined by a JSON Schema that describes the structure of the expected output. + +The schema is passed in the API request in the `response_format` field. An example schema for extracting the city from a given text looks like this: + +```json +{ + "type": "json_schema", + "json_schema": { + "name": "example_schema", + "schema": { + "type": "object", + "properties": { + "city": { + "type": "string", + } + } + } + } +} +``` + +In the background, llama.cpp parses this schema and creates a GBNF grammar that guides the model's response generation. More information in the [llama.cpp documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars). + +Keep in mind that using structured outputs can degrade the performance of LLMs, as shown by [Tam et al.](https://arxiv.org/abs/2408.02442) + +For this exercise, solve TODO 3 in order to extract the name, city and age of user from a given text. + +Here's an example prompt you can use to test your implementation: +```John is a 25 years old software engineer living in New York.``` \ No newline at end of file diff --git a/docs/edge_ai/chat_with_llm/4_tool_calling.md b/docs/edge_ai/chat_with_llm/4_tool_calling.md new file mode 100644 index 0000000..9abe02f --- /dev/null +++ b/docs/edge_ai/chat_with_llm/4_tool_calling.md @@ -0,0 +1,48 @@ +# Exercise 04: Tool Calling + +LLMs are very good at generating text, but they are not very good at performing tasks that require letter-perfect accuracy, such as calculations. Try asking the model to calculate the sum of two numbers over 10000, and you will see that it often makes mistakes. +These weaknesses can be mitigated by using tools, which are functions that can be called by the model to perform specific tasks. + +Tool calling is a technique that builds on structured outputs. It allows the user to define functions that can be called by the language model and executed during the conversation. + +Tool calling also uses structured outputs under the hood, as defining a tool is done using a JSON Schema. + +A tool for calculating the sum of two numbers might look like this: + +```json +[ + { + "type": "function", + "function": { + "name": "add", + "description": "Add two numbers.", + "parameters": { + "type": "object", + "properties": { + "num1": { + "type": "integer", + "description": "The first number." + }, + "num2": { + "type": "integer", + "description": "The second number." + }, + }, + "required": [ + "num1", + "num2", + ] + } + } + } +] +``` + +In this exercise, solve TODO 4 to implement a tool that calculates mathematical operations (add, subtract, multiply, divide) between two numbers. + + +### 5. Extra +Congratulations, you implemented a basic agent! If you want to extend it, you can try these other options: +1. Replace the in-memory RAG implementation with a proper vector database (e.g. Qdrant). +2. Add more tools for the agent to use - e.g. a web search tool, a bash file finding tool, etc. +3. Try to extract data from other types of documents (e.g. logs) or use other data types of [JSON Schema](https://json-schema.org/understanding-json-schema/reference/type) (e.g. arrays, enums). \ No newline at end of file diff --git a/docs/edge_ai/chat_with_llm/index.md b/docs/edge_ai/chat_with_llm/index.md new file mode 100644 index 0000000..3b7b1c3 --- /dev/null +++ b/docs/edge_ai/chat_with_llm/index.md @@ -0,0 +1,48 @@ +--- +position: 3 +--- +# Chat With LLM +The Chat with LLM workshop will guide you through four essential techniques used for interacting with LLMs: +* Simple chat request +* RAG +* Structured outputs +* Tool calling + +The application runs in the CLI and expects a user prompt. The user then selects one of the available techniques to interact with the LLM. The model will respond. The messages inside the conversation are stored in memory. The application will keep running until the user types "exit". + +## Slides + + + +download the slides. + +## Quick Start + +### Prerequisites +The following are already installed on the Raspberry Pi: +* [Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) +* [Llama.cpp](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cpu-build) + +### Deploying the models +```bash +llama-server --embeddings --hf-repo second-state/All-MiniLM-L6-v2-Embedding-GGUF --hf-file all-MiniLM-L6-v2-ggml-model-f16.gguf --port 8081 # embeddings model available on localhost:8081 +llama-server --jinja --hf-repo MaziyarPanahi/gemma-3-1b-it-GGUF --hf-file gemma-3-1b-it.Q5_K_M.gguf # llm available on localhost:8080 +``` + +## Repository + +Please clone the repository. + +```bash +git clone https://github.com/Wyliodrin/edge-ai-chat-with-llm.git +cd edge-ai-chat-with-llm +``` + +## Workshop +You will be working inside the `workshop.rs` file. The full implementation is available in the `full_demo.rs` file, in case you get stuck. +In order to run the workshop, execute: +```bash +RUST_LOG=info cargo run --bin workshop +``` \ No newline at end of file diff --git a/docs/edge_ai/face_authentication/1_image_processing.md b/docs/edge_ai/face_authentication/1_image_processing.md new file mode 100644 index 0000000..a6ced24 --- /dev/null +++ b/docs/edge_ai/face_authentication/1_image_processing.md @@ -0,0 +1,119 @@ +# Exercise 01. Image Processing and Normalization + +## Overview + +This exercise teaches you how to properly preprocess images for computer vision models, specifically focusing on ImageNet normalization. You'll implement the `image_with_std_mean` function that transforms raw images into model-ready tensors. + +## Understanding Tensors and Image Processing + +### What is a Tensor? + +A **tensor** is a multi-dimensional array that serves as the fundamental data structure in machine learning. Think of it as: + +- **1D tensor**: A vector (like `[1, 2, 3, 4]`) +- **2D tensor**: A matrix (like a spreadsheet with rows and columns) +- **3D tensor**: A cube of data (like our image with height × width × channels) +- **4D tensor**: A batch of 3D tensors (multiple images) + +For images, we use **3D tensors** with dimensions: +- **Channels**: Color information (3 for RGB: Red, Green, Blue) +- **Height**: Number of pixel rows +- **Width**: Number of pixel columns + +ConvNeXt expects tensors in **"channels-first"** format: `(channels, height, width)` rather than `(height, width, channels)`. + +### What is Normalization? + +**Normalization** transforms data to have consistent statistical properties. For images, we perform two types: + +1. **Scale Normalization**: Convert pixel values from `[0-255]` to `[0-1]` by dividing by 255 +2. **Statistical Normalization**: Transform to have zero mean and unit variance using: `(value - mean) / standard_deviation` + +### Why Use Mean and Standard Deviation? + +The **ImageNet mean and standard deviation** values aren't arbitrary - they're computed from millions of natural images: + +- **Mean `[0.485, 0.456, 0.406]`**: Average pixel values across Red, Green, Blue channels +- **Std `[0.229, 0.224, 0.225]`**: Standard deviation for each channel + +**Why these specific values matter for ConvNeXt:** + +1. **Distribution Matching**: ConvNeXt was trained on ImageNet data with these exact statistics. Using different values would be like speaking a different language to the model. + +2. **Zero-Centered Data**: Subtracting the mean centers pixel values around zero, which helps neural networks learn faster and more stably. + +3. **Unit Variance**: Dividing by standard deviation ensures all channels contribute equally to learning, preventing one color channel from dominating. + +4. **Gradient Flow**: Normalized inputs lead to better gradient flow during training, preventing vanishing or exploding gradients. + +## Why ImageNet Normalization is Critical for ConvNeXt + +**ImageNet normalization is essential for four key reasons:** + +1. **Neural Network Stability**: Raw pixel values (0-255) are too large and cause training instability. Normalizing to smaller ranges helps gradients flow properly during backpropagation. + +2. **Pre-trained Model Compatibility**: ConvNeXt models are trained on ImageNet-normalized data. Using the same normalization ensures your input matches what the model expects - like using the same units of measurement. + +3. **Feature Standardization**: Different color channels have different statistical distributions in natural images. Per-channel normalization gives equal importance to all color information. + +4. **Mathematical Optimization**: The normalization formula `(pixel/255 - mean) / std` transforms arbitrary pixel values into a standardized range that neural networks can process efficiently. + +**Without proper normalization, ConvNeXt will produce poor results** because the input distribution doesn't match its training data - imagine trying to use a thermometer calibrated in Celsius to read Fahrenheit temperatures! + +## Your Task + +Implement the `image_with_std_mean` function that: + +1. **Resizes** the input image to the specified resolution using Triangle filtering +2. **Converts** to RGB8 format to ensure consistent color channels +3. **Creates** a tensor with shape `(3, height, width)` - channels first format +4. **Normalizes** pixel values from [0-255] to [0-1] range +5. **Applies** ImageNet standardization: `(pixel/255 - mean) / std` + +## Implementation Steps + +```rust +pub fn image_with_std_mean( + img: &DynamicImage, + res: usize, + mean: &[f32; 3], + std: &[f32; 3], +) -> Result +``` + +### Implementation Approach: + +1. **Resize Image**: Use appropriate image resizing methods +2. **Convert Format**: Ensure consistent color channel format +3. **Extract Data**: Get raw pixel data from the image +4. **Create Tensor**: Build tensor with correct shape and dimensions +5. **Normalize**: Apply scaling and ImageNet standardization + +### Key Operations Needed: +- Image resizing and format conversion +- Tensor creation from raw data +- Dimension reordering (channels-first format) +- Mathematical operations for normalization +- Broadcasting for per-channel operations + +**Hint**: Check the CHEATSHEET.md for specific API calls and tensor operations. + +## Testing + +The test verifies that: +- Tensor values are in the expected normalized range (approximately [-2.5, 2.5]) +- Values are actually normalized (not just zeros or ones) +- The transformation follows ImageNet standards + +Run the test with: +```bash +cargo test +``` + +## Expected Output Format + +- **Input**: DynamicImage of any size +- **Output**: Tensor with shape `(3, 224, 224)` and ImageNet-normalized values +- **Value Range**: Approximately [-2.12, 2.64] based on ImageNet constants + +This preprocessing step is crucial for the face authentication pipeline, as it ensures images are in the exact format expected by the ConvNeXt model in the next exercise. diff --git a/docs/edge_ai/face_authentication/2_embeddings.md b/docs/edge_ai/face_authentication/2_embeddings.md new file mode 100644 index 0000000..1e50961 --- /dev/null +++ b/docs/edge_ai/face_authentication/2_embeddings.md @@ -0,0 +1,139 @@ +# Exercise 02. ConvNeXt Model and Embedding Generation + +## Overview + +This exercise teaches you how to load a pre-trained ConvNeXt model and use it to generate face embeddings. You'll implement two key functions: `build_model()` to load the model and `compute_embedding()` to generate feature vectors from facial images. + +## What is ConvNeXt? + +ConvNeXt (Convolution meets NeXt) is a modern convolutional neural network architecture that bridges the gap between traditional CNNs and Vision Transformers (ViTs). Introduced by Facebook AI Research in 2022, ConvNeXt modernizes the standard ResNet architecture by incorporating design choices inspired by Vision Transformers. + +### Key Features of ConvNeXt: +- **Pure Convolutional Architecture**: Uses only convolutions, no self-attention mechanisms +- **Modernized ResNet Design**: Incorporates macro and micro design choices from ViTs +- **Competitive Performance**: Achieves performance comparable to Swin Transformers +- **Efficiency**: Maintains the computational efficiency of traditional CNNs + +### ConvNeXt-Atto Variant: +We use **ConvNeXt-Atto**, an ultra-lightweight variant that provides excellent performance for face recognition tasks while being computationally efficient. + +## What are Face Embeddings? + +Embeddings are dense, low-dimensional vector representations that capture the essential characteristics of a face in numerical form. + +### Purpose of Face Embeddings: +1. **Dimensionality Reduction**: Convert 224×224×3 images (~150K pixels) to compact vectors (~320 dimensions) +2. **Feature Extraction**: Capture essential facial characteristics (eye shape, nose structure, etc.) +3. **Similarity Computation**: Enable mathematical comparison between different faces +4. **Efficient Storage**: Store compact representations instead of full images + +### Properties of Good Face Embeddings: +- **Discriminative**: Different people produce different embeddings +- **Robust**: Similar embeddings for the same person under different conditions +- **Compact**: Much smaller than original images +- **Comparable**: Can be compared using mathematical similarity metrics + +## Your Tasks + +### Task 1: Implement `build_model()` + +```rust +pub fn build_model() -> Result> +``` + +This function should: +1. **Download Model**: Use Hugging Face Hub API to get "timm/convnext_atto.d2_in1k" +2. **Load Weights**: Load the SafeTensors model file +3. **Create Model**: Build ConvNeXt without the final classification layer +4. **Return Function**: Return a callable model function + +#### Why "Without Final Layer"? + +The original ConvNeXt model was trained for ImageNet classification (1000 classes). It has: +- **Feature Extraction Layers**: Extract meaningful patterns from images +- **Final Classification Layer**: Maps features to 1000 ImageNet class probabilities + +For face embeddings, we want: +- ✅ **Feature Extraction**: The rich feature representations (embeddings) +- ❌ **Classification**: We don't need ImageNet class predictions + +By removing the final layer, we get the raw feature vectors (embeddings) that capture facial characteristics, which we can then use for similarity comparison: Use `convnext::convnext_no_final_layer` - CHECK CANDLE CONVNEXT + + +#### Implementation Approach: +- Use Hugging Face Hub API for model download +- Load model weights with VarBuilder +- Create ConvNeXt architecture without classification head +- Return the model as a callable function + +**Hint**: Check the CHEATSHEET.md for HuggingFace API patterns and model loading. + +### Task 2: Implement `compute_embedding()` + +```rust +pub fn compute_embedding(model: &Func, image: &Tensor) -> Result +``` + +This function should: +1. **Handle Input Format**: Check if input is single image or batch +2. **Add Batch Dimension**: If needed, ensure proper tensor dimensions +3. **Forward Pass**: Run the image through the model +4. **Return Embeddings**: Return the feature vectors + +#### Implementation Approach: +- Check tensor dimensions to determine if batching is needed +- Ensure input tensor has the correct shape for the model +- Use the model's forward method to generate embeddings +- Return the resulting embedding tensor + +**Hint**: Models typically expect batch dimensions. Check the CHEATSHEET.md for tensor dimension handling. + +## Technical Details + +### Model Architecture: +- **Input**: 224×224×3 RGB images (ImageNet normalized) +- **Output**: 768-dimensional embedding vectors +- **Weights**: Pre-trained on ImageNet dataset +- **Format**: SafeTensors for efficient loading + +### Tensor Shapes: +- **Single Image Input**: `[3, 224, 224]` → `[1, 3, 224, 224]` (add batch dim) +- **Batch Input**: `[N, 3, 224, 224]` → `[N, 3, 224, 224]` (keep as is) +- **Output**: `[N, 768]` where N is batch size + +### Key Dependencies: +- `hf_hub` - Download models from Hugging Face +- `candle_transformers::models::convnext` - ConvNeXt implementation +- `candle_nn::VarBuilder` - Load model weights + +## Testing + +The test verifies that: +- Model loads successfully from Hugging Face +- Embedding computation works with preprocessed images +- Output tensor has the correct batch dimension + +Run the test with: +```bash +cargo test +``` + +## Expected Behavior + +After successful implementation: +- `build_model()` downloads and loads the ConvNeXt-Atto model +- `compute_embedding()` processes images and returns 768-dimensional embeddings +- The model handles both single images and batches automatically + +## Next Steps + +After completing this exercise, you'll be ready to: +- Learn similarity computation between embeddings (Exercise 03) +- Understand how these embeddings enable face recognition +- Build storage systems for embedding databases (Exercise 04) + +## References + +- **ConvNeXt Paper**: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) +- **Hugging Face Model**: [timm/convnext_atto.d2_in1k](https://huggingface.co/timm/convnext_atto.d2_in1k) +- **Candle ConvNeXt**: [GitHub Implementation](https://github.com/huggingface/candle/blob/main/candle-transformers/src/models/convnext.rs) diff --git a/docs/edge_ai/face_authentication/3_similarity.md b/docs/edge_ai/face_authentication/3_similarity.md new file mode 100644 index 0000000..9a3ffd0 --- /dev/null +++ b/docs/edge_ai/face_authentication/3_similarity.md @@ -0,0 +1,134 @@ +# Exercise 03: Cosine Similarity for Face Authentication + +## Overview + +This exercise teaches you how to compute cosine similarity between face embeddings - the core mathematical operation that enables face recognition. You'll implement L2 normalization and cosine similarity functions that determine whether two faces belong to the same person. + +## Why Cosine Similarity for Face Recognition? + +Cosine similarity is the gold standard for comparing face embeddings because it: + +- **Measures Direction, Not Magnitude**: Focuses on the "shape" of the embedding vector, not its size +- **Handles Lighting Variations**: Less sensitive to brightness changes that might scale embedding values +- **Provides Intuitive Scores**: Returns values between -1 and 1, where 1 means identical faces +- **Industry Standard**: Used by most production face recognition systems + +## Mathematical Foundation + +### L2 Normalization +**Formula**: `normalized_vector = vector / ||vector||₂` + +L2 normalization ensures all embeddings have unit length (magnitude = 1), which: +- **Standardizes Comparisons**: All vectors have the same magnitude +- **Improves Robustness**: Reduces sensitivity to lighting and scale variations +- **Enables Fair Comparison**: Focuses on directional relationships +- **Optimizes Similarity**: Makes cosine similarity equivalent to dot product + +### Cosine Similarity +**Formula**: `cosine_similarity = (A · B) / (||A|| × ||B||)` + +For normalized vectors, this simplifies to just the dot product: `A · B` + +**Key Properties**: +- **Range**: [-1, 1] where 1 = identical, 0 = orthogonal, -1 = opposite +- **Magnitude Invariant**: Only considers the angle between vectors +- **Symmetric**: similarity(A, B) = similarity(B, A) + +## Your Tasks + +### Task 1: Implement `normalize_l2()` + +```rust +fn normalize_l2(v: &Tensor) -> Result +``` + +This helper function should: +1. **Calculate L2 Norm**: Compute the magnitude of the vector +2. **Normalize**: Divide the vector by its norm to get unit length + +#### Implementation Approach: +- Use tensor operations to compute the L2 norm (square, sum, square root) +- Apply broadcasting division to normalize the vector +- Ensure dimensions are handled correctly for broadcasting + +**Hint**: Check the CHEATSHEET.md for L2 normalization building blocks. + +### Task 2: Implement `cosine_similarity()` + +```rust +pub fn cosine_similarity(emb_a: &Tensor, emb_b: &Tensor) -> Result +``` + +This function should: +1. **Normalize Both Embeddings**: Apply L2 normalization to both inputs +2. **Compute Dot Product**: Calculate the similarity using matrix operations +3. **Extract Scalar**: Convert the result tensor to a single f32 value + +#### Implementation Approach: +- Use your `normalize_l2` function on both input embeddings +- Perform matrix multiplication to compute the dot product +- Handle tensor dimensions and extract the final scalar value + +**Hint**: Check the CHEATSHEET.md for cosine similarity building blocks and tensor operations. + +## Technical Details + +### Tensor Shapes: +- **Input Embeddings**: `[1, 768]` (batch size 1, 768 dimensions) +- **After Normalization**: `[1, 768]` (same shape, unit length) +- **After Matrix Multiply**: `[1, 1]` (scalar in tensor form) +- **Final Output**: `f32` scalar value + +### Key Candle Operations: +- `.sqr()` - Element-wise square +- `.sum_keepdim(1)` - Sum along dimension 1, keep the dimension +- `.sqrt()` - Element-wise square root +- `.broadcast_div()` - Element-wise division with broadcasting +- `.matmul()` - Matrix multiplication +- `.transpose(0, 1)` - Swap dimensions 0 and 1 +- `.squeeze()` - Remove dimensions of size 1 +- `.to_vec0::()` - Convert 0D tensor to scalar + +## Testing + +The test verifies that: +- Same person (brad1.png vs brad2.png) has higher similarity than different people +- The similarity computation works with real face embeddings +- Values are in the expected range + +Run the test with: +```bash +cargo test +``` + +## Understanding the Results + +### Typical Similarity Ranges: +- **Same Person**: 0.7 - 0.95 (high similarity) +- **Different People**: 0.2 - 0.6 (lower similarity) +- **Identical Images**: ~1.0 (perfect similarity) + +### Authentication Thresholds: +- **High Security**: 0.85+ (few false positives, some false negatives) +- **Balanced**: 0.75+ (good balance of security and usability) +- **High Accessibility**: 0.65+ (fewer false negatives, more false positives) + +## Real-World Considerations + +### Factors Affecting Similarity: +- **Lighting Conditions**: Dramatic lighting can reduce similarity +- **Facial Expressions**: Extreme expressions may lower scores +- **Image Quality**: Blurry or low-resolution images affect accuracy +- **Pose Variations**: Profile vs frontal views impact similarity + +## Next Steps + +After completing this exercise, you'll be ready to: +- Build storage systems for face embeddings (Exercise 04) +- Implement similarity search and retrieval (Exercise 05) + +## References + +- **Cosine Similarity**: [Wikipedia - Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) +- **Face Recognition Survey**: [Deep Face Recognition: A Survey](https://arxiv.org/abs/1804.06655) +- **L2 Normalization**: [Unit Vector Normalization](https://en.wikipedia.org/wiki/Unit_vector) diff --git a/docs/edge_ai/face_authentication/4_local_storage.md b/docs/edge_ai/face_authentication/4_local_storage.md new file mode 100644 index 0000000..3a98a76 --- /dev/null +++ b/docs/edge_ai/face_authentication/4_local_storage.md @@ -0,0 +1,216 @@ +# Exercise 04: Local File Storage for Face Embeddings + +## Overview + +This exercise teaches you how to build a persistent storage system for face embeddings using local JSON files. You'll implement a complete storage solution that can save, retrieve, and manage embedding records - essential for any face authentication system that needs to remember users between sessions. + +## Why Storage Matters + +Face authentication systems need persistent storage to: +- **Remember Users**: Store embeddings from registration for future login attempts +- **Enable Comparison**: Retrieve stored embeddings to compare against live captures +- **Manage Identities**: Track multiple embeddings per user for better accuracy +- **Persist Data**: Maintain user data across application restarts + +## Architecture Overview + +The storage system uses a trait-based design for flexibility: + +```rust +// Data structure for each stored embedding +pub struct EmbeddingRecord { + pub id: String, // Unique identifier + pub name: String, // User name + pub embedding: Vec, // Face embedding vector + pub created_at: chrono::DateTime, // Timestamp + pub metadata: HashMap, // Additional data +} + +// Storage interface (trait) +pub trait EmbeddingStorage { + fn store_embedding(&mut self, record: EmbeddingRecord) -> Result<()>; + fn get_embedding(&self, id: &str) -> Result>; + fn get_all_embeddings(&self) -> Result>; + fn delete_embedding(&mut self, id: &str) -> Result; +} + +// Local file implementation +pub struct LocalFileStorage { + file_path: String, + data: Mutex>, +} +``` + +## Your Tasks + +### Task 1: Implement `LocalFileStorage::new()` + +```rust +pub fn new(file_path: String) -> Result +``` + +This constructor should: +1. **Create Storage Instance**: Initialize the struct with the file path +2. **Set Up In-Memory Cache**: Create a Mutex-protected HashMap for fast access +3. **Load Existing Data**: Call `load_data()` to read any existing records + +#### Implementation Approach: +- Initialize the struct fields with appropriate values +- Set up thread-safe data structures for concurrent access +- Load any existing data from the file system + +**Hint**: Check the CHEATSHEET.md for HashMap and Mutex patterns. + +### Task 2: Implement `load_data()` + +```rust +fn load_data(&self) -> Result<()> +``` + +This method should: +1. **Check File Existence**: Return early if file doesn't exist +2. **Handle Empty Files**: Deal gracefully with zero-byte files +3. **Parse JSON**: Deserialize the file content into a HashMap +4. **Update Cache**: Store the loaded data in the in-memory HashMap + +#### Implementation Approach: +- Use file system operations to check existence and size +- Handle JSON deserialization with proper error handling +- Update the in-memory cache with loaded data +- Use proper mutex locking for thread safety + +**Hint**: Check the CHEATSHEET.md for JSON deserialization patterns. + +### Task 3: Implement `save_data()` + +```rust +fn save_data(&self) -> Result<()> +``` + +This method should: +1. **Create Directory**: Ensure the parent directory exists +2. **Open File**: Create or truncate the target file +3. **Serialize Data**: Convert the HashMap to JSON format +4. **Write File**: Save the JSON to disk + +#### Implementation Approach: +- Handle directory creation for the file path +- Use appropriate file opening options for writing +- Serialize the in-memory data to JSON format +- Ensure thread-safe access to the data + +**Hint**: Check the CHEATSHEET.md for JSON serialization and file operations. + +### Task 4: Implement `EmbeddingStorage` Trait + +#### `store_embedding()` +```rust +fn store_embedding(&mut self, record: EmbeddingRecord) -> Result<()> +``` +- Add the record to in-memory storage and persist to disk + +#### `get_embedding()` +```rust +fn get_embedding(&self, id: &str) -> Result> +``` +- Search for and return a record by its ID + +#### `get_all_embeddings()` +```rust +fn get_all_embeddings(&self) -> Result> +``` +- Return all stored embedding records + +#### `delete_embedding()` +```rust +fn delete_embedding(&mut self, id: &str) -> Result +``` +- Remove a record by ID and return whether deletion was successful + +**Implementation Approach**: Use HashMap operations with proper mutex locking and persistence. + +### Task 5: Implement `open_temp_storage()` + +```rust +pub fn open_temp_storage() -> Result<(Box, String)> +``` + +This helper function should: +1. **Generate Unique Path**: Create a temporary filename using UUID +2. **Create Storage**: Initialize a LocalFileStorage instance +3. **Return Boxed Trait**: Return as a trait object for flexibility + +#### Implementation Approach: +- Generate a unique filename using UUID +- Create a new storage instance with that path +- Return both the storage and path for cleanup + +**Hint**: Use `Uuid::new_v4()` for unique identifiers. + +## Technical Details + +### JSON File Format: +The storage saves data as a JSON object where keys are record IDs: +```json +{ + "uuid-1": { + "id": "uuid-1", + "name": "Alice", + "embedding": [0.1, 0.2, ...], + "created_at": "2024-01-01T12:00:00Z", + "metadata": {} + }, + "uuid-2": { ... } +} +``` + +### Concurrency Handling: +- Uses `Mutex` for thread-safe access to in-memory data +- Locks are held briefly during read/write operations +- File I/O is synchronized through the mutex + +### Error Handling: +- Gracefully handles missing files (starts with empty storage) +- Deals with corrupted JSON (logs warning, starts fresh) +- Proper error propagation using `Result` + +## Testing + +The provided tests verify: +- **Basic Storage**: Can store and retrieve records +- **Sorting**: Results are returned in correct order +- **Limits**: Respects k-parameter for top-k queries +- **Empty Storage**: Handles empty storage gracefully + +Run tests with: +```bash +cargo test +``` + +## File Management + +The storage system: +- **Auto-creates** directories as needed +- **Handles** missing files gracefully +- **Overwrites** files completely on each save (simple but safe) +- **Uses** pretty-printed JSON for human readability + +## Production Considerations + +This simple file-based approach works well for: +- **Development and Testing**: Easy to inspect and debug +- **Small Datasets**: Hundreds to thousands of embeddings +- **Single-User Applications**: No concurrent access needed + +For production systems, consider: +- **Database Storage**: PostgreSQL with pgvector extension +- **Vector Databases**: Qdrant, Pinecone, Weaviate +- **Concurrent Access**: Proper locking mechanisms +- **Backup Strategies**: Regular data backups + +## Next Steps + +After completing this exercise, you'll be ready to: +- Implement similarity search and retrieval (Exercise 05) + +This storage foundation is crucial for the face authentication system's ability to persist and retrieve user embeddings efficiently. diff --git a/docs/edge_ai/face_authentication/5_retrieval.md b/docs/edge_ai/face_authentication/5_retrieval.md new file mode 100644 index 0000000..69a679c --- /dev/null +++ b/docs/edge_ai/face_authentication/5_retrieval.md @@ -0,0 +1,158 @@ +# Exercise 05: Vector Retrieval and Similarity Search + +## Overview + +This exercise teaches you how to implement vector similarity search - the core functionality that enables face recognition systems to find matching faces. You'll build a `top_k` function that searches through stored embeddings to find the most similar faces to a query. + +## What is Vector Similarity Search? + +Vector similarity search is the process of: +1. **Taking a query vector** (e.g., embedding of a face to identify) +2. **Comparing it against a database** of stored vectors (known face embeddings) +3. **Ranking results by similarity** (most similar faces first) +4. **Returning the top matches** (k most similar faces) + +This is exactly how face authentication works: +- **Registration**: Store face embeddings in the database +- **Login**: Capture new face, find most similar stored embedding +- **Decision**: If similarity > threshold, grant access + +## Your Task + +Implement the `top_k` function that performs similarity search: + +```rust +pub fn top_k(storage: &dyn EmbeddingStorage, query: &[f32], k: usize) -> Result> +``` + +### Algorithm Steps: + +1. **Retrieve All Embeddings**: Get all stored embeddings from storage +2. **Calculate Similarities**: Compute cosine similarity between query and each stored embedding +3. **Sort by Similarity**: Order results from highest to lowest similarity +4. **Return Top-K**: Take only the k most similar results + +### Implementation Approach: + +The algorithm follows these conceptual steps: +1. **Retrieve All Embeddings**: Get stored embeddings from storage +2. **Calculate Similarities**: Compute similarity between query and each stored embedding +3. **Sort Results**: Order by similarity (highest first) +4. **Return Top-K**: Take only the k most similar results + +### Key Operations Needed: +- Storage retrieval operations +- Vector similarity computation (cosine similarity) +- Sorting and ranking algorithms +- Result limiting and formatting + +**Hint**: You'll need a vector-based cosine similarity function. Check the CHEATSHEET.md for similarity computation building blocks. + +## Key Implementation Details + +### Edge Cases to Handle: +- **Empty Storage**: Return empty vector if no embeddings stored +- **k = 0**: Return empty vector +- **k > stored count**: Return all available embeddings +- **Division by Zero**: Handle zero-magnitude vectors gracefully + +### Performance Characteristics: +- **Time Complexity**: O(n × d + n log n) where n = number of stored embeddings, d = embedding dimension +- **Space Complexity**: O(n) for storing similarity scores +- **Scalability**: Linear scan works for thousands of embeddings, but not millions + +### Sorting Considerations: +- Use **descending order** (highest similarity first) +- Handle **NaN values** with `partial_cmp()` and `unwrap_or()` +- **Stable sort** ensures consistent ordering for equal similarities + +## Testing + +The tests verify that your implementation: + +1. **Returns Best Match First**: Most similar embedding appears first in results +2. **Sorts Correctly**: Results are in descending similarity order +3. **Respects K Limit**: Returns exactly k results (or fewer if less data available) +4. **Handles Empty Storage**: Works correctly with no stored embeddings + +Run tests with: +```bash +cargo test +``` + +## Example Usage + +```rust +// Create storage and add some face embeddings +let (mut storage, _path) = open_temp_storage()?; +add_record(storage.as_mut(), "Alice", vec![1.0, 0.0, 0.0])?; +add_record(storage.as_mut(), "Bob", vec![0.0, 1.0, 0.0])?; +add_record(storage.as_mut(), "Charlie", vec![0.8, 0.2, 0.0])?; + +// Search for faces similar to query +let query = vec![0.9, 0.1, 0.0]; // Similar to Alice and Charlie +let results = top_k(storage.as_ref(), &query, 2)?; + +// Results will be: +// 1. Alice (similarity ≈ 0.95) +// 2. Charlie (similarity ≈ 0.85) +``` + +## Production Vector Databases + +While this exercise teaches the fundamentals, production systems use specialized vector databases for better performance: + +### Why Specialized Vector DBs? + +- **Approximate Nearest Neighbor (ANN)**: Sub-linear search time using indexing +- **Horizontal Scaling**: Handle millions/billions of vectors +- **Real-time Updates**: Add/remove vectors without rebuilding indices +- **Advanced Filtering**: Combine similarity search with metadata filters +- **Optimized Storage**: Compressed vectors and efficient memory usage + +### Recommended Options: + +#### Qdrant (Rust-Native) ⭐ +- **Homepage**: [https://qdrant.tech/](https://qdrant.tech/) +- **Why Choose**: Written in Rust, excellent Rust client, HNSW indexing +- **Use Case**: Perfect for Rust applications requiring high performance + +#### pgvector (PostgreSQL Extension) +- **Homepage**: [https://github.com/pgvector/pgvector](https://github.com/pgvector/pgvector) +- **Why Choose**: SQL-based, ACID transactions, familiar ecosystem +- **Use Case**: When you already use PostgreSQL and want vector search + +#### Pinecone +- **Homepage**: [https://www.pinecone.io/](https://www.pinecone.io/) +- **Why Choose**: Fully managed, serverless, auto-scaling +- **Use Case**: When you want zero infrastructure management + +## Real-World Applications + +Vector similarity search powers: + +- **Face Recognition**: Find matching faces in databases +- **Recommendation Systems**: Find similar products/content +- **Semantic Search**: Find documents by meaning, not keywords +- **Duplicate Detection**: Identify similar images/documents +- **Anomaly Detection**: Find outliers in high-dimensional data + +## Performance Optimization + +For production systems: + +1. **Indexing**: Use HNSW, IVF, or LSH for faster search +2. **Quantization**: Reduce vector precision to save memory +3. **Caching**: Cache frequently accessed embeddings +4. **Batching**: Process multiple queries together +5. **Filtering**: Pre-filter by metadata before similarity search + +## Next Steps + +After completing this exercise, you'll understand: +- How face recognition systems find matching faces +- The trade-offs between accuracy and performance in vector search +- Why production systems need specialized vector databases +- How to implement and optimize similarity search algorithms + +This completes the face authentication workshop! You now have all the building blocks to create a complete face recognition system. diff --git a/docs/edge_ai/face_authentication/6_full_application.md b/docs/edge_ai/face_authentication/6_full_application.md new file mode 100644 index 0000000..efbd11a --- /dev/null +++ b/docs/edge_ai/face_authentication/6_full_application.md @@ -0,0 +1,390 @@ +# Exercise 06: Complete Face Authentication System + +This exercise integrates all components from exercises 1-5 into a working face authentication system. The application contains **TODO markers** where you'll implement the functionality you've learned. + +## 🎯 Learning Objectives + +- Integrate image processing, embeddings, similarity computation, and storage +- Build a functional face authentication system +- Understand how components work together in a complete application + +## 📚 Prerequisites + +Before starting this exercise, you should have completed: + +1. **Exercise 01** - Image Processing & ImageNet Normalization +2. **Exercise 02** - ConvNeXt Model & Embedding Generation +3. **Exercise 03** - Cosine Similarity & Face Comparison +4. **Exercise 04** - Local File Storage for Embeddings +5. **Exercise 05** - Vector Retrieval & Similarity Search + +## 🔧 What You'll Implement + +This application contains **TODO sections** that map directly to your previous learning: + +### TODOs from Exercise 01 (Image Processing) +- Image preprocessing and normalization functions +- Converting images to model-ready tensors + +### TODOs from Exercise 02 (Embeddings) +- Model loading and initialization +- Face embedding computation from images + +### TODOs from Exercise 03 (Similarity) +- Cosine similarity calculation between embeddings +- L2 normalization for embedding comparison + +### TODOs from Exercise 04 (Storage) +- Local file storage implementation +- Embedding persistence and retrieval + +### TODOs from Exercise 05 (Retrieval) +- Similarity search across stored embeddings +- Top-k retrieval for face matching + +## 🚧 Implementation Status + +The application framework is provided with TODO markers where you need to implement functionality from exercises 1-5: + +- **Image processing** (Exercise 01): `image_with_std_mean` function +- **Model loading & embeddings** (Exercise 02): `build_model` and `compute_embeddings` functions +- **Similarity computation** (Exercise 03): `normalize_l2` and `cosine_similarity` functions +- **Storage system** (Exercise 04): All `LocalFileVectorStorage` methods + +**Implementation approach:** +1. Follow the TODO comments and reference your previous exercises +2. Implement incrementally and test each component +3. See how individual pieces work together in a complete application + +## 🚀 Application Features + +The completed system provides: +- Face embedding generation using ConvNeXt model +- Local file storage for embeddings in JSON format +- Real-time face registration from video stream +- User authentication via face comparison +- YAML-based configuration + +## 🔍 Finding and Completing TODOs + +### Step 1: Locate TODO Markers +Search for `TODO` comments throughout the codebase. These mark the exact locations where you need to apply your knowledge from exercises 1-5: + +```bash +# Find all TODOs in the project +grep -r "TODO" src/ +``` + +### Step 2: TODO Locations by Exercise + +**Exercise 01 - Image Processing:** +- `src/image_utils/imagenet.rs`: `image_with_std_mean` function + +**Exercise 02 - Embeddings:** +- `src/embeddings/utils.rs`: `build_model` and `compute_embeddings` functions + +**Exercise 03 - Similarity:** +- `src/login.rs`: `normalize_l2` and `cosine_similarity` functions + +**Exercise 04 - Storage:** +- `src/storage/local_file_vector_storage.rs`: All storage methods (`new`, `load_data`, `save_data`, `store_embedding`, `get_embedding`, `get_all_embeddings`, `delete_embedding`) + +**Exercise 05 - Retrieval (Optional Enhancement):** +- `src/login.rs`: Optional similarity search optimization concepts + +### Step 3: Implementation Order + +**Recommended implementation order:** +1. **Exercise 01**: Image processing (needed for camera input) +2. **Exercise 02**: Model loading and embeddings (core functionality) +3. **Exercise 04**: Storage system (needed to save/load embeddings) +4. **Exercise 03**: Similarity computation (needed for authentication) +5. **Exercise 05**: Optional enhancements (similarity search optimizations) + +### Step 4: Test Your Implementation +After completing each exercise's TODOs, test incrementally: + +```bash +# Test after each exercise implementation +cargo build + +# Run the full application once all TODOs are complete +cargo run +``` + +### Step 5: Integration Testing +Once all TODOs are implemented: +1. Start the camera server (see Prerequisites section) +2. Run `cargo run` +3. Test registration: `register` → enter username → look at camera +4. Test login: `login` → enter username → look at camera +5. Verify similarity scores and authentication results + +## Installation + +1. Clone the repository +2. Install dependencies: + ```bash + cargo build + ``` + +## Prerequisites + +### Camera Server Setup + +Before running the face authentication system, you need to start the camera server: + +1. **Navigate to camera server directory**: + ```bash + cd ../camera_server + ``` + +2. **Install Python dependencies**: + ```bash + pip install -r requirements.txt + ``` + +3. **Start the camera server**: + ```bash + python camera_stream_api.py + ``` + +4. **Verify camera stream**: Open http://localhost:8000/video_feed in your browser + +### System Requirements +- **Camera**: Webcam or external camera connected to your system +- **Python 3.7+**: For the camera server +- **Rust 1.70+**: For the main application + +## Configuration + +The system uses `config.yaml` for configuration: + +### Storage Configuration + +```yaml +storage: + type: "local_file" + local_file: + path: "embeddings.json" +``` + +## Usage + +### Running the Application + +```bash +cargo run +``` + +### Commands + +- `register` - Register a new user by capturing face embeddings +- `login` - Authenticate an existing user +- `quit` or `exit` - Exit the application + +**Note**: Commands are entered without the `/` prefix (e.g., type `register`, not `/register`) + +### Registration Process + +1. Run the `register` command +2. Enter a user name +3. Look at the camera while the system captures multiple face samples +4. The system will store embeddings in your configured storage + +### Authentication Process + +1. Run the `login` command +2. Enter your registered user name +3. Look at the camera for authentication +4. The system will compare your face with stored embeddings + +## Storage + +The system uses local file storage to store face embeddings in JSON format. This provides: + +- **Simplicity**: No external dependencies required +- **Reliability**: Works offline and is easy to backup +- **Transparency**: Human-readable JSON format for debugging + +## Configuration Options + +### Stream Configuration + +```yaml +stream: + url: "http://localhost:8000/video_feed" # Video stream URL + num_images: 5 # Number of samples to capture + interval_millis: 10 # Interval between samples + chunk_size: 8192 # Network chunk size +``` + +### Model Configuration + +```yaml +model: + name: "timm/convnext_atto.d2_in1k" # Model name + embedding_size: 768 # Embedding vector size +``` + +## File Structure + +``` +src/ +├── main.rs # Main application entry point +├── config.rs # Configuration management +├── register.rs # Face registration logic +├── login.rs # Face authentication logic (includes TODO for Ex 03 & 05) +├── storage/ # Storage implementations +│ ├── storage.rs # Storage module exports +│ ├── vector_storage.rs # Storage trait and types +│ └── local_file_vector_storage.rs # TODO: Local file storage implementation (Ex 04) +├── embeddings/ # Embedding computation +│ ├── embeddings.rs # Module exports +│ └── utils.rs # TODO: Model loading and embedding computation (Ex 02) +├── image_utils/ # Image processing utilities +│ ├── image_utils.rs # Module exports +│ └── imagenet.rs # TODO: ImageNet preprocessing (Ex 01) +├── camera/ # Camera integration +│ ├── camera.rs # Module exports +│ └── camera_interactions.rs # Camera capture and streaming logic +└── config.yaml # Configuration file +``` + +**Files marked with TODO contain implementations you need to complete based on exercises 1-5.** + +## Dependencies + +### Core Dependencies +- **candle-core/candle-nn**: Neural network framework for model inference +- **candle-transformers**: Pre-trained model implementations (ConvNeXt) +- **hf-hub**: Hugging Face Hub integration for model downloading +- **anyhow**: Error handling and propagation + +### Data & Configuration +- **serde/serde_yaml/serde_json**: Serialization for config and storage +- **uuid**: Unique identifier generation for embeddings +- **chrono**: Timestamp handling for embedding records + +### Camera & Streaming +- **reqwest**: HTTP client for video streaming +- **image**: Image processing and format handling +- **minifb**: Window management for live video display + +### Utilities +- **clap**: Command line argument parsing (for examples) +- **dotenv**: Environment variable loading +- **lazy_static**: Static configuration management + +## Troubleshooting + +### Video Stream Issues + +- Ensure the video stream URL is accessible +- Check network connectivity +- Verify the stream format is supported + +### Storage Issues + +- Ensure write permissions to the configured file path +- Check that the directory exists or can be created + +## 🚀 Extra Mile: Advanced Enhancements + +### Current Limitation +This implementation focuses on **face embeddings only** - it assumes input images already contain properly cropped and aligned faces. In real-world scenarios, you need **face detection** as a preprocessing step. + +### Enhancement Option 1: Complete Face Authentication Pipeline + +Integrate face detection to build a complete pipeline: + +1. **Add Face Detection**: Use [rustface](https://github.com/atomashpolskiy/rustface) - a Rust implementation of SeetaFace detection + ```toml + [dependencies] + rustface = "0.1" + ``` + +2. **Pipeline Flow**: + ``` + Raw Image → Face Detection → Face Cropping → Face Embeddings → Authentication + ``` + +3. **Benefits**: + - Handle images with multiple faces or no faces + - Automatic face cropping and alignment + - More robust real-world deployment + - Better user experience (users don't need to manually align faces) + +**Implementation Steps**: +- Add rustface dependency +- Implement face detection in image preprocessing +- Add face cropping and alignment +- Handle edge cases (no faces, multiple faces) + +### Enhancement Option 2: Production-Grade Vector Storage + +Replace JSON storage with [Qdrant](https://qdrant.tech/) vector database: + +1. **Add Qdrant Integration**: + ```toml + [dependencies] + qdrant-client = "1.0" + ``` + +2. **Benefits Over JSON Storage**: + - **Scalability**: Handle millions of face embeddings + - **Performance**: Optimized vector similarity search + - **Advanced Features**: Filtering, clustering, hybrid search + - **Production Ready**: Built for high-throughput applications + +3. **Implementation**: + - Replace `LocalFileVectorStorage` with `QdrantVectorStorage` + - Implement the same `VectorStorage` trait + - Add Qdrant configuration to `config.yaml` + - Use Qdrant's native similarity search instead of manual iteration + +**Why This Matters**: JSON storage works for learning but doesn't scale. Production face authentication systems need vector databases to handle thousands of users efficiently. + +### Choose Your Challenge +- **Option 1** for computer vision enthusiasts who want to understand the complete pipeline +- **Option 2** for backend developers interested in scalable storage solutions +- **Both** for a production-ready system! + +## 📈 Learning Progression + +This exercise integrates concepts from all previous exercises: +- **Exercise 01**: Image preprocessing for neural network input +- **Exercise 02**: Model loading and face embedding generation +- **Exercise 03**: Similarity computation for face matching +- **Exercise 04**: Data persistence for user storage +- **Exercise 05**: Efficient similarity search + +## 🎓 Key Learning Outcomes + +You now understand: +- Computer vision fundamentals and image preprocessing +- Deep learning model integration for feature extraction +- Vector mathematics for similarity computation +- Storage systems for embedding persistence +- System integration and production considerations + +## 🚀 Next Steps + +Continue your journey by: +- Exploring advanced embedding models (FaceNet, ArcFace) +- Scaling with vector databases (Qdrant, pgvector) +- Adding security features (liveness detection) +- Optimizing performance (GPU acceleration) +- Building production applications + +## Contributing + +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Add tests if applicable +5. Submit a pull request + +## License + +This project is licensed under the MIT License. diff --git a/docs/edge_ai/face_authentication/_category_.json b/docs/edge_ai/face_authentication/_category_.json new file mode 100644 index 0000000..67b7c21 --- /dev/null +++ b/docs/edge_ai/face_authentication/_category_.json @@ -0,0 +1,7 @@ +{ + "label": "Face Authentication", + "position": 2, + "link": { + "type": "generated-index" + } +} diff --git a/docs/edge_ai/face_authentication/cheatsheet.md b/docs/edge_ai/face_authentication/cheatsheet.md new file mode 100644 index 0000000..75a73b3 --- /dev/null +++ b/docs/edge_ai/face_authentication/cheatsheet.md @@ -0,0 +1,323 @@ +# Third-Party Libraries Cheatsheet for Face Auth Workshop + +This cheatsheet covers the essential parts of **candle_***, **image**, and **serde** libraries used across exercises 01-05. + +--- + +## 🔥 Candle Framework + + +### Basic Tensor Operations + +#### Creating Tensors +```rust +// From vector with shape +let data: Vec = image_data; +let tensor = Tensor::from_vec(data, (height, width, channels), &Device::Cpu)?; + +// From array/slice +let mean = [0.485, 0.456, 0.406]; +let mean_tensor = Tensor::new(&mean, &Device::Cpu)?; + +// Reshape tensor +let reshaped = tensor.reshape((3, 1, 1))?; +``` + +#### Tensor Shape Manipulation +```rust +// Permute dimensions (e.g., HWC to CHW) +let tensor = tensor.permute((2, 0, 1))?; + +// Add batch dimension +let batched = tensor.unsqueeze(0)?; + +// Remove singleton dimensions +let squeezed = tensor.squeeze(0)?.squeeze(0)?; +``` + +#### Data Type Conversions +```rust +// Convert to different data types +let float_tensor = tensor.to_dtype(DType::F32)?; + +// Scale values (e.g., 0-255 to 0-1) +let normalized = tensor.to_dtype(DType::F32)? / 255.0; +``` + +#### Mathematical Operations + +**Broadcasting Operations**: These automatically expand tensors to compatible shapes for element-wise operations. + +```rust +// Broadcasting rules: smaller tensors are "stretched" to match larger ones +// Example: (3, 224, 224) + (3, 1, 1) = (3, 224, 224) +// The (3, 1, 1) tensor gets repeated across all 224x224 pixels + +let result = tensor1.broadcast_add(&tensor2)?; // Addition +let result = tensor1.broadcast_sub(&tensor2)?; // Subtraction +let result = tensor1.broadcast_mul(&tensor2)?; // Multiplication +let result = tensor1.broadcast_div(&tensor2)?; // Division + +// Matrix multiplication (no broadcasting - strict dimension requirements) +let result = tensor_a.matmul(&tensor_b)?; + +// Transpose swaps two dimensions +let transposed = tensor.transpose(0, 1)?; // Swap dims 0 and 1 +``` + +**How Broadcasting Works**: +- Dimensions are aligned from the right (trailing dimensions first) +- Missing dimensions are treated as size 1 +- Dimensions of size 1 are stretched to match the other tensor +- Example: `(256,)` + `(3, 224, 224)` becomes `(1, 1, 256)` + `(3, 224, 224)` → `(3, 224, 224)` + +#### Reduction Operations + +**What `keepdim` means**: Maintains the original number of dimensions by keeping reduced dims as size 1. + +```rust +// sum_keepdim example: +// Input: (2, 3, 4) tensor +// .sum(1) → (2, 4) # dimension 1 disappears +// .sum_keepdim(1) → (2, 1, 4) # dimension 1 becomes size 1 + +let sum = tensor.sum_keepdim(1)?; // Sum along dim 1, keep dim structure + +// Element-wise operations +let sqrt_tensor = tensor.sqrt()?; // √x for each element +let squared = tensor.sqr()?; // x² for each element +``` + +**Why keepdim matters**: Preserves tensor shape for broadcasting operations. Without it, you can't broadcast the result back to the original tensor shape. + +#### Extracting Values +```rust +// Single scalar value +let scalar: f32 = tensor.to_vec0()?; + +// 1D vector +let values: Vec = tensor.to_vec1()?; + +// Flatten all dimensions and get vector +let flattened: Vec = tensor.flatten_all()?.to_vec1()?; +``` + +### L2 Normalization (Essential for Embeddings) + +**What it does**: Scales vectors to unit length while preserving direction. Essential for cosine similarity. + +**Mathematical Formula**: `normalized_vector = vector / ||vector||₂` + +**Building Blocks**: +```rust +// Step by step operations you'll need: +// 1. Square each element +let squared = tensor.sqr()?; + +// 2. Sum along dimension (keeping dimensions for broadcasting) +let sum_squared = tensor.sum_keepdim(1)?; + +// 3. Take square root to get L2 norm +let norm = sum_squared.sqrt()?; + +// 4. Divide original by norm (broadcasting) +let normalized = tensor.broadcast_div(&norm)?; +``` + +**Why use it**: After L2 normalization, `||v||₂ = 1`, which means: +- Cosine similarity becomes just a dot product +- Removes magnitude bias - focuses only on direction +- Essential for fair comparison of embeddings + +### Cosine Similarity Building Blocks + +**Mathematical Formula**: `cosine_similarity = (A · B) / (||A|| × ||B||)` + +**Key Operations**: +```rust +// Matrix multiplication for dot product +let dot_product = tensor_a.matmul(&tensor_b.transpose(0, 1)?)?; + +// Transpose for proper matrix multiplication +let transposed = tensor.transpose(0, 1)?; + +// Extract scalar from tensor +let scalar_value = tensor.squeeze(0)?.squeeze(0)?.to_vec0::()?; + +// For Vec similarity (alternative approach): +let dot: f32 = vec_a.iter().zip(vec_b.iter()).map(|(x, y)| x * y).sum(); +let mag_a: f32 = vec_a.iter().map(|x| x * x).sum::().sqrt(); +let mag_b: f32 = vec_b.iter().map(|x| x * x).sum::().sqrt(); +``` + +### Model Loading & Usage + +**Core Concepts**: Loading pre-trained models and running inference. + +**Hugging Face Hub API Building Blocks**: +```rust +// Download model from Hugging Face Hub +let api = hf_hub::api::sync::Api::new()?; +let api = api.model("model-name-here".to_string()); +let model_file = api.get("model.safetensors")?; + +// Create VarBuilder from downloaded weights +let vb = unsafe { + VarBuilder::from_mmaped_safetensors(&[model_file], DType::F32, &device)? +}; + +// Load specific model architectures (examples): +// ConvNeXt: convnext::convnext_no_final_layer(&config, vb)? +// Other models have similar patterns +``` + +**Inference Building Blocks**: +```rust +// Handle batch dimensions +let batched_input = if input.dim(0)? == 3 { // Single image (C,H,W) + input.unsqueeze(0)? // Add batch: (1,C,H,W) +} else { + input.clone() // Already batched (N,C,H,W) +}; + +// Forward pass through model +let output = model.forward(&batched_input)?; + +// Common model interfaces: +// - Module::forward() for neural networks +// - Func::forward() for functional models +``` + +**Key Points**: +- `VarBuilder`: Loads pre-trained weights from `.safetensors` files +- `Module::forward()`: Standard interface for neural network inference +- **Batch Dimension**: Most models expect `(batch_size, channels, height, width)` +- **Device Management**: Ensure model and input tensors are on same device + +--- + +## 🖼️ Image Processing + +### Dependencies +```toml +[dependencies] +image = "0.25.6" +``` + +### Essential Imports +```rust +use image::{ImageReader, ImageFormat}; +``` + +### Loading and Processing Images +```rust +// Load image from file path +let img = image::ImageReader::open(path)? + .decode()?; + +// Resize image (multiple resize methods) +let img = img.resize_to_fill( + 224, // width + 224, // height + image::imageops::FilterType::Triangle, // filter type +); + +// Convert to RGB8 format +let img = img.to_rgb8(); + +// Extract raw pixel data +let data: Vec = img.into_raw(); // Returns Vec with RGB values +``` + +### Filter Types +```rust +// Available filter types for resizing +image::imageops::FilterType::Triangle // Good general purpose +image::imageops::FilterType::Lanczos3 // High quality +image::imageops::FilterType::Nearest // Fastest, pixelated +image::imageops::FilterType::CatmullRom // Sharp results +``` + + +**Key Concept**: The `reshape((3, 1, 1))` creates tensors that broadcast across all pixels: +- Original image: `(3, 224, 224)` - 3 channels, 224×224 pixels +- Mean/Std: `(3, 1, 1)` - 3 values, broadcasted to each pixel +- Result: Each of the 224×224 pixels gets normalized using its channel's specific mean/std + +--- + +## 📦 Serde (Serialization/Deserialization) + + +### Defining Serializable Structs +```rust +// Required derives for JSON serialization +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct YourStruct { + // Common field types: + pub id: String, // String fields + pub name: String, + pub data: Vec, // Vector fields + pub timestamp: chrono::DateTime, // DateTime fields + pub metadata: HashMap, // HashMap fields +} +``` + +### JSON Serialization +```rust +// Serialize to JSON string (pretty printed) +let json_string = serde_json::to_string_pretty(&records)?; + +// Serialize to JSON string (compact) +let json_string = serde_json::to_string(&records)?; + +// Write to file +std::fs::write("data.json", json_string)?; +``` + +### JSON Deserialization +```rust +// Read from file +let content = std::fs::read_to_string("data.json")?; + +// Handle empty files +if content.trim().is_empty() { + return Ok(Vec::new()); +} + +// Deserialize from JSON string +let records: Vec = serde_json::from_str(&content)?; +``` + +### Working with DateTime +```rust +// Create current timestamp +let timestamp = chrono::Utc::now(); + +// DateTime automatically serializes to ISO 8601 string in JSON +``` + +### Working with UUIDs +```rust +// Generate new UUID +let id = uuid::Uuid::new_v4().to_string(); +``` + + +## 🚀 Performance Tips + +1. **Tensor Operations**: Use broadcast operations instead of loops when possible +2. **Memory Management**: Reuse tensors when possible, avoid unnecessary clones +3. **Model Loading**: Cache loaded models, don't reload for each inference +4. **Image Processing**: Consider batch processing multiple images at once +5. **Serialization**: Use `serde_json::to_string_pretty` for debugging, regular `to_string` for production + +--- + +## ⚠️ Common Pitfalls + +1. **Tensor Shapes**: Always check tensor dimensions before operations +2. **Data Types**: Be consistent with DType (F16 vs F32) +3. **Error Handling**: Use `?` operator and proper Result types +4. **Empty Files**: Always handle empty JSON files in deserialization +5. **Path Handling**: Use proper path validation for file operations diff --git a/docs/edge_ai/face_authentication/index.md b/docs/edge_ai/face_authentication/index.md new file mode 100644 index 0000000..cb9c577 --- /dev/null +++ b/docs/edge_ai/face_authentication/index.md @@ -0,0 +1,252 @@ +--- +sidebar_position: 0 +--- +# Overview +A face authentication system built with Rust. + +Face Authentication is a modular face authentication system consisting of three main components: + +1. **App** - A Rust-based face authentication engine that handles face embedding generation, storage, and user authentication +2. **Camera Server** - A Python FastAPI server that provides camera streaming capabilities with support for multiple camera sources +3. **Workshop** - Educational exercises for learning face recognition concepts and implementation techniques + +## Slides + + + +download the slides. + +## Features + +- 🎯 **Real-time Face Authentication** - Fast face recognition using ConvNeXt models +- 📹 **Multiple Camera Sources** - Support for OpenCV, libcamera, and custom video streams +- 💾 **Local File Storage** - Simple JSON-based storage for face embeddings +- 🌐 **Web API** - RESTful camera streaming API with dynamic camera switching +- 🔧 **Easy Configuration** - YAML-based configuration for all components +- 🚀 **Cross-platform** - Works on Windows, Linux, and Raspberry Pi +- 📚 **Educational Workshop** - Step-by-step exercises for learning face recognition + +## Architecture + +``` +┌─────────────────┐ HTTP Stream ┌──────────────────┐ +│ Camera Server │ ◄─────────────── │ Face Auth App │ +│ (Python) │ │ (Rust) │ +│ │ │ │ +│ • FastAPI │ │ +│ • OpenCV │ │ • Embedding Gen │ +│ • libcamera │ │ • Authentication │ +└─────────────────┘ └──────────────────┘ + │ │ + ▼ ▼ +┌─────────────────┐ ┌──────────────────┐ +│ USB Camera │ │ Local Storage │ +│ Raspberry Pi │ │ • JSON Files │ +│ Webcam │ │ • Face Embeddings│ +└─────────────────┘ └──────────────────┘ +``` + +## Quick Start + +```bash +git clone https://github.com/Wyliodrin/edge-ai-face-auth.git +cd edge-ai-face-auth +cd workshop +cargo build +``` +### 1. Setup Camera Server + +```bash +cd ~/WORKSHOP/camera_server +source venv/bin/activate + +# Start the camera server +uvicorn camera_stream_api:app --host 0.0.0.0 --port 8000 +``` + +### 2. Setup Face Auth App + +```bash +cd app + +# Build the application +cargo build + +# Run the application +cargo run +``` + +### 3. Usage + +1. **Register a new user:** + - Run the app and type `register` + - Enter a username + - Look at the camera while the system captures face samples + +2. **Authenticate:** + - Type `login` + - Enter your username + - Look at the camera for authentication + +## Components + +### Face Auth App (Rust) + +The core authentication engine built with Rust for performance and safety. + +**Key Features:** +- ConvNeXt-based face embedding generation +- Local file storage for face embeddings +- Real-time face capture and processing +- High-performance face matching algorithms + +**Dependencies:** +- `candle-core` & `candle-nn` - Neural network framework +- `reqwest` - HTTP client for video streaming +- `image` & `minifb` - Image processing and display + +**Configuration:** +```yaml +# config.yaml +storage: + type: "local_file" + local_file: + path: "embeddings.json" + +stream: + url: "http://localhost:8000/video_feed" + num_images: 5 + interval_millis: 10 + +model: + name: "timm/convnext_atto.d2_in1k" +``` + +### Camera Server (Python) + +A FastAPI-based streaming server that provides camera access with multiple source support. + +**Key Features:** +- FastAPI web server with real-time streaming +- OpenCV and libcamera support +- Dynamic camera source switching +- Comprehensive error handling and logging +- Health check and diagnostic endpoints + +**Dependencies:** +- `fastapi` & `uvicorn` - Web framework and server +- `opencv-python` - Computer vision library +- `picamera2` - Raspberry Pi camera support (optional) + +**API Endpoints:** +- `GET /` - Server status and camera info +- `GET /health` - Health check +- `GET /video_feed` - Video stream +- `GET /camera_info` - Detailed camera configuration +- `GET /switch_camera?source={opencv|libcamera}` - Switch camera source + +### Workshop + +A collection of progressive exercises designed to teach face recognition concepts and implementation. + +**Exercises:** +- **Exercise 01** - Image Processing - Loading and normalizing images for neural networks +- **Exercise 02** - Embeddings - Computing face embeddings using ConvNeXt models +- **Exercise 03** - Similarity - Implementing cosine similarity for face matching +- **Exercise 04** - Storage - Building local file storage for face embeddings +- **Exercise 05** - Retrieval - Implementing k-nearest neighbor search + +**Structure:** +- Each exercise includes skeleton code with TODO comments +- Solutions provided for reference and verification +- Documentation and explanations +- Progressive difficulty building core concepts + +## Running tests +```bash +cd ex0x.. +cargo test +``` + +### Storage + +Face embeddings are stored locally in JSON format (`embeddings.json` by default). + +**Benefits:** +- Simple setup with no external dependencies +- Works offline and is easy to backup +- Human-readable format for debugging +- Suitable for development, testing, and small-scale deployments + + +### Project Structure + +``` +Face Auth/ +├── app/ # Rust face authentication engine +│ ├── src/ +│ │ ├── main.rs # Application entry point +│ │ ├── config.rs # Configuration management +│ │ ├── embeddings/ # Face embedding generation +│ │ ├── storage/ # Storage implementations +│ │ └── image_utils/ # Image processing utilities +│ ├── config.yaml # App configuration +│ └── Cargo.toml # Rust dependencies +├── camera_server/ # Python camera streaming server +│ ├── camera_stream_api.py # FastAPI application +│ ├── requirements.txt # Python dependencies +│ └── config.env # Environment configuration +└── workshop/ # Workshop exercises + ├── ex01_image_processing/ # Exercise 1: Image loading and normalization + ├── ex02_embeddings/ # Exercise 2: Face embedding generation + ├── ex03_similarity/ # Exercise 3: Cosine similarity computation + ├── ex04_storage_local/ # Exercise 4: Local file storage implementation + ├── ex05_retrieval/ # Exercise 5: k-NN search and retrieval + └── solution/ # Reference solutions for all exercises +``` + + +### Authentication Issues + +1. **Poor recognition accuracy:** + - Ensure good lighting conditions + - Capture multiple face samples during registration + - Keep face centered and looking at camera + + +2. **Video stream errors:** + - Verify camera server is running on correct port + - Check network connectivity between components + +### Performance Optimization + +1. **Faster inference:** + - Use release build: `cargo run --release` + - Adjust capture interval in config + +2. **Memory usage:** + - Limit number of face samples during capture + + +## License + +This project is licensed under the MIT License - see the individual component READMEs for details. + +## Acknowledgments + +- Uses [Candle](https://github.com/huggingface/candle) framework for neural network inference +- Camera streaming powered by [FastAPI](https://fastapi.tiangolo.com/) +- Image processing with [OpenCV](https://opencv.org/) and Rust [image](https://crates.io/crates/image) crate + +## Support + +For issues and questions: + +- Review component-specific READMEs in `app/` and `camera_server/` +- Open an issue in the repository + +--- + +**Face Auth** - Secure, fast, and reliable face authentication for modern applications. diff --git a/docs/edge_ai/index.md b/docs/edge_ai/index.md new file mode 100644 index 0000000..b728519 --- /dev/null +++ b/docs/edge_ai/index.md @@ -0,0 +1,18 @@ +--- +sidebar_position: 6 +--- + +# Edge AI +Welcome to the Edge AI Track of the Rust Workshop! + +## Slides + + + +download the slides. + +## Software Prerequisits + +- [VSCode](https://code.visualstudio.com) or [Zed](https://zed.dev) with Remote Development \ No newline at end of file diff --git a/static/pdf/edgeai/0_introduction_rustworkshop.pdf b/static/pdf/edgeai/0_introduction_rustworkshop.pdf new file mode 100644 index 0000000..3a7a557 Binary files /dev/null and b/static/pdf/edgeai/0_introduction_rustworkshop.pdf differ diff --git a/static/pdf/edgeai/1_computer_vision_rustworkshop.pdf b/static/pdf/edgeai/1_computer_vision_rustworkshop.pdf new file mode 100644 index 0000000..f68b0b2 Binary files /dev/null and b/static/pdf/edgeai/1_computer_vision_rustworkshop.pdf differ diff --git a/static/pdf/edgeai/2_llm_rustworkshop.pdf b/static/pdf/edgeai/2_llm_rustworkshop.pdf new file mode 100644 index 0000000..02ad4d0 Binary files /dev/null and b/static/pdf/edgeai/2_llm_rustworkshop.pdf differ