This program is designed to run on Android devices using the Pydroid 3 IDE, leveraging the device's camera for real time object detection, tracking, and re-identification. The system also utilizes a laptop server for the core processing of object detection and tracking.
The application is a computer vision tool that turns an Android phone into a real time object tracking device, streaming its camera feed to a laptop server for processing and then receiving the annotated video back for display. It captures video directly from the phone's camera, identifies objects in the video stream, assigns a unique ID to each object, and consistently tracks them as they move across frames.
The core of the application is built on a combination of powerful models and algorithms. It uses YOLOv11 (specifically yolo11m.pt for the server side processing), a state of the art object detection model, to find objects. For tracking, it employs a complex Re-Identification (ReID) tracker that leverages the OSNet deep learning model. This allows the system to not only track objects by their position but also by their visual appearance, making the tracking far more robust, especially in crowded scenes or during occlusions. The user interface on the phone is built with Kivy, providing a responsive display of the video feed and tracking data.
The entire system utilizes multi threading to ensure that the UI remains smooth while heavy processing tasks (on the laptop server) run in the background.
- Real Time Video Streaming: Streams live video from an Android phone's camera to a laptop over a TCP socket.
- Real Time Object Detection: Utilizes the YOLOv11 model (
yolo11m.pt) to detect a wide range of objects in real time. - Advanced Re-Identification Tracking: Implements a custom tracker with the OSNet model to analyze and match the visual appearance of objects, ensuring persistent and accurate tracking even after an object is temporarily lost or occluded.
- Mobile First Design (Client): Specifically engineered to run on Android devices via the Pydroid 3 IDE, acting as a client for camera capture and display.
- Interactive Kivy GUI (Client): A clean user interface built with Kivy displays the live camera feed with graphical overlays for tracking information.
- Dynamic Visual Feedback: Each tracked object is highlighted with a uniquely colored bounding box, making it easy to follow individual targets.
- Efficient Multi threaded Architecture: The application uses separate threads for camera capture and network communication on the client, and for handling multiple client connections on the server, to maximize throughput and prevent the UI from freezing.
- GPU Acceleration (Server): The YOLO and ReID models on the server leverage CUDA for accelerated inference.
The application's workflow is divided into several key stages, managed by a multi threaded architecture to ensure real time performance.
The initial step is to locate objects in each frame. The laptop server uses a pretrained YOLOv11 model (yolo11m.pt). The detection is configured with a confidence threshold of 0.3. The model processes frames from the phone client, dynamically setting the image size based on the incoming frame resolution.
Once objects are detected, they are passed to a custom Re-Identification (ReID) tracker. This tracker is significantly more advanced than standard position-based methods (like IOU tracking) because it focuses on the visual appearance of the objects.
- Feature Extraction: For each detected object (like a person), a cropped image of the object is passed to the
ReIDFeatureExtractor. This component uses a pretrained OSNetAIN model (osnet_ain_x1_0_imagenet.pth) to generate a 512 dimensional feature vector, or "embedding," that numerically represents the object's unique appearance (colors, textures, shapes). - Appearance Matching: To associate objects between frames, the tracker compares the feature vector of a new detection with the stored feature vectors of existing tracks. This is done using cosine distance, which measures the similarity between two vectors. A lower cosine distance means a higher visual similarity.
- State Management: The
TrackStateclass maintains the state for each tracked object, including its unique ID, bounding box history, and a running average of its appearance features. This history helps to smooth out predictions and maintain a stable identity. - Robust Association: The tracker combines both appearance similarity (cosine distance) and positional information (IoU - Intersection over Union) to make a final matching decision. This hybrid approach allows it to re-identify an object that has reappeared after a long occlusion, a common failure point for trackers that rely on position alone.
The program's architecture is designed for concurrency and responsiveness, split between the Android phone client and the laptop server.
- Phone Client (
phone_client.py): TheYOLOAppclass runs on the main thread, handling all UI elements, rendering the final video frame, and displaying status text. A dedicatednetwork_loopthread continuously attempts to connect to the server, captures frames from the camera, sends them, and receives processed frames back. - Laptop Server (
laptop_server.py): The server listens for incoming client connections. For each connected client, a new thread (handle_client) is spawned. This thread receives frames, performs YOLOv11 detection and ReID tracking, annotates the frames, and sends them back to the client.
For the application to work, your project folder should be structured as follows. You will need to ensure model weights are accessible to the server.
MobileRealTimeYOLOObjectTracker/
├── laptop_server.py #Main Python application program for the server
├── phone_client.py #Main Python application program for the Android client
├── osnet_ain_x1_0_imagenet.pth #Re-Identification model (must be downloaded)
├── requirements.txt #List of Python dependencies for the server
├── bytetrack.yaml #The inbuilt bytetrack algorithm of YOlOv11's configurations file
└── ultralytics_config/ #Directory for Ultralytics configuration (created by server)
Note: The yolo11m.pt model is automatically downloaded by Ultralytics when model = YOLO("yolo11m.pt") is first run on the server. The osnet_ain_x1_0_imagenet.pth model must be manually downloaded.
Follow these steps to set up and run the project.
-
Prerequisites:
- Python 3.8+
- CUDA enabled GPU (highly recommended for performance)
-
Clone the repository:
git clone https://github.com/WhiteMetagross/RealTimeYOLOObjectTracker cd RealTimeYOLOObjectTracker -
Install dependencies: Create a
requirements.txtfile (see example below) and install the dependencies:pip install -r requirements.txt
-
Download Models:
- YOLOv11 Model: The
laptop_server.pyprogram automatically downloadsyolo11m.ptif not present. Ensure you have an active internet connection when running the server for the first time. - ReID Model: You must manually download the
osnet_ain_x1_0_imagenet.pthmodel from the ReID model zoo and place it in the same directory aslaptop_server.py.
- YOLOv11 Model: The
- Install Pydroid 3: Download and install the Pydroid 3: IDE for Python 3 app from the Google Play Store.
- Grant Permissions: Grant Pydroid 3 storage and camera permissions when prompted.
- Transfer Project Files: Download the
phone_client.pyfile and place it in a folder on your Android device's internal storage. - Install Python Dependencies in Pydroid 3:
Open Pydroid 3, then use its built-in pip installer to install the necessary libraries. Install each of the following packages one by one via the LIBRARIES or QUICK INSTALL tab:
kivyopencv-pythonnumpy- Note:
ultralyticsandtorch(PyTorch) are not needed on the phone client.
- Configure
phone_client.py: Openphone_client.pyin Pydroid 3's editor. Locate thehostvariable and set it to the actual IP address of your laptop server. Example:host = '123.456.78.90'
-
Start the Laptop Server: On your laptop, navigate to the project directory and run:
python laptop_server.py
The server will print the IP address it's listening on (like
Server listening on 123.456.78.90:9999). Ensure your laptop's firewall allows incoming connections on port9999. -
Start the Phone Client: On your Android phone, open
phone_client.pyin Pydroid 3 and press the large yellow "Play" button at the bottom right to run the program. The application will attempt to connect to the server and, once connected, will display the live camera feed with object tracking overlays. -
Network Configuration: Ensure both your laptop and phone are on the same WiFi network for communication.
The terminal logs of the running laptop server.
The application provides information overlaid on the video feed displayed on the phone.
- Bounding Boxes: Each detected object is enclosed in a rectangle. The color of the rectangle is unique to the object's track ID.
- Object Label: Above the bounding box, you will find a label with detailed information:
- ID:{id}: The unique tracking ID assigned by the ReID tracker.
- {class_name}: The class of the object detected by YOLOv11.
- {conf:.2f}: The detection confidence score from YOLO, ranging from 0.0 to 1.0.
- Status Label: A status line at the top of the phone screen provides connection status and activity messages (like "Connecting to laptop server...", "Streaming...").
The phone screen recordings of the running phone client.
- Connection Issues:
- Verify that both devices are on the same WiFi network.
- Check that the
hostIP address inphone_client.pyexactly matches the laptop's IP address. - Ensure no firewall is blocking port
9999on the laptop. - Check for "Connection refused" or "Connection lost" messages on the phone; this often indicates an incorrect IP/port or the server not running.
- Model Loading Errors:
- If the server fails to start with a
FileNotFoundError, ensure theosnet_ain_x1_0_imagenet.pthfile is in the correct directory.
- If the server fails to start with a
- Frame Processing Errors:
- If the display on the phone freezes or shows errors, check the server's console for messages.
- Ensure adequate resources (especially the GPU memory) on the laptop for the YOLO and ReID models.
- Low Performance:
- For the server, a powerful GPU is crucial for real time performance.
- Adjust JPEG
encode_paramquality (like 85) inphone_client.pyto balance visual quality and bandwidth usage. - Lower
CAP_PROP_FRAME_WIDTHandCAP_PROP_FRAME_HEIGHTinphone_client.pyto reduce the resolution and frame rate of the streamed video.