Vision - ROS2 AI Vision Pipeline

TLDR: Complete ROS2 vision system integrating SAM (Segment Anything), CLIP (classification), GraspNet (grasp detection), and Scene Understanding for robotic perception.

🔄 Pipeline Flow

Camera Input → SAM Detection → CLIP Classification → GraspNet → Scene Understanding → Unified Output

SAM detects and segments objects
CLIP classifies detected regions
GraspNet generates 6D grasp poses
Scene Understanding analyzes spatial relationships
Unified Pipeline coordinates all modules and outputs JSON

height = 73.66cm or 29 inch

Quick Start

1. Activate Vision Environment

cd ~/final_project_ws
source install/setup.bash
source vision_venv/bin/activate

2. Build the Package

colcon build

3. Launch Complete Pipeline

Option A - Single Launch Command (Recommended):

source install/setup.bash
ros2 launch ur_yt_sim final_project.launch.py mode:=real

argument

mode:=sim for simulation mode:=real for depth cam

Option B - Manual Node Startup:

# Terminal 1: SAM Detector
ros2 run vision simple_sam_detector

# Terminal 2: CLIP Classifier
ros2 run vision clip_classifier

# Terminal 3: GraspNet Detector
ros2 run vision graspnet_detector

# Terminal 4: Scene Understanding
ros2 run vision scene_understanding

# Terminal 5: Pixel-to-Real Converter
ros2 run vision pixel_to_real_world_service

# Terminal 6: Unified Pipeline Orchestrator
ros2 run vision unified_pipeline

4. Trigger Vision Pipeline

ros2 service call /vision/run_pipeline std_srvs/srv/Trigger

📋 Setup Requirements

Build Custom Interfaces

cd ~/final_project_ws
colcon build
source install/setup.bash

Verify Custom Interfaces

ros2 interface show custom_interfaces/msg/SAMDetection
ros2 interface show custom_interfaces/msg/SAMDetections
ros2 interface show custom_interfaces/srv/PixelToReal
ros2 interface show custom_interfaces/srv/FindObject

Install Python Dependencies

cd ~/final_project_ws/src/vision
pip install -r requirements.txt

Setup Gazebo Simulation

# Terminal 1: Build
cd ~/final_project_ws
colcon build
source install/setup.bash

# Terminal 2: Launch Gazebo with UR5 robot
source vision_venv/bin/activate
source install/setup.bash
ros2 launch ur_yt_sim spawn_ur5_camera_gripper_moveit.launch.py

Note: Requires X11 forwarding in WSL for display: export DISPLAY=:0

🧠 Vision AI ROS2 Pipeline Architecture

This project integrates SAM, CLIP, GraspNet, and Scene Understanding into a complete ROS2-based vision perception pipeline. Each module communicates via ROS2 services and topics.

📦 System Overview

Node	Role	Key Services/Topics
simple_sam_detector	Detects objects using SAM model	`/vision/run_pipeline`, `/vision/detect_objects`, `/vision/sam_detections`
clip_classifier	Classifies detected regions with CLIP	`/vision/classify_all`, `/vision/classify_bb`, `/vision/find_object`
graspnet_detector	Generates 6D grasp poses	`/vision/detect_grasp`, `/vision/detect_grasp_bb`
scene_understanding	Analyzes spatial relationships	`/vision/understand_scene`, `/vision/scene_understanding`
pixel_to_real_world_service	Converts pixel coordinates to 3D world coordinates	`/pixel_to_real_world`
unified_pipeline	Orchestrates complete vision pipeline	`/vision/run_pipeline`
find_object_service	High-level object search interface	`/vision/find_object_service`
find_object_grasp_service	Combined object search + grasp generation	`/vision/find_object_grasp_service`

⚙️ Available ROS2 Services

Core Pipeline Services

1. Unified Pipeline (`/vision/run_pipeline`)

Node: unified_pipeline
Type: std_srvs/srv/Trigger
Description: Executes complete vision pipeline (SAM → CLIP → GraspNet → Scene Understanding) and saves results to JSON

ros2 service call /vision/run_pipeline std_srvs/srv/Trigger

Object Detection Services (SAM)

2. SAM Object Detection (`/vision/detect_objects`)

Node: simple_sam_detector
Type: std_srvs/srv/Trigger
Description: Detects all objects in current frame using SAM, returns bounding boxes and confidences

ros2 service call /vision/detect_objects std_srvs/srv/Trigger

3. Run SAM Pipeline (`/vision/run_pipeline`)

Node: simple_sam_detector
Type: std_srvs/srv/Trigger
Description: Starts continuous SAM detection and publishes to /vision/sam_detections topic

ros2 service call /vision/run_pipeline std_srvs/srv/Trigger

4. Show Depth Image (`/vision/show_depth_image`)

Node: simple_sam_detector
Type: std_srvs/srv/Trigger
Description: Displays depth camera visualization for debugging

ros2 service call /vision/show_depth_image std_srvs/srv/Trigger

Classification Services (CLIP)

5. Classify All (`/vision/classify_all`)

Node: clip_classifier
Type: std_srvs/srv/Trigger
Description: Classifies entire camera frame using CLIP model

ros2 service call /vision/classify_all std_srvs/srv/Trigger

6. Classify Bounding Box (`/vision/classify_bb`)

Node: clip_classifier
Type: custom_interfaces/srv/ClassifyBBox
Description: Classifies specific region defined by bounding box

ros2 service call /vision/classify_bb custom_interfaces/srv/ClassifyBBox "{x1: 100, y1: 100, x2: 200, y2: 300}"

7. Find Object (`/vision/find_object`)

Node: clip_classifier
Type: custom_interfaces/srv/FindObject
Description: Searches for specific object by name in detected regions

ros2 service call /vision/find_object custom_interfaces/srv/FindObject "{object_name: 'red_cube'}"

8. Find Object Service (`/vision/find_object_service`)

Node: find_object_service
Type: custom_interfaces/srv/FindObject
Description: High-level object search with automatic detection + classification

ros2 service call /vision/find_object_service custom_interfaces/srv/FindObject "{object_name: 'drill'}"

Grasp Generation Services (GraspNet)

9. Detect Grasp (`/vision/detect_grasp`)

Node: graspnet_detector
Type: std_srvs/srv/Trigger
Description: Generates 6D grasp poses for all detected objects

ros2 service call /vision/detect_grasp std_srvs/srv/Trigger

10. Detect Grasp Bounding Box (`/vision/detect_grasp_bb`)

Node: graspnet_detector
Type: custom_interfaces/srv/DetectGraspBBox
Description: Generates grasp pose for specific region

ros2 service call /vision/detect_grasp_bb custom_interfaces/srv/DetectGraspBBox "{x1: 100, y1: 100, x2: 200, y2: 300}"

11. Find Object Grasp (`/vision/find_object_grasp_service`)

Node: find_object_grasp_service
Type: custom_interfaces/srv/FindObjectGrasp
Description: Combined service: find object + generate grasp pose

ros2 service call /vision/find_object_grasp_service custom_interfaces/srv/FindObjectGrasp "{object_name: 'wrench'}"

Scene Understanding Services

12. Understand Scene (`/vision/understand_scene`)

Node: scene_understanding
Type: std_srvs/srv/Trigger
Description: Analyzes spatial relationships between detected objects

ros2 service call /vision/understand_scene std_srvs/srv/Trigger

Coordinate Transformation Services

13. Pixel to Real World (`/pixel_to_real_world`)

Node: pixel_to_real_world_service
Type: custom_interfaces/srv/PixelToReal
Description: Converts 2D pixel coordinates to 3D world coordinates (x,y,z) based on the UR Arm and depth camera position

ros2 service call /pixel_to_real_world custom_interfaces/srv/PixelToReal "{u: 320, v: 240}"

📡 ROS2 Topics

Published Topics

Topic	Type	Description
`/vision/sam_detections`	`custom_interfaces/msg/SAMDetections`	Continuous SAM detection results
`/vision/scene_understanding`	`custom_interfaces/msg/SceneUnderstanding`	Scene graph with spatial relations
`/camera/image_raw`	`sensor_msgs/Image`	RGB camera feed
`/camera/depth/image_raw`	`sensor_msgs/Image`	Depth camera feed
`/camera/camera_info`	`sensor_msgs/CameraInfo`	Camera calibration parameters

Subscribed Topics

Topic	Nodes Subscribing
`/camera/image_raw`	All vision nodes
`/camera/depth/image_raw`	`simple_sam_detector`, `graspnet_detector`, `pixel_to_real_world_service`
`/vision/sam_detections`	`clip_classifier`, `graspnet_detector`, `scene_understanding`

⚙️ Node Details

Node	Description	Command
`simple_sam_detector`	Object detection using SAM	`ros2 run vision simple_sam_detector`
`clip_classifier`	Image classification using CLIP	`ros2 run vision clip_classifier`
`graspnet_detector`	6D grasp pose generation	`ros2 run vision graspnet_detector`
`scene_understanding`	Spatial relationship analysis	`ros2 run vision scene_understanding`
`pixel_to_real_world_service`	Pixel to 3D coordinate conversion	`ros2 run vision pixel_to_real_world_service`
`unified_pipeline`	Complete pipeline orchestration	`ros2 run vision unified_pipeline`
`find_object_service`	High-level object search	`ros2 run vision find_object_service`
`find_object_grasp_service`	Object search + grasp generation	`ros2 run vision find_object_grasp_service`

📊 Usage Examples

Example 1: Find and Grasp an Object

# Start all nodes with launch file
ros2 launch vision unified_pipeline.launch.py

# Find object and get grasp pose in one call
ros2 service call /vision/find_object_grasp_service custom_interfaces/srv/FindObjectGrasp "{object_name: 'red_cube'}"

Example 2: Complete Scene Analysis

# Run full pipeline
ros2 service call /vision/run_pipeline std_srvs/srv/Trigger

# Results saved to: /home/group11/final_project_ws/src/vision/unified_pipeline_output.json

Example 3: Custom Object Detection Workflow

# Step 1: Detect all objects
ros2 service call /vision/detect_objects std_srvs/srv/Trigger

# Step 2: Classify specific region
ros2 service call /vision/classify_bb custom_interfaces/srv/ClassifyBBox "{x1: 100, y1: 150, x2: 250, y2: 300}"

# Step 3: Generate grasp for that region
ros2 service call /vision/detect_grasp_bb custom_interfaces/srv/DetectGraspBBox "{x1: 100, y1: 150, x2: 250, y2: 300}"

📊 Benchmarking with Gazebo World

This section explains how to benchmark the vision pipeline using a series of 10 Gazebo simulation worlds.

There are 10 benchmark worlds located at:

~/final_project_ws/src/ur_yt_sim/worlds/test_world_x.world

Where x ∈ 1 to 10, e.g.,

test_world_1.world
test_world_2.world
...
test_world_10.world

These worlds contain different object arrangements that allow for testing the vision node under various visual conditions. Details of the objects in each world can be found in this Google Sheet

🔧 1. Sourcing the ROS2 Workspace

Open a terminal, then go to the final_project_ws Workspace by running

cd ~/final_project_ws/

To source the workspace, run:

source install/setup.bash

🔧 2. Launching a Benchmark World

The main Gazebo simulation launch file accepts a launch argument:

world_file:=<name_of_world_file>

To benchmark a specific world, run:

ros2 launch ur_yt_sim spawn_ur5_camera_gripper_moveit.launch.py world_file:=test_world_1.world

Example: launch world 7

ros2 launch ur_yt_sim spawn_ur5_gripper_moveit.launch.py world_file:=test_world_7.world

This will launch both the Gazebo simulation and all vision nodes.

Running the launch command without the world_file arg will launch the default world.

📚 Additional Documentation

For more detailed information, see the docs/ directory:

QUICK_START.md - Getting started guide
API_REFERENCE.md - Complete API documentation
UNIFIED_PIPELINE_SUMMARY.md - Unified pipeline details
PIXEL_TO_REAL_WORLD_QUICK_REF.md - Coordinate transformation guide
BENCHMARK_DASHBOARD.md - Benchmarking dashboard usage

🐛 Troubleshooting

Issue: Services not responding

Solution: Ensure all nodes are running and sourced correctly

ros2 node list  # Check running nodes
ros2 service list  # Check available services

Issue: Camera topics not publishing

Solution: Check Gazebo is running and camera plugin is loaded

ros2 topic list  # Should see /camera/image_raw and /camera/depth/image_raw
ros2 topic hz /camera/image_raw  # Check publishing rate

Issue: CLIP model not loading

Solution: Ensure vision_venv is activated and in the file directory final_project_ws

source ~/final_project_ws/vision_venv/bin/activate

📄 License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
archived		archived
calibration		calibration
dashboard		dashboard
docs		docs
graspnetAPI		graspnetAPI
launch		launch
resource		resource
run_pipeline		run_pipeline
scripts		scripts
src		src
test		test
testsh		testsh
vision		vision
vision_scripts		vision_scripts
world_bench		world_bench
worlds		worlds
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli_note.txt		cli_note.txt
package.xml		package.xml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_clip.py		test_clip.py

Folders and files

Latest commit

History

Repository files navigation

Vision - ROS2 AI Vision Pipeline

🔄 Pipeline Flow

Quick Start

1. Activate Vision Environment

2. Build the Package

3. Launch Complete Pipeline

4. Trigger Vision Pipeline

📋 Setup Requirements

Build Custom Interfaces

Verify Custom Interfaces

Install Python Dependencies

Setup Gazebo Simulation

🧠 Vision AI ROS2 Pipeline Architecture

📦 System Overview

⚙️ Available ROS2 Services

Core Pipeline Services

1. Unified Pipeline (/vision/run_pipeline)

Object Detection Services (SAM)

2. SAM Object Detection (/vision/detect_objects)

3. Run SAM Pipeline (/vision/run_pipeline)

4. Show Depth Image (/vision/show_depth_image)

Classification Services (CLIP)

5. Classify All (/vision/classify_all)

6. Classify Bounding Box (/vision/classify_bb)

7. Find Object (/vision/find_object)

8. Find Object Service (/vision/find_object_service)

Grasp Generation Services (GraspNet)

9. Detect Grasp (/vision/detect_grasp)

10. Detect Grasp Bounding Box (/vision/detect_grasp_bb)

11. Find Object Grasp (/vision/find_object_grasp_service)

Scene Understanding Services

12. Understand Scene (/vision/understand_scene)

Coordinate Transformation Services

13. Pixel to Real World (/pixel_to_real_world)

📡 ROS2 Topics

Published Topics

Subscribed Topics

⚙️ Node Details

📊 Usage Examples

Example 1: Find and Grasp an Object

Example 2: Complete Scene Analysis

Example 3: Custom Object Detection Workflow

📊 Benchmarking with Gazebo World

🔧 1. Sourcing the ROS2 Workspace

🔧 2. Launching a Benchmark World

📚 Additional Documentation

🐛 Troubleshooting

Issue: Services not responding

Issue: Camera topics not publishing

Issue: CLIP model not loading

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Unified Pipeline (`/vision/run_pipeline`)

2. SAM Object Detection (`/vision/detect_objects`)

3. Run SAM Pipeline (`/vision/run_pipeline`)

4. Show Depth Image (`/vision/show_depth_image`)

5. Classify All (`/vision/classify_all`)

6. Classify Bounding Box (`/vision/classify_bb`)

7. Find Object (`/vision/find_object`)

8. Find Object Service (`/vision/find_object_service`)

9. Detect Grasp (`/vision/detect_grasp`)

10. Detect Grasp Bounding Box (`/vision/detect_grasp_bb`)

11. Find Object Grasp (`/vision/find_object_grasp_service`)

12. Understand Scene (`/vision/understand_scene`)

13. Pixel to Real World (`/pixel_to_real_world`)

Packages