CUDA Boids Flocking

Parallel boids flocking simulation in CUDA.

Author: Lu M.

Tested System:

Windows 11 Home
AMD Ryzen 7 5800HS @ 3.20GHz, 16GB RAM
NVIDIA GeForce RTX 3060 Laptop GPU 6GB (Compute Capability 8.6)

Demo Simulations

1,000 Boids

10,000 Boids

50,000 Boids

Performance Analysis

Effect of Boid Count on Performance

For all three methods to detect neighboring boids, increasing the number of boids decreases the frames per second (FPS). This is a direct result of the computational complexity associated with a larger flock. Each boid must check its surroundings for neighbors to follow the flocking rules. As the number of boids increases, the number of potential interactions grows significantly. This leads to more threads to manage, more neighbors to consider for each boid, and a greater overall computational load, which in turn reduces the FPS.

Effect of Block Count and Block Size on Performance

Changing the block count and block size directly influences the performance of the GPU-based implementations. These parameters determine how the workload is distributed among the GPU's processing units.

Block Count: Increasing the block count divides the work among more blocks, allowing for greater parallelism. However, if the number of blocks is too high, it can introduce overhead in managing these blocks, potentially hurting performance.
Block Size: The block size affects how efficiently the GPU's streaming multiprocessors are utilized. An optimal block size fully occupies the SMs without exceeding their resource limits. A block size that is too small underutilizes the hardware, while one that is too large can lead to resource contention and decreased performance.

Coherent Uniform Grid Performance

The coherent uniform grid implementation yielded an interesting performance outcome.

Expected Outcome: It was anticipated that the coherent uniform grid would consistently outperform the naive and scattered methods, especially with a large number of boids. This is because the grid-based approach provides a more efficient way to locate neighbors, avoiding the need for a costly brute-force search.
Actual Results: The results showed that for a small number of boids (less than 3000), the coherent uniform grid had a slightly lower FPS. This is likely due to the initial overhead of setting up and managing the grid and sorting/unsorting. The naive or scattered approaches, while less efficient in principle, may have a simpler, lower-overhead implementation that works better for small numbers of boids. However, as the number of boids scaled up (to over 50,000), the coherent uniform grid's efficiency became evident, performing multiple times better than the other methods. The naive approach, in particular, does not scale well because of its brute-force neighbor search strategy.

Effect of Cell Width and Neighboring Cells

Changing the cell width and the number of neighboring cells checked (8 vs. 27) has an insignificant and nuanced effect on performance.

Cell Width: The optimal cell width is twice the maximum interaction radius of a boid. If the cell width is too small, a boid's neighbors might be in adjacent cells, requiring more cell checks. If it's too large, it may contain many boids that are not neighbors, leading to unnecessary distance calculations. An optimal cell width minimizes the number of cells to check and the number of unnecessary distance checks.
Checking 27 vs. 8 Neighboring Cells: Checking 27 cells (including the current cell and its immediate neighbors in 3D space) is typically slightly slower than checking 8 cells (in 2D space) because it involves more data access and comparisons. A well-defined 2D uniform grid ensures that all interacting neighbors for a boid are contained within its own cell and the 8 surrounding cells. Therefore, checking 27 cells is redundant and inefficient for a 2D simulation, as the additional 18 cells would not contain any relevant neighbors.

Build Instructions

Clone the repository:

git clone https://github.com/lu-m-dev/CUDA-boids-flocking.git

Navigate to the project directory:
```
cd CUDA-boids-flocking
```

Build with CMake:

cmake -B build -S . -G "Visual Studio 17 2022"

Open the solution in Visual Studio:
```
cd build
start ./cis5650_boids.sln
```

Credit

University of Pennsylvania CIS 5650

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
cmake		cmake
external		external
images		images
shaders		shaders
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Boids Flocking

Demo Simulations

Performance Analysis

Effect of Boid Count on Performance

Effect of Block Count and Block Size on Performance

Coherent Uniform Grid Performance

Effect of Cell Width and Neighboring Cells

Build Instructions

Credit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA Boids Flocking

Demo Simulations

Performance Analysis

Effect of Boid Count on Performance

Effect of Block Count and Block Size on Performance

Coherent Uniform Grid Performance

Effect of Cell Width and Neighboring Cells

Build Instructions

Credit

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages