Skip to content

lu-m-dev/WebGPU-gaussian-splatting

 
 

Repository files navigation

WebGPU Gaussian Splatting

3D Gaussian splatting renderer and viewer in WebGPU.

Author: Lu M.

Tested System:

  • Windows 11 Home
  • AMD Ryzen 7 5800HS @ 3.20GHz, 16GB RAM
  • NVIDIA GeForce RTX 3060 Laptop GPU 6GB (Compute Capability 8.6)
  • Brave 1.83.118 (Official Build) (64-bit), Chromium: 141.0.7390.108

Abstract

Gaussian splatting is a point-based rendering technique where each 3D point is rasterized as a screen-space Gaussian instead of a single pixel. Each splat carries a position, covariance, and color. When splats overlap, their contributions are composited to reconstruct a smooth, continuous appearance.

Methods

This implementation follows three main stages: preprocess, sort, and render. Each stage is WebGPU-parallelized.

Preprocess

  • Input a 3D point-cloud loaded from a PLY.
  • Transform points into view, clip and NDC coordinates.
  • Compute per-point splat parameters: positions, 3D covariance, 2D covariance.

Sort

  • Purpose: sorting points back-to-front is required for correct alpha blending and improved memory coherence.
  • Implementation: a GPU radix sort.

Render

Two renderers are provided:

  • Point-cloud renderer: renders raw points with a simple size and per-point color. This uses a minimal vertex/fragment pipeline.
  • Gaussian splat renderer: for each splat, a screen-space quad is rasterized and shaded with a precomputed Gaussian weight using the splat covariance and color. The fragment shader evaluates the color and opacity.

Tile-Based Depth Sorting

Improve rendering performance on large point clouds by organizing Gaussian splats into screen-space tiles and sorting per-tile, reducing memory bandwidth and improving cache locality.

Implementation: a compute-based pipeline that:

  1. For each Gaussian, determines which tiles it overlaps using its 2D projection and splat radius.
  2. Generates key-value pairs (tile_id | depth, gaussian_id) for each Gaussian-tile intersection using atomic operations to avoid write conflicts.
  3. Sorts these pairs globally using radix sort.
  4. Identifies start/end ranges for each tile via a second compute pass.
  5. During rendering, tiles can be rasterized in any order with per-tile depth ordering.

Trade-offs: This approach reduces per-frame fragment shader overdraw in theory but adds preprocessing overhead. For scenes with significant depth complexity and overlapping splats per tile, this can improve performance. For scenes with sparse or well-distributed splats, the overhead may outweigh the benefits.

Visual comparison

Bonsai (272,965 points)

Point Cloud Renderer Gaussian Splatting
bonsai pointcloud bonsai gaussian
272,965 points — point renderer 272,965 points — gaussian splatting

Observation: for the bonsai dataset (272,965 points) both renderers saturate the display refresh on the test laptop and hit the monitor's 144 Hz cap in the author's measurements.

Bicycle (1,063,091 points)

Point Cloud Renderer Gaussian Splatting
bicycle pointcloud bicycle gaussian
1,063,091 points — point renderer 1,063,091 points — gaussian splatting

Measured performance (personal laptop):

  • Bonsai (272,965 points): both renderers hit the display cap at 144 Hz.
  • Bicycle (1,063,091 points): point-cloud renderer ≈ 100 Hz; Gaussian splatting renderer ≈ 50 Hz; Gaussian splatting with tile-based depth sorting ≈ 40 Hz.

Discussion: the Gaussian splatting renderer rasterizes a screen-space quad per point and evaluates a Gaussian per-fragment. For dense clouds (the bicycle set) this produces significantly more fragment shader work and overdraw compared to rendering simple points. For the bonsai dataset the total fragment workload is low enough that both appear similarly fast.

The tile-based depth sorting variant demonstrates no visual difference in rendered quality compared to the base Gaussian splatting renderer. However, it runs approximately 20% slower (40 FPS vs 50 FPS on bicycle) despite the theoretical benefits of improved memory coherence and reduced overdraw. This performance regression occurs because:

  1. Preprocessing overhead dominates: the tile-based approach adds significant compute cost in key generation, atomic allocations, sorting the larger tile-pair dataset, and range identification. These stages are necessary every frame and account for substantial GPU time.
  2. Tile-sorting benefits don't materialize at this scale: while tile-based sorting theoretically improves cache locality, the actual savings in fragment shader evaluation are minimal. The 1M point cloud is already sparse enough per tile that depth coherence doesn't yield meaningful performance gains on modern GPUs with efficient caching.
  3. Atomic operation contention: multiple Gaussians writing to the same tile allocator creates atomic contention, serializing work that could otherwise be parallelized.
  4. Small tile sizes: to fit within WebGPU buffer limits (max_tile_pairs ≈ 16M), tile sizes must be modest, spreading splats across many tiles and reducing the per-tile depth-sorting benefit.

Note: these are preliminary numbers from a single machine. More comprehensive benchmarking is required (different GPUs, tile-based profiling, memory bandwidth counters, and varying splat sizes) to draw robust conclusions.

Bloopers

Incorrect depth and sort buffer

bonsai blooper
I implemented incorrect depth calculation and improper sort buffer usage. This resulted in the output displaying Gaussian quads refreshing with black square artifacts that obscured the view.

Mistaken opacity

Bonsai Bicycle
bonsai blooper bicycle blooper
I implemented wrong alpha calculation in the fragment shader, mistakenly assigning an arbitrary value of `1.0f` to all fragments. This gave the air around the object an opacity coherent with the object's color.

Tile-based sorting buffer overflow

tile sorting blooper
During tile-based depth sorting, I failed to properly allocate space for tile-pair entries using atomic operations, causing buffer overwrites. This resulted in splats rendering as solid colored discs rather than smoothly blended Gaussians, due to corrupted splat data and invalid depth values in the tile-sorted output.

Build instructions

  1. Download Node.js

  2. Clone repository

    git clone https://github.com/lu-m-dev/WebGPU-gaussian-splatting.git
  3. Install dependencies

    cd WebGPU-gaussian-splatting
    npm install
  4. Launch dev server

    npm run dev

    Optional build to static dist/

    npm run build

Credits

About

WebGPU Gaussian Splatting - High-performance realistic 3D rendering

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 56.5%
  • WGSL 42.7%
  • Other 0.8%