WebGPU Gaussian Splatting

3D Gaussian splatting renderer and viewer in WebGPU.

Author: Lu M.

Tested System:

Windows 11 Home
AMD Ryzen 7 5800HS @ 3.20GHz, 16GB RAM
NVIDIA GeForce RTX 3060 Laptop GPU 6GB (Compute Capability 8.6)
Brave 1.83.118 (Official Build) (64-bit), Chromium: 141.0.7390.108

Abstract

Gaussian splatting is a point-based rendering technique where each 3D point is rasterized as a screen-space Gaussian instead of a single pixel. Each splat carries a position, covariance, and color. When splats overlap, their contributions are composited to reconstruct a smooth, continuous appearance.

Methods

This implementation follows three main stages: preprocess, sort, and render. Each stage is WebGPU-parallelized.

Preprocess

Input a 3D point-cloud loaded from a PLY.
Transform points into view, clip and NDC coordinates.
Compute per-point splat parameters: positions, 3D covariance, 2D covariance.

Sort

Purpose: sorting points back-to-front is required for correct alpha blending and improved memory coherence.
Implementation: a GPU radix sort.

Render

Two renderers are provided:

Point-cloud renderer: renders raw points with a simple size and per-point color. This uses a minimal vertex/fragment pipeline.
Gaussian splat renderer: for each splat, a screen-space quad is rasterized and shaded with a precomputed Gaussian weight using the splat covariance and color. The fragment shader evaluates the color and opacity.

Tile-Based Depth Sorting

Improve rendering performance on large point clouds by organizing Gaussian splats into screen-space tiles and sorting per-tile, reducing memory bandwidth and improving cache locality.

Implementation: a compute-based pipeline that:

For each Gaussian, determines which tiles it overlaps using its 2D projection and splat radius.
Generates key-value pairs (tile_id | depth, gaussian_id) for each Gaussian-tile intersection using atomic operations to avoid write conflicts.
Sorts these pairs globally using radix sort.
Identifies start/end ranges for each tile via a second compute pass.
During rendering, tiles can be rasterized in any order with per-tile depth ordering.

Trade-offs: This approach reduces per-frame fragment shader overdraw in theory but adds preprocessing overhead. For scenes with significant depth complexity and overlapping splats per tile, this can improve performance. For scenes with sparse or well-distributed splats, the overhead may outweigh the benefits.

Visual comparison

Bonsai (272,965 points)

Point Cloud Renderer	Gaussian Splatting

272,965 points — point renderer	272,965 points — gaussian splatting

Observation: for the bonsai dataset (272,965 points) both renderers saturate the display refresh on the test laptop and hit the monitor's 144 Hz cap in the author's measurements.

Bicycle (1,063,091 points)

Point Cloud Renderer	Gaussian Splatting

1,063,091 points — point renderer	1,063,091 points — gaussian splatting

Measured performance (personal laptop):

Bonsai (272,965 points): both renderers hit the display cap at 144 Hz.
Bicycle (1,063,091 points): point-cloud renderer ≈ 100 Hz; Gaussian splatting renderer ≈ 50 Hz; Gaussian splatting with tile-based depth sorting ≈ 40 Hz.

Discussion: the Gaussian splatting renderer rasterizes a screen-space quad per point and evaluates a Gaussian per-fragment. For dense clouds (the bicycle set) this produces significantly more fragment shader work and overdraw compared to rendering simple points. For the bonsai dataset the total fragment workload is low enough that both appear similarly fast.

The tile-based depth sorting variant demonstrates no visual difference in rendered quality compared to the base Gaussian splatting renderer. However, it runs approximately 20% slower (40 FPS vs 50 FPS on bicycle) despite the theoretical benefits of improved memory coherence and reduced overdraw. This performance regression occurs because:

Preprocessing overhead dominates: the tile-based approach adds significant compute cost in key generation, atomic allocations, sorting the larger tile-pair dataset, and range identification. These stages are necessary every frame and account for substantial GPU time.
Tile-sorting benefits don't materialize at this scale: while tile-based sorting theoretically improves cache locality, the actual savings in fragment shader evaluation are minimal. The 1M point cloud is already sparse enough per tile that depth coherence doesn't yield meaningful performance gains on modern GPUs with efficient caching.
Atomic operation contention: multiple Gaussians writing to the same tile allocator creates atomic contention, serializing work that could otherwise be parallelized.
Small tile sizes: to fit within WebGPU buffer limits (max_tile_pairs ≈ 16M), tile sizes must be modest, spreading splats across many tiles and reducing the per-tile depth-sorting benefit.

Note: these are preliminary numbers from a single machine. More comprehensive benchmarking is required (different GPUs, tile-based profiling, memory bandwidth counters, and varying splat sizes) to draw robust conclusions.

Bloopers

Incorrect depth and sort buffer

I implemented incorrect depth calculation and improper sort buffer usage. This resulted in the output displaying Gaussian quads refreshing with black square artifacts that obscured the view.

Mistaken opacity

Bonsai	Bicycle

I implemented wrong alpha calculation in the fragment shader, mistakenly assigning an arbitrary value of `1.0f` to all fragments. This gave the air around the object an opacity coherent with the object's color.

Tile-based sorting buffer overflow

During tile-based depth sorting, I failed to properly allocate space for tile-pair entries using atomic operations, causing buffer overwrites. This resulted in splats rendering as solid colored discs rather than smoothly blended Gaussians, due to corrupted splat data and invalid depth values in the tile-sorted output.

Build instructions

Download Node.js

Clone repository

git clone https://github.com/lu-m-dev/WebGPU-gaussian-splatting.git

Install dependencies

cd WebGPU-gaussian-splatting
npm install

Launch dev server
```
npm run dev
```
Optional build to static dist/
```
npm run build
```

Credits

Vite
tweakpane
stats.js
wgpu-matrix
Special Thanks to: Shrek Shao (Google WebGPU team) & Differential Guassian Renderer

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
images		images
src		src
.gitignore		.gitignore
INSTRUCTION.md		INSTRUCTION.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebGPU Gaussian Splatting

Abstract

Methods

Preprocess

Sort

Render

Tile-Based Depth Sorting

Visual comparison

Bonsai (272,965 points)

Bicycle (1,063,091 points)

Bloopers

Incorrect depth and sort buffer

Mistaken opacity

Tile-based sorting buffer overflow

Build instructions

Credits

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebGPU Gaussian Splatting

Abstract

Methods

Preprocess

Sort

Render

Tile-Based Depth Sorting

Visual comparison

Bonsai (272,965 points)

Bicycle (1,063,091 points)

Bloopers

Incorrect depth and sort buffer

Mistaken opacity

Tile-based sorting buffer overflow

Build instructions

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages