-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestv0.3Advanced: Triton backend, advanced opsAdvanced: Triton backend, advanced ops
Description
Description
All neural network layers in src/pygpukit/tts/kokoro/layers.py use CPU fallback implementations via numpy instead of native GPU kernels.
Affected Layers
| Layer | Issue |
|---|---|
Conv1d |
im2col + numpy matmul |
ConvTranspose1d |
Scatter-add via Python loop |
BertSelfAttention |
numpy attention (no FlashAttention) |
ResBlock1d |
Uses Conv1d (CPU) |
ISTFTNet |
Overlap-add via Python loop |
leaky_relu |
numpy where() |
Example: Conv1d Implementation
def __call__(self, x: GPUArray) -> GPUArray:
# Convert to numpy for im2col (can be optimized later)
x_np = x.to_numpy()
w_np = self.weight.to_numpy()
# im2col: extract patches
for i in range(self.kernel_size):
for j in range(out_length):
col[:, :, i, j] = x_np[:, :, j_strided + i_dilated]
# Matmul
out_np = np.einsum("bkl,ok->bol", col, w_reshaped)
return from_numpy(out_np.astype(np.float32))Impact
- Latency: Every layer incurs GPU->CPU->GPU transfer overhead
- Throughput: Python loops are orders of magnitude slower than CUDA
- Memory: Unnecessary copies between GPU and CPU memory
Note
This is a performance issue, not a functionality issue. The layers work correctly, but slowly.
However, see #179 - the main TTS issue is that model._forward_simple() doesn't call these layers at all (generates sine wave instead).
Required Work
- Implement native CUDA conv1d kernel
- Implement native CUDA transpose conv1d kernel
- Use existing SDPA for attention (already have causal, need bidirectional)
- Implement LeakyReLU kernel
- Implement ISTFT kernel (or use cuFFT)
Priority
Low - First fix #179 (make TTS functional), then optimize performance.
Related
- bug(tts): Kokoro TTS outputs 440Hz sine wave instead of speech #179 - TTS outputs sine wave (blocking bug)
- RFC: Image Generation Support (Stable Diffusion, Flux, DiT) #177 - Image generation needs similar kernels
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestv0.3Advanced: Triton backend, advanced opsAdvanced: Triton backend, advanced ops