Skip to content

perf(tts): Kokoro layers use CPU fallback (numpy) instead of GPU kernels #182

@m96-chan

Description

@m96-chan

Description

All neural network layers in src/pygpukit/tts/kokoro/layers.py use CPU fallback implementations via numpy instead of native GPU kernels.

Affected Layers

Layer Issue
Conv1d im2col + numpy matmul
ConvTranspose1d Scatter-add via Python loop
BertSelfAttention numpy attention (no FlashAttention)
ResBlock1d Uses Conv1d (CPU)
ISTFTNet Overlap-add via Python loop
leaky_relu numpy where()

Example: Conv1d Implementation

def __call__(self, x: GPUArray) -> GPUArray:
    # Convert to numpy for im2col (can be optimized later)
    x_np = x.to_numpy()
    w_np = self.weight.to_numpy()
    
    # im2col: extract patches
    for i in range(self.kernel_size):
        for j in range(out_length):
            col[:, :, i, j] = x_np[:, :, j_strided + i_dilated]
    
    # Matmul
    out_np = np.einsum("bkl,ok->bol", col, w_reshaped)
    
    return from_numpy(out_np.astype(np.float32))

Impact

  • Latency: Every layer incurs GPU->CPU->GPU transfer overhead
  • Throughput: Python loops are orders of magnitude slower than CUDA
  • Memory: Unnecessary copies between GPU and CPU memory

Note

This is a performance issue, not a functionality issue. The layers work correctly, but slowly.

However, see #179 - the main TTS issue is that model._forward_simple() doesn't call these layers at all (generates sine wave instead).

Required Work

  1. Implement native CUDA conv1d kernel
  2. Implement native CUDA transpose conv1d kernel
  3. Use existing SDPA for attention (already have causal, need bidirectional)
  4. Implement LeakyReLU kernel
  5. Implement ISTFT kernel (or use cuFFT)

Priority

Low - First fix #179 (make TTS functional), then optimize performance.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.3Advanced: Triton backend, advanced ops

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions