Skip to content

Add comprehensive NumPy comparison tests for GPU kernels #186

@m96-chan

Description

@m96-chan

Summary

Current test coverage for GPU kernel operations is incomplete. Most operations lack explicit NumPy comparison tests to verify correctness.

Current Coverage

✅ Tested (vs NumPy)

Category Operations Test File
Elementwise add, mul test_ops.py
Matmul matmul, matmul_tiled test_ops.py
TF32 matmul (TF32 mode) test_tf32_api.py

❌ Missing NumPy Tests

Category Operations
Elementwise sub, div, add_inplace, mul_inplace, copy_to
Unary exp, log, relu
Reduction sum, mean, max, softmax
NN gelu, silu, layernorm, rmsnorm
Matmul batched_matmul, transpose, linear_bias_gelu
FP8/NVF4 matmul_fp8*, gemv_*, quantize_* (SM-dependent)
Tensor concat_axis0, repeat_interleave_axis1, transpose_3d_021, cast_*

Proposed Test Structure

class TestSubOperation:
    def test_sub_basic(self):
        a_np = np.random.rand(1024).astype(np.float32)
        b_np = np.random.rand(1024).astype(np.float32)
        a, b = gp.from_numpy(a_np), gp.from_numpy(b_np)
        result = gp.sub(a, b).to_numpy()
        np.testing.assert_array_almost_equal(result, a_np - b_np)

class TestExpOperation:
    def test_exp_basic(self):
        x_np = np.random.rand(1024).astype(np.float32)
        x = gp.from_numpy(x_np)
        result = gp.exp(x).to_numpy()
        np.testing.assert_array_almost_equal(result, np.exp(x_np), decimal=5)

class TestSoftmaxOperation:
    def test_softmax_1d(self):
        x_np = np.random.rand(128).astype(np.float32)
        x = gp.from_numpy(x_np)
        result = gp.softmax(x).to_numpy()
        expected = scipy.special.softmax(x_np)
        np.testing.assert_array_almost_equal(result, expected, decimal=5)

Acceptance Criteria

  • All elementwise operations (sub, div, add_inplace, mul_inplace, copy_to) have NumPy tests
  • All unary operations (exp, log, relu) have NumPy tests
  • All reduction operations (sum, mean, max, softmax) have NumPy tests
  • NN operations (gelu, silu, layernorm, rmsnorm) have NumPy/SciPy reference tests
  • Tensor operations (concat, transpose, cast) have NumPy tests
  • Tests use appropriate tolerances for floating-point comparison
  • SM-dependent operations (FP8/NVF4) use pytest.mark.skipif for hardware availability

Notes

  • Use np.testing.assert_array_almost_equal with appropriate decimal parameter
  • For operations like softmax, use scipy.special.softmax as reference
  • For layernorm/rmsnorm, implement NumPy reference manually
  • FP8/NVF4 tests should skip on unsupported hardware (SM < 90/120)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions