Skip to content

Add a benchmark to see how a complex kernel benefits from FKL methodology #240

@morousg

Description

@morousg

We want to verify that implementing a complex and very optimized kernel as a DPP, does really allow to fuse the previous and continuation kernels, making those kernels almost free, and the overall performance much faster.

We found interesting OpenSource kernels to play with here: https://github.com/karpathy/llm.c/blob/master/dev/cuda/attention_forward.cu

Metadata

Metadata

Assignees

Labels

benchmarkcode to evaluate performance metrics

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions