A curated list of best cuda programming books
-
Updated
May 19, 2026
A curated list of best cuda programming books
Real-time stream editing pipeline powered by the FLUX.2-klein-4B model, optimized for consumer GPUs
A GPU-Accelerated First-Order LP Solver
The intelligent OptiScaler installer Linux gamers needed. Automates FSR4, XeSS & DLSS configuration with GPU-optimized profiles for RDNA3/4, Arc & RTX cards.
GVProf: A Value Profiler for GPU-based Clusters
Boost Valheim's FPS to forge a smoother Viking journey!
Fast waifu2x converter with GPU optimization
AI Infrastructure Performance Engineer Learning Track - GPU optimization, inference optimization, and cost reduction
Fast waifu2x converter with GPU optimization
Handwritten Flash Attention 2 CUDA kernel for Blackwell (SM120) with TMA, swizzle, double buffering & warp specialization
KeSSie HUGE Context Semantic recall for Large Language Models
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA
Production-ready checklists and frameworks for deploying LLMs, GenAI models, and AI infrastructure. Covers vLLM, Kubernetes, GPU optimization, observability, compliance, and Day-0 to Day-2 operations.
First open-source real-time face filter app using MediaPipe FaceMesh for high-performance, GPU-accelerated effects.
Bilingual CUDA SGEMM optimization tutorial and reference implementation, from naive kernels to Tensor Core WMMA | 双语 CUDA SGEMM 优化教程与参考实现,从朴素内核到 Tensor Core WMMA
AI Infrastructure Senior Engineer Learning Track - Advanced ML infrastructure and technical leadership
This is a short course covering GPU optimization techniques for LLM inference
Physics-based computation at scale — Hamiltonian dynamics, spectral theory, and statistical mechanics powering optimization, drug discovery, genomics, molecular proof, and agentic commerce.
用于复现和优化常见的深度学习算子,基于cuda和triton两种方案,可供学习和参考
Claymore Dual Miner allows simultaneous mining of multiple cryptocurrencies, optimizing your mining profits while efficiently using GPU resources. ⛏️💰
Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.
To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."