Description
The DLight GEMV schedule rule crashes with KeyError: 'key is not in Map' when applied to any matrix-vector multiplication on an auto-detected CUDA target. The root cause is that gemv.py accesses target.attrs["max_shared_memory_per_block"] unconditionally, but the auto-detected CUDA target does not include this attribute.
Reproduction
import tvm
from tvm import relax
import tvm.relax.op as rop
import tvm.dlight
bb = relax.BlockBuilder()
a = relax.Var("a", relax.TensorStructInfo((128, 256), "float32"))
b = relax.Var("b", relax.TensorStructInfo((256, 1), "float32"))
with bb.function("main", [a, b]):
with bb.dataflow():
out = bb.emit(rop.matmul(a, b))
gv = bb.emit_output(out)
bb.emit_func_output(gv)
mod = bb.finalize()
pipeline = tvm.ir.transform.Sequential([
relax.transform.LegalizeOps(),
tvm.dlight.ApplyDefaultSchedule(
tvm.dlight.gpu.GEMV(),
tvm.dlight.gpu.Fallback(),
),
])
with tvm.target.Target("cuda"):
mod = pipeline(mod) # KeyError here
Error
File "tvm/dlight/gpu/gemv.py", line 161, in apply
and shared_mem_usage.value <= target.max_shared_memory_per_block
File "tvm/target/target.py", line 217, in max_shared_memory_per_block
return int(self.attrs["max_shared_memory_per_block"])
KeyError: 'key is not in Map'
Root cause
Target("cuda") auto-detects the GPU and produces:
cuda -keys=cuda,gpu -arch=sm_75 -max_num_threads=1024 -thread_warp_size=32
The attrs map contains only thread_warp_size, max_num_threads, and arch. The key max_shared_memory_per_block is not populated during auto-detection.
gemv.py:161 accesses this missing key without a guard:
and shared_mem_usage.value <= int(target.attrs["max_shared_memory_per_block"])
Note: the same pattern exists on the main branch at python/tvm/s_tir/dlight/gpu/gemv.py:162.
Expected behavior
The GEMV schedule rule should either:
- Have the CUDA target auto-detection populate
max_shared_memory_per_block (queryable via CUDA API), or
- Guard the access with a fallback default, e.g.:
max_smem = target.attrs.get("max_shared_memory_per_block", 49152)
Environment
- TVM: v0.23.0 (also reproduced on
main branch source)
- GPU: Tesla T4 (sm_75)
- OS: Ubuntu Linux
- Python: 3.11
Description
The DLight
GEMVschedule rule crashes withKeyError: 'key is not in Map'when applied to any matrix-vector multiplication on an auto-detected CUDA target. The root cause is thatgemv.pyaccessestarget.attrs["max_shared_memory_per_block"]unconditionally, but the auto-detected CUDA target does not include this attribute.Reproduction
Error
Root cause
Target("cuda")auto-detects the GPU and produces:The attrs map contains only
thread_warp_size,max_num_threads, andarch. The keymax_shared_memory_per_blockis not populated during auto-detection.gemv.py:161accesses this missing key without a guard:Note: the same pattern exists on the
mainbranch atpython/tvm/s_tir/dlight/gpu/gemv.py:162.Expected behavior
The GEMV schedule rule should either:
max_shared_memory_per_block(queryable via CUDA API), orEnvironment
mainbranch source)