🐛 Describe the bug
I'm moving some computations from kotlin to executorch involving the bucketize operator.
The model exports (I have to set the compile_config) but app crashes when running the model.
It looks like bucketize is missing so I have already implemented the tensor and scalar portable kernel and can open a PR.
To reproduce:
import torch
from torch.export import export
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower, EdgeCompileConfig
from torch import nn
class Bucketizator(nn.Module):
def forward(self, x, boundaries):
return torch.bucketize(x, boundaries)
inputs = (torch.randn(2, 2, 2), torch.randn(5))
model = Bucketizator()
exported_program = export(model, inputs)
eprogram = to_edge_transform_and_lower(exported_program, partitioner=[XnnpackPartitioner()],
compile_config=EdgeCompileConfig(_core_aten_ops_exception_list=[
torch.ops.aten.bucketize.Tensor])).to_executorch()
with open("model.pte", "wb") as file:
file.write(eprogram.buffer)
val module = Module.load(modelPath)
val x = Tensor.fromBlob(floatArrayOf(1.5f, 2.5f, 1.5f, 1.5f, 7f, 4.5f, 3.5f, 6f), longArrayOf(2, 2, 2))
val boundaries = Tensor.fromBlob(floatArrayOf(1f, 2f, 3f, 4f, 5f), longArrayOf(5))
val xValue1 = EValue.from(x)
val boundariesValue = EValue.from(boundaries)
val result = module.forward(xValue1, boundariesValue)[0].toTensor().dataAsLongArray
kernel 'aten::bucketize.Tensor_out' not found.
dtype: 6 | dim order: [
0,
1,
2,
]
dtype: 6 | dim order: [
0,
]
dtype: 4 | dim order: [
0,
1,
2,
]
dtype: 4 | dim order: [
0,
1,
2,
]
Missing operator: [0] aten::bucketize.Tensor_out
There are 1 instructions don't have corresponding operator registered. See logs for details
ptr
Versions
Collecting environment information...
/home/username/miniforge3/envs/executorch/lib/python3.10/site-packages/torch/cuda/__init__.py:384: UserWarning: Found GPU0 NVIDIA GPU which is of compute capability (CC) 6.1.
The following list shows the CCs this version of PyTorch was built for and the hardware CCs it supports:
- 7.5 which supports hardware CC >=7.5,<8.0
- 8.0 which supports hardware CC >=8.0,<9.0 except {8.7}
- 8.6 which supports hardware CC >=8.6,<9.0 except {8.7}
- 9.0 which supports hardware CC >=9.0,<10.0
- 10.0 which supports hardware CC >=10.0,<11.0 except {10.1}
- 12.0 which supports hardware CC >=12.0,<13.0
Please follow the instructions at https://pytorch.org/get-started/locally/ to install a PyTorch release that supports one of these CUDA versions: 12.6
_warn_unsupported_code(d, device_cc, code_ccs)
/home/username/miniforge3/envs/executorch/lib/python3.10/site-packages/torch/cuda/__init__.py:502: UserWarning:
NVIDIA GPU with CUDA capability sm_61 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_75 sm_80 sm_86 sm_90 sm_100 sm_120.
If you want to use the NVIDIA GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
queued_call()
PyTorch version: 2.12.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A
OS: Ubuntu 26.04 LTS (x86_64)
GCC version: (Ubuntu 15.2.0-16ubuntu1) 15.2.0
Clang version: 21.1.8 (6ubuntu1)
CMake version: version 4.3.3
Libc version: glibc-2.43
Python version: 3.10.20 | packaged by conda-forge | (main, Mar 5 2026, 16:42:22) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-7.0.0-22-generic-x86_64-with-glibc2.43
Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GPU
Nvidia driver version: 580.159.03
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Processor
CPU family: 23
Model: 8
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 72%
CPU max MHz: 3700,0000
CPU min MHz: 2200,0000
BogoMIPS: 7398,71
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization: AMD-V
L1d cache: 256 KiB (8 instances)
L1i cache: 512 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 16 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Old microcode: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsa: Not affected
Vulnerability Tsx async abort: Not affected
Vulnerability Vmscape: Mitigation; IBPB before exit to userspace
Versions of relevant libraries:
[pip3] executorch==1.3.1
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy==1.14.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas==13.1.1.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.20.0.48
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.1
[pip3] nvidia-nccl-cu13==2.29.7
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] pytorch_tokenizers==1.3.0
[pip3] torch==2.12.0
[pip3] torchao==0.17.0+git02105d46c
[pip3] torchaudio==2.11.0+cpu
[pip3] torchdata==0.11.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.0.0
[pip3] torchvision==0.27.0+cpu
[pip3] triton==3.7.0
[conda] executorch 1.3.1 pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cublas 13.1.1.3 pypi_0 pypi
[conda] nvidia-cuda-cupti 13.0.85 pypi_0 pypi
[conda] nvidia-cuda-nvrtc 13.0.88 pypi_0 pypi
[conda] nvidia-cuda-runtime 13.0.96 pypi_0 pypi
[conda] nvidia-cudnn-cu13 9.20.0.48 pypi_0 pypi
[conda] nvidia-cufft 12.0.0.61 pypi_0 pypi
[conda] nvidia-curand 10.4.0.35 pypi_0 pypi
[conda] nvidia-cusolver 12.0.4.66 pypi_0 pypi
[conda] nvidia-cusparse 12.6.3.3 pypi_0 pypi
[conda] nvidia-cusparselt-cu13 0.8.1 pypi_0 pypi
[conda] nvidia-nccl-cu13 2.29.7 pypi_0 pypi
[conda] nvidia-nvjitlink 13.0.88 pypi_0 pypi
[conda] nvidia-nvtx 13.0.85 pypi_0 pypi
[conda] pytorch-tokenizers 1.3.0 pypi_0 pypi
[conda] torch 2.12.0 pypi_0 pypi
[conda] torchao 0.17.0+git02105d46c pypi_0 pypi
[conda] torchaudio 2.11.0+cpu pypi_0 pypi
[conda] torchdata 0.11.0+cpu pypi_0 pypi
[conda] torchfix 0.6.0 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchtune 0.0.0 pypi_0 pypi
[conda] torchvision 0.27.0+cpu pypi_0 pypi
[conda] triton 3.7.0 pypi_0 pypi
🐛 Describe the bug
I'm moving some computations from kotlin to executorch involving the bucketize operator.
The model exports (I have to set the
compile_config) but app crashes when running the model.It looks like bucketize is missing so I have already implemented the tensor and scalar portable kernel and can open a PR.
To reproduce:
Versions