[WIP] feat: add InfiniOps as optional kernel provider by chen2021673 · Pull Request #161 · InfiniTensor/InfiniTrain

chen2021673 · 2026-05-28T09:04:57Z

Summary

引入 InfiniOps 作为可选的 kernel provider，通过 USE_INFINIOPS=ON 启用。
插拔粒度落在 Dispatcher::Call()：Dispatcher 在 GetKernel 里增加白名单 hook，命中的 key 路由到 InfiniOps registry，未命中则回退到默认 CUDA kernel。

linear、matmul、outer 三个上层算子从直接调用 GemmCuda 改为 Dispatcher::Instance().Call<void>({device.type(), "Gemm"}, ...)，因此 InfiniOps 一次提供 Gemm 即可透明覆盖这三个上层算子，无需逐个包装。

Changes

构建：新增 USE_INFINIOPS CMake 选项；third_party/InfiniOps 作为子模块接入；启用时按需 add_subdirectory 并链接 InfiniOps::infiniops。
Dispatcher：GetKernel 增加 InfiniOps lookup hook；未命中 whitelist 时维持原行为。
Registry：新增 InfiniOpsRegistry（独立于主 Dispatcher 的 map）+ REGISTER_INFINIOPS_KERNEL 宏 + 全局 whitelist（当前包含 Gemm、AddForward）。
Adapter：adapter.{h,cc} 提供 ToOpsDataType / ToOpsDevice / ToOpsTensor 类型与张量桥接，dtype/device 对照采用 inline const std::unordered_map。
Handle：CUDA backend 通过 handle factory 注入 stream，避免 adapter 与 CUDA runtime 头硬绑定。
InfiniOps 算子封装：gemm.cc、elementwise.cc（AddForward）。
现有 CUDA kernel 改写：common/gemm.{cu,cuh} 把 GemmCuda 重命名为 Gemm 并通过 REGISTER_KERNEL 注册；linear/matmul/outer 的所有 GEMM 调用走 Dispatcher。

测试

当前单卡测试性能精度对齐

问题：InfiniTrain 原生 CUDA GEMM 在 gemm.cu (line 63) 固定用CUBLAS_COMPUTE_32F，但 InfiniOps NVIDIA GEMM 原来对 fp32 走的是CUBLAS_COMPUTE_32F_FAST_TF32，这里我手动修改了 InfiniOps 源码

TODO：解决编译warning

Wire InfiniOps in as a pluggable kernel provider keyed at the GEMM level: Dispatcher consults a per-key whitelist hook and routes registered ops to InfiniOps, falling back to the default CUDA kernel otherwise. linear, matmul and outer now invoke Gemm via Dispatcher rather than calling the cuBLAS wrapper directly, so InfiniOps Gemm transparently covers all three.

kilinchange · 2026-05-29T07:32:42Z

这里是出于什么原因要单独写一套 registry，而不能直接复用 InfiniTrain 原有的注册表呢？

kilinchange · 2026-05-29T07:36:24Z

这个头文件内容没什么问题，但不适合放到 include 里作为公共头文件暴露，先放 infini_train/src/kernels/common 里吧

kilinchange · 2026-05-29T07:39:43Z

这里不应该给 infinops 开额外分支，之前接沐曦 kernel 这块是不需要动的。

kilinchange · 2026-05-29T07:40:55Z

@@ -0,0 +1,25 @@
+#include "infini_train/include/core/kernel_provider/infiniops/adapter.h"


这个不是配套头文件吧

kilinchange · 2026-05-29T07:48:13Z

+
+} // namespace infini_train::kernel_provider::infiniops
+
+REGISTER_INFINIOPS_KERNEL(AddForward, infini_train::kernel_provider::infiniops::AddForward)


如果是为了修改注册 key 而专门给 infiniops 写一套注册机制的话感觉不是很有必要，直接按平台注册就行。

kilinchange · 2026-05-29T07:52:34Z

这部分是必要的通用 gemm 接口抽象改动，不涉及 infiniops 相关，可以考虑单独提 pr 先合。

kilinchange · 2026-05-29T07:54:35Z

 // FIXME: Requires stride tracking in the Tensor class before this can be implemented
 // correctly. Currently always returns true as a placeholder. The contiguous guard in
-// elementwise.cu ensures non-contiguous tensors fall back to the broadcast path.
+// the elementwise provider ensures non-contiguous tensors fall back to the broadcast path.


这块不用改吧

kilinchange · 2026-05-29T07:55:50Z

    std::shared_ptr<Tensor> Contiguous();
    // FIXME: Currently returns true unconditionally. Requires stride tracking in the Tensor
-    // class before this can be implemented correctly. The guard in elementwise.cu ensures
+    // class before this can be implemented correctly. The elementwise broadcast guard ensures


不用改吧

chen2021673 force-pushed the infiniops_plug_in branch from 614baf6 to 91b309a Compare May 28, 2026 09:10

chen2021673 changed the title ~~feat: add InfiniOps as optional kernel provider~~ [WIP] feat: add InfiniOps as optional kernel provider May 28, 2026

chen2021673 force-pushed the infiniops_plug_in branch from 91b309a to c7e3c27 Compare May 28, 2026 09:23

chen2021673 force-pushed the infiniops_plug_in branch from c7e3c27 to 865b51c Compare May 29, 2026 06:56

kilinchange reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: add InfiniOps as optional kernel provider#161

[WIP] feat: add InfiniOps as optional kernel provider#161
chen2021673 wants to merge 1 commit into
masterfrom
infiniops_plug_in

chen2021673 commented May 28, 2026 •

edited

Loading

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

kilinchange May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,25 @@
		#include "infini_train/include/core/kernel_provider/infiniops/adapter.h"


		} // namespace infini_train::kernel_provider::infiniops

		REGISTER_INFINIOPS_KERNEL(AddForward, infini_train::kernel_provider::infiniops::AddForward)

Conversation

chen2021673 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

测试

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chen2021673 commented May 28, 2026 •

edited

Loading