Skip to content

fix: Add AccPhase support for matmul and gemv ops#700

Open
jimmychou0 wants to merge 5 commits into
hw-native-sys:mainfrom
jimmychou0:tmp/accphase-api-pr
Open

fix: Add AccPhase support for matmul and gemv ops#700
jimmychou0 wants to merge 5 commits into
hw-native-sys:mainfrom
jimmychou0:tmp/accphase-api-pr

Conversation

@jimmychou0
Copy link
Copy Markdown

Summary

Add AccPhase support for matmul/gemv-related PTO ops and lower it to
the corresponding pto-isa template arguments.

Affected ops

  • pto.tmatmul
  • pto.tmatmul.acc
  • pto.tmatmul.bias
  • pto.tmatmul.mx
  • pto.tgemv
  • pto.tgemv.acc
  • pto.tgemv.bias
  • pto.tgemv.mx

Changes

  • add PTO_AccPhaseAttr
  • expose accPhase on the affected PTO ops with default Unspecified
  • preserve accPhase in PTOViewToMemref
  • lower accPhase to EmitC / generated C++ template arguments
  • add lit coverage for default / Partial / Final cases
  • make minimal verifier updates required by the mx validation cases

Validation

Validated in a clean worktree on dev-481211.

Passed FileCheck-based tests:

  • tmatmul_accphase_emitc.pto
  • tgemv_accphase_emitc.pto
  • tmatmul_mx_accphase_emitc.pto
  • tgemv_mx_accphase_emitc.pto

Verified generated forms include:

  • non-template default calls
  • pto::AccPhase::Partial
  • pto::AccPhase::Final

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 26, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: fix: Add AccPhase support for matmul and gemv ops #700 fix: Add AccPhase support for matmul and gemv ops
  • Author: jimmychou0
  • Base/Head: main / tmp/accphase-api-pr
  • Head SHA: 849a30193820
  • Trigger: PR 有新提交
  • Generated At: 2026-05-26T12:45:06Z
  • Previous Head SHA: 995c088ba894
  • Status: completed

Summary

PR #700的 AccPhase 支持存在覆盖缺口:mixed-precision 的 accumulate/bias matmul/gemv 变体仍然不会携带或下发该模板参数。

Findings

  1. P2 AccPhase support stops short of the MX accumulate/bias variants include/PTO/IR/PTOOps.td:883

The new feature is wired through tmatmul.bias, tmatmul.mx, tgemv.bias, and tgemv.mx, but the mixed-precision accumulate/bias forms are still outside that contract. tmatmul.mx.acc, tmatmul.mx.bias, tgemv.mx.acc, and tgemv.mx.bias do not expose an accPhase attribute in ODS, and their EmitC lowerings still hard-code empty template args (lib/PTO/Transforms/PTOToEmitC.cpp at the TGEMV_MX / TMATMUL_MX lowerings around lines 9768, 9788, 9848, and 9868). As a result, any K-loop scheduler that uses the MX accumulate/bias paths still has no way to mark partial vs final iterations, so those calls continue to lower as plain TMATMUL_MX(...) / TGEMV_MX(...) instead of the templated AccPhase form. That is a contract mismatch with the PR’s stated feature and can leave mixed-precision matmul/gemv final-iteration behavior incorrect or unsupported.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces precision mode attributes for various mathematical operations (such as division, exponentiation, logarithm, square root, etc.) and accumulation phase attributes for matrix multiplication and GEMV operations in the PTO dialect. It also updates the verification logic, the conversion patterns to EmitC, and the view-to-memref lowering passes to propagate these attributes correctly. The review feedback correctly identifies a critical issue in the EmitC conversion patterns for TGEMV_BIAS, TMATMUL_BIAS, and TMATMUL_MX. Because these underlying C++ functions are void and modify buffers in-place, generating an emitc::CallOpaqueOp with non-empty result types produces invalid C++ assignments (e.g., auto var = TGEMV_BIAS(...)) that will fail compilation. The suggested fixes correctly replace these patterns to use TypeRange{} and replace the MLIR operations with the destination buffer dst instead.

Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
@HecreReed
Copy link
Copy Markdown
Collaborator

/review

@jimmychou0 jimmychou0 force-pushed the tmp/accphase-api-pr branch from 731563a to 3840b8e Compare May 26, 2026 03:22
@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

PR #700 has a TableGen build blocker plus several accPhase integration gaps that currently break codegen and CI.

Findings

  1. P1 Duplicate `PTO_AccPhase` definitions break TableGen generation include/PTO/IR/PTOAttrs.td:534

PTO_AccPhaseEnum / PTO_AccPhaseAttr are now declared twice in the same TableGen file. This PR adds a second copy at lines 534-542, but the original definitions are still present at lines 570-578. TableGen records share a global namespace, so this redefinition blocks ODS/TableGen generation before the C++ build even starts.

  1. P1 EmitC lowering ignores `accPhase` on most newly-attributed matmul/gemv ops lib/PTO/Transforms/PTOToEmitC.cpp:4578

accPhase was added to pto.tgemv, pto.tgemv.acc, pto.tgemv.bias, pto.tgemv.mx, pto.tmatmul.bias, and pto.tmatmul.mx, but their EmitC lowerings still emit bare TGEMV* / TMATMUL* calls with empty template arguments. Only PTOTMatmulToTMATMUL and PTOTMatmulAccToTMATMULACC call buildAccPhaseTemplateArgs(). As a result, {accPhase = partial|final} is silently ignored for most of the newly-covered ops, so the advertised codegen contract is broken and the new *_accphase_emitc.pto tests cannot pass.

  1. P2 `PTOViewToMemref` drops `accPhase` on the new GEMV/matmul variants lib/PTO/Transforms/PTOViewToMemref.cpp:1939

PTOViewToMemrefPass recreates several of the newly-attributed ops without forwarding op.getAccPhaseAttr(). TMatmulBiasOp, TMatmulMxOp, TGemvOp, TGemvAccOp, TGemvBiasOp, and TGemvMxOp are all rebuilt with the default phase, so any non-default partial/final setting is lost as soon as this pass runs. That regresses the memref pipeline even after the EmitC side is fixed.

  1. P2 New A5 accPhase lit checks do not match the current lowering output test/lit/pto/tmatmul_accphase_emitc.pto:27

The new A5 FileCheck expectations use pto::AccPhase::Partial/Final, but buildAccPhaseTemplateArgs() still emits AccPhase::Partial/Final (which is also what the pre-existing A3 test checks). So even the tmatmul / tmatmul.acc cases that do honor accPhase will not match the new checks in tmatmul_accphase_emitc.pto and the sibling A5 accphase tests.

Comment thread include/PTO/IR/PTOAttrs.td Outdated
@jimmychou0 jimmychou0 force-pushed the tmp/accphase-api-pr branch from 995c088 to 849a301 Compare May 26, 2026 11:37
@jimmychou0
Copy link
Copy Markdown
Author

/run a3

@reedhecre
Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:d4969d5c7d12
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260526_203006_manual_pr700.log
  • 手动指令:/run a3
  • 触发人:jimmychou0
  • 触发评论:fix: Add AccPhase support for matmul and gemv ops #700 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #700

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260526_203006_manual_pr700/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 2421703] 2026-05-26-21:13:15.984.727 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 138, there is an exception of aicore error, core id is 16, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000188, vec error info: 0, mte error info: 0x7a06000047, ifu error info: 0x212c200018880, ccu error info: 0x40a01900778000d8, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x6000047, fixp_error1 info: 0x7a, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-26 21:13:17] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260526_203006_manual_pr700/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 2653882] 2026-05-26-21:13:55.566.595 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 139, there is an exception of aivec error, core id is 9, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000160, vec error info: 0x1e000000a8, mte error info: 0xa50313108b, ifu error info: 0x200003f800000, ccu error info: 0x40a00e0000000052, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x313108b, fixp_error1 info: 0xa5, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-26 21:13:56] ERROR: testcase failed (exit 1): tprefetch_async_binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants