Port frontend tile fusion to EmitC mainline#704
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a frontend tile-fusion optimization pipeline for the A5 EmitC mainline, adding passes for fusion planning, instruction scheduling, and last-use marking, along with supporting semantic analyses and C++ post-processing. The review feedback highlights a critical scheduling bug in PTOOpScheduling.cpp where moving only the placement operator breaks the contiguity of the fusion group. Additionally, improvements are suggested to translate a Chinese comment to English in PTOMarkLastUse.cpp, replace std::isdigit with llvm::isDigit in CppPostprocess.cpp to prevent potential undefined behavior, and simplify a redundant ArrayRef conversion in FusionAnalysis.cpp.
| !canMoveLaterAcross(placement, blockingOp)) | ||
| break; | ||
|
|
||
| placement->moveAfter(blockingOp); |
There was a problem hiding this comment.
Moving only placement later via placement->moveAfter(blockingOp) leaves the previously scheduled members of the group behind, which breaks the contiguity of the fusion group. To maintain contiguity, all previously scheduled members of the group must be moved together with placement, or the scheduling logic should be revised to avoid breaking contiguity.
| lastUseMask.push_back(0); | ||
| continue; | ||
| } | ||
| // isSpanLocalLastUseCandidate的检查范围大于hasLaterUseAfterSpan |
There was a problem hiding this comment.
| : encoded.slice(pos, next); | ||
| if (token.empty()) | ||
| return false; | ||
| if (!llvm::all_of(token, [](char c) { return std::isdigit(c); })) |
There was a problem hiding this comment.
Using std::isdigit with a char argument can lead to undefined behavior if the character is signed and has a negative value. It is safer and more idiomatic in LLVM/MLIR to use llvm::isDigit.
| if (!llvm::all_of(token, [](char c) { return std::isdigit(c); })) | |
| if (!llvm::all_of(token, [](char c) { return llvm::isDigit(c); })) |
| if (info.vRow == ShapedType::kDynamic || info.vCol == ShapedType::kDynamic) | ||
| info.unprovenReason = IterationDomainUnprovenReason::DynamicShape; | ||
|
|
||
| for (Value value : ArrayRef<Value>(anchorValues).drop_front()) { |
There was a problem hiding this comment.
The explicit conversion to ArrayRef<Value> is redundant because anchorValues is already an ArrayRef<Value>. You can simplify this by calling drop_front() directly on anchorValues.
| for (Value value : ArrayRef<Value>(anchorValues).drop_front()) { | |
| for (Value value : anchorValues.drop_front()) { |
Codex Review该评论由 review 机器人自动更新。
Summary发现 3 个问题:pure-op bridge 调度不会生效、非连续 fusion group 不会被拒绝、且 FusionPlan 会跨 side-effecting hard boundary 规划出不可调度的 group。 Findings
Once a group has started,
|
Summary
Reintroduce frontend tile fusion on the current A5 EmitC mainline behind
--enable-op-fusion, but keep the implementation intentionally small:PTOViewToMemrefpto.last_usedirectly on scheduled block-localspans
[[pto::last_use(... )]] CALLEE(...)pto.fusion_region/pto.yieldlifecycle in the shared mainline
In other words, this PR keeps the user-visible goal of "frontend op scheduling
FusionRegion-based IRcontract from the implementation.
What changed
Driver and pipeline
--enable-op-fusionon the currentptoasdriver--pto-arch=a5with--pto-level=level2|level3FusionPlanOpSchedulingPTOMarkLastUsePTOViewToMemrefinstead of failing compilation
Frontend fusion core
mainline:
FusionAnalysisFusionOpSemanticsPTOFusionPlanPTOOpSchedulingrather than wrapping them in a region op
last_useimplementationPTOMarkLastUseas the place that computespto.last_usepto.fusion.group_id/pto.fusion.orderthe span
last_useper tile operand slot, with the following rules:0EmitC
last_useoutput[[pto::last_use(... )]] CALLEE(...)PTOToEmitCCppPostprocessemitted operand order, which keeps the output tile slot at
0in the finalemitted attribute
Explicit non-goals / removed scope
pto.fusion_regionpto.yieldPTOFusionRegionGenPTOFlattenFusionRegionPTOViewToMemref, memory planning, reserved-buffer resolution, syncinsertion, or tile-handle materialization
Why this shape
The original larger port bundled three concerns together:
last_useemissionFor the current goal, only (1) and (3) are essential. This PR keeps the
useful part of the feature and localizes the extra complexity to
PTOMarkLastUse, instead of requiring multiple existing shared passes tounderstand and preserve a new region lifecycle.
Testing
Added focused tile-fusion coverage for:
treshapeboundarytreshapebridgelast_use:[[pto::last_use(... )]]emissionpto.fusion_region/pto.yieldFocused verification run:
llvm-lit -sv build/test/lit/tile_fusion