support vpto backend#699
Conversation
Introduce a three-pass pipeline that lowers PTO tile ops to vector-level
implementations via TileLang DSL templates:
- ExpandTileOp: invokes TileLang Python DSL to instantiate template
functions and replaces tile ops with func.call. SpecKey covers all
operands; tile_buf operands are passed through without bridging.
- PTOInlineLibCall: extended to recognize tilelang instance functions via
the attribute set by the DSL frontend.
- FoldTileBufIntrinsics: resolves pto.tile_buf_addr / tile_valid_rows /
tile_valid_cols, including dynamic valid-shape via pto.bind_tile chain
tracing.
- MemrefToTileBuf: recovers tile_buf types from memref + bind_tile
metadata after PlanMemory/InsertSync.
- PTOViewToMemref: insert pto.bind_tile anchors for tile_buf function
args so MemrefToTileBuf can recover them.
Adds new PTO ops (tile_buf_addr/tile_valid_rows/tile_valid_cols),
ptoas pipeline wiring, design docs, and unit tests.
Relocate FileCheck-based VPTO .pto tests from test/vpto to test/lit/vpto so they are discovered by the lit framework. Wrap auto-vecscope tests in a vector kernel submodule to match the current VPTO container form.
* add tcolargmax/min tileops lib implementation * fix tcolargmax/min implementations and tile_buf declaration * add init texp high precision implementation * add texp high precision implementation * remove exp high precision code in math.py(already in custom.py) * add empty lines in math.py * fix texp high precision test case's eps * fix texp test case * move exp hp from custom to exp_hp.py
* Rename VPTO MTE semantic ops Input: existing VPTO memory-transfer semantic ops and tests using dma/cube/acc_store-style names. Output: public semantic ops use mte_<src>_<dst> names, with ptr addrspace aliases for the new memory-space spelling. Key steps: update ODS/parser/lowering/expand/ptr-normalize paths; migrate lit, VPTO, and TileLang test inputs; refresh VPTO/ISA docs and legacy raw-op wrapper mappings. --------- Co-authored-by: mouliangyu <mouliangyu@huawei.com>
* Add OP for TMrgSort * Add OP for TMrgSort | fix review * Add constraints to tmov template for UB2UB ND2ND only Add constraint function to restrict template_tmov_basic to only support: - UB2UB: Both src and dst must be in Unified Buffer (memory_space="ub") - ND2ND: Both tiles must have N-dimensional layout (s_layout=NONE_BOX) Other scenarios (GM2UB, UB2GM, specialized layouts) require different implementation paths and are explicitly rejected by this constraint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: caojian5 <caojian5@huawei.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Add HP (HIGH_PRECISION) support for TDiv * Add HP for divs recip rowexpanddiv colexpanddiv * [fix] Remove redundant else branch in HIGH_PRECISION mode since MLIR validation guarantees f16/f32 only * [fix] Add missing HIGH_PRECISION test kernels for tdivs ST test HP test cases were defined in cases.py but missing from implementation files, causing FileNotFoundError when compare.py tried to read output.bin. Added 12 HP kernels across three synced layers: - tdivs.pto: HIGH_PRECISION kernels with precision_mode attribute - main.cpp: kCases[] entries and launch wrapper declarations - launch.cpp: Launch functions with correct IEEE 754 scalar values Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: caojian5 <caojian5@huawei.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request significantly expands the project's documentation by adding detailed ISA specifications, design documents for core operations like mad and acc_store, and a framework for TileLang ST validation. It also integrates the PTO-Gym submodule, updates build and Docker configurations for TileLang DSL support, and introduces a script for automated release document generation. The review feedback identifies several areas for improvement in the generation script, including more robust version string parsing, avoiding incorrect heading demotion within code blocks, replacing fragile string matching with structured markers, and ensuring anchor uniqueness to prevent broken links in merged documentation.
| notes[version] = "Release refresh" | ||
|
|
||
| def key_fn(item: str) -> tuple[int, ...]: | ||
| return tuple(int(part) for part in item.split(".")) |
There was a problem hiding this comment.
| new_level = min(6, len(hashes) + levels) | ||
| return f"{'#' * new_level} {heading}" | ||
|
|
||
| return re.sub(r"^(#{1,6})\s+(.*)$", replace, text, flags=re.MULTILINE) |
There was a problem hiding this comment.
The regex r"^(#{1,6})\s+(.*)$" used with re.MULTILINE matches any line starting with a hash followed by a space. This can incorrectly demote comments inside fenced code blocks (e.g., in Python or Bash snippets) if they are at the start of a line. It is recommended to use a more robust way to identify headings that are not part of code blocks, such as splitting the text by code block markers before applying the substitution.
| if "For detailed semantics, C-style pseudocode, and CCE mappings" in line: | ||
| continue | ||
| if "CCE correspondence" in line or "builtin mapping" in line.lower(): | ||
| continue |
There was a problem hiding this comment.
This logic for stripping unwanted lines relies on exact string matches for prose, which is fragile. If the source document (vpto-spec.md) is updated with slightly different wording, these lines will no longer be stripped. Consider using more stable markers or a structured approach to identify content to be removed.
| if anchor_suffix: | ||
| return f"]({anchor_suffix})" |
There was a problem hiding this comment.
When an intra-bundle link contains an anchor (e.g., [Foo](chapter.md#anchor)), it is rewritten to ](#anchor). This assumes that the anchor name is unique across all inlined chapters in the merged document. If multiple chapters use the same anchor names, these links may break or point to the wrong location. Consider prefixing internal anchors with the chapter name during the merge process to ensure uniqueness.
Codex Review该评论由 review 机器人自动更新。
SummaryPR #699 的 VPTO tile-op expansion 存在 3 个明确问题: Findings
这个 PR 给 |
2a4ccae to
db2bf1f
Compare
|
/run a3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #699syncall_binding
tprefetch_async_binding
|
Summary
This PR adds the VPTO backend to PTOAS. The new backend is selected with
--pto-backend=vpto; the default backend remainsemitc.Main additions:
tools/ptoas, including VPTO module normalization, TileLang template expansion, VPTO pointer normalization, wrapper-op expansion, LLVM lowering, host stub emission, and fatobj emission.ptoas.Impact on non-VPTO code paths
The default
ptoaspath still uses theemitcbackend. VPTO split/normalize, TileLang expand, VPTO pointer cleanup, VPTO LLVM emission, and fatobj emission are only entered through the VPTO backend branch or explicitly registered passes.This PR does change shared PTO IR and shared passes used by non-VPTO flows:
!pto.ptrnow carries a PTO memory space. The parser/printer accepts forms such as!pto.ptr<T, gm>and!pto.ptr<T, ub>, with GM as the default.getPTOAddressSpaceAttr(Type)now reads the memory space fromPtrType, andPTOViewToMemrefpreserves that memory space when lowering ptr types to memrefs.set_flag,wait_flag,get_buf, andrls_bufnow use custom parsers/printers.get_buf/rls_bufalso acceptpipeattrs in addition to the previous sync op type attrs.castptr,get_tensor_view_stride, andmem_barare added as shared IR surface.precision_modeattr:tcolexpanddiv,tdiv,tdivs,texp,tlog,trecip,trowexpanddiv,trsqrt, andtsqrt. The verifier requiresHIGH_PRECISIONto be used only with f16/f32 element types.castptrvalidates supported cast forms and memory-space consistency; A5tload/tstorevalidate vec layouts; shift-like tile ops require src0/src1/dst element types to match; scalar tile ops such astadds/tmuls/tsubsrequire valid rows to match on A5;tpart*ops validate source valid shapes against destination valid shapes;tselsvalidates row-major layout for mask/src/dst on A5.PTOViewToMemrefpreserving ptr memory spaces, insertingpto.bind_tileanchors for original tile_buf function arguments, loweringget_tensor_view_stride, and preserving attrs when rebuilding selected tile ops.PTOPlanMemoryhandles null alias inputs defensively.PTOToEmitCmaps the extendedReluPreModevaluesScalarRelu,VectorRelu, andPwl.Validation
Local validation run on this branch:
ninja -C build-ci-local ptoas ptobc installninja -C build-ci-local check-pto(416/416)OK=263 FAIL=0 SKIP=16)docker/test_ptoas_cli.sh312tests)281/281)85/85)docker/collect_ptoas_dist.shwas checked locally and failed because the script only collects LLVM libraries from the CI LLVM workspace layout; the local LLVM installation is under the actions-runner tool cache.