[Frontend] Carry gather/scatter offset as an explicit transfer descriptor (#282)#285
Open
YWHyuk wants to merge 2 commits into
Open
[Frontend] Carry gather/scatter offset as an explicit transfer descriptor (#282)#285YWHyuk wants to merge 2 commits into
YWHyuk wants to merge 2 commits into
Conversation
…ptor
Replace the affine.apply{indirect_access} symbol smuggle with an explicit
offset descriptor. convert_indirect_indexing returns the offset spad instead
of folding sympy.Symbol(out) into the index; emit_transfer carries it as a
togsim.transfer operand; decompose_transfer lifts that operand to a
memref.dma_start {indirect_offset = @spad_symbol} attribute (memref.dma_start
is a registered op and rejects an extra operand, but accepts the attribute);
lower_dma_to_gemmini reads the attribute and resolves the global for CONFIG4
(drops _find_indirect); build_skeleton adds the offset spad to the gather DMA
read_bufs so the offset-build -> gather dependency edge forms in the trace.
The index stays clean (base only). Validated on both paths: Spike functional
(computed-index gather + scatter allclose, pointwise/reduce regression 0) and
the C++ trace timing path end-to-end (allclose; gather togsim_dma read_bufs now
includes the offset spad).
Indirect addressing (scattered-DMA timing) in the new trace path is a separate
gap tracked in issue #284.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EEfyUDpkMLRYZ2NAMbb3jN
…ride, compute-loop multi-dim offset Three refinements on top of the explicit offset descriptor: 1. Detect indirect access by an explicit symbol set instead of an "tmp" substring match. indirect_indexing now records str(index_var) in self.indirect_symbols; a _has_indirect(expr) helper tests the index free_symbols against it; the former "tmp"-string sites (store, get_dma_info, convert entry/indirect_dims/stride) use it. Removes the now-dead _find_indirect from lower_dma_to_gemmini. 2. Single indirect dim: pass the raw index spad and let the MVIN apply the gather stride per position (CONFIG4 offset_stride) instead of a vector_load/muli/vector_store round-trip that pre-multiplied the stride. emit_transfer carries offset_stride; decompose copies the attribute; lower programs CONFIG4 with it (was hardcoded to 1). 3. Multi indirect dim (e.g. x[ix, iy]): the offset is a sum of strided indices, which a single CONFIG4 channel cannot do, so the sum stays in the kernel -- but build it in the compute loop (chunked by compute_vec_size, not a single tile-wide vector) and store it to a DEDICATED offset spad so an index that is live elsewhere (x[ix, iy] + ix) is not clobbered. push_step separates the offset-build loop from the gather that reads it. Adds multi-dim gather and index-reuse cases to test_indirect_access. Validated: indirect/scatter/embedding + the two new multi-dim cases pass; add/matmul/softmax regression 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EEfyUDpkMLRYZ2NAMbb3jN
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #283 (base =
feature/codegen-ordered-steps).Makes the gather/scatter offset an explicit transfer descriptor instead of smuggling it through the DRAM-index
affine.apply{indirect_access}.convert_indirect_indexingreturns the offset spad (drops the+ sympy.Symbol(out)fold and the scalar-load/index_cast); the index stays clean (base only).emit_transfercarries it as atogsim.transferoperand.decompose_transferlifts that operand to amemref.dma_start {indirect_offset = @spad_symbol}attribute.memref.dma_startis a registered op and rejects an extra operand, but it accepts the attribute (verified) -- so we do not need to dropmemref.dma_start(the original concern in Indirect access: make offset an explicit togsim.transfer operand (blocked by memref.dma_start; needs direct togsim.transfer -> gemmini lowering) #282).lower_dma_to_gemminireads the attribute and resolves the global forCONFIG4(drops_find_indirect).build_skeletonadds the offset spad to the gather DMA'sread_bufs, so the offset-build -> gather dependency edge forms in the trace.Validation
allclose; add/matmul/reduce/softmax regression 0.allclose; the generatedtrace.cppgathertogsim_dmaread_bufsnow includes the offset spad.Follow-up
Indirect addressing (scattered-DMA timing) in the new trace path is still missing -- tracked in #284. Remaining cleanup: the
"tmp"-string indirect detection -> explicitindirect_indexingsymbol set.🤖 Generated with Claude Code