This patch folds the nvgpu.device_async_copy on subviews into the original memref and eliminating memref.subviews
Details
Diff Detail
Event Timeline
Applied comment. Adding back an empty line after the includes.
mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir | ||
---|---|---|
556 | The test itself is not anchored on c0. After the subview, the the read will be on %gmem_memref_subview_2d[%c0, %c0] for all the reads and %c0 is in the original IR is defined to use there. The check is that the subview indices are are used in the nvgpu.device_async_copy's after the folding %[[GMEM_MEMREF_3d]][%[[IDX_1]], %[[IDX_2]], %[[IDX_3]]], i.e., IDX_1, IDX_2, IDX_3. %smem_memref_4d is not folded so we can ignore that for now. |
mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir | ||
---|---|---|
556 | If the original indices are not zero they need to be combined with the subview indices right? Don’t we miss testing this part? |
nit: I would leave this empty line