This change borrows the ideas from computeExpanded/CollapsedLayoutMap
and computes the dynamic strides at runtime for the memref descriptors.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp | ||
---|---|---|
1352 | calculated | |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
1825–1836 ↗ | (On Diff #423738) | Can you factor this part of the CL out into a separate one? Along with the bufferization test case. |
1826–1827 ↗ | (On Diff #423738) | Good catch, there is definitely a problem here in the old implementation. The old implementation assumes that strides are sorted according to size. This is not always the case, however. E.g., you can use memref.transpose (which only touches the memref descriptor, not the data) to produce a new view on the data that is iterated over in a transposed fashion. The check for 1 only solves this issue partial. Instead, we should be using the minimum stride in the current reassociation group. (The old code always used the last one.) |
1828 ↗ | (On Diff #423738) | This is problematic because we would be failing silently at runtime with a wrong result. We cannot just return failure() unfortunately. We should do the same computation as described above, choosing the minimum stride in the reassociation group. We don't need a loop for this, just a sequence of min operations. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1826–1827 ↗ | (On Diff #423738) | The checking for 1 here is need for non transposed cases when the strides are in order. When the innermost dim is of size 1 the result stride needs to take the smallest stride on the src dim that's not size 1. This is the case for the test case collapse_shape_of_slice3 I added. I realize the input of collpase_shape for this case would not be contiguous but it can still be represented by the strided form and is needed for our testcase. I can't think of a good way to detect this at runtime unless inserting runtime assertion to assume dynamic dims are not size 1. Do you have any suggestions on that? |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1826–1827 ↗ | (On Diff #423738) | We had a discussion and concluded that this handling of dims of size 1 is correct. More work is needed to handle cases where strides are not sorted. This can be done in separate change. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1825–1836 ↗ | (On Diff #423738) | Moved this part to a new patch https://reviews.llvm.org/D124137. |
Factored out the part fixing size 1 cases into https://reviews.llvm.org/D124137.
The remaining memref to llvm lowering part needs to be updated to
take into consideration where the dynamic size is 1 as well.
Calculate with the new logic at runtime. I got an error about dominance here https://gist.github.com/949dafb96801866cf067b5baba4ac614. It's complaining the last mem ref descriptor doesn't dominate the use tensor.collapse_shape but it seems to me it does dominate tensor.collapse_shape.
This looks good. Can you add another test case for collapse and expand, so that we have one for the static case (taking the stride directly from the src type) and one for the dynamic case (also checking that the blocks+branches are generated correctly).
mlir/include/mlir/Conversion/LLVMCommon/MemRefBuilder.h | ||
---|---|---|
82 | Returns | |
mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp | ||
1369 | calculated | |
1382 | Add a TODO that we should take the minimum stride in the reassociation group instead of just the first where the dimension is not 1. | |
1505 | Add a comment that there could be mixed static/dynamic strides. For simplicity, we recompute all strides if there is at least one dynamic stride. | |
mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir | ||
812 | Why did this test case change? This commit is just adding new functionality for a previously not handled case, so existing test cases should not change. |
Thanks for the review!
mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir | ||
---|---|---|
812 | yea, I should have kept the implementation for the identity layout case instead of just using the new logic. Changed back the test cases and added the identity layout logic. |
Returns