- Complete rewrite of the verifier.
- CollapseShapeOp verifier will be updated in a subsequent commit.
- Update and expand op documentation.
- Add a new builder that infers the result type based on the source type, result shape and reassociation indices. In essence, only the result layout map is inferred.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1817 | I find the logic unwieldy, and I can't tell offhand whether correct or not. unsigned shapeIndex = resultShape.size() - 1; // 1-1 mapping between srcStrides and reassociation packs. // Each srcStride starts with the given value and gets expanded according to the proper entries in resultShape. for (auto it : llvm::zip(llvm::reverse(reassociation), llvm::reverse(srcStrides))) ReassociationIndices reassoc = std::get<0>(it); int64_t currentStrideToExpand = std::get<1>(it); // Note that a stride does not get expanded along the first entry of each shape pack. // Example: // strides = [10000, 100, 1], // reassociations = [[0], [1, 2, 3], [4]], // resultSizes = [2, 5, 4, 3, 2] = [[2], [5, 4, 3], [2]] // For the purpose of stride calculation, the useful sizes are: [x, x, 4, 3, x] = [[x], [x, 4, 3], [x]]. // resultStrides = [10000, 1200, 300, 100, 1] for (unsigned idx = 0, e = reassoc.size() ; idx < e; ++idx) { using namespace saturated_arith; reverseResultStrides.push_back(currentStrideToExpand); currentStrideToExpand = Wrapper(currentStrideToExpand) * resultShape[shapeIndex--]; } } return makeStridedLinearLayoutMap(llvm::to_vector<8>(llvm::reverse(reverseResultStrides)), srcOffset, srcType.getContext()) | |
1853 | I forget how no-layout / identity layout are both represented, please double check you have both cases working. | |
1925 | typo | |
1955 | Please extract the code below in a helper function that you can also reuse in the builder. |
update
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1817 | The saturated_path thing didn't help much because I also have to deal with dynamic dim sizes (not just strides). There are different constants for dynamic strides and dynamic dim sizes. But good idea with the zip. This made things already much simpler. Also added an example. Is this easier to follow now? | |
1853 | From my understanding, "no layout map specified" = "standard layout" = "isIdentity()". Both cases are covered by unit tests. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | This is where things become interesting. The ordering of strides might not be same as the ordering of dimensions. Affine maps (unfortunately?) allows you to specify non-inner dimension to be the fastest varying. I am OK if the decision is to say that we arent going to allow that case. But this patch seems to be making an implicit assumption about striding (unless I am mistaken), and I'd like it to be explicit. | |
1908 | This should be logic that is common to collapse and expand for tensor and memrefs. Could we put this in a utility ? |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | You mean for the expanded dims, right? The strides of the other dims are already specified in the source memref. But for the expanded dims within a reassociation group, yes, I currently make that assumption. (I didn't think about this much, but when you don't specify any custom layout, this is what you get.) This commit adds a builder that infers the layout map. The function that we are looking at here is doing that. The strides of the expanded dims have to be decided during bufferization. Either as we do it here or in a different way. When it comes to the verifier, we could make it more permissive, also allowing memref result types where the expanded strides have a different ordering. That would make the logic a bit more complex though, because we could not reuse computeExpandedLayoutMap for both the builder and the verifier that easily anymore. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | @mravishankar could you please elaborate a bit? The algorithm I described does not care about stride value ordering; i.e. we could (and we should) replace the doc and test by e.g. [1, ?, 1000]. | |
1817 | Yes the fact that dynamic dims use -1 as a sentinel is an unfortunate accident of history. Still, this is easily solvable in the saturated_arith case with an explicit constructor for sizes and strides and a saturated bit. /// Helpers to write more idiomatic operations. namespace saturated_arith { struct Wrapper { static Wrapper stride(int64_t v) { if (ShapedType::isDynamicStrideOrOffset(v)) return Wrapper{true, 0}; // the value does not matter, use 0 for more defensive programming return Wrapper{false, v}; } static Wrapper offset(int64_t v) {...} static Wrapper size(int64_t v) { ... } int64_t asOffset() { return saturated ? ShapedType::kDynamicStrideOrOffset : v; } int64_t asSize() { return saturated ? ShapedType::kDynamicSize : v; } int64_t asStride() { return saturated ? ShapedType::kDynamicStrideOrOffset : v; } bool saturated; int64_t v; }; Wrapper operator+(Wrapper a) { if (saturated || a.saturated) return Wrapper{true, 0}; return Wrapper{false, a.v + b}; } Wrapper operator*(Wrapper a) { if (saturated || a.saturated) return Wrapper{true, 0}; return Wrapper{false, a.v * b}; } } // namespace saturated_arith Confining the branchy logic to this helper greatly simplifies the logic and reader's ability to follow comments. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | I think what Mahesh said is that there are multiple ways to compute the result strides. E.g., the middle dim of memref<20x10x5xf32, offset: 0, strides: [50, 5, 1]> could be expanded into:
Is this correct? I currently choose the first one. When calling the builder that infers the layout map (which we need for bufferization), we have to make decision. This decision is not currently documented. When it comes to the verifier, we could generalize it to allow both of the above options. At the moment only the first one is allowed. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1817 | Added a helper function computeNextStride that contains the branching logic and is reused during the CollapseShapeOp computation. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | I see, we can reduce it to just 10xf32 with stride = [1] and split that into 2x5xf32. Now if we reshape to 2x5 with stride = [5, 1], the data is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 an i, j iterator with 0 <= i < 2, 0 <= j < 5 would access data in the expected order. However, if we reshape to 2x5 with stride = [2, 1], the data is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 This is incorrect because only a subset of the data would be accessed and some elements would be accessed twice. So this internal permutation of strides is generally not correct and one should just use memref.transpose which does the right thing. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1908 | Can't be used for the tensor op verification yet because it's still in MemRefOps.cpp, but extracted it into a separate function that will be used in the CollapseShapeOp verification in the next commit. |
Removing my blocker cause I was mistaken. Please add tests for non-row major striding.
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | Actually my bad. I missed that the code does not make assumptions about the stride layout ordering. For an expanded dimension the stride increasing from right to left of the expanded dims is fine. I was looking for tests and didnt find any. Please do add test as Nicolas suggested. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1841 | Ok, I see this is now the same logic as I've written down so we can go ahead. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | It's not clear to me yet what additional test I should add. Expansions into memrefs such as memref<20x2x5x5xf32, offset: 0, strides: [50, 5, 25, 1]> will correctly fail verification. Do we need to extend this? |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | I just did another experiment. This implementation only works if the inner-dimensions are the fastest-varying ones when it comes to strides. Otherwise, we can get faulty stride computations like here: %r6 = memref.expand_shape %arg6 [[0], [1], [2, 3], [4]] : memref<20x2x10x5xf32, offset: 0, strides: [100, 10, 50, 1]> into memref<20x2x2x5x5xf32, offset : 0, strides : [100, 10, 250, 50, 1]> This op passes the verifier. (All other result types will be rejected.) I think we should just limit the verifier/op to source memrefs where the strides come in the same order as the dims. And then extend the op if we actually need to support other layout maps. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | Added a check and expanded the op docs to disallow such memrefs. Can be extended in a follow-up revision if necessary. |
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | ||
---|---|---|
1806 | I don't see any faulty stride computation but your source memref has internal aliasing. The restriction seems artificial and based on incorrect assumptions. I addressed in https://reviews.llvm.org/D122845. | |
1806 | Ordering strides looks adhoc and undesirable. |
This is where things become interesting. The ordering of strides might not be same as the ordering of dimensions. Affine maps (unfortunately?) allows you to specify non-inner dimension to be the fastest varying. I am OK if the decision is to say that we arent going to allow that case. But this patch seems to be making an implicit assumption about striding (unless I am mistaken), and I'd like it to be explicit.