Fold ExtractSliceOp when the source is a constant.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1261 | Drop trivial braces here. | |
1272 | Can you use longer names here? I and E aren't clear to me here. | |
1304 | Drop the llvm:: here. Also do we need the hard coded 6? | |
1313 | Please use m_Constant instead of hardcoding constant operations. | |
1317–1330 | Can you format here? | |
1354 | Drop trivial braces here. |
[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp. │
│
Fold ExtractSliceOp when the source is a constant.
Actually, I was under the impression that you were handling the splat case first before moving on to the non-splat case. In reality, we probably do not want to do such a transformation for the non-splat case. Typically these constants are very large (several MBs). This would end up reading these values element by element in the compiler. That leads to a huge increase in compilation time. Still there is a value of doing that as a preprocessing step within the compiler to (a) reduce runtime costs, and (b) also reduce the binary size of the compiled model.
So one way this has been done in IREE is to extract out such computations as a pre-processing step, compile it, and run it using IREE, and use the result of the run as an input to the rest of the computation.
Long story short, we need to do such transformations only for either splat constants, or "small constants".
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1289 | Probably better to move this out of here and check much earlier. Then you can make the function just take the offsets, strides, and sizes directly as arguments. | |
1321 | I have a broader comment on the constants that need to be folded this way on the patch itself. | |
1326 | Nit: Instead of templating on the iterator type, can we template on the Attribute type directly , i.e. DenseIntElementsAttr/DenseFPElementsAttr? |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1289 | This will make the code in foldConstant() lengthy so I would keep this as is. | |
1321 | This is done. Thank you for your thoughtful comments! | |
1326 | I tried to use the Attribute only, but there is a difficulty to get the underlying concrete element type solely from the attribute since it is an implementation detail. Let's keep it as is. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1274 | Related to the comment below, if this method took the offsets, size and strides directly, say static Attribute sliceDenseElementsAttr(Value constant, ArrayRef<int64_t> offsets, ArrayRef<int64_t> sizes, ArrayRef<int64_t> strides) { ... } then the logic here is agnostic to the fact that these came from a tensor.extract_slice. That separation of concerns seems cleaner to me. | |
1289 | See comment about. Doing this processing in the matchAndRewrite method will separate the actual slice handling of the constant and getting the values of offsets, sizes and strides from the extract_slice op. You could just check for the fold conditions and also retrieve the values directly in the ExtractSliceOp::fold(...) method makes it clear when the folding is done. | |
1323–1329 | You also need to have a size control here to avoid compilation time explosion. |
For some unknown reason, arg diff --update only pushed one commit instead of two.
Now the change for the single use case is correctly updated.
Control the size of constant folding. When the size > 1024 (hard-coded now),
the constant folding won't be applied.
Looks good overall. Just a nit, and maybe some more lit tests?
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1296 | Nit : s/Sice/Since | |
1298 | A better TODO is "If the size of the constant folded is to be controlled, move this out of folding and make it a separate pattern which can accept an option to control the size" | |
mlir/test/Dialect/Tensor/canonicalize.mlir | ||
394 ↗ | (On Diff #408654) | Sorry should have mentioned earlier. Might be worth adding more tests, like having a larger constant and taking an interior out of it, etc. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1279 | attr is guaranteed to be non-null here. Also, you can use SplatElementsAttr in place of DenseElementsAttr above and drop this call to isSplat. | |
1284 | Why go to the defining op instead of op.source().hasOneUse()? | |
1303 | nit: Spell out auto here. | |
1320 | ||
1322 | Spell out auto here. | |
1353 | Spell out auto here. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1284 | I'm not thrilled by this heuristic, I don't think it the right thing to handle this kind of thing in such an ad-hoc way here. In particular because if there are two uses for this constant and both of them can be folded but both of them are guarded the same way, then we don't fold anything... | |
1298 | Right, but a corollary of the TODO is also that we shouldn't have an ad-hoc threshold like this here, because why this operation and no other? How is the threshold suitable in general? | |
1321 | This is unusual for the folder to look |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1284 | Thanks for the comment! Even though the pattern caught by the current code looks very limited, it is still incremental enhancement. I am going to add more patterns. I am also very interested in seeing what is the most common pattern regarding the constant folding for the real applications. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1284 | I commenting on the guard itself, I'm fine with adding the pattern, I object to the way it is guarded though: both by this "hasOneUse" and the arbitrary "kConstantFoldingMaxNumElements". |
I have added two more unit tests.
I will revisit the change to address Mehdi's comments. Thanks River, Mehdi, and Mahesh for the comments!
I see that you changed the condition to check for splat constants, but the same handling does not apply for splat constants. There is no need to take slices, etc (actually that might even be wrong, not sure)
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1284 | There is a difference between splat constants and non-splat constants. The reasoning for non-splat constants is, say you have a constant of type tensor<4x2xf32>. If there are multiple uses of this constant (potentially accessing different slices of the constant, or the whole constant), you will need to store the constant values in the binary somewhere. Having a new constant of smaller shape is just adding to binary size without any value. You might as well leave the tensor.extract_slice as is. Eventually it will just become a memref.subview followed by load. So when there are multiple uses it makes sense to leave the constant as is. if there is a single use, it means that the values outside of the slice are dead and you can just remove those values. It saves binary size. I agree with you about the kConstantFoldingMaxNumElements is fairly arbitrary. The same reasoning does not hold for splat constants though. So now that this change is targeting splat constants, i'd also suggest dropping both these heuristics. | |
1288 | IIUC, constants cannot have a dynamic shape. | |
1298 | Replied above with more context on this. | |
1321 | I am not sure I understand the comment here. | |
1330 | All of this is not necessary for a splat constant. There is only a single value. You just need to change the type of the constant as dictated by the offsets, sizes and strides. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1284 |
Right now the code is targeting only non-splat unless I am mis-reading? |
My bad. Saw the patch too late in the night for me and misread the conditions. Will sync with Okwan to figure out next steps.
Looks good to me.
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1227 | Nit: Combine this and next checks into one if (!sourceType.hasStaticShape() && !resultType.hasStaticShape()) return failure(); | |
1239 | I am not sure we really need this case? | |
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp | ||
67 ↗ | (On Diff #411432) | This will assert if result type is not static. In this case the constant verification enforces that the result of a constant is statically shaped. So maybe its fine, but FYI. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1227 | nit: the logic should be ||. Updated. | |
1239 | Are you referring to if (count == 0)? shape itself is used below. I will move it down. | |
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp | ||
67 ↗ | (On Diff #411432) | Yeah, the control function is just another control over whether the folding should happen or not. |
mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h | ||
---|---|---|
12 ↗ | (On Diff #411476) | Please use forward declaration instead of including headers: https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible |
32 ↗ | (On Diff #411476) | Is it disabled -> Disable ... |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
18–21 | Trim headers -- you don't need all of these. | |
1179 | Missing doc comment. | |
1181–1205 | Code comments are completely missing. | |
1262–1274 | Likewise. | |
1285 | Doc comment. | |
mlir/test/Dialect/Tensor/fold-constant-extractslice.mlir | ||
1 ↗ | (On Diff #411476) | Nit: extractslice -> extract-slice |
Thanks Uday for the comments. Please take another look.
mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h | ||
---|---|---|
12 ↗ | (On Diff #411476) | Unfortunately, ExtractSliceOp is used in the default value and it needs its concrete type, so it is not avoidable. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
1262–1274 | Please elaborate. I put a simple comment over newAttr. |
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp | ||
---|---|---|
1266 | All comments are to be terminated with a full stop. |
Try to keep the last line of the commit message with Differential Revision: https://reviews.llvm.org/D119605 ; not only will this close the revision, but also it'll allow folks in the future to find the discussion in the review from a git log.
(also the title of the revision / commit could have been updated since this evolved into not hooking up into the fold of the operations)
Reopening: The push to origin/main caused a build break due to a cyclic header inclusion between Tensor.h and Transforms.h. The issue didn't occur in my local machine, but it broke the testing system.
Trim headers -- you don't need all of these.