This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1259	Drop trivial braces here.
1270	Can you use longer names here? I and E aren't clear to me here.
1302	Drop the llvm:: here. Also do we need the hard coded 6?
1311	Please use m_Constant instead of hardcoding constant operations.
1315–1328	Can you format here?
1351	Drop trivial braces here.

Harbormaster completed remote builds in B149146: Diff 408103.Feb 11 2022, 5:10 PM

Addressed the comments.

Harbormaster completed remote builds in B149165: Diff 408125.Feb 11 2022, 6:20 PM

[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp. │

│

Fold ExtractSliceOp when the source is a constant.

Harbormaster completed remote builds in B149170: Diff 408131.Feb 11 2022, 7:07 PM

apply clang-format

Harbormaster completed remote builds in B149220: Diff 408192.Feb 12 2022, 10:58 AM

Actually, I was under the impression that you were handling the splat case first before moving on to the non-splat case. In reality, we probably do not want to do such a transformation for the non-splat case. Typically these constants are very large (several MBs). This would end up reading these values element by element in the compiler. That leads to a huge increase in compilation time. Still there is a value of doing that as a preprocessing step within the compiler to (a) reduce runtime costs, and (b) also reduce the binary size of the compiled model.
So one way this has been done in IREE is to extract out such computations as a pre-processing step, compile it, and run it using IREE, and use the result of the run as an input to the rest of the computation.
Long story short, we need to do such transformations only for either splat constants, or "small constants".

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1287	Probably better to move this out of here and check much earlier. Then you can make the function just take the offsets, strides, and sizes directly as arguments.
1319	I have a broader comment on the constants that need to be folded this way on the patch itself. In addition to that, I'd consider doing this only if the constant op has a single use. If the whole constant is used anyway, then taking slice of it will eventually just be doing a pointer offset. The value of this transformation is that if the constant has a single use, then you could just discard the unused values.
1324	Nit: Instead of templating on the iterator type, can we template on the `Attribute` type directly , i.e. `DenseIntElementsAttr`/`DenseFPElementsAttr`?

This revision now requires changes to proceed.Feb 12 2022, 10:20 PM

Addressed Mahesh's comment by folding the single use case.

Harbormaster completed remote builds in B149497: Diff 408586.Feb 14 2022, 1:07 PM

okkwon marked 2 inline comments as done.Feb 14 2022, 1:11 PM

okkwon added inline comments.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1287	This will make the code in foldConstant() lengthy so I would keep this as is.
1319	This is done. Thank you for your thoughtful comments!
1324	I tried to use the Attribute only, but there is a difficulty to get the underlying concrete element type solely from the attribute since it is an implementation detail. Let's keep it as is.

mravishankar requested changes to this revision.Feb 14 2022, 1:19 PM

mravishankar added inline comments.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1243–1343	You also need to have a size control here to avoid compilation time explosion.
1243–1343	Related to the comment below, if this method took the offsets, size and strides directly, say static Attribute sliceDenseElementsAttr(Value constant, ArrayRef<int64_t> offsets, ArrayRef<int64_t> sizes, ArrayRef<int64_t> strides) { ... } then the logic here is agnostic to the fact that these came from a `tensor.extract_slice`. That separation of concerns seems cleaner to me.
1287	See comment about. Doing this processing in the `matchAndRewrite` method will separate the actual slice handling of the constant and getting the values of offsets, sizes and strides from the `extract_slice` op. You could just check for the fold conditions and also retrieve the values directly in the `ExtractSliceOp::fold(...)` method makes it clear when the folding is done.

This revision now requires changes to proceed.Feb 14 2022, 1:19 PM

For some unknown reason, arg diff --update only pushed one commit instead of two.
Now the change for the single use case is correctly updated.

Harbormaster completed remote builds in B149504: Diff 408592.Feb 14 2022, 1:23 PM

Control the size of constant folding. When the size > 1024 (hard-coded now),
the constant folding won't be applied.

Harbormaster completed remote builds in B149511: Diff 408602.Feb 14 2022, 1:47 PM

Can you re-upload with the full diff?

hoist various checks to fold() to clean up the code

Harbormaster completed remote builds in B149541: Diff 408639.Feb 14 2022, 3:11 PM

rriddle added inline comments.Feb 14 2022, 3:16 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1244	When is the result not a ShapedType?
1248–1249
1248–1249

Updated to use cast to get ShapedType.

Harbormaster completed remote builds in B149544: Diff 408643.Feb 14 2022, 3:23 PM

reuse resultType

Harbormaster completed remote builds in B149545: Diff 408644.Feb 14 2022, 3:26 PM

okkwon marked 4 inline comments as done.Feb 14 2022, 3:28 PM

okkwon added inline comments.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1243–1343	Hoisted the offsets, sizes, and strides calculation to `fold()`. Now the fold function directly calls `sliceElements()`.
1244	I will use cast. The type should always be ShapedType. Thanks!

okkwon updated this revision to Diff 408646.Feb 14 2022, 3:30 PM

okkwon marked an inline comment as done.

Resolve the patch error

Harbormaster completed remote builds in B149546: Diff 408646.Feb 14 2022, 3:30 PM

rebase onto main

Fix a missing change for IterTy and ElemTy

Harbormaster completed remote builds in B149550: Diff 408654.Feb 14 2022, 4:33 PM

Looks good overall. Just a nit, and maybe some more lit tests?

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1294	Nit : s/Sice/Since
1296	A better TODO is "If the size of the constant folded is to be controlled, move this out of folding and make it a separate pattern which can accept an option to control the size"
mlir/test/Dialect/Tensor/canonicalize.mlir
394	Sorry should have mentioned earlier. Might be worth adding more tests, like having a larger constant and taking an interior out of it, etc.

This revision now requires changes to proceed.Feb 14 2022, 4:46 PM

rriddle added inline comments.Feb 14 2022, 5:20 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1277	attr is guaranteed to be non-null here. Also, you can use SplatElementsAttr in place of DenseElementsAttr above and drop this call to isSplat.
1282	Why go to the defining op instead of op.source().hasOneUse()?
1301	nit: Spell out auto here.
1318
1320	Spell out auto here.
1350	Spell out auto here.

mehdi_amini added inline comments.Feb 14 2022, 5:43 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	I'm not thrilled by this heuristic, I don't think it the right thing to handle this kind of thing in such an ad-hoc way here. In particular because if there are two uses for this constant and both of them can be folded but both of them are guarded the same way, then we don't fold anything...
1296	Right, but a corollary of the TODO is also that we shouldn't have an ad-hoc threshold like this here, because why this operation and no other? How is the threshold suitable in general?
1319	This is unusual for the folder to look

okkwon added inline comments.Feb 14 2022, 6:55 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	Thanks for the comment! Even though the pattern caught by the current code looks very limited, it is still incremental enhancement. I am going to add more patterns. I am also very interested in seeing what is the most common pattern regarding the constant folding for the real applications.

mehdi_amini added inline comments.Feb 14 2022, 8:20 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	I commenting on the guard itself, I'm fine with adding the pattern, I object to the way it is guarded though: both by this "hasOneUse" and the arbitrary "kConstantFoldingMaxNumElements".

add two more unit tests

I have added two more unit tests.

I will revisit the change to address Mehdi's comments. Thanks River, Mehdi, and Mahesh for the comments!

Harbormaster completed remote builds in B149853: Diff 409085.Feb 15 2022, 5:40 PM

I see that you changed the condition to check for splat constants, but the same handling does not apply for splat constants. There is no need to take slices, etc (actually that might even be wrong, not sure)

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	There is a difference between splat constants and non-splat constants. The reasoning for non-splat constants is, say you have a constant of type `tensor<4x2xf32>`. If there are multiple uses of this constant (potentially accessing different slices of the constant, or the whole constant), you will need to store the constant values in the binary somewhere. Having a new constant of smaller shape is just adding to binary size without any value. You might as well leave the `tensor.extract_slice` as is. Eventually it will just become a `memref.subview` followed by load. So when there are multiple uses it makes sense to leave the constant as is. if there is a single use, it means that the values outside of the slice are dead and you can just remove those values. It saves binary size. I agree with you about the `kConstantFoldingMaxNumElements` is fairly arbitrary. The same reasoning does not hold for splat constants though. So now that this change is targeting splat constants, i'd also suggest dropping both these heuristics.
1286	IIUC, constants cannot have a dynamic shape.
1296	Replied above with more context on this.
1319	I am not sure I understand the comment here.
1328	All of this is not necessary for a splat constant. There is only a single value. You just need to change the type of the constant as dictated by the offsets, sizes and strides.

This revision now requires changes to proceed.Feb 15 2022, 9:16 PM

mehdi_amini added inline comments.Feb 15 2022, 9:32 PM

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	So now that this change is targeting splat constants Right now the code is targeting only non-splat unless I am mis-reading?

Right. It does not handle the splat case yet.

My bad. Saw the patch too late in the night for me and misread the conditions. Will sync with Okwan to figure out next steps.

Add control function support

The code is updated with the control function approach. Please take a look.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1282	I used the control function approach to make the decision user-controllable.
1286	I believe this comment is outdated and the code shows a

Harbormaster completed remote builds in B151489: Diff 411432.Feb 25 2022, 9:30 AM

Looks good to me.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1225	Nit: Combine this and next checks into one if (!sourceType.hasStaticShape() && !resultType.hasStaticShape()) return failure();
1237	I am not sure we really need this case?
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp
67 ↗	(On Diff #411432)	This will assert if result type is not static. In this case the constant verification enforces that the result of a constant is statically shaped. So maybe its fine, but FYI.

This revision is now accepted and ready to land.Feb 25 2022, 10:09 AM

Address Mahesh's comments

okkwon marked an inline comment as done.Feb 25 2022, 11:30 AM

okkwon added inline comments.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1225	nit: the logic should be `\|\|`. Updated.
1237	Are you referring to `if (count == 0)`? `shape` itself is used below. I will move it down.
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp
67 ↗	(On Diff #411432)	Yeah, the control function is just another control over whether the folding should happen or not. The sanity check for the folding itself is done in the folding function.

Harbormaster completed remote builds in B151521: Diff 411476.Feb 25 2022, 11:46 AM

bondhugula requested changes to this revision.Feb 26 2022, 2:45 AM

bondhugula added a subscriber: bondhugula.

bondhugula added inline comments.

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
12 ↗	(On Diff #411476)	Please use forward declaration instead of including headers: https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible
32 ↗	(On Diff #411476)	Is it disabled -> Disable ...
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
18–21	Trim headers -- you don't need all of these.
1177	Missing doc comment.
1179–1203	Code comments are completely missing.
1260–1272	Likewise.
1283	Doc comment.
mlir/test/Dialect/Tensor/fold-constant-extractslice.mlir
1 ↗	(On Diff #411476)	Nit: extractslice -> extract-slice

This revision now requires changes to proceed.Feb 26 2022, 2:45 AM

Thanks Uday for the comments. Please take another look.

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
12 ↗	(On Diff #411476)	Unfortunately, `ExtractSliceOp` is used in the default value and it needs its concrete type, so it is not avoidable.
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1260–1272	Please elaborate. I put a simple comment over `newAttr`.

Addressed Uday's comments.

Harbormaster completed remote builds in B151635: Diff 411642.Feb 26 2022, 2:10 PM

bondhugula removed a reviewer: bondhugula.Feb 27 2022, 9:20 AM

bondhugula added inline comments.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
1264	All comments are to be terminated with a full stop.

This revision is now accepted and ready to land.Feb 27 2022, 9:20 AM

Merged to origin/main.

Try to keep the last line of the commit message with Differential Revision: https://reviews.llvm.org/D119605 ; not only will this close the revision, but also it'll allow folks in the future to find the discussion in the review from a git log.

(also the title of the revision / commit could have been updated since this evolved into not hooking up into the fold of the operations)

Reopening: The push to origin/main caused a build break due to a cyclic header inclusion between Tensor.h and Transforms.h. The issue didn't occur in my local machine, but it broke the testing system.

This revision is now accepted and ready to land.Feb 28 2022, 11:20 AM

Fix the bazel build error and rewrite the commit message.

Harbormaster completed remote builds in B151806: Diff 411883.Feb 28 2022, 2:07 PM

This revision was landed with ongoing or failed builds.Feb 28 2022, 3:09 PM

Closed by commit rG4c901bf44719: [mlir] Match Arithmetic::ConstantOp and Tensor::ExtractSliceOp. (authored by okkwon). · Explain Why

This revision was automatically updated to reflect the committed changes.

okkwon added a commit: rG4c901bf44719: [mlir] Match Arithmetic::ConstantOp and Tensor::ExtractSliceOp..

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Tensor/

IR/

TensorOps.cpp

106 lines

test/

Dialect/

Tensor/

canonicalize.mlir

13 lines

Diff 408651

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Show All 9 Lines

#include "mlir/Dialect/Arithmetic/Utils/Utils.h" #include "mlir/Dialect/Arithmetic/Utils/Utils.h"

#include "mlir/Dialect/Complex/IR/Complex.h" #include "mlir/Dialect/Complex/IR/Complex.h"

#include "mlir/Dialect/Tensor/IR/Tensor.h" #include "mlir/Dialect/Tensor/IR/Tensor.h"

#include "mlir/Dialect/Utils/ReshapeOpsUtils.h" #include "mlir/Dialect/Utils/ReshapeOpsUtils.h"

#include "mlir/Dialect/Utils/StaticValueUtils.h" #include "mlir/Dialect/Utils/StaticValueUtils.h"

#include "mlir/IR/BlockAndValueMapping.h" #include "mlir/IR/BlockAndValueMapping.h"

#include "mlir/IR/Builders.h" #include "mlir/IR/Builders.h"

#include "mlir/IR/BuiltinAttributeInterfaces.h" #include "mlir/IR/BuiltinAttributeInterfaces.h"

#include "mlir/IR/BuiltinAttributes.h"

#include "mlir/IR/BuiltinTypes.h"

#include "mlir/IR/Matchers.h" #include "mlir/IR/Matchers.h"

#include "mlir/IR/OpDefinition.h"

bondhugulaUnsubmitted

Done

Trim headers -- you don't need all of these.

bondhugula: Trim headers -- you don't need all of these.

#include "mlir/IR/PatternMatch.h" #include "mlir/IR/PatternMatch.h"

#include "mlir/IR/TypeUtilities.h" #include "mlir/IR/TypeUtilities.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallBitVector.h" #include "llvm/ADT/SmallBitVector.h"

using namespace mlir; using namespace mlir;

using namespace mlir::tensor; using namespace mlir::tensor;

▲ Show 20 Lines • Show All 1,139 Lines • ▼ Show 20 Lines Value newSlice = rewriter.create<ExtractSliceOp>(

sliceOp.static_sizes(), sliceOp.static_strides()); sliceOp.static_sizes(), sliceOp.static_strides());

rewriter.replaceOpWithNewOp<tensor::CastOp>(sliceOp, sliceOp.getType(), rewriter.replaceOpWithNewOp<tensor::CastOp>(sliceOp, sliceOp.getType(),

newSlice); newSlice);

return success(); return success();

} }

}; };

} // namespace } // namespace

/// Return the canonical type of the result of an extract_slice op. /// Return the canonical type of the result of an extract_slice op.

bondhugulaUnsubmitted

Done

Missing doc comment.

bondhugula: Missing doc comment.

struct SliceReturnTypeCanonicalizer { struct SliceReturnTypeCanonicalizer {

RankedTensorType operator()(ExtractSliceOp op, RankedTensorType operator()(ExtractSliceOp op,

ArrayRef<OpFoldResult> mixedOffsets, ArrayRef<OpFoldResult> mixedOffsets,

ArrayRef<OpFoldResult> mixedSizes, ArrayRef<OpFoldResult> mixedSizes,

ArrayRef<OpFoldResult> mixedStrides) { ArrayRef<OpFoldResult> mixedStrides) {

return getCanonicalSliceResultType(op.getType().getRank(), return getCanonicalSliceResultType(op.getType().getRank(),

op.getSourceType(), mixedOffsets, op.getSourceType(), mixedOffsets,

mixedSizes, mixedStrides); mixedSizes, mixedStrides);

} }

}; };

/// A canonicalizer wrapper to replace ExtractSliceOps. /// A canonicalizer wrapper to replace ExtractSliceOps.

struct SliceCanonicalizer { struct SliceCanonicalizer {

void operator()(PatternRewriter &rewriter, ExtractSliceOp op, void operator()(PatternRewriter &rewriter, ExtractSliceOp op,

ExtractSliceOp newOp) { ExtractSliceOp newOp) {

Value replacement = newOp.getResult(); Value replacement = newOp.getResult();

if (replacement.getType() != op.getType()) if (replacement.getType() != op.getType())

replacement = rewriter.create<tensor::CastOp>(op.getLoc(), op.getType(), replacement = rewriter.create<tensor::CastOp>(op.getLoc(), op.getType(),

replacement); replacement);

rewriter.replaceOp(op, replacement); rewriter.replaceOp(op, replacement);

} }

}; };

void ExtractSliceOp::getCanonicalizationPatterns(RewritePatternSet &results, void ExtractSliceOp::getCanonicalizationPatterns(RewritePatternSet &results,

MLIRContext *context) { MLIRContext *context) {

results.add< results.add<

bondhugulaUnsubmitted

Done

Code comments are completely missing.

bondhugula: Code comments are completely missing.

OpWithOffsetSizesAndStridesConstantArgumentFolder< OpWithOffsetSizesAndStridesConstantArgumentFolder<

ExtractSliceOp, SliceReturnTypeCanonicalizer, SliceCanonicalizer>, ExtractSliceOp, SliceReturnTypeCanonicalizer, SliceCanonicalizer>,

ExtractSliceOpCastFolder>(context); ExtractSliceOpCastFolder>(context);

} }

// //

static LogicalResult static LogicalResult

foldIdentityOffsetSizeAndStrideOpInterface(OffsetSizeAndStrideOpInterface op, foldIdentityOffsetSizeAndStrideOpInterface(OffsetSizeAndStrideOpInterface op,

ShapedType shapedType) { ShapedType shapedType) {

OpBuilder b(op.getContext()); OpBuilder b(op.getContext());

for (OpFoldResult ofr : op.getMixedOffsets()) for (OpFoldResult ofr : op.getMixedOffsets())

if (getConstantIntValue(ofr) != static_cast<int64_t>(0)) if (getConstantIntValue(ofr) != static_cast<int64_t>(0))

return failure(); return failure();

// Rank-reducing noops only need to inspect the leading dimensions: llvm::zip // Rank-reducing noops only need to inspect the leading dimensions: llvm::zip

// is appropriate. // is appropriate.

auto shape = shapedType.getShape(); auto shape = shapedType.getShape();

for (auto it : llvm::zip(op.getMixedSizes(), shape)) for (auto it : llvm::zip(op.getMixedSizes(), shape))

if (getConstantIntValue(std::get<0>(it)) != std::get<1>(it)) if (getConstantIntValue(std::get<0>(it)) != std::get<1>(it))

return failure(); return failure();

for (OpFoldResult ofr : op.getMixedStrides()) for (OpFoldResult ofr : op.getMixedStrides())

if (getConstantIntValue(ofr) != static_cast<int64_t>(1)) if (getConstantIntValue(ofr) != static_cast<int64_t>(1))

return failure(); return failure();

mravishankarUnsubmitted

Done

Nit: Combine this and next checks into one

if (!sourceType.hasStaticShape() && !resultType.hasStaticShape()) return failure();

mravishankar: Nit: Combine this and next checks into one ``` if (!sourceType.hasStaticShape() && !resultType.

okkwonAuthorUnsubmitted

Done

nit: the logic should be ||. Updated.

okkwon: nit: the logic should be `||`. Updated.

return success(); return success();

} }

/// If we have an ExtractSliceOp consuming an InsertSliceOp with the same slice, /// If we have an ExtractSliceOp consuming an InsertSliceOp with the same slice,

/// we can return the InsertSliceOp's source directly. /// we can return the InsertSliceOp's source directly.

// TODO: This only checks the immediate producer; extend to go up the // TODO: This only checks the immediate producer; extend to go up the

// insert/extract chain if the slices are disjoint. // insert/extract chain if the slices are disjoint.

static Value foldExtractAfterInsertSlice(ExtractSliceOp extractOp) { static Value foldExtractAfterInsertSlice(ExtractSliceOp extractOp) {

auto insertOp = extractOp.source().getDefiningOp<InsertSliceOp>(); auto insertOp = extractOp.source().getDefiningOp<InsertSliceOp>();

auto isSame = [](OpFoldResult a, OpFoldResult b) { return a == b; }; auto isSame = [](OpFoldResult a, OpFoldResult b) { return a == b; };

if (insertOp && insertOp.source().getType() == extractOp.getType() && if (insertOp && insertOp.source().getType() == extractOp.getType() &&

mravishankarUnsubmitted

Done

I am not sure we really need this case?

mravishankar: I am not sure we really need this case?

okkwonAuthorUnsubmitted

Not Done

Are you referring to if (count == 0)? shape itself is used below. I will move it down.

okkwon: Are you referring to `if (count == 0)`? `shape` itself is used below. I will move it down.

insertOp.isSameAs(extractOp, isSame)) insertOp.isSameAs(extractOp, isSame))

return insertOp.source(); return insertOp.source();

return {}; return {};

} }

template <typename I, typename E>

rriddleUnsubmitted

Done

When is the result not a ShapedType?

rriddle: When is the result not a ShapedType?

okkwonAuthorUnsubmitted

Done

I will use cast. The type should always be ShapedType. Thanks!

okkwon: I will use cast. The type should always be ShapedType. Thanks!

static void sliceElements(I values, ArrayRef<int64_t> counts,

ArrayRef<int64_t> offsets, ArrayRef<int64_t> sizes,

ArrayRef<int64_t> strides,

llvm::SmallVectorImpl<E> *outValues) {

assert(offsets.size() == sizes.size());

rriddleUnsubmitted

Done

elems.begin(), counts, offsets, sizes, strides, &outValues);

- return DenseElementsAttr::get(op.result().getType().cast<ShapedType>(),

- outValues);

+ return DenseElementsAttr::get(resultType, outValues);

}

if (auto elems = attr.dyn_cast<DenseFPElementsAttr>()) {

rriddle:

rriddleUnsubmitted

Done

elems.begin(), counts, offsets, sizes, strides, &outValues);

- return DenseElementsAttr::get(op.result().getType().cast<ShapedType>(),

- outValues);

+ return DenseElementsAttr::get(resultType, outValues);

}

return {};

rriddle:

assert(offsets.size() == strides.size());

if (offsets.empty())

return;

int64_t offset = offsets.front();

int64_t size = sizes.front();

int64_t stride = strides.front();

if (offsets.size() == 1) {

for (int i = offset; i < size; i += stride)

outValues->push_back(*(values + i));

rriddleUnsubmitted

Done

Drop trivial braces here.

rriddle: Drop trivial braces here.

return;

}

for (; offset < size; offset += stride) {

bondhugulaUnsubmitted

Not Done

All comments are to be terminated with a full stop.

bondhugula: All comments are to be terminated with a full stop.

auto begin = values + offset * counts.front();

sliceElements<I, E>(begin, counts.drop_front(), offsets.drop_front(),

sizes.drop_front(), strides.drop_front(), outValues);

}

rriddleUnsubmitted

Done

Can you use longer names here? I and E aren't clear to me here.

rriddle: Can you use longer names here? I and E aren't clear to me here.

static Attribute foldConstant(ExtractSliceOp op) {

DenseElementsAttr attr;

bondhugulaUnsubmitted

Not Done

Likewise.

bondhugula: Likewise.

okkwonAuthorUnsubmitted

Done

Please elaborate. I put a simple comment over newAttr.

okkwon: Please elaborate. I put a simple comment over `newAttr`.

if (!matchPattern(op.source(), m_Constant(&attr)))

return {};

// TODO: Support the splat case.

if (!attr || attr.isSplat())

return {};

rriddleUnsubmitted

Done

attr is guaranteed to be non-null here. Also, you can use SplatElementsAttr in place of DenseElementsAttr above and drop this call to isSplat.

rriddle: attr is guaranteed to be non-null here. Also, you can use SplatElementsAttr in place of…

// The case with multiple uses is not supported since it creates more

// constant data.

if (!op.source().getDefiningOp()->hasOneUse())

return {};

rriddleUnsubmitted

Done

Why go to the defining op instead of op.source().hasOneUse()?

rriddle: Why go to the defining op instead of op.source().hasOneUse()?

mehdi_aminiUnsubmitted

Done

I'm not thrilled by this heuristic, I don't think it the right thing to handle this kind of thing in such an ad-hoc way here.

In particular because if there are two uses for this constant and both of them can be folded but both of them are guarded the same way, then we don't fold anything...

mehdi_amini: I'm not thrilled by this heuristic, I don't think it the right thing to handle this kind of…

okkwonAuthorUnsubmitted

Done

Thanks for the comment! Even though the pattern caught by the current code looks very limited, it is still incremental enhancement. I am going to add more patterns. I am also very interested in seeing what is the most common pattern regarding the constant folding for the real applications.

okkwon: Thanks for the comment! Even though the pattern caught by the current code looks very limited…

mehdi_aminiUnsubmitted

Done

I commenting on the guard itself, I'm fine with adding the pattern, I object to the way it is guarded though: both by this "hasOneUse" and the arbitrary "kConstantFoldingMaxNumElements".

mehdi_amini: I commenting on the guard itself, I'm fine with adding the pattern, I object to the way it is…

mravishankarUnsubmitted

Done

There is a difference between splat constants and non-splat constants. The reasoning for non-splat constants is, say you have a constant of type tensor<4x2xf32>. If there are multiple uses of this constant (potentially accessing different slices of the constant, or the whole constant), you will need to store the constant values in the binary somewhere. Having a new constant of smaller shape is just adding to binary size without any value. You might as well leave the tensor.extract_slice as is. Eventually it will just become a memref.subview followed by load. So when there are multiple uses it makes sense to leave the constant as is. if there is a single use, it means that the values outside of the slice are dead and you can just remove those values. It saves binary size. I agree with you about the kConstantFoldingMaxNumElements is fairly arbitrary.

The same reasoning does not hold for splat constants though. So now that this change is targeting splat constants, i'd also suggest dropping both these heuristics.

mravishankar: There is a difference between splat constants and non-splat constants. The reasoning for non…

mehdi_aminiUnsubmitted

Done

So now that this change is targeting splat constants

Right now the code is targeting only non-splat unless I am mis-reading?

mehdi_amini: > So now that this change is targeting splat constants Right now the code is targeting only…

okkwonAuthorUnsubmitted

Done

I used the control function approach to make the decision user-controllable.

okkwon: I used the control function approach to make the decision user-controllable.

bondhugulaUnsubmitted

Done

Doc comment.

bondhugula: Doc comment.

// Dynamic result shape is not supported.

auto sourceType = op.source().getType().cast<ShapedType>();

if (!sourceType.hasStaticShape())

mravishankarUnsubmitted

Done

IIUC, constants cannot have a dynamic shape.

mravishankar: IIUC, constants cannot have a dynamic shape.

okkwonAuthorUnsubmitted

Done

I believe this comment is outdated and the code shows a

okkwon: I believe this comment is outdated and the code shows a

return {};

mravishankarUnsubmitted

Done

Probably better to move this out of here and check much earlier. Then you can make the function just take the offsets, strides, and sizes directly as arguments.

mravishankar: Probably better to move this out of here and check much earlier. Then you can make the function…

okkwonAuthorUnsubmitted

Done

This will make the code in foldConstant() lengthy so I would keep this as is.

okkwon: This will make the code in foldConstant() lengthy so I would keep this as is.

mravishankarUnsubmitted

Done

See comment about. Doing this processing in the matchAndRewrite method will separate the actual slice handling of the constant and getting the values of offsets, sizes and strides from the extract_slice op. You could just check for the fold conditions and also retrieve the values directly in the ExtractSliceOp::fold(...) method makes it clear when the folding is done.

mravishankar: See comment about. Doing this processing in the `matchAndRewrite` method will separate the…

auto resultType = op.result().getType().cast<ShapedType>();

if (!resultType.hasStaticShape())

return {};

// Control the size. Sice the way to get a new constant collects each element,

// it can have a bad impact on the compile time when the data size is big.

mravishankarUnsubmitted

Done

Nit : s/Sice/Since

mravishankar: Nit : s/Sice/Since

// TODO: create an option if a customization is needed.

constexpr int64_t kConstantFoldingMaxNumElements = 1024;

mravishankarUnsubmitted

Done

A better TODO is "If the size of the constant folded is to be controlled, move this out of folding and make it a separate pattern which can accept an option to control the size"

mravishankar: A better TODO is "If the size of the constant folded is to be controlled, move this out of…

mehdi_aminiUnsubmitted

Done

Right, but a corollary of the TODO is also that we shouldn't have an ad-hoc threshold like this here, because why this operation and no other? How is the threshold suitable in general?

mehdi_amini: Right, but a corollary of the TODO is also that we shouldn't have an ad-hoc threshold like this…

mravishankarUnsubmitted

Done

Replied above with more context on this.

mravishankar: Replied above with more context on this.

if (resultType.getNumElements() > kConstantFoldingMaxNumElements)

return {};

auto shape = sourceType.getShape();

int64_t count = sourceType.getNumElements();

rriddleUnsubmitted

Done

nit: Spell out auto here.

rriddle: nit: Spell out auto here.

if (count == 0)

rriddleUnsubmitted

Done

Drop the llvm:: here. Also do we need the hard coded 6?

rriddle: Drop the llvm:: here. Also do we need the hard coded 6?

return {};

// Check if there are any dynamic parts, which are not supported.

auto offsets = extractFromI64ArrayAttr(op.static_offsets());

if (llvm::is_contained(offsets, ShapedType::kDynamicStrideOrOffset))

return {};

auto sizes = extractFromI64ArrayAttr(op.static_sizes());

if (llvm::is_contained(sizes, ShapedType::kDynamicSize))

return {};

rriddleUnsubmitted

Done

Please use m_Constant instead of hardcoding constant operations.

rriddle: Please use m_Constant instead of hardcoding constant operations.

auto strides = extractFromI64ArrayAttr(op.static_strides());

if (llvm::is_contained(strides, ShapedType::kDynamicStrideOrOffset))

return {};

// Compute the stride for each dimension.

SmallVector<int64_t, 6> counts;

counts.reserve(shape.size());

rriddleUnsubmitted

Done

// Compute the stride for each dimension.

- SmallVector<int64_t, 6> counts;

+ SmallVector<int64_t> counts;

counts.reserve(shape.size());

rriddle:

for (auto v : shape) {

mravishankarUnsubmitted

Done

I have a broader comment on the constants that need to be folded this way on the patch itself.
In addition to that, I'd consider doing this only if the constant op has a single use. If the whole constant is used anyway, then taking slice of it will eventually just be doing a pointer offset. The value of this transformation is that if the constant has a single use, then you could just discard the unused values.

mravishankar: I have a broader comment on the constants that need to be folded this way on the patch itself.

okkwonAuthorUnsubmitted

Done

This is done. Thank you for your thoughtful comments!

okkwon: This is done. Thank you for your thoughtful comments!

mehdi_aminiUnsubmitted

Done

This is unusual for the folder to look

mehdi_amini: This is unusual for the folder to look

mravishankarUnsubmitted

Done

I am not sure I understand the comment here.

mravishankar: I am not sure I understand the comment here.

count = count / v;

rriddleUnsubmitted

Done

Spell out auto here.

rriddle: Spell out auto here.

counts.push_back(count);

}

if (auto elems = attr.dyn_cast<DenseIntElementsAttr>()) {

mravishankarUnsubmitted

Done

Nit: Instead of templating on the iterator type, can we template on the Attribute type directly , i.e. DenseIntElementsAttr/DenseFPElementsAttr?

mravishankar: Nit: Instead of templating on the iterator type, can we template on the `Attribute` type…

okkwonAuthorUnsubmitted

Done

I tried to use the Attribute only, but there is a difficulty to get the underlying concrete element type solely from the attribute since it is an implementation detail. Let's keep it as is.

okkwon: I tried to use the Attribute only, but there is a difficulty to get the underlying concrete…

SmallVector<APInt> outValues;

outValues.reserve(sourceType.getNumElements());

sliceElements<DenseElementsAttr::IntElementIterator, APInt>(

elems.begin(), counts, offsets, sizes, strides, &outValues);

rriddleUnsubmitted

Done

Can you format here?

rriddle: Can you format here?

mravishankarUnsubmitted

Done

All of this is not necessary for a splat constant. There is only a single value. You just need to change the type of the constant as dictated by the offsets, sizes and strides.

mravishankar: All of this is not necessary for a splat constant. There is only a single value. You just need…

return DenseElementsAttr::get(resultType, outValues);

}

if (auto elems = attr.dyn_cast<DenseFPElementsAttr>()) {

SmallVector<APFloat> outValues;

outValues.reserve(sourceType.getNumElements());

sliceElements<DenseElementsAttr::FloatElementIterator, APFloat>(

elems.begin(), counts, offsets, sizes, strides, &outValues);

return DenseElementsAttr::get(resultType, outValues);

}

return {};

}

OpFoldResult ExtractSliceOp::fold(ArrayRef<Attribute>) { OpFoldResult ExtractSliceOp::fold(ArrayRef<Attribute>) {

mravishankarUnsubmitted

Done

You also need to have a size control here to avoid compilation time explosion.

mravishankar: You also need to have a size control here to avoid compilation time explosion.

mravishankarUnsubmitted

Done

Related to the comment below, if this method took the offsets, size and strides directly, say

static Attribute sliceDenseElementsAttr(Value constant, ArrayRef<int64_t> offsets, ArrayRef<int64_t> sizes, ArrayRef<int64_t> strides) {
  ...
}

then the logic here is agnostic to the fact that these came from a tensor.extract_slice. That separation of concerns seems cleaner to me.

mravishankar: Related to the comment below, if this method took the offsets, size and strides directly, say…

okkwonAuthorUnsubmitted

Done

Hoisted the offsets, sizes, and strides calculation to fold(). Now the fold function directly calls sliceElements().

okkwon: Hoisted the offsets, sizes, and strides calculation to `fold()`. Now the fold function directly…

if (getSourceType() == getType() && if (getSourceType() == getType() &&

succeeded(foldIdentityOffsetSizeAndStrideOpInterface(*this, getType()))) succeeded(foldIdentityOffsetSizeAndStrideOpInterface(*this, getType())))

return this->source(); return this->source();

if (Value slice = foldExtractAfterInsertSlice(*this)) if (Value slice = foldExtractAfterInsertSlice(*this))

return slice; return slice;

if (auto slice = foldConstant(*this))

rriddleUnsubmitted

Done

Spell out auto here.

rriddle: Spell out auto here.

return slice;

rriddleUnsubmitted

Done

Drop trivial braces here.

rriddle: Drop trivial braces here.

return OpFoldResult(); return OpFoldResult();

} }

Value mlir::tensor::createCanonicalRankReducingExtractSliceOp( Value mlir::tensor::createCanonicalRankReducingExtractSliceOp(

OpBuilder &b, Location loc, Value tensor, RankedTensorType targetType) { OpBuilder &b, Location loc, Value tensor, RankedTensorType targetType) {

auto rankedTensorType = tensor.getType().cast<RankedTensorType>(); auto rankedTensorType = tensor.getType().cast<RankedTensorType>();

unsigned rank = rankedTensorType.getRank(); unsigned rank = rankedTensorType.getRank();

auto shape = rankedTensorType.getShape(); auto shape = rankedTensorType.getShape();

▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

mlir/test/Dialect/Tensor/canonicalize.mlir

	Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines
	// CHECK: return %[[ARG0]] : tensor<4x6x16x32xi8>			// CHECK: return %[[ARG0]] : tensor<4x6x16x32xi8>
	func @trivial_slice(%arg0 : tensor<4x6x16x32xi8>) -> tensor<4x6x16x32xi8> {			func @trivial_slice(%arg0 : tensor<4x6x16x32xi8>) -> tensor<4x6x16x32xi8> {
	%0 = tensor.extract_slice %arg0[0, 0, 0, 0] [4, 6, 16, 32] [1, 1, 1, 1] : tensor<4x6x16x32xi8> to tensor<4x6x16x32xi8>			%0 = tensor.extract_slice %arg0[0, 0, 0, 0] [4, 6, 16, 32] [1, 1, 1, 1] : tensor<4x6x16x32xi8> to tensor<4x6x16x32xi8>
	return %0 : tensor<4x6x16x32xi8>			return %0 : tensor<4x6x16x32xi8>
	}			}

	// -----			// -----

				// CHECK-LABEL: func @slice_constant
				// CHECK-NOT: tensor.extract_slice
				// CHECK: %[[CONST:.+]] = arith.constant dense<1.000000e+01> : tensor<1x1xf32>
				// CHECK: return %[[CONST]] : tensor<1x1xf32>
				func @slice_constant(%arg0 : tensor<2x1xf32>) -> tensor<1x1xf32>
				mravishankarUnsubmitted Done Reply Inline Actions Sorry should have mentioned earlier. Might be worth adding more tests, like having a larger constant and taking an interior out of it, etc. mravishankar: Sorry should have mentioned earlier. Might be worth adding more tests, like having a larger…
				{
				%cst = arith.constant dense<[[10.0], [11.0]]> : tensor<2x1xf32>
				%slice = tensor.extract_slice %cst[0, 0] [1, 1] [1, 1] : tensor<2x1xf32> to tensor<1x1xf32>
				return %slice : tensor<1x1xf32>
				}

				// -----

	// CHECK-LABEL: func @trivial_insert_slice			// CHECK-LABEL: func @trivial_insert_slice
	// CHECK-SAME: %[[ARG0:.[a-z0-9A-Z_]+]]: tensor<4x6x16x32xi8>			// CHECK-SAME: %[[ARG0:.[a-z0-9A-Z_]+]]: tensor<4x6x16x32xi8>
	// CHECK-NOT: tensor.extract_slice			// CHECK-NOT: tensor.extract_slice
	// CHECK: return %[[ARG0]] : tensor<4x6x16x32xi8>			// CHECK: return %[[ARG0]] : tensor<4x6x16x32xi8>
	func @trivial_insert_slice(%arg0 : tensor<4x6x16x32xi8>, %arg1 : tensor<4x6x16x32xi8>) -> tensor<4x6x16x32xi8> {			func @trivial_insert_slice(%arg0 : tensor<4x6x16x32xi8>, %arg1 : tensor<4x6x16x32xi8>) -> tensor<4x6x16x32xi8> {
	%0 = tensor.insert_slice %arg0 into %arg1[0, 0, 0, 0] [4, 6, 16, 32] [1, 1, 1, 1] : tensor<4x6x16x32xi8> into tensor<4x6x16x32xi8>			%0 = tensor.insert_slice %arg0 into %arg1[0, 0, 0, 0] [4, 6, 16, 32] [1, 1, 1, 1] : tensor<4x6x16x32xi8> into tensor<4x6x16x32xi8>
	return %0 : tensor<4x6x16x32xi8>			return %0 : tensor<4x6x16x32xi8>
	}			}
	▲ Show 20 Lines • Show All 836 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 408651

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

mlir/test/Dialect/Tensor/canonicalize.mlir

[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp.
ClosedPublic