This is an archive of the discontinued LLVM Phabricator instance.

Patterns flattening vector transfers to 1D
ClosedPublic

Authored by Benoit on Dec 2 2021, 1:18 PM.

Details

Summary

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.

Diff Detail

Event Timeline

Benoit created this revision.Dec 2 2021, 1:18 PM
Benoit requested review of this revision.Dec 2 2021, 1:18 PM
mravishankar added inline comments.Dec 2 2021, 2:35 PM
mlir/lib/Dialect/Vector/VectorTransforms.cpp
3422 ↗(On Diff #391443)

I think you can drop the dimsize != 1 condition. The strides[i] != productOfInnerMostSizes should still hold.

Cool, thanks much for tackling this, glad to see that it seems to get you to reasonable assembly (inferring from your description?)

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3394 ↗(On Diff #391443)

Can we be more specific here: collapseContiguousMemRefCollapseTo1D ?

3398 ↗(On Diff #391443)

nit:trivial braces.

3408 ↗(On Diff #391443)

This should be sliced a bit differently and better reusing API around BuiltinTypes.h::450.

Basically, getStridesAndOffset is the future-proof way to get the offset and strides.
You want to check the last stride is 1. You can have this as a special helper bool isContiguousMostMinorDimension() or bool isStrideOneMostMinor(int dim) (whichever you find most natural and reusable).

Then you need to determine if the whole type is contiguous.
For this you should add a helper that:

  1. returns true empty or identity layout map (the weird trifecta I mentioned previously)
  2. then returns false if not fully static
  3. then perform a direct MemRefType comparison between a proper usage or getStridesAndOffset, makeStridedLinearLayoutMap, canonicalizeStridedLayout. There should be something similar already using similar logic that can be refactored.

Giving this helper a name and reusing it in multiple places will be a nice cleanup.

3411 ↗(On Diff #391443)

Nit: we avoid trivial braces in LLVM, here and below

3419 ↗(On Diff #391443)

nit: we use camelCase in LLVM, here and below.

3420 ↗(On Diff #391443)

nit dimSize != 1

3443 ↗(On Diff #391443)

nit: trivial braces here and below (lift the comment out the the condition)

3469 ↗(On Diff #391443)

Nice!

3539 ↗(On Diff #391443)

VectorTransforms.cpp is way too bloated and we should split it up (a bit like the standard dialect).
Please either start a new properly named Vector/XXXPatterns.cpp file to put these new patterns or find an existing one.

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
23

You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>"

Due to weird biases, it is currently implemented with an underlying affine_map but the informatio may be clearer like this.
Unfortunately it will still print with affine_map for now.

nicolasvasilache requested changes to this revision.Dec 3 2021, 7:47 AM
This revision now requires changes to proceed.Dec 3 2021, 7:47 AM
Benoit updated this revision to Diff 391804.Dec 3 2021, 8:14 PM

add rank reducing subview to drop unit dims

Thanks for the review comments! I'll apply them on monday. For now I've just updated this diff with the new idea to use rank reducing subviews to drop unit dims, (thanks @ThomasRaoux for the suggestion), that removes my need for https://reviews.llvm.org/D114821 so I can drop it now.

nicolasvasilache requested changes to this revision.Dec 6 2021, 1:03 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Vector/VectorTransforms.cpp
3397 ↗(On Diff #391804)

Please don't add more patterns to VectorTransforms.cpp, we need to split them out into better isolated logical units.
Either add a new .cpp file at the same level with a proper name (see other XXXPatternXXX.cpp files) or put them
in an already existing such file, depending on what is most appropriate.

3398 ↗(On Diff #391804)

This should be its own pattern and return failure when it fails to apply.

3422 ↗(On Diff #391804)

You could run this through clang-format. I locally have this in my .bashrc

function git-format-add-and-amend(){
  echo "git add $1 && git show --name-only | egrep \"*.(\.cpp|\.h)\" | xargs -i clang-format --style=file -i {}; git add $1; git commit --amend"
  git add $1 && git show --name-only | egrep "*.(\.cpp|\.h)" | xargs -i clang-format --style=file -i {}; git add $1; git commit --amend
}
3425 ↗(On Diff #391804)

This should be its own pattern and return failure when it fails to apply.

3441 ↗(On Diff #391804)

static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) {

if (!memrefType.hasStaticShape())
  return false;
int64_t offset;
SmallVector<int64_t> strides;
LogicalResult res = getStridesAndOffset(memrefType, strides, offset);
if (failed(res)) 
  return false;

// You may want to improve the APIs here to minimize the code below to something 
// that is expected to be reusable by others.
AffineExpr expr = makeCanonicalStridedLayoutExpr(
  memrefTyp.getSizes(), memrefType.getContext());
MemRefType canonicalMemRefType = MemRefType::get(
  memrefTyp.getSizes(), AffineMap::infer({expr}));

int64_t canonicalOffset;
SmallVector<int64_t> canonicalStrides;
LogicalResult res = getStridesAndOffset(
  canonicalMemRefType, canonicalStrides, canonicalOffset);
if (failed(res)) 
  llvm_unreachable("Unexpected stride extraction error");

for (auto it : llvm::zip(strides, canonicalStrides))
  if (std::get<0>(it) != std::get<1>(it))
    return false;

return true;

}

This revision now requires changes to proceed.Dec 6 2021, 1:03 AM
Benoit updated this revision to Diff 393458.Dec 10 2021, 5:55 AM

apply some review comments

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
214

you'll need comments on this function and everywhere below (ideally with a short IR example but not strictly necessary)

Benoit updated this revision to Diff 393479.Dec 10 2021, 7:12 AM

split into 2 patterns

Benoit added inline comments.Dec 10 2021, 7:32 AM
mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
23

Thanks for the tip. For consistency between the MLIR code and the CHECK's I will stick to the affine_map<...> form for now.

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
214

Nit, top-levle function and class comments take 3 slashes ///

215

nit:camelCase

227

use std::copy_if or one of the other stl transforms?

235

nice!

246

nit: camelCase Drop

251

This would only work for the static memref cases.
It feels like you should either early exit if the type is not fully static or use the OpFoldResult-based builders.

259

std::count_if or llvm::count_if IIRC

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
269

@gysity, does this relate to the example we just discussed?

Benoit updated this revision to Diff 393499.Dec 10 2021, 8:15 AM

more review comments

Benoit marked 6 inline comments as done.Dec 10 2021, 8:19 AM
Benoit added inline comments.
mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
227

I didn't find a simple way to do that, because of how we need to create reducedStrides, not just reducedShape. While it would be possible to let copy_if create reducedShape, I thought that it was more readable if both vectors were created in the same way. In particular, it makes it plain that they have the same length, and that their i-th elements correspond to the same i-th dim.

251

The caller already has such an early exit, so I put assertions here.

gysit added a subscriber: gysit.Dec 10 2021, 8:39 AM
gysit added inline comments.
mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
269

I think @Benoit `s revision addresses a similar topic but at a different level of the stack. I was looking into generating good transfer ops without switching between tensors and vectors in between and making sure rank-reduction works well. @Benoit optimizes the lowering of what we generate higher up in the stack by flattening the vectors, AFAIU. Looking forward to understand the performance implications but I could imagine this helps quite a bit with in combination with hoisting. In particular, for convolutions where we work on very high-dimensional vectors.

Thanks for pushing on this @Benoit !

I'd suggest slicing and dicing into smaller commits so we can better track if we ever need to bisect; some of the behavior is tricky to get right and the smaller the CLs + tests, the better we will be reviewing and revisiting in the future.
I think you can turn this into 4-5 commits that can then be more easily clicked.

Also, please don't forget about putting those outside of vectortransforms.cpp.

These are nice developments, sorry it is getting longer than what I think you initially signed for but OTOH you're doing things in the right and future-proof way, so I am grateful!

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
299

This is incorrect, the original transferReadOp may have some permutation (i.e. also be sure to insert a proper test).
You want some projection map and compose that with the transfer permutation.

@ThomasRaoux for off-EU-hours advice.

Ah nm, my apologies I see you have a permutation_map test above.
Could you just add a comment/TODO that would highlight this / provision for future work?

300

zeros is invalid as we discussed on discord.
You can't go around a project map and applying it to values here.

338

same as above re projection map and zeros / identity map.

343

This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.

353

This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.

367

This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.

Also, can we retire helper functions that were not good enough for your use case (if there are too many uses please ignore this last point).

381

This should be moved to a proper place in the memref dialect (maybe some utils file and maybe there is already something similar) ?

399

I'd use the name "contiguous" in the pattern name and def. in the description otherwise the doc by itself would have invalid assumptions.

432

same comment here re 0 and permutation map
this case may be a bit trickier though

442

same comments as the above pattern

477

same comments re 0 and map

mravishankar added inline comments.Dec 10 2021, 10:55 AM
mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
243

There might be a simpler way to do this. SubViewOp already has constructors to generate rank-reduced subviews. You can use the inferRankReducedSubview method [here[(https://github.com/llvm/llvm-project/blob/7f09aee0f6b4b00508d2cf86b0b1339c8d2ca2d1/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L1488) and get the result type and use that.

Benoit updated this revision to Diff 393792.Dec 12 2021, 7:17 PM

Apply Mahesh's suggestion to use inferRankReducedResultType.

Benoit marked 15 inline comments as done.Dec 12 2021, 7:29 PM
Benoit marked 2 inline comments as done.Dec 12 2021, 7:35 PM
Benoit updated this revision to Diff 393795.Dec 12 2021, 8:10 PM

More review comments.

Benoit marked 7 inline comments as done.Dec 12 2021, 8:11 PM
Benoit added a comment.EditedDec 12 2021, 8:25 PM

Hi Nicolas, thanks for the kind supportive words here regarding the usefulness of these patterns/helpers.

I think I've addressed the "make it correct" part of your comments, in particular, allZeroConstantIndexValues now checks that the indices of the transfer ops are really all zeros. And I've applied the other "localized" comments that you and Mahesh had (thanks again for those! in particular, Mahesh's comment allowed dropping ~20 lines of code by making dropUnitDims trivial).

I haven't yet addressed your comments about splitting this into multiple commits and about contributing these helpers to core headers:

  • Regarding splitting into multiple commits: note that these patterns are so far only enabled in by test-only flags in TestVectorTransforms.cpp, so they are not for now going to affect anything outside these tests. If I understand correctly, the place where granularity will affect how well people can bisect any issues, will be in how we eventually enable these patterns outside of these tests?
  • Regarding sharing helpers into core headers: I wonder if this would be best done anyway as a second step after this, and maybe even delay a little further to wait until a second use case arises from someone else, to get more context before blessing a particular helper into a core header? If you feel that you already have enough context to make this call now, that may be a sign that you or someone else with this experience, not I, should be making this move :-)
  • As you guessed, I am at this point looking for a time-economical way to wrap up this work :-)
This revision is now accepted and ready to land.Dec 13 2021, 11:50 AM

Sliced a first independent commit and landed as 0aea49a7308322e6987c7b45e4e0d7ab15609e78 as we discussed offline to help you land this.

Second part landed as aba437ceb2379f219935b98a10ca3c5081f0c8b7.

Note that I reduced the amount of tests as the combination did not bring much IMO, feel free to disagree and revive some of those.
Also, note that there are now 2 populate functions to get the behavior you wanted, both of the should be called in sequence.