Page MenuHomePhabricator

[mlir][VectorToGPU] First step to convert vector ops to GPU MMA ops
ClosedPublic

Authored by ThomasRaoux on May 21 2021, 6:25 PM.

Details

Summary

This is the first step to convert vector ops to MMA operations in order to target GPUs tensor core ops. This currently only support simple cases, transpose and element-wise operation will be added later.

Diff Detail

Event Timeline

ThomasRaoux created this revision.May 21 2021, 6:25 PM
ThomasRaoux requested review of this revision.May 21 2021, 6:25 PM
Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache added inline comments.
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
114

Nite: Analyze ?

118

You should be able to only walk the vector::ContractionOp and save some logic below.

122
auto hasVectorDest = [](Operation *op) {
  return op->getNumResults() == 0 || llvm::any_of(op->getResultTypes(), [](Type t) { return t.isa<VectorType>(); });
};

Also, do you want "any_of" or "all_of" behavior ?

I'd also lift it at the top of the function to avoid interleaving logic and simplify the reading.

132
if (llvm::any_of(dependentOp, [](){ return !supports...}))
  return;
148

Can a proper subset of the vector matmul lowering pattern be exposed and plugged here (maybe with some extra lambda) ?
This looks like a lot of code to duplicate.

208

We should already have a pattern that rewrites vector.transfer + vector.transpose into the proper memref.transpose + vector.transfer; can we reuse it?
@bkramer for visibility.

264

You need to check for static stride (and bail if not).
In the general case of dynamic stride, we will need some additional abstractions as the stride is currently kept opaque from std.

283

You need to check for static stride (and bail if not).
In the general case of dynamic stride, we will need some additional abstractions as the stride is currently kept opaque from std.

329

I would split anything related to scf.for + scf.yield in a separate CL and discuss there; in particular there are quite some rewrite and canonicalization patterns that may simplify some of this code.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
1071 ↗(On Diff #347175)

nit: "are laid out" / "have the same layout"

bondhugula added inline comments.
mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h
22

Pass to test convertion -> Convert

I guess this is not a test pass but the conversion itself.

Address review comments

ThomasRaoux marked 3 inline comments as done.
ThomasRaoux marked an inline comment as done.Jun 7 2021, 9:55 AM
ThomasRaoux added inline comments.
mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h
22

Done, you're right, I was originally thinking about having this only for test but at this point it is really a conversion pass.

mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
148

I couldn't get to anything that makes sense, the matmul lowering is trying to put the contract in the form (k, m), (k, n), (m, n) while this code is transforming it to (m, k), (k, n), (m, n) so most of the logic is different the only thing that I could move into a common function was the dim binding but it tends to make the code more complicated as the matmul lowering also handles vector * mat.
Let me know if you have any suggestions to improve that.

208

I can't find this pattern but i'm not sure how this help for this case. Ideally I want a transfer with transpose that can directly be lowered to a mma.load op. How would the memref.transpose help this case?

264

Good point, I moved this into a helper function.

283

ditto

329

I removed all the code related to scf, I'll send a separate patch for it once this one lands.

Add missing comment.

clang-format

nicolasvasilache accepted this revision.Jun 11 2021, 5:30 AM

Very cool!

mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
50

nit: matches

116

nit: trivial braces here and below

148

That's sad, these are all so close.
I guess we are reaching the point where we want to cast some of these manipulations finding some permutation and just doing something custom with it.
This could reduce these explicit 8 case enumeration to just 3 cases + a few swaps.

Not for this CL obviously, thanks for trying!

208

I forgot part of the thinking but IIRC, the idea was that since the mma.load seems richer, it may be possible to fold these transposes into the vector.transfer indexing logic and propagate that to the mma ops; rather than perform acrual transposes.

Not clear whether this is really possible, I may look a little more into it in the future.

247

+1 I wonder whether this has a relation to my rambling about transpose and vector.transfer ?

This revision is now accepted and ready to land.Jun 11 2021, 5:30 AM

Rebase and address review feedback

ThomasRaoux marked 2 inline comments as done.Jun 11 2021, 7:46 AM

Thanks Nicolas!

mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
148

Correct, we can reduce the number of cases here and in matmul lowering, I'll try to do it in the next CL.

208

I'm a bit confused, the pattern does merge the transpose into the transfer_read indexing logic. I don't understand why we would want a memref.transpose. If there is a pattern already doing transpose+transfer_read -> transfer_read with affine map I can use it but I couldn't find any.

247

I don't think it is directly related. The transfer op should already have the transpose indexing merged at this point.

This revision was landed with ongoing or failed builds.Jun 11 2021, 7:53 AM
This revision was automatically updated to reflect the committed changes.
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
208

You're right, I confirmed with @bkramer offline that this actually did not land, I just reviewed it a few months back.
It was the motivation for moving linalg.transpose to memref in this CL: https://reviews.llvm.org/D88651.

I imagine we will want to revive the vector.transfer + transpose -> vector.transfer + strided_memref ?
In Ben's case he wanted to remove the strided_memref but I imagine very similar code will be useful for our purpose too.

ThomasRaoux added inline comments.Jun 11 2021, 8:05 AM
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
208

Sounds good, thanks for checking, I can help move those patterns to a more generic place when it makes sense