This is the first step to convert vector ops to MMA operations in order to target GPUs tensor core ops. This currently only support simple cases, transpose and element-wise operation will be added later.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
---|---|---|
114 | Nite: Analyze ? | |
118 | You should be able to only walk the vector::ContractionOp and save some logic below. | |
122 | auto hasVectorDest = [](Operation *op) { return op->getNumResults() == 0 || llvm::any_of(op->getResultTypes(), [](Type t) { return t.isa<VectorType>(); }); }; Also, do you want "any_of" or "all_of" behavior ? I'd also lift it at the top of the function to avoid interleaving logic and simplify the reading. | |
132 | if (llvm::any_of(dependentOp, [](){ return !supports...})) return; | |
148 | Can a proper subset of the vector matmul lowering pattern be exposed and plugged here (maybe with some extra lambda) ? | |
208 | We should already have a pattern that rewrites vector.transfer + vector.transpose into the proper memref.transpose + vector.transfer; can we reuse it? | |
264 | You need to check for static stride (and bail if not). | |
283 | You need to check for static stride (and bail if not). | |
329 | I would split anything related to scf.for + scf.yield in a separate CL and discuss there; in particular there are quite some rewrite and canonicalization patterns that may simplify some of this code. | |
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp | ||
1071 ↗ | (On Diff #347175) | nit: "are laid out" / "have the same layout" |
mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h | ||
---|---|---|
22 | Pass to test convertion -> Convert I guess this is not a test pass but the conversion itself. |
mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h | ||
---|---|---|
22 | Done, you're right, I was originally thinking about having this only for test but at this point it is really a conversion pass. | |
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
148 | I couldn't get to anything that makes sense, the matmul lowering is trying to put the contract in the form (k, m), (k, n), (m, n) while this code is transforming it to (m, k), (k, n), (m, n) so most of the logic is different the only thing that I could move into a common function was the dim binding but it tends to make the code more complicated as the matmul lowering also handles vector * mat. | |
208 | I can't find this pattern but i'm not sure how this help for this case. Ideally I want a transfer with transpose that can directly be lowered to a mma.load op. How would the memref.transpose help this case? | |
264 | Good point, I moved this into a helper function. | |
283 | ditto | |
329 | I removed all the code related to scf, I'll send a separate patch for it once this one lands. |
Very cool!
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
---|---|---|
51 | nit: matches | |
117 | nit: trivial braces here and below | |
148 | That's sad, these are all so close. Not for this CL obviously, thanks for trying! | |
208 | I forgot part of the thinking but IIRC, the idea was that since the mma.load seems richer, it may be possible to fold these transposes into the vector.transfer indexing logic and propagate that to the mma ops; rather than perform acrual transposes. Not clear whether this is really possible, I may look a little more into it in the future. | |
248 | +1 I wonder whether this has a relation to my rambling about transpose and vector.transfer ? |
Thanks Nicolas!
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
---|---|---|
148 | Correct, we can reduce the number of cases here and in matmul lowering, I'll try to do it in the next CL. | |
208 | I'm a bit confused, the pattern does merge the transpose into the transfer_read indexing logic. I don't understand why we would want a memref.transpose. If there is a pattern already doing transpose+transfer_read -> transfer_read with affine map I can use it but I couldn't find any. | |
248 | I don't think it is directly related. The transfer op should already have the transpose indexing merged at this point. |
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
---|---|---|
208 | You're right, I confirmed with @bkramer offline that this actually did not land, I just reviewed it a few months back. I imagine we will want to revive the vector.transfer + transpose -> vector.transfer + strided_memref ? |
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp | ||
---|---|---|
208 | Sounds good, thanks for checking, I can help move those patterns to a more generic place when it makes sense |
Pass to test convertion -> Convert
I guess this is not a test pass but the conversion itself.