This is an archive of the discontinued LLVM Phabricator instance.

[mlir][rocdl] Adding vector to ROCDL dialect lowering
ClosedPublic

Authored by jerryyin on Jun 5 2020, 9:51 AM.

Details

Summary
  • Created the vector to ROCDL lowering pass
    • The lowering pass lowers vector transferOps to rocdl mubufOps
  • Added unit test and functional test

Diff Detail

Event Timeline

jerryyin created this revision.Jun 5 2020, 9:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2020, 9:51 AM
jerryyin updated this revision to Diff 268857.Jun 5 2020, 9:57 AM

Drop irrelevant file

Harbormaster completed remote builds in B59282: Diff 268857.
jerryyin marked 2 inline comments as done.Jun 5 2020, 12:04 PM
jerryyin added inline comments.
mlir/include/mlir/Conversion/VectorToROCDL/VectorToROCDL.h
25

Will update comments

mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir
13

Will update to x2 test case.

whchung added a subscriber: whchung.Jun 5 2020, 1:06 PM
jerryyin updated this revision to Diff 268920.Jun 5 2020, 1:11 PM

Misc update to align with local branch

jerryyin updated this revision to Diff 268921.Jun 5 2020, 1:14 PM

Fix unit test typo

whchung added inline comments.Jun 5 2020, 1:33 PM
mlir/include/mlir/Conversion/Passes.td
300

LLVM -> ROCDL

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33

is this class really used?

68

int1False doesn't seem meaningful here. According to AMD GCN ISA manual, what you want to represent here should be glc and slc bits?

91

could you specify the limitations of this patch?

95

could AffineMap::isMinorIdentity(xferOp.permutation_map()) be used here?

109

from tests provided, it seems vecWidth could only be 2 or 4?

112

nit: add "." at the end of the setence.

121

I guess the only meaningful address spaces for this conversion pattern are 0 (generic) and 1 (global) on AMD GPU. Could additional checks be placed here, and also tests to cover memrefs residing on address space 1?

126

DWORD?

188

is this line necessary?

mlir/test/mlir-rocm-runner/vector-transferops.mlir
12

this end-to-end test only checks loading a vector<2xf32>, is it possible to add one which checks vector<4xf32> as well?

whchung added inline comments.Jun 5 2020, 1:38 PM
mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir
35

could you add tests for vector.transfer_write?

jerryyin marked 3 inline comments as done.Jun 5 2020, 1:45 PM
whchung added inline comments.Jun 5 2020, 1:49 PM
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
45

int32Zero should also have a meaningful name. Is it vindex?

185

is it really necessary to populate std->llvm conversion patterns?

Harbormaster completed remote builds in B59315: Diff 268921.
jerryyin marked 3 inline comments as done.Jun 5 2020, 3:39 PM
jerryyin added inline comments.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33

Some very intricate details here. I got to understand this only after I have taken to compare between ods operand adaptor and the op itself. I'm giving an example based on the tablegen-ed .inc files:

  • On one hand, in ods operand adaptor, getODSOperands() is implemented with
odsOperands.begin() + offset

This routes to the "actual" operands with regards the lowering process. For example, a value with llvm.i64 type. The lowered instruction will be llvm.add(%0, %1):(!llvm.i64, !llvm.i64) -> !llvm.i64

  • On the other hand, in the member function provided by itself, getODSOperands() is implemented with
op.getOperation()->operand_begin() + offset

This routes to the operands from MLIR perspective. For example, a value with mlir's indextype. The lowered instruction will be: llvm.add(%0, %1):(!llvm.i64, index) -> !llvm.i64, yielding an error complaining that types does not match.

For the specific case, if I don't take the ods version of the operand, then getDataPtr either complains it is not valid type match (as example above), or complains that it is not valid memref (because it has already got lowered to llvm).

185

It is for the convenience of unit tests. createConvertVectorToROCDLPass() will only be invoked in unit test, and for the very minimal I need to use FuncOp and ReturnOp. In the actual test I also used AddOp. To my impression the ToLLVMConversion passes all depends on StdToLLVM pass in one way or another to get the (memref) type lowering done correctly.

188

Yes, it has to be there. This allows the mlir type to be lowered llvm type, which, from the perspective of rocdl is not in invalid form.

jerryyin marked 4 inline comments as done.Jun 5 2020, 3:46 PM
jerryyin added inline comments.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
45

Will update.

121

Sure.

mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir
35

Will duplicate the above three cases.

mlir/test/mlir-rocm-runner/vector-transferops.mlir
12

The vector width differences are being done mainly through ROCDL lowering path. But yes, the unit test should not assume based on existing implementations. Will do.

aartbik added inline comments.Jun 5 2020, 3:53 PM
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
65

This looks like a copy-and-paste of Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
However, part of the comments were not included. Please fix.

68

At first glance, I thought this was also due to due to copy and paste, since ConvertToLLVMPattern is really a base class for converting to LLVM IR, but I noticed other GPU passes use it too.

Still, perhaps good to document somewhere this is really a lowering phase into a mix of ROCDL::xx and LLVM::yy dialects?

jerryyin marked 8 inline comments as done and an inline comment as not done.Jun 8 2020, 8:55 AM
jerryyin added inline comments.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
65

Yes that's where I started from. Comments updated.

68

Hmm, you are right that lowering to GPU dialect eventually lowers to LLVM dialect because either NVVM dialect(cuda) or ROCDL dialect(rocm) is a superset of LLVM dialect. This is implied in the directory structure that all those are placed in the same directory with the LLVM dialect. I don't have a clear clue where the best place is to put this documentation. Probably not a good idea to put in one conversion pass, but rather better inside of some common place of all ROCDL + NNVM + LLVMAVX512 dialects, which doesn't exist yet. Either way, I'd tend to make it stay as background knowledge as is for now.

91

Will do on top of the pass.

109

That's right. A few lines above there is a comment documenting the x1 issues.

121

I digged around and find no example of global memref. If global memref isn't a thing than we don't need to consider about address space 1. Let me know if I'm wrong.

126

Renamed to constConfigAttr to align with rest of naming convention.

jerryyin updated this revision to Diff 269260.Jun 8 2020, 8:55 AM
jerryyin marked an inline comment as not done.

Address review feedbacks

Could you please elaborate on your plans for this revision going forward?
@ThomasRaoux, @mravishankar and I are looking into directly using the vector dialect to target GPUs and this looks like this is going in a similar direction.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33

The adaptor gives you the illusion of having a TransferReadOp with all its accessors on the operands, which as you saw have already been lowered,
This hides the underlying implementation details of which of the operands is what, in the cases where the op would not have been valid with the lowered operands (i.e. it does not typecheck).

whchung added inline comments.Jun 8 2020, 11:45 AM
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
121

To the very least, a check to make sure the address space of memrefs is 0 or 1 should be made, because vector transfers for memrefs on address space 3 or 5 really shouldn't go thru this conversion process yet.

Could you please elaborate on your plans for this revision going forward?
@ThomasRaoux, @mravishankar and I are looking into directly using the vector dialect to target GPUs and this looks like this is going in a similar direction.

Thanks for helping explain the adaptor change.

After this CL we might want to generalize the vector transferops lowering with >1d cases. There seems to be some work in scf->llvm conversion pass that does just that. I will have to follow the development a little more closely. But to my understanding that is not a priority for now. This CL is more of a hot fix than the start of new features. The background for the CL is that we realized masked load/store (in vector->llvm pass) results in very poor IR and ISA at AMD GPU target. Therefore comes the pass to fix it.

@whchung Please feel free to add if there's something in additional. Thanks.

jerryyin updated this revision to Diff 269332.Jun 8 2020, 1:14 PM

Adding addresspace check

jerryyin marked 13 inline comments as done.Jun 8 2020, 1:17 PM
jerryyin marked an inline comment as done.
whchung added inline comments.Jun 8 2020, 1:37 PM
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
125

nit: move the check right after memRefType is created, and remove curly braces. Curly braces in some checks above could also be removed.

jerryyin updated this revision to Diff 269345.Jun 8 2020, 1:48 PM

Adjust instruction sequences, remove redundant curly braces

jerryyin marked an inline comment as done.Jun 8 2020, 1:49 PM
whchung accepted this revision.Jun 8 2020, 2:12 PM

LGTM. @aartbik / @nicolasvasilache would you mind give this patch another round of review?

This revision is now accepted and ready to land.Jun 8 2020, 2:12 PM
mehdi_amini added inline comments.Jun 8 2020, 8:23 PM
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
116

Please fix the clang-tidy warning here.

jerryyin updated this revision to Diff 269532.Jun 9 2020, 6:53 AM

Fix clang-tidy warning by removing unecessary local varaible

jerryyin marked 2 inline comments as done.Jun 9 2020, 6:55 AM
jerryyin added inline comments.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
116

Thanks for your reminder. Done.

jerryyin marked an inline comment as done.Jun 9 2020, 6:56 AM
jerryyin updated this revision to Diff 269546.Jun 9 2020, 8:07 AM

Rebase and remove blank line at EOF

aartbik added a comment.EditedJun 9 2020, 2:39 PM

LGTM. @aartbik / @nicolasvasilache would you mind give this patch another round of review?

I have nothing further. I would wait for Nicolas' feedback, since he was interested in the direction that would yield the least overlap in efforts.

rriddle added inline comments.Jun 9 2020, 3:17 PM
mlir/include/mlir/Conversion/VectorToROCDL/VectorToROCDL.h
12

nit: I don't think this include is necessary.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33

nit: Static functions go in the top-level namespace, only use anonymous for things like classes.

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

40

This and the above seem unnecessary. Can you just use either OpTy::OperandAdaptor or OperandAdaptor<OpTy>

43

Missing static on each of these functions.

63

nit: Remove the above newline.

66

nit: ///

jerryyin marked 8 inline comments as done.Jun 10 2020, 7:32 AM
jerryyin added inline comments.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33

Thanks for pointing out the coding standards page. It provides good justifications on use of namespace and static key word.

40

Thanks for pointing out, just realized that this is the recommended way to do it in the mlir documentation.

jerryyin updated this revision to Diff 269843.Jun 10 2020, 7:32 AM
jerryyin marked 2 inline comments as done.

Resolve review feedbacks

I'm pretty sure the build failure has nothing to do with this CL. I will wait till tomorrow to push it upstream if there's no further feedback to it. Thanks.

jerryyin closed this revision.EditedJun 11 2020, 7:37 AM