- Created the vector to ROCDL lowering pass
- The lowering pass lowers vector transferOps to rocdl mubufOps
- Added unit test and functional test
LLVM -> ROCDL
is this class really used?
int1False doesn't seem meaningful here. According to AMD GCN ISA manual, what you want to represent here should be glc and slc bits?
could you specify the limitations of this patch?
could AffineMap::isMinorIdentity(xferOp.permutation_map()) be used here?
from tests provided, it seems vecWidth could only be 2 or 4?
nit: add "." at the end of the setence.
I guess the only meaningful address spaces for this conversion pattern are 0 (generic) and 1 (global) on AMD GPU. Could additional checks be placed here, and also tests to cover memrefs residing on address space 1?
is this line necessary?
this end-to-end test only checks loading a vector<2xf32>, is it possible to add one which checks vector<4xf32> as well?
could you add tests for vector.transfer_write?
Some very intricate details here. I got to understand this only after I have taken to compare between ods operand adaptor and the op itself. I'm giving an example based on the tablegen-ed .inc files:
odsOperands.begin() + offset
This routes to the "actual" operands with regards the lowering process. For example, a value with llvm.i64 type. The lowered instruction will be llvm.add(%0, %1):(!llvm.i64, !llvm.i64) -> !llvm.i64
op.getOperation()->operand_begin() + offset
This routes to the operands from MLIR perspective. For example, a value with mlir's indextype. The lowered instruction will be: llvm.add(%0, %1):(!llvm.i64, index) -> !llvm.i64, yielding an error complaining that types does not match.
For the specific case, if I don't take the ods version of the operand, then getDataPtr either complains it is not valid type match (as example above), or complains that it is not valid memref (because it has already got lowered to llvm).
It is for the convenience of unit tests. createConvertVectorToROCDLPass() will only be invoked in unit test, and for the very minimal I need to use FuncOp and ReturnOp. In the actual test I also used AddOp. To my impression the ToLLVMConversion passes all depends on StdToLLVM pass in one way or another to get the (memref) type lowering done correctly.
Yes, it has to be there. This allows the mlir type to be lowered llvm type, which, from the perspective of rocdl is not in invalid form.
Will duplicate the above three cases.
The vector width differences are being done mainly through ROCDL lowering path. But yes, the unit test should not assume based on existing implementations. Will do.
This looks like a copy-and-paste of Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
At first glance, I thought this was also due to due to copy and paste, since ConvertToLLVMPattern is really a base class for converting to LLVM IR, but I noticed other GPU passes use it too.
Still, perhaps good to document somewhere this is really a lowering phase into a mix of ROCDL::xx and LLVM::yy dialects?
Yes that's where I started from. Comments updated.
Hmm, you are right that lowering to GPU dialect eventually lowers to LLVM dialect because either NVVM dialect(cuda) or ROCDL dialect(rocm) is a superset of LLVM dialect. This is implied in the directory structure that all those are placed in the same directory with the LLVM dialect. I don't have a clear clue where the best place is to put this documentation. Probably not a good idea to put in one conversion pass, but rather better inside of some common place of all ROCDL + NNVM + LLVMAVX512 dialects, which doesn't exist yet. Either way, I'd tend to make it stay as background knowledge as is for now.
Will do on top of the pass.
That's right. A few lines above there is a comment documenting the x1 issues.
I digged around and find no example of global memref. If global memref isn't a thing than we don't need to consider about address space 1. Let me know if I'm wrong.
Renamed to constConfigAttr to align with rest of naming convention.
Could you please elaborate on your plans for this revision going forward?
@ThomasRaoux, @mravishankar and I are looking into directly using the vector dialect to target GPUs and this looks like this is going in a similar direction.
The adaptor gives you the illusion of having a TransferReadOp with all its accessors on the operands, which as you saw have already been lowered,
To the very least, a check to make sure the address space of memrefs is 0 or 1 should be made, because vector transfers for memrefs on address space 3 or 5 really shouldn't go thru this conversion process yet.
Thanks for helping explain the adaptor change.
After this CL we might want to generalize the vector transferops lowering with >1d cases. There seems to be some work in scf->llvm conversion pass that does just that. I will have to follow the development a little more closely. But to my understanding that is not a priority for now. This CL is more of a hot fix than the start of new features. The background for the CL is that we realized masked load/store (in vector->llvm pass) results in very poor IR and ISA at AMD GPU target. Therefore comes the pass to fix it.
@whchung Please feel free to add if there's something in additional. Thanks.
nit: move the check right after memRefType is created, and remove curly braces. Curly braces in some checks above could also be removed.
LGTM. @aartbik / @nicolasvasilache would you mind give this patch another round of review?
I have nothing further. I would wait for Nicolas' feedback, since he was interested in the direction that would yield the least overlap in efforts.
nit: I don't think this include is necessary.
nit: Static functions go in the top-level namespace, only use anonymous for things like classes.
This and the above seem unnecessary. Can you just use either OpTy::OperandAdaptor or OperandAdaptor<OpTy>
Missing static on each of these functions.
nit: Remove the above newline.
I'm pretty sure the build failure has nothing to do with this CL. I will wait till tomorrow to push it upstream if there's no further feedback to it. Thanks.