This patch adds intrinsic definitions and SDNodes for predicated load/store/gather/scatter, based on the work done in D57504.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks Hussain for this patch!
For now I have a couple of minor comments, I will take a deeper look later.
Also, I don't quite understand yet how VP handles (or plans to handle) specifying alignment parameter. Does it need VPBuilder (not to be confused with VPlan VPBuilder!) for that? Would it make sense to add the Alignment parameter to VP load/store/scatter/gather intrinsics (much like their masked counterparts) for now, unless there is already a way to specify that?
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1341 | Better to use DefaultAttrsIntrinsic? (Ditto for other load, gather and scatter) | |
1366 | minor style nit: A blank line above TODO. Perhaps reposition it to above the scope where it applies. |
We should use the align(N) attribute on pointers to convey the alignment. That's what we do in the LLVM for SX-Aurora stack (https://github.com/sx-aurora-dev/llvm-project/blob/683f41073689262a8a493cce7caba13bbccea56c/llvm/test/CodeGen/VE/Packed/vp_load_packed.ll#L80) . I'd say that the alignment parameter in the pre-existing masked.load/store simply predates that attribute.
And yes, setting alignment is something that the VPBuilder or what's-it-called-after-bikeshedding would take care of for you.
llvm/unittests/IR/VPIntrinsicTest.cpp | ||
---|---|---|
47–52 | There are no mask or vlen attributes in upstream. |
Can we split this up into a patch with only the intrinsics and a second patch with SelectionDAG?
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1366–1367 | Can we override clang-format here? The previous code had a more consistent structure. | |
llvm/include/llvm/IR/IntrinsicInst.h | ||
402 | It kind of feels like maybe there should be a VPMemIntrinsic and a VPStoreIntrinsic subclasses that contain these methods? Then you can use the more common isa, cast, dyn_cast and not have Optional return values. | |
llvm/lib/IR/IntrinsicInst.cpp | ||
299 | Doesn't the getValue() below already assert if there is no value? |
Just some minor comments. I think Craig's suggestion of mem subclasses sounds like it's worth investigating.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1366–1367 | Alternatively, I suppose, it could be a switch statement (aside from the last two)? Presumably that'd be optimized well enough by compilers? | |
llvm/include/llvm/IR/IntrinsicInst.h | ||
409 | Inconsistent documentation style, e.g. return the and return The, and use of commas vs forward slashes. Also, is clang-tidy correct here? No uppercase letter on static methods? | |
llvm/include/llvm/IR/Intrinsics.td | ||
1346 | How some the loads are IntrReadMem but the stores aren't IntrWriteMem? | |
llvm/include/llvm/IR/VPIntrinsics.def | ||
91 | typo: cannonical -> canonical | |
llvm/lib/IR/IntrinsicInst.cpp | ||
305 | Purely a matter of style but this is slightly shorter and (I think) a little clearer: if (auto PtrParamOpt = GetMemoryPointerParamPos(getIntrinsicID())) return getArgOperand(PtrParamOpt.getValue()); return nullptr; |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1366–1367 | It seems like this style is more idiomatic in the surrounding code, so leaving it as-is for now (with spacing fixed). | |
llvm/include/llvm/IR/IntrinsicInst.h | ||
402 | There will be separate classes for the corresponding SDNodes at least, in the followup patch. A bunch of the code there also assumes you can generically call these methods, so I think it would be good to leave it like this for now and revisit it in the followup patch to minimize things breaking. |
I had a thought about alignment. Do we have a plan on how to deal with unaligned accesses? The other non-VP masked intrinsics are scalarized by the ScalarizeMaskedMemIntrin pass and regular loads and stores are handled by -- I believe -- LegalizeDAG.
This doesn't apply for scalable vectors, where presumably the onus is on the author/compiler transform not to emit them in the first place. But then for that we have TTI methods like isLegalMaskedStore and friends.
Not yet
The other non-VP masked intrinsics are scalarized by the ScalarizeMaskedMemIntrin pass and regular loads and stores are handled by -- I believe -- LegalizeDAG.
Maybe vp can piggyback on ScalarizeMaskedMemIntrin for unaligned access expansion.
This doesn't apply for scalable vectors, where presumably the onus is on the author/compiler transform not to emit them in the first place. But then for that we have TTI methods like isLegalMaskedStore and friends.
ScalarizeMaskedMemIntrin could use llvm.experimental.vector.extract to do the same for scalable vectors...
It already will piggyback on ScalarizeMaskedMemIntrin for scalarization, so that should be straightforward to put into the followup VP-mem-expansion patch.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1366–1367 | Was this idiomatic comment referring to my comment or the switch suggestion? If we don't do a switch I'd like to see it back the way it was originally so that each opcode is on one line. We can put clang-format off/on around it to suppress the clang format warning. | |
llvm/include/llvm/IR/IntrinsicInst.h | ||
409 | This should return unsigned to match the recent change to GetVectorLengthParamPos and GetMaskParamPos. | |
413 | unsigned | |
llvm/lib/IR/IntrinsicInst.cpp | ||
298 | unsigned |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1366–1367 | Regarding the switch, but I've gone ahead and changed it to a switch statement. |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
1392 | Extra space after case here |
LGTM other than the last few nits.
llvm/lib/IR/IntrinsicInst.cpp | ||
---|---|---|
300 | Is the explicit this-> required? |
This needs to be rebased onto recent upstream.
In order to pass all unittest then, this patch needs an implementation in VPIntrinsic::getDeclarationForParams.
Just so I know for the VP reduction intrinsics, should we not be adding LangRef documentation at this stage too, given we're adding new intrinsics? How about support in the ExpandVectorPredication pass? Should that come now or can it wait?
Ideally, yes, you want to have the documentation with the intrinsics and their expansion in one patch.
If that grows too large, you split off the expansion for another patch.
For this patch, since we are kind of stalling on memory intrinsics to become available upstream, it'd be ok for me to just have the intrinsics and save everything else for followup patches. The intrinsics are documented in the reference implementation anyway.
Got you, thanks for the info. I'd definitely prefer to have it all done in one for my own patches but I'm not going to block this one or anything.
I rebased, fixed some failing unit tests, and extended VPIntrinsic::getDeclarationForParams to handle memory intrins. Not sure if I got arc to properly pick everything up.
Apart from a style nit, LGTM. Will run some local tests before accepting. Thx!
llvm/unittests/IR/VPIntrinsicTest.cpp | ||
---|---|---|
161–162 | No braces |
One more thing: title & summary do not reflect what's in the patch anymore. Please update those too.
My bad. I had overlooked that the VPIntrinsics.def implicitly instantiates the SDNodes. (I thought for a bit there were no SDNodes in this patch anymore). You're all good :)
Can we override clang-format here? The previous code had a more consistent structure.