Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This code is based on https://reviews.llvm.org/D75751, which in turn depends on https://reviews.llvm.org/D75672 and https://reviews.llvm.org/D75674.
I could split the patch in two patches, one for stN and one for ldN. The stN one would not have any dependencies on the aforementioned patches, but I'd rather not do the split because the code uses the common method: AArch64DAGToDAGISel::findAddrModeSVELoadStore.
Of course, I didn't marry this patch. :) If you think I'd better split it so I can submit the stN part without waiting for the dependencies, I'd happily do it.
Thank you!
Francesco
I have extracted the stores in a separate patch, as it does not rely on any of the dependencies that are specific to the loads. I will update this patch to removed the stores bits.
The stores' patch is at https://reviews.llvm.org/D77435
[NFC] Rebased on top of master, after the change for the stN went in (https://github.com/llvm/llvm-project/commit/897fdec586d9ad4c101738caa723bacdda15a769).
Hey @fpetrogalli @sdesmalen , any updates on this patch? This should be the last one to commit for structure load? Or are we using different approach?
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp | ||
---|---|---|
4837 | Why is this a template argument instead of a regular argument with default = 1? | |
4849–4850 | remove line? | |
4874 | How come that ST2..ST4 are not covered here? | |
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll | ||
453 | nit: when testing these pairs, can you at least make sure to test the min/max values? | |
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll | ||
10 | nit: s/gp/pg/ (in both tests) |
Thank you for your review @sdesmalen!
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp | ||
---|---|---|
4837 | It is a template because I don't foresee NumVec becoming a runtime value. I have used this in other places, for example in setInfoSVEStN. I could set the default to be 1, but I'd rather call it out in the template invocation. Your comment made me think I should have also made sure that the values of NumVecs are architecturally valid ==> that's why I have added a static assert in the last version of the patch. | |
4874 | We cannot do for the structured load the same we do for the stores because address optimization is done on the intrinsics for structured stores: case Intrinsic::aarch64_sve_st2: { if (VT == MVT::nxv16i8) { SelectPredicatedStore</*Scale=*/0>(Node, 2, AArch64::ST2B, AArch64::ST2B_IMM); return; While for the loads the address optimization is done on the custom ISD nodes: case AArch64ISD::SVE_LD2_MERGE_ZERO: { if (VT == MVT::nxv16i8) { SelectPredicatedLoad</*Scale=*/0>(Node, 2, AArch64::LD2B_IMM, AArch64::LD2B); return; This means that the method getMemVTFromNode determines the memory type with the following generic code for the stores: if (isa<MemIntrinsicSDNode>(Root)) return cast<MemIntrinsicSDNode>(Root)->getMemoryVT(); We might do the same for the loads, but we would have to teach to AArch64TargetLowering::getTgtMemIntrinsic how to extract the memory type from the intrinsics call, like we do with the stores: switch (Intrinsic) { case Intrinsic::aarch64_sve_st2: return setInfoSVEStN<2>(Info, I); case Intrinsic::aarch64_sve_st3: return setInfoSVEStN<3>(Info, I); case Intrinsic::aarch64_sve_st4: return setInfoSVEStN<4>(Info, I); But that would probably requires changing all the code that converts the intrinsics of the loads into the SVE_LD*_MERGE_ZERO ISD nodes... If we want to do this cleanup maybe it is better to do it for the whole "structure store intrinsics/ISD nodes" in a separate patch? As far as I understand it, this would be quite an extensive change because we would have to make sure that the nodes of the structure loads are still using the intrinsic and not the custom ISD node when we enter this method, which is not what is going on now. I believe there is a reason for having converted the intrinsics nodes into custom ISD nodes before getting here in address optimization? | |
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll | ||
453 | I can do that, just making sure you didn't miss the comment on the top of file that explains why I haven't add range checks for all cases: ; NOTE: invalid, upper and lower bound immediate values of the regimm ; addressing mode are checked only for the byte version of each ; instruction (`ld<N>b`), as the code for detecting the immediate is ; common to all instructions, and varies only for the number of ; elements of the structure store, which is <N> = 2, 3, 4. Let me know if you think this is not a sufficient explanation! |
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp | ||
---|---|---|
4837 | This means the compiler will have to generate two versions of getPackedVectorTypeFromPredicateType because you expect NumVec to be a constant value. It should instead just pass it as a regular parameter. One of the reasons that this file contains various templated functions is so that you can target them in ComplexPatterns through TableGen by specifying a specific variant of that function, e.g. SelectSVEShiftImm64<0, 64> for ComplexPattern SVEShiftImm64. | |
4874 | Thanks for explaining, I guess I'm happy keeping it as it is now. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll | ||
453 | I meant using #-32 and #28 instead of #16 and #-16 for ld4.nxv8i64 and nxv8f64 respectively. |
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll | ||
---|---|---|
453 | Sorry, I don't understand why -32/28 would be better than -16/16. The values you are suggesting are tested to be lower and upper bound for ld4<T> in the byte case, ld4b, between lines 305-323. The range of the immediate does not depend on the scalar data type being loaded, but only on the number of vector loaded by the instruction, so I think we are goo in testing it only for ld<N>b. Or am I missing something? |
I have modified the testing of the reg+imm addressing mode to use upper and lower bound instead of strictly in-range values.
Thank you,
Francesco
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp | ||
---|---|---|
248–250 | Sorry, one more thing, can you make Scale a normal argument rather than a template argument? |
Address last feedback on template parameters.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp | ||
---|---|---|
248–250 |
Sorry, one more thing, can you make Scale a normal argument rather than a template argument?
I see that D77435 also uses a template argument for the stores, rather than a regular argument. I guess you can fix that one separately in an NFC patch.