This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CodeGen] Addressing modes for SVE ldN.
ClosedPublic

Authored by fpetrogalli on Apr 1 2020, 4:02 PM.

Diff Detail

Event Timeline

fpetrogalli created this revision.Apr 1 2020, 4:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2020, 4:02 PM

This code is based on https://reviews.llvm.org/D75751, which in turn depends on https://reviews.llvm.org/D75672 and https://reviews.llvm.org/D75674.

I could split the patch in two patches, one for stN and one for ldN. The stN one would not have any dependencies on the aforementioned patches, but I'd rather not do the split because the code uses the common method: AArch64DAGToDAGISel::findAddrModeSVELoadStore.

Of course, I didn't marry this patch. :) If you think I'd better split it so I can submit the stN part without waiting for the dependencies, I'd happily do it.

Thank you!

Francesco

fpetrogalli added a comment.EditedApr 3 2020, 3:53 PM

I have extracted the stores in a separate patch, as it does not rely on any of the dependencies that are specific to the loads. I will update this patch to removed the stores bits.

The stores' patch is at https://reviews.llvm.org/D77435

fpetrogalli retitled this revision from [llvm][CodeGen] Addressing modes for SVE ldN/stN. to [llvm][CodeGen] Addressing modes for SVE ldN..

I have removed the stores as they are separately implemented in D77435.

[NFC] Rebased on top of master, after the change for the stN went in (https://github.com/llvm/llvm-project/commit/897fdec586d9ad4c101738caa723bacdda15a769).

Hey @fpetrogalli @sdesmalen , any updates on this patch? This should be the last one to commit for structure load? Or are we using different approach?

I have rebased on top of master and added the bfloat test cases.

sdesmalen added inline comments.Jul 20 2020, 9:52 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808

Why is this a template argument instead of a regular argument with default = 1?

4820–4821

remove line?

4845

How come that ST2..ST4 are not covered here?

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453

nit: when testing these pairs, can you at least make sure to test the min/max values?

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll
10

nit: s/gp/pg/ (in both tests)

fpetrogalli marked 7 inline comments as done.

Thank you for your review @sdesmalen!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808

It is a template because I don't foresee NumVec becoming a runtime value. I have used this in other places, for example in setInfoSVEStN. I could set the default to be 1, but I'd rather call it out in the template invocation.

Your comment made me think I should have also made sure that the values of NumVecs are architecturally valid ==> that's why I have added a static assert in the last version of the patch.

4845

We cannot do for the structured load the same we do for the stores because address optimization is done on the intrinsics for structured stores:

case Intrinsic::aarch64_sve_st2: {
  if (VT == MVT::nxv16i8) {
    SelectPredicatedStore</*Scale=*/0>(Node, 2, AArch64::ST2B,
                                       AArch64::ST2B_IMM);
    return;

While for the loads the address optimization is done on the custom ISD nodes:

case AArch64ISD::SVE_LD2_MERGE_ZERO: {
  if (VT == MVT::nxv16i8) {
    SelectPredicatedLoad</*Scale=*/0>(Node, 2, AArch64::LD2B_IMM,
                                      AArch64::LD2B);
    return;

This means that the method getMemVTFromNode determines the memory type with the following generic code for the stores:

if (isa<MemIntrinsicSDNode>(Root))
  return cast<MemIntrinsicSDNode>(Root)->getMemoryVT();

We might do the same for the loads, but we would have to teach to AArch64TargetLowering::getTgtMemIntrinsic how to extract the memory type from the intrinsics call, like we do with the stores:

switch (Intrinsic) {
case Intrinsic::aarch64_sve_st2:
  return setInfoSVEStN<2>(Info, I);
case Intrinsic::aarch64_sve_st3:
  return setInfoSVEStN<3>(Info, I);
case Intrinsic::aarch64_sve_st4:
  return setInfoSVEStN<4>(Info, I);

But that would probably requires changing all the code that converts the intrinsics of the loads into the SVE_LD*_MERGE_ZERO ISD nodes...

If we want to do this cleanup maybe it is better to do it for the whole "structure store intrinsics/ISD nodes" in a separate patch? As far as I understand it, this would be quite an extensive change because we would have to make sure that the nodes of the structure loads are still using the intrinsic and not the custom ISD node when we enter this method, which is not what is going on now. I believe there is a reason for having converted the intrinsics nodes into custom ISD nodes before getting here in address optimization?

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453

I can do that, just making sure you didn't miss the comment on the top of file that explains why I haven't add range checks for all cases:

; NOTE: invalid, upper and lower bound immediate values of the regimm
; addressing mode are checked only for the byte version of each
; instruction (`ld<N>b`), as the code for detecting the immediate is
; common to all instructions, and varies only for the number of
; elements of the structure store, which is <N> = 2, 3, 4.

Let me know if you think this is not a sufficient explanation!

sdesmalen added inline comments.Jul 22 2020, 8:14 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808

This means the compiler will have to generate two versions of getPackedVectorTypeFromPredicateType because you expect NumVec to be a constant value. It should instead just pass it as a regular parameter.

One of the reasons that this file contains various templated functions is so that you can target them in ComplexPatterns through TableGen by specifying a specific variant of that function, e.g. SelectSVEShiftImm64<0, 64> for ComplexPattern SVEShiftImm64.

4845

Thanks for explaining, I guess I'm happy keeping it as it is now.

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453

I meant using #-32 and #28 instead of #16 and #-16 for ld4.nxv8i64 and nxv8f64 respectively.
(and similar for other tests in this file)

I have addressed your feedback. Thank you @sdesmalen.

fpetrogalli marked 2 inline comments as done.Jul 23 2020, 3:06 PM
fpetrogalli added inline comments.
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453

Sorry, I don't understand why -32/28 would be better than -16/16. The values you are suggesting are tested to be lower and upper bound for ld4<T> in the byte case, ld4b, between lines 305-323.

The range of the immediate does not depend on the scalar data type being loaded, but only on the number of vector loaded by the instruction, so I think we are goo in testing it only for ld<N>b.

Or am I missing something?

I have modified the testing of the reg+imm addressing mode to use upper and lower bound instead of strictly in-range values.

Thank you,

Francesco

sdesmalen added inline comments.Jul 27 2020, 7:20 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
248–249

Sorry, one more thing, can you make Scale a normal argument rather than a template argument?
I see that D77435 also uses a template argument for the stores, rather than a regular argument. I guess you can fix that one separately in an NFC patch.

fpetrogalli marked an inline comment as done.

Address last feedback on template parameters.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
248–249
fpetrogalli marked an inline comment as done.Jul 27 2020, 10:11 AM
fpetrogalli marked 3 inline comments as done.
sdesmalen accepted this revision.Jul 27 2020, 10:13 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jul 27 2020, 10:13 AM
This revision was automatically updated to reflect the committed changes.