This is an archive of the discontinued LLVM Phabricator instance.

[llvm][aarch64] SVE addressing modes.
ClosedPublic

Authored by fpetrogalli on Feb 7 2020, 1:35 PM.

Details

Summary

Added register + immediate and register + register addressing modes for the following intrinsics:

  1. Masked load and stores:
    • Sign and zero extended load and truncated stores.
    • No extension or truncation.
  2. Masked non-temporal load and store.

Diff Detail

Event Timeline

fpetrogalli created this revision.Feb 7 2020, 1:35 PM
fpetrogalli retitled this revision from [WIP][llvm][aarch64] SVE addressing modes: register + immediate. to [llvm][aarch64] SVE addressing modes..
fpetrogalli edited the summary of this revision. (Show Details)

The patch now covers all the LLVM intrinsic listed in the summary:

  1. Masked load and stores:
    • Sign and zero extended load and truncated stores.
    • No extension or truncation.
  2. Non-temporal load and store.

Notice that this patch is based (and depends) on https://reviews.llvm.org/D73602.

Hi @fpetrogalli, thank you for working on this.

  • IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.
  • [nit] AFAIK, + is never used in file names (e.g. sve-pred-non-temporal-ldst-addressing-mode-reg+reg.ll). I couldn't find any specific rule for that though.
  • [nit] The idea of %sv2i1 = type <vscale x 2 x i1> is very neat, but I'm slightly worried that that will complicate searching through test files (we are quite familiar with <vscale x 2 x i1>, not so much with %sv2i1, so you might miss cases when using grep).
  • Are the DAG Combine rules required for this patch? If so, could you add tests for them?
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4449

[nit] Wouldn't assert be more suitable?

4468

Base and OffImm are only only needed when building in Debug mode and running e.g. llc with -debug. Maybe move that inside LLVM_DEBUG?

4470

[nit] Match found is very generic (and you use it twice). Maybe sth more specific?

4483

Used only once (so no need for a dedicated variable)

4484

Not used.

4493

Hm, why? Maybe document this method like you did for SelectAddrModeIndexedSVE?

llvm/lib/Target/AArch64/SVEInstrFormats.td
7064

Inconsistent naming with the records that follow this one.

llvm/test/CodeGen/AArch64/sve-gep.ll
7 ↗(On Diff #244061)

Unrelated changes.

fpetrogalli marked 6 inline comments as done.Feb 12 2020, 8:14 AM

@andwar , thank you for the review.

I'll update the patch asap.

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.

I see your point, but the tests that use merge and store at the same time are using exactly the same addressing modes, it is not that they are using something different. So if something fails in the addressing mode of the load, it fails in the addressing mode of the store. Having them merged together saves quite some typing, and has no disadvantages in term of unit testing.

[nit] AFAIK, + is never used in file names (e.g. sve-pred-non-temporal-ldst-addressing-mode-reg+reg.ll). I couldn't find any specific rule for that though.

Fair point, I'll remove the plus.

[nit] The idea of %sv2i1 = type <vscale x 2 x i1> is very neat, but I'm slightly worried that that will complicate searching through test files (we are quite familiar with <vscale x 2 x i1>, not so much with %sv2i1, so you might miss cases when using grep).

Very good point. The type redefinition saved me a lot of typing, but now that the tests are there it is better o remove it.

Are the DAG Combine rules required for this patch? If so, could you add tests for them?

They are necessary for the reg+imm tests. They are needed to make sure that the ADD node of the base address is always in the format (ADD %BASE (VSCALE CONST)). I'll see if I can add some unit tests specifically for the combiner changes.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1371

This is to be removed.

1387

This will be removed too.

llvm/test/CodeGen/AArch64/sve-gep.ll
7 ↗(On Diff #244061)

No, I had to fix this because of the DAG combine changes.

fpetrogalli added inline comments.Feb 12 2020, 8:14 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4468

No, they are output parameters, passed by reference. They are not just for debug purpose.

4470

I will remove these debug messages, they are not very helpful.

fpetrogalli marked 10 inline comments as done.Feb 12 2020, 1:29 PM
fpetrogalli added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4493

Good catch, it is not needed.

llvm/lib/Target/AArch64/SVEInstrFormats.td
7064

Used the file convention (lower case, separated with _, starts with am).

fpetrogalli marked 2 inline comments as done.

Address code review from @andwar.

I have added the tests for the DAG Combine changes.

I have uploaded PDF renderings of the DAGS before and after the changes. I am not sure the tests for the combiner are explicit about the fact that the combiner is run on this pattern, but the graphs attached here show the effect of my changes on the code. I think that some of the patterns that are needed to show the changes in the output code are missing.

This comment was removed by fpetrogalli.

Cosmetic changes in the comments of the non-temporal reg+reg test.

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.

I see your point, but the tests that use merge and store at the same time are using exactly the same addressing modes, it is not that they are using something different. So if something fails in the addressing mode of the load, it fails in the addressing mode of the store. Having them merged together saves quite some typing, and has no disadvantages in term of unit testing.

Ah, I've realised that you split the files per addressing modes. As the name of the patch would suggest :-) OK, keep it as it is.

Thank you for adding DAG diagrams - they are very helpful!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2330 ↗(On Diff #244273)

vscale * C1 and vscale * (C0 + C1)?

3260 ↗(On Diff #244273)

I don't quite understand the benefit of this transformation. The selection dag before and after are almost identical.

3601 ↗(On Diff #244273)

vscale * (C0 + C1)?

7730 ↗(On Diff #244273)

vscale * (C0 * C1)?

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4476

Missing doxstring.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
12

; CHECK-NEXT? Here and in the following examples.

17

Could you format this line (and similar lines elsewhere)? E.g.: https://github.com/llvm/llvm-project/blob/318d0ede572080f18d0106dbc354e11c88329a84/llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll#L11

Makes it easier to parse for humans :) And will be consistent with other files too!

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
6 ↗(On Diff #244273)

vscale * C1? and vscale * (C0 + C1)?

9 ↗(On Diff #244273)

Perhaps ; CHECK-NOT: add?

25 ↗(On Diff #244273)
  • vscale * (C0 * C1)?
  • What's C0 and what is C1 in this example?
33 ↗(On Diff #244273)

[nit] Align with the following line (missing space)

38 ↗(On Diff #244273)

Is nounwind needed in these examples?

55 ↗(On Diff #244273)

Perhaps ; CHECK-NOT: sub?

77 ↗(On Diff #244273)

; CHECK-NOT: shl?

80 ↗(On Diff #244273)

Hm, since it's 6 here, shouldn't line 77 be ; CHECK-NEXT: rdvl x0, #6?

fpetrogalli marked 21 inline comments as done.Feb 13 2020, 11:21 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3260 ↗(On Diff #244273)

It is needed to avoid the having to deal with SUB when computing the addressing mode.

3601 ↗(On Diff #244273)

Almost: vscale * (C0 * C1).

7730 ↗(On Diff #244273)

Almost: (vscale * (C0 << C1)).

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

I prefer keeping all parameters in one line.

The "consistency with other file" argument doesn't work, most of the tests for SVE uses the convention in this patch:

frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*)$" sve*.ll | grep -v "()" | wc -l
2076
frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*,$" sve*.ll | grep -v "()" | wc -l
1433

The result becomes even more unbalanced if you look at the totality of tests in the folder:

frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*)$" *.ll | grep -v "()" | wc -l
7260
frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*,$" *.ll | grep -v "()" | wc -l
1435

I run grep on master, @ a062a3ed7fd82c277812d80fb83dc6f05b939a84, _not_ on my dev branch :).

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
38 ↗(On Diff #244273)

Yes, to be able to use CHECK-NEXT after CHECK-LABEL

80 ↗(On Diff #244273)

At IR level, it is %shl = 2^6 * VSCALE

At Assembly level, the output of RDVL is 2^4 * VSCALE

Hence, to compute %shl we need to multiply RDVL output by 2^2 -> #4 is correct.

Does that make sense? I have actually added it as a comment, but I have modified the code so that it produces rdvl #1.

fpetrogalli marked 6 inline comments as done.Feb 13 2020, 11:23 AM

@andwar, thank you for you review.

I have updated the code accordingly.

Thank you!

fpetrogalli edited the summary of this revision. (Show Details)Feb 13 2020, 11:28 AM
fpetrogalli edited the summary of this revision. (Show Details)
fpetrogalli edited the summary of this revision. (Show Details)

I have added a CHECK-NOT: mul that was missing from the DAG combiner tests.

andwar added inline comments.Feb 17 2020, 11:13 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
703

DELETEME

4464

[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified:

if ((MulImm % MemWidthBytes) != 0)
  return false;
 
signed Offset = MulImm / MemWidthBytes;
if ((Offset < Min) || (Offset > Max)) 
  return false;

Base = N.getOperand(0);
OffImm = CurDAG->getTargetConstant(Offset, SDLoc(N), MVT::i64);
return true;

This way we have fewer levels of indentation.

4465

This operation mixes sizes: unsigned vs int64_t. It would be safer to be either:

  • consistently explicit about the size of integers (int64_t and int32_t), or
  • consistently implicit about the size of integers (unsigned, unsigned long)
4497

[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified as:

if (RHS.getOpcode() != ISD::SHL) 
  return false;

const SDValue SRHS = RHS.getOperand(1);
auto *C = dyn_cast<ConstantSDNode>(SRHS);
if (nullptr == C) 
  return false;

const uint64_t Shift = C->getZExtValue();
if (Shift == Scale) {
   Base = LHS;
   Offset = RHS.getOperand(0);
    return true;
}

This way you we have fewer levels of indentation.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1277

[nit] Indent consistently

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

I think that your comparison misses the context, and that's entirely my fault because I didn't make it clear. If you compare long lines only, the split is roughly 50/50.

I kindly asked for this update because these lines are wrapped by Phabricator (they don't fit on my screen and I don't see how to make Phabricator stop doing that). This makes reviewing on Phab a bit frustrating.

Btw, I think that awk would be more fitting here ;-)

#! /bin/evn bash

TEST_FILES=(
"llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll"
"llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll"
"llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll"
"llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll")

for test_file in "${TEST_FILES[@]}"; do
awk -F',' '{
  if ($1 ~ /call/) {
    # Counter number of character for indentation
    for (i = 1; i <= length($0); i++) {
      if (substr($0, i, 1) == "(") {
        lenght_col = i
        break
      }
    }

    # Split function call - 2 args
    if (NF == 2) {
      printf("%s,\n %*s %s\n", $1, lenght_col - 3, "", $2)
    }

    # Split function call - 3 args
    if (NF == 3) {
      printf("%s,\n %*s %s,\n %*s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3)
    }

    # Split function call - 4 args
    if (NF == 4) {
      printf("%s,\n %*s %s,\n %*s %s,\n %*s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3, lenght_col - 3, "", $4)
    }

  } else {
    # Not a call, not reformatting
    print $0
  }
}' ${test_file} > temp.ll

mv temp.ll ${test_file}

done
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
392

FIXME

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
13 ↗(On Diff #244499)

Could C0 and C1 be different than 1?

28 ↗(On Diff #244499)

Did you mean C0 = 2? And multiplication by C0 seems to be missing - there's only multiplication by 32. Shouldn't the IR look like this:

%vscale = call i64 @llvm.vscale.i64()
%mul_by_16 = mul i64 %vscale, 16
%mul_by_2 = mul i64 %mul_by_16, 32
ret i64 %mul_by_2
6 ↗(On Diff #244273)

What about vscale * C1?

Cheers for updating this @fpetrogalli ! My most recent comments are mostly nits.

fpetrogalli marked 13 inline comments as done.

Thank you @adwar. I have addressed your comments.

Your awk script was very useful.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

You win!

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
13 ↗(On Diff #244499)

Yes, but to get C0 and C1 different from 1 I'd have to use a mul instruction, which is already tested in the combine_mul_vscale_* tests.

I think it is enough to test with C0=C1=1.

28 ↗(On Diff #244499)

When targeting SVE, @llvm.vscale.*() returns the number of 16-byte chunks of and SVE register. If I multiply it by 32, it means it is returning the number of 4-bit (half-byte) elements in the SVE register. RDVL returns the number of 8-bit lanes in the register, hence the number of 4-bit lanes is given by RDVL Xn, %2.

I have updated the comment setting C1 = 32.

Thanks for creating this patch @fpetrogalli!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331 ↗(On Diff #245180)

Does it make sense to submit these DAGCombine changes + all related tests in a separate patch?
This patch is a bit of three patches in one at the moment: DAGCombines, reg+reg addressing modes, reg+imm addressing modes.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
224

nit: This one is missing a comment.

4467

Are you purposely using signed instead of int64_t here?

4468

nit: unnecessary brackets, can be if (Offset < Min || Offset > Max)

4500

nit: Why const?

fpetrogalli marked 8 inline comments as done.

I have extracted the DAGCombiner changes in a separate patch as
requested by @sdesmalen: https://reviews.llvm.org/D74782

fpetrogalli edited the summary of this revision. (Show Details)Feb 18 2020, 11:38 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331 ↗(On Diff #245180)

I have extracted the DAGCombiner changes here: https://reviews.llvm.org/D74782

I think that the code for the addressing mode is simple enough that it can stay in a single patch.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4467

Should be int64_t for consistency.

4500

I try to use const indiscriminately when I know that the variable is not supposed to change inside the scope. The same it is done in other places in this file (not everywhere). I didn't marry this, so if you want I can remove it. :)

Thank you @sdesmalen for your review!

Francesco

Thank you for all the updates @fpetrogalli ! LGTM

sdesmalen accepted this revision.Feb 21 2020, 9:58 AM

LGTM!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4501
if (auto *C = dyn_cast<ConstantSDNode>(ShiftRHS)) {
  if (ShiftAmount == C->getZExtValue()) {
    ...
    return true;
  }
}

return false;
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

nit: did you chose a different alignment here on purpose?

This revision is now accepted and ready to land.Feb 21 2020, 9:58 AM
fpetrogalli marked 3 inline comments as done.Feb 21 2020, 11:34 AM
fpetrogalli added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4501

@sdesmalen, your version saves some lines, I'll apply it and then submit the patch. Thank you!

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

No - it is not on purpose. Do you want me to set everything to 1? I wouldn't bother, non of the code added in this patch care of the value of the alignment...

fpetrogalli marked 2 inline comments as done.Feb 21 2020, 11:59 AM
fpetrogalli added inline comments.
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

Well, it was easier to change everything to i32 1 instead of doing another round of questions! :)

Update code as requested by @sdesmalen.

Thank you!

Francesco

fpetrogalli marked an inline comment as done.Feb 21 2020, 12:01 PM
This revision was automatically updated to reflect the committed changes.