Page MenuHomePhabricator

[llvm][aarch64] SVE addressing modes.
ClosedPublic

Authored by fpetrogalli on Feb 7 2020, 1:35 PM.

Details

Summary

Added register + immediate and register + register addressing modes for the following intrinsics:

  1. Masked load and stores:
    • Sign and zero extended load and truncated stores.
    • No extension or truncation.
  2. Masked non-temporal load and store.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fpetrogalli added inline comments.Feb 12 2020, 8:14 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4439

No, they are output parameters, passed by reference. They are not just for debug purpose.

4441

I will remove these debug messages, they are not very helpful.

fpetrogalli marked 10 inline comments as done.Feb 12 2020, 1:29 PM
fpetrogalli added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4464

Good catch, it is not needed.

llvm/lib/Target/AArch64/SVEInstrFormats.td
6982

Used the file convention (lower case, separated with _, starts with am).

fpetrogalli marked 2 inline comments as done.

Address code review from @andwar.

I have added the tests for the DAG Combine changes.

I have uploaded PDF renderings of the DAGS before and after the changes. I am not sure the tests for the combiner are explicit about the fact that the combiner is run on this pattern, but the graphs attached here show the effect of my changes on the code. I think that some of the patterns that are needed to show the changes in the output code are missing.

This comment was removed by fpetrogalli.

Cosmetic changes in the comments of the non-temporal reg+reg test.

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.

I see your point, but the tests that use merge and store at the same time are using exactly the same addressing modes, it is not that they are using something different. So if something fails in the addressing mode of the load, it fails in the addressing mode of the store. Having them merged together saves quite some typing, and has no disadvantages in term of unit testing.

Ah, I've realised that you split the files per addressing modes. As the name of the patch would suggest :-) OK, keep it as it is.

Thank you for adding DAG diagrams - they are very helpful!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2330

vscale * C1 and vscale * (C0 + C1)?

3260

I don't quite understand the benefit of this transformation. The selection dag before and after are almost identical.

3601

vscale * (C0 + C1)?

7730

vscale * (C0 * C1)?

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4447

Missing doxstring.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
12

; CHECK-NEXT? Here and in the following examples.

17

Could you format this line (and similar lines elsewhere)? E.g.: https://github.com/llvm/llvm-project/blob/318d0ede572080f18d0106dbc354e11c88329a84/llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll#L11

Makes it easier to parse for humans :) And will be consistent with other files too!

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
7

vscale * C1? and vscale * (C0 + C1)?

10

Perhaps ; CHECK-NOT: add?

26
  • vscale * (C0 * C1)?
  • What's C0 and what is C1 in this example?
34

[nit] Align with the following line (missing space)

39

Is nounwind needed in these examples?

56

Perhaps ; CHECK-NOT: sub?

78

; CHECK-NOT: shl?

81

Hm, since it's 6 here, shouldn't line 77 be ; CHECK-NEXT: rdvl x0, #6?

fpetrogalli marked 21 inline comments as done.Feb 13 2020, 11:21 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3260

It is needed to avoid the having to deal with SUB when computing the addressing mode.

3601

Almost: vscale * (C0 * C1).

7730

Almost: (vscale * (C0 << C1)).

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

I prefer keeping all parameters in one line.

The "consistency with other file" argument doesn't work, most of the tests for SVE uses the convention in this patch:

frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*)$" sve*.ll | grep -v "()" | wc -l
2076
frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*,$" sve*.ll | grep -v "()" | wc -l
1433

The result becomes even more unbalanced if you look at the totality of tests in the folder:

frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*)$" *.ll | grep -v "()" | wc -l
7260
frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.*,$" *.ll | grep -v "()" | wc -l
1435

I run grep on master, @ a062a3ed7fd82c277812d80fb83dc6f05b939a84, _not_ on my dev branch :).

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
39

Yes, to be able to use CHECK-NEXT after CHECK-LABEL

81

At IR level, it is %shl = 2^6 * VSCALE

At Assembly level, the output of RDVL is 2^4 * VSCALE

Hence, to compute %shl we need to multiply RDVL output by 2^2 -> #4 is correct.

Does that make sense? I have actually added it as a comment, but I have modified the code so that it produces rdvl #1.

fpetrogalli marked 6 inline comments as done.Feb 13 2020, 11:23 AM

@andwar, thank you for you review.

I have updated the code accordingly.

Thank you!

fpetrogalli edited the summary of this revision. (Show Details)Feb 13 2020, 11:28 AM
fpetrogalli edited the summary of this revision. (Show Details)
fpetrogalli edited the summary of this revision. (Show Details)

I have added a CHECK-NOT: mul that was missing from the DAG combiner tests.

andwar added inline comments.Feb 17 2020, 11:13 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
700

DELETEME

4435

[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified:

if ((MulImm % MemWidthBytes) != 0)
  return false;
 
signed Offset = MulImm / MemWidthBytes;
if ((Offset < Min) || (Offset > Max)) 
  return false;

Base = N.getOperand(0);
OffImm = CurDAG->getTargetConstant(Offset, SDLoc(N), MVT::i64);
return true;

This way we have fewer levels of indentation.

4436

This operation mixes sizes: unsigned vs int64_t. It would be safer to be either:

  • consistently explicit about the size of integers (int64_t and int32_t), or
  • consistently implicit about the size of integers (unsigned, unsigned long)
4468

[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified as:

if (RHS.getOpcode() != ISD::SHL) 
  return false;

const SDValue SRHS = RHS.getOperand(1);
auto *C = dyn_cast<ConstantSDNode>(SRHS);
if (nullptr == C) 
  return false;

const uint64_t Shift = C->getZExtValue();
if (Shift == Scale) {
   Base = LHS;
   Offset = RHS.getOperand(0);
    return true;
}

This way you we have fewer levels of indentation.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1211

[nit] Indent consistently

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

I think that your comparison misses the context, and that's entirely my fault because I didn't make it clear. If you compare long lines only, the split is roughly 50/50.

I kindly asked for this update because these lines are wrapped by Phabricator (they don't fit on my screen and I don't see how to make Phabricator stop doing that). This makes reviewing on Phab a bit frustrating.

Btw, I think that awk would be more fitting here ;-)

#! /bin/evn bash

TEST_FILES=(
"llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll"
"llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll"
"llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll"
"llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll")

for test_file in "${TEST_FILES[@]}"; do
awk -F',' '{
  if ($1 ~ /call/) {
    # Counter number of character for indentation
    for (i = 1; i <= length($0); i++) {
      if (substr($0, i, 1) == "(") {
        lenght_col = i
        break
      }
    }

    # Split function call - 2 args
    if (NF == 2) {
      printf("%s,\n %*s %s\n", $1, lenght_col - 3, "", $2)
    }

    # Split function call - 3 args
    if (NF == 3) {
      printf("%s,\n %*s %s,\n %*s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3)
    }

    # Split function call - 4 args
    if (NF == 4) {
      printf("%s,\n %*s %s,\n %*s %s,\n %*s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3, lenght_col - 3, "", $4)
    }

  } else {
    # Not a call, not reformatting
    print $0
  }
}' ${test_file} > temp.ll

mv temp.ll ${test_file}

done
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
392

FIXME

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
7

What about vscale * C1?

14

Could C0 and C1 be different than 1?

29

Did you mean C0 = 2? And multiplication by C0 seems to be missing - there's only multiplication by 32. Shouldn't the IR look like this:

%vscale = call i64 @llvm.vscale.i64()
%mul_by_16 = mul i64 %vscale, 16
%mul_by_2 = mul i64 %mul_by_16, 32
ret i64 %mul_by_2

Cheers for updating this @fpetrogalli ! My most recent comments are mostly nits.

fpetrogalli marked 13 inline comments as done.

Thank you @adwar. I have addressed your comments.

Your awk script was very useful.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
17

You win!

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
14

Yes, but to get C0 and C1 different from 1 I'd have to use a mul instruction, which is already tested in the combine_mul_vscale_* tests.

I think it is enough to test with C0=C1=1.

29

When targeting SVE, @llvm.vscale.*() returns the number of 16-byte chunks of and SVE register. If I multiply it by 32, it means it is returning the number of 4-bit (half-byte) elements in the SVE register. RDVL returns the number of 8-bit lanes in the register, hence the number of 4-bit lanes is given by RDVL Xn, %2.

I have updated the comment setting C1 = 32.

Thanks for creating this patch @fpetrogalli!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331

Does it make sense to submit these DAGCombine changes + all related tests in a separate patch?
This patch is a bit of three patches in one at the moment: DAGCombines, reg+reg addressing modes, reg+imm addressing modes.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
224

nit: This one is missing a comment.

4438

Are you purposely using signed instead of int64_t here?

4439

nit: unnecessary brackets, can be if (Offset < Min || Offset > Max)

4471

nit: Why const?

fpetrogalli marked 8 inline comments as done.

I have extracted the DAGCombiner changes in a separate patch as
requested by @sdesmalen: https://reviews.llvm.org/D74782

fpetrogalli edited the summary of this revision. (Show Details)Feb 18 2020, 11:38 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331

I have extracted the DAGCombiner changes here: https://reviews.llvm.org/D74782

I think that the code for the addressing mode is simple enough that it can stay in a single patch.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4438

Should be int64_t for consistency.

4471

I try to use const indiscriminately when I know that the variable is not supposed to change inside the scope. The same it is done in other places in this file (not everywhere). I didn't marry this, so if you want I can remove it. :)

Thank you @sdesmalen for your review!

Francesco

Thank you for all the updates @fpetrogalli ! LGTM

sdesmalen accepted this revision.Feb 21 2020, 9:58 AM

LGTM!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4472
if (auto *C = dyn_cast<ConstantSDNode>(ShiftRHS)) {
  if (ShiftAmount == C->getZExtValue()) {
    ...
    return true;
  }
}

return false;
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

nit: did you chose a different alignment here on purpose?

This revision is now accepted and ready to land.Feb 21 2020, 9:58 AM
fpetrogalli marked 3 inline comments as done.Feb 21 2020, 11:34 AM
fpetrogalli added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4472

@sdesmalen, your version saves some lines, I'll apply it and then submit the patch. Thank you!

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

No - it is not on purpose. Do you want me to set everything to 1? I wouldn't bother, non of the code added in this patch care of the value of the alignment...

fpetrogalli marked 2 inline comments as done.Feb 21 2020, 11:59 AM
fpetrogalli added inline comments.
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251

Well, it was easier to change everything to i32 1 instead of doing another round of questions! :)

Update code as requested by @sdesmalen.

Thank you!

Francesco

fpetrogalli marked an inline comment as done.Feb 21 2020, 12:01 PM
This revision was automatically updated to reflect the committed changes.