Page MenuHomePhabricator

[AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3
ClosedPublic

Authored by rahular-rrlogic on Apr 14 2022, 4:45 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17706

I will do the capitalization and add support for i64

craig.topper added inline comments.Apr 14 2022, 8:46 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704

There's a function called isNullConstant that can be used here.

The condition for performing this optimization was modified to checking if the RHS of SUBS is 0 and if one of the values for the CSEL is a CTTZ.

Support for i64 was added.

Tests were added.

rahular-rrlogic marked 3 inline comments as done.Apr 18 2022, 12:33 AM
djtodoro added inline comments.Apr 19 2022, 12:24 AM
llvm/test/CodeGen/AArch64/table-based-cttz.ll
2

Can we add a top-level comment here, describing what we are testing?

Updated test with description and made formatting changes.

rahular-rrlogic marked an inline comment as done.Apr 19 2022, 12:50 AM

Modified test to add details and made formatting changes

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

It would be worth adding test cases for the cases it should not fold.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704

It should probably be something like this, if the value is a condition code.

AArch64CC CC = N->getConstantOperandVal(2);
if (CC == AArch64CC::EQ)

It sounds good to use it for the zero of the sub though.

17705–17706

Why is it hard-coding 31, as opposed to checking the size of the type? Why can't we get here with an i64?

llvm/test/CodeGen/AArch64/table-based-cttz.ll
28

Can you add an i64 case without the truncate down to an i32?

Removed hard coded constants and replaced that with bitwidth, added full support for i64 and added more conditions to match against the intended pattern. Modified test to include cases in which the optimization does not take place.

rahular-rrlogic marked an inline comment as done.Apr 22 2022, 12:20 PM

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

Is the last check really needed? Both icmp and cttz use the value from the same register in the IR itself.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17705–17706

I had misunderstood this part. Fixed now, thank you.

llvm/test/CodeGen/AArch64/table-based-cttz.ll
28

Added.

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

Is the last check really needed? Both icmp and cttz use the value from the same register in the IR itself.

Yes it's needed. You have to make you don't optimize select (icmp eq X, 0), 0, cttz Y if X and Y aren't the same.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17703

You can get the bitwidth a lot quicker with N1.getValueSizeInBits()

You should probably also handle select (icmp ne X, 0), (cttz X), 0.

rahular-rrlogic marked an inline comment as done.

Added condition to check if X in icmp and cttz are the same and added code to handle select (icmp ne X, 0), (cttz X), 0. Updated test for the same.

craig.topper added inline comments.Apr 25 2022, 9:21 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17709

else should be on the same line as the previous closing brace

17721

This line is longer than 80 columns.

17722

Space after if

craig.topper added inline comments.Apr 25 2022, 9:26 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718

Technically, if you look through a truncate you need to know the truncate didn't drop any bits of the CTTZ result. But maybe AArch64ISD::SUBS and CSEL can only be created after type legalization so the only possible types are i32 and i64? I'm not an AArch64 expert so I don't know.

Can you pull the cttz code into a new function, so that it is a little more separate from the other code in performCSELCombine.

Running clang-format on the patch can also be good to remove all the formatting issues. There is a script to help in clang/tools/clang-format/clang-format-diff.py.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718

Yeah I believe SUBS and CSEL will only be generated for legal types. Perhaps it is worth adding an assert just to be safe.

Created a new function to remove the folding itself from performCSELCombine(). Added an assert to check for legal types.

rahular-rrlogic marked 7 inline comments as done.Apr 29 2022, 7:12 AM
rahular-rrlogic added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718

The type legalization does occur in this case which I checked via the debug output. I have added an assert anyway as @dmgreen suggested.

craig.topper added inline comments.Apr 29 2022, 10:30 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17700–17706

You can get rid of the IsEQOrNE variable by adding

else
  return SDValue();
17700–17706

Remove the IsEQOrNE using the suggestion above, then invert this to make an early out.

17715
if (SDValue Folded = foldCTTZ(N, DAG))
  return Folded;

Modified the conditions to add an early out in the case of a non match.

rahular-rrlogic marked 3 inline comments as done.May 1 2022, 10:39 PM

Can you make sure the diff is against current main branch, not against the last version of the patch.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17653–17693

foldCTTZ -> foldCSELofCTTZ

17663

LLVM tends to prefer early exists, to reduce the indent level.

Instead of doing

if (X) {
  if (Y) {
    DoThing()
  }
}

it is preferred to do

if (!X)
  return SDValue();
if (!Y)
  return SDValue();
DoThing()

In this case, the AND variable can be removed, for example.

Renamed the function that performs the folding of CSEL into AND for CTTZ. Added more early outs. Fixed diff.

rahular-rrlogic marked 2 inline comments as done.May 2 2022, 1:07 AM

Thanks for the updates, I think this is looking good now.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17657

I think AND is now unused.

llvm/test/CodeGen/AArch64/table-based-cttz.ll
1

Use -mtriple=aarch64. Otherwise I think this will run differently on non-aarch64 native machines.
Can you also run update_llc_test_checks on the file. It fills in all the check lines in a consistent way.
And maybe name the test file cttz-and.ll, or something like it. It isn't directly related with the table based cttz's.

Changed test file name and generated assertions using update_llc_test_checks.py.

rahular-rrlogic marked an inline comment as done.May 3 2022, 7:01 AM
craig.topper added inline comments.May 3 2022, 9:59 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17676

SUB.getValue(1).getOperand(1) can be shortened to SUBS.getOperand(1). The Result number in isn't used by SDValue::getOperand.

dmgreen accepted this revision.May 5 2022, 1:30 AM

Other than the point @craig.topper mentioned, this Looks OK to me.

This revision is now accepted and ready to land.May 5 2022, 1:30 AM

Removed the unneeded call to a getValue as pointed out.

Other than the point @craig.topper mentioned, this Looks OK to me.

I have marked this as a child revision of https://reviews.llvm.org/D113291 but this patch should work independently of that. Should I remove that as a parent revision? Please let me know what you think (@djtodoro too). Additionally, how should I go about committing this since I don't have commit access?

rahular-rrlogic marked an inline comment as done.May 5 2022, 10:33 PM

I can commit this for you if you don't have commit access yet. I would just need a "name <email@domain.com" for the attribution.

I can commit this for you if you don't have commit access yet. I would just need a "name <email@domain.com" for the attribution.

You can use "Rahul Anand R <rahul@rrlogic.co.in>" for the attribution.

This revision was landed with ongoing or failed builds.May 9 2022, 2:28 AM
This revision was automatically updated to reflect the committed changes.

This causes miscompilation in llvm::lowertypetests::BitSetBuilder::build() in llvm/lib/Transforms/IPO/LowerTypeTests.cpp.
LLVM-Unit::IPOTests catches this.
BSI.AlignLog2 is masked by "and #0x1F".

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17686

CTTZ may be ISD::TRUNCATE.

Oh yeah, I missed that testcase. I'll revert for now. Thanks for the report.

@chapuni @dmgreen How do I trigger this unit test failure? When I run check-llvm-unit all the tests pass for me. I don't understand what I am missing here.

@chapuni @dmgreen How do I trigger this unit test failure? When I run check-llvm-unit all the tests pass for me. I don't understand what I am missing here.

I did; Bootstrap stage2 (on aws c6g) on an internal builder at work.

$ ninja -C build/1 install
$ CC=/path/to/install/1/bin/clang CXX=/path/to/install/1/bin/clang++ cmake ... -DCMAKE_BUILD_TYPE=Release -B build/2
$ ninja -C build/2 IPOTests && path/to/IPOTests

Yeah I think the issue is that @cttztrunc should be doing and 0x3f, not and 0x1f.

We should really have caught that, but I was relying too much on a verification script that didn't catch it due to the way rbit is specified. Which reminds me that we should really be supporting ctlz too for the same transform, but perhaps that's best left to a followup.

Made bitwidth take its value depending on whether there is a truncation present

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

Do you mean just ninja check-all? Those tests will only include the tests people have deemed worth adding in the past - they cannot cover all the possible cases and combinations of things that can come up. To run more thorough testing you need to start compiling code with the new compiler and making sure the results are correct. The issue that came up I think was when the compiler compiled itself - a bootstrap. We should make sure that doesn't still happen in the same way.

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

Do you mean just ninja check-all? Those tests will only include the tests people have deemed worth adding in the past - they cannot cover all the possible cases and combinations of things that can come up. To run more thorough testing you need to start compiling code with the new compiler and making sure the results are correct. The issue that came up I think was when the compiler compiled itself - a bootstrap. We should make sure that doesn't still happen in the same way.

I tried a bootstrap and it seems to be building fine

Thanks for checking. Lets give this another go then. LGTM

This revision was landed with ongoing or failed builds.May 20 2022, 5:41 AM