This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
22/23
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/4
table-based-cttz.ll

Differential D123782

[AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3
ClosedPublic

Authored by rahular-rrlogic on Apr 14 2022, 4:45 AM.

Download Raw Diff

Details

Reviewers

craig.topper
djtodoro
dmgreen
momchil.velikov
KyrBoh
xbolva00
spatel
fhahn
lebedev.ri

Commits

rG534ea8bca51d: [AArch64] Generate AND in place of CSEL for predicated CTTZ
rG7dcd0ea683ed: [AArch64] Generate AND in place of CSEL for predicated CTTZ

Summary

This patch implements a for a target specific optimization on top of https://reviews.llvm.org/D113291 that replaces the cmp and csel with an and

Related submission and comments: https://reviews.llvm.org/D120462

Original issue: https://github.com/llvm/llvm-project/issues/45779

Diff Detail

Event Timeline

rahular-rrlogic created this revision.Apr 14 2022, 4:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 4:45 AM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

rahular-rrlogic requested review of this revision.Apr 14 2022, 4:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 4:45 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

rahular-rrlogic added a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Apr 14 2022, 4:45 AM

rahular-rrlogic edited the summary of this revision. (Show Details)

Thanks for this! Please add test.

Can you add some tests?

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704	As far as I can tell this is checking that the condition code is 0? It would be better to check that it is equal to AArch64CC::EQ.
17706	Variables in llvm start with capital letters. We should make sure that i64 work too, it needs a different constant (there should be tests too).

Harbormaster completed remote builds in B159658: Diff 422815.Apr 14 2022, 5:28 AM

In D123782#3451501, @dmgreen wrote:

Can you add some tests?

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

In D123782#3451500, @djtodoro wrote:

Thanks for this! Please add test.

I will add test with the changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704	No, this is a mistake. I was intending to check if the operands are 0 and cttz. I will change that. Is a check for the condition being EQ really required, though?

rahular-rrlogic added inline comments.Apr 14 2022, 7:53 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17706	I will do the capitalization and add support for i64

craig.topper added inline comments.Apr 14 2022, 8:46 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704	There's a function called `isNullConstant` that can be used here.

The condition for performing this optimization was modified to checking if the RHS of SUBS is 0 and if one of the values for the CSEL is a CTTZ.

Support for i64 was added.

Tests were added.

Harbormaster completed remote builds in B160012: Diff 423342.Apr 18 2022, 12:33 AM

rahular-rrlogic marked 3 inline comments as done.Apr 18 2022, 12:33 AM

djtodoro added inline comments.Apr 19 2022, 12:24 AM

llvm/test/CodeGen/AArch64/table-based-cttz.ll
2	Can we add a top-level comment here, describing what we are testing?

Updated test with description and made formatting changes.

Harbormaster completed remote builds in B160181: Diff 423547.Apr 19 2022, 12:49 AM

rahular-rrlogic marked an inline comment as done.Apr 19 2022, 12:50 AM

Modified test to add details and made formatting changes

Harbormaster completed remote builds in B160185: Diff 423552.Apr 19 2022, 1:36 AM

@momchil.velikov - Review request.

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

It would be worth adding test cases for the cases it should not fold.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17704	It should probably be something like this, if the value is a condition code. AArch64CC CC = N->getConstantOperandVal(2); if (CC == AArch64CC::EQ) It sounds good to use it for the zero of the sub though.
17705–17706	Why is it hard-coding 31, as opposed to checking the size of the type? Why can't we get here with an i64?
llvm/test/CodeGen/AArch64/table-based-cttz.ll
28	Can you add an i64 case without the truncate down to an i32?

Removed hard coded constants and replaced that with bitwidth, added full support for i64 and added more conditions to match against the intended pattern. Modified test to include cases in which the optimization does not take place.

Harbormaster completed remote builds in B160924: Diff 424563.Apr 22 2022, 12:17 PM

In D123782#3467076, @dmgreen wrote:

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

Is the last check really needed? Both icmp and cttz use the value from the same register in the IR itself.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17705–17706	I had misunderstood this part. Fixed now, thank you.
llvm/test/CodeGen/AArch64/table-based-cttz.ll
28	Added.

In D123782#3468601, @rahular-rrlogic wrote:

In D123782#3467076, @dmgreen wrote:

As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.

What other conditions do you think should be added other than checking for 0 and cttz?

It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

Is the last check really needed? Both icmp and cttz use the value from the same register in the IR itself.

Yes it's needed. You have to make you don't optimize select (icmp eq X, 0), 0, cttz Y if X and Y aren't the same.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17703	You can get the bitwidth a lot quicker with N1.getValueSizeInBits()

You should probably also handle select (icmp ne X, 0), (cttz X), 0.

Added condition to check if X in icmp and cttz are the same and added code to handle select (icmp ne X, 0), (cttz X), 0. Updated test for the same.

Harbormaster completed remote builds in B161180: Diff 424924.Apr 25 2022, 8:57 AM

craig.topper added inline comments.Apr 25 2022, 9:21 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17709	`else` should be on the same line as the previous closing brace
17721	This line is longer than 80 columns.
17722	Space after `if`

craig.topper added inline comments.Apr 25 2022, 9:26 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718	Technically, if you look through a truncate you need to know the truncate didn't drop any bits of the CTTZ result. But maybe AArch64ISD::SUBS and CSEL can only be created after type legalization so the only possible types are i32 and i64? I'm not an AArch64 expert so I don't know.

Can you pull the cttz code into a new function, so that it is a little more separate from the other code in performCSELCombine.

Running clang-format on the patch can also be good to remove all the formatting issues. There is a script to help in clang/tools/clang-format/clang-format-diff.py.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718	Yeah I believe SUBS and CSEL will only be generated for legal types. Perhaps it is worth adding an assert just to be safe.

Created a new function to remove the folding itself from performCSELCombine(). Added an assert to check for legal types.

Harbormaster completed remote builds in B161976: Diff 426047.Apr 29 2022, 7:10 AM

rahular-rrlogic marked 7 inline comments as done.Apr 29 2022, 7:12 AM

rahular-rrlogic added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17718	The type legalization does occur in this case which I checked via the debug output. I have added an assert anyway as @dmgreen suggested.

craig.topper added inline comments.Apr 29 2022, 10:30 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17700–17706	You can get rid of the `IsEQOrNE` variable by adding else return SDValue();
17700–17706	Remove the IsEQOrNE using the suggestion above, then invert this to make an early out.
17715	if (SDValue Folded = foldCTTZ(N, DAG)) return Folded;

Modified the conditions to add an early out in the case of a non match.

rahular-rrlogic marked 3 inline comments as done.May 1 2022, 10:39 PM

Harbormaster completed remote builds in B162190: Diff 426334.May 1 2022, 10:39 PM

Can you make sure the diff is against current main branch, not against the last version of the patch.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17653–17693	foldCTTZ -> foldCSELofCTTZ
17663	LLVM tends to prefer early exists, to reduce the indent level. Instead of doing if (X) { if (Y) { DoThing() } } it is preferred to do if (!X) return SDValue(); if (!Y) return SDValue(); DoThing() In this case, the AND variable can be removed, for example.

Renamed the function that performs the folding of CSEL into AND for CTTZ. Added more early outs. Fixed diff.

Harbormaster completed remote builds in B162205: Diff 426350.May 2 2022, 1:06 AM

rahular-rrlogic marked 2 inline comments as done.May 2 2022, 1:07 AM

Thanks for the updates, I think this is looking good now.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17657	I think AND is now unused.
llvm/test/CodeGen/AArch64/table-based-cttz.ll
1	Use -mtriple=aarch64. Otherwise I think this will run differently on non-aarch64 native machines. Can you also run update_llc_test_checks on the file. It fills in all the check lines in a consistent way. And maybe name the test file cttz-and.ll, or something like it. It isn't directly related with the table based cttz's.

Changed test file name and generated assertions using update_llc_test_checks.py.

Harbormaster completed remote builds in B162450: Diff 426692.May 3 2022, 7:00 AM

rahular-rrlogic marked an inline comment as done.May 3 2022, 7:01 AM

craig.topper added inline comments.May 3 2022, 9:59 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17676	`SUB.getValue(1).getOperand(1)` can be shortened to `SUBS.getOperand(1)`. The Result number in isn't used by `SDValue::getOperand`.

Other than the point @craig.topper mentioned, this Looks OK to me.

This revision is now accepted and ready to land.May 5 2022, 1:30 AM

Removed the unneeded call to a getValue as pointed out.

Harbormaster completed remote builds in B163071: Diff 427530.May 5 2022, 10:27 PM

In D123782#3493238, @dmgreen wrote:

Other than the point @craig.topper mentioned, this Looks OK to me.

I have marked this as a child revision of https://reviews.llvm.org/D113291 but this patch should work independently of that. Should I remove that as a parent revision? Please let me know what you think (@djtodoro too). Additionally, how should I go about committing this since I don't have commit access?

rahular-rrlogic marked an inline comment as done.May 5 2022, 10:33 PM

I can commit this for you if you don't have commit access yet. I would just need a "name <email@domain.com" for the attribution.

rahular-rrlogic removed a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .May 8 2022, 9:06 PM

In D123782#3498708, @dmgreen wrote:

I can commit this for you if you don't have commit access yet. I would just need a "name <email@domain.com" for the attribution.

You can use "Rahul Anand R <rahul@rrlogic.co.in>" for the attribution.

This revision was landed with ongoing or failed builds.May 9 2022, 2:28 AM

Closed by commit rG7dcd0ea683ed: [AArch64] Generate AND in place of CSEL for predicated CTTZ (authored by rahular-rrlogic, committed by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG7dcd0ea683ed: [AArch64] Generate AND in place of CSEL for predicated CTTZ.

This causes miscompilation in llvm::lowertypetests::BitSetBuilder::build() in llvm/lib/Transforms/IPO/LowerTypeTests.cpp.
LLVM-Unit::IPOTests catches this.
BSI.AlignLog2 is masked by "and #0x1F".

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17686	CTTZ may be ISD::TRUNCATE.

Oh yeah, I missed that testcase. I'll revert for now. Thanks for the report.

dmgreen added a reverting change: rG442c351b2bb1: Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ".May 10 2022, 9:17 AM

@chapuni @dmgreen How do I trigger this unit test failure? When I run check-llvm-unit all the tests pass for me. I don't understand what I am missing here.

In D123782#3504135, @rahular-rrlogic wrote:

@chapuni @dmgreen How do I trigger this unit test failure? When I run check-llvm-unit all the tests pass for me. I don't understand what I am missing here.

I did; Bootstrap stage2 (on aws c6g) on an internal builder at work.

$ ninja -C build/1 install
$ CC=/path/to/install/1/bin/clang CXX=/path/to/install/1/bin/clang++ cmake ... -DCMAKE_BUILD_TYPE=Release -B build/2
$ ninja -C build/2 IPOTests && path/to/IPOTests

Yeah I think the issue is that @cttztrunc should be doing and 0x3f, not and 0x1f.

We should really have caught that, but I was relying too much on a verification script that didn't catch it due to the way rbit is specified. Which reminds me that we should really be supporting ctlz too for the same transform, but perhaps that's best left to a followup.

Made bitwidth take its value depending on whether there is a truncation present

Harbormaster completed remote builds in B163892: Diff 428642.May 11 2022, 7:19 AM

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

In D123782#3508832, @dmgreen wrote:

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

In D123782#3514938, @rahular-rrlogic wrote:

In D123782#3508832, @dmgreen wrote:

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

Do you mean just ninja check-all? Those tests will only include the tests people have deemed worth adding in the past - they cannot cover all the possible cases and combinations of things that can come up. To run more thorough testing you need to start compiling code with the new compiler and making sure the results are correct. The issue that came up I think was when the compiler compiled itself - a bootstrap. We should make sure that doesn't still happen in the same way.

In D123782#3515031, @dmgreen wrote:

In D123782#3514938, @rahular-rrlogic wrote:

In D123782#3508832, @dmgreen wrote:

Thanks for the update. Have you tried a bootstrap to make sure it passes now?

I never had any test failures even in the previous revision. How do I include all tests?

Do you mean just ninja check-all? Those tests will only include the tests people have deemed worth adding in the past - they cannot cover all the possible cases and combinations of things that can come up. To run more thorough testing you need to start compiling code with the new compiler and making sure the results are correct. The issue that came up I think was when the compiler compiled itself - a bootstrap. We should make sure that doesn't still happen in the same way.

I tried a bootstrap and it seems to be building fine

Thanks for checking. Lets give this another go then. LGTM

This revision was landed with ongoing or failed builds.May 20 2022, 5:41 AM

dmgreen added a commit: rG534ea8bca51d: [AArch64] Generate AND in place of CSEL for predicated CTTZ.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

44 lines

test/

CodeGen/

AArch64/

table-based-cttz.ll

139 lines

Diff 426350

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,644 Lines • ▼ Show 20 Lines	static SDValue performBRCONDCombine(SDNode *N,
else		else
BR = DAG.getNode(AArch64ISD::CBNZ, SDLoc(N), MVT::Other, Chain, LHS, Dest);		BR = DAG.getNode(AArch64ISD::CBNZ, SDLoc(N), MVT::Other, Chain, LHS, Dest);

// Do not add new nodes to DAG combiner worklist.		// Do not add new nodes to DAG combiner worklist.
DCI.CombineTo(N, BR, false);		DCI.CombineTo(N, BR, false);

return SDValue();		return SDValue();
}		}

		static SDValue foldCSELofCTTZ(SDNode *N, SelectionDAG &DAG) {
		unsigned CC = N->getConstantOperandVal(2);
		SDValue SUBS = N->getOperand(3);
		SDValue Zero, CTTZ, AND;
		dmgreenUnsubmitted Done Reply Inline Actions I think AND is now unused. dmgreen: I think AND is now unused.

		if (CC == AArch64CC::EQ && SUBS.getOpcode() == AArch64ISD::SUBS) {
		Zero = N->getOperand(0);
		CTTZ = N->getOperand(1);
		} else if (CC == AArch64CC::NE && SUBS.getOpcode() == AArch64ISD::SUBS) {
		Zero = N->getOperand(1);
		dmgreenUnsubmitted Done Reply Inline Actions LLVM tends to prefer early exists, to reduce the indent level. Instead of doing if (X) { if (Y) { DoThing() } } it is preferred to do if (!X) return SDValue(); if (!Y) return SDValue(); DoThing() In this case, the AND variable can be removed, for example. dmgreen: LLVM tends to prefer early exists, to reduce the indent level. Instead of doing ``` if (X) {…
		CTTZ = N->getOperand(0);
		} else
		return SDValue();

		if ((CTTZ.getOpcode() != ISD::CTTZ && CTTZ.getOpcode() != ISD::TRUNCATE) \|\|
		(CTTZ.getOpcode() == ISD::TRUNCATE &&
		CTTZ.getOperand(0).getOpcode() != ISD::CTTZ))
		return SDValue();

		assert((CTTZ.getValueType() == MVT::i32 \|\| CTTZ.getValueType() == MVT::i64) &&
		"Illegal type in CTTZ folding");

		if (!isNullConstant(Zero) \|\| !isNullConstant(SUBS.getValue(1).getOperand(1)))
		craig.topperUnsubmitted Done Reply Inline Actions `SUB.getValue(1).getOperand(1)` can be shortened to `SUBS.getOperand(1)`. The Result number in isn't used by `SDValue::getOperand`. craig.topper: `SUB.getValue(1).getOperand(1)` can be shortened to `SUBS.getOperand(1)`. The Result number in…
		return SDValue();

		SDValue X = CTTZ.getOpcode() == ISD::TRUNCATE
		? CTTZ.getOperand(0).getOperand(0)
		: CTTZ.getOperand(0);

		if (X != SUBS.getOperand(0))
		return SDValue();

		unsigned BitWidth = CTTZ.getValueSizeInBits();
		chapuniUnsubmitted Not Done Reply Inline Actions CTTZ may be ISD::TRUNCATE. chapuni: CTTZ may be ISD::TRUNCATE.
		SDValue BitWidthMinusOne =
		DAG.getConstant(BitWidth - 1, SDLoc(N), CTTZ.getValueType());
		return DAG.getNode(ISD::AND, SDLoc(N), CTTZ.getValueType(), CTTZ,
		BitWidthMinusOne);
		}

// Optimize CSEL instructions		// Optimize CSEL instructions
		dmgreenUnsubmitted Done Reply Inline Actions foldCTTZ -> foldCSELofCTTZ dmgreen: foldCTTZ -> foldCSELofCTTZ
static SDValue performCSELCombine(SDNode *N,		static SDValue performCSELCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// CSEL x, x, cc -> x		// CSEL x, x, cc -> x
if (N->getOperand(0) == N->getOperand(1))		if (N->getOperand(0) == N->getOperand(1))
return N->getOperand(0);		return N->getOperand(0);

		// CSEL 0, cttz(X), eq(X, 0) -> AND cttz bitwidth-1
		// CSEL cttz(X), 0, ne(X, 0) -> AND cttz bitwidth-1
		if (SDValue Folded = foldCSELofCTTZ(N, DAG))
		craig.topperUnsubmitted Done Reply Inline Actions You can get the bitwidth a lot quicker with N1.getValueSizeInBits() craig.topper: You can get the bitwidth a lot quicker with N1.getValueSizeInBits()
		return Folded;
		dmgreenUnsubmitted Done Reply Inline Actions As far as I can tell this is checking that the condition code is 0? It would be better to check that it is equal to AArch64CC::EQ. dmgreen: As far as I can tell this is checking that the condition code is 0? It would be better to check…
		rahular-rrlogicAuthorUnsubmitted Done Reply Inline Actions No, this is a mistake. I was intending to check if the operands are 0 and cttz. I will change that. Is a check for the condition being EQ really required, though? rahular-rrlogic: No, this is a mistake. I was intending to check if the operands are 0 and cttz. I will change…
		craig.topperUnsubmitted Done Reply Inline Actions There's a function called `isNullConstant` that can be used here. craig.topper: There's a function called `isNullConstant` that can be used here.
		dmgreenUnsubmitted Done Reply Inline Actions It should probably be something like this, if the value is a condition code. AArch64CC CC = N->getConstantOperandVal(2); if (CC == AArch64CC::EQ) It sounds good to use it for the zero of the sub though. dmgreen: It should probably be something like this, if the value is a condition code. ``` AArch64CC CC…

return performCONDCombine(N, DCI, DAG, 2, 3);		return performCONDCombine(N, DCI, DAG, 2, 3);
		dmgreenUnsubmitted Done Reply Inline Actions Variables in llvm start with capital letters. We should make sure that i64 work too, it needs a different constant (there should be tests too). dmgreen: Variables in llvm start with capital letters. We should make sure that i64 work too, it needs a…
		rahular-rrlogicAuthorUnsubmitted Done Reply Inline Actions I will do the capitalization and add support for i64 rahular-rrlogic: I will do the capitalization and add support for i64
		dmgreenUnsubmitted Done Reply Inline Actions Why is it hard-coding 31, as opposed to checking the size of the type? Why can't we get here with an i64? dmgreen: Why is it hard-coding 31, as opposed to checking the size of the type? Why can't we get here…
		rahular-rrlogicAuthorUnsubmitted Done Reply Inline Actions I had misunderstood this part. Fixed now, thank you. rahular-rrlogic: I had misunderstood this part. Fixed now, thank you.
		craig.topperUnsubmitted Done Reply Inline Actions You can get rid of the `IsEQOrNE` variable by adding else return SDValue(); craig.topper: You can get rid of the `IsEQOrNE` variable by adding ``` else return SDValue(); ```
		craig.topperUnsubmitted Done Reply Inline Actions Remove the IsEQOrNE using the suggestion above, then invert this to make an early out. craig.topper: Remove the IsEQOrNE using the suggestion above, then invert this to make an early out.
}		}

static SDValue performSETCCCombine(SDNode *N, SelectionDAG &DAG) {		static SDValue performSETCCCombine(SDNode *N, SelectionDAG &DAG) {
		craig.topperUnsubmitted Done Reply Inline Actions `else` should be on the same line as the previous closing brace craig.topper: `else` should be on the same line as the previous closing brace
assert(N->getOpcode() == ISD::SETCC && "Unexpected opcode!");		assert(N->getOpcode() == ISD::SETCC && "Unexpected opcode!");
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();		ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();
SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		craig.topperUnsubmitted Done Reply Inline Actions if (SDValue Folded = foldCTTZ(N, DAG)) return Folded; craig.topper: ``` if (SDValue Folded = foldCTTZ(N, DAG)) return Folded; ```

// setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X		// setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X
if (Cond == ISD::SETNE && isOneConstant(RHS) &&		if (Cond == ISD::SETNE && isOneConstant(RHS) &&
		craig.topperUnsubmitted Done Reply Inline Actions Technically, if you look through a truncate you need to know the truncate didn't drop any bits of the CTTZ result. But maybe AArch64ISD::SUBS and CSEL can only be created after type legalization so the only possible types are i32 and i64? I'm not an AArch64 expert so I don't know. craig.topper: Technically, if you look through a truncate you need to know the truncate didn't drop any bits…
		dmgreenUnsubmitted Done Reply Inline Actions Yeah I believe SUBS and CSEL will only be generated for legal types. Perhaps it is worth adding an assert just to be safe. dmgreen: Yeah I believe SUBS and CSEL will only be generated for legal types. Perhaps it is worth adding…
		rahular-rrlogicAuthorUnsubmitted Done Reply Inline Actions The type legalization does occur in this case which I checked via the debug output. I have added an assert anyway as @dmgreen suggested. rahular-rrlogic: The type legalization does occur in this case which I checked via the debug output. I have…
LHS->getOpcode() == AArch64ISD::CSEL &&		LHS->getOpcode() == AArch64ISD::CSEL &&
isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) &&		isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) &&
LHS->hasOneUse()) {		LHS->hasOneUse()) {
		craig.topperUnsubmitted Done Reply Inline Actions This line is longer than 80 columns. craig.topper: This line is longer than 80 columns.
// Invert CSEL's condition.		// Invert CSEL's condition.
		craig.topperUnsubmitted Done Reply Inline Actions Space after `if` craig.topper: Space after `if`
auto *OpCC = cast<ConstantSDNode>(LHS.getOperand(2));		auto *OpCC = cast<ConstantSDNode>(LHS.getOperand(2));
auto OldCond = static_cast<AArch64CC::CondCode>(OpCC->getZExtValue());		auto OldCond = static_cast<AArch64CC::CondCode>(OpCC->getZExtValue());
auto NewCond = getInvertedCondCode(OldCond);		auto NewCond = getInvertedCondCode(OldCond);

// csel 0, 1, !cond, X		// csel 0, 1, !cond, X
SDValue CSEL =		SDValue CSEL =
DAG.getNode(AArch64ISD::CSEL, DL, LHS.getValueType(), LHS.getOperand(0),		DAG.getNode(AArch64ISD::CSEL, DL, LHS.getValueType(), LHS.getOperand(0),
LHS.getOperand(1), DAG.getConstant(NewCond, DL, MVT::i32),		LHS.getOperand(1), DAG.getConstant(NewCond, DL, MVT::i32),
▲ Show 20 Lines • Show All 3,419 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/table-based-cttz.ll

This file was added.

				; RUN: llc -march=aarch64 < %s \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions Use -mtriple=aarch64. Otherwise I think this will run differently on non-aarch64 native machines. Can you also run update_llc_test_checks on the file. It fills in all the check lines in a consistent way. And maybe name the test file cttz-and.ll, or something like it. It isn't directly related with the table based cttz's. dmgreen: Use -mtriple=aarch64. Otherwise I think this will run differently on non-aarch64 native…

				djtodoroUnsubmitted Done Reply Inline Actions Can we add a top-level comment here, describing what we are testing? djtodoro: Can we add a top-level comment here, describing what we are testing?
				;; Check the transformation
				;; CSEL 0, cttz, cc -> AND cttz numbits-1
				;; for cttz in the case of i32 and i64 respectively

				;; Cases for which the optimzation takes place
				define i32 @cttzi32(i32 %x) {
				; CHECK: rbit w8, w0
				; CHECK-NEXT: clz w8, w8
				; CHECK-NEXT: and w0, w8, #0x1f
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 0
				%2 = select i1 %1, i32 0, i32 %0
				ret i32 %2
				}

				define i64 @cttzi64(i64 %x) {
				; CHECK: rbit x8, x0
				; CHECK-NEXT: clz x8, x8
				; CHECK-NEXT: and x0, x8, #0x3f
				; CHECK-NEXT: ret
				entry:
				%0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
				%1 = icmp eq i64 %x, 0
				%2 = select i1 %1, i64 0, i64 %0
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you add an i64 case without the truncate down to an i32? dmgreen: Can you add an i64 case without the truncate down to an i32?
				rahular-rrlogicAuthorUnsubmitted Done Reply Inline Actions Added. rahular-rrlogic: Added.
				ret i64 %2
				}

				define i32 @cttzi32ne(i32 %x) {
				; CHECK: rbit w8, w0
				; CHECK-NEXT: clz w8, w8
				; CHECK-NEXT: and w0, w8, #0x1f
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp ne i32 %x, 0
				%2 = select i1 %1, i32 %0, i32 0
				ret i32 %2
				}

				define i64 @cttzi64ne(i64 %x) {
				; CHECK: rbit x8, x0
				; CHECK-NEXT: clz x8, x8
				; CHECK-NEXT: and x0, x8, #0x3f
				; CHECK-NEXT: ret
				entry:
				%0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
				%1 = icmp ne i64 %x, 0
				%2 = select i1 %1, i64 %0, i64 0
				ret i64 %2
				}

				define i32 @cttztrunc(i64 %x) {
				; CHECK: rbit x8, x0
				; CHECK-NEXT: clz x8, x8
				; CHECK-NEXT: and w0, w8, #0x1f
				; CHECK-NEXT: ret
				entry:
				%0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
				%1 = icmp eq i64 %x, 0
				%2 = select i1 %1, i64 0, i64 %0
				%3 = trunc i64 %2 to i32
				ret i32 %3
				}

				;; Cases for which the optimization does not take place
				define i32 @cttzne(i32 %x) {
				; CHECK: rbit w8, w0
				; CHECK-NEXT: cmp w0, #0
				; CHECK-NEXT: clz w8, w8
				; CHECK-NEXT: csel w0, wzr, w8, ne
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp ne i32 %x, 0
				%2 = select i1 %1, i32 0, i32 %0
				ret i32 %2
				}

				define i32 @cttzxnot0(i32 %x) {
				; CHECK: rbit w8, w0
				; CHECK-NEXT: cmp w0, #10
				; CHECK-NEXT: clz w8, w8
				; CHECK-NEXT: csel w0, wzr, w8, eq
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 10
				%2 = select i1 %1, i32 0, i32 %0
				ret i32 %2
				}

				define i32 @cttzlhsnot0(i32 %x) {
				; CHECK: rbit w9, w0
				; CHECK-NEXT: mov w8, #10
				; CHECK-NEXT: clz w9, w9
				; CHECK-NEXT: cmp w0, #0
				; CHECK-NEXT: csel w0, w8, w9, eq
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 0
				%2 = select i1 %1, i32 10, i32 %0
				ret i32 %2
				}

				define i32 @notcttz(i32 %x) {
				; CHECK: clz w8, w0
				; CHECK-NEXT: cmp w0, #0
				; CHECK-NEXT: csel w0, wzr, w8, eq
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.ctlz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 0
				%2 = select i1 %1, i32 0, i32 %0
				ret i32 %2
				}

				define i32 @cttzlhsnotx(i32 %x, i32 %y) {
				; CHECK: rbit w8, w0
				; CHECK-NEXT: cmp w1, #0
				; CHECK-NEXT: clz w8, w8
				; CHECK-NEXT: csel w0, wzr, w8, eq
				; CHECK-NEXT: ret
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %y, 0
				%2 = select i1 %1, i32 0, i32 %0
				ret i32 %2
				}

				declare i32 @llvm.cttz.i32(i32, i1)

				declare i64 @llvm.cttz.i64(i64, i1)

				declare i32 @llvm.ctlz.i32(i32, i1)