This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
TargetLowering.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
Transforms/AggressiveInstCombine/
-
AggressiveInstCombine/
25/47
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/AARCH64/
-
Transforms/
-
AggressiveInstCombine/
-
AARCH64/
-
dereferencing-pointer.ll
-
lit.local.cfg
5/6
lower-table-based-ctz-basics.ll
-
lower-table-based-ctz.ll
-
non-argument-value.ll
-
zero-element.ll

Differential D113291

[AggressiveInstCombine] Lower Table Based CTTZ
ClosedPublic

Authored by djtodoro on Nov 5 2021, 8:58 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
lebedev.ri
fhahn
dmgreen
xbolva00

Commits

rGfec01ee3f524: [AggressiveInstCombine] Lower Table Based CTTZ

Summary

This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine.
This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

TODO: Get the data on SPEC benchmark.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

craig.topper added reviewers: spatel, lebedev.ri.Nov 12 2021, 11:58 AM

djtodoro marked 7 inline comments as done.Nov 15 2021, 4:02 AM

djtodoro added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
367	OK, sure.
447	Agree with this.
479	Yep.
519	Actually, we don't need it.

Refactor && update the tests
Address the comments

Harbormaster completed remote builds in B134216: Diff 387204.Nov 15 2021, 4:04 AM

xbolva00 added inline comments.Nov 15 2021, 4:18 AM

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz-basics.ll
1	AggressiveInstCombine/AARCH64 -> AggressiveInstCombine/AArch64

update the test dir name

Harbormaster completed remote builds in B134224: Diff 387215.Nov 15 2021, 4:52 AM

any other comment here? :)

TODO: Get the data on SPEC benchmark.

Did you manage to collect any perf data yet to motivate this change?

The tests already contain a few things that seem unrelated, it would be good to clean those things up.

llvm/test/Transforms/AggressiveInstCombine/AArch64/dereferencing-pointer.ll
28 ↗	(On Diff #387215)	That's out of date.
40 ↗	(On Diff #387215)	can the tests instead just take `i64 %b` or does it need to be a pointer? (
50 ↗	(On Diff #387215)	is all the mote data needed? Same for the other tests
llvm/test/Transforms/AggressiveInstCombine/AArch64/non-argument-value.ll
31 ↗	(On Diff #387215)	not needed

djtodoro edited the summary of this revision. (Show Details)Nov 26 2021, 1:28 AM

djtodoro added a reviewer: fhahn.

In D113291#3153219, @fhahn wrote:

TODO: Get the data on SPEC benchmark.

Did you manage to collect any perf data yet to motivate this change?

Not yet, but I will share ASAP.

Thanks for your comments!

llvm/test/Transforms/AggressiveInstCombine/AArch64/dereferencing-pointer.ll
40 ↗	(On Diff #387215)	This test is meant to test the pointer type. There are other tests checking non-ptr type.
50 ↗	(On Diff #387215)	Yep, I'll reduce the tests in the next update.

-Clean up the tests

Harbormaster completed remote builds in B136159: Diff 389938.Nov 26 2021, 2:32 AM

AggressiveInstCombine is an extension of InstCombine. That is, it's a target-independent canonicalization pass. Therefore, we shouldn't use any target-specific cost model to enable the transform. Since we have a generic intrinsic for cttz, it's fine to create that for all targets as long as we can guarantee that a generic expansion of that intrinsic in the backend will not be worse than the original code.

But I'm not sure if we can make that guarantee? If not, this should be implemented as a late IR or codegen pass (as it was when first posted).

In D113291#3238355, @spatel wrote:

AggressiveInstCombine is an extension of InstCombine. That is, it's a target-independent canonicalization pass. Therefore, we shouldn't use any target-specific cost model to enable the transform. Since we have a generic intrinsic for cttz, it's fine to create that for all targets as long as we can guarantee that a generic expansion of that intrinsic in the backend will not be worse than the original code.

But I'm not sure if we can make that guarantee? If not, this should be implemented as a late IR or codegen pass (as it was when first posted).

Why is it ok to use DataLayout in InstCombine/AggressiveInstCombine, but not TTI? The cttz seems like it could enable other optimizations so I don't think we want it late. In particular, we should give the optimizer a chance to prove that the input isn't 0 to remove the select that gets generated after the cttz intrinsic. That could require computeKnownBits or CorrelatedValuePropagation. LoopIdiomRecognize queries TTI before generating cttz from loops.

In D113291#3238387, @craig.topper wrote:

In D113291#3238355, @spatel wrote:

AggressiveInstCombine is an extension of InstCombine. That is, it's a target-independent canonicalization pass. Therefore, we shouldn't use any target-specific cost model to enable the transform. Since we have a generic intrinsic for cttz, it's fine to create that for all targets as long as we can guarantee that a generic expansion of that intrinsic in the backend will not be worse than the original code.

But I'm not sure if we can make that guarantee? If not, this should be implemented as a late IR or codegen pass (as it was when first posted).

Why is it ok to use DataLayout in InstCombine/AggressiveInstCombine, but not TTI? The cttz seems like it could enable other optimizations so I don't think we want it late. In particular, we should give the optimizer a chance to prove that the input isn't 0 to remove the select that gets generated after the cttz intrinsic. That could require computeKnownBits or CorrelatedValuePropagation. LoopIdiomRecognize queries TTI before generating cttz from loops.

Yeah, it's fuzzy. I think we're only supposed to be using DataLayout to determine where creating an illegal type would obviously lead to worse codegen. But that's not much different than asking if some operation is legal on target X.
We've had several other requests for a cost-aware version of instcombine over the years, so maybe we should just use this an opportunity to reframe/rename AggressiveInstCombine.

Does handling of "0", when accessing index 0 for x = 0 not acceptable?

What is the reason for this patch being on hold?

We've had several other requests for a cost-aware version of instcombine over the years, so maybe we should just use this an opportunity to reframe/rename AggressiveInstCombine.

It could help with doing thing like https://reviews.llvm.org/D114964 earlier too, where the transfroms are not always profitable and not reversible back to the original code, but would be beneficial to do earlier to get better cost modelling and vectorization.

Hello,

I have a small query regarding this patch. The patch emits following llvm assembly for ctz table -

-----Patch assembly-------
// %bb.0:

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

but in gcc, we have the following assembly being emitted -

------------GCC---------------------
f(unsigned int):

rbit    w0, w0
clz     w0, w0
and     w0, w0, 31
ret

Reference - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

[1] My question is - To solve the bug, do I have to generate assembly similar to GCC?

Please give me suggestions on moving forward in solving this bug.

In D113291#3274616, @gsocshubham wrote:
Hello,

I have a small query regarding this patch. The patch emits following llvm assembly for ctz table -

-----Patch assembly-------
// %bb.0:
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
but in gcc, we have the following assembly being emitted -

------------GCC---------------------
f(unsigned int):
rbit    w0, w0
clz     w0, w0
and     w0, w0, 31
ret
Reference - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

[1] My question is - To solve the bug, do I have to generate assembly similar to GCC?

Please give me suggestions on moving forward in solving this bug.

Hi,

Is this really a bug?

I have a small query regarding this patch. The patch emits following llvm assembly for ctz table -

-----Patch assembly-------
// %bb.0:
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
but in gcc, we have the following assembly being emitted -

------------GCC---------------------
f(unsigned int):
rbit    w0, w0
clz     w0, w0
and     w0, w0, 31
ret

That sounds like a backend optimization that could happen given we know the semantics of the AArch64 instruction.

In D113291#3275222, @dmgreen wrote:
I have a small query regarding this patch. The patch emits following llvm assembly for ctz table -

-----Patch assembly-------
// %bb.0:
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
but in gcc, we have the following assembly being emitted -

------------GCC---------------------
f(unsigned int):
rbit    w0, w0
clz     w0, w0
and     w0, w0, 31
ret
That sounds like a backend optimization that could happen given we know the semantics of the AArch64 instruction.

Yep, +1 for this. I'd leave this job to backends.

@craig.topper @spatel Do you think this can go as is?

(P.S. I don't have access to any AARCH64 board currently, so I can see into the SPEC numbers.)

I am wondering about general direction..

Is it worth it? On way to become compiler just for benchmarks?

Spend compile time just to optimize one very very specific pattern from spec is bad thing imho.

Can you show us some other real world “hits”? I assume any sane project already uses builtin to compute this value efficiently.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
379	getZExtValue may assert large ints

Can you please provide compile time data with @nikic's compile time tracker? This should be *very cheap* to be acceptable.

This revision now requires changes to proceed.Feb 1 2022, 2:17 AM

In D113291#3286784, @xbolva00 wrote:

I am wondering about general direction..

Is it worth it? On way to become compiler just for benchmarks?

Spend compile time just to optimize one very very specific pattern from spec is bad thing imho.

Can you show us some other real world “hits”? I assume any sane project already uses builtin to compute this value efficiently.

The same algorithm is documented here https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup

It's also nearby in the code in primesieve from the recent tzcnt discusson. Around line 143 if you expand the context on Erat.hpp here https://github.com/kimwalisch/primesieve/pull/109/files Granted that code knows when to use the tzcnt builtin instead of that code. I'm only mentioning it to show it is a known way to implement tzcnt that is used in more than just spec.

gsocshubham mentioned this in D119010: [AggressiveInstCombine] Recognize table-based ctz implementation and enable it for AARCH64 at -O3.Feb 4 2022, 9:04 AM

In D113291#3286803, @xbolva00 wrote:

Can you please provide compile time data with @nikic's compile time tracker? This should be *very cheap* to be acceptable.

.

It is done: http://llvm-compile-time-tracker.com/compare.php?from=9920943ea201189f9b34918c7663d8a03d7e4676&to=666dd20021db313e4ead3e39ac4c8a12b9525521&stat=instructions

gsocshubham mentioned this in D120462: [AArch64InstrInfo.td] - Lowering fix for cttz intrinsic.Feb 24 2022, 1:36 AM

rahular-rrlogic added a child revision: D120462: [AArch64InstrInfo.td] - Lowering fix for cttz intrinsic.Apr 14 2022, 12:48 AM

rahular-rrlogic mentioned this in D123782: [AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3.Apr 14 2022, 4:45 AM

rahular-rrlogic added a child revision: D123782: [AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3.

@xbolva00 ping :)

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 1:58 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

How hard is to add x86 support?

(Not blocking this)

This revision now requires review to proceed.May 5 2022, 2:04 AM

Thanks!

In D113291#3493276, @xbolva00 wrote:

How hard is to add x86 support?

Is it even worth of implementing it for x86?

In D113291#3495933, @djtodoro wrote:

Thanks!

In D113291#3493276, @xbolva00 wrote:

How hard is to add x86 support?

Is it even worth of implementing it for x86?

Yes. Intel CPUs from about 2013 have a TZCNT instruction.

OK, great! it will be on my TODO list!

But I think that x86 support doesn’t block this.

Ideally this transformation should just emit proper intrinsic without need for target hook.

What is the real problem?

rahular-rrlogic removed a child revision: D123782: [AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3.May 8 2022, 9:06 PM

dmgreen mentioned this in D125755: [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat.May 17 2022, 3:03 AM

Hello - We were having a discussion about a very similar patch in D125755. I think the outcome for this patch is that either:

We need to do this later (maybe in CodeGenPrepare).
We need to do this unconditionally without the call to TTI.preferCTTZLowering() and have the reverse transform later for targets that do not have a cheaper alternative.
We need to argue some more :)

There are more details about why in D125755. I would go for the first option if it doesn't lead to worse performance, as for the second I'm not sure when it would be profitable to transform back and emit the table. You may not want to do that for non-hot ctzs? It sounds like it may be difficult to get right, but maybe I'm overestimating it.

as for the second I'm not sure when it would be profitable to transform back and emit the table

You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is popcount(v & -v). It should be a straightforward comparison, generally. If you have popcount, use it. If multiply is legal, use a table lookup. Otherwise... maybe stick with the popcount expansion? Probably any approach is expensive at that point.

Compare the generated code for arm-eabi.

You may not want to do that for non-hot ctzs?

As opposed to what, calling into compiler-rt?

In D113291#3539840, @dmgreen wrote:

Hello - We were having a discussion about a very similar patch in D125755. I think the outcome for this patch is that either:

We need to do this later (maybe in CodeGenPrepare).

The initial version of patch was implemented within CodeGenPrepare. And I think it should not introduce any performance regression.

We need to do this unconditionally without the call to TTI.preferCTTZLowering() and have the reverse transform later for targets that do not have a cheaper alternative.

Hmm... I need to think about that.

We need to argue some more :)

Seems that I need to find some time to catch up the conversation in D125755. :)

There are more details about why in D125755. I would go for the first option if it doesn't lead to worse performance, as for the second I'm not sure when it would be profitable to transform back and emit the table. You may not want to do that for non-hot ctzs?

I am not sure I get the question.

In D113291#3539965, @efriedma wrote:

as for the second I'm not sure when it would be profitable to transform back and emit the table

You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is popcount(v & -v). It should be a straightforward comparison, generally. If you have popcount, use it. If multiply is legal, use a table lookup. Otherwise... maybe stick with the popcount expansion? Probably any approach is expensive at that point.

Compare the generated code for arm-eabi.

I guess we could measure something like that, but seems to me that it could introduce some performance regressions...

In D113291#3539965, @efriedma wrote:

as for the second I'm not sure when it would be profitable to transform back and emit the table

You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is popcount(v & -v). It should be a straightforward comparison, generally. If you have popcount, use it. If multiply is legal, use a table lookup. Otherwise... maybe stick with the popcount expansion? Probably any approach is expensive at that point.

Compare the generated code for arm-eabi.

You may not want to do that for non-hot ctzs?

As opposed to what, calling into compiler-rt?

I was meaning - it can be difficult for the compiler to recognize _when_ a ctz is performance critical. If the size of the table is large (which I was possible over-estimating the size of in my mind), then you may not want to emit the table for every ctz in the program. Currently that places where this is used have said, from the fact that they wrote it this way, that these ctz's are important. It just depends on whether converting to a table is always better, and if the table it small enough to be reasonable in the common case. 32 x i8 doesn't sound too big compared to what I originally imagined, if that is all it needs.

In D113291#3540303, @dmgreen wrote:

In D113291#3539965, @efriedma wrote:

as for the second I'm not sure when it would be profitable to transform back and emit the table

You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is popcount(v & -v). It should be a straightforward comparison, generally. If you have popcount, use it. If multiply is legal, use a table lookup. Otherwise... maybe stick with the popcount expansion? Probably any approach is expensive at that point.

Compare the generated code for arm-eabi.

You may not want to do that for non-hot ctzs?

As opposed to what, calling into compiler-rt?

I was meaning - it can be difficult for the compiler to recognize _when_ a ctz is performance critical. If the size of the table is large (which I was possible over-estimating the size of in my mind), then you may not want to emit the table for every ctz in the program. Currently that places where this is used have said, from the fact that they wrote it this way, that these ctz's are important. It just depends on whether converting to a table is always better, and if the table it small enough to be reasonable in the common case. 32 x i8 doesn't sound too big compared to what I originally imagined, if that is all it needs.

For the table lookup, is there an algorithm for creating the special constant that is used in the multiply? Or would we just hardcode known constants for common sizes.

In D113291#3539965, @efriedma wrote:

as for the second I'm not sure when it would be profitable to transform back and emit the table

You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is popcount(v & -v). It should be a straightforward comparison, generally. If you have popcount, use it. If multiply is legal, use a table lookup. Otherwise... maybe stick with the popcount expansion? Probably any approach is expensive at that point.

Note, the popcount expansion already uses a multiply without checking if it is legal.

For the table lookup, is there an algorithm for creating the special constant that is used in the multiply? Or would we just hardcode known constants for common sizes.

https://en.wikipedia.org/wiki/De_Bruijn_sequence has a description of the algorithm. Probably we'd just hard-code constants, though; practically speaking, the only sizes we actually care about are 16, 32, and 64. (For anything that doesn't fit in a single register, we probably just want to split the cttz.)

Note, the popcount expansion already uses a multiply without checking if it is legal.

It's a relatively cheap multiply to expand on a target with a shifter, though.

Thanks a lot for the comments! Can someone please sum the things up that need to be done for this?

By implementing this in the CodeGenPrepare doesn't include/imply that we should get rid of the call to TTI.preferCTTZLowering().
If we decide to go with the unconditional (not to care about Target) cttz lowering and to get rid of the call to TTI.preferCTTZLowering(), what should be done?

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).
Teach TargetLowering::expandCTTZ to emit a table lookup.

drop target dependent hooks

In D113291#3542941, @efriedma wrote:

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).

The latest update implements this.

Teach TargetLowering::expandCTTZ to emit a table lookup.

Harbormaster completed remote builds in B168559: Diff 435145.Jun 8 2022, 7:51 AM

In D113291#3566581, @djtodoro wrote:

In D113291#3542941, @efriedma wrote:

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).

The latest update implements this.

Teach TargetLowering::expandCTTZ to emit a table lookup.

@djtodoro - Will you be sending patch for (2) "Teach TargetLowering::expandCTTZ to emit a table lookup."?

In D113291#3569114, @gsocshubham wrote:

In D113291#3566581, @djtodoro wrote:

In D113291#3542941, @efriedma wrote:

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).

The latest update implements this.

Teach TargetLowering::expandCTTZ to emit a table lookup.

@djtodoro - Will you be sending patch for (2) "Teach TargetLowering::expandCTTZ to emit a table lookup."?

Unfortunately, I don’t have time to do it right now. If you are interested, please go ahead with the implementation.

In D113291#3569225, @djtodoro wrote:

In D113291#3569114, @gsocshubham wrote:

In D113291#3566581, @djtodoro wrote:

In D113291#3542941, @efriedma wrote:

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).

The latest update implements this.

Teach TargetLowering::expandCTTZ to emit a table lookup.

@djtodoro - Will you be sending patch for (2) "Teach TargetLowering::expandCTTZ to emit a table lookup."?

Unfortunately, I don’t have time to do it right now. If you are interested, please go ahead with the implementation.

@djtodoro - Sure. I am interested to do it. Can you elaborate (2) a bit in detail?

In D113291#3569456, @gsocshubham wrote:

In D113291#3569225, @djtodoro wrote:

In D113291#3569114, @gsocshubham wrote:

In D113291#3566581, @djtodoro wrote:

In D113291#3542941, @efriedma wrote:

Concretely, my preferred solution looks something like:

Perform the transform unconditionally in AggressiveInstCombine (so this patch without the preferCTTZLowering() bits).

The latest update implements this.

Teach TargetLowering::expandCTTZ to emit a table lookup.

@djtodoro - Will you be sending patch for (2) "Teach TargetLowering::expandCTTZ to emit a table lookup."?

Unfortunately, I don’t have time to do it right now. If you are interested, please go ahead with the implementation.

@djtodoro - Sure. I am interested to do it. Can you elaborate (2) a bit in detail?

Basically this patch implements lowering of the table-based cttz implementation into @llvm.cttz unconditionally. For some targets it won't be that beneficial, so during the TargetLowering::expandCTTZ we should emit table lookup again. @efriedma May have something to add.

I don't think I have much further to say. Emitting a table lookup from TargetLowering::expandCTTZ should be straightforward, I think. See DAGCombiner::convertSelectOfFPConstantsToLoadOffset for an example of how to emit a constant table.

In D113291#3602808, @efriedma wrote:

I don't think I have much further to say. Emitting a table lookup from TargetLowering::expandCTTZ should be straightforward, I think. See DAGCombiner::convertSelectOfFPConstantsToLoadOffset for an example of how to emit a constant table.

Do we need to be careful with vector CTTZ which can also go through expand CTTZ?

gsocshubham mentioned this in D128911: Emit table lookup from TargetLowering::expandCTTZ().Jun 30 2022, 7:17 AM

gsocshubham edited child revisions, added: D128911: Emit table lookup from TargetLowering::expandCTTZ(); removed: D120462: [AArch64InstrInfo.td] - Lowering fix for cttz intrinsic.Jun 30 2022, 7:20 AM

Is this patch ready to be merged? This is parent patch of https://reviews.llvm.org/D128911. Does child get merged before parent?

Anyways, https://reviews.llvm.org/D128911 can be merged independently of this patch.

In D113291#3670999, @gsocshubham wrote:

Is this patch ready to be merged? This is parent patch of https://reviews.llvm.org/D128911. Does child get merged before parent?

Anyways, https://reviews.llvm.org/D128911 can be merged independently of this patch.

D128911 needs to go in first, once that is done we can move forward with this one. It could do with a rebase and a clang-format.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
440	We will need to support opaque pointers nowadays.
479	It's probably better to just say "64bit targets" as opposed to a specific target.
490	I believe it's the top `Bitwidth - Log2(Bitwidth)` bits.

dmgreen added inline comments.Jul 23 2022, 4:47 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
482	Does the extend between the lshr and the mul every happen? From what I can tell, the type of the VT should be based on the type of these operations.
576	The TTI's can be removed now (and if you rebase they may already be present, but still are not needed by the new code any more).
llvm/test/Transforms/AggressiveInstCombine/AArch64/lower-table-based-ctz-basics.ll
1 ↗	(On Diff #435145)	The tests can be moved out of the AArch64 directory, so long as they drop the AArch64 triple.

gsocshubham removed a child revision: D128911: Emit table lookup from TargetLowering::expandCTTZ().Aug 4 2022, 5:07 AM

dmgreen mentioned this in rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ().Aug 8 2022, 4:08 AM

@djtodoro Do you have any time to update this? Otherwise do you mind we take it over and we can update it and get it reviewed. Thanks.

I promise I will find some time to update this - it is coming next week.

support opaque pointers
remove leftovers (since this was aarch64 only)
move the tests in a non-target dir

Harbormaster completed remote builds in B181687: Diff 453206.Aug 17 2022, 12:20 AM

djtodoro marked 3 inline comments as done.Aug 17 2022, 1:08 AM

djtodoro added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
482	It does not happen in all the cases.

remove a duplicated test

Harbormaster completed remote builds in B181697: Diff 453228.Aug 17 2022, 2:10 AM

Why are some test files still specifying a triple in the RUN line?

It would be good to consolidate tests into less files if possible with better names/comments to explain exactly what differences are being tested in the sequence of tests. There should also be negative tests (wrong table constants, wrong magic multiplier, wrong shift amount, etc), so we know that the transform is not firing on mismatches.

In D113291#3728796, @spatel wrote:

Why are some test files still specifying a triple in the RUN line?

Leftovers. Thanks.

It would be good to consolidate tests into less files if possible with better names/comments to explain exactly what differences are being tested in the sequence of tests. There should also be negative tests (wrong table constants, wrong magic multiplier, wrong shift amount, etc), so we know that the transform is not firing on mismatches.

I will remove one redundant test. There are C producers as well as some top-level comments that explain what it should test (if the comment is needed). Furthermore, thanks for the suggestion for adding some negative tests - will add it.

adding negative tests
rename the tests
clean the target triple leftovers from tests

Harbormaster completed remote builds in B181937: Diff 453574.Aug 18 2022, 2:39 AM

dmgreen added inline comments.Aug 18 2022, 4:24 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
383	If the length of the table is larger than the InputBits, how are we sure that the matched elements will be the correct ones? Is this always guaranteed? I think I would have expected a check that `for each i=0..InputBits-1, Table[(Mul<<i)>>Shift] == i`. With a check that the index is in range. Are they always equivalent with the larger tables too?
442	Can this always just use the Load type?
454	I think User here could be GlobalVariable
458	GEPUser->getOperand(0) -> Global->getInitializer(). It is worth adding a test where the global is extern.
480	It might be better to switch this logical around - unsigned InputBits = X1->getType()->getScalarSizeInBits(); if (InputBits != 32 && InputBits != 64) return false;
482	Do you have a test case where the extend is between the shift and the mul?
483–484	Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. The -1 case is for a larger table with more elements but that can handle zero values?
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
78 ↗	(On Diff #453574)	It is better to pass x as a parameter, although I'm not sure it matter much where x comes from for the rest of the pattern.

spatel added inline comments.Aug 18 2022, 7:25 AM

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll
123 ↗	(On Diff #453574)	This is identical to the previous test, so it is not adding value here. I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3". When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR.
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
78 ↗	(On Diff #453574)	Right - as far as this patch is concerned, this is identical to the previous test, so it shouldn't be here. See my earlier comment about PhaseOrdering tests if we want more end-to-end coverage for `opt -O3`.

Thanks for the comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
383	Hmmm, can you please walk me through out an example?
442	I think yes, nothing comes up to my mind that can break it.
480	Sounds good.
482	I was completely sure that I had a case for it, but I am not able to produce it actually -- so I deleted it for now.
483–484	Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. Right... Bu you meant `Log2_64()`, right? It is a power of 2, since it is either 32 or 64, so no need to add any assert here. The -1 case is for a larger table with more elements but that can handle zero values? int ctz2(unsigned x) { #define u 0 static short table[64] = { 32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14, 10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15, 31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26, 30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u }; x = (x & -x) * 0x0450FBAF; return table[x >> 26]; }
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll
123 ↗	(On Diff #453574)	This is identical to the previous test, so it is not adding value here. Right, thanks! I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3". When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR. I usually do create tests that way, but these may be stale a bit. I will add the `PhaseOrdering``` test, thanks for the suggestion.
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
78 ↗	(On Diff #453574)	Yes, agree.

removed more duplicated tests
added a llvm/test/Transforms/PhaseOrdering test
refactor the code a bit

Harbormaster completed remote builds in B182019: Diff 453684.Aug 18 2022, 10:10 AM

dmgreen added inline comments.Aug 19 2022, 2:01 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
383	Hmm. No, I'm not sure I can. I was thinking about the ctz2 case, and whether there could be cases where the u's have different values that make them "match", but the values are different that make them wrong. So the items in the table accessed by the DeBruijn constant would produce incorrect values, but there are still InputBits number of matches. ;; int ctz2(unsigned x) ;; { ;; #define u 0 ;; static short table[64] = ;; { ;; 32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14, ;; 10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15, ;; 31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26, ;; 30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u ;; }; ;; x = (x & -x) * 0x0450FBAF; ;; return table[x >> 26]; ;; } But I don't think that is something that can come up. I was finding it hard to prove, but if the Mul is InputBits in length there are only at most InputBits separate elements that it can access. And multiple elements cannot map successfully back to the same i. I ran a sat solver overnight, and it is still going but hasn't found any counter examples, which is a good sign. (It is able to find valid DeBruijn CTTZ tables given the chance). It might be worth adding a comment explaining why this correctly matches the table in all cases.
437	One think I forgot to mention - llvm has a code style that is best explained as "just run clang-format on the patch". These returns are all in the wrong place, for example, and could do with a cleanup.
483–484	Ah, yeah - I meant Log2_32, but delete the wrong part of the function name.
485–486	This is true by definition now.
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll
23 ↗	(On Diff #453684)	I usually remove dso_local
llvm/test/Transforms/PhaseOrdering/lower-table-based-cttz.ll
22 ↗	(On Diff #453684)	Can remove the `; Function Attrs`

djtodoro marked an inline comment as done.Aug 21 2022, 3:06 AM

djtodoro added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
383	Yeah, good sign. I will try to make a reasonable comment. Thanks.
437	I've changed the style to `Google`, accidentally. Thanks.
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll
23 ↗	(On Diff #453684)	me as well, leftover :/ thanks!

clang-format
clean up tests

Harbormaster completed remote builds in B182443: Diff 454291.Aug 21 2022, 5:01 AM

Thanks for the updates. I don't think I have anything else than is what is below. Any other comments from anyone else?

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
383	A comment explaining this function would still be useful.
442	This can be changed to just the Load type then.
460–463	if (!GVTable \|\| !GVTable->hasInitializer()) return false;
496–498	Remove this check, as it is always true as far as I can tell.

spatel added inline comments.Aug 24 2022, 1:28 PM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
366–367	There was a request to put a comment on this function, and I'll second that request. It's not clear why we are counting matches rather than just bailing out on the first mismatch. I think that's because you can construct/recognize a table with unaccessed/undefined elements?

djtodoro marked 4 inline comments as done.Aug 27 2022, 6:31 AM

djtodoro added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
366–367	Yes, that is the reason - we are iterating over the elements of the table, so there could be mismatch that we can ignore. A comment is coming.

addressing comments

Harbormaster completed remote builds in B183750: Diff 456115.Aug 27 2022, 7:51 AM

LGTM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
474–476	Could shorten this by using something like: if (!match(GEP->idx_begin()->get(), m_ZeroInt())) return false;

This revision is now accepted and ready to land.Aug 27 2022, 11:32 AM

craig.topper added inline comments.Aug 27 2022, 11:52 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
494	If we’re only handling 32 and 64, this comment should be 5..6

craig.topper added inline comments.Aug 27 2022, 12:11 PM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
376	I think this can be Mask = APInt::getBitsSetFrom(InputBits , Shift)

Thanks for the updates. LGTM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
494	I believe it is 7 because the table can be twice the size. Hence the -1 in the formula below. See the ctz2 test.

Thanks for your comments.

addressing comments

Harbormaster completed remote builds in B183797: Diff 456178.Aug 28 2022, 2:27 AM

craig.topper added inline comments.Aug 28 2022, 11:35 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
375	extra space before comma. Looks like I mistyped it in my comment. Sorry.

djtodoro updated this revision to Diff 456267.Aug 29 2022, 12:20 AM

Harbormaster completed remote builds in B183865: Diff 456267.Aug 29 2022, 1:33 AM

Closed by commit rGfec01ee3f524: [AggressiveInstCombine] Lower Table Based CTTZ (authored by djtodoro). · Explain WhySep 2 2022, 8:28 AM

This revision was automatically updated to reflect the committed changes.

djtodoro added a commit: rGfec01ee3f524: [AggressiveInstCombine] Lower Table Based CTTZ.

Hi, I'm seeing a heap-use-after-free in AggressiveInstCombine in a build shortly after this landed, within a function added here.

test/Transforms/AggressiveInstCombine/X86/sqrt.ll fails as follows under asan: https://gist.github.com/zygoloid/a270e65d32ab5b05504b3b0d5717f83b

Please can you fix or revert?

rsmith added a reverting change: rG053841c5624c: Revert "[AggressiveInstCombine] Lower Table Based CTTZ".Sep 2 2022, 4:19 PM

In D113291#3768076, @rsmith wrote:

test/Transforms/AggressiveInstCombine/X86/sqrt.ll fails as follows under asan: https://gist.github.com/zygoloid/a270e65d32ab5b05504b3b0d5717f83b

I've gone ahead and reverted for now to unblock things. Let me know if you're not able to reproduce the ASan failure and I can dig more into it.

Thanks a lot. I will check ASAP.

@rsmith recommitted with f879939157. Thanks!

In D113291#3777119, @djtodoro wrote:

@rsmith recommitted with f879939157. Thanks!

What was the bug, how was it fixed, and is there a new test to verify the fix? That should have been mentioned in the new commit message.

In D113291#3777195, @spatel wrote:

In D113291#3777119, @djtodoro wrote:

@rsmith recommitted with f879939157. Thanks!

What was the bug, how was it fixed, and is there a new test to verify the fix? That should have been mentioned in the new commit message.

You are right. I reverted the recommit, and I will recommit it with proper message, sorry I missed it. :/

Actually, the issue was that your patch D129167 introduced eraseFromParent and the tryToRecognizeTableBasedCttz would try to use the instruction (dyn_cast) after free. I just moved the tryToRecognizeTableBasedCttz above foldSqrt. I guess it does not need any additional test case.

In D113291#3777356, @djtodoro wrote:

Actually, the issue was that your patch D129167 introduced eraseFromParent and the tryToRecognizeTableBasedCttz would try to use the instruction (dyn_cast) after free. I just moved the tryToRecognizeTableBasedCttz above foldSqrt. I guess it does not need any additional test case.

Ah, I see. Please put a comment on that call then - foldSqrt (or any other erasing transform) needs to be accounted for (last in the loop for now), or we might hit that bug. And yes, looks like we don't need another test since the existing regression test was flagged by the asan bot.

djtodoro mentioned this in rGdf868edee561: "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"".Sep 9 2022, 1:30 AM

libin049 added a subscriber: libin049.Mar 30 2023, 11:56 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

8 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

4 lines

TargetLowering.h

6 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

4 lines

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

182 lines

test/

Transforms/

AggressiveInstCombine/

AARCH64/

dereferencing-pointer.ll

60 lines

lit.local.cfg

2 lines

lower-table-based-ctz-basics.ll

300 lines

lower-table-based-ctz.ll

43 lines

non-argument-value.ll

111 lines

zero-element.ll

44 lines

Diff 387204

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	public:
/// Compared to the SW implementation, HW support is supposed to		/// Compared to the SW implementation, HW support is supposed to
/// significantly boost the performance when the population is dense, and it		/// significantly boost the performance when the population is dense, and it
/// may or may not degrade performance if the population is sparse. A HW		/// may or may not degrade performance if the population is sparse. A HW
/// support is considered as "Fast" if it can outperform, or is on a par		/// support is considered as "Fast" if it can outperform, or is on a par
/// with, SW implementation when the population is sparse; otherwise, it is		/// with, SW implementation when the population is sparse; otherwise, it is
/// considered as "Slow".		/// considered as "Slow".
enum PopcntSupportKind { PSK_Software, PSK_SlowHardware, PSK_FastHardware };		enum PopcntSupportKind { PSK_Software, PSK_SlowHardware, PSK_FastHardware };

		/// Return true if the target would benefit from recognizing a table-based
		/// implementation of cttz.
		bool preferCTTZLowering() const;

/// Return true if the specified immediate is legal add immediate, that		/// Return true if the specified immediate is legal add immediate, that
/// is the target has add instructions which can add a register with the		/// is the target has add instructions which can add a register with the
/// immediate without having to materialize the immediate into a register.		/// immediate without having to materialize the immediate into a register.
bool isLegalAddImmediate(int64_t Imm) const;		bool isLegalAddImmediate(int64_t Imm) const;

/// Return true if the specified immediate is legal icmp immediate,		/// Return true if the specified immediate is legal icmp immediate,
/// that is the target has icmp instructions which can compare a register		/// that is the target has icmp instructions which can compare a register
/// against the immediate without having to materialize the immediate into a		/// against the immediate without having to materialize the immediate into a
▲ Show 20 Lines • Show All 884 Lines • ▼ Show 20 Lines	public:
simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,		simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
APInt DemandedMask, KnownBits &Known,		APInt DemandedMask, KnownBits &Known,
bool &KnownBitsComputed) = 0;		bool &KnownBitsComputed) = 0;
virtual Optional<Value *> simplifyDemandedVectorEltsIntrinsic(		virtual Optional<Value *> simplifyDemandedVectorEltsIntrinsic(
InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
APInt &UndefElts2, APInt &UndefElts3,		APInt &UndefElts2, APInt &UndefElts3,
std::function<void(Instruction *, unsigned, APInt, APInt &)>		std::function<void(Instruction *, unsigned, APInt, APInt &)>
SimplifyAndSetOp) = 0;		SimplifyAndSetOp) = 0;
		virtual bool preferCTTZLowering() = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace,		int64_t Scale, unsigned AddrSpace,
Instruction *I) = 0;		Instruction *I) = 0;
virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2) = 0;		TargetTransformInfo::LSRCost &C2) = 0;
▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	Optional<Value *> simplifyDemandedVectorEltsIntrinsic(
InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
APInt &UndefElts2, APInt &UndefElts3,		APInt &UndefElts2, APInt &UndefElts3,
std::function<void(Instruction *, unsigned, APInt, APInt &)>		std::function<void(Instruction *, unsigned, APInt, APInt &)>
SimplifyAndSetOp) override {		SimplifyAndSetOp) override {
return Impl.simplifyDemandedVectorEltsIntrinsic(		return Impl.simplifyDemandedVectorEltsIntrinsic(
IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,		IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
SimplifyAndSetOp);		SimplifyAndSetOp);
}		}
		bool preferCTTZLowering() override {
		return Impl.preferCTTZLowering();
		}
bool isLegalAddImmediate(int64_t Imm) override {		bool isLegalAddImmediate(int64_t Imm) override {
return Impl.isLegalAddImmediate(Imm);		return Impl.isLegalAddImmediate(Imm);
}		}
bool isLegalICmpImmediate(int64_t Imm) override {		bool isLegalICmpImmediate(int64_t Imm) override {
return Impl.isLegalICmpImmediate(Imm);		return Impl.isLegalICmpImmediate(Imm);
}		}
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace,		bool HasBaseReg, int64_t Scale, unsigned AddrSpace,
▲ Show 20 Lines • Show All 557 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	public:

void getUnrollingPreferences(Loop *, ScalarEvolution &,		void getUnrollingPreferences(Loop *, ScalarEvolution &,
TTI::UnrollingPreferences &,		TTI::UnrollingPreferences &,
OptimizationRemarkEmitter *) const {}		OptimizationRemarkEmitter *) const {}

void getPeelingPreferences(Loop *, ScalarEvolution &,		void getPeelingPreferences(Loop *, ScalarEvolution &,
TTI::PeelingPreferences &) const {}		TTI::PeelingPreferences &) const {}

		bool preferCTTZLowering() const { return false; }

bool isLegalAddImmediate(int64_t Imm) const { return false; }		bool isLegalAddImmediate(int64_t Imm) const { return false; }

bool isLegalICmpImmediate(int64_t Imm) const { return false; }		bool isLegalICmpImmediate(int64_t Imm) const { return false; }

bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace,		bool HasBaseReg, int64_t Scale, unsigned AddrSpace,
Instruction *I = nullptr) const {		Instruction *I = nullptr) const {
// Guess that only reg and reg+reg addressing is allowed. This heuristic is		// Guess that only reg and reg+reg addressing is allowed. This heuristic is
▲ Show 20 Lines • Show All 980 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	unsigned getAssumedAddrSpace(const Value *V) const {
return getTLI()->getTargetMachine().getAssumedAddrSpace(V);		return getTLI()->getTargetMachine().getAssumedAddrSpace(V);
}		}

Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,		Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,
Value *NewV) const {		Value *NewV) const {
return nullptr;		return nullptr;
}		}

		bool preferCTTZLowering() {
		return getTLI()->preferCTTZLowering();
		}

bool isLegalAddImmediate(int64_t imm) {		bool isLegalAddImmediate(int64_t imm) {
return getTLI()->isLegalAddImmediate(imm);		return getTLI()->isLegalAddImmediate(imm);
}		}

bool isLegalICmpImmediate(int64_t imm) {		bool isLegalICmpImmediate(int64_t imm) {
return getTLI()->isLegalICmpImmediate(imm);		return getTLI()->isLegalICmpImmediate(imm);
}		}

▲ Show 20 Lines • Show All 1,934 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,398 Lines • ▼ Show 20 Lines	public:

/// Return true if the specified immediate is legal icmp immediate, that is		/// Return true if the specified immediate is legal icmp immediate, that is
/// the target has icmp instructions which can compare a register against the		/// the target has icmp instructions which can compare a register against the
/// immediate without having to materialize the immediate into a register.		/// immediate without having to materialize the immediate into a register.
virtual bool isLegalICmpImmediate(int64_t) const {		virtual bool isLegalICmpImmediate(int64_t) const {
return true;		return true;
}		}

		/// Return true if the target would benefit from recognizing a table-based
		/// implementation of cttz.
		virtual bool preferCTTZLowering() const {
		return false;
		}

/// Return true if the specified immediate is legal add immediate, that is the		/// Return true if the specified immediate is legal add immediate, that is the
/// target has add instructions which can add a register with the immediate		/// target has add instructions which can add a register with the immediate
/// without having to materialize the immediate into a register.		/// without having to materialize the immediate into a register.
virtual bool isLegalAddImmediate(int64_t) const {		virtual bool isLegalAddImmediate(int64_t) const {
return true;		return true;
}		}

/// Return true if the specified immediate is legal for the value input of a		/// Return true if the specified immediate is legal for the value input of a
▲ Show 20 Lines • Show All 2,296 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	void TargetTransformInfo::getUnrollingPreferences(
return TTIImpl->getUnrollingPreferences(L, SE, UP, ORE);		return TTIImpl->getUnrollingPreferences(L, SE, UP, ORE);
}		}

void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
PeelingPreferences &PP) const {		PeelingPreferences &PP) const {
return TTIImpl->getPeelingPreferences(L, SE, PP);		return TTIImpl->getPeelingPreferences(L, SE, PP);
}		}

		bool TargetTransformInfo::preferCTTZLowering() const {
		return TTIImpl->preferCTTZLowering();
		}

bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {		bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
return TTIImpl->isLegalAddImmediate(Imm);		return TTIImpl->isLegalAddImmediate(Imm);
}		}

bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {		bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {
return TTIImpl->isLegalICmpImmediate(Imm);		return TTIImpl->isLegalICmpImmediate(Imm);
}		}

▲ Show 20 Lines • Show All 835 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 586 Lines • ▼ Show 20 Lines	public:

bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
ArrayRef<ShuffleVectorInst *> Shuffles,		ArrayRef<ShuffleVectorInst *> Shuffles,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Factor) const override;		unsigned Factor) const override;
bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,		bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,
unsigned Factor) const override;		unsigned Factor) const override;

		bool preferCTTZLowering() const override;
bool isLegalAddImmediate(int64_t) const override;		bool isLegalAddImmediate(int64_t) const override;
bool isLegalICmpImmediate(int64_t) const override;		bool isLegalICmpImmediate(int64_t) const override;

bool shouldConsiderGEPOffsetSplit() const override;		bool shouldConsiderGEPOffsetSplit() const override;

EVT getOptimalMemOpType(const MemOp &Op,		EVT getOptimalMemOpType(const MemOp &Op,
const AttributeList &FuncAttributes) const override;		const AttributeList &FuncAttributes) const override;

▲ Show 20 Lines • Show All 528 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,167 Lines • ▼ Show 20 Lines	if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
return LLT::scalar(128);		return LLT::scalar(128);
if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))		if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
return LLT::scalar(64);		return LLT::scalar(64);
if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))		if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
return LLT::scalar(32);		return LLT::scalar(32);
return LLT();		return LLT();
}		}

		bool AArch64TargetLowering::preferCTTZLowering() const {
		return true;
		}

// 12-bit optionally shifted immediates are legal for adds.		// 12-bit optionally shifted immediates are legal for adds.
bool AArch64TargetLowering::isLegalAddImmediate(int64_t Immed) const {		bool AArch64TargetLowering::isLegalAddImmediate(int64_t Immed) const {
if (Immed == std::numeric_limits<int64_t>::min()) {		if (Immed == std::numeric_limits<int64_t>::min()) {
LLVM_DEBUG(dbgs() << "Illegal add imm " << Immed		LLVM_DEBUG(dbgs() << "Illegal add imm " << Immed
<< ": avoid UB for INT64_MIN\n");		<< ": avoid UB for INT64_MIN\n");
return false;		return false;
}		}
// Same encoding for add/sub, just flip the sign.		// Same encoding for add/sub, just flip the sign.
▲ Show 20 Lines • Show All 6,729 Lines • Show Last 20 Lines

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show All 15 Lines
#include "AggressiveInstCombineInternal.h"		#include "AggressiveInstCombineInternal.h"
#include "llvm-c/Initialization.h"		#include "llvm-c/Initialization.h"
#include "llvm-c/Transforms/AggressiveInstCombine.h"		#include "llvm-c/Transforms/AggressiveInstCombine.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if (match(MulOp0, m_And(m_c_Add(m_LShr(m_Value(ShiftOp0), m_SpecificInt(4)),
}		}
}		}
}		}
}		}

return false;		return false;
}		}

		static bool isCTTZTable(const ConstantDataArray &Table, uint64_t Mul,
		uint64_t Shift, uint64_t InputBits) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Can we pass the Width in and not make this a template function? Maybe making using of APInt to manage the width if needed? craig.topper: Can we pass the Width in and not make this a template function? Maybe making using of APInt to…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions OK, sure. djtodoro: OK, sure.
		spatelUnsubmitted Not Done Reply Inline Actions There was a request to put a comment on this function, and I'll second that request. It's not clear why we are counting matches rather than just bailing out on the first mismatch. I think that's because you can construct/recognize a table with unaccessed/undefined elements? spatel: There was a request to put a comment on this function, and I'll second that request. It's not…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Yes, that is the reason - we are iterating over the elements of the table, so there could be mismatch that we can ignore. A comment is coming. djtodoro: Yes, that is the reason - we are iterating over the elements of the table, so there could be…
		unsigned Length = Table.getNumElements();
		uint64_t IntWidth = InputBits / CHAR_BIT;
		if (Length < InputBits \|\| Length > InputBits * 2)
		return false;
		craig.topperUnsubmitted Done Reply Inline Actions 8 should be `CHAR_BIT`, but if we can do this without using a template function I'd rather do that. craig.topper: 8 should be `CHAR_BIT`, but if we can do this without using a template function I'd rather do…

		APInt Mask(InputBits, ((IntWidth << (InputBits - Shift)) - 1) << Shift);
		unsigned Matched = 0;

		craig.topperUnsubmitted Not Done Reply Inline Actions extra space before comma. Looks like I mistyped it in my comment. Sorry. craig.topper: extra space before comma. Looks like I mistyped it in my comment. Sorry.
		for (unsigned i = 0; i < Length; i++) {
		craig.topperUnsubmitted Done Reply Inline Actions I think this can be Mask = APInt::getBitsSetFrom(InputBits , Shift) craig.topper: I think this can be Mask = APInt::getBitsSetFrom(InputBits , Shift)
		uint64_t Element = Table.getElementAsInteger(i);
		if (Element < InputBits &&
		(((Mul << Element) & Mask.getZExtValue()) >> Shift) == i)
		xbolva00Unsubmitted Not Done Reply Inline Actions getZExtValue may assert large ints xbolva00: getZExtValue may assert large ints
		Matched++;
		}

		return Matched == InputBits;
		dmgreenUnsubmitted Not Done Reply Inline Actions If the length of the table is larger than the InputBits, how are we sure that the matched elements will be the correct ones? Is this always guaranteed? I think I would have expected a check that `for each i=0..InputBits-1, Table[(Mul<<i)>>Shift] == i`. With a check that the index is in range. Are they always equivalent with the larger tables too? dmgreen: If the length of the table is larger than the InputBits, how are we sure that the matched…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Hmmm, can you please walk me through out an example? djtodoro: Hmmm, can you please walk me through out an example?
		dmgreenUnsubmitted Not Done Reply Inline Actions Hmm. No, I'm not sure I can. I was thinking about the ctz2 case, and whether there could be cases where the u's have different values that make them "match", but the values are different that make them wrong. So the items in the table accessed by the DeBruijn constant would produce incorrect values, but there are still InputBits number of matches. ;; int ctz2(unsigned x) ;; { ;; #define u 0 ;; static short table[64] = ;; { ;; 32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14, ;; 10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15, ;; 31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26, ;; 30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u ;; }; ;; x = (x & -x) * 0x0450FBAF; ;; return table[x >> 26]; ;; } But I don't think that is something that can come up. I was finding it hard to prove, but if the Mul is InputBits in length there are only at most InputBits separate elements that it can access. And multiple elements cannot map successfully back to the same i. I ran a sat solver overnight, and it is still going but hasn't found any counter examples, which is a good sign. (It is able to find valid DeBruijn CTTZ tables given the chance). It might be worth adding a comment explaining why this correctly matches the table in all cases. dmgreen: Hmm. No, I'm not sure I can. I was thinking about the ctz2 case, and whether there could be…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Yeah, good sign. I will try to make a reasonable comment. Thanks. djtodoro: Yeah, good sign. I will try to make a reasonable comment. Thanks.
		dmgreenUnsubmitted Done Reply Inline Actions A comment explaining this function would still be useful. dmgreen: A comment explaining this function would still be useful.
		}

		// Try to recognize table-based ctz implementation.
		// E.g., an example in C (for more cases please see the llvm/tests):
		// int f(unsigned x) {
		// static const char table[32] =
		// {0, 1, 28, 2, 29, 14, 24, 3, 30,
		// 22, 20, 15, 25, 17, 4, 8, 31, 27,
		// 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};
		// return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
		// }
		// this can be lowered to `cttz` instruction.
		// There is also a special case when the element is 0.
		//
		// Here are some examples or IR for AARCH64 target:
		//
		// CASE 1:
		// %sub = sub i32 0, %x
		// %and = and i32 %sub, %x
		// %mul = mul i32 %and, 125613361
		// %shr = lshr i32 %mul, 27
		// %idxprom = zext i32 %shr to i64
		// %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz1.table, i64 0,
		// i64 %idxprom %0 = load i8, i8* %arrayidx, align 1, !tbaa !8
		//
		// CASE 2:
		// %sub = sub i32 0, %x
		// %and = and i32 %sub, %x
		// %mul = mul i32 %and, 72416175
		// %shr = lshr i32 %mul, 26
		// %idxprom = zext i32 %shr to i64
		// %arrayidx = getelementptr inbounds [64 x i16], [64 x i16]* @ctz2.table, i64
		// 0, i64 %idxprom %0 = load i16, i16* %arrayidx, align 2, !tbaa !8
		//
		// CASE 3:
		// %sub = sub i32 0, %x
		// %and = and i32 %sub, %x
		// %mul = mul i32 %and, 81224991
		// %shr = lshr i32 %mul, 27
		// %idxprom = zext i32 %shr to i64
		// %arrayidx = getelementptr inbounds [32 x i32], [32 x i32]* @ctz3.table, i64
		// 0, i64 %idxprom %0 = load i32, i32* %arrayidx, align 4, !tbaa !8
		//
		// CASE 4:
		// %sub = sub i64 0, %x
		// %and = and i64 %sub, %x
		// %mul = mul i64 %and, 283881067100198605
		// %shr = lshr i64 %mul, 58
		// %arrayidx = getelementptr inbounds [64 x i8], [64 x i8]* @table, i64 0, i64
		// %shr %0 = load i8, i8* %arrayidx, align 1, !tbaa !8
		//
		// All this can be lowered to @llvm.cttz.i32/64 intrinsic.
		static bool tryToRecognizeTableBasedCttz(Instruction &I) {
		LoadInst *LI = dyn_cast<LoadInst>(&I);
		dmgreenUnsubmitted Not Done Reply Inline Actions One think I forgot to mention - llvm has a code style that is best explained as "just run clang-format on the patch". These returns are all in the wrong place, for example, and could do with a cleanup. dmgreen: One think I forgot to mention - llvm has a code style that is best explained as "just run clang…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions I've changed the style to `Google`, accidentally. Thanks. djtodoro: I've changed the style to `Google`, accidentally. Thanks.
		if (!LI)
		return false;

		dmgreenUnsubmitted Done Reply Inline Actions We will need to support opaque pointers nowadays. dmgreen: We will need to support opaque pointers nowadays.
		Type *ElType = LI->getPointerOperandType()->getPointerElementType();
		if (!ElType->isIntegerTy())
		dmgreenUnsubmitted Not Done Reply Inline Actions Can this always just use the Load type? dmgreen: Can this always just use the Load type?
		djtodoroAuthorUnsubmitted Done Reply Inline Actions I think yes, nothing comes up to my mind that can break it. djtodoro: I think yes, nothing comes up to my mind that can break it.
		dmgreenUnsubmitted Done Reply Inline Actions This can be changed to just the Load type then. dmgreen: This can be changed to just the Load type then.
		return false;

		GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(LI->getPointerOperand());
		if (!GEP \|\| !GEP->isInBounds() \|\| GEP->getNumIndices() != 2)
		return false;
		craig.topperUnsubmitted Not Done Reply Inline Actions The _or_null here feels overly paranoid. A load won't ever have a null pointer operand will it? So dyn_cast should be ok Same with the other dyn_cast_or_null. I don't think any of them should ever be null to start. craig.topper: The _or_null here feels overly paranoid. A load won't ever have a null pointer operand will it?
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Agree with this. djtodoro: Agree with this.

		Type *GEPPointeeType = GEP->getPointerOperandType()->getPointerElementType();
		if (!GEPPointeeType->isArrayTy())
		return false;

		uint64_t ArraySize = GEPPointeeType->getArrayNumElements();
		if (ArraySize != 32 && ArraySize != 64)
		dmgreenUnsubmitted Done Reply Inline Actions I think User here could be GlobalVariable dmgreen: I think User here could be GlobalVariable
		return false;

		User *GEPUser = dyn_cast<User>(GEP->getPointerOperand());
		if (!GEPUser)
		dmgreenUnsubmitted Not Done Reply Inline Actions GEPUser->getOperand(0) -> Global->getInitializer(). It is worth adding a test where the global is extern. dmgreen: GEPUser->getOperand(0) -> Global->getInitializer(). It is worth adding a test where the global…
		return false;

		ConstantDataArray *ConstData =
		dyn_cast<ConstantDataArray>(GEPUser->getOperand(0));
		if (!ConstData)
		dmgreenUnsubmitted Done Reply Inline Actions if (!GVTable \|\| !GVTable->hasInitializer()) return false; dmgreen: ``` if (!GVTable \|\| !GVTable->hasInitializer()) return false; ```
		return false;

		Value *Idx1 = GEP->idx_begin()->get();
		Constant *Zero = dyn_cast<Constant>(Idx1);
		if (!Zero \|\| !Zero->isZeroValue())
		return false;

		Value *Idx2 = std::next(GEP->idx_begin())->get();

		bool ConstIsWide = !match(Idx2, m_ZExt(m_Value()));

		Value *X1;
		uint64_t MulConst, ShiftConst;
		spatelUnsubmitted Done Reply Inline Actions Could shorten this by using something like: if (!match(GEP->idx_begin()->get(), m_ZeroInt())) return false; spatel: Could shorten this by using something like: if (!match(GEP->idx_begin()->get(), m_ZeroInt()))…
		// FIXME: AArch64 has i64 type for the GEP index, so this match will
		// probably fail for other targets.
		if (!match(Idx2,
		craig.topperUnsubmitted Not Done Reply Inline Actions Isn't it usually spelled AArch64 with 2 capital As? craig.topper: Isn't it usually spelled AArch64 with 2 capital As?
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Yep. djtodoro: Yep.
		dmgreenUnsubmitted Not Done Reply Inline Actions It's probably better to just say "64bit targets" as opposed to a specific target. dmgreen: It's probably better to just say "64bit targets" as opposed to a specific target.
		m_ZExtOrSelf(m_LShr(
		dmgreenUnsubmitted Not Done Reply Inline Actions It might be better to switch this logical around - unsigned InputBits = X1->getType()->getScalarSizeInBits(); if (InputBits != 32 && InputBits != 64) return false; dmgreen: It might be better to switch this logical around - ``` unsigned InputBits = X1->getType()…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Sounds good. djtodoro: Sounds good.
		m_ZExtOrSelf(m_Mul(m_c_And(m_Neg(m_Value(X1)), m_Deferred(X1)),
		m_ConstantInt(MulConst))),
		dmgreenUnsubmitted Not Done Reply Inline Actions Does the extend between the lshr and the mul every happen? From what I can tell, the type of the VT should be based on the type of these operations. dmgreen: Does the extend between the lshr and the mul every happen? From what I can tell, the type of…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions It does not happen in all the cases. djtodoro: It does not happen in all the cases.
		dmgreenUnsubmitted Not Done Reply Inline Actions Do you have a test case where the extend is between the shift and the mul? dmgreen: Do you have a test case where the extend is between the shift and the mul?
		djtodoroAuthorUnsubmitted Done Reply Inline Actions I was completely sure that I had a case for it, but I am not able to produce it actually -- so I deleted it for now. djtodoro: I was completely sure that I had a case for it, but I am not able to produce it actually -- so…
		m_ConstantInt(ShiftConst)))))
		return false;
		craig.topperUnsubmitted Not Done Reply Inline Actions I think you can use m_Deferred(X1) in place of m_Value(X2), but @lebedev.ri or @spatel would know better. craig.topper: I think you can use m_Deferred(X1) in place of m_Value(X2), but @lebedev.ri or @spatel would…
		dmgreenUnsubmitted Not Done Reply Inline Actions Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. The -1 case is for a larger table with more elements but that can handle zero values? dmgreen: Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. The -1 case is for a…
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. Right... Bu you meant `Log2_64()`, right? It is a power of 2, since it is either 32 or 64, so no need to add any assert here. The -1 case is for a larger table with more elements but that can handle zero values? int ctz2(unsigned x) { #define u 0 static short table[64] = { 32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14, 10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15, 31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26, 30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u }; x = (x & -x) * 0x0450FBAF; return table[x >> 26]; } djtodoro: >Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2. Right... Bu you meant…
		dmgreenUnsubmitted Not Done Reply Inline Actions Ah, yeah - I meant Log2_32, but delete the wrong part of the function name. dmgreen: Ah, yeah - I meant Log2_32, but delete the wrong part of the function name.

		unsigned InputBits = ConstIsWide ? 64 : 32;
		dmgreenUnsubmitted Not Done Reply Inline Actions This is true by definition now. dmgreen: This is true by definition now.

		// Shift should extract top 5..7 bits.
		if (ShiftConst < InputBits - 7 \|\| ShiftConst > InputBits - 5)
		return false;
		dmgreenUnsubmitted Done Reply Inline Actions I believe it's the top `Bitwidth - Log2(Bitwidth)` bits. dmgreen: I believe it's the top `Bitwidth - Log2(Bitwidth)` bits.

		Type *XType = X1->getType();
		if (!XType->isIntegerTy(InputBits))
		return false;
		craig.topperUnsubmitted Not Done Reply Inline Actions If we’re only handling 32 and 64, this comment should be 5..6 craig.topper: If we’re only handling 32 and 64, this comment should be 5..6
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe it is 7 because the table can be twice the size. Hence the -1 in the formula below. See the ctz2 test. dmgreen: I believe it is 7 because the table can be twice the size. Hence the -1 in the formula below.

		if (!isCTTZTable(*ConstData, MulConst, ShiftConst, InputBits))
		return false;

		dmgreenUnsubmitted Done Reply Inline Actions Remove this check, as it is always true as far as I can tell. dmgreen: Remove this check, as it is always true as far as I can tell.
		auto ZeroTableElem = ConstData->getElementAsInteger(0);
		bool DefinedForZero = ZeroTableElem == InputBits;

		IRBuilder<> B(LI);
		ConstantInt *BoolConst = B.getInt1(!DefinedForZero);
		auto Cttz = B.CreateIntrinsic(Intrinsic::cttz, {XType}, {X1, BoolConst});
		Value *ZExtOrTrunc = nullptr;

		if (DefinedForZero) {
		ZExtOrTrunc = B.CreateZExtOrTrunc(Cttz, ElType);
		} else {
		// If the value in elem 0 isn't the same as InputBits, we still want to
		// produce the value from the table.
		auto Cmp = B.CreateICmpEQ(X1, ConstantInt::get(XType, 0));
		auto Select =
		B.CreateSelect(Cmp, ConstantInt::get(XType, ZeroTableElem), Cttz);
		craig.topperUnsubmitted Done Reply Inline Actions `B.getInt1(!DefinedForZero);` craig.topper: `B.getInt1(!DefinedForZero);`

		// NOTE: If the table[0] is 0, but the cttz(0) is defined by the Target
		// it should be handled as: `cttz(x) & (typeSize - 1)`.

		ZExtOrTrunc = B.CreateZExtOrTrunc(Select, ElType);
		craig.topperUnsubmitted Not Done Reply Inline Actions We don't need this ICmp and Select if DefinedForZero is true right? craig.topper: We don't need this ICmp and Select if DefinedForZero is true right?
		djtodoroAuthorUnsubmitted Done Reply Inline Actions Actually, we don't need it. djtodoro: Actually, we don't need it.
		}

		LI->replaceAllUsesWith(ZExtOrTrunc);

		return true;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, DominatorTree &DT,
		TargetTransformInfo &TTI) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedFunnelShift(I, DT);		MadeChange \|= foldGuardedFunnelShift(I, DT);
MadeChange \|= tryToRecognizePopCount(I);		MadeChange \|= tryToRecognizePopCount(I);
		if (TTI.preferCTTZLowering())
		MadeChange \|= tryToRecognizeTableBasedCttz(I);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

return MadeChange;		return MadeChange;
}		}

/// This is the entry point for all transforms. Pass manager differences are		/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.		/// handled in the callers of this function.
static bool runImpl(Function &F, TargetLibraryInfo &TLI, DominatorTree &DT) {		static bool runImpl(Function &F, TargetLibraryInfo &TLI, DominatorTree &DT,
		TargetTransformInfo &TTI) {
bool MadeChange = false;		bool MadeChange = false;
const DataLayout &DL = F.getParent()->getDataLayout();		const DataLayout &DL = F.getParent()->getDataLayout();
TruncInstCombine TIC(TLI, DL, DT);		TruncInstCombine TIC(TLI, DL, DT);
MadeChange \|= TIC.run(F);		MadeChange \|= TIC.run(F);
MadeChange \|= foldUnusualPatterns(F, DT);		MadeChange \|= foldUnusualPatterns(F, DT, TTI);
return MadeChange;		return MadeChange;
}		}

void AggressiveInstCombinerLegacyPass::getAnalysisUsage(		void AggressiveInstCombinerLegacyPass::getAnalysisUsage(
AnalysisUsage &AU) const {		AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
		dmgreenUnsubmitted Done Reply Inline Actions The TTI's can be removed now (and if you rebase they may already be present, but still are not needed by the new code any more). dmgreen: The TTI's can be removed now (and if you rebase they may already be present, but still are not…
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<BasicAAWrapperPass>();		AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}

bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {		bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
return runImpl(F, TLI, DT);		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		return runImpl(F, TLI, DT, TTI);
}		}

PreservedAnalyses AggressiveInstCombinePass::run(Function &F,		PreservedAnalyses AggressiveInstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
if (!runImpl(F, TLI, DT)) {		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
		if (!runImpl(F, TLI, DT, TTI)) {
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}
// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
return PA;		return PA;
}		}

char AggressiveInstCombinerLegacyPass::ID = 0;		char AggressiveInstCombinerLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(AggressiveInstCombinerLegacyPass,		INITIALIZE_PASS_BEGIN(AggressiveInstCombinerLegacyPass,
"aggressive-instcombine",		"aggressive-instcombine",
"Combine pattern based expressions", false, false)		"Combine pattern based expressions", false, false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(AggressiveInstCombinerLegacyPass, "aggressive-instcombine",		INITIALIZE_PASS_END(AggressiveInstCombinerLegacyPass, "aggressive-instcombine",
"Combine pattern based expressions", false, false)		"Combine pattern based expressions", false, false)

// Initialization Routines		// Initialization Routines
void llvm::initializeAggressiveInstCombine(PassRegistry &Registry) {		void llvm::initializeAggressiveInstCombine(PassRegistry &Registry) {
initializeAggressiveInstCombinerLegacyPassPass(Registry);		initializeAggressiveInstCombinerLegacyPassPass(Registry);
}		}
Show All 12 Lines

llvm/test/Transforms/AggressiveInstCombine/AARCH64/dereferencing-pointer.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -aggressive-instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

				;; static const unsigned long long magic = 0x03f08c5392f756cdULL;
				;;
				;; static const int table[64] = {
				;; 0, 1, 12, 2, 13, 22, 17, 3,
				;; 14, 33, 23, 36, 18, 58, 28, 4,
				;; 62, 15, 34, 26, 24, 48, 50, 37,
				;; 19, 55, 59, 52, 29, 44, 39, 5,
				;; 63, 11, 21, 16, 32, 35, 57, 27,
				;; 61, 25, 47, 49, 54, 51, 43, 38,
				;; 10, 20, 31, 56, 60, 46, 53, 42,
				;; 9, 30, 45, 41, 8, 40, 7, 6,
				;; };
				;;
				;; int ctz6 (unsigned long long * const b) {
				;; return table[(((b) & -(b)) * magic) >> 58];
				;; }

				; ModuleID = 'test.c'
				source_filename = "test.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				@table = internal unnamed_addr constant [64 x i32] [i32 0, i32 1, i32 12, i32 2, i32 13, i32 22, i32 17, i32 3, i32 14, i32 33, i32 23, i32 36, i32 18, i32 58, i32 28, i32 4, i32 62, i32 15, i32 34, i32 26, i32 24, i32 48, i32 50, i32 37, i32 19, i32 55, i32 59, i32 52, i32 29, i32 44, i32 39, i32 5, i32 63, i32 11, i32 21, i32 16, i32 32, i32 35, i32 57, i32 27, i32 61, i32 25, i32 47, i32 49, i32 54, i32 51, i32 43, i32 38, i32 10, i32 20, i32 31, i32 56, i32 60, i32 46, i32 53, i32 42, i32 9, i32 30, i32 45, i32 41, i32 8, i32 40, i32 7, i32 6], align 4

				; Function Attrs: mustprogress nofree norecurse nosync nounwind readonly uwtable willreturn
				define dso_local i32 @ctz6(i64* nocapture readonly %b) {
				; CHECK-LABEL: @ctz6(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = load i64, i64 [[B:%.*]], align 8
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.cttz.i64(i64 [[TMP0]], i1 true)
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP0]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i64 0, i64 [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
				; CHECK-NEXT: ret i32 [[TMP4]]
				;
				entry:
				%0 = load i64, i64* %b, align 8
				%sub = sub i64 0, %0
				%and = and i64 %0, %sub
				%mul = mul i64 %and, 283881067100198605
				%shr = lshr i64 %mul, 58
				%arrayidx = getelementptr inbounds [64 x i32], [64 x i32]* @table, i64 0, i64 %shr
				%1 = load i32, i32* %arrayidx, align 4
				ret i32 %1
				}

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"branch-target-enforcement", i32 0}
				!2 = !{i32 1, !"sign-return-address", i32 0}
				!3 = !{i32 1, !"sign-return-address-all", i32 0}
				!4 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
				!5 = !{i32 7, !"uwtable", i32 1}
				!6 = !{i32 7, !"frame-pointer", i32 1}
				!7 = !{!"clang version 14.0.0"}

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lit.local.cfg

This file was added.

				if not 'AArch64' in config.root.targets:
				config.unsupported = True

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz-basics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				xbolva00Unsubmitted Not Done Reply Inline Actions AggressiveInstCombine/AARCH64 -> AggressiveInstCombine/AArch64 xbolva00: AggressiveInstCombine/AARCH64 -> AggressiveInstCombine/AArch64
				; RUN: opt -aggressive-instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

				;; C reproducers:
				;; int ctz1 (unsigned x)
				;; {
				;; static const char table[32] =
				;; {
				;; 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
				;; 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
				;; };
				;; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
				;; }

				;; int ctz2(unsigned x)
				;; {
				;; #define u 0
				;; static short table[64] =
				;; {
				;; 32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
				;; 10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
				;; 31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
				;; 30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
				;; };
				;; x = (x & -x) * 0x0450FBAF;
				;; return table[x >> 26];
				;; }

				;; int ctz3(unsigned x)
				;;{
				;; static int table[32] =
				;; {
				;; 0, 1, 2, 24, 3, 19, 6, 25, 22, 4, 20, 10, 16, 7, 12, 26,
				;; 31, 23, 18, 5, 21, 9, 15, 11, 30, 17, 8, 14, 29, 13, 28, 27
				;; };
				;; if (x == 0) return 32;
				;; x = (x & -x) * 0x04D7651F;
				;; return table[x >> 27];
				;; }

				;; static const unsigned long long magic = 0x03f08c5392f756cdULL;
				;;
				;; static const int table[64] = {
				;; 0, 1, 12, 2, 13, 22, 17, 3, 14, 33, 23, 36, 18, 58, 28, 4,
				;; 62, 15, 34, 26, 24, 48, 50, 37, 19, 55, 59, 52, 29, 44, 39, 5,
				;; 63, 11, 21, 16, 32, 35, 57, 27, 61, 25, 47, 49, 54, 51, 43, 38,
				;; 10, 20, 31, 56, 60, 46, 53, 42, 9, 30, 45, 41, 8, 40, 7, 6,
				;; };
				;;
				;; int ctz4 (unsigned long long b)
				;; {
				;; unsigned long long lsb = b & -b;
				;; return table[(lsb * magic) >> 58];
				;; }
				;;
				;; int ctz5(unsigned x)
				;; {
				;; static char table[32] =
				;; {
				;; 0, 1, 2, 24, 3, 19, 6, 25, 22, 4, 20, 10, 16, 7, 12, 26,
				;; 31, 23, 18, 5, 21, 9, 15, 11, 30, 17, 8, 14, 29, 13, 28, 27
				;; };
				;; x = (x & -x)*0x04D7651F;
				;; return table[x >> 27];
				;; }

				;; int indexes[] = {
				;; 63, 0, 58, 1, 59, 47, 53, 2,60, 39, 48, 27, 54, 33, 42, 3,
				;; 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4,
				;; 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21,
				;; 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5
				;; };
				;;
				;; int ctz6(unsigned long n)
				;; {
				;; return indexes[((n & (~n + 1)) * 0x07EDD5E59A4E28C2ull) >> 58];
				;; }
				;;
				;; int ctz7(unsigned x)
				;; {
				;; static const char table[32] = "\x00\x01\x1c\x02\x1d\x0e\x18\x03\x1e\x16\x14"
				;; "\x0f\x19\x11\x04\b\x1f\x1b\r\x17\x15\x13\x10\x07\x1a\f\x12\x06\v\x05\n\t";
				;; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
				;; }
				;;
				;; int ctz8(unsigned v)
				;; {
				;; static const int table[] =
				;; {
				;; 31 ,0 ,1 ,23 ,2 ,18 ,5 ,24 ,21 ,3 ,19 ,9 ,15 ,6 ,11 ,25 ,30 ,22 ,17 ,4 ,20 ;,8 ,14 ,10 ,29 ,16 ,7 ,13 ,28 ,12 ,27 ,26
				;; };
				;; unsigned x =(-v & v);
				;; return table[(unsigned)(x * 0x9AECA3EU) >> 27];
				;; }

				; ModuleID = 'ctz.c'
				source_filename = "ctz.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				@ctz7.table = internal unnamed_addr constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1

				define i32 @ctz1(i32 %x) {
				; CHECK-LABEL: @ctz1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP3]] to i32
				craig.topperUnsubmitted Done Reply Inline Actions Can we put the tables adjacent to the function that uses them and use the update_test_checks.py script to generate the checks. I think that will make it easy to review each test case in isolation without scrolling around the file. craig.topper: Can we put the tables adjacent to the function that uses them and use the update_test_checks.py…
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz7.table, i64 0, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				ret i32 %conv
				}

				define i32 @ctz7(i32 %x) {
				; CHECK-LABEL: @ctz7(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP3]] to i32
				craig.topperUnsubmitted Done Reply Inline Actions Drop the Function Attrs comments. craig.topper: Drop the Function Attrs comments.
				; CHECK-NEXT: ret i32 [[CONV]]
				craig.topperUnsubmitted Done Reply Inline Actions Drop dso_local and local_unnamed_addr craig.topper: Drop dso_local and local_unnamed_addr
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz7.table, i64 0, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				ret i32 %conv
				}

				@ctz2.table = internal unnamed_addr constant [64 x i16] [i16 32, i16 0, i16 1, i16 12, i16 2, i16 6, i16 0, i16 13, i16 3, i16 0, i16 7, i16 0, i16 0, i16 0, i16 0, i16 14, i16 10, i16 4, i16 0, i16 0, i16 8, i16 0, i16 0, i16 25, i16 0, i16 0, i16 0, i16 0, i16 0, i16 21, i16 27, i16 15, i16 31, i16 11, i16 5, i16 0, i16 0, i16 0, i16 0, i16 0, i16 9, i16 0, i16 0, i16 24, i16 0, i16 0, i16 20, i16 26, i16 30, i16 0, i16 0, i16 0, i16 0, i16 23, i16 0, i16 19, i16 29, i16 0, i16 22, i16 18, i16 28, i16 17, i16 16, i16 0], align 2

				define i32 @ctz2(i32 %x) {
				; CHECK-LABEL: @ctz2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 [[TMP0]] to i16
				; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP1]] to i32
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 72416175
				%shr = lshr i32 %mul, 26
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [64 x i16], [64 x i16]* @ctz2.table, i64 0, i64 %idxprom
				%0 = load i16, i16* %arrayidx, align 2
				%conv = sext i16 %0 to i32
				ret i32 %conv
				}

				@ctz3.table = internal unnamed_addr constant [32 x i32] [i32 0, i32 1, i32 2, i32 24, i32 3, i32 19, i32 6, i32 25, i32 22, i32 4, i32 20, i32 10, i32 16, i32 7, i32 12, i32 26, i32 31, i32 23, i32 18, i32 5, i32 21, i32 9, i32 15, i32 11, i32 30, i32 17, i32 8, i32 14, i32 29, i32 13, i32 28, i32 27], align 4

				define i32 @ctz3(i32 %x) {
				; CHECK-LABEL: @ctz3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[X:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[IF_END:%.]]
				; CHECK: if.end:
				; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.cttz.i32(i32 [[X]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: br label [[RETURN]]
				; CHECK: return:
				; CHECK-NEXT: [[RETVAL_0:%.]] = phi i32 [ [[TMP0]], [[IF_END]] ], [ 32, [[ENTRY:%.]] ]
				; CHECK-NEXT: ret i32 [[RETVAL_0]]
				;
				entry:
				%cmp = icmp eq i32 %x, 0
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 81224991
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i32], [32 x i32]* @ctz3.table, i64 0, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				br label %return

				return: ; preds = %entry, %if.end
				%retval.0 = phi i32 [ %0, %if.end ], [ 32, %entry ]
				ret i32 %retval.0
				}

				@table = internal unnamed_addr constant [64 x i32] [i32 0, i32 1, i32 12, i32 2, i32 13, i32 22, i32 17, i32 3, i32 14, i32 33, i32 23, i32 36, i32 18, i32 58, i32 28, i32 4, i32 62, i32 15, i32 34, i32 26, i32 24, i32 48, i32 50, i32 37, i32 19, i32 55, i32 59, i32 52, i32 29, i32 44, i32 39, i32 5, i32 63, i32 11, i32 21, i32 16, i32 32, i32 35, i32 57, i32 27, i32 61, i32 25, i32 47, i32 49, i32 54, i32 51, i32 43, i32 38, i32 10, i32 20, i32 31, i32 56, i32 60, i32 46, i32 53, i32 42, i32 9, i32 30, i32 45, i32 41, i32 8, i32 40, i32 7, i32 6], align 4

				define i32 @ctz4(i64 %b) {
				; CHECK-LABEL: @ctz4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i64 @llvm.cttz.i64(i64 [[B:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[B]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 0, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				entry:
				%sub = sub i64 0, %b
				%and = and i64 %sub, %b
				%mul = mul i64 %and, 283881067100198605
				%shr = lshr i64 %mul, 58
				%arrayidx = getelementptr inbounds [64 x i32], [64 x i32]* @table, i64 0, i64 %shr
				%0 = load i32, i32* %arrayidx, align 4
				ret i32 %0
				}

				@ctz5.table = internal unnamed_addr constant [32 x i8] c"\00\01\02\18\03\13\06\19\16\04\14\0A\10\07\0C\1A\1F\17\12\05\15\09\0F\0B\1E\11\08\0E\1D\0D\1C\1B", align 1

				define i32 @ctz5(i32 %x) {
				; CHECK-LABEL: @ctz5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
				craig.topperUnsubmitted Done Reply Inline Actions Are these attributes needed? craig.topper: Are these attributes needed?
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP3]] to i32
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 81224991
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz5.table, i64 0, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				ret i32 %conv
				}

				@ctz6.table = global [64 x i32] [i32 63, i32 0, i32 58, i32 1, i32 59, i32 47, i32 53, i32 2, i32 60, i32 39, i32 48, i32 27, i32 54, i32 33, i32 42, i32 3, i32 61, i32 51, i32 37, i32 40, i32 49, i32 18, i32 28, i32 20, i32 55, i32 30, i32 34, i32 11, i32 43, i32 14, i32 22, i32 4, i32 62, i32 57, i32 46, i32 52, i32 38, i32 26, i32 32, i32 41, i32 50, i32 36, i32 17, i32 19, i32 29, i32 10, i32 13, i32 21, i32 56, i32 45, i32 25, i32 31, i32 35, i32 16, i32 9, i32 12, i32 44, i32 24, i32 15, i32 8, i32 23, i32 7, i32 6, i32 5], align 4
				craig.topperUnsubmitted Done Reply Inline Actions Most of this metadata isn't needed craig.topper: Most of this metadata isn't needed

				define i32 @ctz6(i64 %n) {
				; CHECK-LABEL: @ctz6(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i64 @llvm.cttz.i64(i64 [[N:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[N]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 63, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				entry:
				%add = sub i64 0, %n
				%and = and i64 %add, %n
				%mul = mul i64 %and, 571347909858961602
				%shr = lshr i64 %mul, 58
				%arrayidx = getelementptr inbounds [64 x i32], [64 x i32]* @ctz6.table, i64 0, i64 %shr
				%0 = load i32, i32* %arrayidx, align 4
				ret i32 %0
				}

				@ctz8.table = internal unnamed_addr constant [32 x i32] [i32 31, i32 0, i32 1, i32 23, i32 2, i32 18, i32 5, i32 24, i32 21, i32 3, i32 19, i32 9, i32 15, i32 6, i32 11, i32 25, i32 30, i32 22, i32 17, i32 4, i32 20, i32 8, i32 14, i32 10, i32 29, i32 16, i32 7, i32 13, i32 28, i32 12, i32 27, i32 26], align 4

				define i32 @ctz8(i32 %v) {
				; CHECK-LABEL: @ctz8(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[V:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[V]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 31, i32 [[TMP0]]
				; CHECK-NEXT: ret i32 [[TMP2]]
				;
				entry:
				%sub = sub i32 0, %v
				%and = and i32 %sub, %v
				%mul = mul i32 %and, 162449982
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i32], [32 x i32]* @ctz8.table, i64 0, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				ret i32 %0
				}

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"branch-target-enforcement", i32 0}
				!2 = !{i32 1, !"sign-return-address", i32 0}
				!3 = !{i32 1, !"sign-return-address-all", i32 0}
				!4 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
				!5 = !{i32 7, !"uwtable", i32 1}
				!6 = !{i32 7, !"frame-pointer", i32 1}
				!7 = !{!"clang version 14.0.0"}

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -aggressive-instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

				; ModuleID = 'test.c'
				source_filename = "test.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux"

				@f.table = internal unnamed_addr constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1

				define i32 @f(i32 %x) {
				; CHECK-LABEL: @f(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP3]] to i32
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @f.table, i64 0, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				ret i32 %conv
				}

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"branch-target-enforcement", i32 0}
				!2 = !{i32 1, !"sign-return-address", i32 0}
				!3 = !{i32 1, !"sign-return-address-all", i32 0}
				!4 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
				!5 = !{i32 7, !"uwtable", i32 1}
				!6 = !{i32 7, !"frame-pointer", i32 1}
				!7 = !{!"clang version 14.0.0"}

llvm/test/Transforms/AggressiveInstCombine/AARCH64/non-argument-value.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -aggressive-instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

				;; C reproducers:
				;; #include "stdio.h"
				;; unsigned x;
				;;
				;; int globalVar ()
				;; {
				;; static const char table[32] =
				;; {
				;; 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
				;; 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
				;; };
				;; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
				;; }
				;;
				;; int localVar ()
				;; {
				;; unsigned x;
				;; scanf("%u", &x);
				;; static const char table[32] =
				;; {
				;; 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
				;; 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
				;; };
				;; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
				;; }

				; ModuleID = 'x_not_arg.c'
				source_filename = "x_not_arg.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				@x = global i32 0, align 4
				@.str = private constant [3 x i8] c"%u\00", align 1
				@localVar.table = internal constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1

				define i32 @globalVar() {
				; CHECK-LABEL: @globalVar(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @x, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.cttz.i32(i32 [[TMP0]], i1 true)
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP0]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = trunc i32 [[TMP3]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP4]] to i32
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%0 = load i32, i32* @x, align 4
				%sub = sub i32 0, %0
				%and = and i32 %0, %sub
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @localVar.table, i64 0, i64 %idxprom
				%1 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %1 to i32
				ret i32 %conv
				}

				define i32 @localVar() {
				; CHECK-LABEL: @localVar(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[X]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]])
				; CHECK-NEXT: [[CALL:%.]] = call i32 (i8, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull [[X]])
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[X]], align 4
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.cttz.i32(i32 [[TMP1]], i1 true)
				; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[TMP1]], 0
				; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP5]] to i32
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]])
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%x = alloca i32, align 4
				%0 = bitcast i32* %x to i8*
				call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
				%call = call i32 (i8, ...) @__isoc99_scanf(i8 getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %x)
				%1 = load i32, i32* %x, align 4
				%sub = sub i32 0, %1
				%and = and i32 %1, %sub
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @localVar.table, i64 0, i64 %idxprom
				%2 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %2 to i32
				call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)
				ret i32 %conv
				}

				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture)
				declare noundef i32 @__isoc99_scanf(i8* nocapture noundef readonly, ...)
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture)

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"branch-target-enforcement", i32 0}
				!2 = !{i32 1, !"sign-return-address", i32 0}
				!3 = !{i32 1, !"sign-return-address-all", i32 0}
				!4 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
				!5 = !{i32 7, !"uwtable", i32 1}
				!6 = !{i32 7, !"frame-pointer", i32 1}
				!7 = !{!"clang version 14.0.0"}

llvm/test/Transforms/AggressiveInstCombine/AARCH64/zero-element.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -aggressive-instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

				; ModuleID = 'handle-zero-element.c'
				source_filename = "handle-zero-element.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				@ctz1.table = internal constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1

				; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn
				define i32 @ctz1(i32 %x) {
				; CHECK-LABEL: @ctz1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.cttz.i32(i32 [[X:%.]], i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP3]] to i32
				; CHECK-NEXT: ret i32 [[CONV]]
				;
				entry:
				%sub = sub i32 0, %x
				%and = and i32 %sub, %x
				%mul = mul i32 %and, 125613361
				%shr = lshr i32 %mul, 27
				%idxprom = zext i32 %shr to i64
				%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz1.table, i64 0, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				ret i32 %conv
				}

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"branch-target-enforcement", i32 0}
				!2 = !{i32 1, !"sign-return-address", i32 0}
				!3 = !{i32 1, !"sign-return-address-all", i32 0}
				!4 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
				!5 = !{i32 7, !"uwtable", i32 1}
				!6 = !{i32 7, !"frame-pointer", i32 1}
				!7 = !{!"clang version 14.0.0"}

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Lower Table Based CTTZ ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 387204

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/AARCH64/dereferencing-pointer.ll

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lit.local.cfg

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz-basics.ll

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz.ll

llvm/test/Transforms/AggressiveInstCombine/AARCH64/non-argument-value.ll

llvm/test/Transforms/AggressiveInstCombine/AARCH64/zero-element.ll

[AggressiveInstCombine] Lower Table Based CTTZ
ClosedPublic