This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4/25
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
2/3
icmp-shift-opt.ll
-
ARM/
-
consthoist-icmpimm.ll
1/2
icmp-shift-opt.ll
-
X86/
1
icmp-shift-opt.ll

Differential D111530

[TargetLowering] Optimize expanded SRL/SHL fed into SETCC ne/eq 0
AbandonedPublic

Authored by fzhinkin on Oct 11 2021, 3:12 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
RKSimon
t.p.northover

Summary

During legalization SHL/SRL nodes could be expanded into expression
that rotates low/high part of original input and then apply OR to
rotated part and other part. If the result of this operation is
compared for equality/not-equality with zero then rotation could
be removed as it does not affect comparision result.

Bug report: https://llvm.org/PR50197

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fzhinkin created this revision.Oct 11 2021, 3:12 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 11 2021, 3:12 AM

fzhinkin requested review of this revision.Oct 11 2021, 3:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2021, 3:12 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

fzhinkin retitled this revision from [TargetLowering] Optimize expanded SRL/SHL feeded into SETCC ne/eq 0 to [TargetLowering] Optimize expanded SRL/SHL fed into SETCC ne/eq 0.Oct 11 2021, 3:15 AM

fzhinkin added reviewers: spatel, lebedev.ri, RKSimon.

Harbormaster completed remote builds in B128062: Diff 378602.Oct 11 2021, 3:50 AM

craig.topper added a subscriber: craig.topper.Oct 11 2021, 5:13 AM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3683	Please use SDValue. LLVM is pretty conservative about the use of auto. https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
llvm/test/CodeGen/ARM/arm-icmp-shift-opt.ll
118 ↗	(On Diff #378602)	"does not match" -> "do not match" since constants is plural.

Fixed typos, cleaned up the code.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3683	Thanks for pointing to it, fixed.

Harbormaster completed remote builds in B128202: Diff 378795.Oct 11 2021, 3:24 PM

Ping

please can you pre-commit these new tests to trunk with current codegen and then rebase to show the diff?

llvm/test/CodeGen/AArch64/arm64-icmp-shift-opt.ll
1 ↗	(On Diff #378795)	rename icmp-shift-opt.ll?
llvm/test/CodeGen/ARM/arm-icmp-shift-opt.ll
1 ↗	(On Diff #378795)	rename icmp-shift-opt.ll?
llvm/test/CodeGen/X86/icmp-shift-opt.ll
8	add nounwind to reduce cfi noise

Renamed test files, added nounwind attribute.

Harbormaster completed remote builds in B129798: Diff 381050.Oct 20 2021, 1:12 PM

Fixed typos in ARM tests.

In D111530#3074772, @RKSimon wrote:

please can you pre-commit these new tests to trunk with current codegen and then rebase to show the diff?

I don't have a permission to commit changes, so I'll appreciate if you can help me with it. Here's the patch adding new tests with checks generated for current trunk:

50197_precommit_tests.patch13 KBDownload

Harbormaster completed remote builds in B129827: Diff 381086.Oct 20 2021, 2:43 PM

RKSimon mentioned this in rGd8e50c9dba7a: [CodeGen] Add PR50197 AArch64/ARM/X86 test coverage.Oct 22 2021, 6:24 AM

In D111530#3076353, @fzhinkin wrote:

In D111530#3074772, @RKSimon wrote:

please can you pre-commit these new tests to trunk with current codegen and then rebase to show the diff?

I don't have a permission to commit changes, so I'll appreciate if you can help me with it. Here's the patch adding new tests with checks generated for current trunk:
50197_precommit_tests.patch13 KBDownload

Done (sorry for the delay) - please can you rebase?

Rebased to current trunk

In D111530#3080538, @RKSimon wrote:

In D111530#3076353, @fzhinkin wrote:

In D111530#3074772, @RKSimon wrote:

please can you pre-commit these new tests to trunk with current codegen and then rebase to show the diff?

I don't have a permission to commit changes, so I'll appreciate if you can help me with it. Here's the patch adding new tests with checks generated for current trunk:
50197_precommit_tests.patch13 KBDownload

Done (sorry for the delay) - please can you rebase?

Thank you! Rebase done.

Also, thanks for adding i686 to X86's test, it revealed that my optimization does not work when we're legalizing i128 to i32. I'll check if that case could be easily supported.

Harbormaster completed remote builds in B130147: Diff 381553.Oct 22 2021, 8:21 AM

In D111530#3080743, @fzhinkin wrote:

In D111530#3080538, @RKSimon wrote:

In D111530#3076353, @fzhinkin wrote:

In D111530#3074772, @RKSimon wrote:

please can you pre-commit these new tests to trunk with current codegen and then rebase to show the diff?

I don't have a permission to commit changes, so I'll appreciate if you can help me with it. Here's the patch adding new tests with checks generated for current trunk:
50197_precommit_tests.patch13 KBDownload

Done (sorry for the delay) - please can you rebase?

Thank you! Rebase done.

Also, thanks for adding i686 to X86's test, it revealed that my optimization does not work when we're legalizing i128 to i32. I'll check if that case could be easily supported.

Handling of trees generated during legalization of i128/i256/etc to i32 is relatively simple, but in case of i686 some of these expressions are folded into funnel shifts before SimplifySetCC is applied to setcc.
I see two options here (but I'm far from being an expert, so correct me if I'm wrong and there are simpler alternatives):

support various instructions (funnel shifts, rotations, bit/byte swaps) in TargetLowering::optimizeSetCCOfExpandedShift in addition to SHL/SRL;
support only SHL/SRL in TargetLowering::optimizeSetCCOfExpandedShift and apply it in DAGTypeLegalizer::IntegerExpandSetCCOperands right after setcc's operands expansion.

Personally I'm leaning towards second option as it should be less fragile and easier to maintain.

In D111530#3083495, @fzhinkin wrote:

Handling of trees generated during legalization of i128/i256/etc to i32 is relatively simple, but in case of i686 some of these expressions are folded into funnel shifts before SimplifySetCC is applied to setcc.
I see two options here (but I'm far from being an expert, so correct me if I'm wrong and there are simpler alternatives):

support various instructions (funnel shifts, rotations, bit/byte swaps) in TargetLowering::optimizeSetCCOfExpandedShift in addition to SHL/SRL;

support only SHL/SRL in TargetLowering::optimizeSetCCOfExpandedShift and apply it in DAGTypeLegalizer::IntegerExpandSetCCOperands right after setcc's operands expansion.

Personally I'm leaning towards second option as it should be less fragile and easier to maintain.

Makes sense to try (2) first - although I expect at least partial support for (1) might end up being required - you are handling a pattern that is almost a funnel shift much of the time.

Reimplemented expanded shift matching to handle funnel shifts.

In D111530#3083530, @RKSimon wrote:

In D111530#3083495, @fzhinkin wrote:

Handling of trees generated during legalization of i128/i256/etc to i32 is relatively simple, but in case of i686 some of these expressions are folded into funnel shifts before SimplifySetCC is applied to setcc.
I see two options here (but I'm far from being an expert, so correct me if I'm wrong and there are simpler alternatives):

support various instructions (funnel shifts, rotations, bit/byte swaps) in TargetLowering::optimizeSetCCOfExpandedShift in addition to SHL/SRL;

support only SHL/SRL in TargetLowering::optimizeSetCCOfExpandedShift and apply it in DAGTypeLegalizer::IntegerExpandSetCCOperands right after setcc's operands expansion.

Personally I'm leaning towards second option as it should be less fragile and easier to maintain.

Makes sense to try (2) first - although I expect at least partial support for (1) might end up being required - you are handling a pattern that is almost a funnel shift much of the time.

Unfortunately (2) doesn't work well, because nodes created during shift expansion may have several uses until type legalization finished, so I gave it up.

Instead I supported funnel shifts TargetLowering::optimizeSetCCOfExpandedShift (did not support rotations and bit/byte swaps because such nodes should not be created during expanded shift's combining).

While optimization works fine for i686 now there is an issue with AArch64: shifts expanded from types wider than i128 won't be optimized (see @opt_setcc_shl_ne_zero_i256) because for AArch64 funnel shift alike patterns combined into AArch64ISD::EXTR instead of FSHL/FSHR. I attempted to fix it by implementing (2), but the solution was fragile and didn't work in some cases.

Harbormaster completed remote builds in B131781: Diff 383852.Nov 1 2021, 1:02 PM

In D111530#3100864, @fzhinkin wrote:

Instead I supported funnel shifts TargetLowering::optimizeSetCCOfExpandedShift (did not support rotations and bit/byte swaps because such nodes should not be created during expanded shift's combining).

While optimization works fine for i686 now there is an issue with AArch64: shifts expanded from types wider than i128 won't be optimized (see @opt_setcc_shl_ne_zero_i256) because for AArch64 funnel shift alike patterns combined into AArch64ISD::EXTR instead of FSHL/FSHR. I attempted to fix it by implementing (2), but the solution was fragile and didn't work in some cases.

I'm hoping D112443 will help with this

In D111530#3101173, @RKSimon wrote:

In D111530#3100864, @fzhinkin wrote:

Instead I supported funnel shifts TargetLowering::optimizeSetCCOfExpandedShift (did not support rotations and bit/byte swaps because such nodes should not be created during expanded shift's combining).

While optimization works fine for i686 now there is an issue with AArch64: shifts expanded from types wider than i128 won't be optimized (see @opt_setcc_shl_ne_zero_i256) because for AArch64 funnel shift alike patterns combined into AArch64ISD::EXTR instead of FSHL/FSHR. I attempted to fix it by implementing (2), but the solution was fragile and didn't work in some cases.

I'm hoping D112443 will help with this

D112443 has been committed - please can you see if it helps?

Rebase

In D111530#3102654, @RKSimon wrote:

In D111530#3101173, @RKSimon wrote:

In D111530#3100864, @fzhinkin wrote:

Instead I supported funnel shifts TargetLowering::optimizeSetCCOfExpandedShift (did not support rotations and bit/byte swaps because such nodes should not be created during expanded shift's combining).

While optimization works fine for i686 now there is an issue with AArch64: shifts expanded from types wider than i128 won't be optimized (see @opt_setcc_shl_ne_zero_i256) because for AArch64 funnel shift alike patterns combined into AArch64ISD::EXTR instead of FSHL/FSHR. I attempted to fix it by implementing (2), but the solution was fragile and didn't work in some cases.

I'm hoping D112443 will help with this

D112443 has been committed - please can you see if it helps?

Unfortunately it didn't affect AArch64ISD::EXTR combining.

X86 changes look good. Are AArch64 changes actually a regressions, or are they just changes?

Harbormaster completed remote builds in B132044: Diff 384205.Nov 2 2021, 2:07 PM

In D111530#3104045, @lebedev.ri wrote:

X86 changes look good. Are AArch64 changes actually a regressions, or are they just changes?

Thanks!
There is no regression for AArch64: better code sequence is generated for legalized i128 shifts, but there is no improvement for wider types (like i256).

In D111530#3104196, @fzhinkin wrote:

In D111530#3104045, @lebedev.ri wrote:

X86 changes look good. Are AArch64 changes actually a regressions, or are they just changes?

Thanks!
There is no regression for AArch64: better code sequence is generated for legalized i128 shifts, but there is no improvement for wider types (like i256).

A change that only triggers for some (common?) cases and causes no regressions is a better step forward
than a change that indiscriminately triggers on everything and causes widespread regressions i would think :)

In D111530#3104217, @lebedev.ri wrote:

In D111530#3104196, @fzhinkin wrote:

In D111530#3104045, @lebedev.ri wrote:

X86 changes look good. Are AArch64 changes actually a regressions, or are they just changes?

Thanks!
There is no regression for AArch64: better code sequence is generated for legalized i128 shifts, but there is no improvement for wider types (like i256).

A change that only triggers for some (common?) cases and causes no regressions is a better step forward
than a change that indiscriminately triggers on everything and causes widespread regressions i would think :)

I agree. I don't think that i256 and wider types are something frequently you can see in real code.

RKSimon added inline comments.Nov 3 2021, 3:14 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3671	Lots of comment duplication - description to header, and keep the code snippet here?
3712	const APInt &C
3748	(style) Break apart if-else chains when they return.
3758	const APInt &CVal =
3769	Why was i4096 of particular interest? We have a default depth limit we use for this kind of thing: SelectionDAG::MaxRecursionDepth = 6, also, normally we increment a depth to that value instead of decrementing a height.
3780	(style) Don't use auto
3789	Can't you avoid pushing results by just OR'ing the results once? SDValue Reduction = Result[0]; for (size_t I = 1, E = Results.size(); I < E; ++I) Reduction = DAG.getNode(ISD::OR, DL, N0.getValueType(), Reduction, Result[I]);

fzhinkin added inline comments.Nov 3 2021, 4:26 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3769	Nothing really interesting about i4096, just mentioned it to depict that max height of 8 is large enough for any type someone will use in actual code. I'll change `8` to `SelectionDAG::MaxRecursionDepth` and decrement to increment. Thanks!
3789	I'm pushing ORs back to the results list to generate balanced tree (and shorten critical path's length). And it pays off at least for ARM: ; llc -O3 -mtriple=armv7 test.ll define i1 @opt_setcc_shl_ne_zero_i128(i128 %a) nounwind { %shl = shl i128 %a, 17 %cmp = icmp ne i128 %shl, 0 ret i1 %cmp } Code generated using current implementation: opt_setcc_shl_ne_zero_i128: @ %bb.0: orr r2, r2, r3, lsl #17 orr r0, r1, r0 orrs r0, r0, r2 movwne r0, #1 bx lr Code generated using implementation OR'ing in place: opt_setcc_shl_ne_zero_i128: @ %bb.0: orr r0, r1, r0 orr r0, r0, r2 orr r0, r0, r3, lsl #17 cmp r0, #0 movwne r0, #1 bx lr

RKSimon added inline comments.Nov 3 2021, 5:27 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3789	OK - maybe mention that in the comment?

Cleanup

fzhinkin added inline comments.Nov 3 2021, 7:31 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3789	Updated the comment and fixed all other issues you've mentioned earlier.

Harbormaster completed remote builds in B132202: Diff 384426.Nov 3 2021, 8:17 AM

RKSimon added inline comments.Nov 3 2021, 11:54 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3682	Use isNullOrNullSplat?

Simplified expression in assertion.

Harbormaster completed remote builds in B132308: Diff 384556.Nov 3 2021, 1:03 PM

RKSimon added inline comments.Nov 7 2021, 5:25 AM

llvm/test/CodeGen/AArch64/icmp-shift-opt.ll
151	pre-commit?

fzhinkin added inline comments.Nov 8 2021, 1:06 AM

llvm/test/CodeGen/AArch64/icmp-shift-opt.ll
151	Will appreciate if you can help me with committing it: 50197_precommit_tests.patch.11 KBDownload

RKSimon mentioned this in rG1f60302a3741: [AArch64] Precommit i256 test from D111530.Nov 8 2021, 2:48 AM

fzhinkin added inline comments.Nov 8 2021, 6:55 AM

llvm/test/CodeGen/AArch64/icmp-shift-opt.ll
151	thanks!

Rebase

Harbormaster completed remote builds in B133014: Diff 385490.Nov 8 2021, 7:58 AM

RKSimon added inline comments.Nov 8 2021, 8:06 AM

llvm/test/CodeGen/ARM/icmp-shift-opt.ll
150	sorry - I didn't add this one - I'll commit this shortly.

RKSimon mentioned this in rG4ed13275b73c: [ARM] Precommit i128 test from D111530.Nov 8 2021, 8:08 AM

RKSimon added inline comments.Nov 8 2021, 8:13 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	This is doing something pretty similar, its more limited to the 'concat' pattern but handles the -1 'allbits' case as well.

Rebase

Harbormaster completed remote builds in B133029: Diff 385510.Nov 8 2021, 8:29 AM

fzhinkin added inline comments.Nov 8 2021, 8:47 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	Are you suggesting to extract and reuse code common to both optimizations or to support `-1` in `optimizeSetCCOfExpandedShift`?
llvm/test/CodeGen/ARM/icmp-shift-opt.ll
150	Thank you.

fzhinkin added inline comments.Nov 17 2021, 12:32 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	It does not make sense to support `-1` in `optimizeSetCCOfExpandedShift` because a shifted value (the method support only logical shifts) is never equal to `-1`. For me it seems like "concat" optimization and `optimizeSetCCOfExpandedShift` are not so similar. "concat" opt merges OR arms in place, but `optimizeSetCCOfExpandedShift` eliminate shift pairs that are scattered among the tree and the whole tree should be scanned before final OR reduction is applied.

RKSimon added inline comments.Nov 22 2021, 3:30 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	OK - what about the CmpZero case - is that redundant dead code now?

fzhinkin added inline comments.Nov 22 2021, 10:11 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	CmpZero case is redundant only if `Y`'s type was lowered to a pair of values having narrower type (`X` and `Y` are `i64` and comp. target is `armv7`). However, concat optimization still works for following cases not supported in `optimizeSetCCOfExpandedShift`: `X` and `Y` whose type was legal (`X` and `Y` are `i64`, `x86_64` is the target); `X` and `Y` are vectors. I did not support vector types in `optimizeSetCCOfExpandedShift` because it seems impossible to get a DAG with appropriate shape during vectors legalization.

fzhinkin added inline comments.Dec 7 2021, 11:11 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	`MergeConcat`'s `CmpZero` case will continue to work and there is no easy way to exclude cases that could be handled by both `optimizeSetCCOfExpandedShift` and `MergeConcat`. So I think it does make sense to apply `optimizeSetCCOfExpandedShift` after `MergeConcat` and only when types legalization is complete.

Reordered MergeConcat and optimizeSetCCOfExpandedShift optimizations.

Harbormaster completed remote builds in B142943: Diff 399363.Jan 12 2022, 10:37 AM

Cleanup

Harbormaster completed remote builds in B143439: Diff 400054.Jan 14 2022, 12:05 PM

fzhinkin added inline comments.Jan 17 2022, 9:53 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4436	So I think it does make sense to apply optimizeSetCCOfExpandedShift after MergeConcat and only when types legalization is complete. `optimizeSetCCOfExpandedShift` could not be disabled during type legalization phase, because some cases won't be handled then.

Rebase

Harbormaster completed remote builds in B147891: Diff 406343.Feb 7 2022, 1:41 AM

RKSimon edited the summary of this revision. (Show Details)Feb 7 2022, 5:11 AM

RKSimon added inline comments.Feb 7 2022, 5:16 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3728	You might be better off pulling this lambda out as static helper function instead of relying on recursive lambda calls?

fzhinkin added inline comments.Feb 7 2022, 11:08 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3728	Yes, thanks for suggestion. I think it'll be better to move both lambdas as well result reduction to a helper class.

Move most of optimization's code into helper class.

Harbormaster completed remote builds in B148322: Diff 406921.Feb 8 2022, 3:25 PM

Any more comments? @lebedev.ri @spatel @pengfei

RKSimon added inline comments.Feb 16 2022, 9:48 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3685	don't pass SDValue by reference - its just a pointer + int - it should pass by value very cheaply
3741	don't pass SDValue by reference

Pass SDValue by value.

Harbormaster completed remote builds in B150057: Diff 409381.Feb 16 2022, 2:07 PM

I didn't step through everything here, so I may not be seeing the entire problem.

There are 2 or more relatively simple folds that are missing both in IR and DAG, and adding those might solve this more generally and more easily than the proposed patch.

Here are examples in Alive2:
https://alive2.llvm.org/ce/z/KNQuYm
https://alive2.llvm.org/ce/z/LKLpo3

So it might be possible to solve the motivating bug without starting from icmp/setcc -- it's really a problem of combining shifts and funnel shifts in a way that is better for analysis/codegen.

We can show a potential codegen improvement for x86 with a minimal example:

% cat fsh.ll          
declare i32 @llvm.fshl.i32 (i32, i32, i32)
declare i32 @llvm.fshr.i32 (i32, i32, i32)

define i32 @src(i32 %x, i32 %y) {
  %y5 = shl i32 %y, 5
  %fun3 = call i32 @llvm.fshr.i32(i32 %x, i32 %y, i32 27)
  %or2 = or i32 %fun3, %y5
  ret i32 %or2
}


define i32 @tgt(i32 %x, i32 %y) {
  %x5 = shl i32 %x, 5
  %rot3 = call i32 @llvm.fshl.i32(i32 %y, i32 %y, i32 5)
  %or2 = or i32 %rot3, %x5
  ret i32 %or2
}

% llc -o - fsh.ll
_src:
	shldl	$5, %esi, %edi  ; avoid shld if possible because it is slow on some targets
	movl	%esi, %eax
	shll	$5, %eax
	orl	%edi, %eax
	ret

_tgt:
	movl	%esi, %eax
	shll	$5, %edi
	roll	$5, %eax
	orl	%edi, %eax
	retq

If I'm seeing the patterns correctly, there's at least one more possibility.
If a target has a legal+fast funnel shift, we could eliminate the shift completely from this sequence instead of forming a rotate:
https://alive2.llvm.org/ce/z/PpGpy3

In D111530#3329473, @spatel wrote:

I didn't step through everything here, so I may not be seeing the entire problem.

There are 2 or more relatively simple folds that are missing both in IR and DAG, and adding those might solve this more generally and more easily than the proposed patch.

Here are examples in Alive2:
https://alive2.llvm.org/ce/z/KNQuYm
https://alive2.llvm.org/ce/z/LKLpo3

So it might be possible to solve the motivating bug without starting from icmp/setcc -- it's really a problem of combining shifts and funnel shifts in a way that is better for analysis/codegen.

Transformation from this change is not only about shifts recombination but (and mostly) about removal of shifts that will rotate bits of a value compared with zero. In that particular case such shifts could be safely eliminated, but in any other case it's not possible as it will change the result.

By combining shifts differently (by replacing shift + funnel shift with rotate + shift, for example) we can improve performance on some platforms, but the case with icmp eq/ne 0 would still have to be handled separately as it allow transformation that is illegal otherwise.

In D111530#3332382, @fzhinkin wrote:

In D111530#3329473, @spatel wrote:

I didn't step through everything here, so I may not be seeing the entire problem.

There are 2 or more relatively simple folds that are missing both in IR and DAG, and adding those might solve this more generally and more easily than the proposed patch.

Here are examples in Alive2:
https://alive2.llvm.org/ce/z/KNQuYm
https://alive2.llvm.org/ce/z/LKLpo3

So it might be possible to solve the motivating bug without starting from icmp/setcc -- it's really a problem of combining shifts and funnel shifts in a way that is better for analysis/codegen.

Transformation from this change is not only about shifts recombination but (and mostly) about removal of shifts that will rotate bits of a value compared with zero. In that particular case such shifts could be safely eliminated, but in any other case it's not possible as it will change the result.

By combining shifts differently (by replacing shift + funnel shift with rotate + shift, for example) we can improve performance on some platforms, but the case with icmp eq/ne 0 would still have to be handled separately as it allow transformation that is illegal otherwise.

On the other hand it's just a special case where we're using "identify" instead of rotation.

I agree that a general solution for shifts combining could be developed and used instead of the transformation from this change.

spatel mentioned this in D120253: [InstSimplify] remove shift that is redundant with part of funnel shift.Feb 21 2022, 7:37 AM

spatel mentioned this in rGfc3b34c50803: [InstSimplify] remove shift that is redundant with part of funnel shift.Feb 23 2022, 6:10 AM

@spatel I remembered why I didn't add a transformation into DAGCombiner and decided to perform specific transformation in TargetLowering::SimplifySetCC:

to remove unnecessary shifts we have to know that legalized shift was an input of setcc eq/ne 0 and expanded shifts are not yet combined when DAGCombiner is visiting setcc;
and it's actually better to apply the optimization before shifts combining because for some targets shifts could be combined to some target-specific node instead of a funnel shift (for example, for AArch64 shifts implemening funnel shift will be combined into EXTR node instead of FSHL/FSHR).

As a generalization I was thinking to extract rotation-matching part of this optimization to a separate method that will actually combine shifts into rotations, call if from both TargetLowering::SimplifySetCC and DAGCombiner::MatchRotate and then eliminate rotations in TargetLowering::SimplifySetCC.
But it seems like or-shift expression trees that deep enough to apply such optimization should be quite rare. In fact, I'm hardly imagining cases other than setcc eq/ne 0.

In D111530#3340750, @fzhinkin wrote:

@spatel I remembered why I didn't add a transformation into DAGCombiner and decided to perform specific transformation in TargetLowering::SimplifySetCC:

to remove unnecessary shifts we have to know that legalized shift was an input of setcc eq/ne 0 and expanded shifts are not yet combined when DAGCombiner is visiting setcc;

and it's actually better to apply the optimization before shifts combining because for some targets shifts could be combined to some target-specific node instead of a funnel shift (for example, for AArch64 shifts implemening funnel shift will be combined into EXTR node instead of FSHL/FSHR).

As a generalization I was thinking to extract rotation-matching part of this optimization to a separate method that will actually combine shifts into rotations, call if from both TargetLowering::SimplifySetCC and DAGCombiner::MatchRotate and then eliminate rotations in TargetLowering::SimplifySetCC.
But it seems like or-shift expression trees that deep enough to apply such optimization should be quite rare. In fact, I'm hardly imagining cases other than setcc eq/ne 0.

I'm not convinced yet that we need to include the setcc to handle the motivating bug.

I noticed that we're missing an even more basic logic+shift transform (we miss this in IR too):
https://alive2.llvm.org/ce/z/RZjxTE

I think that transform should work with any shift opcode and any bitwise logic opcode (as long as they are matching opcodes within any one transform).

Would this patch still get the optimal result for the transformed code?

spatel mentioned this in D120516: [SDAG] fold bitwise logic with shifted operands.Feb 24 2022, 1:47 PM

spatel mentioned this in rGacb96ffd149d: [SDAG] fold bitwise logic with shifted operands.Feb 27 2022, 6:54 AM

spatel mentioned this in rG69684b84c61c: [SDAG] fold (rotate X) eq/ne (0/-1).Feb 27 2022, 8:36 AM

In D111530#3332382, @fzhinkin wrote:

Transformation from this change is not only about shifts recombination but (and mostly) about removal of shifts that will rotate bits of a value compared with zero. In that particular case such shifts could be safely eliminated, but in any other case it's not possible as it will change the result.

Now that we're reducing the pattern more with:
acb96ffd149db447
...I see what you mean. We were missing a transform that already exists in IR:
69684b84c61c

We still need 1-2 other patches to fix the motivating example, but I think we can get there with some small peepholes.

spatel mentioned this in rGe9302bf7efc7: [SDAG] try harder to remove a rotate from X == 0.Mar 3 2022, 6:26 AM

spatel mentioned this in D120933: [SDAG] match rotate pattern with extra 'or' operation.Mar 3 2022, 12:28 PM

spatel mentioned this in rG341623653d89: [SDAG] match rotate pattern with extra 'or' operation.Mar 9 2022, 10:19 AM

spatel mentioned this in rGc2592c374e46: [SDAG] simplify bitwise logic with repeated operand.Mar 13 2022, 8:31 AM

spatel mentioned this in D122919: [SDAG] try to reduce compare of funnel shift equal 0.Apr 1 2022, 10:37 AM

spatel mentioned this in rG2ed15984b49a: [SDAG] try to reduce compare of funnel shift equal 0.Apr 11 2022, 4:50 AM

@fzhinkin @spatel Is this patch still relevant?

Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 8:46 AM

In D111530#3484697, @RKSimon wrote:

@fzhinkin @spatel Is this patch still relevant?

@spatel fixed almost all cases (huge thanks for that), so this patch is no longer relevant.

fzhinkin abandoned this revision.May 2 2022, 1:58 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

11 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

150 lines

test/

CodeGen/

AArch64/

icmp-shift-opt.ll

18 lines

ARM/

consthoist-icmpimm.ll

24 lines

icmp-shift-opt.ll

36 lines

X86/

icmp-shift-opt.ll

137 lines

Diff 409381

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 4,796 Lines • ▼ Show 20 Lines	SDValue optimizeSetCCOfSignedTruncationCheck(EVT SCCVT, SDValue N0,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
const SDLoc &DL) const;		const SDLoc &DL) const;

// (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0		// (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0
SDValue optimizeSetCCByHoistingAndByConstFromLogicalShift(		SDValue optimizeSetCCByHoistingAndByConstFromLogicalShift(
EVT SCCVT, SDValue N0, SDValue N1C, ISD::CondCode Cond,		EVT SCCVT, SDValue N0, SDValue N1C, ISD::CondCode Cond,
DAGCombinerInfo &DCI, const SDLoc &DL) const;		DAGCombinerInfo &DCI, const SDLoc &DL) const;

		// Simplify SETCC testing shifted value for equality/non-equality to zero by
		// removing redundant operations generated during shift's expansion.
		// Shift's expansion (when its result fed into SETCC eq/ne 0) generates tree
		// consisting of OR and multiple SRL/SHL (that may be combined into
		// FSHL/FSHR). When such pair of shifts applied to the same operand it
		// performs rotation and it could be eliminated as far as the overall result
		// is compared with zero.
		SDValue optimizeSetCCOfExpandedShift(EVT SCCVT, SDValue N0, SDValue N1C,
		ISD::CondCode Cond, DAGCombinerInfo &DCI,
		const SDLoc &DL) const;

SDValue prepareUREMEqFold(EVT SETCCVT, SDValue REMNode,		SDValue prepareUREMEqFold(EVT SETCCVT, SDValue REMNode,
SDValue CompTargetNode, ISD::CondCode Cond,		SDValue CompTargetNode, ISD::CondCode Cond,
DAGCombinerInfo &DCI, const SDLoc &DL,		DAGCombinerInfo &DCI, const SDLoc &DL,
SmallVectorImpl<SDNode *> &Created) const;		SmallVectorImpl<SDNode *> &Created) const;
SDValue buildUREMEqFold(EVT SETCCVT, SDValue REMNode, SDValue CompTargetNode,		SDValue buildUREMEqFold(EVT SETCCVT, SDValue REMNode, SDValue CompTargetNode,
ISD::CondCode Cond, DAGCombinerInfo &DCI,		ISD::CondCode Cond, DAGCombinerInfo &DCI,
const SDLoc &DL) const;		const SDLoc &DL) const;

Show All 19 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,656 Lines • ▼ Show 20 Lines	SDValue TargetLowering::optimizeSetCCByHoistingAndByConstFromLogicalShift(
// Produce:		// Produce:
// ((X 'OppositeShiftOpcode' Y) & C) Cond 0		// ((X 'OppositeShiftOpcode' Y) & C) Cond 0
SDValue T0 = DAG.getNode(NewShiftOpcode, DL, VT, X, Y);		SDValue T0 = DAG.getNode(NewShiftOpcode, DL, VT, X, Y);
SDValue T1 = DAG.getNode(ISD::AND, DL, VT, T0, C);		SDValue T1 = DAG.getNode(ISD::AND, DL, VT, T0, C);
SDValue T2 = DAG.getSetCC(DL, SCCVT, T1, N1C, Cond);		SDValue T2 = DAG.getSetCC(DL, SCCVT, T1, N1C, Cond);
return T2;		return T2;
}		}

		namespace {
		/// Helper class for optimizeSetCCOfExpandedShift.
		/// Scans an expression tree consiting of ORs and shifts to find and replace
		/// shift pairs performing rotation with a rotation's operand.
		class ExpandedShiftsSimplifier {
		struct ShiftInfo {
		APInt Bits;
		RKSimonUnsubmitted Not Done Reply Inline Actions Lots of comment duplication - description to header, and keep the code snippet here? RKSimon: Lots of comment duplication - description to header, and keep the code snippet here?
		bool IsLeft;
		};
		SmallDenseMap<SDValue, ShiftInfo, 16> UnmatchedShifts;
		SmallVector<SDValue, 16> Result;
		unsigned MatchedShiftsCount = 0;

		/// Match pairs of shifts applied to the same operand that effectively
		/// perform its rotation:
		/// 1) create a new entry in UnmatchedShifts map if Op was observed
		/// for the first time;
		/// 2) if UnmatchedShifts map contains an entry for the Op check that
		RKSimonUnsubmitted Not Done Reply Inline Actions Use isNullOrNullSplat? RKSimon: Use isNullOrNullSplat?
		/// it was created for a shift in opposite direction and that
		craig.topperUnsubmitted Not Done Reply Inline Actions Please use SDValue. LLVM is pretty conservative about the use of auto. https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable craig.topper: Please use SDValue. LLVM is pretty conservative about the use of auto. https://llvm.
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing to it, fixed. fzhinkin: Thanks for pointing to it, fixed.
		/// amount of bits in these two shifts is summed up to OpSizeInBits.
		bool matchShifts(SDValue Op, const APInt &C, bool IsLeft) {
		RKSimonUnsubmitted Not Done Reply Inline Actions don't pass SDValue by reference - its just a pointer + int - it should pass by value very cheaply RKSimon: don't pass SDValue by reference - its just a pointer + int - it should pass by value very…
		ShiftInfo &Info = UnmatchedShifts[Op];
		if (Info.Bits.isZero()) {
		Info.Bits = C;
		Info.IsLeft = IsLeft;
		return true;
		}
		if (Info.IsLeft == IsLeft)
		return false;
		if (Info.Bits + C != Op.getValueSizeInBits())
		return false;
		Result.push_back(Op);
		UnmatchedShifts.erase(Op);
		++MatchedShiftsCount;
		return true;
		}

		/// Recursively scan DAG to match all shifts while following conditions are
		/// met:
		/// 1) every node should has only one use;
		/// 2) every shift should be either first shift found for its operand
		/// or there shoud be previously found matching shift in opposite
		/// direction;
		/// 3) Depth should be lower than SelectionDAG::MaxRecursionDepth
		bool scan(SDValue Value, unsigned Depth = 0) {
		if (Depth >= SelectionDAG::MaxRecursionDepth \|\| !Value->hasOneUse())
		return false;
		unsigned Opcode = Value->getOpcode();
		RKSimonUnsubmitted Not Done Reply Inline Actions const APInt &C RKSimon: const APInt &C
		bool IsShiftLeft = Opcode == ISD::SHL \|\| Opcode == ISD::FSHL;
		ConstantSDNode *C;

		if (Opcode == ISD::OR)
		return scan(Value->getOperand(0), Depth + 1) &&
		scan(Value->getOperand(1), Depth + 1);
		if (Opcode == ISD::SRL \|\| Opcode == ISD::SHL) {
		if (!(C = dyn_cast<ConstantSDNode>(Value->getOperand(1))))
		return false;
		SDValue Op = Value->getOperand(0);
		return matchShifts(Op, C->getAPIntValue(), IsShiftLeft);
		}
		if (Opcode == ISD::FSHL \|\| Opcode == ISD::FSHR) {
		if (!(C = dyn_cast<ConstantSDNode>(Value->getOperand(2))))
		return false;
		SDValue Op1 = Value->getOperand(0);
		RKSimonUnsubmitted Not Done Reply Inline Actions You might be better off pulling this lambda out as static helper function instead of relying on recursive lambda calls? RKSimon: You might be better off pulling this lambda out as static helper function instead of relying on…
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions Yes, thanks for suggestion. I think it'll be better to move both lambdas as well result reduction to a helper class. fzhinkin: Yes, thanks for suggestion. I think it'll be better to move both lambdas as well result…
		SDValue Op2 = Value->getOperand(1);
		const APInt &CVal = C->getAPIntValue();
		// For funnel shifts second operand is effectively shifted
		// in opposite direction.
		return matchShifts(Op1, CVal, IsShiftLeft) &&
		matchShifts(Op2, Value.getValueSizeInBits() - CVal, !IsShiftLeft);
		}
		Result.push_back(Value);
		return true;
		}

		public:
		SDValue simplify(SDValue Value, SelectionDAG &DAG, const SDLoc &DL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions don't pass SDValue by reference RKSimon: don't pass SDValue by reference
		assert(Result.size() == 0 && UnmatchedShifts.size() == 0 &&
		MatchedShiftsCount == 0 && "simplify could be called only once");

		if (!scan(Value))
		return SDValue();
		// There should be at most one unmatched shift and at least one pair
		// of matched shifts.
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Break apart if-else chains when they return. RKSimon: (style) Break apart if-else chains when they return.
		if (MatchedShiftsCount == 0 \|\| UnmatchedShifts.size() > 1 \|\| Result.empty())
		return SDValue();
		// Recreate value for unmatched shift.
		if (!UnmatchedShifts.empty()) {
		DenseMap<SDValue, ShiftInfo>::iterator UnmatchedShift =
		UnmatchedShifts.begin();
		SDValue Op = UnmatchedShift->first;
		ShiftInfo &Info = UnmatchedShift->second;
		EVT ShiftTy =
		EVT::getIntegerVT(*DAG.getContext(), Info.Bits.getBitWidth());
		RKSimonUnsubmitted Not Done Reply Inline Actions const APInt &CVal = RKSimon: const APInt &CVal =
		SDValue Con = DAG.getConstant(Info.Bits, DL, ShiftTy);
		unsigned ShiftOpcode = Info.IsLeft ? ISD::SHL : ISD::SRL;
		Result.push_back(
		DAG.getNode(ShiftOpcode, DL, Value.getValueType(), Op, Con));
		}
		// Reduce all values using OR.
		// Push new OR back to the Result list and combine pairs of values from it
		// to generate balanced tree and shorten the critical path.
		for (size_t Index = 0; Index + 1 < Result.size(); Index += 2) {
		SDValue NewOr = DAG.getNode(ISD::OR, DL, Value.getValueType(),
		Result[Index], Result[Index + 1]);
		RKSimonUnsubmitted Not Done Reply Inline Actions Why was i4096 of particular interest? We have a default depth limit we use for this kind of thing: SelectionDAG::MaxRecursionDepth = 6, also, normally we increment a depth to that value instead of decrementing a height. RKSimon: Why was i4096 of particular interest? We have a default depth limit we use for this kind of…
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions Nothing really interesting about i4096, just mentioned it to depict that max height of 8 is large enough for any type someone will use in actual code. I'll change `8` to `SelectionDAG::MaxRecursionDepth` and decrement to increment. Thanks! fzhinkin: Nothing really interesting about i4096, just mentioned it to depict that max height of 8 is…
		Result.push_back(NewOr);
		}
		return Result.back();
		}
		};

		} // end anonymous namespace

		// Example of redundant shifts elimination:
		// (or (or (srl X, C0), (shl Y, C1)), (srl Y, C0)) ==/!= 0
		// --> (or (srl X, C0), Y) ==/!= 0
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Don't use auto RKSimon: (style) Don't use auto
		//
		// (or (or (srl Y, C0), (shl X, C1)), (shl Y, C1)) ==/!= 0
		// --> (or (shl X, C1), Y) ==/!= 0
		//
		// (or (srl X, C), (fshr X, Y, C)) ==/!= 0 --> (or (srl Y, C), X)
		//
		// (or (or (fshl W, X, C), (fshl X, Y, C)),
		// (or (fshl Y, Z, C), (shl Z, C)))) ==/!= 0
		// --> (or (or (shl W, C), X), (or Y, Z))
		RKSimonUnsubmitted Not Done Reply Inline Actions Can't you avoid pushing results by just OR'ing the results once? SDValue Reduction = Result[0]; for (size_t I = 1, E = Results.size(); I < E; ++I) Reduction = DAG.getNode(ISD::OR, DL, N0.getValueType(), Reduction, Result[I]); RKSimon: Can't you avoid pushing results by just OR'ing the results once? ``` SDValue Reduction = Result…
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions I'm pushing ORs back to the results list to generate balanced tree (and shorten critical path's length). And it pays off at least for ARM: ; llc -O3 -mtriple=armv7 test.ll define i1 @opt_setcc_shl_ne_zero_i128(i128 %a) nounwind { %shl = shl i128 %a, 17 %cmp = icmp ne i128 %shl, 0 ret i1 %cmp } Code generated using current implementation: opt_setcc_shl_ne_zero_i128: @ %bb.0: orr r2, r2, r3, lsl #17 orr r0, r1, r0 orrs r0, r0, r2 movwne r0, #1 bx lr Code generated using implementation OR'ing in place: opt_setcc_shl_ne_zero_i128: @ %bb.0: orr r0, r1, r0 orr r0, r0, r2 orr r0, r0, r3, lsl #17 cmp r0, #0 movwne r0, #1 bx lr fzhinkin: I'm pushing ORs back to the results list to generate balanced tree (and shorten critical path's…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - maybe mention that in the comment? RKSimon: OK - maybe mention that in the comment?
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions Updated the comment and fixed all other issues you've mentioned earlier. fzhinkin: Updated the comment and fixed all other issues you've mentioned earlier.
		SDValue TargetLowering::optimizeSetCCOfExpandedShift(EVT SCCVT, SDValue N0,
		SDValue N1C,
		ISD::CondCode Cond,
		DAGCombinerInfo &DCI,
		const SDLoc &DL) const {
		assert(isNullOrNullSplat(N1C) && "Should be a comparison with 0.");
		assert((Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) && "Unexpected condcode");

		if (N0.getValueType().isVector())
		return SDValue();

		SelectionDAG &DAG = DCI.DAG;
		ExpandedShiftsSimplifier matcher;
		if (SDValue ReducedTree = matcher.simplify(N0, DAG, DL))
		return DAG.getSetCC(DL, SCCVT, ReducedTree, N1C, Cond);

		return SDValue();
		}

/// Try to fold an equality comparison with a {add/sub/xor} binary operation as		/// Try to fold an equality comparison with a {add/sub/xor} binary operation as
/// the 1st operand (N0). Callers are expected to swap the N0/N1 parameters to		/// the 1st operand (N0). Callers are expected to swap the N0/N1 parameters to
/// handle the commuted versions of these patterns.		/// handle the commuted versions of these patterns.
SDValue TargetLowering::foldSetCCWithBinOp(EVT VT, SDValue N0, SDValue N1,		SDValue TargetLowering::foldSetCCWithBinOp(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,		ISD::CondCode Cond, const SDLoc &DL,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
unsigned BOpcode = N0.getOpcode();		unsigned BOpcode = N0.getOpcode();
assert((BOpcode == ISD::ADD \|\| BOpcode == ISD::SUB \|\| BOpcode == ISD::XOR) &&		assert((BOpcode == ISD::ADD \|\| BOpcode == ISD::SUB \|\| BOpcode == ISD::XOR) &&
▲ Show 20 Lines • Show All 611 Lines • ▼ Show 20 Lines	if (Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) {
if (C1.isZero())		if (C1.isZero())
if (SDValue CC = optimizeSetCCByHoistingAndByConstFromLogicalShift(		if (SDValue CC = optimizeSetCCByHoistingAndByConstFromLogicalShift(
VT, N0, N1, Cond, DCI, dl))		VT, N0, N1, Cond, DCI, dl))
return CC;		return CC;

// For all/any comparisons, replace or(x,shl(y,bw/2)) with and/or(x,y).		// For all/any comparisons, replace or(x,shl(y,bw/2)) with and/or(x,y).
// For example, when high 32-bits of i64 X are known clear:		// For example, when high 32-bits of i64 X are known clear:
// all bits clear: (X \| (Y<<32)) == 0 --> (X \| Y) == 0		// all bits clear: (X \| (Y<<32)) == 0 --> (X \| Y) == 0
// all bits set: (X \| (Y<<32)) == -1 --> (X & Y) == -1		// all bits set: (X \| (Y<<32)) == -1 --> (X & Y) == -1
		RKSimonUnsubmitted Not Done Reply Inline Actions This is doing something pretty similar, its more limited to the 'concat' pattern but handles the -1 'allbits' case as well. RKSimon: This is doing something pretty similar, its more limited to the 'concat' pattern but handles…
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions Are you suggesting to extract and reuse code common to both optimizations or to support `-1` in `optimizeSetCCOfExpandedShift`? fzhinkin: Are you suggesting to extract and reuse code common to both optimizations or to support `-1` in…
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions It does not make sense to support `-1` in `optimizeSetCCOfExpandedShift` because a shifted value (the method support only logical shifts) is never equal to `-1`. For me it seems like "concat" optimization and `optimizeSetCCOfExpandedShift` are not so similar. "concat" opt merges OR arms in place, but `optimizeSetCCOfExpandedShift` eliminate shift pairs that are scattered among the tree and the whole tree should be scanned before final OR reduction is applied. fzhinkin: It does not make sense to support `-1` in `optimizeSetCCOfExpandedShift` because a shifted…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - what about the CmpZero case - is that redundant dead code now? RKSimon: OK - what about the CmpZero case - is that redundant dead code now?
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions CmpZero case is redundant only if `Y`'s type was lowered to a pair of values having narrower type (`X` and `Y` are `i64` and comp. target is `armv7`). However, concat optimization still works for following cases not supported in `optimizeSetCCOfExpandedShift`: `X` and `Y` whose type was legal (`X` and `Y` are `i64`, `x86_64` is the target); `X` and `Y` are vectors. I did not support vector types in `optimizeSetCCOfExpandedShift` because it seems impossible to get a DAG with appropriate shape during vectors legalization. fzhinkin: CmpZero case is redundant only if `Y`'s type was lowered to a pair of values having narrower…
		fzhinkinAuthorUnsubmitted Not Done Reply Inline Actions `MergeConcat`'s `CmpZero` case will continue to work and there is no easy way to exclude cases that could be handled by both `optimizeSetCCOfExpandedShift` and `MergeConcat`. So I think it does make sense to apply `optimizeSetCCOfExpandedShift` after `MergeConcat` and only when types legalization is complete. fzhinkin: `MergeConcat`'s `CmpZero` case will continue to work and there is no easy way to exclude cases…
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions So I think it does make sense to apply optimizeSetCCOfExpandedShift after MergeConcat and only when types legalization is complete. `optimizeSetCCOfExpandedShift` could not be disabled during type legalization phase, because some cases won't be handled then. fzhinkin: > So I think it does make sense to apply optimizeSetCCOfExpandedShift after MergeConcat and…
bool CmpZero = N1C->getAPIntValue().isZero();		bool CmpZero = N1C->getAPIntValue().isZero();
bool CmpNegOne = N1C->getAPIntValue().isAllOnes();		bool CmpNegOne = N1C->getAPIntValue().isAllOnes();
if ((CmpZero \|\| CmpNegOne) && N0.hasOneUse()) {		if ((CmpZero \|\| CmpNegOne) && N0.hasOneUse()) {
// Match or(lo,shl(hi,bw/2)) pattern.		// Match or(lo,shl(hi,bw/2)) pattern.
auto IsConcat = [&](SDValue V, SDValue &Lo, SDValue &Hi) {		auto IsConcat = [&](SDValue V, SDValue &Lo, SDValue &Hi) {
unsigned EltBits = V.getScalarValueSizeInBits();		unsigned EltBits = V.getScalarValueSizeInBits();
if (V.getOpcode() != ISD::OR \|\| (EltBits % 2) != 0)		if (V.getOpcode() != ISD::OR \|\| (EltBits % 2) != 0)
return false;		return false;
Show All 40 Lines	if (Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) {
SDValue Lo0, Lo1, Hi0, Hi1;		SDValue Lo0, Lo1, Hi0, Hi1;
if (IsConcat(N0.getOperand(0), Lo0, Hi0) &&		if (IsConcat(N0.getOperand(0), Lo0, Hi0) &&
IsConcat(N0.getOperand(1), Lo1, Hi1)) {		IsConcat(N0.getOperand(1), Lo1, Hi1)) {
return MergeConcat(DAG.getNode(N0.getOpcode(), dl, OpVT, Lo0, Lo1),		return MergeConcat(DAG.getNode(N0.getOpcode(), dl, OpVT, Lo0, Lo1),
DAG.getNode(N0.getOpcode(), dl, OpVT, Hi0, Hi1));		DAG.getNode(N0.getOpcode(), dl, OpVT, Hi0, Hi1));
}		}
}		}
}		}
		if (CmpZero)
		// Try to simplify expanded shift by removing shift operations
		// that effectively perform rotation.
		if (SDValue CC =
		optimizeSetCCOfExpandedShift(VT, N0, N1, Cond, DCI, dl))
		return CC;
}		}

// If we have "setcc X, C0", check to see if we can shrink the immediate		// If we have "setcc X, C0", check to see if we can shrink the immediate
// by changing cc.		// by changing cc.
// TODO: Support this for vectors after legalize ops.		// TODO: Support this for vectors after legalize ops.
if (!VT.isVector() \|\| DCI.isBeforeLegalizeOps()) {		if (!VT.isVector() \|\| DCI.isBeforeLegalizeOps()) {
// SETUGT X, SINTMAX -> SETLT X, 0		// SETUGT X, SINTMAX -> SETLT X, 0
// SETUGE X, SINTMIN -> SETLT X, 0		// SETUGE X, SINTMIN -> SETLT X, 0
▲ Show 20 Lines • Show All 4,863 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/icmp-shift-opt.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=arm64-eabi \| FileCheck %s		; RUN: llc < %s -mtriple=arm64-eabi \| FileCheck %s

; Optimize expanded SRL/SHL used as an input of		; Optimize expanded SRL/SHL used as an input of
; SETCC comparing it with zero by removing rotation.		; SETCC comparing it with zero by removing rotation.
;		;
; See https://bugs.llvm.org/show_bug.cgi?id=50197		; See https://bugs.llvm.org/show_bug.cgi?id=50197
define i128 @opt_setcc_lt_power_of_2(i128 %a) nounwind {		define i128 @opt_setcc_lt_power_of_2(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_lt_power_of_2:		; CHECK-LABEL: opt_setcc_lt_power_of_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: .LBB0_1: // %loop		; CHECK-NEXT: .LBB0_1: // %loop
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: adds x0, x0, #1		; CHECK-NEXT: adds x0, x0, #1
; CHECK-NEXT: adcs x1, x1, xzr		; CHECK-NEXT: adcs x1, x1, xzr
; CHECK-NEXT: extr x8, x1, x0, #60		; CHECK-NEXT: orr x8, x1, x0, lsr #60
; CHECK-NEXT: orr x8, x8, x1, lsr #60
; CHECK-NEXT: cbnz x8, .LBB0_1		; CHECK-NEXT: cbnz x8, .LBB0_1
; CHECK-NEXT: // %bb.2: // %exit		; CHECK-NEXT: // %bb.2: // %exit
; CHECK-NEXT: ret		; CHECK-NEXT: ret
br label %loop		br label %loop

loop:		loop:
%phi.a = phi i128 [ %a, %0 ], [ %inc, %loop ]		%phi.a = phi i128 [ %a, %0 ], [ %inc, %loop ]
%inc = add i128 %phi.a, 1		%inc = add i128 %phi.a, 1
%cmp = icmp ult i128 %inc, 1152921504606846976		%cmp = icmp ult i128 %inc, 1152921504606846976
br i1 %cmp, label %exit, label %loop		br i1 %cmp, label %exit, label %loop

exit:		exit:
ret i128 %inc		ret i128 %inc
}		}

define i1 @opt_setcc_srl_eq_zero(i128 %a) nounwind {		define i1 @opt_setcc_srl_eq_zero(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_srl_eq_zero:		; CHECK-LABEL: opt_setcc_srl_eq_zero:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: extr x8, x1, x0, #17		; CHECK-NEXT: orr x8, x1, x0, lsr #17
; CHECK-NEXT: orr x8, x8, x1, lsr #17
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, eq		; CHECK-NEXT: cset w0, eq
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%srl = lshr i128 %a, 17		%srl = lshr i128 %a, 17
%cmp = icmp eq i128 %srl, 0		%cmp = icmp eq i128 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_srl_ne_zero(i128 %a) nounwind {		define i1 @opt_setcc_srl_ne_zero(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_srl_ne_zero:		; CHECK-LABEL: opt_setcc_srl_ne_zero:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: extr x8, x1, x0, #17		; CHECK-NEXT: orr x8, x1, x0, lsr #17
; CHECK-NEXT: orr x8, x8, x1, lsr #17
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, ne		; CHECK-NEXT: cset w0, ne
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%srl = lshr i128 %a, 17		%srl = lshr i128 %a, 17
%cmp = icmp ne i128 %srl, 0		%cmp = icmp ne i128 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_eq_zero(i128 %a) nounwind {		define i1 @opt_setcc_shl_eq_zero(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_shl_eq_zero:		; CHECK-LABEL: opt_setcc_shl_eq_zero:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: extr x8, x1, x0, #47		; CHECK-NEXT: orr x8, x0, x1, lsl #17
; CHECK-NEXT: orr x8, x8, x0, lsl #17
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, eq		; CHECK-NEXT: cset w0, eq
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shl = shl i128 %a, 17		%shl = shl i128 %a, 17
%cmp = icmp eq i128 %shl, 0		%cmp = icmp eq i128 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_ne_zero(i128 %a) nounwind {		define i1 @opt_setcc_shl_ne_zero(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_shl_ne_zero:		; CHECK-LABEL: opt_setcc_shl_ne_zero:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: extr x8, x1, x0, #47		; CHECK-NEXT: orr x8, x0, x1, lsl #17
; CHECK-NEXT: orr x8, x8, x0, lsl #17
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, ne		; CHECK-NEXT: cset w0, ne
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shl = shl i128 %a, 17		%shl = shl i128 %a, 17
%cmp = icmp ne i128 %shl, 0		%cmp = icmp ne i128 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

Show All 17 Lines	; CHECK-NEXT: ret
ret i1 %cmp		ret i1 %cmp
}		}

; Check that optimization is applied to DAG having appropriate shape		; Check that optimization is applied to DAG having appropriate shape
; even if there were no actual shift's expansion.		; even if there were no actual shift's expansion.
define i1 @opt_setcc_expanded_shl_correct_shifts(i64 %a, i64 %b) nounwind {		define i1 @opt_setcc_expanded_shl_correct_shifts(i64 %a, i64 %b) nounwind {
; CHECK-LABEL: opt_setcc_expanded_shl_correct_shifts:		; CHECK-LABEL: opt_setcc_expanded_shl_correct_shifts:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: extr x8, x0, x1, #47		; CHECK-NEXT: orr x8, x1, x0, lsl #17
; CHECK-NEXT: orr x8, x8, x1, lsl #17
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, eq		; CHECK-NEXT: cset w0, eq
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shl.a = shl i64 %a, 17		%shl.a = shl i64 %a, 17
%srl.b = lshr i64 %b, 47		%srl.b = lshr i64 %b, 47
%or.0 = or i64 %shl.a, %srl.b		%or.0 = or i64 %shl.a, %srl.b
%shl.b = shl i64 %b, 17		%shl.b = shl i64 %b, 17
%or.1 = or i64 %or.0, %shl.b		%or.1 = or i64 %or.0, %shl.b
Show All 30 Lines
; CHECK-NEXT: orr x8, x10, x8		; CHECK-NEXT: orr x8, x10, x8
; CHECK-NEXT: orr x8, x9, x8		; CHECK-NEXT: orr x8, x9, x8
; CHECK-NEXT: cmp x8, #0		; CHECK-NEXT: cmp x8, #0
; CHECK-NEXT: cset w0, ne		; CHECK-NEXT: cset w0, ne
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shl = shl i256 %a, 17		%shl = shl i256 %a, 17
%cmp = icmp ne i256 %shl, 0		%cmp = icmp ne i256 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions pre-commit? RKSimon: pre-commit?
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions Will appreciate if you can help me with committing it: 50197_precommit_tests.patch.11 KBDownload fzhinkin: Will appreciate if you can help me with committing it: {F20154245}
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions thanks! fzhinkin: thanks!

declare void @use(i128 %a)		declare void @use(i128 %a)

llvm/test/CodeGen/ARM/consthoist-icmpimm.ll

	Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines
	; CHECKV7M-NEXT: .save {r7, lr}			; CHECKV7M-NEXT: .save {r7, lr}
	; CHECKV7M-NEXT: push {r7, lr}			; CHECKV7M-NEXT: push {r7, lr}
	; CHECKV7M-NEXT: mov r12, r0			; CHECKV7M-NEXT: mov r12, r0
	; CHECKV7M-NEXT: ldr r0, [sp, #16]			; CHECKV7M-NEXT: ldr r0, [sp, #16]
	; CHECKV7M-NEXT: lsls r0, r0, #31			; CHECKV7M-NEXT: lsls r0, r0, #31
	; CHECKV7M-NEXT: ldrd lr, r0, [sp, #8]			; CHECKV7M-NEXT: ldrd lr, r0, [sp, #8]
	; CHECKV7M-NEXT: beq .LBB6_2			; CHECKV7M-NEXT: beq .LBB6_2
	; CHECKV7M-NEXT: @ %bb.1: @ %then			; CHECKV7M-NEXT: @ %bb.1: @ %then
	; CHECKV7M-NEXT: lsrs r2, r2, #17			; CHECKV7M-NEXT: orr.w r2, r3, r2, lsr #17
	; CHECKV7M-NEXT: orr.w r2, r2, r3, lsl #15			; CHECKV7M-NEXT: orr.w r1, r1, r12, lsr #17
	; CHECKV7M-NEXT: orr.w r2, r2, r3, lsr #17
	; CHECKV7M-NEXT: lsr.w r3, r12, #17
	; CHECKV7M-NEXT: orr.w r3, r3, r1, lsl #15
	; CHECKV7M-NEXT: cmp r2, #0			; CHECKV7M-NEXT: cmp r2, #0
	; CHECKV7M-NEXT: mov r2, r0			; CHECKV7M-NEXT: mov r2, r0
	; CHECKV7M-NEXT: orr.w r1, r3, r1, lsr #17
	; CHECKV7M-NEXT: it ne			; CHECKV7M-NEXT: it ne
	; CHECKV7M-NEXT: movne r2, lr			; CHECKV7M-NEXT: movne r2, lr
	; CHECKV7M-NEXT: cmp r1, #0			; CHECKV7M-NEXT: cmp r1, #0
	; CHECKV7M-NEXT: it ne			; CHECKV7M-NEXT: it ne
	; CHECKV7M-NEXT: movne r0, lr			; CHECKV7M-NEXT: movne r0, lr
	; CHECKV7M-NEXT: add r0, r2			; CHECKV7M-NEXT: add r0, r2
	; CHECKV7M-NEXT: pop {r7, pc}			; CHECKV7M-NEXT: pop {r7, pc}
	; CHECKV7M-NEXT: .LBB6_2: @ %else			; CHECKV7M-NEXT: .LBB6_2: @ %else
	; CHECKV7M-NEXT: lsrs r1, r2, #17			; CHECKV7M-NEXT: orr.w r1, r3, r2, lsr #17
	; CHECKV7M-NEXT: orr.w r1, r1, r3, lsl #15
	; CHECKV7M-NEXT: orr.w r1, r1, r3, lsr #17
	; CHECKV7M-NEXT: cmp r1, #0			; CHECKV7M-NEXT: cmp r1, #0
	; CHECKV7M-NEXT: it ne			; CHECKV7M-NEXT: it ne
	; CHECKV7M-NEXT: movne r0, lr			; CHECKV7M-NEXT: movne r0, lr
	; CHECKV7M-NEXT: pop {r7, pc}			; CHECKV7M-NEXT: pop {r7, pc}
	;			;
	; CHECKV7A-LABEL: icmp64_uge_m2:			; CHECKV7A-LABEL: icmp64_uge_m2:
	; CHECKV7A: @ %bb.0:			; CHECKV7A: @ %bb.0:
	; CHECKV7A-NEXT: .save {r4, lr}			; CHECKV7A-NEXT: .save {r4, lr}
	; CHECKV7A-NEXT: push {r4, lr}			; CHECKV7A-NEXT: push {r4, lr}
	; CHECKV7A-NEXT: ldr r4, [sp, #16]			; CHECKV7A-NEXT: ldr r4, [sp, #16]
	; CHECKV7A-NEXT: mov r12, r0			; CHECKV7A-NEXT: mov r12, r0
	; CHECKV7A-NEXT: ldrd lr, r0, [sp, #8]			; CHECKV7A-NEXT: ldrd lr, r0, [sp, #8]
	; CHECKV7A-NEXT: lsls r4, r4, #31			; CHECKV7A-NEXT: lsls r4, r4, #31
	; CHECKV7A-NEXT: beq .LBB6_2			; CHECKV7A-NEXT: beq .LBB6_2
	; CHECKV7A-NEXT: @ %bb.1: @ %then			; CHECKV7A-NEXT: @ %bb.1: @ %then
	; CHECKV7A-NEXT: lsrs r2, r2, #17			; CHECKV7A-NEXT: orr.w r2, r3, r2, lsr #17
	; CHECKV7A-NEXT: orr.w r2, r2, r3, lsl #15			; CHECKV7A-NEXT: orr.w r1, r1, r12, lsr #17
	; CHECKV7A-NEXT: orr.w r2, r2, r3, lsr #17
	; CHECKV7A-NEXT: lsr.w r3, r12, #17
	; CHECKV7A-NEXT: orr.w r3, r3, r1, lsl #15
	; CHECKV7A-NEXT: cmp r2, #0			; CHECKV7A-NEXT: cmp r2, #0
	; CHECKV7A-NEXT: mov r2, r0			; CHECKV7A-NEXT: mov r2, r0
	; CHECKV7A-NEXT: orr.w r1, r3, r1, lsr #17
	; CHECKV7A-NEXT: it ne			; CHECKV7A-NEXT: it ne
	; CHECKV7A-NEXT: movne r2, lr			; CHECKV7A-NEXT: movne r2, lr
	; CHECKV7A-NEXT: cmp r1, #0			; CHECKV7A-NEXT: cmp r1, #0
	; CHECKV7A-NEXT: it ne			; CHECKV7A-NEXT: it ne
	; CHECKV7A-NEXT: movne r0, lr			; CHECKV7A-NEXT: movne r0, lr
	; CHECKV7A-NEXT: add r0, r2			; CHECKV7A-NEXT: add r0, r2
	; CHECKV7A-NEXT: pop {r4, pc}			; CHECKV7A-NEXT: pop {r4, pc}
	; CHECKV7A-NEXT: .LBB6_2: @ %else			; CHECKV7A-NEXT: .LBB6_2: @ %else
	; CHECKV7A-NEXT: lsrs r1, r2, #17			; CHECKV7A-NEXT: orr.w r1, r3, r2, lsr #17
	; CHECKV7A-NEXT: orr.w r1, r1, r3, lsl #15
	; CHECKV7A-NEXT: orr.w r1, r1, r3, lsr #17
	; CHECKV7A-NEXT: cmp r1, #0			; CHECKV7A-NEXT: cmp r1, #0
	; CHECKV7A-NEXT: it ne			; CHECKV7A-NEXT: it ne
	; CHECKV7A-NEXT: movne r0, lr			; CHECKV7A-NEXT: movne r0, lr
	; CHECKV7A-NEXT: pop {r4, pc}			; CHECKV7A-NEXT: pop {r4, pc}
	br i1 %c, label %then, label %else			br i1 %c, label %then, label %else
	then:			then:
	%c1 = icmp uge i64 %x, 131072			%c1 = icmp uge i64 %x, 131072
	%c2 = icmp uge i64 %y, 131072			%c2 = icmp uge i64 %y, 131072
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/icmp-shift-opt.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=armv7 %s -o - \| FileCheck %s		; RUN: llc -mtriple=armv7 %s -o - \| FileCheck %s

; Optimize expanded SRL/SHL used as an input of		; Optimize expanded SRL/SHL used as an input of
; SETCC comparing it with zero by removing rotation.		; SETCC comparing it with zero by removing rotation.
;		;
; See https://bugs.llvm.org/show_bug.cgi?id=50197		; See https://bugs.llvm.org/show_bug.cgi?id=50197
define i64 @opt_setcc_lt_power_of_2(i64 %a) nounwind {		define i64 @opt_setcc_lt_power_of_2(i64 %a) nounwind {
; CHECK-LABEL: opt_setcc_lt_power_of_2:		; CHECK-LABEL: opt_setcc_lt_power_of_2:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: .LBB0_1: @ %loop		; CHECK-NEXT: .LBB0_1: @ %loop
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1		; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
; CHECK-NEXT: adds r0, r0, #1		; CHECK-NEXT: adds r0, r0, #1
; CHECK-NEXT: adc r1, r1, #0		; CHECK-NEXT: adc r1, r1, #0
; CHECK-NEXT: lsr r2, r0, #16		; CHECK-NEXT: orr r2, r1, r0, lsr #16
; CHECK-NEXT: orr r2, r2, r1, lsl #16
; CHECK-NEXT: orr r2, r2, r1, lsr #16
; CHECK-NEXT: cmp r2, #0		; CHECK-NEXT: cmp r2, #0
; CHECK-NEXT: bne .LBB0_1		; CHECK-NEXT: bne .LBB0_1
; CHECK-NEXT: @ %bb.2: @ %exit		; CHECK-NEXT: @ %bb.2: @ %exit
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
br label %loop		br label %loop

loop:		loop:
%phi.a = phi i64 [ %a, %0 ], [ %inc, %loop ]		%phi.a = phi i64 [ %a, %0 ], [ %inc, %loop ]
%inc = add i64 %phi.a, 1		%inc = add i64 %phi.a, 1
%cmp = icmp ult i64 %inc, 65536		%cmp = icmp ult i64 %inc, 65536
br i1 %cmp, label %exit, label %loop		br i1 %cmp, label %exit, label %loop

exit:		exit:
ret i64 %inc		ret i64 %inc
}		}

define i1 @opt_setcc_srl_eq_zero(i64 %a) nounwind {		define i1 @opt_setcc_srl_eq_zero(i64 %a) nounwind {
; CHECK-LABEL: opt_setcc_srl_eq_zero:		; CHECK-LABEL: opt_setcc_srl_eq_zero:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsr r0, r0, #17		; CHECK-NEXT: orr r0, r1, r0, lsr #17
; CHECK-NEXT: orr r0, r0, r1, lsl #15
; CHECK-NEXT: orr r0, r0, r1, lsr #17
; CHECK-NEXT: clz r0, r0		; CHECK-NEXT: clz r0, r0
; CHECK-NEXT: lsr r0, r0, #5		; CHECK-NEXT: lsr r0, r0, #5
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%srl = lshr i64 %a, 17		%srl = lshr i64 %a, 17
%cmp = icmp eq i64 %srl, 0		%cmp = icmp eq i64 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_srl_ne_zero(i64 %a) nounwind {		define i1 @opt_setcc_srl_ne_zero(i64 %a) nounwind {
; CHECK-LABEL: opt_setcc_srl_ne_zero:		; CHECK-LABEL: opt_setcc_srl_ne_zero:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsr r0, r0, #17		; CHECK-NEXT: orr r0, r1, r0, lsr #17
; CHECK-NEXT: orr r0, r0, r1, lsl #15
; CHECK-NEXT: orr r0, r0, r1, lsr #17
; CHECK-NEXT: cmp r0, #0		; CHECK-NEXT: cmp r0, #0
; CHECK-NEXT: movwne r0, #1		; CHECK-NEXT: movwne r0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%srl = lshr i64 %a, 17		%srl = lshr i64 %a, 17
%cmp = icmp ne i64 %srl, 0		%cmp = icmp ne i64 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_eq_zero(i64 %a) nounwind {		define i1 @opt_setcc_shl_eq_zero(i64 %a) nounwind {
; CHECK-LABEL: opt_setcc_shl_eq_zero:		; CHECK-LABEL: opt_setcc_shl_eq_zero:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsl r1, r1, #17		; CHECK-NEXT: orr r0, r0, r1, lsl #17
; CHECK-NEXT: orr r1, r1, r0, lsr #15
; CHECK-NEXT: orr r0, r1, r0, lsl #17
; CHECK-NEXT: clz r0, r0		; CHECK-NEXT: clz r0, r0
; CHECK-NEXT: lsr r0, r0, #5		; CHECK-NEXT: lsr r0, r0, #5
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%shl = shl i64 %a, 17		%shl = shl i64 %a, 17
%cmp = icmp eq i64 %shl, 0		%cmp = icmp eq i64 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_ne_zero(i64 %a) nounwind {		define i1 @opt_setcc_shl_ne_zero(i64 %a) nounwind {
; CHECK-LABEL: opt_setcc_shl_ne_zero:		; CHECK-LABEL: opt_setcc_shl_ne_zero:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsl r1, r1, #17		; CHECK-NEXT: orr r0, r0, r1, lsl #17
; CHECK-NEXT: orr r1, r1, r0, lsr #15
; CHECK-NEXT: orr r0, r1, r0, lsl #17
; CHECK-NEXT: cmp r0, #0		; CHECK-NEXT: cmp r0, #0
; CHECK-NEXT: movwne r0, #1		; CHECK-NEXT: movwne r0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%shl = shl i64 %a, 17		%shl = shl i64 %a, 17
%cmp = icmp ne i64 %shl, 0		%cmp = icmp ne i64 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

Show All 18 Lines	; CHECK-NEXT: pop {r4, r5, r11, pc}
ret i1 %cmp		ret i1 %cmp
}		}

; Check that optimization is applied to DAG having appropriate shape		; Check that optimization is applied to DAG having appropriate shape
; even if there were no actual shift's expansion.		; even if there were no actual shift's expansion.
define i1 @opt_setcc_expanded_shl_correct_shifts(i32 %a, i32 %b) nounwind {		define i1 @opt_setcc_expanded_shl_correct_shifts(i32 %a, i32 %b) nounwind {
; CHECK-LABEL: opt_setcc_expanded_shl_correct_shifts:		; CHECK-LABEL: opt_setcc_expanded_shl_correct_shifts:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsl r0, r0, #17		; CHECK-NEXT: orr r0, r1, r0, lsl #17
; CHECK-NEXT: orr r0, r0, r1, lsr #15
; CHECK-NEXT: orr r0, r0, r1, lsl #17
; CHECK-NEXT: clz r0, r0		; CHECK-NEXT: clz r0, r0
; CHECK-NEXT: lsr r0, r0, #5		; CHECK-NEXT: lsr r0, r0, #5
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%shl.a = shl i32 %a, 17		%shl.a = shl i32 %a, 17
%srl.b = lshr i32 %b, 15		%srl.b = lshr i32 %b, 15
%or.0 = or i32 %shl.a, %srl.b		%or.0 = or i32 %shl.a, %srl.b
%shl.b = shl i32 %b, 17		%shl.b = shl i32 %b, 17
%or.1 = or i32 %or.0, %shl.b		%or.1 = or i32 %or.0, %shl.b
Show All 19 Lines	; CHECK-NEXT: bx lr
%or.1 = or i32 %or.0, %shl.b		%or.1 = or i32 %or.0, %shl.b
%cmp = icmp eq i32 %or.1, 0		%cmp = icmp eq i32 %or.1, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_ne_zero_i128(i128 %a) nounwind {		define i1 @opt_setcc_shl_ne_zero_i128(i128 %a) nounwind {
; CHECK-LABEL: opt_setcc_shl_ne_zero_i128:		; CHECK-LABEL: opt_setcc_shl_ne_zero_i128:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: lsl r3, r3, #17		; CHECK-NEXT: orr r2, r2, r3, lsl #17
; CHECK-NEXT: orr r12, r3, r2, lsr #15		; CHECK-NEXT: orr r0, r1, r0
; CHECK-NEXT: lsl r3, r1, #17		; CHECK-NEXT: orrs r0, r0, r2
; CHECK-NEXT: lsl r2, r2, #17
; CHECK-NEXT: orr r3, r3, r0, lsr #15
; CHECK-NEXT: orr r1, r2, r1, lsr #15
; CHECK-NEXT: orr r3, r3, r12
; CHECK-NEXT: orr r0, r1, r0, lsl #17
; CHECK-NEXT: orrs r0, r0, r3
; CHECK-NEXT: movwne r0, #1		; CHECK-NEXT: movwne r0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%shl = shl i128 %a, 17		%shl = shl i128 %a, 17
%cmp = icmp ne i128 %shl, 0		%cmp = icmp ne i128 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions sorry - I didn't add this one - I'll commit this shortly. RKSimon: sorry - I didn't add this one - I'll commit this shortly.
		fzhinkinAuthorUnsubmitted Done Reply Inline Actions Thank you. fzhinkin: Thank you.

declare void @use(i64 %a)		declare void @use(i64 %a)

llvm/test/CodeGen/X86/icmp-shift-opt.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=i686-- < %s \| FileCheck %s --check-prefix=X86		; RUN: llc -mtriple=i686-- < %s \| FileCheck %s --check-prefix=X86
; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s --check-prefix=X64		; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s --check-prefix=X64

; Optimize expanded SRL/SHL used as an input of		; Optimize expanded SRL/SHL used as an input of
; SETCC comparing it with zero by removing rotation.		; SETCC comparing it with zero by removing rotation.
;		;
; See https://bugs.llvm.org/show_bug.cgi?id=50197		; See https://bugs.llvm.org/show_bug.cgi?id=50197
		RKSimonUnsubmitted Not Done Reply Inline Actions add nounwind to reduce cfi noise RKSimon: add nounwind to reduce cfi noise
define i128 @opt_setcc_lt_power_of_2(i128 %a) nounwind {		define i128 @opt_setcc_lt_power_of_2(i128 %a) nounwind {
; X86-LABEL: opt_setcc_lt_power_of_2:		; X86-LABEL: opt_setcc_lt_power_of_2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %ebp		; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %ebx		; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi		; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi		; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
; X86-NEXT: .p2align 4, 0x90		; X86-NEXT: .p2align 4, 0x90
; X86-NEXT: .LBB0_1: # %loop		; X86-NEXT: .LBB0_1: # %loop
; X86-NEXT: # =>This Inner Loop Header: Depth=1		; X86-NEXT: # =>This Inner Loop Header: Depth=1
; X86-NEXT: addl $1, %ecx		; X86-NEXT: addl $1, %edi
; X86-NEXT: adcl $0, %esi		; X86-NEXT: adcl $0, %esi
; X86-NEXT: adcl $0, %edx		; X86-NEXT: adcl $0, %edx
; X86-NEXT: adcl $0, %ebx		; X86-NEXT: adcl $0, %ecx
; X86-NEXT: movl %ebx, %edi		; X86-NEXT: movl %ecx, %ebx
; X86-NEXT: shldl $4, %edx, %edi		; X86-NEXT: orl %edx, %ebx
; X86-NEXT: movl %edx, %ebp		; X86-NEXT: movl %esi, %ebp
; X86-NEXT: shldl $4, %esi, %ebp		; X86-NEXT: shrl $28, %ebp
; X86-NEXT: movl %ecx, %eax		; X86-NEXT: orl %ebx, %ebp
; X86-NEXT: movl %ebx, %ecx
; X86-NEXT: shrl $28, %ecx
; X86-NEXT: orl %ebp, %ecx
; X86-NEXT: orl %edi, %ecx
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: jne .LBB0_1		; X86-NEXT: jne .LBB0_1
; X86-NEXT: # %bb.2: # %exit		; X86-NEXT: # %bb.2: # %exit
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl %edi, (%eax)
; X86-NEXT: movl %ecx, (%eax)
; X86-NEXT: movl %esi, 4(%eax)		; X86-NEXT: movl %esi, 4(%eax)
; X86-NEXT: movl %edx, 8(%eax)		; X86-NEXT: movl %edx, 8(%eax)
; X86-NEXT: movl %ebx, 12(%eax)		; X86-NEXT: movl %ecx, 12(%eax)
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
; X86-NEXT: popl %edi		; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx		; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp		; X86-NEXT: popl %ebp
; X86-NEXT: retl $4		; X86-NEXT: retl $4
;		;
; X64-LABEL: opt_setcc_lt_power_of_2:		; X64-LABEL: opt_setcc_lt_power_of_2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movq %rsi, %rdx		; X64-NEXT: movq %rsi, %rdx
; X64-NEXT: movq %rdi, %rax		; X64-NEXT: movq %rdi, %rax
; X64-NEXT: .p2align 4, 0x90		; X64-NEXT: .p2align 4, 0x90
; X64-NEXT: .LBB0_1: # %loop		; X64-NEXT: .LBB0_1: # %loop
; X64-NEXT: # =>This Inner Loop Header: Depth=1		; X64-NEXT: # =>This Inner Loop Header: Depth=1
; X64-NEXT: addq $1, %rax		; X64-NEXT: addq $1, %rax
; X64-NEXT: adcq $0, %rdx		; X64-NEXT: adcq $0, %rdx
; X64-NEXT: movq %rdx, %rcx		; X64-NEXT: movq %rax, %rcx
; X64-NEXT: shldq $4, %rax, %rcx		; X64-NEXT: shrq $60, %rcx
; X64-NEXT: movq %rdx, %rsi		; X64-NEXT: orq %rdx, %rcx
; X64-NEXT: shrq $60, %rsi
; X64-NEXT: orq %rcx, %rsi
; X64-NEXT: jne .LBB0_1		; X64-NEXT: jne .LBB0_1
; X64-NEXT: # %bb.2: # %exit		; X64-NEXT: # %bb.2: # %exit
; X64-NEXT: retq		; X64-NEXT: retq
br label %loop		br label %loop

loop:		loop:
%phi.a = phi i128 [ %a, %0 ], [ %inc, %loop ]		%phi.a = phi i128 [ %a, %0 ], [ %inc, %loop ]
%inc = add i128 %phi.a, 1		%inc = add i128 %phi.a, 1
%cmp = icmp ult i128 %inc, 1152921504606846976		%cmp = icmp ult i128 %inc, 1152921504606846976
br i1 %cmp, label %exit, label %loop		br i1 %cmp, label %exit, label %loop

exit:		exit:
ret i128 %inc		ret i128 %inc
}		}

define i1 @opt_setcc_srl_eq_zero(i128 %a) nounwind {		define i1 @opt_setcc_srl_eq_zero(i128 %a) nounwind {
; X86-LABEL: opt_setcc_srl_eq_zero:		; X86-LABEL: opt_setcc_srl_eq_zero:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: shrl $17, %eax
; X86-NEXT: movl %esi, %edi		; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shldl $15, %edx, %edi		; X86-NEXT: orl %ecx, %eax
; X86-NEXT: shldl $15, %ecx, %edx
; X86-NEXT: shrdl $17, %ecx, %eax
; X86-NEXT: orl %edi, %eax
; X86-NEXT: shrl $17, %esi
; X86-NEXT: orl %edx, %esi
; X86-NEXT: orl %eax, %esi
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: opt_setcc_srl_eq_zero:		; X64-LABEL: opt_setcc_srl_eq_zero:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: shrdq $17, %rsi, %rdi		; X64-NEXT: shrq $17, %rdi
; X64-NEXT: shrq $17, %rsi		; X64-NEXT: orq %rsi, %rdi
; X64-NEXT: orq %rdi, %rsi
; X64-NEXT: sete %al		; X64-NEXT: sete %al
; X64-NEXT: retq		; X64-NEXT: retq
%srl = lshr i128 %a, 17		%srl = lshr i128 %a, 17
%cmp = icmp eq i128 %srl, 0		%cmp = icmp eq i128 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_srl_ne_zero(i128 %a) nounwind {		define i1 @opt_setcc_srl_ne_zero(i128 %a) nounwind {
; X86-LABEL: opt_setcc_srl_ne_zero:		; X86-LABEL: opt_setcc_srl_ne_zero:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: shrl $17, %eax
; X86-NEXT: movl %esi, %edi		; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shldl $15, %edx, %edi		; X86-NEXT: orl %ecx, %eax
; X86-NEXT: shldl $15, %ecx, %edx
; X86-NEXT: shrdl $17, %ecx, %eax
; X86-NEXT: orl %edi, %eax
; X86-NEXT: shrl $17, %esi
; X86-NEXT: orl %edx, %esi
; X86-NEXT: orl %eax, %esi
; X86-NEXT: setne %al		; X86-NEXT: setne %al
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: opt_setcc_srl_ne_zero:		; X64-LABEL: opt_setcc_srl_ne_zero:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: shrdq $17, %rsi, %rdi		; X64-NEXT: shrq $17, %rdi
; X64-NEXT: shrq $17, %rsi		; X64-NEXT: orq %rsi, %rdi
; X64-NEXT: orq %rdi, %rsi
; X64-NEXT: setne %al		; X64-NEXT: setne %al
; X64-NEXT: retq		; X64-NEXT: retq
%srl = lshr i128 %a, 17		%srl = lshr i128 %a, 17
%cmp = icmp ne i128 %srl, 0		%cmp = icmp ne i128 %srl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_eq_zero(i128 %a) nounwind {		define i1 @opt_setcc_shl_eq_zero(i128 %a) nounwind {
; X86-LABEL: opt_setcc_shl_eq_zero:		; X86-LABEL: opt_setcc_shl_eq_zero:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shldl $17, %esi, %edx		; X86-NEXT: shll $17, %ecx
; X86-NEXT: shldl $17, %ecx, %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shldl $17, %eax, %ecx
; X86-NEXT: shll $17, %eax
; X86-NEXT: orl %esi, %eax
; X86-NEXT: orl %edx, %ecx
; X86-NEXT: orl %eax, %ecx		; X86-NEXT: orl %eax, %ecx
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: opt_setcc_shl_eq_zero:		; X64-LABEL: opt_setcc_shl_eq_zero:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: shldq $17, %rdi, %rsi		; X64-NEXT: shlq $17, %rsi
; X64-NEXT: shlq $17, %rdi		; X64-NEXT: orq %rdi, %rsi
; X64-NEXT: orq %rsi, %rdi
; X64-NEXT: sete %al		; X64-NEXT: sete %al
; X64-NEXT: retq		; X64-NEXT: retq
%shl = shl i128 %a, 17		%shl = shl i128 %a, 17
%cmp = icmp eq i128 %shl, 0		%cmp = icmp eq i128 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @opt_setcc_shl_ne_zero(i128 %a) nounwind {		define i1 @opt_setcc_shl_ne_zero(i128 %a) nounwind {
; X86-LABEL: opt_setcc_shl_ne_zero:		; X86-LABEL: opt_setcc_shl_ne_zero:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shldl $17, %esi, %edx		; X86-NEXT: shll $17, %ecx
; X86-NEXT: shldl $17, %ecx, %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shldl $17, %eax, %ecx
; X86-NEXT: shll $17, %eax
; X86-NEXT: orl %esi, %eax
; X86-NEXT: orl %edx, %ecx
; X86-NEXT: orl %eax, %ecx		; X86-NEXT: orl %eax, %ecx
; X86-NEXT: setne %al		; X86-NEXT: setne %al
; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: opt_setcc_shl_ne_zero:		; X64-LABEL: opt_setcc_shl_ne_zero:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: shldq $17, %rdi, %rsi		; X64-NEXT: shlq $17, %rsi
; X64-NEXT: shlq $17, %rdi		; X64-NEXT: orq %rdi, %rsi
; X64-NEXT: orq %rsi, %rdi
; X64-NEXT: setne %al		; X64-NEXT: setne %al
; X64-NEXT: retq		; X64-NEXT: retq
%shl = shl i128 %a, 17		%shl = shl i128 %a, 17
%cmp = icmp ne i128 %shl, 0		%cmp = icmp ne i128 %shl, 0
ret i1 %cmp		ret i1 %cmp
}		}

; Negative test: optimization should not be applied if shift has multiple users.		; Negative test: optimization should not be applied if shift has multiple users.
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
ret i1 %cmp		ret i1 %cmp
}		}

; Check that optimization is applied to DAG having appropriate shape		; Check that optimization is applied to DAG having appropriate shape
; even if there were no actual shift's expansion.		; even if there were no actual shift's expansion.
define i1 @opt_setcc_expanded_shl_correct_shifts(i64 %a, i64 %b) nounwind {		define i1 @opt_setcc_expanded_shl_correct_shifts(i64 %a, i64 %b) nounwind {
; X86-LABEL: opt_setcc_expanded_shl_correct_shifts:		; X86-LABEL: opt_setcc_expanded_shl_correct_shifts:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx		; X86-NEXT: shll $17, %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shldl $17, %edx, %esi		; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shldl $17, %ecx, %edx
; X86-NEXT: shldl $17, %eax, %ecx
; X86-NEXT: shll $17, %eax
; X86-NEXT: orl %edx, %eax
; X86-NEXT: orl %esi, %ecx
; X86-NEXT: orl %eax, %ecx		; X86-NEXT: orl %eax, %ecx
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: opt_setcc_expanded_shl_correct_shifts:		; X64-LABEL: opt_setcc_expanded_shl_correct_shifts:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: shldq $17, %rsi, %rdi		; X64-NEXT: shlq $17, %rdi
; X64-NEXT: shlq $17, %rsi		; X64-NEXT: orq %rsi, %rdi
; X64-NEXT: orq %rdi, %rsi
; X64-NEXT: sete %al		; X64-NEXT: sete %al
; X64-NEXT: retq		; X64-NEXT: retq
%shl.a = shl i64 %a, 17		%shl.a = shl i64 %a, 17
%srl.b = lshr i64 %b, 47		%srl.b = lshr i64 %b, 47
%or.0 = or i64 %shl.a, %srl.b		%or.0 = or i64 %shl.a, %srl.b
%shl.b = shl i64 %b, 17		%shl.b = shl i64 %b, 17
%or.1 = or i64 %or.0, %shl.b		%or.1 = or i64 %or.0, %shl.b
%cmp = icmp eq i64 %or.1, 0		%cmp = icmp eq i64 %or.1, 0
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering] Optimize expanded SRL/SHL fed into SETCC ne/eq 0AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409381

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/AArch64/icmp-shift-opt.ll

llvm/test/CodeGen/ARM/consthoist-icmpimm.ll

llvm/test/CodeGen/ARM/icmp-shift-opt.ll

llvm/test/CodeGen/X86/icmp-shift-opt.ll

[TargetLowering] Optimize expanded SRL/SHL fed into SETCC ne/eq 0
AbandonedPublic