This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
thinlto-distributed-newpm.ll
-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/7
TargetTransformInfo.h
1
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
Transforms/InstCombine/
-
InstCombine/
3
InstCombiner.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
AMDGPU/
-
AMDGPUInstCombineIntrinsic.cpp
-
AMDGPUTargetTransformInfo.h
-
CMakeLists.txt
-
InstCombineTables.td
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
NVPTX/
-
NVPTXTargetTransformInfo.h
-
NVPTXTargetTransformInfo.cpp
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
X86/
-
CMakeLists.txt
-
X86InstCombineIntrinsic.cpp
-
X86TargetTransformInfo.h
-
Transforms/InstCombine/
-
InstCombine/
-
CMakeLists.txt
-
InstCombineAddSub.cpp
-
InstCombineAndOrXor.cpp
-
InstCombineAtomicRMW.cpp
-
InstCombineCalls.cpp
-
InstCombineCasts.cpp
-
InstCombineCompares.cpp
-
InstCombineInternal.h
-
InstCombineLoadStoreAlloca.cpp
-
InstCombineMulDivRem.cpp
-
InstCombineNegator.cpp
-
InstCombinePHI.cpp
-
InstCombineSelect.cpp
-
InstCombineShifts.cpp
2
InstCombineSimplifyDemanded.cpp
-
InstCombineTables.td
-
InstCombineVectorOps.cpp
3
InstructionCombining.cpp
-
test/
-
CodeGen/Thumb2/
-
Thumb2/
-
mve-intrinsics/
1
predicates.ll
-
vadc-multiple.ll
-
mve-vpt-from-intrinsics.ll
-
Transforms/InstCombine/
-
InstCombine/
-
AMDGPU/
-
amdgcn-demanded-vector-elts.ll
-
amdgcn-intrinsics.ll
-
ldexp.ll
-
ARM/
-
mve-v2i2v.ll
-
neon-intrinsics.ll
-
NVPTX/
-
nvvm-intrins.ll
-
X86/
-
X86FsubCmpCombine.ll
-
addcarry.ll
-
clmulqdq.ll
-
x86-avx2.ll
-
x86-avx512.ll
-
x86-bmi-tbm.ll
-
x86-insertps.ll
-
x86-masked-memops.ll
-
x86-movmsk.ll
-
x86-pack.ll
-
x86-pshufb.ll
-
x86-sse.ll
-
x86-sse2.ll
-
x86-sse41.ll
-
x86-sse4a.ll
-
x86-vec_demanded_elts.ll
-
x86-vector-shifts.ll
-
x86-vpermil.ll
-
x86-xop.ll

Differential D81728

[InstCombine] Add target-specific inst combining
ClosedPublic

Authored by Flakebi on Jun 12 2020, 3:08 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
majnemer
spatel
lebedev.ri
lattner

Commits

rG2a6c871596ce: [InstCombine] Move target-specific inst combining

Summary

Targets can combine intrinsics in
TargetTransformInfo::instCombineIntrinsic.
This allows accessing target specific features and combining
instructions only if the target supports certain features.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Flakebi created this revision.Jun 12 2020, 3:08 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptJun 12 2020, 3:08 AM

lebedev.ri added a reviewer: spatel.Jun 12 2020, 3:32 AM

lebedev.ri added a subscriber: lebedev.ri.

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3811	This opens a dangerous floodgates of instcombine not being target-independent canonicalization pass.

Harbormaster failed remote builds in B60093: Diff 270348!Jun 12 2020, 4:17 AM

To add more context to this, the problem I am facing is that amdgpu image intrinsics are usually called with float arguments. However, on some subtargets/hardware generations it is possible to call them with half arguments.
If llvm is compiling for such a subtarget, it is beneficial to combine

%s32 = fpext half %s to float
call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(…, float %s32, …)

into

call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(…, half %s, …)

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3811	That is the point of this change, to allow target-dependent combinations in TargetTransformInfo::instCombineIntrinsic. Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I don’t have a great overview of LLVM, so I might be wrong on this.

lebedev.ri added inline comments.Jun 12 2020, 6:23 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3811	Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I agree with that, yes. The problem i'm seeing is that even having TTI in the pass "significantly" lowers the barrier of entry for then using TTI to guard some generic transforms in the instcombine.

In D81728#2089644, @Flakebi wrote:

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

The fact that this pass recognizes target-specific intrinsics at all is widely regarded as a mistake:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102317.html

Target-specific transforms should look first at codegen combiners (SDAG or GlobalISel). If that's too late, consider a target-specific IR codegen pass (I think AMDGPU has a few examples of this already). If that's still too late, write a generic IR transform pass that accesses TTI?

In D81728#2089713, @spatel wrote:

In D81728#2089644, @Flakebi wrote:

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

The fact that this pass recognizes target-specific intrinsics at all is widely regarded as a mistake:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102317.html

Target-specific transforms should look first at codegen combiners (SDAG or GlobalISel). If that's too late, consider a target-specific IR codegen pass (I think AMDGPU has a few examples of this already). If that's still too late, write a generic IR transform pass that accesses TTI?

The problem with all of these suggestions is that they're likely technically-inferior solutions compared to sitting inside of InstCombine's fixed-point iteration scheme. Honestly, I think that the way we should ensure that InstCombine does not start using TTI to define a canonical form for non-target-specific intrinsics is via documentation and code review. InstCombine has long had logic to deal with target-specific intrinsics (in InstCombineCalls.cpp), and refactoring things so that this logic can live in each backend seems like an improvement to me.

nikic added a subscriber: nikic.Jun 13 2020, 4:11 AM

nikic added inline comments.Jun 13 2020, 5:49 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
150	Actually implementing this would require us to export the `InstCombiner` class, which is part of `InstCombineInternal.h`. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine.

foad added a subscriber: foad.Jun 16 2020, 6:34 AM

Summarizing the comments, the important points are

Everyone agrees on moving target specific stuff out of Transforms/InstCombine into target specific folders
Keep running the instruction combining in the InstCombine pass, so the fixed-point iteration works

The majority of target specific code is intrinsic combining, there is only one more amdgpu specific part in InstCombineSimplifyDemanded.cpp:SimplifyDemandedVectorElts. Unless someone has an idea on how to implement this in a more generic way, I’ll keep it like in the current diff, only combining intrinsics in TargetTransformInfo::instCombineIntrinsic.

Actually implementing this would require us to export the InstCombiner class, which is part of InstCombineInternal.h. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine.

Good point, I’ll try to add that here in the next week.

Moved most target specific InstCombine parts to their respective targets.
The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. Is there a place where code for these targets is shared?

The gist of these changes is in the following files:

llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
llvm/include/llvm/CodeGen/BasicTTIImpl.h
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
llvm/lib/Analysis/TargetTransformInfo.cpp
llvm/lib/Transforms/InstCombine/InstCombineInternal.h
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

The rest of the changes are moving about 3000 lines out from InstCombine to the targets and slightly adjust them for the new interface, there should be no other changes in there.

Herald added a reviewer: lebedev.ri. · View Herald TranscriptJun 24 2020, 8:52 AM

Herald added subscribers: kerbowa, dmgreen, jfb and 6 others. · View Herald Transcript

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

Herald added a subscriber: • wuzish. · View Herald TranscriptJun 24 2020, 9:42 AM

Harbormaster failed remote builds in B61563: Diff 273054!Jun 24 2020, 10:48 AM

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

In D81728#2112483, @craig.topper wrote:

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

This may be the only example though. I may have introduced something conceptually new without realizing it. The current use also doesn't exactly make the change. It does introduce new instructions, but the pass is still responsible for doing the replacement/delete of the old value

In D81728#2112558, @arsenm wrote:

In D81728#2112483, @craig.topper wrote:

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

This may be the only example though. I may have introduced something conceptually new without realizing it. The current use also doesn't exactly make the change. It does introduce new instructions, but the pass is still responsible for doing the replacement/delete of the old value

I guess it also modifies the original instruction in place in some cases

Adjust failing clang test, TargetIRAnalysis is run earlier now

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2020, 10:08 AM

Herald added subscribers: cfe-commits, dexonsmith, steven_wu. · View Herald Transcript

Harbormaster failed remote builds in B61775: Diff 273425!Jun 25 2020, 10:15 AM

Rebased, so the automatic builds can run

Harbormaster failed remote builds in B61790: Diff 273458!Jun 25 2020, 12:29 PM

dexonsmith removed a subscriber: dexonsmith.Jun 25 2020, 2:06 PM

We've been handling target-specific intrinsics in InstCombine for a long time, and that's the place where they should naturally sit. This is a pretty clean refactoring in my opinion, I'm in favor. It's substantial enough as a change that it should probably receive a heads-up on llvm-dev, though.

I think an interface usable by InstructionSimplify would be helpful too, so I think that would be a separate thing from TTI

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

I don't think this is a great justification for doing anything here. You can always reverse the transform in isel on targets where it isn't supported; adding more IR patterns increases the potential for missed optimizations.

That said, I think moving the handling for target intrinsics into the target makes sense as a cleanup.

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1444	Is there some way we can check that an intrinsic is actually target-specific, to discourage people from handling generic intrinsics in target-specific ways?

foad added a subscriber: bogner.Jun 30 2020, 1:14 AM

foad added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1444	That was the intent of @bogner's rG92a8c6112c6571112e8b622bfddc7e4d1685a6fe.

Rebased and call target-specific combining only for target-specific intrinsics as suggested.
Add Function::isTargetIntrinsic() for this purpose.

Harbormaster failed remote builds in B62312: Diff 274436!Jun 30 2020, 7:01 AM

This looks like a great direction, but please make sure to minimize public implementation details. We don't want the vast majority of instcombine to be visible outside of its library (it is hairy enough as it is :-)

llvm/include/llvm/Analysis/TargetTransformInfo.h
29	Can this be forward declared instead of #include'd?
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
31	Please minimize #includes in general, thanks :)
47	I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine? If you're curious for a pattern that could be followed, the MLIR AsmParser is a reasonable example. The parser is spread across a bunch of classes in the lib/ directory: https://github.com/llvm/llvm-project/blob/master/mlir/lib/Parser/Parser.cpp But then there is a much smaller public API exposed through a header: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/OpImplementation.h#L229

This revision now requires changes to proceed.Jun 30 2020, 1:24 PM

nhaehnle added inline comments.Jul 1 2020, 7:08 AM

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
47	I agree with the sentiment, but note @Flakebi has split up the `InstCombiner` class into `InstCombiner` and `InstCombinerImpl` classes, which addresses those concerns already as far as I'm concerned. Looking through the new `InstCombiner`, aside from methods that are core to the workings of InstCombine (modifying instructions while keeping track of the Worklist) and methods for accessing the analyses, what's left is: A bunch of static methods that should arguably just be global functions in a utils header somewhere. CreateOverflowTuple and CreateNonTerminatorUnreachable Moving those methods feels sensible, but is likely to touch a lot of code, so I think it would be better to do it in a separate commit.

RKSimon added a subscriber: RKSimon.Jul 2 2020, 12:20 AM

Rebased and removed a few includes as suggested.
Make the TargetTransformInfo a private member of InstCombiner because it should not be used in general inst combines.
Move CreateOverflowTuple out of InstCombiner and make CreateNonTerminatorUnreachable static.

I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine?

I agrees that keeping the public interface small is desirable and I tried to do that by splitting the class into InstCombiner – the internal, public interface – and InstCombinerImpl – the actual implementation of the pass.
As far as I understand it, LLVM_LIBRARY_VISIBILITY hides this class so it is not visible outside LLVM?

With this change, inst combining is split across several places, the general InstCombine and all the targets. They do similar things with the difference that the inst combining part inside the targets does only have access to the public InstCombiner interface.
As the target specific parts want to use the same helper methods, these helpers need to be in a public interface (public to the targets, not to LLVM users). The most prominent of these helpers is peekThroughBitcast.

Some of these helper functions are currently not used by targets, so they can be moved to a utils header if desired. In general, I think we want them to be shared, so that not every target has its own set of helpers.

Harbormaster failed remote builds in B62975: Diff 275617!Jul 6 2020, 3:42 AM

sameerds added a subscriber: sameerds.Jul 6 2020, 10:26 PM

Rebased (no conflicts this time).

Friendly ping for review.

Harbormaster completed remote builds in B63722: Diff 276983.Jul 10 2020, 5:04 AM

nikic added inline comments.Jul 10 2020, 9:57 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
540	For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction *` argument and bool result, is there any reason not to have an `Instruction ` return value, with nullptr indicating that the intrinsic couldn't be simplified?
542	`const APInt &DemandedMask`?
546	`const APInt &DemandedElts`?

Flakebi marked an inline comment as done.Jul 10 2020, 12:22 PM

Flakebi added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
540	Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is called or other code is executed after `instCombineIntrinsic`. So, somehow the caller must be able to see a difference between 'do nothing, just continue execution' and 'return this Instruction', where the `Instruction` can also be a nullptr. The return type could be an `optional<Instruction*>`. I’ll take a look at your other comments on Monday.

Please don't consider me a blocker on this patch, thank you for pushing on it!

Flakebi marked an inline comment as done.Jul 13 2020, 3:04 AM

Flakebi added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
542	I tried to change it it to to `const APInt &DemandedMask` but the x86 simplifyDemandedVectorEltsIntrinsic changes `DemandedMask`, so this function would have to copy it or take a non-const reference. Looking more into it, `SimplifyAndSetOp` takes `DemandedElts` by value too. An `APInt` consists of a `uint64_t` and an `unsigned`, so it should be 16 Byte in most cases. Only if the represented int is larger than 64 bit, it comes with an allocation. I guess copying should be fine. If you think it should be a reference anyway, let me know and I’ll change it.

Rebased and added some docs.

Is there anything left that needs to be done before this can be pushed?

foad added inline comments.Jul 17 2020, 4:46 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
544–547	Did you consider returning `std::pair<bool,Instruction*>`?

Harbormaster failed remote builds in B64652: Diff 278711!Jul 17 2020, 4:51 AM

Here you go.

Change return types of TargetTransformInfo::instCombineIntrinsic and others to Optional<Instruction *> and Optional<Value *>.

Harbormaster failed remote builds in B64664: Diff 278735!Jul 17 2020, 6:33 AM

dmgreen added inline comments.Jul 21 2020, 2:39 AM

llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll
2	Please use the same triple as llc for any test with "mve" in the title.

Rebased and fix triple for Thumb2 tests as suggested.

Thanks

Harbormaster failed remote builds in B65051: Diff 279463!Jul 21 2020, 4:16 AM

This has had a month of good review that has been addressed, I'd say it's good to go.

This revision is now accepted and ready to land.Jul 21 2020, 10:41 AM

Closed by commit rG2a6c871596ce: [InstCombine] Move target-specific inst combining (authored by sebastian-ne). · Explain WhyJul 22 2020, 7:00 AM

This revision was automatically updated to reflect the committed changes.

sebastian-ne mentioned this in rG2a6c871596ce: [InstCombine] Move target-specific inst combining.

I have a multi-stage, auto-git-bisecting bot that has identifying this commit as the source of a regression on Fedora 32 (x86-64). This commit broke my first stage test (release, no asserts). Might a quick fix happen or do we need to revert this?

FAIL: Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c (7188 of 67650)
******************** TEST 'Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c' FAILED ********************
Script:
--
: 'RUN: at line 1';   /tmp/_update_lc/r/bin/clang -cc1 -internal-isystem /tmp/_update_lc/r/lib/clang/12.0.0/include -nostdsysteminc -triple aarch64-arm-none-eabi -target-feature +neon -target-feature +bf16   -O2 -emit-llvm /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c -o - | /tmp/_update_lc/r/bin/FileCheck /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c --check-prefixes=CHECK,CHECK64
: 'RUN: at line 3';   /tmp/_update_lc/r/bin/clang -cc1 -internal-isystem /tmp/_update_lc/r/lib/clang/12.0.0/include -nostdsysteminc -triple armv8.6a-arm-none-eabi -target-feature +neon -target-feature +bf16 -mfloat-abi hard   -O2 -emit-llvm /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c -o - | /tmp/_update_lc/r/bin/FileCheck /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c --check-prefixes=CHECK,CHECK32
--
Exit Code: 1

Command Output (stderr):
--
/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:14:13: error: CHECK32: expected string not found in input
// CHECK32: %1 = load <4 x bfloat>, <4 x bfloat>* %0, align 2
            ^
<stdin>:7:52: note: scanning from here
define arm_aapcs_vfpcc <4 x bfloat> @test_vld1_bf16(bfloat* readonly %ptr) local_unnamed_addr #0 {
                                                   ^
<stdin>:10:5: note: possible intended match here
 %vld1 = tail call <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8* %0, i32 2)
    ^
/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:23:13: error: CHECK32: expected string not found in input
// CHECK32: %1 = load <8 x bfloat>, <8 x bfloat>* %0, align 2
            ^
<stdin>:18:53: note: scanning from here
define arm_aapcs_vfpcc <8 x bfloat> @test_vld1q_bf16(bfloat* readonly %ptr) local_unnamed_addr #2 {
                                                    ^
<stdin>:21:5: note: possible intended match here
 %vld1 = tail call <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8* %0, i32 2)
    ^

Input file: <stdin>
Check file: /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = '/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c'
            2: source_filename = "/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c"
            3: target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
            4: target triple = "armv8.6a-arm-none-eabi"
            5:
            6: ; Function Attrs: nounwind readonly
            7: define arm_aapcs_vfpcc <4 x bfloat> @test_vld1_bf16(bfloat* readonly %ptr) local_unnamed_addr #0 {
check:14'0                                                        X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            8: entry:
check:14'0     ~~~~~~
            9:  %0 = bitcast bfloat* %ptr to i8*
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10:  %vld1 = tail call <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8* %0, i32 2)
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:14'1         ?                                                                          possible intended match
           11:  ret <4 x bfloat> %vld1
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~
           12: }
check:14'0     ~
           13:
check:14'0     ~
           14: ; Function Attrs: argmemonly nounwind readonly
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15: declare <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8*, i32) #1
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:
check:14'0     ~
           17: ; Function Attrs: nounwind readonly
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18: define arm_aapcs_vfpcc <8 x bfloat> @test_vld1q_bf16(bfloat* readonly %ptr) local_unnamed_addr #2 {
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:23'0                                                         X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
           19: entry:
check:23'0     ~~~~~~
           20:  %0 = bitcast bfloat* %ptr to i8*
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           21:  %vld1 = tail call <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8* %0, i32 2)
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:23'1         ?                                                                          possible intended match
           22:  ret <8 x bfloat> %vld1
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~
           23: }
check:23'0     ~
           24:
check:23'0     ~
           25: ; Function Attrs: argmemonly nounwind readonly
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           26: declare <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8*, i32) #1
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (1):
  Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c


Testing Time: 71.60s
  Unsupported      : 10693
  Passed           : 56854
  Expectedly Failed:   102
  Failed           :     1

Thanks for the notification @davezarzycki, an auto-bisecting bot is cool!

This failure should be fixed in b99898c1e9c5d8bade1d898e84604d3241b0087c.

spatel mentioned this in D111500: [InstSimplify] Simplify intrinsic comparisons with domain knoweldge.Oct 11 2021, 1:39 PM

Allen added a subscriber: Allen.Oct 21 2022, 9:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 9:32 PM

Herald added subscribers: nlopes, kosarev, mattd and 4 others. · View Herald Transcript

Revision Contents

Path

Size

clang/

test/

CodeGen/

thinlto-distributed-newpm.ll

2 lines

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

47 lines

TargetTransformInfoImpl.h

21 lines

CodeGen/

BasicTTIImpl.h

26 lines

Transforms/

InstCombine/

InstCombiner.h

510 lines

lib/

Analysis/

TargetTransformInfo.cpp

24 lines

Target/

AMDGPU/

AMDGPUInstCombineIntrinsic.cpp

958 lines

AMDGPUTargetTransformInfo.h

10 lines

CMakeLists.txt

5 lines

ARM/

ARMTargetTransformInfo.h

3 lines

ARMTargetTransformInfo.cpp

139 lines

NVPTX/

NVPTXTargetTransformInfo.h

3 lines

NVPTXTargetTransformInfo.cpp

258 lines

PowerPC/

PPCTargetTransformInfo.h

3 lines

PPCTargetTransformInfo.cpp

165 lines

X86/

CMakeLists.txt

1 line

X86InstCombineIntrinsic.cpp

2063 lines

X86TargetTransformInfo.h

15 lines

	Target/	AMDGPU/
	Transforms/	InstCombine/

InstCombineTables.td

11 lines

Transforms/

InstCombine/

CMakeLists.txt

4 lines

InstCombineAddSub.cpp

41 lines

InstCombineAndOrXor.cpp

87 lines

InstCombineAtomicRMW.cpp

16 lines

InstCombineCalls.cpp

2882 lines

InstCombineCasts.cpp

80 lines

InstCombineCompares.cpp

242 lines

InstCombineInternal.h

533 lines

InstCombineLoadStoreAlloca.cpp

56 lines

InstCombineMulDivRem.cpp

34 lines

InstCombineNegator.cpp

3 lines

InstCombinePHI.cpp

20 lines

InstCombineSelect.cpp

59 lines

InstCombineShifts.cpp

27 lines

InstCombineSimplifyDemanded.cpp

474 lines

InstCombineTables.td

InstCombineVectorOps.cpp

23 lines

InstructionCombining.cpp

90 lines

test/

CodeGen/

Thumb2/

mve-intrinsics/

predicates.ll

2 lines

vadc-multiple.ll

4 lines

mve-vpt-from-intrinsics.ll

3 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-demanded-vector-elts.ll

2 lines

amdgcn-intrinsics.ll

2 lines

ldexp.ll

2 lines

ARM/

mve-v2i2v.ll

2 lines

neon-intrinsics.ll

2 lines

NVPTX/

nvvm-intrins.ll

4 lines

X86/

2 lines

3 lines

2 lines

2 lines

2 lines

2 lines

2 lines

3 lines

2 lines

2 lines

2 lines

2 lines

2 lines

2 lines

2 lines

x86-vec_demanded_elts.ll

2 lines

x86-vector-shifts.ll

2 lines

x86-vpermil.ll

2 lines

x86-xop.ll

2 lines

Diff 273425

clang/test/CodeGen/thinlto-distributed-newpm.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK-O: Running analysis: PassInstrumentationAnalysis on main			; CHECK-O: Running analysis: PassInstrumentationAnalysis on main
	; CHECK-O: Running analysis: AssumptionAnalysis on main			; CHECK-O: Running analysis: AssumptionAnalysis on main
	; CHECK-O: Running pass: DeadArgumentEliminationPass			; CHECK-O: Running pass: DeadArgumentEliminationPass
	; CHECK-O: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager<{{.}}Function>{{ ?}}>			; CHECK-O: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager<{{.}}Function>{{ ?}}>
	; CHECK-O: Starting {{.*}}Function pass manager run.			; CHECK-O: Starting {{.*}}Function pass manager run.
	; CHECK-O: Running pass: InstCombinePass on main			; CHECK-O: Running pass: InstCombinePass on main
	; CHECK-O: Running analysis: TargetLibraryAnalysis on main			; CHECK-O: Running analysis: TargetLibraryAnalysis on main
	; CHECK-O: Running analysis: OptimizationRemarkEmitterAnalysis on main			; CHECK-O: Running analysis: OptimizationRemarkEmitterAnalysis on main
				; CHECK-O: Running analysis: TargetIRAnalysis on main
	; CHECK-O: Running analysis: AAManager on main			; CHECK-O: Running analysis: AAManager on main
	; CHECK-O: Running analysis: BasicAA on main			; CHECK-O: Running analysis: BasicAA on main
	; CHECK-O: Running analysis: ScopedNoAliasAA on main			; CHECK-O: Running analysis: ScopedNoAliasAA on main
	; CHECK-O: Running analysis: TypeBasedAA on main			; CHECK-O: Running analysis: TypeBasedAA on main
	; CHECK-O: Running analysis: OuterAnalysisManagerProxy			; CHECK-O: Running analysis: OuterAnalysisManagerProxy
	; CHECK-O: Running pass: SimplifyCFGPass on main			; CHECK-O: Running pass: SimplifyCFGPass on main
	; CHECK-O: Running analysis: TargetIRAnalysis on main
	; CHECK-O: Finished {{.*}}Function pass manager run.			; CHECK-O: Finished {{.*}}Function pass manager run.
	; CHECK-O: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O: Running analysis: GlobalsAA			; CHECK-O: Running analysis: GlobalsAA
	; CHECK-O: Running analysis: CallGraphAnalysis			; CHECK-O: Running analysis: CallGraphAnalysis
	; CHECK-O: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.}}DevirtSCCRepeatedPass<{{.}}PassManager<{{.*}}LazyCallGraph::SCC			; CHECK-O: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.}}DevirtSCCRepeatedPass<{{.}}PassManager<{{.*}}LazyCallGraph::SCC
	; CHECK-O: Running analysis: InnerAnalysisManagerProxy			; CHECK-O: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O: Running analysis: LazyCallGraphAnalysis			; CHECK-O: Running analysis: LazyCallGraphAnalysis
	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show All 20 Lines
#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H
#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H

#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
		#include "llvm/Support/KnownBits.h"
		lattnerUnsubmitted Not Done Reply Inline Actions Can this be forward declared instead of #include'd? lattner: Can this be forward declared instead of #include'd?
#include <functional>		#include <functional>

namespace llvm {		namespace llvm {

namespace Intrinsic {		namespace Intrinsic {
typedef unsigned ID;		typedef unsigned ID;
}		}

class AssumptionCache;		class AssumptionCache;
class BlockFrequencyInfo;		class BlockFrequencyInfo;
class DominatorTree;		class DominatorTree;
class BranchInst;		class BranchInst;
class CallBase;		class CallBase;
class ExtractElementInst;		class ExtractElementInst;
class Function;		class Function;
class GlobalValue;		class GlobalValue;
		class InstCombiner;
class IntrinsicInst;		class IntrinsicInst;
class LoadInst;		class LoadInst;
class LoopAccessInfo;		class LoopAccessInfo;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class ProfileSummaryInfo;		class ProfileSummaryInfo;
class SCEV;		class SCEV;
class ScalarEvolution;		class ScalarEvolution;
▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
AssumptionCache &AC, TargetLibraryInfo *TLI,		AssumptionCache &AC, TargetLibraryInfo *TLI,
DominatorTree *DT,		DominatorTree *DT,
const LoopAccessInfo *LAI) const;		const LoopAccessInfo *LAI) const;

/// Query the target whether lowering of the llvm.get.active.lane.mask		/// Query the target whether lowering of the llvm.get.active.lane.mask
/// intrinsic is supported.		/// intrinsic is supported.
bool emitGetActiveLaneMask() const;		bool emitGetActiveLaneMask() const;

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;
		nikicUnsubmitted Not Done Reply Inline Actions For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction *` argument and bool result, is there any reason not to have an `Instruction ` return value, with nullptr indicating that the intrinsic couldn't be simplified? nikic: For all three functions, the calling convention seems rather non-idiomatic for InstCombine.
		FlakebiAuthorUnsubmitted Done Reply Inline Actions Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is called or other code is executed after `instCombineIntrinsic`. So, somehow the caller must be able to see a difference between 'do nothing, just continue execution' and 'return this Instruction', where the `Instruction` can also be a nullptr. The return type could be an `optional<Instruction>`. I’ll take a look at your other comments on Monday. Flakebi:* Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is…
		bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		APInt DemandedMask, KnownBits &Known,
		nikicUnsubmitted Not Done Reply Inline Actions `const APInt &DemandedMask`? nikic: `const APInt &DemandedMask`?
		FlakebiAuthorUnsubmitted Done Reply Inline Actions I tried to change it it to to `const APInt &DemandedMask` but the x86 simplifyDemandedVectorEltsIntrinsic changes `DemandedMask`, so this function would have to copy it or take a non-const reference. Looking more into it, `SimplifyAndSetOp` takes `DemandedElts` by value too. An `APInt` consists of a `uint64_t` and an `unsigned`, so it should be 16 Byte in most cases. Only if the represented int is larger than 64 bit, it comes with an allocation. I guess copying should be fine. If you think it should be a reference anyway, let me know and I’ll change it. Flakebi: I tried to change it it to to `const APInt &DemandedMask` but the x86…
		bool &KnownBitsComputed,
		Value **ResultV) const;
		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		nikicUnsubmitted Not Done Reply Inline Actions `const APInt &DemandedElts`? nikic: `const APInt &DemandedElts`?
		APInt &UndefElts2, APInt &UndefElts3,
		foadUnsubmitted Not Done Reply Inline Actions Did you consider returning `std::pair<bool,Instruction>`? foad:* Did you consider returning `std::pair<bool,Instruction*>`?
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) const;

/// @}		/// @}

/// \name Scalar Target Information		/// \name Scalar Target Information
/// @{		/// @{

/// Flags indicating the kind of support for population count.		/// Flags indicating the kind of support for population count.
///		///
/// Compared to the SW implementation, HW support is supposed to		/// Compared to the SW implementation, HW support is supposed to
▲ Show 20 Lines • Show All 744 Lines • ▼ Show 20 Lines	virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
AssumptionCache &AC,		AssumptionCache &AC,
TargetLibraryInfo *LibInfo,		TargetLibraryInfo *LibInfo,
HardwareLoopInfo &HWLoopInfo) = 0;		HardwareLoopInfo &HWLoopInfo) = 0;
virtual bool		virtual bool
preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,		preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
AssumptionCache &AC, TargetLibraryInfo *TLI,		AssumptionCache &AC, TargetLibraryInfo *TLI,
DominatorTree DT, const LoopAccessInfo LAI) = 0;		DominatorTree DT, const LoopAccessInfo LAI) = 0;
virtual bool emitGetActiveLaneMask() = 0;		virtual bool emitGetActiveLaneMask() = 0;
		virtual bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) = 0;
		virtual bool simplifyDemandedUseBitsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed, Value **ResultV) = 0;
		virtual bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace,		int64_t Scale, unsigned AddrSpace,
Instruction *I) = 0;		Instruction *I) = 0;
virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2) = 0;		TargetTransformInfo::LSRCost &C2) = 0;
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
AssumptionCache &AC, TargetLibraryInfo *TLI,		AssumptionCache &AC, TargetLibraryInfo *TLI,
DominatorTree *DT,		DominatorTree *DT,
const LoopAccessInfo *LAI) override {		const LoopAccessInfo *LAI) override {
return Impl.preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return Impl.preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}
bool emitGetActiveLaneMask() override {		bool emitGetActiveLaneMask() override {
return Impl.emitGetActiveLaneMask();		return Impl.emitGetActiveLaneMask();
}		}
		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) override {
		return Impl.instCombineIntrinsic(IC, II, ResultI);
		}
		bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed,
		Value **ResultV) override {
		return Impl.simplifyDemandedUseBitsIntrinsic(IC, II, DemandedMask, Known,
		KnownBitsComputed, ResultV);
		}
		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) override {
		return Impl.simplifyDemandedVectorEltsIntrinsic(
		IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
		SimplifyAndSetOp, ResultV);
		}
bool isLegalAddImmediate(int64_t Imm) override {		bool isLegalAddImmediate(int64_t Imm) override {
return Impl.isLegalAddImmediate(Imm);		return Impl.isLegalAddImmediate(Imm);
}		}
bool isLegalICmpImmediate(int64_t Imm) override {		bool isLegalICmpImmediate(int64_t Imm) override {
return Impl.isLegalICmpImmediate(Imm);		return Impl.isLegalICmpImmediate(Imm);
}		}
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace,		bool HasBaseReg, int64_t Scale, unsigned AddrSpace,
▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
const LoopAccessInfo *LAI) const {		const LoopAccessInfo *LAI) const {
return false;		return false;
}		}

bool emitGetActiveLaneMask() const {		bool emitGetActiveLaneMask() const {
return false;		return false;
}		}

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		nikicUnsubmitted Not Done Reply Inline Actions Actually implementing this would require us to export the `InstCombiner` class, which is part of `InstCombineInternal.h`. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine. nikic: Actually implementing this would require us to export the `InstCombiner` class, which is part…
		Instruction **ResultI) const {
		return false;
		}

		bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed,
		Value **ResultV) const {
		return false;
		}

		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) const {
		return false;
		}

void getUnrollingPreferences(Loop *, ScalarEvolution &,		void getUnrollingPreferences(Loop *, ScalarEvolution &,
TTI::UnrollingPreferences &) {}		TTI::UnrollingPreferences &) {}

bool isLegalAddImmediate(int64_t Imm) { return false; }		bool isLegalAddImmediate(int64_t Imm) { return false; }

bool isLegalICmpImmediate(int64_t Imm) { return false; }		bool isLegalICmpImmediate(int64_t Imm) { return false; }

bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
▲ Show 20 Lines • Show All 891 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 464 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
const LoopAccessInfo *LAI) {		const LoopAccessInfo *LAI) {
return BaseT::preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return BaseT::preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}

bool emitGetActiveLaneMask() {		bool emitGetActiveLaneMask() {
return BaseT::emitGetActiveLaneMask();		return BaseT::emitGetActiveLaneMask();
}		}

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) {
		return BaseT::instCombineIntrinsic(IC, II, ResultI);
		}

		bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed,
		Value **ResultV) {
		return BaseT::simplifyDemandedUseBitsIntrinsic(IC, II, DemandedMask, Known,
		KnownBitsComputed, ResultV);
		}

		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) {
		return BaseT::simplifyDemandedVectorEltsIntrinsic(
		IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
		SimplifyAndSetOp, ResultV);
		}

int getInstructionLatency(const Instruction *I) {		int getInstructionLatency(const Instruction *I) {
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
return getST()->getSchedModel().DefaultLoadLatency;		return getST()->getSchedModel().DefaultLoadLatency;

return BaseT::getInstructionLatency(I);		return BaseT::getInstructionLatency(I);
}		}

virtual Optional<unsigned>		virtual Optional<unsigned>
▲ Show 20 Lines • Show All 1,136 Lines • ▼ Show 20 Lines	for (unsigned ISD : ISDs) {
LegalCost.push_back(LT.first * 1);		LegalCost.push_back(LT.first * 1);
} else if (!TLI->isOperationExpand(ISD, LT.second)) {		} else if (!TLI->isOperationExpand(ISD, LT.second)) {
// If the operation is custom lowered then assume		// If the operation is custom lowered then assume
// that the code is twice as expensive.		// that the code is twice as expensive.
CustomCost.push_back(LT.first * 2);		CustomCost.push_back(LT.first * 2);
}		}
}		}

auto MinLegalCostI = std::min_element(LegalCost.begin(), LegalCost.end());		auto *MinLegalCostI = std::min_element(LegalCost.begin(), LegalCost.end());
if (MinLegalCostI != LegalCost.end())		if (MinLegalCostI != LegalCost.end())
return *MinLegalCostI;		return *MinLegalCostI;

auto MinCustomCostI =		auto MinCustomCostI =
std::min_element(CustomCost.begin(), CustomCost.end());		std::min_element(CustomCost.begin(), CustomCost.end());
if (MinCustomCostI != CustomCost.end())		if (MinCustomCostI != CustomCost.end())
return *MinCustomCostI;		return *MinCustomCostI;

▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h

This file was added.

				//===- InstCombiner.h - InstCombine implementation --------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file provides the interface for the instcombine pass implementation.
				/// The interface is used for generic transformations in this folder and
				/// target specific combinations in the targets.
				/// The visitor implementation is in \c InstCombinerImpl in
				/// \c InstCombineInternal.h.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_INSTCOMBINE_INSTCOMBINER_H
				#define LLVM_TRANSFORMS_INSTCOMBINE_INSTCOMBINER_H

				#include "llvm/Analysis/InstructionSimplify.h"
				#include "llvm/Analysis/TargetFolder.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/KnownBits.h"
				#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
				#include "llvm/Transforms/Utils/Local.h"
				#include <cassert>

				lattnerUnsubmitted Not Done Reply Inline Actions Please minimize #includes in general, thanks :) lattner: Please minimize #includes in general, thanks :)
				#define DEBUG_TYPE "instcombine"

				namespace llvm {

				class AAResults;
				class AssumptionCache;
				class ProfileSummaryInfo;
				class TargetLibraryInfo;
				class TargetTransformInfo;

				/// The core instruction combiner logic.
				///
				/// This class provides both the logic to recursively visit instructions and
				/// combine them.
				class LLVM_LIBRARY_VISIBILITY InstCombiner {
				public:
				lattnerUnsubmitted Not Done Reply Inline Actions I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine? If you're curious for a pattern that could be followed, the MLIR AsmParser is a reasonable example. The parser is spread across a bunch of classes in the lib/ directory: https://github.com/llvm/llvm-project/blob/master/mlir/lib/Parser/Parser.cpp But then there is a much smaller public API exposed through a header: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/OpImplementation.h#L229 lattner: I would really rather not make this be a public class - this is a very thick interface. Can…
				nhaehnleUnsubmitted Not Done Reply Inline Actions I agree with the sentiment, but note @Flakebi has split up the `InstCombiner` class into `InstCombiner` and `InstCombinerImpl` classes, which addresses those concerns already as far as I'm concerned. Looking through the new `InstCombiner`, aside from methods that are core to the workings of InstCombine (modifying instructions while keeping track of the Worklist) and methods for accessing the analyses, what's left is: A bunch of static methods that should arguably just be global functions in a utils header somewhere. CreateOverflowTuple and CreateNonTerminatorUnreachable Moving those methods feels sensible, but is likely to touch a lot of code, so I think it would be better to do it in a separate commit. nhaehnle: I agree with the sentiment, but note @Flakebi has split up the `InstCombiner` class into…
				/// Maximum size of array considered when transforming.
				uint64_t MaxArraySizeForCombine = 0;

				/// An IRBuilder that automatically inserts new instructions into the
				/// worklist.
				using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;
				BuilderTy &Builder;

				/// Only used to call target specific inst combining.
				TargetTransformInfo &TTI;

				protected:
				/// A worklist of the instructions that need to be simplified.
				InstCombineWorklist &Worklist;

				// Mode in which we are running the combiner.
				const bool MinimizeSize;

				AAResults *AA;

				// Required analyses.
				AssumptionCache &AC;
				TargetLibraryInfo &TLI;
				DominatorTree &DT;
				const DataLayout &DL;
				const SimplifyQuery SQ;
				OptimizationRemarkEmitter &ORE;
				BlockFrequencyInfo *BFI;
				ProfileSummaryInfo *PSI;

				// Optional analyses. When non-null, these can both be used to do better
				// combining and will be updated to reflect any changes.
				LoopInfo *LI;

				bool MadeIRChange = false;

				public:
				InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,
				bool MinimizeSize, AAResults *AA, AssumptionCache &AC,
				TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
				DominatorTree &DT, OptimizationRemarkEmitter &ORE,
				BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,
				const DataLayout &DL, LoopInfo *LI)
				: Builder(Builder), TTI(TTI), Worklist(Worklist),
				MinimizeSize(MinimizeSize), AA(AA), AC(AC), TLI(TLI), DT(DT), DL(DL),
				SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}

				virtual ~InstCombiner() {}

				/// Return the source operand of a potentially bitcasted value while
				/// optionally checking if it has one use. If there is no bitcast or the one
				/// use check is not met, return the input value itself.
				static Value peekThroughBitcast(Value V, bool OneUseOnly = false) {
				if (auto *BitCast = dyn_cast<BitCastInst>(V))
				if (!OneUseOnly \|\| BitCast->hasOneUse())
				return BitCast->getOperand(0);

				// V is not a bitcast or V has more than one use and OneUseOnly is true.
				return V;
				}

				/// Assign a complexity or rank value to LLVM Values. This is used to reduce
				/// the amount of pattern matching needed for compares and commutative
				/// instructions. For example, if we have:
				/// icmp ugt X, Constant
				/// or
				/// xor (add X, Constant), cast Z
				///
				/// We do not have to consider the commuted variants of these patterns because
				/// canonicalization based on complexity guarantees the above ordering.
				///
				/// This routine maps IR values to various complexity ranks:
				/// 0 -> undef
				/// 1 -> Constants
				/// 2 -> Other non-instructions
				/// 3 -> Arguments
				/// 4 -> Cast and (f)neg/not instructions
				/// 5 -> Other instructions
				static unsigned getComplexity(Value *V) {
				if (isa<Instruction>(V)) {
				if (isa<CastInst>(V) \|\| match(V, m_Neg(PatternMatch::m_Value())) \|\|
				match(V, m_Not(PatternMatch::m_Value())) \|\|
				match(V, m_FNeg(PatternMatch::m_Value())))
				return 4;
				return 5;
				}
				if (isa<Argument>(V))
				return 3;
				return isa<Constant>(V) ? (isa<UndefValue>(V) ? 0 : 1) : 2;
				}

				/// Predicate canonicalization reduces the number of patterns that need to be
				/// matched by other transforms. For example, we may swap the operands of a
				/// conditional branch or select to create a compare with a canonical
				/// (inverted) predicate which is then more likely to be matched with other
				/// values.
				static bool isCanonicalPredicate(CmpInst::Predicate Pred) {
				switch (Pred) {
				case CmpInst::ICMP_NE:
				case CmpInst::ICMP_ULE:
				case CmpInst::ICMP_SLE:
				case CmpInst::ICMP_UGE:
				case CmpInst::ICMP_SGE:
				// TODO: There are 16 FCMP predicates. Should others be (not) canonical?
				case CmpInst::FCMP_ONE:
				case CmpInst::FCMP_OLE:
				case CmpInst::FCMP_OGE:
				return false;
				default:
				return true;
				}
				}

				/// Given an exploded icmp instruction, return true if the comparison only
				/// checks the sign bit. If it only checks the sign bit, set TrueIfSigned if
				/// the result of the comparison is true when the input value is signed.
				static bool isSignBitCheck(ICmpInst::Predicate Pred, const APInt &RHS,
				bool &TrueIfSigned) {
				switch (Pred) {
				case ICmpInst::ICMP_SLT: // True if LHS s< 0
				TrueIfSigned = true;
				return RHS.isNullValue();
				case ICmpInst::ICMP_SLE: // True if LHS s<= -1
				TrueIfSigned = true;
				return RHS.isAllOnesValue();
				case ICmpInst::ICMP_SGT: // True if LHS s> -1
				TrueIfSigned = false;
				return RHS.isAllOnesValue();
				case ICmpInst::ICMP_SGE: // True if LHS s>= 0
				TrueIfSigned = false;
				return RHS.isNullValue();
				case ICmpInst::ICMP_UGT:
				// True if LHS u> RHS and RHS == sign-bit-mask - 1
				TrueIfSigned = true;
				return RHS.isMaxSignedValue();
				case ICmpInst::ICMP_UGE:
				// True if LHS u>= RHS and RHS == sign-bit-mask (2^7, 2^15, 2^31, etc)
				TrueIfSigned = true;
				return RHS.isMinSignedValue();
				case ICmpInst::ICMP_ULT:
				// True if LHS u< RHS and RHS == sign-bit-mask (2^7, 2^15, 2^31, etc)
				TrueIfSigned = false;
				return RHS.isMinSignedValue();
				case ICmpInst::ICMP_ULE:
				// True if LHS u<= RHS and RHS == sign-bit-mask - 1
				TrueIfSigned = false;
				return RHS.isMaxSignedValue();
				default:
				return false;
				}
				}

				/// Add one to a Constant
				static Constant AddOne(Constant C) {
				return ConstantExpr::getAdd(C, ConstantInt::get(C->getType(), 1));
				}

				/// Subtract one from a Constant
				static Constant SubOne(Constant C) {
				return ConstantExpr::getSub(C, ConstantInt::get(C->getType(), 1));
				}

				llvm::Optional<std::pair<
				CmpInst::Predicate,
				Constant *>> static getFlippedStrictnessPredicateAndConstant(CmpInst::
				Predicate
				Pred,
				Constant *C);

				/// Return true if the specified value is free to invert (apply ~ to).
				/// This happens in cases where the ~ can be eliminated. If WillInvertAllUses
				/// is true, work under the assumption that the caller intends to remove all
				/// uses of V and only keep uses of ~V.
				///
				/// See also: canFreelyInvertAllUsersOf()
				static bool isFreeToInvert(Value *V, bool WillInvertAllUses) {
				// ~(~(X)) -> X.
				if (match(V, m_Not(PatternMatch::m_Value())))
				return true;

				// Constants can be considered to be not'ed values.
				if (match(V, PatternMatch::m_AnyIntegralConstant()))
				return true;

				// Compares can be inverted if all of their uses are being modified to use
				// the ~V.
				if (isa<CmpInst>(V))
				return WillInvertAllUses;

				// If `V` is of the form `A + Constant` then `-1 - V` can be folded into
				// `(-1 - Constant) - A` if we are willing to invert all of the uses.
				if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V))
				if (BO->getOpcode() == Instruction::Add \|\|
				BO->getOpcode() == Instruction::Sub)
				if (isa<Constant>(BO->getOperand(0)) \|\|
				isa<Constant>(BO->getOperand(1)))
				return WillInvertAllUses;

				// Selects with invertible operands are freely invertible
				if (match(V,
				m_Select(PatternMatch::m_Value(), m_Not(PatternMatch::m_Value()),
				m_Not(PatternMatch::m_Value()))))
				return WillInvertAllUses;

				return false;
				}

				/// Given i1 V, can every user of V be freely adapted if V is changed to !V ?
				///
				/// See also: isFreeToInvert()
				static bool canFreelyInvertAllUsersOf(Value V, Value IgnoredUser) {
				// Look at every user of V.
				for (User *U : V->users()) {
				if (U == IgnoredUser)
				continue; // Don't consider this user.

				auto *I = cast<Instruction>(U);
				switch (I->getOpcode()) {
				case Instruction::Select:
				case Instruction::Br:
				break; // Free to invert by swapping true/false values/destinations.
				case Instruction::Xor: // Can invert 'xor' if it's a 'not', by ignoring
				// it.
				if (!match(I, m_Not(PatternMatch::m_Value())))
				return false; // Not a 'not'.
				break;
				default:
				return false; // Don't know, likely not freely invertible.
				}
				// So far all users were free to invert...
				}
				return true; // Can freely invert all users!
				}

				/// Some binary operators require special handling to avoid poison and
				/// undefined behavior. If a constant vector has undef elements, replace those
				/// undefs with identity constants if possible because those are always safe
				/// to execute. If no identity constant exists, replace undef with some other
				/// safe constant.
				static Constant *
				getSafeVectorConstantForBinop(BinaryOperator::BinaryOps Opcode, Constant *In,
				bool IsRHSConstant) {
				auto *InVTy = dyn_cast<VectorType>(In->getType());
				assert(InVTy && "Not expecting scalars here");

				Type *EltTy = InVTy->getElementType();
				auto *SafeC = ConstantExpr::getBinOpIdentity(Opcode, EltTy, IsRHSConstant);
				if (!SafeC) {
				// TODO: Should this be available as a constant utility function? It is
				// similar to getBinOpAbsorber().
				if (IsRHSConstant) {
				switch (Opcode) {
				case Instruction::SRem: // X % 1 = 0
				case Instruction::URem: // X %u 1 = 0
				SafeC = ConstantInt::get(EltTy, 1);
				break;
				case Instruction::FRem: // X % 1.0 (doesn't simplify, but it is safe)
				SafeC = ConstantFP::get(EltTy, 1.0);
				break;
				default:
				llvm_unreachable(
				"Only rem opcodes have no identity constant for RHS");
				}
				} else {
				switch (Opcode) {
				case Instruction::Shl: // 0 << X = 0
				case Instruction::LShr: // 0 >>u X = 0
				case Instruction::AShr: // 0 >> X = 0
				case Instruction::SDiv: // 0 / X = 0
				case Instruction::UDiv: // 0 /u X = 0
				case Instruction::SRem: // 0 % X = 0
				case Instruction::URem: // 0 %u X = 0
				case Instruction::Sub: // 0 - X (doesn't simplify, but it is safe)
				case Instruction::FSub: // 0.0 - X (doesn't simplify, but it is safe)
				case Instruction::FDiv: // 0.0 / X (doesn't simplify, but it is safe)
				case Instruction::FRem: // 0.0 % X = 0
				SafeC = Constant::getNullValue(EltTy);
				break;
				default:
				llvm_unreachable("Expected to find identity constant for opcode");
				}
				}
				}
				assert(SafeC && "Must have safe constant for binop");
				unsigned NumElts = InVTy->getNumElements();
				SmallVector<Constant *, 16> Out(NumElts);
				for (unsigned i = 0; i != NumElts; ++i) {
				Constant *C = In->getAggregateElement(i);
				Out[i] = isa<UndefValue>(C) ? SafeC : C;
				}
				return ConstantVector::get(Out);
				}

				void addToWorklist(Instruction *I) { Worklist.push(I); }

				AssumptionCache &getAssumptionCache() const { return AC; }
				TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }
				DominatorTree &getDominatorTree() const { return DT; }
				const DataLayout &getDataLayout() const { return DL; }
				const SimplifyQuery &getSimplifyQuery() const { return SQ; }
				OptimizationRemarkEmitter &getOptimizationRemarkEmitter() const {
				return ORE;
				}
				BlockFrequencyInfo *getBlockFrequencyInfo() const { return BFI; }
				ProfileSummaryInfo *getProfileSummaryInfo() const { return PSI; }
				LoopInfo *getLoopInfo() const { return LI; }

				/// Inserts an instruction \p New before instruction \p Old
				///
				/// Also adds the new instruction to the worklist and returns \p New so that
				/// it is suitable for use as the return from the visitation patterns.
				Instruction InsertNewInstBefore(Instruction New, Instruction &Old) {
				assert(New && !New->getParent() &&
				"New instruction already inserted into a basic block!");
				BasicBlock *BB = Old.getParent();
				BB->getInstList().insert(Old.getIterator(), New); // Insert inst
				Worklist.push(New);
				return New;
				}

				/// Same as InsertNewInstBefore, but also sets the debug loc.
				Instruction InsertNewInstWith(Instruction New, Instruction &Old) {
				New->setDebugLoc(Old.getDebugLoc());
				return InsertNewInstBefore(New, Old);
				}

				/// A combiner-aware RAUW-like routine.
				///
				/// This method is to be used when an instruction is found to be dead,
				/// replaceable with another preexisting expression. Here we add all uses of
				/// I to the worklist, replace all uses of I with the new value, then return
				/// I, so that the inst combiner will know that I was modified.
				Instruction replaceInstUsesWith(Instruction &I, Value V) {
				// If there are no uses to replace, then we return nullptr to indicate that
				// no changes were made to the program.
				if (I.use_empty())
				return nullptr;

				Worklist.pushUsersToWorkList(I); // Add all modified instrs to worklist.

				// If we are replacing the instruction with itself, this must be in a
				// segment of unreachable code, so just clobber the instruction.
				if (&I == V)
				V = UndefValue::get(I.getType());

				LLVM_DEBUG(dbgs() << "IC: Replacing " << I << "\n"
				<< " with " << *V << '\n');

				I.replaceAllUsesWith(V);
				return &I;
				}

				/// Replace operand of instruction and add old operand to the worklist.
				Instruction replaceOperand(Instruction &I, unsigned OpNum, Value V) {
				Worklist.addValue(I.getOperand(OpNum));
				I.setOperand(OpNum, V);
				return &I;
				}

				/// Replace use and add the previously used value to the worklist.
				void replaceUse(Use &U, Value *NewValue) {
				Worklist.addValue(U);
				U = NewValue;
				}

				/// Creates a result tuple for an overflow intrinsic \p II with a given
				/// \p Result and a constant \p Overflow value.
				Instruction CreateOverflowTuple(IntrinsicInst II, Value *Result,
				Constant *Overflow) {
				Constant *V[] = {UndefValue::get(Result->getType()), Overflow};
				StructType *ST = cast<StructType>(II->getType());
				Constant *Struct = ConstantStruct::get(ST, V);
				return InsertValueInst::Create(Struct, Result, 0);
				}

				/// Create and insert the idiom we use to indicate a block is unreachable
				/// without having to rewrite the CFG from within InstCombine.
				void CreateNonTerminatorUnreachable(Instruction *InsertAt) {
				auto &Ctx = InsertAt->getContext();
				new StoreInst(ConstantInt::getTrue(Ctx),
				UndefValue::get(Type::getInt1PtrTy(Ctx)), InsertAt);
				}

				/// Combiner aware instruction erasure.
				///
				/// When dealing with an instruction that has side effects or produces a void
				/// value, we can't rely on DCE to delete the instruction. Instead, visit
				/// methods should return the value returned by this function.
				virtual Instruction *eraseInstFromFunction(Instruction &I) = 0;

				void computeKnownBits(const Value *V, KnownBits &Known, unsigned Depth,
				const Instruction *CxtI) const {
				llvm::computeKnownBits(V, Known, DL, Depth, &AC, CxtI, &DT);
				}

				KnownBits computeKnownBits(const Value *V, unsigned Depth,
				const Instruction *CxtI) const {
				return llvm::computeKnownBits(V, DL, Depth, &AC, CxtI, &DT);
				}

				bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero = false,
				unsigned Depth = 0,
				const Instruction *CxtI = nullptr) {
				return llvm::isKnownToBeAPowerOfTwo(V, DL, OrZero, Depth, &AC, CxtI, &DT);
				}

				bool MaskedValueIsZero(const Value *V, const APInt &Mask, unsigned Depth = 0,
				const Instruction *CxtI = nullptr) const {
				return llvm::MaskedValueIsZero(V, Mask, DL, Depth, &AC, CxtI, &DT);
				}

				unsigned ComputeNumSignBits(const Value *Op, unsigned Depth = 0,
				const Instruction *CxtI = nullptr) const {
				return llvm::ComputeNumSignBits(Op, DL, Depth, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForUnsignedMul(const Value *LHS,
				const Value *RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForUnsignedMul(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForSignedMul(const Value LHS, const Value RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForSignedMul(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForUnsignedAdd(const Value *LHS,
				const Value *RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForUnsignedAdd(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForSignedAdd(const Value LHS, const Value RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForSignedAdd(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForUnsignedSub(const Value *LHS,
				const Value *RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForUnsignedSub(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				OverflowResult computeOverflowForSignedSub(const Value LHS, const Value RHS,
				const Instruction *CxtI) const {
				return llvm::computeOverflowForSignedSub(LHS, RHS, DL, &AC, CxtI, &DT);
				}

				virtual bool SimplifyDemandedBits(Instruction *I, unsigned OpNo,
				const APInt &DemandedMask, KnownBits &Known,
				unsigned Depth = 0) = 0;
				virtual Value *
				SimplifyDemandedVectorElts(Value *V, APInt DemandedElts, APInt &UndefElts,
				unsigned Depth = 0,
				bool AllowMultipleUsers = false) = 0;
				};

				} // namespace llvm

				#undef DEBUG_TYPE

				#endif

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::preferPredicateOverEpilogue(
const LoopAccessInfo *LAI) const {		const LoopAccessInfo *LAI) const {
return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}

bool TargetTransformInfo::emitGetActiveLaneMask() const {		bool TargetTransformInfo::emitGetActiveLaneMask() const {
return TTIImpl->emitGetActiveLaneMask();		return TTIImpl->emitGetActiveLaneMask();
}		}

		bool TargetTransformInfo::instCombineIntrinsic(InstCombiner &IC,
		IntrinsicInst &II,
		Instruction **ResultI) const {
		return TTIImpl->instCombineIntrinsic(IC, II, ResultI);
		}

		bool TargetTransformInfo::simplifyDemandedUseBitsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed, Value **ResultV) const {
		return TTIImpl->simplifyDemandedUseBitsIntrinsic(IC, II, DemandedMask, Known,
		KnownBitsComputed, ResultV);
		}

		bool TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) const {
		return TTIImpl->simplifyDemandedVectorEltsIntrinsic(
		IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3, SimplifyAndSetOp,
		ResultV);
		}

void TargetTransformInfo::getUnrollingPreferences(		void TargetTransformInfo::getUnrollingPreferences(
Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {		Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {
return TTIImpl->getUnrollingPreferences(L, SE, UP);		return TTIImpl->getUnrollingPreferences(L, SE, UP);
}		}

bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {		bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
return TTIImpl->isLegalAddImmediate(Imm);		return TTIImpl->isLegalAddImmediate(Imm);
}		}
▲ Show 20 Lines • Show All 1,003 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

This file was added.

				//===- AMDGPInstCombineIntrinsic.cpp - AMDGPU specific InstCombine pass ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// \file
				// This file implements a TargetTransformInfo analysis pass specific to the
				// AMDGPU target machine. It uses the target's detailed information to provide
				// more precise answers to certain TTI queries, while letting the target
				// independent and default TTI implementations handle the rest.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPUTargetTransformInfo.h"
				#include "llvm/Transforms/InstCombine/InstCombiner.h"

				using namespace llvm;

				#define DEBUG_TYPE "AMDGPUtti"

				namespace {

				struct AMDGPUImageDMaskIntrinsic {
				unsigned Intr;
				};

				#define GET_AMDGPUImageDMaskIntrinsicTable_IMPL
				#include "InstCombineTables.inc"

				} // end anonymous namespace

				// Constant fold llvm.amdgcn.fmed3 intrinsics for standard inputs.
				//
				// A single NaN input is folded to minnum, so we rely on that folding for
				// handling NaNs.
				static APFloat fmed3AMDGCN(const APFloat &Src0, const APFloat &Src1,
				const APFloat &Src2) {
				APFloat Max3 = maxnum(maxnum(Src0, Src1), Src2);

				APFloat::cmpResult Cmp0 = Max3.compare(Src0);
				assert(Cmp0 != APFloat::cmpUnordered && "nans handled separately");
				if (Cmp0 == APFloat::cmpEqual)
				return maxnum(Src1, Src2);

				APFloat::cmpResult Cmp1 = Max3.compare(Src1);
				assert(Cmp1 != APFloat::cmpUnordered && "nans handled separately");
				if (Cmp1 == APFloat::cmpEqual)
				return maxnum(Src0, Src2);

				return maxnum(Src0, Src1);
				}

				bool GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
				Instruction **ResultI) const {
				Intrinsic::ID IID = II.getIntrinsicID();
				switch (IID) {
				default:
				break;
				case Intrinsic::amdgcn_rcp: {
				Value *Src = II.getArgOperand(0);

				// TODO: Move to ConstantFolding/InstSimplify?
				if (isa<UndefValue>(Src)) {
				Type *Ty = II.getType();
				auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
				*ResultI = IC.replaceInstUsesWith(II, QNaN);
				return true;
				}

				if (II.isStrictFP())
				break;

				if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {
				const APFloat &ArgVal = C->getValueAPF();
				APFloat Val(ArgVal.getSemantics(), 1);
				Val.divide(ArgVal, APFloat::rmNearestTiesToEven);

				// This is more precise than the instruction may give.
				//
				// TODO: The instruction always flushes denormal results (except for f16),
				// should this also?
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantFP::get(II.getContext(), Val));
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_rsq: {
				Value *Src = II.getArgOperand(0);

				// TODO: Move to ConstantFolding/InstSimplify?
				if (isa<UndefValue>(Src)) {
				Type *Ty = II.getType();
				auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
				*ResultI = IC.replaceInstUsesWith(II, QNaN);
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_frexp_mant:
				case Intrinsic::amdgcn_frexp_exp: {
				Value *Src = II.getArgOperand(0);
				if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {
				int Exp;
				APFloat Significand =
				frexp(C->getValueAPF(), Exp, APFloat::rmNearestTiesToEven);

				if (IID == Intrinsic::amdgcn_frexp_mant) {
				*ResultI = IC.replaceInstUsesWith(
				II, ConstantFP::get(II.getContext(), Significand));
				return true;
				}

				// Match instruction special case behavior.
				if (Exp == APFloat::IEK_NaN \|\| Exp == APFloat::IEK_Inf)
				Exp = 0;

				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Exp));
				return true;
				}

				if (isa<UndefValue>(Src)) {
				*ResultI = IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_class: {
				enum {
				S_NAN = 1 << 0, // Signaling NaN
				Q_NAN = 1 << 1, // Quiet NaN
				N_INFINITY = 1 << 2, // Negative infinity
				N_NORMAL = 1 << 3, // Negative normal
				N_SUBNORMAL = 1 << 4, // Negative subnormal
				N_ZERO = 1 << 5, // Negative zero
				P_ZERO = 1 << 6, // Positive zero
				P_SUBNORMAL = 1 << 7, // Positive subnormal
				P_NORMAL = 1 << 8, // Positive normal
				P_INFINITY = 1 << 9 // Positive infinity
				};

				const uint32_t FullMask = S_NAN \| Q_NAN \| N_INFINITY \| N_NORMAL \|
				N_SUBNORMAL \| N_ZERO \| P_ZERO \| P_SUBNORMAL \|
				P_NORMAL \| P_INFINITY;

				Value *Src0 = II.getArgOperand(0);
				Value *Src1 = II.getArgOperand(1);
				const ConstantInt *CMask = dyn_cast<ConstantInt>(Src1);
				if (!CMask) {
				if (isa<UndefValue>(Src0)) {
				*ResultI = IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));
				return true;
				}

				if (isa<UndefValue>(Src1)) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), false));
				return true;
				}
				break;
				}

				uint32_t Mask = CMask->getZExtValue();

				// If all tests are made, it doesn't matter what the value is.
				if ((Mask & FullMask) == FullMask) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), true));
				return true;
				}

				if ((Mask & FullMask) == 0) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), false));
				return true;
				}

				if (Mask == (S_NAN \| Q_NAN)) {
				// Equivalent of isnan. Replace with standard fcmp.
				Value *FCmp = IC.Builder.CreateFCmpUNO(Src0, Src0);
				FCmp->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, FCmp);
				return true;
				}

				if (Mask == (N_ZERO \| P_ZERO)) {
				// Equivalent of == 0.
				Value *FCmp =
				IC.Builder.CreateFCmpOEQ(Src0, ConstantFP::get(Src0->getType(), 0.0));

				FCmp->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, FCmp);
				return true;
				}

				// fp_class (nnan x), qnan\|snan\|other -> fp_class (nnan x), other
				if (((Mask & S_NAN) \|\| (Mask & Q_NAN)) &&
				isKnownNeverNaN(Src0, &IC.getTargetLibraryInfo())) {
				*ResultI = IC.replaceOperand(
				II, 1, ConstantInt::get(Src1->getType(), Mask & ~(S_NAN \| Q_NAN)));
				return true;
				}

				const ConstantFP *CVal = dyn_cast<ConstantFP>(Src0);
				if (!CVal) {
				if (isa<UndefValue>(Src0)) {
				*ResultI = IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));
				return true;
				}

				// Clamp mask to used bits
				if ((Mask & FullMask) != Mask) {
				CallInst *NewCall = IC.Builder.CreateCall(
				II.getCalledFunction(),
				{Src0, ConstantInt::get(Src1->getType(), Mask & FullMask)});

				NewCall->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, NewCall);
				return true;
				}

				break;
				}

				const APFloat &Val = CVal->getValueAPF();

				bool Result =
				((Mask & S_NAN) && Val.isNaN() && Val.isSignaling()) \|\|
				((Mask & Q_NAN) && Val.isNaN() && !Val.isSignaling()) \|\|
				((Mask & N_INFINITY) && Val.isInfinity() && Val.isNegative()) \|\|
				((Mask & N_NORMAL) && Val.isNormal() && Val.isNegative()) \|\|
				((Mask & N_SUBNORMAL) && Val.isDenormal() && Val.isNegative()) \|\|
				((Mask & N_ZERO) && Val.isZero() && Val.isNegative()) \|\|
				((Mask & P_ZERO) && Val.isZero() && !Val.isNegative()) \|\|
				((Mask & P_SUBNORMAL) && Val.isDenormal() && !Val.isNegative()) \|\|
				((Mask & P_NORMAL) && Val.isNormal() && !Val.isNegative()) \|\|
				((Mask & P_INFINITY) && Val.isInfinity() && !Val.isNegative());

				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Result));
				return true;
				}
				case Intrinsic::amdgcn_cvt_pkrtz: {
				Value *Src0 = II.getArgOperand(0);
				Value *Src1 = II.getArgOperand(1);
				if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
				if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {
				const fltSemantics &HalfSem =
				II.getType()->getScalarType()->getFltSemantics();
				bool LosesInfo;
				APFloat Val0 = C0->getValueAPF();
				APFloat Val1 = C1->getValueAPF();
				Val0.convert(HalfSem, APFloat::rmTowardZero, &LosesInfo);
				Val1.convert(HalfSem, APFloat::rmTowardZero, &LosesInfo);

				Constant *Folded =
				ConstantVector::get({ConstantFP::get(II.getContext(), Val0),
				ConstantFP::get(II.getContext(), Val1)});
				*ResultI = IC.replaceInstUsesWith(II, Folded);
				return true;
				}
				}

				if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1)) {
				*ResultI = IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_cvt_pknorm_i16:
				case Intrinsic::amdgcn_cvt_pknorm_u16:
				case Intrinsic::amdgcn_cvt_pk_i16:
				case Intrinsic::amdgcn_cvt_pk_u16: {
				Value *Src0 = II.getArgOperand(0);
				Value *Src1 = II.getArgOperand(1);

				if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1)) {
				*ResultI = IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_ubfe:
				case Intrinsic::amdgcn_sbfe: {
				// Decompose simple cases into standard shifts.
				Value *Src = II.getArgOperand(0);
				if (isa<UndefValue>(Src)) {
				*ResultI = IC.replaceInstUsesWith(II, Src);
				return true;
				}

				unsigned Width;
				Type *Ty = II.getType();
				unsigned IntSize = Ty->getIntegerBitWidth();

				ConstantInt *CWidth = dyn_cast<ConstantInt>(II.getArgOperand(2));
				if (CWidth) {
				Width = CWidth->getZExtValue();
				if ((Width & (IntSize - 1)) == 0) {
				*ResultI = IC.replaceInstUsesWith(II, ConstantInt::getNullValue(Ty));
				return true;
				}

				// Hardware ignores high bits, so remove those.
				if (Width >= IntSize) {
				*ResultI = IC.replaceOperand(
				II, 2, ConstantInt::get(CWidth->getType(), Width & (IntSize - 1)));
				return true;
				}
				}

				unsigned Offset;
				ConstantInt *COffset = dyn_cast<ConstantInt>(II.getArgOperand(1));
				if (COffset) {
				Offset = COffset->getZExtValue();
				if (Offset >= IntSize) {
				*ResultI = IC.replaceOperand(
				II, 1,
				ConstantInt::get(COffset->getType(), Offset & (IntSize - 1)));
				return true;
				}
				}

				bool Signed = IID == Intrinsic::amdgcn_sbfe;

				if (!CWidth \|\| !COffset)
				break;

				// The case of Width == 0 is handled above, which makes this tranformation
				// safe. If Width == 0, then the ashr and lshr instructions become poison
				// value since the shift amount would be equal to the bit size.
				assert(Width != 0);

				// TODO: This allows folding to undef when the hardware has specific
				// behavior?
				if (Offset + Width < IntSize) {
				Value *Shl = IC.Builder.CreateShl(Src, IntSize - Offset - Width);
				Value *RightShift = Signed ? IC.Builder.CreateAShr(Shl, IntSize - Width)
				: IC.Builder.CreateLShr(Shl, IntSize - Width);
				RightShift->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, RightShift);
				return true;
				}

				Value *RightShift = Signed ? IC.Builder.CreateAShr(Src, Offset)
				: IC.Builder.CreateLShr(Src, Offset);

				RightShift->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, RightShift);
				return true;
				}
				case Intrinsic::amdgcn_exp:
				case Intrinsic::amdgcn_exp_compr: {
				ConstantInt *En = cast<ConstantInt>(II.getArgOperand(1));
				unsigned EnBits = En->getZExtValue();
				if (EnBits == 0xf)
				break; // All inputs enabled.

				bool IsCompr = IID == Intrinsic::amdgcn_exp_compr;
				bool Changed = false;
				for (int I = 0; I < (IsCompr ? 2 : 4); ++I) {
				if ((!IsCompr && (EnBits & (1 << I)) == 0) \|\|
				(IsCompr && ((EnBits & (0x3 << (2 * I))) == 0))) {
				Value *Src = II.getArgOperand(I + 2);
				if (!isa<UndefValue>(Src)) {
				IC.replaceOperand(II, I + 2, UndefValue::get(Src->getType()));
				Changed = true;
				}
				}
				}

				if (Changed) {
				*ResultI = &II;
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_fmed3: {
				// Note this does not preserve proper sNaN behavior if IEEE-mode is enabled
				// for the shader.

				Value *Src0 = II.getArgOperand(0);
				Value *Src1 = II.getArgOperand(1);
				Value *Src2 = II.getArgOperand(2);

				// Checking for NaN before canonicalization provides better fidelity when
				// mapping other operations onto fmed3 since the order of operands is
				// unchanged.
				CallInst *NewCall = nullptr;
				if (match(Src0, PatternMatch::m_NaN()) \|\| isa<UndefValue>(Src0)) {
				NewCall = IC.Builder.CreateMinNum(Src1, Src2);
				} else if (match(Src1, PatternMatch::m_NaN()) \|\| isa<UndefValue>(Src1)) {
				NewCall = IC.Builder.CreateMinNum(Src0, Src2);
				} else if (match(Src2, PatternMatch::m_NaN()) \|\| isa<UndefValue>(Src2)) {
				NewCall = IC.Builder.CreateMaxNum(Src0, Src1);
				}

				if (NewCall) {
				NewCall->copyFastMathFlags(&II);
				NewCall->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, NewCall);
				return true;
				}

				bool Swap = false;
				// Canonicalize constants to RHS operands.
				//
				// fmed3(c0, x, c1) -> fmed3(x, c0, c1)
				if (isa<Constant>(Src0) && !isa<Constant>(Src1)) {
				std::swap(Src0, Src1);
				Swap = true;
				}

				if (isa<Constant>(Src1) && !isa<Constant>(Src2)) {
				std::swap(Src1, Src2);
				Swap = true;
				}

				if (isa<Constant>(Src0) && !isa<Constant>(Src1)) {
				std::swap(Src0, Src1);
				Swap = true;
				}

				if (Swap) {
				II.setArgOperand(0, Src0);
				II.setArgOperand(1, Src1);
				II.setArgOperand(2, Src2);
				*ResultI = &II;
				return true;
				}

				if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
				if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {
				if (const ConstantFP *C2 = dyn_cast<ConstantFP>(Src2)) {
				APFloat Result = fmed3AMDGCN(C0->getValueAPF(), C1->getValueAPF(),
				C2->getValueAPF());
				*ResultI = IC.replaceInstUsesWith(
				II, ConstantFP::get(IC.Builder.getContext(), Result));
				return true;
				}
				}
				}

				break;
				}
				case Intrinsic::amdgcn_icmp:
				case Intrinsic::amdgcn_fcmp: {
				const ConstantInt *CC = cast<ConstantInt>(II.getArgOperand(2));
				// Guard against invalid arguments.
				int64_t CCVal = CC->getZExtValue();
				bool IsInteger = IID == Intrinsic::amdgcn_icmp;
				if ((IsInteger && (CCVal < CmpInst::FIRST_ICMP_PREDICATE \|\|
				CCVal > CmpInst::LAST_ICMP_PREDICATE)) \|\|
				(!IsInteger && (CCVal < CmpInst::FIRST_FCMP_PREDICATE \|\|
				CCVal > CmpInst::LAST_FCMP_PREDICATE)))
				break;

				Value *Src0 = II.getArgOperand(0);
				Value *Src1 = II.getArgOperand(1);

				if (auto *CSrc0 = dyn_cast<Constant>(Src0)) {
				if (auto *CSrc1 = dyn_cast<Constant>(Src1)) {
				Constant *CCmp = ConstantExpr::getCompare(CCVal, CSrc0, CSrc1);
				if (CCmp->isNullValue()) {
				*ResultI = IC.replaceInstUsesWith(
				II, ConstantExpr::getSExt(CCmp, II.getType()));
				return true;
				}

				// The result of V_ICMP/V_FCMP assembly instructions (which this
				// intrinsic exposes) is one bit per thread, masked with the EXEC
				// register (which contains the bitmask of live threads). So a
				// comparison that always returns true is the same as a read of the
				// EXEC register.
				Function *NewF = Intrinsic::getDeclaration(
				II.getModule(), Intrinsic::read_register, II.getType());
				Metadata *MDArgs[] = {MDString::get(II.getContext(), "exec")};
				MDNode *MD = MDNode::get(II.getContext(), MDArgs);
				Value *Args[] = {MetadataAsValue::get(II.getContext(), MD)};
				CallInst *NewCall = IC.Builder.CreateCall(NewF, Args);
				NewCall->addAttribute(AttributeList::FunctionIndex,
				Attribute::Convergent);
				NewCall->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, NewCall);
				return true;
				}

				// Canonicalize constants to RHS.
				CmpInst::Predicate SwapPred =
				CmpInst::getSwappedPredicate(static_cast<CmpInst::Predicate>(CCVal));
				II.setArgOperand(0, Src1);
				II.setArgOperand(1, Src0);
				II.setArgOperand(
				2, ConstantInt::get(CC->getType(), static_cast<int>(SwapPred)));
				*ResultI = &II;
				return true;
				}

				if (CCVal != CmpInst::ICMP_EQ && CCVal != CmpInst::ICMP_NE)
				break;

				// Canonicalize compare eq with true value to compare != 0
				// llvm.amdgcn.icmp(zext (i1 x), 1, eq)
				// -> llvm.amdgcn.icmp(zext (i1 x), 0, ne)
				// llvm.amdgcn.icmp(sext (i1 x), -1, eq)
				// -> llvm.amdgcn.icmp(sext (i1 x), 0, ne)
				Value *ExtSrc;
				if (CCVal == CmpInst::ICMP_EQ &&
				((match(Src1, PatternMatch::m_One()) &&
				match(Src0, m_ZExt(PatternMatch::m_Value(ExtSrc)))) \|\|
				(match(Src1, PatternMatch::m_AllOnes()) &&
				match(Src0, m_SExt(PatternMatch::m_Value(ExtSrc))))) &&
				ExtSrc->getType()->isIntegerTy(1)) {
				IC.replaceOperand(II, 1, ConstantInt::getNullValue(Src1->getType()));
				IC.replaceOperand(II, 2,
				ConstantInt::get(CC->getType(), CmpInst::ICMP_NE));
				*ResultI = &II;
				return true;
				}

				CmpInst::Predicate SrcPred;
				Value *SrcLHS;
				Value *SrcRHS;

				// Fold compare eq/ne with 0 from a compare result as the predicate to the
				// intrinsic. The typical use is a wave vote function in the library, which
				// will be fed from a user code condition compared with 0. Fold in the
				// redundant compare.

				// llvm.amdgcn.icmp([sz]ext ([if]cmp pred a, b), 0, ne)
				// -> llvm.amdgcn.[if]cmp(a, b, pred)
				//
				// llvm.amdgcn.icmp([sz]ext ([if]cmp pred a, b), 0, eq)
				// -> llvm.amdgcn.[if]cmp(a, b, inv pred)
				if (match(Src1, PatternMatch::m_Zero()) &&
				match(Src0, PatternMatch::m_ZExtOrSExt(
				m_Cmp(SrcPred, PatternMatch::m_Value(SrcLHS),
				PatternMatch::m_Value(SrcRHS))))) {
				if (CCVal == CmpInst::ICMP_EQ)
				SrcPred = CmpInst::getInversePredicate(SrcPred);

				Intrinsic::ID NewIID = CmpInst::isFPPredicate(SrcPred)
				? Intrinsic::amdgcn_fcmp
				: Intrinsic::amdgcn_icmp;

				Type *Ty = SrcLHS->getType();
				if (auto *CmpType = dyn_cast<IntegerType>(Ty)) {
				// Promote to next legal integer type.
				unsigned Width = CmpType->getBitWidth();
				unsigned NewWidth = Width;

				// Don't do anything for i1 comparisons.
				if (Width == 1)
				break;

				if (Width <= 16)
				NewWidth = 16;
				else if (Width <= 32)
				NewWidth = 32;
				else if (Width <= 64)
				NewWidth = 64;
				else if (Width > 64)
				break; // Can't handle this.

				if (Width != NewWidth) {
				IntegerType *CmpTy = IC.Builder.getIntNTy(NewWidth);
				if (CmpInst::isSigned(SrcPred)) {
				SrcLHS = IC.Builder.CreateSExt(SrcLHS, CmpTy);
				SrcRHS = IC.Builder.CreateSExt(SrcRHS, CmpTy);
				} else {
				SrcLHS = IC.Builder.CreateZExt(SrcLHS, CmpTy);
				SrcRHS = IC.Builder.CreateZExt(SrcRHS, CmpTy);
				}
				}
				} else if (!Ty->isFloatTy() && !Ty->isDoubleTy() && !Ty->isHalfTy())
				break;

				Function *NewF = Intrinsic::getDeclaration(
				II.getModule(), NewIID, {II.getType(), SrcLHS->getType()});
				Value *Args[] = {SrcLHS, SrcRHS,
				ConstantInt::get(CC->getType(), SrcPred)};
				CallInst *NewCall = IC.Builder.CreateCall(NewF, Args);
				NewCall->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, NewCall);
				return true;
				}

				break;
				}
				case Intrinsic::amdgcn_ballot: {
				if (auto *Src = dyn_cast<ConstantInt>(II.getArgOperand(0))) {
				if (Src->isZero()) {
				// amdgcn.ballot(i1 0) is zero.
				*ResultI =
				IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType()));
				return true;
				}

				if (Src->isOne()) {
				// amdgcn.ballot(i1 1) is exec.
				const char *RegName = "exec";
				if (II.getType()->isIntegerTy(32))
				RegName = "exec_lo";
				else if (!II.getType()->isIntegerTy(64))
				break;

				Function *NewF = Intrinsic::getDeclaration(
				II.getModule(), Intrinsic::read_register, II.getType());
				Metadata *MDArgs[] = {MDString::get(II.getContext(), RegName)};
				MDNode *MD = MDNode::get(II.getContext(), MDArgs);
				Value *Args[] = {MetadataAsValue::get(II.getContext(), MD)};
				CallInst *NewCall = IC.Builder.CreateCall(NewF, Args);
				NewCall->addAttribute(AttributeList::FunctionIndex,
				Attribute::Convergent);
				NewCall->takeName(&II);
				*ResultI = IC.replaceInstUsesWith(II, NewCall);
				return true;
				}
				}
				break;
				}
				case Intrinsic::amdgcn_wqm_vote: {
				// wqm_vote is identity when the argument is constant.
				if (!isa<Constant>(II.getArgOperand(0)))
				break;

				*ResultI = IC.replaceInstUsesWith(II, II.getArgOperand(0));
				return true;
				}
				case Intrinsic::amdgcn_kill: {
				const ConstantInt *C = dyn_cast<ConstantInt>(II.getArgOperand(0));
				if (!C \|\| !C->getZExtValue())
				break;

				// amdgcn.kill(i1 1) is a no-op
				*ResultI = IC.eraseInstFromFunction(II);
				return true;
				}
				case Intrinsic::amdgcn_update_dpp: {
				Value *Old = II.getArgOperand(0);

				auto *BC = cast<ConstantInt>(II.getArgOperand(5));
				auto *RM = cast<ConstantInt>(II.getArgOperand(3));
				auto *BM = cast<ConstantInt>(II.getArgOperand(4));
				if (BC->isZeroValue() \|\| RM->getZExtValue() != 0xF \|\|
				BM->getZExtValue() != 0xF \|\| isa<UndefValue>(Old))
				break;

				// If bound_ctrl = 1, row mask = bank mask = 0xf we can omit old value.
				*ResultI = IC.replaceOperand(II, 0, UndefValue::get(Old->getType()));
				return true;
				}
				case Intrinsic::amdgcn_permlane16:
				case Intrinsic::amdgcn_permlanex16: {
				// Discard vdst_in if it's not going to be read.
				Value *VDstIn = II.getArgOperand(0);
				if (isa<UndefValue>(VDstIn))
				break;

				ConstantInt *FetchInvalid = cast<ConstantInt>(II.getArgOperand(4));
				ConstantInt *BoundCtrl = cast<ConstantInt>(II.getArgOperand(5));
				if (!FetchInvalid->getZExtValue() && !BoundCtrl->getZExtValue())
				break;

				*ResultI = IC.replaceOperand(II, 0, UndefValue::get(VDstIn->getType()));
				return true;
				}
				case Intrinsic::amdgcn_readfirstlane:
				case Intrinsic::amdgcn_readlane: {
				// A constant value is trivially uniform.
				if (Constant *C = dyn_cast<Constant>(II.getArgOperand(0))) {
				*ResultI = IC.replaceInstUsesWith(II, C);
				return true;
				}

				// The rest of these may not be safe if the exec may not be the same between
				// the def and use.
				Value *Src = II.getArgOperand(0);
				Instruction *SrcInst = dyn_cast<Instruction>(Src);
				if (SrcInst && SrcInst->getParent() != II.getParent())
				break;

				// readfirstlane (readfirstlane x) -> readfirstlane x
				// readlane (readfirstlane x), y -> readfirstlane x
				if (match(Src,
				PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readfirstlane>())) {
				*ResultI = IC.replaceInstUsesWith(II, Src);
				return true;
				}

				if (IID == Intrinsic::amdgcn_readfirstlane) {
				// readfirstlane (readlane x, y) -> readlane x, y
				if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>())) {
				*ResultI = IC.replaceInstUsesWith(II, Src);
				return true;
				}
				} else {
				// readlane (readlane x, y), y -> readlane x, y
				if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>(
				PatternMatch::m_Value(),
				PatternMatch::m_Specific(II.getArgOperand(1))))) {
				*ResultI = IC.replaceInstUsesWith(II, Src);
				return true;
				}
				}

				break;
				}
				case Intrinsic::amdgcn_ldexp: {
				// FIXME: This doesn't introduce new instructions and belongs in
				// InstructionSimplify.
				Type *Ty = II.getType();
				Value *Op0 = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);

				// Folding undef to qnan is safe regardless of the FP mode.
				if (isa<UndefValue>(Op0)) {
				auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
				*ResultI = IC.replaceInstUsesWith(II, QNaN);
				return true;
				}

				const APFloat *C = nullptr;
				match(Op0, PatternMatch::m_APFloat(C));

				// FIXME: Should flush denorms depending on FP mode, but that's ignored
				// everywhere else.
				//
				// These cases should be safe, even with strictfp.
				// ldexp(0.0, x) -> 0.0
				// ldexp(-0.0, x) -> -0.0
				// ldexp(inf, x) -> inf
				// ldexp(-inf, x) -> -inf
				if (C && (C->isZero() \|\| C->isInfinity())) {
				*ResultI = IC.replaceInstUsesWith(II, Op0);
				return true;
				}

				// With strictfp, be more careful about possibly needing to flush denormals
				// or not, and snan behavior depends on ieee_mode.
				if (II.isStrictFP())
				break;

				if (C && C->isNaN()) {
				// FIXME: We just need to make the nan quiet here, but that's unavailable
				// on APFloat, only IEEEfloat
				auto *Quieted =
				ConstantFP::get(Ty, scalbn(*C, 0, APFloat::rmNearestTiesToEven));
				*ResultI = IC.replaceInstUsesWith(II, Quieted);
				return true;
				}

				// ldexp(x, 0) -> x
				// ldexp(x, undef) -> x
				if (isa<UndefValue>(Op1) \|\| match(Op1, PatternMatch::m_ZeroInt())) {
				*ResultI = IC.replaceInstUsesWith(II, Op0);
				return true;
				}

				break;
				}
				}
				return false;
				}

				/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.
				///
				/// Note: This only supports non-TFE/LWE image intrinsic calls; those have
				/// struct returns.
				Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
				IntrinsicInst &II,
				APInt DemandedElts,
				int DMaskIdx = -1) {

				// FIXME: Allow v3i16/v3f16 in buffer intrinsics when the types are fully
				// supported.
				if (DMaskIdx < 0 && II.getType()->getScalarSizeInBits() != 32 &&
				DemandedElts.getActiveBits() == 3)
				return nullptr;

				auto *IIVTy = cast<VectorType>(II.getType());
				unsigned VWidth = IIVTy->getNumElements();
				if (VWidth == 1)
				return nullptr;

				IRBuilderBase::InsertPointGuard Guard(IC.Builder);
				IC.Builder.SetInsertPoint(&II);

				// Assume the arguments are unchanged and later override them, if needed.
				SmallVector<Value *, 16> Args(II.arg_begin(), II.arg_end());

				if (DMaskIdx < 0) {
				// Buffer case.

				const unsigned ActiveBits = DemandedElts.getActiveBits();
				const unsigned UnusedComponentsAtFront = DemandedElts.countTrailingZeros();

				// Start assuming the prefix of elements is demanded, but possibly clear
				// some other bits if there are trailing zeros (unused components at front)
				// and update offset.
				DemandedElts = (1 << ActiveBits) - 1;

				if (UnusedComponentsAtFront > 0) {
				static const unsigned InvalidOffsetIdx = 0xf;

				unsigned OffsetIdx;
				switch (II.getIntrinsicID()) {
				case Intrinsic::amdgcn_raw_buffer_load:
				OffsetIdx = 1;
				break;
				case Intrinsic::amdgcn_s_buffer_load:
				// If resulting type is vec3, there is no point in trimming the
				// load with updated offset, as the vec3 would most likely be widened to
				// vec4 anyway during lowering.
				if (ActiveBits == 4 && UnusedComponentsAtFront == 1)
				OffsetIdx = InvalidOffsetIdx;
				else
				OffsetIdx = 1;
				break;
				case Intrinsic::amdgcn_struct_buffer_load:
				OffsetIdx = 2;
				break;
				default:
				// TODO: handle tbuffer* intrinsics.
				OffsetIdx = InvalidOffsetIdx;
				break;
				}

				if (OffsetIdx != InvalidOffsetIdx) {
				// Clear demanded bits and update the offset.
				DemandedElts &= ~((1 << UnusedComponentsAtFront) - 1);
				auto *Offset = II.getArgOperand(OffsetIdx);
				unsigned SingleComponentSizeInBits =
				IC.getDataLayout().getTypeSizeInBits(II.getType()->getScalarType());
				unsigned OffsetAdd =
				UnusedComponentsAtFront * SingleComponentSizeInBits / 8;
				auto *OffsetAddVal = ConstantInt::get(Offset->getType(), OffsetAdd);
				Args[OffsetIdx] = IC.Builder.CreateAdd(Offset, OffsetAddVal);
				}
				}
				} else {
				// Image case.

				ConstantInt *DMask = cast<ConstantInt>(II.getArgOperand(DMaskIdx));
				unsigned DMaskVal = DMask->getZExtValue() & 0xf;

				// Mask off values that are undefined because the dmask doesn't cover them
				DemandedElts &= (1 << countPopulation(DMaskVal)) - 1;

				unsigned NewDMaskVal = 0;
				unsigned OrigLoadIdx = 0;
				for (unsigned SrcIdx = 0; SrcIdx < 4; ++SrcIdx) {
				const unsigned Bit = 1 << SrcIdx;
				if (!!(DMaskVal & Bit)) {
				if (!!DemandedElts[OrigLoadIdx])
				NewDMaskVal \|= Bit;
				OrigLoadIdx++;
				}
				}

				if (DMaskVal != NewDMaskVal)
				Args[DMaskIdx] = ConstantInt::get(DMask->getType(), NewDMaskVal);
				}

				unsigned NewNumElts = DemandedElts.countPopulation();
				if (!NewNumElts)
				return UndefValue::get(II.getType());

				if (NewNumElts >= VWidth && DemandedElts.isMask()) {
				if (DMaskIdx >= 0)
				II.setArgOperand(DMaskIdx, Args[DMaskIdx]);
				return nullptr;
				}

				// Determine the overload types of the original intrinsic.
				auto IID = II.getIntrinsicID();
				SmallVector<Intrinsic::IITDescriptor, 16> Table;
				getIntrinsicInfoTableEntries(IID, Table);
				ArrayRef<Intrinsic::IITDescriptor> TableRef = Table;

				// Validate function argument and return types, extracting overloaded types
				// along the way.
				FunctionType *FTy = II.getCalledFunction()->getFunctionType();
				SmallVector<Type *, 6> OverloadTys;
				Intrinsic::matchIntrinsicSignature(FTy, TableRef, OverloadTys);

				Module *M = II.getParent()->getParent()->getParent();
				Type *EltTy = IIVTy->getElementType();
				Type *NewTy =
				(NewNumElts == 1) ? EltTy : FixedVectorType::get(EltTy, NewNumElts);

				OverloadTys[0] = NewTy;
				Function *NewIntrin = Intrinsic::getDeclaration(M, IID, OverloadTys);

				CallInst *NewCall = IC.Builder.CreateCall(NewIntrin, Args);
				NewCall->takeName(&II);
				NewCall->copyMetadata(II);

				if (NewNumElts == 1) {
				return IC.Builder.CreateInsertElement(UndefValue::get(II.getType()),
				NewCall,
				DemandedElts.countTrailingZeros());
				}

				SmallVector<int, 8> EltMask;
				unsigned NewLoadIdx = 0;
				for (unsigned OrigLoadIdx = 0; OrigLoadIdx < VWidth; ++OrigLoadIdx) {
				if (!!DemandedElts[OrigLoadIdx])
				EltMask.push_back(NewLoadIdx++);
				else
				EltMask.push_back(NewNumElts);
				}

				Value *Shuffle =
				IC.Builder.CreateShuffleVector(NewCall, UndefValue::get(NewTy), EltMask);

				return Shuffle;
				}

				bool GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(
				InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
				APInt &UndefElts2, APInt &UndefElts3,
				std::function<void(Instruction *, unsigned, APInt, APInt &)>
				SimplifyAndSetOp,
				Value **ResultV) const {
				switch (II.getIntrinsicID()) {
				case Intrinsic::amdgcn_buffer_load:
				case Intrinsic::amdgcn_buffer_load_format:
				case Intrinsic::amdgcn_raw_buffer_load:
				case Intrinsic::amdgcn_raw_buffer_load_format:
				case Intrinsic::amdgcn_raw_tbuffer_load:
				case Intrinsic::amdgcn_s_buffer_load:
				case Intrinsic::amdgcn_struct_buffer_load:
				case Intrinsic::amdgcn_struct_buffer_load_format:
				case Intrinsic::amdgcn_struct_tbuffer_load:
				case Intrinsic::amdgcn_tbuffer_load:
				*ResultV = simplifyAMDGCNMemoryIntrinsicDemanded(IC, II, DemandedElts);
				return true;
				default: {
				if (getAMDGPUImageDMaskIntrinsic(II.getIntrinsicID())) {
				*ResultV = simplifyAMDGCNMemoryIntrinsicDemanded(IC, II, DemandedElts, 0);
				return true;
				}
				break;
				}
				}
				return false;
				}

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show All 27 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/MC/SubtargetFeature.h"		#include "llvm/MC/SubtargetFeature.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include <cassert>		#include <cassert>

namespace llvm {		namespace llvm {

class AMDGPUTargetLowering;		class AMDGPUTargetLowering;
		class InstCombiner;
class Loop;		class Loop;
class ScalarEvolution;		class ScalarEvolution;
class Type;		class Type;
class Value;		class Value;

class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {		class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {
using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;		using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;
using TTI = TargetTransformInfo;		using TTI = TargetTransformInfo;
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	unsigned getFlatAddressSpace() const {
return AMDGPUAS::FLAT_ADDRESS;		return AMDGPUAS::FLAT_ADDRESS;
}		}

bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const;		Intrinsic::ID IID) const;
Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,		Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,
Value *NewV) const;		Value *NewV) const;

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;
		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) const;

unsigned getVectorSplitCost() { return 0; }		unsigned getVectorSplitCost() { return 0; }

unsigned getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp);		VectorType *SubTp);

bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;

▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

	Show All 28 Lines
	tablegen(LLVM R600GenDFAPacketizer.inc -gen-dfa-packetizer)			tablegen(LLVM R600GenDFAPacketizer.inc -gen-dfa-packetizer)
	tablegen(LLVM R600GenInstrInfo.inc -gen-instr-info)			tablegen(LLVM R600GenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM R600GenMCCodeEmitter.inc -gen-emitter)			tablegen(LLVM R600GenMCCodeEmitter.inc -gen-emitter)
	tablegen(LLVM R600GenRegisterInfo.inc -gen-register-info)			tablegen(LLVM R600GenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM R600GenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM R600GenSubtargetInfo.inc -gen-subtarget)

	add_public_tablegen_target(AMDGPUCommonTableGen)			add_public_tablegen_target(AMDGPUCommonTableGen)

				set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)
				tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)
				add_public_tablegen_target(InstCombineTableGen)

	add_llvm_target(AMDGPUCodeGen			add_llvm_target(AMDGPUCodeGen
	AMDGPUAliasAnalysis.cpp			AMDGPUAliasAnalysis.cpp
	AMDGPUAlwaysInlinePass.cpp			AMDGPUAlwaysInlinePass.cpp
	AMDGPUAnnotateKernelFeatures.cpp			AMDGPUAnnotateKernelFeatures.cpp
	AMDGPUAnnotateUniformValues.cpp			AMDGPUAnnotateUniformValues.cpp
	AMDGPUArgumentUsageInfo.cpp			AMDGPUArgumentUsageInfo.cpp
	AMDGPUAsmPrinter.cpp			AMDGPUAsmPrinter.cpp
	AMDGPUAtomicOptimizer.cpp			AMDGPUAtomicOptimizer.cpp
	AMDGPUCallLowering.cpp			AMDGPUCallLowering.cpp
	AMDGPUCodeGenPrepare.cpp			AMDGPUCodeGenPrepare.cpp
	AMDGPUExportClustering.cpp			AMDGPUExportClustering.cpp
	AMDGPUFixFunctionBitcasts.cpp			AMDGPUFixFunctionBitcasts.cpp
	AMDGPUFrameLowering.cpp			AMDGPUFrameLowering.cpp
	AMDGPUHSAMetadataStreamer.cpp			AMDGPUHSAMetadataStreamer.cpp
				AMDGPUInstCombineIntrinsic.cpp
	AMDGPUInstrInfo.cpp			AMDGPUInstrInfo.cpp
	AMDGPUInstructionSelector.cpp			AMDGPUInstructionSelector.cpp
	AMDGPUISelDAGToDAG.cpp			AMDGPUISelDAGToDAG.cpp
	AMDGPUISelLowering.cpp			AMDGPUISelLowering.cpp
	AMDGPUGlobalISelUtils.cpp			AMDGPUGlobalISelUtils.cpp
	AMDGPULegalizerInfo.cpp			AMDGPULegalizerInfo.cpp
	AMDGPULibCalls.cpp			AMDGPULibCalls.cpp
	AMDGPULibFunc.cpp			AMDGPULibFunc.cpp
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/InstCombineTables.td

This file was moved from llvm/lib/Transforms/InstCombine/InstCombineTables.td.

	include "llvm/TableGen/SearchableTable.td"			include "llvm/TableGen/SearchableTable.td"
	include "llvm/IR/Intrinsics.td"			include "llvm/IR/Intrinsics.td"

	def AMDGPUImageDMaskIntrinsicTable : GenericTable {			def AMDGPUImageDMaskIntrinsicTable : GenericTable {
	let FilterClass = "AMDGPUImageDMaskIntrinsic";			let FilterClass = "AMDGPUImageDMaskIntrinsic";
	let Fields = ["Intr"];			let Fields = ["Intr"];

	let PrimaryKey = ["Intr"];			let PrimaryKey = ["Intr"];
	let PrimaryKeyName = "getAMDGPUImageDMaskIntrinsic";			let PrimaryKeyName = "getAMDGPUImageDMaskIntrinsic";
	let PrimaryKeyEarlyOut = 1;			let PrimaryKeyEarlyOut = 1;
	}			}

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:

/// Floating-point computation using ARMv8 AArch32 Advanced		/// Floating-point computation using ARMv8 AArch32 Advanced
/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD		/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD
/// and Arm MVE are IEEE-754 compliant.		/// and Arm MVE are IEEE-754 compliant.
bool isFPVectorizationPotentiallyUnsafe() {		bool isFPVectorizationPotentiallyUnsafe() {
return !ST->isTargetDarwin() && !ST->hasMVEFloatOps();		return !ST->isTargetDarwin() && !ST->hasMVEFloatOps();
}		}

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);		int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

	Show All 22 Lines
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/IntrinsicsARM.h"			#include "llvm/IR/IntrinsicsARM.h"
	#include "llvm/IR/PatternMatch.h"			#include "llvm/IR/PatternMatch.h"
	#include "llvm/IR/Type.h"			#include "llvm/IR/Type.h"
	#include "llvm/MC/SubtargetFeature.h"			#include "llvm/MC/SubtargetFeature.h"
	#include "llvm/Support/Casting.h"			#include "llvm/Support/Casting.h"
	#include "llvm/Support/MachineValueType.h"			#include "llvm/Support/MachineValueType.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
				#include "llvm/Transforms/InstCombine/InstCombiner.h"
	#include <algorithm>			#include <algorithm>
	#include <cassert>			#include <cassert>
	#include <cstdint>			#include <cstdint>
	#include <utility>			#include <utility>

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "armtti"			#define DEBUG_TYPE "armtti"

	static cl::opt<bool> EnableMaskedLoadStores(			static cl::opt<bool> EnableMaskedLoadStores(
	"enable-arm-maskedldst", cl::Hidden, cl::init(true),			"enable-arm-maskedldst", cl::Hidden, cl::init(true),
	cl::desc("Enable the generation of masked loads and stores"));			cl::desc("Enable the generation of masked loads and stores"));

	static cl::opt<bool> DisableLowOverheadLoops(			static cl::opt<bool> DisableLowOverheadLoops(
	"disable-arm-loloops", cl::Hidden, cl::init(false),			"disable-arm-loloops", cl::Hidden, cl::init(false),
	cl::desc("Disable the generation of low-overhead loops"));			cl::desc("Disable the generation of low-overhead loops"));

	extern cl::opt<bool> DisableTailPredication;			extern cl::opt<bool> DisableTailPredication;

	extern cl::opt<bool> EnableMaskedGatherScatters;			extern cl::opt<bool> EnableMaskedGatherScatters;

				/// Convert a vector load intrinsic into a simple llvm load instruction.
				/// This is beneficial when the underlying object being addressed comes
				/// from a constant, since we get constant-folding for free.
				static Value *simplifyNeonVld1(const IntrinsicInst &II, unsigned MemAlign,
				InstCombiner::BuilderTy &Builder) {
				auto *IntrAlign = dyn_cast<ConstantInt>(II.getArgOperand(1));

				if (!IntrAlign)
				return nullptr;

				unsigned Alignment = IntrAlign->getLimitedValue() < MemAlign
				? MemAlign
				: IntrAlign->getLimitedValue();

				if (!isPowerOf2_32(Alignment))
				return nullptr;

				auto *BCastInst = Builder.CreateBitCast(II.getArgOperand(0),
				PointerType::get(II.getType(), 0));
				return Builder.CreateAlignedLoad(II.getType(), BCastInst, Align(Alignment));
				}

	bool ARMTTIImpl::areInlineCompatible(const Function *Caller,			bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
	const Function *Callee) const {			const Function *Callee) const {
	const TargetMachine &TM = getTLI()->getTargetMachine();			const TargetMachine &TM = getTLI()->getTargetMachine();
	const FeatureBitset &CallerBits =			const FeatureBitset &CallerBits =
	TM.getSubtargetImpl(*Caller)->getFeatureBits();			TM.getSubtargetImpl(*Caller)->getFeatureBits();
	const FeatureBitset &CalleeBits =			const FeatureBitset &CalleeBits =
	TM.getSubtargetImpl(*Callee)->getFeatureBits();			TM.getSubtargetImpl(*Callee)->getFeatureBits();

	Show All 16 Lines
	}			}

	bool ARMTTIImpl::shouldFavorPostInc() const {			bool ARMTTIImpl::shouldFavorPostInc() const {
	if (ST->hasMVEIntegerOps())			if (ST->hasMVEIntegerOps())
	return true;			return true;
	return false;			return false;
	}			}

				bool ARMTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
				Instruction **ResultI) const {
				Intrinsic::ID IID = II.getIntrinsicID();
				switch (IID) {
				default:
				break;
				case Intrinsic::arm_neon_vld1: {
				Align MemAlign =
				getKnownAlignment(II.getArgOperand(0), IC.getDataLayout(), &II,
				&IC.getAssumptionCache(), &IC.getDominatorTree());
				if (Value *V = simplifyNeonVld1(II, MemAlign.value(), IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;
				}

				case Intrinsic::arm_neon_vld2:
				case Intrinsic::arm_neon_vld3:
				case Intrinsic::arm_neon_vld4:
				case Intrinsic::arm_neon_vld2lane:
				case Intrinsic::arm_neon_vld3lane:
				case Intrinsic::arm_neon_vld4lane:
				case Intrinsic::arm_neon_vst1:
				case Intrinsic::arm_neon_vst2:
				case Intrinsic::arm_neon_vst3:
				case Intrinsic::arm_neon_vst4:
				case Intrinsic::arm_neon_vst2lane:
				case Intrinsic::arm_neon_vst3lane:
				case Intrinsic::arm_neon_vst4lane: {
				Align MemAlign =
				getKnownAlignment(II.getArgOperand(0), IC.getDataLayout(), &II,
				&IC.getAssumptionCache(), &IC.getDominatorTree());
				unsigned AlignArg = II.getNumArgOperands() - 1;
				ConstantInt *IntrAlign = dyn_cast<ConstantInt>(II.getArgOperand(AlignArg));
				if (IntrAlign && IntrAlign->getZExtValue() < MemAlign.value()) {
				*ResultI =
				IC.replaceOperand(II, AlignArg,
				ConstantInt::get(Type::getInt32Ty(II.getContext()),
				MemAlign.value(), false));
				return true;
				}
				break;
				}

				case Intrinsic::arm_mve_pred_i2v: {
				Value *Arg = II.getArgOperand(0);
				Value *ArgArg;
				if (match(Arg, PatternMatch::m_Intrinsic<Intrinsic::arm_mve_pred_v2i>(
				PatternMatch::m_Value(ArgArg))) &&
				II.getType() == ArgArg->getType()) {
				*ResultI = IC.replaceInstUsesWith(II, ArgArg);
				return true;
				}
				Constant *XorMask;
				if (match(Arg, m_Xor(PatternMatch::m_Intrinsic<Intrinsic::arm_mve_pred_v2i>(
				PatternMatch::m_Value(ArgArg)),
				PatternMatch::m_Constant(XorMask))) &&
				II.getType() == ArgArg->getType()) {
				if (auto *CI = dyn_cast<ConstantInt>(XorMask)) {
				if (CI->getValue().trunc(16).isAllOnesValue()) {
				auto TrueVector = IC.Builder.CreateVectorSplat(
				cast<VectorType>(II.getType())->getNumElements(),
				IC.Builder.getTrue());
				*ResultI =
				BinaryOperator::Create(Instruction::Xor, ArgArg, TrueVector);
				return true;
				}
				}
				}
				KnownBits ScalarKnown(32);
				if (IC.SimplifyDemandedBits(&II, 0, APInt::getLowBitsSet(32, 16),
				ScalarKnown, 0)) {
				*ResultI = &II;
				return true;
				}
				break;
				}
				case Intrinsic::arm_mve_pred_v2i: {
				Value *Arg = II.getArgOperand(0);
				Value *ArgArg;
				if (match(Arg, PatternMatch::m_Intrinsic<Intrinsic::arm_mve_pred_i2v>(
				PatternMatch::m_Value(ArgArg)))) {
				*ResultI = IC.replaceInstUsesWith(II, ArgArg);
				return true;
				}
				if (!II.getMetadata(LLVMContext::MD_range)) {
				Type *IntTy32 = Type::getInt32Ty(II.getContext());
				Metadata *M[] = {
				ConstantAsMetadata::get(ConstantInt::get(IntTy32, 0)),
				ConstantAsMetadata::get(ConstantInt::get(IntTy32, 0xFFFF))};
				II.setMetadata(LLVMContext::MD_range, MDNode::get(II.getContext(), M));
				*ResultI = &II;
				return true;
				}
				break;
				}
				case Intrinsic::arm_mve_vadc:
				case Intrinsic::arm_mve_vadc_predicated: {
				unsigned CarryOp =
				(II.getIntrinsicID() == Intrinsic::arm_mve_vadc_predicated) ? 3 : 2;
				assert(II.getArgOperand(CarryOp)->getType()->getScalarSizeInBits() == 32 &&
				"Bad type for intrinsic!");

				KnownBits CarryKnown(32);
				if (IC.SimplifyDemandedBits(&II, CarryOp, APInt::getOneBitSet(32, 29),
				CarryKnown)) {
				*ResultI = &II;
				return true;
				}
				break;
				}
				}
				return false;
				}

	int ARMTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,			int ARMTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
	TTI::TargetCostKind CostKind) {			TTI::TargetCostKind CostKind) {
	assert(Ty->isIntegerTy());			assert(Ty->isIntegerTy());

	unsigned Bits = Ty->getPrimitiveSizeInBits();			unsigned Bits = Ty->getPrimitiveSizeInBits();
	if (Bits == 0 \|\| Imm.getActiveBits() >= 64)			if (Bits == 0 \|\| Imm.getActiveBits() >= 64)
	return 4;			return 4;

	▲ Show 20 Lines • Show All 1,408 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
bool hasBranchDivergence() { return true; }		bool hasBranchDivergence() { return true; }

bool isSourceOfDivergence(const Value *V);		bool isSourceOfDivergence(const Value *V);

unsigned getFlatAddressSpace() const {		unsigned getFlatAddressSpace() const {
return AddressSpace::ADDRESS_SPACE_GENERIC;		return AddressSpace::ADDRESS_SPACE_GENERIC;
}		}

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;

// Loads and stores can be vectorized if the alignment is at least as big as		// Loads and stores can be vectorized if the alignment is at least as big as
// the load/store we want to vectorize.		// the load/store we want to vectorize.
bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,		bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
return Alignment >= ChainSizeInBytes;		return Alignment >= ChainSizeInBytes;
}		}
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	if (const Instruction *I = dyn_cast<Instruction>(V)) {
// inter-procedural analysis.		// inter-procedural analysis.
if (isa<CallInst>(I))		if (isa<CallInst>(I))
return true;		return true;
}		}

return false;		return false;
}		}

		// Convert NVVM intrinsics to target-generic LLVM code where possible.
		static Instruction simplifyNvvmIntrinsic(IntrinsicInst II, InstCombiner &IC) {
		// Each NVVM intrinsic we can simplify can be replaced with one of:
		//
		// * an LLVM intrinsic,
		// * an LLVM cast operation,
		// * an LLVM binary operation, or
		// * ad-hoc LLVM IR for the particular operation.

		// Some transformations are only valid when the module's
		// flush-denormals-to-zero (ftz) setting is true/false, whereas other
		// transformations are valid regardless of the module's ftz setting.
		enum FtzRequirementTy {
		FTZ_Any, // Any ftz setting is ok.
		FTZ_MustBeOn, // Transformation is valid only if ftz is on.
		FTZ_MustBeOff, // Transformation is valid only if ftz is off.
		};
		// Classes of NVVM intrinsics that can't be replaced one-to-one with a
		// target-generic intrinsic, cast op, or binary op but that we can nonetheless
		// simplify.
		enum SpecialCase {
		SPC_Reciprocal,
		};

		// SimplifyAction is a poor-man's variant (plus an additional flag) that
		// represents how to replace an NVVM intrinsic with target-generic LLVM IR.
		struct SimplifyAction {
		// Invariant: At most one of these Optionals has a value.
		Optional<Intrinsic::ID> IID;
		Optional<Instruction::CastOps> CastOp;
		Optional<Instruction::BinaryOps> BinaryOp;
		Optional<SpecialCase> Special;

		FtzRequirementTy FtzRequirement = FTZ_Any;

		SimplifyAction() = default;

		SimplifyAction(Intrinsic::ID IID, FtzRequirementTy FtzReq)
		: IID(IID), FtzRequirement(FtzReq) {}

		// Cast operations don't have anything to do with FTZ, so we skip that
		// argument.
		SimplifyAction(Instruction::CastOps CastOp) : CastOp(CastOp) {}

		SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq)
		: BinaryOp(BinaryOp), FtzRequirement(FtzReq) {}

		SimplifyAction(SpecialCase Special, FtzRequirementTy FtzReq)
		: Special(Special), FtzRequirement(FtzReq) {}
		};

		// Try to generate a SimplifyAction describing how to replace our
		// IntrinsicInstr with target-generic LLVM IR.
		const SimplifyAction Action = [II]() -> SimplifyAction {
		switch (II->getIntrinsicID()) {
		// NVVM intrinsics that map directly to LLVM intrinsics.
		case Intrinsic::nvvm_ceil_d:
		return {Intrinsic::ceil, FTZ_Any};
		case Intrinsic::nvvm_ceil_f:
		return {Intrinsic::ceil, FTZ_MustBeOff};
		case Intrinsic::nvvm_ceil_ftz_f:
		return {Intrinsic::ceil, FTZ_MustBeOn};
		case Intrinsic::nvvm_fabs_d:
		return {Intrinsic::fabs, FTZ_Any};
		case Intrinsic::nvvm_fabs_f:
		return {Intrinsic::fabs, FTZ_MustBeOff};
		case Intrinsic::nvvm_fabs_ftz_f:
		return {Intrinsic::fabs, FTZ_MustBeOn};
		case Intrinsic::nvvm_floor_d:
		return {Intrinsic::floor, FTZ_Any};
		case Intrinsic::nvvm_floor_f:
		return {Intrinsic::floor, FTZ_MustBeOff};
		case Intrinsic::nvvm_floor_ftz_f:
		return {Intrinsic::floor, FTZ_MustBeOn};
		case Intrinsic::nvvm_fma_rn_d:
		return {Intrinsic::fma, FTZ_Any};
		case Intrinsic::nvvm_fma_rn_f:
		return {Intrinsic::fma, FTZ_MustBeOff};
		case Intrinsic::nvvm_fma_rn_ftz_f:
		return {Intrinsic::fma, FTZ_MustBeOn};
		case Intrinsic::nvvm_fmax_d:
		return {Intrinsic::maxnum, FTZ_Any};
		case Intrinsic::nvvm_fmax_f:
		return {Intrinsic::maxnum, FTZ_MustBeOff};
		case Intrinsic::nvvm_fmax_ftz_f:
		return {Intrinsic::maxnum, FTZ_MustBeOn};
		case Intrinsic::nvvm_fmin_d:
		return {Intrinsic::minnum, FTZ_Any};
		case Intrinsic::nvvm_fmin_f:
		return {Intrinsic::minnum, FTZ_MustBeOff};
		case Intrinsic::nvvm_fmin_ftz_f:
		return {Intrinsic::minnum, FTZ_MustBeOn};
		case Intrinsic::nvvm_round_d:
		return {Intrinsic::round, FTZ_Any};
		case Intrinsic::nvvm_round_f:
		return {Intrinsic::round, FTZ_MustBeOff};
		case Intrinsic::nvvm_round_ftz_f:
		return {Intrinsic::round, FTZ_MustBeOn};
		case Intrinsic::nvvm_sqrt_rn_d:
		return {Intrinsic::sqrt, FTZ_Any};
		case Intrinsic::nvvm_sqrt_f:
		// nvvm_sqrt_f is a special case. For most intrinsics, foo_ftz_f is the
		// ftz version, and foo_f is the non-ftz version. But nvvm_sqrt_f adopts
		// the ftz-ness of the surrounding code. sqrt_rn_f and sqrt_rn_ftz_f are
		// the versions with explicit ftz-ness.
		return {Intrinsic::sqrt, FTZ_Any};
		case Intrinsic::nvvm_sqrt_rn_f:
		return {Intrinsic::sqrt, FTZ_MustBeOff};
		case Intrinsic::nvvm_sqrt_rn_ftz_f:
		return {Intrinsic::sqrt, FTZ_MustBeOn};
		case Intrinsic::nvvm_trunc_d:
		return {Intrinsic::trunc, FTZ_Any};
		case Intrinsic::nvvm_trunc_f:
		return {Intrinsic::trunc, FTZ_MustBeOff};
		case Intrinsic::nvvm_trunc_ftz_f:
		return {Intrinsic::trunc, FTZ_MustBeOn};

		// NVVM intrinsics that map to LLVM cast operations.
		//
		// Note that llvm's target-generic conversion operators correspond to the rz
		// (round to zero) versions of the nvvm conversion intrinsics, even though
		// most everything else here uses the rn (round to nearest even) nvvm ops.
		case Intrinsic::nvvm_d2i_rz:
		case Intrinsic::nvvm_f2i_rz:
		case Intrinsic::nvvm_d2ll_rz:
		case Intrinsic::nvvm_f2ll_rz:
		return {Instruction::FPToSI};
		case Intrinsic::nvvm_d2ui_rz:
		case Intrinsic::nvvm_f2ui_rz:
		case Intrinsic::nvvm_d2ull_rz:
		case Intrinsic::nvvm_f2ull_rz:
		return {Instruction::FPToUI};
		case Intrinsic::nvvm_i2d_rz:
		case Intrinsic::nvvm_i2f_rz:
		case Intrinsic::nvvm_ll2d_rz:
		case Intrinsic::nvvm_ll2f_rz:
		return {Instruction::SIToFP};
		case Intrinsic::nvvm_ui2d_rz:
		case Intrinsic::nvvm_ui2f_rz:
		case Intrinsic::nvvm_ull2d_rz:
		case Intrinsic::nvvm_ull2f_rz:
		return {Instruction::UIToFP};

		// NVVM intrinsics that map to LLVM binary ops.
		case Intrinsic::nvvm_add_rn_d:
		return {Instruction::FAdd, FTZ_Any};
		case Intrinsic::nvvm_add_rn_f:
		return {Instruction::FAdd, FTZ_MustBeOff};
		case Intrinsic::nvvm_add_rn_ftz_f:
		return {Instruction::FAdd, FTZ_MustBeOn};
		case Intrinsic::nvvm_mul_rn_d:
		return {Instruction::FMul, FTZ_Any};
		case Intrinsic::nvvm_mul_rn_f:
		return {Instruction::FMul, FTZ_MustBeOff};
		case Intrinsic::nvvm_mul_rn_ftz_f:
		return {Instruction::FMul, FTZ_MustBeOn};
		case Intrinsic::nvvm_div_rn_d:
		return {Instruction::FDiv, FTZ_Any};
		case Intrinsic::nvvm_div_rn_f:
		return {Instruction::FDiv, FTZ_MustBeOff};
		case Intrinsic::nvvm_div_rn_ftz_f:
		return {Instruction::FDiv, FTZ_MustBeOn};

		// The remainder of cases are NVVM intrinsics that map to LLVM idioms, but
		// need special handling.
		//
		// We seem to be missing intrinsics for rcp.approx.{ftz.}f32, which is just
		// as well.
		case Intrinsic::nvvm_rcp_rn_d:
		return {SPC_Reciprocal, FTZ_Any};
		case Intrinsic::nvvm_rcp_rn_f:
		return {SPC_Reciprocal, FTZ_MustBeOff};
		case Intrinsic::nvvm_rcp_rn_ftz_f:
		return {SPC_Reciprocal, FTZ_MustBeOn};

		// We do not currently simplify intrinsics that give an approximate
		// answer. These include:
		//
		// - nvvm_cos_approx_{f,ftz_f}
		// - nvvm_ex2_approx_{d,f,ftz_f}
		// - nvvm_lg2_approx_{d,f,ftz_f}
		// - nvvm_sin_approx_{f,ftz_f}
		// - nvvm_sqrt_approx_{f,ftz_f}
		// - nvvm_rsqrt_approx_{d,f,ftz_f}
		// - nvvm_div_approx_{ftz_d,ftz_f,f}
		// - nvvm_rcp_approx_ftz_d
		//
		// Ideally we'd encode them as e.g. "fast call @llvm.cos", where "fast"
		// means that fastmath is enabled in the intrinsic. Unfortunately only
		// binary operators (currently) have a fastmath bit in SelectionDAG, so
		// this information gets lost and we can't select on it.
		//
		// TODO: div and rcp are lowered to a binary op, so these we could in
		// theory lower them to "fast fdiv".

		default:
		return {};
		}
		}();

		// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we
		// can bail out now. (Notice that in the case that IID is not an NVVM
		// intrinsic, we don't have to look up any module metadata, as
		// FtzRequirementTy will be FTZ_Any.)
		if (Action.FtzRequirement != FTZ_Any) {
		StringRef Attr = II->getFunction()
		->getFnAttribute("denormal-fp-math-f32")
		.getValueAsString();
		DenormalMode Mode = parseDenormalFPAttribute(Attr);
		bool FtzEnabled = Mode.Output != DenormalMode::IEEE;

		if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
		return nullptr;
		}

		// Simplify to target-generic intrinsic.
		if (Action.IID) {
		SmallVector<Value *, 4> Args(II->arg_operands());
		// All the target-generic intrinsics currently of interest to us have one
		// type argument, equal to that of the nvvm intrinsic's argument.
		Type *Tys[] = {II->getArgOperand(0)->getType()};
		return CallInst::Create(
		Intrinsic::getDeclaration(II->getModule(), *Action.IID, Tys), Args);
		}

		// Simplify to target-generic binary op.
		if (Action.BinaryOp)
		return BinaryOperator::Create(*Action.BinaryOp, II->getArgOperand(0),
		II->getArgOperand(1), II->getName());

		// Simplify to target-generic cast op.
		if (Action.CastOp)
		return CastInst::Create(*Action.CastOp, II->getArgOperand(0), II->getType(),
		II->getName());

		// All that's left are the special cases.
		if (!Action.Special)
		return nullptr;

		switch (*Action.Special) {
		case SPC_Reciprocal:
		// Simplify reciprocal.
		return BinaryOperator::Create(
		Instruction::FDiv, ConstantFP::get(II->getArgOperand(0)->getType(), 1),
		II->getArgOperand(0), II->getName());
		}
		llvm_unreachable("All SpecialCase enumerators should be handled in switch.");
		}

		bool NVPTXTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const {
		if (Instruction *I = simplifyNvvmIntrinsic(&II, IC)) {
		*ResultI = I;
		return true;
		}
		return false;
		}

int NVPTXTTIImpl::getArithmeticInstrCost(		int NVPTXTTIImpl::getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,		unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
TTI::OperandValueKind Opd1Info,		TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,
const Instruction *CxtI) {		const Instruction *CxtI) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
Show All 36 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show All 35 Lines	class PPCTTIImpl : public BasicTTIImplBase<PPCTTIImpl> {
bool mightUseCTR(BasicBlock BB, TargetLibraryInfo LibInfo,		bool mightUseCTR(BasicBlock BB, TargetLibraryInfo LibInfo,
SmallPtrSetImpl<const Value *> &Visited);		SmallPtrSetImpl<const Value *> &Visited);

public:		public:
explicit PPCTTIImpl(const PPCTargetMachine *TM, const Function &F)		explicit PPCTTIImpl(const PPCTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()) {}		TLI(ST->getTargetLowering()) {}

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
int getIntImmCost(const APInt &Imm, Type *Ty,		int getIntImmCost(const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,		int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

//===-- PPCTargetTransformInfo.cpp - PPC specific TTI ---------------------===//		//===-- PPCTargetTransformInfo.cpp - PPC specific TTI ---------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "PPCTargetTransformInfo.h"		#include "PPCTargetTransformInfo.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/CostTable.h"		#include "llvm/CodeGen/CostTable.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
		#include "llvm/IR/IntrinsicsPowerPC.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "ppctti"		#define DEBUG_TYPE "ppctti"

static cl::opt<bool> DisablePPCConstHoist("disable-ppc-constant-hoisting",		static cl::opt<bool> DisablePPCConstHoist("disable-ppc-constant-hoisting",
cl::desc("disable constant hoisting on PPC"), cl::init(false), cl::Hidden);		cl::desc("disable constant hoisting on PPC"), cl::init(false), cl::Hidden);

// This is currently only used for the data prefetch pass which is only enabled		// This is currently only used for the data prefetch pass which is only enabled
Show All 28 Lines
PPCTTIImpl::getPopcntSupport(unsigned TyWidth) {		PPCTTIImpl::getPopcntSupport(unsigned TyWidth) {
assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");		assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
if (ST->hasPOPCNTD() != PPCSubtarget::POPCNTD_Unavailable && TyWidth <= 64)		if (ST->hasPOPCNTD() != PPCSubtarget::POPCNTD_Unavailable && TyWidth <= 64)
return ST->hasPOPCNTD() == PPCSubtarget::POPCNTD_Slow ?		return ST->hasPOPCNTD() == PPCSubtarget::POPCNTD_Slow ?
TTI::PSK_SlowHardware : TTI::PSK_FastHardware;		TTI::PSK_SlowHardware : TTI::PSK_FastHardware;
return TTI::PSK_Software;		return TTI::PSK_Software;
}		}

		bool PPCTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const {
		Intrinsic::ID IID = II.getIntrinsicID();
		switch (IID) {
		default:
		break;
		case Intrinsic::ppc_altivec_lvx:
		case Intrinsic::ppc_altivec_lvxl:
		// Turn PPC lvx -> load if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(0), Align(16), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 16) {
		Value *Ptr = IC.Builder.CreateBitCast(
		II.getArgOperand(0), PointerType::getUnqual(II.getType()));
		*ResultI = new LoadInst(II.getType(), Ptr, "", false, Align(16));
		return true;
		}
		break;
		case Intrinsic::ppc_vsx_lxvw4x:
		case Intrinsic::ppc_vsx_lxvd2x: {
		// Turn PPC VSX loads into normal loads.
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(0),
		PointerType::getUnqual(II.getType()));
		*ResultI = new LoadInst(II.getType(), Ptr, Twine(""), false, Align(1));
		return true;
		}
		case Intrinsic::ppc_altivec_stvx:
		case Intrinsic::ppc_altivec_stvxl:
		// Turn stvx -> store if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(1), Align(16), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 16) {
		Type *OpPtrTy = PointerType::getUnqual(II.getArgOperand(0)->getType());
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(1), OpPtrTy);
		*ResultI = new StoreInst(II.getArgOperand(0), Ptr, false, Align(16));
		return true;
		}
		break;
		case Intrinsic::ppc_vsx_stxvw4x:
		case Intrinsic::ppc_vsx_stxvd2x: {
		// Turn PPC VSX stores into normal stores.
		Type *OpPtrTy = PointerType::getUnqual(II.getArgOperand(0)->getType());
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(1), OpPtrTy);
		*ResultI = new StoreInst(II.getArgOperand(0), Ptr, false, Align(1));
		return true;
		}
		case Intrinsic::ppc_qpx_qvlfs:
		// Turn PPC QPX qvlfs -> load if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(0), Align(16), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 16) {
		Type *VTy =
		VectorType::get(IC.Builder.getFloatTy(),
		cast<VectorType>(II.getType())->getElementCount());
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(0),
		PointerType::getUnqual(VTy));
		Value *Load = IC.Builder.CreateLoad(VTy, Ptr);
		*ResultI = new FPExtInst(Load, II.getType());
		return true;
		}
		break;
		case Intrinsic::ppc_qpx_qvlfd:
		// Turn PPC QPX qvlfd -> load if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(0), Align(32), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 32) {
		Value *Ptr = IC.Builder.CreateBitCast(
		II.getArgOperand(0), PointerType::getUnqual(II.getType()));
		*ResultI = new LoadInst(II.getType(), Ptr, "", false, Align(32));
		return true;
		}
		break;
		case Intrinsic::ppc_qpx_qvstfs:
		// Turn PPC QPX qvstfs -> store if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(1), Align(16), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 16) {
		Type *VTy = VectorType::get(
		IC.Builder.getFloatTy(),
		cast<VectorType>(II.getArgOperand(0)->getType())->getElementCount());
		Value *TOp = IC.Builder.CreateFPTrunc(II.getArgOperand(0), VTy);
		Type *OpPtrTy = PointerType::getUnqual(VTy);
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(1), OpPtrTy);
		*ResultI = new StoreInst(TOp, Ptr, false, Align(16));
		return true;
		}
		break;
		case Intrinsic::ppc_qpx_qvstfd:
		// Turn PPC QPX qvstfd -> store if the pointer is known aligned.
		if (getOrEnforceKnownAlignment(
		II.getArgOperand(1), Align(32), IC.getDataLayout(), &II,
		&IC.getAssumptionCache(), &IC.getDominatorTree()) >= 32) {
		Type *OpPtrTy = PointerType::getUnqual(II.getArgOperand(0)->getType());
		Value *Ptr = IC.Builder.CreateBitCast(II.getArgOperand(1), OpPtrTy);
		*ResultI = new StoreInst(II.getArgOperand(0), Ptr, false, Align(32));
		return true;
		}
		break;

		case Intrinsic::ppc_altivec_vperm:
		// Turn vperm(V1,V2,mask) -> shuffle(V1,V2,mask) if mask is a constant.
		// Note that ppc_altivec_vperm has a big-endian bias, so when creating
		// a vectorshuffle for little endian, we must undo the transformation
		// performed on vec_perm in altivec.h. That is, we must complement
		// the permutation mask with respect to 31 and reverse the order of
		// V1 and V2.
		if (Constant *Mask = dyn_cast<Constant>(II.getArgOperand(2))) {
		assert(cast<VectorType>(Mask->getType())->getNumElements() == 16 &&
		"Bad type for intrinsic!");

		// Check that all of the elements are integer constants or undefs.
		bool AllEltsOk = true;
		for (unsigned i = 0; i != 16; ++i) {
		Constant *Elt = Mask->getAggregateElement(i);
		if (!Elt \|\| !(isa<ConstantInt>(Elt) \|\| isa<UndefValue>(Elt))) {
		AllEltsOk = false;
		break;
		}
		}

		if (AllEltsOk) {
		// Cast the input vectors to byte vectors.
		Value *Op0 =
		IC.Builder.CreateBitCast(II.getArgOperand(0), Mask->getType());
		Value *Op1 =
		IC.Builder.CreateBitCast(II.getArgOperand(1), Mask->getType());
		Value *Result = UndefValue::get(Op0->getType());

		// Only extract each element once.
		Value *ExtractedElts[32];
		memset(ExtractedElts, 0, sizeof(ExtractedElts));

		for (unsigned i = 0; i != 16; ++i) {
		if (isa<UndefValue>(Mask->getAggregateElement(i)))
		continue;
		unsigned Idx =
		cast<ConstantInt>(Mask->getAggregateElement(i))->getZExtValue();
		Idx &= 31; // Match the hardware behavior.
		if (DL.isLittleEndian())
		Idx = 31 - Idx;

		if (!ExtractedElts[Idx]) {
		Value *Op0ToUse = (DL.isLittleEndian()) ? Op1 : Op0;
		Value *Op1ToUse = (DL.isLittleEndian()) ? Op0 : Op1;
		ExtractedElts[Idx] = IC.Builder.CreateExtractElement(
		Idx < 16 ? Op0ToUse : Op1ToUse, IC.Builder.getInt32(Idx & 15));
		}

		// Insert this value into the result vector.
		Result = IC.Builder.CreateInsertElement(Result, ExtractedElts[Idx],
		IC.Builder.getInt32(i));
		}
		*ResultI = CastInst::Create(Instruction::BitCast, Result, II.getType());
		return true;
		}
		}
		break;
		}
		return false;
		}

int PPCTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,		int PPCTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
if (DisablePPCConstHoist)		if (DisablePPCConstHoist)
return BaseT::getIntImmCost(Imm, Ty, CostKind);		return BaseT::getIntImmCost(Imm, Ty, CostKind);

assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
▲ Show 20 Lines • Show All 770 Lines • ▼ Show 20 Lines	if (ST->hasP9Altivec()) {
if (Index == MfvsrwzIndex)		if (Index == MfvsrwzIndex)
return 1;		return 1;
}		}

// We need a vector extract (or mfvsrld). Assume vector operation cost.		// We need a vector extract (or mfvsrld). Assume vector operation cost.
// The cost of the load constant for a vector extract is disregarded		// The cost of the load constant for a vector extract is disregarded
// (invariant, easily schedulable).		// (invariant, easily schedulable).
return vectorCostAdjustment(1, Opcode, Val, nullptr);		return vectorCostAdjustment(1, Opcode, Val, nullptr);

} else if (ST->hasDirectMove())		} else if (ST->hasDirectMove())
// Assume permute has standard cost.		// Assume permute has standard cost.
// Assume move-to/move-from VSR have 2x standard cost.		// Assume move-to/move-from VSR have 2x standard cost.
return 3;		return 3;
}		}

// Estimated cost of a load-hit-store delay. This was obtained		// Estimated cost of a load-hit-store delay. This was obtained
// experimentally as a minimum needed to prevent unprofitable		// experimentally as a minimum needed to prevent unprofitable
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

llvm/lib/Target/X86/CMakeLists.txt

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	set(sources
X86FrameLowering.cpp		X86FrameLowering.cpp
X86InstructionSelector.cpp		X86InstructionSelector.cpp
X86ISelDAGToDAG.cpp		X86ISelDAGToDAG.cpp
X86ISelLowering.cpp		X86ISelLowering.cpp
X86IndirectBranchTracking.cpp		X86IndirectBranchTracking.cpp
X86IndirectThunks.cpp		X86IndirectThunks.cpp
X86InterleavedAccess.cpp		X86InterleavedAccess.cpp
X86InsertPrefetch.cpp		X86InsertPrefetch.cpp
		X86InstCombineIntrinsic.cpp
X86InstrFMA3Info.cpp		X86InstrFMA3Info.cpp
X86InstrFoldTables.cpp		X86InstrFoldTables.cpp
X86InstrInfo.cpp		X86InstrInfo.cpp
X86EvexToVex.cpp		X86EvexToVex.cpp
X86LegalizerInfo.cpp		X86LegalizerInfo.cpp
X86LoadValueInjectionLoadHardening.cpp		X86LoadValueInjectionLoadHardening.cpp
X86LoadValueInjectionRetHardening.cpp		X86LoadValueInjectionRetHardening.cpp
X86MCInstLower.cpp		X86MCInstLower.cpp
Show All 27 Lines

llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp

This file was added.

				//===-- X86InstCombineIntrinsic.cpp - X86 specific InstCombine pass -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This file implements a TargetTransformInfo analysis pass specific to the
				/// X86 target machine. It uses the target's detailed information to provide
				/// more precise answers to certain TTI queries, while letting the target
				/// independent and default TTI implementations handle the rest.
				///
				//===----------------------------------------------------------------------===//

				#include "X86TargetTransformInfo.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/IntrinsicsX86.h"
				#include "llvm/Transforms/InstCombine/InstCombiner.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86tti"

				/// Return a constant boolean vector that has true elements in all positions
				/// where the input constant data vector has an element with the sign bit set.
				static Constant getNegativeIsTrueBoolVec(ConstantDataVector V) {
				SmallVector<Constant *, 32> BoolVec;
				IntegerType *BoolTy = Type::getInt1Ty(V->getContext());
				for (unsigned I = 0, E = V->getNumElements(); I != E; ++I) {
				Constant *Elt = V->getElementAsConstant(I);
				assert((isa<ConstantInt>(Elt) \|\| isa<ConstantFP>(Elt)) &&
				"Unexpected constant data vector element type");
				bool Sign = V->getElementType()->isIntegerTy()
				? cast<ConstantInt>(Elt)->isNegative()
				: cast<ConstantFP>(Elt)->isNegative();
				BoolVec.push_back(ConstantInt::get(BoolTy, Sign));
				}
				return ConstantVector::get(BoolVec);
				}

				// TODO: If the x86 backend knew how to convert a bool vector mask back to an
				// XMM register mask efficiently, we could transform all x86 masked intrinsics
				// to LLVM masked intrinsics and remove the x86 masked intrinsic defs.
				static Instruction *simplifyX86MaskedLoad(IntrinsicInst &II, InstCombiner &IC) {
				Value *Ptr = II.getOperand(0);
				Value *Mask = II.getOperand(1);
				Constant *ZeroVec = Constant::getNullValue(II.getType());

				// Special case a zero mask since that's not a ConstantDataVector.
				// This masked load instruction creates a zero vector.
				if (isa<ConstantAggregateZero>(Mask))
				return IC.replaceInstUsesWith(II, ZeroVec);

				auto *ConstMask = dyn_cast<ConstantDataVector>(Mask);
				if (!ConstMask)
				return nullptr;

				// The mask is constant. Convert this x86 intrinsic to the LLVM instrinsic
				// to allow target-independent optimizations.

				// First, cast the x86 intrinsic scalar pointer to a vector pointer to match
				// the LLVM intrinsic definition for the pointer argument.
				unsigned AddrSpace = cast<PointerType>(Ptr->getType())->getAddressSpace();
				PointerType *VecPtrTy = PointerType::get(II.getType(), AddrSpace);
				Value *PtrCast = IC.Builder.CreateBitCast(Ptr, VecPtrTy, "castvec");

				// Second, convert the x86 XMM integer vector mask to a vector of bools based
				// on each element's most significant bit (the sign bit).
				Constant *BoolMask = getNegativeIsTrueBoolVec(ConstMask);

				// The pass-through vector for an x86 masked load is a zero vector.
				CallInst *NewMaskedLoad =
				IC.Builder.CreateMaskedLoad(PtrCast, Align(1), BoolMask, ZeroVec);
				return IC.replaceInstUsesWith(II, NewMaskedLoad);
				}

				// TODO: If the x86 backend knew how to convert a bool vector mask back to an
				// XMM register mask efficiently, we could transform all x86 masked intrinsics
				// to LLVM masked intrinsics and remove the x86 masked intrinsic defs.
				static bool simplifyX86MaskedStore(IntrinsicInst &II, InstCombiner &IC) {
				Value *Ptr = II.getOperand(0);
				Value *Mask = II.getOperand(1);
				Value *Vec = II.getOperand(2);

				// Special case a zero mask since that's not a ConstantDataVector:
				// this masked store instruction does nothing.
				if (isa<ConstantAggregateZero>(Mask)) {
				IC.eraseInstFromFunction(II);
				return true;
				}

				// The SSE2 version is too weird (eg, unaligned but non-temporal) to do
				// anything else at this level.
				if (II.getIntrinsicID() == Intrinsic::x86_sse2_maskmov_dqu)
				return false;

				auto *ConstMask = dyn_cast<ConstantDataVector>(Mask);
				if (!ConstMask)
				return false;

				// The mask is constant. Convert this x86 intrinsic to the LLVM instrinsic
				// to allow target-independent optimizations.

				// First, cast the x86 intrinsic scalar pointer to a vector pointer to match
				// the LLVM intrinsic definition for the pointer argument.
				unsigned AddrSpace = cast<PointerType>(Ptr->getType())->getAddressSpace();
				PointerType *VecPtrTy = PointerType::get(Vec->getType(), AddrSpace);
				Value *PtrCast = IC.Builder.CreateBitCast(Ptr, VecPtrTy, "castvec");

				// Second, convert the x86 XMM integer vector mask to a vector of bools based
				// on each element's most significant bit (the sign bit).
				Constant *BoolMask = getNegativeIsTrueBoolVec(ConstMask);

				IC.Builder.CreateMaskedStore(Vec, PtrCast, Align(1), BoolMask);

				// 'Replace uses' doesn't work for stores. Erase the original masked store.
				IC.eraseInstFromFunction(II);
				return true;
				}

				static Value *simplifyX86immShift(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				bool LogicalShift = false;
				bool ShiftLeft = false;
				bool IsImm = false;

				switch (II.getIntrinsicID()) {
				default:
				llvm_unreachable("Unexpected intrinsic!");
				case Intrinsic::x86_sse2_psrai_d:
				case Intrinsic::x86_sse2_psrai_w:
				case Intrinsic::x86_avx2_psrai_d:
				case Intrinsic::x86_avx2_psrai_w:
				case Intrinsic::x86_avx512_psrai_q_128:
				case Intrinsic::x86_avx512_psrai_q_256:
				case Intrinsic::x86_avx512_psrai_d_512:
				case Intrinsic::x86_avx512_psrai_q_512:
				case Intrinsic::x86_avx512_psrai_w_512:
				IsImm = true;
				LLVM_FALLTHROUGH;
				case Intrinsic::x86_sse2_psra_d:
				case Intrinsic::x86_sse2_psra_w:
				case Intrinsic::x86_avx2_psra_d:
				case Intrinsic::x86_avx2_psra_w:
				case Intrinsic::x86_avx512_psra_q_128:
				case Intrinsic::x86_avx512_psra_q_256:
				case Intrinsic::x86_avx512_psra_d_512:
				case Intrinsic::x86_avx512_psra_q_512:
				case Intrinsic::x86_avx512_psra_w_512:
				LogicalShift = false;
				ShiftLeft = false;
				break;
				case Intrinsic::x86_sse2_psrli_d:
				case Intrinsic::x86_sse2_psrli_q:
				case Intrinsic::x86_sse2_psrli_w:
				case Intrinsic::x86_avx2_psrli_d:
				case Intrinsic::x86_avx2_psrli_q:
				case Intrinsic::x86_avx2_psrli_w:
				case Intrinsic::x86_avx512_psrli_d_512:
				case Intrinsic::x86_avx512_psrli_q_512:
				case Intrinsic::x86_avx512_psrli_w_512:
				IsImm = true;
				LLVM_FALLTHROUGH;
				case Intrinsic::x86_sse2_psrl_d:
				case Intrinsic::x86_sse2_psrl_q:
				case Intrinsic::x86_sse2_psrl_w:
				case Intrinsic::x86_avx2_psrl_d:
				case Intrinsic::x86_avx2_psrl_q:
				case Intrinsic::x86_avx2_psrl_w:
				case Intrinsic::x86_avx512_psrl_d_512:
				case Intrinsic::x86_avx512_psrl_q_512:
				case Intrinsic::x86_avx512_psrl_w_512:
				LogicalShift = true;
				ShiftLeft = false;
				break;
				case Intrinsic::x86_sse2_pslli_d:
				case Intrinsic::x86_sse2_pslli_q:
				case Intrinsic::x86_sse2_pslli_w:
				case Intrinsic::x86_avx2_pslli_d:
				case Intrinsic::x86_avx2_pslli_q:
				case Intrinsic::x86_avx2_pslli_w:
				case Intrinsic::x86_avx512_pslli_d_512:
				case Intrinsic::x86_avx512_pslli_q_512:
				case Intrinsic::x86_avx512_pslli_w_512:
				IsImm = true;
				LLVM_FALLTHROUGH;
				case Intrinsic::x86_sse2_psll_d:
				case Intrinsic::x86_sse2_psll_q:
				case Intrinsic::x86_sse2_psll_w:
				case Intrinsic::x86_avx2_psll_d:
				case Intrinsic::x86_avx2_psll_q:
				case Intrinsic::x86_avx2_psll_w:
				case Intrinsic::x86_avx512_psll_d_512:
				case Intrinsic::x86_avx512_psll_q_512:
				case Intrinsic::x86_avx512_psll_w_512:
				LogicalShift = true;
				ShiftLeft = true;
				break;
				}
				assert((LogicalShift \|\| !ShiftLeft) && "Only logical shifts can shift left");

				auto Vec = II.getArgOperand(0);
				auto Amt = II.getArgOperand(1);
				auto VT = cast<VectorType>(Vec->getType());
				auto SVT = VT->getElementType();
				auto AmtVT = Amt->getType();
				unsigned VWidth = VT->getNumElements();
				unsigned BitWidth = SVT->getPrimitiveSizeInBits();

				// If the shift amount is guaranteed to be in-range we can replace it with a
				// generic shift. If its guaranteed to be out of range, logical shifts combine
				// to zero and arithmetic shifts are clamped to (BitWidth - 1).
				if (IsImm) {
				assert(AmtVT->isIntegerTy(32) && "Unexpected shift-by-immediate type");
				KnownBits KnownAmtBits =
				llvm::computeKnownBits(Amt, II.getModule()->getDataLayout());
				if (KnownAmtBits.getMaxValue().ult(BitWidth)) {
				Amt = Builder.CreateZExtOrTrunc(Amt, SVT);
				Amt = Builder.CreateVectorSplat(VWidth, Amt);
				return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
				: Builder.CreateLShr(Vec, Amt))
				: Builder.CreateAShr(Vec, Amt));
				}
				if (KnownAmtBits.getMinValue().uge(BitWidth)) {
				if (LogicalShift)
				return ConstantAggregateZero::get(VT);
				Amt = ConstantInt::get(SVT, BitWidth - 1);
				return Builder.CreateAShr(Vec, Builder.CreateVectorSplat(VWidth, Amt));
				}
				} else {
				// Ensure the first element has an in-range value and the rest of the
				// elements in the bottom 64 bits are zero.
				assert(AmtVT->isVectorTy() && AmtVT->getPrimitiveSizeInBits() == 128 &&
				cast<VectorType>(AmtVT)->getElementType() == SVT &&
				"Unexpected shift-by-scalar type");
				unsigned NumAmtElts = cast<VectorType>(AmtVT)->getNumElements();
				APInt DemandedLower = APInt::getOneBitSet(NumAmtElts, 0);
				APInt DemandedUpper = APInt::getBitsSet(NumAmtElts, 1, NumAmtElts / 2);
				KnownBits KnownLowerBits = llvm::computeKnownBits(
				Amt, DemandedLower, II.getModule()->getDataLayout());
				KnownBits KnownUpperBits = llvm::computeKnownBits(
				Amt, DemandedUpper, II.getModule()->getDataLayout());
				if (KnownLowerBits.getMaxValue().ult(BitWidth) &&
				(DemandedUpper.isNullValue() \|\| KnownUpperBits.isZero())) {
				SmallVector<int, 16> ZeroSplat(VWidth, 0);
				Amt = Builder.CreateShuffleVector(Amt, Amt, ZeroSplat);
				return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
				: Builder.CreateLShr(Vec, Amt))
				: Builder.CreateAShr(Vec, Amt));
				}
				}

				// Simplify if count is constant vector.
				auto CDV = dyn_cast<ConstantDataVector>(Amt);
				if (!CDV)
				return nullptr;

				// SSE2/AVX2 uses all the first 64-bits of the 128-bit vector
				// operand to compute the shift amount.
				assert(AmtVT->isVectorTy() && AmtVT->getPrimitiveSizeInBits() == 128 &&
				cast<VectorType>(AmtVT)->getElementType() == SVT &&
				"Unexpected shift-by-scalar type");

				// Concatenate the sub-elements to create the 64-bit value.
				APInt Count(64, 0);
				for (unsigned i = 0, NumSubElts = 64 / BitWidth; i != NumSubElts; ++i) {
				unsigned SubEltIdx = (NumSubElts - 1) - i;
				auto SubElt = cast<ConstantInt>(CDV->getElementAsConstant(SubEltIdx));
				Count <<= BitWidth;
				Count \|= SubElt->getValue().zextOrTrunc(64);
				}

				// If shift-by-zero then just return the original value.
				if (Count.isNullValue())
				return Vec;

				// Handle cases when Shift >= BitWidth.
				if (Count.uge(BitWidth)) {
				// If LogicalShift - just return zero.
				if (LogicalShift)
				return ConstantAggregateZero::get(VT);

				// If ArithmeticShift - clamp Shift to (BitWidth - 1).
				Count = APInt(64, BitWidth - 1);
				}

				// Get a constant vector of the same type as the first operand.
				auto ShiftAmt = ConstantInt::get(SVT, Count.zextOrTrunc(BitWidth));
				auto ShiftVec = Builder.CreateVectorSplat(VWidth, ShiftAmt);

				if (ShiftLeft)
				return Builder.CreateShl(Vec, ShiftVec);

				if (LogicalShift)
				return Builder.CreateLShr(Vec, ShiftVec);

				return Builder.CreateAShr(Vec, ShiftVec);
				}

				// Attempt to simplify AVX2 per-element shift intrinsics to a generic IR shift.
				// Unlike the generic IR shifts, the intrinsics have defined behaviour for out
				// of range shift amounts (logical - set to zero, arithmetic - splat sign bit).
				static Value *simplifyX86varShift(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				bool LogicalShift = false;
				bool ShiftLeft = false;

				switch (II.getIntrinsicID()) {
				default:
				llvm_unreachable("Unexpected intrinsic!");
				case Intrinsic::x86_avx2_psrav_d:
				case Intrinsic::x86_avx2_psrav_d_256:
				case Intrinsic::x86_avx512_psrav_q_128:
				case Intrinsic::x86_avx512_psrav_q_256:
				case Intrinsic::x86_avx512_psrav_d_512:
				case Intrinsic::x86_avx512_psrav_q_512:
				case Intrinsic::x86_avx512_psrav_w_128:
				case Intrinsic::x86_avx512_psrav_w_256:
				case Intrinsic::x86_avx512_psrav_w_512:
				LogicalShift = false;
				ShiftLeft = false;
				break;
				case Intrinsic::x86_avx2_psrlv_d:
				case Intrinsic::x86_avx2_psrlv_d_256:
				case Intrinsic::x86_avx2_psrlv_q:
				case Intrinsic::x86_avx2_psrlv_q_256:
				case Intrinsic::x86_avx512_psrlv_d_512:
				case Intrinsic::x86_avx512_psrlv_q_512:
				case Intrinsic::x86_avx512_psrlv_w_128:
				case Intrinsic::x86_avx512_psrlv_w_256:
				case Intrinsic::x86_avx512_psrlv_w_512:
				LogicalShift = true;
				ShiftLeft = false;
				break;
				case Intrinsic::x86_avx2_psllv_d:
				case Intrinsic::x86_avx2_psllv_d_256:
				case Intrinsic::x86_avx2_psllv_q:
				case Intrinsic::x86_avx2_psllv_q_256:
				case Intrinsic::x86_avx512_psllv_d_512:
				case Intrinsic::x86_avx512_psllv_q_512:
				case Intrinsic::x86_avx512_psllv_w_128:
				case Intrinsic::x86_avx512_psllv_w_256:
				case Intrinsic::x86_avx512_psllv_w_512:
				LogicalShift = true;
				ShiftLeft = true;
				break;
				}
				assert((LogicalShift \|\| !ShiftLeft) && "Only logical shifts can shift left");

				auto Vec = II.getArgOperand(0);
				auto Amt = II.getArgOperand(1);
				auto VT = cast<VectorType>(II.getType());
				auto SVT = VT->getElementType();
				int NumElts = VT->getNumElements();
				int BitWidth = SVT->getIntegerBitWidth();

				// If the shift amount is guaranteed to be in-range we can replace it with a
				// generic shift.
				APInt UpperBits =
				APInt::getHighBitsSet(BitWidth, BitWidth - Log2_32(BitWidth));
				if (llvm::MaskedValueIsZero(Amt, UpperBits,
				II.getModule()->getDataLayout())) {
				return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
				: Builder.CreateLShr(Vec, Amt))
				: Builder.CreateAShr(Vec, Amt));
				}

				// Simplify if all shift amounts are constant/undef.
				auto *CShift = dyn_cast<Constant>(Amt);
				if (!CShift)
				return nullptr;

				// Collect each element's shift amount.
				// We also collect special cases: UNDEF = -1, OUT-OF-RANGE = BitWidth.
				bool AnyOutOfRange = false;
				SmallVector<int, 8> ShiftAmts;
				for (int I = 0; I < NumElts; ++I) {
				auto *CElt = CShift->getAggregateElement(I);
				if (CElt && isa<UndefValue>(CElt)) {
				ShiftAmts.push_back(-1);
				continue;
				}

				auto *COp = dyn_cast_or_null<ConstantInt>(CElt);
				if (!COp)
				return nullptr;

				// Handle out of range shifts.
				// If LogicalShift - set to BitWidth (special case).
				// If ArithmeticShift - set to (BitWidth - 1) (sign splat).
				APInt ShiftVal = COp->getValue();
				if (ShiftVal.uge(BitWidth)) {
				AnyOutOfRange = LogicalShift;
				ShiftAmts.push_back(LogicalShift ? BitWidth : BitWidth - 1);
				continue;
				}

				ShiftAmts.push_back((int)ShiftVal.getZExtValue());
				}

				// If all elements out of range or UNDEF, return vector of zeros/undefs.
				// ArithmeticShift should only hit this if they are all UNDEF.
				auto OutOfRange = [&](int Idx) { return (Idx < 0) \|\| (BitWidth <= Idx); };
				if (llvm::all_of(ShiftAmts, OutOfRange)) {
				SmallVector<Constant *, 8> ConstantVec;
				for (int Idx : ShiftAmts) {
				if (Idx < 0) {
				ConstantVec.push_back(UndefValue::get(SVT));
				} else {
				assert(LogicalShift && "Logical shift expected");
				ConstantVec.push_back(ConstantInt::getNullValue(SVT));
				}
				}
				return ConstantVector::get(ConstantVec);
				}

				// We can't handle only some out of range values with generic logical shifts.
				if (AnyOutOfRange)
				return nullptr;

				// Build the shift amount constant vector.
				SmallVector<Constant *, 8> ShiftVecAmts;
				for (int Idx : ShiftAmts) {
				if (Idx < 0)
				ShiftVecAmts.push_back(UndefValue::get(SVT));
				else
				ShiftVecAmts.push_back(ConstantInt::get(SVT, Idx));
				}
				auto ShiftVec = ConstantVector::get(ShiftVecAmts);

				if (ShiftLeft)
				return Builder.CreateShl(Vec, ShiftVec);

				if (LogicalShift)
				return Builder.CreateLShr(Vec, ShiftVec);

				return Builder.CreateAShr(Vec, ShiftVec);
				}

				static Value *simplifyX86pack(IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder, bool IsSigned) {
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);
				Type *ResTy = II.getType();

				// Fast all undef handling.
				if (isa<UndefValue>(Arg0) && isa<UndefValue>(Arg1))
				return UndefValue::get(ResTy);

				auto *ArgTy = cast<VectorType>(Arg0->getType());
				unsigned NumLanes = ResTy->getPrimitiveSizeInBits() / 128;
				unsigned NumSrcElts = ArgTy->getNumElements();
				assert(cast<VectorType>(ResTy)->getNumElements() == (2 * NumSrcElts) &&
				"Unexpected packing types");

				unsigned NumSrcEltsPerLane = NumSrcElts / NumLanes;
				unsigned DstScalarSizeInBits = ResTy->getScalarSizeInBits();
				unsigned SrcScalarSizeInBits = ArgTy->getScalarSizeInBits();
				assert(SrcScalarSizeInBits == (2 * DstScalarSizeInBits) &&
				"Unexpected packing types");

				// Constant folding.
				if (!isa<Constant>(Arg0) \|\| !isa<Constant>(Arg1))
				return nullptr;

				// Clamp Values - signed/unsigned both use signed clamp values, but they
				// differ on the min/max values.
				APInt MinValue, MaxValue;
				if (IsSigned) {
				// PACKSS: Truncate signed value with signed saturation.
				// Source values less than dst minint are saturated to minint.
				// Source values greater than dst maxint are saturated to maxint.
				MinValue =
				APInt::getSignedMinValue(DstScalarSizeInBits).sext(SrcScalarSizeInBits);
				MaxValue =
				APInt::getSignedMaxValue(DstScalarSizeInBits).sext(SrcScalarSizeInBits);
				} else {
				// PACKUS: Truncate signed value with unsigned saturation.
				// Source values less than zero are saturated to zero.
				// Source values greater than dst maxuint are saturated to maxuint.
				MinValue = APInt::getNullValue(SrcScalarSizeInBits);
				MaxValue = APInt::getLowBitsSet(SrcScalarSizeInBits, DstScalarSizeInBits);
				}

				auto *MinC = Constant::getIntegerValue(ArgTy, MinValue);
				auto *MaxC = Constant::getIntegerValue(ArgTy, MaxValue);
				Arg0 = Builder.CreateSelect(Builder.CreateICmpSLT(Arg0, MinC), MinC, Arg0);
				Arg1 = Builder.CreateSelect(Builder.CreateICmpSLT(Arg1, MinC), MinC, Arg1);
				Arg0 = Builder.CreateSelect(Builder.CreateICmpSGT(Arg0, MaxC), MaxC, Arg0);
				Arg1 = Builder.CreateSelect(Builder.CreateICmpSGT(Arg1, MaxC), MaxC, Arg1);

				// Shuffle clamped args together at the lane level.
				SmallVector<int, 32> PackMask;
				for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
				for (unsigned Elt = 0; Elt != NumSrcEltsPerLane; ++Elt)
				PackMask.push_back(Elt + (Lane * NumSrcEltsPerLane));
				for (unsigned Elt = 0; Elt != NumSrcEltsPerLane; ++Elt)
				PackMask.push_back(Elt + (Lane * NumSrcEltsPerLane) + NumSrcElts);
				}
				auto *Shuffle = Builder.CreateShuffleVector(Arg0, Arg1, PackMask);

				// Truncate to dst size.
				return Builder.CreateTrunc(Shuffle, ResTy);
				}

				static Value *simplifyX86movmsk(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				Value *Arg = II.getArgOperand(0);
				Type *ResTy = II.getType();

				// movmsk(undef) -> zero as we must ensure the upper bits are zero.
				if (isa<UndefValue>(Arg))
				return Constant::getNullValue(ResTy);

				auto *ArgTy = dyn_cast<VectorType>(Arg->getType());
				// We can't easily peek through x86_mmx types.
				if (!ArgTy)
				return nullptr;

				// Expand MOVMSK to compare/bitcast/zext:
				// e.g. PMOVMSKB(v16i8 x):
				// %cmp = icmp slt <16 x i8> %x, zeroinitializer
				// %int = bitcast <16 x i1> %cmp to i16
				// %res = zext i16 %int to i32
				unsigned NumElts = ArgTy->getNumElements();
				Type *IntegerVecTy = VectorType::getInteger(ArgTy);
				Type *IntegerTy = Builder.getIntNTy(NumElts);

				Value *Res = Builder.CreateBitCast(Arg, IntegerVecTy);
				Res = Builder.CreateICmpSLT(Res, Constant::getNullValue(IntegerVecTy));
				Res = Builder.CreateBitCast(Res, IntegerTy);
				Res = Builder.CreateZExtOrTrunc(Res, ResTy);
				return Res;
				}

				static Value *simplifyX86addcarry(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				Value *CarryIn = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);
				Value *Op2 = II.getArgOperand(2);
				Type *RetTy = II.getType();
				Type *OpTy = Op1->getType();
				assert(RetTy->getStructElementType(0)->isIntegerTy(8) &&
				RetTy->getStructElementType(1) == OpTy && OpTy == Op2->getType() &&
				"Unexpected types for x86 addcarry");

				// If carry-in is zero, this is just an unsigned add with overflow.
				if (match(CarryIn, PatternMatch::m_ZeroInt())) {
				Value *UAdd = Builder.CreateIntrinsic(Intrinsic::uadd_with_overflow, OpTy,
				{Op1, Op2});
				// The types have to be adjusted to match the x86 call types.
				Value *UAddResult = Builder.CreateExtractValue(UAdd, 0);
				Value *UAddOV = Builder.CreateZExt(Builder.CreateExtractValue(UAdd, 1),
				Builder.getInt8Ty());
				Value *Res = UndefValue::get(RetTy);
				Res = Builder.CreateInsertValue(Res, UAddOV, 0);
				return Builder.CreateInsertValue(Res, UAddResult, 1);
				}

				return nullptr;
				}

				static Value *simplifyX86insertps(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				auto *CInt = dyn_cast<ConstantInt>(II.getArgOperand(2));
				if (!CInt)
				return nullptr;

				VectorType *VecTy = cast<VectorType>(II.getType());
				assert(VecTy->getNumElements() == 4 && "insertps with wrong vector type");

				// The immediate permute control byte looks like this:
				// [3:0] - zero mask for each 32-bit lane
				// [5:4] - select one 32-bit destination lane
				// [7:6] - select one 32-bit source lane

				uint8_t Imm = CInt->getZExtValue();
				uint8_t ZMask = Imm & 0xf;
				uint8_t DestLane = (Imm >> 4) & 0x3;
				uint8_t SourceLane = (Imm >> 6) & 0x3;

				ConstantAggregateZero *ZeroVector = ConstantAggregateZero::get(VecTy);

				// If all zero mask bits are set, this was just a weird way to
				// generate a zero vector.
				if (ZMask == 0xf)
				return ZeroVector;

				// Initialize by passing all of the first source bits through.
				int ShuffleMask[4] = {0, 1, 2, 3};

				// We may replace the second operand with the zero vector.
				Value *V1 = II.getArgOperand(1);

				if (ZMask) {
				// If the zero mask is being used with a single input or the zero mask
				// overrides the destination lane, this is a shuffle with the zero vector.
				if ((II.getArgOperand(0) == II.getArgOperand(1)) \|\|
				(ZMask & (1 << DestLane))) {
				V1 = ZeroVector;
				// We may still move 32-bits of the first source vector from one lane
				// to another.
				ShuffleMask[DestLane] = SourceLane;
				// The zero mask may override the previous insert operation.
				for (unsigned i = 0; i < 4; ++i)
				if ((ZMask >> i) & 0x1)
				ShuffleMask[i] = i + 4;
				} else {
				// TODO: Model this case as 2 shuffles or a 'logical and' plus shuffle?
				return nullptr;
				}
				} else {
				// Replace the selected destination lane with the selected source lane.
				ShuffleMask[DestLane] = SourceLane + 4;
				}

				return Builder.CreateShuffleVector(II.getArgOperand(0), V1, ShuffleMask);
				}

				/// Attempt to simplify SSE4A EXTRQ/EXTRQI instructions using constant folding
				/// or conversion to a shuffle vector.
				static Value simplifyX86extrq(IntrinsicInst &II, Value Op0,
				ConstantInt CILength, ConstantInt CIIndex,
				InstCombiner::BuilderTy &Builder) {
				auto LowConstantHighUndef = [&](uint64_t Val) {
				Type *IntTy64 = Type::getInt64Ty(II.getContext());
				Constant *Args[] = {ConstantInt::get(IntTy64, Val),
				UndefValue::get(IntTy64)};
				return ConstantVector::get(Args);
				};

				// See if we're dealing with constant values.
				Constant *C0 = dyn_cast<Constant>(Op0);
				ConstantInt *CI0 =
				C0 ? dyn_cast_or_null<ConstantInt>(C0->getAggregateElement((unsigned)0))
				: nullptr;

				// Attempt to constant fold.
				if (CILength && CIIndex) {
				// From AMD documentation: "The bit index and field length are each six
				// bits in length other bits of the field are ignored."
				APInt APIndex = CIIndex->getValue().zextOrTrunc(6);
				APInt APLength = CILength->getValue().zextOrTrunc(6);

				unsigned Index = APIndex.getZExtValue();

				// From AMD documentation: "a value of zero in the field length is
				// defined as length of 64".
				unsigned Length = APLength == 0 ? 64 : APLength.getZExtValue();

				// From AMD documentation: "If the sum of the bit index + length field
				// is greater than 64, the results are undefined".
				unsigned End = Index + Length;

				// Note that both field index and field length are 8-bit quantities.
				// Since variables 'Index' and 'Length' are unsigned values
				// obtained from zero-extending field index and field length
				// respectively, their sum should never wrap around.
				if (End > 64)
				return UndefValue::get(II.getType());

				// If we are inserting whole bytes, we can convert this to a shuffle.
				// Lowering can recognize EXTRQI shuffle masks.
				if ((Length % 8) == 0 && (Index % 8) == 0) {
				// Convert bit indices to byte indices.
				Length /= 8;
				Index /= 8;

				Type *IntTy8 = Type::getInt8Ty(II.getContext());
				auto *ShufTy = FixedVectorType::get(IntTy8, 16);

				SmallVector<int, 16> ShuffleMask;
				for (int i = 0; i != (int)Length; ++i)
				ShuffleMask.push_back(i + Index);
				for (int i = Length; i != 8; ++i)
				ShuffleMask.push_back(i + 16);
				for (int i = 8; i != 16; ++i)
				ShuffleMask.push_back(-1);

				Value *SV = Builder.CreateShuffleVector(
				Builder.CreateBitCast(Op0, ShufTy),
				ConstantAggregateZero::get(ShufTy), ShuffleMask);
				return Builder.CreateBitCast(SV, II.getType());
				}

				// Constant Fold - shift Index'th bit to lowest position and mask off
				// Length bits.
				if (CI0) {
				APInt Elt = CI0->getValue();
				Elt.lshrInPlace(Index);
				Elt = Elt.zextOrTrunc(Length);
				return LowConstantHighUndef(Elt.getZExtValue());
				}

				// If we were an EXTRQ call, we'll save registers if we convert to EXTRQI.
				if (II.getIntrinsicID() == Intrinsic::x86_sse4a_extrq) {
				Value *Args[] = {Op0, CILength, CIIndex};
				Module *M = II.getModule();
				Function *F = Intrinsic::getDeclaration(M, Intrinsic::x86_sse4a_extrqi);
				return Builder.CreateCall(F, Args);
				}
				}

				// Constant Fold - extraction from zero is always {zero, undef}.
				if (CI0 && CI0->isZero())
				return LowConstantHighUndef(0);

				return nullptr;
				}

				/// Attempt to simplify SSE4A INSERTQ/INSERTQI instructions using constant
				/// folding or conversion to a shuffle vector.
				static Value simplifyX86insertq(IntrinsicInst &II, Value Op0, Value *Op1,
				APInt APLength, APInt APIndex,
				InstCombiner::BuilderTy &Builder) {
				// From AMD documentation: "The bit index and field length are each six bits
				// in length other bits of the field are ignored."
				APIndex = APIndex.zextOrTrunc(6);
				APLength = APLength.zextOrTrunc(6);

				// Attempt to constant fold.
				unsigned Index = APIndex.getZExtValue();

				// From AMD documentation: "a value of zero in the field length is
				// defined as length of 64".
				unsigned Length = APLength == 0 ? 64 : APLength.getZExtValue();

				// From AMD documentation: "If the sum of the bit index + length field
				// is greater than 64, the results are undefined".
				unsigned End = Index + Length;

				// Note that both field index and field length are 8-bit quantities.
				// Since variables 'Index' and 'Length' are unsigned values
				// obtained from zero-extending field index and field length
				// respectively, their sum should never wrap around.
				if (End > 64)
				return UndefValue::get(II.getType());

				// If we are inserting whole bytes, we can convert this to a shuffle.
				// Lowering can recognize INSERTQI shuffle masks.
				if ((Length % 8) == 0 && (Index % 8) == 0) {
				// Convert bit indices to byte indices.
				Length /= 8;
				Index /= 8;

				Type *IntTy8 = Type::getInt8Ty(II.getContext());
				auto *ShufTy = FixedVectorType::get(IntTy8, 16);

				SmallVector<int, 16> ShuffleMask;
				for (int i = 0; i != (int)Index; ++i)
				ShuffleMask.push_back(i);
				for (int i = 0; i != (int)Length; ++i)
				ShuffleMask.push_back(i + 16);
				for (int i = Index + Length; i != 8; ++i)
				ShuffleMask.push_back(i);
				for (int i = 8; i != 16; ++i)
				ShuffleMask.push_back(-1);

				Value *SV = Builder.CreateShuffleVector(Builder.CreateBitCast(Op0, ShufTy),
				Builder.CreateBitCast(Op1, ShufTy),
				ShuffleMask);
				return Builder.CreateBitCast(SV, II.getType());
				}

				// See if we're dealing with constant values.
				Constant *C0 = dyn_cast<Constant>(Op0);
				Constant *C1 = dyn_cast<Constant>(Op1);
				ConstantInt *CI00 =
				C0 ? dyn_cast_or_null<ConstantInt>(C0->getAggregateElement((unsigned)0))
				: nullptr;
				ConstantInt *CI10 =
				C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)0))
				: nullptr;

				// Constant Fold - insert bottom Length bits starting at the Index'th bit.
				if (CI00 && CI10) {
				APInt V00 = CI00->getValue();
				APInt V10 = CI10->getValue();
				APInt Mask = APInt::getLowBitsSet(64, Length).shl(Index);
				V00 = V00 & ~Mask;
				V10 = V10.zextOrTrunc(Length).zextOrTrunc(64).shl(Index);
				APInt Val = V00 \| V10;
				Type *IntTy64 = Type::getInt64Ty(II.getContext());
				Constant *Args[] = {ConstantInt::get(IntTy64, Val.getZExtValue()),
				UndefValue::get(IntTy64)};
				return ConstantVector::get(Args);
				}

				// If we were an INSERTQ call, we'll save demanded elements if we convert to
				// INSERTQI.
				if (II.getIntrinsicID() == Intrinsic::x86_sse4a_insertq) {
				Type *IntTy8 = Type::getInt8Ty(II.getContext());
				Constant *CILength = ConstantInt::get(IntTy8, Length, false);
				Constant *CIIndex = ConstantInt::get(IntTy8, Index, false);

				Value *Args[] = {Op0, Op1, CILength, CIIndex};
				Module *M = II.getModule();
				Function *F = Intrinsic::getDeclaration(M, Intrinsic::x86_sse4a_insertqi);
				return Builder.CreateCall(F, Args);
				}

				return nullptr;
				}

				/// Attempt to convert pshufb* to shufflevector if the mask is constant.
				static Value *simplifyX86pshufb(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				Constant *V = dyn_cast<Constant>(II.getArgOperand(1));
				if (!V)
				return nullptr;

				auto *VecTy = cast<VectorType>(II.getType());
				unsigned NumElts = VecTy->getNumElements();
				assert((NumElts == 16 \|\| NumElts == 32 \|\| NumElts == 64) &&
				"Unexpected number of elements in shuffle mask!");

				// Construct a shuffle mask from constant integers or UNDEFs.
				int Indexes[64];

				// Each byte in the shuffle control mask forms an index to permute the
				// corresponding byte in the destination operand.
				for (unsigned I = 0; I < NumElts; ++I) {
				Constant *COp = V->getAggregateElement(I);
				if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
				return nullptr;

				if (isa<UndefValue>(COp)) {
				Indexes[I] = -1;
				continue;
				}

				int8_t Index = cast<ConstantInt>(COp)->getValue().getZExtValue();

				// If the most significant bit (bit[7]) of each byte of the shuffle
				// control mask is set, then zero is written in the result byte.
				// The zero vector is in the right-hand side of the resulting
				// shufflevector.

				// The value of each index for the high 128-bit lane is the least
				// significant 4 bits of the respective shuffle control byte.
				Index = ((Index < 0) ? NumElts : Index & 0x0F) + (I & 0xF0);
				Indexes[I] = Index;
				}

				auto V1 = II.getArgOperand(0);
				auto V2 = Constant::getNullValue(VecTy);
				return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, NumElts));
				}

				/// Attempt to convert vpermilvar* to shufflevector if the mask is constant.
				static Value *simplifyX86vpermilvar(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				Constant *V = dyn_cast<Constant>(II.getArgOperand(1));
				if (!V)
				return nullptr;

				auto *VecTy = cast<VectorType>(II.getType());
				unsigned NumElts = VecTy->getNumElements();
				bool IsPD = VecTy->getScalarType()->isDoubleTy();
				unsigned NumLaneElts = IsPD ? 2 : 4;
				assert(NumElts == 16 \|\| NumElts == 8 \|\| NumElts == 4 \|\| NumElts == 2);

				// Construct a shuffle mask from constant integers or UNDEFs.
				int Indexes[16];

				// The intrinsics only read one or two bits, clear the rest.
				for (unsigned I = 0; I < NumElts; ++I) {
				Constant *COp = V->getAggregateElement(I);
				if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
				return nullptr;

				if (isa<UndefValue>(COp)) {
				Indexes[I] = -1;
				continue;
				}

				APInt Index = cast<ConstantInt>(COp)->getValue();
				Index = Index.zextOrTrunc(32).getLoBits(2);

				// The PD variants uses bit 1 to select per-lane element index, so
				// shift down to convert to generic shuffle mask index.
				if (IsPD)
				Index.lshrInPlace(1);

				// The _256 variants are a bit trickier since the mask bits always index
				// into the corresponding 128 half. In order to convert to a generic
				// shuffle, we have to make that explicit.
				Index += APInt(32, (I / NumLaneElts) * NumLaneElts);

				Indexes[I] = Index.getZExtValue();
				}

				auto V1 = II.getArgOperand(0);
				auto V2 = UndefValue::get(V1->getType());
				return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, NumElts));
				}

				/// Attempt to convert vpermd/vpermps to shufflevector if the mask is constant.
				static Value *simplifyX86vpermv(const IntrinsicInst &II,
				InstCombiner::BuilderTy &Builder) {
				auto *V = dyn_cast<Constant>(II.getArgOperand(1));
				if (!V)
				return nullptr;

				auto *VecTy = cast<VectorType>(II.getType());
				unsigned Size = VecTy->getNumElements();
				assert((Size == 4 \|\| Size == 8 \|\| Size == 16 \|\| Size == 32 \|\| Size == 64) &&
				"Unexpected shuffle mask size");

				// Construct a shuffle mask from constant integers or UNDEFs.
				int Indexes[64];

				for (unsigned I = 0; I < Size; ++I) {
				Constant *COp = V->getAggregateElement(I);
				if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
				return nullptr;

				if (isa<UndefValue>(COp)) {
				Indexes[I] = -1;
				continue;
				}

				uint32_t Index = cast<ConstantInt>(COp)->getZExtValue();
				Index &= Size - 1;
				Indexes[I] = Index;
				}

				auto V1 = II.getArgOperand(0);
				auto V2 = UndefValue::get(VecTy);
				return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, Size));
				}

				bool X86TTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
				Instruction **ResultI) const {
				auto SimplifyDemandedVectorEltsLow = [&IC](Value *Op, unsigned Width,
				unsigned DemandedWidth) {
				APInt UndefElts(Width, 0);
				APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);
				return IC.SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
				};

				Intrinsic::ID IID = II.getIntrinsicID();
				switch (IID) {
				case Intrinsic::x86_bmi_bextr_32:
				case Intrinsic::x86_bmi_bextr_64:
				case Intrinsic::x86_tbm_bextri_u32:
				case Intrinsic::x86_tbm_bextri_u64:
				// If the RHS is a constant we can try some simplifications.
				if (auto *C = dyn_cast<ConstantInt>(II.getArgOperand(1))) {
				uint64_t Shift = C->getZExtValue();
				uint64_t Length = (Shift >> 8) & 0xff;
				Shift &= 0xff;
				unsigned BitWidth = II.getType()->getIntegerBitWidth();
				// If the length is 0 or the shift is out of range, replace with zero.
				if (Length == 0 \|\| Shift >= BitWidth) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), 0));
				return true;
				}
				// If the LHS is also a constant, we can completely constant fold this.
				if (auto *InC = dyn_cast<ConstantInt>(II.getArgOperand(0))) {
				uint64_t Result = InC->getZExtValue() >> Shift;
				if (Length > BitWidth)
				Length = BitWidth;
				Result &= maskTrailingOnes<uint64_t>(Length);
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Result));
				return true;
				}
				// TODO should we turn this into 'and' if shift is 0? Or 'shl' if we
				// are only masking bits that a shift already cleared?
				}
				break;

				case Intrinsic::x86_bmi_bzhi_32:
				case Intrinsic::x86_bmi_bzhi_64:
				// If the RHS is a constant we can try some simplifications.
				if (auto *C = dyn_cast<ConstantInt>(II.getArgOperand(1))) {
				uint64_t Index = C->getZExtValue() & 0xff;
				unsigned BitWidth = II.getType()->getIntegerBitWidth();
				if (Index >= BitWidth) {
				*ResultI = IC.replaceInstUsesWith(II, II.getArgOperand(0));
				return true;
				}
				if (Index == 0) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), 0));
				return true;
				}
				// If the LHS is also a constant, we can completely constant fold this.
				if (auto *InC = dyn_cast<ConstantInt>(II.getArgOperand(0))) {
				uint64_t Result = InC->getZExtValue();
				Result &= maskTrailingOnes<uint64_t>(Index);
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Result));
				return true;
				}
				// TODO should we convert this to an AND if the RHS is constant?
				}
				break;
				case Intrinsic::x86_bmi_pext_32:
				case Intrinsic::x86_bmi_pext_64:
				if (auto *MaskC = dyn_cast<ConstantInt>(II.getArgOperand(1))) {
				if (MaskC->isNullValue()) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), 0));
				return true;
				}
				if (MaskC->isAllOnesValue()) {
				*ResultI = IC.replaceInstUsesWith(II, II.getArgOperand(0));
				return true;
				}

				if (auto *SrcC = dyn_cast<ConstantInt>(II.getArgOperand(0))) {
				uint64_t Src = SrcC->getZExtValue();
				uint64_t Mask = MaskC->getZExtValue();
				uint64_t Result = 0;
				uint64_t BitToSet = 1;

				while (Mask) {
				// Isolate lowest set bit.
				uint64_t BitToTest = Mask & -Mask;
				if (BitToTest & Src)
				Result \|= BitToSet;

				BitToSet <<= 1;
				// Clear lowest set bit.
				Mask &= Mask - 1;
				}

				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Result));
				return true;
				}
				}
				break;
				case Intrinsic::x86_bmi_pdep_32:
				case Intrinsic::x86_bmi_pdep_64:
				if (auto *MaskC = dyn_cast<ConstantInt>(II.getArgOperand(1))) {
				if (MaskC->isNullValue()) {
				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), 0));
				return true;
				}
				if (MaskC->isAllOnesValue()) {
				*ResultI = IC.replaceInstUsesWith(II, II.getArgOperand(0));
				return true;
				}

				if (auto *SrcC = dyn_cast<ConstantInt>(II.getArgOperand(0))) {
				uint64_t Src = SrcC->getZExtValue();
				uint64_t Mask = MaskC->getZExtValue();
				uint64_t Result = 0;
				uint64_t BitToTest = 1;

				while (Mask) {
				// Isolate lowest set bit.
				uint64_t BitToSet = Mask & -Mask;
				if (BitToTest & Src)
				Result \|= BitToSet;

				BitToTest <<= 1;
				// Clear lowest set bit;
				Mask &= Mask - 1;
				}

				*ResultI =
				IC.replaceInstUsesWith(II, ConstantInt::get(II.getType(), Result));
				return true;
				}
				}
				break;

				case Intrinsic::x86_sse_cvtss2si:
				case Intrinsic::x86_sse_cvtss2si64:
				case Intrinsic::x86_sse_cvttss2si:
				case Intrinsic::x86_sse_cvttss2si64:
				case Intrinsic::x86_sse2_cvtsd2si:
				case Intrinsic::x86_sse2_cvtsd2si64:
				case Intrinsic::x86_sse2_cvttsd2si:
				case Intrinsic::x86_sse2_cvttsd2si64:
				case Intrinsic::x86_avx512_vcvtss2si32:
				case Intrinsic::x86_avx512_vcvtss2si64:
				case Intrinsic::x86_avx512_vcvtss2usi32:
				case Intrinsic::x86_avx512_vcvtss2usi64:
				case Intrinsic::x86_avx512_vcvtsd2si32:
				case Intrinsic::x86_avx512_vcvtsd2si64:
				case Intrinsic::x86_avx512_vcvtsd2usi32:
				case Intrinsic::x86_avx512_vcvtsd2usi64:
				case Intrinsic::x86_avx512_cvttss2si:
				case Intrinsic::x86_avx512_cvttss2si64:
				case Intrinsic::x86_avx512_cvttss2usi:
				case Intrinsic::x86_avx512_cvttss2usi64:
				case Intrinsic::x86_avx512_cvttsd2si:
				case Intrinsic::x86_avx512_cvttsd2si64:
				case Intrinsic::x86_avx512_cvttsd2usi:
				case Intrinsic::x86_avx512_cvttsd2usi64: {
				// These intrinsics only demand the 0th element of their input vectors. If
				// we can simplify the input based on that, do so now.
				Value *Arg = II.getArgOperand(0);
				unsigned VWidth = cast<VectorType>(Arg->getType())->getNumElements();
				if (Value *V = SimplifyDemandedVectorEltsLow(Arg, VWidth, 1)) {
				*ResultI = IC.replaceOperand(II, 0, V);
				return true;
				}
				break;
				}

				case Intrinsic::x86_mmx_pmovmskb:
				case Intrinsic::x86_sse_movmsk_ps:
				case Intrinsic::x86_sse2_movmsk_pd:
				case Intrinsic::x86_sse2_pmovmskb_128:
				case Intrinsic::x86_avx_movmsk_pd_256:
				case Intrinsic::x86_avx_movmsk_ps_256:
				case Intrinsic::x86_avx2_pmovmskb:
				if (Value *V = simplifyX86movmsk(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_sse_comieq_ss:
				case Intrinsic::x86_sse_comige_ss:
				case Intrinsic::x86_sse_comigt_ss:
				case Intrinsic::x86_sse_comile_ss:
				case Intrinsic::x86_sse_comilt_ss:
				case Intrinsic::x86_sse_comineq_ss:
				case Intrinsic::x86_sse_ucomieq_ss:
				case Intrinsic::x86_sse_ucomige_ss:
				case Intrinsic::x86_sse_ucomigt_ss:
				case Intrinsic::x86_sse_ucomile_ss:
				case Intrinsic::x86_sse_ucomilt_ss:
				case Intrinsic::x86_sse_ucomineq_ss:
				case Intrinsic::x86_sse2_comieq_sd:
				case Intrinsic::x86_sse2_comige_sd:
				case Intrinsic::x86_sse2_comigt_sd:
				case Intrinsic::x86_sse2_comile_sd:
				case Intrinsic::x86_sse2_comilt_sd:
				case Intrinsic::x86_sse2_comineq_sd:
				case Intrinsic::x86_sse2_ucomieq_sd:
				case Intrinsic::x86_sse2_ucomige_sd:
				case Intrinsic::x86_sse2_ucomigt_sd:
				case Intrinsic::x86_sse2_ucomile_sd:
				case Intrinsic::x86_sse2_ucomilt_sd:
				case Intrinsic::x86_sse2_ucomineq_sd:
				case Intrinsic::x86_avx512_vcomi_ss:
				case Intrinsic::x86_avx512_vcomi_sd:
				case Intrinsic::x86_avx512_mask_cmp_ss:
				case Intrinsic::x86_avx512_mask_cmp_sd: {
				// These intrinsics only demand the 0th element of their input vectors. If
				// we can simplify the input based on that, do so now.
				bool MadeChange = false;
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);
				unsigned VWidth = cast<VectorType>(Arg0->getType())->getNumElements();
				if (Value *V = SimplifyDemandedVectorEltsLow(Arg0, VWidth, 1)) {
				IC.replaceOperand(II, 0, V);
				MadeChange = true;
				}
				if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
				IC.replaceOperand(II, 1, V);
				MadeChange = true;
				}
				if (MadeChange) {
				*ResultI = &II;
				return true;
				}
				break;
				}
				case Intrinsic::x86_avx512_cmp_pd_128:
				case Intrinsic::x86_avx512_cmp_pd_256:
				case Intrinsic::x86_avx512_cmp_pd_512:
				case Intrinsic::x86_avx512_cmp_ps_128:
				case Intrinsic::x86_avx512_cmp_ps_256:
				case Intrinsic::x86_avx512_cmp_ps_512: {
				// Folding cmp(sub(a,b),0) -> cmp(a,b) and cmp(0,sub(a,b)) -> cmp(b,a)
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);
				bool Arg0IsZero = match(Arg0, PatternMatch::m_PosZeroFP());
				if (Arg0IsZero)
				std::swap(Arg0, Arg1);
				Value A, B;
				// This fold requires only the NINF(not +/- inf) since inf minus
				// inf is nan.
				// NSZ(No Signed Zeros) is not needed because zeros of any sign are
				// equal for both compares.
				// NNAN is not needed because nans compare the same for both compares.
				// The compare intrinsic uses the above assumptions and therefore
				// doesn't require additional flags.
				if ((match(Arg0,
				PatternMatch::m_OneUse(PatternMatch::m_FSub(
				PatternMatch::m_Value(A), PatternMatch::m_Value(B)))) &&
				match(Arg1, PatternMatch::m_PosZeroFP()) && isa<Instruction>(Arg0) &&
				cast<Instruction>(Arg0)->getFastMathFlags().noInfs())) {
				if (Arg0IsZero)
				std::swap(A, B);
				IC.replaceOperand(II, 0, A);
				IC.replaceOperand(II, 1, B);
				*ResultI = &II;
				return true;
				}
				break;
				}

				case Intrinsic::x86_avx512_add_ps_512:
				case Intrinsic::x86_avx512_div_ps_512:
				case Intrinsic::x86_avx512_mul_ps_512:
				case Intrinsic::x86_avx512_sub_ps_512:
				case Intrinsic::x86_avx512_add_pd_512:
				case Intrinsic::x86_avx512_div_pd_512:
				case Intrinsic::x86_avx512_mul_pd_512:
				case Intrinsic::x86_avx512_sub_pd_512:
				// If the rounding mode is CUR_DIRECTION(4) we can turn these into regular
				// IR operations.
				if (auto *R = dyn_cast<ConstantInt>(II.getArgOperand(2))) {
				if (R->getValue() == 4) {
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);

				Value *V;
				switch (IID) {
				default:
				llvm_unreachable("Case stmts out of sync!");
				case Intrinsic::x86_avx512_add_ps_512:
				case Intrinsic::x86_avx512_add_pd_512:
				V = IC.Builder.CreateFAdd(Arg0, Arg1);
				break;
				case Intrinsic::x86_avx512_sub_ps_512:
				case Intrinsic::x86_avx512_sub_pd_512:
				V = IC.Builder.CreateFSub(Arg0, Arg1);
				break;
				case Intrinsic::x86_avx512_mul_ps_512:
				case Intrinsic::x86_avx512_mul_pd_512:
				V = IC.Builder.CreateFMul(Arg0, Arg1);
				break;
				case Intrinsic::x86_avx512_div_ps_512:
				case Intrinsic::x86_avx512_div_pd_512:
				V = IC.Builder.CreateFDiv(Arg0, Arg1);
				break;
				}

				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				}
				break;

				case Intrinsic::x86_avx512_mask_add_ss_round:
				case Intrinsic::x86_avx512_mask_div_ss_round:
				case Intrinsic::x86_avx512_mask_mul_ss_round:
				case Intrinsic::x86_avx512_mask_sub_ss_round:
				case Intrinsic::x86_avx512_mask_add_sd_round:
				case Intrinsic::x86_avx512_mask_div_sd_round:
				case Intrinsic::x86_avx512_mask_mul_sd_round:
				case Intrinsic::x86_avx512_mask_sub_sd_round:
				// If the rounding mode is CUR_DIRECTION(4) we can turn these into regular
				// IR operations.
				if (auto *R = dyn_cast<ConstantInt>(II.getArgOperand(4))) {
				if (R->getValue() == 4) {
				// Extract the element as scalars.
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);
				Value *LHS = IC.Builder.CreateExtractElement(Arg0, (uint64_t)0);
				Value *RHS = IC.Builder.CreateExtractElement(Arg1, (uint64_t)0);

				Value *V;
				switch (IID) {
				default:
				llvm_unreachable("Case stmts out of sync!");
				case Intrinsic::x86_avx512_mask_add_ss_round:
				case Intrinsic::x86_avx512_mask_add_sd_round:
				V = IC.Builder.CreateFAdd(LHS, RHS);
				break;
				case Intrinsic::x86_avx512_mask_sub_ss_round:
				case Intrinsic::x86_avx512_mask_sub_sd_round:
				V = IC.Builder.CreateFSub(LHS, RHS);
				break;
				case Intrinsic::x86_avx512_mask_mul_ss_round:
				case Intrinsic::x86_avx512_mask_mul_sd_round:
				V = IC.Builder.CreateFMul(LHS, RHS);
				break;
				case Intrinsic::x86_avx512_mask_div_ss_round:
				case Intrinsic::x86_avx512_mask_div_sd_round:
				V = IC.Builder.CreateFDiv(LHS, RHS);
				break;
				}

				// Handle the masking aspect of the intrinsic.
				Value *Mask = II.getArgOperand(3);
				auto *C = dyn_cast<ConstantInt>(Mask);
				// We don't need a select if we know the mask bit is a 1.
				if (!C \|\| !C->getValue()[0]) {
				// Cast the mask to an i1 vector and then extract the lowest element.
				auto *MaskTy = FixedVectorType::get(
				IC.Builder.getInt1Ty(),
				cast<IntegerType>(Mask->getType())->getBitWidth());
				Mask = IC.Builder.CreateBitCast(Mask, MaskTy);
				Mask = IC.Builder.CreateExtractElement(Mask, (uint64_t)0);
				// Extract the lowest element from the passthru operand.
				Value *Passthru =
				IC.Builder.CreateExtractElement(II.getArgOperand(2), (uint64_t)0);
				V = IC.Builder.CreateSelect(Mask, V, Passthru);
				}

				// Insert the result back into the original argument 0.
				V = IC.Builder.CreateInsertElement(Arg0, V, (uint64_t)0);

				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				}
				break;

				// Constant fold ashr( <A x Bi>, Ci ).
				// Constant fold lshr( <A x Bi>, Ci ).
				// Constant fold shl( <A x Bi>, Ci ).
				case Intrinsic::x86_sse2_psrai_d:
				case Intrinsic::x86_sse2_psrai_w:
				case Intrinsic::x86_avx2_psrai_d:
				case Intrinsic::x86_avx2_psrai_w:
				case Intrinsic::x86_avx512_psrai_q_128:
				case Intrinsic::x86_avx512_psrai_q_256:
				case Intrinsic::x86_avx512_psrai_d_512:
				case Intrinsic::x86_avx512_psrai_q_512:
				case Intrinsic::x86_avx512_psrai_w_512:
				case Intrinsic::x86_sse2_psrli_d:
				case Intrinsic::x86_sse2_psrli_q:
				case Intrinsic::x86_sse2_psrli_w:
				case Intrinsic::x86_avx2_psrli_d:
				case Intrinsic::x86_avx2_psrli_q:
				case Intrinsic::x86_avx2_psrli_w:
				case Intrinsic::x86_avx512_psrli_d_512:
				case Intrinsic::x86_avx512_psrli_q_512:
				case Intrinsic::x86_avx512_psrli_w_512:
				case Intrinsic::x86_sse2_pslli_d:
				case Intrinsic::x86_sse2_pslli_q:
				case Intrinsic::x86_sse2_pslli_w:
				case Intrinsic::x86_avx2_pslli_d:
				case Intrinsic::x86_avx2_pslli_q:
				case Intrinsic::x86_avx2_pslli_w:
				case Intrinsic::x86_avx512_pslli_d_512:
				case Intrinsic::x86_avx512_pslli_q_512:
				case Intrinsic::x86_avx512_pslli_w_512:
				if (Value *V = simplifyX86immShift(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_sse2_psra_d:
				case Intrinsic::x86_sse2_psra_w:
				case Intrinsic::x86_avx2_psra_d:
				case Intrinsic::x86_avx2_psra_w:
				case Intrinsic::x86_avx512_psra_q_128:
				case Intrinsic::x86_avx512_psra_q_256:
				case Intrinsic::x86_avx512_psra_d_512:
				case Intrinsic::x86_avx512_psra_q_512:
				case Intrinsic::x86_avx512_psra_w_512:
				case Intrinsic::x86_sse2_psrl_d:
				case Intrinsic::x86_sse2_psrl_q:
				case Intrinsic::x86_sse2_psrl_w:
				case Intrinsic::x86_avx2_psrl_d:
				case Intrinsic::x86_avx2_psrl_q:
				case Intrinsic::x86_avx2_psrl_w:
				case Intrinsic::x86_avx512_psrl_d_512:
				case Intrinsic::x86_avx512_psrl_q_512:
				case Intrinsic::x86_avx512_psrl_w_512:
				case Intrinsic::x86_sse2_psll_d:
				case Intrinsic::x86_sse2_psll_q:
				case Intrinsic::x86_sse2_psll_w:
				case Intrinsic::x86_avx2_psll_d:
				case Intrinsic::x86_avx2_psll_q:
				case Intrinsic::x86_avx2_psll_w:
				case Intrinsic::x86_avx512_psll_d_512:
				case Intrinsic::x86_avx512_psll_q_512:
				case Intrinsic::x86_avx512_psll_w_512: {
				if (Value *V = simplifyX86immShift(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}

				// SSE2/AVX2 uses only the first 64-bits of the 128-bit vector
				// operand to compute the shift amount.
				Value *Arg1 = II.getArgOperand(1);
				assert(Arg1->getType()->getPrimitiveSizeInBits() == 128 &&
				"Unexpected packed shift size");
				unsigned VWidth = cast<VectorType>(Arg1->getType())->getNumElements();

				if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, VWidth / 2)) {
				*ResultI = IC.replaceOperand(II, 1, V);
				return true;
				}
				break;
				}

				case Intrinsic::x86_avx2_psllv_d:
				case Intrinsic::x86_avx2_psllv_d_256:
				case Intrinsic::x86_avx2_psllv_q:
				case Intrinsic::x86_avx2_psllv_q_256:
				case Intrinsic::x86_avx512_psllv_d_512:
				case Intrinsic::x86_avx512_psllv_q_512:
				case Intrinsic::x86_avx512_psllv_w_128:
				case Intrinsic::x86_avx512_psllv_w_256:
				case Intrinsic::x86_avx512_psllv_w_512:
				case Intrinsic::x86_avx2_psrav_d:
				case Intrinsic::x86_avx2_psrav_d_256:
				case Intrinsic::x86_avx512_psrav_q_128:
				case Intrinsic::x86_avx512_psrav_q_256:
				case Intrinsic::x86_avx512_psrav_d_512:
				case Intrinsic::x86_avx512_psrav_q_512:
				case Intrinsic::x86_avx512_psrav_w_128:
				case Intrinsic::x86_avx512_psrav_w_256:
				case Intrinsic::x86_avx512_psrav_w_512:
				case Intrinsic::x86_avx2_psrlv_d:
				case Intrinsic::x86_avx2_psrlv_d_256:
				case Intrinsic::x86_avx2_psrlv_q:
				case Intrinsic::x86_avx2_psrlv_q_256:
				case Intrinsic::x86_avx512_psrlv_d_512:
				case Intrinsic::x86_avx512_psrlv_q_512:
				case Intrinsic::x86_avx512_psrlv_w_128:
				case Intrinsic::x86_avx512_psrlv_w_256:
				case Intrinsic::x86_avx512_psrlv_w_512:
				if (Value *V = simplifyX86varShift(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_sse2_packssdw_128:
				case Intrinsic::x86_sse2_packsswb_128:
				case Intrinsic::x86_avx2_packssdw:
				case Intrinsic::x86_avx2_packsswb:
				case Intrinsic::x86_avx512_packssdw_512:
				case Intrinsic::x86_avx512_packsswb_512:
				if (Value *V = simplifyX86pack(II, IC.Builder, true)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_sse2_packuswb_128:
				case Intrinsic::x86_sse41_packusdw:
				case Intrinsic::x86_avx2_packusdw:
				case Intrinsic::x86_avx2_packuswb:
				case Intrinsic::x86_avx512_packusdw_512:
				case Intrinsic::x86_avx512_packuswb_512:
				if (Value *V = simplifyX86pack(II, IC.Builder, false)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_pclmulqdq:
				case Intrinsic::x86_pclmulqdq_256:
				case Intrinsic::x86_pclmulqdq_512: {
				if (auto *C = dyn_cast<ConstantInt>(II.getArgOperand(2))) {
				unsigned Imm = C->getZExtValue();

				bool MadeChange = false;
				Value *Arg0 = II.getArgOperand(0);
				Value *Arg1 = II.getArgOperand(1);
				unsigned VWidth = cast<VectorType>(Arg0->getType())->getNumElements();

				APInt UndefElts1(VWidth, 0);
				APInt DemandedElts1 =
				APInt::getSplat(VWidth, APInt(2, (Imm & 0x01) ? 2 : 1));
				if (Value *V =
				IC.SimplifyDemandedVectorElts(Arg0, DemandedElts1, UndefElts1)) {
				IC.replaceOperand(II, 0, V);
				MadeChange = true;
				}

				APInt UndefElts2(VWidth, 0);
				APInt DemandedElts2 =
				APInt::getSplat(VWidth, APInt(2, (Imm & 0x10) ? 2 : 1));
				if (Value *V =
				IC.SimplifyDemandedVectorElts(Arg1, DemandedElts2, UndefElts2)) {
				IC.replaceOperand(II, 1, V);
				MadeChange = true;
				}

				// If either input elements are undef, the result is zero.
				if (DemandedElts1.isSubsetOf(UndefElts1) \|\|
				DemandedElts2.isSubsetOf(UndefElts2)) {
				*ResultI = IC.replaceInstUsesWith(
				II, ConstantAggregateZero::get(II.getType()));
				return true;
				}

				if (MadeChange) {
				*ResultI = &II;
				return true;
				}
				}
				break;
				}

				case Intrinsic::x86_sse41_insertps:
				if (Value *V = simplifyX86insertps(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_sse4a_extrq: {
				Value *Op0 = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);
				unsigned VWidth0 = cast<VectorType>(Op0->getType())->getNumElements();
				unsigned VWidth1 = cast<VectorType>(Op1->getType())->getNumElements();
				assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
				Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth0 == 2 &&
				VWidth1 == 16 && "Unexpected operand sizes");

				// See if we're dealing with constant values.
				Constant *C1 = dyn_cast<Constant>(Op1);
				ConstantInt *CILength =
				C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)0))
				: nullptr;
				ConstantInt *CIIndex =
				C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)1))
				: nullptr;

				// Attempt to simplify to a constant, shuffle vector or EXTRQI call.
				if (Value *V = simplifyX86extrq(II, Op0, CILength, CIIndex, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}

				// EXTRQ only uses the lowest 64-bits of the first 128-bit vector
				// operands and the lowest 16-bits of the second.
				bool MadeChange = false;
				if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth0, 1)) {
				IC.replaceOperand(II, 0, V);
				MadeChange = true;
				}
				if (Value *V = SimplifyDemandedVectorEltsLow(Op1, VWidth1, 2)) {
				IC.replaceOperand(II, 1, V);
				MadeChange = true;
				}
				if (MadeChange) {
				*ResultI = &II;
				return true;
				}
				break;
				}

				case Intrinsic::x86_sse4a_extrqi: {
				// EXTRQI: Extract Length bits starting from Index. Zero pad the remaining
				// bits of the lower 64-bits. The upper 64-bits are undefined.
				Value *Op0 = II.getArgOperand(0);
				unsigned VWidth = cast<VectorType>(Op0->getType())->getNumElements();
				assert(Op0->getType()->getPrimitiveSizeInBits() == 128 && VWidth == 2 &&
				"Unexpected operand size");

				// See if we're dealing with constant values.
				ConstantInt *CILength = dyn_cast<ConstantInt>(II.getArgOperand(1));
				ConstantInt *CIIndex = dyn_cast<ConstantInt>(II.getArgOperand(2));

				// Attempt to simplify to a constant or shuffle vector.
				if (Value *V = simplifyX86extrq(II, Op0, CILength, CIIndex, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}

				// EXTRQI only uses the lowest 64-bits of the first 128-bit vector
				// operand.
				if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth, 1)) {
				*ResultI = IC.replaceOperand(II, 0, V);
				return true;
				}
				break;
				}

				case Intrinsic::x86_sse4a_insertq: {
				Value *Op0 = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);
				unsigned VWidth = cast<VectorType>(Op0->getType())->getNumElements();
				assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
				Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth == 2 &&
				cast<VectorType>(Op1->getType())->getNumElements() == 2 &&
				"Unexpected operand size");

				// See if we're dealing with constant values.
				Constant *C1 = dyn_cast<Constant>(Op1);
				ConstantInt *CI11 =
				C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)1))
				: nullptr;

				// Attempt to simplify to a constant, shuffle vector or INSERTQI call.
				if (CI11) {
				const APInt &V11 = CI11->getValue();
				APInt Len = V11.zextOrTrunc(6);
				APInt Idx = V11.lshr(8).zextOrTrunc(6);
				if (Value *V = simplifyX86insertq(II, Op0, Op1, Len, Idx, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				}

				// INSERTQ only uses the lowest 64-bits of the first 128-bit vector
				// operand.
				if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth, 1)) {
				*ResultI = IC.replaceOperand(II, 0, V);
				return true;
				}
				break;
				}

				case Intrinsic::x86_sse4a_insertqi: {
				// INSERTQI: Extract lowest Length bits from lower half of second source and
				// insert over first source starting at Index bit. The upper 64-bits are
				// undefined.
				Value *Op0 = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);
				unsigned VWidth0 = cast<VectorType>(Op0->getType())->getNumElements();
				unsigned VWidth1 = cast<VectorType>(Op1->getType())->getNumElements();
				assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
				Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth0 == 2 &&
				VWidth1 == 2 && "Unexpected operand sizes");

				// See if we're dealing with constant values.
				ConstantInt *CILength = dyn_cast<ConstantInt>(II.getArgOperand(2));
				ConstantInt *CIIndex = dyn_cast<ConstantInt>(II.getArgOperand(3));

				// Attempt to simplify to a constant or shuffle vector.
				if (CILength && CIIndex) {
				APInt Len = CILength->getValue().zextOrTrunc(6);
				APInt Idx = CIIndex->getValue().zextOrTrunc(6);
				if (Value *V = simplifyX86insertq(II, Op0, Op1, Len, Idx, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				}

				// INSERTQI only uses the lowest 64-bits of the first two 128-bit vector
				// operands.
				bool MadeChange = false;
				if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth0, 1)) {
				IC.replaceOperand(II, 0, V);
				MadeChange = true;
				}
				if (Value *V = SimplifyDemandedVectorEltsLow(Op1, VWidth1, 1)) {
				IC.replaceOperand(II, 1, V);
				MadeChange = true;
				}
				if (MadeChange) {
				*ResultI = &II;
				return true;
				}
				break;
				}

				case Intrinsic::x86_sse41_pblendvb:
				case Intrinsic::x86_sse41_blendvps:
				case Intrinsic::x86_sse41_blendvpd:
				case Intrinsic::x86_avx_blendv_ps_256:
				case Intrinsic::x86_avx_blendv_pd_256:
				case Intrinsic::x86_avx2_pblendvb: {
				// fold (blend A, A, Mask) -> A
				Value *Op0 = II.getArgOperand(0);
				Value *Op1 = II.getArgOperand(1);
				Value *Mask = II.getArgOperand(2);
				if (Op0 == Op1) {
				*ResultI = IC.replaceInstUsesWith(II, Op0);
				return true;
				}

				// Zero Mask - select 1st argument.
				if (isa<ConstantAggregateZero>(Mask)) {
				*ResultI = IC.replaceInstUsesWith(II, Op0);
				return true;
				}

				// Constant Mask - select 1st/2nd argument lane based on top bit of mask.
				if (auto *ConstantMask = dyn_cast<ConstantDataVector>(Mask)) {
				Constant *NewSelector = getNegativeIsTrueBoolVec(ConstantMask);
				*ResultI = SelectInst::Create(NewSelector, Op1, Op0, "blendv");
				return true;
				}

				// Convert to a vector select if we can bypass casts and find a boolean
				// vector condition value.
				Value *BoolVec;
				Mask = InstCombiner::peekThroughBitcast(Mask);
				if (match(Mask, PatternMatch::m_SExt(PatternMatch::m_Value(BoolVec))) &&
				BoolVec->getType()->isVectorTy() &&
				BoolVec->getType()->getScalarSizeInBits() == 1) {
				assert(Mask->getType()->getPrimitiveSizeInBits() ==
				II.getType()->getPrimitiveSizeInBits() &&
				"Not expecting mask and operands with different sizes");

				unsigned NumMaskElts =
				cast<VectorType>(Mask->getType())->getNumElements();
				unsigned NumOperandElts =
				cast<VectorType>(II.getType())->getNumElements();
				if (NumMaskElts == NumOperandElts) {
				*ResultI = SelectInst::Create(BoolVec, Op1, Op0);
				return true;
				}

				// If the mask has less elements than the operands, each mask bit maps to
				// multiple elements of the operands. Bitcast back and forth.
				if (NumMaskElts < NumOperandElts) {
				Value *CastOp0 = IC.Builder.CreateBitCast(Op0, Mask->getType());
				Value *CastOp1 = IC.Builder.CreateBitCast(Op1, Mask->getType());
				Value *Sel = IC.Builder.CreateSelect(BoolVec, CastOp1, CastOp0);
				*ResultI = new BitCastInst(Sel, II.getType());
				return true;
				}
				}

				break;
				}

				case Intrinsic::x86_ssse3_pshuf_b_128:
				case Intrinsic::x86_avx2_pshuf_b:
				case Intrinsic::x86_avx512_pshuf_b_512:
				if (Value *V = simplifyX86pshufb(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_avx_vpermilvar_ps:
				case Intrinsic::x86_avx_vpermilvar_ps_256:
				case Intrinsic::x86_avx512_vpermilvar_ps_512:
				case Intrinsic::x86_avx_vpermilvar_pd:
				case Intrinsic::x86_avx_vpermilvar_pd_256:
				case Intrinsic::x86_avx512_vpermilvar_pd_512:
				if (Value *V = simplifyX86vpermilvar(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_avx2_permd:
				case Intrinsic::x86_avx2_permps:
				case Intrinsic::x86_avx512_permvar_df_256:
				case Intrinsic::x86_avx512_permvar_df_512:
				case Intrinsic::x86_avx512_permvar_di_256:
				case Intrinsic::x86_avx512_permvar_di_512:
				case Intrinsic::x86_avx512_permvar_hi_128:
				case Intrinsic::x86_avx512_permvar_hi_256:
				case Intrinsic::x86_avx512_permvar_hi_512:
				case Intrinsic::x86_avx512_permvar_qi_128:
				case Intrinsic::x86_avx512_permvar_qi_256:
				case Intrinsic::x86_avx512_permvar_qi_512:
				case Intrinsic::x86_avx512_permvar_sf_512:
				case Intrinsic::x86_avx512_permvar_si_512:
				if (Value *V = simplifyX86vpermv(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				case Intrinsic::x86_avx_maskload_ps:
				case Intrinsic::x86_avx_maskload_pd:
				case Intrinsic::x86_avx_maskload_ps_256:
				case Intrinsic::x86_avx_maskload_pd_256:
				case Intrinsic::x86_avx2_maskload_d:
				case Intrinsic::x86_avx2_maskload_q:
				case Intrinsic::x86_avx2_maskload_d_256:
				case Intrinsic::x86_avx2_maskload_q_256:
				if (Instruction *I = simplifyX86MaskedLoad(II, IC)) {
				*ResultI = I;
				return true;
				}
				break;

				case Intrinsic::x86_sse2_maskmov_dqu:
				case Intrinsic::x86_avx_maskstore_ps:
				case Intrinsic::x86_avx_maskstore_pd:
				case Intrinsic::x86_avx_maskstore_ps_256:
				case Intrinsic::x86_avx_maskstore_pd_256:
				case Intrinsic::x86_avx2_maskstore_d:
				case Intrinsic::x86_avx2_maskstore_q:
				case Intrinsic::x86_avx2_maskstore_d_256:
				case Intrinsic::x86_avx2_maskstore_q_256:
				if (simplifyX86MaskedStore(II, IC)) {
				*ResultI = nullptr;
				return true;
				}
				break;

				case Intrinsic::x86_addcarry_32:
				case Intrinsic::x86_addcarry_64:
				if (Value *V = simplifyX86addcarry(II, IC.Builder)) {
				*ResultI = IC.replaceInstUsesWith(II, V);
				return true;
				}
				break;

				default:
				break;
				}
				return false;
				}

				bool X86TTIImpl::simplifyDemandedUseBitsIntrinsic(
				InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known,
				bool &KnownBitsComputed, Value **ResultV) const {
				switch (II.getIntrinsicID()) {
				default:
				break;
				case Intrinsic::x86_mmx_pmovmskb:
				case Intrinsic::x86_sse_movmsk_ps:
				case Intrinsic::x86_sse2_movmsk_pd:
				case Intrinsic::x86_sse2_pmovmskb_128:
				case Intrinsic::x86_avx_movmsk_ps_256:
				case Intrinsic::x86_avx_movmsk_pd_256:
				case Intrinsic::x86_avx2_pmovmskb: {
				// MOVMSK copies the vector elements' sign bits to the low bits
				// and zeros the high bits.
				unsigned ArgWidth;
				if (II.getIntrinsicID() == Intrinsic::x86_mmx_pmovmskb) {
				ArgWidth = 8; // Arg is x86_mmx, but treated as <8 x i8>.
				} else {
				auto Arg = II.getArgOperand(0);
				auto ArgType = cast<VectorType>(Arg->getType());
				ArgWidth = ArgType->getNumElements();
				}

				// If we don't need any of low bits then return zero,
				// we know that DemandedMask is non-zero already.
				APInt DemandedElts = DemandedMask.zextOrTrunc(ArgWidth);
				Type *VTy = II.getType();
				if (DemandedElts.isNullValue()) {
				*ResultV = ConstantInt::getNullValue(VTy);
				return true;
				}

				// We know that the upper bits are set to zero.
				Known.Zero.setBitsFrom(ArgWidth);
				KnownBitsComputed = true;
				break;
				}
				case Intrinsic::x86_sse42_crc32_64_64:
				Known.Zero.setBitsFrom(32);
				KnownBitsComputed = true;
				break;
				}
				return false;
				}

				bool X86TTIImpl::simplifyDemandedVectorEltsIntrinsic(
				InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
				APInt &UndefElts2, APInt &UndefElts3,
				std::function<void(Instruction *, unsigned, APInt, APInt &)>
				simplifyAndSetOp,
				Value **ResultV) const {
				unsigned VWidth = cast<FixedVectorType>(II.getType())->getNumElements();
				switch (II.getIntrinsicID()) {
				default:
				break;
				case Intrinsic::x86_xop_vfrcz_ss:
				case Intrinsic::x86_xop_vfrcz_sd:
				// The instructions for these intrinsics are speced to zero upper bits not
				// pass them through like other scalar intrinsics. So we shouldn't just
				// use Arg0 if DemandedElts[0] is clear like we do for other intrinsics.
				// Instead we should return a zero vector.
				if (!DemandedElts[0]) {
				IC.addToWorklist(&II);
				*ResultV = ConstantAggregateZero::get(II.getType());
				return true;
				}

				// Only the lower element is used.
				DemandedElts = 1;
				simplifyAndSetOp(&II, 0, DemandedElts, UndefElts);

				// Only the lower element is undefined. The high elements are zero.
				UndefElts = UndefElts[0];
				break;

				// Unary scalar-as-vector operations that work column-wise.
				case Intrinsic::x86_sse_rcp_ss:
				case Intrinsic::x86_sse_rsqrt_ss:
				simplifyAndSetOp(&II, 0, DemandedElts, UndefElts);

				// If lowest element of a scalar op isn't used then use Arg0.
				if (!DemandedElts[0]) {
				IC.addToWorklist(&II);
				*ResultV = II.getArgOperand(0);
				return true;
				}
				// TODO: If only low elt lower SQRT to FSQRT (with rounding/exceptions
				// checks).
				break;

				// Binary scalar-as-vector operations that work column-wise. The high
				// elements come from operand 0. The low element is a function of both
				// operands.
				case Intrinsic::x86_sse_min_ss:
				case Intrinsic::x86_sse_max_ss:
				case Intrinsic::x86_sse_cmp_ss:
				case Intrinsic::x86_sse2_min_sd:
				case Intrinsic::x86_sse2_max_sd:
				case Intrinsic::x86_sse2_cmp_sd: {
				simplifyAndSetOp(&II, 0, DemandedElts, UndefElts);

				// If lowest element of a scalar op isn't used then use Arg0.
				if (!DemandedElts[0]) {
				IC.addToWorklist(&II);
				*ResultV = II.getArgOperand(0);
				return true;
				}

				// Only lower element is used for operand 1.
				DemandedElts = 1;
				simplifyAndSetOp(&II, 1, DemandedElts, UndefElts2);

				// Lower element is undefined if both lower elements are undefined.
				// Consider things like undef&0. The result is known zero, not undef.
				if (!UndefElts2[0])
				UndefElts.clearBit(0);

				break;
				}

				// Binary scalar-as-vector operations that work column-wise. The high
				// elements come from operand 0 and the low element comes from operand 1.
				case Intrinsic::x86_sse41_round_ss:
				case Intrinsic::x86_sse41_round_sd: {
				// Don't use the low element of operand 0.
				APInt DemandedElts2 = DemandedElts;
				DemandedElts2.clearBit(0);
				simplifyAndSetOp(&II, 0, DemandedElts2, UndefElts);

				// If lowest element of a scalar op isn't used then use Arg0.
				if (!DemandedElts[0]) {
				IC.addToWorklist(&II);
				*ResultV = II.getArgOperand(0);
				return true;
				}

				// Only lower element is used for operand 1.
				DemandedElts = 1;
				simplifyAndSetOp(&II, 1, DemandedElts, UndefElts2);

				// Take the high undef elements from operand 0 and take the lower element
				// from operand 1.
				UndefElts.clearBit(0);
				UndefElts \|= UndefElts2[0];
				break;
				}

				// Three input scalar-as-vector operations that work column-wise. The high
				// elements come from operand 0 and the low element is a function of all
				// three inputs.
				case Intrinsic::x86_avx512_mask_add_ss_round:
				case Intrinsic::x86_avx512_mask_div_ss_round:
				case Intrinsic::x86_avx512_mask_mul_ss_round:
				case Intrinsic::x86_avx512_mask_sub_ss_round:
				case Intrinsic::x86_avx512_mask_max_ss_round:
				case Intrinsic::x86_avx512_mask_min_ss_round:
				case Intrinsic::x86_avx512_mask_add_sd_round:
				case Intrinsic::x86_avx512_mask_div_sd_round:
				case Intrinsic::x86_avx512_mask_mul_sd_round:
				case Intrinsic::x86_avx512_mask_sub_sd_round:
				case Intrinsic::x86_avx512_mask_max_sd_round:
				case Intrinsic::x86_avx512_mask_min_sd_round:
				simplifyAndSetOp(&II, 0, DemandedElts, UndefElts);

				// If lowest element of a scalar op isn't used then use Arg0.
				if (!DemandedElts[0]) {
				IC.addToWorklist(&II);
				*ResultV = II.getArgOperand(0);
				return true;
				}

				// Only lower element is used for operand 1 and 2.
				DemandedElts = 1;
				simplifyAndSetOp(&II, 1, DemandedElts, UndefElts2);
				simplifyAndSetOp(&II, 2, DemandedElts, UndefElts3);

				// Lower element is undefined if all three lower elements are undefined.
				// Consider things like undef&0. The result is known zero, not undef.
				if (!UndefElts2[0] \|\| !UndefElts3[0])
				UndefElts.clearBit(0);

				break;

				case Intrinsic::x86_sse2_packssdw_128:
				case Intrinsic::x86_sse2_packsswb_128:
				case Intrinsic::x86_sse2_packuswb_128:
				case Intrinsic::x86_sse41_packusdw:
				case Intrinsic::x86_avx2_packssdw:
				case Intrinsic::x86_avx2_packsswb:
				case Intrinsic::x86_avx2_packusdw:
				case Intrinsic::x86_avx2_packuswb:
				case Intrinsic::x86_avx512_packssdw_512:
				case Intrinsic::x86_avx512_packsswb_512:
				case Intrinsic::x86_avx512_packusdw_512:
				case Intrinsic::x86_avx512_packuswb_512: {
				auto *Ty0 = II.getArgOperand(0)->getType();
				unsigned InnerVWidth = cast<VectorType>(Ty0)->getNumElements();
				assert(VWidth == (InnerVWidth * 2) && "Unexpected input size");

				unsigned NumLanes = Ty0->getPrimitiveSizeInBits() / 128;
				unsigned VWidthPerLane = VWidth / NumLanes;
				unsigned InnerVWidthPerLane = InnerVWidth / NumLanes;

				// Per lane, pack the elements of the first input and then the second.
				// e.g.
				// v8i16 PACK(v4i32 X, v4i32 Y) - (X[0..3],Y[0..3])
				// v32i8 PACK(v16i16 X, v16i16 Y) - (X[0..7],Y[0..7]),(X[8..15],Y[8..15])
				for (int OpNum = 0; OpNum != 2; ++OpNum) {
				APInt OpDemandedElts(InnerVWidth, 0);
				for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
				unsigned LaneIdx = Lane * VWidthPerLane;
				for (unsigned Elt = 0; Elt != InnerVWidthPerLane; ++Elt) {
				unsigned Idx = LaneIdx + Elt + InnerVWidthPerLane * OpNum;
				if (DemandedElts[Idx])
				OpDemandedElts.setBit((Lane * InnerVWidthPerLane) + Elt);
				}
				}

				// Demand elements from the operand.
				APInt OpUndefElts(InnerVWidth, 0);
				simplifyAndSetOp(&II, OpNum, OpDemandedElts, OpUndefElts);

				// Pack the operand's UNDEF elements, one lane at a time.
				OpUndefElts = OpUndefElts.zext(VWidth);
				for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
				APInt LaneElts = OpUndefElts.lshr(InnerVWidthPerLane * Lane);
				LaneElts = LaneElts.getLoBits(InnerVWidthPerLane);
				LaneElts <<= InnerVWidthPerLane * (2 * Lane + OpNum);
				UndefElts \|= LaneElts;
				}
				}
				break;
				}

				// PSHUFB
				case Intrinsic::x86_ssse3_pshuf_b_128:
				case Intrinsic::x86_avx2_pshuf_b:
				case Intrinsic::x86_avx512_pshuf_b_512:
				// PERMILVAR
				case Intrinsic::x86_avx_vpermilvar_ps:
				case Intrinsic::x86_avx_vpermilvar_ps_256:
				case Intrinsic::x86_avx512_vpermilvar_ps_512:
				case Intrinsic::x86_avx_vpermilvar_pd:
				case Intrinsic::x86_avx_vpermilvar_pd_256:
				case Intrinsic::x86_avx512_vpermilvar_pd_512:
				// PERMV
				case Intrinsic::x86_avx2_permd:
				case Intrinsic::x86_avx2_permps: {
				simplifyAndSetOp(&II, 1, DemandedElts, UndefElts);
				break;
				}

				// SSE4A instructions leave the upper 64-bits of the 128-bit result
				// in an undefined state.
				case Intrinsic::x86_sse4a_extrq:
				case Intrinsic::x86_sse4a_extrqi:
				case Intrinsic::x86_sse4a_insertq:
				case Intrinsic::x86_sse4a_insertqi:
				UndefElts.setHighBits(VWidth / 2);
				break;
				}
				return false;
				}

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show All 16 Lines
#define LLVM_LIB_TARGET_X86_X86TARGETTRANSFORMINFO_H		#define LLVM_LIB_TARGET_X86_X86TARGETTRANSFORMINFO_H

#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"

namespace llvm {		namespace llvm {

		class InstCombiner;

class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {		class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
typedef BasicTTIImplBase<X86TTIImpl> BaseT;		typedef BasicTTIImplBase<X86TTIImpl> BaseT;
typedef TargetTransformInfo TTI;		typedef TargetTransformInfo TTI;
friend BaseT;		friend BaseT;

const X86Subtarget *ST;		const X86Subtarget *ST;
const X86TargetLowering *TLI;		const X86TargetLowering *TLI;

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	int getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency);		TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency);
int getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,		int getGatherScatterOpCost(unsigned Opcode, Type DataTy, const Value Ptr,
bool VariableMask, unsigned Alignment,		bool VariableMask, unsigned Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I);		const Instruction *I);
int getAddressComputationCost(Type PtrTy, ScalarEvolution SE,		int getAddressComputationCost(Type PtrTy, ScalarEvolution SE,
const SCEV *Ptr);		const SCEV *Ptr);

		bool instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		Instruction **ResultI) const;
		bool simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,
		APInt DemandedMask, KnownBits &Known,
		bool &KnownBitsComputed,
		Value **ResultV) const;
		bool simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp,
		Value **ResultV) const;

unsigned getAtomicMemIntrinsicMaxElementSize() const;		unsigned getAtomicMemIntrinsicMaxElementSize() const;

int getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		int getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
int getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		int getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

int getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		int getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)
	tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)
	add_public_tablegen_target(InstCombineTableGen)

	add_llvm_component_library(LLVMInstCombine			add_llvm_component_library(LLVMInstCombine
	InstructionCombining.cpp			InstructionCombining.cpp
	InstCombineAddSub.cpp			InstCombineAddSub.cpp
	InstCombineAtomicRMW.cpp			InstCombineAtomicRMW.cpp
	InstCombineAndOrXor.cpp			InstCombineAndOrXor.cpp
	InstCombineCalls.cpp			InstCombineCalls.cpp
	InstCombineCasts.cpp			InstCombineCasts.cpp
	InstCombineCompares.cpp			InstCombineCompares.cpp
	Show All 16 Lines

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show All 23 Lines
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/AlignOf.h"		#include "llvm/Support/AlignOf.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

▲ Show 20 Lines • Show All 815 Lines • ▼ Show 20 Lines	if (match(Op0, m_OneUse(m_ZExt(m_NUWAdd(m_Value(X), m_Constant(NarrowC)))))) {
Constant *NewC = ConstantExpr::getAdd(WideC, Op1C);		Constant *NewC = ConstantExpr::getAdd(WideC, Op1C);
Value *WideX = Builder.CreateZExt(X, Ty);		Value *WideX = Builder.CreateZExt(X, Ty);
return BinaryOperator::CreateAdd(WideX, NewC);		return BinaryOperator::CreateAdd(WideX, NewC);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::foldAddWithConstant(BinaryOperator &Add) {		Instruction *InstCombinerImpl::foldAddWithConstant(BinaryOperator &Add) {
Value Op0 = Add.getOperand(0), Op1 = Add.getOperand(1);		Value Op0 = Add.getOperand(0), Op1 = Add.getOperand(1);
Constant *Op1C;		Constant *Op1C;
if (!match(Op1, m_Constant(Op1C)))		if (!match(Op1, m_Constant(Op1C)))
return nullptr;		return nullptr;

if (Instruction *NV = foldBinOpIntoSelectOrPhi(Add))		if (Instruction *NV = foldBinOpIntoSelectOrPhi(Add))
return NV;		return NV;

Show All 9 Lines	Instruction *InstCombinerImpl::foldAddWithConstant(BinaryOperator &Add) {
// add (sub X, Y), -1 --> add (not Y), X		// add (sub X, Y), -1 --> add (not Y), X
if (match(Op0, m_OneUse(m_Sub(m_Value(X), m_Value(Y)))) &&		if (match(Op0, m_OneUse(m_Sub(m_Value(X), m_Value(Y)))) &&
match(Op1, m_AllOnes()))		match(Op1, m_AllOnes()))
return BinaryOperator::CreateAdd(Builder.CreateNot(Y), X);		return BinaryOperator::CreateAdd(Builder.CreateNot(Y), X);

// zext(bool) + C -> bool ? C + 1 : C		// zext(bool) + C -> bool ? C + 1 : C
if (match(Op0, m_ZExt(m_Value(X))) &&		if (match(Op0, m_ZExt(m_Value(X))) &&
X->getType()->getScalarSizeInBits() == 1)		X->getType()->getScalarSizeInBits() == 1)
return SelectInst::Create(X, AddOne(Op1C), Op1);		return SelectInst::Create(X, InstCombiner::AddOne(Op1C), Op1);
// sext(bool) + C -> bool ? C - 1 : C		// sext(bool) + C -> bool ? C - 1 : C
if (match(Op0, m_SExt(m_Value(X))) &&		if (match(Op0, m_SExt(m_Value(X))) &&
X->getType()->getScalarSizeInBits() == 1)		X->getType()->getScalarSizeInBits() == 1)
return SelectInst::Create(X, SubOne(Op1C), Op1);		return SelectInst::Create(X, InstCombiner::SubOne(Op1C), Op1);

// ~X + C --> (C-1) - X		// ~X + C --> (C-1) - X
if (match(Op0, m_Not(m_Value(X))))		if (match(Op0, m_Not(m_Value(X))))
return BinaryOperator::CreateSub(SubOne(Op1C), X);		return BinaryOperator::CreateSub(InstCombiner::SubOne(Op1C), X);

const APInt *C;		const APInt *C;
if (!match(Op1, m_APInt(C)))		if (!match(Op1, m_APInt(C)))
return nullptr;		return nullptr;

// (X \| C2) + C --> (X \| C2) ^ C2 iff (C2 == -C)		// (X \| C2) + C --> (X \| C2) ^ C2 iff (C2 == -C)
const APInt *C2;		const APInt *C2;
if (match(Op0, m_Or(m_Value(), m_APInt(C2))) && C2 == -C)		if (match(Op0, m_Or(m_Value(), m_APInt(C2))) && C2 == -C)
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	if (IsSigned)
(void)C0.smul_ov(C1, overflow);		(void)C0.smul_ov(C1, overflow);
else		else
(void)C0.umul_ov(C1, overflow);		(void)C0.umul_ov(C1, overflow);
return overflow;		return overflow;
}		}

// Simplifies X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1), where (C0 * C1)		// Simplifies X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1), where (C0 * C1)
// does not overflow.		// does not overflow.
Value *InstCombiner::SimplifyAddWithRemainder(BinaryOperator &I) {		Value *InstCombinerImpl::SimplifyAddWithRemainder(BinaryOperator &I) {
Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
Value X, MulOpV;		Value X, MulOpV;
APInt C0, MulOpC;		APInt C0, MulOpC;
bool IsSigned;		bool IsSigned;
// Match I = X % C0 + MulOpV * C0		// Match I = X % C0 + MulOpV * C0
if (((MatchRem(LHS, X, C0, IsSigned) && MatchMul(RHS, MulOpV, MulOpC)) \|\|		if (((MatchRem(LHS, X, C0, IsSigned) && MatchMul(RHS, MulOpV, MulOpC)) \|\|
(MatchRem(RHS, X, C0, IsSigned) && MatchMul(LHS, MulOpV, MulOpC))) &&		(MatchRem(RHS, X, C0, IsSigned) && MatchMul(LHS, MulOpV, MulOpC))) &&
C0 == MulOpC) {		C0 == MulOpC) {
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	static Instruction *foldToUnsignedSaturatedAdd(BinaryOperator &I) {
const APInt C, NotC;		const APInt C, NotC;
if (match(&I, m_Add(m_UMin(m_Value(X), m_APInt(NotC)), m_APInt(C))) &&		if (match(&I, m_Add(m_UMin(m_Value(X), m_APInt(NotC)), m_APInt(C))) &&
C == ~NotC)		C == ~NotC)
return CallInst::Create(getUAddSat(), { X, ConstantInt::get(Ty, *C) });		return CallInst::Create(getUAddSat(), { X, ConstantInt::get(Ty, *C) });

return nullptr;		return nullptr;
}		}

Instruction *		Instruction *InstCombinerImpl::
InstCombiner::canonicalizeCondSignextOfHighBitExtractToSignextHighBitExtract(		canonicalizeCondSignextOfHighBitExtractToSignextHighBitExtract(
BinaryOperator &I) {		BinaryOperator &I) {
assert((I.getOpcode() == Instruction::Add \|\|		assert((I.getOpcode() == Instruction::Add \|\|
I.getOpcode() == Instruction::Or \|\|		I.getOpcode() == Instruction::Or \|\|
I.getOpcode() == Instruction::Sub) &&		I.getOpcode() == Instruction::Sub) &&
"Expecting add/or/sub instruction");		"Expecting add/or/sub instruction");

// We have a subtraction/addition between a (potentially truncated) logical		// We have a subtraction/addition between a (potentially truncated) logical
// right-shift of X and a "select".		// right-shift of X and a "select".
Value X, Select;		Value X, Select;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::
NewAShr->copyIRFlags(Extract); // Preserve `exact`-ness.		NewAShr->copyIRFlags(Extract); // Preserve `exact`-ness.
if (!HadTrunc)		if (!HadTrunc)
return NewAShr;		return NewAShr;

Builder.Insert(NewAShr);		Builder.Insert(NewAShr);
return TruncInst::CreateTruncOrBitCast(NewAShr, I.getType());		return TruncInst::CreateTruncOrBitCast(NewAShr, I.getType());
}		}

Instruction *InstCombiner::visitAdd(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
if (Value *V = SimplifyAddInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyAddInst(I.getOperand(0), I.getOperand(1),
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	static Instruction *factorizeFAddFSub(BinaryOperator &I,
const APFloat *C;		const APFloat *C;
if (match(XY, m_APFloat(C)) && !C->isNormal())		if (match(XY, m_APFloat(C)) && !C->isNormal())
return nullptr;		return nullptr;

return IsFMul ? BinaryOperator::CreateFMulFMF(XY, Z, &I)		return IsFMul ? BinaryOperator::CreateFMulFMF(XY, Z, &I)
: BinaryOperator::CreateFDivFMF(XY, Z, &I);		: BinaryOperator::CreateFDivFMF(XY, Z, &I);
}		}

Instruction *InstCombiner::visitFAdd(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitFAdd(BinaryOperator &I) {
if (Value *V = SimplifyFAddInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFAddInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFAdd(BinaryOperator &I) {
}		}

return nullptr;		return nullptr;
}		}

/// Optimize pointer differences into the same array into a size. Consider:		/// Optimize pointer differences into the same array into a size. Consider:
/// &A[10] - &A[0]: we should compile this to "10". LHS/RHS are the pointer		/// &A[10] - &A[0]: we should compile this to "10". LHS/RHS are the pointer
/// operands to the ptrtoint instructions for the LHS/RHS of the subtract.		/// operands to the ptrtoint instructions for the LHS/RHS of the subtract.
Value InstCombiner::OptimizePointerDifference(Value LHS, Value *RHS,		Value InstCombinerImpl::OptimizePointerDifference(Value LHS, Value *RHS,
Type *Ty, bool IsNUW) {		Type *Ty, bool IsNUW) {
// If LHS is a gep based on RHS or RHS is a gep based on LHS, we can optimize		// If LHS is a gep based on RHS or RHS is a gep based on LHS, we can optimize
// this.		// this.
bool Swapped = false;		bool Swapped = false;
GEPOperator GEP1 = nullptr, GEP2 = nullptr;		GEPOperator GEP1 = nullptr, GEP2 = nullptr;

// For now we require one side to be the base pointer "A" or a constant		// For now we require one side to be the base pointer "A" or a constant
// GEP derived from it.		// GEP derived from it.
if (GEPOperator *LHSGEP = dyn_cast<GEPOperator>(LHS)) {		if (GEPOperator *LHSGEP = dyn_cast<GEPOperator>(LHS)) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	Value InstCombinerImpl::OptimizePointerDifference(Value LHS, Value *RHS,

// If we have p - gep(p, ...) then we have to negate the result.		// If we have p - gep(p, ...) then we have to negate the result.
if (Swapped)		if (Swapped)
Result = Builder.CreateNeg(Result, "diff.neg");		Result = Builder.CreateNeg(Result, "diff.neg");

return Builder.CreateIntCast(Result, Ty, true);		return Builder.CreateIntCast(Result, Ty, true);
}		}

Instruction *InstCombiner::visitSub(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
if (Value *V = SimplifySubInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifySubInst(I.getOperand(0), I.getOperand(1),
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	Value *Rdx = Builder.CreateIntrinsic(
Intrinsic::experimental_vector_reduce_add, {Sub->getType()}, {Sub});		Intrinsic::experimental_vector_reduce_add, {Sub->getType()}, {Sub});
return replaceInstUsesWith(I, Rdx);		return replaceInstUsesWith(I, Rdx);
}		}

if (Constant *C = dyn_cast<Constant>(Op0)) {		if (Constant *C = dyn_cast<Constant>(Op0)) {
Value *X;		Value *X;
if (match(Op1, m_ZExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1))		if (match(Op1, m_ZExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1))
// C - (zext bool) --> bool ? C - 1 : C		// C - (zext bool) --> bool ? C - 1 : C
return SelectInst::Create(X, SubOne(C), C);		return SelectInst::Create(X, InstCombiner::SubOne(C), C);
if (match(Op1, m_SExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1))		if (match(Op1, m_SExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1))
// C - (sext bool) --> bool ? C + 1 : C		// C - (sext bool) --> bool ? C + 1 : C
return SelectInst::Create(X, AddOne(C), C);		return SelectInst::Create(X, InstCombiner::AddOne(C), C);

// C - ~X == X + (1+C)		// C - ~X == X + (1+C)
if (match(Op1, m_Not(m_Value(X))))		if (match(Op1, m_Not(m_Value(X))))
return BinaryOperator::CreateAdd(X, AddOne(C));		return BinaryOperator::CreateAdd(X, InstCombiner::AddOne(C));

// Try to fold constant sub into select arguments.		// Try to fold constant sub into select arguments.
if (SelectInst *SI = dyn_cast<SelectInst>(Op1))		if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
if (Instruction *R = FoldOpIntoSelect(I, SI))		if (Instruction *R = FoldOpIntoSelect(I, SI))
return R;		return R;

// Try to fold constant sub into PHI values.		// Try to fold constant sub into PHI values.
if (PHINode *PN = dyn_cast<PHINode>(Op1))		if (PHINode *PN = dyn_cast<PHINode>(Op1))
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (match(FNeg, m_OneUse(m_FMul(m_Value(X), m_Value(Y)))))
return BinaryOperator::CreateFMulFMF(Builder.CreateFNegFMF(X, &I), Y, &I);		return BinaryOperator::CreateFMulFMF(Builder.CreateFNegFMF(X, &I), Y, &I);

if (match(FNeg, m_OneUse(m_FDiv(m_Value(X), m_Value(Y)))))		if (match(FNeg, m_OneUse(m_FDiv(m_Value(X), m_Value(Y)))))
return BinaryOperator::CreateFDivFMF(Builder.CreateFNegFMF(X, &I), Y, &I);		return BinaryOperator::CreateFDivFMF(Builder.CreateFNegFMF(X, &I), Y, &I);

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFNeg(UnaryOperator &I) {		Instruction *InstCombinerImpl::visitFNeg(UnaryOperator &I) {
Value *Op = I.getOperand(0);		Value *Op = I.getOperand(0);

if (Value *V = SimplifyFNegInst(Op, I.getFastMathFlags(),		if (Value *V = SimplifyFNegInst(Op, I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		getSimplifyQuery().getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldFNegIntoConstant(I))		if (Instruction *X = foldFNegIntoConstant(I))
return X;		return X;

Value X, Y;		Value X, Y;

// If we can ignore the sign of zeros: -(X - Y) --> (Y - X)		// If we can ignore the sign of zeros: -(X - Y) --> (Y - X)
if (I.hasNoSignedZeros() &&		if (I.hasNoSignedZeros() &&
match(Op, m_OneUse(m_FSub(m_Value(X), m_Value(Y)))))		match(Op, m_OneUse(m_FSub(m_Value(X), m_Value(Y)))))
return BinaryOperator::CreateFSubFMF(Y, X, &I);		return BinaryOperator::CreateFSubFMF(Y, X, &I);

if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))		if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))
return R;		return R;

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFSub(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitFSub(BinaryOperator &I) {
if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		getSimplifyQuery().getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

// Subtraction from -0.0 is the canonical form of fneg.		// Subtraction from -0.0 is the canonical form of fneg.
// fsub -0.0, X ==> fneg X		// fsub -0.0, X ==> fneg X
// fsub nsz 0.0, X ==> fneg nsz X		// fsub nsz 0.0, X ==> fneg nsz X
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

//===- InstCombineAndOrXor.cpp --------------------------------------------===//		//===- InstCombineAndOrXor.cpp --------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the visitAnd, visitOr, and visitXor functions.		// This file implements the visitAnd, visitOr, and visitXor functions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/Analysis/CmpInstAnalysis.h"		#include "llvm/Analysis/CmpInstAnalysis.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
		#include "llvm/Transforms/Utils/Local.h"

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

/// Similar to getICmpCode but for FCmpInst. This encodes a fcmp predicate into		/// Similar to getICmpCode but for FCmpInst. This encodes a fcmp predicate into
/// a four bit mask.		/// a four bit mask.
static unsigned getFCmpCode(FCmpInst::Predicate CC) {		static unsigned getFCmpCode(FCmpInst::Predicate CC) {
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	static Value *SimplifyBSwap(BinaryOperator &I,
Value *BinOp = Builder.CreateBinOp(I.getOpcode(), NewLHS, NewRHS);		Value *BinOp = Builder.CreateBinOp(I.getOpcode(), NewLHS, NewRHS);
Function *F = Intrinsic::getDeclaration(I.getModule(), Intrinsic::bswap,		Function *F = Intrinsic::getDeclaration(I.getModule(), Intrinsic::bswap,
I.getType());		I.getType());
return Builder.CreateCall(F, BinOp);		return Builder.CreateCall(F, BinOp);
}		}

/// This handles expressions of the form ((val OP C1) & C2). Where		/// This handles expressions of the form ((val OP C1) & C2). Where
/// the Op parameter is 'OP', OpRHS is 'C1', and AndRHS is 'C2'.		/// the Op parameter is 'OP', OpRHS is 'C1', and AndRHS is 'C2'.
Instruction InstCombiner::OptAndOp(BinaryOperator Op,		Instruction InstCombinerImpl::OptAndOp(BinaryOperator Op, ConstantInt *OpRHS,
ConstantInt *OpRHS,
ConstantInt *AndRHS,		ConstantInt *AndRHS,
BinaryOperator &TheAnd) {		BinaryOperator &TheAnd) {
Value *X = Op->getOperand(0);		Value *X = Op->getOperand(0);

switch (Op->getOpcode()) {		switch (Op->getOpcode()) {
default: break;		default: break;
case Instruction::Add:		case Instruction::Add:
if (Op->hasOneUse()) {		if (Op->hasOneUse()) {
// Adding a one to a single bit bit-field should be turned into an XOR		// Adding a one to a single bit bit-field should be turned into an XOR
// of the bit. First thing to check is to see if this AND is with a		// of the bit. First thing to check is to see if this AND is with a
Show All 27 Lines	case Instruction::Add:
break;		break;
}		}
return nullptr;		return nullptr;
}		}

/// Emit a computation of: (V >= Lo && V < Hi) if Inside is true, otherwise		/// Emit a computation of: (V >= Lo && V < Hi) if Inside is true, otherwise
/// (V < Lo \|\| V >= Hi). This method expects that Lo < Hi. IsSigned indicates		/// (V < Lo \|\| V >= Hi). This method expects that Lo < Hi. IsSigned indicates
/// whether to treat V, Lo, and Hi as signed or not.		/// whether to treat V, Lo, and Hi as signed or not.
Value InstCombiner::insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,		Value InstCombinerImpl::insertRangeTest(Value V, const APInt &Lo,
bool isSigned, bool Inside) {		const APInt &Hi, bool isSigned,
		bool Inside) {
assert((isSigned ? Lo.slt(Hi) : Lo.ult(Hi)) &&		assert((isSigned ? Lo.slt(Hi) : Lo.ult(Hi)) &&
"Lo is not < Hi in range emission code!");		"Lo is not < Hi in range emission code!");

Type *Ty = V->getType();		Type *Ty = V->getType();

// V >= Min && V < Hi --> V < Hi		// V >= Min && V < Hi --> V < Hi
// V < Min \|\| V >= Hi --> V >= Hi		// V < Min \|\| V >= Hi --> V >= Hi
ICmpInst::Predicate Pred = Inside ? ICmpInst::ICMP_ULT : ICmpInst::ICMP_UGE;		ICmpInst::Predicate Pred = Inside ? ICmpInst::ICMP_ULT : ICmpInst::ICMP_UGE;
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	getMaskedTypeForICmpPair(Value &A, Value &B, Value *&C,
unsigned RightType = getMaskedICmpType(A, D, E, PredR);		unsigned RightType = getMaskedICmpType(A, D, E, PredR);
return Optional<std::pair<unsigned, unsigned>>(std::make_pair(LeftType, RightType));		return Optional<std::pair<unsigned, unsigned>>(std::make_pair(LeftType, RightType));
}		}

/// Try to fold (icmp(A & B) ==/!= C) &/\| (icmp(A & D) ==/!= E) into a single		/// Try to fold (icmp(A & B) ==/!= C) &/\| (icmp(A & D) ==/!= E) into a single
/// (icmp(A & X) ==/!= Y), where the left-hand side is of type Mask_NotAllZeros		/// (icmp(A & X) ==/!= Y), where the left-hand side is of type Mask_NotAllZeros
/// and the right hand side is of type BMask_Mixed. For example,		/// and the right hand side is of type BMask_Mixed. For example,
/// (icmp (A & 12) != 0) & (icmp (A & 15) == 8) -> (icmp (A & 15) == 8).		/// (icmp (A & 12) != 0) & (icmp (A & 15) == 8) -> (icmp (A & 15) == 8).
static Value * foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed(		static Value *foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed(
ICmpInst LHS, ICmpInst RHS, bool IsAnd,		ICmpInst LHS, ICmpInst RHS, bool IsAnd, Value A, Value B, Value *C,
Value A, Value B, Value C, Value D, Value *E,		Value D, Value E, ICmpInst::Predicate PredL, ICmpInst::Predicate PredR,
ICmpInst::Predicate PredL, ICmpInst::Predicate PredR,		InstCombiner::BuilderTy &Builder) {
llvm::InstCombiner::BuilderTy &Builder) {
// We are given the canonical form:		// We are given the canonical form:
// (icmp ne (A & B), 0) & (icmp eq (A & D), E).		// (icmp ne (A & B), 0) & (icmp eq (A & D), E).
// where D & E == E.		// where D & E == E.
//		//
// If IsAnd is false, we get it in negated form:		// If IsAnd is false, we get it in negated form:
// (icmp eq (A & B), 0) \| (icmp ne (A & D), E) ->		// (icmp eq (A & B), 0) \| (icmp ne (A & D), E) ->
// !((icmp ne (A & B), 0) & (icmp eq (A & D), E)).		// !((icmp ne (A & B), 0) & (icmp eq (A & D), E)).
//		//
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	static Value *foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed(
// (icmp ne (A & 6), 0) & (icmp eq (A & 15), 8) -> false.		// (icmp ne (A & 6), 0) & (icmp eq (A & 15), 8) -> false.
return ConstantInt::get(LHS->getType(), !IsAnd);		return ConstantInt::get(LHS->getType(), !IsAnd);
}		}

/// Try to fold (icmp(A & B) ==/!= 0) &/\| (icmp(A & D) ==/!= E) into a single		/// Try to fold (icmp(A & B) ==/!= 0) &/\| (icmp(A & D) ==/!= E) into a single
/// (icmp(A & X) ==/!= Y), where the left-hand side and the right hand side		/// (icmp(A & X) ==/!= Y), where the left-hand side and the right hand side
/// aren't of the common mask pattern type.		/// aren't of the common mask pattern type.
static Value *foldLogOpOfMaskedICmpsAsymmetric(		static Value *foldLogOpOfMaskedICmpsAsymmetric(
ICmpInst LHS, ICmpInst RHS, bool IsAnd,		ICmpInst LHS, ICmpInst RHS, bool IsAnd, Value A, Value B, Value *C,
Value A, Value B, Value C, Value D, Value *E,		Value D, Value E, ICmpInst::Predicate PredL, ICmpInst::Predicate PredR,
ICmpInst::Predicate PredL, ICmpInst::Predicate PredR,		unsigned LHSMask, unsigned RHSMask, InstCombiner::BuilderTy &Builder) {
unsigned LHSMask, unsigned RHSMask,
llvm::InstCombiner::BuilderTy &Builder) {
assert(ICmpInst::isEquality(PredL) && ICmpInst::isEquality(PredR) &&		assert(ICmpInst::isEquality(PredL) && ICmpInst::isEquality(PredR) &&
"Expected equality predicates for masked type of icmps.");		"Expected equality predicates for masked type of icmps.");
// Handle Mask_NotAllZeros-BMask_Mixed cases.		// Handle Mask_NotAllZeros-BMask_Mixed cases.
// (icmp ne/eq (A & B), C) &/\| (icmp eq/ne (A & D), E), or		// (icmp ne/eq (A & B), C) &/\| (icmp eq/ne (A & D), E), or
// (icmp eq/ne (A & B), C) &/\| (icmp ne/eq (A & D), E)		// (icmp eq/ne (A & B), C) &/\| (icmp ne/eq (A & D), E)
// which gets swapped to		// which gets swapped to
// (icmp ne/eq (A & D), E) &/\| (icmp eq/ne (A & B), C).		// (icmp ne/eq (A & D), E) &/\| (icmp eq/ne (A & B), C).
if (!IsAnd) {		if (!IsAnd) {
Show All 14 Lines	if ((LHSMask & Mask_NotAllZeros) && (RHSMask & BMask_Mixed)) {
}		}
}		}
return nullptr;		return nullptr;
}		}

/// Try to fold (icmp(A & B) ==/!= C) &/\| (icmp(A & D) ==/!= E)		/// Try to fold (icmp(A & B) ==/!= C) &/\| (icmp(A & D) ==/!= E)
/// into a single (icmp(A & X) ==/!= Y).		/// into a single (icmp(A & X) ==/!= Y).
static Value foldLogOpOfMaskedICmps(ICmpInst LHS, ICmpInst *RHS, bool IsAnd,		static Value foldLogOpOfMaskedICmps(ICmpInst LHS, ICmpInst *RHS, bool IsAnd,
llvm::InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
Value A = nullptr, B = nullptr, C = nullptr, D = nullptr, *E = nullptr;		Value A = nullptr, B = nullptr, C = nullptr, D = nullptr, *E = nullptr;
ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();		ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();
Optional<std::pair<unsigned, unsigned>> MaskPair =		Optional<std::pair<unsigned, unsigned>> MaskPair =
getMaskedTypeForICmpPair(A, B, C, D, E, LHS, RHS, PredL, PredR);		getMaskedTypeForICmpPair(A, B, C, D, E, LHS, RHS, PredL, PredR);
if (!MaskPair)		if (!MaskPair)
return nullptr;		return nullptr;
assert(ICmpInst::isEquality(PredL) && ICmpInst::isEquality(PredR) &&		assert(ICmpInst::isEquality(PredL) && ICmpInst::isEquality(PredR) &&
"Expected equality predicates for masked type of icmps.");		"Expected equality predicates for masked type of icmps.");
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	static Value foldLogOpOfMaskedICmps(ICmpInst LHS, ICmpInst *RHS, bool IsAnd,

return nullptr;		return nullptr;
}		}

/// Try to fold a signed range checked with lower bound 0 to an unsigned icmp.		/// Try to fold a signed range checked with lower bound 0 to an unsigned icmp.
/// Example: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n		/// Example: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
/// If \p Inverted is true then the check is for the inverted range, e.g.		/// If \p Inverted is true then the check is for the inverted range, e.g.
/// (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n		/// (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n
Value InstCombiner::simplifyRangeCheck(ICmpInst Cmp0, ICmpInst *Cmp1,		Value InstCombinerImpl::simplifyRangeCheck(ICmpInst Cmp0, ICmpInst *Cmp1,
bool Inverted) {		bool Inverted) {
// Check the lower range comparison, e.g. x >= 0		// Check the lower range comparison, e.g. x >= 0
// InstCombine already ensured that if there is a constant it's on the RHS.		// InstCombine already ensured that if there is a constant it's on the RHS.
ConstantInt *RangeStart = dyn_cast<ConstantInt>(Cmp0->getOperand(1));		ConstantInt *RangeStart = dyn_cast<ConstantInt>(Cmp0->getOperand(1));
if (!RangeStart)		if (!RangeStart)
return nullptr;		return nullptr;

ICmpInst::Predicate Pred0 = (Inverted ? Cmp0->getInversePredicate() :		ICmpInst::Predicate Pred0 = (Inverted ? Cmp0->getInversePredicate() :
Cmp0->getPredicate());		Cmp0->getPredicate());
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	if (C1 == C2 - 1) {
return Builder.CreateICmp(NewPred, Add, ConstantInt::get(X->getType(), 1));		return Builder.CreateICmp(NewPred, Add, ConstantInt::get(X->getType(), 1));
}		}

return nullptr;		return nullptr;
}		}

// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)		// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)
// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)		// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)
Value InstCombiner::foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,		Value InstCombinerImpl::foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS,
		ICmpInst *RHS,
BinaryOperator &Logic) {		BinaryOperator &Logic) {
bool JoinedByAnd = Logic.getOpcode() == Instruction::And;		bool JoinedByAnd = Logic.getOpcode() == Instruction::And;
assert((JoinedByAnd \|\| Logic.getOpcode() == Instruction::Or) &&		assert((JoinedByAnd \|\| Logic.getOpcode() == Instruction::Or) &&
"Wrong opcode");		"Wrong opcode");
ICmpInst::Predicate Pred = LHS->getPredicate();		ICmpInst::Predicate Pred = LHS->getPredicate();
if (Pred != RHS->getPredicate())		if (Pred != RHS->getPredicate())
return nullptr;		return nullptr;
if (JoinedByAnd && Pred != ICmpInst::ICMP_NE)		if (JoinedByAnd && Pred != ICmpInst::ICMP_NE)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	if (!SubstituteCmp) {
if (!Cmp1->hasOneUse())		if (!Cmp1->hasOneUse())
return nullptr;		return nullptr;
SubstituteCmp = Builder.CreateICmp(Pred1, Y, C);		SubstituteCmp = Builder.CreateICmp(Pred1, Y, C);
}		}
return Builder.CreateBinOp(Logic.getOpcode(), Cmp0, SubstituteCmp);		return Builder.CreateBinOp(Logic.getOpcode(), Cmp0, SubstituteCmp);
}		}

/// Fold (icmp)&(icmp) if possible.		/// Fold (icmp)&(icmp) if possible.
Value InstCombiner::foldAndOfICmps(ICmpInst LHS, ICmpInst *RHS,		Value InstCombinerImpl::foldAndOfICmps(ICmpInst LHS, ICmpInst *RHS,
BinaryOperator &And) {		BinaryOperator &And) {
const SimplifyQuery Q = SQ.getWithInstruction(&And);		const SimplifyQuery Q = SQ.getWithInstruction(&And);

// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)		// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)
// if K1 and K2 are a one-bit mask.		// if K1 and K2 are a one-bit mask.
if (Value *V = foldAndOrOfICmpsOfAndWithPow2(LHS, RHS, And))		if (Value *V = foldAndOrOfICmpsOfAndWithPow2(LHS, RHS, And))
return V;		return V;

ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();		ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	case ICmpInst::ICMP_SLT: // (X s> 13 & X s< 15) -> (X-14) u< 1
true);		true);
}		}
break;		break;
}		}

return nullptr;		return nullptr;
}		}

Value InstCombiner::foldLogicOfFCmps(FCmpInst LHS, FCmpInst *RHS, bool IsAnd) {		Value InstCombinerImpl::foldLogicOfFCmps(FCmpInst LHS, FCmpInst *RHS,
		bool IsAnd) {
Value LHS0 = LHS->getOperand(0), LHS1 = LHS->getOperand(1);		Value LHS0 = LHS->getOperand(0), LHS1 = LHS->getOperand(1);
Value RHS0 = RHS->getOperand(0), RHS1 = RHS->getOperand(1);		Value RHS0 = RHS->getOperand(0), RHS1 = RHS->getOperand(1);
FCmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();		FCmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();

if (LHS0 == RHS1 && RHS0 == LHS1) {		if (LHS0 == RHS1 && RHS0 == LHS1) {
// Swap RHS operands to match LHS.		// Swap RHS operands to match LHS.
PredR = FCmpInst::getSwappedPredicate(PredR);		PredR = FCmpInst::getSwappedPredicate(PredR);
std::swap(RHS0, RHS1);		std::swap(RHS0, RHS1);
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	assert((Opcode == Instruction::And \|\| Opcode == Instruction::Or) &&
"Trying to match De Morgan's Laws with something other than and/or");		"Trying to match De Morgan's Laws with something other than and/or");

// Flip the logic operation.		// Flip the logic operation.
Opcode = (Opcode == Instruction::And) ? Instruction::Or : Instruction::And;		Opcode = (Opcode == Instruction::And) ? Instruction::Or : Instruction::And;

Value A, B;		Value A, B;
if (match(I.getOperand(0), m_OneUse(m_Not(m_Value(A)))) &&		if (match(I.getOperand(0), m_OneUse(m_Not(m_Value(A)))) &&
match(I.getOperand(1), m_OneUse(m_Not(m_Value(B)))) &&		match(I.getOperand(1), m_OneUse(m_Not(m_Value(B)))) &&
!isFreeToInvert(A, A->hasOneUse()) &&		!InstCombiner::isFreeToInvert(A, A->hasOneUse()) &&
!isFreeToInvert(B, B->hasOneUse())) {		!InstCombiner::isFreeToInvert(B, B->hasOneUse())) {
Value *AndOr = Builder.CreateBinOp(Opcode, A, B, I.getName() + ".demorgan");		Value *AndOr = Builder.CreateBinOp(Opcode, A, B, I.getName() + ".demorgan");
return BinaryOperator::CreateNot(AndOr);		return BinaryOperator::CreateNot(AndOr);
}		}

return nullptr;		return nullptr;
}		}

bool InstCombiner::shouldOptimizeCast(CastInst *CI) {		bool InstCombinerImpl::shouldOptimizeCast(CastInst *CI) {
Value *CastSrc = CI->getOperand(0);		Value *CastSrc = CI->getOperand(0);

// Noop casts and casts of constants should be eliminated trivially.		// Noop casts and casts of constants should be eliminated trivially.
if (CI->getSrcTy() == CI->getDestTy() \|\| isa<Constant>(CastSrc))		if (CI->getSrcTy() == CI->getDestTy() \|\| isa<Constant>(CastSrc))
return false;		return false;

// If this cast is paired with another cast that can be eliminated, we prefer		// If this cast is paired with another cast that can be eliminated, we prefer
// to have it eliminated.		// to have it eliminated.
Show All 39 Lines	if (SextTruncC == C) {
return new SExtInst(NewOp, DestTy);		return new SExtInst(NewOp, DestTy);
}		}
}		}

return nullptr;		return nullptr;
}		}

/// Fold {and,or,xor} (cast X), Y.		/// Fold {and,or,xor} (cast X), Y.
Instruction *InstCombiner::foldCastedBitwiseLogic(BinaryOperator &I) {		Instruction *InstCombinerImpl::foldCastedBitwiseLogic(BinaryOperator &I) {
auto LogicOpc = I.getOpcode();		auto LogicOpc = I.getOpcode();
assert(I.isBitwiseLogicOp() && "Unexpected opcode for bitwise logic folding");		assert(I.isBitwiseLogicOp() && "Unexpected opcode for bitwise logic folding");

Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
CastInst *Cast0 = dyn_cast<CastInst>(Op0);		CastInst *Cast0 = dyn_cast<CastInst>(Op0);
if (!Cast0)		if (!Cast0)
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	static bool canNarrowShiftAmt(Constant *C, unsigned BitWidth) {
}		}

// The constant is a constant expression or unknown.		// The constant is a constant expression or unknown.
return false;		return false;
}		}

/// Try to use narrower ops (sink zext ops) for an 'and' with binop operand and		/// Try to use narrower ops (sink zext ops) for an 'and' with binop operand and
/// a common zext operand: and (binop (zext X), C), (zext X).		/// a common zext operand: and (binop (zext X), C), (zext X).
Instruction *InstCombiner::narrowMaskedBinOp(BinaryOperator &And) {		Instruction *InstCombinerImpl::narrowMaskedBinOp(BinaryOperator &And) {
// This transform could also apply to {or, and, xor}, but there are better		// This transform could also apply to {or, and, xor}, but there are better
// folds for those cases, so we don't expect those patterns here. AShr is not		// folds for those cases, so we don't expect those patterns here. AShr is not
// handled because it should always be transformed to LShr in this sequence.		// handled because it should always be transformed to LShr in this sequence.
// The subtract transform is different because it has a constant on the left.		// The subtract transform is different because it has a constant on the left.
// Add/mul commute the constant to RHS; sub with constant RHS becomes add.		// Add/mul commute the constant to RHS; sub with constant RHS becomes add.
Value Op0 = And.getOperand(0), Op1 = And.getOperand(1);		Value Op0 = And.getOperand(0), Op1 = And.getOperand(1);
Constant *C;		Constant *C;
if (!match(Op0, m_OneUse(m_Add(m_Specific(Op1), m_Constant(C)))) &&		if (!match(Op0, m_OneUse(m_Add(m_Specific(Op1), m_Constant(C)))) &&
Show All 25 Lines	Instruction *InstCombinerImpl::narrowMaskedBinOp(BinaryOperator &And) {
Value *NewBO = Opc == Instruction::Sub ? Builder.CreateBinOp(Opc, NewC, X)		Value *NewBO = Opc == Instruction::Sub ? Builder.CreateBinOp(Opc, NewC, X)
: Builder.CreateBinOp(Opc, X, NewC);		: Builder.CreateBinOp(Opc, X, NewC);
return new ZExtInst(Builder.CreateAnd(NewBO, X), Ty);		return new ZExtInst(Builder.CreateAnd(NewBO, X), Ty);
}		}

// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches		// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
// here. We should standardize that construct where it is needed or choose some		// here. We should standardize that construct where it is needed or choose some
// other way to ensure that commutated variants of patterns are not missed.		// other way to ensure that commutated variants of patterns are not missed.
Instruction *InstCombiner::visitAnd(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
if (Value *V = SimplifyAndInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyAndInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	if (match(&I, m_c_And(m_OneUse(m_AShr(m_NSWSub(m_Value(Y), m_Value(X)),
Value *NewICmpInst = Builder.CreateICmpSGT(X, Y);		Value *NewICmpInst = Builder.CreateICmpSGT(X, Y);
return SelectInst::Create(NewICmpInst, X, ConstantInt::getNullValue(Ty));		return SelectInst::Create(NewICmpInst, X, ConstantInt::getNullValue(Ty));
}		}
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::matchBSwap(BinaryOperator &Or) {		Instruction *InstCombinerImpl::matchBSwap(BinaryOperator &Or) {
assert(Or.getOpcode() == Instruction::Or && "bswap requires an 'or'");		assert(Or.getOpcode() == Instruction::Or && "bswap requires an 'or'");
Value Op0 = Or.getOperand(0), Op1 = Or.getOperand(1);		Value Op0 = Or.getOperand(0), Op1 = Or.getOperand(1);

// Look through zero extends.		// Look through zero extends.
if (Instruction *Ext = dyn_cast<ZExtInst>(Op0))		if (Instruction *Ext = dyn_cast<ZExtInst>(Op0))
Op0 = Ext->getOperand(0);		Op0 = Ext->getOperand(0);

if (Instruction *Ext = dyn_cast<ZExtInst>(Op1))		if (Instruction *Ext = dyn_cast<ZExtInst>(Op1))
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	if (!((match(EltC1, m_Zero()) && match(EltC2, m_AllOnes())) \|\|
return false;		return false;
}		}
return true;		return true;
}		}

/// We have an expression of the form (A & C) \| (B & D). If A is a scalar or		/// We have an expression of the form (A & C) \| (B & D). If A is a scalar or
/// vector composed of all-zeros or all-ones values and is the bitwise 'not' of		/// vector composed of all-zeros or all-ones values and is the bitwise 'not' of
/// B, it can be used as the condition operand of a select instruction.		/// B, it can be used as the condition operand of a select instruction.
Value InstCombiner::getSelectCondition(Value A, Value *B) {		Value InstCombinerImpl::getSelectCondition(Value A, Value *B) {
// Step 1: We may have peeked through bitcasts in the caller.		// Step 1: We may have peeked through bitcasts in the caller.
// Exit immediately if we don't have (vector) integer types.		// Exit immediately if we don't have (vector) integer types.
Type *Ty = A->getType();		Type *Ty = A->getType();
if (!Ty->isIntOrIntVectorTy() \|\| !B->getType()->isIntOrIntVectorTy())		if (!Ty->isIntOrIntVectorTy() \|\| !B->getType()->isIntOrIntVectorTy())
return nullptr;		return nullptr;

// Step 2: We need 0 or all-1's bitmasks.		// Step 2: We need 0 or all-1's bitmasks.
if (ComputeNumSignBits(A) != Ty->getScalarSizeInBits())		if (ComputeNumSignBits(A) != Ty->getScalarSizeInBits())
Show All 40 Lines	if (match(A, (m_Xor(m_SExt(m_Value(Cond)), m_Constant(AConst)))) &&
AConst = ConstantExpr::getTrunc(AConst, CmpInst::makeCmpResultType(Ty));		AConst = ConstantExpr::getTrunc(AConst, CmpInst::makeCmpResultType(Ty));
return Builder.CreateXor(Cond, AConst);		return Builder.CreateXor(Cond, AConst);
}		}
return nullptr;		return nullptr;
}		}

/// We have an expression of the form (A & C) \| (B & D). Try to simplify this		/// We have an expression of the form (A & C) \| (B & D). Try to simplify this
/// to "A' ? C : D", where A' is a boolean or vector of booleans.		/// to "A' ? C : D", where A' is a boolean or vector of booleans.
Value InstCombiner::matchSelectFromAndOr(Value A, Value C, Value B,		Value InstCombinerImpl::matchSelectFromAndOr(Value A, Value C, Value B,
Value *D) {		Value *D) {
// The potential condition of the select may be bitcasted. In that case, look		// The potential condition of the select may be bitcasted. In that case, look
// through its bitcast and the corresponding bitcast of the 'not' condition.		// through its bitcast and the corresponding bitcast of the 'not' condition.
Type *OrigType = A->getType();		Type *OrigType = A->getType();
A = peekThroughBitcast(A, true);		A = peekThroughBitcast(A, true);
B = peekThroughBitcast(B, true);		B = peekThroughBitcast(B, true);
if (Value *Cond = getSelectCondition(A, B)) {		if (Value *Cond = getSelectCondition(A, B)) {
// ((bc Cond) & C) \| ((bc ~Cond) & D) --> bc (select Cond, (bc C), (bc D))		// ((bc Cond) & C) \| ((bc ~Cond) & D) --> bc (select Cond, (bc C), (bc D))
// The bitcasts will either all exist or all not exist. The builder will		// The bitcasts will either all exist or all not exist. The builder will
// not create unnecessary casts if the types already match.		// not create unnecessary casts if the types already match.
Value *BitcastC = Builder.CreateBitCast(C, A->getType());		Value *BitcastC = Builder.CreateBitCast(C, A->getType());
Value *BitcastD = Builder.CreateBitCast(D, A->getType());		Value *BitcastD = Builder.CreateBitCast(D, A->getType());
Value *Select = Builder.CreateSelect(Cond, BitcastC, BitcastD);		Value *Select = Builder.CreateSelect(Cond, BitcastC, BitcastD);
return Builder.CreateBitCast(Select, OrigType);		return Builder.CreateBitCast(Select, OrigType);
}		}

return nullptr;		return nullptr;
}		}

/// Fold (icmp)\|(icmp) if possible.		/// Fold (icmp)\|(icmp) if possible.
Value InstCombiner::foldOrOfICmps(ICmpInst LHS, ICmpInst *RHS,		Value InstCombinerImpl::foldOrOfICmps(ICmpInst LHS, ICmpInst *RHS,
BinaryOperator &Or) {		BinaryOperator &Or) {
const SimplifyQuery Q = SQ.getWithInstruction(&Or);		const SimplifyQuery Q = SQ.getWithInstruction(&Or);

// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)		// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)
// if K1 and K2 are a one-bit mask.		// if K1 and K2 are a one-bit mask.
if (Value *V = foldAndOrOfICmpsOfAndWithPow2(LHS, RHS, Or))		if (Value *V = foldAndOrOfICmpsOfAndWithPow2(LHS, RHS, Or))
return V;		return V;

ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();		ICmpInst::Predicate PredL = LHS->getPredicate(), PredR = RHS->getPredicate();
▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	case ICmpInst::ICMP_SLT:
break;		break;
}		}
return nullptr;		return nullptr;
}		}

// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches		// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
// here. We should standardize that construct where it is needed or choose some		// here. We should standardize that construct where it is needed or choose some
// other way to ensure that commutated variants of patterns are not missed.		// other way to ensure that commutated variants of patterns are not missed.
Instruction *InstCombiner::visitOr(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
if (Value *V = SimplifyOrInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyOrInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
▲ Show 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	if ((match(Op0, m_Or(m_Value(A), m_Value(B))) &&
match(Op1, m_Not(m_c_And(m_Specific(A), m_Specific(B))))) \|\|		match(Op1, m_Not(m_c_And(m_Specific(A), m_Specific(B))))) \|\|
(match(Op0, m_And(m_Value(A), m_Value(B))) &&		(match(Op0, m_And(m_Value(A), m_Value(B))) &&
match(Op1, m_Not(m_c_Or(m_Specific(A), m_Specific(B))))))		match(Op1, m_Not(m_c_Or(m_Specific(A), m_Specific(B))))))
return BinaryOperator::CreateNot(Builder.CreateXor(A, B));		return BinaryOperator::CreateNot(Builder.CreateXor(A, B));

return nullptr;		return nullptr;
}		}

Value InstCombiner::foldXorOfICmps(ICmpInst LHS, ICmpInst *RHS,		Value InstCombinerImpl::foldXorOfICmps(ICmpInst LHS, ICmpInst *RHS,
BinaryOperator &I) {		BinaryOperator &I) {
assert(I.getOpcode() == Instruction::Xor && I.getOperand(0) == LHS &&		assert(I.getOpcode() == Instruction::Xor && I.getOperand(0) == LHS &&
I.getOperand(1) == RHS && "Should be 'xor' with these operands");		I.getOperand(1) == RHS && "Should be 'xor' with these operands");

if (predicatesFoldable(LHS->getPredicate(), RHS->getPredicate())) {		if (predicatesFoldable(LHS->getPredicate(), RHS->getPredicate())) {
if (LHS->getOperand(0) == RHS->getOperand(1) &&		if (LHS->getOperand(0) == RHS->getOperand(1) &&
LHS->getOperand(1) == RHS->getOperand(0))		LHS->getOperand(1) == RHS->getOperand(0))
LHS->swapOperands();		LHS->swapOperands();
if (LHS->getOperand(0) == RHS->getOperand(0) &&		if (LHS->getOperand(0) == RHS->getOperand(0) &&
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	static Instruction *sinkNotIntoXor(BinaryOperator &I,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
Value X, Y;		Value X, Y;
// FIXME: one-use check is not needed in general, but currently we are unable		// FIXME: one-use check is not needed in general, but currently we are unable
// to fold 'not' into 'icmp', if that 'icmp' has multiple uses. (D35182)		// to fold 'not' into 'icmp', if that 'icmp' has multiple uses. (D35182)
if (!match(&I, m_Not(m_OneUse(m_Xor(m_Value(X), m_Value(Y))))))		if (!match(&I, m_Not(m_OneUse(m_Xor(m_Value(X), m_Value(Y))))))
return nullptr;		return nullptr;

// We only want to do the transform if it is free to do.		// We only want to do the transform if it is free to do.
if (isFreeToInvert(X, X->hasOneUse())) {		if (InstCombiner::isFreeToInvert(X, X->hasOneUse())) {
// Ok, good.		// Ok, good.
} else if (isFreeToInvert(Y, Y->hasOneUse())) {		} else if (InstCombiner::isFreeToInvert(Y, Y->hasOneUse())) {
std::swap(X, Y);		std::swap(X, Y);
} else		} else
return nullptr;		return nullptr;

Value *NotX = Builder.CreateNot(X, X->getName() + ".not");		Value *NotX = Builder.CreateNot(X, X->getName() + ".not");
return BinaryOperator::CreateXor(NotX, Y, I.getName() + ".demorgan");		return BinaryOperator::CreateXor(NotX, Y, I.getName() + ".demorgan");
}		}

// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches		// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
// here. We should standardize that construct where it is needed or choose some		// here. We should standardize that construct where it is needed or choose some
// other way to ensure that commutated variants of patterns are not missed.		// other way to ensure that commutated variants of patterns are not missed.
Instruction *InstCombiner::visitXor(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
if (Value *V = SimplifyXorInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyXorInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp

//===- InstCombineAtomicRMW.cpp -------------------------------------------===//		//===- InstCombineAtomicRMW.cpp -------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the visit functions for atomic rmw instructions.		// This file implements the visit functions for atomic rmw instructions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"

using namespace llvm;		using namespace llvm;

namespace {		namespace {
/// Return true if and only if the given instruction does not modify the memory		/// Return true if and only if the given instruction does not modify the memory
/// location referenced. Note that an idemptent atomicrmw may still have		/// location referenced. Note that an idemptent atomicrmw may still have
/// ordering effects on nearby instructions, or be volatile.		/// ordering effects on nearby instructions, or be volatile.
/// TODO: Common w/ the version in AtomicExpandPass, and change the term used.		/// TODO: Common w/ the version in AtomicExpandPass, and change the term used.
/// Idemptotent is confusing in this context.		/// Idemptotent is confusing in this context.
bool isIdempotentRMW(AtomicRMWInst& RMWI) {		bool isIdempotentRMW(AtomicRMWInst& RMWI) {
if (auto CF = dyn_cast<ConstantFP>(RMWI.getValOperand()))		if (auto CF = dyn_cast<ConstantFP>(RMWI.getValOperand()))
switch(RMWI.getOperation()) {		switch(RMWI.getOperation()) {
case AtomicRMWInst::FAdd: // -0.0		case AtomicRMWInst::FAdd: // -0.0
return CF->isZero() && CF->isNegative();		return CF->isZero() && CF->isNegative();
case AtomicRMWInst::FSub: // +0.0		case AtomicRMWInst::FSub: // +0.0
return CF->isZero() && !CF->isNegative();		return CF->isZero() && !CF->isNegative();
default:		default:
return false;		return false;
};		};

auto C = dyn_cast<ConstantInt>(RMWI.getValOperand());		auto C = dyn_cast<ConstantInt>(RMWI.getValOperand());
if(!C)		if(!C)
return false;		return false;

switch(RMWI.getOperation()) {		switch(RMWI.getOperation()) {
case AtomicRMWInst::Add:		case AtomicRMWInst::Add:
case AtomicRMWInst::Sub:		case AtomicRMWInst::Sub:
case AtomicRMWInst::Or:		case AtomicRMWInst::Or:
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bool isSaturating(AtomicRMWInst& RMWI) {
case AtomicRMWInst::UMin:		case AtomicRMWInst::UMin:
return C->isMinValue(false);		return C->isMinValue(false);
case AtomicRMWInst::UMax:		case AtomicRMWInst::UMax:
return C->isMaxValue(false);		return C->isMaxValue(false);
};		};
}		}
}		}

Instruction *InstCombiner::visitAtomicRMWInst(AtomicRMWInst &RMWI) {		Instruction *InstCombinerImpl::visitAtomicRMWInst(AtomicRMWInst &RMWI) {

// Volatile RMWs perform a load and a store, we cannot replace this by just a		// Volatile RMWs perform a load and a store, we cannot replace this by just a
// load or just a store. We chose not to canonicalize out of general paranoia		// load or just a store. We chose not to canonicalize out of general paranoia
// about user expectations around volatile.		// about user expectations around volatile.
if (RMWI.isVolatile())		if (RMWI.isVolatile())
return nullptr;		return nullptr;

// Any atomicrmw op which produces a known result in memory can be		// Any atomicrmw op which produces a known result in memory can be
// replaced w/an atomicrmw xchg.		// replaced w/an atomicrmw xchg.
if (isSaturating(RMWI) &&		if (isSaturating(RMWI) &&
RMWI.getOperation() != AtomicRMWInst::Xchg) {		RMWI.getOperation() != AtomicRMWInst::Xchg) {
RMWI.setOperation(AtomicRMWInst::Xchg);		RMWI.setOperation(AtomicRMWInst::Xchg);
return &RMWI;		return &RMWI;
}		}

AtomicOrdering Ordering = RMWI.getOrdering();		AtomicOrdering Ordering = RMWI.getOrdering();
assert(Ordering != AtomicOrdering::NotAtomic &&		assert(Ordering != AtomicOrdering::NotAtomic &&
Ordering != AtomicOrdering::Unordered &&		Ordering != AtomicOrdering::Unordered &&
"AtomicRMWs don't make sense with Unordered or NotAtomic");		"AtomicRMWs don't make sense with Unordered or NotAtomic");

// Any atomicrmw xchg with no uses can be converted to a atomic store if the		// Any atomicrmw xchg with no uses can be converted to a atomic store if the
// ordering is compatible.		// ordering is compatible.
if (RMWI.getOperation() == AtomicRMWInst::Xchg &&		if (RMWI.getOperation() == AtomicRMWInst::Xchg &&
RMWI.use_empty()) {		RMWI.use_empty()) {
if (Ordering != AtomicOrdering::Release &&		if (Ordering != AtomicOrdering::Release &&
Ordering != AtomicOrdering::Monotonic)		Ordering != AtomicOrdering::Monotonic)
return nullptr;		return nullptr;
auto *SI = new StoreInst(RMWI.getValOperand(),		auto *SI = new StoreInst(RMWI.getValOperand(),
RMWI.getPointerOperand(), &RMWI);		RMWI.getPointerOperand(), &RMWI);
SI->setAtomic(Ordering, RMWI.getSyncScopeID());		SI->setAtomic(Ordering, RMWI.getSyncScopeID());
SI->setAlignment(DL.getABITypeAlign(RMWI.getType()));		SI->setAlignment(DL.getABITypeAlign(RMWI.getType()));
return eraseInstFromFunction(RMWI);		return eraseInstFromFunction(RMWI);
}		}

if (!isIdempotentRMW(RMWI))		if (!isIdempotentRMW(RMWI))
return nullptr;		return nullptr;

// We chose to canonicalize all idempotent operations to an single		// We chose to canonicalize all idempotent operations to an single
// operation code and constant. This makes it easier for the rest of the		// operation code and constant. This makes it easier for the rest of the
// optimizer to match easily. The choices of or w/0 and fadd w/-0.0 are		// optimizer to match easily. The choices of or w/0 and fadd w/-0.0 are
// arbitrary.		// arbitrary.
if (RMWI.getType()->isIntegerTy() &&		if (RMWI.getType()->isIntegerTy() &&
RMWI.getOperation() != AtomicRMWInst::Or) {		RMWI.getOperation() != AtomicRMWInst::Or) {
RMWI.setOperation(AtomicRMWInst::Or);		RMWI.setOperation(AtomicRMWInst::Or);
return replaceOperand(RMWI, 1, ConstantInt::get(RMWI.getType(), 0));		return replaceOperand(RMWI, 1, ConstantInt::get(RMWI.getType(), 0));
} else if (RMWI.getType()->isFloatingPointTy() &&		} else if (RMWI.getType()->isFloatingPointTy() &&
RMWI.getOperation() != AtomicRMWInst::FAdd) {		RMWI.getOperation() != AtomicRMWInst::FAdd) {
RMWI.setOperation(AtomicRMWInst::FAdd);		RMWI.setOperation(AtomicRMWInst::FAdd);
return replaceOperand(RMWI, 1, ConstantFP::getNegativeZero(RMWI.getType()));		return replaceOperand(RMWI, 1, ConstantFP::getNegativeZero(RMWI.getType()));
}		}

// Check if the required ordering is compatible with an atomic load.		// Check if the required ordering is compatible with an atomic load.
if (Ordering != AtomicOrdering::Acquire &&		if (Ordering != AtomicOrdering::Acquire &&
Ordering != AtomicOrdering::Monotonic)		Ordering != AtomicOrdering::Monotonic)
return nullptr;		return nullptr;

LoadInst *Load = new LoadInst(RMWI.getType(), RMWI.getPointerOperand(), "",		LoadInst *Load = new LoadInst(RMWI.getType(), RMWI.getPointerOperand(), "",
false, DL.getABITypeAlign(RMWI.getType()),		false, DL.getABITypeAlign(RMWI.getType()),
Ordering, RMWI.getSyncScopeID());		Ordering, RMWI.getSyncScopeID());
return Load;		return Load;
}		}

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 21 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumeBundleQueries.h"		#include "llvm/Analysis/AssumeBundleQueries.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/IntrinsicsARM.h"
#include "llvm/IR/IntrinsicsAArch64.h"		#include "llvm/IR/IntrinsicsAArch64.h"
#include "llvm/IR/IntrinsicsHexagon.h"
#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsPowerPC.h"		#include "llvm/IR/IntrinsicsARM.h"
		#include "llvm/IR/IntrinsicsHexagon.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/SimplifyLibCalls.h"		#include "llvm/Transforms/Utils/SimplifyLibCalls.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <cstring>		#include <cstring>
#include <utility>		#include <utility>
#include <vector>		#include <vector>
Show All 16 Lines
static Type getPromotedType(Type Ty) {		static Type getPromotedType(Type Ty) {
if (IntegerType* ITy = dyn_cast<IntegerType>(Ty)) {		if (IntegerType* ITy = dyn_cast<IntegerType>(Ty)) {
if (ITy->getBitWidth() < 32)		if (ITy->getBitWidth() < 32)
return Type::getInt32Ty(Ty->getContext());		return Type::getInt32Ty(Ty->getContext());
}		}
return Ty;		return Ty;
}		}

/// Return a constant boolean vector that has true elements in all positions		Instruction InstCombinerImpl::SimplifyAnyMemTransfer(AnyMemTransferInst MI) {
/// where the input constant data vector has an element with the sign bit set.
static Constant getNegativeIsTrueBoolVec(ConstantDataVector V) {
SmallVector<Constant *, 32> BoolVec;
IntegerType *BoolTy = Type::getInt1Ty(V->getContext());
for (unsigned I = 0, E = V->getNumElements(); I != E; ++I) {
Constant *Elt = V->getElementAsConstant(I);
assert((isa<ConstantInt>(Elt) \|\| isa<ConstantFP>(Elt)) &&
"Unexpected constant data vector element type");
bool Sign = V->getElementType()->isIntegerTy()
? cast<ConstantInt>(Elt)->isNegative()
: cast<ConstantFP>(Elt)->isNegative();
BoolVec.push_back(ConstantInt::get(BoolTy, Sign));
}
return ConstantVector::get(BoolVec);
}

Instruction InstCombiner::SimplifyAnyMemTransfer(AnyMemTransferInst MI) {
Align DstAlign = getKnownAlignment(MI->getRawDest(), DL, MI, &AC, &DT);		Align DstAlign = getKnownAlignment(MI->getRawDest(), DL, MI, &AC, &DT);
MaybeAlign CopyDstAlign = MI->getDestAlign();		MaybeAlign CopyDstAlign = MI->getDestAlign();
if (!CopyDstAlign \|\| *CopyDstAlign < DstAlign) {		if (!CopyDstAlign \|\| *CopyDstAlign < DstAlign) {
MI->setDestAlignment(DstAlign);		MI->setDestAlignment(DstAlign);
return MI;		return MI;
}		}

Align SrcAlign = getKnownAlignment(MI->getRawSource(), DL, MI, &AC, &DT);		Align SrcAlign = getKnownAlignment(MI->getRawSource(), DL, MI, &AC, &DT);
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (isa<AtomicMemTransferInst>(MI)) {
S->setOrdering(AtomicOrdering::Unordered);		S->setOrdering(AtomicOrdering::Unordered);
}		}

// Set the size of the copy to 0, it will be deleted on the next iteration.		// Set the size of the copy to 0, it will be deleted on the next iteration.
MI->setLength(Constant::getNullValue(MemOpLength->getType()));		MI->setLength(Constant::getNullValue(MemOpLength->getType()));
return MI;		return MI;
}		}

Instruction InstCombiner::SimplifyAnyMemSet(AnyMemSetInst MI) {		Instruction InstCombinerImpl::SimplifyAnyMemSet(AnyMemSetInst MI) {
const Align KnownAlignment =		const Align KnownAlignment =
getKnownAlignment(MI->getDest(), DL, MI, &AC, &DT);		getKnownAlignment(MI->getDest(), DL, MI, &AC, &DT);
MaybeAlign MemSetAlign = MI->getDestAlign();		MaybeAlign MemSetAlign = MI->getDestAlign();
if (!MemSetAlign \|\| *MemSetAlign < KnownAlignment) {		if (!MemSetAlign \|\| *MemSetAlign < KnownAlignment) {
MI->setDestAlignment(KnownAlignment);		MI->setDestAlignment(KnownAlignment);
return MI;		return MI;
}		}

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (Len <= 8 && isPowerOf2_32((uint32_t)Len)) {
// Set the size of the copy to 0, it will be deleted on the next iteration.		// Set the size of the copy to 0, it will be deleted on the next iteration.
MI->setLength(Constant::getNullValue(LenC->getType()));		MI->setLength(Constant::getNullValue(LenC->getType()));
return MI;		return MI;
}		}

return nullptr;		return nullptr;
}		}

static Value *simplifyX86immShift(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
bool LogicalShift = false;
bool ShiftLeft = false;
bool IsImm = false;

switch (II.getIntrinsicID()) {
default: llvm_unreachable("Unexpected intrinsic!");
case Intrinsic::x86_sse2_psrai_d:
case Intrinsic::x86_sse2_psrai_w:
case Intrinsic::x86_avx2_psrai_d:
case Intrinsic::x86_avx2_psrai_w:
case Intrinsic::x86_avx512_psrai_q_128:
case Intrinsic::x86_avx512_psrai_q_256:
case Intrinsic::x86_avx512_psrai_d_512:
case Intrinsic::x86_avx512_psrai_q_512:
case Intrinsic::x86_avx512_psrai_w_512:
IsImm = true;
LLVM_FALLTHROUGH;
case Intrinsic::x86_sse2_psra_d:
case Intrinsic::x86_sse2_psra_w:
case Intrinsic::x86_avx2_psra_d:
case Intrinsic::x86_avx2_psra_w:
case Intrinsic::x86_avx512_psra_q_128:
case Intrinsic::x86_avx512_psra_q_256:
case Intrinsic::x86_avx512_psra_d_512:
case Intrinsic::x86_avx512_psra_q_512:
case Intrinsic::x86_avx512_psra_w_512:
LogicalShift = false;
ShiftLeft = false;
break;
case Intrinsic::x86_sse2_psrli_d:
case Intrinsic::x86_sse2_psrli_q:
case Intrinsic::x86_sse2_psrli_w:
case Intrinsic::x86_avx2_psrli_d:
case Intrinsic::x86_avx2_psrli_q:
case Intrinsic::x86_avx2_psrli_w:
case Intrinsic::x86_avx512_psrli_d_512:
case Intrinsic::x86_avx512_psrli_q_512:
case Intrinsic::x86_avx512_psrli_w_512:
IsImm = true;
LLVM_FALLTHROUGH;
case Intrinsic::x86_sse2_psrl_d:
case Intrinsic::x86_sse2_psrl_q:
case Intrinsic::x86_sse2_psrl_w:
case Intrinsic::x86_avx2_psrl_d:
case Intrinsic::x86_avx2_psrl_q:
case Intrinsic::x86_avx2_psrl_w:
case Intrinsic::x86_avx512_psrl_d_512:
case Intrinsic::x86_avx512_psrl_q_512:
case Intrinsic::x86_avx512_psrl_w_512:
LogicalShift = true;
ShiftLeft = false;
break;
case Intrinsic::x86_sse2_pslli_d:
case Intrinsic::x86_sse2_pslli_q:
case Intrinsic::x86_sse2_pslli_w:
case Intrinsic::x86_avx2_pslli_d:
case Intrinsic::x86_avx2_pslli_q:
case Intrinsic::x86_avx2_pslli_w:
case Intrinsic::x86_avx512_pslli_d_512:
case Intrinsic::x86_avx512_pslli_q_512:
case Intrinsic::x86_avx512_pslli_w_512:
IsImm = true;
LLVM_FALLTHROUGH;
case Intrinsic::x86_sse2_psll_d:
case Intrinsic::x86_sse2_psll_q:
case Intrinsic::x86_sse2_psll_w:
case Intrinsic::x86_avx2_psll_d:
case Intrinsic::x86_avx2_psll_q:
case Intrinsic::x86_avx2_psll_w:
case Intrinsic::x86_avx512_psll_d_512:
case Intrinsic::x86_avx512_psll_q_512:
case Intrinsic::x86_avx512_psll_w_512:
LogicalShift = true;
ShiftLeft = true;
break;
}
assert((LogicalShift \|\| !ShiftLeft) && "Only logical shifts can shift left");

auto Vec = II.getArgOperand(0);
auto Amt = II.getArgOperand(1);
auto VT = cast<VectorType>(Vec->getType());
auto SVT = VT->getElementType();
auto AmtVT = Amt->getType();
unsigned VWidth = VT->getNumElements();
unsigned BitWidth = SVT->getPrimitiveSizeInBits();

// If the shift amount is guaranteed to be in-range we can replace it with a
// generic shift. If its guaranteed to be out of range, logical shifts combine to
// zero and arithmetic shifts are clamped to (BitWidth - 1).
if (IsImm) {
assert(AmtVT ->isIntegerTy(32) &&
"Unexpected shift-by-immediate type");
KnownBits KnownAmtBits =
llvm::computeKnownBits(Amt, II.getModule()->getDataLayout());
if (KnownAmtBits.getMaxValue().ult(BitWidth)) {
Amt = Builder.CreateZExtOrTrunc(Amt, SVT);
Amt = Builder.CreateVectorSplat(VWidth, Amt);
return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
: Builder.CreateLShr(Vec, Amt))
: Builder.CreateAShr(Vec, Amt));
}
if (KnownAmtBits.getMinValue().uge(BitWidth)) {
if (LogicalShift)
return ConstantAggregateZero::get(VT);
Amt = ConstantInt::get(SVT, BitWidth - 1);
return Builder.CreateAShr(Vec, Builder.CreateVectorSplat(VWidth, Amt));
}
} else {
// Ensure the first element has an in-range value and the rest of the
// elements in the bottom 64 bits are zero.
assert(AmtVT->isVectorTy() && AmtVT->getPrimitiveSizeInBits() == 128 &&
cast<VectorType>(AmtVT)->getElementType() == SVT &&
"Unexpected shift-by-scalar type");
unsigned NumAmtElts = cast<VectorType>(AmtVT)->getNumElements();
APInt DemandedLower = APInt::getOneBitSet(NumAmtElts, 0);
APInt DemandedUpper = APInt::getBitsSet(NumAmtElts, 1, NumAmtElts / 2);
KnownBits KnownLowerBits = llvm::computeKnownBits(
Amt, DemandedLower, II.getModule()->getDataLayout());
KnownBits KnownUpperBits = llvm::computeKnownBits(
Amt, DemandedUpper, II.getModule()->getDataLayout());
if (KnownLowerBits.getMaxValue().ult(BitWidth) &&
(DemandedUpper.isNullValue() \|\| KnownUpperBits.isZero())) {
SmallVector<int, 16> ZeroSplat(VWidth, 0);
Amt = Builder.CreateShuffleVector(Amt, Amt, ZeroSplat);
return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
: Builder.CreateLShr(Vec, Amt))
: Builder.CreateAShr(Vec, Amt));
}
}

// Simplify if count is constant vector.
auto CDV = dyn_cast<ConstantDataVector>(Amt);
if (!CDV)
return nullptr;

// SSE2/AVX2 uses all the first 64-bits of the 128-bit vector
// operand to compute the shift amount.
assert(AmtVT->isVectorTy() && AmtVT->getPrimitiveSizeInBits() == 128 &&
cast<VectorType>(AmtVT)->getElementType() == SVT &&
"Unexpected shift-by-scalar type");

// Concatenate the sub-elements to create the 64-bit value.
APInt Count(64, 0);
for (unsigned i = 0, NumSubElts = 64 / BitWidth; i != NumSubElts; ++i) {
unsigned SubEltIdx = (NumSubElts - 1) - i;
auto SubElt = cast<ConstantInt>(CDV->getElementAsConstant(SubEltIdx));
Count <<= BitWidth;
Count \|= SubElt->getValue().zextOrTrunc(64);
}

// If shift-by-zero then just return the original value.
if (Count.isNullValue())
return Vec;

// Handle cases when Shift >= BitWidth.
if (Count.uge(BitWidth)) {
// If LogicalShift - just return zero.
if (LogicalShift)
return ConstantAggregateZero::get(VT);

// If ArithmeticShift - clamp Shift to (BitWidth - 1).
Count = APInt(64, BitWidth - 1);
}

// Get a constant vector of the same type as the first operand.
auto ShiftAmt = ConstantInt::get(SVT, Count.zextOrTrunc(BitWidth));
auto ShiftVec = Builder.CreateVectorSplat(VWidth, ShiftAmt);

if (ShiftLeft)
return Builder.CreateShl(Vec, ShiftVec);

if (LogicalShift)
return Builder.CreateLShr(Vec, ShiftVec);

return Builder.CreateAShr(Vec, ShiftVec);
}

// Attempt to simplify AVX2 per-element shift intrinsics to a generic IR shift.
// Unlike the generic IR shifts, the intrinsics have defined behaviour for out
// of range shift amounts (logical - set to zero, arithmetic - splat sign bit).
static Value *simplifyX86varShift(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
bool LogicalShift = false;
bool ShiftLeft = false;

switch (II.getIntrinsicID()) {
default: llvm_unreachable("Unexpected intrinsic!");
case Intrinsic::x86_avx2_psrav_d:
case Intrinsic::x86_avx2_psrav_d_256:
case Intrinsic::x86_avx512_psrav_q_128:
case Intrinsic::x86_avx512_psrav_q_256:
case Intrinsic::x86_avx512_psrav_d_512:
case Intrinsic::x86_avx512_psrav_q_512:
case Intrinsic::x86_avx512_psrav_w_128:
case Intrinsic::x86_avx512_psrav_w_256:
case Intrinsic::x86_avx512_psrav_w_512:
LogicalShift = false;
ShiftLeft = false;
break;
case Intrinsic::x86_avx2_psrlv_d:
case Intrinsic::x86_avx2_psrlv_d_256:
case Intrinsic::x86_avx2_psrlv_q:
case Intrinsic::x86_avx2_psrlv_q_256:
case Intrinsic::x86_avx512_psrlv_d_512:
case Intrinsic::x86_avx512_psrlv_q_512:
case Intrinsic::x86_avx512_psrlv_w_128:
case Intrinsic::x86_avx512_psrlv_w_256:
case Intrinsic::x86_avx512_psrlv_w_512:
LogicalShift = true;
ShiftLeft = false;
break;
case Intrinsic::x86_avx2_psllv_d:
case Intrinsic::x86_avx2_psllv_d_256:
case Intrinsic::x86_avx2_psllv_q:
case Intrinsic::x86_avx2_psllv_q_256:
case Intrinsic::x86_avx512_psllv_d_512:
case Intrinsic::x86_avx512_psllv_q_512:
case Intrinsic::x86_avx512_psllv_w_128:
case Intrinsic::x86_avx512_psllv_w_256:
case Intrinsic::x86_avx512_psllv_w_512:
LogicalShift = true;
ShiftLeft = true;
break;
}
assert((LogicalShift \|\| !ShiftLeft) && "Only logical shifts can shift left");

auto Vec = II.getArgOperand(0);
auto Amt = II.getArgOperand(1);
auto VT = cast<VectorType>(II.getType());
auto SVT = VT->getElementType();
int NumElts = VT->getNumElements();
int BitWidth = SVT->getIntegerBitWidth();

// If the shift amount is guaranteed to be in-range we can replace it with a
// generic shift.
APInt UpperBits =
APInt::getHighBitsSet(BitWidth, BitWidth - Log2_32(BitWidth));
if (llvm::MaskedValueIsZero(Amt, UpperBits,
II.getModule()->getDataLayout())) {
return (LogicalShift ? (ShiftLeft ? Builder.CreateShl(Vec, Amt)
: Builder.CreateLShr(Vec, Amt))
: Builder.CreateAShr(Vec, Amt));
}

// Simplify if all shift amounts are constant/undef.
auto *CShift = dyn_cast<Constant>(Amt);
if (!CShift)
return nullptr;

// Collect each element's shift amount.
// We also collect special cases: UNDEF = -1, OUT-OF-RANGE = BitWidth.
bool AnyOutOfRange = false;
SmallVector<int, 8> ShiftAmts;
for (int I = 0; I < NumElts; ++I) {
auto *CElt = CShift->getAggregateElement(I);
if (CElt && isa<UndefValue>(CElt)) {
ShiftAmts.push_back(-1);
continue;
}

auto *COp = dyn_cast_or_null<ConstantInt>(CElt);
if (!COp)
return nullptr;

// Handle out of range shifts.
// If LogicalShift - set to BitWidth (special case).
// If ArithmeticShift - set to (BitWidth - 1) (sign splat).
APInt ShiftVal = COp->getValue();
if (ShiftVal.uge(BitWidth)) {
AnyOutOfRange = LogicalShift;
ShiftAmts.push_back(LogicalShift ? BitWidth : BitWidth - 1);
continue;
}

ShiftAmts.push_back((int)ShiftVal.getZExtValue());
}

// If all elements out of range or UNDEF, return vector of zeros/undefs.
// ArithmeticShift should only hit this if they are all UNDEF.
auto OutOfRange = [&](int Idx) { return (Idx < 0) \|\| (BitWidth <= Idx); };
if (llvm::all_of(ShiftAmts, OutOfRange)) {
SmallVector<Constant *, 8> ConstantVec;
for (int Idx : ShiftAmts) {
if (Idx < 0) {
ConstantVec.push_back(UndefValue::get(SVT));
} else {
assert(LogicalShift && "Logical shift expected");
ConstantVec.push_back(ConstantInt::getNullValue(SVT));
}
}
return ConstantVector::get(ConstantVec);
}

// We can't handle only some out of range values with generic logical shifts.
if (AnyOutOfRange)
return nullptr;

// Build the shift amount constant vector.
SmallVector<Constant *, 8> ShiftVecAmts;
for (int Idx : ShiftAmts) {
if (Idx < 0)
ShiftVecAmts.push_back(UndefValue::get(SVT));
else
ShiftVecAmts.push_back(ConstantInt::get(SVT, Idx));
}
auto ShiftVec = ConstantVector::get(ShiftVecAmts);

if (ShiftLeft)
return Builder.CreateShl(Vec, ShiftVec);

if (LogicalShift)
return Builder.CreateLShr(Vec, ShiftVec);

return Builder.CreateAShr(Vec, ShiftVec);
}

static Value *simplifyX86pack(IntrinsicInst &II,
InstCombiner::BuilderTy &Builder, bool IsSigned) {
Value *Arg0 = II.getArgOperand(0);
Value *Arg1 = II.getArgOperand(1);
Type *ResTy = II.getType();

// Fast all undef handling.
if (isa<UndefValue>(Arg0) && isa<UndefValue>(Arg1))
return UndefValue::get(ResTy);

auto *ArgTy = cast<VectorType>(Arg0->getType());
unsigned NumLanes = ResTy->getPrimitiveSizeInBits() / 128;
unsigned NumSrcElts = ArgTy->getNumElements();
assert(cast<VectorType>(ResTy)->getNumElements() == (2 * NumSrcElts) &&
"Unexpected packing types");

unsigned NumSrcEltsPerLane = NumSrcElts / NumLanes;
unsigned DstScalarSizeInBits = ResTy->getScalarSizeInBits();
unsigned SrcScalarSizeInBits = ArgTy->getScalarSizeInBits();
assert(SrcScalarSizeInBits == (2 * DstScalarSizeInBits) &&
"Unexpected packing types");

// Constant folding.
if (!isa<Constant>(Arg0) \|\| !isa<Constant>(Arg1))
return nullptr;

// Clamp Values - signed/unsigned both use signed clamp values, but they
// differ on the min/max values.
APInt MinValue, MaxValue;
if (IsSigned) {
// PACKSS: Truncate signed value with signed saturation.
// Source values less than dst minint are saturated to minint.
// Source values greater than dst maxint are saturated to maxint.
MinValue =
APInt::getSignedMinValue(DstScalarSizeInBits).sext(SrcScalarSizeInBits);
MaxValue =
APInt::getSignedMaxValue(DstScalarSizeInBits).sext(SrcScalarSizeInBits);
} else {
// PACKUS: Truncate signed value with unsigned saturation.
// Source values less than zero are saturated to zero.
// Source values greater than dst maxuint are saturated to maxuint.
MinValue = APInt::getNullValue(SrcScalarSizeInBits);
MaxValue = APInt::getLowBitsSet(SrcScalarSizeInBits, DstScalarSizeInBits);
}

auto *MinC = Constant::getIntegerValue(ArgTy, MinValue);
auto *MaxC = Constant::getIntegerValue(ArgTy, MaxValue);
Arg0 = Builder.CreateSelect(Builder.CreateICmpSLT(Arg0, MinC), MinC, Arg0);
Arg1 = Builder.CreateSelect(Builder.CreateICmpSLT(Arg1, MinC), MinC, Arg1);
Arg0 = Builder.CreateSelect(Builder.CreateICmpSGT(Arg0, MaxC), MaxC, Arg0);
Arg1 = Builder.CreateSelect(Builder.CreateICmpSGT(Arg1, MaxC), MaxC, Arg1);

// Shuffle clamped args together at the lane level.
SmallVector<int, 32> PackMask;
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
for (unsigned Elt = 0; Elt != NumSrcEltsPerLane; ++Elt)
PackMask.push_back(Elt + (Lane * NumSrcEltsPerLane));
for (unsigned Elt = 0; Elt != NumSrcEltsPerLane; ++Elt)
PackMask.push_back(Elt + (Lane * NumSrcEltsPerLane) + NumSrcElts);
}
auto *Shuffle = Builder.CreateShuffleVector(Arg0, Arg1, PackMask);

// Truncate to dst size.
return Builder.CreateTrunc(Shuffle, ResTy);
}

static Value *simplifyX86movmsk(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
Value *Arg = II.getArgOperand(0);
Type *ResTy = II.getType();

// movmsk(undef) -> zero as we must ensure the upper bits are zero.
if (isa<UndefValue>(Arg))
return Constant::getNullValue(ResTy);

auto *ArgTy = dyn_cast<VectorType>(Arg->getType());
// We can't easily peek through x86_mmx types.
if (!ArgTy)
return nullptr;

// Expand MOVMSK to compare/bitcast/zext:
// e.g. PMOVMSKB(v16i8 x):
// %cmp = icmp slt <16 x i8> %x, zeroinitializer
// %int = bitcast <16 x i1> %cmp to i16
// %res = zext i16 %int to i32
unsigned NumElts = ArgTy->getNumElements();
Type *IntegerVecTy = VectorType::getInteger(ArgTy);
Type *IntegerTy = Builder.getIntNTy(NumElts);

Value *Res = Builder.CreateBitCast(Arg, IntegerVecTy);
Res = Builder.CreateICmpSLT(Res, Constant::getNullValue(IntegerVecTy));
Res = Builder.CreateBitCast(Res, IntegerTy);
Res = Builder.CreateZExtOrTrunc(Res, ResTy);
return Res;
}

static Value *simplifyX86addcarry(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
Value *CarryIn = II.getArgOperand(0);
Value *Op1 = II.getArgOperand(1);
Value *Op2 = II.getArgOperand(2);
Type *RetTy = II.getType();
Type *OpTy = Op1->getType();
assert(RetTy->getStructElementType(0)->isIntegerTy(8) &&
RetTy->getStructElementType(1) == OpTy && OpTy == Op2->getType() &&
"Unexpected types for x86 addcarry");

// If carry-in is zero, this is just an unsigned add with overflow.
if (match(CarryIn, m_ZeroInt())) {
Value *UAdd = Builder.CreateIntrinsic(Intrinsic::uadd_with_overflow, OpTy,
{ Op1, Op2 });
// The types have to be adjusted to match the x86 call types.
Value *UAddResult = Builder.CreateExtractValue(UAdd, 0);
Value *UAddOV = Builder.CreateZExt(Builder.CreateExtractValue(UAdd, 1),
Builder.getInt8Ty());
Value *Res = UndefValue::get(RetTy);
Res = Builder.CreateInsertValue(Res, UAddOV, 0);
return Builder.CreateInsertValue(Res, UAddResult, 1);
}

return nullptr;
}

static Value *simplifyX86insertps(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
auto *CInt = dyn_cast<ConstantInt>(II.getArgOperand(2));
if (!CInt)
return nullptr;

VectorType *VecTy = cast<VectorType>(II.getType());
assert(VecTy->getNumElements() == 4 && "insertps with wrong vector type");

// The immediate permute control byte looks like this:
// [3:0] - zero mask for each 32-bit lane
// [5:4] - select one 32-bit destination lane
// [7:6] - select one 32-bit source lane

uint8_t Imm = CInt->getZExtValue();
uint8_t ZMask = Imm & 0xf;
uint8_t DestLane = (Imm >> 4) & 0x3;
uint8_t SourceLane = (Imm >> 6) & 0x3;

ConstantAggregateZero *ZeroVector = ConstantAggregateZero::get(VecTy);

// If all zero mask bits are set, this was just a weird way to
// generate a zero vector.
if (ZMask == 0xf)
return ZeroVector;

// Initialize by passing all of the first source bits through.
int ShuffleMask[4] = {0, 1, 2, 3};

// We may replace the second operand with the zero vector.
Value *V1 = II.getArgOperand(1);

if (ZMask) {
// If the zero mask is being used with a single input or the zero mask
// overrides the destination lane, this is a shuffle with the zero vector.
if ((II.getArgOperand(0) == II.getArgOperand(1)) \|\|
(ZMask & (1 << DestLane))) {
V1 = ZeroVector;
// We may still move 32-bits of the first source vector from one lane
// to another.
ShuffleMask[DestLane] = SourceLane;
// The zero mask may override the previous insert operation.
for (unsigned i = 0; i < 4; ++i)
if ((ZMask >> i) & 0x1)
ShuffleMask[i] = i + 4;
} else {
// TODO: Model this case as 2 shuffles or a 'logical and' plus shuffle?
return nullptr;
}
} else {
// Replace the selected destination lane with the selected source lane.
ShuffleMask[DestLane] = SourceLane + 4;
}

return Builder.CreateShuffleVector(II.getArgOperand(0), V1, ShuffleMask);
}

/// Attempt to simplify SSE4A EXTRQ/EXTRQI instructions using constant folding
/// or conversion to a shuffle vector.
static Value simplifyX86extrq(IntrinsicInst &II, Value Op0,
ConstantInt CILength, ConstantInt CIIndex,
InstCombiner::BuilderTy &Builder) {
auto LowConstantHighUndef = [&](uint64_t Val) {
Type *IntTy64 = Type::getInt64Ty(II.getContext());
Constant *Args[] = {ConstantInt::get(IntTy64, Val),
UndefValue::get(IntTy64)};
return ConstantVector::get(Args);
};

// See if we're dealing with constant values.
Constant *C0 = dyn_cast<Constant>(Op0);
ConstantInt *CI0 =
C0 ? dyn_cast_or_null<ConstantInt>(C0->getAggregateElement((unsigned)0))
: nullptr;

// Attempt to constant fold.
if (CILength && CIIndex) {
// From AMD documentation: "The bit index and field length are each six
// bits in length other bits of the field are ignored."
APInt APIndex = CIIndex->getValue().zextOrTrunc(6);
APInt APLength = CILength->getValue().zextOrTrunc(6);

unsigned Index = APIndex.getZExtValue();

// From AMD documentation: "a value of zero in the field length is
// defined as length of 64".
unsigned Length = APLength == 0 ? 64 : APLength.getZExtValue();

// From AMD documentation: "If the sum of the bit index + length field
// is greater than 64, the results are undefined".
unsigned End = Index + Length;

// Note that both field index and field length are 8-bit quantities.
// Since variables 'Index' and 'Length' are unsigned values
// obtained from zero-extending field index and field length
// respectively, their sum should never wrap around.
if (End > 64)
return UndefValue::get(II.getType());

// If we are inserting whole bytes, we can convert this to a shuffle.
// Lowering can recognize EXTRQI shuffle masks.
if ((Length % 8) == 0 && (Index % 8) == 0) {
// Convert bit indices to byte indices.
Length /= 8;
Index /= 8;

Type *IntTy8 = Type::getInt8Ty(II.getContext());
auto *ShufTy = FixedVectorType::get(IntTy8, 16);

SmallVector<int, 16> ShuffleMask;
for (int i = 0; i != (int)Length; ++i)
ShuffleMask.push_back(i + Index);
for (int i = Length; i != 8; ++i)
ShuffleMask.push_back(i + 16);
for (int i = 8; i != 16; ++i)
ShuffleMask.push_back(-1);

Value *SV = Builder.CreateShuffleVector(
Builder.CreateBitCast(Op0, ShufTy),
ConstantAggregateZero::get(ShufTy), ShuffleMask);
return Builder.CreateBitCast(SV, II.getType());
}

// Constant Fold - shift Index'th bit to lowest position and mask off
// Length bits.
if (CI0) {
APInt Elt = CI0->getValue();
Elt.lshrInPlace(Index);
Elt = Elt.zextOrTrunc(Length);
return LowConstantHighUndef(Elt.getZExtValue());
}

// If we were an EXTRQ call, we'll save registers if we convert to EXTRQI.
if (II.getIntrinsicID() == Intrinsic::x86_sse4a_extrq) {
Value *Args[] = {Op0, CILength, CIIndex};
Module *M = II.getModule();
Function *F = Intrinsic::getDeclaration(M, Intrinsic::x86_sse4a_extrqi);
return Builder.CreateCall(F, Args);
}
}

// Constant Fold - extraction from zero is always {zero, undef}.
if (CI0 && CI0->isZero())
return LowConstantHighUndef(0);

return nullptr;
}

/// Attempt to simplify SSE4A INSERTQ/INSERTQI instructions using constant
/// folding or conversion to a shuffle vector.
static Value simplifyX86insertq(IntrinsicInst &II, Value Op0, Value *Op1,
APInt APLength, APInt APIndex,
InstCombiner::BuilderTy &Builder) {
// From AMD documentation: "The bit index and field length are each six bits
// in length other bits of the field are ignored."
APIndex = APIndex.zextOrTrunc(6);
APLength = APLength.zextOrTrunc(6);

// Attempt to constant fold.
unsigned Index = APIndex.getZExtValue();

// From AMD documentation: "a value of zero in the field length is
// defined as length of 64".
unsigned Length = APLength == 0 ? 64 : APLength.getZExtValue();

// From AMD documentation: "If the sum of the bit index + length field
// is greater than 64, the results are undefined".
unsigned End = Index + Length;

// Note that both field index and field length are 8-bit quantities.
// Since variables 'Index' and 'Length' are unsigned values
// obtained from zero-extending field index and field length
// respectively, their sum should never wrap around.
if (End > 64)
return UndefValue::get(II.getType());

// If we are inserting whole bytes, we can convert this to a shuffle.
// Lowering can recognize INSERTQI shuffle masks.
if ((Length % 8) == 0 && (Index % 8) == 0) {
// Convert bit indices to byte indices.
Length /= 8;
Index /= 8;

Type *IntTy8 = Type::getInt8Ty(II.getContext());
auto *ShufTy = FixedVectorType::get(IntTy8, 16);

SmallVector<int, 16> ShuffleMask;
for (int i = 0; i != (int)Index; ++i)
ShuffleMask.push_back(i);
for (int i = 0; i != (int)Length; ++i)
ShuffleMask.push_back(i + 16);
for (int i = Index + Length; i != 8; ++i)
ShuffleMask.push_back(i);
for (int i = 8; i != 16; ++i)
ShuffleMask.push_back(-1);

Value *SV = Builder.CreateShuffleVector(Builder.CreateBitCast(Op0, ShufTy),
Builder.CreateBitCast(Op1, ShufTy),
ShuffleMask);
return Builder.CreateBitCast(SV, II.getType());
}

// See if we're dealing with constant values.
Constant *C0 = dyn_cast<Constant>(Op0);
Constant *C1 = dyn_cast<Constant>(Op1);
ConstantInt *CI00 =
C0 ? dyn_cast_or_null<ConstantInt>(C0->getAggregateElement((unsigned)0))
: nullptr;
ConstantInt *CI10 =
C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)0))
: nullptr;

// Constant Fold - insert bottom Length bits starting at the Index'th bit.
if (CI00 && CI10) {
APInt V00 = CI00->getValue();
APInt V10 = CI10->getValue();
APInt Mask = APInt::getLowBitsSet(64, Length).shl(Index);
V00 = V00 & ~Mask;
V10 = V10.zextOrTrunc(Length).zextOrTrunc(64).shl(Index);
APInt Val = V00 \| V10;
Type *IntTy64 = Type::getInt64Ty(II.getContext());
Constant *Args[] = {ConstantInt::get(IntTy64, Val.getZExtValue()),
UndefValue::get(IntTy64)};
return ConstantVector::get(Args);
}

// If we were an INSERTQ call, we'll save demanded elements if we convert to
// INSERTQI.
if (II.getIntrinsicID() == Intrinsic::x86_sse4a_insertq) {
Type *IntTy8 = Type::getInt8Ty(II.getContext());
Constant *CILength = ConstantInt::get(IntTy8, Length, false);
Constant *CIIndex = ConstantInt::get(IntTy8, Index, false);

Value *Args[] = {Op0, Op1, CILength, CIIndex};
Module *M = II.getModule();
Function *F = Intrinsic::getDeclaration(M, Intrinsic::x86_sse4a_insertqi);
return Builder.CreateCall(F, Args);
}

return nullptr;
}

/// Attempt to convert pshufb* to shufflevector if the mask is constant.
static Value *simplifyX86pshufb(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
Constant *V = dyn_cast<Constant>(II.getArgOperand(1));
if (!V)
return nullptr;

auto *VecTy = cast<VectorType>(II.getType());
unsigned NumElts = VecTy->getNumElements();
assert((NumElts == 16 \|\| NumElts == 32 \|\| NumElts == 64) &&
"Unexpected number of elements in shuffle mask!");

// Construct a shuffle mask from constant integers or UNDEFs.
int Indexes[64];

// Each byte in the shuffle control mask forms an index to permute the
// corresponding byte in the destination operand.
for (unsigned I = 0; I < NumElts; ++I) {
Constant *COp = V->getAggregateElement(I);
if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
return nullptr;

if (isa<UndefValue>(COp)) {
Indexes[I] = -1;
continue;
}

int8_t Index = cast<ConstantInt>(COp)->getValue().getZExtValue();

// If the most significant bit (bit[7]) of each byte of the shuffle
// control mask is set, then zero is written in the result byte.
// The zero vector is in the right-hand side of the resulting
// shufflevector.

// The value of each index for the high 128-bit lane is the least
// significant 4 bits of the respective shuffle control byte.
Index = ((Index < 0) ? NumElts : Index & 0x0F) + (I & 0xF0);
Indexes[I] = Index;
}

auto V1 = II.getArgOperand(0);
auto V2 = Constant::getNullValue(VecTy);
return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, NumElts));
}

/// Attempt to convert vpermilvar* to shufflevector if the mask is constant.
static Value *simplifyX86vpermilvar(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
Constant *V = dyn_cast<Constant>(II.getArgOperand(1));
if (!V)
return nullptr;

auto *VecTy = cast<VectorType>(II.getType());
unsigned NumElts = VecTy->getNumElements();
bool IsPD = VecTy->getScalarType()->isDoubleTy();
unsigned NumLaneElts = IsPD ? 2 : 4;
assert(NumElts == 16 \|\| NumElts == 8 \|\| NumElts == 4 \|\| NumElts == 2);

// Construct a shuffle mask from constant integers or UNDEFs.
int Indexes[16];

// The intrinsics only read one or two bits, clear the rest.
for (unsigned I = 0; I < NumElts; ++I) {
Constant *COp = V->getAggregateElement(I);
if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
return nullptr;

if (isa<UndefValue>(COp)) {
Indexes[I] = -1;
continue;
}

APInt Index = cast<ConstantInt>(COp)->getValue();
Index = Index.zextOrTrunc(32).getLoBits(2);

// The PD variants uses bit 1 to select per-lane element index, so
// shift down to convert to generic shuffle mask index.
if (IsPD)
Index.lshrInPlace(1);

// The _256 variants are a bit trickier since the mask bits always index
// into the corresponding 128 half. In order to convert to a generic
// shuffle, we have to make that explicit.
Index += APInt(32, (I / NumLaneElts) * NumLaneElts);

Indexes[I] = Index.getZExtValue();
}

auto V1 = II.getArgOperand(0);
auto V2 = UndefValue::get(V1->getType());
return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, NumElts));
}

/// Attempt to convert vpermd/vpermps to shufflevector if the mask is constant.
static Value *simplifyX86vpermv(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {
auto *V = dyn_cast<Constant>(II.getArgOperand(1));
if (!V)
return nullptr;

auto *VecTy = cast<VectorType>(II.getType());
unsigned Size = VecTy->getNumElements();
assert((Size == 4 \|\| Size == 8 \|\| Size == 16 \|\| Size == 32 \|\| Size == 64) &&
"Unexpected shuffle mask size");

// Construct a shuffle mask from constant integers or UNDEFs.
int Indexes[64];

for (unsigned I = 0; I < Size; ++I) {
Constant *COp = V->getAggregateElement(I);
if (!COp \|\| (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
return nullptr;

if (isa<UndefValue>(COp)) {
Indexes[I] = -1;
continue;
}

uint32_t Index = cast<ConstantInt>(COp)->getZExtValue();
Index &= Size - 1;
Indexes[I] = Index;
}

auto V1 = II.getArgOperand(0);
auto V2 = UndefValue::get(VecTy);
return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes, Size));
}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Narrow width by halfs excluding zero/undef lanes		// * Narrow width by halfs excluding zero/undef lanes
Value *InstCombiner::simplifyMaskedLoad(IntrinsicInst &II) {		Value *InstCombinerImpl::simplifyMaskedLoad(IntrinsicInst &II) {
Value *LoadPtr = II.getArgOperand(0);		Value *LoadPtr = II.getArgOperand(0);
const Align Alignment =		const Align Alignment =
cast<ConstantInt>(II.getArgOperand(1))->getAlignValue();		cast<ConstantInt>(II.getArgOperand(1))->getAlignValue();

// If the mask is all ones or undefs, this is a plain vector load of the 1st		// If the mask is all ones or undefs, this is a plain vector load of the 1st
// argument.		// argument.
if (maskIsAllOneOrUndef(II.getArgOperand(2)))		if (maskIsAllOneOrUndef(II.getArgOperand(2)))
return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,		return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,
Show All 10 Lines	Value *InstCombinerImpl::simplifyMaskedLoad(IntrinsicInst &II) {
}		}

return nullptr;		return nullptr;
}		}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Single constant active lane -> store		// * Single constant active lane -> store
// * Narrow width by halfs excluding zero/undef lanes		// * Narrow width by halfs excluding zero/undef lanes
Instruction *InstCombiner::simplifyMaskedStore(IntrinsicInst &II) {		Instruction *InstCombinerImpl::simplifyMaskedStore(IntrinsicInst &II) {
auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));		auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));
if (!ConstMask)		if (!ConstMask)
return nullptr;		return nullptr;

// If the mask is all zeros, this instruction does nothing.		// If the mask is all zeros, this instruction does nothing.
if (ConstMask->isNullValue())		if (ConstMask->isNullValue())
return eraseInstFromFunction(II);		return eraseInstFromFunction(II);

Show All 16 Lines

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Single constant active lane load -> load		// * Single constant active lane load -> load
// * Dereferenceable address & few lanes -> scalarize speculative load/selects		// * Dereferenceable address & few lanes -> scalarize speculative load/selects
// * Adjacent vector addresses -> masked.load		// * Adjacent vector addresses -> masked.load
// * Narrow width by halfs excluding zero/undef lanes		// * Narrow width by halfs excluding zero/undef lanes
// * Vector splat address w/known mask -> scalar load		// * Vector splat address w/known mask -> scalar load
// * Vector incrementing address -> vector masked load		// * Vector incrementing address -> vector masked load
Instruction *InstCombiner::simplifyMaskedGather(IntrinsicInst &II) {		Instruction *InstCombinerImpl::simplifyMaskedGather(IntrinsicInst &II) {
return nullptr;		return nullptr;
}		}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Single constant active lane -> store		// * Single constant active lane -> store
// * Adjacent vector addresses -> masked.store		// * Adjacent vector addresses -> masked.store
// * Narrow store width by halfs excluding zero/undef lanes		// * Narrow store width by halfs excluding zero/undef lanes
// * Vector splat address w/known mask -> scalar store		// * Vector splat address w/known mask -> scalar store
// * Vector incrementing address -> vector masked store		// * Vector incrementing address -> vector masked store
Instruction *InstCombiner::simplifyMaskedScatter(IntrinsicInst &II) {		Instruction *InstCombinerImpl::simplifyMaskedScatter(IntrinsicInst &II) {
auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));		auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));
if (!ConstMask)		if (!ConstMask)
return nullptr;		return nullptr;

// If the mask is all zeros, a scatter does nothing.		// If the mask is all zeros, a scatter does nothing.
if (ConstMask->isNullValue())		if (ConstMask->isNullValue())
return eraseInstFromFunction(II);		return eraseInstFromFunction(II);

Show All 14 Lines
/// like:		/// like:
/// launder(launder(%x)) -> launder(%x) (the result is not the argument)		/// launder(launder(%x)) -> launder(%x) (the result is not the argument)
/// launder(strip(%x)) -> launder(%x)		/// launder(strip(%x)) -> launder(%x)
/// strip(strip(%x)) -> strip(%x) (the result is not the argument)		/// strip(strip(%x)) -> strip(%x) (the result is not the argument)
/// strip(launder(%x)) -> strip(%x)		/// strip(launder(%x)) -> strip(%x)
/// This is legal because it preserves the most recent information about		/// This is legal because it preserves the most recent information about
/// the presence or absence of invariant.group.		/// the presence or absence of invariant.group.
static Instruction *simplifyInvariantGroupIntrinsic(IntrinsicInst &II,		static Instruction *simplifyInvariantGroupIntrinsic(IntrinsicInst &II,
InstCombiner &IC) {		InstCombinerImpl &IC) {
auto *Arg = II.getArgOperand(0);		auto *Arg = II.getArgOperand(0);
auto *StrippedArg = Arg->stripPointerCasts();		auto *StrippedArg = Arg->stripPointerCasts();
auto *StrippedInvariantGroupsArg = Arg->stripPointerCastsAndInvariantGroups();		auto *StrippedInvariantGroupsArg = Arg->stripPointerCastsAndInvariantGroups();
if (StrippedArg == StrippedInvariantGroupsArg)		if (StrippedArg == StrippedInvariantGroupsArg)
return nullptr; // No launders/strips to remove.		return nullptr; // No launders/strips to remove.

Value *Result = nullptr;		Value *Result = nullptr;

if (II.getIntrinsicID() == Intrinsic::launder_invariant_group)		if (II.getIntrinsicID() == Intrinsic::launder_invariant_group)
Result = IC.Builder.CreateLaunderInvariantGroup(StrippedInvariantGroupsArg);		Result = IC.Builder.CreateLaunderInvariantGroup(StrippedInvariantGroupsArg);
else if (II.getIntrinsicID() == Intrinsic::strip_invariant_group)		else if (II.getIntrinsicID() == Intrinsic::strip_invariant_group)
Result = IC.Builder.CreateStripInvariantGroup(StrippedInvariantGroupsArg);		Result = IC.Builder.CreateStripInvariantGroup(StrippedInvariantGroupsArg);
else		else
llvm_unreachable(		llvm_unreachable(
"simplifyInvariantGroupIntrinsic only handles launder and strip");		"simplifyInvariantGroupIntrinsic only handles launder and strip");
if (Result->getType()->getPointerAddressSpace() !=		if (Result->getType()->getPointerAddressSpace() !=
II.getType()->getPointerAddressSpace())		II.getType()->getPointerAddressSpace())
Result = IC.Builder.CreateAddrSpaceCast(Result, II.getType());		Result = IC.Builder.CreateAddrSpaceCast(Result, II.getType());
if (Result->getType() != II.getType())		if (Result->getType() != II.getType())
Result = IC.Builder.CreateBitCast(Result, II.getType());		Result = IC.Builder.CreateBitCast(Result, II.getType());

return cast<Instruction>(Result);		return cast<Instruction>(Result);
}		}

static Instruction *foldCttzCtlz(IntrinsicInst &II, InstCombiner &IC) {		static Instruction *foldCttzCtlz(IntrinsicInst &II, InstCombinerImpl &IC) {
assert((II.getIntrinsicID() == Intrinsic::cttz \|\|		assert((II.getIntrinsicID() == Intrinsic::cttz \|\|
II.getIntrinsicID() == Intrinsic::ctlz) &&		II.getIntrinsicID() == Intrinsic::ctlz) &&
"Expected cttz or ctlz intrinsic");		"Expected cttz or ctlz intrinsic");
bool IsTZ = II.getIntrinsicID() == Intrinsic::cttz;		bool IsTZ = II.getIntrinsicID() == Intrinsic::cttz;
Value *Op0 = II.getArgOperand(0);		Value *Op0 = II.getArgOperand(0);
Value *X;		Value *X;
// ctlz(bitreverse(x)) -> cttz(x)		// ctlz(bitreverse(x)) -> cttz(x)
// cttz(bitreverse(x)) -> ctlz(x)		// cttz(bitreverse(x)) -> ctlz(x)
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (IT && IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
II.setMetadata(LLVMContext::MD_range,		II.setMetadata(LLVMContext::MD_range,
MDNode::get(II.getContext(), LowAndHigh));		MDNode::get(II.getContext(), LowAndHigh));
return &II;		return &II;
}		}

return nullptr;		return nullptr;
}		}

static Instruction *foldCtpop(IntrinsicInst &II, InstCombiner &IC) {		static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
assert(II.getIntrinsicID() == Intrinsic::ctpop &&		assert(II.getIntrinsicID() == Intrinsic::ctpop &&
"Expected ctpop intrinsic");		"Expected ctpop intrinsic");
Type *Ty = II.getType();		Type *Ty = II.getType();
unsigned BitWidth = Ty->getScalarSizeInBits();		unsigned BitWidth = Ty->getScalarSizeInBits();
Value *Op0 = II.getArgOperand(0);		Value *Op0 = II.getArgOperand(0);
Value *X;		Value *X;

// ctpop(bitreverse(x)) -> ctpop(x)		// ctpop(bitreverse(x)) -> ctpop(x)
Show All 38 Lines	if (IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
II.setMetadata(LLVMContext::MD_range,		II.setMetadata(LLVMContext::MD_range,
MDNode::get(II.getContext(), LowAndHigh));		MDNode::get(II.getContext(), LowAndHigh));
return &II;		return &II;
}		}

return nullptr;		return nullptr;
}		}

// TODO: If the x86 backend knew how to convert a bool vector mask back to an
// XMM register mask efficiently, we could transform all x86 masked intrinsics
// to LLVM masked intrinsics and remove the x86 masked intrinsic defs.
static Instruction *simplifyX86MaskedLoad(IntrinsicInst &II, InstCombiner &IC) {
Value *Ptr = II.getOperand(0);
Value *Mask = II.getOperand(1);
Constant *ZeroVec = Constant::getNullValue(II.getType());

// Special case a zero mask since that's not a ConstantDataVector.
// This masked load instruction creates a zero vector.
if (isa<ConstantAggregateZero>(Mask))
return IC.replaceInstUsesWith(II, ZeroVec);

auto *ConstMask = dyn_cast<ConstantDataVector>(Mask);
if (!ConstMask)
return nullptr;

// The mask is constant. Convert this x86 intrinsic to the LLVM instrinsic
// to allow target-independent optimizations.

// First, cast the x86 intrinsic scalar pointer to a vector pointer to match
// the LLVM intrinsic definition for the pointer argument.
unsigned AddrSpace = cast<PointerType>(Ptr->getType())->getAddressSpace();
PointerType *VecPtrTy = PointerType::get(II.getType(), AddrSpace);
Value *PtrCast = IC.Builder.CreateBitCast(Ptr, VecPtrTy, "castvec");

// Second, convert the x86 XMM integer vector mask to a vector of bools based
// on each element's most significant bit (the sign bit).
Constant *BoolMask = getNegativeIsTrueBoolVec(ConstMask);

// The pass-through vector for an x86 masked load is a zero vector.
CallInst *NewMaskedLoad =
IC.Builder.CreateMaskedLoad(PtrCast, Align(1), BoolMask, ZeroVec);
return IC.replaceInstUsesWith(II, NewMaskedLoad);
}

// TODO: If the x86 backend knew how to convert a bool vector mask back to an
// XMM register mask efficiently, we could transform all x86 masked intrinsics
// to LLVM masked intrinsics and remove the x86 masked intrinsic defs.
static bool simplifyX86MaskedStore(IntrinsicInst &II, InstCombiner &IC) {
Value *Ptr = II.getOperand(0);
Value *Mask = II.getOperand(1);
Value *Vec = II.getOperand(2);

// Special case a zero mask since that's not a ConstantDataVector:
// this masked store instruction does nothing.
if (isa<ConstantAggregateZero>(Mask)) {
IC.eraseInstFromFunction(II);
return true;
}

// The SSE2 version is too weird (eg, unaligned but non-temporal) to do
// anything else at this level.
if (II.getIntrinsicID() == Intrinsic::x86_sse2_maskmov_dqu)
return false;

auto *ConstMask = dyn_cast<ConstantDataVector>(Mask);
if (!ConstMask)
return false;

// The mask is constant. Convert this x86 intrinsic to the LLVM instrinsic
// to allow target-independent optimizations.

// First, cast the x86 intrinsic scalar pointer to a vector pointer to match
// the LLVM intrinsic definition for the pointer argument.
unsigned AddrSpace = cast<PointerType>(Ptr->getType())->getAddressSpace();
PointerType *VecPtrTy = PointerType::get(Vec->getType(), AddrSpace);
Value *PtrCast = IC.Builder.CreateBitCast(Ptr, VecPtrTy, "castvec");

// Second, convert the x86 XMM integer vector mask to a vector of bools based
// on each element's most significant bit (the sign bit).
Constant *BoolMask = getNegativeIsTrueBoolVec(ConstMask);

IC.Builder.CreateMaskedStore(Vec, PtrCast, Align(1), BoolMask);

// 'Replace uses' doesn't work for stores. Erase the original masked store.
IC.eraseInstFromFunction(II);
return true;
}

// Constant fold llvm.amdgcn.fmed3 intrinsics for standard inputs.
//
// A single NaN input is folded to minnum, so we rely on that folding for
// handling NaNs.
static APFloat fmed3AMDGCN(const APFloat &Src0, const APFloat &Src1,
const APFloat &Src2) {
APFloat Max3 = maxnum(maxnum(Src0, Src1), Src2);

APFloat::cmpResult Cmp0 = Max3.compare(Src0);
assert(Cmp0 != APFloat::cmpUnordered && "nans handled separately");
if (Cmp0 == APFloat::cmpEqual)
return maxnum(Src1, Src2);

APFloat::cmpResult Cmp1 = Max3.compare(Src1);
assert(Cmp1 != APFloat::cmpUnordered && "nans handled separately");
if (Cmp1 == APFloat::cmpEqual)
return maxnum(Src0, Src2);

return maxnum(Src0, Src1);
}

/// Convert a table lookup to shufflevector if the mask is constant.		/// Convert a table lookup to shufflevector if the mask is constant.
/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in		/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
/// which case we could lower the shufflevector with rev64 instructions		/// which case we could lower the shufflevector with rev64 instructions
/// as it's actually a byte reverse.		/// as it's actually a byte reverse.
static Value *simplifyNeonTbl1(const IntrinsicInst &II,		static Value *simplifyNeonTbl1(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
// Bail out if the mask is not a constant.		// Bail out if the mask is not a constant.
auto *C = dyn_cast<Constant>(II.getArgOperand(1));		auto *C = dyn_cast<Constant>(II.getArgOperand(1));
Show All 22 Lines	if ((unsigned)Indexes[I] >= NumElts)
return nullptr;		return nullptr;
}		}

auto *V1 = II.getArgOperand(0);		auto *V1 = II.getArgOperand(0);
auto *V2 = Constant::getNullValue(V1->getType());		auto *V2 = Constant::getNullValue(V1->getType());
return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes));		return Builder.CreateShuffleVector(V1, V2, makeArrayRef(Indexes));
}		}

/// Convert a vector load intrinsic into a simple llvm load instruction.
/// This is beneficial when the underlying object being addressed comes
/// from a constant, since we get constant-folding for free.
static Value *simplifyNeonVld1(const IntrinsicInst &II,
unsigned MemAlign,
InstCombiner::BuilderTy &Builder) {
auto *IntrAlign = dyn_cast<ConstantInt>(II.getArgOperand(1));

if (!IntrAlign)
return nullptr;

unsigned Alignment = IntrAlign->getLimitedValue() < MemAlign ?
MemAlign : IntrAlign->getLimitedValue();

if (!isPowerOf2_32(Alignment))
return nullptr;

auto *BCastInst = Builder.CreateBitCast(II.getArgOperand(0),
PointerType::get(II.getType(), 0));
return Builder.CreateAlignedLoad(II.getType(), BCastInst, Align(Alignment));
}

// Returns true iff the 2 intrinsics have the same operands, limiting the		// Returns true iff the 2 intrinsics have the same operands, limiting the
// comparison to the first NumOperands.		// comparison to the first NumOperands.
static bool haveSameOperands(const IntrinsicInst &I, const IntrinsicInst &E,		static bool haveSameOperands(const IntrinsicInst &I, const IntrinsicInst &E,
unsigned NumOperands) {		unsigned NumOperands) {
assert(I.getNumArgOperands() >= NumOperands && "Not enough operands");		assert(I.getNumArgOperands() >= NumOperands && "Not enough operands");
assert(E.getNumArgOperands() >= NumOperands && "Not enough operands");		assert(E.getNumArgOperands() >= NumOperands && "Not enough operands");
for (unsigned i = 0; i < NumOperands; i++)		for (unsigned i = 0; i < NumOperands; i++)
if (I.getArgOperand(i) != E.getArgOperand(i))		if (I.getArgOperand(i) != E.getArgOperand(i))
return false;		return false;
return true;		return true;
}		}

// Remove trivially empty start/end intrinsic ranges, i.e. a start		// Remove trivially empty start/end intrinsic ranges, i.e. a start
// immediately followed by an end (ignoring debuginfo or other		// immediately followed by an end (ignoring debuginfo or other
// start/end intrinsics in between). As this handles only the most trivial		// start/end intrinsics in between). As this handles only the most trivial
// cases, tracking the nesting level is not needed:		// cases, tracking the nesting level is not needed:
//		//
// call @llvm.foo.start(i1 0)		// call @llvm.foo.start(i1 0)
// call @llvm.foo.start(i1 0) ; This one won't be skipped: it will be removed		// call @llvm.foo.start(i1 0) ; This one won't be skipped: it will be removed
// call @llvm.foo.end(i1 0)		// call @llvm.foo.end(i1 0)
// call @llvm.foo.end(i1 0) ; &I		// call @llvm.foo.end(i1 0) ; &I
static bool removeTriviallyEmptyRange(		static bool
IntrinsicInst &EndI, InstCombiner &IC,		removeTriviallyEmptyRange(IntrinsicInst &EndI, InstCombinerImpl &IC,
std::function<bool(const IntrinsicInst &)> IsStart) {		std::function<bool(const IntrinsicInst &)> IsStart) {
// We start from the end intrinsic and scan backwards, so that InstCombine		// We start from the end intrinsic and scan backwards, so that InstCombine
// has already processed (and potentially removed) all the instructions		// has already processed (and potentially removed) all the instructions
// before the end intrinsic.		// before the end intrinsic.
BasicBlock::reverse_iterator BI(EndI), BE(EndI.getParent()->rend());		BasicBlock::reverse_iterator BI(EndI), BE(EndI.getParent()->rend());
for (; BI != BE; ++BI) {		for (; BI != BE; ++BI) {
if (auto I = dyn_cast<IntrinsicInst>(&BI)) {		if (auto I = dyn_cast<IntrinsicInst>(&BI)) {
if (isa<DbgInfoIntrinsic>(I) \|\|		if (isa<DbgInfoIntrinsic>(I) \|\|
I->getIntrinsicID() == EndI.getIntrinsicID())		I->getIntrinsicID() == EndI.getIntrinsicID())
Show All 9 Lines	if (auto I = dyn_cast<IntrinsicInst>(&BI)) {
}		}
}		}
break;		break;
}		}

return false;		return false;
}		}

// Convert NVVM intrinsics to target-generic LLVM code where possible.		Instruction *InstCombinerImpl::visitVAEndInst(VAEndInst &I) {
static Instruction SimplifyNVVMIntrinsic(IntrinsicInst II, InstCombiner &IC) {
// Each NVVM intrinsic we can simplify can be replaced with one of:
//
// * an LLVM intrinsic,
// * an LLVM cast operation,
// * an LLVM binary operation, or
// * ad-hoc LLVM IR for the particular operation.

// Some transformations are only valid when the module's
// flush-denormals-to-zero (ftz) setting is true/false, whereas other
// transformations are valid regardless of the module's ftz setting.
enum FtzRequirementTy {
FTZ_Any, // Any ftz setting is ok.
FTZ_MustBeOn, // Transformation is valid only if ftz is on.
FTZ_MustBeOff, // Transformation is valid only if ftz is off.
};
// Classes of NVVM intrinsics that can't be replaced one-to-one with a
// target-generic intrinsic, cast op, or binary op but that we can nonetheless
// simplify.
enum SpecialCase {
SPC_Reciprocal,
};

// SimplifyAction is a poor-man's variant (plus an additional flag) that
// represents how to replace an NVVM intrinsic with target-generic LLVM IR.
struct SimplifyAction {
// Invariant: At most one of these Optionals has a value.
Optional<Intrinsic::ID> IID;
Optional<Instruction::CastOps> CastOp;
Optional<Instruction::BinaryOps> BinaryOp;
Optional<SpecialCase> Special;

FtzRequirementTy FtzRequirement = FTZ_Any;

SimplifyAction() = default;

SimplifyAction(Intrinsic::ID IID, FtzRequirementTy FtzReq)
: IID(IID), FtzRequirement(FtzReq) {}

// Cast operations don't have anything to do with FTZ, so we skip that
// argument.
SimplifyAction(Instruction::CastOps CastOp) : CastOp(CastOp) {}

SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq)
: BinaryOp(BinaryOp), FtzRequirement(FtzReq) {}

SimplifyAction(SpecialCase Special, FtzRequirementTy FtzReq)
: Special(Special), FtzRequirement(FtzReq) {}
};

// Try to generate a SimplifyAction describing how to replace our
// IntrinsicInstr with target-generic LLVM IR.
const SimplifyAction Action = [II]() -> SimplifyAction {
switch (II->getIntrinsicID()) {
// NVVM intrinsics that map directly to LLVM intrinsics.
case Intrinsic::nvvm_ceil_d:
return {Intrinsic::ceil, FTZ_Any};
case Intrinsic::nvvm_ceil_f:
return {Intrinsic::ceil, FTZ_MustBeOff};
case Intrinsic::nvvm_ceil_ftz_f:
return {Intrinsic::ceil, FTZ_MustBeOn};
case Intrinsic::nvvm_fabs_d:
return {Intrinsic::fabs, FTZ_Any};
case Intrinsic::nvvm_fabs_f:
return {Intrinsic::fabs, FTZ_MustBeOff};
case Intrinsic::nvvm_fabs_ftz_f:
return {Intrinsic::fabs, FTZ_MustBeOn};
case Intrinsic::nvvm_floor_d:
return {Intrinsic::floor, FTZ_Any};
case Intrinsic::nvvm_floor_f:
return {Intrinsic::floor, FTZ_MustBeOff};
case Intrinsic::nvvm_floor_ftz_f:
return {Intrinsic::floor, FTZ_MustBeOn};
case Intrinsic::nvvm_fma_rn_d:
return {Intrinsic::fma, FTZ_Any};
case Intrinsic::nvvm_fma_rn_f:
return {Intrinsic::fma, FTZ_MustBeOff};
case Intrinsic::nvvm_fma_rn_ftz_f:
return {Intrinsic::fma, FTZ_MustBeOn};
case Intrinsic::nvvm_fmax_d:
return {Intrinsic::maxnum, FTZ_Any};
case Intrinsic::nvvm_fmax_f:
return {Intrinsic::maxnum, FTZ_MustBeOff};
case Intrinsic::nvvm_fmax_ftz_f:
return {Intrinsic::maxnum, FTZ_MustBeOn};
case Intrinsic::nvvm_fmin_d:
return {Intrinsic::minnum, FTZ_Any};
case Intrinsic::nvvm_fmin_f:
return {Intrinsic::minnum, FTZ_MustBeOff};
case Intrinsic::nvvm_fmin_ftz_f:
return {Intrinsic::minnum, FTZ_MustBeOn};
case Intrinsic::nvvm_round_d:
return {Intrinsic::round, FTZ_Any};
case Intrinsic::nvvm_round_f:
return {Intrinsic::round, FTZ_MustBeOff};
case Intrinsic::nvvm_round_ftz_f:
return {Intrinsic::round, FTZ_MustBeOn};
case Intrinsic::nvvm_sqrt_rn_d:
return {Intrinsic::sqrt, FTZ_Any};
case Intrinsic::nvvm_sqrt_f:
// nvvm_sqrt_f is a special case. For most intrinsics, foo_ftz_f is the
// ftz version, and foo_f is the non-ftz version. But nvvm_sqrt_f adopts
// the ftz-ness of the surrounding code. sqrt_rn_f and sqrt_rn_ftz_f are
// the versions with explicit ftz-ness.
return {Intrinsic::sqrt, FTZ_Any};
case Intrinsic::nvvm_sqrt_rn_f:
return {Intrinsic::sqrt, FTZ_MustBeOff};
case Intrinsic::nvvm_sqrt_rn_ftz_f:
return {Intrinsic::sqrt, FTZ_MustBeOn};
case Intrinsic::nvvm_trunc_d:
return {Intrinsic::trunc, FTZ_Any};
case Intrinsic::nvvm_trunc_f:
return {Intrinsic::trunc, FTZ_MustBeOff};
case Intrinsic::nvvm_trunc_ftz_f:
return {Intrinsic::trunc, FTZ_MustBeOn};

// NVVM intrinsics that map to LLVM cast operations.
//
// Note that llvm's target-generic conversion operators correspond to the rz
// (round to zero) versions of the nvvm conversion intrinsics, even though
// most everything else here uses the rn (round to nearest even) nvvm ops.
case Intrinsic::nvvm_d2i_rz:
case Intrinsic::nvvm_f2i_rz:
case Intrinsic::nvvm_d2ll_rz:
case Intrinsic::nvvm_f2ll_rz:
return {Instruction::FPToSI};
case Intrinsic::nvvm_d2ui_rz:
case Intrinsic::nvvm_f2ui_rz:
case Intrinsic::nvvm_d2ull_rz:
case Intrinsic::nvvm_f2ull_rz:
return {Instruction::FPToUI};
case Intrinsic::nvvm_i2d_rz:
case Intrinsic::nvvm_i2f_rz:
case Intrinsic::nvvm_ll2d_rz:
case Intrinsic::nvvm_ll2f_rz:
return {Instruction::SIToFP};
case Intrinsic::nvvm_ui2d_rz:
case Intrinsic::nvvm_ui2f_rz:
case Intrinsic::nvvm_ull2d_rz:
case Intrinsic::nvvm_ull2f_rz:
return {Instruction::UIToFP};

// NVVM intrinsics that map to LLVM binary ops.
case Intrinsic::nvvm_add_rn_d:
return {Instruction::FAdd, FTZ_Any};
case Intrinsic::nvvm_add_rn_f:
return {Instruction::FAdd, FTZ_MustBeOff};
case Intrinsic::nvvm_add_rn_ftz_f:
return {Instruction::FAdd, FTZ_MustBeOn};
case Intrinsic::nvvm_mul_rn_d:
return {Instruction::FMul, FTZ_Any};
case Intrinsic::nvvm_mul_rn_f:
return {Instruction::FMul, FTZ_MustBeOff};
case Intrinsic::nvvm_mul_rn_ftz_f:
return {Instruction::FMul, FTZ_MustBeOn};
case Intrinsic::nvvm_div_rn_d:
return {Instruction::FDiv, FTZ_Any};
case Intrinsic::nvvm_div_rn_f:
return {Instruction::FDiv, FTZ_MustBeOff};
case Intrinsic::nvvm_div_rn_ftz_f:
return {Instruction::FDiv, FTZ_MustBeOn};

// The remainder of cases are NVVM intrinsics that map to LLVM idioms, but
// need special handling.
//
// We seem to be missing intrinsics for rcp.approx.{ftz.}f32, which is just
// as well.
case Intrinsic::nvvm_rcp_rn_d:
return {SPC_Reciprocal, FTZ_Any};
case Intrinsic::nvvm_rcp_rn_f:
return {SPC_Reciprocal, FTZ_MustBeOff};
case Intrinsic::nvvm_rcp_rn_ftz_f:
return {SPC_Reciprocal, FTZ_MustBeOn};

// We do not currently simplify intrinsics that give an approximate answer.
// These include:
//
// - nvvm_cos_approx_{f,ftz_f}
// - nvvm_ex2_approx_{d,f,ftz_f}
// - nvvm_lg2_approx_{d,f,ftz_f}
// - nvvm_sin_approx_{f,ftz_f}
// - nvvm_sqrt_approx_{f,ftz_f}
// - nvvm_rsqrt_approx_{d,f,ftz_f}
// - nvvm_div_approx_{ftz_d,ftz_f,f}
// - nvvm_rcp_approx_ftz_d
//
// Ideally we'd encode them as e.g. "fast call @llvm.cos", where "fast"
// means that fastmath is enabled in the intrinsic. Unfortunately only
// binary operators (currently) have a fastmath bit in SelectionDAG, so this
// information gets lost and we can't select on it.
//
// TODO: div and rcp are lowered to a binary op, so these we could in theory
// lower them to "fast fdiv".

default:
return {};
}
}();

// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we
// can bail out now. (Notice that in the case that IID is not an NVVM
// intrinsic, we don't have to look up any module metadata, as
// FtzRequirementTy will be FTZ_Any.)
if (Action.FtzRequirement != FTZ_Any) {
StringRef Attr = II->getFunction()
->getFnAttribute("denormal-fp-math-f32")
.getValueAsString();
DenormalMode Mode = parseDenormalFPAttribute(Attr);
bool FtzEnabled = Mode.Output != DenormalMode::IEEE;

if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
return nullptr;
}

// Simplify to target-generic intrinsic.
if (Action.IID) {
SmallVector<Value *, 4> Args(II->arg_operands());
// All the target-generic intrinsics currently of interest to us have one
// type argument, equal to that of the nvvm intrinsic's argument.
Type *Tys[] = {II->getArgOperand(0)->getType()};
return CallInst::Create(
Intrinsic::getDeclaration(II->getModule(), *Action.IID, Tys), Args);
}

// Simplify to target-generic binary op.
if (Action.BinaryOp)
return BinaryOperator::Create(*Action.BinaryOp, II->getArgOperand(0),
II->getArgOperand(1), II->getName());

// Simplify to target-generic cast op.
if (Action.CastOp)
return CastInst::Create(*Action.CastOp, II->getArgOperand(0), II->getType(),
II->getName());

// All that's left are the special cases.
if (!Action.Special)
return nullptr;

switch (*Action.Special) {
case SPC_Reciprocal:
// Simplify reciprocal.
return BinaryOperator::Create(
Instruction::FDiv, ConstantFP::get(II->getArgOperand(0)->getType(), 1),
II->getArgOperand(0), II->getName());
}
llvm_unreachable("All SpecialCase enumerators should be handled in switch.");
}

Instruction *InstCombiner::visitVAEndInst(VAEndInst &I) {
removeTriviallyEmptyRange(I, *this, [](const IntrinsicInst &I) {		removeTriviallyEmptyRange(I, *this, [](const IntrinsicInst &I) {
return I.getIntrinsicID() == Intrinsic::vastart \|\|		return I.getIntrinsicID() == Intrinsic::vastart \|\|
I.getIntrinsicID() == Intrinsic::vacopy;		I.getIntrinsicID() == Intrinsic::vacopy;
});		});
return nullptr;		return nullptr;
}		}

static Instruction *canonicalizeConstantArg0ToArg1(CallInst &Call) {		static Instruction *canonicalizeConstantArg0ToArg1(CallInst &Call) {
assert(Call.getNumArgOperands() > 1 && "Need at least 2 args to swap");		assert(Call.getNumArgOperands() > 1 && "Need at least 2 args to swap");
Value Arg0 = Call.getArgOperand(0), Arg1 = Call.getArgOperand(1);		Value Arg0 = Call.getArgOperand(0), Arg1 = Call.getArgOperand(1);
if (isa<Constant>(Arg0) && !isa<Constant>(Arg1)) {		if (isa<Constant>(Arg0) && !isa<Constant>(Arg1)) {
Call.setArgOperand(0, Arg1);		Call.setArgOperand(0, Arg1);
Call.setArgOperand(1, Arg0);		Call.setArgOperand(1, Arg0);
return &Call;		return &Call;
}		}
return nullptr;		return nullptr;
}		}

Instruction InstCombiner::foldIntrinsicWithOverflowCommon(IntrinsicInst II) {		Instruction *
		InstCombinerImpl::foldIntrinsicWithOverflowCommon(IntrinsicInst *II) {
WithOverflowInst *WO = cast<WithOverflowInst>(II);		WithOverflowInst *WO = cast<WithOverflowInst>(II);
Value *OperationResult = nullptr;		Value *OperationResult = nullptr;
Constant *OverflowResult = nullptr;		Constant *OverflowResult = nullptr;
if (OptimizeOverflowCheck(WO->getBinaryOp(), WO->isSigned(), WO->getLHS(),		if (OptimizeOverflowCheck(WO->getBinaryOp(), WO->isSigned(), WO->getLHS(),
WO->getRHS(), *WO, OperationResult, OverflowResult))		WO->getRHS(), *WO, OperationResult, OverflowResult))
return CreateOverflowTuple(WO, OperationResult, OverflowResult);		return CreateOverflowTuple(WO, OperationResult, OverflowResult);
return nullptr;		return nullptr;
}		}

/// CallInst simplification. This mostly only handles folding of intrinsic		/// CallInst simplification. This mostly only handles folding of intrinsic
/// instructions. For normal calls, it allows visitCallBase to do the heavy		/// instructions. For normal calls, it allows visitCallBase to do the heavy
/// lifting.		/// lifting.
Instruction *InstCombiner::visitCallInst(CallInst &CI) {		Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
// Don't try to simplify calls without uses. It will not do anything useful,		// Don't try to simplify calls without uses. It will not do anything useful,
// but will result in the following folds being skipped.		// but will result in the following folds being skipped.
if (!CI.use_empty())		if (!CI.use_empty())
if (Value *V = SimplifyCall(&CI, SQ.getWithInstruction(&CI)))		if (Value *V = SimplifyCall(&CI, SQ.getWithInstruction(&CI)))
return replaceInstUsesWith(CI, V);		return replaceInstUsesWith(CI, V);

if (isFreeCall(&CI, &TLI))		if (isFreeCall(&CI, &TLI))
return visitFree(CI);		return visitFree(CI);
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	if (auto *IIFVTy = dyn_cast<FixedVectorType>(II->getType())) {
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {
if (V != II)		if (V != II)
return replaceInstUsesWith(*II, V);		return replaceInstUsesWith(*II, V);
return II;		return II;
}		}
}		}

if (Instruction I = SimplifyNVVMIntrinsic(II, this))
return I;

auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,
unsigned DemandedWidth) {
APInt UndefElts(Width, 0);
APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);
return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
};

Intrinsic::ID IID = II->getIntrinsicID();		Intrinsic::ID IID = II->getIntrinsicID();
switch (IID) {		switch (IID) {
default: break;
case Intrinsic::objectsize:		case Intrinsic::objectsize:
if (Value V = lowerObjectSizeCall(II, DL, &TLI, /MustSucceed=*/false))		if (Value V = lowerObjectSizeCall(II, DL, &TLI, /MustSucceed=*/false))
return replaceInstUsesWith(CI, V);		return replaceInstUsesWith(CI, V);
return nullptr;		return nullptr;
case Intrinsic::bswap: {		case Intrinsic::bswap: {
Value *IIOperand = II->getArgOperand(0);		Value *IIOperand = II->getArgOperand(0);
Value *X = nullptr;		Value *X = nullptr;

▲ Show 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	if (match(II->getArgOperand(0), m_OneUse(m_FNeg(m_Value(X))))) {
// sin(-x) --> -sin(x)		// sin(-x) --> -sin(x)
Value *NewSin = Builder.CreateUnaryIntrinsic(Intrinsic::sin, X, II);		Value *NewSin = Builder.CreateUnaryIntrinsic(Intrinsic::sin, X, II);
Instruction *FNeg = UnaryOperator::CreateFNeg(NewSin);		Instruction *FNeg = UnaryOperator::CreateFNeg(NewSin);
FNeg->copyFastMathFlags(II);		FNeg->copyFastMathFlags(II);
return FNeg;		return FNeg;
}		}
break;		break;
}		}
case Intrinsic::ppc_altivec_lvx:
case Intrinsic::ppc_altivec_lvxl:
// Turn PPC lvx -> load if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(0), Align(16), DL, II, &AC,
&DT) >= 16) {
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(0),
PointerType::getUnqual(II->getType()));
return new LoadInst(II->getType(), Ptr, "", false, Align(16));
}
break;
case Intrinsic::ppc_vsx_lxvw4x:
case Intrinsic::ppc_vsx_lxvd2x: {
// Turn PPC VSX loads into normal loads.
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(0),
PointerType::getUnqual(II->getType()));
return new LoadInst(II->getType(), Ptr, Twine(""), false, Align(1));
}
case Intrinsic::ppc_altivec_stvx:
case Intrinsic::ppc_altivec_stvxl:
// Turn stvx -> store if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(1), Align(16), DL, II, &AC,
&DT) >= 16) {
Type *OpPtrTy =
PointerType::getUnqual(II->getArgOperand(0)->getType());
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(1), OpPtrTy);
return new StoreInst(II->getArgOperand(0), Ptr, false, Align(16));
}
break;
case Intrinsic::ppc_vsx_stxvw4x:
case Intrinsic::ppc_vsx_stxvd2x: {
// Turn PPC VSX stores into normal stores.
Type *OpPtrTy = PointerType::getUnqual(II->getArgOperand(0)->getType());
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(1), OpPtrTy);
return new StoreInst(II->getArgOperand(0), Ptr, false, Align(1));
}
case Intrinsic::ppc_qpx_qvlfs:
// Turn PPC QPX qvlfs -> load if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(0), Align(16), DL, II, &AC,
&DT) >= 16) {
Type *VTy =
VectorType::get(Builder.getFloatTy(),
cast<VectorType>(II->getType())->getElementCount());
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(0),
PointerType::getUnqual(VTy));
Value *Load = Builder.CreateLoad(VTy, Ptr);
return new FPExtInst(Load, II->getType());
}
break;
case Intrinsic::ppc_qpx_qvlfd:
// Turn PPC QPX qvlfd -> load if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(0), Align(32), DL, II, &AC,
&DT) >= 32) {
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(0),
PointerType::getUnqual(II->getType()));
return new LoadInst(II->getType(), Ptr, "", false, Align(32));
}
break;
case Intrinsic::ppc_qpx_qvstfs:
// Turn PPC QPX qvstfs -> store if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(1), Align(16), DL, II, &AC,
&DT) >= 16) {
Type *VTy = VectorType::get(
Builder.getFloatTy(),
cast<VectorType>(II->getArgOperand(0)->getType())->getElementCount());
Value *TOp = Builder.CreateFPTrunc(II->getArgOperand(0), VTy);
Type *OpPtrTy = PointerType::getUnqual(VTy);
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(1), OpPtrTy);
return new StoreInst(TOp, Ptr, false, Align(16));
}
break;
case Intrinsic::ppc_qpx_qvstfd:
// Turn PPC QPX qvstfd -> store if the pointer is known aligned.
if (getOrEnforceKnownAlignment(II->getArgOperand(1), Align(32), DL, II, &AC,
&DT) >= 32) {
Type *OpPtrTy =
PointerType::getUnqual(II->getArgOperand(0)->getType());
Value *Ptr = Builder.CreateBitCast(II->getArgOperand(1), OpPtrTy);
return new StoreInst(II->getArgOperand(0), Ptr, false, Align(32));
}
break;

case Intrinsic::x86_bmi_bextr_32:
case Intrinsic::x86_bmi_bextr_64:
case Intrinsic::x86_tbm_bextri_u32:
case Intrinsic::x86_tbm_bextri_u64:
// If the RHS is a constant we can try some simplifications.
if (auto *C = dyn_cast<ConstantInt>(II->getArgOperand(1))) {
uint64_t Shift = C->getZExtValue();
uint64_t Length = (Shift >> 8) & 0xff;
Shift &= 0xff;
unsigned BitWidth = II->getType()->getIntegerBitWidth();
// If the length is 0 or the shift is out of range, replace with zero.
if (Length == 0 \|\| Shift >= BitWidth)
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), 0));
// If the LHS is also a constant, we can completely constant fold this.
if (auto *InC = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
uint64_t Result = InC->getZExtValue() >> Shift;
if (Length > BitWidth)
Length = BitWidth;
Result &= maskTrailingOnes<uint64_t>(Length);
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), Result));
}
// TODO should we turn this into 'and' if shift is 0? Or 'shl' if we
// are only masking bits that a shift already cleared?
}
break;

case Intrinsic::x86_bmi_bzhi_32:
case Intrinsic::x86_bmi_bzhi_64:
// If the RHS is a constant we can try some simplifications.
if (auto *C = dyn_cast<ConstantInt>(II->getArgOperand(1))) {
uint64_t Index = C->getZExtValue() & 0xff;
unsigned BitWidth = II->getType()->getIntegerBitWidth();
if (Index >= BitWidth)
return replaceInstUsesWith(CI, II->getArgOperand(0));
if (Index == 0)
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), 0));
// If the LHS is also a constant, we can completely constant fold this.
if (auto *InC = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
uint64_t Result = InC->getZExtValue();
Result &= maskTrailingOnes<uint64_t>(Index);
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), Result));
}
// TODO should we convert this to an AND if the RHS is constant?
}
break;
case Intrinsic::x86_bmi_pext_32:
case Intrinsic::x86_bmi_pext_64:
if (auto *MaskC = dyn_cast<ConstantInt>(II->getArgOperand(1))) {
if (MaskC->isNullValue())
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), 0));
if (MaskC->isAllOnesValue())
return replaceInstUsesWith(CI, II->getArgOperand(0));

if (auto *SrcC = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
uint64_t Src = SrcC->getZExtValue();
uint64_t Mask = MaskC->getZExtValue();
uint64_t Result = 0;
uint64_t BitToSet = 1;

while (Mask) {
// Isolate lowest set bit.
uint64_t BitToTest = Mask & -Mask;
if (BitToTest & Src)
Result \|= BitToSet;

BitToSet <<= 1;
// Clear lowest set bit.
Mask &= Mask - 1;
}

return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), Result));
}
}
break;
case Intrinsic::x86_bmi_pdep_32:
case Intrinsic::x86_bmi_pdep_64:
if (auto *MaskC = dyn_cast<ConstantInt>(II->getArgOperand(1))) {
if (MaskC->isNullValue())
return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), 0));
if (MaskC->isAllOnesValue())
return replaceInstUsesWith(CI, II->getArgOperand(0));

if (auto *SrcC = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
uint64_t Src = SrcC->getZExtValue();
uint64_t Mask = MaskC->getZExtValue();
uint64_t Result = 0;
uint64_t BitToTest = 1;

while (Mask) {
// Isolate lowest set bit.
uint64_t BitToSet = Mask & -Mask;
if (BitToTest & Src)
Result \|= BitToSet;

BitToTest <<= 1;
// Clear lowest set bit;
Mask &= Mask - 1;
}

return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), Result));
}
}
break;

case Intrinsic::x86_sse_cvtss2si:
case Intrinsic::x86_sse_cvtss2si64:
case Intrinsic::x86_sse_cvttss2si:
case Intrinsic::x86_sse_cvttss2si64:
case Intrinsic::x86_sse2_cvtsd2si:
case Intrinsic::x86_sse2_cvtsd2si64:
case Intrinsic::x86_sse2_cvttsd2si:
case Intrinsic::x86_sse2_cvttsd2si64:
case Intrinsic::x86_avx512_vcvtss2si32:
case Intrinsic::x86_avx512_vcvtss2si64:
case Intrinsic::x86_avx512_vcvtss2usi32:
case Intrinsic::x86_avx512_vcvtss2usi64:
case Intrinsic::x86_avx512_vcvtsd2si32:
case Intrinsic::x86_avx512_vcvtsd2si64:
case Intrinsic::x86_avx512_vcvtsd2usi32:
case Intrinsic::x86_avx512_vcvtsd2usi64:
case Intrinsic::x86_avx512_cvttss2si:
case Intrinsic::x86_avx512_cvttss2si64:
case Intrinsic::x86_avx512_cvttss2usi:
case Intrinsic::x86_avx512_cvttss2usi64:
case Intrinsic::x86_avx512_cvttsd2si:
case Intrinsic::x86_avx512_cvttsd2si64:
case Intrinsic::x86_avx512_cvttsd2usi:
case Intrinsic::x86_avx512_cvttsd2usi64: {
// These intrinsics only demand the 0th element of their input vectors. If
// we can simplify the input based on that, do so now.
Value *Arg = II->getArgOperand(0);
unsigned VWidth = cast<VectorType>(Arg->getType())->getNumElements();
if (Value *V = SimplifyDemandedVectorEltsLow(Arg, VWidth, 1))
return replaceOperand(*II, 0, V);
break;
}

case Intrinsic::x86_mmx_pmovmskb:
case Intrinsic::x86_sse_movmsk_ps:
case Intrinsic::x86_sse2_movmsk_pd:
case Intrinsic::x86_sse2_pmovmskb_128:
case Intrinsic::x86_avx_movmsk_pd_256:
case Intrinsic::x86_avx_movmsk_ps_256:
case Intrinsic::x86_avx2_pmovmskb:
if (Value V = simplifyX86movmsk(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_sse_comieq_ss:
case Intrinsic::x86_sse_comige_ss:
case Intrinsic::x86_sse_comigt_ss:
case Intrinsic::x86_sse_comile_ss:
case Intrinsic::x86_sse_comilt_ss:
case Intrinsic::x86_sse_comineq_ss:
case Intrinsic::x86_sse_ucomieq_ss:
case Intrinsic::x86_sse_ucomige_ss:
case Intrinsic::x86_sse_ucomigt_ss:
case Intrinsic::x86_sse_ucomile_ss:
case Intrinsic::x86_sse_ucomilt_ss:
case Intrinsic::x86_sse_ucomineq_ss:
case Intrinsic::x86_sse2_comieq_sd:
case Intrinsic::x86_sse2_comige_sd:
case Intrinsic::x86_sse2_comigt_sd:
case Intrinsic::x86_sse2_comile_sd:
case Intrinsic::x86_sse2_comilt_sd:
case Intrinsic::x86_sse2_comineq_sd:
case Intrinsic::x86_sse2_ucomieq_sd:
case Intrinsic::x86_sse2_ucomige_sd:
case Intrinsic::x86_sse2_ucomigt_sd:
case Intrinsic::x86_sse2_ucomile_sd:
case Intrinsic::x86_sse2_ucomilt_sd:
case Intrinsic::x86_sse2_ucomineq_sd:
case Intrinsic::x86_avx512_vcomi_ss:
case Intrinsic::x86_avx512_vcomi_sd:
case Intrinsic::x86_avx512_mask_cmp_ss:
case Intrinsic::x86_avx512_mask_cmp_sd: {
// These intrinsics only demand the 0th element of their input vectors. If
// we can simplify the input based on that, do so now.
bool MadeChange = false;
Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);
unsigned VWidth = cast<VectorType>(Arg0->getType())->getNumElements();
if (Value *V = SimplifyDemandedVectorEltsLow(Arg0, VWidth, 1)) {
replaceOperand(*II, 0, V);
MadeChange = true;
}
if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
replaceOperand(*II, 1, V);
MadeChange = true;
}
if (MadeChange)
return II;
break;
}
case Intrinsic::x86_avx512_cmp_pd_128:
case Intrinsic::x86_avx512_cmp_pd_256:
case Intrinsic::x86_avx512_cmp_pd_512:
case Intrinsic::x86_avx512_cmp_ps_128:
case Intrinsic::x86_avx512_cmp_ps_256:
case Intrinsic::x86_avx512_cmp_ps_512: {
// Folding cmp(sub(a,b),0) -> cmp(a,b) and cmp(0,sub(a,b)) -> cmp(b,a)
Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);
bool Arg0IsZero = match(Arg0, m_PosZeroFP());
if (Arg0IsZero)
std::swap(Arg0, Arg1);
Value A, B;
// This fold requires only the NINF(not +/- inf) since inf minus
// inf is nan.
// NSZ(No Signed Zeros) is not needed because zeros of any sign are
// equal for both compares.
// NNAN is not needed because nans compare the same for both compares.
// The compare intrinsic uses the above assumptions and therefore
// doesn't require additional flags.
if ((match(Arg0, m_OneUse(m_FSub(m_Value(A), m_Value(B)))) &&
match(Arg1, m_PosZeroFP()) && isa<Instruction>(Arg0) &&
cast<Instruction>(Arg0)->getFastMathFlags().noInfs())) {
if (Arg0IsZero)
std::swap(A, B);
replaceOperand(*II, 0, A);
replaceOperand(*II, 1, B);
return II;
}
break;
}

case Intrinsic::x86_avx512_add_ps_512:
case Intrinsic::x86_avx512_div_ps_512:
case Intrinsic::x86_avx512_mul_ps_512:
case Intrinsic::x86_avx512_sub_ps_512:
case Intrinsic::x86_avx512_add_pd_512:
case Intrinsic::x86_avx512_div_pd_512:
case Intrinsic::x86_avx512_mul_pd_512:
case Intrinsic::x86_avx512_sub_pd_512:
// If the rounding mode is CUR_DIRECTION(4) we can turn these into regular
// IR operations.
if (auto *R = dyn_cast<ConstantInt>(II->getArgOperand(2))) {
if (R->getValue() == 4) {
Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);

Value *V;
switch (IID) {
default: llvm_unreachable("Case stmts out of sync!");
case Intrinsic::x86_avx512_add_ps_512:
case Intrinsic::x86_avx512_add_pd_512:
V = Builder.CreateFAdd(Arg0, Arg1);
break;
case Intrinsic::x86_avx512_sub_ps_512:
case Intrinsic::x86_avx512_sub_pd_512:
V = Builder.CreateFSub(Arg0, Arg1);
break;
case Intrinsic::x86_avx512_mul_ps_512:
case Intrinsic::x86_avx512_mul_pd_512:
V = Builder.CreateFMul(Arg0, Arg1);
break;
case Intrinsic::x86_avx512_div_ps_512:
case Intrinsic::x86_avx512_div_pd_512:
V = Builder.CreateFDiv(Arg0, Arg1);
break;
}

return replaceInstUsesWith(*II, V);
}
}
break;

case Intrinsic::x86_avx512_mask_add_ss_round:
case Intrinsic::x86_avx512_mask_div_ss_round:
case Intrinsic::x86_avx512_mask_mul_ss_round:
case Intrinsic::x86_avx512_mask_sub_ss_round:
case Intrinsic::x86_avx512_mask_add_sd_round:
case Intrinsic::x86_avx512_mask_div_sd_round:
case Intrinsic::x86_avx512_mask_mul_sd_round:
case Intrinsic::x86_avx512_mask_sub_sd_round:
// If the rounding mode is CUR_DIRECTION(4) we can turn these into regular
// IR operations.
if (auto *R = dyn_cast<ConstantInt>(II->getArgOperand(4))) {
if (R->getValue() == 4) {
// Extract the element as scalars.
Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);
Value *LHS = Builder.CreateExtractElement(Arg0, (uint64_t)0);
Value *RHS = Builder.CreateExtractElement(Arg1, (uint64_t)0);

Value *V;
switch (IID) {
default: llvm_unreachable("Case stmts out of sync!");
case Intrinsic::x86_avx512_mask_add_ss_round:
case Intrinsic::x86_avx512_mask_add_sd_round:
V = Builder.CreateFAdd(LHS, RHS);
break;
case Intrinsic::x86_avx512_mask_sub_ss_round:
case Intrinsic::x86_avx512_mask_sub_sd_round:
V = Builder.CreateFSub(LHS, RHS);
break;
case Intrinsic::x86_avx512_mask_mul_ss_round:
case Intrinsic::x86_avx512_mask_mul_sd_round:
V = Builder.CreateFMul(LHS, RHS);
break;
case Intrinsic::x86_avx512_mask_div_ss_round:
case Intrinsic::x86_avx512_mask_div_sd_round:
V = Builder.CreateFDiv(LHS, RHS);
break;
}

// Handle the masking aspect of the intrinsic.
Value *Mask = II->getArgOperand(3);
auto *C = dyn_cast<ConstantInt>(Mask);
// We don't need a select if we know the mask bit is a 1.
if (!C \|\| !C->getValue()[0]) {
// Cast the mask to an i1 vector and then extract the lowest element.
auto *MaskTy = FixedVectorType::get(
Builder.getInt1Ty(),
cast<IntegerType>(Mask->getType())->getBitWidth());
Mask = Builder.CreateBitCast(Mask, MaskTy);
Mask = Builder.CreateExtractElement(Mask, (uint64_t)0);
// Extract the lowest element from the passthru operand.
Value *Passthru = Builder.CreateExtractElement(II->getArgOperand(2),
(uint64_t)0);
V = Builder.CreateSelect(Mask, V, Passthru);
}

// Insert the result back into the original argument 0.
V = Builder.CreateInsertElement(Arg0, V, (uint64_t)0);

return replaceInstUsesWith(*II, V);
}
}
break;

// Constant fold ashr( <A x Bi>, Ci ).
// Constant fold lshr( <A x Bi>, Ci ).
// Constant fold shl( <A x Bi>, Ci ).
case Intrinsic::x86_sse2_psrai_d:
case Intrinsic::x86_sse2_psrai_w:
case Intrinsic::x86_avx2_psrai_d:
case Intrinsic::x86_avx2_psrai_w:
case Intrinsic::x86_avx512_psrai_q_128:
case Intrinsic::x86_avx512_psrai_q_256:
case Intrinsic::x86_avx512_psrai_d_512:
case Intrinsic::x86_avx512_psrai_q_512:
case Intrinsic::x86_avx512_psrai_w_512:
case Intrinsic::x86_sse2_psrli_d:
case Intrinsic::x86_sse2_psrli_q:
case Intrinsic::x86_sse2_psrli_w:
case Intrinsic::x86_avx2_psrli_d:
case Intrinsic::x86_avx2_psrli_q:
case Intrinsic::x86_avx2_psrli_w:
case Intrinsic::x86_avx512_psrli_d_512:
case Intrinsic::x86_avx512_psrli_q_512:
case Intrinsic::x86_avx512_psrli_w_512:
case Intrinsic::x86_sse2_pslli_d:
case Intrinsic::x86_sse2_pslli_q:
case Intrinsic::x86_sse2_pslli_w:
case Intrinsic::x86_avx2_pslli_d:
case Intrinsic::x86_avx2_pslli_q:
case Intrinsic::x86_avx2_pslli_w:
case Intrinsic::x86_avx512_pslli_d_512:
case Intrinsic::x86_avx512_pslli_q_512:
case Intrinsic::x86_avx512_pslli_w_512:
if (Value V = simplifyX86immShift(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_sse2_psra_d:
case Intrinsic::x86_sse2_psra_w:
case Intrinsic::x86_avx2_psra_d:
case Intrinsic::x86_avx2_psra_w:
case Intrinsic::x86_avx512_psra_q_128:
case Intrinsic::x86_avx512_psra_q_256:
case Intrinsic::x86_avx512_psra_d_512:
case Intrinsic::x86_avx512_psra_q_512:
case Intrinsic::x86_avx512_psra_w_512:
case Intrinsic::x86_sse2_psrl_d:
case Intrinsic::x86_sse2_psrl_q:
case Intrinsic::x86_sse2_psrl_w:
case Intrinsic::x86_avx2_psrl_d:
case Intrinsic::x86_avx2_psrl_q:
case Intrinsic::x86_avx2_psrl_w:
case Intrinsic::x86_avx512_psrl_d_512:
case Intrinsic::x86_avx512_psrl_q_512:
case Intrinsic::x86_avx512_psrl_w_512:
case Intrinsic::x86_sse2_psll_d:
case Intrinsic::x86_sse2_psll_q:
case Intrinsic::x86_sse2_psll_w:
case Intrinsic::x86_avx2_psll_d:
case Intrinsic::x86_avx2_psll_q:
case Intrinsic::x86_avx2_psll_w:
case Intrinsic::x86_avx512_psll_d_512:
case Intrinsic::x86_avx512_psll_q_512:
case Intrinsic::x86_avx512_psll_w_512: {
if (Value V = simplifyX86immShift(II, Builder))
return replaceInstUsesWith(*II, V);

// SSE2/AVX2 uses only the first 64-bits of the 128-bit vector
// operand to compute the shift amount.
Value *Arg1 = II->getArgOperand(1);
assert(Arg1->getType()->getPrimitiveSizeInBits() == 128 &&
"Unexpected packed shift size");
unsigned VWidth = cast<VectorType>(Arg1->getType())->getNumElements();

if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, VWidth / 2))
return replaceOperand(*II, 1, V);
break;
}

case Intrinsic::x86_avx2_psllv_d:
case Intrinsic::x86_avx2_psllv_d_256:
case Intrinsic::x86_avx2_psllv_q:
case Intrinsic::x86_avx2_psllv_q_256:
case Intrinsic::x86_avx512_psllv_d_512:
case Intrinsic::x86_avx512_psllv_q_512:
case Intrinsic::x86_avx512_psllv_w_128:
case Intrinsic::x86_avx512_psllv_w_256:
case Intrinsic::x86_avx512_psllv_w_512:
case Intrinsic::x86_avx2_psrav_d:
case Intrinsic::x86_avx2_psrav_d_256:
case Intrinsic::x86_avx512_psrav_q_128:
case Intrinsic::x86_avx512_psrav_q_256:
case Intrinsic::x86_avx512_psrav_d_512:
case Intrinsic::x86_avx512_psrav_q_512:
case Intrinsic::x86_avx512_psrav_w_128:
case Intrinsic::x86_avx512_psrav_w_256:
case Intrinsic::x86_avx512_psrav_w_512:
case Intrinsic::x86_avx2_psrlv_d:
case Intrinsic::x86_avx2_psrlv_d_256:
case Intrinsic::x86_avx2_psrlv_q:
case Intrinsic::x86_avx2_psrlv_q_256:
case Intrinsic::x86_avx512_psrlv_d_512:
case Intrinsic::x86_avx512_psrlv_q_512:
case Intrinsic::x86_avx512_psrlv_w_128:
case Intrinsic::x86_avx512_psrlv_w_256:
case Intrinsic::x86_avx512_psrlv_w_512:
if (Value V = simplifyX86varShift(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_sse2_packssdw_128:
case Intrinsic::x86_sse2_packsswb_128:
case Intrinsic::x86_avx2_packssdw:
case Intrinsic::x86_avx2_packsswb:
case Intrinsic::x86_avx512_packssdw_512:
case Intrinsic::x86_avx512_packsswb_512:
if (Value V = simplifyX86pack(II, Builder, true))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_sse2_packuswb_128:
case Intrinsic::x86_sse41_packusdw:
case Intrinsic::x86_avx2_packusdw:
case Intrinsic::x86_avx2_packuswb:
case Intrinsic::x86_avx512_packusdw_512:
case Intrinsic::x86_avx512_packuswb_512:
if (Value V = simplifyX86pack(II, Builder, false))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_pclmulqdq:
case Intrinsic::x86_pclmulqdq_256:
case Intrinsic::x86_pclmulqdq_512: {
if (auto *C = dyn_cast<ConstantInt>(II->getArgOperand(2))) {
unsigned Imm = C->getZExtValue();

bool MadeChange = false;
Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);
unsigned VWidth = cast<VectorType>(Arg0->getType())->getNumElements();

APInt UndefElts1(VWidth, 0);
APInt DemandedElts1 = APInt::getSplat(VWidth,
APInt(2, (Imm & 0x01) ? 2 : 1));
if (Value *V = SimplifyDemandedVectorElts(Arg0, DemandedElts1,
UndefElts1)) {
replaceOperand(*II, 0, V);
MadeChange = true;
}

APInt UndefElts2(VWidth, 0);
APInt DemandedElts2 = APInt::getSplat(VWidth,
APInt(2, (Imm & 0x10) ? 2 : 1));
if (Value *V = SimplifyDemandedVectorElts(Arg1, DemandedElts2,
UndefElts2)) {
replaceOperand(*II, 1, V);
MadeChange = true;
}

// If either input elements are undef, the result is zero.
if (DemandedElts1.isSubsetOf(UndefElts1) \|\|
DemandedElts2.isSubsetOf(UndefElts2))
return replaceInstUsesWith(*II,
ConstantAggregateZero::get(II->getType()));

if (MadeChange)
return II;
}
break;
}

case Intrinsic::x86_sse41_insertps:
if (Value V = simplifyX86insertps(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_sse4a_extrq: {
Value *Op0 = II->getArgOperand(0);
Value *Op1 = II->getArgOperand(1);
unsigned VWidth0 = cast<VectorType>(Op0->getType())->getNumElements();
unsigned VWidth1 = cast<VectorType>(Op1->getType())->getNumElements();
assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth0 == 2 &&
VWidth1 == 16 && "Unexpected operand sizes");

// See if we're dealing with constant values.
Constant *C1 = dyn_cast<Constant>(Op1);
ConstantInt *CILength =
C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)0))
: nullptr;
ConstantInt *CIIndex =
C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)1))
: nullptr;

// Attempt to simplify to a constant, shuffle vector or EXTRQI call.
if (Value V = simplifyX86extrq(II, Op0, CILength, CIIndex, Builder))
return replaceInstUsesWith(*II, V);

// EXTRQ only uses the lowest 64-bits of the first 128-bit vector
// operands and the lowest 16-bits of the second.
bool MadeChange = false;
if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth0, 1)) {
replaceOperand(*II, 0, V);
MadeChange = true;
}
if (Value *V = SimplifyDemandedVectorEltsLow(Op1, VWidth1, 2)) {
replaceOperand(*II, 1, V);
MadeChange = true;
}
if (MadeChange)
return II;
break;
}

case Intrinsic::x86_sse4a_extrqi: {
// EXTRQI: Extract Length bits starting from Index. Zero pad the remaining
// bits of the lower 64-bits. The upper 64-bits are undefined.
Value *Op0 = II->getArgOperand(0);
unsigned VWidth = cast<VectorType>(Op0->getType())->getNumElements();
assert(Op0->getType()->getPrimitiveSizeInBits() == 128 && VWidth == 2 &&
"Unexpected operand size");

// See if we're dealing with constant values.
ConstantInt *CILength = dyn_cast<ConstantInt>(II->getArgOperand(1));
ConstantInt *CIIndex = dyn_cast<ConstantInt>(II->getArgOperand(2));

// Attempt to simplify to a constant or shuffle vector.
if (Value V = simplifyX86extrq(II, Op0, CILength, CIIndex, Builder))
return replaceInstUsesWith(*II, V);

// EXTRQI only uses the lowest 64-bits of the first 128-bit vector
// operand.
if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth, 1))
return replaceOperand(*II, 0, V);
break;
}

case Intrinsic::x86_sse4a_insertq: {
Value *Op0 = II->getArgOperand(0);
Value *Op1 = II->getArgOperand(1);
unsigned VWidth = cast<VectorType>(Op0->getType())->getNumElements();
assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth == 2 &&
cast<VectorType>(Op1->getType())->getNumElements() == 2 &&
"Unexpected operand size");

// See if we're dealing with constant values.
Constant *C1 = dyn_cast<Constant>(Op1);
ConstantInt *CI11 =
C1 ? dyn_cast_or_null<ConstantInt>(C1->getAggregateElement((unsigned)1))
: nullptr;

// Attempt to simplify to a constant, shuffle vector or INSERTQI call.
if (CI11) {
const APInt &V11 = CI11->getValue();
APInt Len = V11.zextOrTrunc(6);
APInt Idx = V11.lshr(8).zextOrTrunc(6);
if (Value V = simplifyX86insertq(II, Op0, Op1, Len, Idx, Builder))
return replaceInstUsesWith(*II, V);
}

// INSERTQ only uses the lowest 64-bits of the first 128-bit vector
// operand.
if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth, 1))
return replaceOperand(*II, 0, V);
break;
}

case Intrinsic::x86_sse4a_insertqi: {
// INSERTQI: Extract lowest Length bits from lower half of second source and
// insert over first source starting at Index bit. The upper 64-bits are
// undefined.
Value *Op0 = II->getArgOperand(0);
Value *Op1 = II->getArgOperand(1);
unsigned VWidth0 = cast<VectorType>(Op0->getType())->getNumElements();
unsigned VWidth1 = cast<VectorType>(Op1->getType())->getNumElements();
assert(Op0->getType()->getPrimitiveSizeInBits() == 128 &&
Op1->getType()->getPrimitiveSizeInBits() == 128 && VWidth0 == 2 &&
VWidth1 == 2 && "Unexpected operand sizes");

// See if we're dealing with constant values.
ConstantInt *CILength = dyn_cast<ConstantInt>(II->getArgOperand(2));
ConstantInt *CIIndex = dyn_cast<ConstantInt>(II->getArgOperand(3));

// Attempt to simplify to a constant or shuffle vector.
if (CILength && CIIndex) {
APInt Len = CILength->getValue().zextOrTrunc(6);
APInt Idx = CIIndex->getValue().zextOrTrunc(6);
if (Value V = simplifyX86insertq(II, Op0, Op1, Len, Idx, Builder))
return replaceInstUsesWith(*II, V);
}

// INSERTQI only uses the lowest 64-bits of the first two 128-bit vector
// operands.
bool MadeChange = false;
if (Value *V = SimplifyDemandedVectorEltsLow(Op0, VWidth0, 1)) {
replaceOperand(*II, 0, V);
MadeChange = true;
}
if (Value *V = SimplifyDemandedVectorEltsLow(Op1, VWidth1, 1)) {
replaceOperand(*II, 1, V);
MadeChange = true;
}
if (MadeChange)
return II;
break;
}

case Intrinsic::x86_sse41_pblendvb:
case Intrinsic::x86_sse41_blendvps:
case Intrinsic::x86_sse41_blendvpd:
case Intrinsic::x86_avx_blendv_ps_256:
case Intrinsic::x86_avx_blendv_pd_256:
case Intrinsic::x86_avx2_pblendvb: {
// fold (blend A, A, Mask) -> A
Value *Op0 = II->getArgOperand(0);
Value *Op1 = II->getArgOperand(1);
Value *Mask = II->getArgOperand(2);
if (Op0 == Op1)
return replaceInstUsesWith(CI, Op0);

// Zero Mask - select 1st argument.
if (isa<ConstantAggregateZero>(Mask))
return replaceInstUsesWith(CI, Op0);

// Constant Mask - select 1st/2nd argument lane based on top bit of mask.
if (auto *ConstantMask = dyn_cast<ConstantDataVector>(Mask)) {
Constant *NewSelector = getNegativeIsTrueBoolVec(ConstantMask);
return SelectInst::Create(NewSelector, Op1, Op0, "blendv");
}

// Convert to a vector select if we can bypass casts and find a boolean
// vector condition value.
Value *BoolVec;
Mask = peekThroughBitcast(Mask);
if (match(Mask, m_SExt(m_Value(BoolVec))) &&
BoolVec->getType()->isVectorTy() &&
BoolVec->getType()->getScalarSizeInBits() == 1) {
assert(Mask->getType()->getPrimitiveSizeInBits() ==
II->getType()->getPrimitiveSizeInBits() &&
"Not expecting mask and operands with different sizes");

unsigned NumMaskElts =
cast<VectorType>(Mask->getType())->getNumElements();
unsigned NumOperandElts =
cast<VectorType>(II->getType())->getNumElements();
if (NumMaskElts == NumOperandElts)
return SelectInst::Create(BoolVec, Op1, Op0);

// If the mask has less elements than the operands, each mask bit maps to
// multiple elements of the operands. Bitcast back and forth.
if (NumMaskElts < NumOperandElts) {
Value *CastOp0 = Builder.CreateBitCast(Op0, Mask->getType());
Value *CastOp1 = Builder.CreateBitCast(Op1, Mask->getType());
Value *Sel = Builder.CreateSelect(BoolVec, CastOp1, CastOp0);
return new BitCastInst(Sel, II->getType());
}
}

break;
}

case Intrinsic::x86_ssse3_pshuf_b_128:
case Intrinsic::x86_avx2_pshuf_b:
case Intrinsic::x86_avx512_pshuf_b_512:
if (Value V = simplifyX86pshufb(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_avx_vpermilvar_ps:
case Intrinsic::x86_avx_vpermilvar_ps_256:
case Intrinsic::x86_avx512_vpermilvar_ps_512:
case Intrinsic::x86_avx_vpermilvar_pd:
case Intrinsic::x86_avx_vpermilvar_pd_256:
case Intrinsic::x86_avx512_vpermilvar_pd_512:
if (Value V = simplifyX86vpermilvar(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_avx2_permd:
case Intrinsic::x86_avx2_permps:
case Intrinsic::x86_avx512_permvar_df_256:
case Intrinsic::x86_avx512_permvar_df_512:
case Intrinsic::x86_avx512_permvar_di_256:
case Intrinsic::x86_avx512_permvar_di_512:
case Intrinsic::x86_avx512_permvar_hi_128:
case Intrinsic::x86_avx512_permvar_hi_256:
case Intrinsic::x86_avx512_permvar_hi_512:
case Intrinsic::x86_avx512_permvar_qi_128:
case Intrinsic::x86_avx512_permvar_qi_256:
case Intrinsic::x86_avx512_permvar_qi_512:
case Intrinsic::x86_avx512_permvar_sf_512:
case Intrinsic::x86_avx512_permvar_si_512:
if (Value V = simplifyX86vpermv(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::x86_avx_maskload_ps:
case Intrinsic::x86_avx_maskload_pd:
case Intrinsic::x86_avx_maskload_ps_256:
case Intrinsic::x86_avx_maskload_pd_256:
case Intrinsic::x86_avx2_maskload_d:
case Intrinsic::x86_avx2_maskload_q:
case Intrinsic::x86_avx2_maskload_d_256:
case Intrinsic::x86_avx2_maskload_q_256:
if (Instruction I = simplifyX86MaskedLoad(II, *this))
return I;
break;

case Intrinsic::x86_sse2_maskmov_dqu:
case Intrinsic::x86_avx_maskstore_ps:
case Intrinsic::x86_avx_maskstore_pd:
case Intrinsic::x86_avx_maskstore_ps_256:
case Intrinsic::x86_avx_maskstore_pd_256:
case Intrinsic::x86_avx2_maskstore_d:
case Intrinsic::x86_avx2_maskstore_q:
case Intrinsic::x86_avx2_maskstore_d_256:
case Intrinsic::x86_avx2_maskstore_q_256:
if (simplifyX86MaskedStore(II, this))
return nullptr;
break;

case Intrinsic::x86_addcarry_32:
case Intrinsic::x86_addcarry_64:
if (Value V = simplifyX86addcarry(II, Builder))
return replaceInstUsesWith(*II, V);
break;

case Intrinsic::ppc_altivec_vperm:
// Turn vperm(V1,V2,mask) -> shuffle(V1,V2,mask) if mask is a constant.
// Note that ppc_altivec_vperm has a big-endian bias, so when creating
// a vectorshuffle for little endian, we must undo the transformation
// performed on vec_perm in altivec.h. That is, we must complement
// the permutation mask with respect to 31 and reverse the order of
// V1 and V2.
if (Constant *Mask = dyn_cast<Constant>(II->getArgOperand(2))) {
assert(cast<VectorType>(Mask->getType())->getNumElements() == 16 &&
"Bad type for intrinsic!");

// Check that all of the elements are integer constants or undefs.
bool AllEltsOk = true;
for (unsigned i = 0; i != 16; ++i) {
Constant *Elt = Mask->getAggregateElement(i);
if (!Elt \|\| !(isa<ConstantInt>(Elt) \|\| isa<UndefValue>(Elt))) {
AllEltsOk = false;
break;
}
}

if (AllEltsOk) {
// Cast the input vectors to byte vectors.
Value *Op0 = Builder.CreateBitCast(II->getArgOperand(0),
Mask->getType());
Value *Op1 = Builder.CreateBitCast(II->getArgOperand(1),
Mask->getType());
Value *Result = UndefValue::get(Op0->getType());

// Only extract each element once.
Value *ExtractedElts[32];
memset(ExtractedElts, 0, sizeof(ExtractedElts));

for (unsigned i = 0; i != 16; ++i) {
if (isa<UndefValue>(Mask->getAggregateElement(i)))
continue;
unsigned Idx =
cast<ConstantInt>(Mask->getAggregateElement(i))->getZExtValue();
Idx &= 31; // Match the hardware behavior.
if (DL.isLittleEndian())
Idx = 31 - Idx;

if (!ExtractedElts[Idx]) {
Value *Op0ToUse = (DL.isLittleEndian()) ? Op1 : Op0;
Value *Op1ToUse = (DL.isLittleEndian()) ? Op0 : Op1;
ExtractedElts[Idx] =
Builder.CreateExtractElement(Idx < 16 ? Op0ToUse : Op1ToUse,
Builder.getInt32(Idx&15));
}

// Insert this value into the result vector.
Result = Builder.CreateInsertElement(Result, ExtractedElts[Idx],
Builder.getInt32(i));
}
return CastInst::Create(Instruction::BitCast, Result, CI.getType());
}
}
break;

case Intrinsic::arm_neon_vld1: {
Align MemAlign = getKnownAlignment(II->getArgOperand(0), DL, II, &AC, &DT);
if (Value V = simplifyNeonVld1(II, MemAlign.value(), Builder))
return replaceInstUsesWith(*II, V);
break;
}

case Intrinsic::arm_neon_vld2:
case Intrinsic::arm_neon_vld3:
case Intrinsic::arm_neon_vld4:
case Intrinsic::arm_neon_vld2lane:
case Intrinsic::arm_neon_vld3lane:
case Intrinsic::arm_neon_vld4lane:
case Intrinsic::arm_neon_vst1:
case Intrinsic::arm_neon_vst2:
case Intrinsic::arm_neon_vst3:
case Intrinsic::arm_neon_vst4:
case Intrinsic::arm_neon_vst2lane:
case Intrinsic::arm_neon_vst3lane:
case Intrinsic::arm_neon_vst4lane: {
Align MemAlign = getKnownAlignment(II->getArgOperand(0), DL, II, &AC, &DT);
unsigned AlignArg = II->getNumArgOperands() - 1;
ConstantInt *IntrAlign = dyn_cast<ConstantInt>(II->getArgOperand(AlignArg));
if (IntrAlign && IntrAlign->getZExtValue() < MemAlign.value())
return replaceOperand(*II, AlignArg,
ConstantInt::get(Type::getInt32Ty(II->getContext()),
MemAlign.value(), false));
break;
}

case Intrinsic::arm_neon_vtbl1:		case Intrinsic::arm_neon_vtbl1:
case Intrinsic::aarch64_neon_tbl1:		case Intrinsic::aarch64_neon_tbl1:
if (Value V = simplifyNeonTbl1(II, Builder))		if (Value V = simplifyNeonTbl1(II, Builder))
return replaceInstUsesWith(*II, V);		return replaceInstUsesWith(*II, V);
break;		break;

case Intrinsic::arm_neon_vmulls:		case Intrinsic::arm_neon_vmulls:
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_crypto_aese: {
if (match(KeyArg, m_ZeroInt()) &&		if (match(KeyArg, m_ZeroInt()) &&
match(DataArg, m_Xor(m_Value(Data), m_Value(Key)))) {		match(DataArg, m_Xor(m_Value(Data), m_Value(Key)))) {
replaceOperand(*II, 0, Data);		replaceOperand(*II, 0, Data);
replaceOperand(*II, 1, Key);		replaceOperand(*II, 1, Key);
return II;		return II;
}		}
break;		break;
}		}
case Intrinsic::arm_mve_pred_i2v: {
Value *Arg = II->getArgOperand(0);
Value *ArgArg;
if (match(Arg, m_Intrinsic<Intrinsic::arm_mve_pred_v2i>(m_Value(ArgArg))) &&
II->getType() == ArgArg->getType())
return replaceInstUsesWith(*II, ArgArg);
Constant *XorMask;
if (match(Arg,
m_Xor(m_Intrinsic<Intrinsic::arm_mve_pred_v2i>(m_Value(ArgArg)),
m_Constant(XorMask))) &&
II->getType() == ArgArg->getType()) {
if (auto *CI = dyn_cast<ConstantInt>(XorMask)) {
if (CI->getValue().trunc(16).isAllOnesValue()) {
auto TrueVector = Builder.CreateVectorSplat(
cast<VectorType>(II->getType())->getNumElements(),
Builder.getTrue());
return BinaryOperator::Create(Instruction::Xor, ArgArg, TrueVector);
}
}
}
KnownBits ScalarKnown(32);
if (SimplifyDemandedBits(II, 0, APInt::getLowBitsSet(32, 16),
ScalarKnown, 0))
return II;
break;
}
case Intrinsic::arm_mve_pred_v2i: {
Value *Arg = II->getArgOperand(0);
Value *ArgArg;
if (match(Arg, m_Intrinsic<Intrinsic::arm_mve_pred_i2v>(m_Value(ArgArg))))
return replaceInstUsesWith(*II, ArgArg);
if (!II->getMetadata(LLVMContext::MD_range)) {
Type *IntTy32 = Type::getInt32Ty(II->getContext());
Metadata *M[] = {
ConstantAsMetadata::get(ConstantInt::get(IntTy32, 0)),
ConstantAsMetadata::get(ConstantInt::get(IntTy32, 0xFFFF))
};
II->setMetadata(LLVMContext::MD_range, MDNode::get(II->getContext(), M));
return II;
}
break;
}
case Intrinsic::arm_mve_vadc:
case Intrinsic::arm_mve_vadc_predicated: {
unsigned CarryOp =
(II->getIntrinsicID() == Intrinsic::arm_mve_vadc_predicated) ? 3 : 2;
assert(II->getArgOperand(CarryOp)->getType()->getScalarSizeInBits() == 32 &&
"Bad type for intrinsic!");

KnownBits CarryKnown(32);
if (SimplifyDemandedBits(II, CarryOp, APInt::getOneBitSet(32, 29),
CarryKnown))
return II;
break;
}
case Intrinsic::amdgcn_rcp: {
Value *Src = II->getArgOperand(0);

// TODO: Move to ConstantFolding/InstSimplify?
if (isa<UndefValue>(Src)) {
Type *Ty = II->getType();
auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
return replaceInstUsesWith(CI, QNaN);
}

if (II->isStrictFP())
break;

if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {
const APFloat &ArgVal = C->getValueAPF();
APFloat Val(ArgVal.getSemantics(), 1);
Val.divide(ArgVal, APFloat::rmNearestTiesToEven);

// This is more precise than the instruction may give.
//
// TODO: The instruction always flushes denormal results (except for f16),
// should this also?
return replaceInstUsesWith(CI, ConstantFP::get(II->getContext(), Val));
}

break;
}
case Intrinsic::amdgcn_rsq: {
Value *Src = II->getArgOperand(0);

// TODO: Move to ConstantFolding/InstSimplify?
if (isa<UndefValue>(Src)) {
Type *Ty = II->getType();
auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
return replaceInstUsesWith(CI, QNaN);
}

break;
}
case Intrinsic::amdgcn_frexp_mant:
case Intrinsic::amdgcn_frexp_exp: {
Value *Src = II->getArgOperand(0);
if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {
int Exp;
APFloat Significand = frexp(C->getValueAPF(), Exp,
APFloat::rmNearestTiesToEven);

if (IID == Intrinsic::amdgcn_frexp_mant) {
return replaceInstUsesWith(CI, ConstantFP::get(II->getContext(),
Significand));
}

// Match instruction special case behavior.
if (Exp == APFloat::IEK_NaN \|\| Exp == APFloat::IEK_Inf)
Exp = 0;

return replaceInstUsesWith(CI, ConstantInt::get(II->getType(), Exp));
}

if (isa<UndefValue>(Src))
return replaceInstUsesWith(CI, UndefValue::get(II->getType()));

break;
}
case Intrinsic::amdgcn_class: {
enum {
S_NAN = 1 << 0, // Signaling NaN
Q_NAN = 1 << 1, // Quiet NaN
N_INFINITY = 1 << 2, // Negative infinity
N_NORMAL = 1 << 3, // Negative normal
N_SUBNORMAL = 1 << 4, // Negative subnormal
N_ZERO = 1 << 5, // Negative zero
P_ZERO = 1 << 6, // Positive zero
P_SUBNORMAL = 1 << 7, // Positive subnormal
P_NORMAL = 1 << 8, // Positive normal
P_INFINITY = 1 << 9 // Positive infinity
};

const uint32_t FullMask = S_NAN \| Q_NAN \| N_INFINITY \| N_NORMAL \|
N_SUBNORMAL \| N_ZERO \| P_ZERO \| P_SUBNORMAL \| P_NORMAL \| P_INFINITY;

Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);
const ConstantInt *CMask = dyn_cast<ConstantInt>(Src1);
if (!CMask) {
if (isa<UndefValue>(Src0))
return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

if (isa<UndefValue>(Src1))
return replaceInstUsesWith(*II, ConstantInt::get(II->getType(), false));
break;
}

uint32_t Mask = CMask->getZExtValue();

// If all tests are made, it doesn't matter what the value is.
if ((Mask & FullMask) == FullMask)
return replaceInstUsesWith(*II, ConstantInt::get(II->getType(), true));

if ((Mask & FullMask) == 0)
return replaceInstUsesWith(*II, ConstantInt::get(II->getType(), false));

if (Mask == (S_NAN \| Q_NAN)) {
// Equivalent of isnan. Replace with standard fcmp.
Value *FCmp = Builder.CreateFCmpUNO(Src0, Src0);
FCmp->takeName(II);
return replaceInstUsesWith(*II, FCmp);
}

if (Mask == (N_ZERO \| P_ZERO)) {
// Equivalent of == 0.
Value *FCmp = Builder.CreateFCmpOEQ(
Src0, ConstantFP::get(Src0->getType(), 0.0));

FCmp->takeName(II);
return replaceInstUsesWith(*II, FCmp);
}

// fp_class (nnan x), qnan\|snan\|other -> fp_class (nnan x), other
if (((Mask & S_NAN) \|\| (Mask & Q_NAN)) && isKnownNeverNaN(Src0, &TLI))
return replaceOperand(*II, 1, ConstantInt::get(Src1->getType(),
Mask & ~(S_NAN \| Q_NAN)));

const ConstantFP *CVal = dyn_cast<ConstantFP>(Src0);
if (!CVal) {
if (isa<UndefValue>(Src0))
return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

// Clamp mask to used bits
if ((Mask & FullMask) != Mask) {
CallInst *NewCall = Builder.CreateCall(II->getCalledFunction(),
{ Src0, ConstantInt::get(Src1->getType(), Mask & FullMask) }
);

NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);
}

break;
}

const APFloat &Val = CVal->getValueAPF();

bool Result =
((Mask & S_NAN) && Val.isNaN() && Val.isSignaling()) \|\|
((Mask & Q_NAN) && Val.isNaN() && !Val.isSignaling()) \|\|
((Mask & N_INFINITY) && Val.isInfinity() && Val.isNegative()) \|\|
((Mask & N_NORMAL) && Val.isNormal() && Val.isNegative()) \|\|
((Mask & N_SUBNORMAL) && Val.isDenormal() && Val.isNegative()) \|\|
((Mask & N_ZERO) && Val.isZero() && Val.isNegative()) \|\|
((Mask & P_ZERO) && Val.isZero() && !Val.isNegative()) \|\|
((Mask & P_SUBNORMAL) && Val.isDenormal() && !Val.isNegative()) \|\|
((Mask & P_NORMAL) && Val.isNormal() && !Val.isNegative()) \|\|
((Mask & P_INFINITY) && Val.isInfinity() && !Val.isNegative());

return replaceInstUsesWith(*II, ConstantInt::get(II->getType(), Result));
}
case Intrinsic::amdgcn_cvt_pkrtz: {
Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);
if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {
const fltSemantics &HalfSem
= II->getType()->getScalarType()->getFltSemantics();
bool LosesInfo;
APFloat Val0 = C0->getValueAPF();
APFloat Val1 = C1->getValueAPF();
Val0.convert(HalfSem, APFloat::rmTowardZero, &LosesInfo);
Val1.convert(HalfSem, APFloat::rmTowardZero, &LosesInfo);

Constant *Folded = ConstantVector::get({
ConstantFP::get(II->getContext(), Val0),
ConstantFP::get(II->getContext(), Val1) });
return replaceInstUsesWith(*II, Folded);
}
}

if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1))
return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

break;
}
case Intrinsic::amdgcn_cvt_pknorm_i16:
case Intrinsic::amdgcn_cvt_pknorm_u16:
case Intrinsic::amdgcn_cvt_pk_i16:
case Intrinsic::amdgcn_cvt_pk_u16: {
Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);

if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1))
return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

break;
}
case Intrinsic::amdgcn_ubfe:
case Intrinsic::amdgcn_sbfe: {
// Decompose simple cases into standard shifts.
Value *Src = II->getArgOperand(0);
if (isa<UndefValue>(Src))
return replaceInstUsesWith(*II, Src);

unsigned Width;
Type *Ty = II->getType();
unsigned IntSize = Ty->getIntegerBitWidth();

ConstantInt *CWidth = dyn_cast<ConstantInt>(II->getArgOperand(2));
if (CWidth) {
Width = CWidth->getZExtValue();
if ((Width & (IntSize - 1)) == 0)
return replaceInstUsesWith(*II, ConstantInt::getNullValue(Ty));

// Hardware ignores high bits, so remove those.
if (Width >= IntSize)
return replaceOperand(*II, 2, ConstantInt::get(CWidth->getType(),
Width & (IntSize - 1)));
}

unsigned Offset;
ConstantInt *COffset = dyn_cast<ConstantInt>(II->getArgOperand(1));
if (COffset) {
Offset = COffset->getZExtValue();
if (Offset >= IntSize)
return replaceOperand(*II, 1, ConstantInt::get(COffset->getType(),
Offset & (IntSize - 1)));
}

bool Signed = IID == Intrinsic::amdgcn_sbfe;

if (!CWidth \|\| !COffset)
break;

// The case of Width == 0 is handled above, which makes this tranformation
// safe. If Width == 0, then the ashr and lshr instructions become poison
// value since the shift amount would be equal to the bit size.
assert(Width != 0);

// TODO: This allows folding to undef when the hardware has specific
// behavior?
if (Offset + Width < IntSize) {
Value *Shl = Builder.CreateShl(Src, IntSize - Offset - Width);
Value *RightShift = Signed ? Builder.CreateAShr(Shl, IntSize - Width)
: Builder.CreateLShr(Shl, IntSize - Width);
RightShift->takeName(II);
return replaceInstUsesWith(*II, RightShift);
}

Value *RightShift = Signed ? Builder.CreateAShr(Src, Offset)
: Builder.CreateLShr(Src, Offset);

RightShift->takeName(II);
return replaceInstUsesWith(*II, RightShift);
}
case Intrinsic::amdgcn_exp:
case Intrinsic::amdgcn_exp_compr: {
ConstantInt *En = cast<ConstantInt>(II->getArgOperand(1));
unsigned EnBits = En->getZExtValue();
if (EnBits == 0xf)
break; // All inputs enabled.

bool IsCompr = IID == Intrinsic::amdgcn_exp_compr;
bool Changed = false;
for (int I = 0; I < (IsCompr ? 2 : 4); ++I) {
if ((!IsCompr && (EnBits & (1 << I)) == 0) \|\|
(IsCompr && ((EnBits & (0x3 << (2 * I))) == 0))) {
Value *Src = II->getArgOperand(I + 2);
if (!isa<UndefValue>(Src)) {
replaceOperand(*II, I + 2, UndefValue::get(Src->getType()));
Changed = true;
}
}
}

if (Changed)
return II;

break;
}
case Intrinsic::amdgcn_fmed3: {
// Note this does not preserve proper sNaN behavior if IEEE-mode is enabled
// for the shader.

Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);
Value *Src2 = II->getArgOperand(2);

// Checking for NaN before canonicalization provides better fidelity when
// mapping other operations onto fmed3 since the order of operands is
// unchanged.
CallInst *NewCall = nullptr;
if (match(Src0, m_NaN()) \|\| isa<UndefValue>(Src0)) {
NewCall = Builder.CreateMinNum(Src1, Src2);
} else if (match(Src1, m_NaN()) \|\| isa<UndefValue>(Src1)) {
NewCall = Builder.CreateMinNum(Src0, Src2);
} else if (match(Src2, m_NaN()) \|\| isa<UndefValue>(Src2)) {
NewCall = Builder.CreateMaxNum(Src0, Src1);
}

if (NewCall) {
NewCall->copyFastMathFlags(II);
NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);
}

bool Swap = false;
// Canonicalize constants to RHS operands.
//
// fmed3(c0, x, c1) -> fmed3(x, c0, c1)
if (isa<Constant>(Src0) && !isa<Constant>(Src1)) {
std::swap(Src0, Src1);
Swap = true;
}

if (isa<Constant>(Src1) && !isa<Constant>(Src2)) {
std::swap(Src1, Src2);
Swap = true;
}

if (isa<Constant>(Src0) && !isa<Constant>(Src1)) {
std::swap(Src0, Src1);
Swap = true;
}

if (Swap) {
II->setArgOperand(0, Src0);
II->setArgOperand(1, Src1);
II->setArgOperand(2, Src2);
return II;
}

if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {
if (const ConstantFP *C2 = dyn_cast<ConstantFP>(Src2)) {
APFloat Result = fmed3AMDGCN(C0->getValueAPF(), C1->getValueAPF(),
C2->getValueAPF());
return replaceInstUsesWith(*II,
ConstantFP::get(Builder.getContext(), Result));
}
}
}

break;
}
case Intrinsic::amdgcn_icmp:
case Intrinsic::amdgcn_fcmp: {
const ConstantInt *CC = cast<ConstantInt>(II->getArgOperand(2));
// Guard against invalid arguments.
int64_t CCVal = CC->getZExtValue();
bool IsInteger = IID == Intrinsic::amdgcn_icmp;
if ((IsInteger && (CCVal < CmpInst::FIRST_ICMP_PREDICATE \|\|
CCVal > CmpInst::LAST_ICMP_PREDICATE)) \|\|
(!IsInteger && (CCVal < CmpInst::FIRST_FCMP_PREDICATE \|\|
CCVal > CmpInst::LAST_FCMP_PREDICATE)))
break;

Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);

if (auto *CSrc0 = dyn_cast<Constant>(Src0)) {
if (auto *CSrc1 = dyn_cast<Constant>(Src1)) {
Constant *CCmp = ConstantExpr::getCompare(CCVal, CSrc0, CSrc1);
if (CCmp->isNullValue()) {
return replaceInstUsesWith(
*II, ConstantExpr::getSExt(CCmp, II->getType()));
}

// The result of V_ICMP/V_FCMP assembly instructions (which this
// intrinsic exposes) is one bit per thread, masked with the EXEC
// register (which contains the bitmask of live threads). So a
// comparison that always returns true is the same as a read of the
// EXEC register.
Function *NewF = Intrinsic::getDeclaration(
II->getModule(), Intrinsic::read_register, II->getType());
Metadata *MDArgs[] = {MDString::get(II->getContext(), "exec")};
MDNode *MD = MDNode::get(II->getContext(), MDArgs);
Value *Args[] = {MetadataAsValue::get(II->getContext(), MD)};
CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->addAttribute(AttributeList::FunctionIndex,
Attribute::Convergent);
NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);
}

// Canonicalize constants to RHS.
CmpInst::Predicate SwapPred
= CmpInst::getSwappedPredicate(static_cast<CmpInst::Predicate>(CCVal));
II->setArgOperand(0, Src1);
II->setArgOperand(1, Src0);
II->setArgOperand(2, ConstantInt::get(CC->getType(),
static_cast<int>(SwapPred)));
return II;
}

if (CCVal != CmpInst::ICMP_EQ && CCVal != CmpInst::ICMP_NE)
break;

// Canonicalize compare eq with true value to compare != 0
// llvm.amdgcn.icmp(zext (i1 x), 1, eq)
// -> llvm.amdgcn.icmp(zext (i1 x), 0, ne)
// llvm.amdgcn.icmp(sext (i1 x), -1, eq)
// -> llvm.amdgcn.icmp(sext (i1 x), 0, ne)
Value *ExtSrc;
if (CCVal == CmpInst::ICMP_EQ &&
((match(Src1, m_One()) && match(Src0, m_ZExt(m_Value(ExtSrc)))) \|\|
(match(Src1, m_AllOnes()) && match(Src0, m_SExt(m_Value(ExtSrc))))) &&
ExtSrc->getType()->isIntegerTy(1)) {
replaceOperand(*II, 1, ConstantInt::getNullValue(Src1->getType()));
replaceOperand(*II, 2, ConstantInt::get(CC->getType(), CmpInst::ICMP_NE));
return II;
}

CmpInst::Predicate SrcPred;
Value *SrcLHS;
Value *SrcRHS;

// Fold compare eq/ne with 0 from a compare result as the predicate to the
// intrinsic. The typical use is a wave vote function in the library, which
// will be fed from a user code condition compared with 0. Fold in the
// redundant compare.

// llvm.amdgcn.icmp([sz]ext ([if]cmp pred a, b), 0, ne)
// -> llvm.amdgcn.[if]cmp(a, b, pred)
//
// llvm.amdgcn.icmp([sz]ext ([if]cmp pred a, b), 0, eq)
// -> llvm.amdgcn.[if]cmp(a, b, inv pred)
if (match(Src1, m_Zero()) &&
match(Src0,
m_ZExtOrSExt(m_Cmp(SrcPred, m_Value(SrcLHS), m_Value(SrcRHS))))) {
if (CCVal == CmpInst::ICMP_EQ)
SrcPred = CmpInst::getInversePredicate(SrcPred);

Intrinsic::ID NewIID = CmpInst::isFPPredicate(SrcPred) ?
Intrinsic::amdgcn_fcmp : Intrinsic::amdgcn_icmp;

Type *Ty = SrcLHS->getType();
if (auto *CmpType = dyn_cast<IntegerType>(Ty)) {
// Promote to next legal integer type.
unsigned Width = CmpType->getBitWidth();
unsigned NewWidth = Width;

// Don't do anything for i1 comparisons.
if (Width == 1)
break;

if (Width <= 16)
NewWidth = 16;
else if (Width <= 32)
NewWidth = 32;
else if (Width <= 64)
NewWidth = 64;
else if (Width > 64)
break; // Can't handle this.

if (Width != NewWidth) {
IntegerType *CmpTy = Builder.getIntNTy(NewWidth);
if (CmpInst::isSigned(SrcPred)) {
SrcLHS = Builder.CreateSExt(SrcLHS, CmpTy);
SrcRHS = Builder.CreateSExt(SrcRHS, CmpTy);
} else {
SrcLHS = Builder.CreateZExt(SrcLHS, CmpTy);
SrcRHS = Builder.CreateZExt(SrcRHS, CmpTy);
}
}
} else if (!Ty->isFloatTy() && !Ty->isDoubleTy() && !Ty->isHalfTy())
break;

Function *NewF =
Intrinsic::getDeclaration(II->getModule(), NewIID,
{ II->getType(),
SrcLHS->getType() });
Value *Args[] = { SrcLHS, SrcRHS,
ConstantInt::get(CC->getType(), SrcPred) };
CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);
}

break;
}
case Intrinsic::amdgcn_ballot: {
if (auto *Src = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
if (Src->isZero()) {
// amdgcn.ballot(i1 0) is zero.
return replaceInstUsesWith(*II, Constant::getNullValue(II->getType()));
}

if (Src->isOne()) {
// amdgcn.ballot(i1 1) is exec.
const char *RegName = "exec";
if (II->getType()->isIntegerTy(32))
RegName = "exec_lo";
else if (!II->getType()->isIntegerTy(64))
break;

Function *NewF = Intrinsic::getDeclaration(
II->getModule(), Intrinsic::read_register, II->getType());
Metadata *MDArgs[] = {MDString::get(II->getContext(), RegName)};
MDNode *MD = MDNode::get(II->getContext(), MDArgs);
Value *Args[] = {MetadataAsValue::get(II->getContext(), MD)};
CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->addAttribute(AttributeList::FunctionIndex,
Attribute::Convergent);
NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);
}
}
break;
}
case Intrinsic::amdgcn_wqm_vote: {
// wqm_vote is identity when the argument is constant.
if (!isa<Constant>(II->getArgOperand(0)))
break;

return replaceInstUsesWith(*II, II->getArgOperand(0));
}
case Intrinsic::amdgcn_kill: {
const ConstantInt *C = dyn_cast<ConstantInt>(II->getArgOperand(0));
if (!C \|\| !C->getZExtValue())
break;

// amdgcn.kill(i1 1) is a no-op
return eraseInstFromFunction(CI);
}
case Intrinsic::amdgcn_update_dpp: {
Value *Old = II->getArgOperand(0);

auto BC = cast<ConstantInt>(II->getArgOperand(5));
auto RM = cast<ConstantInt>(II->getArgOperand(3));
auto BM = cast<ConstantInt>(II->getArgOperand(4));
if (BC->isZeroValue() \|\|
RM->getZExtValue() != 0xF \|\|
BM->getZExtValue() != 0xF \|\|
isa<UndefValue>(Old))
break;

// If bound_ctrl = 1, row mask = bank mask = 0xf we can omit old value.
return replaceOperand(*II, 0, UndefValue::get(Old->getType()));
}
case Intrinsic::amdgcn_permlane16:
case Intrinsic::amdgcn_permlanex16: {
// Discard vdst_in if it's not going to be read.
Value *VDstIn = II->getArgOperand(0);
if (isa<UndefValue>(VDstIn))
break;

ConstantInt *FetchInvalid = cast<ConstantInt>(II->getArgOperand(4));
ConstantInt *BoundCtrl = cast<ConstantInt>(II->getArgOperand(5));
if (!FetchInvalid->getZExtValue() && !BoundCtrl->getZExtValue())
break;

return replaceOperand(*II, 0, UndefValue::get(VDstIn->getType()));
}
case Intrinsic::amdgcn_readfirstlane:
case Intrinsic::amdgcn_readlane: {
// A constant value is trivially uniform.
if (Constant *C = dyn_cast<Constant>(II->getArgOperand(0)))
return replaceInstUsesWith(*II, C);

// The rest of these may not be safe if the exec may not be the same between
// the def and use.
Value *Src = II->getArgOperand(0);
Instruction *SrcInst = dyn_cast<Instruction>(Src);
if (SrcInst && SrcInst->getParent() != II->getParent())
break;

// readfirstlane (readfirstlane x) -> readfirstlane x
// readlane (readfirstlane x), y -> readfirstlane x
if (match(Src, m_Intrinsic<Intrinsic::amdgcn_readfirstlane>()))
return replaceInstUsesWith(*II, Src);

if (IID == Intrinsic::amdgcn_readfirstlane) {
// readfirstlane (readlane x, y) -> readlane x, y
if (match(Src, m_Intrinsic<Intrinsic::amdgcn_readlane>()))
return replaceInstUsesWith(*II, Src);
} else {
// readlane (readlane x, y), y -> readlane x, y
if (match(Src, m_Intrinsic<Intrinsic::amdgcn_readlane>(
m_Value(), m_Specific(II->getArgOperand(1)))))
return replaceInstUsesWith(*II, Src);
}

break;
}
case Intrinsic::amdgcn_ldexp: {
// FIXME: This doesn't introduce new instructions and belongs in
// InstructionSimplify.
Type *Ty = II->getType();
Value *Op0 = II->getArgOperand(0);
Value *Op1 = II->getArgOperand(1);

// Folding undef to qnan is safe regardless of the FP mode.
if (isa<UndefValue>(Op0)) {
auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
return replaceInstUsesWith(*II, QNaN);
}

const APFloat *C = nullptr;
match(Op0, m_APFloat(C));

// FIXME: Should flush denorms depending on FP mode, but that's ignored
// everywhere else.
//
// These cases should be safe, even with strictfp.
// ldexp(0.0, x) -> 0.0
// ldexp(-0.0, x) -> -0.0
// ldexp(inf, x) -> inf
// ldexp(-inf, x) -> -inf
if (C && (C->isZero() \|\| C->isInfinity()))
return replaceInstUsesWith(*II, Op0);

// With strictfp, be more careful about possibly needing to flush denormals
// or not, and snan behavior depends on ieee_mode.
if (II->isStrictFP())
break;

if (C && C->isNaN()) {
// FIXME: We just need to make the nan quiet here, but that's unavailable
// on APFloat, only IEEEfloat
auto *Quieted = ConstantFP::get(
Ty, scalbn(*C, 0, APFloat::rmNearestTiesToEven));
return replaceInstUsesWith(*II, Quieted);
}

// ldexp(x, 0) -> x
// ldexp(x, undef) -> x
if (isa<UndefValue>(Op1) \|\| match(Op1, m_ZeroInt()))
return replaceInstUsesWith(*II, Op0);

break;
}
case Intrinsic::hexagon_V6_vandvrt:		case Intrinsic::hexagon_V6_vandvrt:
case Intrinsic::hexagon_V6_vandvrt_128B: {		case Intrinsic::hexagon_V6_vandvrt_128B: {
// Simplify Q -> V -> Q conversion.		// Simplify Q -> V -> Q conversion.
if (auto Op0 = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {		if (auto Op0 = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {
Intrinsic::ID ID0 = Op0->getIntrinsicID();		Intrinsic::ID ID0 = Op0->getIntrinsicID();
if (ID0 != Intrinsic::hexagon_V6_vandqrt &&		if (ID0 != Intrinsic::hexagon_V6_vandqrt &&
ID0 != Intrinsic::hexagon_V6_vandqrt_128B)		ID0 != Intrinsic::hexagon_V6_vandqrt_128B)
break;		break;
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	if (match(NextInst,
}		}
replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));		replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));
}		}
eraseInstFromFunction(*NextInst);		eraseInstFromFunction(*NextInst);
return II;		return II;
}		}
break;		break;
}		}
		default: {
		Instruction *V = nullptr;
		if (TTI.instCombineIntrinsic(this, II, &V))
		return V;
		break;
		}
}		}
return visitCallBase(*II);		return visitCallBase(*II);
}		}

// Fence instruction simplification		// Fence instruction simplification
Instruction *InstCombiner::visitFenceInst(FenceInst &FI) {		Instruction *InstCombinerImpl::visitFenceInst(FenceInst &FI) {
// Remove identical consecutive fences.		// Remove identical consecutive fences.
Instruction *Next = FI.getNextNonDebugInstruction();		Instruction *Next = FI.getNextNonDebugInstruction();
if (auto *NFI = dyn_cast<FenceInst>(Next))		if (auto *NFI = dyn_cast<FenceInst>(Next))
if (FI.isIdenticalTo(NFI))		if (FI.isIdenticalTo(NFI))
return eraseInstFromFunction(FI);		return eraseInstFromFunction(FI);
return nullptr;		return nullptr;
}		}

// InvokeInst simplification		// InvokeInst simplification
Instruction *InstCombiner::visitInvokeInst(InvokeInst &II) {		Instruction *InstCombinerImpl::visitInvokeInst(InvokeInst &II) {
return visitCallBase(II);		return visitCallBase(II);
}		}

// CallBrInst simplification		// CallBrInst simplification
Instruction *InstCombiner::visitCallBrInst(CallBrInst &CBI) {		Instruction *InstCombinerImpl::visitCallBrInst(CallBrInst &CBI) {
return visitCallBase(CBI);		return visitCallBase(CBI);
}		}

/// If this cast does not affect the value passed through the varargs area, we		/// If this cast does not affect the value passed through the varargs area, we
/// can eliminate the use of the cast.		/// can eliminate the use of the cast.
static bool isSafeToEliminateVarargsCast(const CallBase &Call,		static bool isSafeToEliminateVarargsCast(const CallBase &Call,
const DataLayout &DL,		const DataLayout &DL,
const CastInst *const CI,		const CastInst *const CI,
Show All 23 Lines	Type *DstTy = Call.isByValArgument(ix)
: cast<PointerType>(CI->getType())->getElementType();		: cast<PointerType>(CI->getType())->getElementType();
if (!SrcTy->isSized() \|\| !DstTy->isSized())		if (!SrcTy->isSized() \|\| !DstTy->isSized())
return false;		return false;
if (DL.getTypeAllocSize(SrcTy) != DL.getTypeAllocSize(DstTy))		if (DL.getTypeAllocSize(SrcTy) != DL.getTypeAllocSize(DstTy))
return false;		return false;
return true;		return true;
}		}

Instruction InstCombiner::tryOptimizeCall(CallInst CI) {		Instruction InstCombinerImpl::tryOptimizeCall(CallInst CI) {
if (!CI->getCalledFunction()) return nullptr;		if (!CI->getCalledFunction()) return nullptr;

auto InstCombineRAUW = [this](Instruction From, Value With) {		auto InstCombineRAUW = [this](Instruction From, Value With) {
replaceInstUsesWith(*From, With);		replaceInstUsesWith(*From, With);
};		};
auto InstCombineErase = [this](Instruction *I) {		auto InstCombineErase = [this](Instruction *I) {
eraseInstFromFunction(*I);		eraseInstFromFunction(*I);
};		};
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	if (Len) {
AttributeList::ReturnIndex,		AttributeList::ReturnIndex,
Attribute::getWithDereferenceableOrNullBytes(		Attribute::getWithDereferenceableOrNullBytes(
Call.getContext(), std::min(Len, Op1C->getZExtValue() + 1)));		Call.getContext(), std::min(Len, Op1C->getZExtValue() + 1)));
}		}
}		}
}		}

/// Improvements for call, callbr and invoke instructions.		/// Improvements for call, callbr and invoke instructions.
Instruction *InstCombiner::visitCallBase(CallBase &Call) {		Instruction *InstCombinerImpl::visitCallBase(CallBase &Call) {
if (isAllocationFn(&Call, &TLI))		if (isAllocationFn(&Call, &TLI))
annotateAnyAllocSite(Call, &TLI);		annotateAnyAllocSite(Call, &TLI);

bool Changed = false;		bool Changed = false;

// Mark any parameters that are known to be non-null with the nonnull		// Mark any parameters that are known to be non-null with the nonnull
// attribute. This is helpful for inlining calls to functions with null		// attribute. This is helpful for inlining calls to functions with null
// checks on their arguments.		// checks on their arguments.
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitCallBase(CallBase &Call) {
if (isAllocLikeFn(&Call, &TLI))		if (isAllocLikeFn(&Call, &TLI))
return visitAllocSite(Call);		return visitAllocSite(Call);

return Changed ? &Call : nullptr;		return Changed ? &Call : nullptr;
}		}

/// If the callee is a constexpr cast of a function, attempt to move the cast to		/// If the callee is a constexpr cast of a function, attempt to move the cast to
/// the arguments of the call/callbr/invoke.		/// the arguments of the call/callbr/invoke.
bool InstCombiner::transformConstExprCastCall(CallBase &Call) {		bool InstCombinerImpl::transformConstExprCastCall(CallBase &Call) {
auto *Callee =		auto *Callee =
dyn_cast<Function>(Call.getCalledOperand()->stripPointerCasts());		dyn_cast<Function>(Call.getCalledOperand()->stripPointerCasts());
if (!Callee)		if (!Callee)
return false;		return false;

// If this is a call to a thunk function, don't remove the cast. Thunks are		// If this is a call to a thunk function, don't remove the cast. Thunks are
// used to transparently forward all incoming parameters and outgoing return		// used to transparently forward all incoming parameters and outgoing return
// values, so it's important to leave the cast in place.		// values, so it's important to leave the cast in place.
▲ Show 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	bool InstCombinerImpl::transformConstExprCastCall(CallBase &Call) {

eraseInstFromFunction(*Caller);		eraseInstFromFunction(*Caller);
return true;		return true;
}		}

/// Turn a call to a function created by init_trampoline / adjust_trampoline		/// Turn a call to a function created by init_trampoline / adjust_trampoline
/// intrinsic pair into a direct call to the underlying function.		/// intrinsic pair into a direct call to the underlying function.
Instruction *		Instruction *
InstCombiner::transformCallThroughTrampoline(CallBase &Call,		InstCombinerImpl::transformCallThroughTrampoline(CallBase &Call,
IntrinsicInst &Tramp) {		IntrinsicInst &Tramp) {
Value *Callee = Call.getCalledOperand();		Value *Callee = Call.getCalledOperand();
Type *CalleeTy = Callee->getType();		Type *CalleeTy = Callee->getType();
FunctionType *FTy = Call.getFunctionType();		FunctionType *FTy = Call.getFunctionType();
AttributeList Attrs = Call.getAttributes();		AttributeList Attrs = Call.getAttributes();

// If the call already has the 'nest' attribute somewhere then give up -		// If the call already has the 'nest' attribute somewhere then give up -
// otherwise 'nest' would occur twice after splicing in the chain.		// otherwise 'nest' would occur twice after splicing in the chain.
if (Attrs.hasAttrSomewhere(Attribute::Nest))		if (Attrs.hasAttrSomewhere(Attribute::Nest))
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

//===- InstCombineCasts.cpp -----------------------------------------------===//		//===- InstCombineCasts.cpp -----------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the visit functions for cast operations.		// This file implements the visit functions for cast operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include <numeric>		#include <numeric>
using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

/// Analyze 'Val', seeing if it is a simple linear expression.		/// Analyze 'Val', seeing if it is a simple linear expression.
/// If so, decompose it, returning some value X, such that Val is		/// If so, decompose it, returning some value X, such that Val is
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	static Value decomposeSimpleLinearExpr(Value Val, unsigned &Scale,
// Otherwise, we can't look past this.		// Otherwise, we can't look past this.
Scale = 1;		Scale = 1;
Offset = 0;		Offset = 0;
return Val;		return Val;
}		}

/// If we find a cast of an allocation instruction, try to eliminate the cast by		/// If we find a cast of an allocation instruction, try to eliminate the cast by
/// moving the type information into the alloc.		/// moving the type information into the alloc.
Instruction *InstCombiner::PromoteCastOfAllocation(BitCastInst &CI,		Instruction *InstCombinerImpl::PromoteCastOfAllocation(BitCastInst &CI,
AllocaInst &AI) {		AllocaInst &AI) {
PointerType *PTy = cast<PointerType>(CI.getType());		PointerType *PTy = cast<PointerType>(CI.getType());

IRBuilderBase::InsertPointGuard Guard(Builder);		IRBuilderBase::InsertPointGuard Guard(Builder);
Builder.SetInsertPoint(&AI);		Builder.SetInsertPoint(&AI);

// Get the type really allocated and the type casted to.		// Get the type really allocated and the type casted to.
Type *AllocElTy = AI.getAllocatedType();		Type *AllocElTy = AI.getAllocatedType();
Type *CastElTy = PTy->getElementType();		Type *CastElTy = PTy->getElementType();
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (!AI.hasOneUse()) {
replaceInstUsesWith(AI, NewCast);		replaceInstUsesWith(AI, NewCast);
eraseInstFromFunction(AI);		eraseInstFromFunction(AI);
}		}
return replaceInstUsesWith(CI, New);		return replaceInstUsesWith(CI, New);
}		}

/// Given an expression that CanEvaluateTruncated or CanEvaluateSExtd returns		/// Given an expression that CanEvaluateTruncated or CanEvaluateSExtd returns
/// true for, actually insert the code to evaluate the expression.		/// true for, actually insert the code to evaluate the expression.
Value InstCombiner::EvaluateInDifferentType(Value V, Type *Ty,		Value InstCombinerImpl::EvaluateInDifferentType(Value V, Type *Ty,
bool isSigned) {		bool isSigned) {
if (Constant *C = dyn_cast<Constant>(V)) {		if (Constant *C = dyn_cast<Constant>(V)) {
C = ConstantExpr::getIntegerCast(C, Ty, isSigned /Sext or ZExt/);		C = ConstantExpr::getIntegerCast(C, Ty, isSigned /Sext or ZExt/);
// If we got a constantexpr back, try to simplify it with DL info.		// If we got a constantexpr back, try to simplify it with DL info.
return ConstantFoldConstant(C, DL, &TLI);		return ConstantFoldConstant(C, DL, &TLI);
}		}

// Otherwise, it must be an instruction.		// Otherwise, it must be an instruction.
Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	default:
// TODO: Can handle more cases here.		// TODO: Can handle more cases here.
llvm_unreachable("Unreachable!");		llvm_unreachable("Unreachable!");
}		}

Res->takeName(I);		Res->takeName(I);
return InsertNewInstWith(Res, *I);		return InsertNewInstWith(Res, *I);
}		}

Instruction::CastOps InstCombiner::isEliminableCastPair(const CastInst *CI1,		Instruction::CastOps
		InstCombinerImpl::isEliminableCastPair(const CastInst *CI1,
const CastInst *CI2) {		const CastInst *CI2) {
Type *SrcTy = CI1->getSrcTy();		Type *SrcTy = CI1->getSrcTy();
Type *MidTy = CI1->getDestTy();		Type *MidTy = CI1->getDestTy();
Type *DstTy = CI2->getDestTy();		Type *DstTy = CI2->getDestTy();

Instruction::CastOps firstOp = CI1->getOpcode();		Instruction::CastOps firstOp = CI1->getOpcode();
Instruction::CastOps secondOp = CI2->getOpcode();		Instruction::CastOps secondOp = CI2->getOpcode();
Type *SrcIntPtrTy =		Type *SrcIntPtrTy =
SrcTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(SrcTy) : nullptr;		SrcTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(SrcTy) : nullptr;
Show All 10 Lines	InstCombinerImpl::isEliminableCastPair(const CastInst *CI1,
if ((Res == Instruction::IntToPtr && SrcTy != DstIntPtrTy) \|\|		if ((Res == Instruction::IntToPtr && SrcTy != DstIntPtrTy) \|\|
(Res == Instruction::PtrToInt && DstTy != SrcIntPtrTy))		(Res == Instruction::PtrToInt && DstTy != SrcIntPtrTy))
Res = 0;		Res = 0;

return Instruction::CastOps(Res);		return Instruction::CastOps(Res);
}		}

/// Implement the transforms common to all CastInst visitors.		/// Implement the transforms common to all CastInst visitors.
Instruction *InstCombiner::commonCastTransforms(CastInst &CI) {		Instruction *InstCombinerImpl::commonCastTransforms(CastInst &CI) {
Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);

// Try to eliminate a cast of a cast.		// Try to eliminate a cast of a cast.
if (auto *CSrc = dyn_cast<CastInst>(Src)) { // A->B->C cast		if (auto *CSrc = dyn_cast<CastInst>(Src)) { // A->B->C cast
if (Instruction::CastOps NewOpc = isEliminableCastPair(CSrc, &CI)) {		if (Instruction::CastOps NewOpc = isEliminableCastPair(CSrc, &CI)) {
// The first cast (CSrc) is eliminable so we need to fix up or replace		// The first cast (CSrc) is eliminable so we need to fix up or replace
// the second cast (CI). CSrc will then have a good chance of being dead.		// the second cast (CI). CSrc will then have a good chance of being dead.
auto *Ty = CI.getType();		auto *Ty = CI.getType();
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
///		///
/// Ty will always be a type smaller than V. We should return true if trunc(V)		/// Ty will always be a type smaller than V. We should return true if trunc(V)
/// can be computed by computing V in the smaller type. If V is an instruction,		/// can be computed by computing V in the smaller type. If V is an instruction,
/// then trunc(inst(x,y)) can be computed as inst(trunc(x),trunc(y)), which only		/// then trunc(inst(x,y)) can be computed as inst(trunc(x),trunc(y)), which only
/// makes sense if x and y can be efficiently truncated.		/// makes sense if x and y can be efficiently truncated.
///		///
/// This function works on both vectors and scalars.		/// This function works on both vectors and scalars.
///		///
static bool canEvaluateTruncated(Value V, Type Ty, InstCombiner &IC,		static bool canEvaluateTruncated(Value V, Type Ty, InstCombinerImpl &IC,
Instruction *CxtI) {		Instruction *CxtI) {
if (canAlwaysEvaluateInType(V, Ty))		if (canAlwaysEvaluateInType(V, Ty))
return true;		return true;
if (canNotEvaluateInType(V, Ty))		if (canNotEvaluateInType(V, Ty))
return false;		return false;

auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
Type *OrigTy = V->getType();		Type *OrigTy = V->getType();
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
}		}

/// Given a vector that is bitcast to an integer, optionally logically		/// Given a vector that is bitcast to an integer, optionally logically
/// right-shifted, and truncated, convert it to an extractelement.		/// right-shifted, and truncated, convert it to an extractelement.
/// Example (big endian):		/// Example (big endian):
/// trunc (lshr (bitcast <4 x i32> %X to i128), 32) to i32		/// trunc (lshr (bitcast <4 x i32> %X to i128), 32) to i32
/// --->		/// --->
/// extractelement <4 x i32> %X, 1		/// extractelement <4 x i32> %X, 1
static Instruction *foldVecTruncToExtElt(TruncInst &Trunc, InstCombiner &IC) {		static Instruction *foldVecTruncToExtElt(TruncInst &Trunc,
		InstCombinerImpl &IC) {
Value *TruncOp = Trunc.getOperand(0);		Value *TruncOp = Trunc.getOperand(0);
Type *DestType = Trunc.getType();		Type *DestType = Trunc.getType();
if (!TruncOp->hasOneUse() \|\| !isa<IntegerType>(DestType))		if (!TruncOp->hasOneUse() \|\| !isa<IntegerType>(DestType))
return nullptr;		return nullptr;

Value *VecInput = nullptr;		Value *VecInput = nullptr;
ConstantInt *ShiftVal = nullptr;		ConstantInt *ShiftVal = nullptr;
if (!match(TruncOp, m_CombineOr(m_BitCast(m_Value(VecInput)),		if (!match(TruncOp, m_CombineOr(m_BitCast(m_Value(VecInput)),
Show All 22 Lines	static Instruction *foldVecTruncToExtElt(TruncInst &Trunc,
if (IC.getDataLayout().isBigEndian())		if (IC.getDataLayout().isBigEndian())
Elt = NumVecElts - 1 - Elt;		Elt = NumVecElts - 1 - Elt;

return ExtractElementInst::Create(VecInput, IC.Builder.getInt32(Elt));		return ExtractElementInst::Create(VecInput, IC.Builder.getInt32(Elt));
}		}

/// Rotate left/right may occur in a wider type than necessary because of type		/// Rotate left/right may occur in a wider type than necessary because of type
/// promotion rules. Try to narrow the inputs and convert to funnel shift.		/// promotion rules. Try to narrow the inputs and convert to funnel shift.
Instruction *InstCombiner::narrowRotate(TruncInst &Trunc) {		Instruction *InstCombinerImpl::narrowRotate(TruncInst &Trunc) {
assert((isa<VectorType>(Trunc.getSrcTy()) \|\|		assert((isa<VectorType>(Trunc.getSrcTy()) \|\|
shouldChangeType(Trunc.getSrcTy(), Trunc.getType())) &&		shouldChangeType(Trunc.getSrcTy(), Trunc.getType())) &&
"Don't narrow to an illegal scalar type");		"Don't narrow to an illegal scalar type");

// Bail out on strange types. It is possible to handle some of these patterns		// Bail out on strange types. It is possible to handle some of these patterns
// even with non-power-of-2 sizes, but it is not a likely scenario.		// even with non-power-of-2 sizes, but it is not a likely scenario.
Type *DestTy = Trunc.getType();		Type *DestTy = Trunc.getType();
unsigned NarrowWidth = DestTy->getScalarSizeInBits();		unsigned NarrowWidth = DestTy->getScalarSizeInBits();
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::narrowRotate(TruncInst &Trunc) {
Intrinsic::ID IID = IsFshl ? Intrinsic::fshl : Intrinsic::fshr;		Intrinsic::ID IID = IsFshl ? Intrinsic::fshl : Intrinsic::fshr;
Function *F = Intrinsic::getDeclaration(Trunc.getModule(), IID, DestTy);		Function *F = Intrinsic::getDeclaration(Trunc.getModule(), IID, DestTy);
return IntrinsicInst::Create(F, { X, X, NarrowShAmt });		return IntrinsicInst::Create(F, { X, X, NarrowShAmt });
}		}

/// Try to narrow the width of math or bitwise logic instructions by pulling a		/// Try to narrow the width of math or bitwise logic instructions by pulling a
/// truncate ahead of binary operators.		/// truncate ahead of binary operators.
/// TODO: Transforms for truncated shifts should be moved into here.		/// TODO: Transforms for truncated shifts should be moved into here.
Instruction *InstCombiner::narrowBinOp(TruncInst &Trunc) {		Instruction *InstCombinerImpl::narrowBinOp(TruncInst &Trunc) {
Type *SrcTy = Trunc.getSrcTy();		Type *SrcTy = Trunc.getSrcTy();
Type *DestTy = Trunc.getType();		Type *DestTy = Trunc.getType();
if (!isa<VectorType>(SrcTy) && !shouldChangeType(SrcTy, DestTy))		if (!isa<VectorType>(SrcTy) && !shouldChangeType(SrcTy, DestTy))
return nullptr;		return nullptr;

BinaryOperator *BinOp;		BinaryOperator *BinOp;
if (!match(Trunc.getOperand(0), m_OneUse(m_BinOp(BinOp))))		if (!match(Trunc.getOperand(0), m_OneUse(m_BinOp(BinOp))))
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (isa<UndefValue>(VecOp)) {
UndefValue *NarrowUndef = UndefValue::get(DestTy);		UndefValue *NarrowUndef = UndefValue::get(DestTy);
Value *NarrowOp = Builder.CreateCast(Opcode, ScalarOp, DestScalarTy);		Value *NarrowOp = Builder.CreateCast(Opcode, ScalarOp, DestScalarTy);
return InsertElementInst::Create(NarrowUndef, NarrowOp, Index);		return InsertElementInst::Create(NarrowUndef, NarrowOp, Index);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitTrunc(TruncInst &Trunc) {		Instruction *InstCombinerImpl::visitTrunc(TruncInst &Trunc) {
if (Instruction *Result = commonCastTransforms(Trunc))		if (Instruction *Result = commonCastTransforms(Trunc))
return Result;		return Result;

Value *Src = Trunc.getOperand(0);		Value *Src = Trunc.getOperand(0);
Type DestTy = Trunc.getType(), SrcTy = Src->getType();		Type DestTy = Trunc.getType(), SrcTy = Src->getType();
unsigned DestWidth = DestTy->getScalarSizeInBits();		unsigned DestWidth = DestTy->getScalarSizeInBits();
unsigned SrcWidth = SrcTy->getScalarSizeInBits();		unsigned SrcWidth = SrcTy->getScalarSizeInBits();
ConstantInt *Cst;		ConstantInt *Cst;
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	if (SrcWidth % DestWidth == 0) {
Value *BitCast = Builder.CreateBitCast(VecOp, BitCastTo);		Value *BitCast = Builder.CreateBitCast(VecOp, BitCastTo);
return ExtractElementInst::Create(BitCast, Builder.getInt32(NewIdx));		return ExtractElementInst::Create(BitCast, Builder.getInt32(NewIdx));
}		}
}		}

return nullptr;		return nullptr;
}		}

Instruction InstCombiner::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,		Instruction InstCombinerImpl::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,
bool DoTransform) {		bool DoTransform) {
// If we are just checking for a icmp eq of a single bit and zext'ing it		// If we are just checking for a icmp eq of a single bit and zext'ing it
// to an integer, then shift the bit to the appropriate place and then		// to an integer, then shift the bit to the appropriate place and then
// cast to integer to avoid the comparison.		// cast to integer to avoid the comparison.
const APInt *Op1CV;		const APInt *Op1CV;
if (match(Cmp->getOperand(1), m_APInt(Op1CV))) {		if (match(Cmp->getOperand(1), m_APInt(Op1CV))) {

// zext (x <s 0) to i32 --> x>>u31 true if signbit set.		// zext (x <s 0) to i32 --> x>>u31 true if signbit set.
// zext (x >s -1) to i32 --> (x>>u31)^1 true if signbit clear.		// zext (x >s -1) to i32 --> (x>>u31)^1 true if signbit clear.
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
///		///
/// CanEvaluateZExtd for the 'lshr' will return true, and BitsToClear will be		/// CanEvaluateZExtd for the 'lshr' will return true, and BitsToClear will be
/// set to 8 to indicate that the promoted value needs to have bits 24-31		/// set to 8 to indicate that the promoted value needs to have bits 24-31
/// cleared in addition to bits 32-63. Since an 'and' will be generated to		/// cleared in addition to bits 32-63. Since an 'and' will be generated to
/// clear the top bits anyway, doing this has no extra cost.		/// clear the top bits anyway, doing this has no extra cost.
///		///
/// This function works on both vectors and scalars.		/// This function works on both vectors and scalars.
static bool canEvaluateZExtd(Value V, Type Ty, unsigned &BitsToClear,		static bool canEvaluateZExtd(Value V, Type Ty, unsigned &BitsToClear,
InstCombiner &IC, Instruction *CxtI) {		InstCombinerImpl &IC, Instruction *CxtI) {
BitsToClear = 0;		BitsToClear = 0;
if (canAlwaysEvaluateInType(V, Ty))		if (canAlwaysEvaluateInType(V, Ty))
return true;		return true;
if (canNotEvaluateInType(V, Ty))		if (canNotEvaluateInType(V, Ty))
return false;		return false;

auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
unsigned Tmp;		unsigned Tmp;
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	case Instruction::PHI: {
return true;		return true;
}		}
default:		default:
// TODO: Can handle more cases here.		// TODO: Can handle more cases here.
return false;		return false;
}		}
}		}

Instruction *InstCombiner::visitZExt(ZExtInst &CI) {		Instruction *InstCombinerImpl::visitZExt(ZExtInst &CI) {
// If this zero extend is only used by a truncate, let the truncate be		// If this zero extend is only used by a truncate, let the truncate be
// eliminated before we try to optimize this zext.		// eliminated before we try to optimize this zext.
if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))		if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))
return nullptr;		return nullptr;

// If one of the common conversion will work, do it.		// If one of the common conversion will work, do it.
if (Instruction *Result = commonCastTransforms(CI))		if (Instruction *Result = commonCastTransforms(CI))
return Result;		return Result;
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	if (SrcI && match(SrcI, m_OneUse(m_Xor(m_Value(And), m_Constant(C)))) &&
Constant *ZC = ConstantExpr::getZExt(C, CI.getType());		Constant *ZC = ConstantExpr::getZExt(C, CI.getType());
return BinaryOperator::CreateXor(Builder.CreateAnd(X, ZC), ZC);		return BinaryOperator::CreateXor(Builder.CreateAnd(X, ZC), ZC);
}		}

return nullptr;		return nullptr;
}		}

/// Transform (sext icmp) to bitwise / integer operations to eliminate the icmp.		/// Transform (sext icmp) to bitwise / integer operations to eliminate the icmp.
Instruction InstCombiner::transformSExtICmp(ICmpInst ICI, Instruction &CI) {		Instruction InstCombinerImpl::transformSExtICmp(ICmpInst ICI,
		Instruction &CI) {
Value Op0 = ICI->getOperand(0), Op1 = ICI->getOperand(1);		Value Op0 = ICI->getOperand(0), Op1 = ICI->getOperand(1);
ICmpInst::Predicate Pred = ICI->getPredicate();		ICmpInst::Predicate Pred = ICI->getPredicate();

// Don't bother if Op1 isn't of vector or integer type.		// Don't bother if Op1 isn't of vector or integer type.
if (!Op1->getType()->isIntOrIntVectorTy())		if (!Op1->getType()->isIntOrIntVectorTy())
return nullptr;		return nullptr;

if ((Pred == ICmpInst::ICMP_SLT && match(Op1, m_ZeroInt())) \|\|		if ((Pred == ICmpInst::ICMP_SLT && match(Op1, m_ZeroInt())) \|\|
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	static bool canEvaluateSExtd(Value V, Type Ty) {
default:		default:
// TODO: Can handle more cases here.		// TODO: Can handle more cases here.
break;		break;
}		}

return false;		return false;
}		}

Instruction *InstCombiner::visitSExt(SExtInst &CI) {		Instruction *InstCombinerImpl::visitSExt(SExtInst &CI) {
// If this sign extend is only used by a truncate, let the truncate be		// If this sign extend is only used by a truncate, let the truncate be
// eliminated before we try to optimize this sext.		// eliminated before we try to optimize this sext.
if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))		if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))
return nullptr;		return nullptr;

if (Instruction *I = commonCastTransforms(CI))		if (Instruction *I = commonCastTransforms(CI))
return I;		return I;

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (match(Src, m_AShr(m_Shl(m_Trunc(m_Value(A)), m_ConstantInt(BA)),
Constant *ShAmtV = ConstantInt::get(CI.getType(), ShAmt);		Constant *ShAmtV = ConstantInt::get(CI.getType(), ShAmt);
A = Builder.CreateShl(A, ShAmtV, CI.getName());		A = Builder.CreateShl(A, ShAmtV, CI.getName());
return BinaryOperator::CreateAShr(A, ShAmtV);		return BinaryOperator::CreateAShr(A, ShAmtV);
}		}

return nullptr;		return nullptr;
}		}


/// Return a Constant* for the specified floating-point constant if it fits		/// Return a Constant* for the specified floating-point constant if it fits
/// in the specified FP type without changing its value.		/// in the specified FP type without changing its value.
static bool fitsInFPType(ConstantFP *CFP, const fltSemantics &Sem) {		static bool fitsInFPType(ConstantFP *CFP, const fltSemantics &Sem) {
bool losesInfo;		bool losesInfo;
APFloat F = CFP->getValueAPF();		APFloat F = CFP->getValueAPF();
(void)F.convert(Sem, APFloat::rmNearestTiesToEven, &losesInfo);		(void)F.convert(Sem, APFloat::rmNearestTiesToEven, &losesInfo);
return !losesInfo;		return !losesInfo;
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	static bool isKnownExactCastIntToFP(CastInst &I) {
}		}

// TODO:		// TODO:
// Try harder to find if the source integer type has less significant bits.		// Try harder to find if the source integer type has less significant bits.
// For example, compute number of sign bits or compute low bit mask.		// For example, compute number of sign bits or compute low bit mask.
return false;		return false;
}		}

Instruction *InstCombiner::visitFPTrunc(FPTruncInst &FPT) {		Instruction *InstCombinerImpl::visitFPTrunc(FPTruncInst &FPT) {
if (Instruction *I = commonCastTransforms(FPT))		if (Instruction *I = commonCastTransforms(FPT))
return I;		return I;

// If we have fptrunc(OpI (fpextend x), (fpextend y)), we would like to		// If we have fptrunc(OpI (fpextend x), (fpextend y)), we would like to
// simplify this expression to avoid one or more of the trunc/extend		// simplify this expression to avoid one or more of the trunc/extend
// operations if we can do so without changing the numerical results.		// operations if we can do so without changing the numerical results.
//		//
// The exact manner in which the widths of the operands interact to limit		// The exact manner in which the widths of the operands interact to limit
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	if (isa<SIToFPInst>(Src) \|\| isa<UIToFPInst>(Src)) {
auto *FPCast = cast<CastInst>(Src);		auto *FPCast = cast<CastInst>(Src);
if (isKnownExactCastIntToFP(*FPCast))		if (isKnownExactCastIntToFP(*FPCast))
return CastInst::Create(FPCast->getOpcode(), FPCast->getOperand(0), Ty);		return CastInst::Create(FPCast->getOpcode(), FPCast->getOperand(0), Ty);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFPExt(CastInst &FPExt) {		Instruction *InstCombinerImpl::visitFPExt(CastInst &FPExt) {
// If the source operand is a cast from integer to FP and known exact, then		// If the source operand is a cast from integer to FP and known exact, then
// cast the integer operand directly to the destination type.		// cast the integer operand directly to the destination type.
Type *Ty = FPExt.getType();		Type *Ty = FPExt.getType();
Value *Src = FPExt.getOperand(0);		Value *Src = FPExt.getOperand(0);
if (isa<SIToFPInst>(Src) \|\| isa<UIToFPInst>(Src)) {		if (isa<SIToFPInst>(Src) \|\| isa<UIToFPInst>(Src)) {
auto *FPCast = cast<CastInst>(Src);		auto *FPCast = cast<CastInst>(Src);
if (isKnownExactCastIntToFP(*FPCast))		if (isKnownExactCastIntToFP(*FPCast))
return CastInst::Create(FPCast->getOpcode(), FPCast->getOperand(0), Ty);		return CastInst::Create(FPCast->getOpcode(), FPCast->getOperand(0), Ty);
}		}

return commonCastTransforms(FPExt);		return commonCastTransforms(FPExt);
}		}

/// fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X)		/// fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X)
/// This is safe if the intermediate type has enough bits in its mantissa to		/// This is safe if the intermediate type has enough bits in its mantissa to
/// accurately represent all values of X. For example, this won't work with		/// accurately represent all values of X. For example, this won't work with
/// i64 -> float -> i64.		/// i64 -> float -> i64.
Instruction *InstCombiner::foldItoFPtoI(CastInst &FI) {		Instruction *InstCombinerImpl::foldItoFPtoI(CastInst &FI) {
if (!isa<UIToFPInst>(FI.getOperand(0)) && !isa<SIToFPInst>(FI.getOperand(0)))		if (!isa<UIToFPInst>(FI.getOperand(0)) && !isa<SIToFPInst>(FI.getOperand(0)))
return nullptr;		return nullptr;

auto *OpI = cast<CastInst>(FI.getOperand(0));		auto *OpI = cast<CastInst>(FI.getOperand(0));
Value *X = OpI->getOperand(0);		Value *X = OpI->getOperand(0);
Type *XType = X->getType();		Type *XType = X->getType();
Type *DestType = FI.getType();		Type *DestType = FI.getType();
bool IsOutputSigned = isa<FPToSIInst>(FI);		bool IsOutputSigned = isa<FPToSIInst>(FI);
Show All 23 Lines	Instruction *InstCombinerImpl::foldItoFPtoI(CastInst &FI) {
}		}
if (DestType->getScalarSizeInBits() < XType->getScalarSizeInBits())		if (DestType->getScalarSizeInBits() < XType->getScalarSizeInBits())
return new TruncInst(X, DestType);		return new TruncInst(X, DestType);

assert(XType == DestType && "Unexpected types for int to FP to int casts");		assert(XType == DestType && "Unexpected types for int to FP to int casts");
return replaceInstUsesWith(FI, X);		return replaceInstUsesWith(FI, X);
}		}

Instruction *InstCombiner::visitFPToUI(FPToUIInst &FI) {		Instruction *InstCombinerImpl::visitFPToUI(FPToUIInst &FI) {
if (Instruction *I = foldItoFPtoI(FI))		if (Instruction *I = foldItoFPtoI(FI))
return I;		return I;

return commonCastTransforms(FI);		return commonCastTransforms(FI);
}		}

Instruction *InstCombiner::visitFPToSI(FPToSIInst &FI) {		Instruction *InstCombinerImpl::visitFPToSI(FPToSIInst &FI) {
if (Instruction *I = foldItoFPtoI(FI))		if (Instruction *I = foldItoFPtoI(FI))
return I;		return I;

return commonCastTransforms(FI);		return commonCastTransforms(FI);
}		}

Instruction *InstCombiner::visitUIToFP(CastInst &CI) {		Instruction *InstCombinerImpl::visitUIToFP(CastInst &CI) {
return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

Instruction *InstCombiner::visitSIToFP(CastInst &CI) {		Instruction *InstCombinerImpl::visitSIToFP(CastInst &CI) {
return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

Instruction *InstCombiner::visitIntToPtr(IntToPtrInst &CI) {		Instruction *InstCombinerImpl::visitIntToPtr(IntToPtrInst &CI) {
// If the source integer type is not the intptr_t type for this target, do a		// If the source integer type is not the intptr_t type for this target, do a
// trunc or zext to the intptr_t type, then inttoptr of it. This allows the		// trunc or zext to the intptr_t type, then inttoptr of it. This allows the
// cast to be exposed to other transforms.		// cast to be exposed to other transforms.
unsigned AS = CI.getAddressSpace();		unsigned AS = CI.getAddressSpace();
if (CI.getOperand(0)->getType()->getScalarSizeInBits() !=		if (CI.getOperand(0)->getType()->getScalarSizeInBits() !=
DL.getPointerSizeInBits(AS)) {		DL.getPointerSizeInBits(AS)) {
Type *Ty = DL.getIntPtrType(CI.getContext(), AS);		Type *Ty = DL.getIntPtrType(CI.getContext(), AS);
// Handle vectors of pointers.		// Handle vectors of pointers.
if (auto *CIVTy = dyn_cast<VectorType>(CI.getType()))		if (auto *CIVTy = dyn_cast<VectorType>(CI.getType()))
Ty = VectorType::get(Ty, CIVTy->getElementCount());		Ty = VectorType::get(Ty, CIVTy->getElementCount());

Value *P = Builder.CreateZExtOrTrunc(CI.getOperand(0), Ty);		Value *P = Builder.CreateZExtOrTrunc(CI.getOperand(0), Ty);
return new IntToPtrInst(P, CI.getType());		return new IntToPtrInst(P, CI.getType());
}		}

if (Instruction *I = commonCastTransforms(CI))		if (Instruction *I = commonCastTransforms(CI))
return I;		return I;

return nullptr;		return nullptr;
}		}

/// Implement the transforms for cast of pointer (bitcast/ptrtoint)		/// Implement the transforms for cast of pointer (bitcast/ptrtoint)
Instruction *InstCombiner::commonPointerCastTransforms(CastInst &CI) {		Instruction *InstCombinerImpl::commonPointerCastTransforms(CastInst &CI) {
Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);

if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Src)) {		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Src)) {
// If casting the result of a getelementptr instruction with no offset, turn		// If casting the result of a getelementptr instruction with no offset, turn
// this into a cast of the original pointer!		// this into a cast of the original pointer!
if (GEP->hasAllZeroIndices() &&		if (GEP->hasAllZeroIndices() &&
// If CI is an addrspacecast and GEP changes the poiner type, merging		// If CI is an addrspacecast and GEP changes the poiner type, merging
// GEP into CI would undo canonicalizing addrspacecast with different		// GEP into CI would undo canonicalizing addrspacecast with different
// pointer types, causing infinite loops.		// pointer types, causing infinite loops.
(!isa<AddrSpaceCastInst>(CI) \|\|		(!isa<AddrSpaceCastInst>(CI) \|\|
GEP->getType() == GEP->getPointerOperandType())) {		GEP->getType() == GEP->getPointerOperandType())) {
// Changing the cast operand is usually not a good idea but it is safe		// Changing the cast operand is usually not a good idea but it is safe
// here because the pointer operand is being replaced with another		// here because the pointer operand is being replaced with another
// pointer operand so the opcode doesn't need to change.		// pointer operand so the opcode doesn't need to change.
return replaceOperand(CI, 0, GEP->getOperand(0));		return replaceOperand(CI, 0, GEP->getOperand(0));
}		}
}		}

return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

Instruction *InstCombiner::visitPtrToInt(PtrToIntInst &CI) {		Instruction *InstCombinerImpl::visitPtrToInt(PtrToIntInst &CI) {
// If the destination integer type is not the intptr_t type for this target,		// If the destination integer type is not the intptr_t type for this target,
// do a ptrtoint to intptr_t then do a trunc or zext. This allows the cast		// do a ptrtoint to intptr_t then do a trunc or zext. This allows the cast
// to be exposed to other transforms.		// to be exposed to other transforms.

Type *Ty = CI.getType();		Type *Ty = CI.getType();
unsigned AS = CI.getPointerAddressSpace();		unsigned AS = CI.getPointerAddressSpace();

if (Ty->getScalarSizeInBits() == DL.getPointerSizeInBits(AS))		if (Ty->getScalarSizeInBits() == DL.getPointerSizeInBits(AS))
Show All 21 Lines
/// the least significant bits for little endian. A trunc/zext of an integer		/// the least significant bits for little endian. A trunc/zext of an integer
/// impacts the big end of the integer. Thus, we need to add/remove elements at		/// impacts the big end of the integer. Thus, we need to add/remove elements at
/// the front of the vector for big endian targets, and the back of the vector		/// the front of the vector for big endian targets, and the back of the vector
/// for little endian targets.		/// for little endian targets.
///		///
/// Try to replace it with a shuffle (and vector/vector bitcast) if possible.		/// Try to replace it with a shuffle (and vector/vector bitcast) if possible.
///		///
/// The source and destination vector types may have different element types.		/// The source and destination vector types may have different element types.
static Instruction optimizeVectorResizeWithIntegerBitCasts(Value InVal,		static Instruction *
VectorType *DestTy,		optimizeVectorResizeWithIntegerBitCasts(Value InVal, VectorType DestTy,
InstCombiner &IC) {		InstCombinerImpl &IC) {
// We can only do this optimization if the output is a multiple of the input		// We can only do this optimization if the output is a multiple of the input
// element size, or the input is a multiple of the output element size.		// element size, or the input is a multiple of the output element size.
// Convert the input type to have the same element type as the output.		// Convert the input type to have the same element type as the output.
VectorType *SrcTy = cast<VectorType>(InVal->getType());		VectorType *SrcTy = cast<VectorType>(InVal->getType());

if (SrcTy->getElementType() != DestTy->getElementType()) {		if (SrcTy->getElementType() != DestTy->getElementType()) {
// The input types don't need to be identical, but for now they must be the		// The input types don't need to be identical, but for now they must be the
// same size. There is no specific reason we couldn't handle things like		// same size. There is no specific reason we couldn't handle things like
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
/// %tmp31 = bitcast float %inc5 to i32		/// %tmp31 = bitcast float %inc5 to i32
/// %tmp32 = zext i32 %tmp31 to i64		/// %tmp32 = zext i32 %tmp31 to i64
/// %tmp33 = shl i64 %tmp32, 32		/// %tmp33 = shl i64 %tmp32, 32
/// %ins35 = or i64 %tmp33, %tmp38		/// %ins35 = or i64 %tmp33, %tmp38
/// %tmp43 = bitcast i64 %ins35 to <2 x float>		/// %tmp43 = bitcast i64 %ins35 to <2 x float>
///		///
/// Into two insertelements that do "buildvector{%inc, %inc5}".		/// Into two insertelements that do "buildvector{%inc, %inc5}".
static Value *optimizeIntegerToVectorInsertions(BitCastInst &CI,		static Value *optimizeIntegerToVectorInsertions(BitCastInst &CI,
InstCombiner &IC) {		InstCombinerImpl &IC) {
VectorType *DestVecTy = cast<VectorType>(CI.getType());		VectorType *DestVecTy = cast<VectorType>(CI.getType());
Value *IntInput = CI.getOperand(0);		Value *IntInput = CI.getOperand(0);

SmallVector<Value*, 8> Elements(DestVecTy->getNumElements());		SmallVector<Value*, 8> Elements(DestVecTy->getNumElements());
if (!collectInsertionElements(IntInput, 0, Elements,		if (!collectInsertionElements(IntInput, 0, Elements,
DestVecTy->getElementType(),		DestVecTy->getElementType(),
IC.getDataLayout().isBigEndian()))		IC.getDataLayout().isBigEndian()))
return nullptr;		return nullptr;
Show All 12 Lines	static Value *optimizeIntegerToVectorInsertions(BitCastInst &CI,
return Result;		return Result;
}		}

/// Canonicalize scalar bitcasts of extracted elements into a bitcast of the		/// Canonicalize scalar bitcasts of extracted elements into a bitcast of the
/// vector followed by extract element. The backend tends to handle bitcasts of		/// vector followed by extract element. The backend tends to handle bitcasts of
/// vectors better than bitcasts of scalars because vector registers are		/// vectors better than bitcasts of scalars because vector registers are
/// usually not type-specific like scalar integer or scalar floating-point.		/// usually not type-specific like scalar integer or scalar floating-point.
static Instruction *canonicalizeBitCastExtElt(BitCastInst &BitCast,		static Instruction *canonicalizeBitCastExtElt(BitCastInst &BitCast,
InstCombiner &IC) {		InstCombinerImpl &IC) {
// TODO: Create and use a pattern matcher for ExtractElementInst.		// TODO: Create and use a pattern matcher for ExtractElementInst.
auto *ExtElt = dyn_cast<ExtractElementInst>(BitCast.getOperand(0));		auto *ExtElt = dyn_cast<ExtractElementInst>(BitCast.getOperand(0));
if (!ExtElt \|\| !ExtElt->hasOneUse())		if (!ExtElt \|\| !ExtElt->hasOneUse())
return nullptr;		return nullptr;

// The bitcast must be to a vectorizable type, otherwise we can't make a new		// The bitcast must be to a vectorizable type, otherwise we can't make a new
// type to extract from.		// type to extract from.
Type *DestType = BitCast.getType();		Type *DestType = BitCast.getType();
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
/// This function handles following case		/// This function handles following case
///		///
/// A -> B cast		/// A -> B cast
/// PHI		/// PHI
/// B -> A cast		/// B -> A cast
///		///
/// All the related PHI nodes can be replaced by new PHI nodes with type A.		/// All the related PHI nodes can be replaced by new PHI nodes with type A.
/// The uses of \p CI can be changed to the new PHI node corresponding to \p PN.		/// The uses of \p CI can be changed to the new PHI node corresponding to \p PN.
Instruction InstCombiner::optimizeBitCastFromPhi(CastInst &CI, PHINode PN) {		Instruction *InstCombinerImpl::optimizeBitCastFromPhi(CastInst &CI,
		PHINode *PN) {
// BitCast used by Store can be handled in InstCombineLoadStoreAlloca.cpp.		// BitCast used by Store can be handled in InstCombineLoadStoreAlloca.cpp.
if (hasStoreUsersOnly(CI))		if (hasStoreUsersOnly(CI))
return nullptr;		return nullptr;

Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
Type *SrcTy = Src->getType(); // Type B		Type *SrcTy = Src->getType(); // Type B
Type *DestTy = CI.getType(); // Type A		Type *DestTy = CI.getType(); // Type A

▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	for (auto It = OldPN->user_begin(), End = OldPN->user_end(); It != End; ) {
llvm_unreachable("all uses should be handled");		llvm_unreachable("all uses should be handled");
}		}
}		}
}		}

return RetVal;		return RetVal;
}		}

Instruction *InstCombiner::visitBitCast(BitCastInst &CI) {		Instruction *InstCombinerImpl::visitBitCast(BitCastInst &CI) {
// If the operands are integer typed then apply the integer transforms,		// If the operands are integer typed then apply the integer transforms,
// otherwise just apply the common ones.		// otherwise just apply the common ones.
Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
Type *SrcTy = Src->getType();		Type *SrcTy = Src->getType();
Type *DestTy = CI.getType();		Type *DestTy = CI.getType();

// Get rid of casts from one type to the same type. These are useless and can		// Get rid of casts from one type to the same type. These are useless and can
// be replaced by the operand.		// be replaced by the operand.
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitBitCast(BitCastInst &CI) {
if (Instruction *I = foldBitCastSelect(CI, Builder))		if (Instruction *I = foldBitCastSelect(CI, Builder))
return I;		return I;

if (SrcTy->isPointerTy())		if (SrcTy->isPointerTy())
return commonPointerCastTransforms(CI);		return commonPointerCastTransforms(CI);
return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

Instruction *InstCombiner::visitAddrSpaceCast(AddrSpaceCastInst &CI) {		Instruction *InstCombinerImpl::visitAddrSpaceCast(AddrSpaceCastInst &CI) {
// If the destination pointer element type is not the same as the source's		// If the destination pointer element type is not the same as the source's
// first do a bitcast to the destination type, and then the addrspacecast.		// first do a bitcast to the destination type, and then the addrspacecast.
// This allows the cast to be exposed to other transforms.		// This allows the cast to be exposed to other transforms.
Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
PointerType *SrcTy = cast<PointerType>(Src->getType()->getScalarType());		PointerType *SrcTy = cast<PointerType>(Src->getType()->getScalarType());
PointerType *DestTy = cast<PointerType>(CI.getType()->getScalarType());		PointerType *DestTy = cast<PointerType>(CI.getType()->getScalarType());

Type *DestElemTy = DestTy->getElementType();		Type *DestElemTy = DestTy->getElementType();
Show All 14 Lines

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show All 18 Lines
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

// How many times is a select replaced by one of its operands?		// How many times is a select replaced by one of its operands?
STATISTIC(NumSel, "Number of select opts");		STATISTIC(NumSel, "Number of select opts");
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
/// This is called when we see this pattern:		/// This is called when we see this pattern:
/// cmp pred (load (gep GV, ...)), cmpcst		/// cmp pred (load (gep GV, ...)), cmpcst
/// where GV is a global variable with a constant initializer. Try to simplify		/// where GV is a global variable with a constant initializer. Try to simplify
/// this into some simple computation that does not need the load. For example		/// this into some simple computation that does not need the load. For example
/// we can optimize "icmp eq (load (gep "foo", 0, i)), 0" into "icmp eq i, 3".		/// we can optimize "icmp eq (load (gep "foo", 0, i)), 0" into "icmp eq i, 3".
///		///
/// If AndCst is non-null, then the loaded value is masked with that constant		/// If AndCst is non-null, then the loaded value is masked with that constant
/// before doing the comparison. This handles cases like "A[i]&4 == 0".		/// before doing the comparison. This handles cases like "A[i]&4 == 0".
Instruction InstCombiner::foldCmpLoadFromIndexedGlobal(GetElementPtrInst GEP,		Instruction *
GlobalVariable *GV,		InstCombinerImpl::foldCmpLoadFromIndexedGlobal(GetElementPtrInst *GEP,
CmpInst &ICI,		GlobalVariable *GV, CmpInst &ICI,
ConstantInt *AndCst) {		ConstantInt *AndCst) {
Constant *Init = GV->getInitializer();		Constant *Init = GV->getInitializer();
if (!isa<ConstantArray>(Init) && !isa<ConstantDataArray>(Init))		if (!isa<ConstantArray>(Init) && !isa<ConstantDataArray>(Init))
return nullptr;		return nullptr;

uint64_t ArrayElementCount = Init->getType()->getArrayNumElements();		uint64_t ArrayElementCount = Init->getType()->getArrayNumElements();
// Don't blow up on huge arrays.		// Don't blow up on huge arrays.
if (ArrayElementCount > MaxArraySizeForCombine)		if (ArrayElementCount > MaxArraySizeForCombine)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
/// "icmp ne i, 0". Note that, in general, indices can be complex, and scales		/// "icmp ne i, 0". Note that, in general, indices can be complex, and scales
/// are involved. The above expression would also be legal to codegen as		/// are involved. The above expression would also be legal to codegen as
/// "icmp ne (i*4), 0" (assuming A is a pointer to i32).		/// "icmp ne (i*4), 0" (assuming A is a pointer to i32).
/// This latter form is less amenable to optimization though, and we are allowed		/// This latter form is less amenable to optimization though, and we are allowed
/// to generate the first by knowing that pointer arithmetic doesn't overflow.		/// to generate the first by knowing that pointer arithmetic doesn't overflow.
///		///
/// If we can't emit an optimized form for this expression, this returns null.		/// If we can't emit an optimized form for this expression, this returns null.
///		///
static Value evaluateGEPOffsetExpression(User GEP, InstCombiner &IC,		static Value evaluateGEPOffsetExpression(User GEP, InstCombinerImpl &IC,
const DataLayout &DL) {		const DataLayout &DL) {
gep_type_iterator GTI = gep_type_begin(GEP);		gep_type_iterator GTI = gep_type_begin(GEP);

// Check to see if this gep only has a single variable index. If so, and if		// Check to see if this gep only has a single variable index. If so, and if
// any constant indices are a multiple of its scale, then we can compute this		// any constant indices are a multiple of its scale, then we can compute this
// in terms of the scale of the variable index. For example, if the GEP		// in terms of the scale of the variable index. For example, if the GEP
// implies an offset of "12 + i*4", then we can codegen this as "3 + i",		// implies an offset of "12 + i*4", then we can codegen this as "3 + i",
// because the expression will cross zero at the same point.		// because the expression will cross zero at the same point.
▲ Show 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	static Instruction transformToIndexedCompare(GEPOperator GEPLHS, Value *RHS,
// GEP having PtrBase as the pointer base, and has returned in NewRHS the		// GEP having PtrBase as the pointer base, and has returned in NewRHS the
// offset. Since Index is the offset of LHS to the base pointer, we will now		// offset. Since Index is the offset of LHS to the base pointer, we will now
// compare the offsets instead of comparing the pointers.		// compare the offsets instead of comparing the pointers.
return new ICmpInst(ICmpInst::getSignedPredicate(Cond), Index, NewRHS);		return new ICmpInst(ICmpInst::getSignedPredicate(Cond), Index, NewRHS);
}		}

/// Fold comparisons between a GEP instruction and something else. At this point		/// Fold comparisons between a GEP instruction and something else. At this point
/// we know that the GEP is on the LHS of the comparison.		/// we know that the GEP is on the LHS of the comparison.
Instruction InstCombiner::foldGEPICmp(GEPOperator GEPLHS, Value *RHS,		Instruction InstCombinerImpl::foldGEPICmp(GEPOperator GEPLHS, Value *RHS,
ICmpInst::Predicate Cond,		ICmpInst::Predicate Cond,
Instruction &I) {		Instruction &I) {
// Don't transform signed compares of GEPs into index compares. Even if the		// Don't transform signed compares of GEPs into index compares. Even if the
// GEP is inbounds, the final add of the base pointer can have signed overflow		// GEP is inbounds, the final add of the base pointer can have signed overflow
// and would change the result of the icmp.		// and would change the result of the icmp.
// e.g. "&foo[0] <s &foo[1]" can't be folded to "true" because "foo" could be		// e.g. "&foo[0] <s &foo[1]" can't be folded to "true" because "foo" could be
// the maximum signed value for the pointer type.		// the maximum signed value for the pointer type.
if (ICmpInst::isSigned(Cond))		if (ICmpInst::isSigned(Cond))
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	if (GEPLHS->isInBounds() && ICmpInst::isEquality(Cond) &&
}		}
}		}

// Try convert this to an indexed compare by looking through PHIs/casts as a		// Try convert this to an indexed compare by looking through PHIs/casts as a
// last resort.		// last resort.
return transformToIndexedCompare(GEPLHS, RHS, Cond, DL);		return transformToIndexedCompare(GEPLHS, RHS, Cond, DL);
}		}

Instruction *InstCombiner::foldAllocaCmp(ICmpInst &ICI,		Instruction *InstCombinerImpl::foldAllocaCmp(ICmpInst &ICI,
const AllocaInst *Alloca,		const AllocaInst *Alloca,
const Value *Other) {		const Value *Other) {
assert(ICI.isEquality() && "Cannot fold non-equality comparison.");		assert(ICI.isEquality() && "Cannot fold non-equality comparison.");

// It would be tempting to fold away comparisons between allocas and any		// It would be tempting to fold away comparisons between allocas and any
// pointer not based on that alloca (e.g. an argument). However, even		// pointer not based on that alloca (e.g. an argument). However, even
// though such pointers cannot alias, they can still compare equal.		// though such pointers cannot alias, they can still compare equal.
//		//
// But LLVM doesn't specify where allocas get their memory, so if the alloca		// But LLVM doesn't specify where allocas get their memory, so if the alloca
// doesn't escape we can argue that it's impossible to guess its value, and we		// doesn't escape we can argue that it's impossible to guess its value, and we
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldAllocaCmp(ICmpInst &ICI,

Type *CmpTy = CmpInst::makeCmpResultType(Other->getType());		Type *CmpTy = CmpInst::makeCmpResultType(Other->getType());
return replaceInstUsesWith(		return replaceInstUsesWith(
ICI,		ICI,
ConstantInt::get(CmpTy, !CmpInst::isTrueWhenEqual(ICI.getPredicate())));		ConstantInt::get(CmpTy, !CmpInst::isTrueWhenEqual(ICI.getPredicate())));
}		}

/// Fold "icmp pred (X+C), X".		/// Fold "icmp pred (X+C), X".
Instruction InstCombiner::foldICmpAddOpConst(Value X, const APInt &C,		Instruction InstCombinerImpl::foldICmpAddOpConst(Value X, const APInt &C,
ICmpInst::Predicate Pred) {		ICmpInst::Predicate Pred) {
// From this point on, we know that (X+C <= X) --> (X+C < X) because C != 0,		// From this point on, we know that (X+C <= X) --> (X+C < X) because C != 0,
// so the values can never be equal. Similarly for all other "or equals"		// so the values can never be equal. Similarly for all other "or equals"
// operators.		// operators.
assert(!!C && "C should not be zero!");		assert(!!C && "C should not be zero!");

// (X+1) <u X --> X >u (MAXUINT-1) --> X == 255		// (X+1) <u X --> X >u (MAXUINT-1) --> X == 255
// (X+2) <u X --> X >u (MAXUINT-2) --> X > 253		// (X+2) <u X --> X >u (MAXUINT-2) --> X > 253
// (X+MAXUINT) <u X --> X >u (MAXUINT-MAXUINT) --> X != 0		// (X+MAXUINT) <u X --> X >u (MAXUINT-MAXUINT) --> X != 0
Show All 32 Lines	Instruction InstCombinerImpl::foldICmpAddOpConst(Value X, const APInt &C,
assert(Pred == ICmpInst::ICMP_SGT \|\| Pred == ICmpInst::ICMP_SGE);		assert(Pred == ICmpInst::ICMP_SGT \|\| Pred == ICmpInst::ICMP_SGE);
return new ICmpInst(ICmpInst::ICMP_SLT, X,		return new ICmpInst(ICmpInst::ICMP_SLT, X,
ConstantInt::get(X->getType(), SMax - (C - 1)));		ConstantInt::get(X->getType(), SMax - (C - 1)));
}		}

/// Handle "(icmp eq/ne (ashr/lshr AP2, A), AP1)" ->		/// Handle "(icmp eq/ne (ashr/lshr AP2, A), AP1)" ->
/// (icmp eq/ne A, Log2(AP2/AP1)) ->		/// (icmp eq/ne A, Log2(AP2/AP1)) ->
/// (icmp eq/ne A, Log2(AP2) - Log2(AP1)).		/// (icmp eq/ne A, Log2(AP2) - Log2(AP1)).
Instruction InstCombiner::foldICmpShrConstConst(ICmpInst &I, Value A,		Instruction InstCombinerImpl::foldICmpShrConstConst(ICmpInst &I, Value A,
const APInt &AP1,		const APInt &AP1,
const APInt &AP2) {		const APInt &AP2) {
assert(I.isEquality() && "Cannot fold icmp gt/lt");		assert(I.isEquality() && "Cannot fold icmp gt/lt");

auto getICmp = [&I](CmpInst::Predicate Pred, Value LHS, Value RHS) {		auto getICmp = [&I](CmpInst::Predicate Pred, Value LHS, Value RHS) {
if (I.getPredicate() == I.ICMP_NE)		if (I.getPredicate() == I.ICMP_NE)
Pred = CmpInst::getInversePredicate(Pred);		Pred = CmpInst::getInversePredicate(Pred);
return new ICmpInst(Pred, LHS, RHS);		return new ICmpInst(Pred, LHS, RHS);
};		};

Show All 40 Lines	Instruction InstCombinerImpl::foldICmpShrConstConst(ICmpInst &I, Value A,
// Shifting const2 will never be equal to const1.		// Shifting const2 will never be equal to const1.
// FIXME: This should always be handled by InstSimplify?		// FIXME: This should always be handled by InstSimplify?
auto *TorF = ConstantInt::get(I.getType(), I.getPredicate() == I.ICMP_NE);		auto *TorF = ConstantInt::get(I.getType(), I.getPredicate() == I.ICMP_NE);
return replaceInstUsesWith(I, TorF);		return replaceInstUsesWith(I, TorF);
}		}

/// Handle "(icmp eq/ne (shl AP2, A), AP1)" ->		/// Handle "(icmp eq/ne (shl AP2, A), AP1)" ->
/// (icmp eq/ne A, TrailingZeros(AP1) - TrailingZeros(AP2)).		/// (icmp eq/ne A, TrailingZeros(AP1) - TrailingZeros(AP2)).
Instruction InstCombiner::foldICmpShlConstConst(ICmpInst &I, Value A,		Instruction InstCombinerImpl::foldICmpShlConstConst(ICmpInst &I, Value A,
const APInt &AP1,		const APInt &AP1,
const APInt &AP2) {		const APInt &AP2) {
assert(I.isEquality() && "Cannot fold icmp gt/lt");		assert(I.isEquality() && "Cannot fold icmp gt/lt");

auto getICmp = [&I](CmpInst::Predicate Pred, Value LHS, Value RHS) {		auto getICmp = [&I](CmpInst::Predicate Pred, Value LHS, Value RHS) {
if (I.getPredicate() == I.ICMP_NE)		if (I.getPredicate() == I.ICMP_NE)
Pred = CmpInst::getInversePredicate(Pred);		Pred = CmpInst::getInversePredicate(Pred);
return new ICmpInst(Pred, LHS, RHS);		return new ICmpInst(Pred, LHS, RHS);
};		};

Show All 27 Lines
/// I = icmp ugt (add (add A, B), CI2), CI1		/// I = icmp ugt (add (add A, B), CI2), CI1
/// If this is of the form:		/// If this is of the form:
/// sum = a + b		/// sum = a + b
/// if (sum+128 >u 255)		/// if (sum+128 >u 255)
/// Then replace it with llvm.sadd.with.overflow.i8.		/// Then replace it with llvm.sadd.with.overflow.i8.
///		///
static Instruction processUGT_ADDCST_ADD(ICmpInst &I, Value A, Value *B,		static Instruction processUGT_ADDCST_ADD(ICmpInst &I, Value A, Value *B,
ConstantInt CI2, ConstantInt CI1,		ConstantInt CI2, ConstantInt CI1,
InstCombiner &IC) {		InstCombinerImpl &IC) {
// The transformation we're trying to do here is to transform this into an		// The transformation we're trying to do here is to transform this into an
// llvm.sadd.with.overflow. To do this, we have to replace the original add		// llvm.sadd.with.overflow. To do this, we have to replace the original add
// with a narrower add, and discard the add-with-constant that is part of the		// with a narrower add, and discard the add-with-constant that is part of the
// range check (if we can't eliminate it, this isn't profitable).		// range check (if we can't eliminate it, this isn't profitable).

// In order to eliminate the add-with-constant, the compare can be its only		// In order to eliminate the add-with-constant, the compare can be its only
// use.		// use.
Instruction *AddWithCst = cast<Instruction>(I.getOperand(0));		Instruction *AddWithCst = cast<Instruction>(I.getOperand(0));
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	static Instruction processUGT_ADDCST_ADD(ICmpInst &I, Value A, Value *B,
// The original icmp gets replaced with the overflow value.		// The original icmp gets replaced with the overflow value.
return ExtractValueInst::Create(Call, 1, "sadd.overflow");		return ExtractValueInst::Create(Call, 1, "sadd.overflow");
}		}

/// If we have:		/// If we have:
/// icmp eq/ne (urem/srem %x, %y), 0		/// icmp eq/ne (urem/srem %x, %y), 0
/// iff %y is a power-of-two, we can replace this with a bit test:		/// iff %y is a power-of-two, we can replace this with a bit test:
/// icmp eq/ne (and %x, (add %y, -1)), 0		/// icmp eq/ne (and %x, (add %y, -1)), 0
Instruction *InstCombiner::foldIRemByPowerOfTwoToBitTest(ICmpInst &I) {		Instruction *InstCombinerImpl::foldIRemByPowerOfTwoToBitTest(ICmpInst &I) {
// This fold is only valid for equality predicates.		// This fold is only valid for equality predicates.
if (!I.isEquality())		if (!I.isEquality())
return nullptr;		return nullptr;
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
Value X, Y, *Zero;		Value X, Y, *Zero;
if (!match(&I, m_ICmp(Pred, m_OneUse(m_IRem(m_Value(X), m_Value(Y))),		if (!match(&I, m_ICmp(Pred, m_OneUse(m_IRem(m_Value(X), m_Value(Y))),
m_CombineAnd(m_Zero(), m_Value(Zero)))))		m_CombineAnd(m_Zero(), m_Value(Zero)))))
return nullptr;		return nullptr;
if (!isKnownToBeAPowerOfTwo(Y, /OrZero/ true, 0, &I))		if (!isKnownToBeAPowerOfTwo(Y, /OrZero/ true, 0, &I))
return nullptr;		return nullptr;
// This may increase instruction count, we don't enforce that Y is a constant.		// This may increase instruction count, we don't enforce that Y is a constant.
Value *Mask = Builder.CreateAdd(Y, Constant::getAllOnesValue(Y->getType()));		Value *Mask = Builder.CreateAdd(Y, Constant::getAllOnesValue(Y->getType()));
Value *Masked = Builder.CreateAnd(X, Mask);		Value *Masked = Builder.CreateAnd(X, Mask);
return ICmpInst::Create(Instruction::ICmp, Pred, Masked, Zero);		return ICmpInst::Create(Instruction::ICmp, Pred, Masked, Zero);
}		}

/// Fold equality-comparison between zero and any (maybe truncated) right-shift		/// Fold equality-comparison between zero and any (maybe truncated) right-shift
/// by one-less-than-bitwidth into a sign test on the original value.		/// by one-less-than-bitwidth into a sign test on the original value.
Instruction *InstCombiner::foldSignBitTest(ICmpInst &I) {		Instruction *InstCombinerImpl::foldSignBitTest(ICmpInst &I) {
Instruction *Val;		Instruction *Val;
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
if (!I.isEquality() \|\| !match(&I, m_ICmp(Pred, m_Instruction(Val), m_Zero())))		if (!I.isEquality() \|\| !match(&I, m_ICmp(Pred, m_Instruction(Val), m_Zero())))
return nullptr;		return nullptr;

Value *X;		Value *X;
Type *XTy;		Type *XTy;

Show All 14 Lines	Instruction *InstCombinerImpl::foldSignBitTest(ICmpInst &I) {

return ICmpInst::Create(Instruction::ICmp,		return ICmpInst::Create(Instruction::ICmp,
Pred == ICmpInst::ICMP_EQ ? ICmpInst::ICMP_SGE		Pred == ICmpInst::ICMP_EQ ? ICmpInst::ICMP_SGE
: ICmpInst::ICMP_SLT,		: ICmpInst::ICMP_SLT,
X, ConstantInt::getNullValue(XTy));		X, ConstantInt::getNullValue(XTy));
}		}

// Handle icmp pred X, 0		// Handle icmp pred X, 0
Instruction *InstCombiner::foldICmpWithZero(ICmpInst &Cmp) {		Instruction *InstCombinerImpl::foldICmpWithZero(ICmpInst &Cmp) {
CmpInst::Predicate Pred = Cmp.getPredicate();		CmpInst::Predicate Pred = Cmp.getPredicate();
if (!match(Cmp.getOperand(1), m_Zero()))		if (!match(Cmp.getOperand(1), m_Zero()))
return nullptr;		return nullptr;

// (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)		// (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)
if (Pred == ICmpInst::ICMP_SGT) {		if (Pred == ICmpInst::ICMP_SGT) {
Value A, B;		Value A, B;
SelectPatternResult SPR = matchSelectPattern(Cmp.getOperand(0), A, B);		SelectPatternResult SPR = matchSelectPattern(Cmp.getOperand(0), A, B);
Show All 24 Lines	Instruction *InstCombinerImpl::foldICmpWithZero(ICmpInst &Cmp) {
return nullptr;		return nullptr;
}		}

/// Fold icmp Pred X, C.		/// Fold icmp Pred X, C.
/// TODO: This code structure does not make sense. The saturating add fold		/// TODO: This code structure does not make sense. The saturating add fold
/// should be moved to some other helper and extended as noted below (it is also		/// should be moved to some other helper and extended as noted below (it is also
/// possible that code has been made unnecessary - do we canonicalize IR to		/// possible that code has been made unnecessary - do we canonicalize IR to
/// overflow/saturating intrinsics or not?).		/// overflow/saturating intrinsics or not?).
Instruction *InstCombiner::foldICmpWithConstant(ICmpInst &Cmp) {		Instruction *InstCombinerImpl::foldICmpWithConstant(ICmpInst &Cmp) {
// Match the following pattern, which is a common idiom when writing		// Match the following pattern, which is a common idiom when writing
// overflow-safe integer arithmetic functions. The source performs an addition		// overflow-safe integer arithmetic functions. The source performs an addition
// in wider type and explicitly checks for overflow using comparisons against		// in wider type and explicitly checks for overflow using comparisons against
// INT_MIN and INT_MAX. Simplify by using the sadd_with_overflow intrinsic.		// INT_MIN and INT_MAX. Simplify by using the sadd_with_overflow intrinsic.
//		//
// TODO: This could probably be generalized to handle other overflow-safe		// TODO: This could probably be generalized to handle other overflow-safe
// operations if we worked out the formulas to compute the appropriate magic		// operations if we worked out the formulas to compute the appropriate magic
// constants.		// constants.
Show All 29 Lines	if (all_of(Phi->operands(), [](Value *V) { return isa<Constant>(V); })) {
NewPhi->takeName(&Cmp);		NewPhi->takeName(&Cmp);
return replaceInstUsesWith(Cmp, NewPhi);		return replaceInstUsesWith(Cmp, NewPhi);
}		}

return nullptr;		return nullptr;
}		}

/// Canonicalize icmp instructions based on dominating conditions.		/// Canonicalize icmp instructions based on dominating conditions.
Instruction *InstCombiner::foldICmpWithDominatingICmp(ICmpInst &Cmp) {		Instruction *InstCombinerImpl::foldICmpWithDominatingICmp(ICmpInst &Cmp) {
// This is a cheap/incomplete check for dominance - just match a single		// This is a cheap/incomplete check for dominance - just match a single
// predecessor with a conditional branch.		// predecessor with a conditional branch.
BasicBlock *CmpBB = Cmp.getParent();		BasicBlock *CmpBB = Cmp.getParent();
BasicBlock *DomBB = CmpBB->getSinglePredecessor();		BasicBlock *DomBB = CmpBB->getSinglePredecessor();
if (!DomBB)		if (!DomBB)
return nullptr;		return nullptr;

Value *DomCond;		Value *DomCond;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (match(DomCond, m_ICmp(DomPred, m_Specific(X), m_APInt(DomC))) &&
if (const APInt *NeC = Difference.getSingleElement())		if (const APInt *NeC = Difference.getSingleElement())
return new ICmpInst(ICmpInst::ICMP_NE, X, Builder.getInt(*NeC));		return new ICmpInst(ICmpInst::ICMP_NE, X, Builder.getInt(*NeC));
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (trunc X, Y), C.		/// Fold icmp (trunc X, Y), C.
Instruction *InstCombiner::foldICmpTruncConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpTruncConstant(ICmpInst &Cmp,
TruncInst *Trunc,		TruncInst *Trunc,
const APInt &C) {		const APInt &C) {
ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
Value *X = Trunc->getOperand(0);		Value *X = Trunc->getOperand(0);
if (C.isOneValue() && C.getBitWidth() > 1) {		if (C.isOneValue() && C.getBitWidth() > 1) {
// icmp slt trunc(signum(V)) 1 --> icmp slt V, 1		// icmp slt trunc(signum(V)) 1 --> icmp slt V, 1
Value *V = nullptr;		Value *V = nullptr;
if (Pred == ICmpInst::ICMP_SLT && match(X, m_Signum(m_Value(V))))		if (Pred == ICmpInst::ICMP_SLT && match(X, m_Signum(m_Value(V))))
return new ICmpInst(ICmpInst::ICMP_SLT, V,		return new ICmpInst(ICmpInst::ICMP_SLT, V,
ConstantInt::get(V->getType(), 1));		ConstantInt::get(V->getType(), 1));
Show All 14 Lines	if ((Known.Zero \| Known.One).countLeadingOnes() >= SrcBits - DstBits) {
return new ICmpInst(Pred, X, ConstantInt::get(X->getType(), NewRHS));		return new ICmpInst(Pred, X, ConstantInt::get(X->getType(), NewRHS));
}		}
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (xor X, Y), C.		/// Fold icmp (xor X, Y), C.
Instruction *InstCombiner::foldICmpXorConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpXorConstant(ICmpInst &Cmp,
BinaryOperator *Xor,		BinaryOperator *Xor,
const APInt &C) {		const APInt &C) {
Value *X = Xor->getOperand(0);		Value *X = Xor->getOperand(0);
Value *Y = Xor->getOperand(1);		Value *Y = Xor->getOperand(1);
const APInt *XorC;		const APInt *XorC;
if (!match(Y, m_APInt(XorC)))		if (!match(Y, m_APInt(XorC)))
return nullptr;		return nullptr;

// If this is a comparison that tests the signbit (X < 0) or (x > -1),		// If this is a comparison that tests the signbit (X < 0) or (x > -1),
// fold the xor.		// fold the xor.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (Pred == ICmpInst::ICMP_ULT) {
if (*XorC == C && (-C).isPowerOf2())		if (*XorC == C && (-C).isPowerOf2())
return new ICmpInst(ICmpInst::ICMP_UGT, X,		return new ICmpInst(ICmpInst::ICMP_UGT, X,
ConstantInt::get(X->getType(), ~C));		ConstantInt::get(X->getType(), ~C));
}		}
return nullptr;		return nullptr;
}		}

/// Fold icmp (and (sh X, Y), C2), C1.		/// Fold icmp (and (sh X, Y), C2), C1.
Instruction InstCombiner::foldICmpAndShift(ICmpInst &Cmp, BinaryOperator And,		Instruction *InstCombinerImpl::foldICmpAndShift(ICmpInst &Cmp,
const APInt &C1, const APInt &C2) {		BinaryOperator *And,
		const APInt &C1,
		const APInt &C2) {
BinaryOperator *Shift = dyn_cast<BinaryOperator>(And->getOperand(0));		BinaryOperator *Shift = dyn_cast<BinaryOperator>(And->getOperand(0));
if (!Shift \|\| !Shift->isShift())		if (!Shift \|\| !Shift->isShift())
return nullptr;		return nullptr;

// If this is: (X >> C3) & C2 != C1 (where any shift and any compare could		// If this is: (X >> C3) & C2 != C1 (where any shift and any compare could
// exist), turn it into (X & (C2 << C3)) != (C1 << C3). This happens a LOT in		// exist), turn it into (X & (C2 << C3)) != (C1 << C3). This happens a LOT in
// code produced by the clang front-end, for bitfield access.		// code produced by the clang front-end, for bitfield access.
// This seemingly simple opportunity to fold away a shift turns out to be		// This seemingly simple opportunity to fold away a shift turns out to be
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (Shift->hasOneUse() && C1.isNullValue() && Cmp.isEquality() &&
Value *NewAnd = Builder.CreateAnd(Shift->getOperand(0), NewShift);		Value *NewAnd = Builder.CreateAnd(Shift->getOperand(0), NewShift);
return replaceOperand(Cmp, 0, NewAnd);		return replaceOperand(Cmp, 0, NewAnd);
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (and X, C2), C1.		/// Fold icmp (and X, C2), C1.
Instruction *InstCombiner::foldICmpAndConstConst(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpAndConstConst(ICmpInst &Cmp,
BinaryOperator *And,		BinaryOperator *And,
const APInt &C1) {		const APInt &C1) {
bool isICMP_NE = Cmp.getPredicate() == ICmpInst::ICMP_NE;		bool isICMP_NE = Cmp.getPredicate() == ICmpInst::ICMP_NE;

// For vectors: icmp ne (and X, 1), 0 --> trunc X to N x i1		// For vectors: icmp ne (and X, 1), 0 --> trunc X to N x i1
// TODO: We canonicalize to the longer form for scalars because we have		// TODO: We canonicalize to the longer form for scalars because we have
// better analysis/folds for icmp, and codegen may be better with icmp.		// better analysis/folds for icmp, and codegen may be better with icmp.
if (isICMP_NE && Cmp.getType()->isVectorTy() && C1.isNullValue() &&		if (isICMP_NE && Cmp.getType()->isVectorTy() && C1.isNullValue() &&
match(And->getOperand(1), m_One()))		match(And->getOperand(1), m_One()))
return new TruncInst(And->getOperand(0), Cmp.getType());		return new TruncInst(And->getOperand(0), Cmp.getType());
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	if (match(Or, m_Or(m_Value(LShr), m_Value(A))) &&
}		}
}		}
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (and X, Y), C.		/// Fold icmp (and X, Y), C.
Instruction *InstCombiner::foldICmpAndConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpAndConstant(ICmpInst &Cmp,
BinaryOperator *And,		BinaryOperator *And,
const APInt &C) {		const APInt &C) {
if (Instruction *I = foldICmpAndConstConst(Cmp, And, C))		if (Instruction *I = foldICmpAndConstConst(Cmp, And, C))
return I;		return I;

// TODO: These all require that Y is constant too, so refactor with the above.		// TODO: These all require that Y is constant too, so refactor with the above.

// Try to optimize things like "A[i] & 42 == 0" to index computations.		// Try to optimize things like "A[i] & 42 == 0" to index computations.
Value *X = And->getOperand(0);		Value *X = And->getOperand(0);
Value *Y = And->getOperand(1);		Value *Y = And->getOperand(1);
Show All 35 Lines	if (ExactLogBase2 != -1 && DL.isLegalInteger(ExactLogBase2 + 1)) {
return new ICmpInst(NewPred, Trunc, Constant::getNullValue(NTy));		return new ICmpInst(NewPred, Trunc, Constant::getNullValue(NTy));
}		}
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (or X, Y), C.		/// Fold icmp (or X, Y), C.
Instruction InstCombiner::foldICmpOrConstant(ICmpInst &Cmp, BinaryOperator Or,		Instruction *InstCombinerImpl::foldICmpOrConstant(ICmpInst &Cmp,
		BinaryOperator *Or,
const APInt &C) {		const APInt &C) {
ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
if (C.isOneValue()) {		if (C.isOneValue()) {
// icmp slt signum(V) 1 --> icmp slt V, 1		// icmp slt signum(V) 1 --> icmp slt V, 1
Value *V = nullptr;		Value *V = nullptr;
if (Pred == ICmpInst::ICMP_SLT && match(Or, m_Signum(m_Value(V))))		if (Pred == ICmpInst::ICMP_SLT && match(Or, m_Signum(m_Value(V))))
return new ICmpInst(ICmpInst::ICMP_SLT, V,		return new ICmpInst(ICmpInst::ICMP_SLT, V,
ConstantInt::get(V->getType(), 1));		ConstantInt::get(V->getType(), 1));
}		}
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (match(OrOp0, m_OneUse(m_Xor(m_Value(X1), m_Value(X2)))) &&
auto BOpc = Pred == CmpInst::ICMP_EQ ? Instruction::And : Instruction::Or;		auto BOpc = Pred == CmpInst::ICMP_EQ ? Instruction::And : Instruction::Or;
return BinaryOperator::Create(BOpc, Cmp12, Cmp34);		return BinaryOperator::Create(BOpc, Cmp12, Cmp34);
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (mul X, Y), C.		/// Fold icmp (mul X, Y), C.
Instruction *InstCombiner::foldICmpMulConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpMulConstant(ICmpInst &Cmp,
BinaryOperator *Mul,		BinaryOperator *Mul,
const APInt &C) {		const APInt &C) {
const APInt *MulC;		const APInt *MulC;
if (!match(Mul->getOperand(1), m_APInt(MulC)))		if (!match(Mul->getOperand(1), m_APInt(MulC)))
return nullptr;		return nullptr;

// If this is a test of the sign bit and the multiply is sign-preserving with		// If this is a test of the sign bit and the multiply is sign-preserving with
// a constant operand, use the multiply LHS operand instead.		// a constant operand, use the multiply LHS operand instead.
ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
if (isSignTest(Pred, C) && Mul->hasNoSignedWrap()) {		if (isSignTest(Pred, C) && Mul->hasNoSignedWrap()) {
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	static Instruction foldICmpShlOne(ICmpInst &Cmp, Instruction Shl,
} else if (Cmp.isEquality() && CIsPowerOf2) {		} else if (Cmp.isEquality() && CIsPowerOf2) {
return new ICmpInst(Pred, Y, ConstantInt::get(ShiftType, C.logBase2()));		return new ICmpInst(Pred, Y, ConstantInt::get(ShiftType, C.logBase2()));
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (shl X, Y), C.		/// Fold icmp (shl X, Y), C.
Instruction *InstCombiner::foldICmpShlConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpShlConstant(ICmpInst &Cmp,
BinaryOperator *Shl,		BinaryOperator *Shl,
const APInt &C) {		const APInt &C) {
const APInt *ShiftVal;		const APInt *ShiftVal;
if (Cmp.isEquality() && match(Shl->getOperand(0), m_APInt(ShiftVal)))		if (Cmp.isEquality() && match(Shl->getOperand(0), m_APInt(ShiftVal)))
return foldICmpShlConstConst(Cmp, Shl->getOperand(1), C, *ShiftVal);		return foldICmpShlConstConst(Cmp, Shl->getOperand(1), C, *ShiftVal);

const APInt *ShiftAmt;		const APInt *ShiftAmt;
if (!match(Shl->getOperand(1), m_APInt(ShiftAmt)))		if (!match(Shl->getOperand(1), m_APInt(ShiftAmt)))
return foldICmpShlOne(Cmp, Shl, C);		return foldICmpShlOne(Cmp, Shl, C);

▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	Constant *NewC =
ConstantInt::get(TruncTy, C.ashr(*ShiftAmt).trunc(TypeBits - Amt));		ConstantInt::get(TruncTy, C.ashr(*ShiftAmt).trunc(TypeBits - Amt));
return new ICmpInst(Pred, Builder.CreateTrunc(X, TruncTy), NewC);		return new ICmpInst(Pred, Builder.CreateTrunc(X, TruncTy), NewC);
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp ({al}shr X, Y), C.		/// Fold icmp ({al}shr X, Y), C.
Instruction *InstCombiner::foldICmpShrConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
BinaryOperator *Shr,		BinaryOperator *Shr,
const APInt &C) {		const APInt &C) {
// An exact shr only shifts out zero bits, so:		// An exact shr only shifts out zero bits, so:
// icmp eq/ne (shr X, Y), 0 --> icmp eq/ne X, 0		// icmp eq/ne (shr X, Y), 0 --> icmp eq/ne X, 0
Value *X = Shr->getOperand(0);		Value *X = Shr->getOperand(0);
CmpInst::Predicate Pred = Cmp.getPredicate();		CmpInst::Predicate Pred = Cmp.getPredicate();
if (Cmp.isEquality() && Shr->isExact() && Shr->hasOneUse() &&		if (Cmp.isEquality() && Shr->isExact() && Shr->hasOneUse() &&
C.isNullValue())		C.isNullValue())
return new ICmpInst(Pred, X, Cmp.getOperand(1));		return new ICmpInst(Pred, X, Cmp.getOperand(1));

▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (Shr->hasOneUse()) {
Constant *Mask = ConstantInt::get(ShrTy, Val);		Constant *Mask = ConstantInt::get(ShrTy, Val);
Value *And = Builder.CreateAnd(X, Mask, Shr->getName() + ".mask");		Value *And = Builder.CreateAnd(X, Mask, Shr->getName() + ".mask");
return new ICmpInst(Pred, And, ConstantInt::get(ShrTy, C << ShAmtVal));		return new ICmpInst(Pred, And, ConstantInt::get(ShrTy, C << ShAmtVal));
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::foldICmpSRemConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,
BinaryOperator *SRem,		BinaryOperator *SRem,
const APInt &C) {		const APInt &C) {
// Match an 'is positive' or 'is negative' comparison of remainder by a		// Match an 'is positive' or 'is negative' comparison of remainder by a
// constant power-of-2 value:		// constant power-of-2 value:
// (X % pow2C) sgt/slt 0		// (X % pow2C) sgt/slt 0
const ICmpInst::Predicate Pred = Cmp.getPredicate();		const ICmpInst::Predicate Pred = Cmp.getPredicate();
if (Pred != ICmpInst::ICMP_SGT && Pred != ICmpInst::ICMP_SLT)		if (Pred != ICmpInst::ICMP_SGT && Pred != ICmpInst::ICMP_SLT)
return nullptr;		return nullptr;

// TODO: The one-use check is standard because we do not typically want to		// TODO: The one-use check is standard because we do not typically want to
Show All 20 Lines	Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,

// For 'is negative?' check that the sign-bit is set and at least 1 masked		// For 'is negative?' check that the sign-bit is set and at least 1 masked
// bit is set. Example:		// bit is set. Example:
// (i16 X % 4) s< 0 --> (X & 32771) u> 32768		// (i16 X % 4) s< 0 --> (X & 32771) u> 32768
return new ICmpInst(ICmpInst::ICMP_UGT, And, ConstantInt::get(Ty, SignMask));		return new ICmpInst(ICmpInst::ICMP_UGT, And, ConstantInt::get(Ty, SignMask));
}		}

/// Fold icmp (udiv X, Y), C.		/// Fold icmp (udiv X, Y), C.
Instruction *InstCombiner::foldICmpUDivConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpUDivConstant(ICmpInst &Cmp,
BinaryOperator *UDiv,		BinaryOperator *UDiv,
const APInt &C) {		const APInt &C) {
const APInt *C2;		const APInt *C2;
if (!match(UDiv->getOperand(0), m_APInt(C2)))		if (!match(UDiv->getOperand(0), m_APInt(C2)))
return nullptr;		return nullptr;

assert(*C2 != 0 && "udiv 0, X should have been simplified already.");		assert(*C2 != 0 && "udiv 0, X should have been simplified already.");

// (icmp ugt (udiv C2, Y), C) -> (icmp ule Y, C2/(C+1))		// (icmp ugt (udiv C2, Y), C) -> (icmp ule Y, C2/(C+1))
Value *Y = UDiv->getOperand(1);		Value *Y = UDiv->getOperand(1);
Show All 10 Lines	if (Cmp.getPredicate() == ICmpInst::ICMP_ULT) {
return new ICmpInst(ICmpInst::ICMP_UGT, Y,		return new ICmpInst(ICmpInst::ICMP_UGT, Y,
ConstantInt::get(Y->getType(), C2->udiv(C)));		ConstantInt::get(Y->getType(), C2->udiv(C)));
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp ({su}div X, Y), C.		/// Fold icmp ({su}div X, Y), C.
Instruction *InstCombiner::foldICmpDivConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpDivConstant(ICmpInst &Cmp,
BinaryOperator *Div,		BinaryOperator *Div,
const APInt &C) {		const APInt &C) {
// Fold: icmp pred ([us]div X, C2), C -> range test		// Fold: icmp pred ([us]div X, C2), C -> range test
// Fold this div into the comparison, producing a range check.		// Fold this div into the comparison, producing a range check.
// Determine, based on the divide type, what the range is being		// Determine, based on the divide type, what the range is being
// checked. If there is an overflow on the low or high side, remember		// checked. If there is an overflow on the low or high side, remember
// it, otherwise compute the range [low, hi) bounding the new value.		// it, otherwise compute the range [low, hi) bounding the new value.
// See: InsertRangeTest above for the kinds of replacements possible.		// See: InsertRangeTest above for the kinds of replacements possible.
const APInt *C2;		const APInt *C2;
if (!match(Div->getOperand(1), m_APInt(C2)))		if (!match(Div->getOperand(1), m_APInt(C2)))
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	case ICmpInst::ICMP_SGT:
return new ICmpInst(ICmpInst::ICMP_SGE, X,		return new ICmpInst(ICmpInst::ICMP_SGE, X,
ConstantInt::get(Div->getType(), HiBound));		ConstantInt::get(Div->getType(), HiBound));
}		}

return nullptr;		return nullptr;
}		}

/// Fold icmp (sub X, Y), C.		/// Fold icmp (sub X, Y), C.
Instruction *InstCombiner::foldICmpSubConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpSubConstant(ICmpInst &Cmp,
BinaryOperator *Sub,		BinaryOperator *Sub,
const APInt &C) {		const APInt &C) {
Value X = Sub->getOperand(0), Y = Sub->getOperand(1);		Value X = Sub->getOperand(0), Y = Sub->getOperand(1);
ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
const APInt *C2;		const APInt *C2;
APInt SubResult;		APInt SubResult;

// icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0		// icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0
if (match(X, m_APInt(C2)) && *C2 == C && Cmp.isEquality())		if (match(X, m_APInt(C2)) && *C2 == C && Cmp.isEquality())
return new ICmpInst(Cmp.getPredicate(), Y,		return new ICmpInst(Cmp.getPredicate(), Y,
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldICmpSubConstant(ICmpInst &Cmp,
// iff C2 & C == C and C + 1 is a power of 2		// iff C2 & C == C and C + 1 is a power of 2
if (Pred == ICmpInst::ICMP_UGT && (C + 1).isPowerOf2() && (*C2 & C) == C)		if (Pred == ICmpInst::ICMP_UGT && (C + 1).isPowerOf2() && (*C2 & C) == C)
return new ICmpInst(ICmpInst::ICMP_NE, Builder.CreateOr(Y, C), X);		return new ICmpInst(ICmpInst::ICMP_NE, Builder.CreateOr(Y, C), X);

return nullptr;		return nullptr;
}		}

/// Fold icmp (add X, Y), C.		/// Fold icmp (add X, Y), C.
Instruction *InstCombiner::foldICmpAddConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpAddConstant(ICmpInst &Cmp,
BinaryOperator *Add,		BinaryOperator *Add,
const APInt &C) {		const APInt &C) {
Value *Y = Add->getOperand(1);		Value *Y = Add->getOperand(1);
const APInt *C2;		const APInt *C2;
if (Cmp.isEquality() \|\| !match(Y, m_APInt(C2)))		if (Cmp.isEquality() \|\| !match(Y, m_APInt(C2)))
return nullptr;		return nullptr;

// Fold icmp pred (add X, C2), C.		// Fold icmp pred (add X, C2), C.
Value *X = Add->getOperand(0);		Value *X = Add->getOperand(0);
Type *Ty = Add->getType();		Type *Ty = Add->getType();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldICmpAddConstant(ICmpInst &Cmp,
// C2+1 is a power of 2		// C2+1 is a power of 2
if (Pred == ICmpInst::ICMP_UGT && (C + 1).isPowerOf2() && (*C2 & C) == 0)		if (Pred == ICmpInst::ICMP_UGT && (C + 1).isPowerOf2() && (*C2 & C) == 0)
return new ICmpInst(ICmpInst::ICMP_NE, Builder.CreateAnd(X, ~C),		return new ICmpInst(ICmpInst::ICMP_NE, Builder.CreateAnd(X, ~C),
ConstantExpr::getNeg(cast<Constant>(Y)));		ConstantExpr::getNeg(cast<Constant>(Y)));

return nullptr;		return nullptr;
}		}

bool InstCombiner::matchThreeWayIntCompare(SelectInst SI, Value &LHS,		bool InstCombinerImpl::matchThreeWayIntCompare(SelectInst SI, Value &LHS,
Value &RHS, ConstantInt &Less,		Value &RHS, ConstantInt &Less,
ConstantInt *&Equal,		ConstantInt *&Equal,
ConstantInt *&Greater) {		ConstantInt *&Greater) {
// TODO: Generalize this to work with other comparison idioms or ensure		// TODO: Generalize this to work with other comparison idioms or ensure
// they get canonicalized into this form.		// they get canonicalized into this form.

// select i1 (a == b),		// select i1 (a == b),
// i32 Equal,		// i32 Equal,
// i32 (select i1 (a < b), i32 Less, i32 Greater)		// i32 (select i1 (a < b), i32 Less, i32 Greater)
// where Equal, Less and Greater are placeholders for any three constants.		// where Equal, Less and Greater are placeholders for any three constants.
ICmpInst::Predicate PredA;		ICmpInst::Predicate PredA;
Show All 20 Lines	if (LHS2 != LHS) {
PredB = ICmpInst::getSwappedPredicate(PredB);		PredB = ICmpInst::getSwappedPredicate(PredB);
}		}
if (LHS2 != LHS)		if (LHS2 != LHS)
return false;		return false;
// We also need to canonicalize 'RHS'.		// We also need to canonicalize 'RHS'.
if (PredB == ICmpInst::ICMP_SGT && isa<Constant>(RHS2)) {		if (PredB == ICmpInst::ICMP_SGT && isa<Constant>(RHS2)) {
// x sgt C-1 <--> x sge C <--> not(x slt C)		// x sgt C-1 <--> x sge C <--> not(x slt C)
auto FlippedStrictness =		auto FlippedStrictness =
getFlippedStrictnessPredicateAndConstant(PredB, cast<Constant>(RHS2));		InstCombiner::getFlippedStrictnessPredicateAndConstant(
		PredB, cast<Constant>(RHS2));
if (!FlippedStrictness)		if (!FlippedStrictness)
return false;		return false;
assert(FlippedStrictness->first == ICmpInst::ICMP_SGE && "Sanity check");		assert(FlippedStrictness->first == ICmpInst::ICMP_SGE && "Sanity check");
RHS2 = FlippedStrictness->second;		RHS2 = FlippedStrictness->second;
// And kind-of perform the result swap.		// And kind-of perform the result swap.
std::swap(Less, Greater);		std::swap(Less, Greater);
PredB = ICmpInst::ICMP_SLT;		PredB = ICmpInst::ICMP_SLT;
}		}
return PredB == ICmpInst::ICMP_SLT && RHS == RHS2;		return PredB == ICmpInst::ICMP_SLT && RHS == RHS2;
}		}

Instruction *InstCombiner::foldICmpSelectConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpSelectConstant(ICmpInst &Cmp,
SelectInst *Select,		SelectInst *Select,
ConstantInt *C) {		ConstantInt *C) {

assert(C && "Cmp RHS should be a constant int!");		assert(C && "Cmp RHS should be a constant int!");
// If we're testing a constant value against the result of a three way		// If we're testing a constant value against the result of a three way
// comparison, the result can be expressed directly in terms of the		// comparison, the result can be expressed directly in terms of the
// original values being compared. Note: We could possibly be more		// original values being compared. Note: We could possibly be more
// aggressive here and remove the hasOneUse test. The original select is		// aggressive here and remove the hasOneUse test. The original select is
// really likely to simplify or sink when we remove a test of the result.		// really likely to simplify or sink when we remove a test of the result.
Value OrigLHS, OrigRHS;		Value OrigLHS, OrigRHS;
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (Bitcast->getSrcTy()->getScalarSizeInBits() ==

// If this is a sign-bit test of a bitcast of a casted FP value, eliminate		// If this is a sign-bit test of a bitcast of a casted FP value, eliminate
// the FP extend/truncate because that cast does not change the sign-bit.		// the FP extend/truncate because that cast does not change the sign-bit.
// This is true for all standard IEEE-754 types and the X86 80-bit type.		// This is true for all standard IEEE-754 types and the X86 80-bit type.
// The sign-bit is always the most significant bit in those types.		// The sign-bit is always the most significant bit in those types.
const APInt *C;		const APInt *C;
bool TrueIfSigned;		bool TrueIfSigned;
if (match(Op1, m_APInt(C)) && Bitcast->hasOneUse() &&		if (match(Op1, m_APInt(C)) && Bitcast->hasOneUse() &&
isSignBitCheck(Pred, *C, TrueIfSigned)) {		InstCombiner::isSignBitCheck(Pred, *C, TrueIfSigned)) {
if (match(BCSrcOp, m_FPExt(m_Value(X))) \|\|		if (match(BCSrcOp, m_FPExt(m_Value(X))) \|\|
match(BCSrcOp, m_FPTrunc(m_Value(X)))) {		match(BCSrcOp, m_FPTrunc(m_Value(X)))) {
// (bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0		// (bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0
// (bitcast (fpext/fptrunc X)) to iX) > -1 --> (bitcast X to iY) > -1		// (bitcast (fpext/fptrunc X)) to iX) > -1 --> (bitcast X to iY) > -1
Type *XType = X->getType();		Type *XType = X->getType();

// We can't currently handle Power style floating point operations here.		// We can't currently handle Power style floating point operations here.
if (!(XType->isPPC_FP128Ty() \|\| BCSrcOp->getType()->isPPC_FP128Ty())) {		if (!(XType->isPPC_FP128Ty() \|\| BCSrcOp->getType()->isPPC_FP128Ty())) {
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (is_splat(Mask)) {
}		}
}		}
}		}
return nullptr;		return nullptr;
}		}

/// Try to fold integer comparisons with a constant operand: icmp Pred X, C		/// Try to fold integer comparisons with a constant operand: icmp Pred X, C
/// where X is some kind of instruction.		/// where X is some kind of instruction.
Instruction *InstCombiner::foldICmpInstWithConstant(ICmpInst &Cmp) {		Instruction *InstCombinerImpl::foldICmpInstWithConstant(ICmpInst &Cmp) {
const APInt *C;		const APInt *C;
if (!match(Cmp.getOperand(1), m_APInt(C)))		if (!match(Cmp.getOperand(1), m_APInt(C)))
return nullptr;		return nullptr;

if (auto *BO = dyn_cast<BinaryOperator>(Cmp.getOperand(0))) {		if (auto *BO = dyn_cast<BinaryOperator>(Cmp.getOperand(0))) {
switch (BO->getOpcode()) {		switch (BO->getOpcode()) {
case Instruction::Xor:		case Instruction::Xor:
if (Instruction I = foldICmpXorConstant(Cmp, BO, C))		if (Instruction I = foldICmpXorConstant(Cmp, BO, C))
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (auto *II = dyn_cast<IntrinsicInst>(Cmp.getOperand(0)))
if (Instruction I = foldICmpIntrinsicWithConstant(Cmp, II, C))		if (Instruction I = foldICmpIntrinsicWithConstant(Cmp, II, C))
return I;		return I;

return nullptr;		return nullptr;
}		}

/// Fold an icmp equality instruction with binary operator LHS and constant RHS:		/// Fold an icmp equality instruction with binary operator LHS and constant RHS:
/// icmp eq/ne BO, C.		/// icmp eq/ne BO, C.
Instruction *InstCombiner::foldICmpBinOpEqualityWithConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpBinOpEqualityWithConstant(
BinaryOperator *BO,		ICmpInst &Cmp, BinaryOperator *BO, const APInt &C) {
const APInt &C) {
// TODO: Some of these folds could work with arbitrary constants, but this		// TODO: Some of these folds could work with arbitrary constants, but this
// function is limited to scalar and vector splat constants.		// function is limited to scalar and vector splat constants.
if (!Cmp.isEquality())		if (!Cmp.isEquality())
return nullptr;		return nullptr;

ICmpInst::Predicate Pred = Cmp.getPredicate();		ICmpInst::Predicate Pred = Cmp.getPredicate();
bool isICMP_NE = Pred == ICmpInst::ICMP_NE;		bool isICMP_NE = Pred == ICmpInst::ICMP_NE;
Constant *RHS = cast<Constant>(Cmp.getOperand(1));		Constant *RHS = cast<Constant>(Cmp.getOperand(1));
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	case Instruction::UDiv:
break;		break;
default:		default:
break;		break;
}		}
return nullptr;		return nullptr;
}		}

/// Fold an equality icmp with LLVM intrinsic and constant operand.		/// Fold an equality icmp with LLVM intrinsic and constant operand.
Instruction *InstCombiner::foldICmpEqIntrinsicWithConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpEqIntrinsicWithConstant(
IntrinsicInst *II,		ICmpInst &Cmp, IntrinsicInst *II, const APInt &C) {
const APInt &C) {
Type *Ty = II->getType();		Type *Ty = II->getType();
unsigned BitWidth = C.getBitWidth();		unsigned BitWidth = C.getBitWidth();
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::bswap:		case Intrinsic::bswap:
// bswap(A) == C -> A == bswap(C)		// bswap(A) == C -> A == bswap(C)
return new ICmpInst(Cmp.getPredicate(), II->getArgOperand(0),		return new ICmpInst(Cmp.getPredicate(), II->getArgOperand(0),
ConstantInt::get(Ty, C.byteSwap()));		ConstantInt::get(Ty, C.byteSwap()));

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldICmpEqIntrinsicWithConstant(
default:		default:
break;		break;
}		}

return nullptr;		return nullptr;
}		}

/// Fold an icmp with LLVM intrinsic and constant operand: icmp Pred II, C.		/// Fold an icmp with LLVM intrinsic and constant operand: icmp Pred II, C.
Instruction *InstCombiner::foldICmpIntrinsicWithConstant(ICmpInst &Cmp,		Instruction *InstCombinerImpl::foldICmpIntrinsicWithConstant(ICmpInst &Cmp,
IntrinsicInst *II,		IntrinsicInst *II,
const APInt &C) {		const APInt &C) {
if (Cmp.isEquality())		if (Cmp.isEquality())
return foldICmpEqIntrinsicWithConstant(Cmp, II, C);		return foldICmpEqIntrinsicWithConstant(Cmp, II, C);

Type *Ty = II->getType();		Type *Ty = II->getType();
unsigned BitWidth = C.getBitWidth();		unsigned BitWidth = C.getBitWidth();
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::ctlz: {		case Intrinsic::ctlz: {
// ctlz(0bXXXXXXXX) > 3 -> 0bXXXXXXXX < 0b00010000		// ctlz(0bXXXXXXXX) > 3 -> 0bXXXXXXXX < 0b00010000
Show All 40 Lines	Instruction *InstCombinerImpl::foldICmpIntrinsicWithConstant(ICmpInst &Cmp,
default:		default:
break;		break;
}		}

return nullptr;		return nullptr;
}		}

/// Handle icmp with constant (but not simple integer constant) RHS.		/// Handle icmp with constant (but not simple integer constant) RHS.
Instruction *InstCombiner::foldICmpInstWithConstantNotInt(ICmpInst &I) {		Instruction *InstCombinerImpl::foldICmpInstWithConstantNotInt(ICmpInst &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
Constant *RHSC = dyn_cast<Constant>(Op1);		Constant *RHSC = dyn_cast<Constant>(Op1);
Instruction *LHSI = dyn_cast<Instruction>(Op0);		Instruction *LHSI = dyn_cast<Instruction>(Op0);
if (!RHSC \|\| !LHSI)		if (!RHSC \|\| !LHSI)
return nullptr;		return nullptr;

switch (LHSI->getOpcode()) {		switch (LHSI->getOpcode()) {
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines

/// Fold		/// Fold
/// (-1 u/ x) u< y		/// (-1 u/ x) u< y
/// ((x * y) u/ x) != y		/// ((x * y) u/ x) != y
/// to		/// to
/// @llvm.umul.with.overflow(x, y) plus extraction of overflow bit		/// @llvm.umul.with.overflow(x, y) plus extraction of overflow bit
/// Note that the comparison is commutative, while inverted (u>=, ==) predicate		/// Note that the comparison is commutative, while inverted (u>=, ==) predicate
/// will mean that we are looking for the opposite answer.		/// will mean that we are looking for the opposite answer.
Value *InstCombiner::foldUnsignedMultiplicationOverflowCheck(ICmpInst &I) {		Value *InstCombinerImpl::foldUnsignedMultiplicationOverflowCheck(ICmpInst &I) {
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
Value X, Y;		Value X, Y;
Instruction *Mul;		Instruction *Mul;
bool NeedNegation;		bool NeedNegation;
// Look for: (-1 u/ x) u</u>= y		// Look for: (-1 u/ x) u</u>= y
if (!I.isEquality() &&		if (!I.isEquality() &&
match(&I, m_c_ICmp(Pred, m_OneUse(m_UDiv(m_AllOnes(), m_Value(X))),		match(&I, m_c_ICmp(Pred, m_OneUse(m_UDiv(m_AllOnes(), m_Value(X))),
m_Value(Y)))) {		m_Value(Y)))) {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	Value *InstCombinerImpl::foldUnsignedMultiplicationOverflowCheck(ICmpInst &I) {

return Res;		return Res;
}		}

/// Try to fold icmp (binop), X or icmp X, (binop).		/// Try to fold icmp (binop), X or icmp X, (binop).
/// TODO: A large part of this logic is duplicated in InstSimplify's		/// TODO: A large part of this logic is duplicated in InstSimplify's
/// simplifyICmpWithBinOp(). We should be able to share that and avoid the code		/// simplifyICmpWithBinOp(). We should be able to share that and avoid the code
/// duplication.		/// duplication.
Instruction *InstCombiner::foldICmpBinOp(ICmpInst &I, const SimplifyQuery &SQ) {		Instruction *InstCombinerImpl::foldICmpBinOp(ICmpInst &I,
		const SimplifyQuery &SQ) {
const SimplifyQuery Q = SQ.getWithInstruction(&I);		const SimplifyQuery Q = SQ.getWithInstruction(&I);
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);

// Special logic for binary operators.		// Special logic for binary operators.
BinaryOperator *BO0 = dyn_cast<BinaryOperator>(Op0);		BinaryOperator *BO0 = dyn_cast<BinaryOperator>(Op0);
BinaryOperator *BO1 = dyn_cast<BinaryOperator>(Op1);		BinaryOperator *BO1 = dyn_cast<BinaryOperator>(Op1);
if (!BO0 && !BO1)		if (!BO0 && !BO1)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	if (match(Op0, m_c_UMax(m_Specific(X), m_Value(Y)))) {
// umax(X, Y) u>= X --> true		// umax(X, Y) u>= X --> true
// umax(X, Y) u< X --> false		// umax(X, Y) u< X --> false
return nullptr;		return nullptr;
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::foldICmpEquality(ICmpInst &I) {		Instruction *InstCombinerImpl::foldICmpEquality(ICmpInst &I) {
if (!I.isEquality())		if (!I.isEquality())
return nullptr;		return nullptr;

Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
const CmpInst::Predicate Pred = I.getPredicate();		const CmpInst::Predicate Pred = I.getPredicate();
Value A, B, C, D;		Value A, B, C, D;
if (match(Op0, m_Xor(m_Value(A), m_Value(B)))) {		if (match(Op0, m_Xor(m_Value(A), m_Value(B)))) {
if (A == Op1 \|\| B == Op1) { // (A^B) == A -> B == 0		if (A == Op1 \|\| B == Op1) { // (A^B) == A -> B == 0
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	static Instruction *foldICmpWithZextOrSext(ICmpInst &ICmp,

// Is source op negative?		// Is source op negative?
// icmp ugt (sext X), C --> icmp slt X, 0		// icmp ugt (sext X), C --> icmp slt X, 0
assert(ICmp.getPredicate() == ICmpInst::ICMP_UGT && "ICmp should be folded!");		assert(ICmp.getPredicate() == ICmpInst::ICMP_UGT && "ICmp should be folded!");
return new ICmpInst(CmpInst::ICMP_SLT, X, Constant::getNullValue(SrcTy));		return new ICmpInst(CmpInst::ICMP_SLT, X, Constant::getNullValue(SrcTy));
}		}

/// Handle icmp (cast x), (cast or constant).		/// Handle icmp (cast x), (cast or constant).
Instruction *InstCombiner::foldICmpWithCastOp(ICmpInst &ICmp) {		Instruction *InstCombinerImpl::foldICmpWithCastOp(ICmpInst &ICmp) {
auto *CastOp0 = dyn_cast<CastInst>(ICmp.getOperand(0));		auto *CastOp0 = dyn_cast<CastInst>(ICmp.getOperand(0));
if (!CastOp0)		if (!CastOp0)
return nullptr;		return nullptr;
if (!isa<Constant>(ICmp.getOperand(1)) && !isa<CastInst>(ICmp.getOperand(1)))		if (!isa<Constant>(ICmp.getOperand(1)) && !isa<CastInst>(ICmp.getOperand(1)))
return nullptr;		return nullptr;

Value *Op0Src = CastOp0->getOperand(0);		Value *Op0Src = CastOp0->getOperand(0);
Type *SrcTy = CastOp0->getSrcTy();		Type *SrcTy = CastOp0->getSrcTy();
Show All 38 Lines	switch (BinaryOp) {
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
return match(RHS, m_Zero());		return match(RHS, m_Zero());
case Instruction::Mul:		case Instruction::Mul:
return match(RHS, m_One());		return match(RHS, m_One());
}		}
}		}

OverflowResult InstCombiner::computeOverflow(		OverflowResult
Instruction::BinaryOps BinaryOp, bool IsSigned,		InstCombinerImpl::computeOverflow(Instruction::BinaryOps BinaryOp,
Value LHS, Value RHS, Instruction *CxtI) const {		bool IsSigned, Value LHS, Value RHS,
		Instruction *CxtI) const {
switch (BinaryOp) {		switch (BinaryOp) {
default:		default:
llvm_unreachable("Unsupported binary op");		llvm_unreachable("Unsupported binary op");
case Instruction::Add:		case Instruction::Add:
if (IsSigned)		if (IsSigned)
return computeOverflowForSignedAdd(LHS, RHS, CxtI);		return computeOverflowForSignedAdd(LHS, RHS, CxtI);
else		else
return computeOverflowForUnsignedAdd(LHS, RHS, CxtI);		return computeOverflowForUnsignedAdd(LHS, RHS, CxtI);
case Instruction::Sub:		case Instruction::Sub:
if (IsSigned)		if (IsSigned)
return computeOverflowForSignedSub(LHS, RHS, CxtI);		return computeOverflowForSignedSub(LHS, RHS, CxtI);
else		else
return computeOverflowForUnsignedSub(LHS, RHS, CxtI);		return computeOverflowForUnsignedSub(LHS, RHS, CxtI);
case Instruction::Mul:		case Instruction::Mul:
if (IsSigned)		if (IsSigned)
return computeOverflowForSignedMul(LHS, RHS, CxtI);		return computeOverflowForSignedMul(LHS, RHS, CxtI);
else		else
return computeOverflowForUnsignedMul(LHS, RHS, CxtI);		return computeOverflowForUnsignedMul(LHS, RHS, CxtI);
}		}
}		}

bool InstCombiner::OptimizeOverflowCheck(		bool InstCombinerImpl::OptimizeOverflowCheck(Instruction::BinaryOps BinaryOp,
Instruction::BinaryOps BinaryOp, bool IsSigned, Value LHS, Value RHS,		bool IsSigned, Value *LHS,
Instruction &OrigI, Value &Result, Constant &Overflow) {		Value *RHS, Instruction &OrigI,
		Value *&Result,
		Constant *&Overflow) {
if (OrigI.isCommutative() && isa<Constant>(LHS) && !isa<Constant>(RHS))		if (OrigI.isCommutative() && isa<Constant>(LHS) && !isa<Constant>(RHS))
std::swap(LHS, RHS);		std::swap(LHS, RHS);

// If the overflow check was an add followed by a compare, the insertion point		// If the overflow check was an add followed by a compare, the insertion point
// may be pointing to the compare. We want to insert the new instructions		// may be pointing to the compare. We want to insert the new instructions
// before the add in case there are uses of the add between the add and the		// before the add in case there are uses of the add between the add and the
// compare.		// compare.
Builder.SetInsertPoint(&OrigI);		Builder.SetInsertPoint(&OrigI);
Show All 39 Lines
///		///
/// \param I Compare instruction.		/// \param I Compare instruction.
/// \param MulVal Result of 'mult' instruction. It is one of the arguments of		/// \param MulVal Result of 'mult' instruction. It is one of the arguments of
/// the compare instruction. Must be of integer type.		/// the compare instruction. Must be of integer type.
/// \param OtherVal The other argument of compare instruction.		/// \param OtherVal The other argument of compare instruction.
/// \returns Instruction which must replace the compare instruction, NULL if no		/// \returns Instruction which must replace the compare instruction, NULL if no
/// replacement required.		/// replacement required.
static Instruction processUMulZExtIdiom(ICmpInst &I, Value MulVal,		static Instruction processUMulZExtIdiom(ICmpInst &I, Value MulVal,
Value *OtherVal, InstCombiner &IC) {		Value *OtherVal,
		InstCombinerImpl &IC) {
// Don't bother doing this transformation for pointers, don't do it for		// Don't bother doing this transformation for pointers, don't do it for
// vectors.		// vectors.
if (!isa<IntegerType>(MulVal->getType()))		if (!isa<IntegerType>(MulVal->getType()))
return nullptr;		return nullptr;

assert(I.getOperand(0) == MulVal \|\| I.getOperand(1) == MulVal);		assert(I.getOperand(0) == MulVal \|\| I.getOperand(1) == MulVal);
assert(I.getOperand(0) == OtherVal \|\| I.getOperand(1) == OtherVal);		assert(I.getOperand(0) == OtherVal \|\| I.getOperand(1) == OtherVal);
auto *MulInstr = dyn_cast<Instruction>(MulVal);		auto *MulInstr = dyn_cast<Instruction>(MulVal);
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	static Instruction processUMulZExtIdiom(ICmpInst &I, Value MulVal,
Value MulA = A, MulB = B;		Value MulA = A, MulB = B;
if (WidthA < MulWidth)		if (WidthA < MulWidth)
MulA = Builder.CreateZExt(A, MulType);		MulA = Builder.CreateZExt(A, MulType);
if (WidthB < MulWidth)		if (WidthB < MulWidth)
MulB = Builder.CreateZExt(B, MulType);		MulB = Builder.CreateZExt(B, MulType);
Function *F = Intrinsic::getDeclaration(		Function *F = Intrinsic::getDeclaration(
I.getModule(), Intrinsic::umul_with_overflow, MulType);		I.getModule(), Intrinsic::umul_with_overflow, MulType);
CallInst *Call = Builder.CreateCall(F, {MulA, MulB}, "umul");		CallInst *Call = Builder.CreateCall(F, {MulA, MulB}, "umul");
IC.Worklist.push(MulInstr);		IC.addToWorklist(MulInstr);

// If there are uses of mul result other than the comparison, we know that		// If there are uses of mul result other than the comparison, we know that
// they are truncation or binary AND. Change them to use result of		// they are truncation or binary AND. Change them to use result of
// mul.with.overflow and adjust properly mask/size.		// mul.with.overflow and adjust properly mask/size.
if (MulVal->hasNUsesOrMore(2)) {		if (MulVal->hasNUsesOrMore(2)) {
Value *Mul = Builder.CreateExtractValue(Call, 0, "umul.value");		Value *Mul = Builder.CreateExtractValue(Call, 0, "umul.value");
for (auto UI = MulVal->user_begin(), UE = MulVal->user_end(); UI != UE;) {		for (auto UI = MulVal->user_begin(), UE = MulVal->user_end(); UI != UE;) {
User U = UI++;		User U = UI++;
Show All 10 Lines	for (auto UI = MulVal->user_begin(), UE = MulVal->user_end(); UI != UE;) {
ConstantInt *CI = cast<ConstantInt>(BO->getOperand(1));		ConstantInt *CI = cast<ConstantInt>(BO->getOperand(1));
APInt ShortMask = CI->getValue().trunc(MulWidth);		APInt ShortMask = CI->getValue().trunc(MulWidth);
Value *ShortAnd = Builder.CreateAnd(Mul, ShortMask);		Value *ShortAnd = Builder.CreateAnd(Mul, ShortMask);
Value *Zext = Builder.CreateZExt(ShortAnd, BO->getType());		Value *Zext = Builder.CreateZExt(ShortAnd, BO->getType());
IC.replaceInstUsesWith(*BO, Zext);		IC.replaceInstUsesWith(*BO, Zext);
} else {		} else {
llvm_unreachable("Unexpected Binary operation");		llvm_unreachable("Unexpected Binary operation");
}		}
IC.Worklist.push(cast<Instruction>(U));		IC.addToWorklist(cast<Instruction>(U));
}		}
}		}
if (isa<Instruction>(OtherVal))		if (isa<Instruction>(OtherVal))
IC.Worklist.push(cast<Instruction>(OtherVal));		IC.addToWorklist(cast<Instruction>(OtherVal));

// The original icmp gets replaced with the overflow value, maybe inverted		// The original icmp gets replaced with the overflow value, maybe inverted
// depending on predicate.		// depending on predicate.
bool Inverse = false;		bool Inverse = false;
switch (I.getPredicate()) {		switch (I.getPredicate()) {
case ICmpInst::ICMP_NE:		case ICmpInst::ICMP_NE:
break;		break;
case ICmpInst::ICMP_EQ:		case ICmpInst::ICMP_EQ:
Show All 28 Lines
static APInt getDemandedBitsLHSMask(ICmpInst &I, unsigned BitWidth) {		static APInt getDemandedBitsLHSMask(ICmpInst &I, unsigned BitWidth) {
const APInt *RHS;		const APInt *RHS;
if (!match(I.getOperand(1), m_APInt(RHS)))		if (!match(I.getOperand(1), m_APInt(RHS)))
return APInt::getAllOnesValue(BitWidth);		return APInt::getAllOnesValue(BitWidth);

// If this is a normal comparison, it demands all bits. If it is a sign bit		// If this is a normal comparison, it demands all bits. If it is a sign bit
// comparison, it only demands the sign bit.		// comparison, it only demands the sign bit.
bool UnusedBit;		bool UnusedBit;
if (isSignBitCheck(I.getPredicate(), *RHS, UnusedBit))		if (InstCombiner::isSignBitCheck(I.getPredicate(), *RHS, UnusedBit))
return APInt::getSignMask(BitWidth);		return APInt::getSignMask(BitWidth);

switch (I.getPredicate()) {		switch (I.getPredicate()) {
// For a UGT comparison, we don't care about any bits that		// For a UGT comparison, we don't care about any bits that
// correspond to the trailing ones of the comparand. The value of these		// correspond to the trailing ones of the comparand. The value of these
// bits doesn't impact the outcome of the comparison, because any value		// bits doesn't impact the outcome of the comparison, because any value
// greater than the RHS must differ in a bit higher than these due to carry.		// greater than the RHS must differ in a bit higher than these due to carry.
case ICmpInst::ICMP_UGT:		case ICmpInst::ICMP_UGT:
Show All 40 Lines
///		///
/// \param DI Definition		/// \param DI Definition
/// \param UI Use		/// \param UI Use
/// \param DB Block that must dominate all uses of \p DI outside		/// \param DB Block that must dominate all uses of \p DI outside
/// the parent block		/// the parent block
/// \return true when \p UI is the only use of \p DI in the parent block		/// \return true when \p UI is the only use of \p DI in the parent block
/// and all other uses of \p DI are in blocks dominated by \p DB.		/// and all other uses of \p DI are in blocks dominated by \p DB.
///		///
bool InstCombiner::dominatesAllUses(const Instruction *DI,		bool InstCombinerImpl::dominatesAllUses(const Instruction *DI,
const Instruction *UI,		const Instruction *UI,
const BasicBlock *DB) const {		const BasicBlock *DB) const {
assert(DI && UI && "Instruction not defined\n");		assert(DI && UI && "Instruction not defined\n");
// Ignore incomplete definitions.		// Ignore incomplete definitions.
if (!DI->getParent())		if (!DI->getParent())
return false;		return false;
// DI and UI must be in the same block.		// DI and UI must be in the same block.
if (DI->getParent() != UI->getParent())		if (DI->getParent() != UI->getParent())
return false;		return false;
// Protect from self-referencing blocks.		// Protect from self-referencing blocks.
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
/// Similar when the first operand of the select is a constant or/and		/// Similar when the first operand of the select is a constant or/and
/// the compare is for not equal rather than equal.		/// the compare is for not equal rather than equal.
///		///
/// NOTE: The function is only called when the select and compare constants		/// NOTE: The function is only called when the select and compare constants
/// are equal, the optimization can work only for EQ predicates. This is not a		/// are equal, the optimization can work only for EQ predicates. This is not a
/// major restriction since a NE compare should be 'normalized' to an equal		/// major restriction since a NE compare should be 'normalized' to an equal
/// compare, which usually happens in the combiner and test case		/// compare, which usually happens in the combiner and test case
/// select-cmp-br.ll checks for it.		/// select-cmp-br.ll checks for it.
bool InstCombiner::replacedSelectWithOperand(SelectInst *SI,		bool InstCombinerImpl::replacedSelectWithOperand(SelectInst *SI,
const ICmpInst *Icmp,		const ICmpInst *Icmp,
const unsigned SIOpd) {		const unsigned SIOpd) {
assert((SIOpd == 1 \|\| SIOpd == 2) && "Invalid select operand!");		assert((SIOpd == 1 \|\| SIOpd == 2) && "Invalid select operand!");
if (isChainSelectCmpBranch(SI) && Icmp->getPredicate() == ICmpInst::ICMP_EQ) {		if (isChainSelectCmpBranch(SI) && Icmp->getPredicate() == ICmpInst::ICMP_EQ) {
BasicBlock *Succ = SI->getParent()->getTerminator()->getSuccessor(1);		BasicBlock *Succ = SI->getParent()->getTerminator()->getSuccessor(1);
// The check for the single predecessor is not the best that can be		// The check for the single predecessor is not the best that can be
// done. But it protects efficiently against cases like when SI's		// done. But it protects efficiently against cases like when SI's
// home block has two successors, Succ and Succ1, and Succ1 predecessor		// home block has two successors, Succ and Succ1, and Succ1 predecessor
// of Succ. Then SI can't be replaced by SIOpd because the use that gets		// of Succ. Then SI can't be replaced by SIOpd because the use that gets
// replaced can be reached on either path. So the uniqueness check		// replaced can be reached on either path. So the uniqueness check
Show All 9 Lines	if (Succ->getSinglePredecessor() && dominatesAllUses(SI, Icmp, Succ)) {
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Try to fold the comparison based on range information we can get by checking		/// Try to fold the comparison based on range information we can get by checking
/// whether bits are known to be zero or one in the inputs.		/// whether bits are known to be zero or one in the inputs.
Instruction *InstCombiner::foldICmpUsingKnownBits(ICmpInst &I) {		Instruction *InstCombinerImpl::foldICmpUsingKnownBits(ICmpInst &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
Type *Ty = Op0->getType();		Type *Ty = Op0->getType();
ICmpInst::Predicate Pred = I.getPredicate();		ICmpInst::Predicate Pred = I.getPredicate();

// Get scalar or pointer size.		// Get scalar or pointer size.
unsigned BitWidth = Ty->isIntOrIntVectorTy()		unsigned BitWidth = Ty->isIntOrIntVectorTy()
? Ty->getScalarSizeInBits()		? Ty->getScalarSizeInBits()
: DL.getPointerTypeSizeInBits(Ty->getScalarType());		: DL.getPointerTypeSizeInBits(Ty->getScalarType());
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	if (I.isSigned() &&
((Op0Known.Zero.isNegative() && Op1Known.Zero.isNegative()) \|\|		((Op0Known.Zero.isNegative() && Op1Known.Zero.isNegative()) \|\|
(Op0Known.One.isNegative() && Op1Known.One.isNegative())))		(Op0Known.One.isNegative() && Op1Known.One.isNegative())))
return new ICmpInst(I.getUnsignedPredicate(), Op0, Op1);		return new ICmpInst(I.getUnsignedPredicate(), Op0, Op1);

return nullptr;		return nullptr;
}		}

llvm::Optional<std::pair<CmpInst::Predicate, Constant *>>		llvm::Optional<std::pair<CmpInst::Predicate, Constant *>>
llvm::getFlippedStrictnessPredicateAndConstant(CmpInst::Predicate Pred,		InstCombiner::getFlippedStrictnessPredicateAndConstant(CmpInst::Predicate Pred,
Constant *C) {		Constant *C) {
assert(ICmpInst::isRelational(Pred) && ICmpInst::isIntPredicate(Pred) &&		assert(ICmpInst::isRelational(Pred) && ICmpInst::isIntPredicate(Pred) &&
"Only for relational integer predicates.");		"Only for relational integer predicates.");

Type *Type = C->getType();		Type *Type = C->getType();
bool IsSigned = ICmpInst::isSigned(Pred);		bool IsSigned = ICmpInst::isSigned(Pred);

CmpInst::Predicate UnsignedPred = ICmpInst::getUnsignedPredicate(Pred);		CmpInst::Predicate UnsignedPred = ICmpInst::getUnsignedPredicate(Pred);
bool WillIncrement =		bool WillIncrement =
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
}		}

/// If we have an icmp le or icmp ge instruction with a constant operand, turn		/// If we have an icmp le or icmp ge instruction with a constant operand, turn
/// it into the appropriate icmp lt or icmp gt instruction. This transform		/// it into the appropriate icmp lt or icmp gt instruction. This transform
/// allows them to be folded in visitICmpInst.		/// allows them to be folded in visitICmpInst.
static ICmpInst *canonicalizeCmpWithConstant(ICmpInst &I) {		static ICmpInst *canonicalizeCmpWithConstant(ICmpInst &I) {
ICmpInst::Predicate Pred = I.getPredicate();		ICmpInst::Predicate Pred = I.getPredicate();
if (ICmpInst::isEquality(Pred) \|\| !ICmpInst::isIntPredicate(Pred) \|\|		if (ICmpInst::isEquality(Pred) \|\| !ICmpInst::isIntPredicate(Pred) \|\|
isCanonicalPredicate(Pred))		InstCombiner::isCanonicalPredicate(Pred))
return nullptr;		return nullptr;

Value *Op0 = I.getOperand(0);		Value *Op0 = I.getOperand(0);
Value *Op1 = I.getOperand(1);		Value *Op1 = I.getOperand(1);
auto *Op1C = dyn_cast<Constant>(Op1);		auto *Op1C = dyn_cast<Constant>(Op1);
if (!Op1C)		if (!Op1C)
return nullptr;		return nullptr;

auto FlippedStrictness = getFlippedStrictnessPredicateAndConstant(Pred, Op1C);		auto FlippedStrictness =
		InstCombiner::getFlippedStrictnessPredicateAndConstant(Pred, Op1C);
if (!FlippedStrictness)		if (!FlippedStrictness)
return nullptr;		return nullptr;

return new ICmpInst(FlippedStrictness->first, Op0, FlippedStrictness->second);		return new ICmpInst(FlippedStrictness->first, Op0, FlippedStrictness->second);
}		}

/// Integer compare with boolean values can always be turned into bitwise ops.		/// Integer compare with boolean values can always be turned into bitwise ops.
static Instruction *canonicalizeICmpBool(ICmpInst &I,		static Instruction *canonicalizeICmpBool(ICmpInst &I,
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	else if (match(Op1, UAddOvResultPat) &&
// A > extract(uadd.with.overflow(A, B), 0)		// A > extract(uadd.with.overflow(A, B), 0)
UAddOv = cast<ExtractValueInst>(Op1)->getAggregateOperand();		UAddOv = cast<ExtractValueInst>(Op1)->getAggregateOperand();
else		else
return nullptr;		return nullptr;

return ExtractValueInst::Create(UAddOv, 1);		return ExtractValueInst::Create(UAddOv, 1);
}		}

Instruction *InstCombiner::visitICmpInst(ICmpInst &I) {		Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
bool Changed = false;		bool Changed = false;
const SimplifyQuery Q = SQ.getWithInstruction(&I);		const SimplifyQuery Q = SQ.getWithInstruction(&I);
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
unsigned Op0Cplxity = getComplexity(Op0);		unsigned Op0Cplxity = getComplexity(Op0);
unsigned Op1Cplxity = getComplexity(Op1);		unsigned Op1Cplxity = getComplexity(Op1);

/// Orders the operands of the compare so that they are listed from most		/// Orders the operands of the compare so that they are listed from most
/// complex to least complex. This puts constants before unary operators,		/// complex to least complex. This puts constants before unary operators,
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
if (I.getType()->isVectorTy())		if (I.getType()->isVectorTy())
if (Instruction *Res = foldVectorCmp(I, Builder))		if (Instruction *Res = foldVectorCmp(I, Builder))
return Res;		return Res;

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}

/// Fold fcmp ([us]itofp x, cst) if possible.		/// Fold fcmp ([us]itofp x, cst) if possible.
Instruction InstCombiner::foldFCmpIntToFPConst(FCmpInst &I, Instruction LHSI,		Instruction *InstCombinerImpl::foldFCmpIntToFPConst(FCmpInst &I,
		Instruction *LHSI,
Constant *RHSC) {		Constant *RHSC) {
if (!isa<ConstantFP>(RHSC)) return nullptr;		if (!isa<ConstantFP>(RHSC)) return nullptr;
const APFloat &RHS = cast<ConstantFP>(RHSC)->getValueAPF();		const APFloat &RHS = cast<ConstantFP>(RHSC)->getValueAPF();

// Get the width of the mantissa. We don't want to hack on conversions that		// Get the width of the mantissa. We don't want to hack on conversions that
// might lose information from the integer, e.g. "i64 -> float"		// might lose information from the integer, e.g. "i64 -> float"
int MantissaWidth = LHSI->getType()->getFPMantissaWidth();		int MantissaWidth = LHSI->getType()->getFPMantissaWidth();
if (MantissaWidth == -1) return nullptr; // Unknown.		if (MantissaWidth == -1) return nullptr; // Unknown.

▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	static Instruction foldFCmpReciprocalAndZero(FCmpInst &I, Instruction LHSI,
// Get swapped predicate if necessary.		// Get swapped predicate if necessary.
if (C->isNegative())		if (C->isNegative())
Pred = I.getSwappedPredicate();		Pred = I.getSwappedPredicate();

return new FCmpInst(Pred, LHSI->getOperand(1), RHSC, "", &I);		return new FCmpInst(Pred, LHSI->getOperand(1), RHSC, "", &I);
}		}

/// Optimize fabs(X) compared with zero.		/// Optimize fabs(X) compared with zero.
static Instruction *foldFabsWithFcmpZero(FCmpInst &I, InstCombiner &IC) {		static Instruction *foldFabsWithFcmpZero(FCmpInst &I, InstCombinerImpl &IC) {
Value *X;		Value *X;
if (!match(I.getOperand(0), m_Intrinsic<Intrinsic::fabs>(m_Value(X))) \|\|		if (!match(I.getOperand(0), m_Intrinsic<Intrinsic::fabs>(m_Value(X))) \|\|
!match(I.getOperand(1), m_PosZeroFP()))		!match(I.getOperand(1), m_PosZeroFP()))
return nullptr;		return nullptr;

auto replacePredAndOp0 = [&IC](FCmpInst I, FCmpInst::Predicate P, Value X) {		auto replacePredAndOp0 = [&IC](FCmpInst I, FCmpInst::Predicate P, Value X) {
I->setPredicate(P);		I->setPredicate(P);
return IC.replaceOperand(*I, 0, X);		return IC.replaceOperand(*I, 0, X);
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	case FCmpInst::FCMP_UNO:
// !isnan(fabs(X) --> !isnan(X)		// !isnan(fabs(X) --> !isnan(X)
return replacePredAndOp0(&I, I.getPredicate(), X);		return replacePredAndOp0(&I, I.getPredicate(), X);

default:		default:
return nullptr;		return nullptr;
}		}
}		}

Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {		Instruction *InstCombinerImpl::visitFCmpInst(FCmpInst &I) {
bool Changed = false;		bool Changed = false;

/// Orders the operands of the compare so that they are listed from most		/// Orders the operands of the compare so that they are listed from most
/// complex to least complex. This puts constants before unary operators,		/// complex to least complex. This puts constants before unary operators,
/// before binary operators.		/// before binary operators.
if (getComplexity(I.getOperand(0)) < getComplexity(I.getOperand(1))) {		if (getComplexity(I.getOperand(0)) < getComplexity(I.getOperand(1))) {
I.swapOperands();		I.swapOperands();
Changed = true;		Changed = true;
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show All 9 Lines
///		///
/// This file provides internal interfaces used to implement the InstCombine.		/// This file provides internal interfaces used to implement the InstCombine.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H
#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Use.h"
#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <cassert>		#include <cassert>
#include <cstdint>

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

		// As a default, let's assume that we want to be aggressive,
		// and attempt to traverse with no limits in attempt to sink negation.
		static constexpr unsigned NegatorDefaultMaxDepth = ~0U;

		// Let's guesstimate that most often we will end up visiting/producing
		// fairly small number of new instructions.
		static constexpr unsigned NegatorMaxNodesSSO = 16;

namespace llvm {		namespace llvm {

class AAResults;		class AAResults;
class APInt;		class APInt;
class AssumptionCache;		class AssumptionCache;
class BlockFrequencyInfo;		class BlockFrequencyInfo;
class DataLayout;		class DataLayout;
class DominatorTree;		class DominatorTree;
class GEPOperator;		class GEPOperator;
class GlobalVariable;		class GlobalVariable;
class LoopInfo;		class LoopInfo;
class OptimizationRemarkEmitter;		class OptimizationRemarkEmitter;
class ProfileSummaryInfo;		class ProfileSummaryInfo;
class TargetLibraryInfo;		class TargetLibraryInfo;
class User;		class User;

/// Assign a complexity or rank value to LLVM Values. This is used to reduce		class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
/// the amount of pattern matching needed for compares and commutative		: public InstCombiner,
/// instructions. For example, if we have:		public InstVisitor<InstCombinerImpl, Instruction *> {
/// icmp ugt X, Constant
/// or
/// xor (add X, Constant), cast Z
///
/// We do not have to consider the commuted variants of these patterns because
/// canonicalization based on complexity guarantees the above ordering.
///
/// This routine maps IR values to various complexity ranks:
/// 0 -> undef
/// 1 -> Constants
/// 2 -> Other non-instructions
/// 3 -> Arguments
/// 4 -> Cast and (f)neg/not instructions
/// 5 -> Other instructions
static inline unsigned getComplexity(Value *V) {
if (isa<Instruction>(V)) {
if (isa<CastInst>(V) \|\| match(V, m_Neg(m_Value())) \|\|
match(V, m_Not(m_Value())) \|\| match(V, m_FNeg(m_Value())))
return 4;
return 5;
}
if (isa<Argument>(V))
return 3;
return isa<Constant>(V) ? (isa<UndefValue>(V) ? 0 : 1) : 2;
}

/// Predicate canonicalization reduces the number of patterns that need to be
/// matched by other transforms. For example, we may swap the operands of a
/// conditional branch or select to create a compare with a canonical (inverted)
/// predicate which is then more likely to be matched with other values.
static inline bool isCanonicalPredicate(CmpInst::Predicate Pred) {
switch (Pred) {
case CmpInst::ICMP_NE:
case CmpInst::ICMP_ULE:
case CmpInst::ICMP_SLE:
case CmpInst::ICMP_UGE:
case CmpInst::ICMP_SGE:
// TODO: There are 16 FCMP predicates. Should others be (not) canonical?
case CmpInst::FCMP_ONE:
case CmpInst::FCMP_OLE:
case CmpInst::FCMP_OGE:
return false;
default:
return true;
}
}

/// Given an exploded icmp instruction, return true if the comparison only
/// checks the sign bit. If it only checks the sign bit, set TrueIfSigned if the
/// result of the comparison is true when the input value is signed.
inline bool isSignBitCheck(ICmpInst::Predicate Pred, const APInt &RHS,
bool &TrueIfSigned) {
switch (Pred) {
case ICmpInst::ICMP_SLT: // True if LHS s< 0
TrueIfSigned = true;
return RHS.isNullValue();
case ICmpInst::ICMP_SLE: // True if LHS s<= -1
TrueIfSigned = true;
return RHS.isAllOnesValue();
case ICmpInst::ICMP_SGT: // True if LHS s> -1
TrueIfSigned = false;
return RHS.isAllOnesValue();
case ICmpInst::ICMP_SGE: // True if LHS s>= 0
TrueIfSigned = false;
return RHS.isNullValue();
case ICmpInst::ICMP_UGT:
// True if LHS u> RHS and RHS == sign-bit-mask - 1
TrueIfSigned = true;
return RHS.isMaxSignedValue();
case ICmpInst::ICMP_UGE:
// True if LHS u>= RHS and RHS == sign-bit-mask (2^7, 2^15, 2^31, etc)
TrueIfSigned = true;
return RHS.isMinSignedValue();
case ICmpInst::ICMP_ULT:
// True if LHS u< RHS and RHS == sign-bit-mask (2^7, 2^15, 2^31, etc)
TrueIfSigned = false;
return RHS.isMinSignedValue();
case ICmpInst::ICMP_ULE:
// True if LHS u<= RHS and RHS == sign-bit-mask - 1
TrueIfSigned = false;
return RHS.isMaxSignedValue();
default:
return false;
}
}

llvm::Optional<std::pair<CmpInst::Predicate, Constant *>>
getFlippedStrictnessPredicateAndConstant(CmpInst::Predicate Pred, Constant *C);

/// Return the source operand of a potentially bitcasted value while optionally
/// checking if it has one use. If there is no bitcast or the one use check is
/// not met, return the input value itself.
static inline Value peekThroughBitcast(Value V, bool OneUseOnly = false) {
if (auto *BitCast = dyn_cast<BitCastInst>(V))
if (!OneUseOnly \|\| BitCast->hasOneUse())
return BitCast->getOperand(0);

// V is not a bitcast or V has more than one use and OneUseOnly is true.
return V;
}

/// Add one to a Constant
static inline Constant AddOne(Constant C) {
return ConstantExpr::getAdd(C, ConstantInt::get(C->getType(), 1));
}

/// Subtract one from a Constant
static inline Constant SubOne(Constant C) {
return ConstantExpr::getSub(C, ConstantInt::get(C->getType(), 1));
}

/// Return true if the specified value is free to invert (apply ~ to).
/// This happens in cases where the ~ can be eliminated. If WillInvertAllUses
/// is true, work under the assumption that the caller intends to remove all
/// uses of V and only keep uses of ~V.
///
/// See also: canFreelyInvertAllUsersOf()
static inline bool isFreeToInvert(Value *V, bool WillInvertAllUses) {
// ~(~(X)) -> X.
if (match(V, m_Not(m_Value())))
return true;

// Constants can be considered to be not'ed values.
if (match(V, m_AnyIntegralConstant()))
return true;

// Compares can be inverted if all of their uses are being modified to use the
// ~V.
if (isa<CmpInst>(V))
return WillInvertAllUses;

// If `V` is of the form `A + Constant` then `-1 - V` can be folded into `(-1
// - Constant) - A` if we are willing to invert all of the uses.
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V))
if (BO->getOpcode() == Instruction::Add \|\|
BO->getOpcode() == Instruction::Sub)
if (isa<Constant>(BO->getOperand(0)) \|\| isa<Constant>(BO->getOperand(1)))
return WillInvertAllUses;

// Selects with invertible operands are freely invertible
if (match(V, m_Select(m_Value(), m_Not(m_Value()), m_Not(m_Value()))))
return WillInvertAllUses;

return false;
}

/// Given i1 V, can every user of V be freely adapted if V is changed to !V ?
///
/// See also: isFreeToInvert()
static inline bool canFreelyInvertAllUsersOf(Value V, Value IgnoredUser) {
// Look at every user of V.
for (User *U : V->users()) {
if (U == IgnoredUser)
continue; // Don't consider this user.

auto *I = cast<Instruction>(U);
switch (I->getOpcode()) {
case Instruction::Select:
case Instruction::Br:
break; // Free to invert by swapping true/false values/destinations.
case Instruction::Xor: // Can invert 'xor' if it's a 'not', by ignoring it.
if (!match(I, m_Not(m_Value())))
return false; // Not a 'not'.
break;
default:
return false; // Don't know, likely not freely invertible.
}
// So far all users were free to invert...
}
return true; // Can freely invert all users!
}

/// Some binary operators require special handling to avoid poison and undefined
/// behavior. If a constant vector has undef elements, replace those undefs with
/// identity constants if possible because those are always safe to execute.
/// If no identity constant exists, replace undef with some other safe constant.
static inline Constant *getSafeVectorConstantForBinop(
BinaryOperator::BinaryOps Opcode, Constant *In, bool IsRHSConstant) {
auto *InVTy = dyn_cast<VectorType>(In->getType());
assert(InVTy && "Not expecting scalars here");

Type *EltTy = InVTy->getElementType();
auto *SafeC = ConstantExpr::getBinOpIdentity(Opcode, EltTy, IsRHSConstant);
if (!SafeC) {
// TODO: Should this be available as a constant utility function? It is
// similar to getBinOpAbsorber().
if (IsRHSConstant) {
switch (Opcode) {
case Instruction::SRem: // X % 1 = 0
case Instruction::URem: // X %u 1 = 0
SafeC = ConstantInt::get(EltTy, 1);
break;
case Instruction::FRem: // X % 1.0 (doesn't simplify, but it is safe)
SafeC = ConstantFP::get(EltTy, 1.0);
break;
default:
llvm_unreachable("Only rem opcodes have no identity constant for RHS");
}
} else {
switch (Opcode) {
case Instruction::Shl: // 0 << X = 0
case Instruction::LShr: // 0 >>u X = 0
case Instruction::AShr: // 0 >> X = 0
case Instruction::SDiv: // 0 / X = 0
case Instruction::UDiv: // 0 /u X = 0
case Instruction::SRem: // 0 % X = 0
case Instruction::URem: // 0 %u X = 0
case Instruction::Sub: // 0 - X (doesn't simplify, but it is safe)
case Instruction::FSub: // 0.0 - X (doesn't simplify, but it is safe)
case Instruction::FDiv: // 0.0 / X (doesn't simplify, but it is safe)
case Instruction::FRem: // 0.0 % X = 0
SafeC = Constant::getNullValue(EltTy);
break;
default:
llvm_unreachable("Expected to find identity constant for opcode");
}
}
}
assert(SafeC && "Must have safe constant for binop");
unsigned NumElts = InVTy->getNumElements();
SmallVector<Constant *, 16> Out(NumElts);
for (unsigned i = 0; i != NumElts; ++i) {
Constant *C = In->getAggregateElement(i);
Out[i] = isa<UndefValue>(C) ? SafeC : C;
}
return ConstantVector::get(Out);
}

/// The core instruction combiner logic.
///
/// This class provides both the logic to recursively visit instructions and
/// combine them.
class LLVM_LIBRARY_VISIBILITY InstCombiner
: public InstVisitor<InstCombiner, Instruction *> {
// FIXME: These members shouldn't be public.
public:		public:
/// A worklist of the instructions that need to be simplified.		InstCombinerImpl(InstCombineWorklist &Worklist, BuilderTy &Builder,
InstCombineWorklist &Worklist;		bool MinimizeSize, AAResults *AA, AssumptionCache &AC,
		TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
		DominatorTree &DT, OptimizationRemarkEmitter &ORE,
		BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,
		const DataLayout &DL, LoopInfo *LI)
		: InstCombiner(Worklist, Builder, MinimizeSize, AA, AC, TLI, TTI, DT, ORE,
		BFI, PSI, DL, LI) {}

/// An IRBuilder that automatically inserts new instructions into the		virtual ~InstCombinerImpl() {}
/// worklist.
using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;
BuilderTy &Builder;

private:
// Mode in which we are running the combiner.
const bool MinimizeSize;

AAResults *AA;

// Required analyses.
AssumptionCache &AC;
TargetLibraryInfo &TLI;
DominatorTree &DT;
const DataLayout &DL;
const SimplifyQuery SQ;
OptimizationRemarkEmitter &ORE;
BlockFrequencyInfo *BFI;
ProfileSummaryInfo *PSI;

// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.
LoopInfo *LI;

bool MadeIRChange = false;

public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,
bool MinimizeSize, AAResults *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,
OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,
ProfileSummaryInfo PSI, const DataLayout &DL, LoopInfo LI)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
AA(AA), AC(AC), TLI(TLI), DT(DT),
DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}

/// Run the combiner over the entire worklist until it is empty.		/// Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache &getAssumptionCache() const { return AC; }

const DataLayout &getDataLayout() const { return DL; }

DominatorTree &getDominatorTree() const { return DT; }

LoopInfo *getLoopInfo() const { return LI; }

TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }

// Visitation implementation - Implement instruction combining for different		// Visitation implementation - Implement instruction combining for different
// instruction types. The semantics are as follows:		// instruction types. The semantics are as follows:
// Return Value:		// Return Value:
// null - No change was made		// null - No change was made
// I - Change was made, I is still valid, I may be dead though		// I - Change was made, I is still valid, I may be dead though
// otherwise - Change was made, replace I with returned instruction		// otherwise - Change was made, replace I with returned instruction
//		//
Instruction *visitFNeg(UnaryOperator &I);		Instruction *visitFNeg(UnaryOperator &I);
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	public:
/// Try to replace instruction \p I with value \p V which are pointers		/// Try to replace instruction \p I with value \p V which are pointers
/// in different address space.		/// in different address space.
/// \return true if successful.		/// \return true if successful.
bool replacePointer(Instruction &I, Value *V);		bool replacePointer(Instruction &I, Value *V);

LoadInst combineLoadToNewType(LoadInst &LI, Type NewTy,		LoadInst combineLoadToNewType(LoadInst &LI, Type NewTy,
const Twine &Suffix = "");		const Twine &Suffix = "");

		virtual Instruction *eraseInstFromFunction(Instruction &I) override {
		LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');
		assert(I.use_empty() && "Cannot erase instruction that is used!");
		salvageDebugInfo(I);

		// Make sure that we reprocess all operands now that we reduced their
		// use counts.
		for (Use &Operand : I.operands())
		if (auto *Inst = dyn_cast<Instruction>(Operand))
		Worklist.add(Inst);

		Worklist.remove(&I);
		I.eraseFromParent();
		MadeIRChange = true;
		return nullptr; // Don't do anything with FI
		}

private:		private:
bool shouldChangeType(unsigned FromBitWidth, unsigned ToBitWidth) const;		bool shouldChangeType(unsigned FromBitWidth, unsigned ToBitWidth) const;
bool shouldChangeType(Type From, Type To) const;		bool shouldChangeType(Type From, Type To) const;
Value dyn_castNegVal(Value V) const;		Value dyn_castNegVal(Value V) const;
Type FindElementAtOffset(PointerType PtrTy, int64_t Offset,		Type FindElementAtOffset(PointerType PtrTy, int64_t Offset,
SmallVectorImpl<Value *> &NewIndices);		SmallVectorImpl<Value *> &NewIndices);

/// Classify whether a cast is worth optimizing.		/// Classify whether a cast is worth optimizing.
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	private:

Value foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,		Value foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,
BinaryOperator &Logic);		BinaryOperator &Logic);
Value matchSelectFromAndOr(Value A, Value B, Value C, Value *D);		Value matchSelectFromAndOr(Value A, Value B, Value C, Value *D);
Value getSelectCondition(Value A, Value *B);		Value getSelectCondition(Value A, Value *B);

Instruction foldIntrinsicWithOverflowCommon(IntrinsicInst II);		Instruction foldIntrinsicWithOverflowCommon(IntrinsicInst II);

public:
/// Inserts an instruction \p New before instruction \p Old
///
/// Also adds the new instruction to the worklist and returns \p New so that
/// it is suitable for use as the return from the visitation patterns.
Instruction InsertNewInstBefore(Instruction New, Instruction &Old) {
assert(New && !New->getParent() &&
"New instruction already inserted into a basic block!");
BasicBlock *BB = Old.getParent();
BB->getInstList().insert(Old.getIterator(), New); // Insert inst
Worklist.push(New);
return New;
}

/// Same as InsertNewInstBefore, but also sets the debug loc.
Instruction InsertNewInstWith(Instruction New, Instruction &Old) {
New->setDebugLoc(Old.getDebugLoc());
return InsertNewInstBefore(New, Old);
}

/// A combiner-aware RAUW-like routine.
///
/// This method is to be used when an instruction is found to be dead,
/// replaceable with another preexisting expression. Here we add all uses of
/// I to the worklist, replace all uses of I with the new value, then return
/// I, so that the inst combiner will know that I was modified.
Instruction replaceInstUsesWith(Instruction &I, Value V) {
// If there are no uses to replace, then we return nullptr to indicate that
// no changes were made to the program.
if (I.use_empty()) return nullptr;

Worklist.pushUsersToWorkList(I); // Add all modified instrs to worklist.

// If we are replacing the instruction with itself, this must be in a
// segment of unreachable code, so just clobber the instruction.
if (&I == V)
V = UndefValue::get(I.getType());

LLVM_DEBUG(dbgs() << "IC: Replacing " << I << "\n"
<< " with " << *V << '\n');

I.replaceAllUsesWith(V);
return &I;
}

/// Replace operand of instruction and add old operand to the worklist.
Instruction replaceOperand(Instruction &I, unsigned OpNum, Value V) {
Worklist.addValue(I.getOperand(OpNum));
I.setOperand(OpNum, V);
return &I;
}

/// Replace use and add the previously used value to the worklist.
void replaceUse(Use &U, Value *NewValue) {
Worklist.addValue(U);
U = NewValue;
}

/// Creates a result tuple for an overflow intrinsic \p II with a given
/// \p Result and a constant \p Overflow value.
Instruction CreateOverflowTuple(IntrinsicInst II, Value *Result,
Constant *Overflow) {
Constant *V[] = {UndefValue::get(Result->getType()), Overflow};
StructType *ST = cast<StructType>(II->getType());
Constant *Struct = ConstantStruct::get(ST, V);
return InsertValueInst::Create(Struct, Result, 0);
}

/// Create and insert the idiom we use to indicate a block is unreachable
/// without having to rewrite the CFG from within InstCombine.
void CreateNonTerminatorUnreachable(Instruction *InsertAt) {
auto &Ctx = InsertAt->getContext();
new StoreInst(ConstantInt::getTrue(Ctx),
UndefValue::get(Type::getInt1PtrTy(Ctx)),
InsertAt);
}


/// Combiner aware instruction erasure.
///
/// When dealing with an instruction that has side effects or produces a void
/// value, we can't rely on DCE to delete the instruction. Instead, visit
/// methods should return the value returned by this function.
Instruction *eraseInstFromFunction(Instruction &I) {
LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');
assert(I.use_empty() && "Cannot erase instruction that is used!");
salvageDebugInfo(I);

// Make sure that we reprocess all operands now that we reduced their
// use counts.
for (Use &Operand : I.operands())
if (auto *Inst = dyn_cast<Instruction>(Operand))
Worklist.add(Inst);

Worklist.remove(&I);
I.eraseFromParent();
MadeIRChange = true;
return nullptr; // Don't do anything with FI
}

void computeKnownBits(const Value *V, KnownBits &Known,
unsigned Depth, const Instruction *CxtI) const {
llvm::computeKnownBits(V, Known, DL, Depth, &AC, CxtI, &DT);
}

KnownBits computeKnownBits(const Value *V, unsigned Depth,
const Instruction *CxtI) const {
return llvm::computeKnownBits(V, DL, Depth, &AC, CxtI, &DT);
}

bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero = false,
unsigned Depth = 0,
const Instruction *CxtI = nullptr) {
return llvm::isKnownToBeAPowerOfTwo(V, DL, OrZero, Depth, &AC, CxtI, &DT);
}

bool MaskedValueIsZero(const Value *V, const APInt &Mask, unsigned Depth = 0,
const Instruction *CxtI = nullptr) const {
return llvm::MaskedValueIsZero(V, Mask, DL, Depth, &AC, CxtI, &DT);
}

unsigned ComputeNumSignBits(const Value *Op, unsigned Depth = 0,
const Instruction *CxtI = nullptr) const {
return llvm::ComputeNumSignBits(Op, DL, Depth, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForUnsignedMul(const Value *LHS,
const Value *RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForUnsignedMul(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForSignedMul(const Value *LHS,
const Value *RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForSignedMul(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForUnsignedAdd(const Value *LHS,
const Value *RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForUnsignedAdd(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForSignedAdd(const Value *LHS,
const Value *RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForSignedAdd(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForUnsignedSub(const Value *LHS,
const Value *RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForUnsignedSub(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflowForSignedSub(const Value LHS, const Value RHS,
const Instruction *CxtI) const {
return llvm::computeOverflowForSignedSub(LHS, RHS, DL, &AC, CxtI, &DT);
}

OverflowResult computeOverflow(		OverflowResult computeOverflow(
Instruction::BinaryOps BinaryOp, bool IsSigned,		Instruction::BinaryOps BinaryOp, bool IsSigned,
Value LHS, Value RHS, Instruction *CxtI) const;		Value LHS, Value RHS, Instruction *CxtI) const;

/// Maximum size of array considered when transforming.
uint64_t MaxArraySizeForCombine = 0;

private:
/// Performs a few simplifications for operators which are associative		/// Performs a few simplifications for operators which are associative
/// or commutative.		/// or commutative.
bool SimplifyAssociativeOrCommutative(BinaryOperator &I);		bool SimplifyAssociativeOrCommutative(BinaryOperator &I);

/// Tries to simplify binary operations which some other binary		/// Tries to simplify binary operations which some other binary
/// operation distributes over.		/// operation distributes over.
///		///
/// It does this by either by factorizing out common terms (eg "(AB)+(AC)"		/// It does this by either by factorizing out common terms (eg "(AB)+(AC)"
Show All 29 Lines	bool matchThreeWayIntCompare(SelectInst SI, Value &LHS, Value *&RHS,
ConstantInt *&Greater);		ConstantInt *&Greater);

/// Attempts to replace V with a simpler value based on the demanded		/// Attempts to replace V with a simpler value based on the demanded
/// bits.		/// bits.
Value SimplifyDemandedUseBits(Value V, APInt DemandedMask, KnownBits &Known,		Value SimplifyDemandedUseBits(Value V, APInt DemandedMask, KnownBits &Known,
unsigned Depth, Instruction *CxtI);		unsigned Depth, Instruction *CxtI);
bool SimplifyDemandedBits(Instruction *I, unsigned Op,		bool SimplifyDemandedBits(Instruction *I, unsigned Op,
const APInt &DemandedMask, KnownBits &Known,		const APInt &DemandedMask, KnownBits &Known,
unsigned Depth = 0);		unsigned Depth = 0) override;

/// Helper routine of SimplifyDemandedUseBits. It computes KnownZero/KnownOne		/// Helper routine of SimplifyDemandedUseBits. It computes KnownZero/KnownOne
/// bits. It also tries to handle simplifications that can be done based on		/// bits. It also tries to handle simplifications that can be done based on
/// DemandedMask, but without modifying the Instruction.		/// DemandedMask, but without modifying the Instruction.
Value SimplifyMultipleUseDemandedBits(Instruction I,		Value SimplifyMultipleUseDemandedBits(Instruction I,
const APInt &DemandedMask,		const APInt &DemandedMask,
KnownBits &Known,		KnownBits &Known,
unsigned Depth, Instruction *CxtI);		unsigned Depth, Instruction *CxtI);

/// Helper routine of SimplifyDemandedUseBits. It tries to simplify demanded		/// Helper routine of SimplifyDemandedUseBits. It tries to simplify demanded
/// bit for "r1 = shr x, c1; r2 = shl r1, c2" instruction sequence.		/// bit for "r1 = shr x, c1; r2 = shl r1, c2" instruction sequence.
Value *simplifyShrShlDemandedBits(		Value *simplifyShrShlDemandedBits(
Instruction Shr, const APInt &ShrOp1, Instruction Shl,		Instruction Shr, const APInt &ShrOp1, Instruction Shl,
const APInt &ShlOp1, const APInt &DemandedMask, KnownBits &Known);		const APInt &ShlOp1, const APInt &DemandedMask, KnownBits &Known);

/// Tries to simplify operands to an integer instruction based on its		/// Tries to simplify operands to an integer instruction based on its
/// demanded bits.		/// demanded bits.
bool SimplifyDemandedInstructionBits(Instruction &Inst);		bool SimplifyDemandedInstructionBits(Instruction &Inst);

Value simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,		virtual Value *
APInt DemandedElts,		SimplifyDemandedVectorElts(Value *V, APInt DemandedElts, APInt &UndefElts,
int DmaskIdx = -1);		unsigned Depth = 0,
		bool AllowMultipleUsers = false) override;
Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,
APInt &UndefElts, unsigned Depth = 0,
bool AllowMultipleUsers = false);

/// Canonicalize the position of binops relative to shufflevector.		/// Canonicalize the position of binops relative to shufflevector.
Instruction *foldVectorBinop(BinaryOperator &Inst);		Instruction *foldVectorBinop(BinaryOperator &Inst);
Instruction *foldVectorSelect(SelectInst &Sel);		Instruction *foldVectorSelect(SelectInst &Sel);

/// Given a binary operator, cast instruction, or select which has a PHI node		/// Given a binary operator, cast instruction, or select which has a PHI node
/// as operand #0, see if we can fold the instruction into the PHI (which is		/// as operand #0, see if we can fold the instruction into the PHI (which is
/// only possible if all operands to the PHI are constants).		/// only possible if all operands to the PHI are constants).
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	private:
Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);		Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);

/// Returns a value X such that Val = X * Scale, or null if none.		/// Returns a value X such that Val = X * Scale, or null if none.
///		///
/// If the multiplication is known not to overflow then NoSignedWrap is set.		/// If the multiplication is known not to overflow then NoSignedWrap is set.
Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);		Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);
};		};

namespace {

// As a default, let's assume that we want to be aggressive,
// and attempt to traverse with no limits in attempt to sink negation.
static constexpr unsigned NegatorDefaultMaxDepth = ~0U;

// Let's guesstimate that most often we will end up visiting/producing
// fairly small number of new instructions.
static constexpr unsigned NegatorMaxNodesSSO = 16;

} // namespace

class Negator final {		class Negator final {
/// Top-to-bottom, def-to-use negated instruction tree we produced.		/// Top-to-bottom, def-to-use negated instruction tree we produced.
SmallVector<Instruction *, NegatorMaxNodesSSO> NewInstructions;		SmallVector<Instruction *, NegatorMaxNodesSSO> NewInstructions;

using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;		using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;
BuilderTy Builder;		BuilderTy Builder;

const DataLayout &DL;		const DataLayout &DL;
Show All 27 Lines	#endif
Negator(Negator &&) = delete;		Negator(Negator &&) = delete;
Negator &operator=(const Negator &) = delete;		Negator &operator=(const Negator &) = delete;
Negator &operator=(Negator &&) = delete;		Negator &operator=(Negator &&) = delete;

public:		public:
/// Attempt to negate \p Root. Retuns nullptr if negation can't be performed,		/// Attempt to negate \p Root. Retuns nullptr if negation can't be performed,
/// otherwise returns negated value.		/// otherwise returns negated value.
LLVM_NODISCARD static Value Negate(bool LHSIsZero, Value Root,		LLVM_NODISCARD static Value Negate(bool LHSIsZero, Value Root,
InstCombiner &IC);		InstCombinerImpl &IC);
};		};

} // end namespace llvm		} // end namespace llvm

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show All 17 Lines
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

STATISTIC(NumDeadStore, "Number of dead stores eliminated");		STATISTIC(NumDeadStore, "Number of dead stores eliminated");
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	if (AI->isArrayAllocation())
return false;		return false;
uint64_t AllocaSize = DL.getTypeStoreSize(AI->getAllocatedType());		uint64_t AllocaSize = DL.getTypeStoreSize(AI->getAllocatedType());
if (!AllocaSize)		if (!AllocaSize)
return false;		return false;
return isDereferenceableAndAlignedPointer(V, Align(AI->getAlignment()),		return isDereferenceableAndAlignedPointer(V, Align(AI->getAlignment()),
APInt(64, AllocaSize), DL);		APInt(64, AllocaSize), DL);
}		}

static Instruction *simplifyAllocaArraySize(InstCombiner &IC, AllocaInst &AI) {		static Instruction *simplifyAllocaArraySize(InstCombinerImpl &IC,
		AllocaInst &AI) {
// Check for array size of 1 (scalar allocation).		// Check for array size of 1 (scalar allocation).
if (!AI.isArrayAllocation()) {		if (!AI.isArrayAllocation()) {
// i32 1 is the canonical array size for scalar allocations.		// i32 1 is the canonical array size for scalar allocations.
if (AI.getArraySize()->getType()->isIntegerTy(32))		if (AI.getArraySize()->getType()->isIntegerTy(32))
return nullptr;		return nullptr;

// Canonicalize it.		// Canonicalize it.
return IC.replaceOperand(AI, 0, IC.Builder.getInt32(1));		return IC.replaceOperand(AI, 0, IC.Builder.getInt32(1));
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
//		//
// This class chases down uses of the old pointer until reaching the load		// This class chases down uses of the old pointer until reaching the load
// instructions, then replaces the old pointer in the load instructions with		// instructions, then replaces the old pointer in the load instructions with
// the new pointer. If during the chasing it sees bitcast or GEP, it will		// the new pointer. If during the chasing it sees bitcast or GEP, it will
// create new bitcast or GEP with the new pointer and use them in the load		// create new bitcast or GEP with the new pointer and use them in the load
// instruction.		// instruction.
class PointerReplacer {		class PointerReplacer {
public:		public:
PointerReplacer(InstCombiner &IC) : IC(IC) {}		PointerReplacer(InstCombinerImpl &IC) : IC(IC) {}
void replacePointer(Instruction &I, Value *V);		void replacePointer(Instruction &I, Value *V);

private:		private:
void findLoadAndReplace(Instruction &I);		void findLoadAndReplace(Instruction &I);
void replace(Instruction *I);		void replace(Instruction *I);
Value getReplacement(Value I);		Value getReplacement(Value I);

SmallVector<Instruction *, 4> Path;		SmallVector<Instruction *, 4> Path;
MapVector<Value , Value > WorkMap;		MapVector<Value , Value > WorkMap;
InstCombiner &IC;		InstCombinerImpl &IC;
};		};
} // end anonymous namespace		} // end anonymous namespace

void PointerReplacer::findLoadAndReplace(Instruction &I) {		void PointerReplacer::findLoadAndReplace(Instruction &I) {
for (auto U : I.users()) {		for (auto U : I.users()) {
auto Inst = dyn_cast<Instruction>(&U);		auto Inst = dyn_cast<Instruction>(&U);
if (!Inst)		if (!Inst)
return;		return;
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
auto *NT = cast<PointerType>(V->getType());		auto *NT = cast<PointerType>(V->getType());
assert(PT != NT && PT->getElementType() == NT->getElementType() &&		assert(PT != NT && PT->getElementType() == NT->getElementType() &&
"Invalid usage");		"Invalid usage");
#endif		#endif
WorkMap[&I] = V;		WorkMap[&I] = V;
findLoadAndReplace(I);		findLoadAndReplace(I);
}		}

Instruction *InstCombiner::visitAllocaInst(AllocaInst &AI) {		Instruction *InstCombinerImpl::visitAllocaInst(AllocaInst &AI) {
if (auto I = simplifyAllocaArraySize(this, AI))		if (auto I = simplifyAllocaArraySize(this, AI))
return I;		return I;

if (AI.getAllocatedType()->isSized()) {		if (AI.getAllocatedType()->isSized()) {
// Move all alloca's of zero byte objects to the entry block and merge them		// Move all alloca's of zero byte objects to the entry block and merge them
// together. Note that we only do this for alloca's, because malloc should		// together. Note that we only do this for alloca's, because malloc should
// allocate and return a unique pointer, even for a zero byte allocation.		// allocate and return a unique pointer, even for a zero byte allocation.
if (DL.getTypeAllocSize(AI.getAllocatedType()).getKnownMinSize() == 0) {		if (DL.getTypeAllocSize(AI.getAllocatedType()).getKnownMinSize() == 0) {
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
/// Helper to combine a load to a new type.		/// Helper to combine a load to a new type.
///		///
/// This just does the work of combining a load to a new type. It handles		/// This just does the work of combining a load to a new type. It handles
/// metadata, etc., and returns the new instruction. The \c NewTy should be the		/// metadata, etc., and returns the new instruction. The \c NewTy should be the
/// loaded value type. This will convert it to a pointer, cast the operand to		/// loaded value type. This will convert it to a pointer, cast the operand to
/// that pointer type, load it, etc.		/// that pointer type, load it, etc.
///		///
/// Note that this will create all of the instructions with whatever insert		/// Note that this will create all of the instructions with whatever insert
/// point the \c InstCombiner currently is using.		/// point the \c InstCombinerImpl currently is using.
LoadInst InstCombiner::combineLoadToNewType(LoadInst &LI, Type NewTy,		LoadInst InstCombinerImpl::combineLoadToNewType(LoadInst &LI, Type NewTy,
const Twine &Suffix) {		const Twine &Suffix) {
assert((!LI.isAtomic() \|\| isSupportedAtomicType(NewTy)) &&		assert((!LI.isAtomic() \|\| isSupportedAtomicType(NewTy)) &&
"can't fold an atomic load to requested type");		"can't fold an atomic load to requested type");

Value *Ptr = LI.getPointerOperand();		Value *Ptr = LI.getPointerOperand();
unsigned AS = LI.getPointerAddressSpace();		unsigned AS = LI.getPointerAddressSpace();
Value *NewPtr = nullptr;		Value *NewPtr = nullptr;
if (!(match(Ptr, m_BitCast(m_Value(NewPtr))) &&		if (!(match(Ptr, m_BitCast(m_Value(NewPtr))) &&
NewPtr->getType()->getPointerElementType() == NewTy &&		NewPtr->getType()->getPointerElementType() == NewTy &&
NewPtr->getType()->getPointerAddressSpace() == AS))		NewPtr->getType()->getPointerAddressSpace() == AS))
NewPtr = Builder.CreateBitCast(Ptr, NewTy->getPointerTo(AS));		NewPtr = Builder.CreateBitCast(Ptr, NewTy->getPointerTo(AS));

LoadInst *NewLoad = Builder.CreateAlignedLoad(		LoadInst *NewLoad = Builder.CreateAlignedLoad(
NewTy, NewPtr, LI.getAlign(), LI.isVolatile(), LI.getName() + Suffix);		NewTy, NewPtr, LI.getAlign(), LI.isVolatile(), LI.getName() + Suffix);
NewLoad->setAtomic(LI.getOrdering(), LI.getSyncScopeID());		NewLoad->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
copyMetadataForLoad(*NewLoad, LI);		copyMetadataForLoad(*NewLoad, LI);
return NewLoad;		return NewLoad;
}		}

/// Combine a store to a new type.		/// Combine a store to a new type.
///		///
/// Returns the newly created store instruction.		/// Returns the newly created store instruction.
static StoreInst combineStoreToNewValue(InstCombiner &IC, StoreInst &SI, Value V) {		static StoreInst *combineStoreToNewValue(InstCombinerImpl &IC, StoreInst &SI,
		Value *V) {
assert((!SI.isAtomic() \|\| isSupportedAtomicType(V->getType())) &&		assert((!SI.isAtomic() \|\| isSupportedAtomicType(V->getType())) &&
"can't fold an atomic store of requested type");		"can't fold an atomic store of requested type");

Value *Ptr = SI.getPointerOperand();		Value *Ptr = SI.getPointerOperand();
unsigned AS = SI.getPointerAddressSpace();		unsigned AS = SI.getPointerAddressSpace();
SmallVector<std::pair<unsigned, MDNode *>, 8> MD;		SmallVector<std::pair<unsigned, MDNode *>, 8> MD;
SI.getAllMetadata(MD);		SI.getAllMetadata(MD);

Show All 40 Lines	static StoreInst *combineStoreToNewValue(InstCombinerImpl &IC, StoreInst &SI,
return NewStore;		return NewStore;
}		}

/// Returns true if instruction represent minmax pattern like:		/// Returns true if instruction represent minmax pattern like:
/// select ((cmp load V1, load V2), V1, V2).		/// select ((cmp load V1, load V2), V1, V2).
static bool isMinMaxWithLoads(Value V, Type &LoadTy) {		static bool isMinMaxWithLoads(Value V, Type &LoadTy) {
assert(V->getType()->isPointerTy() && "Expected pointer type.");		assert(V->getType()->isPointerTy() && "Expected pointer type.");
// Ignore possible ty* to ixx* bitcast.		// Ignore possible ty* to ixx* bitcast.
V = peekThroughBitcast(V);		V = InstCombiner::peekThroughBitcast(V);
// Check that select is select ((cmp load V1, load V2), V1, V2) - minmax		// Check that select is select ((cmp load V1, load V2), V1, V2) - minmax
// pattern.		// pattern.
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
Instruction *L1;		Instruction *L1;
Instruction *L2;		Instruction *L2;
Value *LHS;		Value *LHS;
Value *RHS;		Value *RHS;
if (!match(V, m_Select(m_Cmp(Pred, m_Instruction(L1), m_Instruction(L2)),		if (!match(V, m_Select(m_Cmp(Pred, m_Instruction(L1), m_Instruction(L2)),
Show All 18 Lines
/// loads as that would introduce a semantic change. This combine is expected to		/// loads as that would introduce a semantic change. This combine is expected to
/// be a semantic no-op which just allows loads to more closely model the types		/// be a semantic no-op which just allows loads to more closely model the types
/// of their consuming operations.		/// of their consuming operations.
///		///
/// Currently, we also refuse to change the precise type used for an atomic load		/// Currently, we also refuse to change the precise type used for an atomic load
/// or a volatile load. This is debatable, and might be reasonable to change		/// or a volatile load. This is debatable, and might be reasonable to change
/// later. However, it is risky in case some backend or other part of LLVM is		/// later. However, it is risky in case some backend or other part of LLVM is
/// relying on the exact type loaded to select appropriate atomic operations.		/// relying on the exact type loaded to select appropriate atomic operations.
static Instruction *combineLoadToOperationType(InstCombiner &IC, LoadInst &LI) {		static Instruction *combineLoadToOperationType(InstCombinerImpl &IC,
		LoadInst &LI) {
// FIXME: We could probably with some care handle both volatile and ordered		// FIXME: We could probably with some care handle both volatile and ordered
// atomic loads here but it isn't clear that this is important.		// atomic loads here but it isn't clear that this is important.
if (!LI.isUnordered())		if (!LI.isUnordered())
return nullptr;		return nullptr;

if (LI.use_empty())		if (LI.use_empty())
return nullptr;		return nullptr;

Show All 9 Lines	static Instruction *combineLoadToOperationType(InstCombinerImpl &IC,
// is sized and has a size exactly the same as its store size and the store		// is sized and has a size exactly the same as its store size and the store
// size is a legal integer type.		// size is a legal integer type.
// Do not perform canonicalization if minmax pattern is found (to avoid		// Do not perform canonicalization if minmax pattern is found (to avoid
// infinite loop).		// infinite loop).
Type *Dummy;		Type *Dummy;
if (!Ty->isIntegerTy() && Ty->isSized() && !isa<ScalableVectorType>(Ty) &&		if (!Ty->isIntegerTy() && Ty->isSized() && !isa<ScalableVectorType>(Ty) &&
DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&		DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&
DL.typeSizeEqualsStoreSize(Ty) && !DL.isNonIntegralPointerType(Ty) &&		DL.typeSizeEqualsStoreSize(Ty) && !DL.isNonIntegralPointerType(Ty) &&
!isMinMaxWithLoads(		!isMinMaxWithLoads(InstCombiner::peekThroughBitcast(
peekThroughBitcast(LI.getPointerOperand(), /OneUseOnly=/true),		LI.getPointerOperand(), /OneUseOnly=/true),
Dummy)) {		Dummy)) {
if (all_of(LI.users(), [&LI](User *U) {		if (all_of(LI.users(), [&LI](User *U) {
auto *SI = dyn_cast<StoreInst>(U);		auto *SI = dyn_cast<StoreInst>(U);
return SI && SI->getPointerOperand() != &LI &&		return SI && SI->getPointerOperand() != &LI &&
!SI->getPointerOperand()->isSwiftError();		!SI->getPointerOperand()->isSwiftError();
})) {		})) {
LoadInst *NewLoad = IC.combineLoadToNewType(		LoadInst *NewLoad = IC.combineLoadToNewType(
LI, Type::getIntNTy(LI.getContext(), DL.getTypeStoreSizeInBits(Ty)));		LI, Type::getIntNTy(LI.getContext(), DL.getTypeStoreSizeInBits(Ty)));
// Replace all the stores with stores of the newly loaded value.		// Replace all the stores with stores of the newly loaded value.
Show All 23 Lines	if (auto* CI = dyn_cast<CastInst>(LI.user_back()))
return &LI;		return &LI;
}		}

// FIXME: We should also canonicalize loads of vectors when their elements are		// FIXME: We should also canonicalize loads of vectors when their elements are
// cast to other types.		// cast to other types.
return nullptr;		return nullptr;
}		}

static Instruction *unpackLoadToAggregate(InstCombiner &IC, LoadInst &LI) {		static Instruction *unpackLoadToAggregate(InstCombinerImpl &IC, LoadInst &LI) {
// FIXME: We could probably with some care handle both volatile and atomic		// FIXME: We could probably with some care handle both volatile and atomic
// stores here but it isn't clear that this is important.		// stores here but it isn't clear that this is important.
if (!LI.isSimple())		if (!LI.isSimple())
return nullptr;		return nullptr;

Type *T = LI.getType();		Type *T = LI.getType();
if (!T->isAggregateType())		if (!T->isAggregateType())
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
// %arrayidx = getelementptr inbounds [1 x i32]* @f.a, i64 0, i64 %x		// %arrayidx = getelementptr inbounds [1 x i32]* @f.a, i64 0, i64 %x
// ... = load i32* %arrayidx, align 4		// ... = load i32* %arrayidx, align 4
// Then we know that we can replace %x in the GEP with i64 0.		// Then we know that we can replace %x in the GEP with i64 0.
//		//
// FIXME: We could fold any GEP index to zero that would cause UB if it were		// FIXME: We could fold any GEP index to zero that would cause UB if it were
// not zero. Currently, we only handle the first such index. Also, we could		// not zero. Currently, we only handle the first such index. Also, we could
// also search through non-zero constant indices if we kept track of the		// also search through non-zero constant indices if we kept track of the
// offsets those indices implied.		// offsets those indices implied.
static bool canReplaceGEPIdxWithZero(InstCombiner &IC, GetElementPtrInst *GEPI,		static bool canReplaceGEPIdxWithZero(InstCombinerImpl &IC,
Instruction *MemI, unsigned &Idx) {		GetElementPtrInst GEPI, Instruction MemI,
		unsigned &Idx) {
if (GEPI->getNumOperands() < 2)		if (GEPI->getNumOperands() < 2)
return false;		return false;

// Find the first non-zero index of a GEP. If all indices are zero, return		// Find the first non-zero index of a GEP. If all indices are zero, return
// one past the last index.		// one past the last index.
auto FirstNZIdx = [](const GetElementPtrInst *GEPI) {		auto FirstNZIdx = [](const GetElementPtrInst *GEPI) {
unsigned I = 1;		unsigned I = 1;
for (unsigned IE = GEPI->getNumOperands(); I != IE; ++I) {		for (unsigned IE = GEPI->getNumOperands(); I != IE; ++I) {
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	static bool canReplaceGEPIdxWithZero(InstCombinerImpl &IC,
return isObjectSizeLessThanOrEq(GEPI->getOperand(0), TyAllocSize, DL) &&		return isObjectSizeLessThanOrEq(GEPI->getOperand(0), TyAllocSize, DL) &&
IsAllNonNegative();		IsAllNonNegative();
}		}

// If we're indexing into an object with a variable index for the memory		// If we're indexing into an object with a variable index for the memory
// access, but the object has only one element, we can assume that the index		// access, but the object has only one element, we can assume that the index
// will always be zero. If we replace the GEP, return it.		// will always be zero. If we replace the GEP, return it.
template <typename T>		template <typename T>
static Instruction replaceGEPIdxWithZero(InstCombiner &IC, Value Ptr,		static Instruction replaceGEPIdxWithZero(InstCombinerImpl &IC, Value Ptr,
T &MemI) {		T &MemI) {
if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(Ptr)) {		if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(Ptr)) {
unsigned Idx;		unsigned Idx;
if (canReplaceGEPIdxWithZero(IC, GEPI, &MemI, Idx)) {		if (canReplaceGEPIdxWithZero(IC, GEPI, &MemI, Idx)) {
Instruction *NewGEPI = GEPI->clone();		Instruction *NewGEPI = GEPI->clone();
NewGEPI->setOperand(Idx,		NewGEPI->setOperand(Idx,
ConstantInt::get(GEPI->getOperand(Idx)->getType(), 0));		ConstantInt::get(GEPI->getOperand(Idx)->getType(), 0));
NewGEPI->insertBefore(GEPI);		NewGEPI->insertBefore(GEPI);
Show All 25 Lines	static bool canSimplifyNullLoadOrGEP(LoadInst &LI, Value *Op) {
}		}
if (isa<UndefValue>(Op) \|\|		if (isa<UndefValue>(Op) \|\|
(isa<ConstantPointerNull>(Op) &&		(isa<ConstantPointerNull>(Op) &&
!NullPointerIsDefined(LI.getFunction(), LI.getPointerAddressSpace())))		!NullPointerIsDefined(LI.getFunction(), LI.getPointerAddressSpace())))
return true;		return true;
return false;		return false;
}		}

Instruction *InstCombiner::visitLoadInst(LoadInst &LI) {		Instruction *InstCombinerImpl::visitLoadInst(LoadInst &LI) {
Value *Op = LI.getOperand(0);		Value *Op = LI.getOperand(0);

// Try to canonicalize the loaded type.		// Try to canonicalize the loaded type.
if (Instruction Res = combineLoadToOperationType(this, LI))		if (Instruction Res = combineLoadToOperationType(this, LI))
return Res;		return Res;

// Attempt to improve the alignment.		// Attempt to improve the alignment.
Align KnownAlign = getOrEnforceKnownAlignment(		Align KnownAlign = getOrEnforceKnownAlignment(
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
/// %E0 = extractelement <2 x double> %U, i32 0		/// %E0 = extractelement <2 x double> %U, i32 0
/// %V0 = insertvalue [2 x double] undef, double %E0, 0		/// %V0 = insertvalue [2 x double] undef, double %E0, 0
/// %E1 = extractelement <2 x double> %U, i32 1		/// %E1 = extractelement <2 x double> %U, i32 1
/// %V1 = insertvalue [2 x double] %V0, double %E1, 1		/// %V1 = insertvalue [2 x double] %V0, double %E1, 1
///		///
/// and the layout of a <2 x double> is isomorphic to a [2 x double],		/// and the layout of a <2 x double> is isomorphic to a [2 x double],
/// then %V1 can be safely approximated by a conceptual "bitcast" of %U.		/// then %V1 can be safely approximated by a conceptual "bitcast" of %U.
/// Note that %U may contain non-undef values where %V1 has undef.		/// Note that %U may contain non-undef values where %V1 has undef.
static Value likeBitCastFromVector(InstCombiner &IC, Value V) {		static Value likeBitCastFromVector(InstCombinerImpl &IC, Value V) {
Value *U = nullptr;		Value *U = nullptr;
while (auto *IV = dyn_cast<InsertValueInst>(V)) {		while (auto *IV = dyn_cast<InsertValueInst>(V)) {
auto *E = dyn_cast<ExtractElementInst>(IV->getInsertedValueOperand());		auto *E = dyn_cast<ExtractElementInst>(IV->getInsertedValueOperand());
if (!E)		if (!E)
return nullptr;		return nullptr;
auto *W = E->getVectorOperand();		auto *W = E->getVectorOperand();
if (!U)		if (!U)
U = W;		U = W;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
/// volatile store. This is debatable, and might be reasonable to change later.		/// volatile store. This is debatable, and might be reasonable to change later.
/// However, it is risky in case some backend or other part of LLVM is relying		/// However, it is risky in case some backend or other part of LLVM is relying
/// on the exact type stored to select appropriate atomic operations.		/// on the exact type stored to select appropriate atomic operations.
///		///
/// \returns true if the store was successfully combined away. This indicates		/// \returns true if the store was successfully combined away. This indicates
/// the caller must erase the store instruction. We have to let the caller erase		/// the caller must erase the store instruction. We have to let the caller erase
/// the store instruction as otherwise there is no way to signal whether it was		/// the store instruction as otherwise there is no way to signal whether it was
/// combined or not: IC.EraseInstFromFunction returns a null pointer.		/// combined or not: IC.EraseInstFromFunction returns a null pointer.
static bool combineStoreToValueType(InstCombiner &IC, StoreInst &SI) {		static bool combineStoreToValueType(InstCombinerImpl &IC, StoreInst &SI) {
// FIXME: We could probably with some care handle both volatile and ordered		// FIXME: We could probably with some care handle both volatile and ordered
// atomic stores here but it isn't clear that this is important.		// atomic stores here but it isn't clear that this is important.
if (!SI.isUnordered())		if (!SI.isUnordered())
return false;		return false;

// swifterror values can't be bitcasted.		// swifterror values can't be bitcasted.
if (SI.getPointerOperand()->isSwiftError())		if (SI.getPointerOperand()->isSwiftError())
return false;		return false;
Show All 15 Lines	if (!SI.isAtomic() \|\| isSupportedAtomicType(U->getType())) {
return true;		return true;
}		}

// FIXME: We should also canonicalize stores of vectors when their elements		// FIXME: We should also canonicalize stores of vectors when their elements
// are cast to other types.		// are cast to other types.
return false;		return false;
}		}

static bool unpackStoreToAggregate(InstCombiner &IC, StoreInst &SI) {		static bool unpackStoreToAggregate(InstCombinerImpl &IC, StoreInst &SI) {
// FIXME: We could probably with some care handle both volatile and atomic		// FIXME: We could probably with some care handle both volatile and atomic
// stores here but it isn't clear that this is important.		// stores here but it isn't clear that this is important.
if (!SI.isSimple())		if (!SI.isSimple())
return false;		return false;

Value *V = SI.getValueOperand();		Value *V = SI.getValueOperand();
Type *T = V->getType();		Type *T = V->getType();

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	static bool equivalentAddressValues(Value A, Value B) {

// Otherwise they may not be equivalent.		// Otherwise they may not be equivalent.
return false;		return false;
}		}

/// Converts store (bitcast (load (bitcast (select ...)))) to		/// Converts store (bitcast (load (bitcast (select ...)))) to
/// store (load (select ...)), where select is minmax:		/// store (load (select ...)), where select is minmax:
/// select ((cmp load V1, load V2), V1, V2).		/// select ((cmp load V1, load V2), V1, V2).
static bool removeBitcastsFromLoadStoreOnMinMax(InstCombiner &IC,		static bool removeBitcastsFromLoadStoreOnMinMax(InstCombinerImpl &IC,
StoreInst &SI) {		StoreInst &SI) {
// bitcast?		// bitcast?
if (!match(SI.getPointerOperand(), m_BitCast(m_Value())))		if (!match(SI.getPointerOperand(), m_BitCast(m_Value())))
return false;		return false;
// load? integer?		// load? integer?
Value *LoadAddr;		Value *LoadAddr;
if (!match(SI.getValueOperand(), m_Load(m_BitCast(m_Value(LoadAddr)))))		if (!match(SI.getValueOperand(), m_Load(m_BitCast(m_Value(LoadAddr)))))
return false;		return false;
Show All 13 Lines	static bool removeBitcastsFromLoadStoreOnMinMax(InstCombinerImpl &IC,
const auto &DL = IC.getDataLayout();		const auto &DL = IC.getDataLayout();
if (DL.getTypeStoreSizeInBits(LI->getType()) !=		if (DL.getTypeStoreSizeInBits(LI->getType()) !=
DL.getTypeStoreSizeInBits(CmpLoadTy))		DL.getTypeStoreSizeInBits(CmpLoadTy))
return false;		return false;

if (!all_of(LI->users(), [LI, LoadAddr](User *U) {		if (!all_of(LI->users(), [LI, LoadAddr](User *U) {
auto *SI = dyn_cast<StoreInst>(U);		auto *SI = dyn_cast<StoreInst>(U);
return SI && SI->getPointerOperand() != LI &&		return SI && SI->getPointerOperand() != LI &&
peekThroughBitcast(SI->getPointerOperand()) != LoadAddr &&		InstCombiner::peekThroughBitcast(SI->getPointerOperand()) !=
		LoadAddr &&
!SI->getPointerOperand()->isSwiftError();		!SI->getPointerOperand()->isSwiftError();
}))		}))
return false;		return false;

IC.Builder.SetInsertPoint(LI);		IC.Builder.SetInsertPoint(LI);
LoadInst NewLI = IC.combineLoadToNewType(LI, CmpLoadTy);		LoadInst NewLI = IC.combineLoadToNewType(LI, CmpLoadTy);
// Replace all the stores with stores of the newly loaded value.		// Replace all the stores with stores of the newly loaded value.
for (auto *UI : LI->users()) {		for (auto *UI : LI->users()) {
auto *USI = cast<StoreInst>(UI);		auto *USI = cast<StoreInst>(UI);
IC.Builder.SetInsertPoint(USI);		IC.Builder.SetInsertPoint(USI);
combineStoreToNewValue(IC, *USI, NewLI);		combineStoreToNewValue(IC, *USI, NewLI);
}		}
IC.replaceInstUsesWith(*LI, UndefValue::get(LI->getType()));		IC.replaceInstUsesWith(*LI, UndefValue::get(LI->getType()));
IC.eraseInstFromFunction(*LI);		IC.eraseInstFromFunction(*LI);
return true;		return true;
}		}

Instruction *InstCombiner::visitStoreInst(StoreInst &SI) {		Instruction *InstCombinerImpl::visitStoreInst(StoreInst &SI) {
Value *Val = SI.getOperand(0);		Value *Val = SI.getOperand(0);
Value *Ptr = SI.getOperand(1);		Value *Ptr = SI.getOperand(1);

// Try to canonicalize the stored type.		// Try to canonicalize the stored type.
if (combineStoreToValueType(*this, SI))		if (combineStoreToValueType(*this, SI))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);

// Attempt to improve the alignment.		// Attempt to improve the alignment.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitStoreInst(StoreInst &SI) {
return nullptr;		return nullptr;
}		}

/// Try to transform:		/// Try to transform:
/// if () { P = v1; } else { P = v2 }		/// if () { P = v1; } else { P = v2 }
/// or:		/// or:
/// P = v1; if () { P = v2; }		/// P = v1; if () { P = v2; }
/// into a phi node with a store in the successor.		/// into a phi node with a store in the successor.
bool InstCombiner::mergeStoreIntoSuccessor(StoreInst &SI) {		bool InstCombinerImpl::mergeStoreIntoSuccessor(StoreInst &SI) {
assert(SI.isUnordered() &&		assert(SI.isUnordered() &&
"This code has not been audited for volatile or ordered store case.");		"This code has not been audited for volatile or ordered store case.");

// Check if the successor block has exactly 2 incoming edges.		// Check if the successor block has exactly 2 incoming edges.
BasicBlock *StoreBB = SI.getParent();		BasicBlock *StoreBB = SI.getParent();
BasicBlock *DestBB = StoreBB->getTerminator()->getSuccessor(0);		BasicBlock *DestBB = StoreBB->getTerminator()->getSuccessor(0);
if (!DestBB->hasNPredecessors(2))		if (!DestBB->hasNPredecessors(2))
return false;		return false;
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show All 26 Lines
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/BuildLibCalls.h"		#include "llvm/Transforms/Utils/BuildLibCalls.h"
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

/// The specific integer value is used in a context where it is known to be		/// The specific integer value is used in a context where it is known to be
/// non-zero. If this allows us to simplify the computation, do so and return		/// non-zero. If this allows us to simplify the computation, do so and return
/// the new operand, otherwise return null.		/// the new operand, otherwise return null.
static Value simplifyValueKnownNonZero(Value V, InstCombiner &IC,		static Value simplifyValueKnownNonZero(Value V, InstCombinerImpl &IC,
Instruction &CxtI) {		Instruction &CxtI) {
// If V has multiple uses, then we would have to do more analysis to determine		// If V has multiple uses, then we would have to do more analysis to determine
// if this is safe. For example, the use could be in dynamically unreached		// if this is safe. For example, the use could be in dynamically unreached
// code.		// code.
if (!V->hasOneUse()) return nullptr;		if (!V->hasOneUse()) return nullptr;

bool MadeChange = false;		bool MadeChange = false;

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (match(&I, m_c_FMul(m_OneUse(m_Select(m_Value(Cond), m_SpecificFP(-1.0),
IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);		IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
Builder.setFastMathFlags(I.getFastMathFlags());		Builder.setFastMathFlags(I.getFastMathFlags());
return Builder.CreateSelect(Cond, Builder.CreateFNeg(OtherOp), OtherOp);		return Builder.CreateSelect(Cond, Builder.CreateFNeg(OtherOp), OtherOp);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitMul(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitMul(BinaryOperator &I) {
if (Value *V = SimplifyMulInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyMulInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	static Instruction *foldFPSignBitOps(BinaryOperator &I) {
// fabs(X) * fabs(X) -> X * X		// fabs(X) * fabs(X) -> X * X
// fabs(X) / fabs(X) -> X / X		// fabs(X) / fabs(X) -> X / X
if (Op0 == Op1 && match(Op0, m_Intrinsic<Intrinsic::fabs>(m_Value(X))))		if (Op0 == Op1 && match(Op0, m_Intrinsic<Intrinsic::fabs>(m_Value(X))))
return BinaryOperator::CreateWithCopiedFlags(Opcode, X, X, &I);		return BinaryOperator::CreateWithCopiedFlags(Opcode, X, X, &I);

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFMul(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
if (Value *V = SimplifyFMulInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFMulInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
}		}

return nullptr;		return nullptr;
}		}

/// Fold a divide or remainder with a select instruction divisor when one of the		/// Fold a divide or remainder with a select instruction divisor when one of the
/// select operands is zero. In that case, we can use the other select operand		/// select operands is zero. In that case, we can use the other select operand
/// because div/rem by zero is undefined.		/// because div/rem by zero is undefined.
bool InstCombiner::simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I) {		bool InstCombinerImpl::simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I) {
SelectInst *SI = dyn_cast<SelectInst>(I.getOperand(1));		SelectInst *SI = dyn_cast<SelectInst>(I.getOperand(1));
if (!SI)		if (!SI)
return false;		return false;

int NonNullOperand;		int NonNullOperand;
if (match(SI->getTrueValue(), m_Zero()))		if (match(SI->getTrueValue(), m_Zero()))
// div/rem X, (Cond ? 0 : Y) -> div/rem X, Y		// div/rem X, (Cond ? 0 : Y) -> div/rem X, Y
NonNullOperand = 2;		NonNullOperand = 2;
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	static bool isMultiple(const APInt &C1, const APInt &C2, APInt &Quotient,

return Remainder.isMinValue();		return Remainder.isMinValue();
}		}

/// This function implements the transforms common to both integer division		/// This function implements the transforms common to both integer division
/// instructions (udiv and sdiv). It is called by the visitors to those integer		/// instructions (udiv and sdiv). It is called by the visitors to those integer
/// division instructions.		/// division instructions.
/// Common integer divide transforms		/// Common integer divide transforms
Instruction *InstCombiner::commonIDivTransforms(BinaryOperator &I) {		Instruction *InstCombinerImpl::commonIDivTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
bool IsSigned = I.getOpcode() == Instruction::SDiv;		bool IsSigned = I.getOpcode() == Instruction::SDiv;
Type *Ty = I.getType();		Type *Ty = I.getType();

// The RHS is known non-zero.		// The RHS is known non-zero.
if (Value V = simplifyValueKnownNonZero(I.getOperand(1), this, I))		if (Value V = simplifyValueKnownNonZero(I.getOperand(1), this, I))
return replaceOperand(I, 1, V);		return replaceOperand(I, 1, V);

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
}		}

static const unsigned MaxDepth = 6;		static const unsigned MaxDepth = 6;

namespace {		namespace {

using FoldUDivOperandCb = Instruction ()(Value Op0, Value Op1,		using FoldUDivOperandCb = Instruction ()(Value Op0, Value Op1,
const BinaryOperator &I,		const BinaryOperator &I,
InstCombiner &IC);		InstCombinerImpl &IC);

/// Used to maintain state for visitUDivOperand().		/// Used to maintain state for visitUDivOperand().
struct UDivFoldAction {		struct UDivFoldAction {
/// Informs visitUDiv() how to fold this operand. This can be zero if this		/// Informs visitUDiv() how to fold this operand. This can be zero if this
/// action joins two actions together.		/// action joins two actions together.
FoldUDivOperandCb FoldAction;		FoldUDivOperandCb FoldAction;

/// Which operand to fold.		/// Which operand to fold.
Show All 12 Lines	struct UDivFoldAction {
UDivFoldAction(FoldUDivOperandCb FA, Value *InputOperand, size_t SLHS)		UDivFoldAction(FoldUDivOperandCb FA, Value *InputOperand, size_t SLHS)
: FoldAction(FA), OperandToFold(InputOperand), SelectLHSIdx(SLHS) {}		: FoldAction(FA), OperandToFold(InputOperand), SelectLHSIdx(SLHS) {}
};		};

} // end anonymous namespace		} // end anonymous namespace

// X udiv 2^C -> X >> C		// X udiv 2^C -> X >> C
static Instruction foldUDivPow2Cst(Value Op0, Value *Op1,		static Instruction foldUDivPow2Cst(Value Op0, Value *Op1,
const BinaryOperator &I, InstCombiner &IC) {		const BinaryOperator &I,
		InstCombinerImpl &IC) {
Constant *C1 = getLogBase2(Op0->getType(), cast<Constant>(Op1));		Constant *C1 = getLogBase2(Op0->getType(), cast<Constant>(Op1));
if (!C1)		if (!C1)
llvm_unreachable("Failed to constant fold udiv -> logbase2");		llvm_unreachable("Failed to constant fold udiv -> logbase2");
BinaryOperator *LShr = BinaryOperator::CreateLShr(Op0, C1);		BinaryOperator *LShr = BinaryOperator::CreateLShr(Op0, C1);
if (I.isExact())		if (I.isExact())
LShr->setIsExact();		LShr->setIsExact();
return LShr;		return LShr;
}		}

// X udiv (C1 << N), where C1 is "1<<C2" --> X >> (N+C2)		// X udiv (C1 << N), where C1 is "1<<C2" --> X >> (N+C2)
// X udiv (zext (C1 << N)), where C1 is "1<<C2" --> X >> (N+C2)		// X udiv (zext (C1 << N)), where C1 is "1<<C2" --> X >> (N+C2)
static Instruction foldUDivShl(Value Op0, Value *Op1, const BinaryOperator &I,		static Instruction foldUDivShl(Value Op0, Value *Op1, const BinaryOperator &I,
InstCombiner &IC) {		InstCombinerImpl &IC) {
Value *ShiftLeft;		Value *ShiftLeft;
if (!match(Op1, m_ZExt(m_Value(ShiftLeft))))		if (!match(Op1, m_ZExt(m_Value(ShiftLeft))))
ShiftLeft = Op1;		ShiftLeft = Op1;

Constant *CI;		Constant *CI;
Value *N;		Value *N;
if (!match(ShiftLeft, m_Shl(m_Constant(CI), m_Value(N))))		if (!match(ShiftLeft, m_Shl(m_Constant(CI), m_Value(N))))
llvm_unreachable("match should never fail here!");		llvm_unreachable("match should never fail here!");
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	if ((match(N, m_OneUse(m_ZExt(m_Value(X)))) && match(D, m_Constant(C))) \|\|
Value *NarrowOp = isa<Constant>(D) ? Builder.CreateBinOp(Opcode, X, TruncC)		Value *NarrowOp = isa<Constant>(D) ? Builder.CreateBinOp(Opcode, X, TruncC)
: Builder.CreateBinOp(Opcode, TruncC, X);		: Builder.CreateBinOp(Opcode, TruncC, X);
return new ZExtInst(NarrowOp, Ty);		return new ZExtInst(NarrowOp, Ty);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitUDiv(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitUDiv(BinaryOperator &I) {
if (Value *V = SimplifyUDivInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyUDivInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

// Handle the integer div common cases		// Handle the integer div common cases
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = UDivActions.size(); i != e; ++i) {
UDivActions[i].FoldResult = Inst;		UDivActions[i].FoldResult = Inst;
} else		} else
return Inst;		return Inst;
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitSDiv(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitSDiv(BinaryOperator &I) {
if (Value *V = SimplifySDivInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifySDivInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

// Handle the integer div common cases		// Handle the integer div common cases
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	static Instruction *foldFDivConstantDividend(BinaryOperator &I) {
// TODO: Use Intrinsic::canonicalize or let function attributes tell us that		// TODO: Use Intrinsic::canonicalize or let function attributes tell us that
// denorms are flushed?		// denorms are flushed?
if (!NewC \|\| !NewC->isNormalFP())		if (!NewC \|\| !NewC->isNormalFP())
return nullptr;		return nullptr;

return BinaryOperator::CreateFDivFMF(NewC, X, &I);		return BinaryOperator::CreateFDivFMF(NewC, X, &I);
}		}

Instruction *InstCombiner::visitFDiv(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
if (Value *V = SimplifyFDivInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFDivInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
}		}
return nullptr;		return nullptr;
}		}

/// This function implements the transforms common to both integer remainder		/// This function implements the transforms common to both integer remainder
/// instructions (urem and srem). It is called by the visitors to those integer		/// instructions (urem and srem). It is called by the visitors to those integer
/// remainder instructions.		/// remainder instructions.
/// Common integer remainder transforms		/// Common integer remainder transforms
Instruction *InstCombiner::commonIRemTransforms(BinaryOperator &I) {		Instruction *InstCombinerImpl::commonIRemTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);

// The RHS is known non-zero.		// The RHS is known non-zero.
if (Value V = simplifyValueKnownNonZero(I.getOperand(1), this, I))		if (Value V = simplifyValueKnownNonZero(I.getOperand(1), this, I))
return replaceOperand(I, 1, V);		return replaceOperand(I, 1, V);

// Handle cases involving: rem X, (select Cond, Y, Z)		// Handle cases involving: rem X, (select Cond, Y, Z)
if (simplifyDivRemOfSelectWithZeroOp(I))		if (simplifyDivRemOfSelectWithZeroOp(I))
Show All 21 Lines	if (Instruction *Op0I = dyn_cast<Instruction>(Op0)) {
if (SimplifyDemandedInstructionBits(I))		if (SimplifyDemandedInstructionBits(I))
return &I;		return &I;
}		}
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitURem(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitURem(BinaryOperator &I) {
if (Value *V = SimplifyURemInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyURemInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *common = commonIRemTransforms(I))		if (Instruction *common = commonIRemTransforms(I))
Show All 34 Lines	Instruction *InstCombinerImpl::visitURem(BinaryOperator &I) {
if (match(Op1, m_SExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1)) {		if (match(Op1, m_SExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1)) {
Value *Cmp = Builder.CreateICmpEQ(Op0, ConstantInt::getAllOnesValue(Ty));		Value *Cmp = Builder.CreateICmpEQ(Op0, ConstantInt::getAllOnesValue(Ty));
return SelectInst::Create(Cmp, ConstantInt::getNullValue(Ty), Op0);		return SelectInst::Create(Cmp, ConstantInt::getNullValue(Ty), Op0);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitSRem(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitSRem(BinaryOperator &I) {
if (Value *V = SimplifySRemInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifySRemInst(I.getOperand(0), I.getOperand(1),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

// Handle the integer rem common cases		// Handle the integer rem common cases
if (Instruction *Common = commonIRemTransforms(I))		if (Instruction *Common = commonIRemTransforms(I))
return Common;		return Common;

Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
{		{
const APInt *Y;		const APInt *Y;
// X % -Y -> X % Y		// X % -Y -> X % Y
if (match(Op1, m_Negative(Y)) && !Y->isMinSignedValue())		if (match(Op1, m_Negative(Y)) && !Y->isMinSignedValue())
return replaceOperand(I, 1, ConstantInt::get(I.getType(), -*Y));		return replaceOperand(I, 1, ConstantInt::get(I.getType(), -*Y));
}		}

// -X srem Y --> -(X srem Y)		// -X srem Y --> -(X srem Y)
Value X, Y;		Value X, Y;
if (match(&I, m_SRem(m_OneUse(m_NSWSub(m_Zero(), m_Value(X))), m_Value(Y))))		if (match(&I, m_SRem(m_OneUse(m_NSWSub(m_Zero(), m_Value(X))), m_Value(Y))))
return BinaryOperator::CreateNSWNeg(Builder.CreateSRem(X, Y));		return BinaryOperator::CreateNSWNeg(Builder.CreateSRem(X, Y));

// If the sign bits of both operands are zero (i.e. we can prove they are		// If the sign bits of both operands are zero (i.e. we can prove they are
// unsigned inputs), turn this into a urem.		// unsigned inputs), turn this into a urem.
APInt Mask(APInt::getSignMask(I.getType()->getScalarSizeInBits()));		APInt Mask(APInt::getSignMask(I.getType()->getScalarSizeInBits()));
if (MaskedValueIsZero(Op1, Mask, 0, &I) &&		if (MaskedValueIsZero(Op1, Mask, 0, &I) &&
MaskedValueIsZero(Op0, Mask, 0, &I)) {		MaskedValueIsZero(Op0, Mask, 0, &I)) {
// X srem Y -> X urem Y, iff X and Y don't have sign bit set		// X srem Y -> X urem Y, iff X and Y don't have sign bit set
return BinaryOperator::CreateURem(Op0, Op1, I.getName());		return BinaryOperator::CreateURem(Op0, Op1, I.getName());
Show All 32 Lines	if (hasNegative && !hasMissing) {
if (NewRHSV != C) // Don't loop on -MININT		if (NewRHSV != C) // Don't loop on -MININT
return replaceOperand(I, 1, NewRHSV);		return replaceOperand(I, 1, NewRHSV);
}		}
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFRem(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitFRem(BinaryOperator &I) {
if (Value *V = SimplifyFRemInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFRemInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

return nullptr;		return nullptr;
}		}

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp

Show All 36 Lines
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/DebugCounter.h"		#include "llvm/Support/DebugCounter.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include <functional>		#include <functional>
#include <tuple>		#include <tuple>
#include <type_traits>		#include <type_traits>
#include <utility>		#include <utility>

namespace llvm {		namespace llvm {
class AssumptionCache;		class AssumptionCache;
class DataLayout;		class DataLayout;
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	if (!Negated) {
llvm::for_each(llvm::reverse(NewInstructions),		llvm::for_each(llvm::reverse(NewInstructions),
[&](Instruction *I) { I->eraseFromParent(); });		[&](Instruction *I) { I->eraseFromParent(); });
return llvm::None;		return llvm::None;
}		}
return std::make_pair(ArrayRef<Instruction *>(NewInstructions), Negated);		return std::make_pair(ArrayRef<Instruction *>(NewInstructions), Negated);
}		}

LLVM_NODISCARD Value Negator::Negate(bool LHSIsZero, Value Root,		LLVM_NODISCARD Value Negator::Negate(bool LHSIsZero, Value Root,
InstCombiner &IC) {		InstCombinerImpl &IC) {
++NegatorTotalNegationsAttempted;		++NegatorTotalNegationsAttempted;
LLVM_DEBUG(dbgs() << "Negator: attempting to sink negation into " << *Root		LLVM_DEBUG(dbgs() << "Negator: attempting to sink negation into " << *Root
<< "\n");		<< "\n");

if (!NegatorEnabled \|\| !DebugCounter::shouldExecute(NegatorCounter))		if (!NegatorEnabled \|\| !DebugCounter::shouldExecute(NegatorCounter))
return nullptr;		return nullptr;

Negator N(Root->getContext(), IC.getDataLayout(), IC.getAssumptionCache(),		Negator N(Root->getContext(), IC.getDataLayout(), IC.getAssumptionCache(),
Show All 33 Lines

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp

Show All 11 Lines

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

static cl::opt<unsigned>		static cl::opt<unsigned>
MaxNumPhis("instcombine-max-num-phis", cl::init(512),		MaxNumPhis("instcombine-max-num-phis", cl::init(512),
cl::desc("Maximum number phis to handle in intptr/ptrint folding"));		cl::desc("Maximum number phis to handle in intptr/ptrint folding"));

/// The PHI arguments will be folded into a single operation with a PHI node		/// The PHI arguments will be folded into a single operation with a PHI node
/// as input. The debug location of the single operation will be the merged		/// as input. The debug location of the single operation will be the merged
/// locations of the original PHI node arguments.		/// locations of the original PHI node arguments.
void InstCombiner::PHIArgMergedDebugLoc(Instruction *Inst, PHINode &PN) {		void InstCombinerImpl::PHIArgMergedDebugLoc(Instruction *Inst, PHINode &PN) {
auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));		auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));
Inst->setDebugLoc(FirstInst->getDebugLoc());		Inst->setDebugLoc(FirstInst->getDebugLoc());
// We do not expect a CallInst here, otherwise, N-way merging of DebugLoc		// We do not expect a CallInst here, otherwise, N-way merging of DebugLoc
// will be inefficient.		// will be inefficient.
assert(!isa<CallInst>(Inst));		assert(!isa<CallInst>(Inst));

for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {		for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {
auto *I = cast<Instruction>(PN.getIncomingValue(i));		auto *I = cast<Instruction>(PN.getIncomingValue(i));
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
// br label %bb2		// br label %bb2
// bb2:		// bb2:
// ptr_val = PHI([ptr_init, %bb1], [ptr_val_inc, %bb2]		// ptr_val = PHI([ptr_init, %bb1], [ptr_val_inc, %bb2]
// ...		// ...
// use(ptr_val)		// use(ptr_val)
// ptr_val_inc = ...		// ptr_val_inc = ...
// ...		// ...
//		//
Instruction *InstCombiner::FoldIntegerTypedPHI(PHINode &PN) {		Instruction *InstCombinerImpl::FoldIntegerTypedPHI(PHINode &PN) {
if (!PN.getType()->isIntegerTy())		if (!PN.getType()->isIntegerTy())
return nullptr;		return nullptr;
if (!PN.hasOneUse())		if (!PN.hasOneUse())
return nullptr;		return nullptr;

auto *IntToPtr = dyn_cast<IntToPtrInst>(PN.user_back());		auto *IntToPtr = dyn_cast<IntToPtrInst>(PN.user_back());
if (!IntToPtr)		if (!IntToPtr)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	#endif

// The PtrToCast + IntToPtr will be simplified later		// The PtrToCast + IntToPtr will be simplified later
return CastInst::CreateBitOrPointerCast(NewPtrPHI,		return CastInst::CreateBitOrPointerCast(NewPtrPHI,
IntToPtr->getOperand(0)->getType());		IntToPtr->getOperand(0)->getType());
}		}

/// If we have something like phi [add (a,b), add(a,c)] and if a/b/c and the		/// If we have something like phi [add (a,b), add(a,c)] and if a/b/c and the
/// adds all have a single use, turn this into a phi and a single binop.		/// adds all have a single use, turn this into a phi and a single binop.
Instruction *InstCombiner::FoldPHIArgBinOpIntoPHI(PHINode &PN) {		Instruction *InstCombinerImpl::FoldPHIArgBinOpIntoPHI(PHINode &PN) {
Instruction *FirstInst = cast<Instruction>(PN.getIncomingValue(0));		Instruction *FirstInst = cast<Instruction>(PN.getIncomingValue(0));
assert(isa<BinaryOperator>(FirstInst) \|\| isa<CmpInst>(FirstInst));		assert(isa<BinaryOperator>(FirstInst) \|\| isa<CmpInst>(FirstInst));
unsigned Opc = FirstInst->getOpcode();		unsigned Opc = FirstInst->getOpcode();
Value *LHSVal = FirstInst->getOperand(0);		Value *LHSVal = FirstInst->getOperand(0);
Value *RHSVal = FirstInst->getOperand(1);		Value *RHSVal = FirstInst->getOperand(1);

Type *LHSType = LHSVal->getType();		Type *LHSType = LHSVal->getType();
Type *RHSType = RHSVal->getType();		Type *RHSType = RHSVal->getType();
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::FoldPHIArgBinOpIntoPHI(PHINode &PN) {

for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i)		for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i)
NewBinOp->andIRFlags(PN.getIncomingValue(i));		NewBinOp->andIRFlags(PN.getIncomingValue(i));

PHIArgMergedDebugLoc(NewBinOp, PN);		PHIArgMergedDebugLoc(NewBinOp, PN);
return NewBinOp;		return NewBinOp;
}		}

Instruction *InstCombiner::FoldPHIArgGEPIntoPHI(PHINode &PN) {		Instruction *InstCombinerImpl::FoldPHIArgGEPIntoPHI(PHINode &PN) {
GetElementPtrInst *FirstInst =cast<GetElementPtrInst>(PN.getIncomingValue(0));		GetElementPtrInst *FirstInst =cast<GetElementPtrInst>(PN.getIncomingValue(0));

SmallVector<Value*, 16> FixedOperands(FirstInst->op_begin(),		SmallVector<Value*, 16> FixedOperands(FirstInst->op_begin(),
FirstInst->op_end());		FirstInst->op_end());
// This is true if all GEP bases are allocas and if all indices into them are		// This is true if all GEP bases are allocas and if all indices into them are
// constants.		// constants.
bool AllBasePointersAreAllocas = true;		bool AllBasePointersAreAllocas = true;

▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::FoldPHIArgGEPIntoPHI(PHINode &PN) {
GetElementPtrInst *NewGEP =		GetElementPtrInst *NewGEP =
GetElementPtrInst::Create(FirstInst->getSourceElementType(), Base,		GetElementPtrInst::Create(FirstInst->getSourceElementType(), Base,
makeArrayRef(FixedOperands).slice(1));		makeArrayRef(FixedOperands).slice(1));
if (AllInBounds) NewGEP->setIsInBounds();		if (AllInBounds) NewGEP->setIsInBounds();
PHIArgMergedDebugLoc(NewGEP, PN);		PHIArgMergedDebugLoc(NewGEP, PN);
return NewGEP;		return NewGEP;
}		}


/// Return true if we know that it is safe to sink the load out of the block		/// Return true if we know that it is safe to sink the load out of the block
/// that defines it. This means that it must be obvious the value of the load is		/// that defines it. This means that it must be obvious the value of the load is
/// not changed from the point of the load to the end of the block it is in.		/// not changed from the point of the load to the end of the block it is in.
///		///
/// Finally, it is safe, but not profitable, to sink a load targeting a		/// Finally, it is safe, but not profitable, to sink a load targeting a
/// non-address-taken alloca. Doing so will cause us to not promote the alloca		/// non-address-taken alloca. Doing so will cause us to not promote the alloca
/// to a register.		/// to a register.
static bool isSafeAndProfitableToSinkLoad(LoadInst *L) {		static bool isSafeAndProfitableToSinkLoad(LoadInst *L) {
Show All 29 Lines	static bool isSafeAndProfitableToSinkLoad(LoadInst *L) {
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(L->getOperand(0)))		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(L->getOperand(0)))
if (AllocaInst *AI = dyn_cast<AllocaInst>(GEP->getOperand(0)))		if (AllocaInst *AI = dyn_cast<AllocaInst>(GEP->getOperand(0)))
if (AI->isStaticAlloca() && GEP->hasAllConstantIndices())		if (AI->isStaticAlloca() && GEP->hasAllConstantIndices())
return false;		return false;

return true;		return true;
}		}

Instruction *InstCombiner::FoldPHIArgLoadIntoPHI(PHINode &PN) {		Instruction *InstCombinerImpl::FoldPHIArgLoadIntoPHI(PHINode &PN) {
LoadInst *FirstLI = cast<LoadInst>(PN.getIncomingValue(0));		LoadInst *FirstLI = cast<LoadInst>(PN.getIncomingValue(0));

// FIXME: This is overconservative; this transform is allowed in some cases		// FIXME: This is overconservative; this transform is allowed in some cases
// for atomic operations.		// for atomic operations.
if (FirstLI->isAtomic())		if (FirstLI->isAtomic())
return nullptr;		return nullptr;

// When processing loads, we need to propagate two bits of information to the		// When processing loads, we need to propagate two bits of information to the
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::FoldPHIArgLoadIntoPHI(PHINode &PN) {

PHIArgMergedDebugLoc(NewLI, PN);		PHIArgMergedDebugLoc(NewLI, PN);
return NewLI;		return NewLI;
}		}

/// TODO: This function could handle other cast types, but then it might		/// TODO: This function could handle other cast types, but then it might
/// require special-casing a cast from the 'i1' type. See the comment in		/// require special-casing a cast from the 'i1' type. See the comment in
/// FoldPHIArgOpIntoPHI() about pessimizing illegal integer types.		/// FoldPHIArgOpIntoPHI() about pessimizing illegal integer types.
Instruction *InstCombiner::FoldPHIArgZextsIntoPHI(PHINode &Phi) {		Instruction *InstCombinerImpl::FoldPHIArgZextsIntoPHI(PHINode &Phi) {
// We cannot create a new instruction after the PHI if the terminator is an		// We cannot create a new instruction after the PHI if the terminator is an
// EHPad because there is no valid insertion point.		// EHPad because there is no valid insertion point.
if (Instruction *TI = Phi.getParent()->getTerminator())		if (Instruction *TI = Phi.getParent()->getTerminator())
if (TI->isEHPad())		if (TI->isEHPad())
return nullptr;		return nullptr;

// Early exit for the common case of a phi with two operands. These are		// Early exit for the common case of a phi with two operands. These are
// handled elsewhere. See the comment below where we check the count of zexts		// handled elsewhere. See the comment below where we check the count of zexts
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::FoldPHIArgZextsIntoPHI(PHINode &Phi) {

InsertNewInstBefore(NewPhi, Phi);		InsertNewInstBefore(NewPhi, Phi);
return CastInst::CreateZExtOrBitCast(NewPhi, Phi.getType());		return CastInst::CreateZExtOrBitCast(NewPhi, Phi.getType());
}		}

/// If all operands to a PHI node are the same "unary" operator and they all are		/// If all operands to a PHI node are the same "unary" operator and they all are
/// only used by the PHI, PHI together their inputs, and do the operation once,		/// only used by the PHI, PHI together their inputs, and do the operation once,
/// to the result of the PHI.		/// to the result of the PHI.
Instruction *InstCombiner::FoldPHIArgOpIntoPHI(PHINode &PN) {		Instruction *InstCombinerImpl::FoldPHIArgOpIntoPHI(PHINode &PN) {
// We cannot create a new instruction after the PHI if the terminator is an		// We cannot create a new instruction after the PHI if the terminator is an
// EHPad because there is no valid insertion point.		// EHPad because there is no valid insertion point.
if (Instruction *TI = PN.getParent()->getTerminator())		if (Instruction *TI = PN.getParent()->getTerminator())
if (TI->isEHPad())		if (TI->isEHPad())
return nullptr;		return nullptr;

Instruction *FirstInst = cast<Instruction>(PN.getIncomingValue(0));		Instruction *FirstInst = cast<Instruction>(PN.getIncomingValue(0));

▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
/// This is an integer PHI and we know that it has an illegal type: see if it is		/// This is an integer PHI and we know that it has an illegal type: see if it is
/// only used by trunc or trunc(lshr) operations. If so, we split the PHI into		/// only used by trunc or trunc(lshr) operations. If so, we split the PHI into
/// the various pieces being extracted. This sort of thing is introduced when		/// the various pieces being extracted. This sort of thing is introduced when
/// SROA promotes an aggregate to large integer values.		/// SROA promotes an aggregate to large integer values.
///		///
/// TODO: The user of the trunc may be an bitcast to float/double/vector or an		/// TODO: The user of the trunc may be an bitcast to float/double/vector or an
/// inttoptr. We should produce new PHIs in the right type.		/// inttoptr. We should produce new PHIs in the right type.
///		///
Instruction *InstCombiner::SliceUpIllegalIntegerPHI(PHINode &FirstPhi) {		Instruction *InstCombinerImpl::SliceUpIllegalIntegerPHI(PHINode &FirstPhi) {
// PHIUsers - Keep track of all of the truncated values extracted from a set		// PHIUsers - Keep track of all of the truncated values extracted from a set
// of PHIs, along with their offset. These are the things we want to rewrite.		// of PHIs, along with their offset. These are the things we want to rewrite.
SmallVector<PHIUsageRecord, 16> PHIUsers;		SmallVector<PHIUsageRecord, 16> PHIUsers;

// PHIs are often mutually cyclic, so we keep track of a whole set of PHI		// PHIs are often mutually cyclic, so we keep track of a whole set of PHI
// nodes which are extracted from. PHIsToSlice is a set we use to avoid		// nodes which are extracted from. PHIsToSlice is a set we use to avoid
// revisiting PHIs, PHIsInspected is a ordered list of PHIs that we need to		// revisiting PHIs, PHIsInspected is a ordered list of PHIs that we need to
// check the uses of (to ensure they are all extracts).		// check the uses of (to ensure they are all extracts).
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::SliceUpIllegalIntegerPHI(PHINode &FirstPhi) {
Value *Undef = UndefValue::get(FirstPhi.getType());		Value *Undef = UndefValue::get(FirstPhi.getType());
for (unsigned i = 1, e = PHIsToSlice.size(); i != e; ++i)		for (unsigned i = 1, e = PHIsToSlice.size(); i != e; ++i)
replaceInstUsesWith(*PHIsToSlice[i], Undef);		replaceInstUsesWith(*PHIsToSlice[i], Undef);
return replaceInstUsesWith(FirstPhi, Undef);		return replaceInstUsesWith(FirstPhi, Undef);
}		}

// PHINode simplification		// PHINode simplification
//		//
Instruction *InstCombiner::visitPHINode(PHINode &PN) {		Instruction *InstCombinerImpl::visitPHINode(PHINode &PN) {
if (Value *V = SimplifyInstruction(&PN, SQ.getWithInstruction(&PN)))		if (Value *V = SimplifyInstruction(&PN, SQ.getWithInstruction(&PN)))
return replaceInstUsesWith(PN, V);		return replaceInstUsesWith(PN, V);

if (Instruction *Result = FoldPHIArgZextsIntoPHI(PN))		if (Instruction *Result = FoldPHIArgZextsIntoPHI(PN))
return Result;		return Result;

// If all PHI operands are the same operation, pull them through the PHI,		// If all PHI operands are the same operation, pull them through the PHI,
// reducing code size.		// reducing code size.
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show All 32 Lines
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

static Value *createMinMax(InstCombiner::BuilderTy &Builder,		static Value *createMinMax(InstCombiner::BuilderTy &Builder,
SelectPatternFlavor SPF, Value A, Value B) {		SelectPatternFlavor SPF, Value A, Value B) {
CmpInst::Predicate Pred = getMinMaxPred(SPF);		CmpInst::Predicate Pred = getMinMaxPred(SPF);
assert(CmpInst::isIntPredicate(Pred) && "Expected integer predicate");		assert(CmpInst::isIntPredicate(Pred) && "Expected integer predicate");
return Builder.CreateSelect(Builder.CreateICmp(Pred, A, B), A, B);		return Builder.CreateSelect(Builder.CreateICmp(Pred, A, B), A, B);
}		}

/// Replace a select operand based on an equality comparison with the identity		/// Replace a select operand based on an equality comparison with the identity
/// constant of a binop.		/// constant of a binop.
static Instruction *foldSelectBinOpIdentity(SelectInst &Sel,		static Instruction *foldSelectBinOpIdentity(SelectInst &Sel,
const TargetLibraryInfo &TLI,		const TargetLibraryInfo &TLI,
InstCombiner &IC) {		InstCombinerImpl &IC) {
// The select condition must be an equality compare with a constant operand.		// The select condition must be an equality compare with a constant operand.
Value *X;		Value *X;
Constant *C;		Constant *C;
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (!match(Sel.getCondition(), m_Cmp(Pred, m_Value(X), m_Constant(C))))		if (!match(Sel.getCondition(), m_Cmp(Pred, m_Value(X), m_Constant(C))))
return nullptr;		return nullptr;

bool IsEq;		bool IsEq;
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	static APInt getSelectFoldableConstant(BinaryOperator *I) {
case Instruction::And:		case Instruction::And:
return APInt::getAllOnesValue(I->getType()->getScalarSizeInBits());		return APInt::getAllOnesValue(I->getType()->getScalarSizeInBits());
case Instruction::Mul:		case Instruction::Mul:
return APInt(I->getType()->getScalarSizeInBits(), 1);		return APInt(I->getType()->getScalarSizeInBits(), 1);
}		}
}		}

/// We have (select c, TI, FI), and we know that TI and FI have the same opcode.		/// We have (select c, TI, FI), and we know that TI and FI have the same opcode.
Instruction InstCombiner::foldSelectOpOp(SelectInst &SI, Instruction TI,		Instruction InstCombinerImpl::foldSelectOpOp(SelectInst &SI, Instruction TI,
Instruction *FI) {		Instruction *FI) {
// Don't break up min/max patterns. The hasOneUse checks below prevent that		// Don't break up min/max patterns. The hasOneUse checks below prevent that
// for most cases, but vector min/max with bitcasts can be transformed. If the		// for most cases, but vector min/max with bitcasts can be transformed. If the
// one-use restrictions are eased for other patterns, we still don't want to		// one-use restrictions are eased for other patterns, we still don't want to
// obfuscate min/max.		// obfuscate min/max.
if ((match(&SI, m_SMin(m_Value(), m_Value())) \|\|		if ((match(&SI, m_SMin(m_Value(), m_Value())) \|\|
match(&SI, m_SMax(m_Value(), m_Value())) \|\|		match(&SI, m_SMax(m_Value(), m_Value())) \|\|
match(&SI, m_UMin(m_Value(), m_Value())) \|\|		match(&SI, m_UMin(m_Value(), m_Value())) \|\|
match(&SI, m_UMax(m_Value(), m_Value()))))		match(&SI, m_UMax(m_Value(), m_Value()))))
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	static bool isSelect01(const APInt &C1I, const APInt &C2I) {
if (!C1I.isNullValue() && !C2I.isNullValue()) // One side must be zero.		if (!C1I.isNullValue() && !C2I.isNullValue()) // One side must be zero.
return false;		return false;
return C1I.isOneValue() \|\| C1I.isAllOnesValue() \|\|		return C1I.isOneValue() \|\| C1I.isAllOnesValue() \|\|
C2I.isOneValue() \|\| C2I.isAllOnesValue();		C2I.isOneValue() \|\| C2I.isAllOnesValue();
}		}

/// Try to fold the select into one of the operands to allow further		/// Try to fold the select into one of the operands to allow further
/// optimization.		/// optimization.
Instruction InstCombiner::foldSelectIntoOp(SelectInst &SI, Value TrueVal,		Instruction InstCombinerImpl::foldSelectIntoOp(SelectInst &SI, Value TrueVal,
Value *FalseVal) {		Value *FalseVal) {
// See the comment above GetSelectFoldableOperands for a description of the		// See the comment above GetSelectFoldableOperands for a description of the
// transformation we are doing here.		// transformation we are doing here.
if (auto *TVI = dyn_cast<BinaryOperator>(TrueVal)) {		if (auto *TVI = dyn_cast<BinaryOperator>(TrueVal)) {
if (TVI->hasOneUse() && !isa<Constant>(FalseVal)) {		if (TVI->hasOneUse() && !isa<Constant>(FalseVal)) {
if (unsigned SFO = getSelectFoldableOperands(TVI)) {		if (unsigned SFO = getSelectFoldableOperands(TVI)) {
unsigned OpToFold = 0;		unsigned OpToFold = 0;
if ((SFO & 1) && FalseVal == TVI->getOperand(0)) {		if ((SFO & 1) && FalseVal == TVI->getOperand(0)) {
OpToFold = 1;		OpToFold = 1;
▲ Show 20 Lines • Show All 588 Lines • ▼ Show 20 Lines
}		}

/// If this is an integer min/max (icmp + select) with a constant operand,		/// If this is an integer min/max (icmp + select) with a constant operand,
/// create the canonical icmp for the min/max operation and canonicalize the		/// create the canonical icmp for the min/max operation and canonicalize the
/// constant to the 'false' operand of the select:		/// constant to the 'false' operand of the select:
/// select (icmp Pred X, C1), C2, X --> select (icmp Pred' X, C2), X, C2		/// select (icmp Pred X, C1), C2, X --> select (icmp Pred' X, C2), X, C2
/// Note: if C1 != C2, this will change the icmp constant to the existing		/// Note: if C1 != C2, this will change the icmp constant to the existing
/// constant operand of the select.		/// constant operand of the select.
static Instruction *		static Instruction *canonicalizeMinMaxWithConstant(SelectInst &Sel,
canonicalizeMinMaxWithConstant(SelectInst &Sel, ICmpInst &Cmp,		ICmpInst &Cmp,
InstCombiner &IC) {		InstCombinerImpl &IC) {
if (!Cmp.hasOneUse() \|\| !isa<Constant>(Cmp.getOperand(1)))		if (!Cmp.hasOneUse() \|\| !isa<Constant>(Cmp.getOperand(1)))
return nullptr;		return nullptr;

// Canonicalize the compare predicate based on whether we have min or max.		// Canonicalize the compare predicate based on whether we have min or max.
Value LHS, RHS;		Value LHS, RHS;
SelectPatternResult SPR = matchSelectPattern(&Sel, LHS, RHS);		SelectPatternResult SPR = matchSelectPattern(&Sel, LHS, RHS);
if (!SelectPatternResult::isMinOrMax(SPR.Flavor))		if (!SelectPatternResult::isMinOrMax(SPR.Flavor))
return nullptr;		return nullptr;
Show All 27 Lines

/// There are many select variants for each of ABS/NABS.		/// There are many select variants for each of ABS/NABS.
/// In matchSelectPattern(), there are different compare constants, compare		/// In matchSelectPattern(), there are different compare constants, compare
/// predicates/operands and select operands.		/// predicates/operands and select operands.
/// In isKnownNegation(), there are different formats of negated operands.		/// In isKnownNegation(), there are different formats of negated operands.
/// Canonicalize all these variants to 1 pattern.		/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.		/// This makes CSE more likely.
static Instruction *canonicalizeAbsNabs(SelectInst &Sel, ICmpInst &Cmp,		static Instruction *canonicalizeAbsNabs(SelectInst &Sel, ICmpInst &Cmp,
InstCombiner &IC) {		InstCombinerImpl &IC) {
if (!Cmp.hasOneUse() \|\| !isa<Constant>(Cmp.getOperand(1)))		if (!Cmp.hasOneUse() \|\| !isa<Constant>(Cmp.getOperand(1)))
return nullptr;		return nullptr;

// Choose a sign-bit check for the compare (likely simpler for codegen).		// Choose a sign-bit check for the compare (likely simpler for codegen).
// ABS: (X <s 0) ? -X : X		// ABS: (X <s 0) ? -X : X
// NABS: (X <s 0) ? X : -X		// NABS: (X <s 0) ? X : -X
Value LHS, RHS;		Value LHS, RHS;
SelectPatternFlavor SPF = matchSelectPattern(&Sel, LHS, RHS).Flavor;		SelectPatternFlavor SPF = matchSelectPattern(&Sel, LHS, RHS).Flavor;
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	static Instruction *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
case ICmpInst::Predicate::ICMP_UGT:		case ICmpInst::Predicate::ICMP_UGT:
// We want to canonicalize it to 'ult', so we'll need to increment C0,		// We want to canonicalize it to 'ult', so we'll need to increment C0,
// which again means it must not have any all-ones elements.		// which again means it must not have any all-ones elements.
if (!match(C0,		if (!match(C0,
m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_NE,		m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_NE,
APInt::getAllOnesValue(		APInt::getAllOnesValue(
C0->getType()->getScalarSizeInBits()))))		C0->getType()->getScalarSizeInBits()))))
return nullptr; // Can't do, have all-ones element[s].		return nullptr; // Can't do, have all-ones element[s].
C0 = AddOne(C0);		C0 = InstCombiner::AddOne(C0);
std::swap(X, Sel1);		std::swap(X, Sel1);
break;		break;
case ICmpInst::Predicate::ICMP_UGE:		case ICmpInst::Predicate::ICMP_UGE:
// The only way we'd get this predicate if this `icmp` has extra uses,		// The only way we'd get this predicate if this `icmp` has extra uses,
// but then we won't be able to do this fold.		// but then we won't be able to do this fold.
return nullptr;		return nullptr;
default:		default:
return nullptr; // Unknown predicate.		return nullptr; // Unknown predicate.
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	static Instruction *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
case ICmpInst::Predicate::ICMP_SGT:		case ICmpInst::Predicate::ICMP_SGT:
// We want to canonicalize it to 'slt', so we'll need to increment C2,		// We want to canonicalize it to 'slt', so we'll need to increment C2,
// which again means it must not have any signed max elements.		// which again means it must not have any signed max elements.
if (!match(C2,		if (!match(C2,
m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_NE,		m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_NE,
APInt::getSignedMaxValue(		APInt::getSignedMaxValue(
C2->getType()->getScalarSizeInBits()))))		C2->getType()->getScalarSizeInBits()))))
return nullptr; // Can't do, have signed max element[s].		return nullptr; // Can't do, have signed max element[s].
C2 = AddOne(C2);		C2 = InstCombiner::AddOne(C2);
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ICmpInst::Predicate::ICMP_SGE:		case ICmpInst::Predicate::ICMP_SGE:
// Also non-canonical, but here we don't need to change C2,		// Also non-canonical, but here we don't need to change C2,
// so we don't have any restrictions on C2, so we can just handle it.		// so we don't have any restrictions on C2, so we can just handle it.
std::swap(ReplacementLow, ReplacementHigh);		std::swap(ReplacementLow, ReplacementHigh);
break;		break;
default:		default:
return nullptr; // Unknown predicate.		return nullptr; // Unknown predicate.
Show All 30 Lines
// %r = select i1 %cmp, i32 %y, i32 C1		// %r = select i1 %cmp, i32 %y, i32 C1
// Where C0 != C1 and %x may be different from %y, see if the constant that we		// Where C0 != C1 and %x may be different from %y, see if the constant that we
// will have if we flip the strictness of the predicate (i.e. without changing		// will have if we flip the strictness of the predicate (i.e. without changing
// the result) is identical to the C1 in select. If it matches we can change		// the result) is identical to the C1 in select. If it matches we can change
// original comparison to one with swapped predicate, reuse the constant,		// original comparison to one with swapped predicate, reuse the constant,
// and swap the hands of select.		// and swap the hands of select.
static Instruction *		static Instruction *
tryToReuseConstantFromSelectInComparison(SelectInst &Sel, ICmpInst &Cmp,		tryToReuseConstantFromSelectInComparison(SelectInst &Sel, ICmpInst &Cmp,
InstCombiner &IC) {		InstCombinerImpl &IC) {
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
Value *X;		Value *X;
Constant *C0;		Constant *C0;
if (!match(&Cmp, m_OneUse(m_ICmp(		if (!match(&Cmp, m_OneUse(m_ICmp(
Pred, m_Value(X),		Pred, m_Value(X),
m_CombineAnd(m_AnyIntegralConstant(), m_Constant(C0))))))		m_CombineAnd(m_AnyIntegralConstant(), m_Constant(C0))))))
return nullptr;		return nullptr;

// If comparison predicate is non-relational, we won't be able to do anything.		// If comparison predicate is non-relational, we won't be able to do anything.
if (ICmpInst::isEquality(Pred))		if (ICmpInst::isEquality(Pred))
return nullptr;		return nullptr;

// If comparison predicate is non-canonical, then we certainly won't be able		// If comparison predicate is non-canonical, then we certainly won't be able
// to make it canonical; canonicalizeCmpWithConstant() already tried.		// to make it canonical; canonicalizeCmpWithConstant() already tried.
if (!isCanonicalPredicate(Pred))		if (!InstCombiner::isCanonicalPredicate(Pred))
return nullptr;		return nullptr;

// If the [input] type of comparison and select type are different, lets abort		// If the [input] type of comparison and select type are different, lets abort
// for now. We could try to compare constants with trunc/[zs]ext though.		// for now. We could try to compare constants with trunc/[zs]ext though.
if (C0->getType() != Sel.getType())		if (C0->getType() != Sel.getType())
return nullptr;		return nullptr;

// FIXME: are there any magic icmp predicate+constant pairs we must not touch?		// FIXME: are there any magic icmp predicate+constant pairs we must not touch?
Show All 11 Lines	auto MatchesSelectValue = [SelVal0, SelVal1](Constant *C) {
return C->isElementWiseEqual(SelVal0) \|\| C->isElementWiseEqual(SelVal1);		return C->isElementWiseEqual(SelVal0) \|\| C->isElementWiseEqual(SelVal1);
};		};

// If C0 already matches true/false value of select, we are done.		// If C0 already matches true/false value of select, we are done.
if (MatchesSelectValue(C0))		if (MatchesSelectValue(C0))
return nullptr;		return nullptr;

// Check the constant we'd have with flipped-strictness predicate.		// Check the constant we'd have with flipped-strictness predicate.
auto FlippedStrictness = getFlippedStrictnessPredicateAndConstant(Pred, C0);		auto FlippedStrictness =
		InstCombiner::getFlippedStrictnessPredicateAndConstant(Pred, C0);
if (!FlippedStrictness)		if (!FlippedStrictness)
return nullptr;		return nullptr;

// If said constant doesn't match either, then there is no hope,		// If said constant doesn't match either, then there is no hope,
if (!MatchesSelectValue(FlippedStrictness->second))		if (!MatchesSelectValue(FlippedStrictness->second))
return nullptr;		return nullptr;

// It matched! Lets insert the new comparison just before select.		// It matched! Lets insert the new comparison just before select.
InstCombiner::BuilderTy::InsertPointGuard Guard(IC.Builder);		InstCombiner::BuilderTy::InsertPointGuard Guard(IC.Builder);
IC.Builder.SetInsertPoint(&Sel);		IC.Builder.SetInsertPoint(&Sel);

Pred = ICmpInst::getSwappedPredicate(Pred); // Yes, swapped.		Pred = ICmpInst::getSwappedPredicate(Pred); // Yes, swapped.
Value *NewCmp = IC.Builder.CreateICmp(Pred, X, FlippedStrictness->second,		Value *NewCmp = IC.Builder.CreateICmp(Pred, X, FlippedStrictness->second,
Cmp.getName() + ".inv");		Cmp.getName() + ".inv");
IC.replaceOperand(Sel, 0, NewCmp);		IC.replaceOperand(Sel, 0, NewCmp);
Sel.swapValues();		Sel.swapValues();
Sel.swapProfMetadata();		Sel.swapProfMetadata();

return &Sel;		return &Sel;
}		}

/// Visit a SelectInst that has an ICmpInst as its first operand.		/// Visit a SelectInst that has an ICmpInst as its first operand.
Instruction *InstCombiner::foldSelectInstWithICmp(SelectInst &SI,		Instruction *InstCombinerImpl::foldSelectInstWithICmp(SelectInst &SI,
ICmpInst *ICI) {		ICmpInst *ICI) {
if (Value V = foldSelectValueEquivalence(SI, ICI, SQ))		if (Value V = foldSelectValueEquivalence(SI, ICI, SQ))
return replaceInstUsesWith(SI, V);		return replaceInstUsesWith(SI, V);

if (Instruction NewSel = canonicalizeMinMaxWithConstant(SI, ICI, *this))		if (Instruction NewSel = canonicalizeMinMaxWithConstant(SI, ICI, *this))
return NewSel;		return NewSel;

if (Instruction NewAbs = canonicalizeAbsNabs(SI, ICI, *this))		if (Instruction NewAbs = canonicalizeAbsNabs(SI, ICI, *this))
return NewAbs;		return NewAbs;
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	static bool canSelectOperandBeMappingIntoPredBlock(const Value *V,

// Otherwise we have a 'hard' case and we can't tell without doing more		// Otherwise we have a 'hard' case and we can't tell without doing more
// detailed dominator based analysis, punt.		// detailed dominator based analysis, punt.
return false;		return false;
}		}

/// We have an SPF (e.g. a min or max) of an SPF of the form:		/// We have an SPF (e.g. a min or max) of an SPF of the form:
/// SPF2(SPF1(A, B), C)		/// SPF2(SPF1(A, B), C)
Instruction InstCombiner::foldSPFofSPF(Instruction Inner,		Instruction InstCombinerImpl::foldSPFofSPF(Instruction Inner,
SelectPatternFlavor SPF1,		SelectPatternFlavor SPF1, Value *A,
Value A, Value B,		Value *B, Instruction &Outer,
Instruction &Outer,		SelectPatternFlavor SPF2,
SelectPatternFlavor SPF2, Value *C) {		Value *C) {
if (Outer.getType() != Inner->getType())		if (Outer.getType() != Inner->getType())
return nullptr;		return nullptr;

if (C == A \|\| C == B) {		if (C == A \|\| C == B) {
// MAX(MAX(A, B), B) -> MAX(A, B)		// MAX(MAX(A, B), B) -> MAX(A, B)
// MIN(MIN(a, b), a) -> MIN(a, b)		// MIN(MIN(a, b), a) -> MIN(a, b)
// TODO: This could be done in instsimplify.		// TODO: This could be done in instsimplify.
if (SPF1 == SPF2 && SelectPatternResult::isMinOrMax(SPF1))		if (SPF1 == SPF2 && SelectPatternResult::isMinOrMax(SPF1))
▲ Show 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	foldOverflowingAddSubSelect(SelectInst &SI, InstCombiner::BuilderTy &Builder) {
else		else
return nullptr;		return nullptr;

Function *F =		Function *F =
Intrinsic::getDeclaration(SI.getModule(), NewIntrinsicID, SI.getType());		Intrinsic::getDeclaration(SI.getModule(), NewIntrinsicID, SI.getType());
return CallInst::Create(F, {X, Y});		return CallInst::Create(F, {X, Y});
}		}

Instruction *InstCombiner::foldSelectExtConst(SelectInst &Sel) {		Instruction *InstCombinerImpl::foldSelectExtConst(SelectInst &Sel) {
Constant *C;		Constant *C;
if (!match(Sel.getTrueValue(), m_Constant(C)) &&		if (!match(Sel.getTrueValue(), m_Constant(C)) &&
!match(Sel.getFalseValue(), m_Constant(C)))		!match(Sel.getFalseValue(), m_Constant(C)))
return nullptr;		return nullptr;

Instruction *ExtInst;		Instruction *ExtInst;
if (!match(Sel.getTrueValue(), m_Instruction(ExtInst)) &&		if (!match(Sel.getTrueValue(), m_Instruction(ExtInst)) &&
!match(Sel.getFalseValue(), m_Instruction(ExtInst)))		!match(Sel.getFalseValue(), m_Instruction(ExtInst)))
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	static Instruction *canonicalizeSelectToShuffle(SelectInst &SI) {

return new ShuffleVectorInst(SI.getTrueValue(), SI.getFalseValue(), Mask);		return new ShuffleVectorInst(SI.getTrueValue(), SI.getFalseValue(), Mask);
}		}

/// If we have a select of vectors with a scalar condition, try to convert that		/// If we have a select of vectors with a scalar condition, try to convert that
/// to a vector select by splatting the condition. A splat may get folded with		/// to a vector select by splatting the condition. A splat may get folded with
/// other operations in IR and having all operands of a select be vector types		/// other operations in IR and having all operands of a select be vector types
/// is likely better for vector codegen.		/// is likely better for vector codegen.
static Instruction *canonicalizeScalarSelectOfVecs(		static Instruction *canonicalizeScalarSelectOfVecs(SelectInst &Sel,
SelectInst &Sel, InstCombiner &IC) {		InstCombinerImpl &IC) {
auto *Ty = dyn_cast<VectorType>(Sel.getType());		auto *Ty = dyn_cast<VectorType>(Sel.getType());
if (!Ty)		if (!Ty)
return nullptr;		return nullptr;

// We can replace a single-use extract with constant index.		// We can replace a single-use extract with constant index.
Value *Cond = Sel.getCondition();		Value *Cond = Sel.getCondition();
if (!match(Cond, m_OneUse(m_ExtractElt(m_Value(), m_ConstantInt()))))		if (!match(Cond, m_OneUse(m_ExtractElt(m_Value(), m_ConstantInt()))))
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	if (!Overflow) {
ConstantInt::get(X->getType(), *C1));		ConstantInt::get(X->getType(), *C1));
}		}
}		}

return nullptr;		return nullptr;
}		}

/// Match a sadd_sat or ssub_sat which is using min/max to clamp the value.		/// Match a sadd_sat or ssub_sat which is using min/max to clamp the value.
Instruction *InstCombiner::matchSAddSubSat(SelectInst &MinMax1) {		Instruction *InstCombinerImpl::matchSAddSubSat(SelectInst &MinMax1) {
Type *Ty = MinMax1.getType();		Type *Ty = MinMax1.getType();

// We are looking for a tree of:		// We are looking for a tree of:
// max(INT_MIN, min(INT_MAX, add(sext(A), sext(B))))		// max(INT_MIN, min(INT_MAX, add(sext(A), sext(B))))
// Where the min and max could be reversed		// Where the min and max could be reversed
Instruction *MinMax2;		Instruction *MinMax2;
BinaryOperator *AddSub;		BinaryOperator *AddSub;
const APInt MinValue, MaxValue;		const APInt MinValue, MaxValue;
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	static Instruction *foldSelectToCopysign(SelectInst &Sel,

assert(TC != FC && "Expected equal select arms to simplify");		assert(TC != FC && "Expected equal select arms to simplify");

Value *X;		Value *X;
const APInt *C;		const APInt *C;
bool IsTrueIfSignSet;		bool IsTrueIfSignSet;
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
if (!match(Cond, m_OneUse(m_ICmp(Pred, m_BitCast(m_Value(X)), m_APInt(C)))) \|\|		if (!match(Cond, m_OneUse(m_ICmp(Pred, m_BitCast(m_Value(X)), m_APInt(C)))) \|\|
!isSignBitCheck(Pred, *C, IsTrueIfSignSet) \|\| X->getType() != SelType)		!InstCombiner::isSignBitCheck(Pred, *C, IsTrueIfSignSet) \|\|
		X->getType() != SelType)
return nullptr;		return nullptr;

// If needed, negate the value that will be the sign argument of the copysign:		// If needed, negate the value that will be the sign argument of the copysign:
// (bitcast X) < 0 ? -TC : TC --> copysign(TC, X)		// (bitcast X) < 0 ? -TC : TC --> copysign(TC, X)
// (bitcast X) < 0 ? TC : -TC --> copysign(TC, -X)		// (bitcast X) < 0 ? TC : -TC --> copysign(TC, -X)
// (bitcast X) >= 0 ? -TC : TC --> copysign(TC, -X)		// (bitcast X) >= 0 ? -TC : TC --> copysign(TC, -X)
// (bitcast X) >= 0 ? TC : -TC --> copysign(TC, X)		// (bitcast X) >= 0 ? TC : -TC --> copysign(TC, X)
if (IsTrueIfSignSet ^ TC->isNegative())		if (IsTrueIfSignSet ^ TC->isNegative())
X = Builder.CreateFNegFMF(X, &Sel);		X = Builder.CreateFNegFMF(X, &Sel);

// Canonicalize the magnitude argument as the positive constant since we do		// Canonicalize the magnitude argument as the positive constant since we do
// not care about its sign.		// not care about its sign.
Value *MagArg = TC->isNegative() ? FVal : TVal;		Value *MagArg = TC->isNegative() ? FVal : TVal;
Function *F = Intrinsic::getDeclaration(Sel.getModule(), Intrinsic::copysign,		Function *F = Intrinsic::getDeclaration(Sel.getModule(), Intrinsic::copysign,
Sel.getType());		Sel.getType());
Instruction *CopySign = IntrinsicInst::Create(F, { MagArg, X });		Instruction *CopySign = IntrinsicInst::Create(F, { MagArg, X });
CopySign->setFastMathFlags(Sel.getFastMathFlags());		CopySign->setFastMathFlags(Sel.getFastMathFlags());
return CopySign;		return CopySign;
}		}

Instruction *InstCombiner::foldVectorSelect(SelectInst &Sel) {		Instruction *InstCombinerImpl::foldVectorSelect(SelectInst &Sel) {
auto *VecTy = dyn_cast<FixedVectorType>(Sel.getType());		auto *VecTy = dyn_cast<FixedVectorType>(Sel.getType());
if (!VecTy)		if (!VecTy)
return nullptr;		return nullptr;

unsigned NumElts = VecTy->getNumElements();		unsigned NumElts = VecTy->getNumElements();
APInt UndefElts(NumElts, 0);		APInt UndefElts(NumElts, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(NumElts));		APInt AllOnesEltMask(APInt::getAllOnesValue(NumElts));
if (Value *V = SimplifyDemandedVectorElts(&Sel, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&Sel, AllOnesEltMask, UndefElts)) {
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	static Instruction *foldSelectToPhi(SelectInst &Sel, const DominatorTree &DT,
Builder.SetInsertPoint(&*BB->begin());		Builder.SetInsertPoint(&*BB->begin());
auto *PN = Builder.CreatePHI(Sel.getType(), Inputs.size());		auto *PN = Builder.CreatePHI(Sel.getType(), Inputs.size());
for (auto *Pred : predecessors(BB))		for (auto *Pred : predecessors(BB))
PN->addIncoming(Inputs[Pred], Pred);		PN->addIncoming(Inputs[Pred], Pred);
PN->takeName(&Sel);		PN->takeName(&Sel);
return PN;		return PN;
}		}

Instruction *InstCombiner::visitSelectInst(SelectInst &SI) {		Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
Value *CondVal = SI.getCondition();		Value *CondVal = SI.getCondition();
Value *TrueVal = SI.getTrueValue();		Value *TrueVal = SI.getTrueValue();
Value *FalseVal = SI.getFalseValue();		Value *FalseVal = SI.getFalseValue();
Type *SelType = SI.getType();		Type *SelType = SI.getType();

// FIXME: Remove this workaround when freeze related patches are done.		// FIXME: Remove this workaround when freeze related patches are done.
// For select with undef operand which feeds into an equality comparison,		// For select with undef operand which feeds into an equality comparison,
// don't simplify it so loop unswitch can know the equality comparison		// don't simplify it so loop unswitch can know the equality comparison
▲ Show 20 Lines • Show All 457 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

// Given pattern:		// Given pattern:
// (x shiftopcode Q) shiftopcode K		// (x shiftopcode Q) shiftopcode K
// we should rewrite it as		// we should rewrite it as
// x shiftopcode (Q+K) iff (Q+K) u< bitwidth(x) and		// x shiftopcode (Q+K) iff (Q+K) u< bitwidth(x) and
//		//
// This is valid for any shift, but they must be identical, and we must be		// This is valid for any shift, but they must be identical, and we must be
// careful in case we have (zext(Q)+zext(K)) and look past extensions,		// careful in case we have (zext(Q)+zext(K)) and look past extensions,
// (Q+K) must not overflow or else (Q+K) u< bitwidth(x) is bogus.		// (Q+K) must not overflow or else (Q+K) u< bitwidth(x) is bogus.
//		//
// AnalyzeForSignBitExtraction indicates that we will only analyze whether this		// AnalyzeForSignBitExtraction indicates that we will only analyze whether this
// pattern has any 2 right-shifts that sum to 1 less than original bit width.		// pattern has any 2 right-shifts that sum to 1 less than original bit width.
Value *InstCombiner::reassociateShiftAmtsOfTwoSameDirectionShifts(		Value *InstCombinerImpl::reassociateShiftAmtsOfTwoSameDirectionShifts(
BinaryOperator *Sh0, const SimplifyQuery &SQ,		BinaryOperator *Sh0, const SimplifyQuery &SQ,
bool AnalyzeForSignBitExtraction) {		bool AnalyzeForSignBitExtraction) {
// Look for a shift of some instruction, ignore zext of shift amount if any.		// Look for a shift of some instruction, ignore zext of shift amount if any.
Instruction *Sh0Op0;		Instruction *Sh0Op0;
Value *ShAmt0;		Value *ShAmt0;
if (!match(Sh0,		if (!match(Sh0,
m_Shift(m_Instruction(Sh0Op0), m_ZExtOrSelf(m_Value(ShAmt0)))))		m_Shift(m_Instruction(Sh0Op0), m_ZExtOrSelf(m_Value(ShAmt0)))))
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	static Instruction *foldShiftOfShiftedLogic(BinaryOperator &I,

// shift (logic (shift X, C0), Y), C1 -> logic (shift X, C0+C1), (shift Y, C1)		// shift (logic (shift X, C0), Y), C1 -> logic (shift X, C0+C1), (shift Y, C1)
Constant ShiftSumC = ConstantInt::get(Ty, C0 + *C1);		Constant ShiftSumC = ConstantInt::get(Ty, C0 + *C1);
Value *NewShift1 = Builder.CreateBinOp(ShiftOpcode, X, ShiftSumC);		Value *NewShift1 = Builder.CreateBinOp(ShiftOpcode, X, ShiftSumC);
Value *NewShift2 = Builder.CreateBinOp(ShiftOpcode, Y, I.getOperand(1));		Value *NewShift2 = Builder.CreateBinOp(ShiftOpcode, Y, I.getOperand(1));
return BinaryOperator::Create(LogicInst->getOpcode(), NewShift1, NewShift2);		return BinaryOperator::Create(LogicInst->getOpcode(), NewShift1, NewShift2);
}		}

Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {		Instruction *InstCombinerImpl::commonShiftTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
assert(Op0->getType() == Op1->getType());		assert(Op0->getType() == Op1->getType());

// If the shift amount is a one-use `sext`, we can demote it to `zext`.		// If the shift amount is a one-use `sext`, we can demote it to `zext`.
Value *Y;		Value *Y;
if (match(Op1, m_OneUse(m_SExt(m_Value(Y))))) {		if (match(Op1, m_OneUse(m_SExt(m_Value(Y))))) {
Value *NewExt = Builder.CreateZExt(Y, I.getType(), Op1->getName());		Value *NewExt = Builder.CreateZExt(Y, I.getType(), Op1->getName());
return BinaryOperator::Create(I.getOpcode(), Op0, NewExt);		return BinaryOperator::Create(I.getOpcode(), Op0, NewExt);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (Instruction *Logic = foldShiftOfShiftedLogic(I, Builder))
return Logic;		return Logic;

return nullptr;		return nullptr;
}		}

/// Return true if we can simplify two logical (either left or right) shifts		/// Return true if we can simplify two logical (either left or right) shifts
/// that have constant shift amounts: OuterShift (InnerShift X, C1), C2.		/// that have constant shift amounts: OuterShift (InnerShift X, C1), C2.
static bool canEvaluateShiftedShift(unsigned OuterShAmt, bool IsOuterShl,		static bool canEvaluateShiftedShift(unsigned OuterShAmt, bool IsOuterShl,
Instruction *InnerShift, InstCombiner &IC,		Instruction *InnerShift,
Instruction *CxtI) {		InstCombinerImpl &IC, Instruction *CxtI) {
assert(InnerShift->isLogicalShift() && "Unexpected instruction type");		assert(InnerShift->isLogicalShift() && "Unexpected instruction type");

// We need constant scalar or constant splat shifts.		// We need constant scalar or constant splat shifts.
const APInt *InnerShiftConst;		const APInt *InnerShiftConst;
if (!match(InnerShift->getOperand(1), m_APInt(InnerShiftConst)))		if (!match(InnerShift->getOperand(1), m_APInt(InnerShiftConst)))
return false;		return false;

// Two logical shifts in the same direction:		// Two logical shifts in the same direction:
Show All 34 Lines
/// used to eliminate extraneous shifting from things like:		/// used to eliminate extraneous shifting from things like:
/// %C = shl i128 %A, 64		/// %C = shl i128 %A, 64
/// %D = shl i128 %B, 96		/// %D = shl i128 %B, 96
/// %E = or i128 %C, %D		/// %E = or i128 %C, %D
/// %F = lshr i128 %E, 64		/// %F = lshr i128 %E, 64
/// where the client will ask if E can be computed shifted right by 64-bits. If		/// where the client will ask if E can be computed shifted right by 64-bits. If
/// this succeeds, getShiftedValue() will be called to produce the value.		/// this succeeds, getShiftedValue() will be called to produce the value.
static bool canEvaluateShifted(Value *V, unsigned NumBits, bool IsLeftShift,		static bool canEvaluateShifted(Value *V, unsigned NumBits, bool IsLeftShift,
InstCombiner &IC, Instruction *CxtI) {		InstCombinerImpl &IC, Instruction *CxtI) {
// We can always evaluate constants shifted.		// We can always evaluate constants shifted.
if (isa<Constant>(V))		if (isa<Constant>(V))
return true;		return true;

Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) return false;		if (!I) return false;

// If this is the opposite shift, we can directly reuse the input of the shift		// If this is the opposite shift, we can directly reuse the input of the shift
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	static Value foldShiftedShift(BinaryOperator InnerShift, unsigned OuterShAmt,
// lshr (shl X, C1), C2 --> shl X, C1 - C2		// lshr (shl X, C1), C2 --> shl X, C1 - C2
// shl (lshr X, C1), C2 --> lshr X, C1 - C2		// shl (lshr X, C1), C2 --> lshr X, C1 - C2
return NewInnerShift(InnerShAmt - OuterShAmt);		return NewInnerShift(InnerShAmt - OuterShAmt);
}		}

/// When canEvaluateShifted() returns true for an expression, this function		/// When canEvaluateShifted() returns true for an expression, this function
/// inserts the new computation that produces the shifted value.		/// inserts the new computation that produces the shifted value.
static Value getShiftedValue(Value V, unsigned NumBits, bool isLeftShift,		static Value getShiftedValue(Value V, unsigned NumBits, bool isLeftShift,
InstCombiner &IC, const DataLayout &DL) {		InstCombinerImpl &IC, const DataLayout &DL) {
// We can always evaluate constants shifted.		// We can always evaluate constants shifted.
if (Constant *C = dyn_cast<Constant>(V)) {		if (Constant *C = dyn_cast<Constant>(V)) {
if (isLeftShift)		if (isLeftShift)
return IC.Builder.CreateShl(C, NumBits);		return IC.Builder.CreateShl(C, NumBits);
else		else
return IC.Builder.CreateLShr(C, NumBits);		return IC.Builder.CreateLShr(C, NumBits);
}		}

Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
IC.Worklist.push(I);		IC.addToWorklist(I);

switch (I->getOpcode()) {		switch (I->getOpcode()) {
default: llvm_unreachable("Inconsistency with CanEvaluateShifted");		default: llvm_unreachable("Inconsistency with CanEvaluateShifted");
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
// Bitwise operators can all arbitrarily be arbitrarily evaluated shifted.		// Bitwise operators can all arbitrarily be arbitrarily evaluated shifted.
I->setOperand(		I->setOperand(
Show All 37 Lines	case Instruction::Add:
return Shift.getOpcode() == Instruction::Shl;		return Shift.getOpcode() == Instruction::Shl;
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::And:		case Instruction::And:
return true;		return true;
}		}
}		}

Instruction InstCombiner::FoldShiftByConstant(Value Op0, Constant *Op1,		Instruction InstCombinerImpl::FoldShiftByConstant(Value Op0, Constant *Op1,
BinaryOperator &I) {		BinaryOperator &I) {
bool isLeftShift = I.getOpcode() == Instruction::Shl;		bool isLeftShift = I.getOpcode() == Instruction::Shl;

const APInt *Op1C;		const APInt *Op1C;
if (!match(Op1, m_APInt(Op1C)))		if (!match(Op1, m_APInt(Op1C)))
return nullptr;		return nullptr;

// See if we can propagate this shift into the input, this covers the trivial		// See if we can propagate this shift into the input, this covers the trivial
// cast of lshr(shl(x,c1),c2) as well as other more complex cases.		// cast of lshr(shl(x,c1),c2) as well as other more complex cases.
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	if (match(Op0, m_Select(m_Value(Cond), m_Value(TrueVal),
return SelectInst::Create(Cond, NewShift, NewOp);		return SelectInst::Create(Cond, NewShift, NewOp);
}		}
}		}
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitShl(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitShl(BinaryOperator &I) {
const SimplifyQuery Q = SQ.getWithInstruction(&I);		const SimplifyQuery Q = SQ.getWithInstruction(&I);

if (Value *V = SimplifyShlInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyShlInst(I.getOperand(0), I.getOperand(1),
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(), Q))		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(), Q))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitShl(BinaryOperator &I) {
if (match(Op0, m_One()) &&		if (match(Op0, m_One()) &&
match(Op1, m_Sub(m_SpecificInt(BitWidth - 1), m_Value(X))))		match(Op1, m_Sub(m_SpecificInt(BitWidth - 1), m_Value(X))))
return BinaryOperator::CreateLShr(		return BinaryOperator::CreateLShr(
ConstantInt::get(Ty, APInt::getSignMask(BitWidth)), X);		ConstantInt::get(Ty, APInt::getSignMask(BitWidth)), X);

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitLShr(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitLShr(BinaryOperator &I) {
if (Value *V = SimplifyLShrInst(I.getOperand(0), I.getOperand(1), I.isExact(),		if (Value *V = SimplifyLShrInst(I.getOperand(0), I.getOperand(1), I.isExact(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *R = commonShiftTransforms(I))		if (Instruction *R = commonShiftTransforms(I))
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (match(Op0, m_OneUse(m_Shl(m_Value(X), m_Specific(Op1))))) {
Value *Mask = Builder.CreateLShr(AllOnes, Op1);		Value *Mask = Builder.CreateLShr(AllOnes, Op1);
return BinaryOperator::CreateAnd(Mask, X);		return BinaryOperator::CreateAnd(Mask, X);
}		}

return nullptr;		return nullptr;
}		}

Instruction *		Instruction *
InstCombiner::foldVariableSignZeroExtensionOfVariableHighBitExtract(		InstCombinerImpl::foldVariableSignZeroExtensionOfVariableHighBitExtract(
BinaryOperator &OldAShr) {		BinaryOperator &OldAShr) {
assert(OldAShr.getOpcode() == Instruction::AShr &&		assert(OldAShr.getOpcode() == Instruction::AShr &&
"Must be called with arithmetic right-shift instruction only.");		"Must be called with arithmetic right-shift instruction only.");

// Check that constant C is a splat of the element-wise bitwidth of V.		// Check that constant C is a splat of the element-wise bitwidth of V.
auto BitWidthSplat = [](Constant C, Value V) {		auto BitWidthSplat = [](Constant C, Value V) {
return match(		return match(
C, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_EQ,		C, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_EQ,
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	InstCombinerImpl::foldVariableSignZeroExtensionOfVariableHighBitExtract(
NewAShr->copyIRFlags(HighBitExtract); // We can preserve 'exact'-ness.		NewAShr->copyIRFlags(HighBitExtract); // We can preserve 'exact'-ness.
if (!HadTrunc)		if (!HadTrunc)
return NewAShr;		return NewAShr;

Builder.Insert(NewAShr);		Builder.Insert(NewAShr);
return TruncInst::CreateTruncOrBitCast(NewAShr, OldAShr.getType());		return TruncInst::CreateTruncOrBitCast(NewAShr, OldAShr.getType());
}		}

Instruction *InstCombiner::visitAShr(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitAShr(BinaryOperator &I) {
if (Value *V = SimplifyAShrInst(I.getOperand(0), I.getOperand(1), I.isExact(),		if (Value *V = SimplifyAShrInst(I.getOperand(0), I.getOperand(1), I.isExact(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *R = commonShiftTransforms(I))		if (Instruction *R = commonShiftTransforms(I))
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

//===- InstCombineSimplifyDemanded.cpp ------------------------------------===//		//===- InstCombineSimplifyDemanded.cpp ------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains logic for simplifying instructions based on information		// This file contains logic for simplifying instructions based on information
// about how they are used.		// about how they are used.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

namespace {

struct AMDGPUImageDMaskIntrinsic {
unsigned Intr;
};

#define GET_AMDGPUImageDMaskIntrinsicTable_IMPL
#include "InstCombineTables.inc"

} // end anonymous namespace

/// Check to see if the specified operand of the specified instruction is a		/// Check to see if the specified operand of the specified instruction is a
/// constant integer. If so, check to see if there are any bits set in the		/// constant integer. If so, check to see if there are any bits set in the
/// constant that are not demanded. If so, shrink the constant and return true.		/// constant that are not demanded. If so, shrink the constant and return true.
static bool ShrinkDemandedConstant(Instruction *I, unsigned OpNo,		static bool ShrinkDemandedConstant(Instruction *I, unsigned OpNo,
const APInt &Demanded) {		const APInt &Demanded) {
assert(I && "No instruction?");		assert(I && "No instruction?");
assert(OpNo < I->getNumOperands() && "Operand index too large");		assert(OpNo < I->getNumOperands() && "Operand index too large");

Show All 12 Lines	static bool ShrinkDemandedConstant(Instruction *I, unsigned OpNo,

return true;		return true;
}		}



/// Inst is an integer instruction that SimplifyDemandedBits knows about. See if		/// Inst is an integer instruction that SimplifyDemandedBits knows about. See if
/// the instruction has any properties that allow us to simplify its operands.		/// the instruction has any properties that allow us to simplify its operands.
bool InstCombiner::SimplifyDemandedInstructionBits(Instruction &Inst) {		bool InstCombinerImpl::SimplifyDemandedInstructionBits(Instruction &Inst) {
unsigned BitWidth = Inst.getType()->getScalarSizeInBits();		unsigned BitWidth = Inst.getType()->getScalarSizeInBits();
KnownBits Known(BitWidth);		KnownBits Known(BitWidth);
APInt DemandedMask(APInt::getAllOnesValue(BitWidth));		APInt DemandedMask(APInt::getAllOnesValue(BitWidth));

Value *V = SimplifyDemandedUseBits(&Inst, DemandedMask, Known,		Value *V = SimplifyDemandedUseBits(&Inst, DemandedMask, Known,
0, &Inst);		0, &Inst);
if (!V) return false;		if (!V) return false;
if (V == &Inst) return true;		if (V == &Inst) return true;
replaceInstUsesWith(Inst, V);		replaceInstUsesWith(Inst, V);
return true;		return true;
}		}

/// This form of SimplifyDemandedBits simplifies the specified instruction		/// This form of SimplifyDemandedBits simplifies the specified instruction
/// operand if possible, updating it in place. It returns true if it made any		/// operand if possible, updating it in place. It returns true if it made any
/// change and false otherwise.		/// change and false otherwise.
bool InstCombiner::SimplifyDemandedBits(Instruction *I, unsigned OpNo,		bool InstCombinerImpl::SimplifyDemandedBits(Instruction *I, unsigned OpNo,
const APInt &DemandedMask,		const APInt &DemandedMask,
KnownBits &Known,		KnownBits &Known, unsigned Depth) {
unsigned Depth) {
Use &U = I->getOperandUse(OpNo);		Use &U = I->getOperandUse(OpNo);
Value *NewVal = SimplifyDemandedUseBits(U.get(), DemandedMask, Known,		Value *NewVal = SimplifyDemandedUseBits(U.get(), DemandedMask, Known,
Depth, I);		Depth, I);
if (!NewVal) return false;		if (!NewVal) return false;
if (Instruction* OpInst = dyn_cast<Instruction>(U))		if (Instruction* OpInst = dyn_cast<Instruction>(U))
salvageDebugInfo(*OpInst);		salvageDebugInfo(*OpInst);

replaceUse(U, NewVal);		replaceUse(U, NewVal);
return true;		return true;
}		}


/// This function attempts to replace V with a simpler value based on the		/// This function attempts to replace V with a simpler value based on the
/// demanded bits. When this function is called, it is known that only the bits		/// demanded bits. When this function is called, it is known that only the bits
/// set in DemandedMask of the result of V are ever used downstream.		/// set in DemandedMask of the result of V are ever used downstream.
/// Consequently, depending on the mask and V, it may be possible to replace V		/// Consequently, depending on the mask and V, it may be possible to replace V
/// with a constant or one of its operands. In such cases, this function does		/// with a constant or one of its operands. In such cases, this function does
/// the replacement and returns true. In all other cases, it returns false after		/// the replacement and returns true. In all other cases, it returns false after
/// analyzing the expression and setting KnownOne and known to be one in the		/// analyzing the expression and setting KnownOne and known to be one in the
/// expression. Known.Zero contains all the bits that are known to be zero in		/// expression. Known.Zero contains all the bits that are known to be zero in
/// the expression. These are provided to potentially allow the caller (which		/// the expression. These are provided to potentially allow the caller (which
/// might recursively be SimplifyDemandedBits itself) to simplify the		/// might recursively be SimplifyDemandedBits itself) to simplify the
/// expression.		/// expression.
/// Known.One and Known.Zero always follow the invariant that:		/// Known.One and Known.Zero always follow the invariant that:
/// Known.One & Known.Zero == 0.		/// Known.One & Known.Zero == 0.
/// That is, a bit can't be both 1 and 0. Note that the bits in Known.One and		/// That is, a bit can't be both 1 and 0. Note that the bits in Known.One and
/// Known.Zero may only be accurate for those bits set in DemandedMask. Note		/// Known.Zero may only be accurate for those bits set in DemandedMask. Note
/// also that the bitwidth of V, DemandedMask, Known.Zero and Known.One must all		/// also that the bitwidth of V, DemandedMask, Known.Zero and Known.One must all
/// be the same.		/// be the same.
///		///
/// This returns null if it did not change anything and it permits no		/// This returns null if it did not change anything and it permits no
/// simplification. This returns V itself if it did some simplification of V's		/// simplification. This returns V itself if it did some simplification of V's
/// operands based on the information about what bits are demanded. This returns		/// operands based on the information about what bits are demanded. This returns
/// some other non-null value if it found out that V is equal to another value		/// some other non-null value if it found out that V is equal to another value
/// in the context where the specified bits are demanded, but not for all users.		/// in the context where the specified bits are demanded, but not for all users.
Value InstCombiner::SimplifyDemandedUseBits(Value V, APInt DemandedMask,		Value InstCombinerImpl::SimplifyDemandedUseBits(Value V, APInt DemandedMask,
KnownBits &Known, unsigned Depth,		KnownBits &Known,
		unsigned Depth,
Instruction *CxtI) {		Instruction *CxtI) {
assert(V != nullptr && "Null pointer of Value???");		assert(V != nullptr && "Null pointer of Value???");
assert(Depth <= 6 && "Limit Search Depth");		assert(Depth <= 6 && "Limit Search Depth");
uint32_t BitWidth = DemandedMask.getBitWidth();		uint32_t BitWidth = DemandedMask.getBitWidth();
Type *VTy = V->getType();		Type *VTy = V->getType();
assert(		assert(
(!VTy->isIntOrIntVectorTy() \|\| VTy->getScalarSizeInBits() == BitWidth) &&		(!VTy->isIntOrIntVectorTy() \|\| VTy->getScalarSizeInBits() == BitWidth) &&
Known.getBitWidth() == BitWidth &&		Known.getBitWidth() == BitWidth &&
"Value *V, DemandedMask and Known must have same BitWidth");		"Value *V, DemandedMask and Known must have same BitWidth");
▲ Show 20 Lines • Show All 591 Lines • ▼ Show 20 Lines	case Instruction::URem: {
unsigned Leaders = Known2.countMinLeadingZeros();		unsigned Leaders = Known2.countMinLeadingZeros();
Known.Zero = APInt::getHighBitsSet(BitWidth, Leaders) & DemandedMask;		Known.Zero = APInt::getHighBitsSet(BitWidth, Leaders) & DemandedMask;
break;		break;
}		}
case Instruction::Call: {		case Instruction::Call: {
bool KnownBitsComputed = false;		bool KnownBitsComputed = false;
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default: break;
case Intrinsic::bswap: {		case Intrinsic::bswap: {
// If the only bits demanded come from one byte of the bswap result,		// If the only bits demanded come from one byte of the bswap result,
// just shift the input byte into position to eliminate the bswap.		// just shift the input byte into position to eliminate the bswap.
unsigned NLZ = DemandedMask.countLeadingZeros();		unsigned NLZ = DemandedMask.countLeadingZeros();
unsigned NTZ = DemandedMask.countTrailingZeros();		unsigned NTZ = DemandedMask.countTrailingZeros();

// Round NTZ down to the next byte. If we have 11 trailing zeros, then		// Round NTZ down to the next byte. If we have 11 trailing zeros, then
// we need all the bits down to bit 8. Likewise, round NLZ. If we		// we need all the bits down to bit 8. Likewise, round NLZ. If we
Show All 39 Lines	if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {

Known.Zero = LHSKnown.Zero.shl(ShiftAmt) \|		Known.Zero = LHSKnown.Zero.shl(ShiftAmt) \|
RHSKnown.Zero.lshr(BitWidth - ShiftAmt);		RHSKnown.Zero.lshr(BitWidth - ShiftAmt);
Known.One = LHSKnown.One.shl(ShiftAmt) \|		Known.One = LHSKnown.One.shl(ShiftAmt) \|
RHSKnown.One.lshr(BitWidth - ShiftAmt);		RHSKnown.One.lshr(BitWidth - ShiftAmt);
KnownBitsComputed = true;		KnownBitsComputed = true;
break;		break;
}		}
case Intrinsic::x86_mmx_pmovmskb:		default: {
case Intrinsic::x86_sse_movmsk_ps:		Value *V = nullptr;
case Intrinsic::x86_sse2_movmsk_pd:		if (TTI.simplifyDemandedUseBitsIntrinsic(this, II, DemandedMask,
case Intrinsic::x86_sse2_pmovmskb_128:		Known, KnownBitsComputed, &V))
case Intrinsic::x86_avx_movmsk_ps_256:		return V;
case Intrinsic::x86_avx_movmsk_pd_256:
case Intrinsic::x86_avx2_pmovmskb: {
// MOVMSK copies the vector elements' sign bits to the low bits
// and zeros the high bits.
unsigned ArgWidth;
if (II->getIntrinsicID() == Intrinsic::x86_mmx_pmovmskb) {
ArgWidth = 8; // Arg is x86_mmx, but treated as <8 x i8>.
} else {
auto Arg = II->getArgOperand(0);
auto ArgType = cast<VectorType>(Arg->getType());
ArgWidth = ArgType->getNumElements();
}

// If we don't need any of low bits then return zero,
// we know that DemandedMask is non-zero already.
APInt DemandedElts = DemandedMask.zextOrTrunc(ArgWidth);
if (DemandedElts.isNullValue())
return ConstantInt::getNullValue(VTy);

// We know that the upper bits are set to zero.
Known.Zero.setBitsFrom(ArgWidth);
KnownBitsComputed = true;
break;		break;
}		}
case Intrinsic::x86_sse42_crc32_64_64:
Known.Zero.setBitsFrom(32);
KnownBitsComputed = true;
break;
}		}
}		}

if (!KnownBitsComputed)		if (!KnownBitsComputed)
computeKnownBits(V, Known, Depth, CxtI);		computeKnownBits(V, Known, Depth, CxtI);
break;		break;
}		}
}		}

// If the client is only demanding bits that we know, return the known		// If the client is only demanding bits that we know, return the known
// constant.		// constant.
if (DemandedMask.isSubsetOf(Known.Zero\|Known.One))		if (DemandedMask.isSubsetOf(Known.Zero\|Known.One))
return Constant::getIntegerValue(VTy, Known.One);		return Constant::getIntegerValue(VTy, Known.One);
return nullptr;		return nullptr;
}		}

/// Helper routine of SimplifyDemandedUseBits. It computes Known		/// Helper routine of SimplifyDemandedUseBits. It computes Known
/// bits. It also tries to handle simplifications that can be done based on		/// bits. It also tries to handle simplifications that can be done based on
/// DemandedMask, but without modifying the Instruction.		/// DemandedMask, but without modifying the Instruction.
Value InstCombiner::SimplifyMultipleUseDemandedBits(Instruction I,		Value *InstCombinerImpl::SimplifyMultipleUseDemandedBits(
const APInt &DemandedMask,		Instruction *I, const APInt &DemandedMask, KnownBits &Known, unsigned Depth,
KnownBits &Known,
unsigned Depth,
Instruction *CxtI) {		Instruction *CxtI) {
unsigned BitWidth = DemandedMask.getBitWidth();		unsigned BitWidth = DemandedMask.getBitWidth();
Type *ITy = I->getType();		Type *ITy = I->getType();

KnownBits LHSKnown(BitWidth);		KnownBits LHSKnown(BitWidth);
KnownBits RHSKnown(BitWidth);		KnownBits RHSKnown(BitWidth);

// Despite the fact that we can't simplify this instruction in all User's		// Despite the fact that we can't simplify this instruction in all User's
// context, we can at least compute the known bits, and we can		// context, we can at least compute the known bits, and we can
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if (DemandedMask.isSubsetOf(Known.Zero\|Known.One))
return Constant::getIntegerValue(ITy, Known.One);		return Constant::getIntegerValue(ITy, Known.One);

break;		break;
}		}

return nullptr;		return nullptr;
}		}


/// Helper routine of SimplifyDemandedUseBits. It tries to simplify		/// Helper routine of SimplifyDemandedUseBits. It tries to simplify
/// "E1 = (X lsr C1) << C2", where the C1 and C2 are constant, into		/// "E1 = (X lsr C1) << C2", where the C1 and C2 are constant, into
/// "E2 = X << (C2 - C1)" or "E2 = X >> (C1 - C2)", depending on the sign		/// "E2 = X << (C2 - C1)" or "E2 = X >> (C1 - C2)", depending on the sign
/// of "C2-C1".		/// of "C2-C1".
///		///
/// Suppose E1 and E2 are generally different in bits S={bm, bm+1,		/// Suppose E1 and E2 are generally different in bits S={bm, bm+1,
/// ..., bn}, without considering the specific value X is holding.		/// ..., bn}, without considering the specific value X is holding.
/// This transformation is legal iff one of following conditions is hold:		/// This transformation is legal iff one of following conditions is hold:
/// 1) All the bit in S are 0, in this case E1 == E2.		/// 1) All the bit in S are 0, in this case E1 == E2.
/// 2) We don't care those bits in S, per the input DemandedMask.		/// 2) We don't care those bits in S, per the input DemandedMask.
/// 3) Combination of 1) and 2). Some bits in S are 0, and we don't care the		/// 3) Combination of 1) and 2). Some bits in S are 0, and we don't care the
/// rest bits.		/// rest bits.
///		///
/// Currently we only test condition 2).		/// Currently we only test condition 2).
///		///
/// As with SimplifyDemandedUseBits, it returns NULL if the simplification was		/// As with SimplifyDemandedUseBits, it returns NULL if the simplification was
/// not successful.		/// not successful.
Value *		Value *InstCombinerImpl::simplifyShrShlDemandedBits(
InstCombiner::simplifyShrShlDemandedBits(Instruction *Shr, const APInt &ShrOp1,		Instruction Shr, const APInt &ShrOp1, Instruction Shl,
Instruction *Shl, const APInt &ShlOp1,		const APInt &ShlOp1, const APInt &DemandedMask, KnownBits &Known) {
const APInt &DemandedMask,
KnownBits &Known) {
if (!ShlOp1 \|\| !ShrOp1)		if (!ShlOp1 \|\| !ShrOp1)
return nullptr; // No-op.		return nullptr; // No-op.

Value *VarX = Shr->getOperand(0);		Value *VarX = Shr->getOperand(0);
Type *Ty = VarX->getType();		Type *Ty = VarX->getType();
unsigned BitWidth = Ty->getScalarSizeInBits();		unsigned BitWidth = Ty->getScalarSizeInBits();
if (ShlOp1.uge(BitWidth) \|\| ShrOp1.uge(BitWidth))		if (ShlOp1.uge(BitWidth) \|\| ShrOp1.uge(BitWidth))
return nullptr; // Undef.		return nullptr; // Undef.
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if ((BitMask1 & DemandedMask) == (BitMask2 & DemandedMask)) {
}		}

return InsertNewInstWith(New, *Shl);		return InsertNewInstWith(New, *Shl);
}		}

return nullptr;		return nullptr;
}		}

/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.
///
/// Note: This only supports non-TFE/LWE image intrinsic calls; those have
/// struct returns.
Value InstCombiner::simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,
APInt DemandedElts,
int DMaskIdx) {

// FIXME: Allow v3i16/v3f16 in buffer intrinsics when the types are fully supported.
if (DMaskIdx < 0 &&
II->getType()->getScalarSizeInBits() != 32 &&
DemandedElts.getActiveBits() == 3)
return nullptr;

auto *IIVTy = cast<VectorType>(II->getType());
unsigned VWidth = IIVTy->getNumElements();
if (VWidth == 1)
return nullptr;

IRBuilderBase::InsertPointGuard Guard(Builder);
Builder.SetInsertPoint(II);

// Assume the arguments are unchanged and later override them, if needed.
SmallVector<Value *, 16> Args(II->arg_begin(), II->arg_end());

if (DMaskIdx < 0) {
// Buffer case.

const unsigned ActiveBits = DemandedElts.getActiveBits();
const unsigned UnusedComponentsAtFront = DemandedElts.countTrailingZeros();

// Start assuming the prefix of elements is demanded, but possibly clear
// some other bits if there are trailing zeros (unused components at front)
// and update offset.
DemandedElts = (1 << ActiveBits) - 1;

if (UnusedComponentsAtFront > 0) {
static const unsigned InvalidOffsetIdx = 0xf;

unsigned OffsetIdx;
switch (II->getIntrinsicID()) {
case Intrinsic::amdgcn_raw_buffer_load:
OffsetIdx = 1;
break;
case Intrinsic::amdgcn_s_buffer_load:
// If resulting type is vec3, there is no point in trimming the
// load with updated offset, as the vec3 would most likely be widened to
// vec4 anyway during lowering.
if (ActiveBits == 4 && UnusedComponentsAtFront == 1)
OffsetIdx = InvalidOffsetIdx;
else
OffsetIdx = 1;
break;
case Intrinsic::amdgcn_struct_buffer_load:
OffsetIdx = 2;
break;
default:
// TODO: handle tbuffer* intrinsics.
OffsetIdx = InvalidOffsetIdx;
break;
}

if (OffsetIdx != InvalidOffsetIdx) {
// Clear demanded bits and update the offset.
DemandedElts &= ~((1 << UnusedComponentsAtFront) - 1);
auto *Offset = II->getArgOperand(OffsetIdx);
unsigned SingleComponentSizeInBits =
getDataLayout().getTypeSizeInBits(II->getType()->getScalarType());
unsigned OffsetAdd =
UnusedComponentsAtFront * SingleComponentSizeInBits / 8;
auto *OffsetAddVal = ConstantInt::get(Offset->getType(), OffsetAdd);
Args[OffsetIdx] = Builder.CreateAdd(Offset, OffsetAddVal);
}
}
} else {
// Image case.

ConstantInt *DMask = cast<ConstantInt>(II->getArgOperand(DMaskIdx));
unsigned DMaskVal = DMask->getZExtValue() & 0xf;

// Mask off values that are undefined because the dmask doesn't cover them
DemandedElts &= (1 << countPopulation(DMaskVal)) - 1;

unsigned NewDMaskVal = 0;
unsigned OrigLoadIdx = 0;
for (unsigned SrcIdx = 0; SrcIdx < 4; ++SrcIdx) {
const unsigned Bit = 1 << SrcIdx;
if (!!(DMaskVal & Bit)) {
if (!!DemandedElts[OrigLoadIdx])
NewDMaskVal \|= Bit;
OrigLoadIdx++;
}
}

if (DMaskVal != NewDMaskVal)
Args[DMaskIdx] = ConstantInt::get(DMask->getType(), NewDMaskVal);
}

unsigned NewNumElts = DemandedElts.countPopulation();
if (!NewNumElts)
return UndefValue::get(II->getType());

if (NewNumElts >= VWidth && DemandedElts.isMask()) {
if (DMaskIdx >= 0)
II->setArgOperand(DMaskIdx, Args[DMaskIdx]);
return nullptr;
}

// Determine the overload types of the original intrinsic.
auto IID = II->getIntrinsicID();
SmallVector<Intrinsic::IITDescriptor, 16> Table;
getIntrinsicInfoTableEntries(IID, Table);
ArrayRef<Intrinsic::IITDescriptor> TableRef = Table;

// Validate function argument and return types, extracting overloaded types
// along the way.
FunctionType *FTy = II->getCalledFunction()->getFunctionType();
SmallVector<Type *, 6> OverloadTys;
Intrinsic::matchIntrinsicSignature(FTy, TableRef, OverloadTys);

Module *M = II->getParent()->getParent()->getParent();
Type *EltTy = IIVTy->getElementType();
Type *NewTy =
(NewNumElts == 1) ? EltTy : FixedVectorType::get(EltTy, NewNumElts);

OverloadTys[0] = NewTy;
Function *NewIntrin = Intrinsic::getDeclaration(M, IID, OverloadTys);

CallInst *NewCall = Builder.CreateCall(NewIntrin, Args);
NewCall->takeName(II);
NewCall->copyMetadata(*II);

if (NewNumElts == 1) {
return Builder.CreateInsertElement(UndefValue::get(II->getType()), NewCall,
DemandedElts.countTrailingZeros());
}

SmallVector<int, 8> EltMask;
unsigned NewLoadIdx = 0;
for (unsigned OrigLoadIdx = 0; OrigLoadIdx < VWidth; ++OrigLoadIdx) {
if (!!DemandedElts[OrigLoadIdx])
EltMask.push_back(NewLoadIdx++);
else
EltMask.push_back(NewNumElts);
}

Value *Shuffle =
Builder.CreateShuffleVector(NewCall, UndefValue::get(NewTy), EltMask);

return Shuffle;
}

/// The specified value produces a vector with any number of elements.		/// The specified value produces a vector with any number of elements.
/// This method analyzes which elements of the operand are undef and returns		/// This method analyzes which elements of the operand are undef and returns
/// that information in UndefElts.		/// that information in UndefElts.
///		///
/// DemandedElts contains the set of elements that are actually used by the		/// DemandedElts contains the set of elements that are actually used by the
/// caller, and by default (AllowMultipleUsers equals false) the value is		/// caller, and by default (AllowMultipleUsers equals false) the value is
/// simplified only if it has a single caller. If AllowMultipleUsers is set		/// simplified only if it has a single caller. If AllowMultipleUsers is set
/// to true, DemandedElts refers to the union of sets of elements that are		/// to true, DemandedElts refers to the union of sets of elements that are
/// used by all callers.		/// used by all callers.
///		///
/// If the information about demanded elements can be used to simplify the		/// If the information about demanded elements can be used to simplify the
/// operation, the operation is simplified, then the resultant value is		/// operation, the operation is simplified, then the resultant value is
/// returned. This returns null if no change was made.		/// returned. This returns null if no change was made.
Value InstCombiner::SimplifyDemandedVectorElts(Value V, APInt DemandedElts,		Value InstCombinerImpl::SimplifyDemandedVectorElts(Value V,
		APInt DemandedElts,
APInt &UndefElts,		APInt &UndefElts,
unsigned Depth,		unsigned Depth,
bool AllowMultipleUsers) {		bool AllowMultipleUsers) {
// Cannot analyze scalable type. The number of vector elements is not a		// Cannot analyze scalable type. The number of vector elements is not a
// compile-time constant.		// compile-time constant.
if (isa<ScalableVectorType>(V->getType()))		if (isa<ScalableVectorType>(V->getType()))
return nullptr;		return nullptr;

unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();		unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
APInt EltMask(APInt::getAllOnesValue(VWidth));		APInt EltMask(APInt::getAllOnesValue(VWidth));
assert((DemandedElts & ~EltMask) == 0 && "Invalid DemandedElts!");		assert((DemandedElts & ~EltMask) == 0 && "Invalid DemandedElts!");
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	auto mayIndexStructType = [](GetElementPtrInst &GEP) {
for (auto I = gep_type_begin(GEP), E = gep_type_end(GEP);		for (auto I = gep_type_begin(GEP), E = gep_type_end(GEP);
I != E; I++)		I != E; I++)
if (I.isStruct())		if (I.isStruct())
return true;;		return true;;
return false;		return false;
};		};
if (mayIndexStructType(cast<GetElementPtrInst>(*I)))		if (mayIndexStructType(cast<GetElementPtrInst>(*I)))
break;		break;

// Conservatively track the demanded elements back through any vector		// Conservatively track the demanded elements back through any vector
// operands we may have. We know there must be at least one, or we		// operands we may have. We know there must be at least one, or we
// wouldn't have a vector result to get here. Note that we intentionally		// wouldn't have a vector result to get here. Note that we intentionally
// merge the undef bits here since gepping with either an undef base or		// merge the undef bits here since gepping with either an undef base or
// index results in undef.		// index results in undef.
for (unsigned i = 0; i < I->getNumOperands(); i++) {		for (unsigned i = 0; i < I->getNumOperands(); i++) {
if (isa<UndefValue>(I->getOperand(i))) {		if (isa<UndefValue>(I->getOperand(i))) {
// If the entire vector is undefined, just return this info.		// If the entire vector is undefined, just return this info.
UndefElts = EltMask;		UndefElts = EltMask;
return nullptr;		return nullptr;
}		}
if (I->getOperand(i)->getType()->isVectorTy()) {		if (I->getOperand(i)->getType()->isVectorTy()) {
APInt UndefEltsOp(VWidth, 0);		APInt UndefEltsOp(VWidth, 0);
▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	case Intrinsic::masked_load: {
if (CElt->isNullValue())		if (CElt->isNullValue())
DemandedPtrs.clearBit(i);		DemandedPtrs.clearBit(i);
else if (CElt->isAllOnesValue())		else if (CElt->isAllOnesValue())
DemandedPassThrough.clearBit(i);		DemandedPassThrough.clearBit(i);
}		}
if (II->getIntrinsicID() == Intrinsic::masked_gather)		if (II->getIntrinsicID() == Intrinsic::masked_gather)
simplifyAndSetOp(II, 0, DemandedPtrs, UndefElts2);		simplifyAndSetOp(II, 0, DemandedPtrs, UndefElts2);
simplifyAndSetOp(II, 3, DemandedPassThrough, UndefElts3);		simplifyAndSetOp(II, 3, DemandedPassThrough, UndefElts3);

// Output elements are undefined if the element from both sources are.		// Output elements are undefined if the element from both sources are.
// TODO: can strengthen via mask as well.		// TODO: can strengthen via mask as well.
UndefElts = UndefElts2 & UndefElts3;		UndefElts = UndefElts2 & UndefElts3;
break;		break;
}		}
case Intrinsic::x86_xop_vfrcz_ss:
case Intrinsic::x86_xop_vfrcz_sd:
// The instructions for these intrinsics are speced to zero upper bits not
// pass them through like other scalar intrinsics. So we shouldn't just
// use Arg0 if DemandedElts[0] is clear like we do for other intrinsics.
// Instead we should return a zero vector.
if (!DemandedElts[0]) {
Worklist.push(II);
return ConstantAggregateZero::get(II->getType());
}

// Only the lower element is used.
DemandedElts = 1;
simplifyAndSetOp(II, 0, DemandedElts, UndefElts);

// Only the lower element is undefined. The high elements are zero.
UndefElts = UndefElts[0];
break;

// Unary scalar-as-vector operations that work column-wise.
case Intrinsic::x86_sse_rcp_ss:
case Intrinsic::x86_sse_rsqrt_ss:
simplifyAndSetOp(II, 0, DemandedElts, UndefElts);

// If lowest element of a scalar op isn't used then use Arg0.
if (!DemandedElts[0]) {
Worklist.push(II);
return II->getArgOperand(0);
}
// TODO: If only low elt lower SQRT to FSQRT (with rounding/exceptions
// checks).
break;

// Binary scalar-as-vector operations that work column-wise. The high
// elements come from operand 0. The low element is a function of both
// operands.
case Intrinsic::x86_sse_min_ss:
case Intrinsic::x86_sse_max_ss:
case Intrinsic::x86_sse_cmp_ss:
case Intrinsic::x86_sse2_min_sd:
case Intrinsic::x86_sse2_max_sd:
case Intrinsic::x86_sse2_cmp_sd: {
simplifyAndSetOp(II, 0, DemandedElts, UndefElts);

// If lowest element of a scalar op isn't used then use Arg0.
if (!DemandedElts[0]) {
Worklist.push(II);
return II->getArgOperand(0);
}

// Only lower element is used for operand 1.
DemandedElts = 1;
simplifyAndSetOp(II, 1, DemandedElts, UndefElts2);

// Lower element is undefined if both lower elements are undefined.
// Consider things like undef&0. The result is known zero, not undef.
if (!UndefElts2[0])
UndefElts.clearBit(0);

break;
}

// Binary scalar-as-vector operations that work column-wise. The high
// elements come from operand 0 and the low element comes from operand 1.
case Intrinsic::x86_sse41_round_ss:
case Intrinsic::x86_sse41_round_sd: {
// Don't use the low element of operand 0.
APInt DemandedElts2 = DemandedElts;
DemandedElts2.clearBit(0);
simplifyAndSetOp(II, 0, DemandedElts2, UndefElts);

// If lowest element of a scalar op isn't used then use Arg0.
if (!DemandedElts[0]) {
Worklist.push(II);
return II->getArgOperand(0);
}

// Only lower element is used for operand 1.
DemandedElts = 1;
simplifyAndSetOp(II, 1, DemandedElts, UndefElts2);

// Take the high undef elements from operand 0 and take the lower element
// from operand 1.
UndefElts.clearBit(0);
UndefElts \|= UndefElts2[0];
break;
}

// Three input scalar-as-vector operations that work column-wise. The high
// elements come from operand 0 and the low element is a function of all
// three inputs.
case Intrinsic::x86_avx512_mask_add_ss_round:
case Intrinsic::x86_avx512_mask_div_ss_round:
case Intrinsic::x86_avx512_mask_mul_ss_round:
case Intrinsic::x86_avx512_mask_sub_ss_round:
case Intrinsic::x86_avx512_mask_max_ss_round:
case Intrinsic::x86_avx512_mask_min_ss_round:
case Intrinsic::x86_avx512_mask_add_sd_round:
case Intrinsic::x86_avx512_mask_div_sd_round:
case Intrinsic::x86_avx512_mask_mul_sd_round:
case Intrinsic::x86_avx512_mask_sub_sd_round:
case Intrinsic::x86_avx512_mask_max_sd_round:
case Intrinsic::x86_avx512_mask_min_sd_round:
simplifyAndSetOp(II, 0, DemandedElts, UndefElts);

// If lowest element of a scalar op isn't used then use Arg0.
if (!DemandedElts[0]) {
Worklist.push(II);
return II->getArgOperand(0);
}

// Only lower element is used for operand 1 and 2.
DemandedElts = 1;
simplifyAndSetOp(II, 1, DemandedElts, UndefElts2);
simplifyAndSetOp(II, 2, DemandedElts, UndefElts3);

// Lower element is undefined if all three lower elements are undefined.
// Consider things like undef&0. The result is known zero, not undef.
if (!UndefElts2[0] \|\| !UndefElts3[0])
UndefElts.clearBit(0);

break;

case Intrinsic::x86_sse2_packssdw_128:
case Intrinsic::x86_sse2_packsswb_128:
case Intrinsic::x86_sse2_packuswb_128:
case Intrinsic::x86_sse41_packusdw:
case Intrinsic::x86_avx2_packssdw:
case Intrinsic::x86_avx2_packsswb:
case Intrinsic::x86_avx2_packusdw:
case Intrinsic::x86_avx2_packuswb:
case Intrinsic::x86_avx512_packssdw_512:
case Intrinsic::x86_avx512_packsswb_512:
case Intrinsic::x86_avx512_packusdw_512:
case Intrinsic::x86_avx512_packuswb_512: {
auto *Ty0 = II->getArgOperand(0)->getType();
unsigned InnerVWidth = cast<VectorType>(Ty0)->getNumElements();
assert(VWidth == (InnerVWidth * 2) && "Unexpected input size");

unsigned NumLanes = Ty0->getPrimitiveSizeInBits() / 128;
unsigned VWidthPerLane = VWidth / NumLanes;
unsigned InnerVWidthPerLane = InnerVWidth / NumLanes;

// Per lane, pack the elements of the first input and then the second.
// e.g.
// v8i16 PACK(v4i32 X, v4i32 Y) - (X[0..3],Y[0..3])
// v32i8 PACK(v16i16 X, v16i16 Y) - (X[0..7],Y[0..7]),(X[8..15],Y[8..15])
for (int OpNum = 0; OpNum != 2; ++OpNum) {
APInt OpDemandedElts(InnerVWidth, 0);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
unsigned LaneIdx = Lane * VWidthPerLane;
for (unsigned Elt = 0; Elt != InnerVWidthPerLane; ++Elt) {
unsigned Idx = LaneIdx + Elt + InnerVWidthPerLane * OpNum;
if (DemandedElts[Idx])
OpDemandedElts.setBit((Lane * InnerVWidthPerLane) + Elt);
}
}

// Demand elements from the operand.
APInt OpUndefElts(InnerVWidth, 0);
simplifyAndSetOp(II, OpNum, OpDemandedElts, OpUndefElts);

// Pack the operand's UNDEF elements, one lane at a time.
OpUndefElts = OpUndefElts.zext(VWidth);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
APInt LaneElts = OpUndefElts.lshr(InnerVWidthPerLane * Lane);
LaneElts = LaneElts.getLoBits(InnerVWidthPerLane);
LaneElts <<= InnerVWidthPerLane * (2 * Lane + OpNum);
UndefElts \|= LaneElts;
}
}
break;
}

// PSHUFB
case Intrinsic::x86_ssse3_pshuf_b_128:
case Intrinsic::x86_avx2_pshuf_b:
case Intrinsic::x86_avx512_pshuf_b_512:
// PERMILVAR
case Intrinsic::x86_avx_vpermilvar_ps:
case Intrinsic::x86_avx_vpermilvar_ps_256:
case Intrinsic::x86_avx512_vpermilvar_ps_512:
case Intrinsic::x86_avx_vpermilvar_pd:
case Intrinsic::x86_avx_vpermilvar_pd_256:
case Intrinsic::x86_avx512_vpermilvar_pd_512:
// PERMV
case Intrinsic::x86_avx2_permd:
case Intrinsic::x86_avx2_permps: {
simplifyAndSetOp(II, 1, DemandedElts, UndefElts);
break;
}

// SSE4A instructions leave the upper 64-bits of the 128-bit result
// in an undefined state.
case Intrinsic::x86_sse4a_extrq:
case Intrinsic::x86_sse4a_extrqi:
case Intrinsic::x86_sse4a_insertq:
case Intrinsic::x86_sse4a_insertqi:
UndefElts.setHighBits(VWidth / 2);
break;
case Intrinsic::amdgcn_buffer_load:
case Intrinsic::amdgcn_buffer_load_format:
case Intrinsic::amdgcn_raw_buffer_load:
case Intrinsic::amdgcn_raw_buffer_load_format:
case Intrinsic::amdgcn_raw_tbuffer_load:
case Intrinsic::amdgcn_s_buffer_load:
case Intrinsic::amdgcn_struct_buffer_load:
case Intrinsic::amdgcn_struct_buffer_load_format:
case Intrinsic::amdgcn_struct_tbuffer_load:
case Intrinsic::amdgcn_tbuffer_load:
return simplifyAMDGCNMemoryIntrinsicDemanded(II, DemandedElts);
default: {		default: {
if (getAMDGPUImageDMaskIntrinsic(II->getIntrinsicID()))		Value *V = nullptr;
return simplifyAMDGCNMemoryIntrinsicDemanded(II, DemandedElts, 0);		if (TTI.simplifyDemandedVectorEltsIntrinsic(
		this, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
		simplifyAndSetOp, &V))
		efriedmaUnsubmitted Not Done Reply Inline Actions Is there some way we can check that an intrinsic is actually target-specific, to discourage people from handling generic intrinsics in target-specific ways? efriedma: Is there some way we can check that an intrinsic is actually target-specific, to discourage…
		foadUnsubmitted Not Done Reply Inline Actions That was the intent of @bogner's rG92a8c6112c6571112e8b622bfddc7e4d1685a6fe. foad: That was the intent of @bogner's rG92a8c6112c6571112e8b622bfddc7e4d1685a6fe.
		return V;
break;		break;
}		}
} // switch on IntrinsicID		} // switch on IntrinsicID
break;		break;
} // case Call		} // case Call
} // switch on Opcode		} // switch on Opcode

// TODO: We bail completely on integer div/rem and shifts because they have		// TODO: We bail completely on integer div/rem and shifts because they have
Show All 24 Lines

llvm/lib/Transforms/InstCombine/InstCombineTables.td

This file was moved to llvm/lib/Target/AMDGPU/InstCombineTables.td.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show All 29 Lines
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

Show All 34 Lines	if (cheapToScalarize(V0, IsConstantExtractIndex) \|\|
return true;		return true;

return false;		return false;
}		}

// If we have a PHI node with a vector type that is only used to feed		// If we have a PHI node with a vector type that is only used to feed
// itself and be an operand of extractelement at a constant location,		// itself and be an operand of extractelement at a constant location,
// try to replace the PHI of the vector type with a PHI of a scalar type.		// try to replace the PHI of the vector type with a PHI of a scalar type.
Instruction InstCombiner::scalarizePHI(ExtractElementInst &EI, PHINode PN) {		Instruction *InstCombinerImpl::scalarizePHI(ExtractElementInst &EI,
		PHINode *PN) {
SmallVector<Instruction *, 2> Extracts;		SmallVector<Instruction *, 2> Extracts;
// The users we want the PHI to have are:		// The users we want the PHI to have are:
// 1) The EI ExtractElement (we already know this)		// 1) The EI ExtractElement (we already know this)
// 2) Possibly more ExtractElements with the same index.		// 2) Possibly more ExtractElements with the same index.
// 3) Another operand, which will feed back into the PHI.		// 3) Another operand, which will feed back into the PHI.
Instruction *PHIUser = nullptr;		Instruction *PHIUser = nullptr;
for (auto U : PN->users()) {		for (auto U : PN->users()) {
if (ExtractElementInst *EU = dyn_cast<ExtractElementInst>(U)) {		if (ExtractElementInst *EU = dyn_cast<ExtractElementInst>(U)) {
▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	for (const Use &U : V->uses()) {

if (UnionUsedElts.isAllOnesValue())		if (UnionUsedElts.isAllOnesValue())
break;		break;
}		}

return UnionUsedElts;		return UnionUsedElts;
}		}

Instruction *InstCombiner::visitExtractElementInst(ExtractElementInst &EI) {		Instruction *InstCombinerImpl::visitExtractElementInst(ExtractElementInst &EI) {
Value *SrcVec = EI.getVectorOperand();		Value *SrcVec = EI.getVectorOperand();
Value *Index = EI.getIndexOperand();		Value *Index = EI.getIndexOperand();
if (Value *V = SimplifyExtractElementInst(SrcVec, Index,		if (Value *V = SimplifyExtractElementInst(SrcVec, Index,
SQ.getWithInstruction(&EI)))		SQ.getWithInstruction(&EI)))
return replaceInstUsesWith(EI, V);		return replaceInstUsesWith(EI, V);

// If extracting a specified index from the vector, see if we can recursively		// If extracting a specified index from the vector, see if we can recursively
// find a previously computed scalar that was inserted into the vector.		// find a previously computed scalar that was inserted into the vector.
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	static bool collectSingleShuffleElements(Value V, Value LHS, Value *RHS,
return false;		return false;
}		}

/// If we have insertion into a vector that is wider than the vector that we		/// If we have insertion into a vector that is wider than the vector that we
/// are extracting from, try to widen the source vector to allow a single		/// are extracting from, try to widen the source vector to allow a single
/// shufflevector to replace one or more insert/extract pairs.		/// shufflevector to replace one or more insert/extract pairs.
static void replaceExtractElements(InsertElementInst *InsElt,		static void replaceExtractElements(InsertElementInst *InsElt,
ExtractElementInst *ExtElt,		ExtractElementInst *ExtElt,
InstCombiner &IC) {		InstCombinerImpl &IC) {
VectorType *InsVecType = InsElt->getType();		VectorType *InsVecType = InsElt->getType();
VectorType *ExtVecType = ExtElt->getVectorOperandType();		VectorType *ExtVecType = ExtElt->getVectorOperandType();
unsigned NumInsElts = InsVecType->getNumElements();		unsigned NumInsElts = InsVecType->getNumElements();
unsigned NumExtElts = ExtVecType->getNumElements();		unsigned NumExtElts = ExtVecType->getNumElements();

// The inserted-to vector must be wider than the extracted-from vector.		// The inserted-to vector must be wider than the extracted-from vector.
if (InsVecType->getElementType() != ExtVecType->getElementType() \|\|		if (InsVecType->getElementType() != ExtVecType->getElementType() \|\|
NumExtElts >= NumInsElts)		NumExtElts >= NumInsElts)
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
/// parameter as required.		/// parameter as required.
///		///
/// Note: we intentionally don't try to fold earlier shuffles since they have		/// Note: we intentionally don't try to fold earlier shuffles since they have
/// often been chosen carefully to be efficiently implementable on the target.		/// often been chosen carefully to be efficiently implementable on the target.
using ShuffleOps = std::pair<Value , Value >;		using ShuffleOps = std::pair<Value , Value >;

static ShuffleOps collectShuffleElements(Value *V, SmallVectorImpl<int> &Mask,		static ShuffleOps collectShuffleElements(Value *V, SmallVectorImpl<int> &Mask,
Value *PermittedRHS,		Value *PermittedRHS,
InstCombiner &IC) {		InstCombinerImpl &IC) {
assert(V->getType()->isVectorTy() && "Invalid shuffle!");		assert(V->getType()->isVectorTy() && "Invalid shuffle!");
unsigned NumElts = cast<FixedVectorType>(V->getType())->getNumElements();		unsigned NumElts = cast<FixedVectorType>(V->getType())->getNumElements();

if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
Mask.assign(NumElts, -1);		Mask.assign(NumElts, -1);
return std::make_pair(		return std::make_pair(
PermittedRHS ? UndefValue::get(PermittedRHS->getType()) : V, nullptr);		PermittedRHS ? UndefValue::get(PermittedRHS->getType()) : V, nullptr);
}		}
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines

/// Try to find redundant insertvalue instructions, like the following ones:		/// Try to find redundant insertvalue instructions, like the following ones:
/// %0 = insertvalue { i8, i32 } undef, i8 %x, 0		/// %0 = insertvalue { i8, i32 } undef, i8 %x, 0
/// %1 = insertvalue { i8, i32 } %0, i8 %y, 0		/// %1 = insertvalue { i8, i32 } %0, i8 %y, 0
/// Here the second instruction inserts values at the same indices, as the		/// Here the second instruction inserts values at the same indices, as the
/// first one, making the first one redundant.		/// first one, making the first one redundant.
/// It should be transformed to:		/// It should be transformed to:
/// %0 = insertvalue { i8, i32 } undef, i8 %y, 0		/// %0 = insertvalue { i8, i32 } undef, i8 %y, 0
Instruction *InstCombiner::visitInsertValueInst(InsertValueInst &I) {		Instruction *InstCombinerImpl::visitInsertValueInst(InsertValueInst &I) {
bool IsRedundant = false;		bool IsRedundant = false;
ArrayRef<unsigned int> FirstIndices = I.getIndices();		ArrayRef<unsigned int> FirstIndices = I.getIndices();

// If there is a chain of insertvalue instructions (each of them except the		// If there is a chain of insertvalue instructions (each of them except the
// last one has only one use and it's another insertvalue insn from this		// last one has only one use and it's another insertvalue insn from this
// chain), check if any of the 'children' uses the same indices as the first		// chain), check if any of the 'children' uses the same indices as the first
// instruction. In this case, the first one is redundant.		// instruction. In this case, the first one is redundant.
Value *V = &I;		Value *V = &I;
▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if (auto *Shuf = dyn_cast<ShuffleVectorInst>(InsElt.getOperand(0))) {
// Create new operands for a shuffle that includes the constant of the		// Create new operands for a shuffle that includes the constant of the
// original insertelt.		// original insertelt.
return new ShuffleVectorInst(IEI->getOperand(0),		return new ShuffleVectorInst(IEI->getOperand(0),
ConstantVector::get(Values), Mask);		ConstantVector::get(Values), Mask);
}		}
return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {		Instruction *InstCombinerImpl::visitInsertElementInst(InsertElementInst &IE) {
Value *VecOp = IE.getOperand(0);		Value *VecOp = IE.getOperand(0);
Value *ScalarOp = IE.getOperand(1);		Value *ScalarOp = IE.getOperand(1);
Value *IdxOp = IE.getOperand(2);		Value *IdxOp = IE.getOperand(2);

if (auto *V = SimplifyInsertElementInst(		if (auto *V = SimplifyInsertElementInst(
VecOp, ScalarOp, IdxOp, SQ.getWithInstruction(&IE)))		VecOp, ScalarOp, IdxOp, SQ.getWithInstruction(&IE)))
return replaceInstUsesWith(IE, V);		return replaceInstUsesWith(IE, V);

▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	static Instruction *foldSelectShuffleWith1Binop(ShuffleVectorInst &Shuf) {
ArrayRef<int> Mask = Shuf.getShuffleMask();		ArrayRef<int> Mask = Shuf.getShuffleMask();
Constant *NewC = Op0IsBinop ? ConstantExpr::getShuffleVector(C, IdC, Mask) :		Constant *NewC = Op0IsBinop ? ConstantExpr::getShuffleVector(C, IdC, Mask) :
ConstantExpr::getShuffleVector(IdC, C, Mask);		ConstantExpr::getShuffleVector(IdC, C, Mask);

bool MightCreatePoisonOrUB =		bool MightCreatePoisonOrUB =
is_contained(Mask, UndefMaskElem) &&		is_contained(Mask, UndefMaskElem) &&
(Instruction::isIntDivRem(BOpcode) \|\| Instruction::isShift(BOpcode));		(Instruction::isIntDivRem(BOpcode) \|\| Instruction::isShift(BOpcode));
if (MightCreatePoisonOrUB)		if (MightCreatePoisonOrUB)
NewC = getSafeVectorConstantForBinop(BOpcode, NewC, true);		NewC = InstCombiner::getSafeVectorConstantForBinop(BOpcode, NewC, true);

// shuf (bop X, C), X, M --> bop X, C'		// shuf (bop X, C), X, M --> bop X, C'
// shuf X, (bop X, C), M --> bop X, C'		// shuf X, (bop X, C), M --> bop X, C'
Value *X = Op0IsBinop ? Op1 : Op0;		Value *X = Op0IsBinop ? Op1 : Op0;
Instruction *NewBO = BinaryOperator::Create(BOpcode, X, NewC);		Instruction *NewBO = BinaryOperator::Create(BOpcode, X, NewC);
NewBO->copyIRFlags(BO);		NewBO->copyIRFlags(BO);

// An undef shuffle mask element may propagate as an undef constant element in		// An undef shuffle mask element may propagate as an undef constant element in
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	static Instruction *foldSelectShuffle(ShuffleVectorInst &Shuf,

// We are moving a binop after a shuffle. When a shuffle has an undefined		// We are moving a binop after a shuffle. When a shuffle has an undefined
// mask element, the result is undefined, but it is not poison or undefined		// mask element, the result is undefined, but it is not poison or undefined
// behavior. That is not necessarily true for div/rem/shift.		// behavior. That is not necessarily true for div/rem/shift.
bool MightCreatePoisonOrUB =		bool MightCreatePoisonOrUB =
is_contained(Mask, UndefMaskElem) &&		is_contained(Mask, UndefMaskElem) &&
(Instruction::isIntDivRem(BOpc) \|\| Instruction::isShift(BOpc));		(Instruction::isIntDivRem(BOpc) \|\| Instruction::isShift(BOpc));
if (MightCreatePoisonOrUB)		if (MightCreatePoisonOrUB)
NewC = getSafeVectorConstantForBinop(BOpc, NewC, ConstantsAreOp1);		NewC = InstCombiner::getSafeVectorConstantForBinop(BOpc, NewC,
		ConstantsAreOp1);

Value *V;		Value *V;
if (X == Y) {		if (X == Y) {
// Remove a binop and the shuffle by rearranging the constant:		// Remove a binop and the shuffle by rearranging the constant:
// shuffle (op V, C0), (op V, C1), M --> op V, C'		// shuffle (op V, C0), (op V, C1), M --> op V, C'
// shuffle (op C0, V), (op C1, V), M --> op C', V		// shuffle (op C0, V), (op C1, V), M --> op C', V
V = X;		V = X;
} else {		} else {
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumElts; ++i) {
NewMask[i] = ExtractMaskElt == UndefMaskElem ? ExtractMaskElt : MaskElt;		NewMask[i] = ExtractMaskElt == UndefMaskElem ? ExtractMaskElt : MaskElt;
}		}
return new ShuffleVectorInst(X, Y, NewMask);		return new ShuffleVectorInst(X, Y, NewMask);
}		}

/// Try to replace a shuffle with an insertelement or try to replace a shuffle		/// Try to replace a shuffle with an insertelement or try to replace a shuffle
/// operand with the operand of an insertelement.		/// operand with the operand of an insertelement.
static Instruction *foldShuffleWithInsert(ShuffleVectorInst &Shuf,		static Instruction *foldShuffleWithInsert(ShuffleVectorInst &Shuf,
InstCombiner &IC) {		InstCombinerImpl &IC) {
Value V0 = Shuf.getOperand(0), V1 = Shuf.getOperand(1);		Value V0 = Shuf.getOperand(0), V1 = Shuf.getOperand(1);
SmallVector<int, 16> Mask;		SmallVector<int, 16> Mask;
Shuf.getShuffleMask(Mask);		Shuf.getShuffleMask(Mask);

// The shuffle must not change vector sizes.		// The shuffle must not change vector sizes.
// TODO: This restriction could be removed if the insert has only one use		// TODO: This restriction could be removed if the insert has only one use
// (because the transform would require a new length-changing shuffle).		// (because the transform would require a new length-changing shuffle).
int NumElts = Mask.size();		int NumElts = Mask.size();
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	for (int i = 0, e = Mask.size(); i != e; ++i) {
} else {		} else {
assert(Mask[i] < (WideElts + NarrowElts) && "Unexpected shuffle mask");		assert(Mask[i] < (WideElts + NarrowElts) && "Unexpected shuffle mask");
NewMask[i] = Mask[i] - (WideElts - NarrowElts);		NewMask[i] = Mask[i] - (WideElts - NarrowElts);
}		}
}		}
return new ShuffleVectorInst(X, Y, NewMask);		return new ShuffleVectorInst(X, Y, NewMask);
}		}

Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {		Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
Value *LHS = SVI.getOperand(0);		Value *LHS = SVI.getOperand(0);
Value *RHS = SVI.getOperand(1);		Value *RHS = SVI.getOperand(1);
SimplifyQuery ShufQuery = SQ.getWithInstruction(&SVI);		SimplifyQuery ShufQuery = SQ.getWithInstruction(&SVI);
if (auto *V = SimplifyShuffleVectorInst(LHS, RHS, SVI.getShuffleMask(),		if (auto *V = SimplifyShuffleVectorInst(LHS, RHS, SVI.getShuffleMask(),
SVI.getType(), ShufQuery))		SVI.getType(), ShufQuery))
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);

// shuffle x, x, mask --> shuffle x, undef, mask'		// shuffle x, x, mask --> shuffle x, undef, mask'
▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyBlockFrequencyInfo.h"		#include "llvm/Analysis/LazyBlockFrequencyInfo.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
// increases variable availability at the cost of accuracy. Variables that		// increases variable availability at the cost of accuracy. Variables that
// cannot be promoted by mem2reg or SROA will be described as living in memory		// cannot be promoted by mem2reg or SROA will be described as living in memory
// for their entire lifetime. However, passes like DSE and instcombine can		// for their entire lifetime. However, passes like DSE and instcombine can
// delete stores to the alloca, leading to misleading and inaccurate debug		// delete stores to the alloca, leading to misleading and inaccurate debug
// information. This flag can be removed when those passes are fixed.		// information. This flag can be removed when those passes are fixed.
static cl::opt<unsigned> ShouldLowerDbgDeclare("instcombine-lower-dbg-declare",		static cl::opt<unsigned> ShouldLowerDbgDeclare("instcombine-lower-dbg-declare",
cl::Hidden, cl::init(true));		cl::Hidden, cl::init(true));

Value InstCombiner::EmitGEPOffset(User GEP) {		Value InstCombinerImpl::EmitGEPOffset(User GEP) {
return llvm::EmitGEPOffset(&Builder, DL, GEP);		return llvm::EmitGEPOffset(&Builder, DL, GEP);
}		}

/// Return true if it is desirable to convert an integer computation from a		/// Return true if it is desirable to convert an integer computation from a
/// given bit width to a new bit width.		/// given bit width to a new bit width.
/// We don't want to convert from a legal to an illegal type or from a smaller		/// We don't want to convert from a legal to an illegal type or from a smaller
/// to a larger illegal type. A width of '1' is always treated as a legal type		/// to a larger illegal type. A width of '1' is always treated as a legal type
/// because i1 is a fundamental type in IR, and there are many specialized		/// because i1 is a fundamental type in IR, and there are many specialized
/// optimizations for i1 types. Widths of 8, 16 or 32 are equally treated as		/// optimizations for i1 types. Widths of 8, 16 or 32 are equally treated as
/// legal to convert to, in order to open up more combining opportunities.		/// legal to convert to, in order to open up more combining opportunities.
/// NOTE: this treats i8, i16 and i32 specially, due to them being so common		/// NOTE: this treats i8, i16 and i32 specially, due to them being so common
/// from frontend languages.		/// from frontend languages.
bool InstCombiner::shouldChangeType(unsigned FromWidth,		bool InstCombinerImpl::shouldChangeType(unsigned FromWidth,
unsigned ToWidth) const {		unsigned ToWidth) const {
bool FromLegal = FromWidth == 1 \|\| DL.isLegalInteger(FromWidth);		bool FromLegal = FromWidth == 1 \|\| DL.isLegalInteger(FromWidth);
bool ToLegal = ToWidth == 1 \|\| DL.isLegalInteger(ToWidth);		bool ToLegal = ToWidth == 1 \|\| DL.isLegalInteger(ToWidth);

// Convert to widths of 8, 16 or 32 even if they are not legal types. Only		// Convert to widths of 8, 16 or 32 even if they are not legal types. Only
// shrink types, to prevent infinite loops.		// shrink types, to prevent infinite loops.
if (ToWidth < FromWidth && (ToWidth == 8 \|\| ToWidth == 16 \|\| ToWidth == 32))		if (ToWidth < FromWidth && (ToWidth == 8 \|\| ToWidth == 16 \|\| ToWidth == 32))
return true;		return true;

Show All 10 Lines	bool InstCombinerImpl::shouldChangeType(unsigned FromWidth,
return true;		return true;
}		}

/// Return true if it is desirable to convert a computation from 'From' to 'To'.		/// Return true if it is desirable to convert a computation from 'From' to 'To'.
/// We don't want to convert from a legal to an illegal type or from a smaller		/// We don't want to convert from a legal to an illegal type or from a smaller
/// to a larger illegal type. i1 is always treated as a legal type because it is		/// to a larger illegal type. i1 is always treated as a legal type because it is
/// a fundamental type in IR, and there are many specialized optimizations for		/// a fundamental type in IR, and there are many specialized optimizations for
/// i1 types.		/// i1 types.
bool InstCombiner::shouldChangeType(Type From, Type To) const {		bool InstCombinerImpl::shouldChangeType(Type From, Type To) const {
// TODO: This could be extended to allow vectors. Datalayout changes might be		// TODO: This could be extended to allow vectors. Datalayout changes might be
// needed to properly support that.		// needed to properly support that.
if (!From->isIntegerTy() \|\| !To->isIntegerTy())		if (!From->isIntegerTy() \|\| !To->isIntegerTy())
return false;		return false;

unsigned FromWidth = From->getPrimitiveSizeInBits();		unsigned FromWidth = From->getPrimitiveSizeInBits();
unsigned ToWidth = To->getPrimitiveSizeInBits();		unsigned ToWidth = To->getPrimitiveSizeInBits();
return shouldChangeType(FromWidth, ToWidth);		return shouldChangeType(FromWidth, ToWidth);
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	static void ClearSubclassDataAfterReassociation(BinaryOperator &I) {
I.clearSubclassOptionalData();		I.clearSubclassOptionalData();
I.setFastMathFlags(FMF);		I.setFastMathFlags(FMF);
}		}

/// Combine constant operands of associative operations either before or after a		/// Combine constant operands of associative operations either before or after a
/// cast to eliminate one of the associative operations:		/// cast to eliminate one of the associative operations:
/// (op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2)))		/// (op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2)))
/// (op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2))		/// (op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2))
static bool simplifyAssocCastAssoc(BinaryOperator *BinOp1, InstCombiner &IC) {		static bool simplifyAssocCastAssoc(BinaryOperator *BinOp1,
		InstCombinerImpl &IC) {
auto *Cast = dyn_cast<CastInst>(BinOp1->getOperand(0));		auto *Cast = dyn_cast<CastInst>(BinOp1->getOperand(0));
if (!Cast \|\| !Cast->hasOneUse())		if (!Cast \|\| !Cast->hasOneUse())
return false;		return false;

// TODO: Enhance logic for other casts and remove this check.		// TODO: Enhance logic for other casts and remove this check.
auto CastOpcode = Cast->getOpcode();		auto CastOpcode = Cast->getOpcode();
if (CastOpcode != Instruction::ZExt)		if (CastOpcode != Instruction::ZExt)
return false;		return false;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
/// 3. Transform: "A op (B op C)" ==> "(A op B) op C" if "A op B" simplifies.		/// 3. Transform: "A op (B op C)" ==> "(A op B) op C" if "A op B" simplifies.
///		///
/// Associative and commutative operators:		/// Associative and commutative operators:
///		///
/// 4. Transform: "(A op B) op C" ==> "(C op A) op B" if "C op A" simplifies.		/// 4. Transform: "(A op B) op C" ==> "(C op A) op B" if "C op A" simplifies.
/// 5. Transform: "A op (B op C)" ==> "B op (C op A)" if "C op A" simplifies.		/// 5. Transform: "A op (B op C)" ==> "B op (C op A)" if "C op A" simplifies.
/// 6. Transform: "(A op C1) op (B op C2)" ==> "(A op B) op (C1 op C2)"		/// 6. Transform: "(A op C1) op (B op C2)" ==> "(A op B) op (C1 op C2)"
/// if C1 and C2 are constants.		/// if C1 and C2 are constants.
bool InstCombiner::SimplifyAssociativeOrCommutative(BinaryOperator &I) {		bool InstCombinerImpl::SimplifyAssociativeOrCommutative(BinaryOperator &I) {
Instruction::BinaryOps Opcode = I.getOpcode();		Instruction::BinaryOps Opcode = I.getOpcode();
bool Changed = false;		bool Changed = false;

do {		do {
// Order operands such that they are listed from right (least complex) to		// Order operands such that they are listed from right (least complex) to
// left (most complex). This puts constants before unary operators before		// left (most complex). This puts constants before unary operators before
// binary operators.		// binary operators.
if (I.isCommutative() && getComplexity(I.getOperand(0)) <		if (I.isCommutative() && getComplexity(I.getOperand(0)) <
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	if (TopOpcode == Instruction::Add \|\| TopOpcode == Instruction::Sub) {
}		}
// TODO: We can add other conversions e.g. shr => div etc.		// TODO: We can add other conversions e.g. shr => div etc.
}		}
return Op->getOpcode();		return Op->getOpcode();
}		}

/// This tries to simplify binary operations by factorizing out common terms		/// This tries to simplify binary operations by factorizing out common terms
/// (e. g. "(AB)+(AC)" -> "A*(B+C)").		/// (e. g. "(AB)+(AC)" -> "A*(B+C)").
Value *InstCombiner::tryFactorization(BinaryOperator &I,		Value *InstCombinerImpl::tryFactorization(BinaryOperator &I,
Instruction::BinaryOps InnerOpcode,		Instruction::BinaryOps InnerOpcode,
Value A, Value B, Value C, Value D) {		Value A, Value B, Value *C,
		Value *D) {
assert(A && B && C && D && "All values must be provided");		assert(A && B && C && D && "All values must be provided");

Value *V = nullptr;		Value *V = nullptr;
Value *SimplifiedInst = nullptr;		Value *SimplifiedInst = nullptr;
Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
Instruction::BinaryOps TopLevelOpcode = I.getOpcode();		Instruction::BinaryOps TopLevelOpcode = I.getOpcode();

// Does "X op' Y" always equal "Y op' X"?		// Does "X op' Y" always equal "Y op' X"?
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	Value *InstCombinerImpl::tryFactorization(BinaryOperator &I,
return SimplifiedInst;		return SimplifiedInst;
}		}

/// This tries to simplify binary operations which some other binary operation		/// This tries to simplify binary operations which some other binary operation
/// distributes over either by factorizing out common terms		/// distributes over either by factorizing out common terms
/// (eg "(AB)+(AC)" -> "A*(B+C)") or expanding out if this results in		/// (eg "(AB)+(AC)" -> "A*(B+C)") or expanding out if this results in
/// simplifications (eg: "A & (B \| C) -> (A&B) \| (A&C)" if this is a win).		/// simplifications (eg: "A & (B \| C) -> (A&B) \| (A&C)" if this is a win).
/// Returns the simplified value, or null if it didn't simplify.		/// Returns the simplified value, or null if it didn't simplify.
Value *InstCombiner::SimplifyUsingDistributiveLaws(BinaryOperator &I) {		Value *InstCombinerImpl::SimplifyUsingDistributiveLaws(BinaryOperator &I) {
Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
BinaryOperator *Op0 = dyn_cast<BinaryOperator>(LHS);		BinaryOperator *Op0 = dyn_cast<BinaryOperator>(LHS);
BinaryOperator *Op1 = dyn_cast<BinaryOperator>(RHS);		BinaryOperator *Op1 = dyn_cast<BinaryOperator>(RHS);
Instruction::BinaryOps TopLevelOpcode = I.getOpcode();		Instruction::BinaryOps TopLevelOpcode = I.getOpcode();

{		{
// Factorization.		// Factorization.
Value A, B, C, D;		Value A, B, C, D;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	if (R && R == ConstantExpr::getBinOpIdentity(InnerOpcode, R->getType())) {
A->takeName(&I);		A->takeName(&I);
return A;		return A;
}		}
}		}

return SimplifySelectsFeedingBinaryOp(I, LHS, RHS);		return SimplifySelectsFeedingBinaryOp(I, LHS, RHS);
}		}

Value *InstCombiner::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,		Value *InstCombinerImpl::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,
Value LHS, Value RHS) {		Value *LHS,
		Value *RHS) {
Value A, B, C, D, E, F;		Value A, B, C, D, E, F;
bool LHSIsSelect = match(LHS, m_Select(m_Value(A), m_Value(B), m_Value(C)));		bool LHSIsSelect = match(LHS, m_Select(m_Value(A), m_Value(B), m_Value(C)));
bool RHSIsSelect = match(RHS, m_Select(m_Value(D), m_Value(E), m_Value(F)));		bool RHSIsSelect = match(RHS, m_Select(m_Value(D), m_Value(E), m_Value(F)));
if (!LHSIsSelect && !RHSIsSelect)		if (!LHSIsSelect && !RHSIsSelect)
return nullptr;		return nullptr;

FastMathFlags FMF;		FastMathFlags FMF;
BuilderTy::FastMathFlagGuard Guard(Builder);		BuilderTy::FastMathFlagGuard Guard(Builder);
Show All 35 Lines	Value *InstCombinerImpl::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,

Value *SI = Builder.CreateSelect(Cond, True, False);		Value *SI = Builder.CreateSelect(Cond, True, False);
SI->takeName(&I);		SI->takeName(&I);
return SI;		return SI;
}		}

/// Given a 'sub' instruction, return the RHS of the instruction if the LHS is a		/// Given a 'sub' instruction, return the RHS of the instruction if the LHS is a
/// constant zero (which is the 'negate' form).		/// constant zero (which is the 'negate' form).
Value InstCombiner::dyn_castNegVal(Value V) const {		Value InstCombinerImpl::dyn_castNegVal(Value V) const {
Value *NegV;		Value *NegV;
if (match(V, m_Neg(m_Value(NegV))))		if (match(V, m_Neg(m_Value(NegV))))
return NegV;		return NegV;

// Constants can be considered to be negated values if they can be folded.		// Constants can be considered to be negated values if they can be folded.
if (ConstantInt *C = dyn_cast<ConstantInt>(V))		if (ConstantInt *C = dyn_cast<ConstantInt>(V))
return ConstantExpr::getNeg(C);		return ConstantExpr::getNeg(C);

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	static Value foldOperationIntoSelectOperand(Instruction &I, Value SO,
Value *RI = Builder.CreateBinOp(BO->getOpcode(), Op0, Op1,		Value *RI = Builder.CreateBinOp(BO->getOpcode(), Op0, Op1,
SO->getName() + ".op");		SO->getName() + ".op");
auto *FPInst = dyn_cast<Instruction>(RI);		auto *FPInst = dyn_cast<Instruction>(RI);
if (FPInst && isa<FPMathOperator>(FPInst))		if (FPInst && isa<FPMathOperator>(FPInst))
FPInst->copyFastMathFlags(BO);		FPInst->copyFastMathFlags(BO);
return RI;		return RI;
}		}

Instruction InstCombiner::FoldOpIntoSelect(Instruction &Op, SelectInst SI) {		Instruction *InstCombinerImpl::FoldOpIntoSelect(Instruction &Op,
		SelectInst *SI) {
// Don't modify shared select instructions.		// Don't modify shared select instructions.
if (!SI->hasOneUse())		if (!SI->hasOneUse())
return nullptr;		return nullptr;

Value *TV = SI->getTrueValue();		Value *TV = SI->getTrueValue();
Value *FV = SI->getFalseValue();		Value *FV = SI->getFalseValue();
if (!(isa<Constant>(TV) \|\| isa<Constant>(FV)))		if (!(isa<Constant>(TV) \|\| isa<Constant>(FV)))
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static Value foldOperationIntoPhiValue(BinaryOperator I, Value *InV,

Value *RI = Builder.CreateBinOp(I->getOpcode(), Op0, Op1, "phitmp");		Value *RI = Builder.CreateBinOp(I->getOpcode(), Op0, Op1, "phitmp");
auto *FPInst = dyn_cast<Instruction>(RI);		auto *FPInst = dyn_cast<Instruction>(RI);
if (FPInst && isa<FPMathOperator>(FPInst))		if (FPInst && isa<FPMathOperator>(FPInst))
FPInst->copyFastMathFlags(I);		FPInst->copyFastMathFlags(I);
return RI;		return RI;
}		}

Instruction InstCombiner::foldOpIntoPhi(Instruction &I, PHINode PN) {		Instruction InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode PN) {
unsigned NumPHIValues = PN->getNumIncomingValues();		unsigned NumPHIValues = PN->getNumIncomingValues();
if (NumPHIValues == 0)		if (NumPHIValues == 0)
return nullptr;		return nullptr;

// We normally only transform phis with a single use. However, if a PHI has		// We normally only transform phis with a single use. However, if a PHI has
// multiple uses and they are all the same operation, we can fold all of the		// multiple uses and they are all the same operation, we can fold all of the
// uses into the PHI.		// uses into the PHI.
if (!PN->hasOneUse()) {		if (!PN->hasOneUse()) {
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	for (auto UI = PN->user_begin(), E = PN->user_end(); UI != E;) {
Instruction User = cast<Instruction>(UI++);		Instruction User = cast<Instruction>(UI++);
if (User == &I) continue;		if (User == &I) continue;
replaceInstUsesWith(*User, NewPN);		replaceInstUsesWith(*User, NewPN);
eraseInstFromFunction(*User);		eraseInstFromFunction(*User);
}		}
return replaceInstUsesWith(I, NewPN);		return replaceInstUsesWith(I, NewPN);
}		}

Instruction *InstCombiner::foldBinOpIntoSelectOrPhi(BinaryOperator &I) {		Instruction *InstCombinerImpl::foldBinOpIntoSelectOrPhi(BinaryOperator &I) {
if (!isa<Constant>(I.getOperand(1)))		if (!isa<Constant>(I.getOperand(1)))
return nullptr;		return nullptr;

if (auto *Sel = dyn_cast<SelectInst>(I.getOperand(0))) {		if (auto *Sel = dyn_cast<SelectInst>(I.getOperand(0))) {
if (Instruction *NewSel = FoldOpIntoSelect(I, Sel))		if (Instruction *NewSel = FoldOpIntoSelect(I, Sel))
return NewSel;		return NewSel;
} else if (auto *PN = dyn_cast<PHINode>(I.getOperand(0))) {		} else if (auto *PN = dyn_cast<PHINode>(I.getOperand(0))) {
if (Instruction *NewPhi = foldOpIntoPhi(I, PN))		if (Instruction *NewPhi = foldOpIntoPhi(I, PN))
return NewPhi;		return NewPhi;
}		}
return nullptr;		return nullptr;
}		}

/// Given a pointer type and a constant offset, determine whether or not there		/// Given a pointer type and a constant offset, determine whether or not there
/// is a sequence of GEP indices into the pointed type that will land us at the		/// is a sequence of GEP indices into the pointed type that will land us at the
/// specified offset. If so, fill them into NewIndices and return the resultant		/// specified offset. If so, fill them into NewIndices and return the resultant
/// element type, otherwise return null.		/// element type, otherwise return null.
Type InstCombiner::FindElementAtOffset(PointerType PtrTy, int64_t Offset,		Type *
		InstCombinerImpl::FindElementAtOffset(PointerType *PtrTy, int64_t Offset,
SmallVectorImpl<Value *> &NewIndices) {		SmallVectorImpl<Value *> &NewIndices) {
Type *Ty = PtrTy->getElementType();		Type *Ty = PtrTy->getElementType();
if (!Ty->isSized())		if (!Ty->isSized())
return nullptr;		return nullptr;

// Start with the index over the outer type. Note that the type size		// Start with the index over the outer type. Note that the type size
// might be zero (even if the offset isn't zero) if the indexed type		// might be zero (even if the offset isn't zero) if the indexed type
// is something like [0 x {int, int}]		// is something like [0 x {int, int}]
Type *IndexTy = DL.getIndexType(PtrTy);		Type *IndexTy = DL.getIndexType(PtrTy);
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	static bool shouldMergeGEPs(GEPOperator &GEP, GEPOperator &Src) {
if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&		if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
!Src.hasOneUse())		!Src.hasOneUse())
return false;		return false;
return true;		return true;
}		}

/// Return a value X such that Val = X * Scale, or null if none.		/// Return a value X such that Val = X * Scale, or null if none.
/// If the multiplication is known not to overflow, then NoSignedWrap is set.		/// If the multiplication is known not to overflow, then NoSignedWrap is set.
Value InstCombiner::Descale(Value Val, APInt Scale, bool &NoSignedWrap) {		Value InstCombinerImpl::Descale(Value Val, APInt Scale, bool &NoSignedWrap) {
assert(isa<IntegerType>(Val->getType()) && "Can only descale integers!");		assert(isa<IntegerType>(Val->getType()) && "Can only descale integers!");
assert(cast<IntegerType>(Val->getType())->getBitWidth() ==		assert(cast<IntegerType>(Val->getType())->getBitWidth() ==
Scale.getBitWidth() && "Scale not compatible with value!");		Scale.getBitWidth() && "Scale not compatible with value!");

// If Val is zero or Scale is one then Val = Val * Scale.		// If Val is zero or Scale is one then Val = Val * Scale.
if (match(Val, m_Zero()) \|\| Scale == 1) {		if (match(Val, m_Zero()) \|\| Scale == 1) {
NoSignedWrap = true;		NoSignedWrap = true;
return Val;		return Val;
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	if (Ancestor == Val)
return Val;		return Val;

// Move up one level in the expression.		// Move up one level in the expression.
assert(Ancestor->hasOneUse() && "Drilled down when more than one use!");		assert(Ancestor->hasOneUse() && "Drilled down when more than one use!");
Ancestor = Ancestor->user_back();		Ancestor = Ancestor->user_back();
} while (true);		} while (true);
}		}

Instruction *InstCombiner::foldVectorBinop(BinaryOperator &Inst) {		Instruction *InstCombinerImpl::foldVectorBinop(BinaryOperator &Inst) {
// FIXME: some of this is likely fine for scalable vectors		// FIXME: some of this is likely fine for scalable vectors
if (!isa<FixedVectorType>(Inst.getType()))		if (!isa<FixedVectorType>(Inst.getType()))
return nullptr;		return nullptr;

BinaryOperator::BinaryOps Opcode = Inst.getOpcode();		BinaryOperator::BinaryOps Opcode = Inst.getOpcode();
Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);		Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);
assert(cast<VectorType>(LHS->getType())->getElementCount() ==		assert(cast<VectorType>(LHS->getType())->getElementCount() ==
cast<VectorType>(Inst.getType())->getElementCount());		cast<VectorType>(Inst.getType())->getElementCount());
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldVectorBinop(BinaryOperator &Inst) {
}		}

return nullptr;		return nullptr;
}		}

/// Try to narrow the width of a binop if at least 1 operand is an extend of		/// Try to narrow the width of a binop if at least 1 operand is an extend of
/// of a value. This requires a potentially expensive known bits check to make		/// of a value. This requires a potentially expensive known bits check to make
/// sure the narrow op does not overflow.		/// sure the narrow op does not overflow.
Instruction *InstCombiner::narrowMathIfNoOverflow(BinaryOperator &BO) {		Instruction *InstCombinerImpl::narrowMathIfNoOverflow(BinaryOperator &BO) {
// We need at least one extended operand.		// We need at least one extended operand.
Value Op0 = BO.getOperand(0), Op1 = BO.getOperand(1);		Value Op0 = BO.getOperand(0), Op1 = BO.getOperand(1);

// If this is a sub, we swap the operands since we always want an extension		// If this is a sub, we swap the operands since we always want an extension
// on the RHS. The LHS can be an extension or a constant.		// on the RHS. The LHS can be an extension or a constant.
if (BO.getOpcode() == Instruction::Sub)		if (BO.getOpcode() == Instruction::Sub)
std::swap(Op0, Op1);		std::swap(Op0, Op1);

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	static Instruction *foldSelectGEP(GetElementPtrInst &GEP,
bool IsInBounds = GEP.isInBounds();		bool IsInBounds = GEP.isInBounds();
Value *NewTrueC = IsInBounds ? Builder.CreateInBoundsGEP(TrueC, IndexC)		Value *NewTrueC = IsInBounds ? Builder.CreateInBoundsGEP(TrueC, IndexC)
: Builder.CreateGEP(TrueC, IndexC);		: Builder.CreateGEP(TrueC, IndexC);
Value *NewFalseC = IsInBounds ? Builder.CreateInBoundsGEP(FalseC, IndexC)		Value *NewFalseC = IsInBounds ? Builder.CreateInBoundsGEP(FalseC, IndexC)
: Builder.CreateGEP(FalseC, IndexC);		: Builder.CreateGEP(FalseC, IndexC);
return SelectInst::Create(Cond, NewTrueC, NewFalseC, "", nullptr, Sel);		return SelectInst::Create(Cond, NewTrueC, NewFalseC, "", nullptr, Sel);
}		}

Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {		Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());		SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());
Type *GEPType = GEP.getType();		Type *GEPType = GEP.getType();
Type *GEPEltType = GEP.getSourceElementType();		Type *GEPEltType = GEP.getSourceElementType();
bool IsGEPSrcEleScalable = isa<ScalableVectorType>(GEPEltType);		bool IsGEPSrcEleScalable = isa<ScalableVectorType>(GEPEltType);
if (Value *V = SimplifyGEPInst(GEPEltType, Ops, SQ.getWithInstruction(&GEP)))		if (Value *V = SimplifyGEPInst(GEPEltType, Ops, SQ.getWithInstruction(&GEP)))
return replaceInstUsesWith(GEP, V);		return replaceInstUsesWith(GEP, V);

// For vector geps, use the generic demanded vector support.		// For vector geps, use the generic demanded vector support.
▲ Show 20 Lines • Show All 734 Lines • ▼ Show 20 Lines	for (User *U : PI->users()) {
}		}
}		}
llvm_unreachable("missing a return?");		llvm_unreachable("missing a return?");
}		}
} while (!Worklist.empty());		} while (!Worklist.empty());
return true;		return true;
}		}

Instruction *InstCombiner::visitAllocSite(Instruction &MI) {		Instruction *InstCombinerImpl::visitAllocSite(Instruction &MI) {
// If we have a malloc call which is only used in any amount of comparisons to		// If we have a malloc call which is only used in any amount of comparisons to
// null and free calls, delete the calls and replace the comparisons with true		// null and free calls, delete the calls and replace the comparisons with true
// or false as appropriate.		// or false as appropriate.

// This is based on the principle that we can substitute our own allocation		// This is based on the principle that we can substitute our own allocation
// function (which will never return null) rather than knowledge of the		// function (which will never return null) rather than knowledge of the
// specific function being called. In some sense this can change the permitted		// specific function being called. In some sense this can change the permitted
// outputs of a program (when we convert a malloc to an alloca, the fact that		// outputs of a program (when we convert a malloc to an alloca, the fact that
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	if (&Instr == FreeInstrBBTerminator)
break;		break;
Instr.moveBefore(TI);		Instr.moveBefore(TI);
}		}
assert(FreeInstrBB->size() == 1 &&		assert(FreeInstrBB->size() == 1 &&
"Only the branch instruction should remain");		"Only the branch instruction should remain");
return &FI;		return &FI;
}		}

Instruction *InstCombiner::visitFree(CallInst &FI) {		Instruction *InstCombinerImpl::visitFree(CallInst &FI) {
Value *Op = FI.getArgOperand(0);		Value *Op = FI.getArgOperand(0);

// free undef -> unreachable.		// free undef -> unreachable.
if (isa<UndefValue>(Op)) {		if (isa<UndefValue>(Op)) {
// Leave a marker since we can't modify the CFG here.		// Leave a marker since we can't modify the CFG here.
CreateNonTerminatorUnreachable(&FI);		CreateNonTerminatorUnreachable(&FI);
return eraseInstFromFunction(FI);		return eraseInstFromFunction(FI);
}		}
Show All 24 Lines
}		}

static bool isMustTailCall(Value *V) {		static bool isMustTailCall(Value *V) {
if (auto *CI = dyn_cast<CallInst>(V))		if (auto *CI = dyn_cast<CallInst>(V))
return CI->isMustTailCall();		return CI->isMustTailCall();
return false;		return false;
}		}

Instruction *InstCombiner::visitReturnInst(ReturnInst &RI) {		Instruction *InstCombinerImpl::visitReturnInst(ReturnInst &RI) {
if (RI.getNumOperands() == 0) // ret void		if (RI.getNumOperands() == 0) // ret void
return nullptr;		return nullptr;

Value *ResultOp = RI.getOperand(0);		Value *ResultOp = RI.getOperand(0);
Type *VTy = ResultOp->getType();		Type *VTy = ResultOp->getType();
if (!VTy->isIntegerTy() \|\| isa<Constant>(ResultOp))		if (!VTy->isIntegerTy() \|\| isa<Constant>(ResultOp))
return nullptr;		return nullptr;

// Don't replace result of musttail calls.		// Don't replace result of musttail calls.
if (isMustTailCall(ResultOp))		if (isMustTailCall(ResultOp))
return nullptr;		return nullptr;

// There might be assume intrinsics dominating this return that completely		// There might be assume intrinsics dominating this return that completely
// determine the value. If so, constant fold it.		// determine the value. If so, constant fold it.
KnownBits Known = computeKnownBits(ResultOp, 0, &RI);		KnownBits Known = computeKnownBits(ResultOp, 0, &RI);
if (Known.isConstant())		if (Known.isConstant())
return replaceOperand(RI, 0,		return replaceOperand(RI, 0,
Constant::getIntegerValue(VTy, Known.getConstant()));		Constant::getIntegerValue(VTy, Known.getConstant()));

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitBranchInst(BranchInst &BI) {		Instruction *InstCombinerImpl::visitBranchInst(BranchInst &BI) {
// Nothing to do about unconditional branches.		// Nothing to do about unconditional branches.
if (BI.isUnconditional())		if (BI.isUnconditional())
return nullptr;		return nullptr;

// Change br (not X), label True, label False to: br X, label False, True		// Change br (not X), label True, label False to: br X, label False, True
Value *X = nullptr;		Value *X = nullptr;
if (match(&BI, m_Br(m_Not(m_Value(X)), m_BasicBlock(), m_BasicBlock())) &&		if (match(&BI, m_Br(m_Not(m_Value(X)), m_BasicBlock(), m_BasicBlock())) &&
!isa<Constant>(X)) {		!isa<Constant>(X)) {
Show All 20 Lines	if (match(&BI, m_Br(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())),
BI.swapSuccessors();		BI.swapSuccessors();
Worklist.push(Cond);		Worklist.push(Cond);
return &BI;		return &BI;
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {		Instruction *InstCombinerImpl::visitSwitchInst(SwitchInst &SI) {
Value *Cond = SI.getCondition();		Value *Cond = SI.getCondition();
Value *Op0;		Value *Op0;
ConstantInt *AddRHS;		ConstantInt *AddRHS;
if (match(Cond, m_Add(m_Value(Op0), m_ConstantInt(AddRHS)))) {		if (match(Cond, m_Add(m_Value(Op0), m_ConstantInt(AddRHS)))) {
// Change 'switch (X+4) case 1:' into 'switch (X) case -3'.		// Change 'switch (X+4) case 1:' into 'switch (X) case -3'.
for (auto Case : SI.cases()) {		for (auto Case : SI.cases()) {
Constant *NewCase = ConstantExpr::getSub(Case.getCaseValue(), AddRHS);		Constant *NewCase = ConstantExpr::getSub(Case.getCaseValue(), AddRHS);
assert(isa<ConstantInt>(NewCase) &&		assert(isa<ConstantInt>(NewCase) &&
Show All 14 Lines	LeadingKnownZeros = std::min(
LeadingKnownZeros, C.getCaseValue()->getValue().countLeadingZeros());		LeadingKnownZeros, C.getCaseValue()->getValue().countLeadingZeros());
LeadingKnownOnes = std::min(		LeadingKnownOnes = std::min(
LeadingKnownOnes, C.getCaseValue()->getValue().countLeadingOnes());		LeadingKnownOnes, C.getCaseValue()->getValue().countLeadingOnes());
}		}

unsigned NewWidth = Known.getBitWidth() - std::max(LeadingKnownZeros, LeadingKnownOnes);		unsigned NewWidth = Known.getBitWidth() - std::max(LeadingKnownZeros, LeadingKnownOnes);

// Shrink the condition operand if the new type is smaller than the old type.		// Shrink the condition operand if the new type is smaller than the old type.
// But do not shrink to a non-standard type, because backend can't generate		// But do not shrink to a non-standard type, because backend can't generate
// good code for that yet.		// good code for that yet.
// TODO: We can make it aggressive again after fixing PR39569.		// TODO: We can make it aggressive again after fixing PR39569.
if (NewWidth > 0 && NewWidth < Known.getBitWidth() &&		if (NewWidth > 0 && NewWidth < Known.getBitWidth() &&
shouldChangeType(Known.getBitWidth(), NewWidth)) {		shouldChangeType(Known.getBitWidth(), NewWidth)) {
IntegerType *Ty = IntegerType::get(SI.getContext(), NewWidth);		IntegerType *Ty = IntegerType::get(SI.getContext(), NewWidth);
Builder.SetInsertPoint(&SI);		Builder.SetInsertPoint(&SI);
Value *NewCond = Builder.CreateTrunc(Cond, Ty, "trunc");		Value *NewCond = Builder.CreateTrunc(Cond, Ty, "trunc");

for (auto Case : SI.cases()) {		for (auto Case : SI.cases()) {
APInt TruncatedCase = Case.getCaseValue()->getValue().trunc(NewWidth);		APInt TruncatedCase = Case.getCaseValue()->getValue().trunc(NewWidth);
Case.setValue(ConstantInt::get(SI.getContext(), TruncatedCase));		Case.setValue(ConstantInt::get(SI.getContext(), TruncatedCase));
}		}
return replaceOperand(SI, 0, NewCond);		return replaceOperand(SI, 0, NewCond);
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitExtractValueInst(ExtractValueInst &EV) {		Instruction *InstCombinerImpl::visitExtractValueInst(ExtractValueInst &EV) {
Value *Agg = EV.getAggregateOperand();		Value *Agg = EV.getAggregateOperand();

if (!EV.hasIndices())		if (!EV.hasIndices())
return replaceInstUsesWith(EV, Agg);		return replaceInstUsesWith(EV, Agg);

if (Value *V = SimplifyExtractValueInst(Agg, EV.getIndices(),		if (Value *V = SimplifyExtractValueInst(Agg, EV.getIndices(),
SQ.getWithInstruction(&EV)))		SQ.getWithInstruction(&EV)))
return replaceInstUsesWith(EV, V);		return replaceInstUsesWith(EV, V);
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines

static bool shorter_filter(const Value LHS, const Value RHS) {		static bool shorter_filter(const Value LHS, const Value RHS) {
return		return
cast<ArrayType>(LHS->getType())->getNumElements()		cast<ArrayType>(LHS->getType())->getNumElements()
<		<
cast<ArrayType>(RHS->getType())->getNumElements();		cast<ArrayType>(RHS->getType())->getNumElements();
}		}

Instruction *InstCombiner::visitLandingPadInst(LandingPadInst &LI) {		Instruction *InstCombinerImpl::visitLandingPadInst(LandingPadInst &LI) {
// The logic here should be correct for any real-world personality function.		// The logic here should be correct for any real-world personality function.
// However if that turns out not to be true, the offending logic can always		// However if that turns out not to be true, the offending logic can always
// be conditioned on the personality function, like the catch-all logic is.		// be conditioned on the personality function, like the catch-all logic is.
EHPersonality Personality =		EHPersonality Personality =
classifyEHPersonality(LI.getParent()->getParent()->getPersonalityFn());		classifyEHPersonality(LI.getParent()->getParent()->getPersonalityFn());

// Simplify the list of clauses, eg by removing repeated catch clauses		// Simplify the list of clauses, eg by removing repeated catch clauses
// (these are often created by inlining).		// (these are often created by inlining).
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	if (LI.isCleanup() != CleanupFlag) {
assert(!CleanupFlag && "Adding a cleanup, not removing one?!");		assert(!CleanupFlag && "Adding a cleanup, not removing one?!");
LI.setCleanup(CleanupFlag);		LI.setCleanup(CleanupFlag);
return &LI;		return &LI;
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFreeze(FreezeInst &I) {		Instruction *InstCombinerImpl::visitFreeze(FreezeInst &I) {
Value *Op0 = I.getOperand(0);		Value *Op0 = I.getOperand(0);

if (Value *V = SimplifyFreezeInst(Op0, SQ.getWithInstruction(&I)))		if (Value *V = SimplifyFreezeInst(Op0, SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

return nullptr;		return nullptr;
}		}

▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	for (auto &DIIClone : DIIClones) {
DIIClone->insertBefore(&*InsertPos);		DIIClone->insertBefore(&*InsertPos);
LLVM_DEBUG(dbgs() << "SINK: " << *DIIClone << '\n');		LLVM_DEBUG(dbgs() << "SINK: " << *DIIClone << '\n');
}		}
}		}

return true;		return true;
}		}

bool InstCombiner::run() {		bool InstCombinerImpl::run() {
while (!Worklist.isEmpty()) {		while (!Worklist.isEmpty()) {
// Walk deferred instructions in reverse order, and push them to the		// Walk deferred instructions in reverse order, and push them to the
// worklist, which means they'll end up popped from the worklist in-order.		// worklist, which means they'll end up popped from the worklist in-order.
while (Instruction *I = Worklist.popDeferred()) {		while (Instruction *I = Worklist.popDeferred()) {
// Check to see if we can DCE the instruction. We do this already here to		// Check to see if we can DCE the instruction. We do this already here to
// reduce the number of uses and thus allow other folds to trigger.		// reduce the number of uses and thus allow other folds to trigger.
// Note that eraseInstFromFunction() may push additional instructions on		// Note that eraseInstFromFunction() may push additional instructions on
// the deferred worklist, so this will DCE whole instruction chains.		// the deferred worklist, so this will DCE whole instruction chains.
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	for (Instruction *Inst : reverse(InstrsForInstCombineWorklist)) {
ICWorklist.push(Inst);		ICWorklist.push(Inst);
}		}

return MadeIRChange;		return MadeIRChange;
}		}

static bool combineInstructionsOverFunction(		static bool combineInstructionsOverFunction(
Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,		Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,		AssumptionCache &AC, TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,		DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,
ProfileSummaryInfo PSI, unsigned MaxIterations, LoopInfo LI) {		ProfileSummaryInfo PSI, unsigned MaxIterations, LoopInfo LI) {
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();
MaxIterations = std::min(MaxIterations, LimitMaxIterations.getValue());		MaxIterations = std::min(MaxIterations, LimitMaxIterations.getValue());

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
Show All 27 Lines	if (Iteration > MaxIterations) {
break;		break;
}		}

LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);

InstCombiner IC(Worklist, Builder, F.hasMinSize(), AA,		InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,
AC, TLI, DT, ORE, BFI, PSI, DL, LI);		ORE, BFI, PSI, DL, LI);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;

if (!IC.run())		if (!IC.run())
break;		break;

MadeIRChange = true;		MadeIRChange = true;
}		}

return MadeIRChange;		return MadeIRChange;
}		}

InstCombinePass::InstCombinePass() : MaxIterations(LimitMaxIterations) {}		InstCombinePass::InstCombinePass() : MaxIterations(LimitMaxIterations) {}

InstCombinePass::InstCombinePass(unsigned MaxIterations)		InstCombinePass::InstCombinePass(unsigned MaxIterations)
: MaxIterations(MaxIterations) {}		: MaxIterations(MaxIterations) {}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);		auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
		auto &TTI = AM.getResult<TargetIRAnalysis>(F);

auto *LI = AM.getCachedResult<LoopAnalysis>(F);		auto *LI = AM.getCachedResult<LoopAnalysis>(F);

auto *AA = &AM.getResult<AAManager>(F);		auto *AA = &AM.getResult<AAManager>(F);
auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);		auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());		MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
auto *BFI = (PSI && PSI->hasProfileSummary()) ?		auto *BFI = (PSI && PSI->hasProfileSummary()) ?
&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;		&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;

if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, BFI,		if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
PSI, MaxIterations, LI))		BFI, PSI, MaxIterations, LI))
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<AAManager>();		PA.preserve<AAManager>();
PA.preserve<BasicAA>();		PA.preserve<BasicAA>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {		void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<OptimizationRemarkEmitterWrapperPass>();		AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<BasicAAWrapperPass>();		AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);		LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);
}		}

bool InstructionCombiningPass::runOnFunction(Function &F) {		bool InstructionCombiningPass::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

// Required analyses.		// Required analyses.
auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		lebedev.riUnsubmitted Not Done Reply Inline Actions This opens a dangerous floodgates of instcombine not being target-independent canonicalization pass. lebedev.ri: This opens a dangerous floodgates of instcombine not being target-independent canonicalization…
		FlakebiAuthorUnsubmitted Not Done Reply Inline Actions That is the point of this change, to allow target-dependent combinations in TargetTransformInfo::instCombineIntrinsic. Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I don’t have a great overview of LLVM, so I might be wrong on this. Flakebi: That is the point of this change, to allow target-dependent combinations in TargetTransformInfo…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I agree with that, yes. The problem i'm seeing is that even having TTI in the pass "significantly" lowers the barrier of entry for then using TTI to guard some generic transforms in the instcombine. lebedev.ri: > Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();		auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();

// Optional analyses.		// Optional analyses.
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();		auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;		auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
BlockFrequencyInfo *BFI =		BlockFrequencyInfo *BFI =
(PSI && PSI->hasProfileSummary()) ?		(PSI && PSI->hasProfileSummary()) ?
&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :		&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
nullptr;		nullptr;

return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, BFI,		return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
PSI, MaxIterations, LI);		BFI, PSI, MaxIterations, LI);
}		}

char InstructionCombiningPass::ID = 0;		char InstructionCombiningPass::ID = 0;

InstructionCombiningPass::InstructionCombiningPass()		InstructionCombiningPass::InstructionCombiningPass()
: FunctionPass(ID), MaxIterations(InstCombineDefaultMaxIterations) {		: FunctionPass(ID), MaxIterations(InstCombineDefaultMaxIterations) {
initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());		initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());
}		}

InstructionCombiningPass::InstructionCombiningPass(unsigned MaxIterations)		InstructionCombiningPass::InstructionCombiningPass(unsigned MaxIterations)
: FunctionPass(ID), MaxIterations(MaxIterations) {		: FunctionPass(ID), MaxIterations(MaxIterations) {
initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());		initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());
}		}

INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)		INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LazyBlockFrequencyInfoPass)		INITIALIZE_PASS_DEPENDENCY(LazyBlockFrequencyInfoPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
Show All 21 Lines

llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: opt -instcombine %s \| llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -o - \| FileCheck %s			; RUN: opt -instcombine -mtriple=arm %s \| llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -o - \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions Please use the same triple as llc for any test with "mve" in the title. dmgreen: Please use the same triple as llc for any test with "mve" in the title.

	declare <16 x i1> @llvm.arm.mve.vctp8(i32)			declare <16 x i1> @llvm.arm.mve.vctp8(i32)
	declare <8 x i1> @llvm.arm.mve.vctp16(i32)			declare <8 x i1> @llvm.arm.mve.vctp16(i32)
	declare <4 x i1> @llvm.arm.mve.vctp32(i32)			declare <4 x i1> @llvm.arm.mve.vctp32(i32)
	declare <4 x i1> @llvm.arm.mve.vctp64(i32)			declare <4 x i1> @llvm.arm.mve.vctp64(i32)

	declare i32 @llvm.arm.mve.pred.v2i.v4i1(<4 x i1>)			declare i32 @llvm.arm.mve.pred.v2i.v4i1(<4 x i1>)
	declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)			declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: opt -instcombine -S %s \| FileCheck --check-prefix=IR %s			; RUN: opt -instcombine -mtriple=arm -S %s \| FileCheck --check-prefix=IR %s
	; RUN: opt -instcombine %s \| llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -O3 -o - \| FileCheck --check-prefix=ASM %s			; RUN: opt -instcombine -mtriple=arm %s \| llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -verify-machineinstrs -O3 -o - \| FileCheck --check-prefix=ASM %s

	%struct.foo = type { [2 x <4 x i32>] }			%struct.foo = type { [2 x <4 x i32>] }

	define arm_aapcs_vfpcc i32 @test_vadciq_multiple(%struct.foo %a, %struct.foo %b, i32 %carry) {			define arm_aapcs_vfpcc i32 @test_vadciq_multiple(%struct.foo %a, %struct.foo %b, i32 %carry) {
	entry:			entry:
	%a.0 = extractvalue %struct.foo %a, 0, 0			%a.0 = extractvalue %struct.foo %a, 0, 0
	%a.1 = extractvalue %struct.foo %a, 0, 1			%a.1 = extractvalue %struct.foo %a, 0, 1
	%b.0 = extractvalue %struct.foo %b, 0, 0			%b.0 = extractvalue %struct.foo %b, 0, 0
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: opt -instcombine %s \| llc -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve --verify-machineinstrs -o - \| FileCheck %s		; RUN: opt -instcombine -mtriple=arm %s \| llc -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve --verify-machineinstrs -o - \| FileCheck %s

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"		target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define arm_aapcs_vfpcc <8 x i16> @test_vpt_block(<8 x i16> %v_inactive, <8 x i16> %v1, <8 x i16> %v2, <8 x i16> %v3) {		define arm_aapcs_vfpcc <8 x i16> @test_vpt_block(<8 x i16> %v_inactive, <8 x i16> %v1, <8 x i16> %v2, <8 x i16> %v3) {
; CHECK-LABEL: test_vpt_block:		; CHECK-LABEL: test_vpt_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vpt.i16 eq, q1, q2		; CHECK-NEXT: vpt.i16 eq, q1, q2
; CHECK-NEXT: vaddt.i16 q0, q3, q2		; CHECK-NEXT: vaddt.i16 q0, q3, q2
Show All 26 Lines	entry:
%6 = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> %w, <8 x i16> %x, <8 x i1> %5, <8 x i16> %v)		%6 = call <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16> %w, <8 x i16> %x, <8 x i1> %5, <8 x i16> %v)
ret <8 x i16> %6		ret <8 x i16> %6
}		}

declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)		declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)
declare <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32)		declare <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32)
declare <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16>, <8 x i16>, <8 x i1>, <8 x i16>)		declare <8 x i16> @llvm.arm.mve.add.predicated.v8i16.v8i1(<8 x i16>, <8 x i16>, <8 x i1>, <8 x i16>)
declare <8 x i1> @llvm.arm.mve.vctp16(i32)		declare <8 x i1> @llvm.arm.mve.vctp16(i32)

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll

	; RUN: opt -S -instcombine %s \| FileCheck %s			; RUN: opt -S -instcombine -mtriple=amdgcn-amd-amdhsa %s \| FileCheck %s

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.buffer.load			; llvm.amdgcn.buffer.load
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	; CHECK-LABEL: @buffer_load_f32(			; CHECK-LABEL: @buffer_load_f32(
	; CHECK-NEXT: %data = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i1 false, i1 false)			; CHECK-NEXT: %data = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i1 false, i1 false)
	; CHECK-NEXT: ret float %data			; CHECK-NEXT: ret float %data
	▲ Show 20 Lines • Show All 3,763 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-amd-amdhsa -instcombine -S < %s \| FileCheck %s

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.rcp			; llvm.amdgcn.rcp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare float @llvm.amdgcn.rcp.f32(float) nounwind readnone			declare float @llvm.amdgcn.rcp.f32(float) nounwind readnone
	declare double @llvm.amdgcn.rcp.f64(double) nounwind readnone			declare double @llvm.amdgcn.rcp.f64(double) nounwind readnone

	▲ Show 20 Lines • Show All 2,794 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/ldexp.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=amdgcn-amd-amdhsa -instcombine -S \| FileCheck %s

	define float @ldexp_f32_undef_undef() {			define float @ldexp_f32_undef_undef() {
	; CHECK-LABEL: @ldexp_f32_undef_undef(			; CHECK-LABEL: @ldexp_f32_undef_undef(
	; CHECK-NEXT: ret float 0x7FF8000000000000			; CHECK-NEXT: ret float 0x7FF8000000000000
	;			;
	%call = call float @llvm.amdgcn.ldexp.f32(float undef, i32 undef)			%call = call float @llvm.amdgcn.ldexp.f32(float undef, i32 undef)
	ret float %call			ret float %call
	}			}
	▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S -o - %s \| FileCheck %s			; RUN: opt -instcombine -S -mtriple=arm -o - %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

	declare i32 @llvm.arm.mve.pred.v2i.v4i1(<4 x i1>)			declare i32 @llvm.arm.mve.pred.v2i.v4i1(<4 x i1>)
	declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)			declare i32 @llvm.arm.mve.pred.v2i.v8i1(<8 x i1>)
	declare i32 @llvm.arm.mve.pred.v2i.v16i1(<16 x i1>)			declare i32 @llvm.arm.mve.pred.v2i.v16i1(<16 x i1>)

	declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)			declare <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32)
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/ARM/neon-intrinsics.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=arm -S \| FileCheck %s

	; The alignment arguments for NEON load/store intrinsics can be increased			; The alignment arguments for NEON load/store intrinsics can be increased
	; by instcombine. Check for this.			; by instcombine. Check for this.

	; CHECK: vld4.v2i32.p0i8({{.*}}, i32 32)			; CHECK: vld4.v2i32.p0i8({{.*}}, i32 32)
	; CHECK: vst4.p0i8.v2i32({{.*}}, i32 16)			; CHECK: vst4.p0i8.v2i32({{.*}}, i32 16)

	@x = common global [8 x i32] zeroinitializer, align 32			@x = common global [8 x i32] zeroinitializer, align 32
	Show All 16 Lines

llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

	; Check that nvvm intrinsics get simplified to target-generic intrinsics where			; Check that nvvm intrinsics get simplified to target-generic intrinsics where
	; possible.			; possible.
	;			;
	; We run this test twice; once with ftz on, and again with ftz off. Behold the			; We run this test twice; once with ftz on, and again with ftz off. Behold the
	; hackery:			; hackery:

	; RUN: cat %s > %t.ftz			; RUN: cat %s > %t.ftz
	; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz			; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
	; RUN: opt < %t.ftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ			; RUN: opt < %t.ftz -instcombine -mtriple=nvptx64-nvidia-cuda -S \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ

	; RUN: cat %s > %t.noftz			; RUN: cat %s > %t.noftz
	; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz			; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
	; RUN: opt < %t.noftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ			; RUN: opt < %t.noftz -instcombine -mtriple=nvptx64-nvidia-cuda -S \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ

	; We handle nvvm intrinsics with ftz variants as follows:			; We handle nvvm intrinsics with ftz variants as follows:
	; - If the module is in ftz mode, the ftz variant is transformed into the			; - If the module is in ftz mode, the ftz variant is transformed into the
	; regular llvm intrinsic, and the non-ftz variant is left alone.			; regular llvm intrinsic, and the non-ftz variant is left alone.
	; - If the module is not in ftz mode, it's the reverse: Only the non-ftz			; - If the module is not in ftz mode, it's the reverse: Only the non-ftz
	; variant is transformed, and the ftz variant is left alone.			; variant is transformed, and the ftz variant is left alone.

	; Check NVVM intrinsics that map directly to LLVM target-generic intrinsics.			; Check NVVM intrinsics that map directly to LLVM target-generic intrinsics.
	▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/X86FsubCmpCombine.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	; The test checks the folding of cmp(sub(a,b),0) into cmp(a,b).			; The test checks the folding of cmp(sub(a,b),0) into cmp(a,b).

	define i8 @sub_compare_foldingPD128_safe(<2 x double> %a, <2 x double> %b){			define i8 @sub_compare_foldingPD128_safe(<2 x double> %a, <2 x double> %b){
	; CHECK-LABEL: @sub_compare_foldingPD128_safe(			; CHECK-LABEL: @sub_compare_foldingPD128_safe(
	; CHECK-NEXT: [[SUB_SAFE:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[SUB_SAFE:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]
	; CHECK-NEXT: [[T0:%.*]] = call <2 x i1> @llvm.x86.avx512.cmp.pd.128(<2 x double> [[SUB_SAFE]], <2 x double> zeroinitializer, i32 5)			; CHECK-NEXT: [[T0:%.*]] = call <2 x i1> @llvm.x86.avx512.cmp.pd.128(<2 x double> [[SUB_SAFE]], <2 x double> zeroinitializer, i32 5)
	; CHECK-NEXT: [[T1:%.*]] = shufflevector <2 x i1> [[T0]], <2 x i1> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 2, i32 3, i32 2, i32 3>			; CHECK-NEXT: [[T1:%.*]] = shufflevector <2 x i1> [[T0]], <2 x i1> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 2, i32 3, i32 2, i32 3>
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/addcarry.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	declare { i8, i32 } @llvm.x86.addcarry.32(i8, i32, i32)			declare { i8, i32 } @llvm.x86.addcarry.32(i8, i32, i32)
	declare { i8, i64 } @llvm.x86.addcarry.64(i8, i64, i64)			declare { i8, i64 } @llvm.x86.addcarry.64(i8, i64, i64)

	define i32 @no_carryin_i32(i32 %x, i32 %y, i8* %p) {			define i32 @no_carryin_i32(i32 %x, i32 %y, i8* %p) {
	; CHECK-LABEL: @no_carryin_i32(			; CHECK-LABEL: @no_carryin_i32(
	; CHECK-NEXT: [[TMP1:%.]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[X:%.]], i32 [[Y:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[X:%.]], i32 [[Y:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0			; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
	Show All 19 Lines
	; CHECK-NEXT: ret i64 [[TMP2]]			; CHECK-NEXT: ret i64 [[TMP2]]
	;			;
	%s = call { i8, i64 } @llvm.x86.addcarry.64(i8 0, i64 %x, i64 %y)			%s = call { i8, i64 } @llvm.x86.addcarry.64(i8 0, i64 %x, i64 %y)
	%ov = extractvalue { i8, i64 } %s, 0			%ov = extractvalue { i8, i64 } %s, 0
	store i8 %ov, i8* %p			store i8 %ov, i8* %p
	%r = extractvalue { i8, i64 } %s, 1			%r = extractvalue { i8, i64 } %s, 1
	ret i64 %r			ret i64 %r
	}			}

llvm/test/Transforms/InstCombine/X86/clmulqdq.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	declare <2 x i64> @llvm.x86.pclmulqdq(<2 x i64>, <2 x i64>, i8)			declare <2 x i64> @llvm.x86.pclmulqdq(<2 x i64>, <2 x i64>, i8)
	declare <4 x i64> @llvm.x86.pclmulqdq.256(<4 x i64>, <4 x i64>, i8)			declare <4 x i64> @llvm.x86.pclmulqdq.256(<4 x i64>, <4 x i64>, i8)
	declare <8 x i64> @llvm.x86.pclmulqdq.512(<8 x i64>, <8 x i64>, i8)			declare <8 x i64> @llvm.x86.pclmulqdq.512(<8 x i64>, <8 x i64>, i8)

	define <2 x i64> @test_demanded_elts_pclmulqdq_0(<2 x i64> %a0, <2 x i64> %a1) {			define <2 x i64> @test_demanded_elts_pclmulqdq_0(<2 x i64> %a0, <2 x i64> %a1) {
	; CHECK-LABEL: @test_demanded_elts_pclmulqdq_0(			; CHECK-LABEL: @test_demanded_elts_pclmulqdq_0(
	; CHECK-NEXT: [[TMP1:%.]] = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> [[A0:%.]], <2 x i64> [[A1:%.*]], i8 0)			; CHECK-NEXT: [[TMP1:%.]] = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> [[A0:%.]], <2 x i64> [[A1:%.*]], i8 0)
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-avx2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; Verify that instcombine is able to fold identity shuffles.			; Verify that instcombine is able to fold identity shuffles.

	define <8 x i32> @identity_test_vpermd(<8 x i32> %a0) {			define <8 x i32> @identity_test_vpermd(<8 x i32> %a0) {
	; CHECK-LABEL: @identity_test_vpermd(			; CHECK-LABEL: @identity_test_vpermd(
	; CHECK-NEXT: ret <8 x i32> [[A0:%.*]]			; CHECK-NEXT: ret <8 x i32> [[A0:%.*]]
	;			;
	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-avx512.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	declare <4 x float> @llvm.x86.avx512.mask.add.ss.round(<4 x float>, <4 x float>, <4 x float>, i8, i32)			declare <4 x float> @llvm.x86.avx512.mask.add.ss.round(<4 x float>, <4 x float>, <4 x float>, i8, i32)

	define <4 x float> @test_add_ss(<4 x float> %a, <4 x float> %b) {			define <4 x float> @test_add_ss(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: @test_add_ss(			; CHECK-LABEL: @test_add_ss(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[A:%.]], i64 0			; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[A:%.]], i64 0
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x float> [[B:%.]], i64 0			; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x float> [[B:%.]], i64 0
	▲ Show 20 Lines • Show All 3,397 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	declare i32 @llvm.x86.tbm.bextri.u32(i32, i32) nounwind readnone			declare i32 @llvm.x86.tbm.bextri.u32(i32, i32) nounwind readnone
	declare i64 @llvm.x86.tbm.bextri.u64(i64, i64) nounwind readnone			declare i64 @llvm.x86.tbm.bextri.u64(i64, i64) nounwind readnone
	declare i32 @llvm.x86.bmi.bextr.32(i32, i32) nounwind readnone			declare i32 @llvm.x86.bmi.bextr.32(i32, i32) nounwind readnone
	declare i64 @llvm.x86.bmi.bextr.64(i64, i64) nounwind readnone			declare i64 @llvm.x86.bmi.bextr.64(i64, i64) nounwind readnone
	declare i32 @llvm.x86.bmi.bzhi.32(i32, i32) nounwind readnone			declare i32 @llvm.x86.bmi.bzhi.32(i32, i32) nounwind readnone
	declare i64 @llvm.x86.bmi.bzhi.64(i64, i64) nounwind readnone			declare i64 @llvm.x86.bmi.bzhi.64(i64, i64) nounwind readnone
	declare i32 @llvm.x86.bmi.pext.32(i32, i32) nounwind readnone			declare i32 @llvm.x86.bmi.pext.32(i32, i32) nounwind readnone
	▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-insertps.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	declare <4 x float> @llvm.x86.sse41.insertps(<4 x float>, <4 x float>, i8) nounwind readnone			declare <4 x float> @llvm.x86.sse41.insertps(<4 x float>, <4 x float>, i8) nounwind readnone

	; If all zero mask bits are set, return a zero regardless of the other control bits.			; If all zero mask bits are set, return a zero regardless of the other control bits.

	define <4 x float> @insertps_0x0f(<4 x float> %v1, <4 x float> %v2) {			define <4 x float> @insertps_0x0f(<4 x float> %v1, <4 x float> %v2) {
	; CHECK-LABEL: @insertps_0x0f(			; CHECK-LABEL: @insertps_0x0f(
	; CHECK-NEXT: ret <4 x float> zeroinitializer			; CHECK-NEXT: ret <4 x float> zeroinitializer
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-masked-memops.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	;; MASKED LOADS			;; MASKED LOADS

	; If the mask isn't constant, do nothing.			; If the mask isn't constant, do nothing.

	define <4 x float> @mload(i8* %f, <4 x i32> %mask) {			define <4 x float> @mload(i8* %f, <4 x i32> %mask) {
	; CHECK-LABEL: @mload(			; CHECK-LABEL: @mload(
	; CHECK-NEXT: [[LD:%.]] = tail call <4 x float> @llvm.x86.avx.maskload.ps(i8 [[F:%.]], <4 x i32> [[MASK:%.]])			; CHECK-NEXT: [[LD:%.]] = tail call <4 x float> @llvm.x86.avx.maskload.ps(i8 [[F:%.]], <4 x i32> [[MASK:%.]])
	▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines
	declare void @llvm.x86.avx.maskstore.pd.256(i8*, <4 x i64>, <4 x double>)			declare void @llvm.x86.avx.maskstore.pd.256(i8*, <4 x i64>, <4 x double>)

	declare void @llvm.x86.avx2.maskstore.d(i8*, <4 x i32>, <4 x i32>)			declare void @llvm.x86.avx2.maskstore.d(i8*, <4 x i32>, <4 x i32>)
	declare void @llvm.x86.avx2.maskstore.q(i8*, <2 x i64>, <2 x i64>)			declare void @llvm.x86.avx2.maskstore.q(i8*, <2 x i64>, <2 x i64>)
	declare void @llvm.x86.avx2.maskstore.d.256(i8*, <8 x i32>, <8 x i32>)			declare void @llvm.x86.avx2.maskstore.d.256(i8*, <8 x i32>, <8 x i32>)
	declare void @llvm.x86.avx2.maskstore.q.256(i8*, <4 x i64>, <4 x i64>)			declare void @llvm.x86.avx2.maskstore.q.256(i8*, <4 x i64>, <4 x i64>)

	declare void @llvm.x86.sse2.maskmov.dqu(<16 x i8>, <16 x i8>, i8*)			declare void @llvm.x86.sse2.maskmov.dqu(<16 x i8>, <16 x i8>, i8*)

llvm/test/Transforms/InstCombine/X86/x86-movmsk.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	;			;
	; DemandedBits - MOVMSK zeros the upper bits of the result.			; DemandedBits - MOVMSK zeros the upper bits of the result.
	;			;

	define i32 @test_upper_x86_mmx_pmovmskb(x86_mmx %a0) {			define i32 @test_upper_x86_mmx_pmovmskb(x86_mmx %a0) {
	▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-pack.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	;			;
	; UNDEF Elts			; UNDEF Elts
	;			;

	define <8 x i16> @undef_packssdw_128() {			define <8 x i16> @undef_packssdw_128() {
	; CHECK-LABEL: @undef_packssdw_128(			; CHECK-LABEL: @undef_packssdw_128(
	; CHECK-NEXT: ret <8 x i16> undef			; CHECK-NEXT: ret <8 x i16> undef
	▲ Show 20 Lines • Show All 625 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	; Verify that instcombine is able to fold identity shuffles.			; Verify that instcombine is able to fold identity shuffles.

	define <16 x i8> @identity_test(<16 x i8> %InVec) {			define <16 x i8> @identity_test(<16 x i8> %InVec) {
	; CHECK-LABEL: @identity_test(			; CHECK-LABEL: @identity_test(
	; CHECK-NEXT: ret <16 x i8> [[INVEC:%.*]]			; CHECK-NEXT: ret <16 x i8> [[INVEC:%.*]]
	;			;
	%1 = tail call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> %InVec, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)			%1 = tail call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> %InVec, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
	▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-sse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define float @test_rcp_ss_0(float %a) {			define float @test_rcp_ss_0(float %a) {
	; CHECK-LABEL: @test_rcp_ss_0(			; CHECK-LABEL: @test_rcp_ss_0(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> undef, float [[A:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> undef, float [[A:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = tail call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> [[TMP1]])
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: ret float [[TMP3]]			; CHECK-NEXT: ret float [[TMP3]]
	▲ Show 20 Lines • Show All 684 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-sse2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define double @test_sqrt_sd_0(double %a) {			define double @test_sqrt_sd_0(double %a) {
	; CHECK-LABEL: @test_sqrt_sd_0(			; CHECK-LABEL: @test_sqrt_sd_0(
	; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.sqrt.f64(double [[A:%.]])			; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.sqrt.f64(double [[A:%.]])
	; CHECK-NEXT: ret double [[TMP1]]			; CHECK-NEXT: ret double [[TMP1]]
	;			;
	%1 = insertelement <2 x double> undef, double %a, i32 0			%1 = insertelement <2 x double> undef, double %a, i32 0
	▲ Show 20 Lines • Show All 531 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-sse41.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define <2 x double> @test_round_sd(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_round_sd(<2 x double> %a, <2 x double> %b) {
	; CHECK-LABEL: @test_round_sd(			; CHECK-LABEL: @test_round_sd(
	; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> [[A:%.]], <2 x double> [[B:%.*]], i32 10)			; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> [[A:%.]], <2 x double> [[B:%.*]], i32 10)
	; CHECK-NEXT: ret <2 x double> [[TMP1]]			; CHECK-NEXT: ret <2 x double> [[TMP1]]
	;			;
	%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 0			%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 0
	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	;			;
	; EXTRQ			; EXTRQ
	;			;

	define <2 x i64> @test_extrq_call(<2 x i64> %x, <16 x i8> %y) {			define <2 x i64> @test_extrq_call(<2 x i64> %x, <16 x i8> %y) {
	; CHECK-LABEL: @test_extrq_call(			; CHECK-LABEL: @test_extrq_call(
	; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x i64> @llvm.x86.sse4a.extrq(<2 x i64> [[X:%.]], <16 x i8> [[Y:%.*]]) #1			; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x i64> @llvm.x86.sse4a.extrq(<2 x i64> [[X:%.]], <16 x i8> [[Y:%.*]]) #1
	▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-vec_demanded_elts.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i16 @test1(float %f) {			define i16 @test1(float %f) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: [[TMP281:%.*]] = fadd float %f, -1.000000e+00			; CHECK-NEXT: [[TMP281:%.*]] = fadd float %f, -1.000000e+00
	; CHECK-NEXT: [[TMP373:%.*]] = fmul float [[TMP281]], 5.000000e-01			; CHECK-NEXT: [[TMP373:%.*]] = fmul float [[TMP281]], 5.000000e-01
	; CHECK-NEXT: [[TMP374:%.*]] = insertelement <4 x float> undef, float [[TMP373]], i32 0			; CHECK-NEXT: [[TMP374:%.*]] = insertelement <4 x float> undef, float [[TMP373]], i32 0
	; CHECK-NEXT: [[TMP48:%.*]] = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> [[TMP374]], <4 x float> <float 6.553500e+04, float undef, float undef, float undef>)			; CHECK-NEXT: [[TMP48:%.*]] = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> [[TMP374]], <4 x float> <float 6.553500e+04, float undef, float undef, float undef>)
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	;			;
	; ASHR - Immediate			; ASHR - Immediate
	;			;

	define <8 x i16> @sse2_psrai_w_0(<8 x i16> %v) {			define <8 x i16> @sse2_psrai_w_0(<8 x i16> %v) {
	; CHECK-LABEL: @sse2_psrai_w_0(			; CHECK-LABEL: @sse2_psrai_w_0(
	▲ Show 20 Lines • Show All 3,773 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; Verify that instcombine is able to fold identity shuffles.			; Verify that instcombine is able to fold identity shuffles.

	define <4 x float> @identity_test_vpermilvar_ps(<4 x float> %v) {			define <4 x float> @identity_test_vpermilvar_ps(<4 x float> %v) {
	; CHECK-LABEL: @identity_test_vpermilvar_ps(			; CHECK-LABEL: @identity_test_vpermilvar_ps(
	; CHECK-NEXT: ret <4 x float> [[V:%.*]]			; CHECK-NEXT: ret <4 x float> [[V:%.*]]
	;			;
	▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/X86/x86-xop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-unknown -S \| FileCheck %s

	define <2 x double> @test_vfrcz_sd(<2 x double> %a) {			define <2 x double> @test_vfrcz_sd(<2 x double> %a) {
	; CHECK-LABEL: @test_vfrcz_sd(			; CHECK-LABEL: @test_vfrcz_sd(
	; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x double> @llvm.x86.xop.vfrcz.sd(<2 x double> [[A:%.]])			; CHECK-NEXT: [[TMP1:%.]] = tail call <2 x double> @llvm.x86.xop.vfrcz.sd(<2 x double> [[A:%.]])
	; CHECK-NEXT: ret <2 x double> [[TMP1]]			; CHECK-NEXT: ret <2 x double> [[TMP1]]
	;			;
	%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 1			%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 1
	%2 = tail call <2 x double> @llvm.x86.xop.vfrcz.sd(<2 x double> %1)			%2 = tail call <2 x double> @llvm.x86.xop.vfrcz.sd(<2 x double> %1)
	▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add target-specific inst combiningClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 273425

clang/test/CodeGen/thinlto-distributed-newpm.ll

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/InstCombineTables.td

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h

llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/X86/CMakeLists.txt

llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Transforms/InstCombine/CMakeLists.txt

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

llvm/lib/Transforms/InstCombine/InstCombineTables.td

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll

llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll

llvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

llvm/test/Transforms/InstCombine/AMDGPU/ldexp.ll

llvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll

llvm/test/Transforms/InstCombine/ARM/neon-intrinsics.ll

llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

llvm/test/Transforms/InstCombine/X86/X86FsubCmpCombine.ll

llvm/test/Transforms/InstCombine/X86/addcarry.ll

llvm/test/Transforms/InstCombine/X86/clmulqdq.ll

llvm/test/Transforms/InstCombine/X86/x86-avx2.ll

llvm/test/Transforms/InstCombine/X86/x86-avx512.ll

llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll

llvm/test/Transforms/InstCombine/X86/x86-insertps.ll

llvm/test/Transforms/InstCombine/X86/x86-masked-memops.ll

llvm/test/Transforms/InstCombine/X86/x86-movmsk.ll

llvm/test/Transforms/InstCombine/X86/x86-pack.ll

llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll

llvm/test/Transforms/InstCombine/X86/x86-sse.ll

llvm/test/Transforms/InstCombine/X86/x86-sse2.ll

llvm/test/Transforms/InstCombine/X86/x86-sse41.ll

llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll

llvm/test/Transforms/InstCombine/X86/x86-vec_demanded_elts.ll

llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll

llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll

llvm/test/Transforms/InstCombine/X86/x86-xop.ll

[InstCombine] Add target-specific inst combining
ClosedPublic