This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/7
TargetTransformInfo.h
1
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCalls.cpp
-
InstCombineInternal.h
3
InstructionCombining.cpp

Differential D81728

[InstCombine] Add target-specific inst combining
ClosedPublic

Authored by Flakebi on Jun 12 2020, 3:08 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
majnemer
spatel
lebedev.ri
lattner

Commits

rG2a6c871596ce: [InstCombine] Move target-specific inst combining

Summary

Targets can combine intrinsics in
TargetTransformInfo::instCombineIntrinsic.
This allows accessing target specific features and combining
instructions only if the target supports certain features.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	110 ms	Clang.CodeGen::Unknown Unit Message ("")
	240 ms	Clang.CodeGen::Unknown Unit Message ("")

Event Timeline

Flakebi created this revision.Jun 12 2020, 3:08 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptJun 12 2020, 3:08 AM

lebedev.ri added a reviewer: spatel.Jun 12 2020, 3:32 AM

lebedev.ri added a subscriber: lebedev.ri.

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3781	This opens a dangerous floodgates of instcombine not being target-independent canonicalization pass.

Harbormaster failed remote builds in B60093: Diff 270348!Jun 12 2020, 4:17 AM

To add more context to this, the problem I am facing is that amdgpu image intrinsics are usually called with float arguments. However, on some subtargets/hardware generations it is possible to call them with half arguments.
If llvm is compiling for such a subtarget, it is beneficial to combine

%s32 = fpext half %s to float
call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(…, float %s32, …)

into

call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(…, half %s, …)

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3781	That is the point of this change, to allow target-dependent combinations in TargetTransformInfo::instCombineIntrinsic. Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I don’t have a great overview of LLVM, so I might be wrong on this.

lebedev.ri added inline comments.Jun 12 2020, 6:23 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3781	Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I agree with that, yes. The problem i'm seeing is that even having TTI in the pass "significantly" lowers the barrier of entry for then using TTI to guard some generic transforms in the instcombine.

In D81728#2089644, @Flakebi wrote:

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

The fact that this pass recognizes target-specific intrinsics at all is widely regarded as a mistake:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102317.html

Target-specific transforms should look first at codegen combiners (SDAG or GlobalISel). If that's too late, consider a target-specific IR codegen pass (I think AMDGPU has a few examples of this already). If that's still too late, write a generic IR transform pass that accesses TTI?

In D81728#2089713, @spatel wrote:

In D81728#2089644, @Flakebi wrote:

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

The fact that this pass recognizes target-specific intrinsics at all is widely regarded as a mistake:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102317.html

Target-specific transforms should look first at codegen combiners (SDAG or GlobalISel). If that's too late, consider a target-specific IR codegen pass (I think AMDGPU has a few examples of this already). If that's still too late, write a generic IR transform pass that accesses TTI?

The problem with all of these suggestions is that they're likely technically-inferior solutions compared to sitting inside of InstCombine's fixed-point iteration scheme. Honestly, I think that the way we should ensure that InstCombine does not start using TTI to define a canonical form for non-target-specific intrinsics is via documentation and code review. InstCombine has long had logic to deal with target-specific intrinsics (in InstCombineCalls.cpp), and refactoring things so that this logic can live in each backend seems like an improvement to me.

nikic added a subscriber: nikic.Jun 13 2020, 4:11 AM

nikic added inline comments.Jun 13 2020, 5:49 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
149	Actually implementing this would require us to export the `InstCombiner` class, which is part of `InstCombineInternal.h`. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine.

foad added a subscriber: foad.Jun 16 2020, 6:34 AM

Summarizing the comments, the important points are

Everyone agrees on moving target specific stuff out of Transforms/InstCombine into target specific folders
Keep running the instruction combining in the InstCombine pass, so the fixed-point iteration works

The majority of target specific code is intrinsic combining, there is only one more amdgpu specific part in InstCombineSimplifyDemanded.cpp:SimplifyDemandedVectorElts. Unless someone has an idea on how to implement this in a more generic way, I’ll keep it like in the current diff, only combining intrinsics in TargetTransformInfo::instCombineIntrinsic.

Actually implementing this would require us to export the InstCombiner class, which is part of InstCombineInternal.h. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine.

Good point, I’ll try to add that here in the next week.

Moved most target specific InstCombine parts to their respective targets.
The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. Is there a place where code for these targets is shared?

The gist of these changes is in the following files:

llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
llvm/include/llvm/CodeGen/BasicTTIImpl.h
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
llvm/lib/Analysis/TargetTransformInfo.cpp
llvm/lib/Transforms/InstCombine/InstCombineInternal.h
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

The rest of the changes are moving about 3000 lines out from InstCombine to the targets and slightly adjust them for the new interface, there should be no other changes in there.

Herald added a reviewer: lebedev.ri. · View Herald TranscriptJun 24 2020, 8:52 AM

Herald added subscribers: kerbowa, dmgreen, jfb and 6 others. · View Herald Transcript

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

Herald added a subscriber: • wuzish. · View Herald TranscriptJun 24 2020, 9:42 AM

Harbormaster failed remote builds in B61563: Diff 273054!Jun 24 2020, 10:48 AM

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

In D81728#2112483, @craig.topper wrote:

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

This may be the only example though. I may have introduced something conceptually new without realizing it. The current use also doesn't exactly make the change. It does introduce new instructions, but the pass is still responsible for doing the replacement/delete of the old value

In D81728#2112558, @arsenm wrote:

In D81728#2112483, @craig.topper wrote:

In D81728#2112158, @arsenm wrote:

In D81728#2111901, @craig.topper wrote:

As far as I know and I might be wrong, but TargetTransformInfo up til now has only provided information. It doesn't do any transforms itself. Is adding transforms to it the right thing to do?

This isn't strictly true. I recently added rewriteIntrinsicWithAddressSpace for example

I stand corrected then.

This may be the only example though. I may have introduced something conceptually new without realizing it. The current use also doesn't exactly make the change. It does introduce new instructions, but the pass is still responsible for doing the replacement/delete of the old value

I guess it also modifies the original instruction in place in some cases

Adjust failing clang test, TargetIRAnalysis is run earlier now

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2020, 10:08 AM

Herald added subscribers: cfe-commits, dexonsmith, steven_wu. · View Herald Transcript

Harbormaster failed remote builds in B61775: Diff 273425!Jun 25 2020, 10:15 AM

Rebased, so the automatic builds can run

Harbormaster failed remote builds in B61790: Diff 273458!Jun 25 2020, 12:29 PM

dexonsmith removed a subscriber: dexonsmith.Jun 25 2020, 2:06 PM

We've been handling target-specific intrinsics in InstCombine for a long time, and that's the place where they should naturally sit. This is a pretty clean refactoring in my opinion, I'm in favor. It's substantial enough as a change that it should probably receive a heads-up on llvm-dev, though.

I think an interface usable by InstructionSimplify would be helpful too, so I think that would be a separate thing from TTI

This combines instructions, so I think it belongs into the InstCombine pass. On the other hand, the f16 form of the intrinsics is not available on all targets, so this combination cannot be applied unconditionally but it needs to be gated depending on the target.

I don't think this is a great justification for doing anything here. You can always reverse the transform in isel on targets where it isn't supported; adding more IR patterns increases the potential for missed optimizations.

That said, I think moving the handling for target intrinsics into the target makes sense as a cleanup.

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1444 ↗	(On Diff #273458)	Is there some way we can check that an intrinsic is actually target-specific, to discourage people from handling generic intrinsics in target-specific ways?

foad added a subscriber: bogner.Jun 30 2020, 1:14 AM

foad added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1444 ↗	(On Diff #273458)	That was the intent of @bogner's rG92a8c6112c6571112e8b622bfddc7e4d1685a6fe.

Rebased and call target-specific combining only for target-specific intrinsics as suggested.
Add Function::isTargetIntrinsic() for this purpose.

Harbormaster failed remote builds in B62312: Diff 274436!Jun 30 2020, 7:01 AM

This looks like a great direction, but please make sure to minimize public implementation details. We don't want the vast majority of instcombine to be visible outside of its library (it is hairy enough as it is :-)

llvm/include/llvm/Analysis/TargetTransformInfo.h
29	Can this be forward declared instead of #include'd?
llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
30 ↗	(On Diff #274436)	Please minimize #includes in general, thanks :)
46 ↗	(On Diff #274436)	I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine? If you're curious for a pattern that could be followed, the MLIR AsmParser is a reasonable example. The parser is spread across a bunch of classes in the lib/ directory: https://github.com/llvm/llvm-project/blob/master/mlir/lib/Parser/Parser.cpp But then there is a much smaller public API exposed through a header: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/OpImplementation.h#L229

This revision now requires changes to proceed.Jun 30 2020, 1:24 PM

nhaehnle added inline comments.Jul 1 2020, 7:08 AM

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
46 ↗	(On Diff #274436)	I agree with the sentiment, but note @Flakebi has split up the `InstCombiner` class into `InstCombiner` and `InstCombinerImpl` classes, which addresses those concerns already as far as I'm concerned. Looking through the new `InstCombiner`, aside from methods that are core to the workings of InstCombine (modifying instructions while keeping track of the Worklist) and methods for accessing the analyses, what's left is: A bunch of static methods that should arguably just be global functions in a utils header somewhere. CreateOverflowTuple and CreateNonTerminatorUnreachable Moving those methods feels sensible, but is likely to touch a lot of code, so I think it would be better to do it in a separate commit.

RKSimon added a subscriber: RKSimon.Jul 2 2020, 12:20 AM

Rebased and removed a few includes as suggested.
Make the TargetTransformInfo a private member of InstCombiner because it should not be used in general inst combines.
Move CreateOverflowTuple out of InstCombiner and make CreateNonTerminatorUnreachable static.

I would really rather not make this be a public class - this is a very thick interface. Can this be cut down to something much smaller than the implementation details of InstCombine?

I agrees that keeping the public interface small is desirable and I tried to do that by splitting the class into InstCombiner – the internal, public interface – and InstCombinerImpl – the actual implementation of the pass.
As far as I understand it, LLVM_LIBRARY_VISIBILITY hides this class so it is not visible outside LLVM?

With this change, inst combining is split across several places, the general InstCombine and all the targets. They do similar things with the difference that the inst combining part inside the targets does only have access to the public InstCombiner interface.
As the target specific parts want to use the same helper methods, these helpers need to be in a public interface (public to the targets, not to LLVM users). The most prominent of these helpers is peekThroughBitcast.

Some of these helper functions are currently not used by targets, so they can be moved to a utils header if desired. In general, I think we want them to be shared, so that not every target has its own set of helpers.

Harbormaster failed remote builds in B62975: Diff 275617!Jul 6 2020, 3:42 AM

sameerds added a subscriber: sameerds.Jul 6 2020, 10:26 PM

Rebased (no conflicts this time).

Friendly ping for review.

Harbormaster completed remote builds in B63722: Diff 276983.Jul 10 2020, 5:04 AM

nikic added inline comments.Jul 10 2020, 9:57 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
537	For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction *` argument and bool result, is there any reason not to have an `Instruction ` return value, with nullptr indicating that the intrinsic couldn't be simplified?
539	`const APInt &DemandedMask`?
543	`const APInt &DemandedElts`?

Flakebi marked an inline comment as done.Jul 10 2020, 12:22 PM

Flakebi added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
537	Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is called or other code is executed after `instCombineIntrinsic`. So, somehow the caller must be able to see a difference between 'do nothing, just continue execution' and 'return this Instruction', where the `Instruction` can also be a nullptr. The return type could be an `optional<Instruction*>`. I’ll take a look at your other comments on Monday.

Please don't consider me a blocker on this patch, thank you for pushing on it!

Flakebi marked an inline comment as done.Jul 13 2020, 3:04 AM

Flakebi added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
539	I tried to change it it to to `const APInt &DemandedMask` but the x86 simplifyDemandedVectorEltsIntrinsic changes `DemandedMask`, so this function would have to copy it or take a non-const reference. Looking more into it, `SimplifyAndSetOp` takes `DemandedElts` by value too. An `APInt` consists of a `uint64_t` and an `unsigned`, so it should be 16 Byte in most cases. Only if the represented int is larger than 64 bit, it comes with an allocation. I guess copying should be fine. If you think it should be a reference anyway, let me know and I’ll change it.

Rebased and added some docs.

Is there anything left that needs to be done before this can be pushed?

foad added inline comments.Jul 17 2020, 4:46 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
541–544	Did you consider returning `std::pair<bool,Instruction*>`?

Harbormaster failed remote builds in B64652: Diff 278711!Jul 17 2020, 4:51 AM

Here you go.

Change return types of TargetTransformInfo::instCombineIntrinsic and others to Optional<Instruction *> and Optional<Value *>.

Harbormaster failed remote builds in B64664: Diff 278735!Jul 17 2020, 6:33 AM

dmgreen added inline comments.Jul 21 2020, 2:39 AM

llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll
2 ↗	(On Diff #278735)	Please use the same triple as llc for any test with "mve" in the title.

Rebased and fix triple for Thumb2 tests as suggested.

Thanks

Harbormaster failed remote builds in B65051: Diff 279463!Jul 21 2020, 4:16 AM

This has had a month of good review that has been addressed, I'd say it's good to go.

This revision is now accepted and ready to land.Jul 21 2020, 10:41 AM

Closed by commit rG2a6c871596ce: [InstCombine] Move target-specific inst combining (authored by sebastian-ne). · Explain WhyJul 22 2020, 7:00 AM

This revision was automatically updated to reflect the committed changes.

sebastian-ne mentioned this in rG2a6c871596ce: [InstCombine] Move target-specific inst combining.

I have a multi-stage, auto-git-bisecting bot that has identifying this commit as the source of a regression on Fedora 32 (x86-64). This commit broke my first stage test (release, no asserts). Might a quick fix happen or do we need to revert this?

FAIL: Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c (7188 of 67650)
******************** TEST 'Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c' FAILED ********************
Script:
--
: 'RUN: at line 1';   /tmp/_update_lc/r/bin/clang -cc1 -internal-isystem /tmp/_update_lc/r/lib/clang/12.0.0/include -nostdsysteminc -triple aarch64-arm-none-eabi -target-feature +neon -target-feature +bf16   -O2 -emit-llvm /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c -o - | /tmp/_update_lc/r/bin/FileCheck /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c --check-prefixes=CHECK,CHECK64
: 'RUN: at line 3';   /tmp/_update_lc/r/bin/clang -cc1 -internal-isystem /tmp/_update_lc/r/lib/clang/12.0.0/include -nostdsysteminc -triple armv8.6a-arm-none-eabi -target-feature +neon -target-feature +bf16 -mfloat-abi hard   -O2 -emit-llvm /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c -o - | /tmp/_update_lc/r/bin/FileCheck /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c --check-prefixes=CHECK,CHECK32
--
Exit Code: 1

Command Output (stderr):
--
/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:14:13: error: CHECK32: expected string not found in input
// CHECK32: %1 = load <4 x bfloat>, <4 x bfloat>* %0, align 2
            ^
<stdin>:7:52: note: scanning from here
define arm_aapcs_vfpcc <4 x bfloat> @test_vld1_bf16(bfloat* readonly %ptr) local_unnamed_addr #0 {
                                                   ^
<stdin>:10:5: note: possible intended match here
 %vld1 = tail call <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8* %0, i32 2)
    ^
/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:23:13: error: CHECK32: expected string not found in input
// CHECK32: %1 = load <8 x bfloat>, <8 x bfloat>* %0, align 2
            ^
<stdin>:18:53: note: scanning from here
define arm_aapcs_vfpcc <8 x bfloat> @test_vld1q_bf16(bfloat* readonly %ptr) local_unnamed_addr #2 {
                                                    ^
<stdin>:21:5: note: possible intended match here
 %vld1 = tail call <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8* %0, i32 2)
    ^

Input file: <stdin>
Check file: /home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = '/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c'
            2: source_filename = "/home/dave/ro_s/lp/clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c"
            3: target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
            4: target triple = "armv8.6a-arm-none-eabi"
            5:
            6: ; Function Attrs: nounwind readonly
            7: define arm_aapcs_vfpcc <4 x bfloat> @test_vld1_bf16(bfloat* readonly %ptr) local_unnamed_addr #0 {
check:14'0                                                        X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            8: entry:
check:14'0     ~~~~~~
            9:  %0 = bitcast bfloat* %ptr to i8*
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10:  %vld1 = tail call <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8* %0, i32 2)
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:14'1         ?                                                                          possible intended match
           11:  ret <4 x bfloat> %vld1
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~
           12: }
check:14'0     ~
           13:
check:14'0     ~
           14: ; Function Attrs: argmemonly nounwind readonly
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15: declare <4 x bfloat> @llvm.arm.neon.vld1.v4bf16.p0i8(i8*, i32) #1
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:
check:14'0     ~
           17: ; Function Attrs: nounwind readonly
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18: define arm_aapcs_vfpcc <8 x bfloat> @test_vld1q_bf16(bfloat* readonly %ptr) local_unnamed_addr #2 {
check:14'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:23'0                                                         X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
           19: entry:
check:23'0     ~~~~~~
           20:  %0 = bitcast bfloat* %ptr to i8*
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           21:  %vld1 = tail call <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8* %0, i32 2)
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:23'1         ?                                                                          possible intended match
           22:  ret <8 x bfloat> %vld1
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~
           23: }
check:23'0     ~
           24:
check:23'0     ~
           25: ; Function Attrs: argmemonly nounwind readonly
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           26: declare <8 x bfloat> @llvm.arm.neon.vld1.v8bf16.p0i8(i8*, i32) #1
check:23'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (1):
  Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c


Testing Time: 71.60s
  Unsupported      : 10693
  Passed           : 56854
  Expectedly Failed:   102
  Failed           :     1

Thanks for the notification @davezarzycki, an auto-bisecting bot is cool!

This failure should be fixed in b99898c1e9c5d8bade1d898e84604d3241b0087c.

spatel mentioned this in D111500: [InstSimplify] Simplify intrinsic comparisons with domain knoweldge.Oct 11 2021, 1:39 PM

Allen added a subscriber: Allen.Oct 21 2022, 9:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 9:32 PM

Herald added subscribers: nlopes, kosarev, mattd and 4 others. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

4 lines

CodeGen/

BasicTTIImpl.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

12 lines

InstCombineInternal.h

16 lines

InstructionCombining.cpp

21 lines

Diff 270348

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show All 20 Lines
#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H
#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H

#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
#include <functional>		#include <functional>
		lattnerUnsubmitted Not Done Reply Inline Actions Can this be forward declared instead of #include'd? lattner: Can this be forward declared instead of #include'd?

namespace llvm {		namespace llvm {

namespace Intrinsic {		namespace Intrinsic {
typedef unsigned ID;		typedef unsigned ID;
}		}

class AssumptionCache;		class AssumptionCache;
class BlockFrequencyInfo;		class BlockFrequencyInfo;
class DominatorTree;		class DominatorTree;
class BranchInst;		class BranchInst;
class CallBase;		class CallBase;
class Function;		class Function;
class GlobalValue;		class GlobalValue;
		class InstCombiner;
class IntrinsicInst;		class IntrinsicInst;
class LoadInst;		class LoadInst;
class LoopAccessInfo;		class LoopAccessInfo;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class ProfileSummaryInfo;		class ProfileSummaryInfo;
class SCEV;		class SCEV;
class ScalarEvolution;		class ScalarEvolution;
▲ Show 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
DominatorTree *DT,		DominatorTree *DT,
const LoopAccessInfo *LAI) const;		const LoopAccessInfo *LAI) const;

/// Query the target whether lowering of the llvm.get.active.lane.mask		/// Query the target whether lowering of the llvm.get.active.lane.mask
/// intrinsic is supported and if emitting it is desired for this loop.		/// intrinsic is supported and if emitting it is desired for this loop.
bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,		bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,
bool TailFolded) const;		bool TailFolded) const;

		Value *instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const;

		nikicUnsubmitted Not Done Reply Inline Actions For all three functions, the calling convention seems rather non-idiomatic for InstCombine. Rather than having an `Instruction *` argument and bool result, is there any reason not to have an `Instruction ` return value, with nullptr indicating that the intrinsic couldn't be simplified? nikic: For all three functions, the calling convention seems rather non-idiomatic for InstCombine.
		FlakebiAuthorUnsubmitted Done Reply Inline Actions Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is called or other code is executed after `instCombineIntrinsic`. So, somehow the caller must be able to see a difference between 'do nothing, just continue execution' and 'return this Instruction', where the `Instruction` can also be a nullptr. The return type could be an `optional<Instruction>`. I’ll take a look at your other comments on Monday. Flakebi:* Yes, the function must have the option to return a nullptr and prevent that `visitCallBase` is…
/// @}		/// @}

		nikicUnsubmitted Not Done Reply Inline Actions `const APInt &DemandedMask`? nikic: `const APInt &DemandedMask`?
		FlakebiAuthorUnsubmitted Done Reply Inline Actions I tried to change it it to to `const APInt &DemandedMask` but the x86 simplifyDemandedVectorEltsIntrinsic changes `DemandedMask`, so this function would have to copy it or take a non-const reference. Looking more into it, `SimplifyAndSetOp` takes `DemandedElts` by value too. An `APInt` consists of a `uint64_t` and an `unsigned`, so it should be 16 Byte in most cases. Only if the represented int is larger than 64 bit, it comes with an allocation. I guess copying should be fine. If you think it should be a reference anyway, let me know and I’ll change it. Flakebi: I tried to change it it to to `const APInt &DemandedMask` but the x86…
/// \name Scalar Target Information		/// \name Scalar Target Information
/// @{		/// @{

/// Flags indicating the kind of support for population count.		/// Flags indicating the kind of support for population count.
		nikicUnsubmitted Not Done Reply Inline Actions `const APInt &DemandedElts`? nikic: `const APInt &DemandedElts`?
///		///
		foadUnsubmitted Not Done Reply Inline Actions Did you consider returning `std::pair<bool,Instruction>`? foad:* Did you consider returning `std::pair<bool,Instruction*>`?
/// Compared to the SW implementation, HW support is supposed to		/// Compared to the SW implementation, HW support is supposed to
/// significantly boost the performance when the population is dense, and it		/// significantly boost the performance when the population is dense, and it
/// may or may not degrade performance if the population is sparse. A HW		/// may or may not degrade performance if the population is sparse. A HW
/// support is considered as "Fast" if it can outperform, or is on a par		/// support is considered as "Fast" if it can outperform, or is on a par
/// with, SW implementation when the population is sparse; otherwise, it is		/// with, SW implementation when the population is sparse; otherwise, it is
/// considered as "Slow".		/// considered as "Slow".
enum PopcntSupportKind { PSK_Software, PSK_SlowHardware, PSK_FastHardware };		enum PopcntSupportKind { PSK_Software, PSK_SlowHardware, PSK_FastHardware };

▲ Show 20 Lines • Show All 706 Lines • ▼ Show 20 Lines	virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
TargetLibraryInfo *LibInfo,		TargetLibraryInfo *LibInfo,
HardwareLoopInfo &HWLoopInfo) = 0;		HardwareLoopInfo &HWLoopInfo) = 0;
virtual bool		virtual bool
preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,		preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
AssumptionCache &AC, TargetLibraryInfo *TLI,		AssumptionCache &AC, TargetLibraryInfo *TLI,
DominatorTree DT, const LoopAccessInfo LAI) = 0;		DominatorTree DT, const LoopAccessInfo LAI) = 0;
virtual bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,		virtual bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,
bool TailFolded) = 0;		bool TailFolded) = 0;
		virtual Value *instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace,		int64_t Scale, unsigned AddrSpace,
Instruction *I) = 0;		Instruction *I) = 0;
virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2) = 0;		TargetTransformInfo::LSRCost &C2) = 0;
▲ Show 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
DominatorTree *DT,		DominatorTree *DT,
const LoopAccessInfo *LAI) override {		const LoopAccessInfo *LAI) override {
return Impl.preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return Impl.preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}
bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,		bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,
bool TailFolded) override {		bool TailFolded) override {
return Impl.emitGetActiveLaneMask(L, LI, SE, TailFolded);		return Impl.emitGetActiveLaneMask(L, LI, SE, TailFolded);
}		}
		Value *instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) override {
		return Impl.instCombineIntrinsic(IC, II);
		}
bool isLegalAddImmediate(int64_t Imm) override {		bool isLegalAddImmediate(int64_t Imm) override {
return Impl.isLegalAddImmediate(Imm);		return Impl.isLegalAddImmediate(Imm);
}		}
bool isLegalICmpImmediate(int64_t Imm) override {		bool isLegalICmpImmediate(int64_t Imm) override {
return Impl.isLegalICmpImmediate(Imm);		return Impl.isLegalICmpImmediate(Imm);
}		}
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace,		bool HasBaseReg, int64_t Scale, unsigned AddrSpace,
▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
return false;		return false;
}		}

bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,		bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,
bool TailFold) const {		bool TailFold) const {
return false;		return false;
}		}

		Value *instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
		nikicUnsubmitted Not Done Reply Inline Actions Actually implementing this would require us to export the `InstCombiner` class, which is part of `InstCombineInternal.h`. I don't think we would want to do this in its current form. This would require a larger refactoring to separate out the implementation and API portions of InstCombine. nikic: Actually implementing this would require us to export the `InstCombiner` class, which is part…
		return nullptr;
		}

void getUnrollingPreferences(Loop *, ScalarEvolution &,		void getUnrollingPreferences(Loop *, ScalarEvolution &,
TTI::UnrollingPreferences &) {}		TTI::UnrollingPreferences &) {}

bool isLegalAddImmediate(int64_t Imm) { return false; }		bool isLegalAddImmediate(int64_t Imm) { return false; }

bool isLegalICmpImmediate(int64_t Imm) { return false; }		bool isLegalICmpImmediate(int64_t Imm) { return false; }

bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
▲ Show 20 Lines • Show All 830 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
return BaseT::preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return BaseT::preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}

bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,		bool emitGetActiveLaneMask(Loop L, LoopInfo LI, ScalarEvolution &SE,
bool TailFold) {		bool TailFold) {
return BaseT::emitGetActiveLaneMask(L, LI, SE, TailFold);		return BaseT::emitGetActiveLaneMask(L, LI, SE, TailFold);
}		}

		Value *instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) {
		return BaseT::instCombineIntrinsic(IC, II);
		}

int getInstructionLatency(const Instruction *I) {		int getInstructionLatency(const Instruction *I) {
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
return getST()->getSchedModel().DefaultLoadLatency;		return getST()->getSchedModel().DefaultLoadLatency;

return BaseT::getInstructionLatency(I);		return BaseT::getInstructionLatency(I);
}		}

virtual Optional<unsigned>		virtual Optional<unsigned>
▲ Show 20 Lines • Show All 1,382 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::preferPredicateOverEpilogue(
return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);		return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
}		}

bool TargetTransformInfo::emitGetActiveLaneMask(Loop L, LoopInfo LI,		bool TargetTransformInfo::emitGetActiveLaneMask(Loop L, LoopInfo LI,
ScalarEvolution &SE, bool TailFolded) const {		ScalarEvolution &SE, bool TailFolded) const {
return TTIImpl->emitGetActiveLaneMask(L, LI, SE, TailFolded);		return TTIImpl->emitGetActiveLaneMask(L, LI, SE, TailFolded);
}		}

		Value *TargetTransformInfo::instCombineIntrinsic(InstCombiner &IC,
		IntrinsicInst &II) const {
		return TTIImpl->instCombineIntrinsic(IC, II);
		}

void TargetTransformInfo::getUnrollingPreferences(		void TargetTransformInfo::getUnrollingPreferences(
Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {		Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {
return TTIImpl->getUnrollingPreferences(L, SE, UP);		return TTIImpl->getUnrollingPreferences(L, SE, UP);
}		}

bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {		bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
return TTIImpl->isLegalAddImmediate(Imm);		return TTIImpl->isLegalAddImmediate(Imm);
}		}
▲ Show 20 Lines • Show All 1,092 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 21 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumeBundleQueries.h"		#include "llvm/Analysis/AssumeBundleQueries.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/IntrinsicsARM.h"
#include "llvm/IR/IntrinsicsAArch64.h"		#include "llvm/IR/IntrinsicsAArch64.h"
		#include "llvm/IR/IntrinsicsAMDGPU.h"
		#include "llvm/IR/IntrinsicsARM.h"
#include "llvm/IR/IntrinsicsHexagon.h"		#include "llvm/IR/IntrinsicsHexagon.h"
#include "llvm/IR/IntrinsicsNVPTX.h"		#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsPowerPC.h"		#include "llvm/IR/IntrinsicsPowerPC.h"
		#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
▲ Show 20 Lines • Show All 1,899 Lines • ▼ Show 20 Lines	auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,
unsigned DemandedWidth) {		unsigned DemandedWidth) {
APInt UndefElts(Width, 0);		APInt UndefElts(Width, 0);
APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);		APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);
return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);		return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
};		};

Intrinsic::ID IID = II->getIntrinsicID();		Intrinsic::ID IID = II->getIntrinsicID();
switch (IID) {		switch (IID) {
default: break;
case Intrinsic::objectsize:		case Intrinsic::objectsize:
if (Value V = lowerObjectSizeCall(II, DL, &TLI, /MustSucceed=*/false))		if (Value V = lowerObjectSizeCall(II, DL, &TLI, /MustSucceed=*/false))
return replaceInstUsesWith(CI, V);		return replaceInstUsesWith(CI, V);
return nullptr;		return nullptr;
case Intrinsic::bswap: {		case Intrinsic::bswap: {
Value *IIOperand = II->getArgOperand(0);		Value *IIOperand = II->getArgOperand(0);
Value *X = nullptr;		Value *X = nullptr;

▲ Show 20 Lines • Show All 2,369 Lines • ▼ Show 20 Lines	if (match(NextInst,
}		}
replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));		replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));
}		}
eraseInstFromFunction(*NextInst);		eraseInstFromFunction(*NextInst);
return II;		return II;
}		}
break;		break;
}		}
		default: {
		if (Value V = TTI.instCombineIntrinsic(this, *II))
		return replaceInstUsesWith(*II, V);
		}
}		}
return visitCallBase(*II);		return visitCallBase(*II);
}		}

// Fence instruction simplification		// Fence instruction simplification
Instruction *InstCombiner::visitFenceInst(FenceInst &FI) {		Instruction *InstCombiner::visitFenceInst(FenceInst &FI) {
// Remove identical consecutive fences.		// Remove identical consecutive fences.
Instruction *Next = FI.getNextNonDebugInstruction();		Instruction *Next = FI.getNextNonDebugInstruction();
▲ Show 20 Lines • Show All 791 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	private:
// Mode in which we are running the combiner.		// Mode in which we are running the combiner.
const bool MinimizeSize;		const bool MinimizeSize;

AliasAnalysis *AA;		AliasAnalysis *AA;

// Required analyses.		// Required analyses.
AssumptionCache &AC;		AssumptionCache &AC;
TargetLibraryInfo &TLI;		TargetLibraryInfo &TLI;
		TargetTransformInfo &TTI;
DominatorTree &DT;		DominatorTree &DT;
const DataLayout &DL;		const DataLayout &DL;
const SimplifyQuery SQ;		const SimplifyQuery SQ;
OptimizationRemarkEmitter &ORE;		OptimizationRemarkEmitter &ORE;
BlockFrequencyInfo *BFI;		BlockFrequencyInfo *BFI;
ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;

// Optional analyses. When non-null, these can both be used to do better		// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.		// combining and will be updated to reflect any changes.
LoopInfo *LI;		LoopInfo *LI;

bool MadeIRChange = false;		bool MadeIRChange = false;

public:		public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,		InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,
bool MinimizeSize, AliasAnalysis *AA,		bool MinimizeSize, AliasAnalysis *AA, AssumptionCache &AC,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,		TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,		DominatorTree &DT, OptimizationRemarkEmitter &ORE,
ProfileSummaryInfo PSI, const DataLayout &DL, LoopInfo LI)		BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,
		const DataLayout &DL, LoopInfo *LI)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),		: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
AA(AA), AC(AC), TLI(TLI), DT(DT),		AA(AA), AC(AC), TLI(TLI), TTI(TTI), DT(DT), DL(DL),
DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}		SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}

/// Run the combiner over the entire worklist until it is empty.		/// Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache &getAssumptionCache() const { return AC; }		AssumptionCache &getAssumptionCache() const { return AC; }

const DataLayout &getDataLayout() const { return DL; }		const DataLayout &getDataLayout() const { return DL; }

DominatorTree &getDominatorTree() const { return DT; }		DominatorTree &getDominatorTree() const { return DT; }

LoopInfo *getLoopInfo() const { return LI; }		LoopInfo *getLoopInfo() const { return LI; }

TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }		TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }

		TargetTransformInfo &getTargetTransformInfo() const { return TTI; }

// Visitation implementation - Implement instruction combining for different		// Visitation implementation - Implement instruction combining for different
// instruction types. The semantics are as follows:		// instruction types. The semantics are as follows:
// Return Value:		// Return Value:
// null - No change was made		// null - No change was made
// I - Change was made, I is still valid, I may be dead though		// I - Change was made, I is still valid, I may be dead though
// otherwise - Change was made, replace I with returned instruction		// otherwise - Change was made, replace I with returned instruction
//		//
Instruction *visitFNeg(UnaryOperator &I);		Instruction *visitFNeg(UnaryOperator &I);
▲ Show 20 Lines • Show All 708 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyBlockFrequencyInfo.h"		#include "llvm/Analysis/LazyBlockFrequencyInfo.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 3,584 Lines • ▼ Show 20 Lines	for (Instruction *Inst : reverse(InstrsForInstCombineWorklist)) {
ICWorklist.push(Inst);		ICWorklist.push(Inst);
}		}

return MadeIRChange;		return MadeIRChange;
}		}

static bool combineInstructionsOverFunction(		static bool combineInstructionsOverFunction(
Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,		Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,		AssumptionCache &AC, TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,		DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,
ProfileSummaryInfo PSI, unsigned MaxIterations, LoopInfo LI) {		ProfileSummaryInfo PSI, unsigned MaxIterations, LoopInfo LI) {
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();
MaxIterations = std::min(MaxIterations, LimitMaxIterations.getValue());		MaxIterations = std::min(MaxIterations, LimitMaxIterations.getValue());

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
Show All 27 Lines	if (Iteration > MaxIterations) {
break;		break;
}		}

LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);

InstCombiner IC(Worklist, Builder, F.hasMinSize(), AA,		InstCombiner IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,
AC, TLI, DT, ORE, BFI, PSI, DL, LI);		ORE, BFI, PSI, DL, LI);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;

if (!IC.run())		if (!IC.run())
break;		break;

MadeIRChange = true;		MadeIRChange = true;
}		}

return MadeIRChange;		return MadeIRChange;
}		}

InstCombinePass::InstCombinePass() : MaxIterations(LimitMaxIterations) {}		InstCombinePass::InstCombinePass() : MaxIterations(LimitMaxIterations) {}

InstCombinePass::InstCombinePass(unsigned MaxIterations)		InstCombinePass::InstCombinePass(unsigned MaxIterations)
: MaxIterations(MaxIterations) {}		: MaxIterations(MaxIterations) {}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);		auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
		auto &TTI = AM.getResult<TargetIRAnalysis>(F);

auto *LI = AM.getCachedResult<LoopAnalysis>(F);		auto *LI = AM.getCachedResult<LoopAnalysis>(F);

auto *AA = &AM.getResult<AAManager>(F);		auto *AA = &AM.getResult<AAManager>(F);
auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);		auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());		MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
auto *BFI = (PSI && PSI->hasProfileSummary()) ?		auto *BFI = (PSI && PSI->hasProfileSummary()) ?
&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;		&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;

if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, BFI,		if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
PSI, MaxIterations, LI))		BFI, PSI, MaxIterations, LI))
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<AAManager>();		PA.preserve<AAManager>();
PA.preserve<BasicAA>();		PA.preserve<BasicAA>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {		void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<OptimizationRemarkEmitterWrapperPass>();		AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<BasicAAWrapperPass>();		AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);		LazyBlockFrequencyInfoPass::getLazyBFIAnalysisUsage(AU);
}		}

bool InstructionCombiningPass::runOnFunction(Function &F) {		bool InstructionCombiningPass::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

// Required analyses.		// Required analyses.
auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		lebedev.riUnsubmitted Not Done Reply Inline Actions This opens a dangerous floodgates of instcombine not being target-independent canonicalization pass. lebedev.ri: This opens a dangerous floodgates of instcombine not being target-independent canonicalization…
		FlakebiAuthorUnsubmitted Not Done Reply Inline Actions That is the point of this change, to allow target-dependent combinations in TargetTransformInfo::instCombineIntrinsic. Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I don’t have a great overview of LLVM, so I might be wrong on this. Flakebi: That is the point of this change, to allow target-dependent combinations in TargetTransformInfo…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.) can be moved to their respective target. I agree with that, yes. The problem i'm seeing is that even having TTI in the pass "significantly" lowers the barrier of entry for then using TTI to guard some generic transforms in the instcombine. lebedev.ri: > Imo, all the target specific intrinsic combinations in InstCombineCalls.cpp (x86, amdgpu, etc.
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();		auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();

// Optional analyses.		// Optional analyses.
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();		auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;		auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
BlockFrequencyInfo *BFI =		BlockFrequencyInfo *BFI =
(PSI && PSI->hasProfileSummary()) ?		(PSI && PSI->hasProfileSummary()) ?
&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :		&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
nullptr;		nullptr;

return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, BFI,		return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
PSI, MaxIterations, LI);		BFI, PSI, MaxIterations, LI);
}		}

char InstructionCombiningPass::ID = 0;		char InstructionCombiningPass::ID = 0;

InstructionCombiningPass::InstructionCombiningPass()		InstructionCombiningPass::InstructionCombiningPass()
: FunctionPass(ID), MaxIterations(InstCombineDefaultMaxIterations) {		: FunctionPass(ID), MaxIterations(InstCombineDefaultMaxIterations) {
initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());		initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());
}		}

InstructionCombiningPass::InstructionCombiningPass(unsigned MaxIterations)		InstructionCombiningPass::InstructionCombiningPass(unsigned MaxIterations)
: FunctionPass(ID), MaxIterations(MaxIterations) {		: FunctionPass(ID), MaxIterations(MaxIterations) {
initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());		initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());
}		}

INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)		INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LazyBlockFrequencyInfoPass)		INITIALIZE_PASS_DEPENDENCY(LazyBlockFrequencyInfoPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
Show All 21 Lines