This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/InstCombine/
-
llvm/
-
Transforms/
-
InstCombine/
-
InstCombine.h
-
test/
-
Analysis/ValueTracking/
-
ValueTracking/
1/1
numsignbits-from-assume.ll
-
Other/
-
new-pm-print-pipeline.ll
-
print-debug-counter.ll
-
Transforms/
-
InstCombine/
1/1
merging-multiple-stores-into-successor.ll
1/1
pr55228.ll
1/2
shift.ll
-
PGOProfile/
1/1
chr.ll
-
PhaseOrdering/AArch64/
-
AArch64/
1/1
matrix-extract-insert.ll

Differential D154579

[InstCombine] Only perform one iteration
ClosedPublic

Authored by nikic on Jul 6 2023, 1:40 AM.

Download Raw Diff

Details

Reviewers

goldstein.w.n
aeubanks
fhahn
efriedma
RKSimon

Commits

rG41895843b591: [InstCombine] Only perform one iteration

Summary

InstCombine is a worklist-driven algorithm, which works roughly as follows:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite:

"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point.

This patch proposes to remove the fixpoint iteration from InstCombine, and to always only perform a single iteration. This results in a major compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=b7e38ff22326d7bcbd01f080dc91f47be25e703e&to=40936c7e9324ce41819483f2c02f5bbcefa292a0&stat=instructions%3Au We get a 4-5% compile-time reduction at negligible codegen impact. (These numbers include D75362, which is a non-trivial regression when taken by itself. Most of the size-text changes are also due to that patch, not this one.)

This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases.

In order to prevent accidental optimization regressions in the future, this implements a default-enabled verify-fixpoint option, which will make sure that the fix point has indeed been reached after a single iteration. This means that tests where this is not the case need to be explicitly annotated. The actual optimization pipeline will disable this option, as failure to reach the fix point is expected to happen there (in rare cases, as described above).

Depends on D75362.

Diff Detail

Event Timeline

nikic created this revision.Jul 6 2023, 1:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM

Herald added subscribers: StephenFan, wenlei. · View Herald Transcript

nikic requested review of this revision.Jul 6 2023, 1:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic edited the summary of this revision. (Show Details)Jul 6 2023, 1:45 AM

Left some comments on test diffs. I don't think any of the remaining cases are particularly problematic, though the phi and freeze cases are something that may be worth fixing.

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll
51	This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer instructions in a limited window before the assume. We may fail to fold such cases in one iteration if we first need to fold instructions to bring the assume into a recognized form. Here the assume is only recognized by AC after ule is converted to ult, at which point the add before has already been visited. I don't think this issue matters in practice.
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
31	This is caused by details of how we canonicalize phi operand order. This is easy to fix, it just has annoying test fallout.
llvm/test/Transforms/InstCombine/pr55228.ll
11	This happens because the initializer of the global is not fully folded. This is not a problem when run in a real optimization pipeline, because GlobalOpt will handle such cases earlier.
llvm/test/Transforms/InstCombine/shift.ll
1718	I didn't bother looking into this, because it's a fuzzer test case.
llvm/test/Transforms/PGOProfile/chr.ll
1936	At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to introduce two freeze instructions. We could fix this by allowing the creation of more than one freeze when pushing upward. Especially for icmps that is probably beneficial.
llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
119	This is the same backwards reasoning assume issue mentioned above.

Harbormaster completed remote builds in B243400: Diff 537620.Jul 6 2023, 3:17 AM

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

In D154579#4477044, @RKSimon wrote:

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Normally, InstCombine worklist management is handled implicitly, using a combination of IRBuilder callbacks and standard helpers like replaceInstUsesWith(). Things work automatically as long as folds just replace one sequence of instructions with another. However, for folds that do non-local changes (e.g. looping over users and doing extra replacements there), it may be necessary to perform manual worklist management. I've been working on adding that manual worklist management in all the places that were missing it over the last few weeks, and I'm not aware of any remaining issues. (The most common case is folds leaving behind dead instructions without queuing them for DCE.)

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

Right. Ideally we would always verify the fixpoint for tests (so that an explicitly opt-out is required for cases that don't reach the fixpoint), and not verify it outside tests. Verifying it for -passes=instcombine but not -passes='default<O3>' would be the heuristic for that.

the instcombine<no-verify-fixpoint> approach sgtm

re:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

In D154579#4477942, @goldstein.w.n wrote:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

I don't think I understand your suggestion here. This sparse reprocessing is what the worklist is for -- and we do want to perform the reprocessing as part of the same iteration, not a later one, to make sure that folds working on later instructions see already folded operands, even if arriving at them requires multiple folds. If we delayed all reprocessing until a second iteration, folds would see operands after a single round of folding was applied to them, rather than in their final form.

nikic mentioned this in rG70aca7b12220: [InstCombine] Explicitly track dead edges.Jul 27 2023, 7:41 AM

Implement fix-point verification.

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 27 2023, 7:45 AM

Move stat update.

Harbormaster completed remote builds in B248575: Diff 544777.Jul 27 2023, 10:49 AM

aeubanks added inline comments.Jul 27 2023, 10:51 AM

llvm/lib/Passes/PassBuilderPipelines.cpp
369 ↗	(On Diff #544777)	imo the `InstCombinePass` constructor should default to `no-verify-fixpoint`, but `parseInstCombineOptions` should by default set `verify-fixpoint`, since we typically call the `InstCombinePass` constructor from pass pipelines

Move default to parseInstCombineOptions().

nikic marked an inline comment as done.Jul 28 2023, 1:58 AM

nikic added inline comments.

llvm/lib/Passes/PassBuilderPipelines.cpp
369 ↗	(On Diff #544777)	Good point. In fact, I missed some InstCombinePass() uses in BackendUtil in the previous patch. Doing this in option parsing makes sure all C++ uses of InstCombinePass don't get fixpoint verification.

nikic marked an inline comment as done.Jul 28 2023, 1:59 AM

nikic added inline comments.

llvm/lib/Passes/PassRegistry.def
328 ↗	(On Diff #545068)	This didn't get removed when FUNCTION_PASS_WITH_PARAMS was added below.

Harbormaster completed remote builds in B248785: Diff 545068.Jul 28 2023, 2:52 AM

are you still looking into some of the remaining cases, or is this in a state you want to land now?

In D154579#4543098, @aeubanks wrote:

are you still looking into some of the remaining cases, or is this in a state you want to land now?

This is ready to land as far as I'm concerned.

lgtm

llvm/test/Transforms/InstCombine/shift.ll
5

This revision is now accepted and ready to land.Jul 28 2023, 10:40 AM

aeubanks mentioned this in D75362: [InstCombine] Process blocks in RPO.Jul 28 2023, 10:44 AM

nikic mentioned this in rGad7f02010f32: [InstCombine] Process blocks in RPO.Jul 30 2023, 9:39 AM

This revision was landed with ongoing or failed builds.Jul 31 2023, 1:57 AM

Closed by commit rG41895843b591: [InstCombine] Only perform one iteration (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG41895843b591: [InstCombine] Only perform one iteration.

bjope added a subscriber: bjope.Jul 31 2023, 6:20 AM

bjope added inline comments.

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	We should add "no-verify-fixpoint;verify-fixpoint;" here, right?

bjope added inline comments.Jul 31 2023, 6:23 AM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	Also noticed that `instcombine<verify-fixpoint>` will be tricky to use in fuzzy testing with random pipelines. So I think we will avoid that.

bjope added inline comments.Jul 31 2023, 12:47 PM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	I solved this in https://reviews.llvm.org/rG5fbee1c6e300eee9ce9d18275bf8a6de0a22ba59

nikic added inline comments.Jul 31 2023, 12:50 PM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	Thank you! And yes, for fuzzing purposes, `instcombine<no-verify-fixpoint>` should be used.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

InstCombine/

InstCombine.h

2 lines

test/

Analysis/

ValueTracking/

numsignbits-from-assume.ll

2 lines

Other/

new-pm-print-pipeline.ll

2 lines

print-debug-counter.ll

2 lines

Transforms/

InstCombine/

merging-multiple-stores-into-successor.ll

12 lines

pr55228.ll

2 lines

shift.ll

2 lines

PGOProfile/

chr.ll

12 lines

PhaseOrdering/

AArch64/

matrix-extract-insert.ll

12 lines

Diff 537620

llvm/include/llvm/Transforms/InstCombine/InstCombine.h

	Show All 19 Lines
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	#define DEBUG_TYPE "instcombine"			#define DEBUG_TYPE "instcombine"
	#include "llvm/Transforms/Utils/InstructionWorklist.h"			#include "llvm/Transforms/Utils/InstructionWorklist.h"

	namespace llvm {			namespace llvm {

	static constexpr unsigned InstCombineDefaultMaxIterations = 1000;			static constexpr unsigned InstCombineDefaultMaxIterations = 1;

	struct InstCombineOptions {			struct InstCombineOptions {
	bool UseLoopInfo = false;			bool UseLoopInfo = false;
	unsigned MaxIterations = InstCombineDefaultMaxIterations;			unsigned MaxIterations = InstCombineDefaultMaxIterations;

	InstCombineOptions() = default;			InstCombineOptions() = default;

	InstCombineOptions &setUseLoopInfo(bool Value) {			InstCombineOptions &setUseLoopInfo(bool Value) {
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%cond = icmp ule i32 %sub, 42		%cond = icmp ule i32 %sub, 42
call void @llvm.assume(i1 %cond)		call void @llvm.assume(i1 %cond)
%sh = shl i32 %sub, 3		%sh = shl i32 %sub, 3
ret i32 %sh		ret i32 %sh
}		}

define i32 @computeNumSignBits_sub2(i32 %in) {		define i32 @computeNumSignBits_sub2(i32 %in) {
; CHECK-LABEL: @computeNumSignBits_sub2(		; CHECK-LABEL: @computeNumSignBits_sub2(
; CHECK-NEXT: [[SUB:%.]] = add nsw i32 [[IN:%.]], -1		; CHECK-NEXT: [[SUB:%.]] = add i32 [[IN:%.]], -1
		nikicAuthorUnsubmitted Done Reply Inline Actions This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer instructions in a limited window before the assume. We may fail to fold such cases in one iteration if we first need to fold instructions to bring the assume into a recognized form. Here the assume is only recognized by AC after ule is converted to ult, at which point the add before has already been visited. I don't think this issue matters in practice. nikic: This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer…
; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[SUB]], 43		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[SUB]], 43
; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])		; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])
; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[SUB]], 3		; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[SUB]], 3
; CHECK-NEXT: ret i32 [[SH]]		; CHECK-NEXT: ret i32 [[SH]]
;		;
%sub = sub i32 %in, 1		%sub = sub i32 %in, 1
%cond = icmp ule i32 %sub, 42		%cond = icmp ule i32 %sub, 42
call void @llvm.assume(i1 %cond)		call void @llvm.assume(i1 %cond)
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-print-pipeline.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; RUN: opt -disable-output -passes='default<O3>' < %s			; RUN: opt -disable-output -passes='default<O3>' < %s

	;; Test SeparateConstOffsetFromGEPPass option.			;; Test SeparateConstOffsetFromGEPPass option.
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='separate-const-offset-from-gep<lower-gep>' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-27			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='separate-const-offset-from-gep<lower-gep>' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-27
	; CHECK-27: function(separate-const-offset-from-gep<lower-gep>)			; CHECK-27: function(separate-const-offset-from-gep<lower-gep>)

	;; Test InstCombine options - the first pass checks default settings, and the second checks customized options.			;; Test InstCombine options - the first pass checks default settings, and the second checks customized options.
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='function(instcombine,instcombine<use-loop-info;max-iterations=42>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-28			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='function(instcombine,instcombine<use-loop-info;max-iterations=42>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-28
	; CHECK-28: function(instcombine<max-iterations=1000;no-use-loop-info>,instcombine<max-iterations=42;use-loop-info>)			; CHECK-28: function(instcombine<max-iterations=1;no-use-loop-info>,instcombine<max-iterations=42;use-loop-info>)

	;; Test function-attrs			;; Test function-attrs
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function-attrs<skip-non-recursive>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-29			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function-attrs<skip-non-recursive>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-29
	; CHECK-29: cgscc(function-attrs<skip-non-recursive>)			; CHECK-29: cgscc(function-attrs<skip-non-recursive>)

	;; Test cgscc -> function adaptor			;; Test cgscc -> function adaptor
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function<eager-inv;no-rerun>(no-op-function))' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-30			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function<eager-inv;no-rerun>(no-op-function))' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-30
	; CHECK-30: cgscc(function<eager-inv;no-rerun>(no-op-function))			; CHECK-30: cgscc(function<eager-inv;no-rerun>(no-op-function))
	Show All 15 Lines

llvm/test/Other/print-debug-counter.ll

	; REQUIRES: asserts			; REQUIRES: asserts

	; RUN: opt -S -debug-counter=early-cse-skip=1,early-cse-count=1 -passes=early-cse,newgvn,instcombine -earlycse-debug-hash \			; RUN: opt -S -debug-counter=early-cse-skip=1,early-cse-count=1 -passes=early-cse,newgvn,instcombine -earlycse-debug-hash \
	; RUN: -debug-counter=newgvn-vn-skip=1,newgvn-vn-count=2 \			; RUN: -debug-counter=newgvn-vn-skip=1,newgvn-vn-count=2 \
	; RUN: -print-debug-counter < %s 2>&1 \| FileCheck %s			; RUN: -print-debug-counter < %s 2>&1 \| FileCheck %s
	;; Test debug counter prints correct info in right order.			;; Test debug counter prints correct info in right order.
	; CHECK-LABEL: Counters and values:			; CHECK-LABEL: Counters and values:
	; CHECK: early-cse			; CHECK: early-cse
	; CHECK-SAME: {4,1,1}			; CHECK-SAME: {4,1,1}
	; CHECK: instcombine-visit			; CHECK: instcombine-visit
	; CHECK-SAME: {13,0,-1}			; CHECK-SAME: {12,0,-1}
	; CHECK: newgvn-vn			; CHECK: newgvn-vn
	; CHECK-SAME: {9,1,2}			; CHECK-SAME: {9,1,2}
	define i32 @f1(i32 %a, i32 %b) {			define i32 @f1(i32 %a, i32 %b) {
	bb:			bb:
	%add1 = add i32 %a, %b			%add1 = add i32 %a, %b
	%add2 = add i32 %a, %b			%add2 = add i32 %a, %b
	%add3 = add i32 %a, %b			%add3 = add i32 %a, %b
	%add4 = add i32 %a, %b			%add4 = add i32 %a, %b
	Show All 13 Lines

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

Show All 22 Lines
; CHECK-NEXT: [[I3:%.*]] = icmp eq i32 [[I2]], 0		; CHECK-NEXT: [[I3:%.*]] = icmp eq i32 [[I2]], 0
; CHECK-NEXT: [[I6:%.*]] = load i64, ptr @var_5, align 8		; CHECK-NEXT: [[I6:%.*]] = load i64, ptr @var_5, align 8
; CHECK-NEXT: [[I5:%.*]] = sext i16 [[I4]] to i64		; CHECK-NEXT: [[I5:%.*]] = sext i16 [[I4]] to i64
; CHECK-NEXT: [[I7:%.*]] = select i1 [[I3]], i64 [[I6]], i64 [[I5]]		; CHECK-NEXT: [[I7:%.*]] = select i1 [[I3]], i64 [[I6]], i64 [[I5]]
; CHECK-NEXT: [[I11:%.*]] = trunc i64 [[I7]] to i32		; CHECK-NEXT: [[I11:%.*]] = trunc i64 [[I7]] to i32
; CHECK-NEXT: br label [[BB12]]		; CHECK-NEXT: br label [[BB12]]
; CHECK: bb12:		; CHECK: bb12:
; CHECK-NEXT: [[STOREMERGE1:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ]		; CHECK-NEXT: [[STOREMERGE1:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ]
		; CHECK-NEXT: [[STOREMERGE:%.*]] = phi i32 [ 1, [[BB9]] ], [ [[I11]], [[BB10]] ]
		nikicAuthorUnsubmitted Done Reply Inline Actions This is caused by details of how we canonicalize phi operand order. This is easy to fix, it just has annoying test fallout. nikic: This is caused by details of how we canonicalize phi operand order. This is easy to fix, it…
; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr @arr_2, align 4		; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr @arr_2, align 4
; CHECK-NEXT: store i16 [[I4]], ptr @arr_4, align 2		; CHECK-NEXT: store i16 [[I4]], ptr @arr_4, align 2
; CHECK-NEXT: [[I8:%.*]] = sext i16 [[I4]] to i32		; CHECK-NEXT: [[I8:%.*]] = sext i16 [[I4]] to i32
; CHECK-NEXT: store i32 [[I8]], ptr @arr_3, align 16		; CHECK-NEXT: store i32 [[I8]], ptr @arr_3, align 16
; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4		; CHECK-NEXT: store i32 [[STOREMERGE]], ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4
; CHECK-NEXT: store i16 [[I4]], ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2		; CHECK-NEXT: store i16 [[I4]], ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2
; CHECK-NEXT: store i32 [[I8]], ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4		; CHECK-NEXT: store i32 [[I8]], ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
bb:		bb:
%i = load i8, ptr @var_7, align 1		%i = load i8, ptr @var_7, align 1
%i1 = icmp eq i8 %i, -1		%i1 = icmp eq i8 %i, -1
%i2 = load i32, ptr @var_1, align 4		%i2 = load i32, ptr @var_1, align 4
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	BB1:
store ptr %b, ptr %alloca		store ptr %b, ptr %alloca
br label %sink		br label %sink
sink:		sink:
%val = load i64, ptr %alloca		%val = load i64, ptr %alloca
ret i64 %val		ret i64 %val
}		}

define ptr @inttoptr_merge(i1 %cond, i64 %a, ptr %b) {		define ptr @inttoptr_merge(i1 %cond, i64 %a, ptr %b) {
; CHECK-LABEL: define ptr @inttoptr_merge		; CHECK-LABEL: @inttoptr_merge(
; CHECK-SAME: (i1 [[COND:%.]], i64 [[A:%.]], ptr [[B:%.*]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[COND]], label [[BB0:%.]], label [[BB1:%.]]		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
; CHECK: BB0:		; CHECK: BB0:
; CHECK-NEXT: [[TMP0:%.*]] = inttoptr i64 [[A]] to ptr		; CHECK-NEXT: [[TMP0:%.]] = inttoptr i64 [[A:%.]] to ptr
; CHECK-NEXT: br label [[SINK:%.*]]		; CHECK-NEXT: br label [[SINK:%.*]]
; CHECK: BB1:		; CHECK: BB1:
; CHECK-NEXT: br label [[SINK]]		; CHECK-NEXT: br label [[SINK]]
; CHECK: sink:		; CHECK: sink:
; CHECK-NEXT: [[STOREMERGE:%.*]] = phi ptr [ [[B]], [[BB1]] ], [ [[TMP0]], [[BB0]] ]		; CHECK-NEXT: [[STOREMERGE:%.]] = phi ptr [ [[B:%.]], [[BB1]] ], [ [[TMP0]], [[BB0]] ]
; CHECK-NEXT: ret ptr [[STOREMERGE]]		; CHECK-NEXT: ret ptr [[STOREMERGE]]
;		;
entry:		entry:
%alloca = alloca ptr		%alloca = alloca ptr
br i1 %cond, label %BB0, label %BB1		br i1 %cond, label %BB0, label %BB1
BB0:		BB0:
store i64 %a, ptr %alloca, align 8		store i64 %a, ptr %alloca, align 8
br label %sink		br label %sink
BB1:		BB1:
store ptr %b, ptr %alloca, align 8		store ptr %b, ptr %alloca, align 8
br label %sink		br label %sink
sink:		sink:
%val = load ptr, ptr %alloca		%val = load ptr, ptr %alloca
ret ptr %val		ret ptr %val
}		}

llvm/test/Transforms/InstCombine/pr55228.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=instcombine < %s \| FileCheck %s			; RUN: opt -S -passes=instcombine < %s \| FileCheck %s

	target datalayout = "p:8:8"			target datalayout = "p:8:8"

	@g = external global i8			@g = external global i8
	@c = constant ptr getelementptr inbounds (i8, ptr @g, i64 1)			@c = constant ptr getelementptr inbounds (i8, ptr @g, i64 1)

	define i1 @test(ptr %p) {			define i1 @test(ptr %p) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[P:%.]], getelementptr inbounds (i8, ptr @g, i8 1)			; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[P:%.]], getelementptr inbounds (i8, ptr @g, i64 1)
				nikicAuthorUnsubmitted Done Reply Inline Actions This happens because the initializer of the global is not fully folded. This is not a problem when run in a real optimization pipeline, because GlobalOpt will handle such cases earlier. nikic: This happens because the initializer of the global is not fully folded. This is not a problem…
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	%alloca = alloca ptr			%alloca = alloca ptr
	call void @llvm.memcpy.p0.p0.i32(ptr %alloca, ptr @c, i32 0, i1 false)			call void @llvm.memcpy.p0.p0.i32(ptr %alloca, ptr @c, i32 0, i1 false)
	%load = load ptr, ptr %alloca			%load = load ptr, ptr %alloca
	%cmp = icmp eq ptr %p, %load			%cmp = icmp eq ptr %p, %load
	ret i1 %cmp			ret i1 %cmp
	}			}

	declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg)			declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg)

llvm/test/Transforms/InstCombine/shift.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

; RUN: opt < %s -passes=instcombine -S | FileCheck %s

declare void @use(i64)

declare void @use_i32(i32)

aeubanksUnsubmitted

Not Done

; The fuzzer-generated @ashr_out_of_range test case does not reach a fixpoint,

- ; because a logical and it not relaxed to a bitwise and in one iteration.

+ ; because a logical and is not relaxed to a bitwise and in one iteration.

declare void @use(i64)

aeubanks:

declare i32 @llvm.cttz.i32(i32, i1 immarg)

declare <2 x i8> @llvm.cttz.v2i8(<2 x i8>, i1 immarg)

define <4 x i32> @lshr_non_splat_vector(<4 x i32> %A) {

; CHECK-LABEL: @lshr_non_splat_vector(

; CHECK-NEXT: [[B:%.*]] = lshr <4 x i32> [[A:%.*]], <i32 32, i32 1, i32 2, i32 3>

; CHECK-NEXT: ret <4 x i32> [[B]]

▲ Show 20 Lines • Show All 1,696 Lines • ▼ Show 20 Lines

; CHECK-LABEL: @ashr_out_of_range(

; CHECK-NEXT: [[L:%.*]] = load i177, ptr [[A:%.*]], align 4

; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i177 [[L]], -1

; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 -1, i64 -2

; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]

; CHECK-NEXT: [[L7:%.*]] = load i177, ptr [[G11]], align 4

; CHECK-NEXT: [[L7_FROZEN:%.*]] = freeze i177 [[L7]]

; CHECK-NEXT: [[C171:%.*]] = icmp slt i177 [[L7_FROZEN]], 0

; CHECK-NEXT: [[C17:%.*]] = and i1 [[TMP1]], [[C171]]

; CHECK-NEXT: [[C17:%.*]] = select i1 [[TMP1]], i1 [[C171]], i1 false

nikicAuthorUnsubmitted

Done

I didn't bother looking into this, because it's a fuzzer test case.

nikic: I didn't bother looking into this, because it's a fuzzer test case.

; CHECK-NEXT: [[TMP3:%.*]] = sext i1 [[C17]] to i64

; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP3]]

; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i177 [[L7_FROZEN]], -1

; CHECK-NEXT: [[B28:%.*]] = select i1 [[TMP4]], i177 0, i177 [[L7_FROZEN]]

; CHECK-NEXT: store i177 [[B28]], ptr [[G62]], align 4

; CHECK-NEXT: ret void

;

%L = load i177, ptr %A

▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/chr.ll

	Show First 20 Lines • Show All 1,926 Lines • ▼ Show 20 Lines
	; foo();			; foo();
	; }			; }
	; return 45;			; return 45;
	define i32 @test_chr_21(i64 %i, i64 %k, i64 %j) !prof !14 {			define i32 @test_chr_21(i64 %i, i64 %k, i64 %j) !prof !14 {
	; CHECK-LABEL: @test_chr_21(			; CHECK-LABEL: @test_chr_21(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[J_FR:%.]] = freeze i64 [[J:%.]]			; CHECK-NEXT: [[J_FR:%.]] = freeze i64 [[J:%.]]
	; CHECK-NEXT: [[I_FR:%.]] = freeze i64 [[I:%.]]			; CHECK-NEXT: [[I_FR:%.]] = freeze i64 [[I:%.]]
	; CHECK-NEXT: [[K_FR:%.]] = freeze i64 [[K:%.]]			; CHECK-NEXT: [[CMP0:%.]] = icmp ne i64 [[J_FR]], [[K:%.]]
	; CHECK-NEXT: [[CMP0:%.*]] = icmp ne i64 [[J_FR]], [[K_FR]]			; CHECK-NEXT: [[TMP0:%.*]] = freeze i1 [[CMP0]]
				nikicAuthorUnsubmitted Done Reply Inline Actions At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to introduce two freeze instructions. We could fix this by allowing the creation of more than one freeze when pushing upward. Especially for icmps that is probably beneficial. nikic: At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to…
	; CHECK-NEXT: [[CMP3:%.*]] = icmp ne i64 [[I_FR]], [[J_FR]]			; CHECK-NEXT: [[CMP3:%.*]] = icmp ne i64 [[I_FR]], [[J_FR]]
	; CHECK-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[I_FR]], 86			; CHECK-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[I_FR]], 86
	; CHECK-NEXT: [[TMP0:%.*]] = and i1 [[CMP0]], [[CMP3]]			; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP3]]
	; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP_I]]			; CHECK-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[CMP_I]]
	; CHECK-NEXT: br i1 [[TMP1]], label [[BB1:%.]], label [[ENTRY_SPLIT_NONCHR:%.]], !prof [[PROF15]]			; CHECK-NEXT: br i1 [[TMP2]], label [[BB1:%.]], label [[ENTRY_SPLIT_NONCHR:%.]], !prof [[PROF15]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[CMP2:%.*]] = icmp ne i64 [[I_FR]], 2			; CHECK-NEXT: [[CMP2:%.*]] = icmp ne i64 [[I_FR]], 2
	; CHECK-NEXT: switch i64 [[I_FR]], label [[BB2:%.*]] [			; CHECK-NEXT: switch i64 [[I_FR]], label [[BB2:%.*]] [
	; CHECK-NEXT: i64 2, label [[BB3_NONCHR2:%.*]]			; CHECK-NEXT: i64 2, label [[BB3_NONCHR2:%.*]]
	; CHECK-NEXT: i64 86, label [[BB2_NONCHR1:%.*]]			; CHECK-NEXT: i64 86, label [[BB2_NONCHR1:%.*]]
	; CHECK-NEXT: ], !prof [[PROF19:![0-9]+]]			; CHECK-NEXT: ], !prof [[PROF19:![0-9]+]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB7:%.*]]			; CHECK-NEXT: br label [[BB7:%.*]]
	; CHECK: bb2.nonchr1:			; CHECK: bb2.nonchr1:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB3_NONCHR2]]			; CHECK-NEXT: br label [[BB3_NONCHR2]]
	; CHECK: bb3.nonchr2:			; CHECK: bb3.nonchr2:
	; CHECK-NEXT: br i1 [[CMP_I]], label [[BB4_NONCHR3:%.*]], label [[BB7]], !prof [[PROF18]]			; CHECK-NEXT: br i1 [[CMP_I]], label [[BB4_NONCHR3:%.*]], label [[BB7]], !prof [[PROF18]]
	; CHECK: bb4.nonchr3:			; CHECK: bb4.nonchr3:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB7]]			; CHECK-NEXT: br label [[BB7]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB10:%.*]]			; CHECK-NEXT: br label [[BB10:%.*]]
	; CHECK: entry.split.nonchr:			; CHECK: entry.split.nonchr:
	; CHECK-NEXT: br i1 [[CMP0]], label [[BB1_NONCHR:%.*]], label [[BB10]], !prof [[PROF18]]			; CHECK-NEXT: br i1 [[TMP0]], label [[BB1_NONCHR:%.*]], label [[BB10]], !prof [[PROF18]]
	; CHECK: bb1.nonchr:			; CHECK: bb1.nonchr:
	; CHECK-NEXT: [[CMP2_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 2			; CHECK-NEXT: [[CMP2_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 2
	; CHECK-NEXT: br i1 [[CMP2_NONCHR]], label [[BB3_NONCHR:%.]], label [[BB2_NONCHR:%.]], !prof [[PROF16]]			; CHECK-NEXT: br i1 [[CMP2_NONCHR]], label [[BB3_NONCHR:%.]], label [[BB2_NONCHR:%.]], !prof [[PROF16]]
	; CHECK: bb3.nonchr:			; CHECK: bb3.nonchr:
	; CHECK-NEXT: [[CMP_I_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 86			; CHECK-NEXT: [[CMP_I_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 86
	; CHECK-NEXT: br i1 [[CMP_I_NONCHR]], label [[BB6_NONCHR:%.]], label [[BB4_NONCHR:%.]], !prof [[PROF16]]			; CHECK-NEXT: br i1 [[CMP_I_NONCHR]], label [[BB6_NONCHR:%.]], label [[BB4_NONCHR:%.]], !prof [[PROF16]]
	; CHECK: bb6.nonchr:			; CHECK: bb6.nonchr:
	; CHECK-NEXT: [[CMP3_NONCHR:%.*]] = icmp eq i64 [[J_FR]], [[I_FR]]			; CHECK-NEXT: [[CMP3_NONCHR:%.*]] = icmp eq i64 [[J_FR]], [[I_FR]]
	▲ Show 20 Lines • Show All 731 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us:
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[CONV6]], 15			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[CONV6]], 15
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ult i32 [[I]], 210			; CHECK-NEXT: [[TMP6:%.*]] = icmp ult i32 [[I]], 210
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP6]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP6]])
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP5]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_1:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_1:%.*]]
	; CHECK: for.body4.us.1:			; CHECK: for.body4.us.1:
	; CHECK-NEXT: [[K_011_US_1:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US]] ], [ [[INC_US_1:%.]], [[FOR_BODY4_US_1]] ]			; CHECK-NEXT: [[K_011_US_1:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US]] ], [ [[INC_US_1:%.]], [[FOR_BODY4_US_1]] ]
	; CHECK-NEXT: [[NARROW:%.*]] = add nuw nsw i32 [[K_011_US_1]], 15			; CHECK-NEXT: [[CONV_US_1:%.*]] = zext i32 [[K_011_US_1]] to i64
	; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[NARROW]] to i64			; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[CONV_US_1]], 15
				nikicAuthorUnsubmitted Done Reply Inline Actions This is the same backwards reasoning assume issue mentioned above. nikic: This is the same backwards reasoning assume issue mentioned above.
	; CHECK-NEXT: [[TMP9:%.*]] = icmp ult i32 [[K_011_US_1]], 210			; CHECK-NEXT: [[TMP9:%.*]] = icmp ult i32 [[K_011_US_1]], 210
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP9]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP9]])
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP8]]
	; CHECK-NEXT: [[MATRIXEXT_US_1:%.*]] = load double, ptr [[TMP10]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_1:%.*]] = load double, ptr [[TMP10]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_1:%.*]] = load double, ptr [[TMP7]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_1:%.*]] = load double, ptr [[TMP7]], align 8
	; CHECK-NEXT: [[MUL_US_1:%.*]] = fmul double [[MATRIXEXT_US_1]], [[MATRIXEXT8_US_1]]			; CHECK-NEXT: [[MUL_US_1:%.*]] = fmul double [[MATRIXEXT_US_1]], [[MATRIXEXT8_US_1]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP8]]			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP8]]
	; CHECK-NEXT: [[MATRIXEXT11_US_1:%.*]] = load double, ptr [[TMP11]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_1:%.*]] = load double, ptr [[TMP11]], align 8
	; CHECK-NEXT: [[SUB_US_1:%.*]] = fsub double [[MATRIXEXT11_US_1]], [[MUL_US_1]]			; CHECK-NEXT: [[SUB_US_1:%.*]] = fsub double [[MATRIXEXT11_US_1]], [[MUL_US_1]]
	; CHECK-NEXT: store double [[SUB_US_1]], ptr [[TMP11]], align 8			; CHECK-NEXT: store double [[SUB_US_1]], ptr [[TMP11]], align 8
	; CHECK-NEXT: [[INC_US_1]] = add nuw nsw i32 [[K_011_US_1]], 1			; CHECK-NEXT: [[INC_US_1]] = add nuw nsw i32 [[K_011_US_1]], 1
	; CHECK-NEXT: [[CMP2_US_1:%.*]] = icmp ult i32 [[INC_US_1]], [[I]]			; CHECK-NEXT: [[CMP2_US_1:%.*]] = icmp ult i32 [[INC_US_1]], [[I]]
	; CHECK-NEXT: br i1 [[CMP2_US_1]], label [[FOR_BODY4_US_1]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1:%.*]]			; CHECK-NEXT: br i1 [[CMP2_US_1]], label [[FOR_BODY4_US_1]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1:%.*]]
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:
	; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[CONV6]], 30			; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[CONV6]], 30
	; CHECK-NEXT: [[TMP13:%.*]] = icmp ult i32 [[I]], 195			; CHECK-NEXT: [[TMP13:%.*]] = icmp ult i32 [[I]], 195
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP13]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP13]])
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP12]]			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP12]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]
	; CHECK: for.body4.us.2:			; CHECK: for.body4.us.2:
	; CHECK-NEXT: [[K_011_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]			; CHECK-NEXT: [[K_011_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]
	; CHECK-NEXT: [[NARROW14:%.*]] = add nuw nsw i32 [[K_011_US_2]], 30			; CHECK-NEXT: [[CONV_US_2:%.*]] = zext i32 [[K_011_US_2]] to i64
	; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[NARROW14]] to i64			; CHECK-NEXT: [[TMP15:%.*]] = add nuw nsw i64 [[CONV_US_2]], 30
	; CHECK-NEXT: [[TMP16:%.*]] = icmp ult i32 [[K_011_US_2]], 195			; CHECK-NEXT: [[TMP16:%.*]] = icmp ult i32 [[K_011_US_2]], 195
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP16]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP16]])
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP15]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP15]]
	; CHECK-NEXT: [[MATRIXEXT_US_2:%.*]] = load double, ptr [[TMP17]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_2:%.*]] = load double, ptr [[TMP17]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_2:%.*]] = load double, ptr [[TMP14]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_2:%.*]] = load double, ptr [[TMP14]], align 8
	; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]			; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP15]]			; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP15]]
	; CHECK-NEXT: [[MATRIXEXT11_US_2:%.*]] = load double, ptr [[TMP18]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_2:%.*]] = load double, ptr [[TMP18]], align 8
	; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]			; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]
	; CHECK-NEXT: store double [[SUB_US_2]], ptr [[TMP18]], align 8			; CHECK-NEXT: store double [[SUB_US_2]], ptr [[TMP18]], align 8
	; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_011_US_2]], 1			; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_011_US_2]], 1
	; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]			; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]
	; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]			; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:
	; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[CONV6]], 45			; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[CONV6]], 45
	; CHECK-NEXT: [[TMP20:%.*]] = icmp ult i32 [[I]], 180			; CHECK-NEXT: [[TMP20:%.*]] = icmp ult i32 [[I]], 180
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP20]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP20]])
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP19]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]
	; CHECK: for.body4.us.3:			; CHECK: for.body4.us.3:
	; CHECK-NEXT: [[K_011_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]			; CHECK-NEXT: [[K_011_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]
	; CHECK-NEXT: [[NARROW15:%.*]] = add nuw nsw i32 [[K_011_US_3]], 45			; CHECK-NEXT: [[CONV_US_3:%.*]] = zext i32 [[K_011_US_3]] to i64
	; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[NARROW15]] to i64			; CHECK-NEXT: [[TMP22:%.*]] = add nuw nsw i64 [[CONV_US_3]], 45
	; CHECK-NEXT: [[TMP23:%.*]] = icmp ult i32 [[K_011_US_3]], 180			; CHECK-NEXT: [[TMP23:%.*]] = icmp ult i32 [[K_011_US_3]], 180
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP23]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP23]])
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP22]]			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP22]]
	; CHECK-NEXT: [[MATRIXEXT_US_3:%.*]] = load double, ptr [[TMP24]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_3:%.*]] = load double, ptr [[TMP24]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_3:%.*]] = load double, ptr [[TMP21]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_3:%.*]] = load double, ptr [[TMP21]], align 8
	; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]			; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]
	; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP22]]			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP22]]
	; CHECK-NEXT: [[MATRIXEXT11_US_3:%.*]] = load double, ptr [[TMP25]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_3:%.*]] = load double, ptr [[TMP25]], align 8
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Only perform one iterationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 537620

llvm/include/llvm/Transforms/InstCombine/InstCombine.h

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll

llvm/test/Other/new-pm-print-pipeline.ll

llvm/test/Other/print-debug-counter.ll

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

llvm/test/Transforms/InstCombine/pr55228.ll

llvm/test/Transforms/InstCombine/shift.ll

llvm/test/Transforms/PGOProfile/chr.ll

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

[InstCombine] Only perform one iteration
ClosedPublic