This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/InstCombine/
-
llvm/
-
Transforms/
-
InstCombine/
-
InstCombine.h
-
lib/
-
Passes/
-
PassBuilder.cpp
2/2
PassBuilderPipelines.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstructionCombining.cpp
-
test/
-
Analysis/ValueTracking/
-
ValueTracking/
1/1
numsignbits-from-assume.ll
-
Other/
-
new-pm-print-pipeline.ll
-
Transforms/
-
InstCombine/
-
constant-fold-iteration.ll
1/1
merging-multiple-stores-into-successor.ll
1/1
pr55228.ll
1/2
shift.ll
-
PGOProfile/
1/1
chr.ll
-
PhaseOrdering/AArch64/
-
AArch64/
1/1
matrix-extract-insert.ll

Differential D154579

[InstCombine] Only perform one iteration
ClosedPublic

Authored by nikic on Jul 6 2023, 1:40 AM.

Download Raw Diff

Details

Reviewers

goldstein.w.n
aeubanks
fhahn
efriedma
RKSimon

Commits

rG41895843b591: [InstCombine] Only perform one iteration

Summary

InstCombine is a worklist-driven algorithm, which works roughly as follows:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite:

"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point.

This patch proposes to remove the fixpoint iteration from InstCombine, and to always only perform a single iteration. This results in a major compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=b7e38ff22326d7bcbd01f080dc91f47be25e703e&to=40936c7e9324ce41819483f2c02f5bbcefa292a0&stat=instructions%3Au We get a 4-5% compile-time reduction at negligible codegen impact. (These numbers include D75362, which is a non-trivial regression when taken by itself. Most of the size-text changes are also due to that patch, not this one.)

This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases.

In order to prevent accidental optimization regressions in the future, this implements a default-enabled verify-fixpoint option, which will make sure that the fix point has indeed been reached after a single iteration. This means that tests where this is not the case need to be explicitly annotated. The actual optimization pipeline will disable this option, as failure to reach the fix point is expected to happen there (in rare cases, as described above).

Depends on D75362.

Diff Detail

Event Timeline

nikic created this revision.Jul 6 2023, 1:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM

Herald added subscribers: StephenFan, wenlei. · View Herald Transcript

nikic requested review of this revision.Jul 6 2023, 1:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 1:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic edited the summary of this revision. (Show Details)Jul 6 2023, 1:45 AM

Left some comments on test diffs. I don't think any of the remaining cases are particularly problematic, though the phi and freeze cases are something that may be worth fixing.

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll
54	This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer instructions in a limited window before the assume. We may fail to fold such cases in one iteration if we first need to fold instructions to bring the assume into a recognized form. Here the assume is only recognized by AC after ule is converted to ult, at which point the add before has already been visited. I don't think this issue matters in practice.
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
34	This is caused by details of how we canonicalize phi operand order. This is easy to fix, it just has annoying test fallout.
llvm/test/Transforms/InstCombine/pr55228.ll
14	This happens because the initializer of the global is not fully folded. This is not a problem when run in a real optimization pipeline, because GlobalOpt will handle such cases earlier.
llvm/test/Transforms/InstCombine/shift.ll
1725	I didn't bother looking into this, because it's a fuzzer test case.
llvm/test/Transforms/PGOProfile/chr.ll
1939	At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to introduce two freeze instructions. We could fix this by allowing the creation of more than one freeze when pushing upward. Especially for icmps that is probably beneficial.
llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
119	This is the same backwards reasoning assume issue mentioned above.

Harbormaster completed remote builds in B243400: Diff 537620.Jul 6 2023, 3:17 AM

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

In D154579#4477044, @RKSimon wrote:

This seems worthwhile in pursuing, but I don't know very much about how IC worklists are managed/sorted - your summary implies there are a number of workarounds in there, would these benefit from being cleaned up before/after this change?

Normally, InstCombine worklist management is handled implicitly, using a combination of IRBuilder callbacks and standard helpers like replaceInstUsesWith(). Things work automatically as long as folds just replace one sequence of instructions with another. However, for folds that do non-local changes (e.g. looping over users and doing extra replacements there), it may be necessary to perform manual worklist management. I've been working on adding that manual worklist management in all the places that were missing it over the last few weeks, and I'm not aware of any remaining issues. (The most common case is folds leaving behind dead instructions without queuing them for DCE.)

Regarding your verify-fixpoint proposal - would it mean we don't have much "no-verify-fixpoint" test coverage apart from phase-ordering or similar tests?

Right. Ideally we would always verify the fixpoint for tests (so that an explicitly opt-out is required for cases that don't reach the fixpoint), and not verify it outside tests. Verifying it for -passes=instcombine but not -passes='default<O3>' would be the heuristic for that.

the instcombine<no-verify-fixpoint> approach sgtm

re:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

In D154579#4477942, @goldstein.w.n wrote:

All instructions are initially pushed to the worklist. The initial order is (roughly) in program order / RPO.
All newly inserted instructions get added to the worklist.
When an instruction is folded, its users get added back to the worklist.
When the use-count of an instruction decreases, it gets added back to the worklist.
...plus a bunch of other heuristics on when we should revisit instructions.

What does it look like if instead of decreasing iteration count, we change re-insertion
logic based on iteration?
I.e:
iteration 1 -> do everything
iteration 2+ -> only re-add newly created insn or insn that are now single-use.

I don't think I understand your suggestion here. This sparse reprocessing is what the worklist is for -- and we do want to perform the reprocessing as part of the same iteration, not a later one, to make sure that folds working on later instructions see already folded operands, even if arriving at them requires multiple folds. If we delayed all reprocessing until a second iteration, folds would see operands after a single round of folding was applied to them, rather than in their final form.

nikic mentioned this in rG70aca7b12220: [InstCombine] Explicitly track dead edges.Jul 27 2023, 7:41 AM

Implement fix-point verification.

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 27 2023, 7:45 AM

Move stat update.

Harbormaster completed remote builds in B248575: Diff 544777.Jul 27 2023, 10:49 AM

aeubanks added inline comments.Jul 27 2023, 10:51 AM

llvm/lib/Passes/PassBuilderPipelines.cpp
369	imo the `InstCombinePass` constructor should default to `no-verify-fixpoint`, but `parseInstCombineOptions` should by default set `verify-fixpoint`, since we typically call the `InstCombinePass` constructor from pass pipelines

Move default to parseInstCombineOptions().

nikic marked an inline comment as done.Jul 28 2023, 1:58 AM

nikic added inline comments.

llvm/lib/Passes/PassBuilderPipelines.cpp
369	Good point. In fact, I missed some InstCombinePass() uses in BackendUtil in the previous patch. Doing this in option parsing makes sure all C++ uses of InstCombinePass don't get fixpoint verification.

nikic marked an inline comment as done.Jul 28 2023, 1:59 AM

nikic added inline comments.

llvm/lib/Passes/PassRegistry.def
328 ↗	(On Diff #545068)	This didn't get removed when FUNCTION_PASS_WITH_PARAMS was added below.

Harbormaster completed remote builds in B248785: Diff 545068.Jul 28 2023, 2:52 AM

are you still looking into some of the remaining cases, or is this in a state you want to land now?

In D154579#4543098, @aeubanks wrote:

are you still looking into some of the remaining cases, or is this in a state you want to land now?

This is ready to land as far as I'm concerned.

lgtm

llvm/test/Transforms/InstCombine/shift.ll
5

This revision is now accepted and ready to land.Jul 28 2023, 10:40 AM

aeubanks mentioned this in D75362: [InstCombine] Process blocks in RPO.Jul 28 2023, 10:44 AM

nikic mentioned this in rGad7f02010f32: [InstCombine] Process blocks in RPO.Jul 30 2023, 9:39 AM

This revision was landed with ongoing or failed builds.Jul 31 2023, 1:57 AM

Closed by commit rG41895843b591: [InstCombine] Only perform one iteration (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG41895843b591: [InstCombine] Only perform one iteration.

bjope added a subscriber: bjope.Jul 31 2023, 6:20 AM

bjope added inline comments.

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	We should add "no-verify-fixpoint;verify-fixpoint;" here, right?

bjope added inline comments.Jul 31 2023, 6:23 AM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	Also noticed that `instcombine<verify-fixpoint>` will be tricky to use in fuzzy testing with random pipelines. So I think we will avoid that.

bjope added inline comments.Jul 31 2023, 12:47 PM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	I solved this in https://reviews.llvm.org/rG5fbee1c6e300eee9ce9d18275bf8a6de0a22ba59

nikic added inline comments.Jul 31 2023, 12:50 PM

llvm/lib/Passes/PassRegistry.def
511 ↗	(On Diff #545541)	Thank you! And yes, for fuzzing purposes, `instcombine<no-verify-fixpoint>` should be used.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

InstCombine/

InstCombine.h

9 lines

lib/

Passes/

PassBuilder.cpp

2 lines

PassBuilderPipelines.cpp

42 lines

Transforms/

InstCombine/

InstructionCombining.cpp

37 lines

test/

Analysis/

ValueTracking/

numsignbits-from-assume.ll

7 lines

Other/

new-pm-print-pipeline.ll

4 lines

Transforms/

InstCombine/

constant-fold-iteration.ll

5 lines

merging-multiple-stores-into-successor.ll

17 lines

pr55228.ll

7 lines

shift.ll

7 lines

PGOProfile/

chr.ll

17 lines

PhaseOrdering/

AArch64/

matrix-extract-insert.ll

12 lines

Diff 544774

llvm/include/llvm/Transforms/InstCombine/InstCombine.h

	Show All 19 Lines
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	#define DEBUG_TYPE "instcombine"			#define DEBUG_TYPE "instcombine"
	#include "llvm/Transforms/Utils/InstructionWorklist.h"			#include "llvm/Transforms/Utils/InstructionWorklist.h"

	namespace llvm {			namespace llvm {

	static constexpr unsigned InstCombineDefaultMaxIterations = 1000;			static constexpr unsigned InstCombineDefaultMaxIterations = 1;

	struct InstCombineOptions {			struct InstCombineOptions {
	bool UseLoopInfo = false;			bool UseLoopInfo = false;
				// Verify that a fix point has been reached after MaxIterations.
				bool VerifyFixpoint = true;
	unsigned MaxIterations = InstCombineDefaultMaxIterations;			unsigned MaxIterations = InstCombineDefaultMaxIterations;

	InstCombineOptions() = default;			InstCombineOptions() = default;

	InstCombineOptions &setUseLoopInfo(bool Value) {			InstCombineOptions &setUseLoopInfo(bool Value) {
	UseLoopInfo = Value;			UseLoopInfo = Value;
	return *this;			return *this;
	}			}

				InstCombineOptions &setVerifyFixpoint(bool Value) {
				VerifyFixpoint = Value;
				return *this;
				}

	InstCombineOptions &setMaxIterations(unsigned Value) {			InstCombineOptions &setMaxIterations(unsigned Value) {
	MaxIterations = Value;			MaxIterations = Value;
	return *this;			return *this;
	}			}
	};			};

	class InstCombinePass : public PassInfoMixin<InstCombinePass> {			class InstCombinePass : public PassInfoMixin<InstCombinePass> {
	private:			private:
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	Expected<InstCombineOptions> parseInstCombineOptions(StringRef Params) {
InstCombineOptions Result;		InstCombineOptions Result;
while (!Params.empty()) {		while (!Params.empty()) {
StringRef ParamName;		StringRef ParamName;
std::tie(ParamName, Params) = Params.split(';');		std::tie(ParamName, Params) = Params.split(';');

bool Enable = !ParamName.consume_front("no-");		bool Enable = !ParamName.consume_front("no-");
if (ParamName == "use-loop-info") {		if (ParamName == "use-loop-info") {
Result.setUseLoopInfo(Enable);		Result.setUseLoopInfo(Enable);
		} else if (ParamName == "verify-fixpoint") {
		Result.setVerifyFixpoint(Enable);
} else if (Enable && ParamName.consume_front("max-iterations=")) {		} else if (Enable && ParamName.consume_front("max-iterations=")) {
APInt MaxIterations;		APInt MaxIterations;
if (ParamName.getAsInteger(0, MaxIterations))		if (ParamName.getAsInteger(0, MaxIterations))
return make_error<StringError>(		return make_error<StringError>(
formatv("invalid argument to InstCombine pass max-iterations "		formatv("invalid argument to InstCombine pass max-iterations "
"parameter: '{0}' ",		"parameter: '{0}' ",
ParamName).str(),		ParamName).str(),
inconvertibleErrorCode());		inconvertibleErrorCode());
▲ Show 20 Lines • Show All 1,225 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
}		}

// Helper to check if the current compilation phase is preparing for LTO		// Helper to check if the current compilation phase is preparing for LTO
static bool isLTOPreLink(ThinOrFullLTOPhase Phase) {		static bool isLTOPreLink(ThinOrFullLTOPhase Phase) {
return Phase == ThinOrFullLTOPhase::ThinLTOPreLink \|\|		return Phase == ThinOrFullLTOPhase::ThinLTOPreLink \|\|
Phase == ThinOrFullLTOPhase::FullLTOPreLink;		Phase == ThinOrFullLTOPhase::FullLTOPreLink;
}		}

		static InstCombinePass createInstCombinePass() {
		aeubanksUnsubmitted Done Reply Inline Actions imo the `InstCombinePass` constructor should default to `no-verify-fixpoint`, but `parseInstCombineOptions` should by default set `verify-fixpoint`, since we typically call the `InstCombinePass` constructor from pass pipelines aeubanks: imo the `InstCombinePass` constructor should default to `no-verify-fixpoint`, but…
		nikicAuthorUnsubmitted Done Reply Inline Actions Good point. In fact, I missed some InstCombinePass() uses in BackendUtil in the previous patch. Doing this in option parsing makes sure all C++ uses of InstCombinePass don't get fixpoint verification. nikic: Good point. In fact, I missed some InstCombinePass() uses in BackendUtil in the previous patch.
		// InstCombine passes in the optimization pipeline should not verify that
		// a fixpoint has been reached.
		return InstCombinePass(InstCombineOptions().setVerifyFixpoint(false));
		}

// TODO: Investigate the cost/benefit of tail call elimination on debugging.		// TODO: Investigate the cost/benefit of tail call elimination on debugging.
FunctionPassManager		FunctionPassManager
PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,		PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
ThinOrFullLTOPhase Phase) {		ThinOrFullLTOPhase Phase) {

FunctionPassManager FPM;		FunctionPassManager FPM;

if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
FPM.addPass(CountVisitsPass());		FPM.addPass(CountVisitsPass());

// Form SSA out of local memory accesses after breaking apart aggregates into		// Form SSA out of local memory accesses after breaking apart aggregates into
// scalars.		// scalars.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));		FPM.addPass(SROAPass(SROAOptions::ModifyCFG));

// Catch trivial redundancies		// Catch trivial redundancies
FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));		FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));

// Hoisting of scalars and load expressions.		// Hoisting of scalars and load expressions.
FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());

FPM.addPass(LibCallsShrinkWrapPass());		FPM.addPass(LibCallsShrinkWrapPass());

invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,

invokeLoopOptimizerEndEPCallbacks(LPM2, Level);		invokeLoopOptimizerEndEPCallbacks(LPM2, Level);

FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/UseMemorySSA=/true,		/UseMemorySSA=/true,
/UseBlockFrequencyInfo=/true));		/UseBlockFrequencyInfo=/true));
FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
// The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.		// The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
// All loop passes must preserve it, in order to be able to use it.		// All loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
/UseMemorySSA=/false,		/UseMemorySSA=/false,
/UseBlockFrequencyInfo=/false));		/UseBlockFrequencyInfo=/false));

// Delete small array after loop unroll.		// Delete small array after loop unroll.
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));		FPM.addPass(SROAPass(SROAOptions::ModifyCFG));

// Specially optimize memory movement as it doesn't look like dataflow in SSA.		// Specially optimize memory movement as it doesn't look like dataflow in SSA.
FPM.addPass(MemCpyOptPass());		FPM.addPass(MemCpyOptPass());

// Sparse conditional constant propagation.		// Sparse conditional constant propagation.
// FIXME: It isn't clear why we do this after loop passes rather than		// FIXME: It isn't clear why we do this after loop passes rather than
// before...		// before...
FPM.addPass(SCCPPass());		FPM.addPass(SCCPPass());

// Delete dead bit computations (instcombine runs after to fold away the dead		// Delete dead bit computations (instcombine runs after to fold away the dead
// computations, and then ADCE will run later to exploit any new DCE		// computations, and then ADCE will run later to exploit any new DCE
// opportunities that creates).		// opportunities that creates).
FPM.addPass(BDCEPass());		FPM.addPass(BDCEPass());

// Run instcombine after redundancy and dead bit elimination to exploit		// Run instcombine after redundancy and dead bit elimination to exploit
// opportunities opened up by them.		// opportunities opened up by them.
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

FPM.addPass(CoroElidePass());		FPM.addPass(CoroElidePass());

invokeScalarOptimizerLateEPCallbacks(FPM, Level);		invokeScalarOptimizerLateEPCallbacks(FPM, Level);

// Finally, do an expensive DCE pass to catch all the dead code exposed by		// Finally, do an expensive DCE pass to catch all the dead code exposed by
// the simplifications and basic cleanup after all the simplifications.		// the simplifications and basic cleanup after all the simplifications.
// TODO: Investigate if this is too expensive.		// TODO: Investigate if this is too expensive.
FPM.addPass(ADCEPass());		FPM.addPass(ADCEPass());
FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

return FPM;		return FPM;
}		}

FunctionPassManager		FunctionPassManager
PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,		PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
ThinOrFullLTOPhase Phase) {		ThinOrFullLTOPhase Phase) {
Show All 33 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
FPM.addPass(SpeculativeExecutionPass(/* OnlyIfDivergentTarget =*/true));		FPM.addPass(SpeculativeExecutionPass(/* OnlyIfDivergentTarget =*/true));

// Optimize based on known information about branches, and cleanup afterward.		// Optimize based on known information about branches, and cleanup afterward.
FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());
FPM.addPass(CorrelatedValuePropagationPass());		FPM.addPass(CorrelatedValuePropagationPass());

FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
FPM.addPass(AggressiveInstCombinePass());		FPM.addPass(AggressiveInstCombinePass());

if (EnableConstraintElimination)		if (EnableConstraintElimination)
FPM.addPass(ConstraintEliminationPass());		FPM.addPass(ConstraintEliminationPass());

if (!Level.isOptimizingForSize())		if (!Level.isOptimizingForSize())
FPM.addPass(LibCallsShrinkWrapPass());		FPM.addPass(LibCallsShrinkWrapPass());

▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,

invokeLoopOptimizerEndEPCallbacks(LPM2, Level);		invokeLoopOptimizerEndEPCallbacks(LPM2, Level);

FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/UseMemorySSA=/true,		/UseMemorySSA=/true,
/UseBlockFrequencyInfo=/true));		/UseBlockFrequencyInfo=/true));
FPM.addPass(		FPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
// The loop passes in LPM2 (LoopIdiomRecognizePass, IndVarSimplifyPass,		// The loop passes in LPM2 (LoopIdiomRecognizePass, IndVarSimplifyPass,
// LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.		// LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
// All loop passes must preserve it, in order to be able to use it.		// All loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
/UseMemorySSA=/false,		/UseMemorySSA=/false,
/UseBlockFrequencyInfo=/false));		/UseBlockFrequencyInfo=/false));

// Delete small array after loop unroll.		// Delete small array after loop unroll.
Show All 17 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,

// Delete dead bit computations (instcombine runs after to fold away the dead		// Delete dead bit computations (instcombine runs after to fold away the dead
// computations, and then ADCE will run later to exploit any new DCE		// computations, and then ADCE will run later to exploit any new DCE
// opportunities that creates).		// opportunities that creates).
FPM.addPass(BDCEPass());		FPM.addPass(BDCEPass());

// Run instcombine after redundancy and dead bit elimination to exploit		// Run instcombine after redundancy and dead bit elimination to exploit
// opportunities opened up by them.		// opportunities opened up by them.
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

// Re-consider control flow based optimizations after redundancy elimination,		// Re-consider control flow based optimizations after redundancy elimination,
// redo DCE, etc.		// redo DCE, etc.
if (EnableDFAJumpThreading && Level.getSizeLevel() == 0)		if (EnableDFAJumpThreading && Level.getSizeLevel() == 0)
FPM.addPass(DFAJumpThreadingPass());		FPM.addPass(DFAJumpThreadingPass());

FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());
Show All 18 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
FPM.addPass(CoroElidePass());		FPM.addPass(CoroElidePass());

invokeScalarOptimizerLateEPCallbacks(FPM, Level);		invokeScalarOptimizerLateEPCallbacks(FPM, Level);

FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()
.convertSwitchRangeToICmp(true)		.convertSwitchRangeToICmp(true)
.hoistCommonInsts(true)		.hoistCommonInsts(true)
.sinkCommonInsts(true)));		.sinkCommonInsts(true)));
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

return FPM;		return FPM;
}		}

void PassBuilder::addRequiredLTOPreLinkPasses(ModulePassManager &MPM) {		void PassBuilder::addRequiredLTOPreLinkPasses(ModulePassManager &MPM) {
MPM.addPass(CanonicalizeAliasesPass());		MPM.addPass(CanonicalizeAliasesPass());
MPM.addPass(NameAnonGlobalPass());		MPM.addPass(NameAnonGlobalPass());
Show All 21 Lines	ModuleInlinerWrapperPass MIWP(
InlineContext{LTOPhase, InlinePass::EarlyInliner});		InlineContext{LTOPhase, InlinePass::EarlyInliner});
CGSCCPassManager &CGPipeline = MIWP.getPM();		CGSCCPassManager &CGPipeline = MIWP.getPM();

FunctionPassManager FPM;		FunctionPassManager FPM;
FPM.addPass(SROAPass(SROAOptions::ModifyCFG));		FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
FPM.addPass(EarlyCSEPass()); // Catch trivial redundancies.		FPM.addPass(EarlyCSEPass()); // Catch trivial redundancies.
FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(
true))); // Merge & remove basic blocks.		true))); // Merge & remove basic blocks.
FPM.addPass(InstCombinePass()); // Combine silly sequences.		FPM.addPass(createInstCombinePass()); // Combine silly sequences.
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

CGPipeline.addPass(createCGSCCToFunctionPassAdaptor(		CGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
std::move(FPM), PTO.EagerlyInvalidateAnalyses));		std::move(FPM), PTO.EagerlyInvalidateAnalyses));

MPM.addPass(std::move(MIWP));		MPM.addPass(std::move(MIWP));

// Delete anything that is now dead to make sure that we don't instrument		// Delete anything that is now dead to make sure that we don't instrument
▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// Optimize globals to try and fold them into constants.		// Optimize globals to try and fold them into constants.
MPM.addPass(GlobalOptPass());		MPM.addPass(GlobalOptPass());

// Create a small function pass pipeline to cleanup after all the global		// Create a small function pass pipeline to cleanup after all the global
// optimizations.		// optimizations.
FunctionPassManager GlobalCleanupPM;		FunctionPassManager GlobalCleanupPM;
// FIXME: Should this instead by a run of SROA?		// FIXME: Should this instead by a run of SROA?
GlobalCleanupPM.addPass(PromotePass());		GlobalCleanupPM.addPass(PromotePass());
GlobalCleanupPM.addPass(InstCombinePass());		GlobalCleanupPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(GlobalCleanupPM, Level);		invokePeepholeEPCallbacks(GlobalCleanupPM, Level);
GlobalCleanupPM.addPass(		GlobalCleanupPM.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM),		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM),
PTO.EagerlyInvalidateAnalyses));		PTO.EagerlyInvalidateAnalyses));

// Add all the requested passes for instrumentation PGO, if requested.		// Add all the requested passes for instrumentation PGO, if requested.
if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&		if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	void PassBuilder::addVectorPasses(OptimizationLevel Level,
}		}

if (!IsFullLTO) {		if (!IsFullLTO) {
// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
FPM.addPass(LoopLoadEliminationPass());		FPM.addPass(LoopLoadEliminationPass());
}		}
// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());

if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {		if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {
ExtraVectorPassManager ExtraPasses;		ExtraVectorPassManager ExtraPasses;
// At higher optimization levels, try to clean up any runtime overlap and		// At higher optimization levels, try to clean up any runtime overlap and
// alignment checks inserted by the vectorizer. We want to track correlated		// alignment checks inserted by the vectorizer. We want to track correlated
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
ExtraPasses.addPass(EarlyCSEPass());		ExtraPasses.addPass(EarlyCSEPass());
ExtraPasses.addPass(CorrelatedValuePropagationPass());		ExtraPasses.addPass(CorrelatedValuePropagationPass());
ExtraPasses.addPass(InstCombinePass());		ExtraPasses.addPass(createInstCombinePass());
LoopPassManager LPM;		LoopPassManager LPM;
LPM.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,		LPM.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/AllowSpeculation=/true));		/AllowSpeculation=/true));
LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==		LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
OptimizationLevel::O3));		OptimizationLevel::O3));
ExtraPasses.addPass(		ExtraPasses.addPass(
createFunctionToLoopPassAdaptor(std::move(LPM), /UseMemorySSA=/true,		createFunctionToLoopPassAdaptor(std::move(LPM), /UseMemorySSA=/true,
/UseBlockFrequencyInfo=/true));		/UseBlockFrequencyInfo=/true));
ExtraPasses.addPass(		ExtraPasses.addPass(
SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
ExtraPasses.addPass(InstCombinePass());		ExtraPasses.addPass(createInstCombinePass());
FPM.addPass(std::move(ExtraPasses));		FPM.addPass(std::move(ExtraPasses));
}		}

// Now that we've formed fast to execute loop structures, we do further		// Now that we've formed fast to execute loop structures, we do further
// optimizations. These are run afterward as they might block doing complex		// optimizations. These are run afterward as they might block doing complex
// analyses and transforms such as what are needed for loop vectorization.		// analyses and transforms such as what are needed for loop vectorization.

// Cleanup after loop vectorization, etc. Simplification passes like CVP and		// Cleanup after loop vectorization, etc. Simplification passes like CVP and
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()
.forwardSwitchCondToPhi(true)		.forwardSwitchCondToPhi(true)
.convertSwitchRangeToICmp(true)		.convertSwitchRangeToICmp(true)
.convertSwitchToLookupTable(true)		.convertSwitchToLookupTable(true)
.needCanonicalLoops(false)		.needCanonicalLoops(false)
.hoistCommonInsts(true)		.hoistCommonInsts(true)
.sinkCommonInsts(true)));		.sinkCommonInsts(true)));

if (IsFullLTO) {		if (IsFullLTO) {
FPM.addPass(SCCPPass());		FPM.addPass(SCCPPass());
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
FPM.addPass(BDCEPass());		FPM.addPass(BDCEPass());
}		}

// Optimize parallel scalar instruction chains into SIMD instructions.		// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization) {		if (PTO.SLPVectorization) {
FPM.addPass(SLPVectorizerPass());		FPM.addPass(SLPVectorizerPass());
if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {		if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {
FPM.addPass(EarlyCSEPass());		FPM.addPass(EarlyCSEPass());
}		}
}		}
// Enhance/cleanup vector code.		// Enhance/cleanup vector code.
FPM.addPass(VectorCombinePass());		FPM.addPass(VectorCombinePass());

if (!IsFullLTO) {		if (!IsFullLTO) {
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
// Unroll small loops to hide loop backedge latency and saturate any		// Unroll small loops to hide loop backedge latency and saturate any
// parallel execution resources of an out-of-order processor. We also then		// parallel execution resources of an out-of-order processor. We also then
// need to clean up redundancies and loop invariant code.		// need to clean up redundancies and loop invariant code.
// FIXME: It would be really good to use a loop-integrated instruction		// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.		// across the loop nests.
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll		// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
if (EnableUnrollAndJam && PTO.LoopUnrolling) {		if (EnableUnrollAndJam && PTO.LoopUnrolling) {
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LoopUnrollAndJamPass(Level.getSpeedupLevel())));		LoopUnrollAndJamPass(Level.getSpeedupLevel())));
}		}
FPM.addPass(LoopUnrollPass(LoopUnrollOptions(		FPM.addPass(LoopUnrollPass(LoopUnrollOptions(
Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,		Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));		PTO.ForgetAllSCEVInLoopUnroll)));
FPM.addPass(WarnMissedTransformationsPass());		FPM.addPass(WarnMissedTransformationsPass());
// Now that we are done with loop unrolling, be it either by LoopVectorizer,		// Now that we are done with loop unrolling, be it either by LoopVectorizer,
// or LoopUnroll passes, some variable-offset GEP's into alloca's could have		// or LoopUnroll passes, some variable-offset GEP's into alloca's could have
// become constant-offset, thus enabling SROA and alloca promotion. Do so.		// become constant-offset, thus enabling SROA and alloca promotion. Do so.
// NOTE: we are very late in the pipeline, and we don't have any LICM		// NOTE: we are very late in the pipeline, and we don't have any LICM
// or SimplifyCFG passes scheduled after us, that would cleanup		// or SimplifyCFG passes scheduled after us, that would cleanup
// the CFG mess this may created if allowed to modify CFG, so forbid that.		// the CFG mess this may created if allowed to modify CFG, so forbid that.
FPM.addPass(SROAPass(SROAOptions::PreserveCFG));		FPM.addPass(SROAPass(SROAOptions::PreserveCFG));
}		}

FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());

// This is needed for two reasons:		// This is needed for two reasons:
// 1. It works around problems that instcombine introduces, such as sinking		// 1. It works around problems that instcombine introduces, such as sinking
// expensive FP divides into loops containing multiplications using the		// expensive FP divides into loops containing multiplications using the
// divide result.		// divide result.
// 2. It helps to clean up some loop-invariant code created by the loop		// 2. It helps to clean up some loop-invariant code created by the loop
// unroll pass when IsFullLTO=false.		// unroll pass when IsFullLTO=false.
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
▲ Show 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
// Remove unused arguments from functions.		// Remove unused arguments from functions.
MPM.addPass(DeadArgumentEliminationPass());		MPM.addPass(DeadArgumentEliminationPass());

// Reduce the code after globalopt and ipsccp. Both can open up significant		// Reduce the code after globalopt and ipsccp. Both can open up significant
// simplification opportunities, and both can propagate functions through		// simplification opportunities, and both can propagate functions through
// function pointers. When this happens, we often have to resolve varargs		// function pointers. When this happens, we often have to resolve varargs
// calls, etc, so let instcombine do this.		// calls, etc, so let instcombine do this.
FunctionPassManager PeepholeFPM;		FunctionPassManager PeepholeFPM;
PeepholeFPM.addPass(InstCombinePass());		PeepholeFPM.addPass(createInstCombinePass());
if (Level.getSpeedupLevel() > 1)		if (Level.getSpeedupLevel() > 1)
PeepholeFPM.addPass(AggressiveInstCombinePass());		PeepholeFPM.addPass(AggressiveInstCombinePass());
invokePeepholeEPCallbacks(PeepholeFPM, Level);		invokePeepholeEPCallbacks(PeepholeFPM, Level);

MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM),		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM),
PTO.EagerlyInvalidateAnalyses));		PTO.EagerlyInvalidateAnalyses));

// Note: historically, the PruneEH pass was run first to deduce nounwind and		// Note: historically, the PruneEH pass was run first to deduce nounwind and
Show All 29 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MPM.addPass(GlobalDCEPass(/InLTOPostLink=/true));		MPM.addPass(GlobalDCEPass(/InLTOPostLink=/true));

// If we didn't decide to inline a function, check to see if we can		// If we didn't decide to inline a function, check to see if we can
// transform it to pass arguments by value instead of by reference.		// transform it to pass arguments by value instead of by reference.
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(ArgumentPromotionPass()));		MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(ArgumentPromotionPass()));

FunctionPassManager FPM;		FunctionPassManager FPM;
// The IPO Passes may leave cruft around. Clean up after them.		// The IPO Passes may leave cruft around. Clean up after them.
FPM.addPass(InstCombinePass());		FPM.addPass(createInstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

if (EnableConstraintElimination)		if (EnableConstraintElimination)
FPM.addPass(ConstraintEliminationPass());		FPM.addPass(ConstraintEliminationPass());

FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());

// Do a post inline PGO instrumentation and use pass. This is a context		// Do a post inline PGO instrumentation and use pass. This is a context
▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"),		EnableCodeSinking("instcombine-code-sinking", cl::desc("Enable code sinking"),
cl::init(true));		cl::init(true));

static cl::opt<unsigned> MaxSinkNumUsers(		static cl::opt<unsigned> MaxSinkNumUsers(
"instcombine-max-sink-users", cl::init(32),		"instcombine-max-sink-users", cl::init(32),
cl::desc("Maximum number of undroppable users for instruction sinking"));		cl::desc("Maximum number of undroppable users for instruction sinking"));

		// FIXME: Remove this option, it has been superseded by verify-fixpoint.
		// Only keeping it for now to avoid unnecessary test churn in this patch.
static cl::opt<unsigned> InfiniteLoopDetectionThreshold(		static cl::opt<unsigned> InfiniteLoopDetectionThreshold(
"instcombine-infinite-loop-threshold",		"instcombine-infinite-loop-threshold",
cl::desc("Number of instruction combining iterations considered an "		cl::desc("Number of instruction combining iterations considered an "
"infinite loop"),		"infinite loop"),
cl::init(InstCombineDefaultInfiniteLoopThreshold), cl::Hidden);		cl::init(InstCombineDefaultInfiniteLoopThreshold), cl::Hidden);

static cl::opt<unsigned>		static cl::opt<unsigned>
MaxArraySize("instcombine-maxarray-size", cl::init(1024),		MaxArraySize("instcombine-maxarray-size", cl::init(1024),
▲ Show 20 Lines • Show All 4,087 Lines • ▼ Show 20 Lines	prepareICWorklistFromFunction(Function &F, const DataLayout &DL,

return MadeIRChange;		return MadeIRChange;
}		}

static bool combineInstructionsOverFunction(		static bool combineInstructionsOverFunction(
Function &F, InstructionWorklist &Worklist, AliasAnalysis *AA,		Function &F, InstructionWorklist &Worklist, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, TargetTransformInfo &TTI,		AssumptionCache &AC, TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,		DominatorTree &DT, OptimizationRemarkEmitter &ORE, BlockFrequencyInfo *BFI,
ProfileSummaryInfo PSI, unsigned MaxIterations, LoopInfo LI) {		ProfileSummaryInfo *PSI, unsigned MaxIterations, bool VerifyFixpoint,
		LoopInfo *LI) {
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {		IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
Worklist.add(I);		Worklist.add(I);
if (auto *Assume = dyn_cast<AssumeInst>(I))		if (auto *Assume = dyn_cast<AssumeInst>(I))
AC.registerAssumption(Assume);		AC.registerAssumption(Assume);
}));		}));

ReversePostOrderTraversal<BasicBlock *> RPOT(&F.front());		ReversePostOrderTraversal<BasicBlock *> RPOT(&F.front());

// Lower dbg.declare intrinsics otherwise their value may be clobbered		// Lower dbg.declare intrinsics otherwise their value may be clobbered
// by instcombiner.		// by instcombiner.
bool MadeIRChange = false;		bool MadeIRChange = false;
if (ShouldLowerDbgDeclare)		if (ShouldLowerDbgDeclare)
MadeIRChange = LowerDbgDeclare(F);		MadeIRChange = LowerDbgDeclare(F);

// Iterate while there is work to do.		// Iterate while there is work to do.
unsigned Iteration = 0;		unsigned Iteration = 0;
while (true) {		while (true) {
		bool MadeChangeInThisIteration = false;
++NumWorklistIterations;		++NumWorklistIterations;
++Iteration;		++Iteration;

if (Iteration > InfiniteLoopDetectionThreshold) {		if (Iteration > MaxIterations && !VerifyFixpoint) {
report_fatal_error(
"Instruction Combining seems stuck in an infinite loop after " +
Twine(InfiniteLoopDetectionThreshold) + " iterations.");
}

if (Iteration > MaxIterations) {
LLVM_DEBUG(dbgs() << "\n\n[IC] Iteration limit #" << MaxIterations		LLVM_DEBUG(dbgs() << "\n\n[IC] Iteration limit #" << MaxIterations
<< " on " << F.getName()		<< " on " << F.getName()
<< " reached; stopping before reaching a fixpoint\n");		<< " reached; stopping without verifying fixpoint\n");
break;		break;
}		}

LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist, RPOT);		MadeChangeInThisIteration \|=
		prepareICWorklistFromFunction(F, DL, &TLI, Worklist, RPOT);

InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,		InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,
ORE, BFI, PSI, DL, LI);		ORE, BFI, PSI, DL, LI);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;
		MadeChangeInThisIteration \|= IC.run();
if (!IC.run())		if (!MadeChangeInThisIteration)
break;		break;

MadeIRChange = true;		MadeIRChange = true;
		if (Iteration > MaxIterations) {
		report_fatal_error(
		"Instruction Combining did not reach a fixpoint after " +
		Twine(MaxIterations) + " iterations");
		}
}		}

if (Iteration == 1)		if (Iteration == 1)
++NumOneIteration;		++NumOneIteration;
else if (Iteration == 2)		else if (Iteration == 2)
++NumTwoIterations;		++NumTwoIterations;
else if (Iteration == 3)		else if (Iteration == 3)
++NumThreeIterations;		++NumThreeIterations;
else		else
++NumFourOrMoreIterations;		++NumFourOrMoreIterations;

return MadeIRChange;		return MadeIRChange;
}		}

InstCombinePass::InstCombinePass(InstCombineOptions Opts) : Options(Opts) {}		InstCombinePass::InstCombinePass(InstCombineOptions Opts) : Options(Opts) {}

void InstCombinePass::printPipeline(		void InstCombinePass::printPipeline(
raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {		raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
static_cast<PassInfoMixin<InstCombinePass> *>(this)->printPipeline(		static_cast<PassInfoMixin<InstCombinePass> *>(this)->printPipeline(
OS, MapClassName2PassName);		OS, MapClassName2PassName);
OS << '<';		OS << '<';
OS << "max-iterations=" << Options.MaxIterations << ";";		OS << "max-iterations=" << Options.MaxIterations << ";";
OS << (Options.UseLoopInfo ? "" : "no-") << "use-loop-info";		OS << (Options.UseLoopInfo ? "" : "no-") << "use-loop-info;";
		OS << (Options.VerifyFixpoint ? "" : "no-") << "verify-fixpoint";
OS << '>';		OS << '>';
}		}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
Show All 9 Lines	PreservedAnalyses InstCombinePass::run(Function &F,
auto *AA = &AM.getResult<AAManager>(F);		auto *AA = &AM.getResult<AAManager>(F);
auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);		auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());		MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
auto *BFI = (PSI && PSI->hasProfileSummary()) ?		auto *BFI = (PSI && PSI->hasProfileSummary()) ?
&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;		&AM.getResult<BlockFrequencyAnalysis>(F) : nullptr;

if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,		if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
BFI, PSI, Options.MaxIterations, LI))		BFI, PSI, Options.MaxIterations,
		Options.VerifyFixpoint, LI))
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
return PA;		return PA;
}		}
Show All 33 Lines	ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
BlockFrequencyInfo *BFI =		BlockFrequencyInfo *BFI =
(PSI && PSI->hasProfileSummary()) ?		(PSI && PSI->hasProfileSummary()) ?
&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :		&getAnalysis<LazyBlockFrequencyInfoPass>().getBFI() :
nullptr;		nullptr;

return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,		return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI, DT, ORE,
BFI, PSI,		BFI, PSI,
InstCombineDefaultMaxIterations, LI);		InstCombineDefaultMaxIterations,
		/VerifyFixpoint / false, LI);
}		}

char InstructionCombiningPass::ID = 0;		char InstructionCombiningPass::ID = 0;

InstructionCombiningPass::InstructionCombiningPass() : FunctionPass(ID) {		InstructionCombiningPass::InstructionCombiningPass() : FunctionPass(ID) {
initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());		initializeInstructionCombiningPassPass(*PassRegistry::getPassRegistry());
}		}

Show All 22 Lines

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -S \| FileCheck %s		; RUN: opt < %s -passes='instcombine<no-verify-fixpoint>' -S \| FileCheck %s

		; FIXME: This does not currently reach a fix point, because an assume can only
		; be propagated backwards after its argument has been simplified.

define i32 @computeNumSignBits_add1(i32 %in) {		define i32 @computeNumSignBits_add1(i32 %in) {
; CHECK-LABEL: @computeNumSignBits_add1(		; CHECK-LABEL: @computeNumSignBits_add1(
; CHECK-NEXT: [[ADD:%.]] = add i32 [[IN:%.]], 1		; CHECK-NEXT: [[ADD:%.]] = add i32 [[IN:%.]], 1
; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[ADD]], 43		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[ADD]], 43
; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])		; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])
; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[ADD]], 3		; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[ADD]], 3
; CHECK-NEXT: ret i32 [[SH]]		; CHECK-NEXT: ret i32 [[SH]]
Show All 32 Lines	;
%cond = icmp ule i32 %sub, 42		%cond = icmp ule i32 %sub, 42
call void @llvm.assume(i1 %cond)		call void @llvm.assume(i1 %cond)
%sh = shl i32 %sub, 3		%sh = shl i32 %sub, 3
ret i32 %sh		ret i32 %sh
}		}

define i32 @computeNumSignBits_sub2(i32 %in) {		define i32 @computeNumSignBits_sub2(i32 %in) {
; CHECK-LABEL: @computeNumSignBits_sub2(		; CHECK-LABEL: @computeNumSignBits_sub2(
; CHECK-NEXT: [[SUB:%.]] = add nsw i32 [[IN:%.]], -1		; CHECK-NEXT: [[SUB:%.]] = add i32 [[IN:%.]], -1
		nikicAuthorUnsubmitted Done Reply Inline Actions This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer instructions in a limited window before the assume. We may fail to fold such cases in one iteration if we first need to fold instructions to bring the assume into a recognized form. Here the assume is only recognized by AC after ule is converted to ult, at which point the add before has already been visited. I don't think this issue matters in practice. nikic: This is related to backwards-propagation of assumes: Assumes can affect guaranteed-to-transfer…
; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[SUB]], 43		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[SUB]], 43
; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])		; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])
; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[SUB]], 3		; CHECK-NEXT: [[SH:%.*]] = shl nuw nsw i32 [[SUB]], 3
; CHECK-NEXT: ret i32 [[SH]]		; CHECK-NEXT: ret i32 [[SH]]
;		;
%sub = sub i32 %in, 1		%sub = sub i32 %in, 1
%cond = icmp ule i32 %sub, 42		%cond = icmp ule i32 %sub, 42
call void @llvm.assume(i1 %cond)		call void @llvm.assume(i1 %cond)
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-print-pipeline.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; RUN: opt -disable-output -passes='default<O2>' < %s			; RUN: opt -disable-output -passes='default<O2>' < %s
	; RUN: opt -disable-output -passes='default<O3>' < %s			; RUN: opt -disable-output -passes='default<O3>' < %s

	;; Test SeparateConstOffsetFromGEPPass option.			;; Test SeparateConstOffsetFromGEPPass option.
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='separate-const-offset-from-gep<lower-gep>' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-27			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='separate-const-offset-from-gep<lower-gep>' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-27
	; CHECK-27: function(separate-const-offset-from-gep<lower-gep>)			; CHECK-27: function(separate-const-offset-from-gep<lower-gep>)

	;; Test InstCombine options - the first pass checks default settings, and the second checks customized options.			;; Test InstCombine options - the first pass checks default settings, and the second checks customized options.
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='function(instcombine,instcombine<use-loop-info;max-iterations=42>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-28			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='function(instcombine,instcombine<use-loop-info;no-verify-fixpoint;max-iterations=42>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-28
	; CHECK-28: function(instcombine<max-iterations=1000;no-use-loop-info>,instcombine<max-iterations=42;use-loop-info>)			; CHECK-28: function(instcombine<max-iterations=1;no-use-loop-info;verify-fixpoint>,instcombine<max-iterations=42;use-loop-info;no-verify-fixpoint>)

	;; Test function-attrs			;; Test function-attrs
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function-attrs<skip-non-recursive>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-29			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function-attrs<skip-non-recursive>)' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-29
	; CHECK-29: cgscc(function-attrs<skip-non-recursive>)			; CHECK-29: cgscc(function-attrs<skip-non-recursive>)

	;; Test cgscc -> function adaptor			;; Test cgscc -> function adaptor
	; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function<eager-inv;no-rerun>(no-op-function))' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-30			; RUN: opt -disable-output -disable-verify -print-pipeline-passes -passes='cgscc(function<eager-inv;no-rerun>(no-op-function))' < %s \| FileCheck %s --match-full-lines --check-prefixes=CHECK-30
	; CHECK-30: cgscc(function<eager-inv;no-rerun>(no-op-function))			; CHECK-30: cgscc(function<eager-inv;no-rerun>(no-op-function))
	Show All 15 Lines

llvm/test/Transforms/InstCombine/constant-fold-iteration.ll

	; RUN: opt < %s -passes=instcombine -S -debug 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='instcombine<no-verify-fixpoint>' -S -debug 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts
	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"

				; This test disables fixpoint verification, because that would cause a second
				; iteration for verification.

	define i32 @a() nounwind readnone {			define i32 @a() nounwind readnone {
	entry:			entry:
	ret i32 zext (i1 icmp eq (i32 0, i32 ptrtoint (ptr @a to i32)) to i32)			ret i32 zext (i1 icmp eq (i32 0, i32 ptrtoint (ptr @a to i32)) to i32)
	}			}
	; CHECK: INSTCOMBINE ITERATION #1			; CHECK: INSTCOMBINE ITERATION #1
	; CHECK-NOT: INSTCOMBINE ITERATION #2			; CHECK-NOT: INSTCOMBINE ITERATION #2

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -instcombine-infinite-loop-threshold=3 -S \| FileCheck %s		; RUN: opt < %s -passes='instcombine<no-verify-fixpoint>' -S \| FileCheck %s

		; FIXME: This currently doesn't reach a fix point, because we don't
		; canonicalize the operand order of newly added phi nodes.

@var_7 = external global i8, align 1		@var_7 = external global i8, align 1
@var_1 = external global i32, align 4		@var_1 = external global i32, align 4
@var_0 = external global i16, align 2		@var_0 = external global i16, align 2
@var_5 = external global i64, align 8		@var_5 = external global i64, align 8
@arr_2 = external global [0 x i32], align 4		@arr_2 = external global [0 x i32], align 4
@arr_4 = external global [0 x i16], align 2		@arr_4 = external global [0 x i16], align 2
@arr_3 = external global [8 x i32], align 16		@arr_3 = external global [8 x i32], align 16
Show All 12 Lines
; CHECK-NEXT: [[I3:%.*]] = icmp eq i32 [[I2]], 0		; CHECK-NEXT: [[I3:%.*]] = icmp eq i32 [[I2]], 0
; CHECK-NEXT: [[I6:%.*]] = load i64, ptr @var_5, align 8		; CHECK-NEXT: [[I6:%.*]] = load i64, ptr @var_5, align 8
; CHECK-NEXT: [[I5:%.*]] = sext i16 [[I4]] to i64		; CHECK-NEXT: [[I5:%.*]] = sext i16 [[I4]] to i64
; CHECK-NEXT: [[I7:%.*]] = select i1 [[I3]], i64 [[I6]], i64 [[I5]]		; CHECK-NEXT: [[I7:%.*]] = select i1 [[I3]], i64 [[I6]], i64 [[I5]]
; CHECK-NEXT: [[I11:%.*]] = trunc i64 [[I7]] to i32		; CHECK-NEXT: [[I11:%.*]] = trunc i64 [[I7]] to i32
; CHECK-NEXT: br label [[BB12]]		; CHECK-NEXT: br label [[BB12]]
; CHECK: bb12:		; CHECK: bb12:
; CHECK-NEXT: [[STOREMERGE1:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ]		; CHECK-NEXT: [[STOREMERGE1:%.*]] = phi i32 [ [[I11]], [[BB10]] ], [ 1, [[BB9]] ]
		; CHECK-NEXT: [[STOREMERGE:%.*]] = phi i32 [ 1, [[BB9]] ], [ [[I11]], [[BB10]] ]
		nikicAuthorUnsubmitted Done Reply Inline Actions This is caused by details of how we canonicalize phi operand order. This is easy to fix, it just has annoying test fallout. nikic: This is caused by details of how we canonicalize phi operand order. This is easy to fix, it…
; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr @arr_2, align 4		; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr @arr_2, align 4
; CHECK-NEXT: store i16 [[I4]], ptr @arr_4, align 2		; CHECK-NEXT: store i16 [[I4]], ptr @arr_4, align 2
; CHECK-NEXT: [[I8:%.*]] = sext i16 [[I4]] to i32		; CHECK-NEXT: [[I8:%.*]] = sext i16 [[I4]] to i32
; CHECK-NEXT: store i32 [[I8]], ptr @arr_3, align 16		; CHECK-NEXT: store i32 [[I8]], ptr @arr_3, align 16
; CHECK-NEXT: store i32 [[STOREMERGE1]], ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4		; CHECK-NEXT: store i32 [[STOREMERGE]], ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4
; CHECK-NEXT: store i16 [[I4]], ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2		; CHECK-NEXT: store i16 [[I4]], ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2
; CHECK-NEXT: store i32 [[I8]], ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4		; CHECK-NEXT: store i32 [[I8]], ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
bb:		bb:
%i = load i8, ptr @var_7, align 1		%i = load i8, ptr @var_7, align 1
%i1 = icmp eq i8 %i, -1		%i1 = icmp eq i8 %i, -1
%i2 = load i32, ptr @var_1, align 4		%i2 = load i32, ptr @var_1, align 4
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	BB1:
store ptr %b, ptr %alloca		store ptr %b, ptr %alloca
br label %sink		br label %sink
sink:		sink:
%val = load i64, ptr %alloca		%val = load i64, ptr %alloca
ret i64 %val		ret i64 %val
}		}

define ptr @inttoptr_merge(i1 %cond, i64 %a, ptr %b) {		define ptr @inttoptr_merge(i1 %cond, i64 %a, ptr %b) {
; CHECK-LABEL: define ptr @inttoptr_merge		; CHECK-LABEL: @inttoptr_merge(
; CHECK-SAME: (i1 [[COND:%.]], i64 [[A:%.]], ptr [[B:%.*]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[COND]], label [[BB0:%.]], label [[BB1:%.]]		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
; CHECK: BB0:		; CHECK: BB0:
; CHECK-NEXT: [[TMP0:%.*]] = inttoptr i64 [[A]] to ptr		; CHECK-NEXT: [[TMP0:%.]] = inttoptr i64 [[A:%.]] to ptr
; CHECK-NEXT: br label [[SINK:%.*]]		; CHECK-NEXT: br label [[SINK:%.*]]
; CHECK: BB1:		; CHECK: BB1:
; CHECK-NEXT: br label [[SINK]]		; CHECK-NEXT: br label [[SINK]]
; CHECK: sink:		; CHECK: sink:
; CHECK-NEXT: [[STOREMERGE:%.*]] = phi ptr [ [[B]], [[BB1]] ], [ [[TMP0]], [[BB0]] ]		; CHECK-NEXT: [[STOREMERGE:%.]] = phi ptr [ [[B:%.]], [[BB1]] ], [ [[TMP0]], [[BB0]] ]
; CHECK-NEXT: ret ptr [[STOREMERGE]]		; CHECK-NEXT: ret ptr [[STOREMERGE]]
;		;
entry:		entry:
%alloca = alloca ptr		%alloca = alloca ptr
br i1 %cond, label %BB0, label %BB1		br i1 %cond, label %BB0, label %BB1
BB0:		BB0:
store i64 %a, ptr %alloca, align 8		store i64 %a, ptr %alloca, align 8
br label %sink		br label %sink
BB1:		BB1:
store ptr %b, ptr %alloca, align 8		store ptr %b, ptr %alloca, align 8
br label %sink		br label %sink
sink:		sink:
%val = load ptr, ptr %alloca		%val = load ptr, ptr %alloca
ret ptr %val		ret ptr %val
}		}

llvm/test/Transforms/InstCombine/pr55228.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=instcombine < %s \| FileCheck %s			; RUN: opt -S -passes='instcombine<no-verify-fixpoint>' < %s \| FileCheck %s

				; This does not reach a fixpoint, because the global initializer is not in
				; folded form. This will not happen if preceded by a GlobalOpt run.

	target datalayout = "p:8:8"			target datalayout = "p:8:8"

	@g = external global i8			@g = external global i8
	@c = constant ptr getelementptr inbounds (i8, ptr @g, i64 1)			@c = constant ptr getelementptr inbounds (i8, ptr @g, i64 1)

	define i1 @test(ptr %p) {			define i1 @test(ptr %p) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[P:%.]], getelementptr inbounds (i8, ptr @g, i8 1)			; CHECK-NEXT: [[CMP:%.]] = icmp eq ptr [[P:%.]], getelementptr inbounds (i8, ptr @g, i64 1)
				nikicAuthorUnsubmitted Done Reply Inline Actions This happens because the initializer of the global is not fully folded. This is not a problem when run in a real optimization pipeline, because GlobalOpt will handle such cases earlier. nikic: This happens because the initializer of the global is not fully folded. This is not a problem…
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	%alloca = alloca ptr			%alloca = alloca ptr
	call void @llvm.memcpy.p0.p0.i32(ptr %alloca, ptr @c, i32 0, i1 false)			call void @llvm.memcpy.p0.p0.i32(ptr %alloca, ptr @c, i32 0, i1 false)
	%load = load ptr, ptr %alloca			%load = load ptr, ptr %alloca
	%cmp = icmp eq ptr %p, %load			%cmp = icmp eq ptr %p, %load
	ret i1 %cmp			ret i1 %cmp
	}			}

	declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg)			declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg)

llvm/test/Transforms/InstCombine/shift.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

; RUN: opt < %s -passes=instcombine -S | FileCheck %s

; RUN: opt < %s -passes='instcombine<no-verify-fixpoint>' -S | FileCheck %s

; The fuzzer-generated @ashr_out_of_range test case does not reach a fixpoint,

; because a logical and it not relaxed to a bitwise and in one iteration.

aeubanksUnsubmitted

Not Done

; The fuzzer-generated @ashr_out_of_range test case does not reach a fixpoint,

- ; because a logical and it not relaxed to a bitwise and in one iteration.

+ ; because a logical and is not relaxed to a bitwise and in one iteration.

declare void @use(i64)

aeubanks:

declare void @use(i64)

declare void @use_i32(i32)

declare i32 @llvm.cttz.i32(i32, i1 immarg)

declare <2 x i8> @llvm.cttz.v2i8(<2 x i8>, i1 immarg)

define <4 x i32> @lshr_non_splat_vector(<4 x i32> %A) {

▲ Show 20 Lines • Show All 1,703 Lines • ▼ Show 20 Lines

; CHECK-LABEL: @ashr_out_of_range(

; CHECK-NEXT: [[L:%.*]] = load i177, ptr [[A:%.*]], align 4

; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i177 [[L]], -1

; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 -1, i64 -2

; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]

; CHECK-NEXT: [[L7:%.*]] = load i177, ptr [[G11]], align 4

; CHECK-NEXT: [[L7_FROZEN:%.*]] = freeze i177 [[L7]]

; CHECK-NEXT: [[C171:%.*]] = icmp slt i177 [[L7_FROZEN]], 0

; CHECK-NEXT: [[C17:%.*]] = and i1 [[TMP1]], [[C171]]

; CHECK-NEXT: [[C17:%.*]] = select i1 [[TMP1]], i1 [[C171]], i1 false

nikicAuthorUnsubmitted

Done

I didn't bother looking into this, because it's a fuzzer test case.

nikic: I didn't bother looking into this, because it's a fuzzer test case.

; CHECK-NEXT: [[TMP3:%.*]] = sext i1 [[C17]] to i64

; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP3]]

; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i177 [[L7_FROZEN]], -1

; CHECK-NEXT: [[B28:%.*]] = select i1 [[TMP4]], i177 0, i177 [[L7_FROZEN]]

; CHECK-NEXT: store i177 [[B28]], ptr [[G62]], align 4

; CHECK-NEXT: ret void

;

%L = load i177, ptr %A

▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/chr.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes='require<profile-summary>,function(chr,instcombine,simplifycfg)' -S \| FileCheck %s			; RUN: opt < %s -passes='require<profile-summary>,function(chr,instcombine<no-verify-fixpoint>,simplifycfg)' -S \| FileCheck %s

				; FIXME: This does not currently reach a fix point, because we don't make use
				; of a freeze that is pushed up the instruction chain later.

	declare void @foo()			declare void @foo()
	declare void @bar()			declare void @bar()

	; Simple case.			; Simple case.
	; Roughly,			; Roughly,
	; t0 = *i			; t0 = *i
	; if ((t0 & 1) != 0) // Likely true			; if ((t0 & 1) != 0) // Likely true
	▲ Show 20 Lines • Show All 1,916 Lines • ▼ Show 20 Lines
	; foo();			; foo();
	; }			; }
	; return 45;			; return 45;
	define i32 @test_chr_21(i64 %i, i64 %k, i64 %j) !prof !14 {			define i32 @test_chr_21(i64 %i, i64 %k, i64 %j) !prof !14 {
	; CHECK-LABEL: @test_chr_21(			; CHECK-LABEL: @test_chr_21(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[J_FR:%.]] = freeze i64 [[J:%.]]			; CHECK-NEXT: [[J_FR:%.]] = freeze i64 [[J:%.]]
	; CHECK-NEXT: [[I_FR:%.]] = freeze i64 [[I:%.]]			; CHECK-NEXT: [[I_FR:%.]] = freeze i64 [[I:%.]]
	; CHECK-NEXT: [[K_FR:%.]] = freeze i64 [[K:%.]]			; CHECK-NEXT: [[CMP0:%.]] = icmp ne i64 [[J_FR]], [[K:%.]]
	; CHECK-NEXT: [[CMP0:%.*]] = icmp ne i64 [[J_FR]], [[K_FR]]			; CHECK-NEXT: [[TMP0:%.*]] = freeze i1 [[CMP0]]
				nikicAuthorUnsubmitted Done Reply Inline Actions At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to introduce two freeze instructions. We could fix this by allowing the creation of more than one freeze when pushing upward. Especially for icmps that is probably beneficial. nikic: At the time we process this freeze, j.fr hasn't been introduced yet, so we would have to…
	; CHECK-NEXT: [[CMP3:%.*]] = icmp ne i64 [[I_FR]], [[J_FR]]			; CHECK-NEXT: [[CMP3:%.*]] = icmp ne i64 [[I_FR]], [[J_FR]]
	; CHECK-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[I_FR]], 86			; CHECK-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[I_FR]], 86
	; CHECK-NEXT: [[TMP0:%.*]] = and i1 [[CMP0]], [[CMP3]]			; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP3]]
	; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[CMP_I]]			; CHECK-NEXT: [[TMP2:%.*]] = and i1 [[TMP1]], [[CMP_I]]
	; CHECK-NEXT: br i1 [[TMP1]], label [[BB1:%.]], label [[ENTRY_SPLIT_NONCHR:%.]], !prof [[PROF15]]			; CHECK-NEXT: br i1 [[TMP2]], label [[BB1:%.]], label [[ENTRY_SPLIT_NONCHR:%.]], !prof [[PROF15]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[CMP2:%.*]] = icmp ne i64 [[I_FR]], 2			; CHECK-NEXT: [[CMP2:%.*]] = icmp ne i64 [[I_FR]], 2
	; CHECK-NEXT: switch i64 [[I_FR]], label [[BB2:%.*]] [			; CHECK-NEXT: switch i64 [[I_FR]], label [[BB2:%.*]] [
	; CHECK-NEXT: i64 2, label [[BB3_NONCHR2:%.*]]			; CHECK-NEXT: i64 2, label [[BB3_NONCHR2:%.*]]
	; CHECK-NEXT: i64 86, label [[BB2_NONCHR1:%.*]]			; CHECK-NEXT: i64 86, label [[BB2_NONCHR1:%.*]]
	; CHECK-NEXT: ], !prof [[PROF19:![0-9]+]]			; CHECK-NEXT: ], !prof [[PROF19:![0-9]+]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB7:%.*]]			; CHECK-NEXT: br label [[BB7:%.*]]
	; CHECK: bb2.nonchr1:			; CHECK: bb2.nonchr1:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB3_NONCHR2]]			; CHECK-NEXT: br label [[BB3_NONCHR2]]
	; CHECK: bb3.nonchr2:			; CHECK: bb3.nonchr2:
	; CHECK-NEXT: br i1 [[CMP_I]], label [[BB4_NONCHR3:%.*]], label [[BB7]], !prof [[PROF18]]			; CHECK-NEXT: br i1 [[CMP_I]], label [[BB4_NONCHR3:%.*]], label [[BB7]], !prof [[PROF18]]
	; CHECK: bb4.nonchr3:			; CHECK: bb4.nonchr3:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB7]]			; CHECK-NEXT: br label [[BB7]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[BB10:%.*]]			; CHECK-NEXT: br label [[BB10:%.*]]
	; CHECK: entry.split.nonchr:			; CHECK: entry.split.nonchr:
	; CHECK-NEXT: br i1 [[CMP0]], label [[BB1_NONCHR:%.*]], label [[BB10]], !prof [[PROF18]]			; CHECK-NEXT: br i1 [[TMP0]], label [[BB1_NONCHR:%.*]], label [[BB10]], !prof [[PROF18]]
	; CHECK: bb1.nonchr:			; CHECK: bb1.nonchr:
	; CHECK-NEXT: [[CMP2_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 2			; CHECK-NEXT: [[CMP2_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 2
	; CHECK-NEXT: br i1 [[CMP2_NONCHR]], label [[BB3_NONCHR:%.]], label [[BB2_NONCHR:%.]], !prof [[PROF16]]			; CHECK-NEXT: br i1 [[CMP2_NONCHR]], label [[BB3_NONCHR:%.]], label [[BB2_NONCHR:%.]], !prof [[PROF16]]
	; CHECK: bb3.nonchr:			; CHECK: bb3.nonchr:
	; CHECK-NEXT: [[CMP_I_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 86			; CHECK-NEXT: [[CMP_I_NONCHR:%.*]] = icmp eq i64 [[I_FR]], 86
	; CHECK-NEXT: br i1 [[CMP_I_NONCHR]], label [[BB6_NONCHR:%.]], label [[BB4_NONCHR:%.]], !prof [[PROF16]]			; CHECK-NEXT: br i1 [[CMP_I_NONCHR]], label [[BB6_NONCHR:%.]], label [[BB4_NONCHR:%.]], !prof [[PROF16]]
	; CHECK: bb6.nonchr:			; CHECK: bb6.nonchr:
	; CHECK-NEXT: [[CMP3_NONCHR:%.*]] = icmp eq i64 [[J_FR]], [[I_FR]]			; CHECK-NEXT: [[CMP3_NONCHR:%.*]] = icmp eq i64 [[J_FR]], [[I_FR]]
	▲ Show 20 Lines • Show All 731 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us:
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[CONV6]], 15			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[CONV6]], 15
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ult i32 [[I]], 210			; CHECK-NEXT: [[TMP6:%.*]] = icmp ult i32 [[I]], 210
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP6]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP6]])
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP5]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_1:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_1:%.*]]
	; CHECK: for.body4.us.1:			; CHECK: for.body4.us.1:
	; CHECK-NEXT: [[K_011_US_1:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US]] ], [ [[INC_US_1:%.]], [[FOR_BODY4_US_1]] ]			; CHECK-NEXT: [[K_011_US_1:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US]] ], [ [[INC_US_1:%.]], [[FOR_BODY4_US_1]] ]
	; CHECK-NEXT: [[NARROW:%.*]] = add nuw nsw i32 [[K_011_US_1]], 15			; CHECK-NEXT: [[CONV_US_1:%.*]] = zext i32 [[K_011_US_1]] to i64
	; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[NARROW]] to i64			; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[CONV_US_1]], 15
				nikicAuthorUnsubmitted Done Reply Inline Actions This is the same backwards reasoning assume issue mentioned above. nikic: This is the same backwards reasoning assume issue mentioned above.
	; CHECK-NEXT: [[TMP9:%.*]] = icmp ult i32 [[K_011_US_1]], 210			; CHECK-NEXT: [[TMP9:%.*]] = icmp ult i32 [[K_011_US_1]], 210
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP9]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP9]])
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP8]]
	; CHECK-NEXT: [[MATRIXEXT_US_1:%.*]] = load double, ptr [[TMP10]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_1:%.*]] = load double, ptr [[TMP10]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_1:%.*]] = load double, ptr [[TMP7]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_1:%.*]] = load double, ptr [[TMP7]], align 8
	; CHECK-NEXT: [[MUL_US_1:%.*]] = fmul double [[MATRIXEXT_US_1]], [[MATRIXEXT8_US_1]]			; CHECK-NEXT: [[MUL_US_1:%.*]] = fmul double [[MATRIXEXT_US_1]], [[MATRIXEXT8_US_1]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP8]]			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP8]]
	; CHECK-NEXT: [[MATRIXEXT11_US_1:%.*]] = load double, ptr [[TMP11]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_1:%.*]] = load double, ptr [[TMP11]], align 8
	; CHECK-NEXT: [[SUB_US_1:%.*]] = fsub double [[MATRIXEXT11_US_1]], [[MUL_US_1]]			; CHECK-NEXT: [[SUB_US_1:%.*]] = fsub double [[MATRIXEXT11_US_1]], [[MUL_US_1]]
	; CHECK-NEXT: store double [[SUB_US_1]], ptr [[TMP11]], align 8			; CHECK-NEXT: store double [[SUB_US_1]], ptr [[TMP11]], align 8
	; CHECK-NEXT: [[INC_US_1]] = add nuw nsw i32 [[K_011_US_1]], 1			; CHECK-NEXT: [[INC_US_1]] = add nuw nsw i32 [[K_011_US_1]], 1
	; CHECK-NEXT: [[CMP2_US_1:%.*]] = icmp ult i32 [[INC_US_1]], [[I]]			; CHECK-NEXT: [[CMP2_US_1:%.*]] = icmp ult i32 [[INC_US_1]], [[I]]
	; CHECK-NEXT: br i1 [[CMP2_US_1]], label [[FOR_BODY4_US_1]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1:%.*]]			; CHECK-NEXT: br i1 [[CMP2_US_1]], label [[FOR_BODY4_US_1]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1:%.*]]
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:
	; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[CONV6]], 30			; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[CONV6]], 30
	; CHECK-NEXT: [[TMP13:%.*]] = icmp ult i32 [[I]], 195			; CHECK-NEXT: [[TMP13:%.*]] = icmp ult i32 [[I]], 195
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP13]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP13]])
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP12]]			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP12]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]
	; CHECK: for.body4.us.2:			; CHECK: for.body4.us.2:
	; CHECK-NEXT: [[K_011_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]			; CHECK-NEXT: [[K_011_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]
	; CHECK-NEXT: [[NARROW14:%.*]] = add nuw nsw i32 [[K_011_US_2]], 30			; CHECK-NEXT: [[CONV_US_2:%.*]] = zext i32 [[K_011_US_2]] to i64
	; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[NARROW14]] to i64			; CHECK-NEXT: [[TMP15:%.*]] = add nuw nsw i64 [[CONV_US_2]], 30
	; CHECK-NEXT: [[TMP16:%.*]] = icmp ult i32 [[K_011_US_2]], 195			; CHECK-NEXT: [[TMP16:%.*]] = icmp ult i32 [[K_011_US_2]], 195
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP16]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP16]])
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP15]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP15]]
	; CHECK-NEXT: [[MATRIXEXT_US_2:%.*]] = load double, ptr [[TMP17]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_2:%.*]] = load double, ptr [[TMP17]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_2:%.*]] = load double, ptr [[TMP14]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_2:%.*]] = load double, ptr [[TMP14]], align 8
	; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]			; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP15]]			; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP15]]
	; CHECK-NEXT: [[MATRIXEXT11_US_2:%.*]] = load double, ptr [[TMP18]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_2:%.*]] = load double, ptr [[TMP18]], align 8
	; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]			; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]
	; CHECK-NEXT: store double [[SUB_US_2]], ptr [[TMP18]], align 8			; CHECK-NEXT: store double [[SUB_US_2]], ptr [[TMP18]], align 8
	; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_011_US_2]], 1			; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_011_US_2]], 1
	; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]			; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]
	; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]			; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]
	; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:			; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:
	; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[CONV6]], 45			; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[CONV6]], 45
	; CHECK-NEXT: [[TMP20:%.*]] = icmp ult i32 [[I]], 180			; CHECK-NEXT: [[TMP20:%.*]] = icmp ult i32 [[I]], 180
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP20]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP20]])
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP19]]
	; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]
	; CHECK: for.body4.us.3:			; CHECK: for.body4.us.3:
	; CHECK-NEXT: [[K_011_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]			; CHECK-NEXT: [[K_011_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]
	; CHECK-NEXT: [[NARROW15:%.*]] = add nuw nsw i32 [[K_011_US_3]], 45			; CHECK-NEXT: [[CONV_US_3:%.*]] = zext i32 [[K_011_US_3]] to i64
	; CHECK-NEXT: [[TMP22:%.*]] = zext i32 [[NARROW15]] to i64			; CHECK-NEXT: [[TMP22:%.*]] = add nuw nsw i64 [[CONV_US_3]], 45
	; CHECK-NEXT: [[TMP23:%.*]] = icmp ult i32 [[K_011_US_3]], 180			; CHECK-NEXT: [[TMP23:%.*]] = icmp ult i32 [[K_011_US_3]], 180
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP23]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP23]])
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP22]]			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds <225 x double>, ptr [[A]], i64 0, i64 [[TMP22]]
	; CHECK-NEXT: [[MATRIXEXT_US_3:%.*]] = load double, ptr [[TMP24]], align 8			; CHECK-NEXT: [[MATRIXEXT_US_3:%.*]] = load double, ptr [[TMP24]], align 8
	; CHECK-NEXT: [[MATRIXEXT8_US_3:%.*]] = load double, ptr [[TMP21]], align 8			; CHECK-NEXT: [[MATRIXEXT8_US_3:%.*]] = load double, ptr [[TMP21]], align 8
	; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]			; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]
	; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP22]]			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds <225 x double>, ptr [[B]], i64 0, i64 [[TMP22]]
	; CHECK-NEXT: [[MATRIXEXT11_US_3:%.*]] = load double, ptr [[TMP25]], align 8			; CHECK-NEXT: [[MATRIXEXT11_US_3:%.*]] = load double, ptr [[TMP25]], align 8
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Only perform one iterationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 544774

llvm/include/llvm/Transforms/InstCombine/InstCombine.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Analysis/ValueTracking/numsignbits-from-assume.ll

llvm/test/Other/new-pm-print-pipeline.ll

llvm/test/Transforms/InstCombine/constant-fold-iteration.ll

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

llvm/test/Transforms/InstCombine/pr55228.ll

llvm/test/Transforms/InstCombine/shift.ll

llvm/test/Transforms/PGOProfile/chr.ll

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

[InstCombine] Only perform one iteration
ClosedPublic