This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] new IR transform pass for partial vector ops
ClosedPublic

Authored by spatel on Jan 27 2020, 7:49 AM.

Download Raw Diff

Details

Reviewers

nemanjai
craig.topper
hfinkel
efriedma
lebedev.ri
RKSimon

Commits

rGa17f03bd9393: [VectorCombine] new IR transform pass for partial vector ops

Summary

We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws:

InstCombine - can't happen without TTI, but we don't want target-specific folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy here.

Diff Detail

Event Timeline

spatel created this revision.Jan 27 2020, 7:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2020, 7:50 AM

Herald added subscribers: dexonsmith, steven_wu, hiraditya and 3 others. · View Herald Transcript

Thanks for working on this @spatel - I'm interested to see how far we can take cost driven combines (without conflicting with canonicalizations in other passes).

You mention that you want this to begin as a relatively safe set of combines, but it's probably necessary to get some idea of where you see this going and the range of optimizations it could handle (for instance - are memory ops being considered......)?

In D73480#1844833, @RKSimon wrote:

Thanks for working on this @spatel - I'm interested to see how far we can take cost driven combines (without conflicting with canonicalizations in other passes).

You mention that you want this to begin as a relatively safe set of combines, but it's probably necessary to get some idea of where you see this going and the range of optimizations it could handle (for instance - are memory ops being considered......)?

Yes, I think we could handle some load/store patterns. Note that there is already a LoadStoreVectorizer pass for IR, but it's currently only used for GPU targets as a pre-codegen cleanup. I haven't looked in there to see if it can be extended to solve the problems we've seen.

Here are some examples of potential vector/scalar enhancements that we've been collectively chasing (for many years now...):

Vector load combining: http://llvm.org/PR16739
Vector load combining: http://llvm.org/PR21780
Vector load combining: http://llvm.org/PR39473
Vector store combining: http://llvm.org/PR41892
Insert/extract -> shuffle combining: http://llvm.org/PR34724
Compare insert/extract: http://llvm.org/PR39665
Compare insert/extract: http://llvm.org/PR43745 (Failed to get SLP to do this so far.)
Binop insert/extract: http://llvm.org/PR42633 (I suggested a pass like this proposal in this bug report.)
This is going in the opposite direction, but here's a questionable (because it might interfere with GVN) scalarization proposal for InstCombine: D71828
Similarly, I proposed a scalarization for binops in InstCombine that's hard to justify without TTI: D50992
Shuffle fold that was too dangerous for InstCombine, but should be fine with TTI: D31509

Anybody else have any comments?

spatel mentioned this in D73703: [InstCombine] reassociate splatted vector ops.Jan 30 2020, 6:31 AM

I think the general idea of a pass that uses vector cost models to do peephole optimizations on vectors makes sense. A couple general concerns:

If we're running it in the middle of the pipeline, we need to be careful the transforms don't conflict with instcombine.
The vectorizer's cost model is optimized for throughput; I'm a little worried we'll run into issues with latency if we depend too much on the cost model for scalar code.

In D73480#1850467, @efriedma wrote:

I think the general idea of a pass that uses vector cost models to do peephole optimizations on vectors makes sense. A couple general concerns:

If we're running it in the middle of the pipeline, we need to be careful the transforms don't conflict with instcombine.

AFAIK, the closest we've come to overlapping/conflicting with vector<->scalar transforms is D50992. I'd be happy to abandon that in favor of a cost-aware transform here. Similarly, we could move some questionable shuffle transforms and/or vector demanded elements analysis out of InstCombine to live here. That would likely have a side benefit of making the optimizer slightly faster overall since we don't need to run this pass as often as InstCombine.

The vectorizer's cost model is optimized for throughput; I'm a little worried we'll run into issues with latency if we depend too much on the cost model for scalar code.

We do have: TargetTransformInfo::getInstructionLatency(const Instruction *I)
...but I don't see it used/overridden anywhere in tree currently.

This looks good to me overall.
I'm not sure about pass ordering, but the pass itself seems sound to me.

lebedev.ri added inline comments.Jan 31 2020, 2:35 PM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
56	Word "alternative" usually means "something other than what currently is". Using it here seems misleading to me - currently we do have scalar comparison :)

spatel mentioned this in rGe78fb556c552: [InstCombine] reassociate splatted vector ops.Feb 3 2020, 6:26 AM

spatel marked an inline comment as done.Feb 4 2020, 12:39 PM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
56	Good point - the vector code is the alternative that we will compare to.

Patch updated:
Improved code comment.

I think this is ready to land - does anyone have anymore comments?

rscottmanley added a subscriber: rscottmanley.Feb 6 2020, 7:33 PM

LGTM - thanks for working on this!

This revision is now accepted and ready to land.Feb 7 2020, 10:08 AM

Closed by commit rGa17f03bd9393: [VectorCombine] new IR transform pass for partial vector ops (authored by spatel). · Explain WhyFeb 9 2020, 7:42 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added inline comments.Feb 10 2020, 1:10 PM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
35	Is this counter used?

spatel marked an inline comment as done.Feb 10 2020, 1:53 PM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
35	Oops - no. Copy/pasted from existing code. Looks like other passes do something like: if (!DebugCounter::shouldExecute(VecCombineCounter)) continue; I've never used that myself, so I could either add that code or remove the counter. Let me know if there's a preference. Not sure how we test it?

craig.topper added inline comments.Feb 10 2020, 2:12 PM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
35	Probably fine to drop it. I added the one to instcombine and I've only used it a couple times.

spatel marked an inline comment as done.Feb 11 2020, 6:58 AM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
35	Removed: rGa2a0f9a43a71

liuz added a subscriber: liuz.Feb 19 2020, 6:20 PM

lebedev.ri mentioned this in D75908: [SCEV] isHighCostExpansionHelper(): use correct TTI hooks.Mar 10 2020, 5:04 AM

lebedev.ri mentioned this in rG8737dc2d32e6: [SCEV] isHighCostExpansionHelper(): use correct TTI hooks.Mar 12 2020, 2:20 AM

xbolva00 added a subscriber: xbolva00.Mar 29 2020, 8:15 AM

xbolva00 added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
97	Skip debug insn? DbgInfoIntrinsic?

xbolva00 added inline comments.Mar 29 2020, 8:19 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
97	And skip “cold” blocks?

spatel marked 3 inline comments as done.Mar 29 2020, 11:26 AM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
97	I don't think there's enough going on in this pass yet to make this measurable, but: rGfc3cc8a4b074
97	Is there a pass that we can view as a template for this?

xbolva00 added inline comments.Mar 29 2020, 5:38 PM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
97	https://llvm.org/doxygen/classllvm_1_1ProfileSummaryInfo.html IsColdBlock/IsFunctionEntryCold are probably the helpers we need.

spatel marked an inline comment as done.Mar 30 2020, 9:25 AM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
97	Thinking about this 1 a bit more, it's not clear to me that we want to predicate on hot/cold. This pass can reduce code size, so the transforms are still potentially beneficial. Also, I don't see that kind of restriction on any other combiner passes (or other IR passes in general?), so raise this on llvm-dev, so we have a consistent implementation across different passes?

dongAxis1944 added a subscriber: dongAxis1944.Jan 19 2021, 1:56 AM

Herald added subscribers: wenlei, nikic. · View Herald TranscriptJan 19 2021, 1:56 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Vectorize.h

6 lines

Vectorize/

VectorCombine.h

30 lines

lib/

Passes/

PassBuilder.cpp

6 lines

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

8 lines

Vectorize/

CMakeLists.txt

1 line

VectorCombine.cpp

160 lines

Vectorize.cpp

4 lines

test/

Other/

new-pm-defaults.ll

4 lines

new-pm-thinlto-defaults.ll

4 lines

new-pm-thinlto-postlink-pgo-defaults.ll

4 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

4 lines

opt-O2-pipeline.ll

4 lines

opt-O3-pipeline.ll

4 lines

opt-Os-pipeline.ll

4 lines

Transforms/

VectorCombine/

X86/

extract-cmp.ll

87 lines

lit.local.cfg

2 lines

Diff 240569

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	void initializeThreadSanitizerLegacyPassPass(PassRegistry&);			void initializeThreadSanitizerLegacyPassPass(PassRegistry&);
	void initializeTwoAddressInstructionPassPass(PassRegistry&);			void initializeTwoAddressInstructionPassPass(PassRegistry&);
	void initializeTypeBasedAAWrapperPassPass(PassRegistry&);			void initializeTypeBasedAAWrapperPassPass(PassRegistry&);
	void initializeTypePromotionPass(PassRegistry&);			void initializeTypePromotionPass(PassRegistry&);
	void initializeUnifyFunctionExitNodesPass(PassRegistry&);			void initializeUnifyFunctionExitNodesPass(PassRegistry&);
	void initializeUnpackMachineBundlesPass(PassRegistry&);			void initializeUnpackMachineBundlesPass(PassRegistry&);
	void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);			void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);
	void initializeUnreachableMachineBlockElimPass(PassRegistry&);			void initializeUnreachableMachineBlockElimPass(PassRegistry&);
				void initializeVectorCombineLegacyPassPass(PassRegistry&);
	void initializeVerifierLegacyPassPass(PassRegistry&);			void initializeVerifierLegacyPassPass(PassRegistry&);
	void initializeVirtRegMapPass(PassRegistry&);			void initializeVirtRegMapPass(PassRegistry&);
	void initializeVirtRegRewriterPass(PassRegistry&);			void initializeVirtRegRewriterPass(PassRegistry&);
	void initializeWarnMissedTransformationsLegacyPass(PassRegistry &);			void initializeWarnMissedTransformationsLegacyPass(PassRegistry &);
	void initializeWasmEHPreparePass(PassRegistry&);			void initializeWasmEHPreparePass(PassRegistry&);
	void initializeWholeProgramDevirtPass(PassRegistry&);			void initializeWholeProgramDevirtPass(PassRegistry&);
	void initializeWinEHPreparePass(PassRegistry&);			void initializeWinEHPreparePass(PassRegistry&);
	void initializeWriteBitcodePassPass(PassRegistry&);			void initializeWriteBitcodePassPass(PassRegistry&);
	void initializeWriteThinLTOBitcodePass(PassRegistry&);			void initializeWriteThinLTOBitcodePass(PassRegistry&);
	void initializeXRayInstrumentationPass(PassRegistry&);			void initializeXRayInstrumentationPass(PassRegistry&);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_INITIALIZEPASSES_H			#endif // LLVM_INITIALIZEPASSES_H

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createLintPass();		(void) llvm::createLintPass();
(void) llvm::createSinkingPass();		(void) llvm::createSinkingPass();
(void) llvm::createLowerAtomicPass();		(void) llvm::createLowerAtomicPass();
(void) llvm::createCorrelatedValuePropagationPass();		(void) llvm::createCorrelatedValuePropagationPass();
(void) llvm::createMemDepPrinter();		(void) llvm::createMemDepPrinter();
(void) llvm::createLoopVectorizePass();		(void) llvm::createLoopVectorizePass();
(void) llvm::createSLPVectorizerPass();		(void) llvm::createSLPVectorizerPass();
(void) llvm::createLoadStoreVectorizerPass();		(void) llvm::createLoadStoreVectorizerPass();
		(void) llvm::createVectorCombinePass();
(void) llvm::createPartiallyInlineLibCallsPass();		(void) llvm::createPartiallyInlineLibCallsPass();
(void) llvm::createScalarizerPass();		(void) llvm::createScalarizerPass();
(void) llvm::createSeparateConstOffsetFromGEPPass();		(void) llvm::createSeparateConstOffsetFromGEPPass();
(void) llvm::createSpeculativeExecutionPass();		(void) llvm::createSpeculativeExecutionPass();
(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();		(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
(void) llvm::createStraightLineStrengthReducePass();		(void) llvm::createStraightLineStrengthReducePass();
(void) llvm::createMemDerefPrinter();		(void) llvm::createMemDerefPrinter();
Show All 26 Lines

llvm/include/llvm/Transforms/Vectorize.h

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoadStoreVectorizer - Create vector loads and stores, but leave scalar			// LoadStoreVectorizer - Create vector loads and stores, but leave scalar
	// operations.			// operations.
	//			//
	Pass *createLoadStoreVectorizerPass();			Pass *createLoadStoreVectorizerPass();

				//===----------------------------------------------------------------------===//
				//
				// Optimize partial vector operations using target cost models.
				//
				Pass *createVectorCombinePass();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

llvm/include/llvm/Transforms/Vectorize/VectorCombine.h

This file was added.

				//===-------- VectorCombine.h - Optimize partial vector operations --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass optimizes scalar/vector interactions using target cost models. The
				// transforms implemented here may not fit in traditional loop-based or SLP
				// vectorization passes.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTOR_VECTORCOMBINE_H
				#define LLVM_TRANSFORMS_VECTOR_VECTORCOMBINE_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				/// Optimize scalar/vector interactions in IR using target cost models.
				struct VectorCombinePass : public PassInfoMixin<VectorCombinePass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &);
				};

				}
				#endif // LLVM_TRANSFORMS_VECTOR_VECTORCOMBINE_H

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LowerInvoke.h"		#include "llvm/Transforms/Utils/LowerInvoke.h"
#include "llvm/Transforms/Utils/Mem2Reg.h"		#include "llvm/Transforms/Utils/Mem2Reg.h"
#include "llvm/Transforms/Utils/NameAnonGlobals.h"		#include "llvm/Transforms/Utils/NameAnonGlobals.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"		#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "llvm/Transforms/Vectorize/SLPVectorizer.h"		#include "llvm/Transforms/Vectorize/SLPVectorizer.h"
		#include "llvm/Transforms/Vectorize/VectorCombine.h"

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",		static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",
cl::ReallyHidden, cl::init(4));		cl::ReallyHidden, cl::init(4));
static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-npm-partial-inlining", cl::init(false),		RunPartialInlining("enable-npm-partial-inlining", cl::init(false),
cl::Hidden, cl::ZeroOrMore,		cl::Hidden, cl::ZeroOrMore,
▲ Show 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
OptimizePM.addPass(LoopVectorizePass(		OptimizePM.addPass(LoopVectorizePass(
LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));		LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
OptimizePM.addPass(LoopLoadEliminationPass());		OptimizePM.addPass(LoopLoadEliminationPass());

// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
		OptimizePM.addPass(VectorCombinePass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Now that we've formed fast to execute loop structures, we do further		// Now that we've formed fast to execute loop structures, we do further
// optimizations. These are run afterward as they might block doing complex		// optimizations. These are run afterward as they might block doing complex
// analyses and transforms such as what are needed for loop vectorization.		// analyses and transforms such as what are needed for loop vectorization.

// Cleanup after loop vectorization, etc. Simplification passes like CVP and		// Cleanup after loop vectorization, etc. Simplification passes like CVP and
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().		OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().
forwardSwitchCondToPhi(true).		forwardSwitchCondToPhi(true).
convertSwitchToLookupTable(true).		convertSwitchToLookupTable(true).
needCanonicalLoops(false).		needCanonicalLoops(false).
sinkCommonInsts(true)));		sinkCommonInsts(true)));

// Optimize parallel scalar instruction chains into SIMD instructions.		// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization)		if (PTO.SLPVectorization) {
OptimizePM.addPass(SLPVectorizerPass());		OptimizePM.addPass(SLPVectorizerPass());
		OptimizePM.addPass(VectorCombinePass());
		}

OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Unroll small loops to hide loop backedge latency and saturate any parallel		// Unroll small loops to hide loop backedge latency and saturate any parallel
// execution resources of an out-of-order processor. We also then need to		// execution resources of an out-of-order processor. We also then need to
// clean up redundancies and loop invariant code.		// clean up redundancies and loop invariant code.
// FIXME: It would be really good to use a loop-integrated instruction		// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
▲ Show 20 Lines • Show All 1,464 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("sink", SinkingPass())			FUNCTION_PASS("sink", SinkingPass())
	FUNCTION_PASS("slp-vectorizer", SLPVectorizerPass())			FUNCTION_PASS("slp-vectorizer", SLPVectorizerPass())
	FUNCTION_PASS("speculative-execution", SpeculativeExecutionPass())			FUNCTION_PASS("speculative-execution", SpeculativeExecutionPass())
	FUNCTION_PASS("spec-phis", SpeculateAroundPHIsPass())			FUNCTION_PASS("spec-phis", SpeculateAroundPHIsPass())
	FUNCTION_PASS("sroa", SROA())			FUNCTION_PASS("sroa", SROA())
	FUNCTION_PASS("tailcallelim", TailCallElimPass())			FUNCTION_PASS("tailcallelim", TailCallElimPass())
	FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass())			FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass())
	FUNCTION_PASS("unroll-and-jam", LoopUnrollAndJamPass())			FUNCTION_PASS("unroll-and-jam", LoopUnrollAndJamPass())
				FUNCTION_PASS("vector-combine", VectorCombinePass())
	FUNCTION_PASS("verify", VerifierPass())			FUNCTION_PASS("verify", VerifierPass())
	FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())			FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())
	FUNCTION_PASS("verify<loops>", LoopVerifierPass())			FUNCTION_PASS("verify<loops>", LoopVerifierPass())
	FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())			FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())
	FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())			FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())
	FUNCTION_PASS("verify<safepoint-ir>", SafepointIRVerifierPass())			FUNCTION_PASS("verify<safepoint-ir>", SafepointIRVerifierPass())
	FUNCTION_PASS("verify<scalar-evolution>", ScalarEvolutionVerifierPass())			FUNCTION_PASS("verify<scalar-evolution>", ScalarEvolutionVerifierPass())
	FUNCTION_PASS("view-cfg", CFGViewerPass())			FUNCTION_PASS("view-cfg", CFGViewerPass())
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show All 40 Lines
#include "llvm/Transforms/Scalar/InstSimplifyPass.h"		#include "llvm/Transforms/Scalar/InstSimplifyPass.h"
#include "llvm/Transforms/Scalar/LICM.h"		#include "llvm/Transforms/Scalar/LICM.h"
#include "llvm/Transforms/Scalar/LoopUnrollPass.h"		#include "llvm/Transforms/Scalar/LoopUnrollPass.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "llvm/Transforms/Vectorize/SLPVectorizer.h"		#include "llvm/Transforms/Vectorize/SLPVectorizer.h"
		#include "llvm/Transforms/Vectorize/VectorCombine.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,		RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,
cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));		cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));

static cl::opt<bool>		static cl::opt<bool>
▲ Show 20 Lines • Show All 643 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// of the current iteration.		// of the current iteration.
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
// inserted in the pipeline, even when the vectorizer doesn't run (ex. when		// inserted in the pipeline, even when the vectorizer doesn't run (ex. when
// on -O1 and no #pragma is found). Would be good to have these two passes		// on -O1 and no #pragma is found). Would be good to have these two passes
// as function calls, so that we can only pass them when the vectorizer		// as function calls, so that we can only pass them when the vectorizer
// changed the code.		// changed the code.
		MPM.add(createVectorCombinePass());
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
// At higher optimization levels, try to clean up any runtime overlap and		// At higher optimization levels, try to clean up any runtime overlap and
// alignment checks inserted by the vectorizer. We want to track correllated		// alignment checks inserted by the vectorizer. We want to track correllated
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
Show All 10 Lines	void PassManagerBuilder::populateModulePassManager(
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
MPM.add(createCFGSimplificationPass(1, true, true, false, true));		MPM.add(createCFGSimplificationPass(1, true, true, false, true));

if (SLPVectorize) {		if (SLPVectorize) {
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
		MPM.add(createVectorCombinePass());
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
MPM.add(createEarlyCSEPass());		MPM.add(createEarlyCSEPass());
}		}
}		}

addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);

▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
PM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,		PM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));

PM.add(createWarnMissedTransformationsPass());		PM.add(createWarnMissedTransformationsPass());

// Now that we've optimized loops (in particular loop induction variables),		// Now that we've optimized loops (in particular loop induction variables),
// we may have exposed more scalar opportunities. Run parts of the scalar		// we may have exposed more scalar opportunities. Run parts of the scalar
// optimizer again at this point.		// optimizer again at this point.
		PM.add(createVectorCombinePass());
addInstructionCombiningPass(PM); // Initial cleanup		addInstructionCombiningPass(PM); // Initial cleanup
PM.add(createCFGSimplificationPass()); // if-convert		PM.add(createCFGSimplificationPass()); // if-convert
PM.add(createSCCPPass()); // Propagate exposed constants		PM.add(createSCCPPass()); // Propagate exposed constants
addInstructionCombiningPass(PM); // Clean up again		addInstructionCombiningPass(PM); // Clean up again
PM.add(createBitTrackingDCEPass());		PM.add(createBitTrackingDCEPass());

// More scalar chains could be vectorized due to more alias information		// More scalar chains could be vectorized due to more alias information
if (SLPVectorize)		if (SLPVectorize) {
PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
		PM.add(createVectorCombinePass()); // Clean up partial vectorization.
		}

// After vectorization, assume intrinsics may tell us more about pointer		// After vectorization, assume intrinsics may tell us more about pointer
// alignments.		// alignments.
PM.add(createAlignmentFromAssumptionsPass());		PM.add(createAlignmentFromAssumptionsPass());

// Cleanup and simplify the code after the scalar optimizations.		// Cleanup and simplify the code after the scalar optimizations.
addInstructionCombiningPass(PM);		addInstructionCombiningPass(PM);
addExtensionsToPM(EP_Peephole, PM);		addExtensionsToPM(EP_Peephole, PM);
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/CMakeLists.txt

	add_llvm_component_library(LLVMVectorize			add_llvm_component_library(LLVMVectorize
	LoadStoreVectorizer.cpp			LoadStoreVectorizer.cpp
	LoopVectorizationLegality.cpp			LoopVectorizationLegality.cpp
	LoopVectorize.cpp			LoopVectorize.cpp
	SLPVectorizer.cpp			SLPVectorizer.cpp
	Vectorize.cpp			Vectorize.cpp
				VectorCombine.cpp
	VPlan.cpp			VPlan.cpp
	VPlanHCFGBuilder.cpp			VPlanHCFGBuilder.cpp
	VPlanPredicator.cpp			VPlanPredicator.cpp
	VPlanSLP.cpp			VPlanSLP.cpp
	VPlanTransforms.cpp			VPlanTransforms.cpp
	VPlanVerifier.cpp			VPlanVerifier.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

This file was added.

				//===------- VectorCombine.cpp - Optimize partial vector operations -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass optimizes scalar/vector interactions using target cost models. The
				// transforms implemented here may not fit in traditional loop-based or SLP
				// vectorization passes.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Vectorize/VectorCombine.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/DebugCounter.h"
				#include "llvm/Transforms/Vectorize.h"
				#include "llvm/Transforms/Utils/Local.h"

				using namespace llvm;
				using namespace llvm::PatternMatch;

				#define DEBUG_TYPE "vector-combine"
				STATISTIC(NumVecCmp, "Number of vector compares formed");
				DEBUG_COUNTER(VecCombineCounter, "vector-combine-transform",
				"Controls transformations in vector-combine pass");
				craig.topperUnsubmitted Not Done Reply Inline Actions Is this counter used? craig.topper: Is this counter used?
				spatelAuthorUnsubmitted Done Reply Inline Actions Oops - no. Copy/pasted from existing code. Looks like other passes do something like: if (!DebugCounter::shouldExecute(VecCombineCounter)) continue; I've never used that myself, so I could either add that code or remove the counter. Let me know if there's a preference. Not sure how we test it? spatel: Oops - no. Copy/pasted from existing code. Looks like other passes do something like: if (!
				craig.topperUnsubmitted Not Done Reply Inline Actions Probably fine to drop it. I added the one to instcombine and I've only used it a couple times. craig.topper: Probably fine to drop it. I added the one to instcombine and I've only used it a couple times.
				spatelAuthorUnsubmitted Done Reply Inline Actions Removed: rGa2a0f9a43a71 spatel: Removed: rGa2a0f9a43a71

				static bool foldExtractCmp(Instruction &I, const TargetTransformInfo &TTI) {
				// Match a cmp with extracted vector operands.
				CmpInst::Predicate Pred;
				Instruction Ext0, Ext1;
				if (!match(&I, m_Cmp(Pred, m_Instruction(Ext0), m_Instruction(Ext1))))
				return false;

				Value V0, V1;
				ConstantInt *C;
				if (!match(Ext0, m_ExtractElement(m_Value(V0), m_ConstantInt(C))) \|\|
				!match(Ext1, m_ExtractElement(m_Value(V1), m_Specific(C))) \|\|
				V0->getType() != V1->getType())
				return false;

				Type *ScalarTy = Ext0->getType();
				Type *VecTy = V0->getType();
				bool IsFP = ScalarTy->isFloatingPointTy();
				unsigned CmpOpcode = IsFP ? Instruction::FCmp : Instruction::ICmp;

				// Check if the scalar alternative is cheaper. Extra uses of the extracts mean
				lebedev.riUnsubmitted Not Done Reply Inline Actions Word "alternative" usually means "something other than what currently is". Using it here seems misleading to me - currently we do have scalar comparison :) lebedev.ri: Word "alternative" usually means "something other than what currently is". Using it here seems…
				spatelAuthorUnsubmitted Done Reply Inline Actions Good point - the vector code is the alternative that we will compare to. spatel: Good point - the vector code is the alternative that we will compare to.
				// that we include those costs in the vector total because those instructions
				// will not be eliminated.
				// ((2 * extract) + scalar cmp) < (vector cmp + extract) ?
				int ExtractCost = TTI.getVectorInstrCost(Instruction::ExtractElement,
				VecTy, C->getZExtValue());
				int ScalarCmpCost = TTI.getOperationCost(CmpOpcode, ScalarTy);
				int VecCmpCost = TTI.getOperationCost(CmpOpcode, VecTy);

				int ScalarCost = 2 * ExtractCost + ScalarCmpCost;
				int VecCost = VecCmpCost + ExtractCost +
				!Ext0->hasOneUse() * ExtractCost +
				!Ext1->hasOneUse() * ExtractCost;
				if (ScalarCost < VecCost)
				return false;

				// cmp Pred (extelt V0, C), (extelt V1, C) --> extelt (cmp Pred V0, V1), C
				++NumVecCmp;
				IRBuilder<> Builder(&I);
				Value *VecCmp = IsFP ? Builder.CreateFCmp(Pred, V0, V1)
				: Builder.CreateICmp(Pred, V0, V1);
				Value *Ext = Builder.CreateExtractElement(VecCmp, C);
				I.replaceAllUsesWith(Ext);
				return true;
				}

				/// This is the entry point for all transforms. Pass manager differences are
				/// handled in the callers of this function.
				static bool runImpl(Function &F, const TargetTransformInfo &TTI,
				const DominatorTree &DT) {
				bool MadeChange = false;
				for (BasicBlock &BB : F) {
				// Ignore unreachable basic blocks.
				if (!DT.isReachableFromEntry(&BB))
				continue;
				// Do not delete instructions under here and invalidate the iterator.
				// Walk the block backwards for efficiency. We're matching a chain of
				// use->defs, so we're more likely to succeed by starting from the bottom.
				// TODO: It could be more efficient to remove dead instructions
				// iteratively in this loop rather than waiting until the end.
				for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
				MadeChange \|= foldExtractCmp(I, TTI);
				xbolva00Unsubmitted Done Reply Inline Actions Skip debug insn? DbgInfoIntrinsic? xbolva00: Skip debug insn? DbgInfoIntrinsic?
				xbolva00Unsubmitted Not Done Reply Inline Actions And skip “cold” blocks? xbolva00: And skip “cold” blocks?
				spatelAuthorUnsubmitted Done Reply Inline Actions Is there a pass that we can view as a template for this? spatel: Is there a pass that we can view as a template for this?
				spatelAuthorUnsubmitted Done Reply Inline Actions I don't think there's enough going on in this pass yet to make this measurable, but: rGfc3cc8a4b074 spatel: I don't think there's enough going on in this pass yet to make this measurable, but…
				xbolva00Unsubmitted Not Done Reply Inline Actions https://llvm.org/doxygen/classllvm_1_1ProfileSummaryInfo.html IsColdBlock/IsFunctionEntryCold are probably the helpers we need. xbolva00: https://llvm.org/doxygen/classllvm_1_1ProfileSummaryInfo.html IsColdBlock/IsFunctionEntryCold…
				spatelAuthorUnsubmitted Done Reply Inline Actions Thinking about this 1 a bit more, it's not clear to me that we want to predicate on hot/cold. This pass can reduce code size, so the transforms are still potentially beneficial. Also, I don't see that kind of restriction on any other combiner passes (or other IR passes in general?), so raise this on llvm-dev, so we have a consistent implementation across different passes? spatel: Thinking about this 1 a bit more, it's not clear to me that we want to predicate on hot/cold.
				// TODO: More transforms go here.
				}
				}

				// We're done with transforms, so remove dead instructions.
				if (MadeChange)
				for (BasicBlock &BB : F)
				SimplifyInstructionsInBlock(&BB);

				return MadeChange;
				}

				// Pass manager boilerplate below here.

				namespace {
				class VectorCombineLegacyPass : public FunctionPass {
				public:
				static char ID;
				VectorCombineLegacyPass() : FunctionPass(ID) {
				initializeVectorCombineLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.setPreservesCFG();
				AU.addPreserved<DominatorTreeWrapperPass>();
				AU.addPreserved<GlobalsAAWrapperPass>();
				FunctionPass::getAnalysisUsage(AU);
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;
				auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				return runImpl(F, TTI, DT);
				}
				};
				} // namespace

				char VectorCombineLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(VectorCombineLegacyPass, "vector-combine",
				"Optimize scalar/vector ops", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(VectorCombineLegacyPass, "vector-combine",
				"Optimize scalar/vector ops", false, false)
				Pass *llvm::createVectorCombinePass() {
				return new VectorCombineLegacyPass();
				}

				PreservedAnalyses VectorCombinePass::run(Function &F,
				FunctionAnalysisManager &FAM) {
				TargetTransformInfo &TTI = FAM.getResult<TargetIRAnalysis>(F);
				DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);
				if (!runImpl(F, TTI, DT))
				return PreservedAnalyses::all();
				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				PA.preserve<GlobalsAA>();
				return PA;
				}

llvm/lib/Transforms/Vectorize/Vectorize.cpp

	Show All 15 Lines
	#include "llvm-c/Initialization.h"			#include "llvm-c/Initialization.h"
	#include "llvm-c/Transforms/Vectorize.h"			#include "llvm-c/Transforms/Vectorize.h"
	#include "llvm/Analysis/Passes.h"			#include "llvm/Analysis/Passes.h"
	#include "llvm/IR/LegacyPassManager.h"			#include "llvm/IR/LegacyPassManager.h"
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"

	using namespace llvm;			using namespace llvm;

	/// initializeVectorizationPasses - Initialize all passes linked into the			/// Initialize all passes linked into the Vectorization library.
	/// Vectorization library.
	void llvm::initializeVectorization(PassRegistry &Registry) {			void llvm::initializeVectorization(PassRegistry &Registry) {
	initializeLoopVectorizePass(Registry);			initializeLoopVectorizePass(Registry);
	initializeSLPVectorizerPass(Registry);			initializeSLPVectorizerPass(Registry);
	initializeLoadStoreVectorizerLegacyPassPass(Registry);			initializeLoadStoreVectorizerLegacyPassPass(Registry);
				initializeVectorCombineLegacyPassPass(Registry);
	}			}

	void LLVMInitializeVectorization(LLVMPassRegistryRef R) {			void LLVMInitializeVectorization(LLVMPassRegistryRef R) {
	initializeVectorization(*unwrap(R));			initializeVectorization(*unwrap(R));
	}			}

	void LLVMAddLoopVectorizePass(LLVMPassManagerRef PM) {			void LLVMAddLoopVectorizePass(LLVMPassManagerRef PM) {
	unwrap(PM)->add(createLoopVectorizePass());			unwrap(PM)->add(createLoopVectorizePass());
	}			}

	void LLVMAddSLPVectorizePass(LLVMPassManagerRef PM) {			void LLVMAddSLPVectorizePass(LLVMPassManagerRef PM) {
	unwrap(PM)->add(createSLPVectorizerPass());			unwrap(PM)->add(createSLPVectorizerPass());
	}			}

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
				; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O2-NEXT: Running pass: VectorCombinePass
				; CHECK-O3-NEXT: Running pass: VectorCombinePass
				; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
				; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-POSTLINK-O2-NEXT: Running pass: VectorCombinePass
				; CHECK-POSTLINK-O3-NEXT: Running pass: VectorCombinePass
				; CHECK-POSTLINK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
				; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O2-NEXT: Running pass: VectorCombinePass
				; CHECK-O3-NEXT: Running pass: VectorCombinePass
				; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
				; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O2-NEXT: Running pass: VectorCombinePass
				; CHECK-O3-NEXT: Running pass: VectorCombinePass
				; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
				; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
				; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
				; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/X86/extract-cmp.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- \| FileCheck %s

				define i1 @cmp_v4i32(<4 x float> %arg, <4 x float> %arg1) {
				; CHECK-LABEL: @cmp_v4i32(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[T:%.]] = bitcast <4 x float> [[ARG:%.]] to <4 x i32>
				; CHECK-NEXT: [[T3:%.]] = bitcast <4 x float> [[ARG1:%.]] to <4 x i32>
				; CHECK-NEXT: [[TMP0:%.*]] = icmp eq <4 x i32> [[T]], [[T3]]
				; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i1> [[TMP0]], i32 0
				; CHECK-NEXT: br i1 [[TMP1]], label [[BB6:%.]], label [[BB18:%.]]
				; CHECK: bb6:
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[T]], [[T3]]
				; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i1> [[TMP2]], i32 1
				; CHECK-NEXT: br i1 [[TMP3]], label [[BB10:%.*]], label [[BB18]]
				; CHECK: bb10:
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[T]], [[T3]]
				; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; CHECK-NEXT: br i1 [[TMP5]], label [[BB14:%.*]], label [[BB18]]
				; CHECK: bb14:
				; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i32> [[T]], [[T3]]
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP6]], i32 3
				; CHECK-NEXT: br label [[BB18]]
				; CHECK: bb18:
				; CHECK-NEXT: [[T19:%.]] = phi i1 [ false, [[BB10]] ], [ false, [[BB6]] ], [ false, [[BB:%.]] ], [ [[TMP7]], [[BB14]] ]
				; CHECK-NEXT: ret i1 [[T19]]
				;
				bb:
				%t = bitcast <4 x float> %arg to <4 x i32>
				%t2 = extractelement <4 x i32> %t, i32 0
				%t3 = bitcast <4 x float> %arg1 to <4 x i32>
				%t4 = extractelement <4 x i32> %t3, i32 0
				%t5 = icmp eq i32 %t2, %t4
				br i1 %t5, label %bb6, label %bb18

				bb6:
				%t7 = extractelement <4 x i32> %t, i32 1
				%t8 = extractelement <4 x i32> %t3, i32 1
				%t9 = icmp eq i32 %t7, %t8
				br i1 %t9, label %bb10, label %bb18

				bb10:
				%t11 = extractelement <4 x i32> %t, i32 2
				%t12 = extractelement <4 x i32> %t3, i32 2
				%t13 = icmp eq i32 %t11, %t12
				br i1 %t13, label %bb14, label %bb18

				bb14:
				%t15 = extractelement <4 x i32> %t, i32 3
				%t16 = extractelement <4 x i32> %t3, i32 3
				%t17 = icmp eq i32 %t15, %t16
				br label %bb18

				bb18:
				%t19 = phi i1 [ false, %bb10 ], [ false, %bb6 ], [ false, %bb ], [ %t17, %bb14 ]
				ret i1 %t19
				}

				define i32 @cmp_v2f64(<2 x double> %x, <2 x double> %y, <2 x double> %z) {
				; CHECK-LABEL: @cmp_v2f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = fcmp oeq <2 x double> [[X:%.]], [[Y:%.*]]
				; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
				; CHECK-NEXT: br i1 [[TMP1]], label [[T:%.]], label [[F:%.]]
				; CHECK: t:
				; CHECK-NEXT: [[TMP2:%.]] = fcmp ogt <2 x double> [[Y]], [[Z:%.]]
				; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
				; CHECK-NEXT: [[E:%.*]] = select i1 [[TMP3]], i32 42, i32 99
				; CHECK-NEXT: ret i32 [[E]]
				; CHECK: f:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%x1 = extractelement <2 x double> %x, i32 1
				%y1 = extractelement <2 x double> %y, i32 1
				%cmp1 = fcmp oeq double %x1, %y1
				br i1 %cmp1, label %t, label %f

				t:
				%z1 = extractelement <2 x double> %z, i32 1
				%cmp2 = fcmp ogt double %y1, %z1
				%e = select i1 %cmp2, i32 42, i32 99
				ret i32 %e

				f:
				ret i32 0
				}

llvm/test/Transforms/VectorCombine/X86/lit.local.cfg

This file was added.

				if not 'X86' in config.root.targets:
				config.unsupported = True

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] new IR transform pass for partial vector opsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 240569

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Vectorize.h

llvm/include/llvm/Transforms/Vectorize/VectorCombine.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Vectorize/CMakeLists.txt

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/lib/Transforms/Vectorize/Vectorize.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Transforms/VectorCombine/X86/extract-cmp.ll

llvm/test/Transforms/VectorCombine/X86/lit.local.cfg

[VectorCombine] new IR transform pass for partial vector ops
ClosedPublic