This is an archive of the discontinued LLVM Phabricator instance.

[Passes][VectorCombine] enable early run generally and try load folds
ClosedPublic

Authored by spatel on Nov 19 2022, 7:47 AM.

Download Raw Diff

Details

Reviewers

fhahn
nikic
lebedev.ri
RKSimon

Commits

rG163bb6d64e5f: [Passes][VectorCombine] enable early run generally and try load folds

Summary

An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu.

The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN:
issue #17113

The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with:
87debdadaf18
https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u

...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization:
https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u

It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Nov 19 2022, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2022, 7:47 AM

Herald added subscribers: ormris, wenlei, steven_wu and 2 others. · View Herald Transcript

spatel requested review of this revision.Nov 19 2022, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2022, 7:47 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B198620: Diff 476680.Nov 19 2022, 7:48 AM

lebedev.ri added inline comments.Nov 19 2022, 7:54 AM

llvm/lib/Passes/PassBuilderPipelines.cpp
618–621	What does "reduced" here mean? "obscured"?

spatel added inline comments.Nov 19 2022, 8:15 AM

llvm/lib/Passes/PassBuilderPipelines.cpp

618–621

No, that was supposed to mean "enable more folds".
In the motivating example from #17113, we have:

%2 = load float, ptr %0, align 16
%3 = insertelement <4 x float> undef, float %2, i64 0
%4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
%5 = load float, ptr %4, align 4

VectorCombine can widen the first load (with legality/profitability constraints):

%2 = load <4 x float>, ptr %0, align 16
%3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
%5 = load float, ptr %4, align 4

And GVN then replaces the redundant 2nd load:

%2 = load <4 x float>, ptr %0, align 16
%3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
%5 = bitcast <4 x float> %2 to i128
%6 = lshr i128 %5, 32
%7 = trunc i128 %6 to i32
%8 = bitcast i32 %7 to float

And then InstCombine manages to remove all of those extra instructions.

lebedev.ri added inline comments.Nov 19 2022, 8:23 AM

llvm/lib/Passes/PassBuilderPipelines.cpp
618–621	I would suggest something closer to // Try vectorization/scalarization transforms // that are both improvements themselves, // and can allow further GVN and InstCombine folds. then, maybe, optionally.

Patch updated:
Improved code comment in PassBuilder, so it's more clear why VectorCombine is run (and anchor the position ahead of GVN/InstCombine).

Harbormaster completed remote builds in B198621: Diff 476681.Nov 19 2022, 8:31 AM

Sounds reasonable to me.

LGTM - any other comments?

lebedev.ri accepted this revision.Nov 21 2022, 7:36 AM

This revision is now accepted and ready to land.Nov 21 2022, 7:36 AM

This revision was landed with ongoing or failed builds.Nov 21 2022, 10:58 AM

Closed by commit rG163bb6d64e5f: [Passes][VectorCombine] enable early run generally and try load folds (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG163bb6d64e5f: [Passes][VectorCombine] enable early run generally and try load folds.

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilderPipelines.cpp

7 lines

Transforms/

Vectorize/

VectorCombine.cpp

4 lines

test/

Other/

new-pm-defaults.ll

2 lines

new-pm-thinlto-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

new-pm-thinlto-prelink-pgo-defaults.ll

1 line

new-pm-thinlto-prelink-samplepgo-defaults.ll

1 line

Transforms/

PhaseOrdering/

X86/

vec-load-combine.ll

15 lines

Diff 476952

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 609 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// All loop passes must preserve it, in order to be able to use it.		// All loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),		FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
/UseMemorySSA=/false,		/UseMemorySSA=/false,
/UseBlockFrequencyInfo=/false));		/UseBlockFrequencyInfo=/false));

// Delete small array after loop unroll.		// Delete small array after loop unroll.
FPM.addPass(SROAPass());		FPM.addPass(SROAPass());

// The matrix extension can introduce large vector operations early, which can		// Try vectorization/scalarization transforms that are both improvements
// benefit from running vector-combine early on.		// themselves and can allow further folds with GVN and InstCombine.
if (EnableMatrix)
FPM.addPass(VectorCombinePass(/TryEarlyFoldsOnly=/true));		FPM.addPass(VectorCombinePass(/TryEarlyFoldsOnly=/true));

		lebedev.riUnsubmitted Not Done Reply Inline Actions What does "reduced" here mean? "obscured"? lebedev.ri: What does "reduced" here mean? "obscured"?
		spatelAuthorUnsubmitted Done Reply Inline Actions No, that was supposed to mean "enable more folds". In the motivating example from #17113, we have: %2 = load float, ptr %0, align 16 %3 = insertelement <4 x float> undef, float %2, i64 0 %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1 %5 = load float, ptr %4, align 4 VectorCombine can widen the first load (with legality/profitability constraints): %2 = load <4 x float>, ptr %0, align 16 %3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1 %5 = load float, ptr %4, align 4 And GVN then replaces the redundant 2nd load: %2 = load <4 x float>, ptr %0, align 16 %3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1 %5 = bitcast <4 x float> %2 to i128 %6 = lshr i128 %5, 32 %7 = trunc i128 %6 to i32 %8 = bitcast i32 %7 to float And then InstCombine manages to remove all of those extra instructions. spatel: No, that was supposed to mean "enable more folds". In the motivating example from #17113, we…
		lebedev.riUnsubmitted Done Reply Inline Actions I would suggest something closer to // Try vectorization/scalarization transforms // that are both improvements themselves, // and can allow further GVN and InstCombine folds. then, maybe, optionally. lebedev.ri: I would suggest something closer to ``` // Try vectorization/scalarization transforms //…
// Eliminate redundancies.		// Eliminate redundancies.
FPM.addPass(MergedLoadStoreMotionPass());		FPM.addPass(MergedLoadStoreMotionPass());
if (RunNewGVN)		if (RunNewGVN)
FPM.addPass(NewGVNPass());		FPM.addPass(NewGVNPass());
else		else
FPM.addPass(GVNPass());		FPM.addPass(GVNPass());

// Sparse conditional constant propagation.		// Sparse conditional constant propagation.
▲ Show 20 Lines • Show All 1,361 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 1,694 Lines • ▼ Show 20 Lines	bool VectorCombine::run() {
if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))		if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
auto FoldInst = [this, &MadeChange](Instruction &I) {		auto FoldInst = [this, &MadeChange](Instruction &I) {
Builder.SetInsertPoint(&I);		Builder.SetInsertPoint(&I);
if (!TryEarlyFoldsOnly) {		if (!TryEarlyFoldsOnly) {
if (isa<FixedVectorType>(I.getType())) {		if (isa<FixedVectorType>(I.getType())) {
MadeChange \|= vectorizeLoadInsert(I);
MadeChange \|= widenSubvectorLoad(I);
MadeChange \|= foldInsExtFNeg(I);		MadeChange \|= foldInsExtFNeg(I);
MadeChange \|= foldBitcastShuf(I);		MadeChange \|= foldBitcastShuf(I);
MadeChange \|= foldShuffleOfBinops(I);		MadeChange \|= foldShuffleOfBinops(I);
MadeChange \|= foldSelectShuffle(I);		MadeChange \|= foldSelectShuffle(I);
} else {		} else {
MadeChange \|= foldExtractExtract(I);		MadeChange \|= foldExtractExtract(I);
MadeChange \|= foldExtractedCmps(I);		MadeChange \|= foldExtractedCmps(I);
MadeChange \|= foldShuffleFromReductions(I);		MadeChange \|= foldShuffleFromReductions(I);
}		}
}		}
if (isa<FixedVectorType>(I.getType())) {		if (isa<FixedVectorType>(I.getType())) {
		MadeChange \|= vectorizeLoadInsert(I);
		MadeChange \|= widenSubvectorLoad(I);
MadeChange \|= scalarizeBinopOrCmp(I);		MadeChange \|= scalarizeBinopOrCmp(I);
MadeChange \|= scalarizeLoadExtract(I);		MadeChange \|= scalarizeLoadExtract(I);
}		}
MadeChange \|= foldSingleElementStore(I);		MadeChange \|= foldSingleElementStore(I);
};		};
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-EP-LOOP-LATE-NEXT: Running pass: NoOpLoopPass			; CHECK-EP-LOOP-LATE-NEXT: Running pass: NoOpLoopPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass			; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
	; CHECK-MATRIX: Running pass: VectorCombinePass			; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
				; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
				; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
				; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
				; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
				; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vec-load-combine.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes="default<O3>" -S < %s \| FileCheck %s --check-prefix=SSE			; RUN: opt -passes="default<O3>" -S < %s \| FileCheck %s --check-prefix=SSE
	; RUN: opt -passes="default<O3>" -S -mattr=avx < %s \| FileCheck %s --check-prefix=AVX			; RUN: opt -passes="default<O3>" -S -mattr=avx < %s \| FileCheck %s --check-prefix=AVX

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64--"			target triple = "x86_64--"

	%union.ElementWiseAccess = type { <4 x float> }			%union.ElementWiseAccess = type { <4 x float> }

	$getAt = comdat any			$getAt = comdat any

	define dso_local noundef <4 x float> @ConvertVectors_ByRef(ptr noundef nonnull align 16 dereferenceable(16) %0) #0 {			define dso_local noundef <4 x float> @ConvertVectors_ByRef(ptr noundef nonnull align 16 dereferenceable(16) %0) #0 {
	; SSE-LABEL: @ConvertVectors_ByRef(			; SSE-LABEL: @ConvertVectors_ByRef(
	; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[TMP0:%.]], align 16			; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[TMP0:%.]], align 16
	; SSE-NEXT: [[TMP3:%.*]] = getelementptr inbounds [4 x float], ptr [[TMP0]], i64 0, i64 1			; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
	; SSE-NEXT: [[TMP4:%.*]] = load <2 x float>, ptr [[TMP3]], align 4			; SSE-NEXT: ret <4 x float> [[TMP3]]
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>
	; SSE-NEXT: ret <4 x float> [[TMP7]]
	;			;
	; AVX-LABEL: @ConvertVectors_ByRef(			; AVX-LABEL: @ConvertVectors_ByRef(
	; AVX-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[TMP0:%.]], align 16			; AVX-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[TMP0:%.]], align 16
	; AVX-NEXT: [[TMP3:%.*]] = getelementptr inbounds [4 x float], ptr [[TMP0]], i64 0, i64 2			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
	; AVX-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 8			; AVX-NEXT: ret <4 x float> [[TMP3]]
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP4]], i64 2
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float [[TMP4]], i64 3
	; AVX-NEXT: ret <4 x float> [[TMP6]]
	;			;
	%2 = alloca ptr, align 8			%2 = alloca ptr, align 8
	%3 = alloca <4 x float>, align 16			%3 = alloca <4 x float>, align 16
	store ptr %0, ptr %2, align 8			store ptr %0, ptr %2, align 8
	%4 = load ptr, ptr %2, align 8			%4 = load ptr, ptr %2, align 8
	%5 = call noundef nonnull align 16 dereferenceable(16) ptr @castToElementWiseAccess_ByRef(ptr noundef nonnull align 16 dereferenceable(16) %4)			%5 = call noundef nonnull align 16 dereferenceable(16) ptr @castToElementWiseAccess_ByRef(ptr noundef nonnull align 16 dereferenceable(16) %4)
	%6 = call noundef float @getAt(ptr noundef nonnull align 16 dereferenceable(16) %5, i32 noundef 0)			%6 = call noundef float @getAt(ptr noundef nonnull align 16 dereferenceable(16) %5, i32 noundef 0)
	%7 = insertelement <4 x float> undef, float %6, i32 0			%7 = insertelement <4 x float> undef, float %6, i32 0
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Passes][VectorCombine] enable early run generally and try load foldsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476952

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

llvm/test/Transforms/PhaseOrdering/X86/vec-load-combine.ll

[Passes][VectorCombine] enable early run generally and try load folds
ClosedPublic