Download Raw Diff

Details

Reviewers

chandlerc
hfinkel
asbirlea
mehdi_amini
vitalybuka

Summary

Fix full unrolling with new pass manager.

Last we looked at this and couldn't come up with a reason to change
it, but with a pragma for full loop unrolling we bypass every other
loop unroll and then fail to fully unroll a loop when the pragma is set.

Move the OnlyWhenForced out of the check and into the initialization
of the full unroll pass in the new pass manager. This doesn't show up
with the old pass manager.

Add a new option to opt so that we can turn off loop unrolling
manually since this is a difference between clang and opt.

Tested with check-clang and check-llvm.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

echristo created this revision.Dec 18 2019, 8:14 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 18 2019, 8:14 PM

Herald added subscribers: llvm-commits, cfe-commits, hiraditya, mcrosier. · View Herald Transcript

Formatting and parens changes.

Unit tests: pass. 61011 tests passed, 0 failed and 728 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61011 tests passed, 0 failed and 728 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B42771: Diff 234655.Dec 18 2019, 8:48 PM

Harbormaster completed remote builds in B42772: Diff 234656.

Ping ping goes the trolley.

MaskRay added a subscriber: MaskRay.Dec 26 2019, 3:34 PM

MaskRay added inline comments.

clang/test/Misc/loop-opt-setup.c
13	The test seems to pass without the code change (llvm/lib/Passes/PassBuilder.cpp) below.

mehdi_amini accepted this revision.Dec 26 2019, 3:36 PM

This revision is now accepted and ready to land.Dec 26 2019, 3:36 PM

mehdi_amini added inline comments.Dec 26 2019, 3:37 PM

llvm/lib/Passes/PassBuilder.cpp
514	I would document that the normal unroll pass does not honor the forced unroll, it is surprising to me.

Can we add an LLVM test w/ the metadata so that we have an entirely LLVM test flow that ensures the pass builder DTRT?

(I still would include the Clang side test which is also very useful to test integrating Clang w/ different flows through the pass manager.)

Fixed the clang test.

Tried to get something that I could reduce down and duplicate with just opt but it's been... difficult. Even the small clang testcase in isolation won't duplicate via something like:

clang -O0 -fexperimental-new-pass-manager foo.cc -S -o - -emit-llvm -mllvm -disable-llvm-optzns | opt -passes='default<O1>' -transform-warning -pass-remarks-missed=transform-warning -S -o -

Might be holding it wrong, but this isn't very discoverable if so :)

Fix comments around full unroller.

echristo marked an inline comment as done.Mar 13 2020, 9:17 AM

Harbormaster failed remote builds in B49151: Diff 250229!Mar 13 2020, 10:12 AM

Harbormaster failed remote builds in B49153: Diff 250232!

Can this be landed?

Add a testcase with opt and command line option so we can enable it.

Herald added a subscriber: zzheng. · View Herald TranscriptMay 4 2020, 2:06 PM

Harbormaster completed remote builds in B55700: Diff 261928.May 4 2020, 3:05 PM

echristo marked an inline comment as done.May 4 2020, 3:49 PM

Wooot about finally having a test case! (Sorry for nit picking it a bit ....)

llvm/test/Transforms/LoopUnroll/FullUnroll.ll
4–6	Make sure it is in the correct function at least, and maybe after the label for the loop header?
12–17	Are both functions needed?
19–20	Nit, but minimize and clean this function up a touch? At the least, removing all the target features seems valuable, and I'd give things stable names instead of numbered values.

Update and reduce testcase a bit.

OK, ready again :)

llvm/test/Transforms/LoopUnroll/FullUnroll.ll
4–6	Got it. There's not a lot of function left at the end: define void @walrus() local_unnamed_addr #0 { entry: br label %for.cond.preheader for.cond.preheader: ; preds = %for.cond.preheader, %entry br label %for.cond.preheader }
12–17	I thought so, but apparently not :)
19–20	Got it. Most things have stable names, only a few temporaries have numbered names.

Harbormaster failed remote builds in B55973: Diff 262458!May 6 2020, 2:11 PM

chandlerc added inline comments.May 12 2020, 11:56 PM

clang/test/Misc/loop-opt-setup.c
12	This is dead now that you have different prefixes...
32–33	But for which function? I'd rework these to be more modern `CHECK-LABEL` and affirmative checks for the unrolling.
llvm/test/Transforms/LoopUnroll/FullUnroll.ll
4–6	Then switch to generated `CHECK` lines? The checks we have here will pass easily after incorrect edits that cause this test to not actually check what it intends to. I'd either check something that constructively shows unrolling or generate exact checks (if its small enough to be genuinely stable).
19–20	Sure, but even those will make future edits to this really frustrating with irrelevant diff, etc. And this test case can still be minimized. As an example, the function is currently attributed as optnone. =/ I think you can just run this through the metarenamer pass?

Done. New diff incoming.

clang/test/Misc/loop-opt-setup.c
12	So it is. I'd actually missed this comment and saw it when fixing up other things. Thanks :)

Update and restructure some test cases.

Cool, thanks and LGTM!

Harbormaster completed remote builds in B58521: Diff 267424.May 29 2020, 9:14 PM

Committed a while back.

I'm looking at enabling the -enable-npm-optnone flag and FullUnroll.ll fails. I understand that loop unrolling should be forced when some metadata is present, but the FullUnroll.ll test seems to check for a lot more than that. It checks for (roughly) two unconditional branches and no conditional branches. Running the test under -debug-pass-manager and --print-after-all, , it looks like InstCombine does a lot of work removing instructions, and SimplifyCFG is ultimately what removes all the conditional branches. But InstCombine and SimplifyCFG shouldn't run on an optnone function, right?

aeubanks mentioned this in D86005: [NewPM][LoopFullUnroll] Make LoopFullUnrollPass required.Aug 24 2020, 1:47 PM

aeubanks mentioned this in D86485: [test] Fix FullUnroll.ll.Aug 24 2020, 1:47 PM

aeubanks mentioned this in rGf2f0474c93ee: [test] Fix FullUnroll.ll.Sep 17 2020, 4:00 PM

Diff 261928

clang/test/Misc/loop-opt-setup.c

	// RUN: %clang -O1 -fexperimental-new-pass-manager -fno-unroll-loops -S -o - %s -emit-llvm \| FileCheck %s			// RUN: %clang -O1 -fexperimental-new-pass-manager -fno-unroll-loops -S -o - %s -emit-llvm \| FileCheck %s -check-prefix=CHECK-NEWPM
	// RUN: %clang -O1 -fno-experimental-new-pass-manager -fno-unroll-loops -S -o - %s -emit-llvm \| FileCheck %s			// RUN: %clang -O1 -fno-experimental-new-pass-manager -fno-unroll-loops -S -o - %s -emit-llvm \| FileCheck %s -check-prefix=CHECK-OLDPM
	extern int a[16];			extern int a[16];
	int b = 0;			int b = 0;
	int foo(void) {			int foo(void) {
	#pragma unroll			#pragma unroll
	for (int i = 0; i < 16; ++i)			for (int i = 0; i < 16; ++i)
	a[i] = b += 2;			a[i] = b += 2;
	return b;			return b;
	}			}
				// Check br i1 to make sure that the loop is fully unrolled
	// CHECK-NOT: br i1			// CHECK-NOT: br i1
				chandlercUnsubmitted Done Reply Inline Actions This is dead now that you have different prefixes... chandlerc: This is dead now that you have different prefixes...
				echristoAuthorUnsubmitted Done Reply Inline Actions So it is. I'd actually missed this comment and saw it when fixing up other things. Thanks :) echristo: So it is. I'd actually missed this comment and saw it when fixing up other things. Thanks :)

				MaskRayUnsubmitted Done Reply Inline Actions The test seems to pass without the code change (llvm/lib/Passes/PassBuilder.cpp) below. MaskRay: The test seems to pass without the code change (llvm/lib/Passes/PassBuilder.cpp) below.
				inline void Helper() {
				const int *nodes[5];
				int num_active = 5;

				while (num_active) {
				#pragma clang loop unroll(full)
				for (int i = 0; i < 5; ++i) {
				if (nodes[i]) {
				--num_active;
				}
				}
				}
				}

				void Run() {
				Helper();
				}

				// Check br i1 to make sure the loop is gone, there will still be a label branch for the infinite loop.
				// CHECK-NEWPM-NOT: br i1
				chandlercUnsubmitted Not Done Reply Inline Actions But for which function? I'd rework these to be more modern `CHECK-LABEL` and affirmative checks for the unrolling. chandlerc: But for which function? I'd rework these to be more modern `CHECK-LABEL` and affirmative…

				// The old pass manager doesn't remove the loop so check for 5 load i32*.
				// CHECK-OLDPM: Helper
				// CHECK-OLDPM: load i32*
				// CHECK-OLDPM: load i32*
				// CHECK-OLDPM: load i32*
				// CHECK-OLDPM: load i32*
				// CHECK-OLDPM: load i32*

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 497 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());

for (auto &C : LateLoopOptimizationsEPCallbacks)		for (auto &C : LateLoopOptimizationsEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

LPM2.addPass(LoopDeletionPass());		LPM2.addPass(LoopDeletionPass());
// Do not enable unrolling in PreLinkThinLTO phase during sample PGO		// Do not enable unrolling in PreLinkThinLTO phase during sample PGO
// because it changes IR to makes profile annotation in back compile		// because it changes IR to makes profile annotation in back compile
// inaccurate.		// inaccurate. The normal unroller doesn't pay attention to forced full unroll
if ((Phase != ThinLTOPhase::PreLink \|\| !PGOOpt \|\|		// attributes so we need to make sure and allow the full unroll pass to pay
PGOOpt->Action != PGOOptions::SampleUse) &&		// attention to it.
PTO.LoopUnrolling)		if (Phase != ThinLTOPhase::PreLink \|\| !PGOOpt \|\|
		PGOOpt->Action != PGOOptions::SampleUse)
LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),		LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
/OnlyWhenForced=/false,		/* OnlyWhenForced= */ !PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll));		PTO.ForgetAllSCEVInLoopUnroll));

		mehdi_aminiUnsubmitted Done Reply Inline Actions I would document that the normal unroll pass does not honor the forced unroll, it is surprising to me. mehdi_amini: I would document that the normal unroll pass does not honor the forced unroll, it is surprising…
for (auto &C : LoopOptimizerEndEPCallbacks)		for (auto &C : LoopOptimizerEndEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

// We provide the opt remark emitter pass for LICM to use. We only need to do		// We provide the opt remark emitter pass for LICM to use. We only need to do
// this once as it is immutable.		// this once as it is immutable.
FPM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		FPM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
std::move(LPM1), EnableMSSALoopDependency, DebugLogging));		std::move(LPM1), EnableMSSALoopDependency, DebugLogging));
▲ Show 20 Lines • Show All 1,992 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/FullUnroll.ll

This file was added.

				; RUN: opt -passes='default<O1>' -disable-verify --mtriple x86_64-pc-linux-gnu -new-pm-disable-loop-unrolling=true \
				; RUN: -S -o - %s \| FileCheck %s

				; We don't end up deleting the loop, but we remove everything inside of it so checking for any
				; reasonable instruction from the original loop will work.
				; CHECK-NOT: br i1
				chandlercUnsubmitted Done Reply Inline Actions Make sure it is in the correct function at least, and maybe after the label for the loop header? chandlerc: Make sure it is in the correct function at least, and maybe after the label for the loop header?
				echristoAuthorUnsubmitted Done Reply Inline Actions Got it. There's not a lot of function left at the end: define void @walrus() local_unnamed_addr #0 { entry: br label %for.cond.preheader for.cond.preheader: ; preds = %for.cond.preheader, %entry br label %for.cond.preheader } echristo: Got it. There's not a lot of function left at the end: define void @walrus()…
				chandlercUnsubmitted Done Reply Inline Actions Then switch to generated `CHECK` lines? The checks we have here will pass easily after incorrect edits that cause this test to not actually check what it intends to. I'd either check something that constructively shows unrolling or generate exact checks (if its small enough to be genuinely stable). chandlerc: Then switch to generated `CHECK` lines? The checks we have here will pass easily after…
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				$_Z6Helperv = comdat any

				; Function Attrs: noinline optnone uwtable
				define dso_local void @_Z3Runv() #0 {
				entry:
				call void @_Z6Helperv()
				ret void
				}
				chandlercUnsubmitted Done Reply Inline Actions Are both functions needed? chandlerc: Are both functions needed?
				echristoAuthorUnsubmitted Done Reply Inline Actions I thought so, but apparently not :) echristo: I thought so, but apparently not :)

				; Function Attrs: noinline nounwind optnone uwtable
				define linkonce_odr dso_local void @_Z6Helperv() #1 comdat {
				chandlercUnsubmitted Done Reply Inline Actions Nit, but minimize and clean this function up a touch? At the least, removing all the target features seems valuable, and I'd give things stable names instead of numbered values. chandlerc: Nit, but minimize and clean this function up a touch? At the least, removing all the target…
				echristoAuthorUnsubmitted Done Reply Inline Actions Got it. Most things have stable names, only a few temporaries have numbered names. echristo: Got it. Most things have stable names, only a few temporaries have numbered names.
				chandlercUnsubmitted Done Reply Inline Actions Sure, but even those will make future edits to this really frustrating with irrelevant diff, etc. And this test case can still be minimized. As an example, the function is currently attributed as optnone. =/ I think you can just run this through the metarenamer pass? chandlerc: Sure, but even those will make future edits to this really frustrating with irrelevant diff…
				entry:
				%nodes = alloca [5 x i32*], align 16
				%num_active = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 5, i32* %num_active, align 4
				br label %while.cond

				while.cond: ; preds = %for.end, %entry
				%0 = load i32, i32* %num_active, align 4
				%tobool = icmp ne i32 %0, 0
				br i1 %tobool, label %while.body, label %while.end

				while.body: ; preds = %while.cond
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc, %while.body
				%1 = load i32, i32* %i, align 4
				%cmp = icmp slt i32 %1, 5
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%2 = load i32, i32* %i, align 4
				%idxprom = sext i32 %2 to i64
				%arrayidx = getelementptr inbounds [5 x i32], [5 x i32]* %nodes, i64 0, i64 %idxprom
				%3 = load i32, i32* %arrayidx, align 8
				%tobool1 = icmp ne i32* %3, null
				br i1 %tobool1, label %if.then, label %if.end

				if.then: ; preds = %for.body
				%4 = load i32, i32* %num_active, align 4
				%dec = add nsw i32 %4, -1
				store i32 %dec, i32* %num_active, align 4
				br label %if.end

				if.end: ; preds = %if.then, %for.body
				br label %for.inc

				for.inc: ; preds = %if.end
				%5 = load i32, i32* %i, align 4
				%inc = add nsw i32 %5, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond, !llvm.loop !2

				for.end: ; preds = %for.cond
				br label %while.cond

				while.end: ; preds = %while.cond
				ret void
				}

				attributes #0 = { noinline optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 11.0.0 (git@github.com:llvm/llvm-project.git 3ccd454c102b069d2230a18cfe16b84a5f005fc8)"}
				!2 = distinct !{!2, !3}
				!3 = !{!"llvm.loop.unroll.full"}

llvm/tools/opt/NewPMDriver.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	cl::desc("A textual description of the function pass pipeline inserted at "
"the PipelineStart extension point into default pipelines"),		"the PipelineStart extension point into default pipelines"),
cl::Hidden);		cl::Hidden);
static cl::opt<std::string> OptimizerLastEPPipeline(		static cl::opt<std::string> OptimizerLastEPPipeline(
"passes-ep-optimizer-last",		"passes-ep-optimizer-last",
cl::desc("A textual description of the function pass pipeline inserted at "		cl::desc("A textual description of the function pass pipeline inserted at "
"the OptimizerLast extension point into default pipelines"),		"the OptimizerLast extension point into default pipelines"),
cl::Hidden);		cl::Hidden);

		// Individual pipeline tuning options.
		static cl::opt<bool> DisableLoopUnrolling(
		"new-pm-disable-loop-unrolling",
		cl::desc("Disable loop unrolling in all relevant passes"), cl::init(false));

extern cl::opt<PGOKind> PGOKindFlag;		extern cl::opt<PGOKind> PGOKindFlag;
extern cl::opt<std::string> ProfileFile;		extern cl::opt<std::string> ProfileFile;
extern cl::opt<CSPGOKind> CSPGOKindFlag;		extern cl::opt<CSPGOKind> CSPGOKindFlag;
extern cl::opt<std::string> CSProfileGenFile;		extern cl::opt<std::string> CSProfileGenFile;

static cl::opt<std::string>		static cl::opt<std::string>
ProfileRemappingFile("profile-remapping-file",		ProfileRemappingFile("profile-remapping-file",
cl::desc("Path to the profile remapping file."),		cl::desc("Path to the profile remapping file."),
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	if (CSPGOKindFlag == CSInstrGen) {
P->CSAction = PGOOptions::CSIRUse;		P->CSAction = PGOOptions::CSIRUse;
}		}
}		}
PassInstrumentationCallbacks PIC;		PassInstrumentationCallbacks PIC;
StandardInstrumentations SI;		StandardInstrumentations SI;
SI.registerCallbacks(PIC);		SI.registerCallbacks(PIC);

PipelineTuningOptions PTO;		PipelineTuningOptions PTO;
		// LoopUnrolling defaults on to true and DisableLoopUnrolling is initialized
		// to false above so we shouldn't necessarily need to check whether or not the
		// option has been enabled.
		PTO.LoopUnrolling = !DisableLoopUnrolling;
PTO.Coroutines = Coroutines;		PTO.Coroutines = Coroutines;
PassBuilder PB(TM, PTO, P, &PIC);		PassBuilder PB(TM, PTO, P, &PIC);
registerEPCallbacks(PB, VerifyEachPass, DebugPM);		registerEPCallbacks(PB, VerifyEachPass, DebugPM);

// Load requested pass plugins and let them register pass builder callbacks		// Load requested pass plugins and let them register pass builder callbacks
for (auto &PluginFN : PassPlugins) {		for (auto &PluginFN : PassPlugins) {
auto PassPlugin = PassPlugin::Load(PluginFN);		auto PassPlugin = PassPlugin::Load(PluginFN);
if (!PassPlugin) {		if (!PassPlugin) {
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fix full loop unrolling initialization in new pass manager
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261928

clang/test/Misc/loop-opt-setup.c

llvm/lib/Passes/PassBuilder.cpp

llvm/test/Transforms/LoopUnroll/FullUnroll.ll

llvm/tools/opt/NewPMDriver.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Fix full loop unrolling initialization in new pass managerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261928

clang/test/Misc/loop-opt-setup.c

llvm/lib/Passes/PassBuilder.cpp

llvm/test/Transforms/LoopUnroll/FullUnroll.ll

llvm/tools/opt/NewPMDriver.cpp

Fix full loop unrolling initialization in new pass manager
ClosedPublic