This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.def
-
Driver/
-
Options.td
-
lib/CodeGen/
-
CodeGen/
-
BackendUtil.cpp
-
CodeGenModule.cpp
-
test/CodeGen/
-
CodeGen/
-
falways-mem2reg.c
-
llvm/
-
include/llvm/
-
llvm/
-
Passes/
-
PassBuilder.h
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp

Differential D105516

[clang][PassManager] Add -falways-mem2reg CC1 flag to run mem2reg at -O0
AcceptedPublic

Authored by jrtc27 on Jul 6 2021, 3:34 PM.

Download Raw Diff

Details

Reviewers

chandlerc
rjmccall
rsmith
efriedma

Summary

Standard -O0 IR and assembly can be hard to follow as, without mem2reg,
there are loads and stores to the stack everywhere that clutter things
up and make it hard to see where the actual interesting instructions are
(such as when trying to debug a crash in unoptimised code from the
disassembly). It is therefore useful to be able to force mem2reg to be
run even at -O0 to clean up a lot of those stack loads and stores. There
are also Clang CodeGen tests in the tree that explicitly run mem2reg on
the output in order to make the CHECK lines more readable, which
requires manually passing -disable-O0-optnone and piping to opt; having
a flag that supports this also makes those less clunky.

Whilst optimisation for speed's sake is not the primary purpose of this
patch, it does provide an easy significant improvement in code size as
you might expect, giving a ~12% decrease in code size on macOS/arm64
when compiling Clang itself with the option enabled, likely also having
a significant improvement on the running time of the test suite over a
plain Debug build. On GNU/Linux/amd64 the decrease is less pronounced,
at about 4%, likely due to the fact that many instructions can take one
memory operand and so do not have to pay the additional cost of a load
or store like on load-store architectures.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	220 ms	x64 windows > Flang.Driver::debug-parsing-log.f90
	150 ms	x64 windows > Flang.Driver::debug-provenance.f90
	20 ms	x64 windows > Flang.Evaluate::folding01.f90
	0 ms	x64 windows > Flang.Evaluate::folding02.f90
	0 ms	x64 windows > Flang.Evaluate::folding03.f90
		View Full Test Results (481 Failed)

Event Timeline

jrtc27 created this revision.Jul 6 2021, 3:34 PM

Herald added subscribers: ormris, dexonsmith, dang and 4 others. · View Herald TranscriptJul 6 2021, 3:34 PM

jrtc27 requested review of this revision.Jul 6 2021, 3:34 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 6 2021, 3:34 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B112695: Diff 356828.Jul 6 2021, 5:04 PM

arichardson added a subscriber: arichardson.Jul 7 2021, 9:27 AM

Ping

I think it would be better to focus on making -O1 more usable for this sort of purpose, rather than introduce -O0.5. I mean, there's a lot of wiggle-room between -O0 and -O2, but I don't think it makes sense to add a driver option that promises to run exactly one optimization.

This is not meant to be an -O0.5, this is meant to be an -Oepsilon. I don't want optimised code, I just want code that I can actually disassemble and understand without having to trawl through a mess of stack spills and loads. This is for debugging really basic bugs (either compiler or bad C/C++ input) that turn up even at -O0 and that you don't want optimisations for because it's just going to complicate things, or even make the bugs disappear.

This is also so that the myriad of %clang_cc1 -disable-O0-optnone | opt -S -mem2reg seen in clang/tests can become %clang_cc1 -falways-mem2reg as the current way to write those tests is really clunky.

The part I'm most uncomfortable with is sticking "mem2reg" in a public, documented driver option. I don't want to promise that the mem2reg pass will exist forever. We should be focused on making sure the options we add are stable, and compose effectively, not just being convenient for some specific use.

I'd be less concerned if it were just a -cc1 option; if it's for our internal use, and we can throw it away if we come up with a better solution, this seems okay.

In D105516#2889411, @efriedma wrote:

The part I'm most uncomfortable with is sticking "mem2reg" in a public, documented driver option. I don't want to promise that the mem2reg pass will exist forever. We should be focused on making sure the options we add are stable, and compose effectively, not just being convenient for some specific use.

I'd be less concerned if it were just a -cc1 option; if it's for our internal use, and we can throw it away if we come up with a better solution, this seems okay.

I'd be ok with having it just be a -cc1 option (I didn't even actually add a driver test for the non-cc1 form...). I also thought about doing something like -falways-regalloc to not tie it to the pass name, but names like that are misleading since machine register allocation does still happen, just not on things that it doesn't know could be promoted from memory to registers.

Matt added a subscriber: Matt.Jul 20 2021, 6:43 AM

I agree with Eli: we should decide what the goals are here and then use those goals to decide if we can identify a desirable permanent feature and, if so, what the appropriate name for that feature is.

It sounds like your goal is to get readable assembly that still corresponds fairly literally to your original code, in the sense that the readability of -O0 assembly is often undermined by the sheer amount of code and all the extra, unnecessary work it seems to do. However, I've found that a lot of the extra -O0 code is not actually from loads and stores to local variables, it's from the -O0 instruction selection and register allocation, which often combine to do very silly things. Have you looked into whether you get more readable code by just running normal -O0 IR through the non-O0 codegen pipeline? Because the problem with doing just mem2reg is that that's already a fairly major non-literal change to the code, and at some point it's tricky to say what exactly should be part of this new pipeline; whereas still emitting exactly what the abstract machine says to do, just with less nonsense from fast-isel, is a lot easier to define.

lkail added a subscriber: lkail.Jul 23 2021, 3:37 AM

Now only a CC1 option

Seems fine.

This revision is now accepted and ready to land.Jul 27 2021, 4:09 PM

I like this since the flag significantly improves readability for update_cc_test_checks.py-generated Clang test without having to use the -disable-O0-optnone | opt trick. Not sure what the best flag name is, but as long as it's a CC1 flag it shouldn't really matter.

In D105516#2892512, @rjmccall wrote:

I agree with Eli: we should decide what the goals are here and then use those goals to decide if we can identify a desirable permanent feature and, if so, what the appropriate name for that feature is.

It sounds like your goal is to get readable assembly that still corresponds fairly literally to your original code, in the sense that the readability of -O0 assembly is often undermined by the sheer amount of code and all the extra, unnecessary work it seems to do. However, I've found that a lot of the extra -O0 code is not actually from loads and stores to local variables, it's from the -O0 instruction selection and register allocation, which often combine to do very silly things. Have you looked into whether you get more readable code by just running normal -O0 IR through the non-O0 codegen pipeline? Because the problem with doing just mem2reg is that that's already a fairly major non-literal change to the code, and at some point it's tricky to say what exactly should be part of this new pipeline; whereas still emitting exactly what the abstract machine says to do, just with less nonsense from fast-isel, is a lot easier to define.

Well, I'm dealing with RISC-V which doesn't have FastISel. And no, running -O0 IR through a different pipeline is never going to work well without mem2reg (and you can't mem2reg -O0 IR without -disable-O0-optnone), so you need at least this to work well. Whether or not there's a useful load of additional stuff you could do, maybe, but I do think this patch in and of itself makes a huge difference, and the limited scope of it can be beneficial.

jrtc27 retitled this revision from [clang][PassManager] Add -falways-mem2reg to run mem2reg at -O0 to [clang][PassManager] Add -falways-mem2reg CC1 flag to run mem2reg at -O0.Jul 27 2021, 4:11 PM

jrtc27 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B116565: Diff 362214.Jul 27 2021, 6:02 PM

@rjmccall Ping?

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

CodeGenOptions.def

1 line

Driver/

Options.td

4 lines

lib/

CodeGen/

BackendUtil.cpp

2 lines

CodeGenModule.cpp

2 lines

test/

CodeGen/

falways-mem2reg.c

33 lines

llvm/

include/

llvm/

Passes/

PassBuilder.h

4 lines

Transforms/

IPO/

PassManagerBuilder.h

1 line

lib/

Passes/

PassBuilder.cpp

4 lines

Transforms/

IPO/

PassManagerBuilder.cpp

6 lines

Diff 362214

clang/include/clang/Basic/CodeGenOptions.def

	Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines
	CODEGENOPT(TimePasses , 1, 0) ///< Set when -ftime-report or -ftime-report= is enabled.			CODEGENOPT(TimePasses , 1, 0) ///< Set when -ftime-report or -ftime-report= is enabled.
	CODEGENOPT(TimePassesPerRun , 1, 0) ///< Set when -ftime-report=per-pass-run is enabled.			CODEGENOPT(TimePassesPerRun , 1, 0) ///< Set when -ftime-report=per-pass-run is enabled.
	CODEGENOPT(TimeTrace , 1, 0) ///< Set when -ftime-trace is enabled.			CODEGENOPT(TimeTrace , 1, 0) ///< Set when -ftime-trace is enabled.
	VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),			VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),
	///< traced by time profiler			///< traced by time profiler
	CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.			CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.
	CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.			CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.
	CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.			CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.
				CODEGENOPT(AlwaysMem2Reg , 1, 0) ///< Set when -falways-mem2reg is enabled.
	CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables.			CODEGENOPT(UnwindTables , 1, 0) ///< Emit unwind tables.
	CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer.			CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer.
	CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer.			CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer.
	CODEGENOPT(ProfileSampleAccurate, 1, 0) ///< Sample profile is accurate.			CODEGENOPT(ProfileSampleAccurate, 1, 0) ///< Sample profile is accurate.

	/// Treat loops as finite: language, always, never.			/// Treat loops as finite: language, always, never.
	ENUM_CODEGENOPT(FiniteLoops, FiniteLoopsKind, 2, FiniteLoopsKind::Language)			ENUM_CODEGENOPT(FiniteLoops, FiniteLoopsKind, 2, FiniteLoopsKind::Language)

	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,782 Lines • ▼ Show 20 Lines	def mrelocation_model : Separate<["-"], "mrelocation-model">,
NormalizedValuesScope<"llvm::Reloc">,		NormalizedValuesScope<"llvm::Reloc">,
NormalizedValues<["Static", "PIC_", "ROPI", "RWPI", "ROPI_RWPI", "DynamicNoPIC"]>,		NormalizedValues<["Static", "PIC_", "ROPI", "RWPI", "ROPI_RWPI", "DynamicNoPIC"]>,
MarshallingInfoEnum<CodeGenOpts<"RelocationModel">, "PIC_">;		MarshallingInfoEnum<CodeGenOpts<"RelocationModel">, "PIC_">;
def fno_math_builtin : Flag<["-"], "fno-math-builtin">,		def fno_math_builtin : Flag<["-"], "fno-math-builtin">,
HelpText<"Disable implicit builtin knowledge of math functions">,		HelpText<"Disable implicit builtin knowledge of math functions">,
MarshallingInfoFlag<LangOpts<"NoMathBuiltin">>;		MarshallingInfoFlag<LangOpts<"NoMathBuiltin">>;
def fuse_ctor_homing: Flag<["-"], "fuse-ctor-homing">,		def fuse_ctor_homing: Flag<["-"], "fuse-ctor-homing">,
HelpText<"Use constructor homing if we are using limited debug info already">;		HelpText<"Use constructor homing if we are using limited debug info already">;
		defm falways_mem2reg : BoolFOption<"always-mem2reg",
		CodeGenOpts<"AlwaysMem2Reg">, DefaultFalse,
		PosFlag<SetTrue, [], "Always run mem2reg regardless of optimisation level">,
		NegFlag<SetFalse, [], "Run mem2reg based on optimisation level">>;
}		}

def disable_llvm_verifier : Flag<["-"], "disable-llvm-verifier">,		def disable_llvm_verifier : Flag<["-"], "disable-llvm-verifier">,
HelpText<"Don't run the LLVM IR verifier pass">,		HelpText<"Don't run the LLVM IR verifier pass">,
MarshallingInfoNegativeFlag<CodeGenOpts<"VerifyModule">>;		MarshallingInfoNegativeFlag<CodeGenOpts<"VerifyModule">>;
def disable_llvm_passes : Flag<["-"], "disable-llvm-passes">,		def disable_llvm_passes : Flag<["-"], "disable-llvm-passes">,
HelpText<"Use together with -emit-llvm to get pristine LLVM IR from the "		HelpText<"Use together with -emit-llvm to get pristine LLVM IR from the "
"frontend by not running any LLVM passes at all">,		"frontend by not running any LLVM passes at all">,
▲ Show 20 Lines • Show All 1,462 Lines • Show Last 20 Lines

clang/lib/CodeGen/BackendUtil.cpp

Show First 20 Lines • Show All 676 Lines • ▼ Show 20 Lines	void EmitAssemblyHelper::CreatePasses(legacy::PassManager &MPM,
PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize;		PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize;
PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP;		PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP;
PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop;		PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop;
// Only enable CGProfilePass when using integrated assembler, since		// Only enable CGProfilePass when using integrated assembler, since
// non-integrated assemblers don't recognize .cgprofile section.		// non-integrated assemblers don't recognize .cgprofile section.
PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;		PMBuilder.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;

PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops;		PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops;
		PMBuilder.AlwaysMem2Reg = CodeGenOpts.AlwaysMem2Reg;
// Loop interleaving in the loop vectorizer has historically been set to be		// Loop interleaving in the loop vectorizer has historically been set to be
// enabled when loop unrolling is enabled.		// enabled when loop unrolling is enabled.
PMBuilder.LoopsInterleaved = CodeGenOpts.UnrollLoops;		PMBuilder.LoopsInterleaved = CodeGenOpts.UnrollLoops;
PMBuilder.MergeFunctions = CodeGenOpts.MergeFunctions;		PMBuilder.MergeFunctions = CodeGenOpts.MergeFunctions;
PMBuilder.PrepareForThinLTO = CodeGenOpts.PrepareForThinLTO;		PMBuilder.PrepareForThinLTO = CodeGenOpts.PrepareForThinLTO;
PMBuilder.PrepareForLTO = CodeGenOpts.PrepareForLTO;		PMBuilder.PrepareForLTO = CodeGenOpts.PrepareForLTO;
PMBuilder.RerollLoops = CodeGenOpts.RerollLoops;		PMBuilder.RerollLoops = CodeGenOpts.RerollLoops;

▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	void EmitAssemblyHelper::EmitAssemblyWithNewPassManager(

PipelineTuningOptions PTO;		PipelineTuningOptions PTO;
PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;		PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop		// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.		// unrolling.
PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;		PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
PTO.LoopVectorization = CodeGenOpts.VectorizeLoop;		PTO.LoopVectorization = CodeGenOpts.VectorizeLoop;
PTO.SLPVectorization = CodeGenOpts.VectorizeSLP;		PTO.SLPVectorization = CodeGenOpts.VectorizeSLP;
		PTO.AlwaysMem2Reg = CodeGenOpts.AlwaysMem2Reg;
PTO.MergeFunctions = CodeGenOpts.MergeFunctions;		PTO.MergeFunctions = CodeGenOpts.MergeFunctions;
// Only enable CGProfilePass when using integrated assembler, since		// Only enable CGProfilePass when using integrated assembler, since
// non-integrated assemblers don't recognize .cgprofile section.		// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;		PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.Coroutines = LangOpts.Coroutines;		PTO.Coroutines = LangOpts.Coroutines;

LoopAnalysisManager LAM;		LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;		FunctionAnalysisManager FAM;
▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 1,758 Lines • ▼ Show 20 Lines	if (!D) {
F->addAttributes(llvm::AttributeList::FunctionIndex, B);		F->addAttributes(llvm::AttributeList::FunctionIndex, B);
return;		return;
}		}

// Track whether we need to add the optnone LLVM attribute,		// Track whether we need to add the optnone LLVM attribute,
// starting with the default for this optimization level.		// starting with the default for this optimization level.
bool ShouldAddOptNone =		bool ShouldAddOptNone =
!CodeGenOpts.DisableO0ImplyOptNone && CodeGenOpts.OptimizationLevel == 0;		!CodeGenOpts.DisableO0ImplyOptNone && CodeGenOpts.OptimizationLevel == 0;
		// -falways-mem2reg implies at least a minimal amount of optimisation.
		ShouldAddOptNone &= !CodeGenOpts.AlwaysMem2Reg;
// We can't add optnone in the following cases, it won't pass the verifier.		// We can't add optnone in the following cases, it won't pass the verifier.
ShouldAddOptNone &= !D->hasAttr<MinSizeAttr>();		ShouldAddOptNone &= !D->hasAttr<MinSizeAttr>();
ShouldAddOptNone &= !D->hasAttr<AlwaysInlineAttr>();		ShouldAddOptNone &= !D->hasAttr<AlwaysInlineAttr>();

// Add optnone, but do so only if the function isn't always_inline.		// Add optnone, but do so only if the function isn't always_inline.
if ((ShouldAddOptNone \|\| D->hasAttr<OptimizeNoneAttr>()) &&		if ((ShouldAddOptNone \|\| D->hasAttr<OptimizeNoneAttr>()) &&
!F->hasFnAttribute(llvm::Attribute::AlwaysInline)) {		!F->hasFnAttribute(llvm::Attribute::AlwaysInline)) {
B.addAttribute(llvm::Attribute::OptimizeNone);		B.addAttribute(llvm::Attribute::OptimizeNone);
▲ Show 20 Lines • Show All 4,649 Lines • Show Last 20 Lines

clang/test/CodeGen/falways-mem2reg.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 %s \
				// RUN: \| FileCheck --check-prefix=O0-NO-MEM2REG %s
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 %s \
				// RUN: \| FileCheck --check-prefix=O0-NO-MEM2REG %s
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 -fno-always-mem2reg %s \
				// RUN: \| FileCheck --check-prefix=O0-NO-MEM2REG %s
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 -fno-always-mem2reg %s \
				// RUN: \| FileCheck --check-prefix=O0-NO-MEM2REG %s
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -flegacy-pass-manager -O0 -falways-mem2reg %s \
				// RUN: \| FileCheck --check-prefix=O0-MEM2REG %s
				// RUN: %clang_cc1 -triple riscv64 -emit-llvm -o - -fno-legacy-pass-manager -O0 -falways-mem2reg %s \
				// RUN: \| FileCheck --check-prefix=O0-MEM2REG %s

				// O0-NO-MEM2REG-LABEL: @add(
				// O0-NO-MEM2REG-NEXT: entry:
				// O0-NO-MEM2REG-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4
				// O0-NO-MEM2REG-NEXT: [[B_ADDR:%.*]] = alloca i32, align 4
				// O0-NO-MEM2REG-NEXT: store i32 [[A:%.]], i32 [[A_ADDR]], align 4
				// O0-NO-MEM2REG-NEXT: store i32 [[B:%.]], i32 [[B_ADDR]], align 4
				// O0-NO-MEM2REG-NEXT: [[TMP0:%.]] = load i32, i32 [[A_ADDR]], align 4
				// O0-NO-MEM2REG-NEXT: [[TMP1:%.]] = load i32, i32 [[B_ADDR]], align 4
				// O0-NO-MEM2REG-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP0]], [[TMP1]]
				// O0-NO-MEM2REG-NEXT: ret i32 [[ADD]]
				//
				// O0-MEM2REG-LABEL: @add(
				// O0-MEM2REG-NEXT: entry:
				// O0-MEM2REG-NEXT: [[ADD:%.]] = add nsw i32 [[A:%.]], [[B:%.*]]
				// O0-MEM2REG-NEXT: ret i32 [[ADD]]
				//
				int add(int a, int b) {
				return a + b;
				}

llvm/include/llvm/Passes/PassBuilder.h

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	public:

/// Tuning option to enable/disable loop unrolling. Its default value is true.		/// Tuning option to enable/disable loop unrolling. Its default value is true.
bool LoopUnrolling;		bool LoopUnrolling;

/// Tuning option to forget all SCEV loops in LoopUnroll. Its default value		/// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
/// is that of the flag: `-forget-scev-loop-unroll`.		/// is that of the flag: `-forget-scev-loop-unroll`.
bool ForgetAllSCEVInLoopUnroll;		bool ForgetAllSCEVInLoopUnroll;

		/// Tuning option to always run mem2reg regardless of the optimisation level.
		/// Its default value is false.
		bool AlwaysMem2Reg;

/// Tuning option to enable/disable coroutine intrinsic lowering. Its default		/// Tuning option to enable/disable coroutine intrinsic lowering. Its default
/// value is false. Frontends such as Clang may enable this conditionally. For		/// value is false. Frontends such as Clang may enable this conditionally. For
/// example, Clang enables this option if the flags `-std=c++2a` or above, or		/// example, Clang enables this option if the flags `-std=c++2a` or above, or
/// `-fcoroutines-ts`, have been specified.		/// `-fcoroutines-ts`, have been specified.
bool Coroutines;		bool Coroutines;

/// Tuning option to cap the number of calls to retrive clobbering accesses in		/// Tuning option to cap the number of calls to retrive clobbering accesses in
/// MemorySSA, in LICM.		/// MemorySSA, in LICM.
▲ Show 20 Lines • Show All 721 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	public:

/// The module summary index to use for importing information to the		/// The module summary index to use for importing information to the
/// thin LTO backends, for example for the CFI and devirtualization type		/// thin LTO backends, for example for the CFI and devirtualization type
/// tests.		/// tests.
const ModuleSummaryIndex *ImportSummary = nullptr;		const ModuleSummaryIndex *ImportSummary = nullptr;

bool DisableTailCalls;		bool DisableTailCalls;
bool DisableUnrollLoops;		bool DisableUnrollLoops;
		bool AlwaysMem2Reg;
bool CallGraphProfile;		bool CallGraphProfile;
bool SLPVectorize;		bool SLPVectorize;
bool LoopVectorize;		bool LoopVectorize;
bool LoopsInterleaved;		bool LoopsInterleaved;
bool RerollLoops;		bool RerollLoops;
bool NewGVN;		bool NewGVN;
bool DisableGVNLoadPRE;		bool DisableGVNLoadPRE;
bool ForgetAllSCEVInLoopUnroll;		bool ForgetAllSCEVInLoopUnroll;
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableO3NonTrivialUnswitching(
cl::ZeroOrMore, cl::desc("Enable non-trivial loop unswitching for -O3"));		cl::ZeroOrMore, cl::desc("Enable non-trivial loop unswitching for -O3"));

PipelineTuningOptions::PipelineTuningOptions() {		PipelineTuningOptions::PipelineTuningOptions() {
LoopInterleaving = true;		LoopInterleaving = true;
LoopVectorization = true;		LoopVectorization = true;
SLPVectorization = false;		SLPVectorization = false;
LoopUnrolling = true;		LoopUnrolling = true;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;		ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
		AlwaysMem2Reg = false;
Coroutines = false;		Coroutines = false;
LicmMssaOptCap = SetLicmMssaOptCap;		LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;		LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
CallGraphProfile = true;		CallGraphProfile = true;
MergeFunctions = false;		MergeFunctions = false;
}		}

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 1,632 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
// which is just that always inlining occurs. Further, disable generating		// which is just that always inlining occurs. Further, disable generating
// lifetime intrinsics to avoid enabling further optimizations during		// lifetime intrinsics to avoid enabling further optimizations during
// code generation.		// code generation.
// However, we need to insert lifetime intrinsics to avoid invalid access		// However, we need to insert lifetime intrinsics to avoid invalid access
// caused by multithreaded coroutines.		// caused by multithreaded coroutines.
MPM.addPass(AlwaysInlinerPass(		MPM.addPass(AlwaysInlinerPass(
/InsertLifetimeIntrinsics=/PTO.Coroutines));		/InsertLifetimeIntrinsics=/PTO.Coroutines));

		if (PTO.AlwaysMem2Reg)
		MPM.addPass(createModuleToFunctionPassAdaptor(PromotePass()));

if (PTO.MergeFunctions)		if (PTO.MergeFunctions)
MPM.addPass(MergeFunctionsPass());		MPM.addPass(MergeFunctionsPass());

if (EnableMatrix)		if (EnableMatrix)
MPM.addPass(		MPM.addPass(
createModuleToFunctionPassAdaptor(LowerMatrixIntrinsicsPass(true)));		createModuleToFunctionPassAdaptor(LowerMatrixIntrinsicsPass(true)));

if (!CGSCCOptimizerLateEPCallbacks.empty()) {		if (!CGSCCOptimizerLateEPCallbacks.empty()) {
▲ Show 20 Lines • Show All 1,268 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
} // namespace llvm		} // namespace llvm

PassManagerBuilder::PassManagerBuilder() {		PassManagerBuilder::PassManagerBuilder() {
OptLevel = 2;		OptLevel = 2;
SizeLevel = 0;		SizeLevel = 0;
LibraryInfo = nullptr;		LibraryInfo = nullptr;
Inliner = nullptr;		Inliner = nullptr;
DisableUnrollLoops = false;		DisableUnrollLoops = false;
		AlwaysMem2Reg = false;
SLPVectorize = false;		SLPVectorize = false;
LoopVectorize = true;		LoopVectorize = true;
LoopsInterleaved = true;		LoopsInterleaved = true;
RerollLoops = RunLoopRerolling;		RerollLoops = RunLoopRerolling;
NewGVN = RunNewGVN;		NewGVN = RunNewGVN;
LicmMssaOptCap = SetLicmMssaOptCap;		LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;		LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
DisableGVNLoadPRE = false;		DisableGVNLoadPRE = false;
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	if (!PGOSampleUse.empty()) {
if (!(FlattenedProfileUsed && PerformThinLTO))		if (!(FlattenedProfileUsed && PerformThinLTO))
MPM.add(createSampleProfileLoaderPass(PGOSampleUse));		MPM.add(createSampleProfileLoaderPass(PGOSampleUse));
}		}

// Allow forcing function attributes as a debugging and tuning aid.		// Allow forcing function attributes as a debugging and tuning aid.
MPM.add(createForceFunctionAttrsLegacyPass());		MPM.add(createForceFunctionAttrsLegacyPass());

// If all optimizations are disabled, just run the always-inline pass and,		// If all optimizations are disabled, just run the always-inline pass and,
// if enabled, the function merging pass.		// if enabled, the mem2reg and function merging passes.
if (OptLevel == 0) {		if (OptLevel == 0) {
		if (AlwaysMem2Reg)
		MPM.add(createPromoteMemoryToRegisterPass());

addPGOInstrPasses(MPM);		addPGOInstrPasses(MPM);
if (Inliner) {		if (Inliner) {
MPM.add(Inliner);		MPM.add(Inliner);
Inliner = nullptr;		Inliner = nullptr;
}		}

// FIXME: The BarrierNoopPass is a HACK! The inliner pass above implicitly		// FIXME: The BarrierNoopPass is a HACK! The inliner pass above implicitly
// creates a CGSCC pass manager, but we don't want to add extensions into		// creates a CGSCC pass manager, but we don't want to add extensions into
▲ Show 20 Lines • Show All 635 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang][PassManager] Add -falways-mem2reg CC1 flag to run mem2reg at -O0AcceptedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 362214

clang/include/clang/Basic/CodeGenOptions.def

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/BackendUtil.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/test/CodeGen/falways-mem2reg.c

llvm/include/llvm/Passes/PassBuilder.h

llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

[clang][PassManager] Add -falways-mem2reg CC1 flag to run mem2reg at -O0
AcceptedPublic