This is an archive of the discontinued LLVM Phabricator instance.

[RFC] [Coroutines] Split Ramp Function
AbandonedPublic

Authored by ChuanqiXu on Mar 22 2021, 3:57 AM.

Download Raw Diff

Details

Reviewers

lxfind
junparser
rjmccall
bruno

Summary

When the initial_suspend of coroutine isn't suspend_always, the ramp function may be very large. And the large ramp function would be hard to inline, which would disable Coro-elide optimization further. And split ramp function would normalize the formal for all the ramp function. Now all the ramp function looks same and only did some simple initializations.

This patch did split ramp function by insert a coro.suspend before initial_suspend and a call to resume function just after coro.end. The inserted coro.suspend makes sure that the ramp function would be small. And the inserted resume call makes sure the control flow remains the same.

But there is still some issues:

It seems like we shouldn't do this for whose initial_suspend is suspend_always. But where should we add this check? In the front end or in the middle end.
If ramp function returns gro by the argument, then the call to resume in the end of ramp would be a tail call, which is nice. But if the gro should be returned directly, then how much is the cost about the inserted call?
If we compiled a coroutine program with -g, debug information for the coroutine would be copied 3 times. Then the final binary would be very large, which is really hard for CI/CD. I was imaged that this patch could help the problem. However, the debug information would be copied before compiler try to do splitting, which results the binary size compiled with -g wouldn't reduce after this patch.

This patch isn't intended for committing. I just want to ask for opinions about your guys.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Mar 22 2021, 3:57 AM

Herald added subscribers: jansvoboda11, dexonsmith, dang. · View Herald TranscriptMar 22 2021, 3:57 AM

ChuanqiXu requested review of this revision.Mar 22 2021, 3:57 AM

Harbormaster completed remote builds in B94960: Diff 332238.Mar 22 2021, 4:40 AM

Since D100415 did some structural changes to coroutine too, we may could only review this patch after that one accepted.

ChuanqiXu abandoned this revision.Jul 13 2021, 4:21 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

LangOptions.def

1 line

Driver/

Options.td

2 lines

lib/

CodeGen/

CGCoroutine.cpp

36 lines

Driver/

ToolChains/

Clang.cpp

6 lines

Frontend/

CompilerInvocation.cpp

1 line

test/

CodeGenCoroutines/

coro-split-ramp.cpp

56 lines

Diff 332238

clang/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	LANGOPT(RTTI , 1, 1, "run-time type information")			LANGOPT(RTTI , 1, 1, "run-time type information")
	LANGOPT(RTTIData , 1, 1, "emit run-time type information data")			LANGOPT(RTTIData , 1, 1, "emit run-time type information data")
	LANGOPT(MSBitfields , 1, 0, "Microsoft-compatible structure layout")			LANGOPT(MSBitfields , 1, 0, "Microsoft-compatible structure layout")
	LANGOPT(Freestanding, 1, 0, "freestanding implementation")			LANGOPT(Freestanding, 1, 0, "freestanding implementation")
	LANGOPT(NoBuiltin , 1, 0, "disable builtin functions")			LANGOPT(NoBuiltin , 1, 0, "disable builtin functions")
	LANGOPT(NoMathBuiltin , 1, 0, "disable math builtin functions")			LANGOPT(NoMathBuiltin , 1, 0, "disable math builtin functions")
	LANGOPT(GNUAsm , 1, 1, "GNU-style inline assembly")			LANGOPT(GNUAsm , 1, 1, "GNU-style inline assembly")
	LANGOPT(Coroutines , 1, 0, "C++20 coroutines")			LANGOPT(Coroutines , 1, 0, "C++20 coroutines")
				LANGOPT(SplitCoroRamp , 1, 0, "Split Ramp function for C++20 coroutines")
	LANGOPT(DllExportInlines , 1, 1, "dllexported classes dllexport inline methods")			LANGOPT(DllExportInlines , 1, 1, "dllexported classes dllexport inline methods")
	LANGOPT(RelaxedTemplateTemplateArgs, 1, 0, "C++17 relaxed matching of template template arguments")			LANGOPT(RelaxedTemplateTemplateArgs, 1, 0, "C++17 relaxed matching of template template arguments")

	LANGOPT(DoubleSquareBracketAttributes, 1, 0, "'[[]]' attributes extension for all language standard modes")			LANGOPT(DoubleSquareBracketAttributes, 1, 0, "'[[]]' attributes extension for all language standard modes")

	COMPATIBLE_LANGOPT(RecoveryAST, 1, 1, "Preserve expressions in AST when encountering errors")			COMPATIBLE_LANGOPT(RecoveryAST, 1, 1, "Preserve expressions in AST when encountering errors")
	COMPATIBLE_LANGOPT(RecoveryASTType, 1, 1, "Preserve the type in recovery expressions")			COMPATIBLE_LANGOPT(RecoveryASTType, 1, 1, "Preserve the type in recovery expressions")

	▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,069 Lines • ▼ Show 20 Lines	defm autolink : BoolFOption<"autolink",
PosFlag<SetTrue>>;		PosFlag<SetTrue>>;

// C++ Coroutines TS		// C++ Coroutines TS
defm coroutines_ts : BoolFOption<"coroutines-ts",		defm coroutines_ts : BoolFOption<"coroutines-ts",
LangOpts<"Coroutines">, Default<cpp20.KeyPath>,		LangOpts<"Coroutines">, Default<cpp20.KeyPath>,
PosFlag<SetTrue, [CC1Option], "Enable support for the C++ Coroutines TS">,		PosFlag<SetTrue, [CC1Option], "Enable support for the C++ Coroutines TS">,
NegFlag<SetFalse>>;		NegFlag<SetFalse>>;

		defm split_coroutine_ramp_function : OptInFFlag<"split-coroutine-ramp", "Enable split ramp function for C++ Coroutine function.">;

def fembed_bitcode_EQ : Joined<["-"], "fembed-bitcode=">,		def fembed_bitcode_EQ : Joined<["-"], "fembed-bitcode=">,
Group<f_Group>, Flags<[NoXarchOption, CC1Option, CC1AsOption]>, MetaVarName<"<option>">,		Group<f_Group>, Flags<[NoXarchOption, CC1Option, CC1AsOption]>, MetaVarName<"<option>">,
HelpText<"Embed LLVM bitcode (option: off, all, bitcode, marker)">,		HelpText<"Embed LLVM bitcode (option: off, all, bitcode, marker)">,
Values<"off,all,bitcode,marker">, NormalizedValuesScope<"CodeGenOptions">,		Values<"off,all,bitcode,marker">, NormalizedValuesScope<"CodeGenOptions">,
NormalizedValues<["Embed_Off", "Embed_All", "Embed_Bitcode", "Embed_Marker"]>,		NormalizedValues<["Embed_Off", "Embed_All", "Embed_Bitcode", "Embed_Marker"]>,
MarshallingInfoEnum<CodeGenOpts<"EmbedBitcode">, "Embed_Off">;		MarshallingInfoEnum<CodeGenOpts<"EmbedBitcode">, "Embed_Off">;
def fembed_bitcode : Flag<["-"], "fembed-bitcode">, Group<f_Group>,		def fembed_bitcode : Flag<["-"], "fembed-bitcode">, Group<f_Group>,
Alias<fembed_bitcode_EQ>, AliasArgs<["all"]>,		Alias<fembed_bitcode_EQ>, AliasArgs<["all"]>,
▲ Show 20 Lines • Show All 5,000 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCoroutine.cpp

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
// See llvm's docs/Coroutines.rst for more details.		// See llvm's docs/Coroutines.rst for more details.
//		//
namespace {		namespace {
struct LValueOrRValue {		struct LValueOrRValue {
LValue LV;		LValue LV;
RValue RV;		RValue RV;
};		};
}		}
		static void emitFakeSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro) {
		SmallString<32> Prefix{"coro.fake.suspend"};
		BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
		BasicBlock *SuspendBlock = CGF.createBasicBlock(Prefix + Twine(".suspend"));
		BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
		CGF.EmitBlock(SuspendBlock);

		auto &Builder = CGF.Builder;

		llvm::Function *CoroSuspend =
		CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
		auto *SuspendResult = Builder.CreateCall(
		CoroSuspend, {llvm::ConstantTokenNone::get(CGF.getLLVMContext()),
		Builder.getInt1(false)});

		// Create a switch capturing three possible continuations.
		auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
		Switch->addCase(Builder.getInt8(0), ReadyBlock);
		Switch->addCase(Builder.getInt8(1), CleanupBlock);

		CGF.EmitBlock(CleanupBlock);
		CGF.EmitBlock(ReadyBlock);
		}
static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,		static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
CoroutineSuspendExpr const &S,		CoroutineSuspendExpr const &S,
AwaitKind Kind, AggValueSlot aggSlot,		AwaitKind Kind, AggValueSlot aggSlot,
bool ignoreResult, bool forLValue) {		bool ignoreResult, bool forLValue) {
auto *E = S.getCommonExpr();		auto *E = S.getCommonExpr();

auto Binder =		auto Binder =
CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);		CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
▲ Show 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	CurCoro.Data->CleanupJD = getJumpDestInCurrentScope(RetBB);

// Now we have the promise, initialize the GRO		// Now we have the promise, initialize the GRO
GroManager.EmitGroInit();		GroManager.EmitGroInit();

EHStack.pushCleanup<CallCoroEnd>(EHCleanup);		EHStack.pushCleanup<CallCoroEnd>(EHCleanup);

CurCoro.Data->CurrentAwaitKind = AwaitKind::Init;		CurCoro.Data->CurrentAwaitKind = AwaitKind::Init;
CurCoro.Data->ExceptionHandler = S.getExceptionHandler();		CurCoro.Data->ExceptionHandler = S.getExceptionHandler();

		// Emit fake suspend in the front of initial suspend to reduce the size for
		// ramp function in which case the initial suspend isn't always suspend.
		if (getLangOpts().SplitCoroRamp)
		emitFakeSuspendExpression(this, CurCoro.Data);

EmitStmt(S.getInitSuspendStmt());		EmitStmt(S.getInitSuspendStmt());
CurCoro.Data->FinalJD = getJumpDestInCurrentScope(FinalBB);		CurCoro.Data->FinalJD = getJumpDestInCurrentScope(FinalBB);

CurCoro.Data->CurrentAwaitKind = AwaitKind::Normal;		CurCoro.Data->CurrentAwaitKind = AwaitKind::Normal;

if (CurCoro.Data->ExceptionHandler) {		if (CurCoro.Data->ExceptionHandler) {
// If we generated IR to record whether an exception was thrown from		// If we generated IR to record whether an exception was thrown from
// 'await_resume', then use that IR to determine whether the coroutine		// 'await_resume', then use that IR to determine whether the coroutine
Show All 40 Lines	CurCoro.Data->CleanupJD = getJumpDestInCurrentScope(RetBB);
}		}
}		}

EmitBlock(RetBB);		EmitBlock(RetBB);
// Emit coro.end before getReturnStmt (and parameter destructors), since		// Emit coro.end before getReturnStmt (and parameter destructors), since
// resume and destroy parts of the coroutine should not include them.		// resume and destroy parts of the coroutine should not include them.
llvm::Function *CoroEnd = CGM.getIntrinsic(llvm::Intrinsic::coro_end);		llvm::Function *CoroEnd = CGM.getIntrinsic(llvm::Intrinsic::coro_end);
Builder.CreateCall(CoroEnd, {NullPtr, Builder.getFalse()});		Builder.CreateCall(CoroEnd, {NullPtr, Builder.getFalse()});
		if (getLangOpts().SplitCoroRamp) {
		// Emit Call to resume function to keep program behavior
		// if we try to split ramp function t
		llvm::Function *CallToResume =
		CGM.getIntrinsic(llvm::Intrinsic::coro_resume);
		Builder.CreateCall(CallToResume, {CurCoro.Data->CoroBegin});
		}

if (Stmt *Ret = S.getReturnStmt())		if (Stmt *Ret = S.getReturnStmt())
EmitStmt(Ret);		EmitStmt(Ret);
}		}

// Emit coroutine intrinsic and patch up arguments of the token type.		// Emit coroutine intrinsic and patch up arguments of the token type.
RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,		RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
unsigned int IID) {		unsigned int IID) {
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,829 Lines • ▼ Show 20 Lines	if (TC.IsEncodeExtendedBlockSignatureDefault())
CmdArgs.push_back("-fencode-extended-block-signature");		CmdArgs.push_back("-fencode-extended-block-signature");

if (Args.hasFlag(options::OPT_fcoroutines_ts, options::OPT_fno_coroutines_ts,		if (Args.hasFlag(options::OPT_fcoroutines_ts, options::OPT_fno_coroutines_ts,
false) &&		false) &&
types::isCXX(InputType)) {		types::isCXX(InputType)) {
CmdArgs.push_back("-fcoroutines-ts");		CmdArgs.push_back("-fcoroutines-ts");
}		}

		if (Args.hasFlag(options::OPT_fsplit_coroutine_ramp_function,
		options::OPT_fno_split_coroutine_ramp_function, false) &&
		types::isCXX(InputType)) {
		CmdArgs.push_back("-fsplit-coroutine-ramp");
		}

Args.AddLastArg(CmdArgs, options::OPT_fdouble_square_bracket_attributes,		Args.AddLastArg(CmdArgs, options::OPT_fdouble_square_bracket_attributes,
options::OPT_fno_double_square_bracket_attributes);		options::OPT_fno_double_square_bracket_attributes);

// -faccess-control is default.		// -faccess-control is default.
if (Args.hasFlag(options::OPT_fno_access_control,		if (Args.hasFlag(options::OPT_fno_access_control,
options::OPT_faccess_control, false))		options::OPT_faccess_control, false))
CmdArgs.push_back("-fno-access-control");		CmdArgs.push_back("-fno-access-control");

▲ Show 20 Lines • Show All 1,746 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 3,682 Lines • ▼ Show 20 Lines	#undef LANG_OPTION_WITH_MARSHALLING
// For z/OS, trigraphs are enabled by default (without regard to the above).		// For z/OS, trigraphs are enabled by default (without regard to the above).
Opts.Trigraphs =		Opts.Trigraphs =
(!Opts.GNUMode && !Opts.MSVCCompat && !Opts.CPlusPlus17) \|\| T.isOSzOS();		(!Opts.GNUMode && !Opts.MSVCCompat && !Opts.CPlusPlus17) \|\| T.isOSzOS();
Opts.Trigraphs =		Opts.Trigraphs =
Args.hasFlag(OPT_ftrigraphs, OPT_fno_trigraphs, Opts.Trigraphs);		Args.hasFlag(OPT_ftrigraphs, OPT_fno_trigraphs, Opts.Trigraphs);

Opts.Blocks = Args.hasArg(OPT_fblocks) \|\| (Opts.OpenCL		Opts.Blocks = Args.hasArg(OPT_fblocks) \|\| (Opts.OpenCL
&& Opts.OpenCLVersion == 200);		&& Opts.OpenCLVersion == 200);
		Opts.SplitCoroRamp = Args.hasArg(OPT_fsplit_coroutine_ramp_function);

Opts.ConvergentFunctions = Opts.OpenCL \|\| (Opts.CUDA && Opts.CUDAIsDevice) \|\|		Opts.ConvergentFunctions = Opts.OpenCL \|\| (Opts.CUDA && Opts.CUDAIsDevice) \|\|
Opts.SYCLIsDevice \|\|		Opts.SYCLIsDevice \|\|
Args.hasArg(OPT_fconvergent_functions);		Args.hasArg(OPT_fconvergent_functions);

Opts.NoBuiltin = Args.hasArg(OPT_fno_builtin) \|\| Opts.Freestanding;		Opts.NoBuiltin = Args.hasArg(OPT_fno_builtin) \|\| Opts.Freestanding;
if (!Opts.NoBuiltin)		if (!Opts.NoBuiltin)
getAllNoBuiltinFuncValues(Args, Opts.NoBuiltinFuncs);		getAllNoBuiltinFuncValues(Args, Opts.NoBuiltinFuncs);
▲ Show 20 Lines • Show All 872 Lines • Show Last 20 Lines

clang/test/CodeGenCoroutines/coro-split-ramp.cpp

This file was added.

				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
				// RUN: -fexperimental-new-pass-manager -O0 %s -fsplit-coroutine-ramp -o - \| FileCheck %s
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
				// RUN: -O0 %s -fsplit-coroutine-ramp -o - \| FileCheck %s

				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
				// RUN: -fexperimental-new-pass-manager -O0 %s -o - \| FileCheck %s --check-prefix=CHECK-NOSPLIT
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
				// RUN: -O0 %s -o - \| FileCheck %s --check-prefix=CHECK-NOSPLIT
				namespace std {
				namespace experimental {

				struct handle {};

				struct awaitable {
				bool await_ready() noexcept { return true; }
				inline void await_suspend(handle) noexcept {}
				bool await_resume() noexcept { return true; }
				};

				template <typename T>
				struct coroutine_handle {
				static handle from_address(void *address) noexcept { return {}; }
				};

				template <typename T = void>
				struct coroutine_traits {
				struct promise_type {
				awaitable initial_suspend() { return {}; }
				awaitable final_suspend() noexcept { return {}; }
				void return_void() {}
				T get_return_object() { return T(); }
				void unhandled_exception() {}
				};
				};
				} // namespace experimental
				} // namespace std

				// The number of indexes would add one if we try to split ramp function
				// CHECK: %_Z3foov.Frame = {{.*}}i2
				// CHECK-LABEL: @_Z3foov
				// CHECK-LABEL: AfterCoroEnd:
				// CHECK: %[[CASTING:.*]] = bitcast
				// CHECK: %[[GEP:.]] = getelementptr inbounds { i8, i8* }, { i8, i8 }* %[[CASTING]], i32 0, i32 0
				// CHECK: %[[LOAD:.]] = load i8, i8** %[[GEP]], align
				// CHECK: %[[FUNC_POINTER:.]] = bitcast i8 %[[LOAD]] to void (i8)
				// CHECK: call fastcc void %[[FUNC_POINTER]]
				// CHECK-NEXT: ret void

				// CHECK-NOSPLIT: %_Z3foov.Frame = {{.*}}i1
				// CHECK-NOSPLIT-LABEL: @_Z3foov
				// CHECK-NOSPLIT-LABEL: AfterCoroEnd:
				// CHECK-NOSPLIT-NOT: call fastcc void %
				// CHECK-NOSPLIT-NEXT: ret void

				void foo() { co_return; }