This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2/4
LangRef.rst
1/2
Statepoints.rst
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
RewriteStatepointsForGC.cpp
-
Utils/
1/2
Local.cpp
-
test/Transforms/RewriteStatepointsForGC/
-
Transforms/
-
RewriteStatepointsForGC/
-
unordered-atomic-memcpy-no-deopt.ll
-
unordered-atomic-memcpy.ll

Differential D88861

GC-parseable element atomic memcpy/memmove
ClosedPublic

Authored by apilipenko on Oct 5 2020, 4:39 PM.

Download Raw Diff

Details

Reviewers

reames
skatkov
fedor.sergeev
yrouban
jdoerfert
DaniilSuchkov
dantrushin

Commits

rG6ec2c5e402a7: GC-parseable element atomic memcpy/memmove

Summary

This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

This lowering involves three things:

The call is wrapped into a statepoint.
The call is lowered to a different symbol __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.
The arguments for the call are adjusted so as to make the base pointers available in the copy function.

In order to be consistent with other calls by default element atomic memcpy/memmove intrinsics are treated as non-leaf. So, by default GC parseable lowering is generated. Old GC leaf lowering will be generated if the call is explicitly marked "gc-leaf-function" attribute.

This default choice though introduces a minor transitioning issue. In some systems, e.g. ours, GC safepoints are coupled with deoptimization mechanism (this is controlled by a cl::opt rs4gc-allow-statepoint-with-no-deopt-info). In this case we can't have a statepoint without deopt information. Normally it's up to the frontend to make sure that non-leaf calls also have proper deopt state. But element atomic memcpy/memmove intrinsic calls might be generated by the optimizer, which is not aware of this coupling. If statepoints without deopt into are not allowed and we see a non-leaf memcpy/memmove without deopt state we treat it as a leaf copy and don't produce a statepoint.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

apilipenko created this revision.Oct 5 2020, 4:39 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 5 2020, 4:39 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: dantrushin, jfb, hiraditya. · View Herald Transcript

apilipenko requested review of this revision.Oct 5 2020, 4:39 PM

apilipenko mentioned this in D87954: GC-parseable element atomic memcpy/memmove, single patch.

Hi Artur, just to explicitly state out:

But element atomic memcpy/memmove intrinsic calls might be generated by the optimizer, which is not aware of this coupling. If statepoints without deopt into are not allowed and we see a non-leaf memcpy/memmove without deopt state we treat it as a leaf copy and don't produce a statepoint

If a long running loop of loads and stores are converted to atomic memcpy (through loop idiom recognize for example) and we don't produce a statepoint - this will make the memcpy non GC parseable, which is erroneous. So, it is the responsibility of such runtimes to also introduce some mechanism to prevent long running loops from being converted to memcpys (such as having safepoint requests within the loop thereby preventing long running loops from being converted to memcpy: https://llvm.org/docs/Statepoints.html#id27).

Currently, if we have a loop with a safepoint poll it is not converted into a memcpy/memmove. This is because the safepoint has read semantic and prevents LoopIdiomRecognize from performing the transform. In theory we can have a transform which recognizes loops with safepoints and converts them to non-leaf memcpy/memmove. It will be up to this new transform to figure out the legality and interactions with the runtime requirements.

Also, note that a memcpy/memmove without "gc-leaf-function" attribute is not required to have a safepoint. It's lowered in a way which *may* have a safepoint. This is why it's correct to lower to a GC leaf representation and choose not to have a safepoint.

Added a note into the doc that a GC parseable copy operation is not required to take a safepoint.

apilipenko added a reviewer: DaniilSuchkov.Oct 16 2020, 9:58 AM

apilipenko added a reviewer: dantrushin.Oct 20 2020, 9:05 PM

skatkov added inline comments.Oct 21 2020, 3:47 AM

llvm/docs/LangRef.rst
20257	See see?
20336	The same link as for memcpy?
llvm/docs/Statepoints.rst
836	"This makes it is possible"?
llvm/lib/Transforms/Utils/Local.cpp
2679	I'm a bit confused here. RS4GC consider memcpy/memove as gc leaf if there is no deopt bundle. Here we do not check for deopt bundle. Is it ok?

Address review comments.

apilipenko added inline comments.Oct 21 2020, 11:29 AM

llvm/docs/LangRef.rst
20257	Fixed.
20336	Fixed.
llvm/docs/Statepoints.rst
836	Fixed.

apilipenko added inline comments.Oct 21 2020, 11:42 AM

llvm/lib/Transforms/Utils/Local.cpp
2679	callsGCLeafFunction returns true if the call is guaranteed to never safepoint. If a memcpy/memmove call is marked as gc-leaf it is guaranteed to be a leaf call, i.e. it will never take a safepoint. Otherwise the call may take a safepoint, but doesn't have to. This is reflected in the documentation: Note that a GC parseable copy operation is not required to take a safepoint. For example, a short copy operation may be performed without taking a safepoint. Interactions with deopt bundles is an implementation detail of RS4GC. For a memcpy/memmove call which may safepoint (i.e. doesn't have gc-leaf attribute) RS4GC is allowed to generate a leaf call which will never safepoint. If runtime requires deopt information to be associated with every safepoint (rs4gc-allow-statepoint-with-no-deopt-info=false) and RS4GC can not satisfy this requirement for a memcpy/memmove call it will generate a leaf call.

ok, this looks good to me. Please wait 1-2 days before landing to give a last call to others.

This revision is now accepted and ready to land.Oct 21 2020, 9:12 PM

Closed by commit rG6ec2c5e402a7: GC-parseable element atomic memcpy/memmove (authored by apilipenko). · Explain WhyOct 23 2020, 2:06 PM

This revision was automatically updated to reflect the committed changes.

apilipenko added a commit: rG6ec2c5e402a7: GC-parseable element atomic memcpy/memmove.

yrouban mentioned this in D100445: [RS4GC] Introduce intrinsics to get base ptr and offset.Apr 13 2021, 10:10 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

8 lines

Statepoints.rst

44 lines

lib/

Transforms/

Scalar/

RewriteStatepointsForGC.cpp

118 lines

Utils/

Local.cpp

7 lines

test/

Transforms/

RewriteStatepointsForGC/

unordered-atomic-memcpy-no-deopt.ll

52 lines

unordered-atomic-memcpy.ll

199 lines

Diff 300404

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 20,248 Lines • ▼ Show 20 Lines
	'``llvm.vscale``' Intrinsic			'``llvm.vscale``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	::			::

	declare i32 llvm.vscale.i32()			declare i32 llvm.vscale.i32()
				skatkovUnsubmitted Not Done Reply Inline Actions See see? skatkov: See see?
				apilipenkoAuthorUnsubmitted Done Reply Inline Actions Fixed. apilipenko: Fixed.
	declare i64 llvm.vscale.i64()			declare i64 llvm.vscale.i64()

	Overview:			Overview:
	"""""""""			"""""""""

	The ``llvm.vscale`` intrinsic returns the value for ``vscale`` in scalable			The ``llvm.vscale`` intrinsic returns the value for ``vscale`` in scalable
	vectors such as ``<vscale x 16 x i8>``.			vectors such as ``<vscale x 16 x i8>``.

	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	``element_size`` must be a compile-time constant positive power of two no greater than			``element_size`` must be a compile-time constant positive power of two no greater than
	target-specific atomic access size limit.			target-specific atomic access size limit.

	For each of the input pointers ``align`` parameter attribute must be specified. It			For each of the input pointers ``align`` parameter attribute must be specified. It
	must be a power of two no less than the ``element_size``. Caller guarantees that			must be a power of two no less than the ``element_size``. Caller guarantees that
	both the source and destination pointers are aligned to that boundary.			both the source and destination pointers are aligned to that boundary.

	Semantics:			Semantics:
	""""""""""			""""""""""
				skatkovUnsubmitted Not Done Reply Inline Actions The same link as for memcpy? skatkov: The same link as for memcpy?
				apilipenkoAuthorUnsubmitted Done Reply Inline Actions Fixed. apilipenko: Fixed.

	The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of			The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
	memory from the source location to the destination location. These locations are not			memory from the source location to the destination location. These locations are not
	allowed to overlap. The memory copy is performed as a sequence of load/store operations			allowed to overlap. The memory copy is performed as a sequence of load/store operations
	where each access is guaranteed to be a multiple of ``element_size`` bytes wide and			where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
	aligned at an ``element_size`` boundary.			aligned at an ``element_size`` boundary.

	The order of the copy is unspecified. The same value may be read from the source			The order of the copy is unspecified. The same value may be read from the source
	buffer many times, but only one write is issued to the destination buffer per			buffer many times, but only one write is issued to the destination buffer per
	element. It is well defined to have concurrent reads and writes to both source and			element. It is well defined to have concurrent reads and writes to both source and
	destination provided those reads and writes are unordered atomic when specified.			destination provided those reads and writes are unordered atomic when specified.

	This intrinsic does not provide any additional ordering guarantees over those			This intrinsic does not provide any additional ordering guarantees over those
	provided by a set of unordered loads from the source location and stores to the			provided by a set of unordered loads from the source location and stores to the
	destination.			destination.

	Lowering:			Lowering:
	"""""""""			"""""""""

	In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is			In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
	lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_``. Where ''			lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_``. Where ''
	is replaced with an actual element size.			is replaced with an actual element size. See :ref:`RewriteStatepointsForGC intrinsic
				lowering <RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
				lowering.

	Optimizer is allowed to inline memory copy when it's profitable to do so.			Optimizer is allowed to inline memory copy when it's profitable to do so.

	'``llvm.memmove.element.unordered.atomic``' Intrinsic			'``llvm.memmove.element.unordered.atomic``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	destination.			destination.

	Lowering:			Lowering:
	"""""""""			"""""""""

	In the most general case call to the			In the most general case call to the
	'``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol			'``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol
	``__llvm_memmove_element_unordered_atomic_``. Where '' is replaced with an			``__llvm_memmove_element_unordered_atomic_``. Where '' is replaced with an
	actual element size.			actual element size. See :ref:`RewriteStatepointsForGC intrinsic lowering
				<RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
				lowering.

	The optimizer is allowed to inline the memory copy when it's profitable to do so.			The optimizer is allowed to inline the memory copy when it's profitable to do so.

	.. _int_memset_element_unordered_atomic:			.. _int_memset_element_unordered_atomic:

	'``llvm.memset.element.unordered.atomic``' Intrinsic			'``llvm.memset.element.unordered.atomic``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	▲ Show 20 Lines • Show All 439 Lines • Show Last 20 Lines

llvm/docs/Statepoints.rst

	Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines
	``"statepoint-id"`` and ``"statepoint-num-patch-bytes"`` attributes			``"statepoint-id"`` and ``"statepoint-num-patch-bytes"`` attributes
	are not propagated to the ``gc.statepoint`` call or invoke if they			are not propagated to the ``gc.statepoint`` call or invoke if they
	could be successfully parsed.			could be successfully parsed.

	In practice, RewriteStatepointsForGC should be run much later in the pass			In practice, RewriteStatepointsForGC should be run much later in the pass
	pipeline, after most optimization is already done. This helps to improve			pipeline, after most optimization is already done. This helps to improve
	the quality of the generated code when compiled with garbage collection support.			the quality of the generated code when compiled with garbage collection support.

				.. _RewriteStatepointsForGC_intrinsic_lowering:

				RewriteStatepointsForGC intrinsic lowering
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				As a part of lowering to the explicit model of relocations
				RewriteStatepointsForGC performs GC specific lowering for
				'``llvm.memcpy.element.unordered.atomic.*``',
				'``llvm.memmove.element.unordered.atomic.*``' intrinsics.

				There are two possible lowerings for these copy operations: GC leaf lowering
				and GC parseable lowering. If a call is explicitly marked with
				"gc-leaf-function" attribute the call is lowered to a GC leaf call to
				'``__llvm_memcpy_element_unordered_atomic_*``' or
				'``__llvm_memmove_element_unordered_atomic_*``' symbol. Such a call can not
				take a safepoint. Otherwise, the call is made GC parseable by wrapping the
				call into a statepoint. This makes it possible to take a safepoint during
				skatkovUnsubmitted Not Done Reply Inline Actions "This makes it is possible"? skatkov: "This makes it is possible"?
				apilipenkoAuthorUnsubmitted Done Reply Inline Actions Fixed. apilipenko: Fixed.
				copy operation. Note that a GC parseable copy operation is not required to
				take a safepoint. For example, a short copy operation may be performed without
				taking a safepoint.

				GC parseable calls to '``llvm.memcpy.element.unordered.atomic.*``',
				'``llvm.memmove.element.unordered.atomic.*``' intrinsics are lowered to calls
				to '``__llvm_memcpy_element_unordered_atomic_safepoint_*``',
				'``__llvm_memmove_element_unordered_atomic_safepoint_*``' symbols respectively.
				This way the runtime can provide implementations of copy operations with and
				without safepoints.

				GC parseable lowering also involves adjusting the arguments for the call.
				Memcpy and memmove intrinsics take derived pointers as source and destination
				arguments. If a copy operation takes a safepoint it might need to relocate the
				underlying source and destination objects. This requires the corresponding base
				pointers to be available in the copy operation. In order to make the base
				pointers available RewriteStatepointsForGC replaces derived pointers with base
				pointer and offset pairs. For example:

				.. code-block:: llvm

				declare void @__llvm_memcpy_element_unordered_atomic_safepoint_1(
				i8 addrspace(1)* %dest_base, i64 %dest_offset,
				i8 addrspace(1)* %src_base, i64 %src_offset,
				i64 %length)


	.. _PlaceSafepoints:			.. _PlaceSafepoints:

	PlaceSafepoints			PlaceSafepoints
	^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^

	The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running			The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running
	code checks for a safepoint request on a timely manner. This pass is expected			code checks for a safepoint request on a timely manner. This pass is expected
	to be run before RewriteStatepointsForGC and thus does not produce full			to be run before RewriteStatepointsForGC and thus does not produce full
	▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp

Show First 20 Lines • Show All 1,537 Lines • ▼ Show 20 Lines	if (IID == Intrinsic::experimental_deoptimize) {
// calls to @llvm.experimental.deoptimize with different argument types in		// calls to @llvm.experimental.deoptimize with different argument types in
// the same module. This is fine -- we assume the frontend knew what it		// the same module. This is fine -- we assume the frontend knew what it
// was doing when generating this kind of IR.		// was doing when generating this kind of IR.
CallTarget = F->getParent()		CallTarget = F->getParent()
->getOrInsertFunction("__llvm_deoptimize", FTy)		->getOrInsertFunction("__llvm_deoptimize", FTy)
.getCallee();		.getCallee();

IsDeoptimize = true;		IsDeoptimize = true;
		} else if (IID == Intrinsic::memcpy_element_unordered_atomic \|\|
		IID == Intrinsic::memmove_element_unordered_atomic) {
		// Unordered atomic memcpy and memmove intrinsics which are not explicitly
		// marked as "gc-leaf-function" should be lowered in a GC parseable way.
		// Specifically, these calls should be lowered to the
		// __llvm_{memcpy\|memmove}_element_unordered_atomic_safepoint symbols.
		// Similarly to __llvm_deoptimize we want to resolve this now, since the
		// verifier does not allow taking the address of an intrinsic function.
		//
		// Moreover we need to shuffle the arguments for the call in order to
		// accommodate GC. The underlying source and destination objects might be
		// relocated during copy operation should the GC occur. To relocate the
		// derived source and destination pointers the implementation of the
		// intrinsic should know the corresponding base pointers.
		//
		// To make the base pointers available pass them explicitly as arguments:
		// memcpy(dest_derived, source_derived, ...) =>
		// memcpy(dest_base, dest_offset, source_base, source_offset, ...)
		auto &Context = Call->getContext();
		auto &DL = Call->getModule()->getDataLayout();
		auto GetBaseAndOffset = [&](Value *Derived) {
		assert(Result.PointerToBase.count(Derived));
		unsigned AddressSpace = Derived->getType()->getPointerAddressSpace();
		unsigned IntPtrSize = DL.getPointerSizeInBits(AddressSpace);
		Value *Base = Result.PointerToBase.find(Derived)->second;
		Value *Base_int = Builder.CreatePtrToInt(
		Base, Type::getIntNTy(Context, IntPtrSize));
		Value *Derived_int = Builder.CreatePtrToInt(
		Derived, Type::getIntNTy(Context, IntPtrSize));
		return std::make_pair(Base, Builder.CreateSub(Derived_int, Base_int));
		};

		auto *Dest = CallArgs[0];
		Value DestBase, DestOffset;
		std::tie(DestBase, DestOffset) = GetBaseAndOffset(Dest);

		auto *Source = CallArgs[1];
		Value SourceBase, SourceOffset;
		std::tie(SourceBase, SourceOffset) = GetBaseAndOffset(Source);

		auto *LengthInBytes = CallArgs[2];
		auto *ElementSizeCI = cast<ConstantInt>(CallArgs[3]);

		CallArgs.clear();
		CallArgs.push_back(DestBase);
		CallArgs.push_back(DestOffset);
		CallArgs.push_back(SourceBase);
		CallArgs.push_back(SourceOffset);
		CallArgs.push_back(LengthInBytes);

		SmallVector<Type *, 8> DomainTy;
		for (Value *Arg : CallArgs)
		DomainTy.push_back(Arg->getType());
		auto *FTy = FunctionType::get(Type::getVoidTy(F->getContext()), DomainTy,
		/* isVarArg = */ false);

		auto GetFunctionName = [](Intrinsic::ID IID, ConstantInt *ElementSizeCI) {
		uint64_t ElementSize = ElementSizeCI->getZExtValue();
		if (IID == Intrinsic::memcpy_element_unordered_atomic) {
		switch (ElementSize) {
		case 1:
		return "__llvm_memcpy_element_unordered_atomic_safepoint_1";
		case 2:
		return "__llvm_memcpy_element_unordered_atomic_safepoint_2";
		case 4:
		return "__llvm_memcpy_element_unordered_atomic_safepoint_4";
		case 8:
		return "__llvm_memcpy_element_unordered_atomic_safepoint_8";
		case 16:
		return "__llvm_memcpy_element_unordered_atomic_safepoint_16";
		default:
		llvm_unreachable("unexpected element size!");
		}
		}
		assert(IID == Intrinsic::memmove_element_unordered_atomic);
		switch (ElementSize) {
		case 1:
		return "__llvm_memmove_element_unordered_atomic_safepoint_1";
		case 2:
		return "__llvm_memmove_element_unordered_atomic_safepoint_2";
		case 4:
		return "__llvm_memmove_element_unordered_atomic_safepoint_4";
		case 8:
		return "__llvm_memmove_element_unordered_atomic_safepoint_8";
		case 16:
		return "__llvm_memmove_element_unordered_atomic_safepoint_16";
		default:
		llvm_unreachable("unexpected element size!");
		}
		};

		CallTarget =
		F->getParent()
		->getOrInsertFunction(GetFunctionName(IID, ElementSizeCI), FTy)
		.getCallee();
}		}
}		}

// Create the statepoint given all the arguments		// Create the statepoint given all the arguments
GCStatepointInst *Token = nullptr;		GCStatepointInst *Token = nullptr;
if (auto *CI = dyn_cast<CallInst>(Call)) {		if (auto *CI = dyn_cast<CallInst>(Call)) {
CallInst *SPCall = Builder.CreateGCStatepointCall(		CallInst *SPCall = Builder.CreateGCStatepointCall(
StatepointID, NumPatchBytes, CallTarget, Flags, CallArgs,		StatepointID, NumPatchBytes, CallTarget, Flags, CallArgs,
▲ Show 20 Lines • Show All 1,025 Lines • ▼ Show 20 Lines
bool RewriteStatepointsForGC::runOnFunction(Function &F, DominatorTree &DT,		bool RewriteStatepointsForGC::runOnFunction(Function &F, DominatorTree &DT,
TargetTransformInfo &TTI,		TargetTransformInfo &TTI,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
assert(!F.isDeclaration() && !F.empty() &&		assert(!F.isDeclaration() && !F.empty() &&
"need function body to rewrite statepoints in");		"need function body to rewrite statepoints in");
assert(shouldRewriteStatepointsIn(F) && "mismatch in rewrite decision");		assert(shouldRewriteStatepointsIn(F) && "mismatch in rewrite decision");

auto NeedsRewrite = [&TLI](Instruction &I) {		auto NeedsRewrite = [&TLI](Instruction &I) {
if (const auto *Call = dyn_cast<CallBase>(&I))		if (const auto *Call = dyn_cast<CallBase>(&I)) {
return !callsGCLeafFunction(Call, TLI) && !isa<GCStatepointInst>(Call);		if (isa<GCStatepointInst>(Call))
		return false;
		if (callsGCLeafFunction(Call, TLI))
		return false;

		// Normally it's up to the frontend to make sure that non-leaf calls also
		// have proper deopt state if it is required. We make an exception for
		// element atomic memcpy/memmove intrinsics here. Unlike other intrinsics
		// these are non-leaf by default. They might be generated by the optimizer
		// which doesn't know how to produce a proper deopt state. So if we see a
		// non-leaf memcpy/memmove without deopt state just treat it as a leaf
		// copy and don't produce a statepoint.
		if (!AllowStatepointWithNoDeoptInfo &&
		!Call->getOperandBundle(LLVMContext::OB_deopt)) {
		assert((isa<AtomicMemCpyInst>(Call) \|\| isa<AtomicMemMoveInst>(Call)) &&
		"Don't expect any other calls here!");
		return false;
		}
		return true;
		}
return false;		return false;
};		};

// Delete any unreachable statepoints so that we don't have unrewritten		// Delete any unreachable statepoints so that we don't have unrewritten
// statepoints surviving this pass. This makes testing easier and the		// statepoints surviving this pass. This makes testing easier and the
// resulting IR less confusing to human readers.		// resulting IR less confusing to human readers.
DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);		DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
bool MadeChange = removeUnreachableBlocks(F, &DTU);		bool MadeChange = removeUnreachableBlocks(F, &DTU);
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/Local.cpp

Show First 20 Lines • Show All 2,666 Lines • ▼ Show 20 Lines	bool llvm::callsGCLeafFunction(const CallBase *Call,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
// Check if the function is specifically marked as a gc leaf function.		// Check if the function is specifically marked as a gc leaf function.
if (Call->hasFnAttr("gc-leaf-function"))		if (Call->hasFnAttr("gc-leaf-function"))
return true;		return true;
if (const Function *F = Call->getCalledFunction()) {		if (const Function *F = Call->getCalledFunction()) {
if (F->hasFnAttribute("gc-leaf-function"))		if (F->hasFnAttribute("gc-leaf-function"))
return true;		return true;

if (auto IID = F->getIntrinsicID())		if (auto IID = F->getIntrinsicID()) {
// Most LLVM intrinsics do not take safepoints.		// Most LLVM intrinsics do not take safepoints.
return IID != Intrinsic::experimental_gc_statepoint &&		return IID != Intrinsic::experimental_gc_statepoint &&
IID != Intrinsic::experimental_deoptimize;		IID != Intrinsic::experimental_deoptimize &&
		IID != Intrinsic::memcpy_element_unordered_atomic &&
		skatkovUnsubmitted Not Done Reply Inline Actions I'm a bit confused here. RS4GC consider memcpy/memove as gc leaf if there is no deopt bundle. Here we do not check for deopt bundle. Is it ok? skatkov: I'm a bit confused here. RS4GC consider memcpy/memove as gc leaf if there is no deopt bundle.
		apilipenkoAuthorUnsubmitted Done Reply Inline Actions callsGCLeafFunction returns true if the call is guaranteed to never safepoint. If a memcpy/memmove call is marked as gc-leaf it is guaranteed to be a leaf call, i.e. it will never take a safepoint. Otherwise the call may take a safepoint, but doesn't have to. This is reflected in the documentation: Note that a GC parseable copy operation is not required to take a safepoint. For example, a short copy operation may be performed without taking a safepoint. Interactions with deopt bundles is an implementation detail of RS4GC. For a memcpy/memmove call which may safepoint (i.e. doesn't have gc-leaf attribute) RS4GC is allowed to generate a leaf call which will never safepoint. If runtime requires deopt information to be associated with every safepoint (rs4gc-allow-statepoint-with-no-deopt-info=false) and RS4GC can not satisfy this requirement for a memcpy/memmove call it will generate a leaf call. apilipenko: callsGCLeafFunction returns true if the call is guaranteed to never safepoint. If a…
		IID != Intrinsic::memmove_element_unordered_atomic;
		}
}		}

// Lib calls can be materialized by some passes, and won't be		// Lib calls can be materialized by some passes, and won't be
// marked as 'gc-leaf-function.' All available Libcalls are		// marked as 'gc-leaf-function.' All available Libcalls are
// GC-leaf.		// GC-leaf.
LibFunc LF;		LibFunc LF;
if (TLI.getLibFunc(*Call, LF)) {		if (TLI.getLibFunc(*Call, LF)) {
return TLI.has(LF);		return TLI.has(LF);
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

llvm/test/Transforms/RewriteStatepointsForGC/unordered-atomic-memcpy-no-deopt.ll

This file was added.

				; RUN: opt -passes=rewrite-statepoints-for-gc -rs4gc-allow-statepoint-with-no-deopt-info=0 -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-REQUIRE-DEOPT
				; RUN: opt -passes=rewrite-statepoints-for-gc -rs4gc-allow-statepoint-with-no-deopt-info=1 -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-NO-REQUIRE-DEOPT

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.11.0"

				declare void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1), i8 addrspace(1), i32, i32 immarg)
				declare void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1), i8 addrspace(1), i32, i32 immarg)

				define void @test_memcpy_no_deopt(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memcpy_no_deopt
				; CHECK-REQUIRE-DEOPT-NOT: @llvm.experimental.gc.statepoint
				; CHECK-NO-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1)
				ret void
				}

				define void @test_memmove_no_deopt(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memmove_no_deopt
				; CHECK-REQUIRE-DEOPT-NOT: @llvm.experimental.gc.statepoint
				; CHECK-NO-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1)
				ret void
				}

				define void @test_memcpy_with_deopt(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memcpy_with_deopt
				; CHECK-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				; CHECK-NO-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1) [ "deopt"(i32 0) ]
				ret void
				}

				define void @test_memmove_with_deopt(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memmove_with_deopt
				; CHECK-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				; CHECK-NO-REQUIRE-DEOPT: @llvm.experimental.gc.statepoint
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1) [ "deopt"(i32 0) ]
				ret void
				}

llvm/test/Transforms/RewriteStatepointsForGC/unordered-atomic-memcpy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature
				; Use instcombine to cleanup offset computation.
				; RUN: opt -passes=rewrite-statepoints-for-gc,instcombine -S < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128-p1:64:64"
				target triple = "x86_64-apple-macosx10.11.0"

				declare void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1), i8 addrspace(1), i32, i32 immarg)
				declare void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1), i8 addrspace(1), i32, i32 immarg)

				define void @test_memcpy_gc_leaf_function(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_gc_leaf_function
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SRC_DERIVED:%.]] = getelementptr inbounds i8, i8 addrspace(1) [[SRC]], i64 [[SRC_OFFSET]]
				; CHECK-NEXT: [[DEST_DERIVED:%.]] = getelementptr inbounds i8, i8 addrspace(1) [[DEST]], i64 [[DEST_OFFSET]]
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 1) [[ATTR2:#.*]]
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 2) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 4) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 8) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 16) [[ATTR2]]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset

				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1) "gc-leaf-function"
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 2) "gc-leaf-function"
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 4) "gc-leaf-function"
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 8) "gc-leaf-function"
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 16) "gc-leaf-function"
				ret void
				}

				define void @test_memcpy_element_atomic_1(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_element_atomic_1
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memcpy_element_unordered_atomic_safepoint_1, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1)
				ret void
				}

				define void @test_memcpy_element_atomic_2(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_element_atomic_2
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memcpy_element_unordered_atomic_safepoint_2, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 2)
				ret void
				}

				define void @test_memcpy_element_atomic_4(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_element_atomic_4
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memcpy_element_unordered_atomic_safepoint_4, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 4)
				ret void
				}

				define void @test_memcpy_element_atomic_8(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_element_atomic_8
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memcpy_element_unordered_atomic_safepoint_8, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 8)
				ret void
				}

				define void @test_memcpy_element_atomic_16(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memcpy_element_atomic_16
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memcpy_element_unordered_atomic_safepoint_16, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 16)
				ret void
				}

				define void @test_memmove_gc_leaf_function(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_gc_leaf_function
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SRC_DERIVED:%.]] = getelementptr inbounds i8, i8 addrspace(1) [[SRC]], i64 [[SRC_OFFSET]]
				; CHECK-NEXT: [[DEST_DERIVED:%.]] = getelementptr inbounds i8, i8 addrspace(1) [[DEST]], i64 [[DEST_OFFSET]]
				; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 1) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 2) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 4) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 8) [[ATTR2]]
				; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[SRC_DERIVED]], i8 addrspace(1)* align 16 [[DEST_DERIVED]], i32 [[LEN]], i32 16) [[ATTR2]]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset

				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1) "gc-leaf-function"
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 2) "gc-leaf-function"
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 4) "gc-leaf-function"
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 8) "gc-leaf-function"
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 16) "gc-leaf-function"
				ret void
				}

				define void @test_memmove_element_atomic_1(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_element_atomic_1
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memmove_element_unordered_atomic_safepoint_1, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 1)
				ret void
				}

				define void @test_memmove_element_atomic_2(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_element_atomic_2
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memmove_element_unordered_atomic_safepoint_2, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 2)
				ret void
				}

				define void @test_memmove_element_atomic_4(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_element_atomic_4
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memmove_element_unordered_atomic_safepoint_4, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 4)
				ret void
				}

				define void @test_memmove_element_atomic_8(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_element_atomic_8
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memmove_element_unordered_atomic_safepoint_8, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 8)
				ret void
				}

				define void @test_memmove_element_atomic_16(i8 addrspace(1)* %src, i64 %src_offset, i8 addrspace(1)* %dest, i64 %dest_offset, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: define {{[^@]+}}@test_memmove_element_atomic_16
				; CHECK-SAME: (i8 addrspace(1)* [[SRC:%.]], i64 [[SRC_OFFSET:%.]], i8 addrspace(1)* [[DEST:%.]], i64 [[DEST_OFFSET:%.]], i32 [[LEN:%.*]]) gc "statepoint-example" {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STATEPOINT_TOKEN:%.]] = call token (i64, i32, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i8i64p1i8i64i32f(i64 2882400000, i32 0, void (i8 addrspace(1), i64, i8 addrspace(1), i64, i32)* nonnull @__llvm_memmove_element_unordered_atomic_safepoint_16, i32 5, i32 0, i8 addrspace(1)* [[SRC]], i64 [[SRC_OFFSET]], i8 addrspace(1)* [[DEST]], i64 [[DEST_OFFSET]], i32 [[LEN]], i32 0, i32 0) [ "gc-live"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				%src_derived = getelementptr inbounds i8, i8 addrspace(1)* %src, i64 %src_offset
				%dest_derived = getelementptr inbounds i8, i8 addrspace(1)* %dest, i64 %dest_offset
				call void @llvm.memmove.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %src_derived, i8 addrspace(1)* align 16 %dest_derived, i32 %len, i32 16)
				ret void
				}