This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
RuntimeLibcalls.h
-
IR/
-
Intrinsics.td
-
lib/
-
CodeGen/
-
SelectionDAG/
-
SelectionDAGBuilder.cpp
-
TargetLoweringBase.cpp
-
IR/
-
Verifier.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
element-wise-atomic-memory-intrinsics.ll
-
Verifier/
-
element-wise-atomic-memory-intrinsics.ll

Differential D27133

Introduce element-wise atomic memcpy and memmove intrinsics
ClosedPublic

Authored by igor-laevsky on Nov 25 2016, 7:59 AM.

Download Raw Diff

Details

Reviewers

pete
majnemer
joerg
sanjoy
haicheng
mcrosier
efriedma

Commits

rG4f31e52f9456: Introduce element-wise atomic memcpy intrinsic
rL290708: Introduce element-wise atomic memcpy intrinsic

Summary

LLVM can do various optimizations around standard library memcpy and memmove
intrinsics. In particular we have LoopIdionRecognize and MemCpyOptimizer passes
which do a bunch of interesting transformations.

Some languages require from all of their memory accesses to be unordered
atomics (like Java for example). Which means that we can't directly use above
mentioned passes when compiling for such language.

This change is a step towards supporting this optimizations for languages with
atomicity constraints. It adds special versions of the memcpy and memmove intrinsics
which are specified to have predictable set of atomic memory accesses. They are
lowered to call to the library functions and it's target responsibility to
implement them.

Following changes will be able to extend LoopIdionRecognize and other passes
to support newly added intrinsics.

Diff Detail

Repository: rL LLVM

Event Timeline

igor-laevsky updated this revision to Diff 79305.Nov 25 2016, 7:59 AM

igor-laevsky retitled this revision from to Introduce element-wise atomic memcpy and memmove intrinsics.

igor-laevsky updated this object.

igor-laevsky added reviewers: joerg, mcrosier, haicheng, majnemer, pete.

igor-laevsky added a subscriber: llvm-commits.

anna added a subscriber: anna.Nov 30 2016, 1:05 PM

anna added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4932 ↗	(On Diff #79305)	do we need a switch case here? the main switch case on the intrinsic is only triggered for memcpy and memmove (line 4901). Maybe we can have it as a ternary with just `memcpy_element_atomic` and `memmove_element_atomic`?
lib/IR/Verifier.cpp
3963 ↗	(On Diff #79305)	Don't we need verification that the source and dest are of same addrspace?

reames added a subscriber: reames.Nov 30 2016, 6:48 PM

reames added inline comments.

docs/LangRef.rst
12684 ↗	(On Diff #79305)	It might be good to specify the alignment of source and destination separately.
12704 ↗	(On Diff #79305)	Hm, the wording isn't quite right. As written, we could have six bytes to copy with an element size of 2 and be legally allowed to copy two three byte chunks. Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size and aligned at an element size boundary. (e.g. each element is accessed atomicly in source and destination buffer) Also, add: The order of the copy is unspecified. The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per element. It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are at least unordered atomic. This intrinsic does not provide ordering with respect to other memory accesses in adjacent code. It can be used to implement a fence-like construct.
12711 ↗	(On Diff #79305)	If we're just going to ignore the volatile argument, should we have it? Also mention that the optimizer may inline the copy if profitable.
12714 ↗	(On Diff #79305)	Not sure we really need memmove. I might start with just atomic_memcpy and see where that gets us.
include/llvm/IR/Intrinsics.td
766 ↗	(On Diff #79305)	The first argument is the return type right? Shouldn't that be void?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4906 ↗	(On Diff #79305)	We'll want to inline small copies here eventually, but this is a fine starting place.
4913 ↗	(On Diff #79305)	If this can be written as: Args.push_back(Entry(Ty, Node)) please do. the stateful code is slightly confusing.
test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll
6 ↗	(On Diff #79305)	Check the argument passing as well.

Thanks for the comments! Please take a look at the updated diff.

docs/LangRef.rst
12684 ↗	(On Diff #79305)	Why do you think this will be useful? This will definitely add some complexity to the possible transforms with this intrinsics.
12704 ↗	(On Diff #79305)	Right. Thanks for catching this! Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size Not sure, why it should be even multiplier? I think it's fine to copy any number of elements in a single operation, only importance is to never copy them partially. This intrinsic does not provide ordering with respect to other memory accesses in adjacent code. It can be used to implement a fence-like construct. Since there is no ordering, why it can be used as a fence-like construct?
12711 ↗	(On Diff #79305)	Plan is to use volatile to disable all optimizations involving this intrinsic. Much in a same way as original memcpy does.
12714 ↗	(On Diff #79305)	My thought was that since it looks very much like memcpy it would be easy to add them both together. However you're right and for the sake of simplifying review process I removed memmove from this change.
include/llvm/IR/Intrinsics.td
766 ↗	(On Diff #79305)	I'm not sure if there is any semantic difference between specifying empty list as a return value or specifying list with a single element which is llvm_void_ty. However I see that all other intrinsic definitions describe void return type as an empty list, so anyway it's better to be in sync with the rest of the code.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4913 ↗	(On Diff #79305)	Unfortunately TargetLowering::ArgListEntry doesn't have suitable constructor. Probably it worth adding it, but in a different change.
4932 ↗	(On Diff #79305)	Right, we don't need it. I completely removed memmove from the change, so now there is no conditional at all.
lib/IR/Verifier.cpp
3963 ↗	(On Diff #79305)	I didn't add this limitation to the intrinsic definition. Since address space semantics is target defined, we might be able to execute memory copy from one address space to another. I think it should be frontend's job to ensure that this is a valid operation.

reames added inline comments.Dec 1 2016, 12:26 PM

docs/LangRef.rst
12684 ↗	(On Diff #79305)	I don't have a strong view here. I just vaguely remembering someone wanting to do that to the memcpy intrinsic.
12704 ↗	(On Diff #79305)	w.r.t. "even multiple" you're right. The even can be dropped. The key word is "multiple". w.r.t. the fence instruction, that's spell check dropping a word. It should have been "It can NOT be used..."
12711 ↗	(On Diff #79305)	Ok.
include/llvm/IR/Intrinsics.td
766 ↗	(On Diff #79305)	I think I just misread the code. I thought you had three llvm_anyptr_ty arguments. I now see the third is an llvm_anyint_ty.

Specify alignment for each of the input pointers via parameter attribute instead of an explicit call argument.

I have some comments / questions on the specification

docs/LangRef.rst
12644 ↗	(On Diff #79895)	Add a couple of `-` s to underline the whole heading.
12646 ↗	(On Diff #80740)	I'd change the style a little bit here -- "to the standard library memory intrinsics except that they perform"
12667 ↗	(On Diff #80740)	Why not just have the `i64` variant?
12679 ↗	(On Diff #80740)	s/atomicly/atomically/ Also, perhaps you meant "i.e." and not "e.g."?
12689 ↗	(On Diff #80740)	*less than a target-specific atomic access size limit
12697 ↗	(On Diff #80740)	Given that you're explicitly not passing this parameter to `_llvm_memcpy_element_atomic`, perhaps this isn't needed?
12710 ↗	(On Diff #80740)	"The same value may be read from the source buffer many times" >> why do you care about this?
12710 ↗	(On Diff #80740)	"The order of the copy is unspecified" -- I'd be a bit more explicit -- "The order in which the `num_elements` elements are copied is unspecified"
12716 ↗	(On Diff #80740)	I'd expect for this to at least participate with reordering constraints due to other memory operations. That is, I'd expect reordering %val = acquire_load atomic_memcpy(...) to atomic_memcpy(...) %val = acquire_load to be illegal, even if the memory locations involved in memcpy is disjoint with the location in the load.

This revision now requires changes to proceed.Dec 16 2016, 4:36 PM

sanjoy added inline comments.Dec 16 2016, 4:40 PM

docs/LangRef.rst
12697 ↗	(On Diff #80740)	I see you've already replied to this before -- it would be used as a switch to disable all optimizations. However, what is the use case for this? I'd imagine the motivation behind emitting this intrinsic would be to enable memcpy related optimizations and better codegen. If a user does not want optimizations, perhaps they can just emit a loop with volatile loads and stores? Or are you trying to cater to a use case of when a user wants good codegen (i.e. they want to call into a optimized runtime intrinsic) but do not want memcpy optimizations? In that case, why won't they call `__llvm_memcpy_element_atomic` directly?

reames added inline comments.Dec 21 2016, 5:48 PM

docs/LangRef.rst
12697 ↗	(On Diff #80740)	In general, we want to implement two families of optimizations: target function recognition to intrinsic matching lowering of intrinsics to IR for small sizes We need the volatile flag to know that the later isn't legal.

This looks like it's basically ready to go in, we're down to minor wordsmithing at this point.

I'd really prefer not to sign off on this myself. Igor and I work together and I arguably have a conflict of interest here. Can I get another reviewer to speak up and LGTM the patch? If no one else has taken a look in a couple of days, I will LGTM to let us move forward, but I'd really prefer not to do that.

docs/LangRef.rst
12667 ↗	(On Diff #80740)	Yeah, I'd just drop the i32 version. It's redundant. I think this means that argument can just be an i64 rather than an anyint as well.
12710 ↗	(On Diff #80740)	The ability to read multiple times let's you restart after inspecting the value. Not sure why we'd want this, but let's leave the possibly available. (Maybe we can fastpath zero initialize if all the source values are zero or something?)
12716 ↗	(On Diff #80740)	Hm, good point. This clearly needs reworded. Possibly: This intrinsic does not provide any additional ordering guarantees over those provided by a set of unordered loads from the source location and stores to the destination.

igor-laevsky updated this revision to Diff 82338.Dec 22 2016, 6:47 AM

igor-laevsky edited edge metadata.

igor-laevsky marked 5 inline comments as done.

igor-laevsky added inline comments.

docs/LangRef.rst
12679 ↗	(On Diff #80740)	"E.g" is correct here. It is meant to describe one of the possible access patterns we can get. For example we can instead copy elements pair by pair if length is even. I replaced "e.g" with "for example" to make it more clear.
12710 ↗	(On Diff #80740)	I have a feeling that new phrase implies that more (or less) elements might be copied. Why do you think it's important to rephrase current statement?
12716 ↗	(On Diff #80740)	Right. Idea here is that this intrinsic behaves like unordered memory operation. I've updated it with Philip's wording.

Have you considered instead of specifying a single __llvm_memcpy_element_atomic library function, specifying a set? (__llvm_memcpy_element_atomic_1, __llvm_memcpy_element_atomic_2, __llvm_memcpy_element_atomic_4, __llvm_memcpy_element_atomic_8, __llvm_memcpy_element_atomic_16). It should be a bit faster, and if someone screws up, you'll get a link error rather than a runtime error.

It's kind of hard to for me to judge how useful this is because clang doesn't use unordered loads/stores... but I can definitely see this being nice to have.

In general, we want to implement two families of optimizations:

target function recognition to intrinsic matching

lowering of intrinsics to IR for small sizes

We need the volatile flag to know that the later isn't legal.

I don't follow; what exactly is "volatile" supposed to mean here? memcpy only has a volatile bit to match C semantics for volatile structs; since you're not dealing with that legacy, you should call your bit something different to reflect how you actually expect it to be used (maybe noinline, if that's what you're after?).

Hi Eli, thanks for the comments!

I updated the change according to your suggestion to use different library functions for the different element sizes. I also removed 'isvolatile' flag. My primary motivation was to match memcpy semantics, but I see now that it's not necessary.

LGTM with one minor tweak.

Please send an email to llvmdev with a short summary of the feature and a link to this review before you merge this; there might be other people interested in this who don't read llvm-commits.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4929 ↗	(On Diff #82509)	This should probably be a report_fatal_error rather than an assertion, given that it can be triggered with valid IR.

Closed by commit rL290708: Introduce element-wise atomic memcpy intrinsic (authored by igor.laevsky). · Explain WhyDec 29 2016, 6:41 AM

This revision was automatically updated to reflect the committed changes.

In D27133#631331, @efriedma wrote:

LGTM with one minor tweak.

Please send an email to llvmdev with a short summary of the feature and a link to this review before you merge this; there might be other people interested in this who don't read llvm-commits.

Thanks! I already such email some time ago: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107018.html

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

76 lines

include/

llvm/

CodeGen/

RuntimeLibcalls.h

11 lines

IR/

Intrinsics.td

9 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

45 lines

TargetLoweringBase.cpp

23 lines

IR/

Verifier.cpp

26 lines

test/

CodeGen/

X86/

element-wise-atomic-memory-intrinsics.ll

68 lines

Verifier/

element-wise-atomic-memory-intrinsics.ll

17 lines

Diff 82666

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 12,655 Lines • ▼ Show 20 Lines
	used to avoid the possibility of overflows when loading from such a constant.			used to avoid the possibility of overflows when loading from such a constant.

	Stack Map Intrinsics			Stack Map Intrinsics
	--------------------			--------------------

	LLVM provides experimental intrinsics to support runtime patching			LLVM provides experimental intrinsics to support runtime patching
	mechanisms commonly desired in dynamic language JITs. These intrinsics			mechanisms commonly desired in dynamic language JITs. These intrinsics
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

				Element Wise Atomic Memory Intrinsics
				-----------------------------

				These intrinsics are similar to the standard library memory intrinsics except
				that they perform memory transfer as a sequence of atomic memory accesses.

				.. _int_memcpy_element_atomic:

				'``llvm.memcpy.element.atomic``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on
				any integer bit width and for different address spaces. Not all targets
				support all bit widths however.

				::

				declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,
				i64 <num_elements>, i32 <element_size>)

				Overview:
				"""""""""

				The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of
				memory from the source location to the destination location as a sequence of
				unordered atomic memory accesses where each access is a multiple of
				``element_size`` bytes wide and aligned at an element size boundary. For example
				each element is accessed atomically in source and destination buffers.

				Arguments:
				""""""""""

				The first argument is a pointer to the destination, the second is a
				pointer to the source. The third argument is an integer argument
				specifying the number of elements to copy, the fourth argument is size of
				the single element in bytes.

				``element_size`` should be a power of two, greater than zero and less than
				a target-specific atomic access size limit.

				For each of the input pointers ``align`` parameter attribute must be specified.
				It must be a power of two and greater than or equal to the ``element_size``.
				Caller guarantees that both the source and destination pointers are aligned to
				that boundary.

				Semantics:
				""""""""""

				The '``llvm.memcpy.element.atomic.*``' intrinsic copies
				'``num_elements`` * ``element_size``' bytes of memory from the source location to
				the destination location. These locations are not allowed to overlap. Memory copy
				is performed as a sequence of unordered atomic memory accesses where each access
				is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an
				element size boundary.

				The order of the copy is unspecified. The same value may be read from the source
				buffer many times, but only one write is issued to the destination buffer per
				element. It is well defined to have concurrent reads and writes to both source
				and destination provided those reads and writes are at least unordered atomic.

				This intrinsic does not provide any additional ordering guarantees over those
				provided by a set of unordered loads from the source location and stores to the
				destination.

				Lowering:
				""""""""""

				In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered
				to a call to the symbol ``__llvm_memcpy_element_atomic_``. Where '' is replaced
				with an actual element size.

				Optimizer is allowed to inline memory copy when it's profitable to do so.

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	enum Libcall {
O_F128,		O_F128,
O_PPCF128,		O_PPCF128,

// MEMORY		// MEMORY
MEMCPY,		MEMCPY,
MEMSET,		MEMSET,
MEMMOVE,		MEMMOVE,

		// ELEMENT-WISE ATOMIC MEMORY
		MEMCPY_ELEMENT_ATOMIC_1,
		MEMCPY_ELEMENT_ATOMIC_2,
		MEMCPY_ELEMENT_ATOMIC_4,
		MEMCPY_ELEMENT_ATOMIC_8,
		MEMCPY_ELEMENT_ATOMIC_16,

// EXCEPTION HANDLING		// EXCEPTION HANDLING
UNWIND_RESUME,		UNWIND_RESUME,

// Note: there's two sets of atomics libcalls; see		// Note: there's two sets of atomics libcalls; see
// <http://llvm.org/docs/Atomics.html> for more info on the		// <http://llvm.org/docs/Atomics.html> for more info on the
// difference between them.		// difference between them.

// Atomic '__sync_*' libcalls.		// Atomic '__sync_*' libcalls.
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	namespace RTLIB {

/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or		/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getUINTTOFP(EVT OpVT, EVT RetVT);		Libcall getUINTTOFP(EVT OpVT, EVT RetVT);

/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or		/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getSYNC(unsigned Opc, MVT VT);		Libcall getSYNC(unsigned Opc, MVT VT);

		/// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the
		/// given element size or UNKNOW_LIBCALL if there is none.
		Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);
}		}
}		}

#endif		#endif

llvm/trunk/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 753 Lines • ▼ Show 20 Lines
	// Safely loads a function pointer from a virtual table pointer using type metadata.			// Safely loads a function pointer from a virtual table pointer using type metadata.
	def int_type_checked_load : Intrinsic<[llvm_ptr_ty, llvm_i1_ty],			def int_type_checked_load : Intrinsic<[llvm_ptr_ty, llvm_i1_ty],
	[llvm_ptr_ty, llvm_i32_ty, llvm_metadata_ty],			[llvm_ptr_ty, llvm_i32_ty, llvm_metadata_ty],
	[IntrNoMem]>;			[IntrNoMem]>;

	def int_load_relative: Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_anyint_ty],			def int_load_relative: Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_anyint_ty],
	[IntrReadMem, IntrArgMemOnly]>;			[IntrReadMem, IntrArgMemOnly]>;

				//===------ Memory intrinsics with element-wise atomicity guarantees ------===//
				//

				def int_memcpy_element_atomic : Intrinsic<[],
				[llvm_anyptr_ty, llvm_anyptr_ty,
				llvm_i64_ty, llvm_i32_ty],
				[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
				WriteOnly<0>, ReadOnly<1>]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	include "llvm/IR/IntrinsicsAArch64.td"			include "llvm/IR/IntrinsicsAArch64.td"
	include "llvm/IR/IntrinsicsXCore.td"			include "llvm/IR/IntrinsicsXCore.td"
	include "llvm/IR/IntrinsicsHexagon.td"			include "llvm/IR/IntrinsicsHexagon.td"
	include "llvm/IR/IntrinsicsNVVM.td"			include "llvm/IR/IntrinsicsNVVM.td"
	include "llvm/IR/IntrinsicsMips.td"			include "llvm/IR/IntrinsicsMips.td"
	include "llvm/IR/IntrinsicsAMDGPU.td"			include "llvm/IR/IntrinsicsAMDGPU.td"
	include "llvm/IR/IntrinsicsBPF.td"			include "llvm/IR/IntrinsicsBPF.td"
	include "llvm/IR/IntrinsicsSystemZ.td"			include "llvm/IR/IntrinsicsSystemZ.td"
	include "llvm/IR/IntrinsicsWebAssembly.td"			include "llvm/IR/IntrinsicsWebAssembly.td"

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,890 Lines • ▼ Show 20 Lines	case Intrinsic::memmove: {
bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();		bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();
bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());		bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());
SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,		SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,
isTC, MachinePointerInfo(I.getArgOperand(0)),		isTC, MachinePointerInfo(I.getArgOperand(0)),
MachinePointerInfo(I.getArgOperand(1)));		MachinePointerInfo(I.getArgOperand(1)));
updateDAGForMaybeTailCall(MM);		updateDAGForMaybeTailCall(MM);
return nullptr;		return nullptr;
}		}
		case Intrinsic::memcpy_element_atomic: {
		SDValue Dst = getValue(I.getArgOperand(0));
		SDValue Src = getValue(I.getArgOperand(1));
		SDValue NumElements = getValue(I.getArgOperand(2));
		SDValue ElementSize = getValue(I.getArgOperand(3));

		// Emit a library call.
		TargetLowering::ArgListTy Args;
		TargetLowering::ArgListEntry Entry;
		Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
		Entry.Node = Dst;
		Args.push_back(Entry);

		Entry.Node = Src;
		Args.push_back(Entry);

		Entry.Ty = I.getArgOperand(2)->getType();
		Entry.Node = NumElements;
		Args.push_back(Entry);

		Entry.Ty = Type::getInt32Ty(*DAG.getContext());
		Entry.Node = ElementSize;
		Args.push_back(Entry);

		uint64_t ElementSizeConstant =
		cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();
		RTLIB::Libcall LibraryCall =
		RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);
		if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
		report_fatal_error("Unsupported element size");

		TargetLowering::CallLoweringInfo CLI(DAG);
		CLI.setDebugLoc(sdl)
		.setChain(getRoot())
		.setCallee(TLI.getLibcallCallingConv(LibraryCall),
		Type::getVoidTy(*DAG.getContext()),
		DAG.getExternalSymbol(
		TLI.getLibcallName(LibraryCall),
		TLI.getPointerTy(DAG.getDataLayout())),
		std::move(Args));

		std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);
		DAG.setRoot(CallResult.second);
		return nullptr;
		}
case Intrinsic::dbg_declare: {		case Intrinsic::dbg_declare: {
const DbgDeclareInst &DI = cast<DbgDeclareInst>(I);		const DbgDeclareInst &DI = cast<DbgDeclareInst>(I);
DILocalVariable *Variable = DI.getVariable();		DILocalVariable *Variable = DI.getVariable();
DIExpression *Expression = DI.getExpression();		DIExpression *Expression = DI.getExpression();
const Value *Address = DI.getAddress();		const Value *Address = DI.getAddress();
assert(Variable && "Missing variable");		assert(Variable && "Missing variable");
if (!Address) {		if (!Address) {
DEBUG(dbgs() << "Dropping debug info for " << DI << "\n");		DEBUG(dbgs() << "Dropping debug info for " << DI << "\n");
▲ Show 20 Lines • Show All 4,424 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	static void InitLibcallNames(const char **Names, const Triple &TT) {
Names[RTLIB::UO_PPCF128] = "__gcc_qunord";		Names[RTLIB::UO_PPCF128] = "__gcc_qunord";
Names[RTLIB::O_F32] = "__unordsf2";		Names[RTLIB::O_F32] = "__unordsf2";
Names[RTLIB::O_F64] = "__unorddf2";		Names[RTLIB::O_F64] = "__unorddf2";
Names[RTLIB::O_F128] = "__unordtf2";		Names[RTLIB::O_F128] = "__unordtf2";
Names[RTLIB::O_PPCF128] = "__gcc_qunord";		Names[RTLIB::O_PPCF128] = "__gcc_qunord";
Names[RTLIB::MEMCPY] = "memcpy";		Names[RTLIB::MEMCPY] = "memcpy";
Names[RTLIB::MEMMOVE] = "memmove";		Names[RTLIB::MEMMOVE] = "memmove";
Names[RTLIB::MEMSET] = "memset";		Names[RTLIB::MEMSET] = "memset";
		Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";
		Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";
		Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";
		Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";
		Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";
Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";		Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";
▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	switch (Opc) {
OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)		OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)
}		}

#undef OP_TO_LIBCALL		#undef OP_TO_LIBCALL

return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}

		RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {
		switch (ElementSize) {
		case 1:
		return MEMCPY_ELEMENT_ATOMIC_1;
		case 2:
		return MEMCPY_ELEMENT_ATOMIC_2;
		case 4:
		return MEMCPY_ELEMENT_ATOMIC_4;
		case 8:
		return MEMCPY_ELEMENT_ATOMIC_8;
		case 16:
		return MEMCPY_ELEMENT_ATOMIC_16;
		default:
		return UNKNOWN_LIBCALL;
		}

		}

/// InitCmpLibcallCCs - Set default comparison libcall CC.		/// InitCmpLibcallCCs - Set default comparison libcall CC.
///		///
static void InitCmpLibcallCCs(ISD::CondCode *CCs) {		static void InitCmpLibcallCCs(ISD::CondCode *CCs) {
memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);		memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);
CCs[RTLIB::OEQ_F32] = ISD::SETEQ;		CCs[RTLIB::OEQ_F32] = ISD::SETEQ;
CCs[RTLIB::OEQ_F64] = ISD::SETEQ;		CCs[RTLIB::OEQ_F64] = ISD::SETEQ;
CCs[RTLIB::OEQ_F128] = ISD::SETEQ;		CCs[RTLIB::OEQ_F128] = ISD::SETEQ;
CCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;		CCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;
▲ Show 20 Lines • Show All 1,303 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 3,946 Lines • ▼ Show 20 Lines	case Intrinsic::memset: {
const APInt &AlignVal = AlignCI->getValue();		const APInt &AlignVal = AlignCI->getValue();
Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),		Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),
"alignment argument of memory intrinsics must be a power of 2", CS);		"alignment argument of memory intrinsics must be a power of 2", CS);
Assert(isa<ConstantInt>(CS.getArgOperand(4)),		Assert(isa<ConstantInt>(CS.getArgOperand(4)),
"isvolatile argument of memory intrinsics must be a constant int",		"isvolatile argument of memory intrinsics must be a constant int",
CS);		CS);
break;		break;
}		}
		case Intrinsic::memcpy_element_atomic: {
		ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));
		Assert(ElementSizeCI, "element size of the element-wise atomic memory "
		"intrinsic must be a constant int",
		CS);
		const APInt &ElementSizeVal = ElementSizeCI->getValue();
		Assert(ElementSizeVal.isPowerOf2(),
		"element size of the element-wise atomic memory intrinsic "
		"must be a power of 2",
		CS);

		auto IsValidAlignment = [&](uint64_t Alignment) {
		return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
		};

		uint64_t DstAlignment = CS.getParamAlignment(1),
		SrcAlignment = CS.getParamAlignment(2);

		Assert(IsValidAlignment(DstAlignment),
		"incorrect alignment of the destination argument",
		CS);
		Assert(IsValidAlignment(SrcAlignment),
		"incorrect alignment of the source argument",
		CS);
		break;
		}
case Intrinsic::gcroot:		case Intrinsic::gcroot:
case Intrinsic::gcwrite:		case Intrinsic::gcwrite:
case Intrinsic::gcread:		case Intrinsic::gcread:
if (ID == Intrinsic::gcroot) {		if (ID == Intrinsic::gcroot) {
AllocaInst *AI =		AllocaInst *AI =
dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());		dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());
Assert(AI, "llvm.gcroot parameter #1 must be an alloca.", CS);		Assert(AI, "llvm.gcroot parameter #1 must be an alloca.", CS);
Assert(isa<Constant>(CS.getArgOperand(1)),		Assert(isa<Constant>(CS.getArgOperand(1)),
▲ Show 20 Lines • Show All 838 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

				; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

				define i8* @test_memcpy1(i8* %P, i8* %Q) {
				; CHECK: test_memcpy
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)
				ret i8* %P
				; CHECK-DAG: movl $1, %edx
				; CHECK-DAG: movl $1, %ecx
				; CHECK: __llvm_memcpy_element_atomic_1
				}

				define i8* @test_memcpy2(i8* %P, i8* %Q) {
				; CHECK: test_memcpy2
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)
				ret i8* %P
				; CHECK-DAG: movl $2, %edx
				; CHECK-DAG: movl $2, %ecx
				; CHECK: __llvm_memcpy_element_atomic_2
				}

				define i8* @test_memcpy4(i8* %P, i8* %Q) {
				; CHECK: test_memcpy4
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)
				ret i8* %P
				; CHECK-DAG: movl $4, %edx
				; CHECK-DAG: movl $4, %ecx
				; CHECK: __llvm_memcpy_element_atomic_4
				}

				define i8* @test_memcpy8(i8* %P, i8* %Q) {
				; CHECK: test_memcpy8
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)
				ret i8* %P
				; CHECK-DAG: movl $8, %edx
				; CHECK-DAG: movl $8, %ecx
				; CHECK: __llvm_memcpy_element_atomic_8
				}

				define i8* @test_memcpy16(i8* %P, i8* %Q) {
				; CHECK: test_memcpy16
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)
				ret i8* %P
				; CHECK-DAG: movl $16, %edx
				; CHECK-DAG: movl $16, %ecx
				; CHECK: __llvm_memcpy_element_atomic_16
				}

				define void @test_memcpy_args(i8** %Storage) {
				; CHECK: test_memcpy_args
				%Dst = load i8, i8* %Storage
				%Src.addr = getelementptr i8, i8* %Storage, i64 1
				%Src = load i8, i8* %Src.addr

				; First argument
				; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
				; CHECK-DAG: movq [[REG1]], %rdi
				; Second argument
				; CHECK-DAG: movq 8(%rdi), %rsi
				; Third argument
				; CHECK-DAG: movl $4, %edx
				; Fourth argument
				; CHECK-DAG: movl $4, %ecx
				; CHECK: __llvm_memcpy_element_atomic_4
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)
				ret void
				}

				declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

				; RUN: not opt -verify < %s 2>&1 \| FileCheck %s

				define void @test_memcpy(i8* %P, i8* %Q) {
				; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)

				; CHECK: incorrect alignment of the destination argument
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)

				; CHECK: incorrect alignment of the source argument
				call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)

				ret void
				}
				declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind

				; CHECK: input module is broken!

This is an archive of the discontinued LLVM Phabricator instance.

Introduce element-wise atomic memcpy and memmove intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 82666

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

llvm/trunk/include/llvm/IR/Intrinsics.td

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

llvm/trunk/lib/IR/Verifier.cpp

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

Introduce element-wise atomic memcpy and memmove intrinsics
ClosedPublic