This is an archive of the discontinued LLVM Phabricator instance.

Add element atomic memset intrinsic
ClosedPublic

Authored by dneilson on Jun 30 2017, 7:39 AM.

Download Raw Diff

Details

Reviewers

eli.friedman
reames
mkazantsev
skatkov

Commits

rG965613ef1b07: Add element atomic memset intrinsic
rL307854: Add element atomic memset intrinsic

Summary

Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.

Diff Detail

Repository: rL LLVM

Event Timeline

dneilson created this revision.Jun 30 2017, 7:39 AM

dneilson added a child revision: D34883: Check for MemIntrinsic::isElementAtomic() in InstCombine.

Codewise, this looks fine and clearly implements the semantics you document. However, I wondering if those are the most useful semantics. Once we answer the design question below, if we still want to go in this direction the patch LGTM.

This can be used to represent any pattern memset currently can. In particular it does handle larger elements with repeating bytewise values. It does not full cover the Arrays.fill case though because a value such as "i32 15" can't be represented. I could see an argument in favour of handling that case entirely separately or all of it in a single family of intrinsics. What do you think?

In D34885#796833, @reames wrote:

Codewise, this looks fine and clearly implements the semantics you document. However, I wondering if those are the most useful semantics. Once we answer the design question below, if we still want to go in this direction the patch LGTM.

This can be used to represent any pattern memset currently can. In particular it does handle larger elements with repeating bytewise values. It does not full cover the Arrays.fill case though because a value such as "i32 15" can't be represented. I could see an argument in favour of handling that case entirely separately or all of it in a single family of intrinsics. What do you think?

I'd had the same thought... that case feels more like a special case of memset_pattern to me, as it matches the semantics of that function exactly. So, I think that case would be better handled as a separate thing -- memset_pattern is a lib function instead of an intrinsic right now.

I'm also a little skeptical this is a good idea; the "obvious" intrinsic to provide is one where the size of the value is equal to the atomic element size. The only reason to prefer this version is to slightly reduce the amount of work required to port existing transforms, and that doesn't seem compelling.

In D34885#797072, @efriedma wrote:

I'm also a little skeptical this is a good idea; the "obvious" intrinsic to provide is one where the size of the value is equal to the atomic element size. The only reason to prefer this version is to slightly reduce the amount of work required to port existing transforms, and that doesn't seem compelling.

I don't like the idea of having an element atomic memset intrinsic that doesn't follow the semantics of memset -- memset specifically sets every byte of a block of memory to the same value. It does not have super wide usefulness; it's basically just useful for initialization-type stuff like zeroing out blocks of memory. An element atomic version of memset, defined as in this patch, allows us to use memset in IR sourced from, say, Java in the same ways that it would be used in IR sourced from C or Fortran. I don't think that the limited applicability of memset should be a strike against it.

If we want to be able to turn code like "for (int i=0; i<N; i++) a[i] = <some 32-bit integer constant>;" into an intrinsic call that can be optimized & reasoned about, then I think that's a different intrinsic from memset. In fact, it's memset_pattern; but, memset_pattern currently only exists as a libcall in LLVM.

Philip/Eli -- what are your thoughts? Should we try proposing adding memset_pattern as an LLVM intrinsic?

There's a lot of history going into this.

The memset API isn't really very good; memset is enshrined as an intrinsic most because it's available as part of the C library (and therefore well-optimized for most targets). This saves us a bunch of work, and it's good enough for the most common case of initializing memory with an all-zero pattern.

The problem with adding a memset_pattern intrinsic is that we can't really support it well on all targets, at least not without a bunch of implementation work. If we're going to emit a call, we need an implementation somewhere, which means we need to add one to compiler-rt, and probably write target-specific code to optimize it. And even if we did have a good implementation for a bunch of targets in compiler-rt, people trying to use clang with libgcc or a freestanding target would complain about not having that implementation available. So we have the current solution, where we don't have an intrinsic, and depend on vectorization/unrolling to generate reasonably fast code for initialization which can't be lowered to memset.

If you're writing a new API which isn't tied to the history of memset, the same interface doesn't really make sense; the only possible advantage is sharing code in the optimizer. But maybe that's what we're stuck with; the potential advantages aren't important compared to the burden of implementing/maintaining an intrinsic with a different API.

On a side-note, it's sort of a problem for the other unordered-atomic intrinsics that we currently don't have an implementation available in compiler-rt... but we can let it slide for now because most languages don't use unordered atomics anyway.

Daniel, and I ended up talking about this one a decent amount offline with one of our other non-LLVM compiler folks. Our conclusions was that the more generic fill pattern isn't really any more common in Java than it is in C/C++. The overwhelmingly most important pattern we want to catch is just zero-initialization of a block of memory, (i.e. atomic bzero) which is covered by the current proposal.

My take here is that we should move forward with the current design. If we decide we want to generalize at some point we can, but there's no reason to block the addition of the element wise atomic code on that generalization decision. It looks like we're probably not going to want to generalize this in the near future given the work required that Eli worked out and the low perceived profitability.

Eli's point about not having a compiler-rt implementation for these is sound and something I hadn't previously considered. We should at minimum file a bug for that. :) It might be reasonable to provide a simple implementation just to make sure the whole system works end to end as well.

Unless anyone objects to this, I intend to LGTM this and move forward with the original approach. Any concerns?

In D34885#802663, @reames wrote:

Unless anyone objects to this, I intend to LGTM this and move forward with the original approach. Any concerns?

LGTM

Daniel, please do file the compiler-rt bug for the issue mentioned by Eli.

This revision is now accepted and ready to land.Jul 11 2017, 2:26 PM

rebase
add entries to WebAssembly runtime lib lists due to change in RTLIB::Libcall enum.

Herald added subscribers: aheejin, jgravelle-google, sbc100 and 2 others. · View Herald TranscriptJul 12 2017, 2:27 PM

Closed by commit rL307854: Add element atomic memset intrinsic (authored by dneilson). · Explain WhyJul 12 2017, 2:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

75 lines

include/

llvm/

CodeGen/

RuntimeLibcalls.h

12 lines

IR/

IntrinsicInst.h

80 lines

Intrinsics.td

4 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

39 lines

TargetLoweringBase.cpp

27 lines

IR/

Verifier.cpp

33 lines

Target/

WebAssembly/

WebAssemblyRuntimeLibcallSignatures.cpp

11 lines

test/

CodeGen/

X86/

element-wise-atomic-memory-intrinsics.ll

61 lines

Verifier/

element-wise-atomic-memory-intrinsics.ll

18 lines

Diff 106321

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,333 Lines • ▼ Show 20 Lines
	""""""""""			""""""""""

	The '``llvm.memmove.*``' intrinsics copy a block of memory from the			The '``llvm.memmove.*``' intrinsics copy a block of memory from the
	source location to the destination location, which may overlap. It			source location to the destination location, which may overlap. It
	copies "len" bytes of memory over. If the argument is known to be			copies "len" bytes of memory over. If the argument is known to be
	aligned to some boundary, this can be specified as the fourth argument,			aligned to some boundary, this can be specified as the fourth argument,
	otherwise it should be set to 0 or 1 (both meaning no alignment).			otherwise it should be set to 0 or 1 (both meaning no alignment).

				.. _int_memset:

	'``llvm.memset.*``' Intrinsics			'``llvm.memset.*``' Intrinsics
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	This is an overloaded intrinsic. You can use llvm.memset on any integer			This is an overloaded intrinsic. You can use llvm.memset on any integer
	bit width and for different address spaces. However, not all targets			bit width and for different address spaces. However, not all targets
	▲ Show 20 Lines • Show All 3,903 Lines • ▼ Show 20 Lines
	"""""""""			"""""""""

	In the most general case call to the			In the most general case call to the
	'``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol			'``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol
	``__llvm_memmove_element_unordered_atomic_``. Where '' is replaced with an			``__llvm_memmove_element_unordered_atomic_``. Where '' is replaced with an
	actual element size.			actual element size.

	The optimizer is allowed to inline the memory copy when it's profitable to do so.			The optimizer is allowed to inline the memory copy when it's profitable to do so.

				.. _int_memset_element_unordered_atomic:

				'``llvm.memset.element.unordered.atomic``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				This is an overloaded intrinsic. You can use ``llvm.memset.element.unordered.atomic`` on
				any integer bit width and for different address spaces. Not all targets
				support all bit widths however.

				::

				declare void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* <dest>,
				i8 <value>,
				i32 <len>,
				i32 <element_size>)
				declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* <dest>,
				i8 <value>,
				i64 <len>,
				i32 <element_size>)

				Overview:
				"""""""""

				The '``llvm.memset.element.unordered.atomic.*``' intrinsic is a specialization of the
				'``llvm.memset.*``' intrinsic. It differs in that the ``dest`` is treated as an array
				with elements that are exactly ``element_size`` bytes, and the assignment to that array
				uses uses a sequence of :ref:`unordered atomic <ordering>` store operations
				that are a positive integer multiple of the ``element_size`` in size.

				Arguments:
				""""""""""

				The first three arguments are the same as they are in the :ref:`@llvm.memset <int_memset>`
				intrinsic, with the added constraint that ``len`` is required to be a positive integer
				multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
				``element_size``, then the behaviour of the intrinsic is undefined.

				``element_size`` must be a compile-time constant positive power of two no greater than
				target-specific atomic access size limit.

				The ``dest`` input pointer must have the ``align`` parameter attribute specified. It
				must be a power of two no less than the ``element_size``. Caller guarantees that
				the destination pointer is aligned to that boundary.

				Semantics:
				""""""""""

				The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of
				memory starting at the destination location to the given ``value``. The memory is
				set with a sequence of store operations where each access is guaranteed to be a
				multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.

				The order of the assignment is unspecified. Only one write is issued to the
				destination buffer per element. It is well defined to have concurrent reads and
				writes to the destination provided those reads and writes are unordered atomic
				when specified.

				This intrinsic does not provide any additional ordering guarantees over those
				provided by a set of unordered stores to the destination.

				Lowering:
				"""""""""

				In the most general case call to the '``llvm.memset.element.unordered.atomic.*``' is
				lowered to a call to the symbol ``__llvm_memset_element_unordered_atomic_``. Where ''
				is replaced with an actual element size.

				The optimizer is allowed to inline the memory assignment when it's profitable to do so.

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	enum Libcall {
MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,

MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1,		MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1,
MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2,		MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2,
MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4,		MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4,
MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8,		MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8,
MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16,		MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16,

		MEMSET_ELEMENT_UNORDERED_ATOMIC_1,
		MEMSET_ELEMENT_UNORDERED_ATOMIC_2,
		MEMSET_ELEMENT_UNORDERED_ATOMIC_4,
		MEMSET_ELEMENT_UNORDERED_ATOMIC_8,
		MEMSET_ELEMENT_UNORDERED_ATOMIC_16,

// EXCEPTION HANDLING		// EXCEPTION HANDLING
UNWIND_RESUME,		UNWIND_RESUME,

// Note: there's two sets of atomics libcalls; see		// Note: there's two sets of atomics libcalls; see
// <http://llvm.org/docs/Atomics.html> for more info on the		// <http://llvm.org/docs/Atomics.html> for more info on the
// difference between them.		// difference between them.

// Atomic '__sync_*' libcalls.		// Atomic '__sync_*' libcalls.
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	namespace RTLIB {
/// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or		/// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
/// UNKNOW_LIBCALL if there is none.		/// UNKNOW_LIBCALL if there is none.
Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);		Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);

/// getMEMMOVE_ELEMENT_UNORDERED_ATOMIC - Return		/// getMEMMOVE_ELEMENT_UNORDERED_ATOMIC - Return
/// MEMMOVE_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or		/// MEMMOVE_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
/// UNKNOW_LIBCALL if there is none.		/// UNKNOW_LIBCALL if there is none.
Libcall getMEMMOVE_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);		Libcall getMEMMOVE_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);

		/// getMEMSET_ELEMENT_UNORDERED_ATOMIC - Return
		/// MEMSET_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
		/// UNKNOW_LIBCALL if there is none.
		Libcall getMEMSET_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);

}		}
}		}

#endif		#endif

llvm/trunk/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	public:
static inline bool classof(const IntrinsicInst *I) {		static inline bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memmove_element_unordered_atomic;		return I->getIntrinsicID() == Intrinsic::memmove_element_unordered_atomic;
}		}
static inline bool classof(const Value *V) {		static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This class represents atomic memset intrinsic
		/// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
		/// C&P of all methods from that hierarchy
		class ElementUnorderedAtomicMemSetInst : public IntrinsicInst {
		private:
		enum { ARG_DEST = 0, ARG_VALUE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 };

		public:
		Value *getRawDest() const {
		return const_cast<Value *>(getArgOperand(ARG_DEST));
		}
		const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
		Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }

		Value getValue() const { return const_cast<Value>(getArgOperand(ARG_VALUE)); }
		const Use &getValueUse() const { return getArgOperandUse(ARG_VALUE); }
		Use &getValueUse() { return getArgOperandUse(ARG_VALUE); }

		Value *getLength() const {
		return const_cast<Value *>(getArgOperand(ARG_LENGTH));
		}
		const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
		Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }

		bool isVolatile() const { return false; }

		Value *getRawElementSizeInBytes() const {
		return const_cast<Value *>(getArgOperand(ARG_ELEMENTSIZE));
		}

		ConstantInt *getElementSizeInBytesCst() const {
		return cast<ConstantInt>(getRawElementSizeInBytes());
		}

		uint32_t getElementSizeInBytes() const {
		return getElementSizeInBytesCst()->getZExtValue();
		}

		/// This is just like getRawDest, but it strips off any cast
		/// instructions that feed it, giving the original input. The returned
		/// value is guaranteed to be a pointer.
		Value *getDest() const { return getRawDest()->stripPointerCasts(); }

		unsigned getDestAddressSpace() const {
		return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
		}

		/// Set the specified arguments of the instruction.
		void setDest(Value *Ptr) {
		assert(getRawDest()->getType() == Ptr->getType() &&
		"setDest called with pointer of wrong type!");
		setArgOperand(ARG_DEST, Ptr);
		}

		void setValue(Value *Val) {
		assert(getValue()->getType() == Val->getType() &&
		"setValue called with value of wrong type!");
		setArgOperand(ARG_VALUE, Val);
		}

		void setLength(Value *L) {
		assert(getLength()->getType() == L->getType() &&
		"setLength called with value of wrong type!");
		setArgOperand(ARG_LENGTH, L);
		}

		void setElementSizeInBytes(Constant *V) {
		assert(V->getType() == Type::getInt8Ty(getContext()) &&
		"setElementSizeInBytes called with value of wrong type!");
		setArgOperand(ARG_ELEMENTSIZE, V);
		}

		static inline bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::memset_element_unordered_atomic;
		}
		static inline bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

/// This is the common base class for memset/memcpy/memmove.		/// This is the common base class for memset/memcpy/memmove.
class MemIntrinsic : public IntrinsicInst {		class MemIntrinsic : public IntrinsicInst {
public:		public:
Value getRawDest() const { return const_cast<Value>(getArgOperand(0)); }		Value getRawDest() const { return const_cast<Value>(getArgOperand(0)); }
const Use &getRawDestUse() const { return getArgOperandUse(0); }		const Use &getRawDestUse() const { return getArgOperandUse(0); }
Use &getRawDestUse() { return getArgOperandUse(0); }		Use &getRawDestUse() { return getArgOperandUse(0); }

Value getLength() const { return const_cast<Value>(getArgOperand(2)); }		Value getLength() const { return const_cast<Value>(getArgOperand(2)); }
▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 878 Lines • ▼ Show 20 Lines	: Intrinsic<[],
[		[
llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty		llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty
],		],
[		[
IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,		IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
ReadOnly<1>		ReadOnly<1>
]>;		]>;

		// @llvm.memset.element.unordered.atomic.*(dest, value, length, elementsize)
		def int_memset_element_unordered_atomic
		: Intrinsic<[], [ llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty ],
		[ IntrArgMemOnly, NoCapture<0>, WriteOnly<0> ]>;

//===------------------------ Reduction Intrinsics ------------------------===//		//===------------------------ Reduction Intrinsics ------------------------===//
//		//
def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
[llvm_anyfloat_ty,		[llvm_anyfloat_ty,
llvm_anyvector_ty],		llvm_anyvector_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,026 Lines • ▼ Show 20 Lines	CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(
DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),		DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),
TLI.getPointerTy(DAG.getDataLayout())),		TLI.getPointerTy(DAG.getDataLayout())),
std::move(Args));		std::move(Args));

std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);		std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);
DAG.setRoot(CallResult.second);		DAG.setRoot(CallResult.second);
return nullptr;		return nullptr;
}		}
		case Intrinsic::memset_element_unordered_atomic: {
		auto &MI = cast<ElementUnorderedAtomicMemSetInst>(I);
		SDValue Dst = getValue(MI.getRawDest());
		SDValue Val = getValue(MI.getValue());
		SDValue Length = getValue(MI.getLength());

		// Emit a library call.
		TargetLowering::ArgListTy Args;
		TargetLowering::ArgListEntry Entry;
		Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
		Entry.Node = Dst;
		Args.push_back(Entry);

		Entry.Ty = Type::getInt8Ty(*DAG.getContext());
		Entry.Node = Val;
		Args.push_back(Entry);

		Entry.Ty = MI.getLength()->getType();
		Entry.Node = Length;
		Args.push_back(Entry);

		uint64_t ElementSizeConstant = MI.getElementSizeInBytes();
		RTLIB::Libcall LibraryCall =
		RTLIB::getMEMSET_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
		if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
		report_fatal_error("Unsupported element size");

		TargetLowering::CallLoweringInfo CLI(DAG);
		CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(
		TLI.getLibcallCallingConv(LibraryCall),
		Type::getVoidTy(*DAG.getContext()),
		DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),
		TLI.getPointerTy(DAG.getDataLayout())),
		std::move(Args));

		std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);
		DAG.setRoot(CallResult.second);
		return nullptr;
		}
case Intrinsic::dbg_declare: {		case Intrinsic::dbg_declare: {
const DbgDeclareInst &DI = cast<DbgDeclareInst>(I);		const DbgDeclareInst &DI = cast<DbgDeclareInst>(I);
DILocalVariable *Variable = DI.getVariable();		DILocalVariable *Variable = DI.getVariable();
DIExpression *Expression = DI.getExpression();		DIExpression *Expression = DI.getExpression();
const Value *Address = DI.getAddress();		const Value *Address = DI.getAddress();
assert(Variable && "Missing variable");		assert(Variable && "Missing variable");
if (!Address) {		if (!Address) {
DEBUG(dbgs() << "Dropping debug info for " << DI << "\n");		DEBUG(dbgs() << "Dropping debug info for " << DI << "\n");
▲ Show 20 Lines • Show All 4,745 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	static void InitLibcallNames(const char **Names, const Triple &TT) {
Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2] =		Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2] =
"__llvm_memmove_element_unordered_atomic_2";		"__llvm_memmove_element_unordered_atomic_2";
Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4] =		Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4] =
"__llvm_memmove_element_unordered_atomic_4";		"__llvm_memmove_element_unordered_atomic_4";
Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8] =		Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8] =
"__llvm_memmove_element_unordered_atomic_8";		"__llvm_memmove_element_unordered_atomic_8";
Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16] =		Names[RTLIB::MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16] =
"__llvm_memmove_element_unordered_atomic_16";		"__llvm_memmove_element_unordered_atomic_16";
		Names[RTLIB::MEMSET_ELEMENT_UNORDERED_ATOMIC_1] =
		"__llvm_memset_element_unordered_atomic_1";
		Names[RTLIB::MEMSET_ELEMENT_UNORDERED_ATOMIC_2] =
		"__llvm_memset_element_unordered_atomic_2";
		Names[RTLIB::MEMSET_ELEMENT_UNORDERED_ATOMIC_4] =
		"__llvm_memset_element_unordered_atomic_4";
		Names[RTLIB::MEMSET_ELEMENT_UNORDERED_ATOMIC_8] =
		"__llvm_memset_element_unordered_atomic_8";
		Names[RTLIB::MEMSET_ELEMENT_UNORDERED_ATOMIC_16] =
		"__llvm_memset_element_unordered_atomic_16";
Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";		Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	case 8:
return MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8;		return MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8;
case 16:		case 16:
return MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16;		return MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16;
default:		default:
return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}
}		}

		RTLIB::Libcall RTLIB::getMEMSET_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
		switch (ElementSize) {
		case 1:
		return MEMSET_ELEMENT_UNORDERED_ATOMIC_1;
		case 2:
		return MEMSET_ELEMENT_UNORDERED_ATOMIC_2;
		case 4:
		return MEMSET_ELEMENT_UNORDERED_ATOMIC_4;
		case 8:
		return MEMSET_ELEMENT_UNORDERED_ATOMIC_8;
		case 16:
		return MEMSET_ELEMENT_UNORDERED_ATOMIC_16;
		default:
		return UNKNOWN_LIBCALL;
		}
		}

/// InitCmpLibcallCCs - Set default comparison libcall CC.		/// InitCmpLibcallCCs - Set default comparison libcall CC.
///		///
static void InitCmpLibcallCCs(ISD::CondCode *CCs) {		static void InitCmpLibcallCCs(ISD::CondCode *CCs) {
memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);		memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);
CCs[RTLIB::OEQ_F32] = ISD::SETEQ;		CCs[RTLIB::OEQ_F32] = ISD::SETEQ;
CCs[RTLIB::OEQ_F64] = ISD::SETEQ;		CCs[RTLIB::OEQ_F64] = ISD::SETEQ;
CCs[RTLIB::OEQ_F128] = ISD::SETEQ;		CCs[RTLIB::OEQ_F128] = ISD::SETEQ;
CCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;		CCs[RTLIB::OEQ_PPCF128] = ISD::SETEQ;
▲ Show 20 Lines • Show All 1,317 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,071 Lines • ▼ Show 20 Lines	auto IsValidAlignment = [&](uint64_t Alignment) {
return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);		return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
};		};
uint64_t DstAlignment = CS.getParamAlignment(0),		uint64_t DstAlignment = CS.getParamAlignment(0),
SrcAlignment = CS.getParamAlignment(1);		SrcAlignment = CS.getParamAlignment(1);
Assert(IsValidAlignment(DstAlignment),		Assert(IsValidAlignment(DstAlignment),
"incorrect alignment of the destination argument", CS);		"incorrect alignment of the destination argument", CS);
Assert(IsValidAlignment(SrcAlignment),		Assert(IsValidAlignment(SrcAlignment),
"incorrect alignment of the source argument", CS);		"incorrect alignment of the source argument", CS);
		break;
		}
		case Intrinsic::memset_element_unordered_atomic: {
		auto *MI = cast<ElementUnorderedAtomicMemSetInst>(CS.getInstruction());

		ConstantInt *ElementSizeCI =
		dyn_cast<ConstantInt>(MI->getRawElementSizeInBytes());
		Assert(ElementSizeCI,
		"element size of the element-wise unordered atomic memory "
		"intrinsic must be a constant int",
		CS);
		const APInt &ElementSizeVal = ElementSizeCI->getValue();
		Assert(ElementSizeVal.isPowerOf2(),
		"element size of the element-wise atomic memory intrinsic "
		"must be a power of 2",
		CS);

		if (auto *LengthCI = dyn_cast<ConstantInt>(MI->getLength())) {
		uint64_t Length = LengthCI->getZExtValue();
		uint64_t ElementSize = MI->getElementSizeInBytes();
		Assert((Length % ElementSize) == 0,
		"constant length must be a multiple of the element size in the "
		"element-wise atomic memory intrinsic",
		CS);
		}

		auto IsValidAlignment = [&](uint64_t Alignment) {
		return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
		};
		uint64_t DstAlignment = CS.getParamAlignment(0);
		Assert(IsValidAlignment(DstAlignment),
		"incorrect alignment of the destination argument", CS);
break;		break;
}		}
case Intrinsic::gcroot:		case Intrinsic::gcroot:
case Intrinsic::gcwrite:		case Intrinsic::gcwrite:
case Intrinsic::gcread:		case Intrinsic::gcread:
if (ID == Intrinsic::gcroot) {		if (ID == Intrinsic::gcroot) {
AllocaInst *AI =		AllocaInst *AI =
dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());		dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());
▲ Show 20 Lines • Show All 896 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp

	Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
	/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_8 */ unsupported,			/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_8 */ unsupported,
	/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_16 */ unsupported,			/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_16 */ unsupported,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1 */ unsupported,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1 */ unsupported,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2 */ unsupported,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2 */ unsupported,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4 */ unsupported,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4 */ unsupported,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8 */ unsupported,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8 */ unsupported,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16 */ unsupported,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16 */ unsupported,

				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_1 */ unsupported,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_2 */ unsupported,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_4 */ unsupported,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_8 */ unsupported,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_16 */ unsupported,

	// EXCEPTION HANDLING			// EXCEPTION HANDLING
	/* UNWIND_RESUME */ unsupported,			/* UNWIND_RESUME */ unsupported,

	// Note: there's two sets of atomics libcalls; see			// Note: there's two sets of atomics libcalls; see
	// <http://llvm.org/docs/Atomics.html> for more info on the			// <http://llvm.org/docs/Atomics.html> for more info on the
	// difference between them.			// difference between them.

	// Atomic '__sync_*' libcalls.			// Atomic '__sync_*' libcalls.
	▲ Show 20 Lines • Show All 433 Lines • ▼ Show 20 Lines
	/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_4 */ nullptr,			/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_4 */ nullptr,
	/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_8 */ nullptr,			/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_8 */ nullptr,
	/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_16 */ nullptr,			/* MEMCPY_ELEMENT_UNORDERED_ATOMIC_16 */ nullptr,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1 */ nullptr,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_1 */ nullptr,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2 */ nullptr,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_2 */ nullptr,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4 */ nullptr,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_4 */ nullptr,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8 */ nullptr,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_8 */ nullptr,
	/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16 */ nullptr,			/* MEMMOVE_ELEMENT_UNORDERED_ATOMIC_16 */ nullptr,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_1 */ nullptr,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_2 */ nullptr,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_4 */ nullptr,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_8 */ nullptr,
				/* MEMSET_ELEMENT_UNORDERED_ATOMIC_16 */ nullptr,
	/* UNWIND_RESUME */ "_Unwind_Resume",			/* UNWIND_RESUME */ "_Unwind_Resume",
	/* SYNC_VAL_COMPARE_AND_SWAP_1 */ "__sync_val_compare_and_swap_1",			/* SYNC_VAL_COMPARE_AND_SWAP_1 */ "__sync_val_compare_and_swap_1",
	/* SYNC_VAL_COMPARE_AND_SWAP_2 */ "__sync_val_compare_and_swap_2",			/* SYNC_VAL_COMPARE_AND_SWAP_2 */ "__sync_val_compare_and_swap_2",
	/* SYNC_VAL_COMPARE_AND_SWAP_4 */ "__sync_val_compare_and_swap_4",			/* SYNC_VAL_COMPARE_AND_SWAP_4 */ "__sync_val_compare_and_swap_4",
	/* SYNC_VAL_COMPARE_AND_SWAP_8 */ "__sync_val_compare_and_swap_8",			/* SYNC_VAL_COMPARE_AND_SWAP_8 */ "__sync_val_compare_and_swap_8",
	/* SYNC_VAL_COMPARE_AND_SWAP_16 */ "__sync_val_compare_and_swap_16",			/* SYNC_VAL_COMPARE_AND_SWAP_16 */ "__sync_val_compare_and_swap_16",
	/* SYNC_LOCK_TEST_AND_SET_1 */ "__sync_lock_test_and_set_1",			/* SYNC_LOCK_TEST_AND_SET_1 */ "__sync_lock_test_and_set_1",
	/* SYNC_LOCK_TEST_AND_SET_2 */ "__sync_lock_test_and_set_2",			/* SYNC_LOCK_TEST_AND_SET_2 */ "__sync_lock_test_and_set_2",
	▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	define void @test_memmove_args(i8** %Storage) {
; 2nd arg (%rsi)		; 2nd arg (%rsi)
; CHECK-DAG: movq 8(%rdi), %rsi		; CHECK-DAG: movq 8(%rdi), %rsi
; 3rd arg (%edx) -- length		; 3rd arg (%edx) -- length
; CHECK-DAG: movl $4, %edx		; CHECK-DAG: movl $4, %edx
; CHECK: __llvm_memmove_element_unordered_atomic_4		; CHECK: __llvm_memmove_element_unordered_atomic_4
call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4) ret void		call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4) ret void
}		}

		define i8* @test_memset1(i8* %P, i8 %V) {
		; CHECK: test_memset
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 1, i32 1)
		ret i8* %P
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $1, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_1
		}

		define i8* @test_memset2(i8* %P, i8 %V) {
		; CHECK: test_memset2
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 2, i32 2)
		ret i8* %P
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $2, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_2
		}

		define i8* @test_memset4(i8* %P, i8 %V) {
		; CHECK: test_memset4
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 4, i32 4)
		ret i8* %P
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $4, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_4
		}

		define i8* @test_memset8(i8* %P, i8 %V) {
		; CHECK: test_memset8
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 8 %P, i8 %V, i32 8, i32 8)
		ret i8* %P
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $8, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_8
		}

		define i8* @test_memset16(i8* %P, i8 %V) {
		; CHECK: test_memset16
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 16 %P, i8 %V, i32 16, i32 16)
		ret i8* %P
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $16, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_16
		}

		define void @test_memset_args(i8** %Storage, i8* %V) {
		; CHECK: test_memset_args
		%Dst = load i8, i8* %Storage
		%Val = load i8, i8* %V

		; 1st arg (%rdi)
		; CHECK-DAG: movq (%rdi), %rdi
		; 2nd arg (%rsi)
		; CHECK-DAG: movzbl (%rsi), %esi
		; 3rd arg (%edx) -- length
		; CHECK-DAG: movl $4, %edx
		; CHECK: __llvm_memset_element_unordered_atomic_4
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %Dst, i8 %Val, i32 4, i32 4) ret void
		}

declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind		declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind		declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
		declare void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* nocapture, i8, i32, i32) nounwind

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

Show All 40 Lines	define void @test_memmove(i8* %P, i8* %Q, i32 %A, i32 %E) {
call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)		call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)
; CHECK: incorrect alignment of the source argument		; CHECK: incorrect alignment of the source argument
call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)		call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)

ret void		ret void
}		}
declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind		declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

		define void @test_memset(i8* %P, i8 %V, i32 %A, i32 %E) {
		; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 1, i32 %E)
		; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 1, i32 3)

		; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 4 %P, i8 %V, i32 7, i32 4)

		; CHECK: incorrect alignment of the destination argument
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* %P, i8 %V, i32 1, i32 1)
		; CHECK: incorrect alignment of the destination argument
		call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 1 %P, i8 %V, i32 4, i32 4)

		ret void
		}
		declare void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* nocapture, i8, i32, i32) nounwind

; CHECK: input module is broken!		; CHECK: input module is broken!

This is an archive of the discontinued LLVM Phabricator instance.

Add element atomic memset intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106321

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

llvm/trunk/include/llvm/IR/IntrinsicInst.h

llvm/trunk/include/llvm/IR/Intrinsics.td

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

llvm/trunk/lib/IR/Verifier.cpp

llvm/trunk/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

Add element atomic memset intrinsic
ClosedPublic