Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -13553,62 +13553,70 @@ These intrinsics are similar to the standard library memory intrinsics except that they perform memory transfer as a sequence of atomic memory accesses. -.. _int_memcpy_element_atomic: +.. _int_memcpy_element_unordered_atomic: -'``llvm.memcpy.element.atomic``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +'``llvm.memcpy.element.unordered.atomic``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" -This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on +This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on any integer bit width and for different address spaces. Not all targets support all bit widths however. :: - declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* , i8* , - i64 , i32 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* , i8* , i32 , + i32 , i1 , + i1 , i1 , + i8 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* , i8* , i64 , + i32 , i1 , + i1 , i1 , + i8 ) Overview: """"""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of -memory from the source location to the destination location as a sequence of -unordered atomic memory accesses where each access is a multiple of -``element_size`` bytes wide and aligned at an element size boundary. For example -each element is accessed atomically in source and destination buffers. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the '``llvm.memcpy.*``' +intrinsic. It differs in that the ``dest`` and ``src`` are treated as arrays with elements that are +exactly ``element_size`` bytes, and the copy between buffers is done in a way that uses +:ref:`unordered atomic ` load/store operations that are a positive integer multiple +of the ``element_size`` in size. Arguments: """""""""" -The first argument is a pointer to the destination, the second is a -pointer to the source. The third argument is an integer argument -specifying the number of elements to copy, the fourth argument is size of -the single element in bytes. +The first five arguments are the same as they are in the :ref:`@llvm.memcpy ` intrinsic, +with the added constraint that ``len`` must be a positive integer multiple of the ``element_size``. -``element_size`` should be a power of two, greater than zero and less than -a target-specific atomic access size limit. +``dest_unordered`` is ``true`` if and only if stores to the destination buffer must be unordered +atomic stores. -For each of the input pointers ``align`` parameter attribute must be specified. -It must be a power of two and greater than or equal to the ``element_size``. -Caller guarantees that both the source and destination pointers are aligned to -that boundary. +``src_unordered`` is ``true`` if and only if loads from the source buffer must be unordered atomic +loads. + +``element_size`` must be a compile-time constant positive power of two no greater than target-specific +atomic access size limit. + +For each of the input pointers ``align`` parameter attribute must be specified. It must be a power of +two and greater than or equal to the ``element_size``. Caller guarantees that both the source and +destination pointers are aligned to that boundary. Semantics: """""""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic copies -'``num_elements`` * ``element_size``' bytes of memory from the source location to -the destination location. These locations are not allowed to overlap. Memory copy -is performed as a sequence of unordered atomic memory accesses where each access -is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an -element size boundary. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of memory from +the source location to the destination location. These locations are not allowed to overlap. +The memory copy is performed as a sequence of load/store operations where each access is +guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an ``element_size`` +boundary. The order of the copy is unspecified. The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per -element. It is well defined to have concurrent reads and writes to both source -and destination provided those reads and writes are at least unordered atomic. +element. It is well defined to have concurrent reads and writes to both source and destination +provided those reads and writes are unordered atomic when specified. This intrinsic does not provide any additional ordering guarantees over those provided by a set of unordered loads from the source location and stores to the @@ -13617,8 +13625,8 @@ Lowering: """"""""" -In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered -to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced +In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is lowered +to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*' is replaced with an actual element size. -Optimizer is allowed to inline memory copy when it's profitable to do so. +The optimizer is allowed to inline the memory copy when it's profitable to do so. Index: include/llvm/CodeGen/RuntimeLibcalls.h =================================================================== --- include/llvm/CodeGen/RuntimeLibcalls.h +++ include/llvm/CodeGen/RuntimeLibcalls.h @@ -333,12 +333,12 @@ MEMSET, MEMMOVE, - // ELEMENT-WISE ATOMIC MEMORY - MEMCPY_ELEMENT_ATOMIC_1, - MEMCPY_ELEMENT_ATOMIC_2, - MEMCPY_ELEMENT_ATOMIC_4, - MEMCPY_ELEMENT_ATOMIC_8, - MEMCPY_ELEMENT_ATOMIC_16, + // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes + MEMCPY_ELEMENT_UNORDERED_ATOMIC_1, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_2, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_4, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_8, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_16, // EXCEPTION HANDLING UNWIND_RESUME, @@ -511,9 +511,9 @@ /// UNKNOWN_LIBCALL if there is none. Libcall getSYNC(unsigned Opc, MVT VT); - /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the + /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the /// given element size or UNKNOW_LIBCALL if there is none. - Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize); + Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize); } } Index: include/llvm/IR/IntrinsicInst.h =================================================================== --- include/llvm/IR/IntrinsicInst.h +++ include/llvm/IR/IntrinsicInst.h @@ -192,25 +192,122 @@ }; /// This class represents atomic memcpy intrinsic - /// TODO: Integrate this class into MemIntrinsic hierarchy. - class ElementAtomicMemCpyInst : public IntrinsicInst { + /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is + /// C&P of all methods from that hierarchy + class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst { + private: + constexpr static int ARG_DEST = 0; + constexpr static int ARG_SRC = 1; + constexpr static int ARG_LENGTH = 2; + constexpr static int ARG_ALIGN = 3; + constexpr static int ARG_VOLATILE = 4; + constexpr static int ARG_DEST_UNORDERED = 5; + constexpr static int ARG_SRC_UNORDERED = 6; + constexpr static int ARG_ELEMENTSIZE = 7; + public: - Value *getRawDest() const { return getArgOperand(0); } - Value *getRawSource() const { return getArgOperand(1); } + Value *getRawDest() const { + return const_cast(getArgOperand(ARG_DEST)); + } + const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); } + Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); } + + /// Return the arguments to the instruction. + Value *getRawSource() const { + return const_cast(getArgOperand(ARG_SRC)); + } + const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SRC); } + Use &getRawSourceUse() { return getArgOperandUse(ARG_SRC); } + + Value *getLength() const { + return const_cast(getArgOperand(ARG_LENGTH)); + } + const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); } + Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); } + + ConstantInt *getAlignmentCst() const { + return cast(const_cast(getArgOperand(ARG_ALIGN))); + } - Value *getNumElements() const { return getArgOperand(2); } - void setNumElements(Value *V) { setArgOperand(2, V); } + unsigned getAlignment() const { return getAlignmentCst()->getZExtValue(); } + + Type *getAlignmentType() const { + return getArgOperand(ARG_ALIGN)->getType(); + } - uint64_t getSrcAlignment() const { return getParamAlignment(0); } - uint64_t getDstAlignment() const { return getParamAlignment(1); } + ConstantInt *getVolatileCst() const { + return cast( + const_cast(getArgOperand(ARG_VOLATILE))); + } - uint64_t getElementSizeInBytes() const { - Value *Arg = getArgOperand(3); + bool isVolatile() const { return !getVolatileCst()->isZero(); } + + uint8_t getDestUnordered() const { + Value *Arg = getArgOperand(ARG_DEST_UNORDERED); + return uint8_t(cast(Arg)->getZExtValue()); + } + + uint8_t getSrcUnordered() const { + Value *Arg = getArgOperand(ARG_SRC_UNORDERED); + return uint8_t(cast(Arg)->getZExtValue()); + } + + uint8_t getElementSizeInBytes() const { + Value *Arg = getArgOperand(ARG_ELEMENTSIZE); return cast(Arg)->getZExtValue(); } + /// This is just like getRawDest, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getDest() const { return getRawDest()->stripPointerCasts(); } + + /// This is just like getRawSource, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getSource() const { return getRawSource()->stripPointerCasts(); } + + unsigned getDestAddressSpace() const { + return cast(getRawDest()->getType())->getAddressSpace(); + } + + unsigned getSourceAddressSpace() const { + return cast(getRawSource()->getType())->getAddressSpace(); + } + + /// Set the specified arguments of the instruction. + void setDest(Value *Ptr) { + assert(getRawDest()->getType() == Ptr->getType() && + "setDest called with pointer of wrong type!"); + setArgOperand(ARG_DEST, Ptr); + } + + void setSource(Value *Ptr) { + assert(getRawSource()->getType() == Ptr->getType() && + "setSource called with pointer of wrong type!"); + setArgOperand(ARG_SRC, Ptr); + } + + void setLength(Value *L) { + assert(getLength()->getType() == L->getType() && + "setLength called with value of wrong type!"); + setArgOperand(ARG_LENGTH, L); + } + + void setAlignment(Constant *A) { setArgOperand(ARG_ALIGN, A); } + + void setVolatile(Constant *V) { setArgOperand(ARG_VOLATILE, V); } + + void setDestUnordered(Constant *V) { setArgOperand(ARG_DEST_UNORDERED, V); } + + void setSrcUnordered(Constant *V) { setArgOperand(ARG_SRC_UNORDERED, V); } + + void setElementSizeInBytes(Constant *V) { + setArgOperand(ARG_ELEMENTSIZE, V); + } + static inline bool classof(const IntrinsicInst *I) { - return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic; + return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic; } static inline bool classof(const Value *V) { return isa(V) && classof(cast(V)); Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -806,11 +806,18 @@ //===------ Memory intrinsics with element-wise atomicity guarantees ------===// // -def int_memcpy_element_atomic : Intrinsic<[], - [llvm_anyptr_ty, llvm_anyptr_ty, - llvm_i64_ty, llvm_i32_ty], - [IntrArgMemOnly, NoCapture<0>, NoCapture<1>, - WriteOnly<0>, ReadOnly<1>]>; +// llvm.memcpy.element.unordered.atomic(dest, src, length, alignment, volatile, +// isunordered, elementsize) +def int_memcpy_element_unordered_atomic + : Intrinsic<[], + [ + llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty, + llvm_i1_ty, llvm_i1_ty, llvm_i1_ty, llvm_i8_ty + ], + [ + IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>, + ReadOnly<1> + ]>; //===------------------------ Reduction Intrinsics ------------------------===// // Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -4867,11 +4867,15 @@ updateDAGForMaybeTailCall(MM); return nullptr; } - case Intrinsic::memcpy_element_atomic: { + case Intrinsic::memcpy_element_unordered_atomic: { SDValue Dst = getValue(I.getArgOperand(0)); SDValue Src = getValue(I.getArgOperand(1)); - SDValue NumElements = getValue(I.getArgOperand(2)); - SDValue ElementSize = getValue(I.getArgOperand(3)); + SDValue Length = getValue(I.getArgOperand(2)); + SDValue Alignment = getValue(I.getArgOperand(3)); + // Note: arg4 is isvolatile, which is unused for this intrinsic + SDValue DestUnordered = getValue(I.getArgOperand(5)); + SDValue SrcUnordered = getValue(I.getArgOperand(6)); + // SDValue ElementSize = getValue(I.getArgOperand(7)); // Emit a library call. TargetLowering::ArgListTy Args; @@ -4884,17 +4888,25 @@ Args.push_back(Entry); Entry.Ty = I.getArgOperand(2)->getType(); - Entry.Node = NumElements; + Entry.Node = Length; Args.push_back(Entry); Entry.Ty = Type::getInt32Ty(*DAG.getContext()); - Entry.Node = ElementSize; + Entry.Node = Alignment; + Args.push_back(Entry); + + Entry.Ty = Type::getInt1Ty(*DAG.getContext()); + Entry.Node = DestUnordered; + Args.push_back(Entry); + + Entry.Ty = Type::getInt1Ty(*DAG.getContext()); + Entry.Node = SrcUnordered; Args.push_back(Entry); uint64_t ElementSizeConstant = - cast(I.getArgOperand(3))->getZExtValue(); + cast(I.getArgOperand(7))->getZExtValue(); RTLIB::Libcall LibraryCall = - RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant); + RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant); if (LibraryCall == RTLIB::UNKNOWN_LIBCALL) report_fatal_error("Unsupported element size"); Index: lib/CodeGen/TargetLoweringBase.cpp =================================================================== --- lib/CodeGen/TargetLoweringBase.cpp +++ lib/CodeGen/TargetLoweringBase.cpp @@ -374,11 +374,11 @@ Names[RTLIB::MEMCPY] = "memcpy"; Names[RTLIB::MEMMOVE] = "memmove"; Names[RTLIB::MEMSET] = "memset"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] = "__llvm_memcpy_element_unordered_atomic_1"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] = "__llvm_memcpy_element_unordered_atomic_2"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] = "__llvm_memcpy_element_unordered_atomic_4"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] = "__llvm_memcpy_element_unordered_atomic_8"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] = "__llvm_memcpy_element_unordered_atomic_16"; Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2"; @@ -781,22 +781,21 @@ return UNKNOWN_LIBCALL; } -RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) { +RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) { switch (ElementSize) { case 1: - return MEMCPY_ELEMENT_ATOMIC_1; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1; case 2: - return MEMCPY_ELEMENT_ATOMIC_2; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2; case 4: - return MEMCPY_ELEMENT_ATOMIC_4; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4; case 8: - return MEMCPY_ELEMENT_ATOMIC_8; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8; case 16: - return MEMCPY_ELEMENT_ATOMIC_16; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16; default: return UNKNOWN_LIBCALL; } - } /// InitCmpLibcallCCs - Set default comparison libcall CC. Index: lib/IR/Verifier.cpp =================================================================== --- lib/IR/Verifier.cpp +++ lib/IR/Verifier.cpp @@ -3987,10 +3987,40 @@ CS); break; } - case Intrinsic::memcpy_element_atomic: { - ConstantInt *ElementSizeCI = dyn_cast(CS.getArgOperand(3)); - Assert(ElementSizeCI, "element size of the element-wise atomic memory " - "intrinsic must be a constant int", + case Intrinsic::memcpy_element_unordered_atomic: { + ConstantInt *AlignCI = dyn_cast(CS.getArgOperand(3)); + Assert(AlignCI, + "alignment argument of element-wise unordered atomic memory " + "intrinsics must be a constant int", + CS); + const APInt &AlignVal = AlignCI->getValue(); + Assert(AlignCI->isZero() || AlignVal.isPowerOf2(), + "alignment argument of element-wise unordered atomic memory " + "intrinsics must be a power of 2", + CS); + + ConstantInt *DestUnorderedCI = dyn_cast(CS.getArgOperand(5)); + Assert(DestUnorderedCI, + "dest_unordered of the element-wise unordered atomic memory " + "intrinsic must be a constant int", + CS); + + ConstantInt *SrcUnorderedCI = dyn_cast(CS.getArgOperand(6)); + Assert(SrcUnorderedCI, + "src_unordered of the element-wise unordered atomic memory " + "intrinsic must be a constant int", + CS); + + // Cannot have both unordered flags being false. + Assert(!(DestUnorderedCI->isZero() && SrcUnorderedCI->isZero()), + "dest_unordered and src_unordered cannot both be zero on the " + "element-wise unordered atomic memory intrinsic", + CS); + + ConstantInt *ElementSizeCI = dyn_cast(CS.getArgOperand(7)); + Assert(ElementSizeCI, + "element size of the element-wise unordered atomic memory " + "intrinsic must be a constant int", CS); const APInt &ElementSizeVal = ElementSizeCI->getValue(); Assert(ElementSizeVal.isPowerOf2(), @@ -4001,16 +4031,12 @@ auto IsValidAlignment = [&](uint64_t Alignment) { return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment); }; - uint64_t DstAlignment = CS.getParamAlignment(0), SrcAlignment = CS.getParamAlignment(1); - Assert(IsValidAlignment(DstAlignment), - "incorrect alignment of the destination argument", - CS); + "incorrect alignment of the destination argument", CS); Assert(IsValidAlignment(SrcAlignment), - "incorrect alignment of the source argument", - CS); + "incorrect alignment of the source argument", CS); break; } case Intrinsic::gcroot: Index: lib/Transforms/InstCombine/InstCombineCalls.cpp =================================================================== --- lib/Transforms/InstCombine/InstCombineCalls.cpp +++ lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -94,6 +94,7 @@ return ConstantVector::get(BoolVec); } +/* -- temp removal to aid staging Instruction * InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) { // Try to unfold this intrinsic into sequence of explicit atomic loads and @@ -165,6 +166,7 @@ AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType())); return AMI; } +*/ Instruction *InstCombiner::SimplifyMemTransfer(MemIntrinsic *MI) { unsigned DstAlign = getKnownAlignment(MI->getArgOperand(0), DL, MI, &AC, &DT); @@ -1892,6 +1894,7 @@ if (Changed) return II; } + /* -- temp removal to simplify staging if (auto *AMI = dyn_cast(II)) { if (Constant *C = dyn_cast(AMI->getNumElements())) if (C->isNullValue()) @@ -1900,7 +1903,8 @@ if (Instruction *I = SimplifyElementAtomicMemCpy(AMI)) return I; } - + */ + if (Instruction *I = SimplifyNVVMIntrinsic(II, *this)) return I; Index: lib/Transforms/InstCombine/InstCombineInternal.h =================================================================== --- lib/Transforms/InstCombine/InstCombineInternal.h +++ lib/Transforms/InstCombine/InstCombineInternal.h @@ -687,7 +687,7 @@ Instruction *MatchBSwap(BinaryOperator &I); bool SimplifyStoreAtEndOfBlock(StoreInst &SI); - Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI); + // Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI); -- temp removal to aid staging Instruction *SimplifyMemTransfer(MemIntrinsic *MI); Instruction *SimplifyMemSet(MemSetInst *MI); Index: test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll =================================================================== --- test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll +++ test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll @@ -2,47 +2,77 @@ define i8* @test_memcpy1(i8* %P, i8* %Q) { ; CHECK: test_memcpy - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1) ret i8* %P + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $1, %edx - ; CHECK-DAG: movl $1, %ecx - ; CHECK: __llvm_memcpy_element_atomic_1 + ; 4th arg (%ecx) -- align + ; CHECK-DAG: movl $4, %ecx + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_1 } define i8* @test_memcpy2(i8* %P, i8* %Q) { ; CHECK: test_memcpy2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 4, i1 0, i1 1, i1 1, i8 2) ret i8* %P + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $2, %edx - ; CHECK-DAG: movl $2, %ecx - ; CHECK: __llvm_memcpy_element_atomic_2 + ; 4th arg (%ecx) -- align + ; CHECK-DAG: movl $4, %ecx + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_2 } define i8* @test_memcpy4(i8* %P, i8* %Q) { ; CHECK: test_memcpy4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4) ret i8* %P + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $4, %edx + ; 4th arg (%ecx) -- align ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 } define i8* @test_memcpy8(i8* %P, i8* %Q) { ; CHECK: test_memcpy8 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8, i1 0, i1 1, i1 1, i8 8) ret i8* %P + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $8, %edx + ; 4th arg (%ecx) -- align ; CHECK-DAG: movl $8, %ecx - ; CHECK: __llvm_memcpy_element_atomic_8 + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_8 } define i8* @test_memcpy16(i8* %P, i8* %Q) { ; CHECK: test_memcpy16 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16, i1 0, i1 1, i1 1, i8 16) ret i8* %P + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $16, %edx + ; 4th arg (%ecx) -- align ; CHECK-DAG: movl $16, %ecx - ; CHECK: __llvm_memcpy_element_atomic_16 + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_16 } define void @test_memcpy_args(i8** %Storage) { @@ -51,18 +81,21 @@ %Src.addr = getelementptr i8*, i8** %Storage, i64 1 %Src = load i8*, i8** %Src.addr - ; First argument + ; 1st arg (%rdi) ; CHECK-DAG: movq (%rdi), [[REG1:%r.+]] ; CHECK-DAG: movq [[REG1]], %rdi - ; Second argument + ; 2nd arg (%rsi) ; CHECK-DAG: movq 8(%rdi), %rsi - ; Third argument + ; 3rd arg (%edx) -- size ; CHECK-DAG: movl $4, %edx - ; Fourth argument + ; 4th arg (%ecx) -- align ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4) - ret void + ; 5th arg (%r8) -- dest_unordered + ; CHECK-DAG: movl $1, %r8d + ; 6th arg (%r9) -- src_unordered + ; CHECK-DAG: movl $1, %r9d + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1, i1, i1, i8) nounwind Index: test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll =================================================================== --- test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll +++ test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll @@ -1,4 +1,6 @@ ; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s +; Temporarily an expected failure until inst combine is updated in the next patch +; XFAIL: * target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" ; Test basic unfolding Index: test/Verifier/element-wise-atomic-memory-intrinsics.ll =================================================================== --- test/Verifier/element-wise-atomic-memory-intrinsics.ll +++ test/Verifier/element-wise-atomic-memory-intrinsics.ll @@ -1,17 +1,38 @@ ; RUN: not opt -verify < %s 2>&1 | FileCheck %s -define void @test_memcpy(i8* %P, i8* %Q) { +define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i8 %E, i1 %V) { + ; CHECK: alignment argument of element-wise unordered atomic memory intrinsics must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %A, i1 0, i1 1, i1 1, i8 1) + + ; CHECK: alignment argument of element-wise unordered atomic memory intrinsics must be a power of 2 + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 5, i1 0, i1 1, i1 1, i8 1) + + ; CHECK: dest_unordered of the element-wise unordered atomic memory intrinsic must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 %V, i1 1, i8 1) + + ; CHECK: src_unordered of the element-wise unordered atomic memory intrinsic must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 %V, i8 1) + + ; CHECK: dest_unordered and src_unordered cannot both be zero on the element-wise unordered atomic memory intrinsic + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 0, i1 0, i8 1) + + ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 %E) ; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 3) ; CHECK: incorrect alignment of the destination argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1) + ; CHECK: incorrect alignment of the destination argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4) ; CHECK: incorrect alignment of the source argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1) + ; CHECK: incorrect alignment of the source argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4) + ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind - +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1, i1, i1, i8) nounwind ; CHECK: input module is broken!