Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -14002,62 +14002,68 @@ These intrinsics are similar to the standard library memory intrinsics except that they perform memory transfer as a sequence of atomic memory accesses. -.. _int_memcpy_element_atomic: +.. _int_memcpy_element_unordered_atomic: -'``llvm.memcpy.element.atomic``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +'``llvm.memcpy.element.unordered.atomic``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" -This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on +This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on any integer bit width and for different address spaces. Not all targets support all bit widths however. :: - declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* , i8* , - i64 , i32 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* , + i8* , + i32 , + i1 , + i8 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* , + i8* , + i64 , + i1 , + i8 ) Overview: """"""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of -memory from the source location to the destination location as a sequence of -unordered atomic memory accesses where each access is a multiple of -``element_size`` bytes wide and aligned at an element size boundary. For example -each element is accessed atomically in source and destination buffers. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the +'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated +as arrays with elements that are exactly ``element_size`` bytes, and the copy between +buffers uses a sequence of :ref:`unordered atomic ` load/store operations +that are a positive integer multiple of the ``element_size`` in size. Arguments: """""""""" -The first argument is a pointer to the destination, the second is a -pointer to the source. The third argument is an integer argument -specifying the number of elements to copy, the fourth argument is size of -the single element in bytes. +The first four arguments are the same as they are in the :ref:`@llvm.memcpy ` +intrinsic, with the added constraint that ``len`` is required to be a positive integer +multiple of the ``element_size``. If ``len`` is not a positive integer multiple of +``element_size``, then the behaviour of the intrinsic is undefined. -``element_size`` should be a power of two, greater than zero and less than -a target-specific atomic access size limit. +``element_size`` must be a compile-time constant positive power of two no greater than +target-specific atomic access size limit. -For each of the input pointers ``align`` parameter attribute must be specified. -It must be a power of two and greater than or equal to the ``element_size``. -Caller guarantees that both the source and destination pointers are aligned to -that boundary. +For each of the input pointers ``align`` parameter attribute must be specified. It +must be a power of two and greater than or equal to the ``element_size``. Caller +guarantees that both the source and destination pointers are aligned to that boundary. Semantics: """""""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic copies -'``num_elements`` * ``element_size``' bytes of memory from the source location to -the destination location. These locations are not allowed to overlap. Memory copy -is performed as a sequence of unordered atomic memory accesses where each access -is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an -element size boundary. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of +memory from the source location to the destination location. These locations are not +allowed to overlap. The memory copy is performed as a sequence of load/store operations +where each access is guaranteed to be a multiple of ``element_size`` bytes wide and +aligned at an ``element_size`` boundary. The order of the copy is unspecified. The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per -element. It is well defined to have concurrent reads and writes to both source -and destination provided those reads and writes are at least unordered atomic. +element. It is well defined to have concurrent reads and writes to both source and +destination provided those reads and writes are unordered atomic when specified. This intrinsic does not provide any additional ordering guarantees over those provided by a set of unordered loads from the source location and stores to the @@ -14066,8 +14072,8 @@ Lowering: """"""""" -In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered -to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced -with an actual element size. +In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is +lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*' +is replaced with an actual element size. -Optimizer is allowed to inline memory copy when it's profitable to do so. +The optimizer is allowed to inline the memory copy when it's profitable to do so. Index: include/llvm/CodeGen/RuntimeLibcalls.h =================================================================== --- include/llvm/CodeGen/RuntimeLibcalls.h +++ include/llvm/CodeGen/RuntimeLibcalls.h @@ -333,12 +333,12 @@ MEMSET, MEMMOVE, - // ELEMENT-WISE ATOMIC MEMORY - MEMCPY_ELEMENT_ATOMIC_1, - MEMCPY_ELEMENT_ATOMIC_2, - MEMCPY_ELEMENT_ATOMIC_4, - MEMCPY_ELEMENT_ATOMIC_8, - MEMCPY_ELEMENT_ATOMIC_16, + // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes + MEMCPY_ELEMENT_UNORDERED_ATOMIC_1, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_2, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_4, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_8, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_16, // EXCEPTION HANDLING UNWIND_RESUME, @@ -511,9 +511,10 @@ /// UNKNOWN_LIBCALL if there is none. Libcall getSYNC(unsigned Opc, MVT VT); - /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the - /// given element size or UNKNOW_LIBCALL if there is none. - Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize); + /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return + /// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or + /// UNKNOW_LIBCALL if there is none. + Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize); } } Index: include/llvm/IR/IntrinsicInst.h =================================================================== --- include/llvm/IR/IntrinsicInst.h +++ include/llvm/IR/IntrinsicInst.h @@ -205,25 +205,108 @@ }; /// This class represents atomic memcpy intrinsic - /// TODO: Integrate this class into MemIntrinsic hierarchy. - class ElementAtomicMemCpyInst : public IntrinsicInst { + /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is + /// C&P of all methods from that hierarchy + class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst { + private: + enum { + ARG_DEST = 0, + ARG_SOURCE = 1, + ARG_LENGTH = 2, + ARG_VOLATILE = 3, + ARG_ELEMENTSIZE = 4 + }; + public: - Value *getRawDest() const { return getArgOperand(0); } - Value *getRawSource() const { return getArgOperand(1); } + Value *getRawDest() const { + return const_cast(getArgOperand(ARG_DEST)); + } + const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); } + Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); } + + /// Return the arguments to the instruction. + Value *getRawSource() const { + return const_cast(getArgOperand(ARG_SOURCE)); + } + const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); } + Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); } + + Value *getLength() const { + return const_cast(getArgOperand(ARG_LENGTH)); + } + const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); } + Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); } - Value *getNumElements() const { return getArgOperand(2); } - void setNumElements(Value *V) { setArgOperand(2, V); } + ConstantInt *getVolatileCst() const { + return cast( + const_cast(getArgOperand(ARG_VOLATILE))); + } + + bool isVolatile() const { return !getVolatileCst()->isZero(); } + + Value *getRawElementSizeInBytes() const { + return const_cast(getArgOperand(ARG_ELEMENTSIZE)); + } - uint64_t getSrcAlignment() const { return getParamAlignment(0); } - uint64_t getDstAlignment() const { return getParamAlignment(1); } + ConstantInt *getElementSizeInBytesCst() const { + return cast(getRawElementSizeInBytes()); + } + + uint8_t getElementSizeInBytes() const { + return getElementSizeInBytesCst()->getZExtValue(); + } + + /// This is just like getRawDest, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getDest() const { return getRawDest()->stripPointerCasts(); } + + /// This is just like getRawSource, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getSource() const { return getRawSource()->stripPointerCasts(); } + + unsigned getDestAddressSpace() const { + return cast(getRawDest()->getType())->getAddressSpace(); + } + + unsigned getSourceAddressSpace() const { + return cast(getRawSource()->getType())->getAddressSpace(); + } + + /// Set the specified arguments of the instruction. + void setDest(Value *Ptr) { + assert(getRawDest()->getType() == Ptr->getType() && + "setDest called with pointer of wrong type!"); + setArgOperand(ARG_DEST, Ptr); + } + + void setSource(Value *Ptr) { + assert(getRawSource()->getType() == Ptr->getType() && + "setSource called with pointer of wrong type!"); + setArgOperand(ARG_SOURCE, Ptr); + } + + void setLength(Value *L) { + assert(getLength()->getType() == L->getType() && + "setLength called with value of wrong type!"); + setArgOperand(ARG_LENGTH, L); + } + + void setVolatile(Constant *V) { + assert(V->getType() == Type::getInt1Ty(getContext()) && + "setVolatile called with value of wrong type!"); + setArgOperand(ARG_VOLATILE, V); + } - uint64_t getElementSizeInBytes() const { - Value *Arg = getArgOperand(3); - return cast(Arg)->getZExtValue(); + void setElementSizeInBytes(Constant *V) { + assert(V->getType() == Type::getInt8Ty(getContext()) && + "setElementSizeInBytes called with value of wrong type!"); + setArgOperand(ARG_ELEMENTSIZE, V); } static inline bool classof(const IntrinsicInst *I) { - return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic; + return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic; } static inline bool classof(const Value *V) { return isa(V) && classof(cast(V)); Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -862,11 +862,17 @@ //===------ Memory intrinsics with element-wise atomicity guarantees ------===// // -def int_memcpy_element_atomic : Intrinsic<[], - [llvm_anyptr_ty, llvm_anyptr_ty, - llvm_i64_ty, llvm_i32_ty], - [IntrArgMemOnly, NoCapture<0>, NoCapture<1>, - WriteOnly<0>, ReadOnly<1>]>; +//llvm.memcpy.element.unordered.atomic(dest, src, length, volatile, elementsize) +def int_memcpy_element_unordered_atomic + : Intrinsic<[], + [ + llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, + llvm_i1_ty, llvm_i8_ty + ], + [ + IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>, + ReadOnly<1> + ]>; //===------------------------ Reduction Intrinsics ------------------------===// // Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -4858,11 +4858,12 @@ updateDAGForMaybeTailCall(MM); return nullptr; } - case Intrinsic::memcpy_element_atomic: { - SDValue Dst = getValue(I.getArgOperand(0)); - SDValue Src = getValue(I.getArgOperand(1)); - SDValue NumElements = getValue(I.getArgOperand(2)); - SDValue ElementSize = getValue(I.getArgOperand(3)); + case Intrinsic::memcpy_element_unordered_atomic: { + const ElementUnorderedAtomicMemCpyInst &MI = + cast(I); + SDValue Dst = getValue(MI.getRawDest()); + SDValue Src = getValue(MI.getRawSource()); + SDValue Length = getValue(MI.getLength()); // Emit a library call. TargetLowering::ArgListTy Args; @@ -4874,18 +4875,13 @@ Entry.Node = Src; Args.push_back(Entry); - Entry.Ty = I.getArgOperand(2)->getType(); - Entry.Node = NumElements; + Entry.Ty = MI.getLength()->getType(); + Entry.Node = Length; Args.push_back(Entry); - Entry.Ty = Type::getInt32Ty(*DAG.getContext()); - Entry.Node = ElementSize; - Args.push_back(Entry); - - uint64_t ElementSizeConstant = - cast(I.getArgOperand(3))->getZExtValue(); + uint64_t ElementSizeConstant = MI.getElementSizeInBytes(); RTLIB::Libcall LibraryCall = - RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant); + RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant); if (LibraryCall == RTLIB::UNKNOWN_LIBCALL) report_fatal_error("Unsupported element size"); Index: lib/CodeGen/TargetLoweringBase.cpp =================================================================== --- lib/CodeGen/TargetLoweringBase.cpp +++ lib/CodeGen/TargetLoweringBase.cpp @@ -374,11 +374,16 @@ Names[RTLIB::MEMCPY] = "memcpy"; Names[RTLIB::MEMMOVE] = "memmove"; Names[RTLIB::MEMSET] = "memset"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] = + "__llvm_memcpy_element_unordered_atomic_1"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] = + "__llvm_memcpy_element_unordered_atomic_2"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] = + "__llvm_memcpy_element_unordered_atomic_4"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] = + "__llvm_memcpy_element_unordered_atomic_8"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] = + "__llvm_memcpy_element_unordered_atomic_16"; Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2"; @@ -781,22 +786,21 @@ return UNKNOWN_LIBCALL; } -RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) { +RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) { switch (ElementSize) { case 1: - return MEMCPY_ELEMENT_ATOMIC_1; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1; case 2: - return MEMCPY_ELEMENT_ATOMIC_2; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2; case 4: - return MEMCPY_ELEMENT_ATOMIC_4; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4; case 8: - return MEMCPY_ELEMENT_ATOMIC_8; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8; case 16: - return MEMCPY_ELEMENT_ATOMIC_16; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16; default: return UNKNOWN_LIBCALL; } - } /// InitCmpLibcallCCs - Set default comparison libcall CC. Index: lib/IR/Verifier.cpp =================================================================== --- lib/IR/Verifier.cpp +++ lib/IR/Verifier.cpp @@ -4004,10 +4004,16 @@ CS); break; } - case Intrinsic::memcpy_element_atomic: { - ConstantInt *ElementSizeCI = dyn_cast(CS.getArgOperand(3)); - Assert(ElementSizeCI, "element size of the element-wise atomic memory " - "intrinsic must be a constant int", + case Intrinsic::memcpy_element_unordered_atomic: { + const ElementUnorderedAtomicMemCpyInst *MI = + cast(CS.getInstruction()); + ; + + ConstantInt *ElementSizeCI = + dyn_cast(MI->getRawElementSizeInBytes()); + Assert(ElementSizeCI, + "element size of the element-wise unordered atomic memory " + "intrinsic must be a constant int", CS); const APInt &ElementSizeVal = ElementSizeCI->getValue(); Assert(ElementSizeVal.isPowerOf2(), @@ -4018,16 +4024,12 @@ auto IsValidAlignment = [&](uint64_t Alignment) { return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment); }; - uint64_t DstAlignment = CS.getParamAlignment(0), SrcAlignment = CS.getParamAlignment(1); - Assert(IsValidAlignment(DstAlignment), - "incorrect alignment of the destination argument", - CS); + "incorrect alignment of the destination argument", CS); Assert(IsValidAlignment(SrcAlignment), - "incorrect alignment of the source argument", - CS); + "incorrect alignment of the source argument", CS); break; } case Intrinsic::gcroot: Index: lib/Transforms/InstCombine/InstCombineCalls.cpp =================================================================== --- lib/Transforms/InstCombine/InstCombineCalls.cpp +++ lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -94,23 +94,25 @@ return ConstantVector::get(BoolVec); } -Instruction * -InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) { +Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy( + ElementUnorderedAtomicMemCpyInst *AMI) { // Try to unfold this intrinsic into sequence of explicit atomic loads and // stores. // First check that number of elements is compile time constant. - auto *NumElementsCI = dyn_cast(AMI->getNumElements()); - if (!NumElementsCI) + auto *LengthCI = dyn_cast(AMI->getLength()); + if (!LengthCI) return nullptr; // Check that there are not too many elements. - uint64_t NumElements = NumElementsCI->getZExtValue(); + uint64_t LengthInBytes = LengthCI->getZExtValue(); + uint8_t ElementSizeInBytes = AMI->getElementSizeInBytes(); + uint64_t NumElements = LengthInBytes / ElementSizeInBytes; if (NumElements >= UnfoldElementAtomicMemcpyMaxElements) return nullptr; // Don't unfold into illegal integers - uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8; - if (!getDataLayout().isLegalInteger(ElementSizeInBytes)) + uint64_t ElementSizeInBits = ElementSizeInBytes * 8; + if (!getDataLayout().isLegalInteger(ElementSizeInBits)) return nullptr; // Cast source and destination to the correct type. Intrinsic input arguments @@ -120,8 +122,9 @@ // other InstCombine rules which will cover trivial cases anyway. Value *Src = AMI->getRawSource(); Value *Dst = AMI->getRawDest(); - Type *ElementPointerType = Type::getIntNPtrTy( - AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace()); + Type *ElementPointerType = + Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits, + Src->getType()->getPointerAddressSpace()); Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType, "memcpy_unfold.src_casted"); @@ -148,21 +151,20 @@ // aligned. // TODO: We can infer better alignment but there is no evidence that this // will matter. - Load->setAlignment(i == 0 ? AMI->getSrcAlignment() - : AMI->getElementSizeInBytes()); + Load->setAlignment(i == 0 ? AMI->getParamAlignment(1) : ElementSizeInBytes); Load->setDebugLoc(AMI->getDebugLoc()); // Store loaded value via unordered atomic store. StoreInst *Store = Builder->CreateStore(Load, DstElementAddr); Store->setOrdering(AtomicOrdering::Unordered); - Store->setAlignment(i == 0 ? AMI->getDstAlignment() - : AMI->getElementSizeInBytes()); + Store->setAlignment(i == 0 ? AMI->getParamAlignment(0) + : ElementSizeInBytes); Store->setDebugLoc(AMI->getDebugLoc()); } // Set the number of elements of the copy to 0, it will be deleted on the // next iteration. - AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType())); + AMI->setLength(Constant::getNullValue(LengthCI->getType())); return AMI; } @@ -1890,12 +1892,12 @@ if (Changed) return II; } - if (auto *AMI = dyn_cast(II)) { - if (Constant *C = dyn_cast(AMI->getNumElements())) + if (auto *AMI = dyn_cast(II)) { + if (Constant *C = dyn_cast(AMI->getLength())) if (C->isNullValue()) return eraseInstFromFunction(*AMI); - if (Instruction *I = SimplifyElementAtomicMemCpy(AMI)) + if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI)) return I; } Index: lib/Transforms/InstCombine/InstCombineInternal.h =================================================================== --- lib/Transforms/InstCombine/InstCombineInternal.h +++ lib/Transforms/InstCombine/InstCombineInternal.h @@ -724,7 +724,8 @@ Instruction *MatchBSwap(BinaryOperator &I); bool SimplifyStoreAtEndOfBlock(StoreInst &SI); - Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI); + Instruction * + SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI); Instruction *SimplifyMemTransfer(MemIntrinsic *MI); Instruction *SimplifyMemSet(MemSetInst *MI); Index: test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll =================================================================== --- test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll +++ test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll @@ -2,47 +2,47 @@ define i8* @test_memcpy1(i8* %P, i8* %Q) { ; CHECK: test_memcpy - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i1 0, i8 1) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $1, %edx - ; CHECK-DAG: movl $1, %ecx - ; CHECK: __llvm_memcpy_element_atomic_1 + ; CHECK: __llvm_memcpy_element_unordered_atomic_1 } define i8* @test_memcpy2(i8* %P, i8* %Q) { ; CHECK: test_memcpy2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i1 0, i8 2) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $2, %edx - ; CHECK-DAG: movl $2, %ecx - ; CHECK: __llvm_memcpy_element_atomic_2 + ; CHECK: __llvm_memcpy_element_unordered_atomic_2 } define i8* @test_memcpy4(i8* %P, i8* %Q) { ; CHECK: test_memcpy4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i1 0, i8 4) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $4, %edx - ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 } define i8* @test_memcpy8(i8* %P, i8* %Q) { ; CHECK: test_memcpy8 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i1 0, i8 8) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $8, %edx - ; CHECK-DAG: movl $8, %ecx - ; CHECK: __llvm_memcpy_element_atomic_8 + ; CHECK: __llvm_memcpy_element_unordered_atomic_8 } define i8* @test_memcpy16(i8* %P, i8* %Q) { ; CHECK: test_memcpy16 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i1 0, i8 16) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $16, %edx - ; CHECK-DAG: movl $16, %ecx - ; CHECK: __llvm_memcpy_element_atomic_16 + ; CHECK: __llvm_memcpy_element_unordered_atomic_16 } define void @test_memcpy_args(i8** %Storage) { @@ -51,18 +51,15 @@ %Src.addr = getelementptr i8*, i8** %Storage, i64 1 %Src = load i8*, i8** %Src.addr - ; First argument + ; 1st arg (%rdi) ; CHECK-DAG: movq (%rdi), [[REG1:%r.+]] ; CHECK-DAG: movq [[REG1]], %rdi - ; Second argument + ; 2nd arg (%rsi) ; CHECK-DAG: movq 8(%rdi), %rsi - ; Third argument + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $4, %edx - ; Fourth argument - ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4) - ret void + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i1 0, i8 4) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1, i8) nounwind Index: test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll =================================================================== --- test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll +++ test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll @@ -1,10 +1,11 @@ ; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s +; Temporarily an expected failure until inst combine is updated in the next patch target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" -; Test basic unfolding -define void @test1(i8* %Src, i8* %Dst) { -; CHECK-LABEL: test1 -; CHECK-NOT: llvm.memcpy.element.atomic +; Test basic unfolding -- unordered load & store +define void @test1a(i8* %Src, i8* %Dst) { +; CHECK-LABEL: test1a +; CHECK-NOT: llvm.memcpy.element.unordered.atomic ; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32* ; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32* @@ -21,7 +22,7 @@ ; CHECK-DAG: [[VAL4:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4 ; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4 entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i1 false, i8 4) ret void } @@ -31,9 +32,9 @@ ; CHECK-NOT: load ; CHECK-NOT: store -; CHECK: llvm.memcpy.element.atomic +; CHECK: llvm.memcpy.element.unordered.atomic entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i1 false, i8 4) ret void } @@ -43,16 +44,16 @@ ; CHECK-NOT: load ; CHECK-NOT: store -; CHECK: llvm.memcpy.element.atomic +; CHECK: llvm.memcpy.element.unordered.atomic entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i1 false, i8 64) ret void } ; Test that we will eliminate redundant bitcasts define void @test4(i64* %Src, i64* %Dst) { ; CHECK-LABEL: test4 -; CHECK-NOT: llvm.memcpy.element.atomic +; CHECK-NOT: llvm.memcpy.element.unordered.atomic ; CHECK-NOT: bitcast @@ -76,17 +77,18 @@ entry: %Src.casted = bitcast i64* %Src to i8* %Dst.casted = bitcast i64* %Dst to i8* - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i1 false, i8 8) ret void } +; Test that 0-length unordered atomic memcpy gets removed. define void @test5(i8* %Src, i8* %Dst) { ; CHECK-LABEL: test5 -; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64) +; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i1 false, i8 8) entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i1 false, i8 8) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1, i8) nounwind Index: test/Verifier/element-wise-atomic-memory-intrinsics.ll =================================================================== --- test/Verifier/element-wise-atomic-memory-intrinsics.ll +++ test/Verifier/element-wise-atomic-memory-intrinsics.ll @@ -1,17 +1,22 @@ ; RUN: not opt -verify < %s 2>&1 | FileCheck %s -define void @test_memcpy(i8* %P, i8* %Q) { +define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i8 %E, i1 %V) { + ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i1 0, i8 %E) ; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i1 0, i8 3) ; CHECK: incorrect alignment of the destination argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i1 0, i8 1) + ; CHECK: incorrect alignment of the destination argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i1 0, i8 4) ; CHECK: incorrect alignment of the source argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i1 0, i8 1) + ; CHECK: incorrect alignment of the source argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i1 0, i8 4) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind - +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1, i8) nounwind ; CHECK: input module is broken!