Index: llvm/trunk/docs/LangRef.rst =================================================================== --- llvm/trunk/docs/LangRef.rst +++ llvm/trunk/docs/LangRef.rst @@ -14068,62 +14068,66 @@ These intrinsics are similar to the standard library memory intrinsics except that they perform memory transfer as a sequence of atomic memory accesses. -.. _int_memcpy_element_atomic: +.. _int_memcpy_element_unordered_atomic: -'``llvm.memcpy.element.atomic``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +'``llvm.memcpy.element.unordered.atomic``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" -This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on +This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on any integer bit width and for different address spaces. Not all targets support all bit widths however. :: - declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* , i8* , - i64 , i32 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* , + i8* , + i32 , + i32 ) + declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* , + i8* , + i64 , + i32 ) Overview: """"""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of -memory from the source location to the destination location as a sequence of -unordered atomic memory accesses where each access is a multiple of -``element_size`` bytes wide and aligned at an element size boundary. For example -each element is accessed atomically in source and destination buffers. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the +'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated +as arrays with elements that are exactly ``element_size`` bytes, and the copy between +buffers uses a sequence of :ref:`unordered atomic ` load/store operations +that are a positive integer multiple of the ``element_size`` in size. Arguments: """""""""" -The first argument is a pointer to the destination, the second is a -pointer to the source. The third argument is an integer argument -specifying the number of elements to copy, the fourth argument is size of -the single element in bytes. +The first three arguments are the same as they are in the :ref:`@llvm.memcpy ` +intrinsic, with the added constraint that ``len`` is required to be a positive integer +multiple of the ``element_size``. If ``len`` is not a positive integer multiple of +``element_size``, then the behaviour of the intrinsic is undefined. -``element_size`` should be a power of two, greater than zero and less than -a target-specific atomic access size limit. +``element_size`` must be a compile-time constant positive power of two no greater than +target-specific atomic access size limit. -For each of the input pointers ``align`` parameter attribute must be specified. -It must be a power of two and greater than or equal to the ``element_size``. -Caller guarantees that both the source and destination pointers are aligned to -that boundary. +For each of the input pointers ``align`` parameter attribute must be specified. It +must be a power of two no less than the ``element_size``. Caller guarantees that +both the source and destination pointers are aligned to that boundary. Semantics: """""""""" -The '``llvm.memcpy.element.atomic.*``' intrinsic copies -'``num_elements`` * ``element_size``' bytes of memory from the source location to -the destination location. These locations are not allowed to overlap. Memory copy -is performed as a sequence of unordered atomic memory accesses where each access -is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an -element size boundary. +The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of +memory from the source location to the destination location. These locations are not +allowed to overlap. The memory copy is performed as a sequence of load/store operations +where each access is guaranteed to be a multiple of ``element_size`` bytes wide and +aligned at an ``element_size`` boundary. The order of the copy is unspecified. The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per -element. It is well defined to have concurrent reads and writes to both source -and destination provided those reads and writes are at least unordered atomic. +element. It is well defined to have concurrent reads and writes to both source and +destination provided those reads and writes are unordered atomic when specified. This intrinsic does not provide any additional ordering guarantees over those provided by a set of unordered loads from the source location and stores to the @@ -14132,8 +14136,8 @@ Lowering: """"""""" -In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered -to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced -with an actual element size. +In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is +lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*' +is replaced with an actual element size. -Optimizer is allowed to inline memory copy when it's profitable to do so. +The optimizer is allowed to inline the memory copy when it's profitable to do so. Index: llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h =================================================================== --- llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h +++ llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h @@ -333,12 +333,12 @@ MEMSET, MEMMOVE, - // ELEMENT-WISE ATOMIC MEMORY - MEMCPY_ELEMENT_ATOMIC_1, - MEMCPY_ELEMENT_ATOMIC_2, - MEMCPY_ELEMENT_ATOMIC_4, - MEMCPY_ELEMENT_ATOMIC_8, - MEMCPY_ELEMENT_ATOMIC_16, + // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes + MEMCPY_ELEMENT_UNORDERED_ATOMIC_1, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_2, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_4, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_8, + MEMCPY_ELEMENT_UNORDERED_ATOMIC_16, // EXCEPTION HANDLING UNWIND_RESUME, @@ -511,9 +511,10 @@ /// UNKNOWN_LIBCALL if there is none. Libcall getSYNC(unsigned Opc, MVT VT); - /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the - /// given element size or UNKNOW_LIBCALL if there is none. - Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize); + /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return + /// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or + /// UNKNOW_LIBCALL if there is none. + Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize); } } Index: llvm/trunk/include/llvm/IR/IRBuilder.h =================================================================== --- llvm/trunk/include/llvm/IR/IRBuilder.h +++ llvm/trunk/include/llvm/IR/IRBuilder.h @@ -435,27 +435,25 @@ MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr); - /// \brief Create and insert an atomic memcpy between the specified - /// pointers. + /// \brief Create and insert an element unordered-atomic memcpy between the + /// specified pointers. /// /// If the pointers aren't i8*, they will be converted. If a TBAA tag is /// specified, it will be added to the instruction. Likewise with alias.scope /// and noalias tags. - CallInst *CreateElementAtomicMemCpy( - Value *Dst, Value *Src, uint64_t NumElements, uint32_t ElementSize, + CallInst *CreateElementUnorderedAtomicMemCpy( + Value *Dst, Value *Src, uint64_t Size, uint32_t ElementSize, MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr, MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr) { - return CreateElementAtomicMemCpy(Dst, Src, getInt64(NumElements), - ElementSize, TBAATag, TBAAStructTag, - ScopeTag, NoAliasTag); + return CreateElementUnorderedAtomicMemCpy( + Dst, Src, getInt64(Size), ElementSize, TBAATag, TBAAStructTag, ScopeTag, + NoAliasTag); } - CallInst *CreateElementAtomicMemCpy(Value *Dst, Value *Src, - Value *NumElements, uint32_t ElementSize, - MDNode *TBAATag = nullptr, - MDNode *TBAAStructTag = nullptr, - MDNode *ScopeTag = nullptr, - MDNode *NoAliasTag = nullptr); + CallInst *CreateElementUnorderedAtomicMemCpy( + Value *Dst, Value *Src, Value *Size, uint32_t ElementSize, + MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr, + MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr); /// \brief Create and insert a memmove between the specified /// pointers. Index: llvm/trunk/include/llvm/IR/IntrinsicInst.h =================================================================== --- llvm/trunk/include/llvm/IR/IntrinsicInst.h +++ llvm/trunk/include/llvm/IR/IntrinsicInst.h @@ -205,25 +205,91 @@ }; /// This class represents atomic memcpy intrinsic - /// TODO: Integrate this class into MemIntrinsic hierarchy. - class ElementAtomicMemCpyInst : public IntrinsicInst { + /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is + /// C&P of all methods from that hierarchy + class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst { + private: + enum { ARG_DEST = 0, ARG_SOURCE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 }; + public: - Value *getRawDest() const { return getArgOperand(0); } - Value *getRawSource() const { return getArgOperand(1); } + Value *getRawDest() const { + return const_cast(getArgOperand(ARG_DEST)); + } + const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); } + Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); } + + /// Return the arguments to the instruction. + Value *getRawSource() const { + return const_cast(getArgOperand(ARG_SOURCE)); + } + const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); } + Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); } + + Value *getLength() const { + return const_cast(getArgOperand(ARG_LENGTH)); + } + const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); } + Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); } + + bool isVolatile() const { return false; } + + Value *getRawElementSizeInBytes() const { + return const_cast(getArgOperand(ARG_ELEMENTSIZE)); + } + + ConstantInt *getElementSizeInBytesCst() const { + return cast(getRawElementSizeInBytes()); + } + + uint32_t getElementSizeInBytes() const { + return getElementSizeInBytesCst()->getZExtValue(); + } + + /// This is just like getRawDest, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getDest() const { return getRawDest()->stripPointerCasts(); } + + /// This is just like getRawSource, but it strips off any cast + /// instructions that feed it, giving the original input. The returned + /// value is guaranteed to be a pointer. + Value *getSource() const { return getRawSource()->stripPointerCasts(); } + + unsigned getDestAddressSpace() const { + return cast(getRawDest()->getType())->getAddressSpace(); + } - Value *getNumElements() const { return getArgOperand(2); } - void setNumElements(Value *V) { setArgOperand(2, V); } + unsigned getSourceAddressSpace() const { + return cast(getRawSource()->getType())->getAddressSpace(); + } - uint64_t getSrcAlignment() const { return getParamAlignment(0); } - uint64_t getDstAlignment() const { return getParamAlignment(1); } + /// Set the specified arguments of the instruction. + void setDest(Value *Ptr) { + assert(getRawDest()->getType() == Ptr->getType() && + "setDest called with pointer of wrong type!"); + setArgOperand(ARG_DEST, Ptr); + } + + void setSource(Value *Ptr) { + assert(getRawSource()->getType() == Ptr->getType() && + "setSource called with pointer of wrong type!"); + setArgOperand(ARG_SOURCE, Ptr); + } + + void setLength(Value *L) { + assert(getLength()->getType() == L->getType() && + "setLength called with value of wrong type!"); + setArgOperand(ARG_LENGTH, L); + } - uint64_t getElementSizeInBytes() const { - Value *Arg = getArgOperand(3); - return cast(Arg)->getZExtValue(); + void setElementSizeInBytes(Constant *V) { + assert(V->getType() == Type::getInt8Ty(getContext()) && + "setElementSizeInBytes called with value of wrong type!"); + setArgOperand(ARG_ELEMENTSIZE, V); } static inline bool classof(const IntrinsicInst *I) { - return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic; + return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic; } static inline bool classof(const Value *V) { return isa(V) && classof(cast(V)); Index: llvm/trunk/include/llvm/IR/Intrinsics.td =================================================================== --- llvm/trunk/include/llvm/IR/Intrinsics.td +++ llvm/trunk/include/llvm/IR/Intrinsics.td @@ -862,11 +862,16 @@ //===------ Memory intrinsics with element-wise atomicity guarantees ------===// // -def int_memcpy_element_atomic : Intrinsic<[], - [llvm_anyptr_ty, llvm_anyptr_ty, - llvm_i64_ty, llvm_i32_ty], - [IntrArgMemOnly, NoCapture<0>, NoCapture<1>, - WriteOnly<0>, ReadOnly<1>]>; +// @llvm.memcpy.element.unordered.atomic.*(dest, src, length, elementsize) +def int_memcpy_element_unordered_atomic + : Intrinsic<[], + [ + llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty + ], + [ + IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>, + ReadOnly<1> + ]>; //===------------------------ Reduction Intrinsics ------------------------===// // Index: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -4943,11 +4943,12 @@ updateDAGForMaybeTailCall(MM); return nullptr; } - case Intrinsic::memcpy_element_atomic: { - SDValue Dst = getValue(I.getArgOperand(0)); - SDValue Src = getValue(I.getArgOperand(1)); - SDValue NumElements = getValue(I.getArgOperand(2)); - SDValue ElementSize = getValue(I.getArgOperand(3)); + case Intrinsic::memcpy_element_unordered_atomic: { + const ElementUnorderedAtomicMemCpyInst &MI = + cast(I); + SDValue Dst = getValue(MI.getRawDest()); + SDValue Src = getValue(MI.getRawSource()); + SDValue Length = getValue(MI.getLength()); // Emit a library call. TargetLowering::ArgListTy Args; @@ -4959,18 +4960,13 @@ Entry.Node = Src; Args.push_back(Entry); - Entry.Ty = I.getArgOperand(2)->getType(); - Entry.Node = NumElements; + Entry.Ty = MI.getLength()->getType(); + Entry.Node = Length; Args.push_back(Entry); - Entry.Ty = Type::getInt32Ty(*DAG.getContext()); - Entry.Node = ElementSize; - Args.push_back(Entry); - - uint64_t ElementSizeConstant = - cast(I.getArgOperand(3))->getZExtValue(); + uint64_t ElementSizeConstant = MI.getElementSizeInBytes(); RTLIB::Libcall LibraryCall = - RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant); + RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant); if (LibraryCall == RTLIB::UNKNOWN_LIBCALL) report_fatal_error("Unsupported element size"); Index: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp =================================================================== --- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp +++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp @@ -374,11 +374,16 @@ Names[RTLIB::MEMCPY] = "memcpy"; Names[RTLIB::MEMMOVE] = "memmove"; Names[RTLIB::MEMSET] = "memset"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8"; - Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] = + "__llvm_memcpy_element_unordered_atomic_1"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] = + "__llvm_memcpy_element_unordered_atomic_2"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] = + "__llvm_memcpy_element_unordered_atomic_4"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] = + "__llvm_memcpy_element_unordered_atomic_8"; + Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] = + "__llvm_memcpy_element_unordered_atomic_16"; Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1"; Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2"; @@ -781,22 +786,21 @@ return UNKNOWN_LIBCALL; } -RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) { +RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) { switch (ElementSize) { case 1: - return MEMCPY_ELEMENT_ATOMIC_1; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1; case 2: - return MEMCPY_ELEMENT_ATOMIC_2; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2; case 4: - return MEMCPY_ELEMENT_ATOMIC_4; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4; case 8: - return MEMCPY_ELEMENT_ATOMIC_8; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8; case 16: - return MEMCPY_ELEMENT_ATOMIC_16; + return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16; default: return UNKNOWN_LIBCALL; } - } /// InitCmpLibcallCCs - Set default comparison libcall CC. Index: llvm/trunk/lib/IR/IRBuilder.cpp =================================================================== --- llvm/trunk/lib/IR/IRBuilder.cpp +++ llvm/trunk/lib/IR/IRBuilder.cpp @@ -134,18 +134,17 @@ return CI; } -CallInst *IRBuilderBase::CreateElementAtomicMemCpy( - Value *Dst, Value *Src, Value *NumElements, uint32_t ElementSize, - MDNode *TBAATag, MDNode *TBAAStructTag, MDNode *ScopeTag, - MDNode *NoAliasTag) { +CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy( + Value *Dst, Value *Src, Value *Size, uint32_t ElementSize, MDNode *TBAATag, + MDNode *TBAAStructTag, MDNode *ScopeTag, MDNode *NoAliasTag) { Dst = getCastedInt8PtrValue(Dst); Src = getCastedInt8PtrValue(Src); - Value *Ops[] = {Dst, Src, NumElements, getInt32(ElementSize)}; - Type *Tys[] = {Dst->getType(), Src->getType()}; + Value *Ops[] = {Dst, Src, Size, getInt32(ElementSize)}; + Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()}; Module *M = BB->getParent()->getParent(); - Value *TheFn = - Intrinsic::getDeclaration(M, Intrinsic::memcpy_element_atomic, Tys); + Value *TheFn = Intrinsic::getDeclaration( + M, Intrinsic::memcpy_element_unordered_atomic, Tys); CallInst *CI = createCallHelper(TheFn, Ops, this); Index: llvm/trunk/lib/IR/Verifier.cpp =================================================================== --- llvm/trunk/lib/IR/Verifier.cpp +++ llvm/trunk/lib/IR/Verifier.cpp @@ -4012,10 +4012,16 @@ CS); break; } - case Intrinsic::memcpy_element_atomic: { - ConstantInt *ElementSizeCI = dyn_cast(CS.getArgOperand(3)); - Assert(ElementSizeCI, "element size of the element-wise atomic memory " - "intrinsic must be a constant int", + case Intrinsic::memcpy_element_unordered_atomic: { + const ElementUnorderedAtomicMemCpyInst *MI = + cast(CS.getInstruction()); + ; + + ConstantInt *ElementSizeCI = + dyn_cast(MI->getRawElementSizeInBytes()); + Assert(ElementSizeCI, + "element size of the element-wise unordered atomic memory " + "intrinsic must be a constant int", CS); const APInt &ElementSizeVal = ElementSizeCI->getValue(); Assert(ElementSizeVal.isPowerOf2(), @@ -4023,19 +4029,24 @@ "must be a power of 2", CS); + if (auto *LengthCI = dyn_cast(MI->getLength())) { + uint64_t Length = LengthCI->getZExtValue(); + uint64_t ElementSize = MI->getElementSizeInBytes(); + Assert((Length % ElementSize) == 0, + "constant length must be a multiple of the element size in the " + "element-wise atomic memory intrinsic", + CS); + } + auto IsValidAlignment = [&](uint64_t Alignment) { return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment); }; - uint64_t DstAlignment = CS.getParamAlignment(0), SrcAlignment = CS.getParamAlignment(1); - Assert(IsValidAlignment(DstAlignment), - "incorrect alignment of the destination argument", - CS); + "incorrect alignment of the destination argument", CS); Assert(IsValidAlignment(SrcAlignment), - "incorrect alignment of the source argument", - CS); + "incorrect alignment of the source argument", CS); break; } case Intrinsic::gcroot: Index: llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp =================================================================== --- llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -94,75 +94,80 @@ return ConstantVector::get(BoolVec); } -Instruction * -InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) { +Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy( + ElementUnorderedAtomicMemCpyInst *AMI) { // Try to unfold this intrinsic into sequence of explicit atomic loads and // stores. // First check that number of elements is compile time constant. - auto *NumElementsCI = dyn_cast(AMI->getNumElements()); - if (!NumElementsCI) + auto *LengthCI = dyn_cast(AMI->getLength()); + if (!LengthCI) return nullptr; // Check that there are not too many elements. - uint64_t NumElements = NumElementsCI->getZExtValue(); + uint64_t LengthInBytes = LengthCI->getZExtValue(); + uint32_t ElementSizeInBytes = AMI->getElementSizeInBytes(); + uint64_t NumElements = LengthInBytes / ElementSizeInBytes; if (NumElements >= UnfoldElementAtomicMemcpyMaxElements) return nullptr; - // Don't unfold into illegal integers - uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8; - if (!getDataLayout().isLegalInteger(ElementSizeInBytes)) - return nullptr; + // Only expand if there are elements to copy. + if (NumElements > 0) { + // Don't unfold into illegal integers + uint64_t ElementSizeInBits = ElementSizeInBytes * 8; + if (!getDataLayout().isLegalInteger(ElementSizeInBits)) + return nullptr; - // Cast source and destination to the correct type. Intrinsic input arguments - // are usually represented as i8*. - // Often operands will be explicitly casted to i8* and we can just strip - // those casts instead of inserting new ones. However it's easier to rely on - // other InstCombine rules which will cover trivial cases anyway. - Value *Src = AMI->getRawSource(); - Value *Dst = AMI->getRawDest(); - Type *ElementPointerType = Type::getIntNPtrTy( - AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace()); - - Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType, - "memcpy_unfold.src_casted"); - Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType, - "memcpy_unfold.dst_casted"); - - for (uint64_t i = 0; i < NumElements; ++i) { - // Get current element addresses - ConstantInt *ElementIdxCI = - ConstantInt::get(AMI->getContext(), APInt(64, i)); - Value *SrcElementAddr = - Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr"); - Value *DstElementAddr = - Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr"); - - // Load from the source. Transfer alignment information and mark load as - // unordered atomic. - LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val"); - Load->setOrdering(AtomicOrdering::Unordered); - // We know alignment of the first element. It is also guaranteed by the - // verifier that element size is less or equal than first element alignment - // and both of this values are powers of two. - // This means that all subsequent accesses are at least element size - // aligned. - // TODO: We can infer better alignment but there is no evidence that this - // will matter. - Load->setAlignment(i == 0 ? AMI->getSrcAlignment() - : AMI->getElementSizeInBytes()); - Load->setDebugLoc(AMI->getDebugLoc()); - - // Store loaded value via unordered atomic store. - StoreInst *Store = Builder->CreateStore(Load, DstElementAddr); - Store->setOrdering(AtomicOrdering::Unordered); - Store->setAlignment(i == 0 ? AMI->getDstAlignment() - : AMI->getElementSizeInBytes()); - Store->setDebugLoc(AMI->getDebugLoc()); + // Cast source and destination to the correct type. Intrinsic input + // arguments are usually represented as i8*. Often operands will be + // explicitly casted to i8* and we can just strip those casts instead of + // inserting new ones. However it's easier to rely on other InstCombine + // rules which will cover trivial cases anyway. + Value *Src = AMI->getRawSource(); + Value *Dst = AMI->getRawDest(); + Type *ElementPointerType = + Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits, + Src->getType()->getPointerAddressSpace()); + + Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType, + "memcpy_unfold.src_casted"); + Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType, + "memcpy_unfold.dst_casted"); + + for (uint64_t i = 0; i < NumElements; ++i) { + // Get current element addresses + ConstantInt *ElementIdxCI = + ConstantInt::get(AMI->getContext(), APInt(64, i)); + Value *SrcElementAddr = + Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr"); + Value *DstElementAddr = + Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr"); + + // Load from the source. Transfer alignment information and mark load as + // unordered atomic. + LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val"); + Load->setOrdering(AtomicOrdering::Unordered); + // We know alignment of the first element. It is also guaranteed by the + // verifier that element size is less or equal than first element + // alignment and both of this values are powers of two. This means that + // all subsequent accesses are at least element size aligned. + // TODO: We can infer better alignment but there is no evidence that this + // will matter. + Load->setAlignment(i == 0 ? AMI->getParamAlignment(1) + : ElementSizeInBytes); + Load->setDebugLoc(AMI->getDebugLoc()); + + // Store loaded value via unordered atomic store. + StoreInst *Store = Builder->CreateStore(Load, DstElementAddr); + Store->setOrdering(AtomicOrdering::Unordered); + Store->setAlignment(i == 0 ? AMI->getParamAlignment(0) + : ElementSizeInBytes); + Store->setDebugLoc(AMI->getDebugLoc()); + } } // Set the number of elements of the copy to 0, it will be deleted on the // next iteration. - AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType())); + AMI->setLength(Constant::getNullValue(LengthCI->getType())); return AMI; } @@ -1888,12 +1893,12 @@ if (Changed) return II; } - if (auto *AMI = dyn_cast(II)) { - if (Constant *C = dyn_cast(AMI->getNumElements())) + if (auto *AMI = dyn_cast(II)) { + if (Constant *C = dyn_cast(AMI->getLength())) if (C->isNullValue()) return eraseInstFromFunction(*AMI); - if (Instruction *I = SimplifyElementAtomicMemCpy(AMI)) + if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI)) return I; } Index: llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h =================================================================== --- llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h @@ -726,7 +726,8 @@ Instruction *MatchBSwap(BinaryOperator &I); bool SimplifyStoreAtEndOfBlock(StoreInst &SI); - Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI); + Instruction * + SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI); Instruction *SimplifyMemTransfer(MemIntrinsic *MI); Instruction *SimplifyMemSet(MemSetInst *MI); Index: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp =================================================================== --- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp +++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp @@ -983,21 +983,21 @@ const SCEV *NumBytesS = SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW); + if (StoreSize != 1) + NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize), + SCEV::FlagNUW); + + Value *NumBytes = + Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator()); + unsigned Align = std::min(SI->getAlignment(), LI->getAlignment()); CallInst *NewCall = nullptr; // Check whether to generate an unordered atomic memcpy: // If the load or store are atomic, then they must neccessarily be unordered // by previous checks. - if (!SI->isAtomic() && !LI->isAtomic()) { - if (StoreSize != 1) - NumBytesS = SE->getMulExpr( - NumBytesS, SE->getConstant(IntPtrTy, StoreSize), SCEV::FlagNUW); - - Value *NumBytes = - Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator()); - + if (!SI->isAtomic() && !LI->isAtomic()) NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align); - } else { + else { // We cannot allow unaligned ops for unordered load/store, so reject // anything where the alignment isn't at least the element size. if (Align < StoreSize) @@ -1010,11 +1010,9 @@ if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize()) return false; - Value *NumElements = - Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator()); + NewCall = Builder.CreateElementUnorderedAtomicMemCpy( + StoreBasePtr, LoadBasePtr, NumBytes, StoreSize); - NewCall = Builder.CreateElementAtomicMemCpy(StoreBasePtr, LoadBasePtr, - NumElements, StoreSize); // Propagate alignment info onto the pointer args. Note that unordered // atomic loads/stores are *required* by the spec to have an alignment // but non-atomic loads/stores may not. Index: llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll =================================================================== --- llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll +++ llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll @@ -2,47 +2,47 @@ define i8* @test_memcpy1(i8* %P, i8* %Q) { ; CHECK: test_memcpy - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 1) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $1, %edx - ; CHECK-DAG: movl $1, %ecx - ; CHECK: __llvm_memcpy_element_atomic_1 + ; CHECK: __llvm_memcpy_element_unordered_atomic_1 } define i8* @test_memcpy2(i8* %P, i8* %Q) { ; CHECK: test_memcpy2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 2) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $2, %edx - ; CHECK-DAG: movl $2, %ecx - ; CHECK: __llvm_memcpy_element_atomic_2 + ; CHECK: __llvm_memcpy_element_unordered_atomic_2 } define i8* @test_memcpy4(i8* %P, i8* %Q) { ; CHECK: test_memcpy4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $4, %edx - ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 } define i8* @test_memcpy8(i8* %P, i8* %Q) { ; CHECK: test_memcpy8 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $8, %edx - ; CHECK-DAG: movl $8, %ecx - ; CHECK: __llvm_memcpy_element_atomic_8 + ; CHECK: __llvm_memcpy_element_unordered_atomic_8 } define i8* @test_memcpy16(i8* %P, i8* %Q) { ; CHECK: test_memcpy16 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16) ret i8* %P + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $16, %edx - ; CHECK-DAG: movl $16, %ecx - ; CHECK: __llvm_memcpy_element_atomic_16 + ; CHECK: __llvm_memcpy_element_unordered_atomic_16 } define void @test_memcpy_args(i8** %Storage) { @@ -51,18 +51,15 @@ %Src.addr = getelementptr i8*, i8** %Storage, i64 1 %Src = load i8*, i8** %Src.addr - ; First argument + ; 1st arg (%rdi) ; CHECK-DAG: movq (%rdi), [[REG1:%r.+]] ; CHECK-DAG: movq [[REG1]], %rdi - ; Second argument + ; 2nd arg (%rsi) ; CHECK-DAG: movq 8(%rdi), %rsi - ; Third argument + ; 3rd arg (%edx) -- length ; CHECK-DAG: movl $4, %edx - ; Fourth argument - ; CHECK-DAG: movl $4, %ecx - ; CHECK: __llvm_memcpy_element_atomic_4 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4) - ret void + ; CHECK: __llvm_memcpy_element_unordered_atomic_4 + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind Index: llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll =================================================================== --- llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll +++ llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll @@ -1,10 +1,11 @@ ; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s +; Temporarily an expected failure until inst combine is updated in the next patch target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" -; Test basic unfolding -define void @test1(i8* %Src, i8* %Dst) { -; CHECK-LABEL: test1 -; CHECK-NOT: llvm.memcpy.element.atomic +; Test basic unfolding -- unordered load & store +define void @test1a(i8* %Src, i8* %Dst) { +; CHECK-LABEL: test1a +; CHECK-NOT: llvm.memcpy.element.unordered.atomic ; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32* ; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32* @@ -21,7 +22,7 @@ ; CHECK-DAG: [[VAL4:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4 ; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4 entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i32 4) ret void } @@ -31,9 +32,9 @@ ; CHECK-NOT: load ; CHECK-NOT: store -; CHECK: llvm.memcpy.element.atomic +; CHECK: llvm.memcpy.element.unordered.atomic entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i32 4) ret void } @@ -43,16 +44,16 @@ ; CHECK-NOT: load ; CHECK-NOT: store -; CHECK: llvm.memcpy.element.atomic +; CHECK: llvm.memcpy.element.unordered.atomic entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i32 64) ret void } ; Test that we will eliminate redundant bitcasts define void @test4(i64* %Src, i64* %Dst) { ; CHECK-LABEL: test4 -; CHECK-NOT: llvm.memcpy.element.atomic +; CHECK-NOT: llvm.memcpy.element.unordered.atomic ; CHECK-NOT: bitcast @@ -76,17 +77,18 @@ entry: %Src.casted = bitcast i64* %Src to i8* %Dst.casted = bitcast i64* %Dst to i8* - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i32 8) ret void } +; Test that 0-length unordered atomic memcpy gets removed. define void @test5(i8* %Src, i8* %Dst) { ; CHECK-LABEL: test5 -; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64) +; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8) entry: - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind Index: llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll =================================================================== --- llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll +++ llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll @@ -5,7 +5,7 @@ ;; memcpy.atomic formation (atomic load & store) define void @test1(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test1( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -30,7 +30,7 @@ ;; memcpy.atomic formation (atomic store, normal load) define void @test2(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test2( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -55,7 +55,7 @@ ;; memcpy.atomic formation rejection (atomic store, normal load w/ no align) define void @test2b(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test2b( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -80,7 +80,7 @@ ;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align) define void @test2c(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test2c( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -105,7 +105,7 @@ ;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load) define void @test2d(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test2d( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -131,7 +131,7 @@ ;; memcpy.atomic formation (normal store, atomic load) define void @test3(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test3( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -156,7 +156,7 @@ ;; memcpy.atomic formation rejection (normal store w/ no align, atomic load) define void @test3b(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test3b( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -181,7 +181,7 @@ ;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align) define void @test3c(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test3c( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -206,7 +206,7 @@ ;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load) define void @test3d(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test3d( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -232,7 +232,7 @@ ;; memcpy.atomic formation rejection (atomic load, ordered-atomic store) define void @test4(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test4( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -257,7 +257,7 @@ ;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store) define void @test5(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test5( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: @@ -282,7 +282,8 @@ ;; memcpy.atomic formation (atomic load & store) -- element size 2 define void @test6(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test6( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 %Size, i32 2) +; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 1 +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 [[Sz]], i32 2) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -307,7 +308,8 @@ ;; memcpy.atomic formation (atomic load & store) -- element size 4 define void @test7(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test7( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 %Size, i32 4) +; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 2 +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 [[Sz]], i32 4) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -332,7 +334,8 @@ ;; memcpy.atomic formation (atomic load & store) -- element size 8 define void @test8(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test8( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 %Size, i32 8) +; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 3 +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 [[Sz]], i32 8) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -357,7 +360,8 @@ ;; memcpy.atomic formation rejection (atomic load & store) -- element size 16 define void @test9(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test9( -; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 %Size, i32 16) +; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 4 +; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 [[Sz]], i32 16) ; CHECK-NOT: store ; CHECK: ret void bb.nph: @@ -382,7 +386,7 @@ ;; memcpy.atomic formation rejection (atomic load & store) -- element size 32 define void @test10(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test10( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: Index: llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll =================================================================== --- llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll +++ llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll @@ -5,7 +5,7 @@ ;; Will not create call due to a max element size of 0 define void @test1(i64 %Size) nounwind ssp { ; CHECK-LABEL: @test1( -; CHECK-NOT: call void @llvm.memcpy.element.atomic +; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic ; CHECK: store ; CHECK: ret void bb.nph: Index: llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll =================================================================== --- llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll +++ llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll @@ -1,17 +1,25 @@ ; RUN: not opt -verify < %s 2>&1 | FileCheck %s -define void @test_memcpy(i8* %P, i8* %Q) { +define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i32 %E) { + ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %E) ; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2 - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 3) + ; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 7, i32 4) + + ; CHECK: incorrect alignment of the destination argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 1) ; CHECK: incorrect alignment of the destination argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4) ; CHECK: incorrect alignment of the source argument - call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4) + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1) + ; CHECK: incorrect alignment of the source argument + call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4) ret void } -declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind - +declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind ; CHECK: input module is broken!