This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
RuntimeLibcalls.h
-
IR/
-
IRBuilder.h
-
IntrinsicInst.h
-
Intrinsics.td
-
lib/
-
CodeGen/
-
SelectionDAG/
-
SelectionDAGBuilder.cpp
-
TargetLoweringBase.cpp
-
IR/
-
IRBuilder.cpp
-
Verifier.cpp
-
Transforms/
-
InstCombine/
-
InstCombineCalls.cpp
-
InstCombineInternal.h
-
Scalar/
-
LoopIdiomRecognize.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
element-wise-atomic-memory-intrinsics.ll
-
Transforms/
-
InstCombine/
-
element-atomic-memcpy-to-loads.ll
-
LoopIdiom/
-
X86/
-
unordered-atomic-memcpy.ll
-
unordered-atomic-memcpy-noarch.ll
-
Verifier/
-
element-wise-atomic-memory-intrinsics.ll

Differential D33240

[Atomics] Rename and change prototype for atomic memcpy intrinsic
ClosedPublic

Authored by dneilson on May 16 2017, 7:34 AM.

Download Raw Diff

Details

Reviewers

reames
sanjoy
efriedma

Commits

rG3faabbbe85d5: [Atomics] Rename and change prototype for atomic memcpy intrinsic
rL305558: [Atomics] Rename and change prototype for atomic memcpy intrinsic

Summary

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Diff Detail

Repository: rL LLVM

Event Timeline

dneilson created this revision.May 16 2017, 7:34 AM

Note that this is the first of a series of patches that are being developed for the unordered atomic memcpy. Minimally, the plan is to push the following changes one at a time to minimize risk and impact on others:
i. Change intrinsic name, prototype (to match memcpy closely), & documentation.
ii. Add code to loop idiom to recognize the element unordered atomic memcpy.
iii. Add code to instcombine & selection dag builder to lower the intrinsic.
iv. Add an isunordered() to the MemIntrinsic introspection class (returning false for all existing intrinsics), and add calls to it to all passes it's relevant.
v. Add intrinsic into the introspection hierarchy & complete support for new intrinsic in passes.

dneilson added a reviewer: efriedma.May 16 2017, 9:33 AM

Reviewing only the LangRef changes for the moment. Let's iterate on those until we're happy and then I can go looking for code issues.

docs/LangRef.rst
13582 ↗	(On Diff #99144)	Your revised text is missing key aspects of the old text. You need to preserve the "as a sequence of intrinsic. It differs in that the `dest `and` `src` `are treated as arrays with elements that are` `element_size`` bytes wide and aligned at an element size boundary. " wording from the original, because this is semantically important.
13592 ↗	(On Diff #99144)	Er, huh? What do these major values mean? And why do we need anything other than an i1 boolean?
13595 ↗	(On Diff #99144)	This sentence is important and shouldn't be dropped.
13607 ↗	(On Diff #99144)	Ah, the answer to my question above. I think it would be cleaner to have two i1 params instead of encoding the bitmask. Does that complicate anything for you?
13614 ↗	(On Diff #99144)	This is slightly wrong. You don't need the writes to be unordered atomic if the src/dest doesn't need it, but you do still want to allow concurrent reads and writes. I think you want something along the lines of: "It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are unordered atomic when specified.

This revision now requires changes to proceed.May 16 2017, 11:18 AM

dneilson added a child revision: D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.May 16 2017, 11:20 AM

Do we get any practical benefit from separately specifying whether the source and destination require unordered operations?

Why are you adding an alignment parameter? The alignment is already specified with attributes. (There was a plan at one point to change memcpy to specify alignment like this; IIRC it got committed, then reverted? I don't recall what happened after that.)

Why do we want to specify the length in bytes, as opposed to the number of elements to copy? Any implementation is inevitably just going to divide a length in bytes by the element size.

I'm not really sure why you're messing with the signature of the intrinsic in the first place; we went through most of this design space when it was initially proposed.

In D33240#756419, @efriedma wrote:

Do we get any practical benefit from separately specifying whether the source and destination require unordered operations?

I don't know enough about the possible source languages to know with 100% certainty that it's not possible to mix, say, unordered loads with ordered stores. For example, Java only requires the unordered ops for shared data (i.e. stuff on the heap). It's conceivable that a memcpy is desired to copy, say, from the stack to the heap; only the heap stores would need to be unordered in this case -- playing devil's advocate, it would not be wrong (just unnecessary) to use unordered loads of the stack data here as well. Erring on the side of flexible/generic here.

Why are you adding an alignment parameter? The alignment is already specified with attributes. (There was a plan at one point to change memcpy to specify alignment like this; IIRC it got committed, then reverted? I don't recall what happened after that.)

Compatibility with the existing memcpy intrinsic. I have no problem removing the alignment parameter if that's the long-term goal/vision. My only concern is that I don't know whether that difference will lead me into trouble when I get to the point of adding the unordered atomic intrinsic to the MemTransferInst introspection class hierarchy.

Why do we want to specify the length in bytes, as opposed to the number of elements to copy? Any implementation is inevitably just going to divide a length in bytes by the element size.

Compatibility with memcpy semantics. The 'length' parameter of a MemTransferInst intrinsic is semantically understood by transforms/analysis to be the size of the transfer in bytes. Having an intrinsic in that hierarchy that specifies a non-bytes value for that parameter strikes me as a recipe for bugs.

I'm not really sure why you're messing with the signature of the intrinsic in the first place; we went through most of this design space when it was initially proposed.

It's being revisited with a long-term eye towards merging the unordered atomic semantics into the existing memcpy intrinsic. By making the proposed/new intrinsic's definition much closer to that of the existing llvm.memcpy it should be much easier to do that eventual merging of the two without having to revisit the semantic understanding of the unordered intrinsic in every analysis/transform.

I'll make the suggested changes to the LangRef.

docs/LangRef.rst
13607 ↗	(On Diff #99144)	Easy enough to do if it's desired. Only reason that I didn't do that originally is that I'm already adding two parameters on top of memcpy for unordered memcpy; seemed like a good way to prevent that from becoming three additional parameters.

Addressed suggested changes to the LangRef doc for the intrinsic.
Split out the is unordered parameter into two separate parameters -- dest_unordered & src_unordered.

dneilson marked 5 inline comments as done.May 16 2017, 2:39 PM

I'm going to let Daniel and Eli debate the general direction before reviewing further. I want to make sure Eli is on board with the general direction before we invest lots of effort in cleaning up the code.

skatkov added a subscriber: skatkov.May 17 2017, 3:05 AM

skatkov added inline comments.

include/llvm/IR/IntrinsicInst.h
199 ↗	(On Diff #99202)	why not enum?
200 ↗	(On Diff #99202)	ARG_SOURCE? For consistency with other names.
205 ↗	(On Diff #99202)	ARG_SOURCE_UNORDERED? For consistency with other names.
245 ↗	(On Diff #99202)	bool isDestUnordered?
250 ↗	(On Diff #99202)	bool isSrcUnordered?
297 ↗	(On Diff #99202)	Don't you want to check (or assert) the constraints for the alignment here?
303 ↗	(On Diff #99202)	setSourceUnordered? For consistency with other names.
305 ↗	(On Diff #99202)	Don't you want to check (or assert) the constraints for the element size here?
include/llvm/IR/Intrinsics.td
810 ↗	(On Diff #99202)	you have two unordered arguments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4871 ↗	(On Diff #99202)	You introduced the special getters, can you use them here?
lib/IR/Verifier.cpp
3997 ↗	(On Diff #99202)	is zero alignment allowed? At least it is strange that the 0 is a power of 2 :) if it is allowed, please update text "or zero"

I don't know enough about the possible source languages to know with 100% certainty that it's not possible to mix, say, unordered loads with ordered stores.

It's almost certainly possible to mix them... even if the source language doesn't allow mixing them, we could transform unordered operations to non-atomic operations if we can prove the memory isn't accessed from another thread. (I don't think we actually do this transform at the moment, but LICM has a similar sort of check.)

The question is whether there's actually any optimization that would actually check the unordered bit. Maybe there is? (See my comments on __llvm_memcpy_element_unordered_atomic_*.)

Compatibility with the existing memcpy intrinsic. I have no problem removing the alignment parameter if that's the long-term goal/vision. My only concern is that I don't know whether that difference will lead me into trouble when I get to the point of adding the unordered atomic intrinsic to the MemTransferInst introspection class hierarchy.

Yes, this is the direction we want to go in. You should be able to hide the difference in the implementation of MemIntrinsic, I think. If it does cause problems, we can revisit.

Compatibility with memcpy semantics. The 'length' parameter of a MemTransferInst intrinsic is semantically understood by transforms/analysis to be the size of the transfer in bytes. Having an intrinsic in that hierarchy that specifies a non-bytes value for that parameter strikes me as a recipe for bugs.

Okay. It's a little ugly, but I guess it isn't that terrible.

docs/LangRef.rst
13592 ↗	(On Diff #99202)	"must" is kind of confusing in this context. Probably need to explicitly say "if len is not a multiple of element_size, the behavior is undefined", or something like that, since we can't actually tell until runtime.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891 ↗	(On Diff #99202)	I'm not sure changing the signature of __llvm_memcpy_element_unordered_atomic_* like this makes sense. I guess changing from the number of elements to the length in bytes is fine. I'm not sure why you want to pass the alignment to the function; the implementation can easily compute the alignment itself based on the pointers passed in. Not sure what the implementation is going to do with the DestUnordered and SrcUnordered parameters. Maybe if the source/dest is non-atomic, it could use unaligned load/store operations? If we do need DestUnordered and SrcUnordered, it probably makes sense to merge them to save an instruction in the caller.

dneilson marked 8 inline comments as done.May 17 2017, 1:24 PM

dneilson added inline comments.

docs/LangRef.rst
13582 ↗	(On Diff #99144)	I'm not seeing the difference here. "Sequence" doesn't imply any sort of ordering. So, to me the old and new are semantically equivalent -- there is a copy happening, and it's being done with unordered atomic load/stores.
13592 ↗	(On Diff #99202)	Fair. I'll make that change.
include/llvm/IR/IntrinsicInst.h
199 ↗	(On Diff #99202)	No particularly good reason. Just playing around.
200 ↗	(On Diff #99202)	Fair.
297 ↗	(On Diff #99202)	I figured that's handled by the verifier.
305 ↗	(On Diff #99202)	Handled by verifier.
include/llvm/IR/Intrinsics.td
810 ↗	(On Diff #99202)	Good catch! I updated the prototype, but neglected the comment.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891 ↗	(On Diff #99202)	Fair. I have no strong preference with the library function taking size in bytes vs. number of elements. The only benefit that I can see to passing size in bytes instead of number of elements here is the possibility for runtime checks being implemented in a debug version of the library -- one that verifies that the length is a multiple of element size. Good point about alignment. Not the smartest move on my part. I think that you're right about passing dest_unordered & src_unordered to the library -- it might be unnecessary. The lib function could just be implemented assuming that both source & dest require unordered atomic ops; there shouldn't be any harm in it, since unordered just means that we can't break up an element into partial loads/stores, and we wouldn't want to do that in a high performance library anyways. I'll change the lib prototype to match memcpy exactly. _llvm_memcpy_unordered_atomic(i8* noalias dest, i8* noalias src, uint64 length)
lib/IR/Verifier.cpp
3997 ↗	(On Diff #99202)	I would think not, but this is the same check as exists for memcpy. So, for compatibility I think it should be allowed unless we can definitively say that it's not allowed for memcpy.

Addressing suggestions.

dneilson added a subscriber: llvm-commits.May 18 2017, 1:00 PM

anna added a subscriber: anna.May 18 2017, 2:39 PM

anna added inline comments.

docs/LangRef.rst
13604 ↗	(On Diff #99472)	Nit: if and only if stores to the destination buffer are

skatkov added inline comments.May 18 2017, 9:08 PM

include/llvm/IR/IntrinsicInst.h
297 ↗	(On Diff #99202)	ok, but to me it is one of a primary goal for setter to check incoming args.
lib/IR/Verifier.cpp
3991 ↗	(On Diff #99472)	Use getters?

Addressing some comments -- use of getters, adding assertions to setters, and some minor wording changes to LangRef.

skatkov added inline comments.May 21 2017, 8:07 PM

lib/IR/Verifier.cpp
3992 ↗	(On Diff #99618)	Extra semicolon

dneilson mentioned this in D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.May 25 2017, 9:39 AM

dneilson removed a child revision: D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.

Another iteration on the intrinsic prototype. I've removed the align, and dest/src_unordered arguments.
- Having align both as arg attributes and as an arg could cause challenges if we need to resolve a difference, and it is the desired future direction for intrinsics.
- Upon further thought, and digging into where passes would have to be made aware of this intrinsic -- I'm no longer convinced about the value of the separate dest_unordered/src_unordered args.
  - It seems sufficient to have the semantics of the intrinsic being that all loads/stores are unordered atomic; we can still "promote" idioms that mix, say, unordered loads with simple stores.
  - Any library implementation will just use unordered atomic loads & stores throughout, anyways.
  - There will be a side-effect of promoting simple ops to unordered-atomic if we recognize a loop idiom, and then later lower it into loads/stores. The tradeoff is that it should be easier to work with the intrinsic in passes.
  - The only value that I can see in having the separate dest_unordered/src_unordered args is that in lowering passes that change the intrinsic into explicit loads/stores we wouldn't "promote" a simple op into an unordered-atomic op.

Added in updating the InstCombine lowering of the intrinsic so that the change doesn't lose functionality; even temporarily.

efriedma added inline comments.May 26 2017, 11:41 AM

docs/LangRef.rst
14022 ↗	(On Diff #100415)	Hmm... I didn't mention isvolatile earlier? See https://reviews.llvm.org/D27133?id=79305#629884 for original discussion of it. At the very least, we need a better description of what it means.

skatkov added inline comments.May 28 2017, 10:49 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4878 ↗	(On Diff #100415)	MI.getLength() == Length
lib/Transforms/InstCombine/InstCombineCalls.cpp
109 ↗	(On Diff #100415)	assert LengthInBytes % ElementSizeInBytes == 0 and LengthInBytes > 0?

Did a scan through the code, didn't spot anything major. Once we settle the last few design/specification questions, this looks basically ready to go in.

docs/LangRef.rst
14022 ↗	(On Diff #100415)	I think we can just remove this. The original motivation was essentially future proofing, and I don't think it's worth keeping the complexity for now. We can change our minds later if it turns out we actually need this.
lib/Transforms/InstCombine/InstCombineCalls.cpp
109 ↗	(On Diff #100415)	Agreed. Also, length might actually be zero. We should remove such calls.
1896 ↗	(On Diff #100415)	Hm, I might sink this into the helper function. Optional, and can be submitted separately without further review.

dneilson added inline comments.May 31 2017, 12:29 PM

docs/LangRef.rst
14022 ↗	(On Diff #100415)	Agreed. I'll remove it.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4878 ↗	(On Diff #100415)	Not quite. Doesn't seem to be a straightforward way to go from an SDValue to a Type*, so I don't think this sort of replacement can be made.
lib/Transforms/InstCombine/InstCombineCalls.cpp
109 ↗	(On Diff #100415)	Re: The assert. I think that it would be better to check this in the verifier. In the LangRef we've said that it's undefined behaviour if length isn't a multiple of element size, so I think it's okay to blindly do this divide here and add a check to the verifier. Re: Zero length; see the in-line comment below.
1896 ↗	(On Diff #100415)	This is following the same pattern/code-flow as the normal memcpy/memmove/memset handlers just above this. i.e. Check for a null length -- if there is one, then remove the call, else call the simplify method for the intrinsic. I'm inclined to stick to this pattern to make the later merging of the introspection classes cleaner.

Remove volatile arg from intrinsic.
Add check to verifier to ensure that constant length is a multiple of element size & add corresponding test.

LGTM w/one comment addressed before submission.

lib/Transforms/InstCombine/InstCombineCalls.cpp
114 ↗	(On Diff #100922)	Where did this check come from and why is it needed? It looks like an attempt to handle a length which isn't an even interval of element size, but the verifier should reject that?

This revision is now accepted and ready to land.Jun 5 2017, 7:07 PM

dneilson added inline comments.Jun 6 2017, 6:29 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
114 ↗	(On Diff #100922)	Just me being extra cautious. The verifier checks for the case where constant length is not a multiple of element size, and the zero length case is handled elsewhere. However, I'm not sure that the verifier runs after every single pass. So, I figure there's no harm in handling the corner case.

Loop idiom patch was dropped, so update loop idiom recognition as well.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJun 6 2017, 1:03 PM

dneilson added inline comments.Jun 7 2017, 7:57 AM

include/llvm/IR/IntrinsicInst.h
210 ↗	(On Diff #101605)	I'm inclined to change this name to 'EUAMemcpyInst' to cut down on the length of its name. Any objections?

rebase

Closed by commit rL305558: [Atomics] Rename and change prototype for atomic memcpy intrinsic (authored by dneilson). · Explain WhyJun 16 2017, 7:44 AM

This revision was automatically updated to reflect the committed changes.

jfb mentioned this in D79279: Add overloaded versions of builtin mem* functions.Aug 4 2020, 5:51 PM

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

70 lines

include/

llvm/

CodeGen/

RuntimeLibcalls.h

19 lines

IR/

IRBuilder.h

24 lines

IntrinsicInst.h

90 lines

Intrinsics.td

15 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

24 lines

TargetLoweringBase.cpp

28 lines

IR/

IRBuilder.cpp

15 lines

Verifier.cpp

31 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

121 lines

InstCombineInternal.h

3 lines

Scalar/

LoopIdiomRecognize.cpp

24 lines

test/

CodeGen/

X86/

element-wise-atomic-memory-intrinsics.ll

45 lines

Transforms/

InstCombine/

element-atomic-memcpy-to-loads.ll

30 lines

LoopIdiom/

X86/

unordered-atomic-memcpy.ll

36 lines

unordered-atomic-memcpy-noarch.ll

2 lines

Verifier/

element-wise-atomic-memory-intrinsics.ll

20 lines

Diff 102827

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 14,062 Lines • ▼ Show 20 Lines
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

	Element Wise Atomic Memory Intrinsics			Element Wise Atomic Memory Intrinsics
	-------------------------------------			-------------------------------------

	These intrinsics are similar to the standard library memory intrinsics except			These intrinsics are similar to the standard library memory intrinsics except
	that they perform memory transfer as a sequence of atomic memory accesses.			that they perform memory transfer as a sequence of atomic memory accesses.

	.. _int_memcpy_element_atomic:			.. _int_memcpy_element_unordered_atomic:

	'``llvm.memcpy.element.atomic``' Intrinsic			'``llvm.memcpy.element.unordered.atomic``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on			This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
	any integer bit width and for different address spaces. Not all targets			any integer bit width and for different address spaces. Not all targets
	support all bit widths however.			support all bit widths however.

	::			::

	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
	i64 <num_elements>, i32 <element_size>)			i8* <src>,
				i32 <len>,
				i32 <element_size>)
				declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
				i8* <src>,
				i64 <len>,
				i32 <element_size>)

	Overview:			Overview:
	"""""""""			"""""""""

	The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of			The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
	memory from the source location to the destination location as a sequence of			'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
	unordered atomic memory accesses where each access is a multiple of			as arrays with elements that are exactly ``element_size`` bytes, and the copy between
	``element_size`` bytes wide and aligned at an element size boundary. For example			buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
	each element is accessed atomically in source and destination buffers.			that are a positive integer multiple of the ``element_size`` in size.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument is a pointer to the destination, the second is a			The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
	pointer to the source. The third argument is an integer argument			intrinsic, with the added constraint that ``len`` is required to be a positive integer
	specifying the number of elements to copy, the fourth argument is size of			multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
	the single element in bytes.			``element_size``, then the behaviour of the intrinsic is undefined.

	``element_size`` should be a power of two, greater than zero and less than			``element_size`` must be a compile-time constant positive power of two no greater than
	a target-specific atomic access size limit.			target-specific atomic access size limit.

	For each of the input pointers ``align`` parameter attribute must be specified.			For each of the input pointers ``align`` parameter attribute must be specified. It
	It must be a power of two and greater than or equal to the ``element_size``.			must be a power of two no less than the ``element_size``. Caller guarantees that
	Caller guarantees that both the source and destination pointers are aligned to			both the source and destination pointers are aligned to that boundary.
	that boundary.

	Semantics:			Semantics:
	""""""""""			""""""""""

	The '``llvm.memcpy.element.atomic.*``' intrinsic copies			The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
	'``num_elements`` * ``element_size``' bytes of memory from the source location to			memory from the source location to the destination location. These locations are not
	the destination location. These locations are not allowed to overlap. Memory copy			allowed to overlap. The memory copy is performed as a sequence of load/store operations
	is performed as a sequence of unordered atomic memory accesses where each access			where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
	is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an			aligned at an ``element_size`` boundary.
	element size boundary.

	The order of the copy is unspecified. The same value may be read from the source			The order of the copy is unspecified. The same value may be read from the source
	buffer many times, but only one write is issued to the destination buffer per			buffer many times, but only one write is issued to the destination buffer per
	element. It is well defined to have concurrent reads and writes to both source			element. It is well defined to have concurrent reads and writes to both source and
	and destination provided those reads and writes are at least unordered atomic.			destination provided those reads and writes are unordered atomic when specified.

	This intrinsic does not provide any additional ordering guarantees over those			This intrinsic does not provide any additional ordering guarantees over those
	provided by a set of unordered loads from the source location and stores to the			provided by a set of unordered loads from the source location and stores to the
	destination.			destination.

	Lowering:			Lowering:
	"""""""""			"""""""""

	In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered			In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
	to a call to the symbol ``__llvm_memcpy_element_atomic_``. Where '' is replaced			lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_``. Where ''
	with an actual element size.			is replaced with an actual element size.

	Optimizer is allowed to inline memory copy when it's profitable to do so.			The optimizer is allowed to inline the memory copy when it's profitable to do so.

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	enum Libcall {
O_F128,		O_F128,
O_PPCF128,		O_PPCF128,

// MEMORY		// MEMORY
MEMCPY,		MEMCPY,
MEMSET,		MEMSET,
MEMMOVE,		MEMMOVE,

// ELEMENT-WISE ATOMIC MEMORY		// ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes
MEMCPY_ELEMENT_ATOMIC_1,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_1,
MEMCPY_ELEMENT_ATOMIC_2,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_2,
MEMCPY_ELEMENT_ATOMIC_4,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_4,
MEMCPY_ELEMENT_ATOMIC_8,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_8,
MEMCPY_ELEMENT_ATOMIC_16,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,

// EXCEPTION HANDLING		// EXCEPTION HANDLING
UNWIND_RESUME,		UNWIND_RESUME,

// Note: there's two sets of atomics libcalls; see		// Note: there's two sets of atomics libcalls; see
// <http://llvm.org/docs/Atomics.html> for more info on the		// <http://llvm.org/docs/Atomics.html> for more info on the
// difference between them.		// difference between them.

▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	namespace RTLIB {
/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or		/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getUINTTOFP(EVT OpVT, EVT RetVT);		Libcall getUINTTOFP(EVT OpVT, EVT RetVT);

/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or		/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getSYNC(unsigned Opc, MVT VT);		Libcall getSYNC(unsigned Opc, MVT VT);

/// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the		/// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return
/// given element size or UNKNOW_LIBCALL if there is none.		/// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);		/// UNKNOW_LIBCALL if there is none.
		Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);
}		}
}		}

#endif		#endif

llvm/trunk/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	public:
}		}

CallInst CreateMemCpy(Value Dst, Value Src, Value Size, unsigned Align,		CallInst CreateMemCpy(Value Dst, Value Src, Value Size, unsigned Align,
bool isVolatile = false, MDNode *TBAATag = nullptr,		bool isVolatile = false, MDNode *TBAATag = nullptr,
MDNode *TBAAStructTag = nullptr,		MDNode *TBAAStructTag = nullptr,
MDNode *ScopeTag = nullptr,		MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);		MDNode *NoAliasTag = nullptr);

/// \brief Create and insert an atomic memcpy between the specified		/// \brief Create and insert an element unordered-atomic memcpy between the
/// pointers.		/// specified pointers.
///		///
/// If the pointers aren't i8*, they will be converted. If a TBAA tag is		/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
/// specified, it will be added to the instruction. Likewise with alias.scope		/// specified, it will be added to the instruction. Likewise with alias.scope
/// and noalias tags.		/// and noalias tags.
CallInst *CreateElementAtomicMemCpy(		CallInst *CreateElementUnorderedAtomicMemCpy(
Value Dst, Value Src, uint64_t NumElements, uint32_t ElementSize,		Value Dst, Value Src, uint64_t Size, uint32_t ElementSize,
MDNode TBAATag = nullptr, MDNode TBAAStructTag = nullptr,		MDNode TBAATag = nullptr, MDNode TBAAStructTag = nullptr,
MDNode ScopeTag = nullptr, MDNode NoAliasTag = nullptr) {		MDNode ScopeTag = nullptr, MDNode NoAliasTag = nullptr) {
return CreateElementAtomicMemCpy(Dst, Src, getInt64(NumElements),		return CreateElementUnorderedAtomicMemCpy(
ElementSize, TBAATag, TBAAStructTag,		Dst, Src, getInt64(Size), ElementSize, TBAATag, TBAAStructTag, ScopeTag,
ScopeTag, NoAliasTag);		NoAliasTag);
}		}

CallInst CreateElementAtomicMemCpy(Value Dst, Value *Src,		CallInst *CreateElementUnorderedAtomicMemCpy(
Value *NumElements, uint32_t ElementSize,		Value Dst, Value Src, Value *Size, uint32_t ElementSize,
MDNode *TBAATag = nullptr,		MDNode TBAATag = nullptr, MDNode TBAAStructTag = nullptr,
MDNode *TBAAStructTag = nullptr,		MDNode ScopeTag = nullptr, MDNode NoAliasTag = nullptr);
MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);

/// \brief Create and insert a memmove between the specified		/// \brief Create and insert a memmove between the specified
/// pointers.		/// pointers.
///		///
/// If the pointers aren't i8*, they will be converted. If a TBAA tag is		/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
/// specified, it will be added to the instruction. Likewise with alias.scope		/// specified, it will be added to the instruction. Likewise with alias.scope
/// and noalias tags.		/// and noalias tags.
CallInst CreateMemMove(Value Dst, Value *Src, uint64_t Size, unsigned Align,		CallInst CreateMemMove(Value Dst, Value *Src, uint64_t Size, unsigned Align,
▲ Show 20 Lines • Show All 1,498 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static inline bool classof(const IntrinsicInst *I) {
}		}
}		}
static inline bool classof(const Value *V) {		static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

/// This class represents atomic memcpy intrinsic		/// This class represents atomic memcpy intrinsic
/// TODO: Integrate this class into MemIntrinsic hierarchy.		/// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
class ElementAtomicMemCpyInst : public IntrinsicInst {		/// C&P of all methods from that hierarchy
		class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst {
		private:
		enum { ARG_DEST = 0, ARG_SOURCE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 };

public:		public:
Value *getRawDest() const { return getArgOperand(0); }		Value *getRawDest() const {
Value *getRawSource() const { return getArgOperand(1); }		return const_cast<Value *>(getArgOperand(ARG_DEST));
		}
		const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
		Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }

		/// Return the arguments to the instruction.
		Value *getRawSource() const {
		return const_cast<Value *>(getArgOperand(ARG_SOURCE));
		}
		const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); }
		Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); }

		Value *getLength() const {
		return const_cast<Value *>(getArgOperand(ARG_LENGTH));
		}
		const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
		Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }

		bool isVolatile() const { return false; }

		Value *getRawElementSizeInBytes() const {
		return const_cast<Value *>(getArgOperand(ARG_ELEMENTSIZE));
		}

		ConstantInt *getElementSizeInBytesCst() const {
		return cast<ConstantInt>(getRawElementSizeInBytes());
		}

		uint32_t getElementSizeInBytes() const {
		return getElementSizeInBytesCst()->getZExtValue();
		}

		/// This is just like getRawDest, but it strips off any cast
		/// instructions that feed it, giving the original input. The returned
		/// value is guaranteed to be a pointer.
		Value *getDest() const { return getRawDest()->stripPointerCasts(); }

		/// This is just like getRawSource, but it strips off any cast
		/// instructions that feed it, giving the original input. The returned
		/// value is guaranteed to be a pointer.
		Value *getSource() const { return getRawSource()->stripPointerCasts(); }

		unsigned getDestAddressSpace() const {
		return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
		}

Value *getNumElements() const { return getArgOperand(2); }		unsigned getSourceAddressSpace() const {
void setNumElements(Value *V) { setArgOperand(2, V); }		return cast<PointerType>(getRawSource()->getType())->getAddressSpace();
		}

uint64_t getSrcAlignment() const { return getParamAlignment(0); }		/// Set the specified arguments of the instruction.
uint64_t getDstAlignment() const { return getParamAlignment(1); }		void setDest(Value *Ptr) {
		assert(getRawDest()->getType() == Ptr->getType() &&
		"setDest called with pointer of wrong type!");
		setArgOperand(ARG_DEST, Ptr);
		}

		void setSource(Value *Ptr) {
		assert(getRawSource()->getType() == Ptr->getType() &&
		"setSource called with pointer of wrong type!");
		setArgOperand(ARG_SOURCE, Ptr);
		}

		void setLength(Value *L) {
		assert(getLength()->getType() == L->getType() &&
		"setLength called with value of wrong type!");
		setArgOperand(ARG_LENGTH, L);
		}

uint64_t getElementSizeInBytes() const {		void setElementSizeInBytes(Constant *V) {
Value *Arg = getArgOperand(3);		assert(V->getType() == Type::getInt8Ty(getContext()) &&
return cast<ConstantInt>(Arg)->getZExtValue();		"setElementSizeInBytes called with value of wrong type!");
		setArgOperand(ARG_ELEMENTSIZE, V);
}		}

static inline bool classof(const IntrinsicInst *I) {		static inline bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic;		return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic;
}		}
static inline bool classof(const Value *V) {		static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

/// This is the common base class for memset/memcpy/memmove.		/// This is the common base class for memset/memcpy/memmove.
class MemIntrinsic : public IntrinsicInst {		class MemIntrinsic : public IntrinsicInst {
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 856 Lines • ▼ Show 20 Lines
	// Takes a pointer to a string and the length of the string.			// Takes a pointer to a string and the length of the string.
	def int_xray_customevent : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],			def int_xray_customevent : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
	[NoCapture<0>, ReadOnly<0>, IntrWriteMem]>;			[NoCapture<0>, ReadOnly<0>, IntrWriteMem]>;
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===------ Memory intrinsics with element-wise atomicity guarantees ------===//			//===------ Memory intrinsics with element-wise atomicity guarantees ------===//
	//			//

	def int_memcpy_element_atomic : Intrinsic<[],			// @llvm.memcpy.element.unordered.atomic.*(dest, src, length, elementsize)
	[llvm_anyptr_ty, llvm_anyptr_ty,			def int_memcpy_element_unordered_atomic
	llvm_i64_ty, llvm_i32_ty],			: Intrinsic<[],
	[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,			[
	WriteOnly<0>, ReadOnly<1>]>;			llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty
				],
				[
				IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
				ReadOnly<1>
				]>;

	//===------------------------ Reduction Intrinsics ------------------------===//			//===------------------------ Reduction Intrinsics ------------------------===//
	//			//
	def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],			def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
	[llvm_anyfloat_ty,			[llvm_anyfloat_ty,
	llvm_anyvector_ty],			llvm_anyvector_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],			def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,937 Lines • ▼ Show 20 Lines	case Intrinsic::memmove: {
bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();		bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();
bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());		bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());
SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,		SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,
isTC, MachinePointerInfo(I.getArgOperand(0)),		isTC, MachinePointerInfo(I.getArgOperand(0)),
MachinePointerInfo(I.getArgOperand(1)));		MachinePointerInfo(I.getArgOperand(1)));
updateDAGForMaybeTailCall(MM);		updateDAGForMaybeTailCall(MM);
return nullptr;		return nullptr;
}		}
case Intrinsic::memcpy_element_atomic: {		case Intrinsic::memcpy_element_unordered_atomic: {
SDValue Dst = getValue(I.getArgOperand(0));		const ElementUnorderedAtomicMemCpyInst &MI =
SDValue Src = getValue(I.getArgOperand(1));		cast<ElementUnorderedAtomicMemCpyInst>(I);
SDValue NumElements = getValue(I.getArgOperand(2));		SDValue Dst = getValue(MI.getRawDest());
SDValue ElementSize = getValue(I.getArgOperand(3));		SDValue Src = getValue(MI.getRawSource());
		SDValue Length = getValue(MI.getLength());

// Emit a library call.		// Emit a library call.
TargetLowering::ArgListTy Args;		TargetLowering::ArgListTy Args;
TargetLowering::ArgListEntry Entry;		TargetLowering::ArgListEntry Entry;
Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());		Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
Entry.Node = Dst;		Entry.Node = Dst;
Args.push_back(Entry);		Args.push_back(Entry);

Entry.Node = Src;		Entry.Node = Src;
Args.push_back(Entry);		Args.push_back(Entry);

Entry.Ty = I.getArgOperand(2)->getType();		Entry.Ty = MI.getLength()->getType();
Entry.Node = NumElements;		Entry.Node = Length;
Args.push_back(Entry);

Entry.Ty = Type::getInt32Ty(*DAG.getContext());
Entry.Node = ElementSize;
Args.push_back(Entry);		Args.push_back(Entry);

uint64_t ElementSizeConstant =		uint64_t ElementSizeConstant = MI.getElementSizeInBytes();
cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();
RTLIB::Libcall LibraryCall =		RTLIB::Libcall LibraryCall =
RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);		RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)		if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
report_fatal_error("Unsupported element size");		report_fatal_error("Unsupported element size");

TargetLowering::CallLoweringInfo CLI(DAG);		TargetLowering::CallLoweringInfo CLI(DAG);
CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(		CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(
TLI.getLibcallCallingConv(LibraryCall),		TLI.getLibcallCallingConv(LibraryCall),
Type::getVoidTy(*DAG.getContext()),		Type::getVoidTy(*DAG.getContext()),
DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),		DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),
▲ Show 20 Lines • Show All 4,743 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	static void InitLibcallNames(const char **Names, const Triple &TT) {
Names[RTLIB::UO_PPCF128] = "__gcc_qunord";		Names[RTLIB::UO_PPCF128] = "__gcc_qunord";
Names[RTLIB::O_F32] = "__unordsf2";		Names[RTLIB::O_F32] = "__unordsf2";
Names[RTLIB::O_F64] = "__unorddf2";		Names[RTLIB::O_F64] = "__unorddf2";
Names[RTLIB::O_F128] = "__unordtf2";		Names[RTLIB::O_F128] = "__unordtf2";
Names[RTLIB::O_PPCF128] = "__gcc_qunord";		Names[RTLIB::O_PPCF128] = "__gcc_qunord";
Names[RTLIB::MEMCPY] = "memcpy";		Names[RTLIB::MEMCPY] = "memcpy";
Names[RTLIB::MEMMOVE] = "memmove";		Names[RTLIB::MEMMOVE] = "memmove";
Names[RTLIB::MEMSET] = "memset";		Names[RTLIB::MEMSET] = "memset";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] =
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";		"__llvm_memcpy_element_unordered_atomic_1";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] =
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";		"__llvm_memcpy_element_unordered_atomic_2";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] =
		"__llvm_memcpy_element_unordered_atomic_4";
		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] =
		"__llvm_memcpy_element_unordered_atomic_8";
		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] =
		"__llvm_memcpy_element_unordered_atomic_16";
Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";		Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";
▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	switch (Opc) {
OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)		OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)
}		}

#undef OP_TO_LIBCALL		#undef OP_TO_LIBCALL

return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}

RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {		RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
switch (ElementSize) {		switch (ElementSize) {
case 1:		case 1:
return MEMCPY_ELEMENT_ATOMIC_1;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1;
case 2:		case 2:
return MEMCPY_ELEMENT_ATOMIC_2;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2;
case 4:		case 4:
return MEMCPY_ELEMENT_ATOMIC_4;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4;
case 8:		case 8:
return MEMCPY_ELEMENT_ATOMIC_8;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8;
case 16:		case 16:
return MEMCPY_ELEMENT_ATOMIC_16;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16;
default:		default:
return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}

}		}

/// InitCmpLibcallCCs - Set default comparison libcall CC.		/// InitCmpLibcallCCs - Set default comparison libcall CC.
///		///
static void InitCmpLibcallCCs(ISD::CondCode *CCs) {		static void InitCmpLibcallCCs(ISD::CondCode *CCs) {
memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);		memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);
CCs[RTLIB::OEQ_F32] = ISD::SETEQ;		CCs[RTLIB::OEQ_F32] = ISD::SETEQ;
CCs[RTLIB::OEQ_F64] = ISD::SETEQ;		CCs[RTLIB::OEQ_F64] = ISD::SETEQ;
▲ Show 20 Lines • Show All 1,319 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

return CI;		return CI;
}		}

CallInst *IRBuilderBase::CreateElementAtomicMemCpy(		CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
Value Dst, Value Src, Value *NumElements, uint32_t ElementSize,		Value Dst, Value Src, Value Size, uint32_t ElementSize, MDNode TBAATag,
MDNode TBAATag, MDNode TBAAStructTag, MDNode *ScopeTag,		MDNode TBAAStructTag, MDNode ScopeTag, MDNode *NoAliasTag) {
MDNode *NoAliasTag) {
Dst = getCastedInt8PtrValue(Dst);		Dst = getCastedInt8PtrValue(Dst);
Src = getCastedInt8PtrValue(Src);		Src = getCastedInt8PtrValue(Src);

Value *Ops[] = {Dst, Src, NumElements, getInt32(ElementSize)};		Value *Ops[] = {Dst, Src, Size, getInt32(ElementSize)};
Type *Tys[] = {Dst->getType(), Src->getType()};		Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
Module *M = BB->getParent()->getParent();		Module *M = BB->getParent()->getParent();
Value *TheFn =		Value *TheFn = Intrinsic::getDeclaration(
Intrinsic::getDeclaration(M, Intrinsic::memcpy_element_atomic, Tys);		M, Intrinsic::memcpy_element_unordered_atomic, Tys);

CallInst *CI = createCallHelper(TheFn, Ops, this);		CallInst *CI = createCallHelper(TheFn, Ops, this);

// Set the TBAA info if present.		// Set the TBAA info if present.
if (TBAATag)		if (TBAATag)
CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);		CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);

// Set the TBAA Struct info if present.		// Set the TBAA Struct info if present.
▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,006 Lines • ▼ Show 20 Lines	case Intrinsic::memset: {
const APInt &AlignVal = AlignCI->getValue();		const APInt &AlignVal = AlignCI->getValue();
Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),		Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),
"alignment argument of memory intrinsics must be a power of 2", CS);		"alignment argument of memory intrinsics must be a power of 2", CS);
Assert(isa<ConstantInt>(CS.getArgOperand(4)),		Assert(isa<ConstantInt>(CS.getArgOperand(4)),
"isvolatile argument of memory intrinsics must be a constant int",		"isvolatile argument of memory intrinsics must be a constant int",
CS);		CS);
break;		break;
}		}
case Intrinsic::memcpy_element_atomic: {		case Intrinsic::memcpy_element_unordered_atomic: {
ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));		const ElementUnorderedAtomicMemCpyInst *MI =
Assert(ElementSizeCI, "element size of the element-wise atomic memory "		cast<ElementUnorderedAtomicMemCpyInst>(CS.getInstruction());
		;

		ConstantInt *ElementSizeCI =
		dyn_cast<ConstantInt>(MI->getRawElementSizeInBytes());
		Assert(ElementSizeCI,
		"element size of the element-wise unordered atomic memory "
"intrinsic must be a constant int",		"intrinsic must be a constant int",
CS);		CS);
const APInt &ElementSizeVal = ElementSizeCI->getValue();		const APInt &ElementSizeVal = ElementSizeCI->getValue();
Assert(ElementSizeVal.isPowerOf2(),		Assert(ElementSizeVal.isPowerOf2(),
"element size of the element-wise atomic memory intrinsic "		"element size of the element-wise atomic memory intrinsic "
"must be a power of 2",		"must be a power of 2",
CS);		CS);

		if (auto *LengthCI = dyn_cast<ConstantInt>(MI->getLength())) {
		uint64_t Length = LengthCI->getZExtValue();
		uint64_t ElementSize = MI->getElementSizeInBytes();
		Assert((Length % ElementSize) == 0,
		"constant length must be a multiple of the element size in the "
		"element-wise atomic memory intrinsic",
		CS);
		}

auto IsValidAlignment = [&](uint64_t Alignment) {		auto IsValidAlignment = [&](uint64_t Alignment) {
return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);		return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
};		};

uint64_t DstAlignment = CS.getParamAlignment(0),		uint64_t DstAlignment = CS.getParamAlignment(0),
SrcAlignment = CS.getParamAlignment(1);		SrcAlignment = CS.getParamAlignment(1);

Assert(IsValidAlignment(DstAlignment),		Assert(IsValidAlignment(DstAlignment),
"incorrect alignment of the destination argument",		"incorrect alignment of the destination argument", CS);
CS);
Assert(IsValidAlignment(SrcAlignment),		Assert(IsValidAlignment(SrcAlignment),
"incorrect alignment of the source argument",		"incorrect alignment of the source argument", CS);
CS);
break;		break;
}		}
case Intrinsic::gcroot:		case Intrinsic::gcroot:
case Intrinsic::gcwrite:		case Intrinsic::gcwrite:
case Intrinsic::gcread:		case Intrinsic::gcread:
if (ID == Intrinsic::gcroot) {		if (ID == Intrinsic::gcroot) {
AllocaInst *AI =		AllocaInst *AI =
dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());		dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());
▲ Show 20 Lines • Show All 896 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = V->getNumElements(); I != E; ++I) {
bool Sign = V->getElementType()->isIntegerTy()		bool Sign = V->getElementType()->isIntegerTy()
? cast<ConstantInt>(Elt)->isNegative()		? cast<ConstantInt>(Elt)->isNegative()
: cast<ConstantFP>(Elt)->isNegative();		: cast<ConstantFP>(Elt)->isNegative();
BoolVec.push_back(ConstantInt::get(BoolTy, Sign));		BoolVec.push_back(ConstantInt::get(BoolTy, Sign));
}		}
return ConstantVector::get(BoolVec);		return ConstantVector::get(BoolVec);
}		}

Instruction *		Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy(
InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {		ElementUnorderedAtomicMemCpyInst *AMI) {
// Try to unfold this intrinsic into sequence of explicit atomic loads and		// Try to unfold this intrinsic into sequence of explicit atomic loads and
// stores.		// stores.
// First check that number of elements is compile time constant.		// First check that number of elements is compile time constant.
auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());		auto *LengthCI = dyn_cast<ConstantInt>(AMI->getLength());
if (!NumElementsCI)		if (!LengthCI)
return nullptr;		return nullptr;

// Check that there are not too many elements.		// Check that there are not too many elements.
uint64_t NumElements = NumElementsCI->getZExtValue();		uint64_t LengthInBytes = LengthCI->getZExtValue();
		uint32_t ElementSizeInBytes = AMI->getElementSizeInBytes();
		uint64_t NumElements = LengthInBytes / ElementSizeInBytes;
if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)		if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)
return nullptr;		return nullptr;

		// Only expand if there are elements to copy.
		if (NumElements > 0) {
// Don't unfold into illegal integers		// Don't unfold into illegal integers
uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;		uint64_t ElementSizeInBits = ElementSizeInBytes * 8;
if (!getDataLayout().isLegalInteger(ElementSizeInBytes))		if (!getDataLayout().isLegalInteger(ElementSizeInBits))
return nullptr;		return nullptr;

// Cast source and destination to the correct type. Intrinsic input arguments		// Cast source and destination to the correct type. Intrinsic input
// are usually represented as i8*.		// arguments are usually represented as i8*. Often operands will be
// Often operands will be explicitly casted to i8* and we can just strip		// explicitly casted to i8* and we can just strip those casts instead of
// those casts instead of inserting new ones. However it's easier to rely on		// inserting new ones. However it's easier to rely on other InstCombine
// other InstCombine rules which will cover trivial cases anyway.		// rules which will cover trivial cases anyway.
Value *Src = AMI->getRawSource();		Value *Src = AMI->getRawSource();
Value *Dst = AMI->getRawDest();		Value *Dst = AMI->getRawDest();
Type *ElementPointerType = Type::getIntNPtrTy(		Type *ElementPointerType =
AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace());		Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,
		Src->getType()->getPointerAddressSpace());

Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,		Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
"memcpy_unfold.src_casted");		"memcpy_unfold.src_casted");
Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,		Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
"memcpy_unfold.dst_casted");		"memcpy_unfold.dst_casted");

for (uint64_t i = 0; i < NumElements; ++i) {		for (uint64_t i = 0; i < NumElements; ++i) {
// Get current element addresses		// Get current element addresses
ConstantInt *ElementIdxCI =		ConstantInt *ElementIdxCI =
ConstantInt::get(AMI->getContext(), APInt(64, i));		ConstantInt::get(AMI->getContext(), APInt(64, i));
Value *SrcElementAddr =		Value *SrcElementAddr =
Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");		Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
Value *DstElementAddr =		Value *DstElementAddr =
Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");		Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");

// Load from the source. Transfer alignment information and mark load as		// Load from the source. Transfer alignment information and mark load as
// unordered atomic.		// unordered atomic.
LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");		LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
Load->setOrdering(AtomicOrdering::Unordered);		Load->setOrdering(AtomicOrdering::Unordered);
// We know alignment of the first element. It is also guaranteed by the		// We know alignment of the first element. It is also guaranteed by the
// verifier that element size is less or equal than first element alignment		// verifier that element size is less or equal than first element
// and both of this values are powers of two.		// alignment and both of this values are powers of two. This means that
// This means that all subsequent accesses are at least element size		// all subsequent accesses are at least element size aligned.
// aligned.
// TODO: We can infer better alignment but there is no evidence that this		// TODO: We can infer better alignment but there is no evidence that this
// will matter.		// will matter.
Load->setAlignment(i == 0 ? AMI->getSrcAlignment()		Load->setAlignment(i == 0 ? AMI->getParamAlignment(1)
: AMI->getElementSizeInBytes());		: ElementSizeInBytes);
Load->setDebugLoc(AMI->getDebugLoc());		Load->setDebugLoc(AMI->getDebugLoc());

// Store loaded value via unordered atomic store.		// Store loaded value via unordered atomic store.
StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);		StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
Store->setOrdering(AtomicOrdering::Unordered);		Store->setOrdering(AtomicOrdering::Unordered);
Store->setAlignment(i == 0 ? AMI->getDstAlignment()		Store->setAlignment(i == 0 ? AMI->getParamAlignment(0)
: AMI->getElementSizeInBytes());		: ElementSizeInBytes);
Store->setDebugLoc(AMI->getDebugLoc());		Store->setDebugLoc(AMI->getDebugLoc());
}		}
		}

// Set the number of elements of the copy to 0, it will be deleted on the		// Set the number of elements of the copy to 0, it will be deleted on the
// next iteration.		// next iteration.
AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));		AMI->setLength(Constant::getNullValue(LengthCI->getType()));
return AMI;		return AMI;
}		}

Instruction InstCombiner::SimplifyMemTransfer(MemIntrinsic MI) {		Instruction InstCombiner::SimplifyMemTransfer(MemIntrinsic MI) {
unsigned DstAlign = getKnownAlignment(MI->getArgOperand(0), DL, MI, &AC, &DT);		unsigned DstAlign = getKnownAlignment(MI->getArgOperand(0), DL, MI, &AC, &DT);
unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);		unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);
unsigned MinAlign = std::min(DstAlign, SrcAlign);		unsigned MinAlign = std::min(DstAlign, SrcAlign);
unsigned CopyAlign = MI->getAlignment();		unsigned CopyAlign = MI->getAlignment();
▲ Show 20 Lines • Show All 1,709 Lines • ▼ Show 20 Lines	if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(II)) {
} else if (MemSetInst *MSI = dyn_cast<MemSetInst>(MI)) {		} else if (MemSetInst *MSI = dyn_cast<MemSetInst>(MI)) {
if (Instruction *I = SimplifyMemSet(MSI))		if (Instruction *I = SimplifyMemSet(MSI))
return I;		return I;
}		}

if (Changed) return II;		if (Changed) return II;
}		}

if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {		if (auto *AMI = dyn_cast<ElementUnorderedAtomicMemCpyInst>(II)) {
if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))		if (Constant *C = dyn_cast<Constant>(AMI->getLength()))
if (C->isNullValue())		if (C->isNullValue())
return eraseInstFromFunction(*AMI);		return eraseInstFromFunction(*AMI);

if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))		if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI))
return I;		return I;
}		}

if (Instruction I = SimplifyNVVMIntrinsic(II, this))		if (Instruction I = SimplifyNVVMIntrinsic(II, this))
return I;		return I;

auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,		auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,
unsigned DemandedWidth) {		unsigned DemandedWidth) {
▲ Show 20 Lines • Show All 2,466 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	Instruction OptAndOp(BinaryOperator Op, ConstantInt *OpRHS,
ConstantInt *AndRHS, BinaryOperator &TheAnd);		ConstantInt *AndRHS, BinaryOperator &TheAnd);

Value insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,		Value insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,
bool isSigned, bool Inside);		bool isSigned, bool Inside);
Instruction *PromoteCastOfAllocation(BitCastInst &CI, AllocaInst &AI);		Instruction *PromoteCastOfAllocation(BitCastInst &CI, AllocaInst &AI);
Instruction *MatchBSwap(BinaryOperator &I);		Instruction *MatchBSwap(BinaryOperator &I);
bool SimplifyStoreAtEndOfBlock(StoreInst &SI);		bool SimplifyStoreAtEndOfBlock(StoreInst &SI);

Instruction SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst AMI);		Instruction *
		SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI);
Instruction SimplifyMemTransfer(MemIntrinsic MI);		Instruction SimplifyMemTransfer(MemIntrinsic MI);
Instruction SimplifyMemSet(MemSetInst MI);		Instruction SimplifyMemSet(MemSetInst MI);

Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);		Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);

/// \brief Returns a value X such that Val = X * Scale, or null if none.		/// \brief Returns a value X such that Val = X * Scale, or null if none.
///		///
/// If the multiplication is known not to overflow then NoSignedWrap is set.		/// If the multiplication is known not to overflow then NoSignedWrap is set.
Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);		Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);
};		};

} // end namespace llvm.		} // end namespace llvm.

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif		#endif

llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 977 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,

// The # stored bytes is (BECount+1)*Size. Expand the trip count out to		// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
// pointer size if it isn't already.		// pointer size if it isn't already.
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);		BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);

const SCEV *NumBytesS =		const SCEV *NumBytesS =
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);		SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);

unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
CallInst *NewCall = nullptr;
// Check whether to generate an unordered atomic memcpy:
// If the load or store are atomic, then they must neccessarily be unordered
// by previous checks.
if (!SI->isAtomic() && !LI->isAtomic()) {
if (StoreSize != 1)		if (StoreSize != 1)
NumBytesS = SE->getMulExpr(		NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
NumBytesS, SE->getConstant(IntPtrTy, StoreSize), SCEV::FlagNUW);		SCEV::FlagNUW);

Value *NumBytes =		Value *NumBytes =
Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());		Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());

		unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
		CallInst *NewCall = nullptr;
		// Check whether to generate an unordered atomic memcpy:
		// If the load or store are atomic, then they must neccessarily be unordered
		// by previous checks.
		if (!SI->isAtomic() && !LI->isAtomic())
NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);		NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
} else {		else {
// We cannot allow unaligned ops for unordered load/store, so reject		// We cannot allow unaligned ops for unordered load/store, so reject
// anything where the alignment isn't at least the element size.		// anything where the alignment isn't at least the element size.
if (Align < StoreSize)		if (Align < StoreSize)
return false;		return false;

// If the element.atomic memcpy is not lowered into explicit		// If the element.atomic memcpy is not lowered into explicit
// loads/stores later, then it will be lowered into an element-size		// loads/stores later, then it will be lowered into an element-size
// specific lib call. If the lib call doesn't exist for our store size, then		// specific lib call. If the lib call doesn't exist for our store size, then
// we shouldn't generate the memcpy.		// we shouldn't generate the memcpy.
if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize())		if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize())
return false;		return false;

Value *NumElements =		NewCall = Builder.CreateElementUnorderedAtomicMemCpy(
Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());		StoreBasePtr, LoadBasePtr, NumBytes, StoreSize);

NewCall = Builder.CreateElementAtomicMemCpy(StoreBasePtr, LoadBasePtr,
NumElements, StoreSize);
// Propagate alignment info onto the pointer args. Note that unordered		// Propagate alignment info onto the pointer args. Note that unordered
// atomic loads/stores are required by the spec to have an alignment		// atomic loads/stores are required by the spec to have an alignment
// but non-atomic loads/stores may not.		// but non-atomic loads/stores may not.
NewCall->addParamAttr(0, Attribute::getWithAlignment(NewCall->getContext(),		NewCall->addParamAttr(0, Attribute::getWithAlignment(NewCall->getContext(),
SI->getAlignment()));		SI->getAlignment()));
NewCall->addParamAttr(1, Attribute::getWithAlignment(NewCall->getContext(),		NewCall->addParamAttr(1, Attribute::getWithAlignment(NewCall->getContext(),
LI->getAlignment()));		LI->getAlignment()));
}		}
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define i8* @test_memcpy1(i8* %P, i8* %Q) {			define i8* @test_memcpy1(i8* %P, i8* %Q) {
	; CHECK: test_memcpy			; CHECK: test_memcpy
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 1)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $1, %edx			; CHECK-DAG: movl $1, %edx
	; CHECK-DAG: movl $1, %ecx			; CHECK: __llvm_memcpy_element_unordered_atomic_1
	; CHECK: __llvm_memcpy_element_atomic_1
	}			}

	define i8* @test_memcpy2(i8* %P, i8* %Q) {			define i8* @test_memcpy2(i8* %P, i8* %Q) {
	; CHECK: test_memcpy2			; CHECK: test_memcpy2
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 2)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $2, %edx			; CHECK-DAG: movl $2, %edx
	; CHECK-DAG: movl $2, %ecx			; CHECK: __llvm_memcpy_element_unordered_atomic_2
	; CHECK: __llvm_memcpy_element_atomic_2
	}			}

	define i8* @test_memcpy4(i8* %P, i8* %Q) {			define i8* @test_memcpy4(i8* %P, i8* %Q) {
	; CHECK: test_memcpy4			; CHECK: test_memcpy4
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $4, %edx			; CHECK-DAG: movl $4, %edx
	; CHECK-DAG: movl $4, %ecx			; CHECK: __llvm_memcpy_element_unordered_atomic_4
	; CHECK: __llvm_memcpy_element_atomic_4
	}			}

	define i8* @test_memcpy8(i8* %P, i8* %Q) {			define i8* @test_memcpy8(i8* %P, i8* %Q) {
	; CHECK: test_memcpy8			; CHECK: test_memcpy8
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $8, %edx			; CHECK-DAG: movl $8, %edx
	; CHECK-DAG: movl $8, %ecx			; CHECK: __llvm_memcpy_element_unordered_atomic_8
	; CHECK: __llvm_memcpy_element_atomic_8
	}			}

	define i8* @test_memcpy16(i8* %P, i8* %Q) {			define i8* @test_memcpy16(i8* %P, i8* %Q) {
	; CHECK: test_memcpy16			; CHECK: test_memcpy16
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $16, %edx			; CHECK-DAG: movl $16, %edx
	; CHECK-DAG: movl $16, %ecx			; CHECK: __llvm_memcpy_element_unordered_atomic_16
	; CHECK: __llvm_memcpy_element_atomic_16
	}			}

	define void @test_memcpy_args(i8** %Storage) {			define void @test_memcpy_args(i8** %Storage) {
	; CHECK: test_memcpy_args			; CHECK: test_memcpy_args
	%Dst = load i8, i8* %Storage			%Dst = load i8, i8* %Storage
	%Src.addr = getelementptr i8, i8* %Storage, i64 1			%Src.addr = getelementptr i8, i8* %Storage, i64 1
	%Src = load i8, i8* %Src.addr			%Src = load i8, i8* %Src.addr

	; First argument			; 1st arg (%rdi)
	; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]			; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
	; CHECK-DAG: movq [[REG1]], %rdi			; CHECK-DAG: movq [[REG1]], %rdi
	; Second argument			; 2nd arg (%rsi)
	; CHECK-DAG: movq 8(%rdi), %rsi			; CHECK-DAG: movq 8(%rdi), %rsi
	; Third argument			; 3rd arg (%edx) -- length
	; CHECK-DAG: movl $4, %edx			; CHECK-DAG: movl $4, %edx
	; Fourth argument			; CHECK: __llvm_memcpy_element_unordered_atomic_4
	; CHECK-DAG: movl $4, %ecx			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4) ret void
	; CHECK: __llvm_memcpy_element_atomic_4
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)
	ret void
	}			}

	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll

	; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s \| FileCheck %s			; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s \| FileCheck %s
				; Temporarily an expected failure until inst combine is updated in the next patch
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; Test basic unfolding			; Test basic unfolding -- unordered load & store
	define void @test1(i8* %Src, i8* %Dst) {			define void @test1a(i8* %Src, i8* %Dst) {
	; CHECK-LABEL: test1			; CHECK-LABEL: test1a
	; CHECK-NOT: llvm.memcpy.element.atomic			; CHECK-NOT: llvm.memcpy.element.unordered.atomic

	; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*			; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*
	; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32*			; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32*

	; CHECK-DAG: [[VAL1:%[^\s]+]] = load atomic i32, i32* %memcpy_unfold.src_casted unordered, align 4			; CHECK-DAG: [[VAL1:%[^\s]+]] = load atomic i32, i32* %memcpy_unfold.src_casted unordered, align 4
	; CHECK-DAG: store atomic i32 [[VAL1]], i32* %memcpy_unfold.dst_casted unordered, align 8			; CHECK-DAG: store atomic i32 [[VAL1]], i32* %memcpy_unfold.dst_casted unordered, align 8

	; CHECK-DAG: [[VAL2:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: [[VAL2:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4
	; CHECK-DAG: store atomic i32 [[VAL2]], i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: store atomic i32 [[VAL2]], i32* %{{[^\s]+}} unordered, align 4

	; CHECK-DAG: [[VAL3:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: [[VAL3:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4
	; CHECK-DAG: store atomic i32 [[VAL3]], i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: store atomic i32 [[VAL3]], i32* %{{[^\s]+}} unordered, align 4

	; CHECK-DAG: [[VAL4:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: [[VAL4:%[^\s]+]] = load atomic i32, i32* %{{[^\s]+}} unordered, align 4
	; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4			; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4
	entry:			entry:
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i32 4)
	ret void			ret void
	}			}

	; Test that we don't unfold too much			; Test that we don't unfold too much
	define void @test2(i8* %Src, i8* %Dst) {			define void @test2(i8* %Src, i8* %Dst) {
	; CHECK-LABEL: test2			; CHECK-LABEL: test2

	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: llvm.memcpy.element.atomic			; CHECK: llvm.memcpy.element.unordered.atomic
	entry:			entry:
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i32 4)
	ret void			ret void
	}			}

	; Test that we will not unfold into non native integers			; Test that we will not unfold into non native integers
	define void @test3(i8* %Src, i8* %Dst) {			define void @test3(i8* %Src, i8* %Dst) {
	; CHECK-LABEL: test3			; CHECK-LABEL: test3

	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: llvm.memcpy.element.atomic			; CHECK: llvm.memcpy.element.unordered.atomic
	entry:			entry:
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i32 64)
	ret void			ret void
	}			}

	; Test that we will eliminate redundant bitcasts			; Test that we will eliminate redundant bitcasts
	define void @test4(i64* %Src, i64* %Dst) {			define void @test4(i64* %Src, i64* %Dst) {
	; CHECK-LABEL: test4			; CHECK-LABEL: test4
	; CHECK-NOT: llvm.memcpy.element.atomic			; CHECK-NOT: llvm.memcpy.element.unordered.atomic

	; CHECK-NOT: bitcast			; CHECK-NOT: bitcast

	; CHECK-DAG: [[VAL1:%[^\s]+]] = load atomic i64, i64* %Src unordered, align 16			; CHECK-DAG: [[VAL1:%[^\s]+]] = load atomic i64, i64* %Src unordered, align 16
	; CHECK-DAG: store atomic i64 [[VAL1]], i64* %Dst unordered, align 16			; CHECK-DAG: store atomic i64 [[VAL1]], i64* %Dst unordered, align 16

	; CHECK-DAG: [[SRC_ADDR2:%[^ ]+]] = getelementptr i64, i64* %Src, i64 1			; CHECK-DAG: [[SRC_ADDR2:%[^ ]+]] = getelementptr i64, i64* %Src, i64 1
	; CHECK-DAG: [[DST_ADDR2:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 1			; CHECK-DAG: [[DST_ADDR2:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 1
	; CHECK-DAG: [[VAL2:%[^\s]+]] = load atomic i64, i64* [[SRC_ADDR2]] unordered, align 8			; CHECK-DAG: [[VAL2:%[^\s]+]] = load atomic i64, i64* [[SRC_ADDR2]] unordered, align 8
	; CHECK-DAG: store atomic i64 [[VAL2]], i64* [[DST_ADDR2]] unordered, align 8			; CHECK-DAG: store atomic i64 [[VAL2]], i64* [[DST_ADDR2]] unordered, align 8

	; CHECK-DAG: [[SRC_ADDR3:%[^ ]+]] = getelementptr i64, i64* %Src, i64 2			; CHECK-DAG: [[SRC_ADDR3:%[^ ]+]] = getelementptr i64, i64* %Src, i64 2
	; CHECK-DAG: [[DST_ADDR3:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 2			; CHECK-DAG: [[DST_ADDR3:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 2
	; CHECK-DAG: [[VAL3:%[^ ]+]] = load atomic i64, i64* [[SRC_ADDR3]] unordered, align 8			; CHECK-DAG: [[VAL3:%[^ ]+]] = load atomic i64, i64* [[SRC_ADDR3]] unordered, align 8
	; CHECK-DAG: store atomic i64 [[VAL3]], i64* [[DST_ADDR3]] unordered, align 8			; CHECK-DAG: store atomic i64 [[VAL3]], i64* [[DST_ADDR3]] unordered, align 8

	; CHECK-DAG: [[SRC_ADDR4:%[^ ]+]] = getelementptr i64, i64* %Src, i64 3			; CHECK-DAG: [[SRC_ADDR4:%[^ ]+]] = getelementptr i64, i64* %Src, i64 3
	; CHECK-DAG: [[DST_ADDR4:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 3			; CHECK-DAG: [[DST_ADDR4:%[^ ]+]] = getelementptr i64, i64* %Dst, i64 3
	; CHECK-DAG: [[VAL4:%[^ ]+]] = load atomic i64, i64* [[SRC_ADDR4]] unordered, align 8			; CHECK-DAG: [[VAL4:%[^ ]+]] = load atomic i64, i64* [[SRC_ADDR4]] unordered, align 8
	; CHECK-DAG: store atomic i64 [[VAL4]], i64* [[DST_ADDR4]] unordered, align 8			; CHECK-DAG: store atomic i64 [[VAL4]], i64* [[DST_ADDR4]] unordered, align 8
	entry:			entry:
	%Src.casted = bitcast i64* %Src to i8*			%Src.casted = bitcast i64* %Src to i8*
	%Dst.casted = bitcast i64* %Dst to i8*			%Dst.casted = bitcast i64* %Dst to i8*
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i32 8)
	ret void			ret void
	}			}

				; Test that 0-length unordered atomic memcpy gets removed.
	define void @test5(i8* %Src, i8* %Dst) {			define void @test5(i8* %Src, i8* %Dst) {
	; CHECK-LABEL: test5			; CHECK-LABEL: test5

	; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)			; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
	entry:			entry:
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
	ret void			ret void
	}			}

	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32)			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll

	; RUN: opt -basicaa -loop-idiom < %s -S \| FileCheck %s			; RUN: opt -basicaa -loop-idiom < %s -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	;; memcpy.atomic formation (atomic load & store)			;; memcpy.atomic formation (atomic load & store)
	define void @test1(i64 %Size) nounwind ssp {			define void @test1(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)			; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load atomic i8, i8* %I.0.014 unordered, align 1			%V = load atomic i8, i8* %I.0.014 unordered, align 1
	store atomic i8 %V, i8* %DestI unordered, align 1			store atomic i8 %V, i8* %DestI unordered, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation (atomic store, normal load)			;; memcpy.atomic formation (atomic store, normal load)
	define void @test2(i64 %Size) nounwind ssp {			define void @test2(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)			; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load i8, i8* %I.0.014, align 1			%V = load i8, i8* %I.0.014, align 1
	store atomic i8 %V, i8* %DestI unordered, align 1			store atomic i8 %V, i8* %DestI unordered, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (atomic store, normal load w/ no align)			;; memcpy.atomic formation rejection (atomic store, normal load w/ no align)
	define void @test2b(i64 %Size) nounwind ssp {			define void @test2b(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test2b(			; CHECK-LABEL: @test2b(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load i8, i8* %I.0.014			%V = load i8, i8* %I.0.014
	store atomic i8 %V, i8* %DestI unordered, align 1			store atomic i8 %V, i8* %DestI unordered, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align)			;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align)
	define void @test2c(i64 %Size) nounwind ssp {			define void @test2c(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test2c(			; CHECK-LABEL: @test2c(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i32, i32 10000			%Base = alloca i32, i32 10000
	%Dest = alloca i32, i32 10000			%Dest = alloca i32, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar			%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar
	%DestI = getelementptr i32, i32* %Dest, i64 %indvar			%DestI = getelementptr i32, i32* %Dest, i64 %indvar
	%V = load i32, i32* %I.0.014, align 2			%V = load i32, i32* %I.0.014, align 2
	store atomic i32 %V, i32* %DestI unordered, align 4			store atomic i32 %V, i32* %DestI unordered, align 4
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load)			;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load)
	define void @test2d(i64 %Size) nounwind ssp {			define void @test2d(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test2d(			; CHECK-LABEL: @test2d(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i32, i32 10000			%Base = alloca i32, i32 10000
	%Dest = alloca i32, i32 10000			%Dest = alloca i32, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	Show All 9 Lines
	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}


	;; memcpy.atomic formation (normal store, atomic load)			;; memcpy.atomic formation (normal store, atomic load)
	define void @test3(i64 %Size) nounwind ssp {			define void @test3(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test3(			; CHECK-LABEL: @test3(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)			; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load atomic i8, i8* %I.0.014 unordered, align 1			%V = load atomic i8, i8* %I.0.014 unordered, align 1
	store i8 %V, i8* %DestI, align 1			store i8 %V, i8* %DestI, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (normal store w/ no align, atomic load)			;; memcpy.atomic formation rejection (normal store w/ no align, atomic load)
	define void @test3b(i64 %Size) nounwind ssp {			define void @test3b(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test3b(			; CHECK-LABEL: @test3b(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load atomic i8, i8* %I.0.014 unordered, align 1			%V = load atomic i8, i8* %I.0.014 unordered, align 1
	store i8 %V, i8* %DestI			store i8 %V, i8* %DestI
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align)			;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align)
	define void @test3c(i64 %Size) nounwind ssp {			define void @test3c(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test3c(			; CHECK-LABEL: @test3c(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i32, i32 10000			%Base = alloca i32, i32 10000
	%Dest = alloca i32, i32 10000			%Dest = alloca i32, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar			%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar
	%DestI = getelementptr i32, i32* %Dest, i64 %indvar			%DestI = getelementptr i32, i32* %Dest, i64 %indvar
	%V = load atomic i32, i32* %I.0.014 unordered, align 2			%V = load atomic i32, i32* %I.0.014 unordered, align 2
	store i32 %V, i32* %DestI, align 4			store i32 %V, i32* %DestI, align 4
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load)			;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load)
	define void @test3d(i64 %Size) nounwind ssp {			define void @test3d(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test3d(			; CHECK-LABEL: @test3d(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i32, i32 10000			%Base = alloca i32, i32 10000
	%Dest = alloca i32, i32 10000			%Dest = alloca i32, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	Show All 9 Lines
	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}


	;; memcpy.atomic formation rejection (atomic load, ordered-atomic store)			;; memcpy.atomic formation rejection (atomic load, ordered-atomic store)
	define void @test4(i64 %Size) nounwind ssp {			define void @test4(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test4(			; CHECK-LABEL: @test4(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load atomic i8, i8* %I.0.014 unordered, align 1			%V = load atomic i8, i8* %I.0.014 unordered, align 1
	store atomic i8 %V, i8* %DestI monotonic, align 1			store atomic i8 %V, i8* %DestI monotonic, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store)			;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store)
	define void @test5(i64 %Size) nounwind ssp {			define void @test5(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test5(			; CHECK-LABEL: @test5(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i8, i32 10000			%Base = alloca i8, i32 10000
	%Dest = alloca i8, i32 10000			%Dest = alloca i8, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar			%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
	%DestI = getelementptr i8, i8* %Dest, i64 %indvar			%DestI = getelementptr i8, i8* %Dest, i64 %indvar
	%V = load atomic i8, i8* %I.0.014 monotonic, align 1			%V = load atomic i8, i8* %I.0.014 monotonic, align 1
	store atomic i8 %V, i8* %DestI unordered, align 1			store atomic i8 %V, i8* %DestI unordered, align 1
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation (atomic load & store) -- element size 2			;; memcpy.atomic formation (atomic load & store) -- element size 2
	define void @test6(i64 %Size) nounwind ssp {			define void @test6(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test6(			; CHECK-LABEL: @test6(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %Dest{{[0-9]}}, i8 align 2 %Base{{[0-9]*}}, i64 %Size, i32 2)			; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 1
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 2 %Dest{{[0-9]}}, i8 align 2 %Base{{[0-9]*}}, i64 [[Sz]], i32 2)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i16, i32 10000			%Base = alloca i16, i32 10000
	%Dest = alloca i16, i32 10000			%Dest = alloca i16, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i16, i16* %Base, i64 %indvar			%I.0.014 = getelementptr i16, i16* %Base, i64 %indvar
	%DestI = getelementptr i16, i16* %Dest, i64 %indvar			%DestI = getelementptr i16, i16* %Dest, i64 %indvar
	%V = load atomic i16, i16* %I.0.014 unordered, align 2			%V = load atomic i16, i16* %I.0.014 unordered, align 2
	store atomic i16 %V, i16* %DestI unordered, align 2			store atomic i16 %V, i16* %DestI unordered, align 2
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation (atomic load & store) -- element size 4			;; memcpy.atomic formation (atomic load & store) -- element size 4
	define void @test7(i64 %Size) nounwind ssp {			define void @test7(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test7(			; CHECK-LABEL: @test7(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dest{{[0-9]}}, i8 align 4 %Base{{[0-9]*}}, i64 %Size, i32 4)			; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 2
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 4 %Dest{{[0-9]}}, i8 align 4 %Base{{[0-9]*}}, i64 [[Sz]], i32 4)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i32, i32 10000			%Base = alloca i32, i32 10000
	%Dest = alloca i32, i32 10000			%Dest = alloca i32, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar			%I.0.014 = getelementptr i32, i32* %Base, i64 %indvar
	%DestI = getelementptr i32, i32* %Dest, i64 %indvar			%DestI = getelementptr i32, i32* %Dest, i64 %indvar
	%V = load atomic i32, i32* %I.0.014 unordered, align 4			%V = load atomic i32, i32* %I.0.014 unordered, align 4
	store atomic i32 %V, i32* %DestI unordered, align 4			store atomic i32 %V, i32* %DestI unordered, align 4
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation (atomic load & store) -- element size 8			;; memcpy.atomic formation (atomic load & store) -- element size 8
	define void @test8(i64 %Size) nounwind ssp {			define void @test8(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test8(			; CHECK-LABEL: @test8(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %Dest{{[0-9]}}, i8 align 8 %Base{{[0-9]*}}, i64 %Size, i32 8)			; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 3
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 8 %Dest{{[0-9]}}, i8 align 8 %Base{{[0-9]*}}, i64 [[Sz]], i32 8)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i64, i32 10000			%Base = alloca i64, i32 10000
	%Dest = alloca i64, i32 10000			%Dest = alloca i64, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i64, i64* %Base, i64 %indvar			%I.0.014 = getelementptr i64, i64* %Base, i64 %indvar
	%DestI = getelementptr i64, i64* %Dest, i64 %indvar			%DestI = getelementptr i64, i64* %Dest, i64 %indvar
	%V = load atomic i64, i64* %I.0.014 unordered, align 8			%V = load atomic i64, i64* %I.0.014 unordered, align 8
	store atomic i64 %V, i64* %DestI unordered, align 8			store atomic i64 %V, i64* %DestI unordered, align 8
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (atomic load & store) -- element size 16			;; memcpy.atomic formation rejection (atomic load & store) -- element size 16
	define void @test9(i64 %Size) nounwind ssp {			define void @test9(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test9(			; CHECK-LABEL: @test9(
	; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dest{{[0-9]}}, i8 align 16 %Base{{[0-9]*}}, i64 %Size, i32 16)			; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 4
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 16 %Dest{{[0-9]}}, i8 align 16 %Base{{[0-9]*}}, i64 [[Sz]], i32 16)
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i128, i32 10000			%Base = alloca i128, i32 10000
	%Dest = alloca i128, i32 10000			%Dest = alloca i128, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]			%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
	%I.0.014 = getelementptr i128, i128* %Base, i64 %indvar			%I.0.014 = getelementptr i128, i128* %Base, i64 %indvar
	%DestI = getelementptr i128, i128* %Dest, i64 %indvar			%DestI = getelementptr i128, i128* %Dest, i64 %indvar
	%V = load atomic i128, i128* %I.0.014 unordered, align 16			%V = load atomic i128, i128* %I.0.014 unordered, align 16
	store atomic i128 %V, i128* %DestI unordered, align 16			store atomic i128 %V, i128* %DestI unordered, align 16
	%indvar.next = add i64 %indvar, 1			%indvar.next = add i64 %indvar, 1
	%exitcond = icmp eq i64 %indvar.next, %Size			%exitcond = icmp eq i64 %indvar.next, %Size
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	;; memcpy.atomic formation rejection (atomic load & store) -- element size 32			;; memcpy.atomic formation rejection (atomic load & store) -- element size 32
	define void @test10(i64 %Size) nounwind ssp {			define void @test10(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test10(			; CHECK-LABEL: @test10(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i256, i32 10000			%Base = alloca i256, i32 10000
	%Dest = alloca i256, i32 10000			%Dest = alloca i256, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll

	; RUN: opt -basicaa -loop-idiom < %s -S \| FileCheck %s			; RUN: opt -basicaa -loop-idiom < %s -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

	;; memcpy.atomic formation (atomic load & store) -- element size 2			;; memcpy.atomic formation (atomic load & store) -- element size 2
	;; Will not create call due to a max element size of 0			;; Will not create call due to a max element size of 0
	define void @test1(i64 %Size) nounwind ssp {			define void @test1(i64 %Size) nounwind ssp {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NOT: call void @llvm.memcpy.element.atomic			; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
	; CHECK: store			; CHECK: store
	; CHECK: ret void			; CHECK: ret void
	bb.nph:			bb.nph:
	%Base = alloca i16, i32 10000			%Base = alloca i16, i32 10000
	%Dest = alloca i16, i32 10000			%Dest = alloca i16, i32 10000
	br label %for.body			br label %for.body

	for.body: ; preds = %bb.nph, %for.body			for.body: ; preds = %bb.nph, %for.body
	Show All 12 Lines

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

	; RUN: not opt -verify < %s 2>&1 \| FileCheck %s			; RUN: not opt -verify < %s 2>&1 \| FileCheck %s

	define void @test_memcpy(i8* %P, i8* %Q) {			define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i32 %E) {
				; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %E)
	; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2			; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 3)

				; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 7, i32 4)

				; CHECK: incorrect alignment of the destination argument
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 1)
	; CHECK: incorrect alignment of the destination argument			; CHECK: incorrect alignment of the destination argument
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4)

	; CHECK: incorrect alignment of the source argument			; CHECK: incorrect alignment of the source argument
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)
				; CHECK: incorrect alignment of the source argument
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)

	ret void			ret void
	}			}
	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

	; CHECK: input module is broken!			; CHECK: input module is broken!

This is an archive of the discontinued LLVM Phabricator instance.

[Atomics] Rename and change prototype for atomic memcpy intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 102827

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h

llvm/trunk/include/llvm/IR/IRBuilder.h

llvm/trunk/include/llvm/IR/IntrinsicInst.h

llvm/trunk/include/llvm/IR/Intrinsics.td

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

llvm/trunk/lib/IR/IRBuilder.cpp

llvm/trunk/lib/IR/Verifier.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll

llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll

llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll

llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

[Atomics] Rename and change prototype for atomic memcpy intrinsic
ClosedPublic