This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
6/13
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
RuntimeLibcalls.h
-
IR/
7/14
IntrinsicInst.h
1/2
Intrinsics.td
-
lib/
-
CodeGen/
-
SelectionDAG/
1/5
SelectionDAGBuilder.cpp
-
TargetLoweringBase.cpp
-
IR/
1/4
Verifier.cpp
-
Transforms/InstCombine/
-
InstCombine/
7
InstCombineCalls.cpp
-
InstCombineInternal.h
-
test/
-
CodeGen/X86/
-
X86/
-
element-wise-atomic-memory-intrinsics.ll
-
Transforms/InstCombine/
-
InstCombine/
-
element-atomic-memcpy-to-loads.ll
-
Verifier/
-
element-wise-atomic-memory-intrinsics.ll

Differential D33240

[Atomics] Rename and change prototype for atomic memcpy intrinsic
ClosedPublic

Authored by dneilson on May 16 2017, 7:34 AM.

Download Raw Diff

Details

Reviewers

reames
sanjoy
efriedma

Commits

rG3faabbbe85d5: [Atomics] Rename and change prototype for atomic memcpy intrinsic
rL305558: [Atomics] Rename and change prototype for atomic memcpy intrinsic

Summary

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Diff Detail

Event Timeline

dneilson created this revision.May 16 2017, 7:34 AM

Note that this is the first of a series of patches that are being developed for the unordered atomic memcpy. Minimally, the plan is to push the following changes one at a time to minimize risk and impact on others:
i. Change intrinsic name, prototype (to match memcpy closely), & documentation.
ii. Add code to loop idiom to recognize the element unordered atomic memcpy.
iii. Add code to instcombine & selection dag builder to lower the intrinsic.
iv. Add an isunordered() to the MemIntrinsic introspection class (returning false for all existing intrinsics), and add calls to it to all passes it's relevant.
v. Add intrinsic into the introspection hierarchy & complete support for new intrinsic in passes.

dneilson added a reviewer: efriedma.May 16 2017, 9:33 AM

Reviewing only the LangRef changes for the moment. Let's iterate on those until we're happy and then I can go looking for code issues.

docs/LangRef.rst
13582	Your revised text is missing key aspects of the old text. You need to preserve the "as a sequence of intrinsic. It differs in that the `dest `and` `src` `are treated as arrays with elements that are` `element_size`` bytes wide and aligned at an element size boundary. " wording from the original, because this is semantically important.
13592	Er, huh? What do these major values mean? And why do we need anything other than an i1 boolean?
13595	This sentence is important and shouldn't be dropped.
13607	Ah, the answer to my question above. I think it would be cleaner to have two i1 params instead of encoding the bitmask. Does that complicate anything for you?
13614	This is slightly wrong. You don't need the writes to be unordered atomic if the src/dest doesn't need it, but you do still want to allow concurrent reads and writes. I think you want something along the lines of: "It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are unordered atomic when specified.

This revision now requires changes to proceed.May 16 2017, 11:18 AM

dneilson added a child revision: D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.May 16 2017, 11:20 AM

Do we get any practical benefit from separately specifying whether the source and destination require unordered operations?

Why are you adding an alignment parameter? The alignment is already specified with attributes. (There was a plan at one point to change memcpy to specify alignment like this; IIRC it got committed, then reverted? I don't recall what happened after that.)

Why do we want to specify the length in bytes, as opposed to the number of elements to copy? Any implementation is inevitably just going to divide a length in bytes by the element size.

I'm not really sure why you're messing with the signature of the intrinsic in the first place; we went through most of this design space when it was initially proposed.

In D33240#756419, @efriedma wrote:

Do we get any practical benefit from separately specifying whether the source and destination require unordered operations?

I don't know enough about the possible source languages to know with 100% certainty that it's not possible to mix, say, unordered loads with ordered stores. For example, Java only requires the unordered ops for shared data (i.e. stuff on the heap). It's conceivable that a memcpy is desired to copy, say, from the stack to the heap; only the heap stores would need to be unordered in this case -- playing devil's advocate, it would not be wrong (just unnecessary) to use unordered loads of the stack data here as well. Erring on the side of flexible/generic here.

Why are you adding an alignment parameter? The alignment is already specified with attributes. (There was a plan at one point to change memcpy to specify alignment like this; IIRC it got committed, then reverted? I don't recall what happened after that.)

Compatibility with the existing memcpy intrinsic. I have no problem removing the alignment parameter if that's the long-term goal/vision. My only concern is that I don't know whether that difference will lead me into trouble when I get to the point of adding the unordered atomic intrinsic to the MemTransferInst introspection class hierarchy.

Why do we want to specify the length in bytes, as opposed to the number of elements to copy? Any implementation is inevitably just going to divide a length in bytes by the element size.

Compatibility with memcpy semantics. The 'length' parameter of a MemTransferInst intrinsic is semantically understood by transforms/analysis to be the size of the transfer in bytes. Having an intrinsic in that hierarchy that specifies a non-bytes value for that parameter strikes me as a recipe for bugs.

I'm not really sure why you're messing with the signature of the intrinsic in the first place; we went through most of this design space when it was initially proposed.

It's being revisited with a long-term eye towards merging the unordered atomic semantics into the existing memcpy intrinsic. By making the proposed/new intrinsic's definition much closer to that of the existing llvm.memcpy it should be much easier to do that eventual merging of the two without having to revisit the semantic understanding of the unordered intrinsic in every analysis/transform.

I'll make the suggested changes to the LangRef.

docs/LangRef.rst
13607	Easy enough to do if it's desired. Only reason that I didn't do that originally is that I'm already adding two parameters on top of memcpy for unordered memcpy; seemed like a good way to prevent that from becoming three additional parameters.

Addressed suggested changes to the LangRef doc for the intrinsic.
Split out the is unordered parameter into two separate parameters -- dest_unordered & src_unordered.

dneilson marked 5 inline comments as done.May 16 2017, 2:39 PM

I'm going to let Daniel and Eli debate the general direction before reviewing further. I want to make sure Eli is on board with the general direction before we invest lots of effort in cleaning up the code.

skatkov added a subscriber: skatkov.May 17 2017, 3:05 AM

skatkov added inline comments.

include/llvm/IR/IntrinsicInst.h
199	why not enum?
200	ARG_SOURCE? For consistency with other names.
205	ARG_SOURCE_UNORDERED? For consistency with other names.
245	bool isDestUnordered?
250	bool isSrcUnordered?
297	Don't you want to check (or assert) the constraints for the alignment here?
303	setSourceUnordered? For consistency with other names.
305	Don't you want to check (or assert) the constraints for the element size here?
include/llvm/IR/Intrinsics.td
810	you have two unordered arguments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4871	You introduced the special getters, can you use them here?
lib/IR/Verifier.cpp
3997	is zero alignment allowed? At least it is strange that the 0 is a power of 2 :) if it is allowed, please update text "or zero"

I don't know enough about the possible source languages to know with 100% certainty that it's not possible to mix, say, unordered loads with ordered stores.

It's almost certainly possible to mix them... even if the source language doesn't allow mixing them, we could transform unordered operations to non-atomic operations if we can prove the memory isn't accessed from another thread. (I don't think we actually do this transform at the moment, but LICM has a similar sort of check.)

The question is whether there's actually any optimization that would actually check the unordered bit. Maybe there is? (See my comments on __llvm_memcpy_element_unordered_atomic_*.)

Compatibility with the existing memcpy intrinsic. I have no problem removing the alignment parameter if that's the long-term goal/vision. My only concern is that I don't know whether that difference will lead me into trouble when I get to the point of adding the unordered atomic intrinsic to the MemTransferInst introspection class hierarchy.

Yes, this is the direction we want to go in. You should be able to hide the difference in the implementation of MemIntrinsic, I think. If it does cause problems, we can revisit.

Compatibility with memcpy semantics. The 'length' parameter of a MemTransferInst intrinsic is semantically understood by transforms/analysis to be the size of the transfer in bytes. Having an intrinsic in that hierarchy that specifies a non-bytes value for that parameter strikes me as a recipe for bugs.

Okay. It's a little ugly, but I guess it isn't that terrible.

docs/LangRef.rst
13592	"must" is kind of confusing in this context. Probably need to explicitly say "if len is not a multiple of element_size, the behavior is undefined", or something like that, since we can't actually tell until runtime.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891	I'm not sure changing the signature of __llvm_memcpy_element_unordered_atomic_* like this makes sense. I guess changing from the number of elements to the length in bytes is fine. I'm not sure why you want to pass the alignment to the function; the implementation can easily compute the alignment itself based on the pointers passed in. Not sure what the implementation is going to do with the DestUnordered and SrcUnordered parameters. Maybe if the source/dest is non-atomic, it could use unaligned load/store operations? If we do need DestUnordered and SrcUnordered, it probably makes sense to merge them to save an instruction in the caller.

dneilson marked 8 inline comments as done.May 17 2017, 1:24 PM

dneilson added inline comments.

docs/LangRef.rst
13582	I'm not seeing the difference here. "Sequence" doesn't imply any sort of ordering. So, to me the old and new are semantically equivalent -- there is a copy happening, and it's being done with unordered atomic load/stores.
13592	Fair. I'll make that change.
include/llvm/IR/IntrinsicInst.h
199	No particularly good reason. Just playing around.
200	Fair.
297	I figured that's handled by the verifier.
305	Handled by verifier.
include/llvm/IR/Intrinsics.td
810	Good catch! I updated the prototype, but neglected the comment.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891	Fair. I have no strong preference with the library function taking size in bytes vs. number of elements. The only benefit that I can see to passing size in bytes instead of number of elements here is the possibility for runtime checks being implemented in a debug version of the library -- one that verifies that the length is a multiple of element size. Good point about alignment. Not the smartest move on my part. I think that you're right about passing dest_unordered & src_unordered to the library -- it might be unnecessary. The lib function could just be implemented assuming that both source & dest require unordered atomic ops; there shouldn't be any harm in it, since unordered just means that we can't break up an element into partial loads/stores, and we wouldn't want to do that in a high performance library anyways. I'll change the lib prototype to match memcpy exactly. _llvm_memcpy_unordered_atomic(i8* noalias dest, i8* noalias src, uint64 length)
lib/IR/Verifier.cpp
3997	I would think not, but this is the same check as exists for memcpy. So, for compatibility I think it should be allowed unless we can definitively say that it's not allowed for memcpy.

Addressing suggestions.

dneilson added a subscriber: llvm-commits.May 18 2017, 1:00 PM

anna added a subscriber: anna.May 18 2017, 2:39 PM

anna added inline comments.

docs/LangRef.rst
13594–13595	Nit: if and only if stores to the destination buffer are

skatkov added inline comments.May 18 2017, 9:08 PM

include/llvm/IR/IntrinsicInst.h
297	ok, but to me it is one of a primary goal for setter to check incoming args.
lib/IR/Verifier.cpp
3991	Use getters?

Addressing some comments -- use of getters, adding assertions to setters, and some minor wording changes to LangRef.

skatkov added inline comments.May 21 2017, 8:07 PM

lib/IR/Verifier.cpp
3992	Extra semicolon

dneilson mentioned this in D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.May 25 2017, 9:39 AM

dneilson removed a child revision: D33243: [Atomics][LoopIdiom] Recognize unordered atomic memcpy.

Another iteration on the intrinsic prototype. I've removed the align, and dest/src_unordered arguments.
- Having align both as arg attributes and as an arg could cause challenges if we need to resolve a difference, and it is the desired future direction for intrinsics.
- Upon further thought, and digging into where passes would have to be made aware of this intrinsic -- I'm no longer convinced about the value of the separate dest_unordered/src_unordered args.
  - It seems sufficient to have the semantics of the intrinsic being that all loads/stores are unordered atomic; we can still "promote" idioms that mix, say, unordered loads with simple stores.
  - Any library implementation will just use unordered atomic loads & stores throughout, anyways.
  - There will be a side-effect of promoting simple ops to unordered-atomic if we recognize a loop idiom, and then later lower it into loads/stores. The tradeoff is that it should be easier to work with the intrinsic in passes.
  - The only value that I can see in having the separate dest_unordered/src_unordered args is that in lowering passes that change the intrinsic into explicit loads/stores we wouldn't "promote" a simple op into an unordered-atomic op.

Added in updating the InstCombine lowering of the intrinsic so that the change doesn't lose functionality; even temporarily.

efriedma added inline comments.May 26 2017, 11:41 AM

docs/LangRef.rst
13573	Hmm... I didn't mention isvolatile earlier? See https://reviews.llvm.org/D27133?id=79305#629884 for original discussion of it. At the very least, we need a better description of what it means.

skatkov added inline comments.May 28 2017, 10:49 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891	MI.getLength() == Length
lib/Transforms/InstCombine/InstCombineCalls.cpp
110	assert LengthInBytes % ElementSizeInBytes == 0 and LengthInBytes > 0?

Did a scan through the code, didn't spot anything major. Once we settle the last few design/specification questions, this looks basically ready to go in.

docs/LangRef.rst
13573	I think we can just remove this. The original motivation was essentially future proofing, and I don't think it's worth keeping the complexity for now. We can change our minds later if it turns out we actually need this.
lib/Transforms/InstCombine/InstCombineCalls.cpp
110	Agreed. Also, length might actually be zero. We should remove such calls.
1900	Hm, I might sink this into the helper function. Optional, and can be submitted separately without further review.

dneilson added inline comments.May 31 2017, 12:29 PM

docs/LangRef.rst
13573	Agreed. I'll remove it.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4891	Not quite. Doesn't seem to be a straightforward way to go from an SDValue to a Type*, so I don't think this sort of replacement can be made.
lib/Transforms/InstCombine/InstCombineCalls.cpp
110	Re: The assert. I think that it would be better to check this in the verifier. In the LangRef we've said that it's undefined behaviour if length isn't a multiple of element size, so I think it's okay to blindly do this divide here and add a check to the verifier. Re: Zero length; see the in-line comment below.
1900	This is following the same pattern/code-flow as the normal memcpy/memmove/memset handlers just above this. i.e. Check for a null length -- if there is one, then remove the call, else call the simplify method for the intrinsic. I'm inclined to stick to this pattern to make the later merging of the introspection classes cleaner.

Remove volatile arg from intrinsic.
Add check to verifier to ensure that constant length is a multiple of element size & add corresponding test.

LGTM w/one comment addressed before submission.

lib/Transforms/InstCombine/InstCombineCalls.cpp
116	Where did this check come from and why is it needed? It looks like an attempt to handle a length which isn't an even interval of element size, but the verifier should reject that?

This revision is now accepted and ready to land.Jun 5 2017, 7:07 PM

dneilson added inline comments.Jun 6 2017, 6:29 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
116	Just me being extra cautious. The verifier checks for the case where constant length is not a multiple of element size, and the zero length case is handled elsewhere. However, I'm not sure that the verifier runs after every single pass. So, I figure there's no harm in handling the corner case.

Loop idiom patch was dropped, so update loop idiom recognition as well.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJun 6 2017, 1:03 PM

dneilson added inline comments.Jun 7 2017, 7:57 AM

include/llvm/IR/IntrinsicInst.h
197	I'm inclined to change this name to 'EUAMemcpyInst' to cut down on the length of its name. Any objections?

rebase

Closed by commit rL305558: [Atomics] Rename and change prototype for atomic memcpy intrinsic (authored by dneilson). · Explain WhyJun 16 2017, 7:44 AM

This revision was automatically updated to reflect the committed changes.

jfb mentioned this in D79279: Add overloaded versions of builtin mem* functions.Aug 4 2020, 5:51 PM

Revision Contents

Path

Size

docs/

LangRef.rst

72 lines

include/

llvm/

CodeGen/

RuntimeLibcalls.h

16 lines

IR/

IntrinsicInst.h

119 lines

Intrinsics.td

17 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

26 lines

TargetLoweringBase.cpp

23 lines

IR/

Verifier.cpp

46 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

6 lines

InstCombineInternal.h

2 lines

test/

CodeGen/

X86/

element-wise-atomic-memory-intrinsics.ll

73 lines

Transforms/

InstCombine/

element-atomic-memcpy-to-loads.ll

2 lines

Verifier/

element-wise-atomic-memory-intrinsics.ll

33 lines

Diff 99202

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 13,547 Lines • ▼ Show 20 Lines
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

	Element Wise Atomic Memory Intrinsics			Element Wise Atomic Memory Intrinsics
	-------------------------------------			-------------------------------------

	These intrinsics are similar to the standard library memory intrinsics except			These intrinsics are similar to the standard library memory intrinsics except
	that they perform memory transfer as a sequence of atomic memory accesses.			that they perform memory transfer as a sequence of atomic memory accesses.

	.. _int_memcpy_element_atomic:			.. _int_memcpy_element_unordered_atomic:

	'``llvm.memcpy.element.atomic``' Intrinsic			'``llvm.memcpy.element.unordered.atomic``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on			This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
	any integer bit width and for different address spaces. Not all targets			any integer bit width and for different address spaces. Not all targets
	support all bit widths however.			support all bit widths however.

	::			::

	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>, i8* <src>, i32 <len>,
	i64 <num_elements>, i32 <element_size>)			i32 <align>, i1 <isvolatile>,
				i1 <dest_unordered>, i1 <src_unordered>,
				i8 <element_size>)
				efriedmaUnsubmitted Not Done Reply Inline Actions Hmm... I didn't mention isvolatile earlier? See https://reviews.llvm.org/D27133?id=79305#629884 for original discussion of it. At the very least, we need a better description of what it means. efriedma: Hmm... I didn't mention isvolatile earlier? See https://reviews.llvm.org/D27133?
				reamesUnsubmitted Not Done Reply Inline Actions I think we can just remove this. The original motivation was essentially future proofing, and I don't think it's worth keeping the complexity for now. We can change our minds later if it turns out we actually need this. reames: I think we can just remove this. The original motivation was essentially future proofing, and…
				dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Agreed. I'll remove it. dneilson: Agreed. I'll remove it.
				declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>, i8* <src>, i64 <len>,
				i32 <align>, i1 <isvolatile>,
				i1 <dest_unordered>, i1 <src_unordered>,
				i8 <element_size>)

	Overview:			Overview:
	"""""""""			"""""""""

	The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of			The '``llvm.memcpy.element.unordered.atomic.``' intrinsic is a specialization of the '``llvm.memcpy.``'
				reamesUnsubmitted Not Done Reply Inline Actions Your revised text is missing key aspects of the old text. You need to preserve the "as a sequence of intrinsic. It differs in that the `dest `and` `src` `are treated as arrays with elements that are` `element_size`` bytes wide and aligned at an element size boundary. " wording from the original, because this is semantically important. reames: Your revised text is missing key aspects of the old text. You need to preserve the "as a…
				dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I'm not seeing the difference here. "Sequence" doesn't imply any sort of ordering. So, to me the old and new are semantically equivalent -- there is a copy happening, and it's being done with unordered atomic load/stores. dneilson: I'm not seeing the difference here. "Sequence" doesn't imply any sort of ordering. So, to me…
	memory from the source location to the destination location as a sequence of			intrinsic. It differs in that the ``dest`` and ``src`` are treated as arrays with elements that are
	unordered atomic memory accesses where each access is a multiple of			exactly ``element_size`` bytes, and the copy between buffers is done in a way that uses
	``element_size`` bytes wide and aligned at an element size boundary. For example			:ref:`unordered atomic <ordering>` load/store operations that are a positive integer multiple
	each element is accessed atomically in source and destination buffers.			of the ``element_size`` in size.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument is a pointer to the destination, the second is a			The first five arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>` intrinsic,
	pointer to the source. The third argument is an integer argument			with the added constraint that ``len`` must be a positive integer multiple of the ``element_size``.
				reamesUnsubmitted Done Reply Inline Actions Er, huh? What do these major values mean? And why do we need anything other than an i1 boolean? reames: Er, huh? What do these major values mean? And why do we need anything other than an i1…
				efriedmaUnsubmitted Not Done Reply Inline Actions "must" is kind of confusing in this context. Probably need to explicitly say "if len is not a multiple of element_size, the behavior is undefined", or something like that, since we can't actually tell until runtime. efriedma: "must" is kind of confusing in this context. Probably need to explicitly say "if len is not a…
				dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Fair. I'll make that change. dneilson: Fair. I'll make that change.
	specifying the number of elements to copy, the fourth argument is size of
	the single element in bytes.

	``element_size`` should be a power of two, greater than zero and less than			``dest_unordered`` is ``true`` if and only if stores to the destination buffer must be unordered
	a target-specific atomic access size limit.			atomic stores.
				annaUnsubmitted Done Reply Inline Actions Nit: if and only if stores to the destination buffer are anna: Nit: if and only if stores to the destination buffer are

	For each of the input pointers ``align`` parameter attribute must be specified.			``src_unordered`` is ``true`` if and only if loads from the source buffer must be unordered atomic
	It must be a power of two and greater than or equal to the ``element_size``.			loads.
	Caller guarantees that both the source and destination pointers are aligned to
	reamesUnsubmitted Done Reply Inline Actions This sentence is important and shouldn't be dropped. reames: This sentence is important and shouldn't be dropped.
	that boundary.			``element_size`` must be a compile-time constant positive power of two no greater than target-specific
				atomic access size limit.

				For each of the input pointers ``align`` parameter attribute must be specified. It must be a power of
				two and greater than or equal to the ``element_size``. Caller guarantees that both the source and
				destination pointers are aligned to that boundary.

	Semantics:			Semantics:
				reamesUnsubmitted Done Reply Inline Actions Ah, the answer to my question above. I think it would be cleaner to have two i1 params instead of encoding the bitmask. Does that complicate anything for you? reames: Ah, the answer to my question above. I think it would be cleaner to have two i1 params instead…
				dneilsonAuthorUnsubmitted Done Reply Inline Actions Easy enough to do if it's desired. Only reason that I didn't do that originally is that I'm already adding two parameters on top of memcpy for unordered memcpy; seemed like a good way to prevent that from becoming three additional parameters. dneilson: Easy enough to do if it's desired. Only reason that I didn't do that originally is that I'm…
	""""""""""			""""""""""

	The '``llvm.memcpy.element.atomic.*``' intrinsic copies			The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of memory from
	'``num_elements`` * ``element_size``' bytes of memory from the source location to			the source location to the destination location. These locations are not allowed to overlap.
	the destination location. These locations are not allowed to overlap. Memory copy			The memory copy is performed as a sequence of load/store operations where each access is
	is performed as a sequence of unordered atomic memory accesses where each access			guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an ``element_size``
	is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an			boundary.
				reamesUnsubmitted Done Reply Inline Actions This is slightly wrong. You don't need the writes to be unordered atomic if the src/dest doesn't need it, but you do still want to allow concurrent reads and writes. I think you want something along the lines of: "It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are unordered atomic when specified. reames: This is slightly wrong. You don't need the writes to be unordered atomic if the src/dest…
	element size boundary.

	The order of the copy is unspecified. The same value may be read from the source			The order of the copy is unspecified. The same value may be read from the source
	buffer many times, but only one write is issued to the destination buffer per			buffer many times, but only one write is issued to the destination buffer per
	element. It is well defined to have concurrent reads and writes to both source			element. It is well defined to have concurrent reads and writes to both source and destination
	and destination provided those reads and writes are at least unordered atomic.			provided those reads and writes are unordered atomic when specified.

	This intrinsic does not provide any additional ordering guarantees over those			This intrinsic does not provide any additional ordering guarantees over those
	provided by a set of unordered loads from the source location and stores to the			provided by a set of unordered loads from the source location and stores to the
	destination.			destination.

	Lowering:			Lowering:
	"""""""""			"""""""""

	In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered			In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is lowered
	to a call to the symbol ``__llvm_memcpy_element_atomic_``. Where '' is replaced			to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_``. Where '' is replaced
	with an actual element size.			with an actual element size.

	Optimizer is allowed to inline memory copy when it's profitable to do so.			The optimizer is allowed to inline the memory copy when it's profitable to do so.

include/llvm/CodeGen/RuntimeLibcalls.h

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	enum Libcall {
O_F128,		O_F128,
O_PPCF128,		O_PPCF128,

// MEMORY		// MEMORY
MEMCPY,		MEMCPY,
MEMSET,		MEMSET,
MEMMOVE,		MEMMOVE,

// ELEMENT-WISE ATOMIC MEMORY		// ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes
MEMCPY_ELEMENT_ATOMIC_1,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_1,
MEMCPY_ELEMENT_ATOMIC_2,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_2,
MEMCPY_ELEMENT_ATOMIC_4,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_4,
MEMCPY_ELEMENT_ATOMIC_8,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_8,
MEMCPY_ELEMENT_ATOMIC_16,		MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,

// EXCEPTION HANDLING		// EXCEPTION HANDLING
UNWIND_RESUME,		UNWIND_RESUME,

// Note: there's two sets of atomics libcalls; see		// Note: there's two sets of atomics libcalls; see
// <http://llvm.org/docs/Atomics.html> for more info on the		// <http://llvm.org/docs/Atomics.html> for more info on the
// difference between them.		// difference between them.

▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	namespace RTLIB {
/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or		/// getUINTTOFP - Return the UINTTOFP__ value for the given types, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getUINTTOFP(EVT OpVT, EVT RetVT);		Libcall getUINTTOFP(EVT OpVT, EVT RetVT);

/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or		/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or
/// UNKNOWN_LIBCALL if there is none.		/// UNKNOWN_LIBCALL if there is none.
Libcall getSYNC(unsigned Opc, MVT VT);		Libcall getSYNC(unsigned Opc, MVT VT);

/// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the		/// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the
/// given element size or UNKNOW_LIBCALL if there is none.		/// given element size or UNKNOW_LIBCALL if there is none.
Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);		Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);
}		}
}		}

#endif		#endif

include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	static inline bool classof(const IntrinsicInst *I) {
}		}
}		}
static inline bool classof(const Value *V) {		static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

/// This class represents atomic memcpy intrinsic		/// This class represents atomic memcpy intrinsic
/// TODO: Integrate this class into MemIntrinsic hierarchy.		/// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
class ElementAtomicMemCpyInst : public IntrinsicInst {		/// C&P of all methods from that hierarchy
		class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst {
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I'm inclined to change this name to 'EUAMemcpyInst' to cut down on the length of its name. Any objections? dneilson: I'm inclined to change this name to 'EUAMemcpyInst' to cut down on the length of its name. Any…
		private:
		constexpr static int ARG_DEST = 0;
		skatkovUnsubmitted Done Reply Inline Actions why not enum? skatkov: why not enum?
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions No particularly good reason. Just playing around. dneilson: No particularly good reason. Just playing around.
		constexpr static int ARG_SRC = 1;
		skatkovUnsubmitted Done Reply Inline Actions ARG_SOURCE? For consistency with other names. skatkov: ARG_SOURCE? For consistency with other names.
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Fair. dneilson: Fair.
		constexpr static int ARG_LENGTH = 2;
		constexpr static int ARG_ALIGN = 3;
		constexpr static int ARG_VOLATILE = 4;
		constexpr static int ARG_DEST_UNORDERED = 5;
		constexpr static int ARG_SRC_UNORDERED = 6;
		skatkovUnsubmitted Done Reply Inline Actions ARG_SOURCE_UNORDERED? For consistency with other names. skatkov: ARG_SOURCE_UNORDERED? For consistency with other names.
		constexpr static int ARG_ELEMENTSIZE = 7;

public:		public:
Value *getRawDest() const { return getArgOperand(0); }		Value *getRawDest() const {
Value *getRawSource() const { return getArgOperand(1); }		return const_cast<Value *>(getArgOperand(ARG_DEST));
		}
		const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
		Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }

		/// Return the arguments to the instruction.
		Value *getRawSource() const {
		return const_cast<Value *>(getArgOperand(ARG_SRC));
		}
		const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SRC); }
		Use &getRawSourceUse() { return getArgOperandUse(ARG_SRC); }

		Value *getLength() const {
		return const_cast<Value *>(getArgOperand(ARG_LENGTH));
		}
		const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
		Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }

		ConstantInt *getAlignmentCst() const {
		return cast<ConstantInt>(const_cast<Value *>(getArgOperand(ARG_ALIGN)));
		}

Value *getNumElements() const { return getArgOperand(2); }		unsigned getAlignment() const { return getAlignmentCst()->getZExtValue(); }
void setNumElements(Value *V) { setArgOperand(2, V); }
		Type *getAlignmentType() const {
		return getArgOperand(ARG_ALIGN)->getType();
		}

uint64_t getSrcAlignment() const { return getParamAlignment(0); }		ConstantInt *getVolatileCst() const {
uint64_t getDstAlignment() const { return getParamAlignment(1); }		return cast<ConstantInt>(
		const_cast<Value *>(getArgOperand(ARG_VOLATILE)));
		}

uint64_t getElementSizeInBytes() const {		bool isVolatile() const { return !getVolatileCst()->isZero(); }
Value *Arg = getArgOperand(3);
		uint8_t getDestUnordered() const {
		skatkovUnsubmitted Done Reply Inline Actions bool isDestUnordered? skatkov: bool isDestUnordered?
		Value *Arg = getArgOperand(ARG_DEST_UNORDERED);
		return uint8_t(cast<ConstantInt>(Arg)->getZExtValue());
		}

		uint8_t getSrcUnordered() const {
		skatkovUnsubmitted Done Reply Inline Actions bool isSrcUnordered? skatkov: bool isSrcUnordered?
		Value *Arg = getArgOperand(ARG_SRC_UNORDERED);
		return uint8_t(cast<ConstantInt>(Arg)->getZExtValue());
		}

		uint8_t getElementSizeInBytes() const {
		Value *Arg = getArgOperand(ARG_ELEMENTSIZE);
return cast<ConstantInt>(Arg)->getZExtValue();		return cast<ConstantInt>(Arg)->getZExtValue();
}		}

		/// This is just like getRawDest, but it strips off any cast
		/// instructions that feed it, giving the original input. The returned
		/// value is guaranteed to be a pointer.
		Value *getDest() const { return getRawDest()->stripPointerCasts(); }

		/// This is just like getRawSource, but it strips off any cast
		/// instructions that feed it, giving the original input. The returned
		/// value is guaranteed to be a pointer.
		Value *getSource() const { return getRawSource()->stripPointerCasts(); }

		unsigned getDestAddressSpace() const {
		return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
		}

		unsigned getSourceAddressSpace() const {
		return cast<PointerType>(getRawSource()->getType())->getAddressSpace();
		}

		/// Set the specified arguments of the instruction.
		void setDest(Value *Ptr) {
		assert(getRawDest()->getType() == Ptr->getType() &&
		"setDest called with pointer of wrong type!");
		setArgOperand(ARG_DEST, Ptr);
		}

		void setSource(Value *Ptr) {
		assert(getRawSource()->getType() == Ptr->getType() &&
		"setSource called with pointer of wrong type!");
		setArgOperand(ARG_SRC, Ptr);
		}

		void setLength(Value *L) {
		assert(getLength()->getType() == L->getType() &&
		"setLength called with value of wrong type!");
		setArgOperand(ARG_LENGTH, L);
		}

		void setAlignment(Constant *A) { setArgOperand(ARG_ALIGN, A); }
		skatkovUnsubmitted Not Done Reply Inline Actions Don't you want to check (or assert) the constraints for the alignment here? skatkov: Don't you want to check (or assert) the constraints for the alignment here?
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I figured that's handled by the verifier. dneilson: I figured that's handled by the verifier.
		skatkovUnsubmitted Done Reply Inline Actions ok, but to me it is one of a primary goal for setter to check incoming args. skatkov: ok, but to me it is one of a primary goal for setter to check incoming args.

		void setVolatile(Constant *V) { setArgOperand(ARG_VOLATILE, V); }

		void setDestUnordered(Constant *V) { setArgOperand(ARG_DEST_UNORDERED, V); }

		void setSrcUnordered(Constant *V) { setArgOperand(ARG_SRC_UNORDERED, V); }
		skatkovUnsubmitted Done Reply Inline Actions setSourceUnordered? For consistency with other names. skatkov: setSourceUnordered? For consistency with other names.

		void setElementSizeInBytes(Constant *V) {
		skatkovUnsubmitted Not Done Reply Inline Actions Don't you want to check (or assert) the constraints for the element size here? skatkov: Don't you want to check (or assert) the constraints for the element size here?
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Handled by verifier. dneilson: Handled by verifier.
		setArgOperand(ARG_ELEMENTSIZE, V);
		}

static inline bool classof(const IntrinsicInst *I) {		static inline bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic;		return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic;
}		}
static inline bool classof(const Value *V) {		static inline bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

/// This is the common base class for memset/memcpy/memmove.		/// This is the common base class for memset/memcpy/memmove.
class MemIntrinsic : public IntrinsicInst {		class MemIntrinsic : public IntrinsicInst {
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 800 Lines • ▼ Show 20 Lines
	// Takes a pointer to a string and the length of the string.			// Takes a pointer to a string and the length of the string.
	def int_xray_customevent : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],			def int_xray_customevent : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
	[NoCapture<0>, ReadOnly<0>, IntrWriteMem]>;			[NoCapture<0>, ReadOnly<0>, IntrWriteMem]>;
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===------ Memory intrinsics with element-wise atomicity guarantees ------===//			//===------ Memory intrinsics with element-wise atomicity guarantees ------===//
	//			//

	def int_memcpy_element_atomic : Intrinsic<[],			// llvm.memcpy.element.unordered.atomic(dest, src, length, alignment, volatile,
	[llvm_anyptr_ty, llvm_anyptr_ty,			// isunordered, elementsize)
				skatkovUnsubmitted Done Reply Inline Actions you have two unordered arguments. skatkov: you have two unordered arguments.
				dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Good catch! I updated the prototype, but neglected the comment. dneilson: Good catch! I updated the prototype, but neglected the comment.
	llvm_i64_ty, llvm_i32_ty],			def int_memcpy_element_unordered_atomic
	[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,			: Intrinsic<[],
	WriteOnly<0>, ReadOnly<1>]>;			[
				llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty,
				llvm_i1_ty, llvm_i1_ty, llvm_i1_ty, llvm_i8_ty
				],
				[
				IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
				ReadOnly<1>
				]>;

	//===------------------------ Reduction Intrinsics ------------------------===//			//===------------------------ Reduction Intrinsics ------------------------===//
	//			//
	def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],			def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
	[llvm_anyfloat_ty,			[llvm_anyfloat_ty,
	llvm_anyvector_ty],			llvm_anyvector_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],			def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,861 Lines • ▼ Show 20 Lines	case Intrinsic::memmove: {
bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();		bool isVol = cast<ConstantInt>(I.getArgOperand(4))->getZExtValue();
bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());		bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());
SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,		SDValue MM = DAG.getMemmove(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,
isTC, MachinePointerInfo(I.getArgOperand(0)),		isTC, MachinePointerInfo(I.getArgOperand(0)),
MachinePointerInfo(I.getArgOperand(1)));		MachinePointerInfo(I.getArgOperand(1)));
updateDAGForMaybeTailCall(MM);		updateDAGForMaybeTailCall(MM);
return nullptr;		return nullptr;
}		}
case Intrinsic::memcpy_element_atomic: {		case Intrinsic::memcpy_element_unordered_atomic: {
SDValue Dst = getValue(I.getArgOperand(0));		SDValue Dst = getValue(I.getArgOperand(0));
		skatkovUnsubmitted Done Reply Inline Actions You introduced the special getters, can you use them here? skatkov: You introduced the special getters, can you use them here?
SDValue Src = getValue(I.getArgOperand(1));		SDValue Src = getValue(I.getArgOperand(1));
SDValue NumElements = getValue(I.getArgOperand(2));		SDValue Length = getValue(I.getArgOperand(2));
SDValue ElementSize = getValue(I.getArgOperand(3));		SDValue Alignment = getValue(I.getArgOperand(3));
		// Note: arg4 is isvolatile, which is unused for this intrinsic
		SDValue DestUnordered = getValue(I.getArgOperand(5));
		SDValue SrcUnordered = getValue(I.getArgOperand(6));
		// SDValue ElementSize = getValue(I.getArgOperand(7));

// Emit a library call.		// Emit a library call.
TargetLowering::ArgListTy Args;		TargetLowering::ArgListTy Args;
TargetLowering::ArgListEntry Entry;		TargetLowering::ArgListEntry Entry;
Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());		Entry.Ty = DAG.getDataLayout().getIntPtrType(*DAG.getContext());
Entry.Node = Dst;		Entry.Node = Dst;
Args.push_back(Entry);		Args.push_back(Entry);

Entry.Node = Src;		Entry.Node = Src;
Args.push_back(Entry);		Args.push_back(Entry);

Entry.Ty = I.getArgOperand(2)->getType();		Entry.Ty = I.getArgOperand(2)->getType();
Entry.Node = NumElements;		Entry.Node = Length;
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure changing the signature of __llvm_memcpy_element_unordered_atomic_* like this makes sense. I guess changing from the number of elements to the length in bytes is fine. I'm not sure why you want to pass the alignment to the function; the implementation can easily compute the alignment itself based on the pointers passed in. Not sure what the implementation is going to do with the DestUnordered and SrcUnordered parameters. Maybe if the source/dest is non-atomic, it could use unaligned load/store operations? If we do need DestUnordered and SrcUnordered, it probably makes sense to merge them to save an instruction in the caller. efriedma: I'm not sure changing the signature of __llvm_memcpy_element_unordered_atomic_* like this makes…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Fair. I have no strong preference with the library function taking size in bytes vs. number of elements. The only benefit that I can see to passing size in bytes instead of number of elements here is the possibility for runtime checks being implemented in a debug version of the library -- one that verifies that the length is a multiple of element size. Good point about alignment. Not the smartest move on my part. I think that you're right about passing dest_unordered & src_unordered to the library -- it might be unnecessary. The lib function could just be implemented assuming that both source & dest require unordered atomic ops; there shouldn't be any harm in it, since unordered just means that we can't break up an element into partial loads/stores, and we wouldn't want to do that in a high performance library anyways. I'll change the lib prototype to match memcpy exactly. _llvm_memcpy_unordered_atomic(i8* noalias dest, i8* noalias src, uint64 length) dneilson: Fair. I have no strong preference with the library function taking size in bytes vs. number of…
		skatkovUnsubmitted Not Done Reply Inline Actions MI.getLength() == Length skatkov: MI.getLength() == Length
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Not quite. Doesn't seem to be a straightforward way to go from an SDValue to a Type, so I don't think this sort of replacement can be made. dneilson:* Not quite. Doesn't seem to be a straightforward way to go from an SDValue to a Type*, so I…
Args.push_back(Entry);		Args.push_back(Entry);

Entry.Ty = Type::getInt32Ty(*DAG.getContext());		Entry.Ty = Type::getInt32Ty(*DAG.getContext());
Entry.Node = ElementSize;		Entry.Node = Alignment;
		Args.push_back(Entry);

		Entry.Ty = Type::getInt1Ty(*DAG.getContext());
		Entry.Node = DestUnordered;
		Args.push_back(Entry);

		Entry.Ty = Type::getInt1Ty(*DAG.getContext());
		Entry.Node = SrcUnordered;
Args.push_back(Entry);		Args.push_back(Entry);

uint64_t ElementSizeConstant =		uint64_t ElementSizeConstant =
cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();		cast<ConstantInt>(I.getArgOperand(7))->getZExtValue();
RTLIB::Libcall LibraryCall =		RTLIB::Libcall LibraryCall =
RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);		RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)		if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
report_fatal_error("Unsupported element size");		report_fatal_error("Unsupported element size");

TargetLowering::CallLoweringInfo CLI(DAG);		TargetLowering::CallLoweringInfo CLI(DAG);
CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(		CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(
TLI.getLibcallCallingConv(LibraryCall),		TLI.getLibcallCallingConv(LibraryCall),
Type::getVoidTy(*DAG.getContext()),		Type::getVoidTy(*DAG.getContext()),
DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),		DAG.getExternalSymbol(TLI.getLibcallName(LibraryCall),
▲ Show 20 Lines • Show All 4,673 Lines • Show Last 20 Lines

lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	static void InitLibcallNames(const char **Names, const Triple &TT) {
Names[RTLIB::UO_PPCF128] = "__gcc_qunord";		Names[RTLIB::UO_PPCF128] = "__gcc_qunord";
Names[RTLIB::O_F32] = "__unordsf2";		Names[RTLIB::O_F32] = "__unordsf2";
Names[RTLIB::O_F64] = "__unorddf2";		Names[RTLIB::O_F64] = "__unorddf2";
Names[RTLIB::O_F128] = "__unordtf2";		Names[RTLIB::O_F128] = "__unordtf2";
Names[RTLIB::O_PPCF128] = "__gcc_qunord";		Names[RTLIB::O_PPCF128] = "__gcc_qunord";
Names[RTLIB::MEMCPY] = "memcpy";		Names[RTLIB::MEMCPY] = "memcpy";
Names[RTLIB::MEMMOVE] = "memmove";		Names[RTLIB::MEMMOVE] = "memmove";
Names[RTLIB::MEMSET] = "memset";		Names[RTLIB::MEMSET] = "memset";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] = "__llvm_memcpy_element_unordered_atomic_1";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] = "__llvm_memcpy_element_unordered_atomic_2";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] = "__llvm_memcpy_element_unordered_atomic_4";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] = "__llvm_memcpy_element_unordered_atomic_8";
Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";		Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] = "__llvm_memcpy_element_unordered_atomic_16";
Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";		Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_4] = "__sync_val_compare_and_swap_4";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_8] = "__sync_val_compare_and_swap_8";
Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";		Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_16] = "__sync_val_compare_and_swap_16";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_1] = "__sync_lock_test_and_set_1";
Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";		Names[RTLIB::SYNC_LOCK_TEST_AND_SET_2] = "__sync_lock_test_and_set_2";
▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	switch (Opc) {
OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)		OP_TO_LIBCALL(ISD::ATOMIC_LOAD_UMIN, SYNC_FETCH_AND_UMIN)
}		}

#undef OP_TO_LIBCALL		#undef OP_TO_LIBCALL

return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}

RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {		RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
switch (ElementSize) {		switch (ElementSize) {
case 1:		case 1:
return MEMCPY_ELEMENT_ATOMIC_1;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1;
case 2:		case 2:
return MEMCPY_ELEMENT_ATOMIC_2;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2;
case 4:		case 4:
return MEMCPY_ELEMENT_ATOMIC_4;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4;
case 8:		case 8:
return MEMCPY_ELEMENT_ATOMIC_8;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8;
case 16:		case 16:
return MEMCPY_ELEMENT_ATOMIC_16;		return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16;
default:		default:
return UNKNOWN_LIBCALL;		return UNKNOWN_LIBCALL;
}		}

}		}

/// InitCmpLibcallCCs - Set default comparison libcall CC.		/// InitCmpLibcallCCs - Set default comparison libcall CC.
///		///
static void InitCmpLibcallCCs(ISD::CondCode *CCs) {		static void InitCmpLibcallCCs(ISD::CondCode *CCs) {
memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);		memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);
CCs[RTLIB::OEQ_F32] = ISD::SETEQ;		CCs[RTLIB::OEQ_F32] = ISD::SETEQ;
CCs[RTLIB::OEQ_F64] = ISD::SETEQ;		CCs[RTLIB::OEQ_F64] = ISD::SETEQ;
▲ Show 20 Lines • Show All 1,312 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 3,981 Lines • ▼ Show 20 Lines	case Intrinsic::memset: {
const APInt &AlignVal = AlignCI->getValue();		const APInt &AlignVal = AlignCI->getValue();
Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),		Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),
"alignment argument of memory intrinsics must be a power of 2", CS);		"alignment argument of memory intrinsics must be a power of 2", CS);
Assert(isa<ConstantInt>(CS.getArgOperand(4)),		Assert(isa<ConstantInt>(CS.getArgOperand(4)),
"isvolatile argument of memory intrinsics must be a constant int",		"isvolatile argument of memory intrinsics must be a constant int",
CS);		CS);
break;		break;
}		}
case Intrinsic::memcpy_element_atomic: {		case Intrinsic::memcpy_element_unordered_atomic: {
ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));		ConstantInt *AlignCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));
		skatkovUnsubmitted Done Reply Inline Actions Use getters? skatkov: Use getters?
Assert(ElementSizeCI, "element size of the element-wise atomic memory "		Assert(AlignCI,
		skatkovUnsubmitted Not Done Reply Inline Actions Extra semicolon skatkov: Extra semicolon
		"alignment argument of element-wise unordered atomic memory "
		"intrinsics must be a constant int",
		CS);
		const APInt &AlignVal = AlignCI->getValue();
		Assert(AlignCI->isZero() \|\| AlignVal.isPowerOf2(),
		skatkovUnsubmitted Not Done Reply Inline Actions is zero alignment allowed? At least it is strange that the 0 is a power of 2 :) if it is allowed, please update text "or zero" skatkov: is zero alignment allowed? At least it is strange that the 0 is a power of 2 :) if it is…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I would think not, but this is the same check as exists for memcpy. So, for compatibility I think it should be allowed unless we can definitively say that it's not allowed for memcpy. dneilson: I would think not, but this is the same check as exists for memcpy. So, for compatibility I…
		"alignment argument of element-wise unordered atomic memory "
		"intrinsics must be a power of 2",
		CS);

		ConstantInt *DestUnorderedCI = dyn_cast<ConstantInt>(CS.getArgOperand(5));
		Assert(DestUnorderedCI,
		"dest_unordered of the element-wise unordered atomic memory "
		"intrinsic must be a constant int",
		CS);

		ConstantInt *SrcUnorderedCI = dyn_cast<ConstantInt>(CS.getArgOperand(6));
		Assert(SrcUnorderedCI,
		"src_unordered of the element-wise unordered atomic memory "
		"intrinsic must be a constant int",
		CS);

		// Cannot have both unordered flags being false.
		Assert(!(DestUnorderedCI->isZero() && SrcUnorderedCI->isZero()),
		"dest_unordered and src_unordered cannot both be zero on the "
		"element-wise unordered atomic memory intrinsic",
		CS);

		ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(7));
		Assert(ElementSizeCI,
		"element size of the element-wise unordered atomic memory "
"intrinsic must be a constant int",		"intrinsic must be a constant int",
CS);		CS);
const APInt &ElementSizeVal = ElementSizeCI->getValue();		const APInt &ElementSizeVal = ElementSizeCI->getValue();
Assert(ElementSizeVal.isPowerOf2(),		Assert(ElementSizeVal.isPowerOf2(),
"element size of the element-wise atomic memory intrinsic "		"element size of the element-wise atomic memory intrinsic "
"must be a power of 2",		"must be a power of 2",
CS);		CS);

auto IsValidAlignment = [&](uint64_t Alignment) {		auto IsValidAlignment = [&](uint64_t Alignment) {
return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);		return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
};		};

uint64_t DstAlignment = CS.getParamAlignment(0),		uint64_t DstAlignment = CS.getParamAlignment(0),
SrcAlignment = CS.getParamAlignment(1);		SrcAlignment = CS.getParamAlignment(1);

Assert(IsValidAlignment(DstAlignment),		Assert(IsValidAlignment(DstAlignment),
"incorrect alignment of the destination argument",		"incorrect alignment of the destination argument", CS);
CS);
Assert(IsValidAlignment(SrcAlignment),		Assert(IsValidAlignment(SrcAlignment),
"incorrect alignment of the source argument",		"incorrect alignment of the source argument", CS);
CS);
break;		break;
}		}
case Intrinsic::gcroot:		case Intrinsic::gcroot:
case Intrinsic::gcwrite:		case Intrinsic::gcwrite:
case Intrinsic::gcread:		case Intrinsic::gcread:
if (ID == Intrinsic::gcroot) {		if (ID == Intrinsic::gcroot) {
AllocaInst *AI =		AllocaInst *AI =
dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());		dyn_cast<AllocaInst>(CS.getArgOperand(0)->stripPointerCasts());
▲ Show 20 Lines • Show All 891 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = V->getNumElements(); I != E; ++I) {
bool Sign = V->getElementType()->isIntegerTy()		bool Sign = V->getElementType()->isIntegerTy()
? cast<ConstantInt>(Elt)->isNegative()		? cast<ConstantInt>(Elt)->isNegative()
: cast<ConstantFP>(Elt)->isNegative();		: cast<ConstantFP>(Elt)->isNegative();
BoolVec.push_back(ConstantInt::get(BoolTy, Sign));		BoolVec.push_back(ConstantInt::get(BoolTy, Sign));
}		}
return ConstantVector::get(BoolVec);		return ConstantVector::get(BoolVec);
}		}

		/* -- temp removal to aid staging
Instruction *		Instruction *
InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {		InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {
// Try to unfold this intrinsic into sequence of explicit atomic loads and		// Try to unfold this intrinsic into sequence of explicit atomic loads and
// stores.		// stores.
// First check that number of elements is compile time constant.		// First check that number of elements is compile time constant.
auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());		auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());
if (!NumElementsCI)		if (!NumElementsCI)
return nullptr;		return nullptr;

// Check that there are not too many elements.		// Check that there are not too many elements.
uint64_t NumElements = NumElementsCI->getZExtValue();		uint64_t NumElements = NumElementsCI->getZExtValue();
if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)		if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)
return nullptr;		return nullptr;
		skatkovUnsubmitted Not Done Reply Inline Actions assert LengthInBytes % ElementSizeInBytes == 0 and LengthInBytes > 0? skatkov: assert LengthInBytes % ElementSizeInBytes == 0 and LengthInBytes > 0?
		reamesUnsubmitted Not Done Reply Inline Actions Agreed. Also, length might actually be zero. We should remove such calls. reames: Agreed. Also, length might actually be zero. We should remove such calls.
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Re: The assert. I think that it would be better to check this in the verifier. In the LangRef we've said that it's undefined behaviour if length isn't a multiple of element size, so I think it's okay to blindly do this divide here and add a check to the verifier. Re: Zero length; see the in-line comment below. dneilson: Re: The assert. I think that it would be better to check this in the verifier. In the LangRef…

// Don't unfold into illegal integers		// Don't unfold into illegal integers
uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;		uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;
if (!getDataLayout().isLegalInteger(ElementSizeInBytes))		if (!getDataLayout().isLegalInteger(ElementSizeInBytes))
return nullptr;		return nullptr;

		reamesUnsubmitted Not Done Reply Inline Actions Where did this check come from and why is it needed? It looks like an attempt to handle a length which isn't an even interval of element size, but the verifier should reject that? reames: Where did this check come from and why is it needed? It looks like an attempt to handle a…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Just me being extra cautious. The verifier checks for the case where constant length is not a multiple of element size, and the zero length case is handled elsewhere. However, I'm not sure that the verifier runs after every single pass. So, I figure there's no harm in handling the corner case. dneilson: Just me being extra cautious. The verifier checks for the case where constant length is not a…
// Cast source and destination to the correct type. Intrinsic input arguments		// Cast source and destination to the correct type. Intrinsic input arguments
// are usually represented as i8*.		// are usually represented as i8*.
// Often operands will be explicitly casted to i8* and we can just strip		// Often operands will be explicitly casted to i8* and we can just strip
// those casts instead of inserting new ones. However it's easier to rely on		// those casts instead of inserting new ones. However it's easier to rely on
// other InstCombine rules which will cover trivial cases anyway.		// other InstCombine rules which will cover trivial cases anyway.
Value *Src = AMI->getRawSource();		Value *Src = AMI->getRawSource();
Value *Dst = AMI->getRawDest();		Value *Dst = AMI->getRawDest();
Type *ElementPointerType = Type::getIntNPtrTy(		Type *ElementPointerType = Type::getIntNPtrTy(
Show All 36 Lines	for (uint64_t i = 0; i < NumElements; ++i) {
Store->setDebugLoc(AMI->getDebugLoc());		Store->setDebugLoc(AMI->getDebugLoc());
}		}

// Set the number of elements of the copy to 0, it will be deleted on the		// Set the number of elements of the copy to 0, it will be deleted on the
// next iteration.		// next iteration.
AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));		AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));
return AMI;		return AMI;
}		}
		*/

Instruction InstCombiner::SimplifyMemTransfer(MemIntrinsic MI) {		Instruction InstCombiner::SimplifyMemTransfer(MemIntrinsic MI) {
unsigned DstAlign = getKnownAlignment(MI->getArgOperand(0), DL, MI, &AC, &DT);		unsigned DstAlign = getKnownAlignment(MI->getArgOperand(0), DL, MI, &AC, &DT);
unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);		unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);
unsigned MinAlign = std::min(DstAlign, SrcAlign);		unsigned MinAlign = std::min(DstAlign, SrcAlign);
unsigned CopyAlign = MI->getAlignment();		unsigned CopyAlign = MI->getAlignment();

if (CopyAlign < MinAlign) {		if (CopyAlign < MinAlign) {
▲ Show 20 Lines • Show All 1,711 Lines • ▼ Show 20 Lines	if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(II)) {
} else if (MemSetInst *MSI = dyn_cast<MemSetInst>(MI)) {		} else if (MemSetInst *MSI = dyn_cast<MemSetInst>(MI)) {
if (Instruction *I = SimplifyMemSet(MSI))		if (Instruction *I = SimplifyMemSet(MSI))
return I;		return I;
}		}

if (Changed) return II;		if (Changed) return II;
}		}

		/* -- temp removal to simplify staging
if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {		if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {
if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))		if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))
if (C->isNullValue())		if (C->isNullValue())
		reamesUnsubmitted Not Done Reply Inline Actions Hm, I might sink this into the helper function. Optional, and can be submitted separately without further review. reames: Hm, I might sink this into the helper function. Optional, and can be submitted separately…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions This is following the same pattern/code-flow as the normal memcpy/memmove/memset handlers just above this. i.e. Check for a null length -- if there is one, then remove the call, else call the simplify method for the intrinsic. I'm inclined to stick to this pattern to make the later merging of the introspection classes cleaner. dneilson: This is following the same pattern/code-flow as the normal memcpy/memmove/memset handlers just…
return eraseInstFromFunction(*AMI);		return eraseInstFromFunction(*AMI);

if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))		if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))
return I;		return I;
}		}
		*/

if (Instruction I = SimplifyNVVMIntrinsic(II, this))		if (Instruction I = SimplifyNVVMIntrinsic(II, this))
return I;		return I;

auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,		auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,
unsigned DemandedWidth) {		unsigned DemandedWidth) {
APInt UndefElts(Width, 0);		APInt UndefElts(Width, 0);
APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);		APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);
return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);		return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
▲ Show 20 Lines • Show All 2,463 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	Instruction OptAndOp(BinaryOperator Op, ConstantInt *OpRHS,
ConstantInt *AndRHS, BinaryOperator &TheAnd);		ConstantInt *AndRHS, BinaryOperator &TheAnd);

Value insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,		Value insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,
bool isSigned, bool Inside);		bool isSigned, bool Inside);
Instruction *PromoteCastOfAllocation(BitCastInst &CI, AllocaInst &AI);		Instruction *PromoteCastOfAllocation(BitCastInst &CI, AllocaInst &AI);
Instruction *MatchBSwap(BinaryOperator &I);		Instruction *MatchBSwap(BinaryOperator &I);
bool SimplifyStoreAtEndOfBlock(StoreInst &SI);		bool SimplifyStoreAtEndOfBlock(StoreInst &SI);

Instruction SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst AMI);		// Instruction SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst AMI); -- temp removal to aid staging
Instruction SimplifyMemTransfer(MemIntrinsic MI);		Instruction SimplifyMemTransfer(MemIntrinsic MI);
Instruction SimplifyMemSet(MemSetInst MI);		Instruction SimplifyMemSet(MemSetInst MI);

Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);		Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);

/// \brief Returns a value X such that Val = X * Scale, or null if none.		/// \brief Returns a value X such that Val = X * Scale, or null if none.
///		///
/// If the multiplication is known not to overflow then NoSignedWrap is set.		/// If the multiplication is known not to overflow then NoSignedWrap is set.
Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);		Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);
};		};

} // end namespace llvm.		} // end namespace llvm.

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif		#endif

test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define i8* @test_memcpy1(i8* %P, i8* %Q) {			define i8* @test_memcpy1(i8* %P, i8* %Q) {
	; CHECK: test_memcpy			; CHECK: test_memcpy
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $1, %edx			; CHECK-DAG: movl $1, %edx
	; CHECK-DAG: movl $1, %ecx			; 4th arg (%ecx) -- align
	; CHECK: __llvm_memcpy_element_atomic_1			; CHECK-DAG: movl $4, %ecx
				; 5th arg (%r8) -- dest_unordered
				; CHECK-DAG: movl $1, %r8d
				; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_1
	}			}

	define i8* @test_memcpy2(i8* %P, i8* %Q) {			define i8* @test_memcpy2(i8* %P, i8* %Q) {
	; CHECK: test_memcpy2			; CHECK: test_memcpy2
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 4, i1 0, i1 1, i1 1, i8 2)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $2, %edx			; CHECK-DAG: movl $2, %edx
	; CHECK-DAG: movl $2, %ecx			; 4th arg (%ecx) -- align
	; CHECK: __llvm_memcpy_element_atomic_2			; CHECK-DAG: movl $4, %ecx
				; 5th arg (%r8) -- dest_unordered
				; CHECK-DAG: movl $1, %r8d
				; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_2
	}			}

	define i8* @test_memcpy4(i8* %P, i8* %Q) {			define i8* @test_memcpy4(i8* %P, i8* %Q) {
	; CHECK: test_memcpy4			; CHECK: test_memcpy4
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $4, %edx			; CHECK-DAG: movl $4, %edx
				; 4th arg (%ecx) -- align
	; CHECK-DAG: movl $4, %ecx			; CHECK-DAG: movl $4, %ecx
	; CHECK: __llvm_memcpy_element_atomic_4			; 5th arg (%r8) -- dest_unordered
				; CHECK-DAG: movl $1, %r8d
				; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_4
	}			}

	define i8* @test_memcpy8(i8* %P, i8* %Q) {			define i8* @test_memcpy8(i8* %P, i8* %Q) {
	; CHECK: test_memcpy8			; CHECK: test_memcpy8
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8, i1 0, i1 1, i1 1, i8 8)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $8, %edx			; CHECK-DAG: movl $8, %edx
				; 4th arg (%ecx) -- align
	; CHECK-DAG: movl $8, %ecx			; CHECK-DAG: movl $8, %ecx
	; CHECK: __llvm_memcpy_element_atomic_8			; 5th arg (%r8) -- dest_unordered
				; CHECK-DAG: movl $1, %r8d
				; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_8
	}			}

	define i8* @test_memcpy16(i8* %P, i8* %Q) {			define i8* @test_memcpy16(i8* %P, i8* %Q) {
	; CHECK: test_memcpy16			; CHECK: test_memcpy16
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16, i1 0, i1 1, i1 1, i8 16)
	ret i8* %P			ret i8* %P
				; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $16, %edx			; CHECK-DAG: movl $16, %edx
				; 4th arg (%ecx) -- align
	; CHECK-DAG: movl $16, %ecx			; CHECK-DAG: movl $16, %ecx
	; CHECK: __llvm_memcpy_element_atomic_16			; 5th arg (%r8) -- dest_unordered
				; CHECK-DAG: movl $1, %r8d
				; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_16
	}			}

	define void @test_memcpy_args(i8** %Storage) {			define void @test_memcpy_args(i8** %Storage) {
	; CHECK: test_memcpy_args			; CHECK: test_memcpy_args
	%Dst = load i8, i8* %Storage			%Dst = load i8, i8* %Storage
	%Src.addr = getelementptr i8, i8* %Storage, i64 1			%Src.addr = getelementptr i8, i8* %Storage, i64 1
	%Src = load i8, i8* %Src.addr			%Src = load i8, i8* %Src.addr

	; First argument			; 1st arg (%rdi)
	; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]			; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
	; CHECK-DAG: movq [[REG1]], %rdi			; CHECK-DAG: movq [[REG1]], %rdi
	; Second argument			; 2nd arg (%rsi)
	; CHECK-DAG: movq 8(%rdi), %rsi			; CHECK-DAG: movq 8(%rdi), %rsi
	; Third argument			; 3rd arg (%edx) -- size
	; CHECK-DAG: movl $4, %edx			; CHECK-DAG: movl $4, %edx
	; Fourth argument			; 4th arg (%ecx) -- align
	; CHECK-DAG: movl $4, %ecx			; CHECK-DAG: movl $4, %ecx
	; CHECK: __llvm_memcpy_element_atomic_4			; 5th arg (%r8) -- dest_unordered
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)			; CHECK-DAG: movl $1, %r8d
	ret void			; 6th arg (%r9) -- src_unordered
				; CHECK-DAG: movl $1, %r9d
				; CHECK: __llvm_memcpy_element_unordered_atomic_4
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4) ret void
	}			}

	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1, i1, i1, i8) nounwind

test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll

	; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s \| FileCheck %s			; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s \| FileCheck %s
				; Temporarily an expected failure until inst combine is updated in the next patch
				; XFAIL: *
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; Test basic unfolding			; Test basic unfolding
	define void @test1(i8* %Src, i8* %Dst) {			define void @test1(i8* %Src, i8* %Dst) {
	; CHECK-LABEL: test1			; CHECK-LABEL: test1
	; CHECK-NOT: llvm.memcpy.element.atomic			; CHECK-NOT: llvm.memcpy.element.atomic

	; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*			; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

test/Verifier/element-wise-atomic-memory-intrinsics.ll

	; RUN: not opt -verify < %s 2>&1 \| FileCheck %s			; RUN: not opt -verify < %s 2>&1 \| FileCheck %s

	define void @test_memcpy(i8* %P, i8* %Q) {			define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i8 %E, i1 %V) {
				; CHECK: alignment argument of element-wise unordered atomic memory intrinsics must be a constant int
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %A, i1 0, i1 1, i1 1, i8 1)

				; CHECK: alignment argument of element-wise unordered atomic memory intrinsics must be a power of 2
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 5, i1 0, i1 1, i1 1, i8 1)

				; CHECK: dest_unordered of the element-wise unordered atomic memory intrinsic must be a constant int
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 %V, i1 1, i8 1)

				; CHECK: src_unordered of the element-wise unordered atomic memory intrinsic must be a constant int
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 %V, i8 1)

				; CHECK: dest_unordered and src_unordered cannot both be zero on the element-wise unordered atomic memory intrinsic
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 0, i1 0, i8 1)

				; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 %E)
	; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2			; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 3)

	; CHECK: incorrect alignment of the destination argument			; CHECK: incorrect alignment of the destination argument
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1)
				; CHECK: incorrect alignment of the destination argument
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4)

	; CHECK: incorrect alignment of the source argument			; CHECK: incorrect alignment of the source argument
	call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 4, i1 0, i1 1, i1 1, i8 1)
				; CHECK: incorrect alignment of the source argument
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4, i1 0, i1 1, i1 1, i8 4)


	ret void			ret void
	}			}
	declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1, i1, i1, i8) nounwind

	; CHECK: input module is broken!			; CHECK: input module is broken!

This is an archive of the discontinued LLVM Phabricator instance.

[Atomics] Rename and change prototype for atomic memcpy intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99202

docs/LangRef.rst

include/llvm/CodeGen/RuntimeLibcalls.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/TargetLoweringBase.cpp

lib/IR/Verifier.cpp

lib/Transforms/InstCombine/InstCombineCalls.cpp

lib/Transforms/InstCombine/InstCombineInternal.h

test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll

test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll

test/Verifier/element-wise-atomic-memory-intrinsics.ll

[Atomics] Rename and change prototype for atomic memcpy intrinsic
ClosedPublic