This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
LoopInfo.h
-
LoopInfoImpl.h
-
VectorUtils.h
-
IR/
-
LLVMContext.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
-
LoopInfo.cpp
-
VectorUtils.cpp
-
IR/
-
LLVMContext.cpp
-
Transforms/
-
InstCombine/
-
InstCombineCalls.cpp
-
InstCombineLoadStoreAlloca.cpp
-
InstCombinePHI.cpp
-
Scalar/
-
GVNHoist.cpp
-
LoopVersioningLICM.cpp
-
MemCpyOptimizer.cpp
-
SROA.cpp
-
Scalarizer.cpp
-
Utils/
-
InlineFunction.cpp
-
Local.cpp
-
LoopUtils.cpp
-
SimplifyCFG.cpp
-
test/
-
Analysis/LoopInfo/
-
LoopInfo/
-
annotated-parallel-complex.ll
-
annotated-parallel-simple.ll
-
ThinLTO/X86/
-
X86/
-
lazyload_metadata.ll
-
Transforms/
-
Inline/
-
parallel-loop-md-callee.ll
-
parallel-loop-md-merge.ll
-
parallel-loop-md.ll
-
InstCombine/
-
intersect-accessgroup.ll
-
loadstore-metadata.ll
-
mem-par-metadata-memcpy.ll
-
LoopVectorize/X86/
-
X86/
-
force-ifcvt.ll
-
parallel-loops-after-reg2mem.ll
-
parallel-loops.ll
-
pr34438.ll
-
vect.omp.force.ll
-
vect.omp.force.small-tc.ll
-
vector_max_bandwidth.ll
-
SROA/
-
mem-par-metadata-sroa.ll
-
Scalarizer/
-
basic.ll
-
SimplifyCFG/
-
combine-parallel-mem-md.ll

Differential D52116

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
ClosedPublic

Authored by Meinersbur on Sep 14 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

hfinkel
pekka
paul.redmond
reames
hsaito
pekka.jaaskelainen
jdoerfert

Commits

rG978ba61536c2: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
rL349725: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.

Summary

The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated.

Possible extensions/follow-up patches:

AutoUpgrade llvm.mem.parallel_loop_access to llvm.access.group such that we can remove its handling in the passes.

Diff Detail

Repository: rL LLVM

Event Timeline

Meinersbur created this revision.Sep 14 2018, 12:04 PM

Herald added subscribers: dexonsmith, steven_wu, dmgreen and 2 others. · View Herald TranscriptSep 14 2018, 12:04 PM

Meinersbur mentioned this in D52117: Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata..Sep 14 2018, 12:04 PM

Meinersbur added a child revision: D52117: Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata..

Meinersbur mentioned this in D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..Sep 28 2018, 4:21 AM

Thanks for making this MD more robust! It is essential for the vectorization performance of pocl. Sorry for my slow response time.

Related to OpenCL, there's usually a 3D work-item loop all of which levels are parallel. I didn't spot a test that shows multiple loop levels and thus multiple parallel access groups attached to an instruction. Moreover, the inliner MD transfer would indeed be much improved in case it then replicated the parallel access info from all loop hierarchy levels downwards. The comment which mentions that you focus on inner loop is true, but if there is (or will be) a loop interchange optimization pass that utilizes the parallel loop info, then any loop level might be potentially transferred to be the inner loop.

This is again very useful for OpenCL work-item loops as it allows optimizing memory access patterns via loop interchange, thus essentially performing outer loop vectorization when it's the best way to get performance.

This revision is now accepted and ready to land.Oct 3 2018, 12:47 AM

Thank you for your feedback. If you don'r mind, before I commit this, I will prepare a patch for clang generating this new kind of metadata (since atm nothing is generating it) and give others some time for feedback.

Dropping one of the llvm.access.group annotation instead of merging is a trade-off I had to make. It only affects inlining situation where the a loop containing a function call to a function call containing a loop, both loop being marked as parallel. I can make patch suggestion later.

I actually already created a review for the Clang part which is D52117, but uploaded the wrong diff. Corrected. I'll wait for both being accepted before committing.

Rebase

Harbormaster completed remote builds in B23448: Diff 168252.Oct 4 2018, 3:12 AM

dexonsmith removed a subscriber: dexonsmith.Oct 4 2018, 4:15 PM

@hfinkel ping

In D52116#1283910, @Meinersbur wrote:

@hfinkel ping

I'm basically happy with this, but we shouldn't add things to the LangRef without an RFC. I don't recall seeing one.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631 ↗	(On Diff #168252)	I don't understand what this FIXME is saying. What needs to be fixed?
lib/Transforms/Utils/InlineFunction.cpp
808 ↗	(On Diff #168252)	Is the problem with "updating all uses of one of the access groups" that we don't have a way to efficiently enumerate them? Would we need to scan the functions for branches and collect all of the loop-id metadata that's relevant first? It would be nice not to lose this information.

I am going to to prepare an RFC.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631 ↗	(On Diff #168252)	The line below adds `llvm.mem.parallel_loop_access` to a LoopID, but is expected as annotations of instructions that access memory (with a the loop it is parallel to as parameter). There is no code that looks for `llvm.mem.parallel_loop_access` in LoopID metadata. Hence, adding the property has no effect.
lib/Transforms/Utils/InlineFunction.cpp
808 ↗	(On Diff #168252)	Either search (and update) all LoopIDs that reference the access group or create a new 'meta-access-group' as outlined in the FIXME. Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. E.g. it would be nice if alias-analysis would be control-flow-sensitive. At there moment the only pass making use of `Loop::isAnnotatedParallel()` are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. There's also the FIXME still in the code such that the issue is not forgotten. Do you still want me to implement one of the solutions?

hfinkel added inline comments.Dec 4 2018, 8:03 AM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631 ↗	(On Diff #168252)	OIC, okay. Thanks.
lib/Transforms/Utils/InlineFunction.cpp
808 ↗	(On Diff #168252)	Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. Clearly I know this ;) At there moment the only pass making use of Loop::isAnnotatedParallel() are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. Two things: First, this might be true today, but very soon won't be true (vectorization will soon handle outer loops, and we are developing other loop-nest transformations) and if we do the right thing now, we won't later need to go back and fix this later. Second, while it's true that full loop unrolling generally happens before inlining, we could have a loop that, at the point of inlining is an inner loop, but is later fully unrolled such that before vectorization (etc.) the loop here might become, once again, the inner loop. Do you still want me to implement one of the solutions? Yes.

Allow multiple access groups per instructions, i.e. an instruction can be in multiple access groups. This allows a simple 'union' operation that occurs when inlining into another function. A memory access is considered parallel when at least one access group is listed in llvm.loop.parallel_accesses. This is prioritized over the 'intersect' case for combining instructions which would be dual. We only do best-effort here.

Harbormaster completed remote builds in B25850: Diff 177322.Dec 7 2018, 2:36 PM

Meinersbur edited the summary of this revision. (Show Details)Dec 7 2018, 2:42 PM

Meinersbur added a reviewer: jdoerfert.

Rebase to trunk

Harbormaster completed remote builds in B25995: Diff 178092.Dec 13 2018, 10:25 AM

Reinsert parts of LangRef.rst that went missing after merge.

lib/Transforms/Utils/LoopUtils.cpp
190–219 ↗	(On Diff #178092)	This has moved to LoopInfo.cpp. LoopUtils.cpp belongs to libTransform, but not all users have a dependency to libTransform (such as `Loop::isAnnotatedParallel()`).

Harbormaster completed remote builds in B25996: Diff 178093.Dec 13 2018, 10:36 AM

hfinkel added inline comments.Dec 18 2018, 5:32 PM

docs/LangRef.rst
5472 ↗	(On Diff #178093)	update -> updating
5494 ↗	(On Diff #178093)	of -> if
include/llvm/Analysis/VectorUtils.h
120 ↗	(On Diff #178093)	access group lists -> access-group lists
126 ↗	(On Diff #178093)	access group list -> access-group list
lib/Analysis/LoopInfo.cpp
328 ↗	(On Diff #178093)	Repeating this set of asserts seems unfortunate. Also, they have no comment. Maybe make a function? assert(AccGroup->isDistinct() && "Access group metadata nodes must be distinct"); assert(AccGroup->getNumOperands() == 0 && "Access group metadata nodes must have zero operands"); you also repeat these asserts in lib/Analysis/VectorUtils.cpp below.
lib/Analysis/VectorUtils.cpp
474 ↗	(On Diff #178093)	Indentation here is odd.
477 ↗	(On Diff #178093)	Indentation here looks odd too.

Address @hfinkel's review
clang-format

Harbormaster completed remote builds in B26151: Diff 178943.Dec 19 2018, 12:29 PM

Aside from the requested renaming (see below), this LGTM.

include/llvm/Analysis/LoopInfo.h
1016 ↗	(On Diff #178943)	I'd prefer to name this isValidAsAccessGroup() (because the function does not actually determine whether the MD node is an access group, but rather, whether it might be valid to use it as one).

Rename: isAccessGroup -> isValidAsAccessGroup

Harbormaster completed remote builds in B26161: Diff 178979.Dec 19 2018, 3:32 PM

Closed by commit rL349725: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. (authored by Meinersbur). · Explain WhyDec 19 2018, 9:01 PM

This revision was automatically updated to reflect the committed changes.

Meinersbur marked an inline comment as done.

Meinersbur mentioned this in rC349823: [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem..Dec 20 2018, 1:28 PM

Meinersbur mentioned this in rL349823: [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem..

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

130 lines

include/

llvm/

Analysis/

LoopInfo.h

26 lines

LoopInfoImpl.h

5 lines

VectorUtils.h

20 lines

IR/

LLVMContext.h

1 line

Transforms/

Utils/

LoopUtils.h

2 lines

lib/

Analysis/

LoopInfo.cpp

72 lines

VectorUtils.cpp

95 lines

IR/

LLVMContext.cpp

1 line

Transforms/

InstCombine/

InstCombineCalls.cpp

5 lines

InstCombineLoadStoreAlloca.cpp

1 line

InstCombinePHI.cpp

1 line

Scalar/

GVNHoist.cpp

2 lines

LoopVersioningLICM.cpp

2 lines

MemCpyOptimizer.cpp

3 lines

SROA.cpp

12 lines

Scalarizer.cpp

3 lines

Utils/

26 lines

11 lines

49 lines

3 lines

test/

Analysis/

LoopInfo/

annotated-parallel-complex.ll

91 lines

annotated-parallel-simple.ll

37 lines

ThinLTO/

X86/

lazyload_metadata.ll

4 lines

Transforms/

Inline/

parallel-loop-md-callee.ll

56 lines

parallel-loop-md-merge.ll

78 lines

parallel-loop-md.ll

18 lines

InstCombine/

intersect-accessgroup.ll

113 lines

loadstore-metadata.ll

17 lines

mem-par-metadata-memcpy.ll

11 lines

LoopVectorize/

X86/

force-ifcvt.ll

11 lines

parallel-loops-after-reg2mem.ll

13 lines

parallel-loops.ll

34 lines

pr34438.ll

9 lines

vect.omp.force.ll

14 lines

vect.omp.force.small-tc.ll

46 lines

vector_max_bandwidth.ll

9 lines

SROA/

mem-par-metadata-sroa.ll

33 lines

Scalarizer/

basic.ll

25 lines

SimplifyCFG/

combine-parallel-mem-md.ll

21 lines

Diff 179014

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,134 Lines • ▼ Show 20 Lines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are		Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
used to control per-loop vectorization and interleaving parameters such as		used to control per-loop vectorization and interleaving parameters such as
vectorization width and interleave count. These metadata should be used in		vectorization width and interleave count. These metadata should be used in
conjunction with ``llvm.loop`` loop identification metadata. The		conjunction with ``llvm.loop`` loop identification metadata. The
``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only		``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
optimization hints and the optimizer will only interleave and vectorize loops if		optimization hints and the optimizer will only interleave and vectorize loops if
it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata		it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
which contains information about loop-carried memory dependencies can be helpful		which contains information about loop-carried memory dependencies can be helpful
in determining the safety of these transformations.		in determining the safety of these transformations.

'``llvm.loop.interleave.count``' Metadata		'``llvm.loop.interleave.count``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This metadata suggests an interleave count to the loop interleaver.		This metadata suggests an interleave count to the loop interleaver.
The first operand is the string ``llvm.loop.interleave.count`` and the		The first operand is the string ``llvm.loop.interleave.count`` and the
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines

'``llvm.loop.distribute.followup_all``' Metadata		'``llvm.loop.distribute.followup_all``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Thes attributes in this metdata is added to all followup loops of the		Thes attributes in this metdata is added to all followup loops of the
loop distribution pass. See		loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.		:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.mem``'		'``llvm.access.group``' Metadata
^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Metadata types used to annotate memory accesses with information helpful		``llvm.access.group`` metadata can be attached to any instruction that
for optimizations are prefixed with ``llvm.mem``.		potentially accesses memory. It can point to a single distinct metadata
		node, which we call access group. This node represents all memory access
		instructions referring to it via ``llvm.access.group``. When an
		instruction belongs to multiple access groups, it can also point to a
		list of accesses groups, illustrated by the following example.

'``llvm.mem.parallel_loop_access``' Metadata		.. code-block:: llvm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,		%val = load i32, i32* %arrayidx, !llvm.access.group !0
or metadata containing a list of loop identifiers for nested loops.		...
The metadata is attached to memory accessing instructions and denotes that		!0 = !{!1, !2}
no loop carried memory dependence exist between it and other instructions denoted		!1 = distinct !{}
with the same loop identifier. The metadata on memory reads also implies that		!2 = distinct !{}
if conversion (i.e. speculative execution within a loop iteration) is safe.
		It is illegal for the list node to be empty since it might be confused
Precisely, given two instructions ``m1`` and ``m2`` that both have the		with an access group.
``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
set of loops associated with that metadata, respectively, then there is no loop		The access group metadata node must be 'distinct' to avoid collapsing
carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and		multiple access groups by content. A access group metadata node must
``L2``.		always be empty which can be used to distinguish an access group
		metadata node from a list of access groups. Being empty avoids the
As a special case, if all memory accessing instructions in a loop have		situation that the content must be updated which, because metadata is
``llvm.mem.parallel_loop_access`` metadata that refers to that loop, then the		immutable by design, would required finding and updating all references
loop has no loop carried memory dependences and is considered to be a parallel		to the access group node.
loop.
		The access group can be used to refer to a memory access instruction
		without pointing to it directly (which is not possible in global
		metadata). Currently, the only metadata making use of it is
		``llvm.loop.parallel_accesses``.

		'``llvm.loop.parallel_accesses``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Note that if not all memory access instructions have such metadata referring to		The ``llvm.loop.parallel_accesses`` metadata refers to one or more
the loop, then the loop is considered not being trivially parallel. Additional		access group metadata nodes (see ``llvm.access.group``). It denotes that
		no loop-carried memory dependence exist between it and other instructions
		in the loop with this metadata.

		Let ``m1`` and ``m2`` be two instructions that both have the
		``llvm.access.group`` metadata to the access group ``g1``, respectively
		``g2`` (which might be identical). If a loop contains both access groups
		in its ``llvm.loop.parallel_accesses`` metadata, then the compiler can
		assume that there is no dependency between ``m1`` and ``m2`` carried by
		this loop. Instructions that belong to multiple access groups are
		considered having this property if at least one of the access groups
		matches the ``llvm.loop.parallel_accesses`` list.

		If all memory-accessing instructions in a loop have
		``llvm.loop.parallel_accesses`` metadata that refers to that loop, then the
		loop has no loop carried memory dependences and is considered to be a
		parallel loop.

		Note that if not all memory access instructions belong to an access
		group referred to by ``llvm.loop.parallel_accesses``, then the loop must
		not be considered trivially parallel. Additional
memory dependence analysis is required to make that determination. As a fail		memory dependence analysis is required to make that determination. As a fail
safe mechanism, this causes loops that were originally parallel to be considered		safe mechanism, this causes loops that were originally parallel to be considered
sequential (if optimization passes that are unaware of the parallel semantics		sequential (if optimization passes that are unaware of the parallel semantics
insert new memory instructions into the loop body).		insert new memory instructions into the loop body).

Example of a loop that is considered parallel due to its correct use of		Example of a loop that is considered parallel due to its correct use of
both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``		both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
metadata types that refer to the same loop identifier metadata.		metadata types.

.. code-block:: llvm		.. code-block:: llvm

for.body:		for.body:
...		...
%val0 = load i32, i32* %arrayidx, !llvm.mem.parallel_loop_access !0		%val0 = load i32, i32* %arrayidx, !llvm.access.group !1
...		...
store i32 %val0, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0		store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
...		...
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
...		...
!0 = !{!0}		!0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
		!1 = distinct !{}

It is also possible to have nested parallel loops. In that case the		It is also possible to have nested parallel loops:
memory accesses refer to a list of loop identifier metadata nodes instead of
the loop identifier metadata node directly:

.. code-block:: llvm		.. code-block:: llvm

outer.for.body:		outer.for.body:
...		...
%val1 = load i32, i32* %arrayidx3, !llvm.mem.parallel_loop_access !2		%val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
...		...
br label %inner.for.body		br label %inner.for.body

inner.for.body:		inner.for.body:
...		...
%val0 = load i32, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0		%val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
...		...
store i32 %val0, i32* %arrayidx2, !llvm.mem.parallel_loop_access !0		store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
...		...
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1		br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1

inner.for.end:		inner.for.end:
...		...
store i32 %val1, i32* %arrayidx4, !llvm.mem.parallel_loop_access !2		store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
...		...
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2		br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2

outer.for.end: ; preds = %for.body		outer.for.end: ; preds = %for.body
...		...
!0 = !{!1, !2} ; a list of loop identifiers		!1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !3}} ; metadata for the inner loop
!1 = !{!1} ; an identifier for the inner loop		!2 = distinct !{!2, !{!"llvm.loop.parallel_accesses", !3, !4}} ; metadata for the outer loop
!2 = !{!2} ; an identifier for the outer loop		!3 = distinct !{} ; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)
		!4 = distinct !{} ; access group for instructions in the outer, but not the inner loop

'``irr_loop``' Metadata		'``irr_loop``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^

``irr_loop`` metadata may be attached to the terminator instruction of a basic		``irr_loop`` metadata may be attached to the terminator instruction of a basic
block that's an irreducible loop header (note that an irreducible loop has more		block that's an irreducible loop header (note that an irreducible loop has more
than once header basic blocks.) If ``irr_loop`` metadata is attached to the		than once header basic blocks.) If ``irr_loop`` metadata is attached to the
terminator instruction of a basic block that is not really an irreducible loop		terminator instruction of a basic block that is not really an irreducible loop
▲ Show 20 Lines • Show All 1,399 Lines • ▼ Show 20 Lines

The '``fneg``' instruction returns the negation of its operand.		The '``fneg``' instruction returns the negation of its operand.

Arguments:		Arguments:
""""""""""		""""""""""

The argument to the '``fneg``' instruction must be a		The argument to the '``fneg``' instruction must be a
:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of		:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
floating-point values.		floating-point values.

Semantics:		Semantics:
""""""""""		""""""""""

The value produced is a copy of the operand with its sign bit flipped.		The value produced is a copy of the operand with its sign bit flipped.
This instruction can also take any number of :ref:`fast-math		This instruction can also take any number of :ref:`fast-math
flags <fastmath>`, which are optimization hints to enable otherwise		flags <fastmath>`, which are optimization hints to enable otherwise
unsafe floating-point optimizations:		unsafe floating-point optimizations:
▲ Show 20 Lines • Show All 7,999 Lines • ▼ Show 20 Lines	::
declare <type>		declare <type>
@llvm.experimental.constrained.maxnum(<type> <op1>, <type> <op2>		@llvm.experimental.constrained.maxnum(<type> <op1>, <type> <op2>
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum		The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
of the two arguments.		of the two arguments.

Arguments:		Arguments:
""""""""""		""""""""""

The first two arguments and the return value are floating-point numbers		The first two arguments and the return value are floating-point numbers
of the same type.		of the same type.

The third and forth arguments specify the rounding mode and exception		The third and forth arguments specify the rounding mode and exception
behavior as described above.		behavior as described above.

Semantics:		Semantics:
""""""""""		""""""""""

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	::
declare <type>		declare <type>
@llvm.experimental.constrained.ceil(<type> <op1>,		@llvm.experimental.constrained.ceil(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the		The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
first operand.		first operand.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument and the return value are floating-point numbers of the same		The first argument and the return value are floating-point numbers of the same
type.		type.

Show All 19 Lines	::
declare <type>		declare <type>
@llvm.experimental.constrained.floor(<type> <op1>,		@llvm.experimental.constrained.floor(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the		The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
first operand.		first operand.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument and the return value are floating-point numbers of the same		The first argument and the return value are floating-point numbers of the same
type.		type.

The second and third arguments specify the rounding mode and exception		The second and third arguments specify the rounding mode and exception
behavior as described above. The rounding mode is currently unused for this		behavior as described above. The rounding mode is currently unused for this
intrinsic.		intrinsic.

Semantics:		Semantics:
""""""""""		""""""""""

This function returns the same values as the libm ``floor`` functions		This function returns the same values as the libm ``floor`` functions
would and handles error conditions in the same way.		would and handles error conditions in the same way.


'``llvm.experimental.constrained.round``' Intrinsic		'``llvm.experimental.constrained.round``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.round(<type> <op1>,		@llvm.experimental.constrained.round(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.round``' intrinsic returns the first		The '``llvm.experimental.constrained.round``' intrinsic returns the first
operand rounded to the nearest integer.		operand rounded to the nearest integer.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument and the return value are floating-point numbers of the same		The first argument and the return value are floating-point numbers of the same
type.		type.

Show All 19 Lines	::
declare <type>		declare <type>
@llvm.experimental.constrained.trunc(<type> <op1>,		@llvm.experimental.constrained.trunc(<type> <op1>,
metadata <truncing mode>,		metadata <truncing mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.trunc``' intrinsic returns the first		The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
operand rounded to the nearest integer not larger in magnitude than the		operand rounded to the nearest integer not larger in magnitude than the
operand.		operand.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument and the return value are floating-point numbers of the same		The first argument and the return value are floating-point numbers of the same
type.		type.

▲ Show 20 Lines • Show All 1,382 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/LoopInfo.h

Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	#endif
}		}

/// Verify loop structure		/// Verify loop structure
void verifyLoop() const;		void verifyLoop() const;

/// Verify loop structure of this loop and all nested loops.		/// Verify loop structure of this loop and all nested loops.
void verifyLoopNest(DenseSet<const LoopT > Loops) const;		void verifyLoopNest(DenseSet<const LoopT > Loops) const;

		/// Returns true if the loop is annotated parallel.
		///
		/// Derived classes can override this method using static template
		/// polymorphism.
		bool isAnnotatedParallel() const { return false; }

/// Print loop with all the BBs inside it.		/// Print loop with all the BBs inside it.
void print(raw_ostream &OS, unsigned Depth = 0, bool Verbose = false) const;		void print(raw_ostream &OS, unsigned Depth = 0, bool Verbose = false) const;

protected:		protected:
friend class LoopInfoBase<BlockT, LoopT>;		friend class LoopInfoBase<BlockT, LoopT>;

/// This creates an empty loop.		/// This creates an empty loop.
LoopBase() : ParentLoop(nullptr) {}		LoopBase() : ParentLoop(nullptr) {}
▲ Show 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	public:
void print(raw_ostream &O, const Module *M = nullptr) const override;		void print(raw_ostream &O, const Module *M = nullptr) const override;

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
};		};

/// Function to print a loop's contents as LLVM's text IR assembly.		/// Function to print a loop's contents as LLVM's text IR assembly.
void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner = "");		void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner = "");

		/// Find and return the loop attribute node for the attribute @p Name in
		/// @p LoopID. Return nullptr if there is no such attribute.
		MDNode findOptionMDForLoopID(MDNode LoopID, StringRef Name);

		/// Find string metadata for a loop.
		///
		/// Returns the MDNode where the first operand is the metadata's name. The
		/// following operands are the metadata's values. If no metadata with @p Name is
		/// found, return nullptr.
		MDNode findOptionMDForLoop(const Loop TheLoop, StringRef Name);

		/// Return whether an MDNode might represent an access group.
		///
		/// Access group metadata nodes have to be distinct and empty. Being
		/// always-empty ensures that it never needs to be changed (which -- because
		/// MDNodes are designed immutable -- would require creating a new MDNode). Note
		/// that this is not a sufficient condition: not every distinct and empty NDNode
		/// is representing an access group.
		bool isValidAsAccessGroup(MDNode *AccGroup);

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/trunk/include/llvm/Analysis/LoopInfoImpl.h

Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	void LoopBase<BlockT, LoopT>::verifyLoopNest(
// Verify the subloops.		// Verify the subloops.
for (iterator I = begin(), E = end(); I != E; ++I)		for (iterator I = begin(), E = end(); I != E; ++I)
(*I)->verifyLoopNest(Loops);		(*I)->verifyLoopNest(Loops);
}		}

template <class BlockT, class LoopT>		template <class BlockT, class LoopT>
void LoopBase<BlockT, LoopT>::print(raw_ostream &OS, unsigned Depth,		void LoopBase<BlockT, LoopT>::print(raw_ostream &OS, unsigned Depth,
bool Verbose) const {		bool Verbose) const {
OS.indent(Depth * 2) << "Loop at depth " << getLoopDepth() << " containing: ";		OS.indent(Depth * 2);
		if (static_cast<const LoopT *>(this)->isAnnotatedParallel())
		OS << "Parallel ";
		OS << "Loop at depth " << getLoopDepth() << " containing: ";

BlockT *H = getHeader();		BlockT *H = getHeader();
for (unsigned i = 0; i < getBlocks().size(); ++i) {		for (unsigned i = 0; i < getBlocks().size(); ++i) {
BlockT *BB = getBlocks()[i];		BlockT *BB = getBlocks()[i];
if (!Verbose) {		if (!Verbose) {
if (i)		if (i)
OS << ",";		OS << ",";
BB->printAsOperand(OS, false);		BB->printAsOperand(OS, false);
▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/VectorUtils.h

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	///			///
	/// If the optional TargetTransformInfo is provided, this function tries harder			/// If the optional TargetTransformInfo is provided, this function tries harder
	/// to do less work by only looking at illegal types.			/// to do less work by only looking at illegal types.
	MapVector<Instruction*, uint64_t>			MapVector<Instruction*, uint64_t>
	computeMinimumValueSizes(ArrayRef<BasicBlock*> Blocks,			computeMinimumValueSizes(ArrayRef<BasicBlock*> Blocks,
	DemandedBits &DB,			DemandedBits &DB,
	const TargetTransformInfo *TTI=nullptr);			const TargetTransformInfo *TTI=nullptr);

				/// Compute the union of two access-group lists.
				///
				/// If the list contains just one access group, it is returned directly. If the
				/// list is empty, returns nullptr.
				MDNode uniteAccessGroups(MDNode AccGroups1, MDNode *AccGroups2);

				/// Compute the access-group list of access groups that @p Inst1 and @p Inst2
				/// are both in. If either instruction does not access memory at all, it is
				/// considered to be in every list.
				///
				/// If the list contains just one access group, it is returned directly. If the
				/// list is empty, returns nullptr.
				MDNode intersectAccessGroups(const Instruction Inst1,
				const Instruction *Inst2);

	/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,			/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
	/// MD_nontemporal]. For K in Kinds, we get the MDNode for K from each of the			/// MD_nontemporal, MD_access_group].
				/// For K in Kinds, we get the MDNode for K from each of the
	/// elements of VL, compute their "intersection" (i.e., the most generic			/// elements of VL, compute their "intersection" (i.e., the most generic
	/// metadata value that covers all of the individual values), and set I's			/// metadata value that covers all of the individual values), and set I's
	/// metadata for M equal to the intersection value.			/// metadata for M equal to the intersection value.
	///			///
	/// This function always sets a (possibly null) value for each K in Kinds.			/// This function always sets a (possibly null) value for each K in Kinds.
	Instruction propagateMetadata(Instruction I, ArrayRef<Value *> VL);			Instruction propagateMetadata(Instruction I, ArrayRef<Value *> VL);

	/// Create a mask that filters the members of an interleave group where there			/// Create a mask that filters the members of an interleave group where there
	/// are gaps.			/// are gaps.
	///			///
	/// For example, the mask for \p Group with interleave-factor 3			/// For example, the mask for \p Group with interleave-factor 3
	/// and \p VF 4, that has only its first member present is:			/// and \p VF 4, that has only its first member present is:
	///			///
	/// <1,0,0,1,0,0,1,0,0,1,0,0>			/// <1,0,0,1,0,0,1,0,0,1,0,0>
	///			///
	/// Note: The result is a mask of 0's and 1's, as opposed to the other			/// Note: The result is a mask of 0's and 1's, as opposed to the other
	/// create[*]Mask() utilities which create a shuffle mask (mask that			/// create[*]Mask() utilities which create a shuffle mask (mask that
	/// consists of indices).			/// consists of indices).
	Constant *createBitMaskForGaps(IRBuilder<> &Builder, unsigned VF,			Constant *createBitMaskForGaps(IRBuilder<> &Builder, unsigned VF,
	const InterleaveGroup<Instruction> &Group);			const InterleaveGroup<Instruction> &Group);

	/// Create a mask with replicated elements.			/// Create a mask with replicated elements.
	///			///
	/// This function creates a shuffle mask for replicating each of the \p VF			/// This function creates a shuffle mask for replicating each of the \p VF
	/// elements in a vector \p ReplicationFactor times. It can be used to			/// elements in a vector \p ReplicationFactor times. It can be used to
	/// transform a mask of \p VF elements into a mask of			/// transform a mask of \p VF elements into a mask of
	/// \p VF * \p ReplicationFactor elements used by a predicated			/// \p VF * \p ReplicationFactor elements used by a predicated
	/// interleaved-group of loads/stores whose Interleaved-factor ==			/// interleaved-group of loads/stores whose Interleaved-factor ==
	/// \p ReplicationFactor.			/// \p ReplicationFactor.
	///			///
	/// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is:			/// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is:
	///			///
	▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/LLVMContext.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	enum : unsigned {
MD_align = 17, // "align"		MD_align = 17, // "align"
MD_loop = 18, // "llvm.loop"		MD_loop = 18, // "llvm.loop"
MD_type = 19, // "type"		MD_type = 19, // "type"
MD_section_prefix = 20, // "section_prefix"		MD_section_prefix = 20, // "section_prefix"
MD_absolute_symbol = 21, // "absolute_symbol"		MD_absolute_symbol = 21, // "absolute_symbol"
MD_associated = 22, // "associated"		MD_associated = 22, // "associated"
MD_callees = 23, // "callees"		MD_callees = 23, // "callees"
MD_irr_loop = 24, // "irr_loop"		MD_irr_loop = 24, // "irr_loop"
		MD_access_group = 25, // "llvm.access.group"
};		};

/// Known operand bundle tag IDs, which always have the same value. All		/// Known operand bundle tag IDs, which always have the same value. All
/// operand bundle tags that LLVM has special knowledge of are listed here.		/// operand bundle tags that LLVM has special knowledge of are listed here.
/// Additionally, this scheme allows LLVM to efficiently check for specific		/// Additionally, this scheme allows LLVM to efficiently check for specific
/// operand bundle tags without comparing strings.		/// operand bundle tags without comparing strings.
enum : unsigned {		enum : unsigned {
OB_deopt = 0, // "deopt"		OB_deopt = 0, // "deopt"
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	/// Returns the instructions that use values defined in the loop.			/// Returns the instructions that use values defined in the loop.
	SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);			SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);

	/// Find string metadata for loop			/// Find string metadata for loop
	///			///
	/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an			/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
	/// operand or null otherwise. If the string metadata is not found return			/// operand or null otherwise. If the string metadata is not found return
	/// Optional's not-a-value.			/// Optional's not-a-value.
	Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,			Optional<const MDOperand > findStringMetadataForLoop(const Loop TheLoop,
	StringRef Name);			StringRef Name);

	/// Find named metadata for a loop with an integer value.			/// Find named metadata for a loop with an integer value.
	llvm::Optional<int> getOptionalIntLoopAttribute(Loop *TheLoop, StringRef Name);			llvm::Optional<int> getOptionalIntLoopAttribute(Loop *TheLoop, StringRef Name);

	/// Create a new loop identifier for a loop created from a loop transformation.			/// Create a new loop identifier for a loop created from a loop transformation.
	///			///
	/// @param OrigLoopID The loop ID of the loop before the transformation.			/// @param OrigLoopID The loop ID of the loop before the transformation.
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/LoopInfo.cpp

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines
}		}

bool Loop::isAnnotatedParallel() const {		bool Loop::isAnnotatedParallel() const {
MDNode *DesiredLoopIdMetadata = getLoopID();		MDNode *DesiredLoopIdMetadata = getLoopID();

if (!DesiredLoopIdMetadata)		if (!DesiredLoopIdMetadata)
return false;		return false;

		MDNode *ParallelAccesses =
		findOptionMDForLoop(this, "llvm.loop.parallel_accesses");
		SmallPtrSet<MDNode *, 4>
		ParallelAccessGroups; // For scalable 'contains' check.
		if (ParallelAccesses) {
		for (const MDOperand &MD : drop_begin(ParallelAccesses->operands(), 1)) {
		MDNode *AccGroup = cast<MDNode>(MD.get());
		assert(isValidAsAccessGroup(AccGroup) &&
		"List item must be an access group");
		ParallelAccessGroups.insert(AccGroup);
		}
		}

// The loop branch contains the parallel loop metadata. In order to ensure		// The loop branch contains the parallel loop metadata. In order to ensure
// that any parallel-loop-unaware optimization pass hasn't added loop-carried		// that any parallel-loop-unaware optimization pass hasn't added loop-carried
// dependencies (thus converted the loop back to a sequential loop), check		// dependencies (thus converted the loop back to a sequential loop), check
// that all the memory instructions in the loop contain parallelism metadata		// that all the memory instructions in the loop belong to an access group that
// that point to the same unique "loop id metadata" the loop branch does.		// is parallel to this loop.
for (BasicBlock *BB : this->blocks()) {		for (BasicBlock *BB : this->blocks()) {
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
if (!I.mayReadOrWriteMemory())		if (!I.mayReadOrWriteMemory())
continue;		continue;

		if (MDNode *AccessGroup = I.getMetadata(LLVMContext::MD_access_group)) {
		auto ContainsAccessGroup = [&ParallelAccessGroups](MDNode *AG) -> bool {
		if (AG->getNumOperands() == 0) {
		assert(isValidAsAccessGroup(AG) && "Item must be an access group");
		return ParallelAccessGroups.count(AG);
		}

		for (const MDOperand &AccessListItem : AG->operands()) {
		MDNode *AccGroup = cast<MDNode>(AccessListItem.get());
		assert(isValidAsAccessGroup(AccGroup) &&
		"List item must be an access group");
		if (ParallelAccessGroups.count(AccGroup))
		return true;
		}
		return false;
		};

		if (ContainsAccessGroup(AccessGroup))
		continue;
		}

// The memory instruction can refer to the loop identifier metadata		// The memory instruction can refer to the loop identifier metadata
// directly or indirectly through another list metadata (in case of		// directly or indirectly through another list metadata (in case of
// nested parallel loops). The loop identifier metadata refers to		// nested parallel loops). The loop identifier metadata refers to
// itself so we can check both cases with the same routine.		// itself so we can check both cases with the same routine.
MDNode *LoopIdMD =		MDNode *LoopIdMD =
I.getMetadata(LLVMContext::MD_mem_parallel_loop_access);		I.getMetadata(LLVMContext::MD_mem_parallel_loop_access);

if (!LoopIdMD)		if (!LoopIdMD)
▲ Show 20 Lines • Show All 374 Lines • ▼ Show 20 Lines	if (!ExitBlocks.empty()) {
for (auto *Block : ExitBlocks)		for (auto *Block : ExitBlocks)
if (Block)		if (Block)
Block->print(OS);		Block->print(OS);
else		else
OS << "Printing <null> block";		OS << "Printing <null> block";
}		}
}		}

		MDNode llvm::findOptionMDForLoopID(MDNode LoopID, StringRef Name) {
		// No loop metadata node, no loop properties.
		if (!LoopID)
		return nullptr;

		// First operand should refer to the metadata node itself, for legacy reasons.
		assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
		assert(LoopID->getOperand(0) == LoopID && "invalid loop id");

		// Iterate over the metdata node operands and look for MDString metadata.
		for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
		MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
		if (!MD \|\| MD->getNumOperands() < 1)
		continue;
		MDString *S = dyn_cast<MDString>(MD->getOperand(0));
		if (!S)
		continue;
		// Return the operand node if MDString holds expected metadata.
		if (Name.equals(S->getString()))
		return MD;
		}

		// Loop property not found.
		return nullptr;
		}

		MDNode llvm::findOptionMDForLoop(const Loop TheLoop, StringRef Name) {
		return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
		}

		bool llvm::isValidAsAccessGroup(MDNode *Node) {
		return Node->getNumOperands() == 0 && Node->isDistinct();
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// LoopInfo implementation		// LoopInfo implementation
//		//

char LoopInfoWrapperPass::ID = 0;		char LoopInfoWrapperPass::ID = 0;
INITIALIZE_PASS_BEGIN(LoopInfoWrapperPass, "loops", "Natural Loop Information",		INITIALIZE_PASS_BEGIN(LoopInfoWrapperPass, "loops", "Natural Loop Information",
true, true)		true, true)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/VectorUtils.cpp

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	for (auto MI = ECs.member_begin(I), ME = ECs.member_end(); MI != ME; ++MI) {
if (MinBW < Ty->getScalarSizeInBits())		if (MinBW < Ty->getScalarSizeInBits())
MinBWs[cast<Instruction>(*MI)] = MinBW;		MinBWs[cast<Instruction>(*MI)] = MinBW;
}		}
}		}

return MinBWs;		return MinBWs;
}		}

		/// Add all access groups in @p AccGroups to @p List.
		template <typename ListT>
		static void addToAccessGroupList(ListT &List, MDNode *AccGroups) {
		// Interpret an access group as a list containing itself.
		if (AccGroups->getNumOperands() == 0) {
		assert(isValidAsAccessGroup(AccGroups) && "Node must be an access group");
		List.insert(AccGroups);
		return;
		}

		for (auto &AccGroupListOp : AccGroups->operands()) {
		auto *Item = cast<MDNode>(AccGroupListOp.get());
		assert(isValidAsAccessGroup(Item) && "List item must be an access group");
		List.insert(Item);
		}
		};

		MDNode llvm::uniteAccessGroups(MDNode AccGroups1, MDNode *AccGroups2) {
		if (!AccGroups1)
		return AccGroups2;
		if (!AccGroups2)
		return AccGroups1;
		if (AccGroups1 == AccGroups2)
		return AccGroups1;

		SmallSetVector<Metadata *, 4> Union;
		addToAccessGroupList(Union, AccGroups1);
		addToAccessGroupList(Union, AccGroups2);

		if (Union.size() == 0)
		return nullptr;
		if (Union.size() == 1)
		return cast<MDNode>(Union.front());

		LLVMContext &Ctx = AccGroups1->getContext();
		return MDNode::get(Ctx, Union.getArrayRef());
		}

		MDNode llvm::intersectAccessGroups(const Instruction Inst1,
		const Instruction *Inst2) {
		bool MayAccessMem1 = Inst1->mayReadOrWriteMemory();
		bool MayAccessMem2 = Inst2->mayReadOrWriteMemory();

		if (!MayAccessMem1 && !MayAccessMem2)
		return nullptr;
		if (!MayAccessMem1)
		return Inst2->getMetadata(LLVMContext::MD_access_group);
		if (!MayAccessMem2)
		return Inst1->getMetadata(LLVMContext::MD_access_group);

		MDNode *MD1 = Inst1->getMetadata(LLVMContext::MD_access_group);
		MDNode *MD2 = Inst2->getMetadata(LLVMContext::MD_access_group);
		if (!MD1 \|\| !MD2)
		return nullptr;
		if (MD1 == MD2)
		return MD1;

		// Use set for scalable 'contains' check.
		SmallPtrSet<Metadata *, 4> AccGroupSet2;
		addToAccessGroupList(AccGroupSet2, MD2);

		SmallVector<Metadata *, 4> Intersection;
		if (MD1->getNumOperands() == 0) {
		assert(isValidAsAccessGroup(MD1) && "Node must be an access group");
		if (AccGroupSet2.count(MD1))
		Intersection.push_back(MD1);
		} else {
		for (const MDOperand &Node : MD1->operands()) {
		auto *Item = cast<MDNode>(Node.get());
		assert(isValidAsAccessGroup(Item) && "List item must be an access group");
		if (AccGroupSet2.count(Item))
		Intersection.push_back(Item);
		}
		}

		if (Intersection.size() == 0)
		return nullptr;
		if (Intersection.size() == 1)
		return cast<MDNode>(Intersection.front());

		LLVMContext &Ctx = Inst1->getContext();
		return MDNode::get(Ctx, Intersection);
		}

/// \returns \p I after propagating metadata from \p VL.		/// \returns \p I after propagating metadata from \p VL.
Instruction llvm::propagateMetadata(Instruction Inst, ArrayRef<Value *> VL) {		Instruction llvm::propagateMetadata(Instruction Inst, ArrayRef<Value *> VL) {
Instruction *I0 = cast<Instruction>(VL[0]);		Instruction *I0 = cast<Instruction>(VL[0]);
SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;		SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;
I0->getAllMetadataOtherThanDebugLoc(Metadata);		I0->getAllMetadataOtherThanDebugLoc(Metadata);

for (auto Kind :		for (auto Kind : {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
{LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_fpmath,		LLVMContext::MD_noalias, LLVMContext::MD_fpmath,
LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load}) {		LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load,
		LLVMContext::MD_access_group}) {
MDNode *MD = I0->getMetadata(Kind);		MDNode *MD = I0->getMetadata(Kind);

for (int J = 1, E = VL.size(); MD && J != E; ++J) {		for (int J = 1, E = VL.size(); MD && J != E; ++J) {
const Instruction *IJ = cast<Instruction>(VL[J]);		const Instruction *IJ = cast<Instruction>(VL[J]);
MDNode *IMD = IJ->getMetadata(Kind);		MDNode *IMD = IJ->getMetadata(Kind);
switch (Kind) {		switch (Kind) {
case LLVMContext::MD_tbaa:		case LLVMContext::MD_tbaa:
MD = MDNode::getMostGenericTBAA(MD, IMD);		MD = MDNode::getMostGenericTBAA(MD, IMD);
break;		break;
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
MD = MDNode::getMostGenericAliasScope(MD, IMD);		MD = MDNode::getMostGenericAliasScope(MD, IMD);
break;		break;
case LLVMContext::MD_fpmath:		case LLVMContext::MD_fpmath:
MD = MDNode::getMostGenericFPMath(MD, IMD);		MD = MDNode::getMostGenericFPMath(MD, IMD);
break;		break;
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_nontemporal:		case LLVMContext::MD_nontemporal:
case LLVMContext::MD_invariant_load:		case LLVMContext::MD_invariant_load:
MD = MDNode::intersect(MD, IMD);		MD = MDNode::intersect(MD, IMD);
break;		break;
		case LLVMContext::MD_access_group:
		MD = intersectAccessGroups(Inst, IJ);
		break;
default:		default:
llvm_unreachable("unhandled metadata");		llvm_unreachable("unhandled metadata");
}		}
}		}

Inst->setMetadata(Kind, MD);		Inst->setMetadata(Kind, MD);
}		}

▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/LLVMContext.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	std::pair<unsigned, StringRef> MDKinds[] = {
{MD_align, "align"},		{MD_align, "align"},
{MD_loop, "llvm.loop"},		{MD_loop, "llvm.loop"},
{MD_type, "type"},		{MD_type, "type"},
{MD_section_prefix, "section_prefix"},		{MD_section_prefix, "section_prefix"},
{MD_absolute_symbol, "absolute_symbol"},		{MD_absolute_symbol, "absolute_symbol"},
{MD_associated, "associated"},		{MD_associated, "associated"},
{MD_callees, "callees"},		{MD_callees, "callees"},
{MD_irr_loop, "irr_loop"},		{MD_irr_loop, "irr_loop"},
		{MD_access_group, "llvm.access.group"},
};		};

for (auto &MDKind : MDKinds) {		for (auto &MDKind : MDKinds) {
unsigned ID = getMDKindID(MDKind.second);		unsigned ID = getMDKindID(MDKind.second);
assert(ID == MDKind.first && "metadata kind id drifted");		assert(ID == MDKind.first && "metadata kind id drifted");
(void)ID;		(void)ID;
}		}

▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	Instruction InstCombiner::SimplifyAnyMemTransfer(AnyMemTransferInst MI) {
// Alignment from the mem intrinsic will be better, so use it.		// Alignment from the mem intrinsic will be better, so use it.
L->setAlignment(CopySrcAlign);		L->setAlignment(CopySrcAlign);
if (CopyMD)		if (CopyMD)
L->setMetadata(LLVMContext::MD_tbaa, CopyMD);		L->setMetadata(LLVMContext::MD_tbaa, CopyMD);
MDNode *LoopMemParallelMD =		MDNode *LoopMemParallelMD =
MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);		MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
if (LoopMemParallelMD)		if (LoopMemParallelMD)
L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);		L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
		MDNode *AccessGroupMD = MI->getMetadata(LLVMContext::MD_access_group);
		if (AccessGroupMD)
		L->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);

StoreInst *S = Builder.CreateStore(L, Dest);		StoreInst *S = Builder.CreateStore(L, Dest);
// Alignment from the mem intrinsic will be better, so use it.		// Alignment from the mem intrinsic will be better, so use it.
S->setAlignment(CopyDstAlign);		S->setAlignment(CopyDstAlign);
if (CopyMD)		if (CopyMD)
S->setMetadata(LLVMContext::MD_tbaa, CopyMD);		S->setMetadata(LLVMContext::MD_tbaa, CopyMD);
if (LoopMemParallelMD)		if (LoopMemParallelMD)
S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);		S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
		if (AccessGroupMD)
		S->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);

if (auto *MT = dyn_cast<MemTransferInst>(MI)) {		if (auto *MT = dyn_cast<MemTransferInst>(MI)) {
// non-atomics can be volatile		// non-atomics can be volatile
L->setVolatile(MT->isVolatile());		L->setVolatile(MT->isVolatile());
S->setVolatile(MT->isVolatile());		S->setVolatile(MT->isVolatile());
}		}
if (isa<AtomicMemTransferInst>(MI)) {		if (isa<AtomicMemTransferInst>(MI)) {
// atomics have to be unordered		// atomics have to be unordered
▲ Show 20 Lines • Show All 4,598 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	for (const auto &MDPair : MD) {
case LLVMContext::MD_prof:		case LLVMContext::MD_prof:
case LLVMContext::MD_fpmath:		case LLVMContext::MD_fpmath:
case LLVMContext::MD_tbaa_struct:		case LLVMContext::MD_tbaa_struct:
case LLVMContext::MD_invariant_load:		case LLVMContext::MD_invariant_load:
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_nontemporal:		case LLVMContext::MD_nontemporal:
case LLVMContext::MD_mem_parallel_loop_access:		case LLVMContext::MD_mem_parallel_loop_access:
		case LLVMContext::MD_access_group:
// All of these directly apply.		// All of these directly apply.
NewLoad->setMetadata(ID, N);		NewLoad->setMetadata(ID, N);
break;		break;

case LLVMContext::MD_nonnull:		case LLVMContext::MD_nonnull:
copyNonnullMetadata(LI, N, *NewLoad);		copyNonnullMetadata(LI, N, *NewLoad);
break;		break;
case LLVMContext::MD_align:		case LLVMContext::MD_align:
▲ Show 20 Lines • Show All 1,132 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombinePHI.cpp

Show First 20 Lines • Show All 602 Lines • ▼ Show 20 Lines	unsigned KnownIDs[] = {
LLVMContext::MD_range,		LLVMContext::MD_range,
LLVMContext::MD_invariant_load,		LLVMContext::MD_invariant_load,
LLVMContext::MD_alias_scope,		LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias,		LLVMContext::MD_noalias,
LLVMContext::MD_nonnull,		LLVMContext::MD_nonnull,
LLVMContext::MD_align,		LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null,		LLVMContext::MD_dereferenceable_or_null,
		LLVMContext::MD_access_group,
};		};

for (unsigned ID : KnownIDs)		for (unsigned ID : KnownIDs)
NewLI->setMetadata(ID, FirstLI->getMetadata(ID));		NewLI->setMetadata(ID, FirstLI->getMetadata(ID));

// Add all operands to the new PHI and combine TBAA metadata.		// Add all operands to the new PHI and combine TBAA metadata.
for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {
LoadInst *LI = cast<LoadInst>(PN.getIncomingValue(i));		LoadInst *LI = cast<LoadInst>(PN.getIncomingValue(i));
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/GVNHoist.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	public:
const VNtoInsns &getStoreVNTable() const { return VNtoCallsStores; }		const VNtoInsns &getStoreVNTable() const { return VNtoCallsStores; }
};		};

static void combineKnownMetadata(Instruction ReplInst, Instruction I) {		static void combineKnownMetadata(Instruction ReplInst, Instruction I) {
static const unsigned KnownIDs[] = {		static const unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,		LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_invariant_group};		LLVMContext::MD_invariant_group, LLVMContext::MD_access_group};
combineMetadata(ReplInst, I, KnownIDs, true);		combineMetadata(ReplInst, I, KnownIDs, true);
}		}

// This pass hoists common computations across branches sharing common		// This pass hoists common computations across branches sharing common
// dominator. The primary goal is to reduce the code size, and in some		// dominator. The primary goal is to reduce the code size, and in some
// cases reduce critical path (by exposing more ILP).		// cases reduce critical path (by exposing more ILP).
class GVNHoist {		class GVNHoist {
public:		public:
▲ Show 20 Lines • Show All 950 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/LoopVersioningLICM.cpp

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	if (isLegalForVersioning()) {
DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);		LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);
LVer.versionLoop();		LVer.versionLoop();
// Set Loop Versioning metaData for original loop.		// Set Loop Versioning metaData for original loop.
addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);
// Set Loop Versioning metaData for version loop.		// Set Loop Versioning metaData for version loop.
addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);
// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.		// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.
		// FIXME: "llvm.mem.parallel_loop_access" annotates memory access
		// instructions, not loops.
addStringMetadataToLoop(LVer.getVersionedLoop(),		addStringMetadataToLoop(LVer.getVersionedLoop(),
"llvm.mem.parallel_loop_access");		"llvm.mem.parallel_loop_access");
// Update version loop with aggressive aliasing assumption.		// Update version loop with aggressive aliasing assumption.
setNoAliasToLoop(LVer.getVersionedLoop());		setNoAliasToLoop(LVer.getVersionedLoop());
Changed = true;		Changed = true;
}		}
return Changed;		return Changed;
}		}
Show All 18 Lines

llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 990 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::performCallSlotOptzn(Instruction cpy, Value cpyDest,
// its dependence information by changing its parameter.		// its dependence information by changing its parameter.
MD->removeInstruction(C);		MD->removeInstruction(C);

// Update AA metadata		// Update AA metadata
// FIXME: MD_tbaa_struct and MD_mem_parallel_loop_access should also be		// FIXME: MD_tbaa_struct and MD_mem_parallel_loop_access should also be
// handled here, but combineMetadata doesn't support them yet		// handled here, but combineMetadata doesn't support them yet
unsigned KnownIDs[] = {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		unsigned KnownIDs[] = {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias,		LLVMContext::MD_noalias,
LLVMContext::MD_invariant_group};		LLVMContext::MD_invariant_group,
		LLVMContext::MD_access_group};
combineMetadata(C, cpy, KnownIDs, true);		combineMetadata(C, cpy, KnownIDs, true);

// Remove the memcpy.		// Remove the memcpy.
MD->removeInstruction(cpy);		MD->removeInstruction(cpy);
++NumMemCpyInstr;		++NumMemCpyInstr;

return true;		return true;
}		}
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 2,587 Lines • ▼ Show 20 Lines	if (DL.getTypeSizeInBits(V->getType()) != IntTy->getBitWidth()) {
IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");		IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
Old = convertValue(DL, IRB, Old, IntTy);		Old = convertValue(DL, IRB, Old, IntTy);
assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");		assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");
uint64_t Offset = BeginOffset - NewAllocaBeginOffset;		uint64_t Offset = BeginOffset - NewAllocaBeginOffset;
V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");		V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");
}		}
V = convertValue(DL, IRB, V, NewAllocaTy);		V = convertValue(DL, IRB, V, NewAllocaTy);
StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());		StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
Store->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);		Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});
if (AATags)		if (AATags)
Store->setAAMetadata(AATags);		Store->setAAMetadata(AATags);
Pass.DeadInsts.insert(&SI);		Pass.DeadInsts.insert(&SI);
LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");		LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");
return true;		return true;
}		}

bool visitStoreInst(StoreInst &SI) {		bool visitStoreInst(StoreInst &SI) {
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (NewBeginOffset == NewAllocaBeginOffset &&
NewSI = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),		NewSI = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),
SI.isVolatile());		SI.isVolatile());
} else {		} else {
unsigned AS = SI.getPointerAddressSpace();		unsigned AS = SI.getPointerAddressSpace();
Value *NewPtr = getNewAllocaSlicePtr(IRB, V->getType()->getPointerTo(AS));		Value *NewPtr = getNewAllocaSlicePtr(IRB, V->getType()->getPointerTo(AS));
NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),		NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),
SI.isVolatile());		SI.isVolatile());
}		}
NewSI->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);		NewSI->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});
if (AATags)		if (AATags)
NewSI->setAAMetadata(AATags);		NewSI->setAAMetadata(AATags);
if (SI.isVolatile())		if (SI.isVolatile())
NewSI->setAtomic(SI.getOrdering(), SI.getSyncScopeID());		NewSI->setAtomic(SI.getOrdering(), SI.getSyncScopeID());
Pass.DeadInsts.insert(&SI);		Pass.DeadInsts.insert(&SI);
deleteIfTriviallyDead(OldOp);		deleteIfTriviallyDead(OldOp);

LLVM_DEBUG(dbgs() << " to: " << *NewSI << "\n");		LLVM_DEBUG(dbgs() << " to: " << *NewSI << "\n");
▲ Show 20 Lines • Show All 1,120 Lines • ▼ Show 20 Lines	for (;;) {
auto AS = LI->getPointerAddressSpace();		auto AS = LI->getPointerAddressSpace();
auto *PartPtrTy = PartTy->getPointerTo(AS);		auto *PartPtrTy = PartTy->getPointerTo(AS);
LoadInst *PLoad = IRB.CreateAlignedLoad(		LoadInst *PLoad = IRB.CreateAlignedLoad(
getAdjustedPtr(IRB, DL, BasePtr,		getAdjustedPtr(IRB, DL, BasePtr,
APInt(DL.getIndexSizeInBits(AS), PartOffset),		APInt(DL.getIndexSizeInBits(AS), PartOffset),
PartPtrTy, BasePtr->getName() + "."),		PartPtrTy, BasePtr->getName() + "."),
getAdjustedAlignment(LI, PartOffset, DL), /IsVolatile/ false,		getAdjustedAlignment(LI, PartOffset, DL), /IsVolatile/ false,
LI->getName());		LI->getName());
PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);		PLoad->copyMetadata(*LI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});

// Append this load onto the list of split loads so we can find it later		// Append this load onto the list of split loads so we can find it later
// to rewrite the stores.		// to rewrite the stores.
SplitLoads.push_back(PLoad);		SplitLoads.push_back(PLoad);

// Now build a new slice for the alloca.		// Now build a new slice for the alloca.
NewSlices.push_back(		NewSlices.push_back(
Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
Show All 39 Lines	for (User *LU : LI->users()) {

auto AS = SI->getPointerAddressSpace();		auto AS = SI->getPointerAddressSpace();
StoreInst *PStore = IRB.CreateAlignedStore(		StoreInst *PStore = IRB.CreateAlignedStore(
PLoad,		PLoad,
getAdjustedPtr(IRB, DL, StoreBasePtr,		getAdjustedPtr(IRB, DL, StoreBasePtr,
APInt(DL.getIndexSizeInBits(AS), PartOffset),		APInt(DL.getIndexSizeInBits(AS), PartOffset),
PartPtrTy, StoreBasePtr->getName() + "."),		PartPtrTy, StoreBasePtr->getName() + "."),
getAdjustedAlignment(SI, PartOffset, DL), /IsVolatile/ false);		getAdjustedAlignment(SI, PartOffset, DL), /IsVolatile/ false);
PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);		PStore->copyMetadata(*LI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});
LLVM_DEBUG(dbgs() << " +" << PartOffset << ":" << *PStore << "\n");		LLVM_DEBUG(dbgs() << " +" << PartOffset << ":" << *PStore << "\n");
}		}

// We want to immediately iterate on any allocas impacted by splitting		// We want to immediately iterate on any allocas impacted by splitting
// this store, and we have to track any promotable alloca (indicated by		// this store, and we have to track any promotable alloca (indicated by
// a direct store) as needing to be resplit because it is no longer		// a direct store) as needing to be resplit because it is no longer
// promotable.		// promotable.
if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(StoreBasePtr)) {		if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(StoreBasePtr)) {
▲ Show 20 Lines • Show All 723 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/Scalarizer.cpp

	Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines
	// vector to scalar instructions.			// vector to scalar instructions.
	bool ScalarizerVisitor::canTransferMetadata(unsigned Tag) {			bool ScalarizerVisitor::canTransferMetadata(unsigned Tag) {
	return (Tag == LLVMContext::MD_tbaa			return (Tag == LLVMContext::MD_tbaa
	\|\| Tag == LLVMContext::MD_fpmath			\|\| Tag == LLVMContext::MD_fpmath
	\|\| Tag == LLVMContext::MD_tbaa_struct			\|\| Tag == LLVMContext::MD_tbaa_struct
	\|\| Tag == LLVMContext::MD_invariant_load			\|\| Tag == LLVMContext::MD_invariant_load
	\|\| Tag == LLVMContext::MD_alias_scope			\|\| Tag == LLVMContext::MD_alias_scope
	\|\| Tag == LLVMContext::MD_noalias			\|\| Tag == LLVMContext::MD_noalias
	\|\| Tag == ParallelLoopAccessMDKind);			\|\| Tag == ParallelLoopAccessMDKind
				\|\| Tag == LLVMContext::MD_access_group);
	}			}

	// Transfer metadata from Op to the instructions in CV if it is known			// Transfer metadata from Op to the instructions in CV if it is known
	// to be safe to do so.			// to be safe to do so.
	void ScalarizerVisitor::transferMetadata(Instruction *Op, const ValueVector &CV) {			void ScalarizerVisitor::transferMetadata(Instruction *Op, const ValueVector &CV) {
	SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;			SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
	Op->getAllMetadataOtherThanDebugLoc(MDs);			Op->getAllMetadataOtherThanDebugLoc(MDs);
	for (unsigned I = 0, E = CV.size(); I != E; ++I) {			for (unsigned I = 0, E = CV.size(); I != E; ++I) {
	▲ Show 20 Lines • Show All 433 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp

Show All 25 Lines
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	static void HandleInlinedEHPad(InvokeInst II, BasicBlock FirstNewBlock,

// Now that everything is happy, we have one final detail. The PHI nodes in		// Now that everything is happy, we have one final detail. The PHI nodes in
// the exception destination block still have entries due to the original		// the exception destination block still have entries due to the original
// invoke instruction. Eliminate these entries (which might even delete the		// invoke instruction. Eliminate these entries (which might even delete the
// PHI node) now.		// PHI node) now.
UnwindDest->removePredecessor(InvokeBB);		UnwindDest->removePredecessor(InvokeBB);
}		}

/// When inlining a call site that has !llvm.mem.parallel_loop_access metadata,		/// When inlining a call site that has !llvm.mem.parallel_loop_access or
/// that metadata should be propagated to all memory-accessing cloned		/// llvm.access.group metadata, that metadata should be propagated to all
/// instructions.		/// memory-accessing cloned instructions.
static void PropagateParallelLoopAccessMetadata(CallSite CS,		static void PropagateParallelLoopAccessMetadata(CallSite CS,
ValueToValueMapTy &VMap) {		ValueToValueMapTy &VMap) {
MDNode *M =		MDNode *M =
CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);		CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
if (!M)		MDNode *CallAccessGroup =
		CS.getInstruction()->getMetadata(LLVMContext::MD_access_group);
		if (!M && !CallAccessGroup)
return;		return;

for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();		for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();
VMI != VMIE; ++VMI) {		VMI != VMIE; ++VMI) {
if (!VMI->second)		if (!VMI->second)
continue;		continue;

Instruction *NI = dyn_cast<Instruction>(VMI->second);		Instruction *NI = dyn_cast<Instruction>(VMI->second);
if (!NI)		if (!NI)
continue;		continue;

if (MDNode *PM = NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {		if (M) {
		if (MDNode *PM =
		NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
M = MDNode::concatenate(PM, M);		M = MDNode::concatenate(PM, M);
NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);		NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
} else if (NI->mayReadOrWriteMemory()) {		} else if (NI->mayReadOrWriteMemory()) {
NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);		NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
}		}
}		}

		if (NI->mayReadOrWriteMemory()) {
		MDNode *UnitedAccGroups = uniteAccessGroups(
		NI->getMetadata(LLVMContext::MD_access_group), CallAccessGroup);
		NI->setMetadata(LLVMContext::MD_access_group, UnitedAccGroups);
		}
		}
}		}

/// When inlining a function that contains noalias scope metadata,		/// When inlining a function that contains noalias scope metadata,
/// this metadata needs to be cloned so that the inlined blocks		/// this metadata needs to be cloned so that the inlined blocks
/// have different "unique scopes" at every call site. Were this not done, then		/// have different "unique scopes" at every call site. Were this not done, then
/// aliasing scopes from a function inlined into a caller multiple times could		/// aliasing scopes from a function inlined into a caller multiple times could
/// not be differentiated (and this would lead to miscompiles because the		/// not be differentiated (and this would lead to miscompiles because the
/// non-aliasing property communicated by the metadata could have		/// non-aliasing property communicated by the metadata could have
▲ Show 20 Lines • Show All 1,567 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/Local.cpp

Show All 28 Lines
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyValueInfo.h"		#include "llvm/Analysis/LazyValueInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
▲ Show 20 Lines • Show All 2,247 Lines • ▼ Show 20 Lines	switch (Kind) {
break;		break;
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
K->setMetadata(Kind, MDNode::getMostGenericAliasScope(JMD, KMD));		K->setMetadata(Kind, MDNode::getMostGenericAliasScope(JMD, KMD));
break;		break;
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_mem_parallel_loop_access:		case LLVMContext::MD_mem_parallel_loop_access:
K->setMetadata(Kind, MDNode::intersect(JMD, KMD));		K->setMetadata(Kind, MDNode::intersect(JMD, KMD));
break;		break;
		case LLVMContext::MD_access_group:
		K->setMetadata(LLVMContext::MD_access_group,
		intersectAccessGroups(K, J));
		break;
case LLVMContext::MD_range:		case LLVMContext::MD_range:

// If K does move, use most generic range. Otherwise keep the range of		// If K does move, use most generic range. Otherwise keep the range of
// K.		// K.
if (DoesKMove)		if (DoesKMove)
// FIXME: If K does move, we should drop the range info and nonnull.		// FIXME: If K does move, we should drop the range info and nonnull.
// Currently this function is used with DoesKMove in passes		// Currently this function is used with DoesKMove in passes
// doing hoisting/sinking and the current behavior of using the		// doing hoisting/sinking and the current behavior of using the
Show All 40 Lines
void llvm::combineMetadataForCSE(Instruction K, const Instruction J,		void llvm::combineMetadataForCSE(Instruction K, const Instruction J,
bool KDominatesJ) {		bool KDominatesJ) {
unsigned KnownIDs[] = {		unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_invariant_load, LLVMContext::MD_nonnull,		LLVMContext::MD_invariant_load, LLVMContext::MD_nonnull,
LLVMContext::MD_invariant_group, LLVMContext::MD_align,		LLVMContext::MD_invariant_group, LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null};		LLVMContext::MD_dereferenceable_or_null,
		LLVMContext::MD_access_group};
combineMetadata(K, J, KnownIDs, KDominatesJ);		combineMetadata(K, J, KnownIDs, KDominatesJ);
}		}

void llvm::patchReplacementInstruction(Instruction I, Value Repl) {		void llvm::patchReplacementInstruction(Instruction I, Value Repl) {
auto *ReplInst = dyn_cast<Instruction>(Repl);		auto *ReplInst = dyn_cast<Instruction>(Repl);
if (!ReplInst)		if (!ReplInst)
return;		return;

Show All 14 Lines	void llvm::patchReplacementInstruction(Instruction I, Value Repl) {

// In general, GVN unifies expressions over different control-flow		// In general, GVN unifies expressions over different control-flow
// regions, and so we need a conservative combination of the noalias		// regions, and so we need a conservative combination of the noalias
// scopes.		// scopes.
static const unsigned KnownIDs[] = {		static const unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,		LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull};		LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull,
		LLVMContext::MD_access_group};
combineMetadata(ReplInst, I, KnownIDs, false);		combineMetadata(ReplInst, I, KnownIDs, false);
}		}

template <typename RootType, typename DominatesFn>		template <typename RootType, typename DominatesFn>
static unsigned replaceDominatedUsesWith(Value From, Value To,		static unsigned replaceDominatedUsesWith(Value From, Value To,
const RootType &Root,		const RootType &Root,
const DominatesFn &Dominates) {		const DominatesFn &Dominates) {
assert(From->getType() == To->getType());		assert(From->getType() == To->getType());
▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	void llvm::initializeLoopPassPass(PassRegistry &Registry) {
INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
}		}

static Optional<MDNode > findOptionMDForLoopID(MDNode LoopID,
StringRef Name) {
// Return none if LoopID is false.
if (!LoopID)
return None;

// First operand should refer to the loop id itself.
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");

// Iterate over LoopID operands and look for MDString Metadata
for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
if (!MD)
continue;
MDString *S = dyn_cast<MDString>(MD->getOperand(0));
if (!S)
continue;
// Return true if MDString holds expected MetaData.
if (Name.equals(S->getString()))
return MD;
}
return None;
}

static Optional<MDNode > findOptionMDForLoop(const Loop TheLoop,
StringRef Name) {
return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
}

/// Find string metadata for loop		/// Find string metadata for loop
///		///
/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an		/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
/// operand or null otherwise. If the string metadata is not found return		/// operand or null otherwise. If the string metadata is not found return
/// Optional's not-a-value.		/// Optional's not-a-value.
Optional<const MDOperand > llvm::findStringMetadataForLoop(Loop TheLoop,		Optional<const MDOperand > llvm::findStringMetadataForLoop(const Loop TheLoop,
StringRef Name) {		StringRef Name) {
auto MD = findOptionMDForLoop(TheLoop, Name).getValueOr(nullptr);		MDNode *MD = findOptionMDForLoop(TheLoop, Name);
if (!MD)		if (!MD)
return None;		return None;
switch (MD->getNumOperands()) {		switch (MD->getNumOperands()) {
case 1:		case 1:
return nullptr;		return nullptr;
case 2:		case 2:
return &MD->getOperand(1);		return &MD->getOperand(1);
default:		default:
llvm_unreachable("loop metadata has 0 or 1 operand");		llvm_unreachable("loop metadata has 0 or 1 operand");
}		}
}		}

static Optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,		static Optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,
StringRef Name) {		StringRef Name) {
Optional<MDNode *> MD = findOptionMDForLoop(TheLoop, Name);		MDNode *MD = findOptionMDForLoop(TheLoop, Name);
if (!MD.hasValue())		if (!MD)
return None;
MDNode *OptionNode = MD.getValue();
if (OptionNode == nullptr)
return None;		return None;
switch (OptionNode->getNumOperands()) {		switch (MD->getNumOperands()) {
case 1:		case 1:
// When the value is absent it is interpreted as 'attribute set'.		// When the value is absent it is interpreted as 'attribute set'.
return true;		return true;
case 2:		case 2:
return mdconst::extract_or_null<ConstantInt>(		return mdconst::extract_or_null<ConstantInt>(MD->getOperand(1).get());
OptionNode->getOperand(1).get());
}		}
llvm_unreachable("unexpected number of options");		llvm_unreachable("unexpected number of options");
}		}

static bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {		static bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {
return getOptionalBoolLoopAttribute(TheLoop, Name).getValueOr(false);		return getOptionalBoolLoopAttribute(TheLoop, Name).getValueOr(false);
}		}

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (InheritAllAttrs \|\| InheritSomeAttrs) {
}		}
} else {		} else {
// Modified if we dropped at least one attribute.		// Modified if we dropped at least one attribute.
Changed = OrigLoopID->getNumOperands() > 1;		Changed = OrigLoopID->getNumOperands() > 1;
}		}

bool HasAnyFollowup = false;		bool HasAnyFollowup = false;
for (StringRef OptionName : FollowupOptions) {		for (StringRef OptionName : FollowupOptions) {
MDNode *FollowupNode =		MDNode *FollowupNode = findOptionMDForLoopID(OrigLoopID, OptionName);
findOptionMDForLoopID(OrigLoopID, OptionName).getValueOr(nullptr);
if (!FollowupNode)		if (!FollowupNode)
continue;		continue;

HasAnyFollowup = true;		HasAnyFollowup = true;
for (const MDOperand &Option : drop_begin(FollowupNode->operands(), 1)) {		for (const MDOperand &Option : drop_begin(FollowupNode->operands(), 1)) {
MDs.push_back(Option.get());		MDs.push_back(Option.get());
Changed = true;		Changed = true;
}		}
▲ Show 20 Lines • Show All 628 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines	if (isa<DbgInfoIntrinsic>(I1) \|\| isa<DbgInfoIntrinsic>(I2)) {
LLVMContext::MD_range,		LLVMContext::MD_range,
LLVMContext::MD_fpmath,		LLVMContext::MD_fpmath,
LLVMContext::MD_invariant_load,		LLVMContext::MD_invariant_load,
LLVMContext::MD_nonnull,		LLVMContext::MD_nonnull,
LLVMContext::MD_invariant_group,		LLVMContext::MD_invariant_group,
LLVMContext::MD_align,		LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null,		LLVMContext::MD_dereferenceable_or_null,
LLVMContext::MD_mem_parallel_loop_access};		LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group};
combineMetadata(I1, I2, KnownIDs, true);		combineMetadata(I1, I2, KnownIDs, true);

// I1 and I2 are being combined into a single instruction. Its debug		// I1 and I2 are being combined into a single instruction. Its debug
// location is the merged locations of the original instructions.		// location is the merged locations of the original instructions.
I1->applyMergedLocation(I1->getDebugLoc(), I2->getDebugLoc());		I1->applyMergedLocation(I1->getDebugLoc(), I2->getDebugLoc());

I2->eraseFromParent();		I2->eraseFromParent();
Changed = true;		Changed = true;
▲ Show 20 Lines • Show All 4,769 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/LoopInfo/annotated-parallel-complex.ll

				; RUN: opt -loops -analyze < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict 4n], double B[static const restrict 4n]) {
				; for (long i = 0; i < n; i += 1)
				; for (long j = 0; j < n; j += 1)
				; for (long k = 0; k < n; k += 1)
				; for (long l = 0; l < n; l += 1) {
				; A[i + j + k + l] = 21;
				; B[i + j + k + l] = 42;
				; }
				; }
				;
				; Check that isAnnotatedParallel is working as expected.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @func(i64 %n, double* noalias nonnull %A, double* noalias nonnull %B) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add28, %for.inc27 ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.cond2, label %for.end29

				for.cond2:
				%j.0 = phi i64 [ %add25, %for.inc24 ], [ 0, %for.cond ]
				%cmp3 = icmp slt i64 %j.0, %n
				br i1 %cmp3, label %for.cond6, label %for.inc27

				for.cond6:
				%k.0 = phi i64 [ %add22, %for.inc21 ], [ 0, %for.cond2 ]
				%cmp7 = icmp slt i64 %k.0, %n
				br i1 %cmp7, label %for.cond10, label %for.inc24

				for.cond10:
				%l.0 = phi i64 [ %add20, %for.body13 ], [ 0, %for.cond6 ]
				%cmp11 = icmp slt i64 %l.0, %n
				br i1 %cmp11, label %for.body13, label %for.inc21

				for.body13:
				%add = add nuw nsw i64 %i.0, %j.0
				%add14 = add nuw nsw i64 %add, %k.0
				%add15 = add nuw nsw i64 %add14, %l.0
				%arrayidx = getelementptr inbounds double, double* %A, i64 %add15
				store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !5
				%add16 = add nuw nsw i64 %i.0, %j.0
				%add17 = add nuw nsw i64 %add16, %k.0
				%add18 = add nuw nsw i64 %add17, %l.0
				%arrayidx19 = getelementptr inbounds double, double* %B, i64 %add18
				store double 4.200000e+01, double* %arrayidx19, align 8, !llvm.access.group !6
				%add20 = add nuw nsw i64 %l.0, 1
				br label %for.cond10, !llvm.loop !11

				for.inc21:
				%add22 = add nuw nsw i64 %k.0, 1
				br label %for.cond6, !llvm.loop !14

				for.inc24:
				%add25 = add nuw nsw i64 %j.0, 1
				br label %for.cond2, !llvm.loop !16

				for.inc27:
				%add28 = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !18

				for.end29:
				ret void
				}

				; access groups
				!7 = distinct !{}
				!8 = distinct !{}
				!10 = distinct !{}

				; access group lists
				!5 = !{!7, !10}
				!6 = !{!7, !8, !10}

				; LoopIDs
				!11 = distinct !{!11, !{!"llvm.loop.parallel_accesses", !10}}
				!14 = distinct !{!14, !{!"llvm.loop.parallel_accesses", !8, !10}}
				!16 = distinct !{!16, !{!"llvm.loop.parallel_accesses", !8}}
				!18 = distinct !{!18, !{!"llvm.loop.parallel_accesses", !7}}


				; CHECK: Parallel Loop at depth 1
				; CHECK-NOT: Parallel
				; CHECK: Loop at depth 2
				; CHECK: Parallel Loop
				; CHECK: Parallel Loop

llvm/trunk/test/Analysis/LoopInfo/annotated-parallel-simple.ll

				; RUN: opt -loops -analyze < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict n]) {
				; for (long i = 0; i < n; i += 1)
				; A[i] = 21;
				; }
				;
				; Check that isAnnotatedParallel is working as expected.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @func(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%arrayidx = getelementptr inbounds double, double* %A, i64 %i.0
				store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !6
				%add = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !7

				for.end:
				ret void
				}

				!6 = distinct !{} ; access group

				!7 = distinct !{!7, !9} ; LoopID
				!9 = !{!"llvm.loop.parallel_accesses", !6}


				; CHECK: Parallel Loop

llvm/trunk/test/ThinLTO/X86/lazyload_metadata.ll

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0
	; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0
	; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc			; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc
	; REQUIRES: asserts			; REQUIRES: asserts

	; Check that importing @globalfunc1 does not trigger loading all the global			; Check that importing @globalfunc1 does not trigger loading all the global
	; metadata for @globalfunc2 and @globalfunc3			; metadata for @globalfunc2 and @globalfunc3

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -stats \			; RUN: -o /dev/null -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY
	; LAZY: 55 bitcode-reader - Number of Metadata records loaded			; LAZY: 57 bitcode-reader - Number of Metadata records loaded
	; LAZY: 2 bitcode-reader - Number of MDStrings loaded			; LAZY: 2 bitcode-reader - Number of MDStrings loaded

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \			; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY
	; NOTLAZY: 64 bitcode-reader - Number of Metadata records loaded			; NOTLAZY: 66 bitcode-reader - Number of Metadata records loaded
	; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded			; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded


	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.11.0"			target triple = "x86_64-apple-macosx10.11.0"

	define void @globalfunc1(i32 %arg) {			define void @globalfunc1(i32 %arg) {
	%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")			%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")
	Show All 31 Lines

llvm/trunk/test/Transforms/Inline/parallel-loop-md-callee.ll

				; RUN: opt -S -inline < %s \| FileCheck %s
				;
				; Check that the !llvm.access.group is still present after inlining.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @Body(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p, i32 %i) {
				entry:
				%idxprom = sext i32 %i to i64
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
				%cmp = icmp eq i32 %0, 0
				%arrayidx2 = getelementptr inbounds i32, i32* %res, i64 %idxprom
				%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !0
				br i1 %cmp, label %cond.end, label %cond.false

				cond.false:
				%arrayidx6 = getelementptr inbounds i32, i32* %d, i64 %idxprom
				%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !0
				%add = add nsw i32 %2, %1
				br label %cond.end

				cond.end:
				%cond = phi i32 [ %add, %cond.false ], [ %1, %entry ]
				store i32 %cond, i32* %arrayidx2, align 4
				ret void
				}

				define void @Test(i32* %res, i32* %c, i32* %d, i32* %p, i32 %n) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%cmp = icmp slt i32 %i.0, 1600
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
				%inc = add nsw i32 %i.0, 1
				br label %for.cond, !llvm.loop !1

				for.end:
				ret void
				}

				!0 = distinct !{} ; access group
				!1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !0}} ; LoopID


				; CHECK-LABEL: @Test
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: store i32 {{.*}}, !llvm.access.group !0
				; CHECK: br label %for.cond, !llvm.loop !1

llvm/trunk/test/Transforms/Inline/parallel-loop-md-merge.ll

				; RUN: opt -always-inline -globalopt -S < %s \| FileCheck %s
				;
				; static void __attribute__((always_inline)) callee(long n, double A[static const restrict n], long i) {
				; for (long j = 0; j < n; j += 1)
				; A[i * n + j] = 42;
				; }
				;
				; void caller(long n, double A[static const restrict n]) {
				; for (long i = 0; i < n; i += 1)
				; callee(n, A, i);
				; }
				;
				; Check that the access groups (llvm.access.group) are correctly merged.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define internal void @callee(i64 %n, double* noalias nonnull %A, i64 %i) #0 {
				entry:
				br label %for.cond

				for.cond:
				%j.0 = phi i64 [ 0, %entry ], [ %add1, %for.body ]
				%cmp = icmp slt i64 %j.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%mul = mul nsw i64 %i, %n
				%add = add nsw i64 %mul, %j.0
				%arrayidx = getelementptr inbounds double, double* %A, i64 %add
				store double 4.200000e+01, double* %arrayidx, align 8, !llvm.access.group !6
				%add1 = add nuw nsw i64 %j.0, 1
				br label %for.cond, !llvm.loop !7

				for.end:
				ret void
				}

				attributes #0 = { alwaysinline }

				!6 = distinct !{} ; access group
				!7 = distinct !{!7, !9} ; LoopID
				!9 = !{!"llvm.loop.parallel_accesses", !6}


				define void @caller(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				call void @callee(i64 %n, double* %A, i64 %i.0), !llvm.access.group !10
				%add = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !11

				for.end:
				ret void
				}

				!10 = distinct !{} ; access group
				!11 = distinct !{!11, !12} ; LoopID
				!12 = !{!"llvm.loop.parallel_accesses", !10}


				; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group ![[ACCESS_GROUP_LIST_3:[0-9]+]]
				; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
				; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]

				; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
				; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
				; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
				; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], ![[ACCESSES_INNER:[0-9]+]]}
				; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
				; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], ![[ACCESSES_OUTER:[0-9]+]]}
				; CHECK: ![[ACCESSES_OUTER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}

llvm/trunk/test/Transforms/Inline/parallel-loop-md.ll

Show All 31 Lines	entry:
br label %for.cond		br label %for.cond

for.cond: ; preds = %for.body, %entry		for.cond: ; preds = %for.body, %entry
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp slt i32 %i.0, 1600		%cmp = icmp slt i32 %i.0, 1600
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.body: ; preds = %for.cond		for.body: ; preds = %for.cond
call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.mem.parallel_loop_access !0		call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
%inc = add nsw i32 %i.0, 1		%inc = add nsw i32 %i.0, 1
br label %for.cond, !llvm.loop !0		br label %for.cond, !llvm.loop !1

for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

; CHECK-LABEL: @Test		; CHECK-LABEL: @Test
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: store i32{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: store i32{{.*}}, !llvm.access.group !0
; CHECK: br label %for.cond, !llvm.loop !0		; CHECK: br label %for.cond, !llvm.loop !1

attributes #0 = { norecurse nounwind uwtable }		attributes #0 = { norecurse nounwind uwtable }

!0 = distinct !{!0}		!0 = distinct !{}
		!1 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !0}}

llvm/trunk/test/Transforms/InstCombine/intersect-accessgroup.ll

				; RUN: opt -instcombine -S < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict n]) {
				; for (int i = 0; i < n; i+=1)
				; for (int j = 0; j < n;j+=1)
				; for (int k = 0; k < n; k += 1)
				; for (int l = 0; l < n; l += 1) {
				; double *p = &A[i + j + k + l];
				; double x = *p;
				; double y = *p;
				; arg(x + y);
				; }
				; }
				;
				; Check for correctly merging access group metadata for instcombine
				; (only common loops are parallel == intersection)
				; Note that combined load would be parallel to loop !16 since both
				; origin loads are parallel to it, but it references two access groups
				; (!8 and !9), neither of which contain both loads. As such, the
				; information that the combined load is parallel to !16 is lost.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				declare void @arg(double)

				define void @func(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i32 [ 0, %entry ], [ %add31, %for.inc30 ]
				%conv = sext i32 %i.0 to i64
				%cmp = icmp slt i64 %conv, %n
				br i1 %cmp, label %for.cond2, label %for.end32

				for.cond2:
				%j.0 = phi i32 [ %add28, %for.inc27 ], [ 0, %for.cond ]
				%conv3 = sext i32 %j.0 to i64
				%cmp4 = icmp slt i64 %conv3, %n
				br i1 %cmp4, label %for.cond8, label %for.inc30

				for.cond8:
				%k.0 = phi i32 [ %add25, %for.inc24 ], [ 0, %for.cond2 ]
				%conv9 = sext i32 %k.0 to i64
				%cmp10 = icmp slt i64 %conv9, %n
				br i1 %cmp10, label %for.cond14, label %for.inc27

				for.cond14:
				%l.0 = phi i32 [ %add23, %for.body19 ], [ 0, %for.cond8 ]
				%conv15 = sext i32 %l.0 to i64
				%cmp16 = icmp slt i64 %conv15, %n
				br i1 %cmp16, label %for.body19, label %for.inc24

				for.body19:
				%add = add nsw i32 %i.0, %j.0
				%add20 = add nsw i32 %add, %k.0
				%add21 = add nsw i32 %add20, %l.0
				%idxprom = sext i32 %add21 to i64
				%arrayidx = getelementptr inbounds double, double* %A, i64 %idxprom
				%0 = load double, double* %arrayidx, align 8, !llvm.access.group !1
				%1 = load double, double* %arrayidx, align 8, !llvm.access.group !2
				%add22 = fadd double %0, %1
				call void @arg(double %add22), !llvm.access.group !3
				%add23 = add nsw i32 %l.0, 1
				br label %for.cond14, !llvm.loop !11

				for.inc24:
				%add25 = add nsw i32 %k.0, 1
				br label %for.cond8, !llvm.loop !14

				for.inc27:
				%add28 = add nsw i32 %j.0, 1
				br label %for.cond2, !llvm.loop !16

				for.inc30:
				%add31 = add nsw i32 %i.0, 1
				br label %for.cond, !llvm.loop !18

				for.end32:
				ret void
				}


				; access groups
				!7 = distinct !{}
				!8 = distinct !{}
				!9 = distinct !{}

				; access group lists
				!1 = !{!7, !9}
				!2 = !{!7, !8}
				!3 = !{!7, !8, !9}

				!11 = distinct !{!11, !13}
				!13 = !{!"llvm.loop.parallel_accesses", !7}

				!14 = distinct !{!14, !15}
				!15 = !{!"llvm.loop.parallel_accesses", !8}

				!16 = distinct !{!16, !17}
				!17 = !{!"llvm.loop.parallel_accesses", !8, !9}

				!18 = distinct !{!18, !19}
				!19 = !{!"llvm.loop.parallel_accesses", !9}


				; CHECK: load double, {{.*}} !llvm.access.group ![[ACCESSGROUP_0:[0-9]+]]
				; CHECK: br label %for.cond14, !llvm.loop ![[LOOP_4:[0-9]+]]

				; CHECK: ![[ACCESSGROUP_0]] = distinct !{}

				; CHECK: ![[LOOP_4]] = distinct !{![[LOOP_4]], ![[PARALLEL_ACCESSES_5:[0-9]+]]}
				; CHECK: ![[PARALLEL_ACCESSES_5]] = !{!"llvm.loop.parallel_accesses", ![[ACCESSGROUP_0]]}

llvm/trunk/test/Transforms/InstCombine/loadstore-metadata.ll

Show All 33 Lines	entry:
%l = load i32, i32* %ptr, !range !5		%l = load i32, i32* %ptr, !range !5
%c = bitcast i32 %l to float		%c = bitcast i32 %l to float
ret float %c		ret float %c
}		}

define i32 @test_load_cast_combine_invariant(float* %ptr) {		define i32 @test_load_cast_combine_invariant(float* %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.		; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.
; CHECK-LABEL: @test_load_cast_combine_invariant(		; CHECK-LABEL: @test_load_cast_combine_invariant(
; CHECK: load i32, i32* %{{.*}}, !invariant.load !5		; CHECK: load i32, i32* %{{.*}}, !invariant.load !7
entry:		entry:
%l = load float, float* %ptr, !invariant.load !6		%l = load float, float* %ptr, !invariant.load !6
%c = bitcast float %l to i32		%c = bitcast float %l to i32
ret i32 %c		ret i32 %c
}		}

define i32 @test_load_cast_combine_nontemporal(float* %ptr) {		define i32 @test_load_cast_combine_nontemporal(float* %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal		; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_nontemporal(		; CHECK-LABEL: @test_load_cast_combine_nontemporal(
; CHECK: load i32, i32* %{{.*}}, !nontemporal !6		; CHECK: load i32, i32* %{{.*}}, !nontemporal !8
entry:		entry:
%l = load float, float* %ptr, !nontemporal !7		%l = load float, float* %ptr, !nontemporal !7
%c = bitcast float %l to i32		%c = bitcast float %l to i32
ret i32 %c		ret i32 %c
}		}

define i8* @test_load_cast_combine_align(i32** %ptr) {		define i8* @test_load_cast_combine_align(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves align		; Ensure (cast (load (...))) -> (load (cast (...))) preserves align
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_align(		; CHECK-LABEL: @test_load_cast_combine_align(
; CHECK: load i8, i8* %{{.*}}, !align !7		; CHECK: load i8, i8* %{{.*}}, !align !9
entry:		entry:
%l = load i32, i32* %ptr, !align !8		%l = load i32, i32* %ptr, !align !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define i8* @test_load_cast_combine_deref(i32** %ptr) {		define i8* @test_load_cast_combine_deref(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable		; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_deref(		; CHECK-LABEL: @test_load_cast_combine_deref(
; CHECK: load i8, i8* %{{.*}}, !dereferenceable !7		; CHECK: load i8, i8* %{{.*}}, !dereferenceable !9
entry:		entry:
%l = load i32, i32* %ptr, !dereferenceable !8		%l = load i32, i32* %ptr, !dereferenceable !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define i8* @test_load_cast_combine_deref_or_null(i32** %ptr) {		define i8* @test_load_cast_combine_deref_or_null(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves		; Ensure (cast (load (...))) -> (load (cast (...))) preserves
; dereferenceable_or_null metadata.		; dereferenceable_or_null metadata.
; CHECK-LABEL: @test_load_cast_combine_deref_or_null(		; CHECK-LABEL: @test_load_cast_combine_deref_or_null(
; CHECK: load i8, i8* %{{.*}}, !dereferenceable_or_null !7		; CHECK: load i8, i8* %{{.*}}, !dereferenceable_or_null !9
entry:		entry:
%l = load i32, i32* %ptr, !dereferenceable_or_null !8		%l = load i32, i32* %ptr, !dereferenceable_or_null !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define void @test_load_cast_combine_loop(float* %src, i32* %dst, i32 %n) {		define void @test_load_cast_combine_loop(float* %src, i32* %dst, i32 %n) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access		; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_loop(		; CHECK-LABEL: @test_load_cast_combine_loop(
; CHECK: load i32, i32* %{{.*}}, !llvm.mem.parallel_loop_access !4		; CHECK: load i32, i32* %{{.*}}, !llvm.access.group !6
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]		%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
%src.gep = getelementptr inbounds float, float* %src, i32 %i		%src.gep = getelementptr inbounds float, float* %src, i32 %i
%dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i		%dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i
%l = load float, float* %src.gep, !llvm.mem.parallel_loop_access !4		%l = load float, float* %src.gep, !llvm.access.group !9
%c = bitcast float %l to i32		%c = bitcast float %l to i32
store i32 %c, i32* %dst.gep		store i32 %c, i32* %dst.gep
%i.next = add i32 %i, 1		%i.next = add i32 %i, 1
%cmp = icmp slt i32 %i.next, %n		%cmp = icmp slt i32 %i.next, %n
br i1 %cmp, label %loop, label %exit, !llvm.loop !1		br i1 %cmp, label %loop, label %exit, !llvm.loop !1

exit:		exit:
ret void		ret void
Show All 23 Lines
}		}

; This is the metadata tuple that we reference above:		; This is the metadata tuple that we reference above:
; CHECK: ![[MD]] = !{i64 1, i64 0}		; CHECK: ![[MD]] = !{i64 1, i64 0}
!0 = !{!1, !1, i64 0}		!0 = !{!1, !1, i64 0}
!1 = !{!"scalar type", !2}		!1 = !{!"scalar type", !2}
!2 = !{!"root"}		!2 = !{!"root"}
!3 = distinct !{!3, !4}		!3 = distinct !{!3, !4}
!4 = distinct !{!4}		!4 = distinct !{!4, !{!"llvm.loop.parallel_accesses", !9}}
!5 = !{i32 0, i32 42}		!5 = !{i32 0, i32 42}
!6 = !{}		!6 = !{}
!7 = !{i32 1}		!7 = !{i32 1}
!8 = !{i64 8}		!8 = !{i64 8}
		!9 = distinct !{}

llvm/trunk/test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	;			;
	; Make sure the llvm.mem.parallel_loop_access meta-data is preserved			; Make sure the llvm.access.group meta-data is preserved
	; when a memcpy is replaced with a load+store by instcombine			; when a memcpy is replaced with a load+store by instcombine
	;			;
	; #include <string.h>			; #include <string.h>
	; void test(char* out, long size)			; void test(char* out, long size)
	; {			; {
	; #pragma clang loop vectorize(assume_safety)			; #pragma clang loop vectorize(assume_safety)
	; for (long i = 0; i < size; i+=2) {			; for (long i = 0; i < size; i+=2) {
	; memcpy(&(out[i]), &(out[i+size]), 2);			; memcpy(&(out[i]), &(out[i+size]), 2);
	; }			; }
	; }			; }

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: %{{.}} = load i16, i16 %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1			; CHECK: %{{.}} = load i16, i16 %{{.*}}, align 1, !llvm.access.group !1
	; CHECK: store i16 %{{.}}, i16 %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1			; CHECK: store i16 %{{.}}, i16 %{{.*}}, align 1, !llvm.access.group !1


	; ModuleID = '<stdin>'			; ModuleID = '<stdin>'
	source_filename = "memcpy.pragma.cpp"			source_filename = "memcpy.pragma.cpp"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @_Z4testPcl(i8* %out, i64 %size) #0 {			define void @_Z4testPcl(i8* %out, i64 %size) #0 {
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.inc, %entry			for.cond: ; preds = %for.inc, %entry
	%i.0 = phi i64 [ 0, %entry ], [ %add2, %for.inc ]			%i.0 = phi i64 [ 0, %entry ], [ %add2, %for.inc ]
	%cmp = icmp slt i64 %i.0, %size			%cmp = icmp slt i64 %i.0, %size
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	%arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0			%arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0
	%add = add nsw i64 %i.0, %size			%add = add nsw i64 %i.0, %size
	%arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add			%arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.mem.parallel_loop_access !1			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.access.group !4
	br label %for.inc			br label %for.inc

	for.inc: ; preds = %for.body			for.inc: ; preds = %for.body
	%add2 = add nsw i64 %i.0, 2			%add2 = add nsw i64 %i.0, 2
	br label %for.cond, !llvm.loop !2			br label %for.cond, !llvm.loop !2

	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	ret void			ret void
	}			}

	; Function Attrs: argmemonly nounwind			; Function Attrs: argmemonly nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

	attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { argmemonly nounwind }			attributes #1 = { argmemonly nounwind }

	!llvm.ident = !{!0}			!llvm.ident = !{!0}

	!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}			!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
	!1 = distinct !{!1, !2, !3}			!1 = distinct !{!1, !2, !3, !{!"llvm.loop.parallel_accesses", !4}}
	!2 = distinct !{!2, !3}			!2 = distinct !{!2, !3}
	!3 = !{!"llvm.loop.vectorize.enable", i1 true}			!3 = !{!"llvm.loop.vectorize.enable", i1 true}
				!4 = distinct !{} ; access group

llvm/trunk/test/Transforms/LoopVectorize/X86/force-ifcvt.ll

	; RUN: opt -loop-vectorize -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {			define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; CHECK-LABEL: @Test			; CHECK-LABEL: @Test
	; CHECK: <4 x i32>			; CHECK: <4 x i32>

	for.body: ; preds = %cond.end, %entry			for.body: ; preds = %cond.end, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
	%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !1
	%cmp1 = icmp eq i32 %0, 0			%cmp1 = icmp eq i32 %0, 0
	%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			%1 = load i32, i32* %arrayidx3, align 4, !llvm.access.group !1
	br i1 %cmp1, label %cond.end, label %cond.false			br i1 %cmp1, label %cond.end, label %cond.false

	cond.false: ; preds = %for.body			cond.false: ; preds = %for.body
	%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0			%2 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !1
	%add = add nsw i32 %2, %1			%add = add nsw i32 %2, %1
	br label %cond.end			br label %cond.end

	cond.end: ; preds = %for.body, %cond.false			cond.end: ; preds = %for.body, %cond.false
	%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]			%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
	store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			store i32 %cond, i32* %arrayidx3, align 4, !llvm.access.group !1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %cond.end			for.end: ; preds = %cond.end
	ret void			ret void
	}			}

	attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }			attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

	!0 = distinct !{!0}			!0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
				!1 = distinct !{}

llvm/trunk/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll

Show All 13 Lines	entry:
%indvars.iv.reg2mem = alloca i64		%indvars.iv.reg2mem = alloca i64
%"reg2mem alloca point" = bitcast i32 0 to i32		%"reg2mem alloca point" = bitcast i32 0 to i32
store i64 0, i64* %indvars.iv.reg2mem		store i64 0, i64* %indvars.iv.reg2mem
br label %for.body		br label %for.body

for.body: ; preds = %for.body.for.body_crit_edge, %entry		for.body: ; preds = %for.body.for.body_crit_edge, %entry
%indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem		%indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem
%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload		%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload
%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3		%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !4
%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload		%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload
%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3		%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !4
%idxprom3 = sext i32 %1 to i64		%idxprom3 = sext i32 %1 to i64
%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3		%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3		store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !4
%indvars.iv.next = add i64 %indvars.iv.reload, 1		%indvars.iv.next = add i64 %indvars.iv.reload, 1
; A new store without the parallel metadata here:		; A new store without the parallel metadata here:
store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem		store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem
%indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem
%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1		%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1
%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3		%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !4
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3		store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !4
%indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem
%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32		%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
%exitcond = icmp eq i32 %lftr.wideiv, 512		%exitcond = icmp eq i32 %lftr.wideiv, 512
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3		br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3

for.body.for.body_crit_edge: ; preds = %for.body		for.body.for.body_crit_edge: ; preds = %for.body
%indvars.iv.next.reload2 = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload2 = load i64, i64* %indvars.iv.next.reg2mem
store i64 %indvars.iv.next.reload2, i64* %indvars.iv.reg2mem		store i64 %indvars.iv.next.reload2, i64* %indvars.iv.reg2mem
br label %for.body		br label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!3 = !{!3}		!3 = !{!3, !{!"llvm.loop.parallel_accesses", !4}}
		!4 = distinct !{}

llvm/trunk/test/Transforms/LoopVectorize/X86/parallel-loops.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	;CHECK: <4 x i32>			;CHECK: <4 x i32>
	define void @parallel_loop(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {			define void @parallel_loop(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !13
	%idxprom3 = sext i32 %1 to i64			%idxprom3 = sext i32 %1 to i64
	%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3			%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
	; This store might have originated from inlining a function with a parallel			; This store might have originated from inlining a function with a parallel
	; loop. Refers to a list with the "original loop reference" (!4) also included.			; loop. Refers to a list with the "original loop reference" (!4) also included.
	store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !5			store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !15
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next			%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
	%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3			%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !13
	store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !13
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 512			%exitcond = icmp eq i32 %lftr.wideiv, 512
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	; The same loop with an illegal parallel loop metadata: the memory			; The same loop with an illegal parallel loop metadata: the memory
	; accesses refer to a different loop's identifier.			; accesses refer to a different loop's identifier.

	;CHECK-LABEL: @mixed_metadata(			;CHECK-LABEL: @mixed_metadata(
	;CHECK-NOT: <4 x i32>			;CHECK-NOT: <4 x i32>

	define void @mixed_metadata(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {			define void @mixed_metadata(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !6			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !16
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6			%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !16
	%idxprom3 = sext i32 %1 to i64			%idxprom3 = sext i32 %1 to i64
	%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3			%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
	; This refers to the loop marked with !7 which we are not in at the moment.			; This refers to the loop marked with !7 which we are not in at the moment.
	; It should prevent detecting as a parallel loop.			; It should prevent detecting as a parallel loop.
	store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !7			store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !17
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next			%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
	%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !6			%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !16
	store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6			store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !16
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 512			%exitcond = icmp eq i32 %lftr.wideiv, 512
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13, !15}}
	!4 = !{!4}			!4 = !{!4, !{!"llvm.loop.parallel_accesses", !14, !15}}
	!5 = !{!3, !4}			!6 = !{!6, !{!"llvm.loop.parallel_accesses", !16}}
	!6 = !{!6}			!7 = !{!7, !{!"llvm.loop.parallel_accesses", !17}}
	!7 = !{!7}			!13 = distinct !{}
				!14 = distinct !{}
				!15 = distinct !{}
				!16 = distinct !{}
				!17 = distinct !{}

llvm/trunk/test/Transforms/LoopVectorize/X86/pr34438.ll

	Show All 12 Lines
	; CHECK: load <8 x float>, <8 x float>*			; CHECK: load <8 x float>, <8 x float>*
	; CHECK: fadd fast <8 x float>			; CHECK: fadd fast <8 x float>
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !5
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !5
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !5
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 8			%exitcond = icmp eq i64 %indvars.iv.next, 8
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !5}}
	!4 = !{!4}			!4 = !{!4}
				!5 = distinct !{}

llvm/trunk/test/Transforms/LoopVectorize/X86/vect.omp.force.ll

	Show All 26 Lines

	define void @vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {			define void @vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
	%call = tail call float @llvm.sin.f32(float %0)			%call = tail call float @llvm.sin.f32(float %0)
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			store float %call, float* %arrayidx2, align 4, !llvm.access.group !11
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 1000			%exitcond = icmp eq i32 %lftr.wideiv, 1000
	br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !1			br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !1

	for.end.loopexit:			for.end.loopexit:
	br label %for.end			br label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

	!1 = !{!1, !2}			!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
	!2 = !{!"llvm.loop.vectorize.enable", i1 true}			!2 = !{!"llvm.loop.vectorize.enable", i1 true}
				!11 = distinct !{}

	;			;
	; This method will not be vectorized, as scalar cost is lower than any of vector costs.			; This method will not be vectorized, as scalar cost is lower than any of vector costs.
	;			;

	define void @not_vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {			define void @not_vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%call = tail call float @llvm.sin.f32(float %0)			%call = tail call float @llvm.sin.f32(float %0)
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %call, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 1000			%exitcond = icmp eq i32 %lftr.wideiv, 1000
	br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !3

	for.end.loopexit:			for.end.loopexit:
	br label %for.end			br label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

	declare float @llvm.sin.f32(float) nounwind readnone			declare float @llvm.sin.f32(float) nounwind readnone

	; Dummy metadata			; Dummy metadata
	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
				!13 = distinct !{}

llvm/trunk/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

	Show All 35 Lines
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !1
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !4			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !5
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !11
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			store float %add, float* %arrayidx2, align 4, !llvm.access.group !11
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 20			%exitcond = icmp eq i64 %indvars.iv.next, 20
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

	for.end:			for.end:
	ret void			ret void
	}			}

	!1 = !{!1, !2}			!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
	!2 = !{!"llvm.loop.vectorize.enable", i1 true}			!2 = !{!"llvm.loop.vectorize.enable", i1 true}
				!11 = distinct !{}

	;			;
	; This loop will be vectorized as the trip count is below the threshold but no			; This loop will be vectorized as the trip count is below the threshold but no
	; scalar iterations are needed thanks to folding its tail.			; scalar iterations are needed thanks to folding its tail.
	;			;
	define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized1(			; CHECK-LABEL: @vectorized1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	Show All 15 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>			; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP7]], <8 x float>* [[TMP9]], i32 4, <8 x i1> [[TMP8]])			; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP7]], <8 x float>* [[TMP9]], i32 4, <8 x i1> [[TMP8]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 20			%exitcond = icmp eq i64 %indvars.iv.next, 20
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

	for.end:			for.end:
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
				!13 = distinct !{}

	;			;
	; This loop will be vectorized as the trip count is below the threshold but no			; This loop will be vectorized as the trip count is below the threshold but no
	; scalar iterations are needed.			; scalar iterations are needed.
	;			;
	define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized2(			; CHECK-LABEL: @vectorized2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	Show All 14 Lines
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !9			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !11
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.access.group !7
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.access.group !7
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !7
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 16			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 16
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !10			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !12
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}

	!4 = !{!4}			!4 = !{!4}

llvm/trunk/test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: not_too_small_tc			; CHECK-LABEL: not_too_small_tc
	; CHECK-AVX2: LV: Selecting VF: 16.			; CHECK-AVX2: LV: Selecting VF: 16.
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv
	%l1 = load i8, i8* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%l1 = load i8, i8* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv
	%l2 = load i8, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%l2 = load i8, i8* %arrayidx2, align 4, !llvm.access.group !13
	%add = add i8 %l1, %l2			%add = add i8 %l1, %l2
	store i8 %add, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store i8 %add, i8* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}
	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
	!4 = !{!4}			!4 = !{!4}
				!13 = distinct !{}

llvm/trunk/test/Transforms/SROA/mem-par-metadata-sroa.ll

; RUN: opt < %s -sroa -S \| FileCheck %s		; RUN: opt < %s -sroa -S \| FileCheck %s
;		;
; Make sure the llvm.mem.parallel_loop_access meta-data is preserved		; Make sure the llvm.access.group meta-data is preserved
; when a load/store is replaced with another load/store by sroa		; when a load/store is replaced with another load/store by sroa
;		;
; class Complex {		; class Complex {
; private:		; private:
; float real_;		; float real_;
; float imaginary_;		; float imaginary_;
;		;
; public:		; public:
Show All 16 Lines
; for (long offset = 0; offset < size; ++offset) {		; for (long offset = 0; offset < size; ++offset) {
; Complex t0 = out[offset];		; Complex t0 = out[offset];
; out[offset] = t0 + t0;		; out[offset] = t0 + t0;
; }		; }
; }		; }

; CHECK: for.body:		; CHECK: for.body:
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1		; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.access.group !1
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1		; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.access.group !1
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: br label		; CHECK: br label

; ModuleID = '<stdin>'		; ModuleID = '<stdin>'
source_filename = "mem-par-metadata-sroa1.cpp"		source_filename = "mem-par-metadata-sroa1.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

Show All 11 Lines	for.cond: ; preds = %for.body, %entry
%offset.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]		%offset.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp slt i64 %offset.0, %size		%cmp = icmp slt i64 %offset.0, %size
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.body: ; preds = %for.cond		for.body: ; preds = %for.cond
%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0		%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
%real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0		%real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0
%0 = load float, float* %real_.i.i, align 4, !llvm.mem.parallel_loop_access !1		%0 = load float, float* %real_.i.i, align 4, !llvm.access.group !11
store float %0, float* %real_.i, align 4, !llvm.mem.parallel_loop_access !1		store float %0, float* %real_.i, align 4, !llvm.access.group !11
%imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1		%imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1
%1 = load float, float* %imaginary_.i.i, align 4, !llvm.mem.parallel_loop_access !1		%1 = load float, float* %imaginary_.i.i, align 4, !llvm.access.group !11
store float %1, float* %imaginary_.i, align 4, !llvm.mem.parallel_loop_access !1		store float %1, float* %imaginary_.i, align 4, !llvm.access.group !11
%arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0		%arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
%real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.access.group !11
%real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.access.group !11
%add.i = fadd float %2, %3		%add.i = fadd float %2, %3
%imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.access.group !11
%imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.access.group !11
%add4.i = fadd float %4, %5		%add4.i = fadd float %4, %5
%real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0		%real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0
store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1		store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.access.group !11
%imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1		%imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1
store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1		store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.access.group !11
%6 = bitcast %class.Complex* %arrayidx1 to i64*		%6 = bitcast %class.Complex* %arrayidx1 to i64*
%7 = load i64, i64* %ref.tmp, align 8, !llvm.mem.parallel_loop_access !1		%7 = load i64, i64* %ref.tmp, align 8, !llvm.access.group !11
store i64 %7, i64* %6, align 4, !llvm.mem.parallel_loop_access !1		store i64 %7, i64* %6, align 4, !llvm.access.group !11
%inc = add nsw i64 %offset.0, 1		%inc = add nsw i64 %offset.0, 1
br label %for.cond, !llvm.loop !1		br label %for.cond, !llvm.loop !1

for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

; Function Attrs: argmemonly nounwind		; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1		declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

attributes #0 = { norecurse nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #0 = { norecurse nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }		attributes #1 = { argmemonly nounwind }

!llvm.ident = !{!0}		!llvm.ident = !{!0}

!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}		!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
!1 = distinct !{!1, !2}		!1 = distinct !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
!2 = !{!"llvm.loop.vectorize.enable", i1 true}		!2 = !{!"llvm.loop.vectorize.enable", i1 true}
!3 = !{!4}		!3 = !{!4}
!4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}		!4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}
!5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}		!5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}
		!11 = distinct !{}

llvm/trunk/test/Transforms/Scalarizer/basic.ll

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	; CHECK: store i32 %add.i3, i32* %dst.i3, align 4, !tbaa.struct ![[TAG]]			; CHECK: store i32 %add.i3, i32* %dst.i3, align 4, !tbaa.struct ![[TAG]]
	; CHECK: ret void			; CHECK: ret void
	%val = load <4 x i32> , <4 x i32> *%src, !tbaa.struct !5			%val = load <4 x i32> , <4 x i32> *%src, !tbaa.struct !5
	%add = add <4 x i32> %val, %val			%add = add <4 x i32> %val, %val
	store <4 x i32> %add, <4 x i32> *%dst, !tbaa.struct !5			store <4 x i32> %add, <4 x i32> *%dst, !tbaa.struct !5
	ret void			ret void
	}			}

	; Check that llvm.mem.parallel_loop_access information is preserved.			; Check that llvm.access.group information is preserved.
	define void @f5(i32 %count, <4 x i32> %src, <4 x i32> %dst) {			define void @f5(i32 %count, <4 x i32> %src, <4 x i32> %dst) {
	; CHECK-LABEL: @f5(			; CHECK-LABEL: @f5(
	; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG:[0-9]*]]			; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.access.group ![[TAG:[0-9]*]]
	; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.access.group ![[TAG]]
	; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.access.group ![[TAG]]
	; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.access.group ![[TAG]]
	; CHECK: ret void			; CHECK: ret void
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%index = phi i32 [ 0, %entry ], [ %next_index, %loop ]			%index = phi i32 [ 0, %entry ], [ %next_index, %loop ]
	%this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index			%this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index
	%this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index			%this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index
	%val = load <4 x i32> , <4 x i32> *%this_src, !llvm.mem.parallel_loop_access !3			%val = load <4 x i32> , <4 x i32> *%this_src, !llvm.access.group !13
	%add = add <4 x i32> %val, %val			%add = add <4 x i32> %val, %val
	store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.mem.parallel_loop_access !3			store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.access.group !13
	%next_index = add i32 %index, -1			%next_index = add i32 %index, -1
	%continue = icmp ne i32 %next_index, %count			%continue = icmp ne i32 %next_index, %count
	br i1 %continue, label %loop, label %end, !llvm.loop !3			br i1 %continue, label %loop, label %end, !llvm.loop !3

	end:			end:
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines

	exit:			exit:
	ret <4 x float> %next_acc			ret <4 x float> %next_acc
	}			}

	!0 = !{ !"root" }			!0 = !{ !"root" }
	!1 = !{ !"set1", !0 }			!1 = !{ !"set1", !0 }
	!2 = !{ !"set2", !0 }			!2 = !{ !"set2", !0 }
	!3 = !{ !3 }			!3 = !{ !3, !{!"llvm.loop.parallel_accesses", !13} }
	!4 = !{ float 4.0 }			!4 = !{ float 4.0 }
	!5 = !{ i64 0, i64 8, null }			!5 = !{ i64 0, i64 8, null }
				!13 = distinct !{}

llvm/trunk/test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll

	; RUN: opt -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -simplifycfg -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {			define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; CHECK-LABEL: @Test			; CHECK-LABEL: @Test
	; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
	; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
	; CHECK: store i32 {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: store i32 {{.*}}, align 4, !llvm.access.group !0
	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK-NOT: store			; CHECK-NOT: store

	for.body: ; preds = %cond.end, %entry			for.body: ; preds = %cond.end, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
	%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
	%cmp1 = icmp eq i32 %0, 0			%cmp1 = icmp eq i32 %0, 0
	br i1 %cmp1, label %cond.true, label %cond.false			br i1 %cmp1, label %cond.true, label %cond.false

	cond.false: ; preds = %for.body			cond.false: ; preds = %for.body
	%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%v = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			%v = load i32, i32* %arrayidx3, align 4, !llvm.access.group !0
	%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0			%1 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !0
	%add = add nsw i32 %1, %v			%add = add nsw i32 %1, %v
	br label %cond.end			br label %cond.end

	cond.true: ; preds = %for.body			cond.true: ; preds = %for.body
	%arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%w = load i32, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0			%w = load i32, i32* %arrayidx4, align 4, !llvm.access.group !0
	%arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%2 = load i32, i32* %arrayidx8, align 4, !llvm.mem.parallel_loop_access !0			%2 = load i32, i32* %arrayidx8, align 4, !llvm.access.group !0
	%add2 = add nsw i32 %2, %w			%add2 = add nsw i32 %2, %w
	br label %cond.end			br label %cond.end

	cond.end: ; preds = %for.body, %cond.false			cond.end: ; preds = %for.body, %cond.false
	%cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]			%cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]
	%arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	store i32 %cond, i32* %arrayidx9, align 4, !llvm.mem.parallel_loop_access !0			store i32 %cond, i32* %arrayidx9, align 4, !llvm.access.group !0
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %cond.end			for.end: ; preds = %cond.end
	ret void			ret void
	}			}

	attributes #0 = { norecurse nounwind uwtable }			attributes #0 = { norecurse nounwind uwtable }

	!0 = distinct !{!0, !1}			!0 = distinct !{!0, !1, !{!"llvm.loop.parallel_accesses", !10}}
	!1 = !{!"llvm.loop.vectorize.enable", i1 true}			!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				!10 = distinct !{}

This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 179014

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/Analysis/LoopInfo.h

llvm/trunk/include/llvm/Analysis/LoopInfoImpl.h

llvm/trunk/include/llvm/Analysis/VectorUtils.h

llvm/trunk/include/llvm/IR/LLVMContext.h

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

llvm/trunk/lib/Analysis/LoopInfo.cpp

llvm/trunk/lib/Analysis/VectorUtils.cpp

llvm/trunk/lib/IR/LLVMContext.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombinePHI.cpp

llvm/trunk/lib/Transforms/Scalar/GVNHoist.cpp

llvm/trunk/lib/Transforms/Scalar/LoopVersioningLICM.cpp

llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/trunk/lib/Transforms/Scalar/SROA.cpp

llvm/trunk/lib/Transforms/Scalar/Scalarizer.cpp

llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp

llvm/trunk/lib/Transforms/Utils/Local.cpp

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/trunk/test/Analysis/LoopInfo/annotated-parallel-complex.ll

llvm/trunk/test/Analysis/LoopInfo/annotated-parallel-simple.ll

llvm/trunk/test/ThinLTO/X86/lazyload_metadata.ll

llvm/trunk/test/Transforms/Inline/parallel-loop-md-callee.ll

llvm/trunk/test/Transforms/Inline/parallel-loop-md-merge.ll

llvm/trunk/test/Transforms/Inline/parallel-loop-md.ll

llvm/trunk/test/Transforms/InstCombine/intersect-accessgroup.ll

llvm/trunk/test/Transforms/InstCombine/loadstore-metadata.ll

llvm/trunk/test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/force-ifcvt.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/parallel-loops.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/pr34438.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/vect.omp.force.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll

llvm/trunk/test/Transforms/SROA/mem-par-metadata-sroa.ll

llvm/trunk/test/Transforms/Scalarizer/basic.ll

llvm/trunk/test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
ClosedPublic