This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2/2
LangRef.rst
-
include/llvm/
-
llvm/
-
Analysis/
1/1
LoopInfo.h
-
LoopInfoImpl.h
2/2
VectorUtils.h
-
IR/
-
LLVMContext.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
1/1
LoopInfo.cpp
2/2
VectorUtils.cpp
-
IR/
-
LLVMContext.cpp
-
Transforms/
-
InstCombine/
-
InstCombineCalls.cpp
-
InstCombineLoadStoreAlloca.cpp
-
InstCombinePHI.cpp
-
Scalar/
-
GVNHoist.cpp
3/3
LoopVersioningLICM.cpp
-
MemCpyOptimizer.cpp
-
SROA.cpp
-
Scalarizer.cpp
-
Utils/
3/3
InlineFunction.cpp
-
Local.cpp
1/1
LoopUtils.cpp
-
SimplifyCFG.cpp
-
test/
-
Analysis/LoopInfo/
-
LoopInfo/
-
annotated-parallel-complex.ll
-
annotated-parallel-simple.ll
-
ThinLTO/X86/
-
X86/
-
lazyload_metadata.ll
-
Transforms/
-
Inline/
-
parallel-loop-md-callee.ll
-
parallel-loop-md-merge.ll
-
parallel-loop-md.ll
-
InstCombine/
-
intersect-accessgroup.ll
-
loadstore-metadata.ll
-
mem-par-metadata-memcpy.ll
-
LoopVectorize/X86/
-
X86/
-
force-ifcvt.ll
-
parallel-loops-after-reg2mem.ll
-
parallel-loops.ll
-
pr34438.ll
-
vect.omp.force.ll
-
vect.omp.force.small-tc.ll
-
vector_max_bandwidth.ll
-
SROA/
-
mem-par-metadata-sroa.ll
-
Scalarizer/
-
basic.ll
-
SimplifyCFG/
-
combine-parallel-mem-md.ll

Differential D52116

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
ClosedPublic

Authored by Meinersbur on Sep 14 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

hfinkel
pekka
paul.redmond
reames
hsaito
pekka.jaaskelainen
jdoerfert

Commits

rG978ba61536c2: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
rL349725: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.

Summary

The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated.

Possible extensions/follow-up patches:

AutoUpgrade llvm.mem.parallel_loop_access to llvm.access.group such that we can remove its handling in the passes.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 25850
Build 25849: arc lint + arc unit

Event Timeline

Meinersbur created this revision.Sep 14 2018, 12:04 PM

Herald added subscribers: dexonsmith, steven_wu, dmgreen and 2 others. · View Herald TranscriptSep 14 2018, 12:04 PM

Meinersbur mentioned this in D52117: Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata..Sep 14 2018, 12:04 PM

Meinersbur added a child revision: D52117: Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata..

Meinersbur mentioned this in D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..Sep 28 2018, 4:21 AM

Thanks for making this MD more robust! It is essential for the vectorization performance of pocl. Sorry for my slow response time.

Related to OpenCL, there's usually a 3D work-item loop all of which levels are parallel. I didn't spot a test that shows multiple loop levels and thus multiple parallel access groups attached to an instruction. Moreover, the inliner MD transfer would indeed be much improved in case it then replicated the parallel access info from all loop hierarchy levels downwards. The comment which mentions that you focus on inner loop is true, but if there is (or will be) a loop interchange optimization pass that utilizes the parallel loop info, then any loop level might be potentially transferred to be the inner loop.

This is again very useful for OpenCL work-item loops as it allows optimizing memory access patterns via loop interchange, thus essentially performing outer loop vectorization when it's the best way to get performance.

This revision is now accepted and ready to land.Oct 3 2018, 12:47 AM

Thank you for your feedback. If you don'r mind, before I commit this, I will prepare a patch for clang generating this new kind of metadata (since atm nothing is generating it) and give others some time for feedback.

Dropping one of the llvm.access.group annotation instead of merging is a trade-off I had to make. It only affects inlining situation where the a loop containing a function call to a function call containing a loop, both loop being marked as parallel. I can make patch suggestion later.

I actually already created a review for the Clang part which is D52117, but uploaded the wrong diff. Corrected. I'll wait for both being accepted before committing.

Rebase

Harbormaster completed remote builds in B23448: Diff 168252.Oct 4 2018, 3:12 AM

dexonsmith removed a subscriber: dexonsmith.Oct 4 2018, 4:15 PM

@hfinkel ping

In D52116#1283910, @Meinersbur wrote:

@hfinkel ping

I'm basically happy with this, but we shouldn't add things to the LangRef without an RFC. I don't recall seeing one.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631	I don't understand what this FIXME is saying. What needs to be fixed?
lib/Transforms/Utils/InlineFunction.cpp
809	Is the problem with "updating all uses of one of the access groups" that we don't have a way to efficiently enumerate them? Would we need to scan the functions for branches and collect all of the loop-id metadata that's relevant first? It would be nice not to lose this information.

I am going to to prepare an RFC.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631	The line below adds `llvm.mem.parallel_loop_access` to a LoopID, but is expected as annotations of instructions that access memory (with a the loop it is parallel to as parameter). There is no code that looks for `llvm.mem.parallel_loop_access` in LoopID metadata. Hence, adding the property has no effect.
lib/Transforms/Utils/InlineFunction.cpp
809	Either search (and update) all LoopIDs that reference the access group or create a new 'meta-access-group' as outlined in the FIXME. Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. E.g. it would be nice if alias-analysis would be control-flow-sensitive. At there moment the only pass making use of `Loop::isAnnotatedParallel()` are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. There's also the FIXME still in the code such that the issue is not forgotten. Do you still want me to implement one of the solutions?

hfinkel added inline comments.Dec 4 2018, 8:03 AM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
631	OIC, okay. Thanks.
lib/Transforms/Utils/InlineFunction.cpp
809	Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. Clearly I know this ;) At there moment the only pass making use of Loop::isAnnotatedParallel() are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. Two things: First, this might be true today, but very soon won't be true (vectorization will soon handle outer loops, and we are developing other loop-nest transformations) and if we do the right thing now, we won't later need to go back and fix this later. Second, while it's true that full loop unrolling generally happens before inlining, we could have a loop that, at the point of inlining is an inner loop, but is later fully unrolled such that before vectorization (etc.) the loop here might become, once again, the inner loop. Do you still want me to implement one of the solutions? Yes.

Allow multiple access groups per instructions, i.e. an instruction can be in multiple access groups. This allows a simple 'union' operation that occurs when inlining into another function. A memory access is considered parallel when at least one access group is listed in llvm.loop.parallel_accesses. This is prioritized over the 'intersect' case for combining instructions which would be dual. We only do best-effort here.

Harbormaster completed remote builds in B25850: Diff 177322.Dec 7 2018, 2:36 PM

Meinersbur edited the summary of this revision. (Show Details)Dec 7 2018, 2:42 PM

Meinersbur added a reviewer: jdoerfert.

Rebase to trunk

Harbormaster completed remote builds in B25995: Diff 178092.Dec 13 2018, 10:25 AM

Reinsert parts of LangRef.rst that went missing after merge.

lib/Transforms/Utils/LoopUtils.cpp
185–224	This has moved to LoopInfo.cpp. LoopUtils.cpp belongs to libTransform, but not all users have a dependency to libTransform (such as `Loop::isAnnotatedParallel()`).

Harbormaster completed remote builds in B25996: Diff 178093.Dec 13 2018, 10:36 AM

hfinkel added inline comments.Dec 18 2018, 5:32 PM

docs/LangRef.rst
5349	update -> updating
5371	of -> if
include/llvm/Analysis/VectorUtils.h
120	access group lists -> access-group lists
126	access group list -> access-group list
lib/Analysis/LoopInfo.cpp
332	Repeating this set of asserts seems unfortunate. Also, they have no comment. Maybe make a function? assert(AccGroup->isDistinct() && "Access group metadata nodes must be distinct"); assert(AccGroup->getNumOperands() == 0 && "Access group metadata nodes must have zero operands"); you also repeat these asserts in lib/Analysis/VectorUtils.cpp below.
lib/Analysis/VectorUtils.cpp
474	Indentation here is odd.
477	Indentation here looks odd too.

Address @hfinkel's review
clang-format

Harbormaster completed remote builds in B26151: Diff 178943.Dec 19 2018, 12:29 PM

Aside from the requested renaming (see below), this LGTM.

include/llvm/Analysis/LoopInfo.h
1016	I'd prefer to name this isValidAsAccessGroup() (because the function does not actually determine whether the MD node is an access group, but rather, whether it might be valid to use it as one).

Rename: isAccessGroup -> isValidAsAccessGroup

Harbormaster completed remote builds in B26161: Diff 178979.Dec 19 2018, 3:32 PM

Closed by commit rL349725: Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. (authored by Meinersbur). · Explain WhyDec 19 2018, 9:01 PM

This revision was automatically updated to reflect the committed changes.

Meinersbur marked an inline comment as done.

Meinersbur mentioned this in rC349823: [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem..Dec 20 2018, 1:28 PM

Meinersbur mentioned this in rL349823: [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem..

Revision Contents

Path

Size

docs/

LangRef.rst

114 lines

include/

llvm/

Analysis/

LoopInfo.h

15 lines

LoopInfoImpl.h

5 lines

VectorUtils.h

15 lines

IR/

LLVMContext.h

1 line

Transforms/

Utils/

LoopUtils.h

2 lines

lib/

Analysis/

LoopInfo.cpp

68 lines

VectorUtils.cpp

91 lines

IR/

LLVMContext.cpp

1 line

Transforms/

InstCombine/

InstCombineCalls.cpp

5 lines

InstCombineLoadStoreAlloca.cpp

1 line

InstCombinePHI.cpp

1 line

Scalar/

GVNHoist.cpp

2 lines

LoopVersioningLICM.cpp

2 lines

MemCpyOptimizer.cpp

3 lines

SROA.cpp

8 lines

Scalarizer.cpp

3 lines

Utils/

26 lines

11 lines

38 lines

3 lines

test/

Analysis/

LoopInfo/

annotated-parallel-complex.ll

91 lines

annotated-parallel-simple.ll

37 lines

ThinLTO/

X86/

lazyload_metadata.ll

4 lines

Transforms/

Inline/

parallel-loop-md-callee.ll

56 lines

parallel-loop-md-merge.ll

78 lines

parallel-loop-md.ll

18 lines

InstCombine/

intersect-accessgroup.ll

113 lines

loadstore-metadata.ll

17 lines

mem-par-metadata-memcpy.ll

11 lines

LoopVectorize/

X86/

force-ifcvt.ll

11 lines

parallel-loops-after-reg2mem.ll

13 lines

parallel-loops.ll

34 lines

pr34438.ll

9 lines

vect.omp.force.ll

14 lines

vect.omp.force.small-tc.ll

57 lines

vector_max_bandwidth.ll

9 lines

SROA/

mem-par-metadata-sroa.ll

33 lines

Scalarizer/

basic.ll

25 lines

SimplifyCFG/

combine-parallel-mem-md.ll

21 lines

Commit	Tree	Parents	Author	Summary	Date
52791d0f87d4	1b6b70cd3d1b	fa2fb2ef477e	Michael Kruse	Fix continue/return	Dec 7 2018, 2:30 PM
fa2fb2ef477e	3ec0aa7bcf20	902e9a55c944	Michael Kruse	Manual formatting	Dec 7 2018, 2:03 PM
902e9a55c944	cae3a90bb6e6	0c968eb07a69	Michael Kruse	clang-format	Dec 7 2018, 11:53 AM
0c968eb07a69	88131ce4c828	cfc6659c5a0f	Michael Kruse	isAnnotatedParallel test cases	Dec 7 2018, 10:50 AM
cfc6659c5a0f	b50d4628194c	e63c010f669f	Michael Kruse	More relevant things in test case	Dec 6 2018, 5:38 PM
e63c010f669f	5b19b1bb178b	a3f881766ad0	Michael Kruse	instcombine access group metadata test case	Dec 6 2018, 5:34 PM
a3f881766ad0	dbea21d942e8	225fea15f3f3	Michael Kruse	Test access group merging	Dec 6 2018, 2:36 PM
225fea15f3f3	a46a12a5ee01	a7c8fb51b69a	Michael Kruse	Allow unions of access groups	Dec 6 2018, 12:15 PM
a7c8fb51b69a	9b82d926041d	542766f4f5b2	Michael Kruse	clang-format	Dec 5 2018, 11:33 AM
542766f4f5b2	0da2597227e7	6a6a1ea599a4	Michael Kruse	Merge access groups	Dec 5 2018, 11:29 AM
6a6a1ea599a4	7333e25de608	f802aed99181 89a173bdd79f	Michael Kruse	Merge branch 'access_group' into HEAD	Dec 4 2018, 4:36 PM
89a173bdd79f	be3bcf5fb975	a348028e7c25 aaa6a63d3fd4	Michael Kruse	Merge branch 'access_group' into HEAD	Dec 4 2018, 1:11 PM
aaa6a63d3fd4	c608d8cfde79	0375fe2e5c2d c9b5c59206b8	Michael Kruse	Merge remote-tracking branch 'official/master' into access_group	Oct 4 2018, 2:08 AM
0375fe2e5c2d	9a518291625f	5c738478d09a 7a4f5a4b4d26	Michael Kruse	Merge branch 'access_group' into HEAD	Oct 3 2018, 4:11 AM
7a4f5a4b4d26	18a2d5aef6ea	6199925006dd	Michael Kruse	Silence warning	Sep 26 2018, 4:05 AM
6199925006dd	b4b3c865b6bf	1f289b509371	Michael Kruse	Use call access group if instruction's access group is not set	Sep 26 2018, 4:01 AM
1f289b509371	5c0d20cb187b	3303750bfb00 2a46b8063dd9	Michael Kruse	Merge branch 'access_group' into HEAD	Sep 26 2018, 2:57 AM
2a46b8063dd9	e6aea037220c	a3c3e205293c	Michael Kruse	Assert correct access groups	Sep 14 2018, 11:41 AM
a3c3e205293c	9ffcefebe0ed	3c620e729e3b	Michael Kruse	Cleanup	Sep 14 2018, 10:22 AM
3c620e729e3b	a858ed8f6637	bd194b54c3a8	Michael Kruse	clang-format	Sep 13 2018, 8:46 PM
bd194b54c3a8	1168fa5f56b5	03e629a1a434	Michael Kruse	access groups	Sep 13 2018, 8:37 PM

Diff 177322

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,112 Lines • ▼ Show 20 Lines
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are			Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
	used to control per-loop vectorization and interleaving parameters such as			used to control per-loop vectorization and interleaving parameters such as
	vectorization width and interleave count. These metadata should be used in			vectorization width and interleave count. These metadata should be used in
	conjunction with ``llvm.loop`` loop identification metadata. The			conjunction with ``llvm.loop`` loop identification metadata. The
	``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only			``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
	optimization hints and the optimizer will only interleave and vectorize loops if			optimization hints and the optimizer will only interleave and vectorize loops if
	it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata			it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
	which contains information about loop-carried memory dependencies can be helpful			which contains information about loop-carried memory dependencies can be helpful
	in determining the safety of these transformations.			in determining the safety of these transformations.

	'``llvm.loop.interleave.count``' Metadata			'``llvm.loop.interleave.count``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	This metadata suggests an interleave count to the loop interleaver.			This metadata suggests an interleave count to the loop interleaver.
	The first operand is the string ``llvm.loop.interleave.count`` and the			The first operand is the string ``llvm.loop.interleave.count`` and the
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	.. code-block:: llvm			.. code-block:: llvm

	!0 = !{!"llvm.loop.distribute.enable", i1 0}			!0 = !{!"llvm.loop.distribute.enable", i1 0}
	!1 = !{!"llvm.loop.distribute.enable", i1 1}			!1 = !{!"llvm.loop.distribute.enable", i1 1}

	This metadata should be used in conjunction with ``llvm.loop`` loop			This metadata should be used in conjunction with ``llvm.loop`` loop
	identification metadata.			identification metadata.

	'``llvm.mem``'			'``llvm.access.group``' Metadata
	^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Metadata types used to annotate memory accesses with information helpful			``llvm.access.group`` metadata can be attached to any instruction that
	for optimizations are prefixed with ``llvm.mem``.			potentially accesses memory. It can point to a single distinct metadata
				node, which we call access group. This node represents all memory access
				instructions referring to it via ``llvm.access.group``. When an
				instruction belongs to multiple access groups, it can also point to a
				list of accesses groups, illustrated by the following example.

	'``llvm.mem.parallel_loop_access``' Metadata			.. code-block:: llvm
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,			%val = load i32, i32* %arrayidx, !llvm.access.group !0
	or metadata containing a list of loop identifiers for nested loops.			...
	The metadata is attached to memory accessing instructions and denotes that			!0 = !{!1, !2}
	no loop carried memory dependence exist between it and other instructions denoted			!1 = distinct !{}
	with the same loop identifier. The metadata on memory reads also implies that			!2 = distinct !{}
	if conversion (i.e. speculative execution within a loop iteration) is safe.
				It is illegal for the list node to be empty since it might be confused
	Precisely, given two instructions ``m1`` and ``m2`` that both have the			with an access group.
	``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
	set of loops associated with that metadata, respectively, then there is no loop			The access group metadata node must be 'distinct' to avoid collapsing
	carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and			multiple access groups by content. A access group metadata node must
	``L2``.			always be empty which can be used to distinguish an access group
				metadata node from a list of access groups. Being empty avoids the
	As a special case, if all memory accessing instructions in a loop have			situation that the content must be updated which, because metadata is
	``llvm.mem.parallel_loop_access`` metadata that refers to that loop, then the			immutable by design, would required finding and update all references
				hfinkelUnsubmitted Done Reply Inline Actions update -> updating hfinkel: update -> updating
	loop has no loop carried memory dependences and is considered to be a parallel			to the access group node.
	loop.
				The access group can be used to refer to a memory access instruction
				without pointing to it directly (which is not possible in global
				metadata). Currently, the only metadata making use of it is
				``llvm.loop.parallel_accesses``.

				'``llvm.loop.parallel_accesses``' Metadata
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Note that if not all memory access instructions have such metadata referring to			The ``llvm.loop.parallel_accesses`` metadata refers to one or more
	the loop, then the loop is considered not being trivially parallel. Additional			access group metadata nodes (see ``llvm.access.group``). It denotes that
				no loop-carried memory dependence exist between it and other instructions
				in the loop with this metadata.

				Let ``m1`` and ``m2`` be two instructions that both have the
				``llvm.access.group`` metadata to the access group ``g1``, respectively
				``g2`` (which might be identical). If a loop contains both access groups
				in its ``llvm.loop.parallel_accesses`` metadata, then the compiler can
				assume that there is no dependency between ``m1`` and ``m2`` carried by
				this loop. Instructions that belong to multiple access groups are
				considered having this property of at least one of the access groups
				hfinkelUnsubmitted Done Reply Inline Actions of -> if hfinkel: of -> if
				matches the ``llvm.loop.parallel_accesses`` list.

				If all memory-accessing instructions in a loop have
				``llvm.loop.parallel_accesses`` metadata that refers to that loop, then the
				loop has no loop carried memory dependences and is considered to be a
				parallel loop.

				Note that if not all memory access instructions belong to an access
				group referred to by ``llvm.loop.parallel_accesses``, then the loop must
				not be considered trivially parallel. Additional
	memory dependence analysis is required to make that determination. As a fail			memory dependence analysis is required to make that determination. As a fail
	safe mechanism, this causes loops that were originally parallel to be considered			safe mechanism, this causes loops that were originally parallel to be considered
	sequential (if optimization passes that are unaware of the parallel semantics			sequential (if optimization passes that are unaware of the parallel semantics
	insert new memory instructions into the loop body).			insert new memory instructions into the loop body).

	Example of a loop that is considered parallel due to its correct use of			Example of a loop that is considered parallel due to its correct use of
	both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``			both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
	metadata types that refer to the same loop identifier metadata.			metadata types.

	.. code-block:: llvm			.. code-block:: llvm

	for.body:			for.body:
	...			...
	%val0 = load i32, i32* %arrayidx, !llvm.mem.parallel_loop_access !0			%val0 = load i32, i32* %arrayidx, !llvm.access.group !1
	...			...
	store i32 %val0, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0			store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
	...			...
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end:			for.end:
	...			...
	!0 = !{!0}			!0 = !{!0, !{"llvm.loop.parallel_accesses", !1}}
				!1 = distinct !{}

	It is also possible to have nested parallel loops. In that case the			It is also possible to have nested parallel loops:
	memory accesses refer to a list of loop identifier metadata nodes instead of
	the loop identifier metadata node directly:

	.. code-block:: llvm			.. code-block:: llvm

	outer.for.body:			outer.for.body:
	...			...
	%val1 = load i32, i32* %arrayidx3, !llvm.mem.parallel_loop_access !2			%val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
	...			...
	br label %inner.for.body			br label %inner.for.body

	inner.for.body:			inner.for.body:
	...			...
	%val0 = load i32, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0			%val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
	...			...
	store i32 %val0, i32* %arrayidx2, !llvm.mem.parallel_loop_access !0			store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
	...			...
	br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1			br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1

	inner.for.end:			inner.for.end:
	...			...
	store i32 %val1, i32* %arrayidx4, !llvm.mem.parallel_loop_access !2			store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
	...			...
	br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2			br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2

	outer.for.end: ; preds = %for.body			outer.for.end: ; preds = %for.body
	...			...
	!0 = !{!1, !2} ; a list of loop identifiers			!1 = !{!1, !{"llvm.loop.parallel_accesses", !3}} ; an identifier for the inner loop
	!1 = !{!1} ; an identifier for the inner loop			!2 = !{!2, !{"llvm.loop.parallel_accesses", !3, !4}} ; an identifier for the outer loop
	!2 = !{!2} ; an identifier for the outer loop			!3 = distinct !{} ; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)
				!4 = distinct !{} ; access group for instructions in the outer, but not the inner loop

	'``irr_loop``' Metadata			'``irr_loop``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^

	``irr_loop`` metadata may be attached to the terminator instruction of a basic			``irr_loop`` metadata may be attached to the terminator instruction of a basic
	block that's an irreducible loop header (note that an irreducible loop has more			block that's an irreducible loop header (note that an irreducible loop has more
	than once header basic blocks.) If ``irr_loop`` metadata is attached to the			than once header basic blocks.) If ``irr_loop`` metadata is attached to the
	terminator instruction of a basic block that is not really an irreducible loop			terminator instruction of a basic block that is not really an irreducible loop
	▲ Show 20 Lines • Show All 10,528 Lines • Show Last 20 Lines

include/llvm/Analysis/LoopInfo.h

Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	#endif
}		}

/// Verify loop structure		/// Verify loop structure
void verifyLoop() const;		void verifyLoop() const;

/// Verify loop structure of this loop and all nested loops.		/// Verify loop structure of this loop and all nested loops.
void verifyLoopNest(DenseSet<const LoopT > Loops) const;		void verifyLoopNest(DenseSet<const LoopT > Loops) const;

		/// Returns true if the loop is annotated parallel.
		///
		/// Derived classes can override this method using static template
		/// polymorphism.
		bool isAnnotatedParallel() const { return false; }

/// Print loop with all the BBs inside it.		/// Print loop with all the BBs inside it.
void print(raw_ostream &OS, unsigned Depth = 0, bool Verbose = false) const;		void print(raw_ostream &OS, unsigned Depth = 0, bool Verbose = false) const;

protected:		protected:
friend class LoopInfoBase<BlockT, LoopT>;		friend class LoopInfoBase<BlockT, LoopT>;

/// This creates an empty loop.		/// This creates an empty loop.
LoopBase() : ParentLoop(nullptr) {}		LoopBase() : ParentLoop(nullptr) {}
▲ Show 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	public:
void print(raw_ostream &O, const Module *M = nullptr) const override;		void print(raw_ostream &O, const Module *M = nullptr) const override;

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
};		};

/// Function to print a loop's contents as LLVM's text IR assembly.		/// Function to print a loop's contents as LLVM's text IR assembly.
void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner = "");		void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner = "");


		/// Find string metadata for a loop.
		///
		/// Returns the MDNode where the first operand is the metadata's name. The
		/// following operands are the metadata's values. If no metadata with @p Name is
		/// found, return nullptr.
		MDNode findOptionMDForLoop(const Loop TheLoop, StringRef Name);


} // End llvm namespace		} // End llvm namespace

#endif		#endif
		hfinkelUnsubmitted Done Reply Inline Actions I'd prefer to name this isValidAsAccessGroup() (because the function does not actually determine whether the MD node is an access group, but rather, whether it might be valid to use it as one). hfinkel: I'd prefer to name this isValidAsAccessGroup() (because the function does not actually…

include/llvm/Analysis/LoopInfoImpl.h

Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	void LoopBase<BlockT, LoopT>::verifyLoopNest(
// Verify the subloops.		// Verify the subloops.
for (iterator I = begin(), E = end(); I != E; ++I)		for (iterator I = begin(), E = end(); I != E; ++I)
(*I)->verifyLoopNest(Loops);		(*I)->verifyLoopNest(Loops);
}		}

template <class BlockT, class LoopT>		template <class BlockT, class LoopT>
void LoopBase<BlockT, LoopT>::print(raw_ostream &OS, unsigned Depth,		void LoopBase<BlockT, LoopT>::print(raw_ostream &OS, unsigned Depth,
bool Verbose) const {		bool Verbose) const {
OS.indent(Depth * 2) << "Loop at depth " << getLoopDepth() << " containing: ";		OS.indent(Depth * 2);
		if (static_cast<const LoopT *>(this)->isAnnotatedParallel())
		OS << "Parallel ";
		OS << "Loop at depth " << getLoopDepth() << " containing: ";

BlockT *H = getHeader();		BlockT *H = getHeader();
for (unsigned i = 0; i < getBlocks().size(); ++i) {		for (unsigned i = 0; i < getBlocks().size(); ++i) {
BlockT *BB = getBlocks()[i];		BlockT *BB = getBlocks()[i];
if (!Verbose) {		if (!Verbose) {
if (i)		if (i)
OS << ",";		OS << ",";
BB->printAsOperand(OS, false);		BB->printAsOperand(OS, false);
▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines

include/llvm/Analysis/VectorUtils.h

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	///			///
	/// If the optional TargetTransformInfo is provided, this function tries harder			/// If the optional TargetTransformInfo is provided, this function tries harder
	/// to do less work by only looking at illegal types.			/// to do less work by only looking at illegal types.
	MapVector<Instruction*, uint64_t>			MapVector<Instruction*, uint64_t>
	computeMinimumValueSizes(ArrayRef<BasicBlock*> Blocks,			computeMinimumValueSizes(ArrayRef<BasicBlock*> Blocks,
	DemandedBits &DB,			DemandedBits &DB,
	const TargetTransformInfo *TTI=nullptr);			const TargetTransformInfo *TTI=nullptr);

				/// Compute the union of two access group lists.
				hfinkelUnsubmitted Done Reply Inline Actions access group lists -> access-group lists hfinkel: access group lists -> access-group lists
				///
				/// If the list contains just one access group, it is returned directly. If the
				/// list is empty, returns nullptr.
				MDNode uniteAccessGroups(MDNode AccGroups1, MDNode *AccGroups2);

				/// Compute the access group list of access groups that @p Inst1 and @p Inst2 are both in. If either instruction does not access memory at all, it is considered to be in every list.
				hfinkelUnsubmitted Done Reply Inline Actions access group list -> access-group list hfinkel: access group list -> access-group list
				///
				/// If the list contains just one access group, it is returned directly. If the
				/// list is empty, returns nullptr.
				MDNode intersectAccessGroups(const Instruction Inst1, const Instruction *Inst2);

	/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,			/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
	/// MD_nontemporal]. For K in Kinds, we get the MDNode for K from each of the			/// MD_nontemporal, MD_access_group].
				/// For K in Kinds, we get the MDNode for K from each of the
	/// elements of VL, compute their "intersection" (i.e., the most generic			/// elements of VL, compute their "intersection" (i.e., the most generic
	/// metadata value that covers all of the individual values), and set I's			/// metadata value that covers all of the individual values), and set I's
	/// metadata for M equal to the intersection value.			/// metadata for M equal to the intersection value.
	///			///
	/// This function always sets a (possibly null) value for each K in Kinds.			/// This function always sets a (possibly null) value for each K in Kinds.
	Instruction propagateMetadata(Instruction I, ArrayRef<Value *> VL);			Instruction propagateMetadata(Instruction I, ArrayRef<Value *> VL);

	/// Create a mask that filters the members of an interleave group where there			/// Create a mask that filters the members of an interleave group where there
	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

include/llvm/IR/LLVMContext.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	enum : unsigned {
MD_align = 17, // "align"		MD_align = 17, // "align"
MD_loop = 18, // "llvm.loop"		MD_loop = 18, // "llvm.loop"
MD_type = 19, // "type"		MD_type = 19, // "type"
MD_section_prefix = 20, // "section_prefix"		MD_section_prefix = 20, // "section_prefix"
MD_absolute_symbol = 21, // "absolute_symbol"		MD_absolute_symbol = 21, // "absolute_symbol"
MD_associated = 22, // "associated"		MD_associated = 22, // "associated"
MD_callees = 23, // "callees"		MD_callees = 23, // "callees"
MD_irr_loop = 24, // "irr_loop"		MD_irr_loop = 24, // "irr_loop"
		MD_access_group = 25, // "llvm.access.group"
};		};

/// Known operand bundle tag IDs, which always have the same value. All		/// Known operand bundle tag IDs, which always have the same value. All
/// operand bundle tags that LLVM has special knowledge of are listed here.		/// operand bundle tags that LLVM has special knowledge of are listed here.
/// Additionally, this scheme allows LLVM to efficiently check for specific		/// Additionally, this scheme allows LLVM to efficiently check for specific
/// operand bundle tags without comparing strings.		/// operand bundle tags without comparing strings.
enum : unsigned {		enum : unsigned {
OB_deopt = 0, // "deopt"		OB_deopt = 0, // "deopt"
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	/// Returns the instructions that use values defined in the loop.			/// Returns the instructions that use values defined in the loop.
	SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);			SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);

	/// Find string metadata for loop			/// Find string metadata for loop
	///			///
	/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an			/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
	/// operand or null otherwise. If the string metadata is not found return			/// operand or null otherwise. If the string metadata is not found return
	/// Optional's not-a-value.			/// Optional's not-a-value.
	Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,			Optional<const MDOperand > findStringMetadataForLoop(const Loop TheLoop,
	StringRef Name);			StringRef Name);

	/// Set input string into loop metadata by keeping other values intact.			/// Set input string into loop metadata by keeping other values intact.
	void addStringMetadataToLoop(Loop TheLoop, const char MDString,			void addStringMetadataToLoop(Loop TheLoop, const char MDString,
	unsigned V = 0);			unsigned V = 0);

	/// Get a loop's estimated trip count based on branch weight metadata.			/// Get a loop's estimated trip count based on branch weight metadata.
	/// Returns 0 when the count is estimated to be 0, or None when a meaningful			/// Returns 0 when the count is estimated to be 0, or None when a meaningful
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/Analysis/LoopInfo.cpp

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
}		}

bool Loop::isAnnotatedParallel() const {		bool Loop::isAnnotatedParallel() const {
MDNode *DesiredLoopIdMetadata = getLoopID();		MDNode *DesiredLoopIdMetadata = getLoopID();

if (!DesiredLoopIdMetadata)		if (!DesiredLoopIdMetadata)
return false;		return false;

		MDNode *ParallelAccesses =
		findOptionMDForLoop(this, "llvm.loop.parallel_accesses");
		SmallPtrSet<MDNode *, 4>
		ParallelAccessGroups; // For scalable 'contains' check.
		if (ParallelAccesses) {
		for (auto &MD : drop_begin(ParallelAccesses->operands(), 1)) {
		MDNode *AccGroup = cast<MDNode>(MD.get());
		assert(AccGroup->isDistinct());
		assert(AccGroup->getNumOperands() == 0);
		ParallelAccessGroups.insert(AccGroup);
		}
		}

// The loop branch contains the parallel loop metadata. In order to ensure		// The loop branch contains the parallel loop metadata. In order to ensure
// that any parallel-loop-unaware optimization pass hasn't added loop-carried		// that any parallel-loop-unaware optimization pass hasn't added loop-carried
// dependencies (thus converted the loop back to a sequential loop), check		// dependencies (thus converted the loop back to a sequential loop), check
// that all the memory instructions in the loop contain parallelism metadata		// that all the memory instructions in the loop belong to an access group that
// that point to the same unique "loop id metadata" the loop branch does.		// is parallel to this loop.
for (BasicBlock *BB : this->blocks()) {		for (BasicBlock *BB : this->blocks()) {
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
if (!I.mayReadOrWriteMemory())		if (!I.mayReadOrWriteMemory())
continue;		continue;

		if (auto AccessGroup = I.getMetadata(LLVMContext::MD_access_group)) {
		auto ContainsAccessGroup = [&ParallelAccessGroups](MDNode *AG) -> bool {
		if (AG->getNumOperands() == 0) {
		assert(AG->isDistinct());
		return ParallelAccessGroups.count(AG);
		}

		for (auto &AccessListItem : AG->operands()) {
		MDNode *AccGroup = cast<MDNode>(AccessListItem.get());
		assert(AccGroup->isDistinct());
		hfinkelUnsubmitted Done Reply Inline Actions Repeating this set of asserts seems unfortunate. Also, they have no comment. Maybe make a function? assert(AccGroup->isDistinct() && "Access group metadata nodes must be distinct"); assert(AccGroup->getNumOperands() == 0 && "Access group metadata nodes must have zero operands"); you also repeat these asserts in lib/Analysis/VectorUtils.cpp below. hfinkel: Repeating this set of asserts seems unfortunate. Also, they have no comment. Maybe make a…
		assert(AccGroup->getNumOperands() == 0);
		if (ParallelAccessGroups.count(AccGroup))
		return true;
		}
		return false;
		};

		if (ContainsAccessGroup(AccessGroup))
		continue;
		}

// The memory instruction can refer to the loop identifier metadata		// The memory instruction can refer to the loop identifier metadata
// directly or indirectly through another list metadata (in case of		// directly or indirectly through another list metadata (in case of
// nested parallel loops). The loop identifier metadata refers to		// nested parallel loops). The loop identifier metadata refers to
// itself so we can check both cases with the same routine.		// itself so we can check both cases with the same routine.
MDNode *LoopIdMD =		MDNode *LoopIdMD =
I.getMetadata(LLVMContext::MD_mem_parallel_loop_access);		I.getMetadata(LLVMContext::MD_mem_parallel_loop_access);

if (!LoopIdMD)		if (!LoopIdMD)
▲ Show 20 Lines • Show All 374 Lines • ▼ Show 20 Lines	if (!ExitBlocks.empty()) {
for (auto *Block : ExitBlocks)		for (auto *Block : ExitBlocks)
if (Block)		if (Block)
Block->print(OS);		Block->print(OS);
else		else
OS << "Printing <null> block";		OS << "Printing <null> block";
}		}
}		}

		/// Find and return the loop attribute node for the attribute @p Name in @p
		/// LoopID. Return nullptr if there is no such attribute.
		static MDNode findOptionMDForLoopID(MDNode LoopID, StringRef Name) {
		// Return none if LoopID is false.
		if (!LoopID)
		return nullptr;

		// First operand should refer to the loop id itself.
		assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
		assert(LoopID->getOperand(0) == LoopID && "invalid loop id");

		// Iterate over LoopID operands and look for MDString Metadata
		for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
		MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
		if (!MD)
		continue;
		MDString *S = dyn_cast<MDString>(MD->getOperand(0));
		if (!S)
		continue;
		// Return true if MDString holds expected MetaData.
		if (Name.equals(S->getString()))
		return MD;
		}
		return nullptr;
		}

		MDNode llvm::findOptionMDForLoop(const Loop TheLoop, StringRef Name) {
		return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// LoopInfo implementation		// LoopInfo implementation
//		//

char LoopInfoWrapperPass::ID = 0;		char LoopInfoWrapperPass::ID = 0;
INITIALIZE_PASS_BEGIN(LoopInfoWrapperPass, "loops", "Natural Loop Information",		INITIALIZE_PASS_BEGIN(LoopInfoWrapperPass, "loops", "Natural Loop Information",
true, true)		true, true)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/Analysis/VectorUtils.cpp

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	for (auto MI = ECs.member_begin(I), ME = ECs.member_end(); MI != ME; ++MI) {
if (MinBW < Ty->getScalarSizeInBits())		if (MinBW < Ty->getScalarSizeInBits())
MinBWs[cast<Instruction>(*MI)] = MinBW;		MinBWs[cast<Instruction>(*MI)] = MinBW;
}		}
}		}

return MinBWs;		return MinBWs;
}		}

		/// Add all access groups in @p AccGroups to @p List.
		template <typename ListT>
		static void addToAccessGroupList(ListT &List, MDNode *AccGroups) {
		// Interpret an access group as a list containing itself.
		if (AccGroups->getNumOperands() == 0) {
		assert(AccGroups->isDistinct());
		List.insert(AccGroups);
		return;
		hfinkelUnsubmitted Done Reply Inline Actions Indentation here is odd. hfinkel: Indentation here is odd.
		}

		for (auto &AccGroupListOp : AccGroups->operands()) {
		hfinkelUnsubmitted Done Reply Inline Actions Indentation here looks odd too. hfinkel: Indentation here looks odd too.
		auto* Item = cast<MDNode>(AccGroupListOp.get());
		assert(Item->isDistinct());
		assert(Item->getNumOperands()==0);
		List.insert(Item);
		}
		};

		MDNode llvm::uniteAccessGroups(MDNode AccGroups1,
		MDNode *AccGroups2) {
		if (!AccGroups1)
		return AccGroups2;
		if (!AccGroups2)
		return AccGroups1;
		if (AccGroups1 == AccGroups2)
		return AccGroups1;

		SmallSetVector<Metadata *, 4> Union;
		addToAccessGroupList(Union, AccGroups1);
		addToAccessGroupList(Union, AccGroups2);

		if (Union.size() == 0)
		return nullptr;
		if (Union.size() == 1)
		return cast<MDNode>(Union.front());

		LLVMContext &Ctx = AccGroups1->getContext();
		return MDNode::get(Ctx, Union.getArrayRef());
		}

		MDNode llvm::intersectAccessGroups(const Instruction Inst1,
		const Instruction *Inst2) {
		bool MayAccessMem1 = Inst1->mayReadOrWriteMemory();
		bool MayAccessMem2 = Inst2->mayReadOrWriteMemory();

		if (!MayAccessMem1 && !MayAccessMem2)
		return nullptr;
		if (!MayAccessMem1)
		return Inst2->getMetadata(LLVMContext::MD_access_group);
		if (!MayAccessMem2)
		return Inst1->getMetadata(LLVMContext::MD_access_group);

		MDNode* MD1 = Inst1->getMetadata(LLVMContext::MD_access_group);
		MDNode* MD2 = Inst2->getMetadata(LLVMContext::MD_access_group);
		if (!MD1 \|\| !MD2)
		return nullptr;
		if (MD1 == MD2)
		return MD1;

		// Use set for scalable 'contains' check.
		SmallPtrSet<Metadata *, 4> AccGroupSet2;
		addToAccessGroupList(AccGroupSet2, MD2);

		SmallVector<Metadata *, 4> Intersection;
		if (MD1->getNumOperands() == 0) {
		assert(MD1->isDistinct());
		if (AccGroupSet2.count(MD1))
		Intersection.push_back(MD1);
		} else {
		for (const MDOperand &Node : MD1->operands()) {
		auto* Item = cast<MDNode>(Node.get());
		assert(Item->isDistinct());
		if (AccGroupSet2.count(Item))
		Intersection.push_back(Item);
		}
		}

		if (Intersection.size() == 0)
		return nullptr;
		if (Intersection.size() == 1)
		return cast<MDNode>(Intersection.front());

		LLVMContext &Ctx = Inst1->getContext();
		return MDNode::get(Ctx, Intersection);
		}

/// \returns \p I after propagating metadata from \p VL.		/// \returns \p I after propagating metadata from \p VL.
Instruction llvm::propagateMetadata(Instruction Inst, ArrayRef<Value *> VL) {		Instruction llvm::propagateMetadata(Instruction Inst, ArrayRef<Value *> VL) {
Instruction *I0 = cast<Instruction>(VL[0]);		Instruction *I0 = cast<Instruction>(VL[0]);
SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;		SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;
I0->getAllMetadataOtherThanDebugLoc(Metadata);		I0->getAllMetadataOtherThanDebugLoc(Metadata);

for (auto Kind :		for (auto Kind :
{LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		{LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_fpmath,		LLVMContext::MD_noalias, LLVMContext::MD_fpmath,
LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load}) {		LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load,LLVMContext::MD_access_group}) {
MDNode *MD = I0->getMetadata(Kind);		MDNode *MD = I0->getMetadata(Kind);

for (int J = 1, E = VL.size(); MD && J != E; ++J) {		for (int J = 1, E = VL.size(); MD && J != E; ++J) {
const Instruction *IJ = cast<Instruction>(VL[J]);		const Instruction *IJ = cast<Instruction>(VL[J]);
MDNode *IMD = IJ->getMetadata(Kind);		MDNode *IMD = IJ->getMetadata(Kind);
switch (Kind) {		switch (Kind) {
case LLVMContext::MD_tbaa:		case LLVMContext::MD_tbaa:
MD = MDNode::getMostGenericTBAA(MD, IMD);		MD = MDNode::getMostGenericTBAA(MD, IMD);
break;		break;
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
MD = MDNode::getMostGenericAliasScope(MD, IMD);		MD = MDNode::getMostGenericAliasScope(MD, IMD);
break;		break;
case LLVMContext::MD_fpmath:		case LLVMContext::MD_fpmath:
MD = MDNode::getMostGenericFPMath(MD, IMD);		MD = MDNode::getMostGenericFPMath(MD, IMD);
break;		break;
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_nontemporal:		case LLVMContext::MD_nontemporal:
case LLVMContext::MD_invariant_load:		case LLVMContext::MD_invariant_load:
MD = MDNode::intersect(MD, IMD);		MD = MDNode::intersect(MD, IMD);
break;		break;
		case LLVMContext::MD_access_group:
		MD = intersectAccessGroups(Inst, IJ);
		break;
default:		default:
llvm_unreachable("unhandled metadata");		llvm_unreachable("unhandled metadata");
}		}
}		}

Inst->setMetadata(Kind, MD);		Inst->setMetadata(Kind, MD);
}		}

▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

lib/IR/LLVMContext.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	std::pair<unsigned, StringRef> MDKinds[] = {
{MD_align, "align"},		{MD_align, "align"},
{MD_loop, "llvm.loop"},		{MD_loop, "llvm.loop"},
{MD_type, "type"},		{MD_type, "type"},
{MD_section_prefix, "section_prefix"},		{MD_section_prefix, "section_prefix"},
{MD_absolute_symbol, "absolute_symbol"},		{MD_absolute_symbol, "absolute_symbol"},
{MD_associated, "associated"},		{MD_associated, "associated"},
{MD_callees, "callees"},		{MD_callees, "callees"},
{MD_irr_loop, "irr_loop"},		{MD_irr_loop, "irr_loop"},
		{MD_access_group, "llvm.access.group"},
};		};

for (auto &MDKind : MDKinds) {		for (auto &MDKind : MDKinds) {
unsigned ID = getMDKindID(MDKind.second);		unsigned ID = getMDKindID(MDKind.second);
assert(ID == MDKind.first && "metadata kind id drifted");		assert(ID == MDKind.first && "metadata kind id drifted");
(void)ID;		(void)ID;
}		}

▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	Instruction InstCombiner::SimplifyAnyMemTransfer(AnyMemTransferInst MI) {
// Alignment from the mem intrinsic will be better, so use it.		// Alignment from the mem intrinsic will be better, so use it.
L->setAlignment(CopySrcAlign);		L->setAlignment(CopySrcAlign);
if (CopyMD)		if (CopyMD)
L->setMetadata(LLVMContext::MD_tbaa, CopyMD);		L->setMetadata(LLVMContext::MD_tbaa, CopyMD);
MDNode *LoopMemParallelMD =		MDNode *LoopMemParallelMD =
MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);		MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
if (LoopMemParallelMD)		if (LoopMemParallelMD)
L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);		L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
		MDNode *AccessGroupMD = MI->getMetadata(LLVMContext::MD_access_group);
		if (AccessGroupMD)
		L->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);

StoreInst *S = Builder.CreateStore(L, Dest);		StoreInst *S = Builder.CreateStore(L, Dest);
// Alignment from the mem intrinsic will be better, so use it.		// Alignment from the mem intrinsic will be better, so use it.
S->setAlignment(CopyDstAlign);		S->setAlignment(CopyDstAlign);
if (CopyMD)		if (CopyMD)
S->setMetadata(LLVMContext::MD_tbaa, CopyMD);		S->setMetadata(LLVMContext::MD_tbaa, CopyMD);
if (LoopMemParallelMD)		if (LoopMemParallelMD)
S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);		S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
		if (AccessGroupMD)
		S->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);

if (auto *MT = dyn_cast<MemTransferInst>(MI)) {		if (auto *MT = dyn_cast<MemTransferInst>(MI)) {
// non-atomics can be volatile		// non-atomics can be volatile
L->setVolatile(MT->isVolatile());		L->setVolatile(MT->isVolatile());
S->setVolatile(MT->isVolatile());		S->setVolatile(MT->isVolatile());
}		}
if (isa<AtomicMemTransferInst>(MI)) {		if (isa<AtomicMemTransferInst>(MI)) {
// atomics have to be unordered		// atomics have to be unordered
▲ Show 20 Lines • Show All 4,580 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	for (const auto &MDPair : MD) {
case LLVMContext::MD_prof:		case LLVMContext::MD_prof:
case LLVMContext::MD_fpmath:		case LLVMContext::MD_fpmath:
case LLVMContext::MD_tbaa_struct:		case LLVMContext::MD_tbaa_struct:
case LLVMContext::MD_invariant_load:		case LLVMContext::MD_invariant_load:
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_nontemporal:		case LLVMContext::MD_nontemporal:
case LLVMContext::MD_mem_parallel_loop_access:		case LLVMContext::MD_mem_parallel_loop_access:
		case LLVMContext::MD_access_group:
// All of these directly apply.		// All of these directly apply.
NewLoad->setMetadata(ID, N);		NewLoad->setMetadata(ID, N);
break;		break;

case LLVMContext::MD_nonnull:		case LLVMContext::MD_nonnull:
copyNonnullMetadata(LI, N, *NewLoad);		copyNonnullMetadata(LI, N, *NewLoad);
break;		break;
case LLVMContext::MD_align:		case LLVMContext::MD_align:
▲ Show 20 Lines • Show All 1,132 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombinePHI.cpp

Show First 20 Lines • Show All 602 Lines • ▼ Show 20 Lines	unsigned KnownIDs[] = {
LLVMContext::MD_range,		LLVMContext::MD_range,
LLVMContext::MD_invariant_load,		LLVMContext::MD_invariant_load,
LLVMContext::MD_alias_scope,		LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias,		LLVMContext::MD_noalias,
LLVMContext::MD_nonnull,		LLVMContext::MD_nonnull,
LLVMContext::MD_align,		LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null,		LLVMContext::MD_dereferenceable_or_null,
		LLVMContext::MD_access_group,
};		};

for (unsigned ID : KnownIDs)		for (unsigned ID : KnownIDs)
NewLI->setMetadata(ID, FirstLI->getMetadata(ID));		NewLI->setMetadata(ID, FirstLI->getMetadata(ID));

// Add all operands to the new PHI and combine TBAA metadata.		// Add all operands to the new PHI and combine TBAA metadata.
for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {
LoadInst *LI = cast<LoadInst>(PN.getIncomingValue(i));		LoadInst *LI = cast<LoadInst>(PN.getIncomingValue(i));
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

lib/Transforms/Scalar/GVNHoist.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	public:
const VNtoInsns &getStoreVNTable() const { return VNtoCallsStores; }		const VNtoInsns &getStoreVNTable() const { return VNtoCallsStores; }
};		};

static void combineKnownMetadata(Instruction ReplInst, Instruction I) {		static void combineKnownMetadata(Instruction ReplInst, Instruction I) {
static const unsigned KnownIDs[] = {		static const unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,		LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_invariant_group};		LLVMContext::MD_invariant_group, LLVMContext::MD_access_group};
combineMetadata(ReplInst, I, KnownIDs, true);		combineMetadata(ReplInst, I, KnownIDs, true);
}		}

// This pass hoists common computations across branches sharing common		// This pass hoists common computations across branches sharing common
// dominator. The primary goal is to reduce the code size, and in some		// dominator. The primary goal is to reduce the code size, and in some
// cases reduce critical path (by exposing more ILP).		// cases reduce critical path (by exposing more ILP).
class GVNHoist {		class GVNHoist {
public:		public:
▲ Show 20 Lines • Show All 950 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopVersioningLICM.cpp

Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	if (isLegalForVersioning()) {
DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);		LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);
LVer.versionLoop();		LVer.versionLoop();
// Set Loop Versioning metaData for original loop.		// Set Loop Versioning metaData for original loop.
addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);
// Set Loop Versioning metaData for version loop.		// Set Loop Versioning metaData for version loop.
addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);
// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.		// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.
		// FIXME: "llvm.mem.parallel_loop_access" annotates memory access
		hfinkelUnsubmitted Done Reply Inline Actions I don't understand what this FIXME is saying. What needs to be fixed? hfinkel: I don't understand what this FIXME is saying. What needs to be fixed?
		MeinersburAuthorUnsubmitted Done Reply Inline Actions The line below adds `llvm.mem.parallel_loop_access` to a LoopID, but is expected as annotations of instructions that access memory (with a the loop it is parallel to as parameter). There is no code that looks for `llvm.mem.parallel_loop_access` in LoopID metadata. Hence, adding the property has no effect. Meinersbur: The line below adds `llvm.mem.parallel_loop_access` to a LoopID, but is expected as annotations…
		hfinkelUnsubmitted Done Reply Inline Actions OIC, okay. Thanks. hfinkel: OIC, okay. Thanks.
		// instructions, not loops.
addStringMetadataToLoop(LVer.getVersionedLoop(),		addStringMetadataToLoop(LVer.getVersionedLoop(),
"llvm.mem.parallel_loop_access");		"llvm.mem.parallel_loop_access");
// Update version loop with aggressive aliasing assumption.		// Update version loop with aggressive aliasing assumption.
setNoAliasToLoop(LVer.getVersionedLoop());		setNoAliasToLoop(LVer.getVersionedLoop());
Changed = true;		Changed = true;
}		}
return Changed;		return Changed;
}		}
Show All 18 Lines

lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 990 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::performCallSlotOptzn(Instruction cpy, Value cpyDest,
// its dependence information by changing its parameter.		// its dependence information by changing its parameter.
MD->removeInstruction(C);		MD->removeInstruction(C);

// Update AA metadata		// Update AA metadata
// FIXME: MD_tbaa_struct and MD_mem_parallel_loop_access should also be		// FIXME: MD_tbaa_struct and MD_mem_parallel_loop_access should also be
// handled here, but combineMetadata doesn't support them yet		// handled here, but combineMetadata doesn't support them yet
unsigned KnownIDs[] = {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		unsigned KnownIDs[] = {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias,		LLVMContext::MD_noalias,
LLVMContext::MD_invariant_group};		LLVMContext::MD_invariant_group,
		LLVMContext::MD_access_group};
combineMetadata(C, cpy, KnownIDs, true);		combineMetadata(C, cpy, KnownIDs, true);

// Remove the memcpy.		// Remove the memcpy.
MD->removeInstruction(cpy);		MD->removeInstruction(cpy);
++NumMemCpyInstr;		++NumMemCpyInstr;

return true;		return true;
}		}
▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 2,587 Lines • ▼ Show 20 Lines	if (DL.getTypeSizeInBits(V->getType()) != IntTy->getBitWidth()) {
IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");		IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
Old = convertValue(DL, IRB, Old, IntTy);		Old = convertValue(DL, IRB, Old, IntTy);
assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");		assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");
uint64_t Offset = BeginOffset - NewAllocaBeginOffset;		uint64_t Offset = BeginOffset - NewAllocaBeginOffset;
V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");		V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");
}		}
V = convertValue(DL, IRB, V, NewAllocaTy);		V = convertValue(DL, IRB, V, NewAllocaTy);
StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());		StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
Store->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);		Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});
if (AATags)		if (AATags)
Store->setAAMetadata(AATags);		Store->setAAMetadata(AATags);
Pass.DeadInsts.insert(&SI);		Pass.DeadInsts.insert(&SI);
LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");		LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");
return true;		return true;
}		}

bool visitStoreInst(StoreInst &SI) {		bool visitStoreInst(StoreInst &SI) {
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (NewBeginOffset == NewAllocaBeginOffset &&
NewSI = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),		NewSI = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),
SI.isVolatile());		SI.isVolatile());
} else {		} else {
unsigned AS = SI.getPointerAddressSpace();		unsigned AS = SI.getPointerAddressSpace();
Value *NewPtr = getNewAllocaSlicePtr(IRB, V->getType()->getPointerTo(AS));		Value *NewPtr = getNewAllocaSlicePtr(IRB, V->getType()->getPointerTo(AS));
NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),		NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),
SI.isVolatile());		SI.isVolatile());
}		}
NewSI->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);		NewSI->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});
if (AATags)		if (AATags)
NewSI->setAAMetadata(AATags);		NewSI->setAAMetadata(AATags);
if (SI.isVolatile())		if (SI.isVolatile())
NewSI->setAtomic(SI.getOrdering(), SI.getSyncScopeID());		NewSI->setAtomic(SI.getOrdering(), SI.getSyncScopeID());
Pass.DeadInsts.insert(&SI);		Pass.DeadInsts.insert(&SI);
deleteIfTriviallyDead(OldOp);		deleteIfTriviallyDead(OldOp);

LLVM_DEBUG(dbgs() << " to: " << *NewSI << "\n");		LLVM_DEBUG(dbgs() << " to: " << *NewSI << "\n");
▲ Show 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	for (;;) {
auto *PartPtrTy = PartTy->getPointerTo(AS);		auto *PartPtrTy = PartTy->getPointerTo(AS);
LoadInst *PLoad = IRB.CreateAlignedLoad(		LoadInst *PLoad = IRB.CreateAlignedLoad(
getAdjustedPtr(IRB, DL, BasePtr,		getAdjustedPtr(IRB, DL, BasePtr,
APInt(DL.getIndexSizeInBits(AS), PartOffset),		APInt(DL.getIndexSizeInBits(AS), PartOffset),
PartPtrTy, BasePtr->getName() + "."),		PartPtrTy, BasePtr->getName() + "."),
getAdjustedAlignment(LI, PartOffset, DL), /IsVolatile/ false,		getAdjustedAlignment(LI, PartOffset, DL), /IsVolatile/ false,
LI->getName());		LI->getName());
PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);		PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
		PLoad->copyMetadata(*LI, LLVMContext::MD_access_group);

// Append this load onto the list of split loads so we can find it later		// Append this load onto the list of split loads so we can find it later
// to rewrite the stores.		// to rewrite the stores.
SplitLoads.push_back(PLoad);		SplitLoads.push_back(PLoad);

// Now build a new slice for the alloca.		// Now build a new slice for the alloca.
NewSlices.push_back(		NewSlices.push_back(
Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
Show All 40 Lines	for (User *LU : LI->users()) {
auto AS = SI->getPointerAddressSpace();		auto AS = SI->getPointerAddressSpace();
StoreInst *PStore = IRB.CreateAlignedStore(		StoreInst *PStore = IRB.CreateAlignedStore(
PLoad,		PLoad,
getAdjustedPtr(IRB, DL, StoreBasePtr,		getAdjustedPtr(IRB, DL, StoreBasePtr,
APInt(DL.getIndexSizeInBits(AS), PartOffset),		APInt(DL.getIndexSizeInBits(AS), PartOffset),
PartPtrTy, StoreBasePtr->getName() + "."),		PartPtrTy, StoreBasePtr->getName() + "."),
getAdjustedAlignment(SI, PartOffset, DL), /IsVolatile/ false);		getAdjustedAlignment(SI, PartOffset, DL), /IsVolatile/ false);
PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);		PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
		PStore->copyMetadata(*LI, LLVMContext::MD_access_group);
LLVM_DEBUG(dbgs() << " +" << PartOffset << ":" << *PStore << "\n");		LLVM_DEBUG(dbgs() << " +" << PartOffset << ":" << *PStore << "\n");
}		}

// We want to immediately iterate on any allocas impacted by splitting		// We want to immediately iterate on any allocas impacted by splitting
// this store, and we have to track any promotable alloca (indicated by		// this store, and we have to track any promotable alloca (indicated by
// a direct store) as needing to be resplit because it is no longer		// a direct store) as needing to be resplit because it is no longer
// promotable.		// promotable.
if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(StoreBasePtr)) {		if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(StoreBasePtr)) {
▲ Show 20 Lines • Show All 723 Lines • Show Last 20 Lines

lib/Transforms/Scalar/Scalarizer.cpp

	Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines
	// vector to scalar instructions.			// vector to scalar instructions.
	bool ScalarizerVisitor::canTransferMetadata(unsigned Tag) {			bool ScalarizerVisitor::canTransferMetadata(unsigned Tag) {
	return (Tag == LLVMContext::MD_tbaa			return (Tag == LLVMContext::MD_tbaa
	\|\| Tag == LLVMContext::MD_fpmath			\|\| Tag == LLVMContext::MD_fpmath
	\|\| Tag == LLVMContext::MD_tbaa_struct			\|\| Tag == LLVMContext::MD_tbaa_struct
	\|\| Tag == LLVMContext::MD_invariant_load			\|\| Tag == LLVMContext::MD_invariant_load
	\|\| Tag == LLVMContext::MD_alias_scope			\|\| Tag == LLVMContext::MD_alias_scope
	\|\| Tag == LLVMContext::MD_noalias			\|\| Tag == LLVMContext::MD_noalias
	\|\| Tag == ParallelLoopAccessMDKind);			\|\| Tag == ParallelLoopAccessMDKind
				\|\| Tag == LLVMContext::MD_access_group);
	}			}

	// Transfer metadata from Op to the instructions in CV if it is known			// Transfer metadata from Op to the instructions in CV if it is known
	// to be safe to do so.			// to be safe to do so.
	void ScalarizerVisitor::transferMetadata(Instruction *Op, const ValueVector &CV) {			void ScalarizerVisitor::transferMetadata(Instruction *Op, const ValueVector &CV) {
	SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;			SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
	Op->getAllMetadataOtherThanDebugLoc(MDs);			Op->getAllMetadataOtherThanDebugLoc(MDs);
	for (unsigned I = 0, E = CV.size(); I != E; ++I) {			for (unsigned I = 0, E = CV.size(); I != E; ++I) {
	▲ Show 20 Lines • Show All 433 Lines • Show Last 20 Lines

lib/Transforms/Utils/InlineFunction.cpp

Show All 25 Lines
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	static void HandleInlinedEHPad(InvokeInst II, BasicBlock FirstNewBlock,

// Now that everything is happy, we have one final detail. The PHI nodes in		// Now that everything is happy, we have one final detail. The PHI nodes in
// the exception destination block still have entries due to the original		// the exception destination block still have entries due to the original
// invoke instruction. Eliminate these entries (which might even delete the		// invoke instruction. Eliminate these entries (which might even delete the
// PHI node) now.		// PHI node) now.
UnwindDest->removePredecessor(InvokeBB);		UnwindDest->removePredecessor(InvokeBB);
}		}

/// When inlining a call site that has !llvm.mem.parallel_loop_access metadata,		/// When inlining a call site that has !llvm.mem.parallel_loop_access or
/// that metadata should be propagated to all memory-accessing cloned		/// llvm.access.group metadata, that metadata should be propagated to all
/// instructions.		/// memory-accessing cloned instructions.
static void PropagateParallelLoopAccessMetadata(CallSite CS,		static void PropagateParallelLoopAccessMetadata(CallSite CS,
ValueToValueMapTy &VMap) {		ValueToValueMapTy &VMap) {
MDNode *M =		MDNode *M =
CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);		CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
if (!M)		MDNode *CallAccessGroup =
		CS.getInstruction()->getMetadata(LLVMContext::MD_access_group);
		if (!M && !CallAccessGroup)
return;		return;

for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();		for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();
VMI != VMIE; ++VMI) {		VMI != VMIE; ++VMI) {
if (!VMI->second)		if (!VMI->second)
continue;		continue;

Instruction *NI = dyn_cast<Instruction>(VMI->second);		Instruction *NI = dyn_cast<Instruction>(VMI->second);
if (!NI)		if (!NI)
continue;		continue;

if (MDNode *PM = NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {		if (M) {
		if (MDNode *PM =
		NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
M = MDNode::concatenate(PM, M);		M = MDNode::concatenate(PM, M);
NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);		NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
} else if (NI->mayReadOrWriteMemory()) {		} else if (NI->mayReadOrWriteMemory()) {
NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);		NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
}		}
}		}

		if (NI->mayReadOrWriteMemory()) {
		MDNode *UnitedAccGroups = uniteAccessGroups(
		NI->getMetadata(LLVMContext::MD_access_group), CallAccessGroup);
		NI->setMetadata(LLVMContext::MD_access_group, UnitedAccGroups);
		}
		hfinkelUnsubmitted Done Reply Inline Actions Is the problem with "updating all uses of one of the access groups" that we don't have a way to efficiently enumerate them? Would we need to scan the functions for branches and collect all of the loop-id metadata that's relevant first? It would be nice not to lose this information. hfinkel: Is the problem with "updating all uses of one of the access groups" that we don't have a way to…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions Either search (and update) all LoopIDs that reference the access group or create a new 'meta-access-group' as outlined in the FIXME. Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. E.g. it would be nice if alias-analysis would be control-flow-sensitive. At there moment the only pass making use of `Loop::isAnnotatedParallel()` are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. There's also the FIXME still in the code such that the issue is not forgotten. Do you still want me to implement one of the solutions? Meinersbur: Either search (and update) all LoopIDs that reference the access group or create a new 'meta…
		hfinkelUnsubmitted Done Reply Inline Actions Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. Clearly I know this ;) At there moment the only pass making use of Loop::isAnnotatedParallel() are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. Two things: First, this might be true today, but very soon won't be true (vectorization will soon handle outer loops, and we are developing other loop-nest transformations) and if we do the right thing now, we won't later need to go back and fix this later. Second, while it's true that full loop unrolling generally happens before inlining, we could have a loop that, at the point of inlining is an inner loop, but is later fully unrolled such that before vectorization (etc.) the loop here might become, once again, the inner loop. Do you still want me to implement one of the solutions? Yes. hfinkel: > Of course it would be nice to not lose information, but as for any analyses there is a trade…
		}
}		}

/// When inlining a function that contains noalias scope metadata,		/// When inlining a function that contains noalias scope metadata,
/// this metadata needs to be cloned so that the inlined blocks		/// this metadata needs to be cloned so that the inlined blocks
/// have different "unique scopes" at every call site. Were this not done, then		/// have different "unique scopes" at every call site. Were this not done, then
/// aliasing scopes from a function inlined into a caller multiple times could		/// aliasing scopes from a function inlined into a caller multiple times could
/// not be differentiated (and this would lead to miscompiles because the		/// not be differentiated (and this would lead to miscompiles because the
/// non-aliasing property communicated by the metadata could have		/// non-aliasing property communicated by the metadata could have
▲ Show 20 Lines • Show All 1,567 Lines • Show Last 20 Lines

lib/Transforms/Utils/Local.cpp

Show All 28 Lines
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyValueInfo.h"		#include "llvm/Analysis/LazyValueInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
▲ Show 20 Lines • Show All 2,262 Lines • ▼ Show 20 Lines	switch (Kind) {
break;		break;
case LLVMContext::MD_alias_scope:		case LLVMContext::MD_alias_scope:
K->setMetadata(Kind, MDNode::getMostGenericAliasScope(JMD, KMD));		K->setMetadata(Kind, MDNode::getMostGenericAliasScope(JMD, KMD));
break;		break;
case LLVMContext::MD_noalias:		case LLVMContext::MD_noalias:
case LLVMContext::MD_mem_parallel_loop_access:		case LLVMContext::MD_mem_parallel_loop_access:
K->setMetadata(Kind, MDNode::intersect(JMD, KMD));		K->setMetadata(Kind, MDNode::intersect(JMD, KMD));
break;		break;
		case LLVMContext::MD_access_group:
		K->setMetadata(LLVMContext::MD_access_group,
		intersectAccessGroups(K, J));
		break;
case LLVMContext::MD_range:		case LLVMContext::MD_range:

// If K does move, use most generic range. Otherwise keep the range of		// If K does move, use most generic range. Otherwise keep the range of
// K.		// K.
if (DoesKMove)		if (DoesKMove)
// FIXME: If K does move, we should drop the range info and nonnull.		// FIXME: If K does move, we should drop the range info and nonnull.
// Currently this function is used with DoesKMove in passes		// Currently this function is used with DoesKMove in passes
// doing hoisting/sinking and the current behavior of using the		// doing hoisting/sinking and the current behavior of using the
Show All 40 Lines
void llvm::combineMetadataForCSE(Instruction K, const Instruction J,		void llvm::combineMetadataForCSE(Instruction K, const Instruction J,
bool KDominatesJ) {		bool KDominatesJ) {
unsigned KnownIDs[] = {		unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_invariant_load, LLVMContext::MD_nonnull,		LLVMContext::MD_invariant_load, LLVMContext::MD_nonnull,
LLVMContext::MD_invariant_group, LLVMContext::MD_align,		LLVMContext::MD_invariant_group, LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null};		LLVMContext::MD_dereferenceable_or_null,
		LLVMContext::MD_access_group};
combineMetadata(K, J, KnownIDs, KDominatesJ);		combineMetadata(K, J, KnownIDs, KDominatesJ);
}		}

void llvm::patchReplacementInstruction(Instruction I, Value Repl) {		void llvm::patchReplacementInstruction(Instruction I, Value Repl) {
auto *ReplInst = dyn_cast<Instruction>(Repl);		auto *ReplInst = dyn_cast<Instruction>(Repl);
if (!ReplInst)		if (!ReplInst)
return;		return;

Show All 14 Lines	void llvm::patchReplacementInstruction(Instruction I, Value Repl) {

// In general, GVN unifies expressions over different control-flow		// In general, GVN unifies expressions over different control-flow
// regions, and so we need a conservative combination of the noalias		// regions, and so we need a conservative combination of the noalias
// scopes.		// scopes.
static const unsigned KnownIDs[] = {		static const unsigned KnownIDs[] = {
LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,		LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias, LLVMContext::MD_range,		LLVMContext::MD_noalias, LLVMContext::MD_range,
LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,		LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull};		LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull,
		LLVMContext::MD_access_group};
combineMetadata(ReplInst, I, KnownIDs, false);		combineMetadata(ReplInst, I, KnownIDs, false);
}		}

template <typename RootType, typename DominatesFn>		template <typename RootType, typename DominatesFn>
static unsigned replaceDominatedUsesWith(Value From, Value To,		static unsigned replaceDominatedUsesWith(Value From, Value To,
const RootType &Root,		const RootType &Root,
const DominatesFn &Dominates) {		const DominatesFn &Dominates) {
assert(From->getType() == To->getType());		assert(From->getType() == To->getType());
▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	void llvm::initializeLoopPassPass(PassRegistry &Registry) {
INITIALIZE_PASS_DEPENDENCY(LoopSimplify)		INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
}		}

/// Find string metadata for loop		/// Find string metadata for loop
///		///
/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an		/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
/// operand or null otherwise. If the string metadata is not found return		/// operand or null otherwise. If the string metadata is not found return
/// Optional's not-a-value.		/// Optional's not-a-value.
Optional<const MDOperand > llvm::findStringMetadataForLoop(Loop TheLoop,		Optional<const MDOperand > llvm::findStringMetadataForLoop(const Loop TheLoop,
StringRef Name) {		StringRef Name) {
MDNode *LoopID = TheLoop->getLoopID();		MDNode *MD = findOptionMDForLoop(TheLoop, Name);
// Return none if LoopID is false.
if (!LoopID)
return None;

// First operand should refer to the loop id itself.
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");

// Iterate over LoopID operands and look for MDString Metadata
for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
if (!MD)		if (!MD)
continue;		return None;
MDString *S = dyn_cast<MDString>(MD->getOperand(0));
if (!S)
continue;
// Return true if MDString holds expected MetaData.
if (Name.equals(S->getString()))
switch (MD->getNumOperands()) {		switch (MD->getNumOperands()) {
case 1:		case 1:
return nullptr;		return nullptr;
case 2:		case 2:
return &MD->getOperand(1);		return &MD->getOperand(1);
default:		default:
llvm_unreachable("loop metadata has 0 or 1 operand");		llvm_unreachable("loop metadata has 0 or 1 operand");
}		}
}		}
return None;
}

/// Does a BFS from a given node to all of its children inside a given loop.		/// Does a BFS from a given node to all of its children inside a given loop.
MeinersburAuthorUnsubmitted Done Reply Inline Actions This has moved to LoopInfo.cpp. LoopUtils.cpp belongs to libTransform, but not all users have a dependency to libTransform (such as `Loop::isAnnotatedParallel()`). Meinersbur: This has moved to LoopInfo.cpp. LoopUtils.cpp belongs to libTransform, but not all users have a…
/// The returned vector of nodes includes the starting point.		/// The returned vector of nodes includes the starting point.
SmallVector<DomTreeNode *, 16>		SmallVector<DomTreeNode *, 16>
llvm::collectChildrenInLoop(DomTreeNode N, const Loop CurLoop) {		llvm::collectChildrenInLoop(DomTreeNode N, const Loop CurLoop) {
SmallVector<DomTreeNode *, 16> Worklist;		SmallVector<DomTreeNode *, 16> Worklist;
auto AddRegionToWorklist = [&](DomTreeNode *DTN) {		auto AddRegionToWorklist = [&](DomTreeNode *DTN) {
// Only include subregions in the top level loop.		// Only include subregions in the top level loop.
BasicBlock *BB = DTN->getBlock();		BasicBlock *BB = DTN->getBlock();
if (CurLoop->contains(BB))		if (CurLoop->contains(BB))
▲ Show 20 Lines • Show All 477 Lines • Show Last 20 Lines

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines	if (isa<DbgInfoIntrinsic>(I1) \|\| isa<DbgInfoIntrinsic>(I2)) {
LLVMContext::MD_range,		LLVMContext::MD_range,
LLVMContext::MD_fpmath,		LLVMContext::MD_fpmath,
LLVMContext::MD_invariant_load,		LLVMContext::MD_invariant_load,
LLVMContext::MD_nonnull,		LLVMContext::MD_nonnull,
LLVMContext::MD_invariant_group,		LLVMContext::MD_invariant_group,
LLVMContext::MD_align,		LLVMContext::MD_align,
LLVMContext::MD_dereferenceable,		LLVMContext::MD_dereferenceable,
LLVMContext::MD_dereferenceable_or_null,		LLVMContext::MD_dereferenceable_or_null,
LLVMContext::MD_mem_parallel_loop_access};		LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group};
combineMetadata(I1, I2, KnownIDs, true);		combineMetadata(I1, I2, KnownIDs, true);

// I1 and I2 are being combined into a single instruction. Its debug		// I1 and I2 are being combined into a single instruction. Its debug
// location is the merged locations of the original instructions.		// location is the merged locations of the original instructions.
I1->applyMergedLocation(I1->getDebugLoc(), I2->getDebugLoc());		I1->applyMergedLocation(I1->getDebugLoc(), I2->getDebugLoc());

I2->eraseFromParent();		I2->eraseFromParent();
Changed = true;		Changed = true;
▲ Show 20 Lines • Show All 4,769 Lines • Show Last 20 Lines

test/Analysis/LoopInfo/annotated-parallel-complex.ll

This file was added.

				; RUN: opt -loops -analyze < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict 4n], double B[static const restrict 4n]) {
				; for (long i = 0; i < n; i += 1)
				; for (long j = 0; j < n; j += 1)
				; for (long k = 0; k < n; k += 1)
				; for (long l = 0; l < n; l += 1) {
				; A[i + j + k + l] = 21;
				; B[i + j + k + l] = 42;
				; }
				; }
				;
				; Check that isAnnotatedParallel is working as expected.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @func(i64 %n, double* noalias nonnull %A, double* noalias nonnull %B) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add28, %for.inc27 ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.cond2, label %for.end29

				for.cond2:
				%j.0 = phi i64 [ %add25, %for.inc24 ], [ 0, %for.cond ]
				%cmp3 = icmp slt i64 %j.0, %n
				br i1 %cmp3, label %for.cond6, label %for.inc27

				for.cond6:
				%k.0 = phi i64 [ %add22, %for.inc21 ], [ 0, %for.cond2 ]
				%cmp7 = icmp slt i64 %k.0, %n
				br i1 %cmp7, label %for.cond10, label %for.inc24

				for.cond10:
				%l.0 = phi i64 [ %add20, %for.body13 ], [ 0, %for.cond6 ]
				%cmp11 = icmp slt i64 %l.0, %n
				br i1 %cmp11, label %for.body13, label %for.inc21

				for.body13:
				%add = add nuw nsw i64 %i.0, %j.0
				%add14 = add nuw nsw i64 %add, %k.0
				%add15 = add nuw nsw i64 %add14, %l.0
				%arrayidx = getelementptr inbounds double, double* %A, i64 %add15
				store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !5
				%add16 = add nuw nsw i64 %i.0, %j.0
				%add17 = add nuw nsw i64 %add16, %k.0
				%add18 = add nuw nsw i64 %add17, %l.0
				%arrayidx19 = getelementptr inbounds double, double* %B, i64 %add18
				store double 4.200000e+01, double* %arrayidx19, align 8, !llvm.access.group !6
				%add20 = add nuw nsw i64 %l.0, 1
				br label %for.cond10, !llvm.loop !11

				for.inc21:
				%add22 = add nuw nsw i64 %k.0, 1
				br label %for.cond6, !llvm.loop !14

				for.inc24:
				%add25 = add nuw nsw i64 %j.0, 1
				br label %for.cond2, !llvm.loop !16

				for.inc27:
				%add28 = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !18

				for.end29:
				ret void
				}

				; access groups
				!7 = distinct !{}
				!8 = distinct !{}
				!10 = distinct !{}

				; access group lists
				!5 = !{!7, !10}
				!6 = !{!7, !8, !10}

				; LoopIDs
				!11 = distinct !{!11, !{!"llvm.loop.parallel_accesses", !10}}
				!14 = distinct !{!14, !{!"llvm.loop.parallel_accesses", !8, !10}}
				!16 = distinct !{!16, !{!"llvm.loop.parallel_accesses", !8}}
				!18 = distinct !{!18, !{!"llvm.loop.parallel_accesses", !7}}


				; CHECK: Parallel Loop at depth 1
				; CHECK-NOT: Parallel
				; CHECK: Loop at depth 2
				; CHECK: Parallel Loop
				; CHECK: Parallel Loop

test/Analysis/LoopInfo/annotated-parallel-simple.ll

This file was added.

				; RUN: opt -loops -analyze < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict n]) {
				; for (long i = 0; i < n; i += 1)
				; A[i] = 21;
				; }
				;
				; Check that isAnnotatedParallel is working as expected.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @func(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%arrayidx = getelementptr inbounds double, double* %A, i64 %i.0
				store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !6
				%add = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !7

				for.end:
				ret void
				}

				!6 = distinct !{} ; access group

				!7 = distinct !{!7, !9} ; LoopID
				!9 = !{!"llvm.loop.parallel_accesses", !6}


				; CHECK: Parallel Loop

test/ThinLTO/X86/lazyload_metadata.ll

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0
	; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0
	; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc			; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc
	; REQUIRES: asserts			; REQUIRES: asserts

	; Check that importing @globalfunc1 does not trigger loading all the global			; Check that importing @globalfunc1 does not trigger loading all the global
	; metadata for @globalfunc2 and @globalfunc3			; metadata for @globalfunc2 and @globalfunc3

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -stats \			; RUN: -o /dev/null -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY
	; LAZY: 55 bitcode-reader - Number of Metadata records loaded			; LAZY: 57 bitcode-reader - Number of Metadata records loaded
	; LAZY: 2 bitcode-reader - Number of MDStrings loaded			; LAZY: 2 bitcode-reader - Number of MDStrings loaded

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \			; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY
	; NOTLAZY: 64 bitcode-reader - Number of Metadata records loaded			; NOTLAZY: 66 bitcode-reader - Number of Metadata records loaded
	; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded			; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded


	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.11.0"			target triple = "x86_64-apple-macosx10.11.0"

	define void @globalfunc1(i32 %arg) {			define void @globalfunc1(i32 %arg) {
	%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")			%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")
	Show All 31 Lines

test/Transforms/Inline/parallel-loop-md-callee.ll

This file was added.

				; RUN: opt -S -inline < %s \| FileCheck %s
				;
				; Check that the !llvm.access.group is still present after inlining.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @Body(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p, i32 %i) {
				entry:
				%idxprom = sext i32 %i to i64
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
				%cmp = icmp eq i32 %0, 0
				%arrayidx2 = getelementptr inbounds i32, i32* %res, i64 %idxprom
				%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !0
				br i1 %cmp, label %cond.end, label %cond.false

				cond.false:
				%arrayidx6 = getelementptr inbounds i32, i32* %d, i64 %idxprom
				%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !0
				%add = add nsw i32 %2, %1
				br label %cond.end

				cond.end:
				%cond = phi i32 [ %add, %cond.false ], [ %1, %entry ]
				store i32 %cond, i32* %arrayidx2, align 4
				ret void
				}

				define void @Test(i32* %res, i32* %c, i32* %d, i32* %p, i32 %n) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%cmp = icmp slt i32 %i.0, 1600
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
				%inc = add nsw i32 %i.0, 1
				br label %for.cond, !llvm.loop !1

				for.end:
				ret void
				}

				!0 = distinct !{} ; access group
				!1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !0}} ; LoopID


				; CHECK-LABEL: @Test
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: load i32,{{.*}}, !llvm.access.group !0
				; CHECK: store i32 {{.*}}, !llvm.access.group !0
				; CHECK: br label %for.cond, !llvm.loop !1

test/Transforms/Inline/parallel-loop-md-merge.ll

This file was added.

				; RUN: opt -always-inline -globalopt -S < %s \| FileCheck %s
				;
				; static void __attribute__((always_inline)) callee(long n, double A[static const restrict n], long i) {
				; for (long j = 0; j < n; j += 1)
				; A[i * n + j] = 42;
				; }
				;
				; void caller(long n, double A[static const restrict n]) {
				; for (long i = 0; i < n; i += 1)
				; callee(n, A, i);
				; }
				;
				; Check that the access groups (llvm.access.group) are correctly merged.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define internal void @callee(i64 %n, double* noalias nonnull %A, i64 %i) #0 {
				entry:
				br label %for.cond

				for.cond:
				%j.0 = phi i64 [ 0, %entry ], [ %add1, %for.body ]
				%cmp = icmp slt i64 %j.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%mul = mul nsw i64 %i, %n
				%add = add nsw i64 %mul, %j.0
				%arrayidx = getelementptr inbounds double, double* %A, i64 %add
				store double 4.200000e+01, double* %arrayidx, align 8, !llvm.access.group !6
				%add1 = add nuw nsw i64 %j.0, 1
				br label %for.cond, !llvm.loop !7

				for.end:
				ret void
				}

				attributes #0 = { alwaysinline }

				!6 = distinct !{} ; access group
				!7 = distinct !{!7, !9} ; LoopID
				!9 = !{!"llvm.loop.parallel_accesses", !6}


				define void @caller(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%cmp = icmp slt i64 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				call void @callee(i64 %n, double* %A, i64 %i.0), !llvm.access.group !10
				%add = add nuw nsw i64 %i.0, 1
				br label %for.cond, !llvm.loop !11

				for.end:
				ret void
				}

				!10 = distinct !{} ; access group
				!11 = distinct !{!11, !12} ; LoopID
				!12 = !{!"llvm.loop.parallel_accesses", !10}


				; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group ![[ACCESS_GROUP_LIST_3:[0-9]+]]
				; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
				; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]

				; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
				; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
				; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
				; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], ![[ACCESSES_INNER:[0-9]+]]}
				; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
				; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], ![[ACCESSES_OUTER:[0-9]+]]}
				; CHECK: ![[ACCESSES_OUTER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}

test/Transforms/Inline/parallel-loop-md.ll

Show All 31 Lines	entry:
br label %for.cond		br label %for.cond

for.cond: ; preds = %for.body, %entry		for.cond: ; preds = %for.body, %entry
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp slt i32 %i.0, 1600		%cmp = icmp slt i32 %i.0, 1600
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.body: ; preds = %for.cond		for.body: ; preds = %for.cond
call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.mem.parallel_loop_access !0		call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
%inc = add nsw i32 %i.0, 1		%inc = add nsw i32 %i.0, 1
br label %for.cond, !llvm.loop !0		br label %for.cond, !llvm.loop !1

for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

; CHECK-LABEL: @Test		; CHECK-LABEL: @Test
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: load i32,{{.*}}, !llvm.access.group !0
; CHECK: store i32{{.*}}, !llvm.mem.parallel_loop_access !0		; CHECK: store i32{{.*}}, !llvm.access.group !0
; CHECK: br label %for.cond, !llvm.loop !0		; CHECK: br label %for.cond, !llvm.loop !1

attributes #0 = { norecurse nounwind uwtable }		attributes #0 = { norecurse nounwind uwtable }

!0 = distinct !{!0}		!0 = distinct !{}
		!1 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !0}}

test/Transforms/InstCombine/intersect-accessgroup.ll

This file was added.

				; RUN: opt -instcombine -S < %s \| FileCheck %s
				;
				; void func(long n, double A[static const restrict n]) {
				; for (int i = 0; i < n; i+=1)
				; for (int j = 0; j < n;j+=1)
				; for (int k = 0; k < n; k += 1)
				; for (int l = 0; l < n; l += 1) {
				; double *p = &A[i + j + k + l];
				; double x = *p;
				; double y = *p;
				; arg(x + y);
				; }
				; }
				;
				; Check for correctly merging access group metadata for instcombine
				; (only common loops are parallel == intersection)
				; Note that combined load would be parallel to loop !16 since both
				; origin loads are parallel to it, but it references two access groups
				; (!8 and !9), neither of which contain both loads. As such, the
				; information that the combined load is parallel to !16 is lost.
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				declare void @arg(double)

				define void @func(i64 %n, double* noalias nonnull %A) {
				entry:
				br label %for.cond

				for.cond:
				%i.0 = phi i32 [ 0, %entry ], [ %add31, %for.inc30 ]
				%conv = sext i32 %i.0 to i64
				%cmp = icmp slt i64 %conv, %n
				br i1 %cmp, label %for.cond2, label %for.end32

				for.cond2:
				%j.0 = phi i32 [ %add28, %for.inc27 ], [ 0, %for.cond ]
				%conv3 = sext i32 %j.0 to i64
				%cmp4 = icmp slt i64 %conv3, %n
				br i1 %cmp4, label %for.cond8, label %for.inc30

				for.cond8:
				%k.0 = phi i32 [ %add25, %for.inc24 ], [ 0, %for.cond2 ]
				%conv9 = sext i32 %k.0 to i64
				%cmp10 = icmp slt i64 %conv9, %n
				br i1 %cmp10, label %for.cond14, label %for.inc27

				for.cond14:
				%l.0 = phi i32 [ %add23, %for.body19 ], [ 0, %for.cond8 ]
				%conv15 = sext i32 %l.0 to i64
				%cmp16 = icmp slt i64 %conv15, %n
				br i1 %cmp16, label %for.body19, label %for.inc24

				for.body19:
				%add = add nsw i32 %i.0, %j.0
				%add20 = add nsw i32 %add, %k.0
				%add21 = add nsw i32 %add20, %l.0
				%idxprom = sext i32 %add21 to i64
				%arrayidx = getelementptr inbounds double, double* %A, i64 %idxprom
				%0 = load double, double* %arrayidx, align 8, !llvm.access.group !1
				%1 = load double, double* %arrayidx, align 8, !llvm.access.group !2
				%add22 = fadd double %0, %1
				call void @arg(double %add22), !llvm.access.group !3
				%add23 = add nsw i32 %l.0, 1
				br label %for.cond14, !llvm.loop !11

				for.inc24:
				%add25 = add nsw i32 %k.0, 1
				br label %for.cond8, !llvm.loop !14

				for.inc27:
				%add28 = add nsw i32 %j.0, 1
				br label %for.cond2, !llvm.loop !16

				for.inc30:
				%add31 = add nsw i32 %i.0, 1
				br label %for.cond, !llvm.loop !18

				for.end32:
				ret void
				}


				; access groups
				!7 = distinct !{}
				!8 = distinct !{}
				!9 = distinct !{}

				; access group lists
				!1 = !{!7, !9}
				!2 = !{!7, !8}
				!3 = !{!7, !8, !9}

				!11 = distinct !{!11, !13}
				!13 = !{!"llvm.loop.parallel_accesses", !7}

				!14 = distinct !{!14, !15}
				!15 = !{!"llvm.loop.parallel_accesses", !8}

				!16 = distinct !{!16, !17}
				!17 = !{!"llvm.loop.parallel_accesses", !8, !9}

				!18 = distinct !{!18, !19}
				!19 = !{!"llvm.loop.parallel_accesses", !9}


				; CHECK: load double, {{.*}} !llvm.access.group ![[ACCESSGROUP_0:[0-9]+]]
				; CHECK: br label %for.cond14, !llvm.loop ![[LOOP_4:[0-9]+]]

				; CHECK: ![[ACCESSGROUP_0]] = distinct !{}

				; CHECK: ![[LOOP_4]] = distinct !{![[LOOP_4]], ![[PARALLEL_ACCESSES_5:[0-9]+]]}
				; CHECK: ![[PARALLEL_ACCESSES_5]] = !{!"llvm.loop.parallel_accesses", ![[ACCESSGROUP_0]]}

test/Transforms/InstCombine/loadstore-metadata.ll

Show All 33 Lines	entry:
%l = load i32, i32* %ptr, !range !5		%l = load i32, i32* %ptr, !range !5
%c = bitcast i32 %l to float		%c = bitcast i32 %l to float
ret float %c		ret float %c
}		}

define i32 @test_load_cast_combine_invariant(float* %ptr) {		define i32 @test_load_cast_combine_invariant(float* %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.		; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.
; CHECK-LABEL: @test_load_cast_combine_invariant(		; CHECK-LABEL: @test_load_cast_combine_invariant(
; CHECK: load i32, i32* %{{.*}}, !invariant.load !5		; CHECK: load i32, i32* %{{.*}}, !invariant.load !7
entry:		entry:
%l = load float, float* %ptr, !invariant.load !6		%l = load float, float* %ptr, !invariant.load !6
%c = bitcast float %l to i32		%c = bitcast float %l to i32
ret i32 %c		ret i32 %c
}		}

define i32 @test_load_cast_combine_nontemporal(float* %ptr) {		define i32 @test_load_cast_combine_nontemporal(float* %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal		; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_nontemporal(		; CHECK-LABEL: @test_load_cast_combine_nontemporal(
; CHECK: load i32, i32* %{{.*}}, !nontemporal !6		; CHECK: load i32, i32* %{{.*}}, !nontemporal !8
entry:		entry:
%l = load float, float* %ptr, !nontemporal !7		%l = load float, float* %ptr, !nontemporal !7
%c = bitcast float %l to i32		%c = bitcast float %l to i32
ret i32 %c		ret i32 %c
}		}

define i8* @test_load_cast_combine_align(i32** %ptr) {		define i8* @test_load_cast_combine_align(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves align		; Ensure (cast (load (...))) -> (load (cast (...))) preserves align
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_align(		; CHECK-LABEL: @test_load_cast_combine_align(
; CHECK: load i8, i8* %{{.*}}, !align !7		; CHECK: load i8, i8* %{{.*}}, !align !9
entry:		entry:
%l = load i32, i32* %ptr, !align !8		%l = load i32, i32* %ptr, !align !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define i8* @test_load_cast_combine_deref(i32** %ptr) {		define i8* @test_load_cast_combine_deref(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable		; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_deref(		; CHECK-LABEL: @test_load_cast_combine_deref(
; CHECK: load i8, i8* %{{.*}}, !dereferenceable !7		; CHECK: load i8, i8* %{{.*}}, !dereferenceable !9
entry:		entry:
%l = load i32, i32* %ptr, !dereferenceable !8		%l = load i32, i32* %ptr, !dereferenceable !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define i8* @test_load_cast_combine_deref_or_null(i32** %ptr) {		define i8* @test_load_cast_combine_deref_or_null(i32** %ptr) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves		; Ensure (cast (load (...))) -> (load (cast (...))) preserves
; dereferenceable_or_null metadata.		; dereferenceable_or_null metadata.
; CHECK-LABEL: @test_load_cast_combine_deref_or_null(		; CHECK-LABEL: @test_load_cast_combine_deref_or_null(
; CHECK: load i8, i8* %{{.*}}, !dereferenceable_or_null !7		; CHECK: load i8, i8* %{{.*}}, !dereferenceable_or_null !9
entry:		entry:
%l = load i32, i32* %ptr, !dereferenceable_or_null !8		%l = load i32, i32* %ptr, !dereferenceable_or_null !8
%c = bitcast i32* %l to i8*		%c = bitcast i32* %l to i8*
ret i8* %c		ret i8* %c
}		}

define void @test_load_cast_combine_loop(float* %src, i32* %dst, i32 %n) {		define void @test_load_cast_combine_loop(float* %src, i32* %dst, i32 %n) {
; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access		; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access
; metadata.		; metadata.
; CHECK-LABEL: @test_load_cast_combine_loop(		; CHECK-LABEL: @test_load_cast_combine_loop(
; CHECK: load i32, i32* %{{.*}}, !llvm.mem.parallel_loop_access !4		; CHECK: load i32, i32* %{{.*}}, !llvm.access.group !6
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]		%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
%src.gep = getelementptr inbounds float, float* %src, i32 %i		%src.gep = getelementptr inbounds float, float* %src, i32 %i
%dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i		%dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i
%l = load float, float* %src.gep, !llvm.mem.parallel_loop_access !4		%l = load float, float* %src.gep, !llvm.access.group !9
%c = bitcast float %l to i32		%c = bitcast float %l to i32
store i32 %c, i32* %dst.gep		store i32 %c, i32* %dst.gep
%i.next = add i32 %i, 1		%i.next = add i32 %i, 1
%cmp = icmp slt i32 %i.next, %n		%cmp = icmp slt i32 %i.next, %n
br i1 %cmp, label %loop, label %exit, !llvm.loop !1		br i1 %cmp, label %loop, label %exit, !llvm.loop !1

exit:		exit:
ret void		ret void
Show All 23 Lines
}		}

; This is the metadata tuple that we reference above:		; This is the metadata tuple that we reference above:
; CHECK: ![[MD]] = !{i64 1, i64 0}		; CHECK: ![[MD]] = !{i64 1, i64 0}
!0 = !{!1, !1, i64 0}		!0 = !{!1, !1, i64 0}
!1 = !{!"scalar type", !2}		!1 = !{!"scalar type", !2}
!2 = !{!"root"}		!2 = !{!"root"}
!3 = distinct !{!3, !4}		!3 = distinct !{!3, !4}
!4 = distinct !{!4}		!4 = distinct !{!4, !{!"llvm.loop.parallel_accesses", !9}}
!5 = !{i32 0, i32 42}		!5 = !{i32 0, i32 42}
!6 = !{}		!6 = !{}
!7 = !{i32 1}		!7 = !{i32 1}
!8 = !{i64 8}		!8 = !{i64 8}
		!9 = distinct !{}

test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	;			;
	; Make sure the llvm.mem.parallel_loop_access meta-data is preserved			; Make sure the llvm.access.group meta-data is preserved
	; when a memcpy is replaced with a load+store by instcombine			; when a memcpy is replaced with a load+store by instcombine
	;			;
	; #include <string.h>			; #include <string.h>
	; void test(char* out, long size)			; void test(char* out, long size)
	; {			; {
	; #pragma clang loop vectorize(assume_safety)			; #pragma clang loop vectorize(assume_safety)
	; for (long i = 0; i < size; i+=2) {			; for (long i = 0; i < size; i+=2) {
	; memcpy(&(out[i]), &(out[i+size]), 2);			; memcpy(&(out[i]), &(out[i+size]), 2);
	; }			; }
	; }			; }

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: %{{.}} = load i16, i16 %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1			; CHECK: %{{.}} = load i16, i16 %{{.*}}, align 1, !llvm.access.group !1
	; CHECK: store i16 %{{.}}, i16 %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1			; CHECK: store i16 %{{.}}, i16 %{{.*}}, align 1, !llvm.access.group !1


	; ModuleID = '<stdin>'			; ModuleID = '<stdin>'
	source_filename = "memcpy.pragma.cpp"			source_filename = "memcpy.pragma.cpp"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @_Z4testPcl(i8* %out, i64 %size) #0 {			define void @_Z4testPcl(i8* %out, i64 %size) #0 {
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.inc, %entry			for.cond: ; preds = %for.inc, %entry
	%i.0 = phi i64 [ 0, %entry ], [ %add2, %for.inc ]			%i.0 = phi i64 [ 0, %entry ], [ %add2, %for.inc ]
	%cmp = icmp slt i64 %i.0, %size			%cmp = icmp slt i64 %i.0, %size
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	%arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0			%arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0
	%add = add nsw i64 %i.0, %size			%add = add nsw i64 %i.0, %size
	%arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add			%arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.mem.parallel_loop_access !1			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.access.group !4
	br label %for.inc			br label %for.inc

	for.inc: ; preds = %for.body			for.inc: ; preds = %for.body
	%add2 = add nsw i64 %i.0, 2			%add2 = add nsw i64 %i.0, 2
	br label %for.cond, !llvm.loop !2			br label %for.cond, !llvm.loop !2

	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	ret void			ret void
	}			}

	; Function Attrs: argmemonly nounwind			; Function Attrs: argmemonly nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

	attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { argmemonly nounwind }			attributes #1 = { argmemonly nounwind }

	!llvm.ident = !{!0}			!llvm.ident = !{!0}

	!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}			!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
	!1 = distinct !{!1, !2, !3}			!1 = distinct !{!1, !2, !3, !{!"llvm.loop.parallel_accesses", !4}}
	!2 = distinct !{!2, !3}			!2 = distinct !{!2, !3}
	!3 = !{!"llvm.loop.vectorize.enable", i1 true}			!3 = !{!"llvm.loop.vectorize.enable", i1 true}
				!4 = distinct !{} ; access group

test/Transforms/LoopVectorize/X86/force-ifcvt.ll

	; RUN: opt -loop-vectorize -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {			define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; CHECK-LABEL: @Test			; CHECK-LABEL: @Test
	; CHECK: <4 x i32>			; CHECK: <4 x i32>

	for.body: ; preds = %cond.end, %entry			for.body: ; preds = %cond.end, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
	%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !1
	%cmp1 = icmp eq i32 %0, 0			%cmp1 = icmp eq i32 %0, 0
	%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			%1 = load i32, i32* %arrayidx3, align 4, !llvm.access.group !1
	br i1 %cmp1, label %cond.end, label %cond.false			br i1 %cmp1, label %cond.end, label %cond.false

	cond.false: ; preds = %for.body			cond.false: ; preds = %for.body
	%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0			%2 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !1
	%add = add nsw i32 %2, %1			%add = add nsw i32 %2, %1
	br label %cond.end			br label %cond.end

	cond.end: ; preds = %for.body, %cond.false			cond.end: ; preds = %for.body, %cond.false
	%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]			%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
	store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			store i32 %cond, i32* %arrayidx3, align 4, !llvm.access.group !1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %cond.end			for.end: ; preds = %cond.end
	ret void			ret void
	}			}

	attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }			attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

	!0 = distinct !{!0}			!0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
				!1 = distinct !{}

test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll

Show All 13 Lines	entry:
%indvars.iv.reg2mem = alloca i64		%indvars.iv.reg2mem = alloca i64
%"reg2mem alloca point" = bitcast i32 0 to i32		%"reg2mem alloca point" = bitcast i32 0 to i32
store i64 0, i64* %indvars.iv.reg2mem		store i64 0, i64* %indvars.iv.reg2mem
br label %for.body		br label %for.body

for.body: ; preds = %for.body.for.body_crit_edge, %entry		for.body: ; preds = %for.body.for.body_crit_edge, %entry
%indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem		%indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem
%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload		%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload
%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3		%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !4
%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload		%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload
%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3		%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !4
%idxprom3 = sext i32 %1 to i64		%idxprom3 = sext i32 %1 to i64
%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3		%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3		store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !4
%indvars.iv.next = add i64 %indvars.iv.reload, 1		%indvars.iv.next = add i64 %indvars.iv.reload, 1
; A new store without the parallel metadata here:		; A new store without the parallel metadata here:
store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem		store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem
%indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem
%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1		%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1
%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3		%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !4
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3		store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !4
%indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem
%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32		%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
%exitcond = icmp eq i32 %lftr.wideiv, 512		%exitcond = icmp eq i32 %lftr.wideiv, 512
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3		br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3

for.body.for.body_crit_edge: ; preds = %for.body		for.body.for.body_crit_edge: ; preds = %for.body
%indvars.iv.next.reload2 = load i64, i64* %indvars.iv.next.reg2mem		%indvars.iv.next.reload2 = load i64, i64* %indvars.iv.next.reg2mem
store i64 %indvars.iv.next.reload2, i64* %indvars.iv.reg2mem		store i64 %indvars.iv.next.reload2, i64* %indvars.iv.reg2mem
br label %for.body		br label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!3 = !{!3}		!3 = !{!3, !{!"llvm.loop.parallel_accesses", !4}}
		!4 = distinct !{}

test/Transforms/LoopVectorize/X86/parallel-loops.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	;CHECK: <4 x i32>			;CHECK: <4 x i32>
	define void @parallel_loop(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {			define void @parallel_loop(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !13
	%idxprom3 = sext i32 %1 to i64			%idxprom3 = sext i32 %1 to i64
	%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3			%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
	; This store might have originated from inlining a function with a parallel			; This store might have originated from inlining a function with a parallel
	; loop. Refers to a list with the "original loop reference" (!4) also included.			; loop. Refers to a list with the "original loop reference" (!4) also included.
	store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !5			store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !15
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next			%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
	%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3			%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !13
	store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !13
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 512			%exitcond = icmp eq i32 %lftr.wideiv, 512
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	; The same loop with an illegal parallel loop metadata: the memory			; The same loop with an illegal parallel loop metadata: the memory
	; accesses refer to a different loop's identifier.			; accesses refer to a different loop's identifier.

	;CHECK-LABEL: @mixed_metadata(			;CHECK-LABEL: @mixed_metadata(
	;CHECK-NOT: <4 x i32>			;CHECK-NOT: <4 x i32>

	define void @mixed_metadata(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {			define void @mixed_metadata(i32* nocapture %a, i32* nocapture %b) nounwind uwtable {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !6			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !16
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6			%1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !16
	%idxprom3 = sext i32 %1 to i64			%idxprom3 = sext i32 %1 to i64
	%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3			%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
	; This refers to the loop marked with !7 which we are not in at the moment.			; This refers to the loop marked with !7 which we are not in at the moment.
	; It should prevent detecting as a parallel loop.			; It should prevent detecting as a parallel loop.
	store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !7			store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !17
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next			%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
	%2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !6			%2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !16
	store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6			store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !16
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 512			%exitcond = icmp eq i32 %lftr.wideiv, 512
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13, !15}}
	!4 = !{!4}			!4 = !{!4, !{!"llvm.loop.parallel_accesses", !14, !15}}
	!5 = !{!3, !4}			!6 = !{!6, !{!"llvm.loop.parallel_accesses", !16}}
	!6 = !{!6}			!7 = !{!7, !{!"llvm.loop.parallel_accesses", !17}}
	!7 = !{!7}			!13 = distinct !{}
				!14 = distinct !{}
				!15 = distinct !{}
				!16 = distinct !{}
				!17 = distinct !{}

test/Transforms/LoopVectorize/X86/pr34438.ll

	Show All 12 Lines
	; CHECK: load <8 x float>, <8 x float>*			; CHECK: load <8 x float>, <8 x float>*
	; CHECK: fadd fast <8 x float>			; CHECK: fadd fast <8 x float>
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !5
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !5
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !5
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 8			%exitcond = icmp eq i64 %indvars.iv.next, 8
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !5}}
	!4 = !{!4}			!4 = !{!4}
				!5 = distinct !{}

test/Transforms/LoopVectorize/X86/vect.omp.force.ll

	Show All 26 Lines

	define void @vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {			define void @vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
	%call = tail call float @llvm.sin.f32(float %0)			%call = tail call float @llvm.sin.f32(float %0)
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			store float %call, float* %arrayidx2, align 4, !llvm.access.group !11
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 1000			%exitcond = icmp eq i32 %lftr.wideiv, 1000
	br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !1			br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !1

	for.end.loopexit:			for.end.loopexit:
	br label %for.end			br label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

	!1 = !{!1, !2}			!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
	!2 = !{!"llvm.loop.vectorize.enable", i1 true}			!2 = !{!"llvm.loop.vectorize.enable", i1 true}
				!11 = distinct !{}

	;			;
	; This method will not be vectorized, as scalar cost is lower than any of vector costs.			; This method will not be vectorized, as scalar cost is lower than any of vector costs.
	;			;

	define void @not_vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {			define void @not_vectorized(float* noalias nocapture %A, float* noalias nocapture %B) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%call = tail call float @llvm.sin.f32(float %0)			%call = tail call float @llvm.sin.f32(float %0)
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %call, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, 1000			%exitcond = icmp eq i32 %lftr.wideiv, 1000
	br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop !3

	for.end.loopexit:			for.end.loopexit:
	br label %for.end			br label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

	declare float @llvm.sin.f32(float) nounwind readnone			declare float @llvm.sin.f32(float) nounwind readnone

	; Dummy metadata			; Dummy metadata
	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
				!13 = distinct !{}

test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.access.group !3
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.access.group !3
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !3
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !4			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !5
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !11
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1			store float %add, float* %arrayidx2, align 4, !llvm.access.group !11
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 20			%exitcond = icmp eq i64 %indvars.iv.next, 20
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

	for.end:			for.end:
	ret void			ret void
	}			}

	!1 = !{!1, !2}			!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
	!2 = !{!"llvm.loop.vectorize.enable", i1 true}			!2 = !{!"llvm.loop.vectorize.enable", i1 true}
				!11 = distinct !{}

	;			;
	; This loop will be vectorized as the trip count is below the threshold but no			; This loop will be vectorized as the trip count is below the threshold but no
	; scalar iterations are needed thanks to folding its tail.			; scalar iterations are needed thanks to folding its tail.
	;			;
	define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized1(			; CHECK-LABEL: @vectorized1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	Show All 15 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>			; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP7]], <8 x float>* [[TMP9]], i32 4, <8 x i1> [[TMP8]])			; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP7]], <8 x float>* [[TMP9]], i32 4, <8 x i1> [[TMP8]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !7
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 24, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.access.group !9
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP12:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.access.group !9
				; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP11]], [[TMP12]]
				; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !9
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !10
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 20			%exitcond = icmp eq i64 %indvars.iv.next, 20
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

	for.end:			for.end:
	ret void			ret void
	}			}

	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
				!13 = distinct !{}

	;			;
	; This loop will be vectorized as the trip count is below the threshold but no			; This loop will be vectorized as the trip count is below the threshold but no
	; scalar iterations are needed.			; scalar iterations are needed.
	;			;
	define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized2(			; CHECK-LABEL: @vectorized2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	Show All 14 Lines
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <8 x float>, <8 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP5]] to <8 x float>*
	; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !9			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !11
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX]], align 4, !llvm.access.group !9
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX2]], align 4, !llvm.access.group !9
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !9
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 16			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 16
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !10			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !12
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
	%1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
	%add = fadd fast float %0, %1			%add = fadd fast float %0, %1
	store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}

	!4 = !{!4}			!4 = !{!4}

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: not_too_small_tc			; CHECK-LABEL: not_too_small_tc
	; CHECK-AVX2: LV: Selecting VF: 16.			; CHECK-AVX2: LV: Selecting VF: 16.
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv			%arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv
	%l1 = load i8, i8* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3			%l1 = load i8, i8* %arrayidx, align 4, !llvm.access.group !13
	%arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv
	%l2 = load i8, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			%l2 = load i8, i8* %arrayidx2, align 4, !llvm.access.group !13
	%add = add i8 %l1, %l2			%add = add i8 %l1, %l2
	store i8 %add, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3			store i8 %add, i8* %arrayidx2, align 4, !llvm.access.group !13
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

	for.end:			for.end:
	ret void			ret void
	}			}
	!3 = !{!3}			!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
	!4 = !{!4}			!4 = !{!4}
				!13 = distinct !{}

test/Transforms/SROA/mem-par-metadata-sroa.ll

; RUN: opt < %s -sroa -S \| FileCheck %s		; RUN: opt < %s -sroa -S \| FileCheck %s
;		;
; Make sure the llvm.mem.parallel_loop_access meta-data is preserved		; Make sure the llvm.access.group meta-data is preserved
; when a load/store is replaced with another load/store by sroa		; when a load/store is replaced with another load/store by sroa
;		;
; class Complex {		; class Complex {
; private:		; private:
; float real_;		; float real_;
; float imaginary_;		; float imaginary_;
;		;
; public:		; public:
Show All 16 Lines
; for (long offset = 0; offset < size; ++offset) {		; for (long offset = 0; offset < size; ++offset) {
; Complex t0 = out[offset];		; Complex t0 = out[offset];
; out[offset] = t0 + t0;		; out[offset] = t0 + t0;
; }		; }
; }		; }

; CHECK: for.body:		; CHECK: for.body:
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1		; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.access.group !1
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1		; CHECK: store i32 %{{.}}, i32 %{{.*}}, align 4, !llvm.access.group !1
; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4		; CHECK-NOT: store i32 %{{.}}, i32 %{{.*}}, align 4
; CHECK: br label		; CHECK: br label

; ModuleID = '<stdin>'		; ModuleID = '<stdin>'
source_filename = "mem-par-metadata-sroa1.cpp"		source_filename = "mem-par-metadata-sroa1.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

Show All 11 Lines	for.cond: ; preds = %for.body, %entry
%offset.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]		%offset.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp slt i64 %offset.0, %size		%cmp = icmp slt i64 %offset.0, %size
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.body: ; preds = %for.cond		for.body: ; preds = %for.cond
%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0		%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
%real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0		%real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0
%0 = load float, float* %real_.i.i, align 4, !llvm.mem.parallel_loop_access !1		%0 = load float, float* %real_.i.i, align 4, !llvm.access.group !11
store float %0, float* %real_.i, align 4, !llvm.mem.parallel_loop_access !1		store float %0, float* %real_.i, align 4, !llvm.access.group !11
%imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1		%imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1
%1 = load float, float* %imaginary_.i.i, align 4, !llvm.mem.parallel_loop_access !1		%1 = load float, float* %imaginary_.i.i, align 4, !llvm.access.group !11
store float %1, float* %imaginary_.i, align 4, !llvm.mem.parallel_loop_access !1		store float %1, float* %imaginary_.i, align 4, !llvm.access.group !11
%arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0		%arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
%real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.access.group !11
%real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0		%real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
%3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.access.group !11
%add.i = fadd float %2, %3		%add.i = fadd float %2, %3
%imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.access.group !11
%imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1		%imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
%5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1		%5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.access.group !11
%add4.i = fadd float %4, %5		%add4.i = fadd float %4, %5
%real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0		%real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0
store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1		store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.access.group !11
%imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1		%imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1
store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1		store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.access.group !11
%6 = bitcast %class.Complex* %arrayidx1 to i64*		%6 = bitcast %class.Complex* %arrayidx1 to i64*
%7 = load i64, i64* %ref.tmp, align 8, !llvm.mem.parallel_loop_access !1		%7 = load i64, i64* %ref.tmp, align 8, !llvm.access.group !11
store i64 %7, i64* %6, align 4, !llvm.mem.parallel_loop_access !1		store i64 %7, i64* %6, align 4, !llvm.access.group !11
%inc = add nsw i64 %offset.0, 1		%inc = add nsw i64 %offset.0, 1
br label %for.cond, !llvm.loop !1		br label %for.cond, !llvm.loop !1

for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

; Function Attrs: argmemonly nounwind		; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1		declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

attributes #0 = { norecurse nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #0 = { norecurse nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }		attributes #1 = { argmemonly nounwind }

!llvm.ident = !{!0}		!llvm.ident = !{!0}

!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}		!0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
!1 = distinct !{!1, !2}		!1 = distinct !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
!2 = !{!"llvm.loop.vectorize.enable", i1 true}		!2 = !{!"llvm.loop.vectorize.enable", i1 true}
!3 = !{!4}		!3 = !{!4}
!4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}		!4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}
!5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}		!5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}
		!11 = distinct !{}

test/Transforms/Scalarizer/basic.ll

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	; CHECK: store i32 %add.i3, i32* %dst.i3, align 4, !tbaa.struct ![[TAG]]			; CHECK: store i32 %add.i3, i32* %dst.i3, align 4, !tbaa.struct ![[TAG]]
	; CHECK: ret void			; CHECK: ret void
	%val = load <4 x i32> , <4 x i32> *%src, !tbaa.struct !5			%val = load <4 x i32> , <4 x i32> *%src, !tbaa.struct !5
	%add = add <4 x i32> %val, %val			%add = add <4 x i32> %val, %val
	store <4 x i32> %add, <4 x i32> *%dst, !tbaa.struct !5			store <4 x i32> %add, <4 x i32> *%dst, !tbaa.struct !5
	ret void			ret void
	}			}

	; Check that llvm.mem.parallel_loop_access information is preserved.			; Check that llvm.access.group information is preserved.
	define void @f5(i32 %count, <4 x i32> %src, <4 x i32> %dst) {			define void @f5(i32 %count, <4 x i32> %src, <4 x i32> %dst) {
	; CHECK-LABEL: @f5(			; CHECK-LABEL: @f5(
	; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG:[0-9]*]]			; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.access.group ![[TAG:[0-9]*]]
	; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.access.group ![[TAG]]
	; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.access.group ![[TAG]]
	; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.access.group ![[TAG]]
	; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]			; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.access.group ![[TAG]]
	; CHECK: ret void			; CHECK: ret void
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%index = phi i32 [ 0, %entry ], [ %next_index, %loop ]			%index = phi i32 [ 0, %entry ], [ %next_index, %loop ]
	%this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index			%this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index
	%this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index			%this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index
	%val = load <4 x i32> , <4 x i32> *%this_src, !llvm.mem.parallel_loop_access !3			%val = load <4 x i32> , <4 x i32> *%this_src, !llvm.access.group !13
	%add = add <4 x i32> %val, %val			%add = add <4 x i32> %val, %val
	store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.mem.parallel_loop_access !3			store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.access.group !13
	%next_index = add i32 %index, -1			%next_index = add i32 %index, -1
	%continue = icmp ne i32 %next_index, %count			%continue = icmp ne i32 %next_index, %count
	br i1 %continue, label %loop, label %end, !llvm.loop !3			br i1 %continue, label %loop, label %end, !llvm.loop !3

	end:			end:
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines

	exit:			exit:
	ret <4 x float> %next_acc			ret <4 x float> %next_acc
	}			}

	!0 = !{ !"root" }			!0 = !{ !"root" }
	!1 = !{ !"set1", !0 }			!1 = !{ !"set1", !0 }
	!2 = !{ !"set2", !0 }			!2 = !{ !"set2", !0 }
	!3 = !{ !3 }			!3 = !{ !3, !{!"llvm.loop.parallel_accesses", !13} }
	!4 = !{ float 4.0 }			!4 = !{ float 4.0 }
	!5 = !{ i64 0, i64 8, null }			!5 = !{ i64 0, i64 8, null }
				!13 = distinct !{}

test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll

	; RUN: opt -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -simplifycfg -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {			define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; CHECK-LABEL: @Test			; CHECK-LABEL: @Test
	; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
	; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
	; CHECK: store i32 {{.*}}, align 4, !llvm.mem.parallel_loop_access !0			; CHECK: store i32 {{.*}}, align 4, !llvm.access.group !0
	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK-NOT: store			; CHECK-NOT: store

	for.body: ; preds = %cond.end, %entry			for.body: ; preds = %cond.end, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
	%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0			%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
	%cmp1 = icmp eq i32 %0, 0			%cmp1 = icmp eq i32 %0, 0
	br i1 %cmp1, label %cond.true, label %cond.false			br i1 %cmp1, label %cond.true, label %cond.false

	cond.false: ; preds = %for.body			cond.false: ; preds = %for.body
	%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%v = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0			%v = load i32, i32* %arrayidx3, align 4, !llvm.access.group !0
	%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0			%1 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !0
	%add = add nsw i32 %1, %v			%add = add nsw i32 %1, %v
	br label %cond.end			br label %cond.end

	cond.true: ; preds = %for.body			cond.true: ; preds = %for.body
	%arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%w = load i32, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0			%w = load i32, i32* %arrayidx4, align 4, !llvm.access.group !0
	%arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv			%arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%2 = load i32, i32* %arrayidx8, align 4, !llvm.mem.parallel_loop_access !0			%2 = load i32, i32* %arrayidx8, align 4, !llvm.access.group !0
	%add2 = add nsw i32 %2, %w			%add2 = add nsw i32 %2, %w
	br label %cond.end			br label %cond.end

	cond.end: ; preds = %for.body, %cond.false			cond.end: ; preds = %for.body, %cond.false
	%cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]			%cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]
	%arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv			%arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	store i32 %cond, i32* %arrayidx9, align 4, !llvm.mem.parallel_loop_access !0			store i32 %cond, i32* %arrayidx9, align 4, !llvm.access.group !0
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16			%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %cond.end			for.end: ; preds = %cond.end
	ret void			ret void
	}			}

	attributes #0 = { norecurse nounwind uwtable }			attributes #0 = { norecurse nounwind uwtable }

	!0 = distinct !{!0, !1}			!0 = distinct !{!0, !1, !{!"llvm.loop.parallel_accesses", !10}}
	!1 = !{!"llvm.loop.vectorize.enable", i1 true}			!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				!10 = distinct !{}

This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 177322

docs/LangRef.rst

include/llvm/Analysis/LoopInfo.h

include/llvm/Analysis/LoopInfoImpl.h

include/llvm/Analysis/VectorUtils.h

include/llvm/IR/LLVMContext.h

include/llvm/Transforms/Utils/LoopUtils.h

lib/Analysis/LoopInfo.cpp

lib/Analysis/VectorUtils.cpp

lib/IR/LLVMContext.cpp

lib/Transforms/InstCombine/InstCombineCalls.cpp

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

lib/Transforms/InstCombine/InstCombinePHI.cpp

lib/Transforms/Scalar/GVNHoist.cpp

lib/Transforms/Scalar/LoopVersioningLICM.cpp

lib/Transforms/Scalar/MemCpyOptimizer.cpp

lib/Transforms/Scalar/SROA.cpp

lib/Transforms/Scalar/Scalarizer.cpp

lib/Transforms/Utils/InlineFunction.cpp

lib/Transforms/Utils/Local.cpp

lib/Transforms/Utils/LoopUtils.cpp

lib/Transforms/Utils/SimplifyCFG.cpp

test/Analysis/LoopInfo/annotated-parallel-complex.ll

test/Analysis/LoopInfo/annotated-parallel-simple.ll

test/ThinLTO/X86/lazyload_metadata.ll

test/Transforms/Inline/parallel-loop-md-callee.ll

test/Transforms/Inline/parallel-loop-md-merge.ll

test/Transforms/Inline/parallel-loop-md.ll

test/Transforms/InstCombine/intersect-accessgroup.ll

test/Transforms/InstCombine/loadstore-metadata.ll

test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

test/Transforms/LoopVectorize/X86/force-ifcvt.ll

test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll

test/Transforms/LoopVectorize/X86/parallel-loops.ll

test/Transforms/LoopVectorize/X86/pr34438.ll

test/Transforms/LoopVectorize/X86/vect.omp.force.ll

test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll

test/Transforms/SROA/mem-par-metadata-sroa.ll

test/Transforms/Scalarizer/basic.ll

test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
ClosedPublic