This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
8/21
LangRef.rst
-
Passes.rst
30/43
TransformMetadata.rst
-
index.rst
-
include/llvm/
-
llvm/
-
InitializePasses.h
1/2
LinkAllPasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
2
WarnMissedTransforms.h
-
Utils/
5/6
LoopUtils.h
-
UnrollLoop.h
-
Vectorize/
-
LoopVectorizationLegality.h
-
lib/
-
Analysis/
-
LoopInfo.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
1/1
LoopDistribute.cpp
2/9
LoopUnrollAndJamPass.cpp
5/8
LoopUnrollPass.cpp
-
LoopVersioningLICM.cpp
-
Scalar.cpp
3/3
WarnMissedTransforms.cpp
-
Utils/
-
LoopUnroll.cpp
-
LoopUnrollAndJam.cpp
3/5
LoopUnrollRuntime.cpp
3/6
LoopUtils.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
opt-hot-cold-split.ll
-
Transforms/
-
LoopDistribute/
-
disable-heuristic.ll
-
followup.ll
-
LoopTransformWarning/
-
distribution-remarks-missed.ll
-
unrollandjam-remarks-missed.ll
-
unrolling-remarks-missed.ll
-
vectorization-remarks-missed.ll
-
LoopUnroll/
-
disable_nonforced.ll
-
disable_nonforced_count.ll
-
disable_nonforced_enable.ll
-
disable_nonforced_full.ll
-
runtime-loop_transform.ll
-
unroll-count_transform.ll
-
unroll-pragmas-disabled_transform.ll
5
unroll-pragmas_transform.ll
-
LoopUnrollAndJam/
-
disable_nonforced.ll
-
disable_nonforced_count.ll
-
disable_nonforced_enable.ll
-
followup-metadata.ll
-
pragma.ll
-
LoopVectorize/
-
X86/
-
already-vectorized_transform.ll
-
vectorization-remarks-missed.ll
-
x86_fp80-vector-store_transform.ll
-
disable-heuristic.ll
-
duplicated-metadata_transform.ll
-
followups.ll
-
hints-trans_transform.ll
-
multiple-strides-vectorization_transform.ll
-
no_array_bounds.ll
-
no_switch.ll
-
vectorize-once_transform.ll

Differential D49281

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
ClosedPublic

Authored by Meinersbur on Jul 12 2018, 10:15 PM.

Download Raw Diff

Details

Reviewers

hfinkel
dmgreen
hsaito
jeffhammond
jingyue
eliben
anemet
Ashutosh
ashutosh.nema
mcrosier
fhahn
hiraditya
meheff
bollu

Commits

rG7244852557ca: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
rL348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
rC348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.

Summary

When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)

is the same as

#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

To ensure that no other transformation is executed before the intended one, the attribute llvm.loop.transformations.disable_nonforced can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

This approach deviates from proposal in the RFC at http://lists.llvm.org/pipermail/llvm-dev/2018-May/123690.html. There are three reasons:

In the RFC approach, when a pass wants to determine whether it should perform a transformation, it must search the list of transformations for the first transformation on its loop. When inlining, the two transformation list must be combined.
For compatibility, the current approach using loop IDs must still be supported, either each pass looks up both metadata formats, or using the AutoUpgrade mechanism. This patch's approach keeps the current mechanism.
The "loop IDs" can change. When a attribute is added/removed from the Loop ID metata (e.g. adding "llvm.loop.isvectorized" after vectorizing), the loop is assigned a new, distrinct MDNode. Every reference to these nodes needs to be updated as well, and metadata nodes do not support RAUW. Some of the current passes (e.g. LoopDistribution) keep the same loop ID for multiple output loops such that the loop ID is not uniquely identifying loops anymore. This patch's approach does not reference loop IDs in the attribute values.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 25622
Build 25621: arc lint + arc unit

Event Timeline

Meinersbur created this revision.Jul 12 2018, 10:15 PM

Herald added a reviewer: bollu. · View Herald TranscriptJul 12 2018, 10:15 PM

Herald added subscribers: dexonsmith, steven_wu, zzheng and 3 others. · View Herald Transcript

I like the idea of this, giving control to the user to figure out the best way to manipulate loops (even if they end up shooting themselves in the foot with it :) )

What does this mean for the clang side of this patch? Has the user-visible loop-naming scheme changed, or will it be mapped to these matadatas?

What follows is mostly a number of small Nits:

docs/TransformMetadata.rst
24	Nit: from the emitted IR
54	Nit: transformation
83	Nit: transformed
88	Nit: occur
253	Nit: have
344	Should we fix this? Will it work as expected with nonforced, if it was enabled?
include/llvm/LinkAllPasses.h
223	Nit: Space I guess? I think this file could do with a clang-formatting
include/llvm/Transforms/Scalar/WarnMissedTransforms.h
36	Are these two needed here, if they are declared in other places?
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
361	This is called SubLoop here

Thanks for the review.

In D49281#1162931, @dmgreen wrote:

What does this mean for the clang side of this patch? Has the user-visible loop-naming scheme changed, or will it be mapped to these matadatas?

I also have a patch for clang in mind to make the order defined by the frontend, not the pass pipeline. However, for compatibility it will have to put them in the order of the current pass order e.g.

#pragma unroll_and_jam
#pragma clang loop vectorize(enable)

and

#pragma clang loop vectorize(enable)
#pragma unroll_and_jam

both have to emit

!0 = !{!0, 
    !{!"llvm.loop.vectorize.enable"},
    !{!"llvm.loop.vectorize.followup_all", 
        !{!"llvm.loop.unrollandjam.enable"}}}}

A different syntax is needed that the user can use if they want to define the transformation order.

Also for compatibility we cannot error-out if transformation undefined (at least for any combination of currently supported transformations), but could use the Auto-Upgrader.

ping

Meinersbur marked 7 inline comments as done.Jul 26 2018, 7:30 PM

Meinersbur added inline comments.

docs/TransformMetadata.rst
344	Yes, but in a separate patch that adds interchange-specific metadata. LoopInterchange currently is not enabled by default and does not modify metadata so its interaction with other transformation is less significant.
include/llvm/LinkAllPasses.h
223	I used `git clang-format origin/master` which only applies to changed lines like this one.
include/llvm/Transforms/Scalar/WarnMissedTransforms.h
36	Depends on which headers are included when using this pass. `WarnMissingTransforms.cpp` itself does not include `include/llvm/InitializePasses.h` or `include/llvm/Transforms/Scalar.h`. Having the forward declarations here matches the translation unit header idiom an the compiler checks that the definition matches the declaration here.

Address @dmgreen's remarks

Harbormaster completed remote builds in B20769: Diff 157638.Jul 26 2018, 7:32 PM

Some extra tests for nonforced + a pragma would be good to see.

I'm not much of an expert on the vectoriser changes here.

docs/LangRef.rst
5289	These can now move down with the other unroll_and_jam metadata
5307	Nit: This loop
docs/TransformMetadata.rst
72	Nit: Maybe change the second "for instance"
164	Do you think it's worth mentioning unroll.count and unroll.disable etc, before jumping into the followup metadata?
207	Again, could mention unroll_and_jam.count and enable/disable.
265	Again, maybe describe other metadata first?
344	Yep. Certainly a separate patch.
396	Nit: pass pass
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
206	Same as unroll. What if PragmaCount and hasDisableAllTransformsHint?
lib/Transforms/Scalar/LoopUnrollPass.cpp
755	What if it has PragmaCount and hasDisableAllTransformsHint? Should that not enable too?
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
2	This file look sensible on it's own and I think looks OK to be committed separately. (Apart from the nit below)
6	Nit: Is this sentence still true?

In D49281#1187583, @dmgreen wrote:

I'm not much of an expert on the vectoriser changes here.

There is a call for a vectorizer person, and here I am, but before going there, ever since this patch was uploaded, I've been thinking whether the original intent of this patch can be really accomplished ---- especially so with the set of "hints" being used here. So, I'd like to go back and start there if that's okay. I think this patch is in some sense based on the optimism of "programmer won't abuse this and transformation will happen". Reality is more like programmer will try using those pragmas to arm-twist the compiler to get the set of transformations he/she wants w/o thinking deep enough about what happens at each step of the way. As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens. From what I've read --- granted that I haven't gone through this very deep, the fall back aspect isn't handled well. If we don't start from "programmer specified transformation may fail to kick-in", providing this feature to the programmers would quickly backfire and we'll get tons of this doesn't work that doesn't work problem reports ---- which is a big mess/disaster. If we are doing this for research purposes, that may be fine. I'm looking at this from a production compiler development perspective.

There are two different approaches we can think of.

Start from defining the transformation directives that actually transforms in most cases. For example, Intel compiler's implementation of OpenMP SIMD is in this position, and we are trying to bring the same position to LLVM LV. Then, failing to transform is a compiler bug (or a programmer has a bug in the source code to fix). OpenMP SIMD is defined in such a way for compilers to be able to take this positioning. In this approach, we can stop processing transformation directives after the first failed transformation. Even in this case, for example, programmer probably won't know under certain circumstances, vectorizer won't produce remainder loop, or under certain cases, vectorizer completely unrolls the loop so that there aren't any loops left after vectorization. So, controlling a single transformation in a programmer predictable manner --- enough to describe what should happen next --- is a big task.
Based on transformation hint directives. Since what the programmer is using is a hint, for each transformation, programmer needs to tell the next step 1) if the transformation kicks in and also 2) if the transformation does not kick-in. This will be very messy and I'm not sure how practical it would be for the programmer to specify the behaviors for all situations.

If these kinds of stuff have been already discussed, please give me the pointers and I'll try to digest. Else, we can talk about this in this review or in llvm-dev. Hope my argument makes sense to you/others, but please feel free to ask for clarification. I've been working on making SIMD directive programmer predictable for many years. So, I may be skipping some explanations that became natural enough to me over the years.

In short, doing this even for one transformation (in my case SIMD) is difficult enough. If we are trying to expand to multiple transformations, we should try doing so in baby steps.

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

Thanks,
Hideki

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not). By default, the compiler should still provide safety conditions. The alternative to providing the functionality proposed here is, in reality, source-to-source code generators (which often do semantically-incorrect things and are hard to use in production for a large number of reasons). Also, the client for this functionality is not just programmers directly, but other tools (e.g., autotuners and other higher-level languages), which is also why generating safety predicates, or at least having that option, is important. I think that LLVM also currently does the right thing regarding transformations that don't apply: we issue a warning (which bubbles up from the optimizer via the optimization-remark interface). We should certainly continue to do that (and I know that Michael has experimented with ways of making this continue to happen reliably).

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

I don't think that we can ever really having something in this space which isn't best effort, but, I think that providing a warning is both necessary (because silent failure is poor user experience) and sufficient (it's not clear to me what kind of fallback we could provide that would be more robust in practice). If people start providing us with bug reports about loops that we couldn't transform, but should have transformed, that will be great data on what to improve. That having been said, if you can suggest pragmas that have semantics that allow us to control loop transformations in a way that's more robust than the current ones, then please, of course, suggest them.

I feel obligated to note, however, that the motivation for this work comes entirely from our experience supporting HPC users. I'm convinced that it will provide a significantly better user experience over the current state of the art. I'm sure that you've seen code that comes out of higher-level code generators and the like. These tend to be hard to maintain, inflexible, and buggy, and the code produced is difficult for both humans and compilers to understand. I'm sure you've also seen cases where people implement these transformations by hand (do I need to go on?). The compiler can perform these transformations and having the user able to direct the compiler to do so if a much better option. To some extent, a compiler can have cost models and heuristics to apply these automatically, but only occasionally do we actually have enough static information to do so (even with some PGO capabilities).

In D49281#1187583, @dmgreen wrote:

Some extra tests for nonforced + a pragma would be good to see.

Any transformations in particular?

In D49281#1188026, @hsaito wrote:

In D49281#1187583, @dmgreen wrote:

I'm not much of an expert on the vectoriser changes here.

There is a call for a vectorizer person, and here I am,

Thank you!!!

but before going there, ever since this patch was uploaded, I've been thinking whether the original intent of this patch can be really accomplished ---- especially so with the set of "hints" being used here. So, I'd like to go back and start there if that's okay. I think this patch is in some sense based on the optimism of "programmer won't abuse this and transformation will happen". Reality is more like programmer will try using those pragmas to arm-twist the compiler to get the set of transformations he/she wants w/o thinking deep enough about what happens at each step of the way. As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens. From what I've read --- granted that I haven't gone through this very deep, the fall back aspect isn't handled well. If we don't start from "programmer specified transformation may fail to kick-in", providing this feature to the programmers would quickly backfire and we'll get tons of this doesn't work that doesn't work problem reports ---- which is a big mess/disaster. If we are doing this for research purposes, that may be fine. I'm looking at this from a production compiler development perspective.

There are two different approaches we can think of.

Start from defining the transformation directives that actually transforms in most cases. For example, Intel compiler's implementation of OpenMP SIMD is in this position, and we are trying to bring the same position to LLVM LV. Then, failing to transform is a compiler bug (or a programmer has a bug in the source code to fix). OpenMP SIMD is defined in such a way for compilers to be able to take this positioning. In this approach, we can stop processing transformation directives after the first failed transformation. Even in this case, for example, programmer probably won't know under certain circumstances, vectorizer won't produce remainder loop, or under certain cases, vectorizer completely unrolls the loop so that there aren't any loops left after vectorization. So, controlling a single transformation in a programmer predictable manner --- enough to describe what should happen next --- is a big task.

Based on transformation hint directives. Since what the programmer is using is a hint, for each transformation, programmer needs to tell the next step 1) if the transformation kicks in and also 2) if the transformation does not kick-in. This will be very messy and I'm not sure how practical it would be for the programmer to specify the behaviors for all situations.

If these kinds of stuff have been already discussed, please give me the pointers and I'll try to digest. Else, we can talk about this in this review or in llvm-dev. Hope my argument makes sense to you/others, but please feel free to ask for clarification. I've been working on making SIMD directive programmer predictable for many years. So, I may be skipping some explanations that became natural enough to me over the years.

All these issues apply to the current loop metadata/#pragma clang loop as well. A user can specify #pragma clang loop distribute(enable) vectorize_width(4) with the expectation that these will be carried out. If not, this can be a bug (or semantically incorrect) just as if the distribution/vecorization was explicitly orderder using followup-attributes.
In contrast, currently there is no way to even express that the transformations should be carried out in there reverse order (first vectorization, then distribution; let's ignore for the moment whether this makes sense, it definitely makes sense with loop transformations other than those for which LLVM currently has metadata). Our longer-term plans are to support this in LLVM and the first step is to make sequences of transformations expressible in IR. Of course our goal is also to make transformations more applicable/robust. I see these as two orthogonal problems.

I don't think we need a fallback loop transformation. If a transformation cannot be applied, the user's reactions is probably not "let's do a different transformation then" (which will in most cases also fail for the same reasons the primary transformation failed), but "unfortunately the compiler cannot do my transformation I need to get the best performance, will implement it by hand then." Even if the first option is what the programmer wantsm they will implement it using preprocessor switches as it is common today with different compilers that support/do not support specific pragmas.

I had already some discussions on the reliability of transformations in the compiler. Some groups do not want to rely at all on the compiler being able to do a specific optimization and user libraries instead. Those libraries will 'miscompile' the input if certain preconditions are not met which is the desired outcome where slow execution just is no option. With incorrect results it at least becomes obvious that there is a problem. However, this is not an option for a compiler where correctness is the most important aspect (and only relaxed by attributes such as assume_safety). For some use cases, such as autotuning, being able to rely on the compiler producing correct output makes it possible in first place.
We could argue about whether we want high-level transformations in a low-level compiler in the first place. I think this has been answered a long time ago: LoopVectorizer, LoopUnroll, LoopInterchange, LoopDistribution, LoopUnswitch, etc. Since we have this kind of transformation, why not mkaing them as good as possible?

In short, doing this even for one transformation (in my case SIMD) is difficult enough.

I'd be happy to extract-out attributes for specific transformations in separate reviews. However, to be come useful, I think we need the entire set.

If we are trying to expand to multiple transformations, we should try doing so in baby steps.

I understood that you are working on making the loop vectorizer predictable, i.e. we are working towards the same goal.

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

I think the current one-pass-per-transformation is indeed very fragile and I am working on something that should be more robust.

lib/Transforms/Scalar/LoopUnrollPass.cpp
755	If `llvm.loop.unroll.enable` is not set, interpretation here is that the transformation is 'non-forced', that is, `llvm.loop.unroll.count` is a hint to the compiler that if it unrolls, then it should unroll by that amount. `llvm.loop.disable_nonforced` overrride the decision whether to unroll, i.e. the unroll factor does not matter. I am aware that the concept of 'forced' transformations is not consistent between passes, but I am trying to give it some consistency. Passes could query shared code such as `hasUnrollTransformation` in `LoopUtils`. `hasUnrollTransformation` currently follows your interpretation of `llvm.loop.unroll.count` / `llvm.loop.disable_nonforced`. I am happy to implement either definition, as long as we find a consistent rule.
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
2	This is a copy of `unroll-pragmas.ll` and any ambiguous metadata replaced by follow-up attributes. An hope is to generally make 'multiple transformation attributes on the same loop' illegal and rejected by the IR verifier (since the result depends on an implementation detail: the order in the pass manager). In this case this file would replace `unroll-pragmas.ll`. But my expectation is that we cannot break backwards-compatibility this way.
6	Yes: When no follow-up attributes are specified, the default ones are added (here: `llvm.loop.unroll.disable` to disable further unrolling). In case there are follow-up attribute lists, there is no default and the transformation-disabling must be added explicitly (MDNode `!18`) and of course added after unrolling and recognized by the second LoopUnroll.

I agree that this is a very powerful idea (something I wish I'd had back when I was writing psuedo-hpc applications). I think it's well worth having, but equally worth making sure we get it right. The clang side is very important for that.

In D49281#1188441, @Meinersbur wrote:

In D49281#1187583, @dmgreen wrote:

Some extra tests for nonforced + a pragma would be good to see.

Any transformations in particular?

There seem to be tests that nonforced disables things, but not that nonforced + an attribute keeps it enabled. e.g. the unroll.count metadata.

lib/Transforms/Scalar/LoopUnrollPass.cpp
755	I would expect that if a loop has any metadata for a pass, that would mean disable_nonforced doesn't apply. As if the user has specified some metadata, it likely wants something to happen. I think in this specific case llvm.loop.unroll.count implies llvm.loop.unroll.enable, and we wouldn't put both on a loop for "#pragma unroll(4)" or "#pragma clang loop unroll_count(4)"
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
2	Ah, I missed the "followup" here. Is it worth replicating this entire file, or should it just be an extra test in the old file. The "followup" on unroll_1 seems to be the only test changed here? To add unroll.disable as a followup attribute? I'm not sure I see why. Would we expect "#pragma unroll(1)" to not work as it did before? (disable unroll)

In D49281#1188249, @hfinkel wrote:

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

In D49281#1189643, @hsaito wrote:

In D49281#1188249, @hfinkel wrote:

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

I certainly agree that we should document this.

In D49281#1189643, @hsaito wrote:

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?

Yes. A warning will be emitted by the -transform-warning pass (Please see Passes.rst). B and C cannot apply on a loop that does not exist.

Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways?

The current LoopDistribution pass unfortunately does not support this, by a goal is to make the user able to define what code should become their own loop. See [[ A Proposal for Loop-Transformation Pragmas | https://arxiv.org/abs/1805.03374 ]].
For the current LoopDistribution pass, only two categories of followup-attributes can be defined noncyclic and cyclic. The noncyclic category can be added to multiple loops.

If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Documented in TransformMetadata.rst line 57ff.

docs/TransformMetadata.rst
164	Yes, maybe, but they are also already documented in the `LangRef.rst`. Please understand that the goal in this patch is to define a transformation model for each pass such that it is clear what are those followup-loops, not to write an exhaustive documentation.

In D49281#1189737, @Meinersbur wrote:

In D49281#1189643, @hsaito wrote:

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?

Yes. A warning will be emitted by the -transform-warning pass (Please see Passes.rst).

This part, I know you did.

B and C cannot apply on a loop that does not exist.

I don't think this is explicitly written. Here's an example. Suppose A is vectorization and B is unroll. If a loop is somehow not vectorized. Unrolling can still happen to the non-vectorized loop. Whether we stop unrolling in this situation is what I'd like to see us being explicit about.

Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways?

The current LoopDistribution pass unfortunately does not support this, by a goal is to make the user able to define what code should become their own loop. See [[ A Proposal for Loop-Transformation Pragmas | https://arxiv.org/abs/1805.03374 ]].
For the current LoopDistribution pass, only two categories of followup-attributes can be defined noncyclic and cyclic. The noncyclic category can be added to multiple loops.

Whether distribution currently supports that is a different issue. I'm sure we will be expanding the features in the futures. This composability discussion should encapsulate the baseline behaviors for enough of possible future situations ---- else we have to keep revising baseline behaviors, which is very bad.

If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Documented in TransformMetadata.rst line 57ff.

I only see the warning behavior there. I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

Meinersbur mentioned this in D50075: [UnJ] Improve explicit loop count checks.Aug 9 2018, 2:27 PM

Explicitly document followup of not applied transformations to be ignored
Unroll/UnrollAndJam: Interpret enable/count/full as forced
Unroll/UnrollAndJam: Add tests for disable_nonforced combined with enable/count/full
Reduce size of unroll-pragmas_transform.ll

Harbormaster completed remote builds in B21316: Diff 160051.Aug 9 2018, 8:42 PM

In D49281#1189774, @hsaito wrote:

I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

I added a paragraph to TransformMetadata.rst. (I was assuming it was obvious from the definition: A transformation in a followup-attribute only becomes assigned to a loop by the loop transformation pass. Before that, it is not associated with any loop)

In D49281#1194814, @Meinersbur wrote:

In D49281#1189774, @hsaito wrote:

I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

I added a paragraph to TransformMetadata.rst. (I was assuming it was obvious from the definition: A transformation in a followup-attribute only becomes assigned to a loop by the loop transformation pass. Before that, it is not associated with any loop)

The added paragraph looks good to me on the implementation side specification. Looking forward to see the programmers (i.e., compiler users, not compiler writers) side pragma description, but that will not gate my review of this patch. There is a difference between specification forcing one behavior versus implementation choice ends up in the same behavior. I wanted the former, not the latter. With this specification, we can have another implementation choice ---- attaching all those metadata to the loop, to be updated by the successful transformation, and let failed transform drop subsequent ones. I'm not saying it's better to go that way. What I'm saying is that if, for some reason, we later choose to implement this differently, there is a specification to guide how to implement the feature correctly. Hope I don't sound too picky. I just want to provide consistent experiences to the programmers.

In D49281#1196034, @hsaito wrote:

The added paragraph looks good to me on the implementation side specification. Looking forward to see the programmers (i.e., compiler users, not compiler writers) side pragma description, but that will not gate my review of this patch. There is a difference between specification forcing one behavior versus implementation choice ends up in the same behavior. I wanted the former, not the latter.

Different behavior of different implementations is also a serious concern for me. I have three different implementations in mind (the current loop transformations, an extension to Polly to use this metadata, and an idealized loop-transformation pass; the latter two being more powerful is one of the motivtions for this path). Given the prototypical transformations in TransformMetadata.rst, I think the model is applicable to other implementations as well.

With this specification, we can have another implementation choice ---- attaching all those metadata to the loop, to be updated by the successful transformation, and let failed transform drop subsequent ones. I'm not saying it's better to go that way. What I'm saying is that if, for some reason, we later choose to implement this differently, there is a specification to guide how to implement the feature correctly.

This would unfortunately break existing behavior. E.g. llvm.loop.distribute.enable and llvm.loop.vectorize,enable can both be specified in the same loop attributes. Currently, if LoopDistribution fails, the attribute llvm.loop.vectorize.enable will be left untouched. If we change LoopDistribution to remove it, the loop would not be vectorized anymore (assuming the heuristic does not deem it profitable).
It also does do what motivates this patch: Neither the order of transformations be specified, nor can the same transformation be applied multiple times.

I was testing the code and ran into some problems with debug metadata on the loop nodes (actually using -Rpass=unroll in that case). Can you make sure that works as expected?

docs/TransformMetadata.rst
111	Nit: never be added
include/llvm/Transforms/Utils/LoopUtils.h
177	Nit: transformations->transformation
185	Nit: inherit
197	Nit: choose
227	Nit: warning
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
205	This code will need rebasing. There is a check earlier that looks for disable metadata that could be replaced by this. Look for HasUnrollDisablePragma/HasUnrollAndJamDisablePragma. If the same was done for unrolling, I think that would remove the need for the IgnoreUser (although your comment about it is probably still true).
lib/Transforms/Utils/LoopUtils.cpp
285	Maybe InheritSomeAttrs -> InheritNonExceptAttrs?

hiraditya added inline comments.Aug 14 2018, 5:13 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
1085	nit: maybe put the string literals as a separate declaration?
lib/Transforms/Utils/LoopUnrollRuntime.cpp
542	What is the rationale of using pointer to a pointer here? If we want to assign to ResultLoop, then maybe we can just return ResultLoop and bool as a pair.
925	nit: space

Meinersbur mentioned this in D50698: [UnJ] Ensure unroll_and_jam metadata is removed once consumed..Aug 14 2018, 3:04 PM

I am thinking about adding a LoopMetadataTacker (sort of a combination of LoopVectorizeHints and AssumptionTracker) analysis pass which would centralize the interpretation of that metadata and avoid the linear search through the metadata list when looking up a specific attribute.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
205	I'd push towards refectoring-out the common parts of `computeUnrollCount` used by LoopUnroll and LoopUnrollAndJam. Currently `computeUnrollCount` uses lots of settings meant for LoopUnroll (`llvm.loop.unroll.` metadata which should not exist anymore, OptimizationRemarkMissed specific to LoopUnroll, `-unroll-count`, `PartialThreshold`, handling of full unroll, loop peeling that UnrollAndJam does not support, being used in a single call by UnrollAndJam for two different things: determining `ExplicitUnroll` (i.e. is normal unroll is forced) and the unroll-and-jam count). It's hard to understand the subtleties between those codes. I gave up at some point and added the `IgnoreUser` flag to make test cases pass.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
542	If the Result loop is not needed, one can pass `nullptr` (which is the default argument). Returning `std::pair` will require more changes.
925	This is done by `clang-format`. It try not to fight its decisions and hope for future improvement.
lib/Transforms/Utils/LoopUtils.cpp
285	Avoiding double negation here.

Meinersbur added inline comments.Aug 14 2018, 3:46 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
925	I will try to remove the space in patch updates, but may sneak in again when I re-run clang-format and forget about it before submission.

hsaito added inline comments.Aug 15 2018, 5:33 PM

docs/LangRef.rst
5185	I understand that the RST file update should talk about what happens today, but for the sake of code review, it's good to discuss what could happen in reasonably foreseeable future so that we don't under-design things. I think we should be thinking ahead about vectorizer peeling the loop, for example, for alignment optimization. Such peeled loop could be fully unrolled if the trip count is known, or vectorized with mask. main vector loop could be fully unrolled there may be more than one remainder loop, e.g., vectorized remainder followed by scalar remainder. remainder loop may be fully unrolled. All those situations could happen w/o programmer knowing it'll happen that way. Some of the questions we want to think before the real need arises: Will the loop attribute get dropped if the "loop" is fully unrolled? How do we designate more than one remainder loop? Will the loop attribute applicable for vectorized peel/remainder? Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder?
5282	Remainder here may be unrolled again or fully unrolled (see the comments on vectorize metadata). What do we do for that?
5342	Is there an assumption of unroll_and_jam operating only on a double loop and/or a perfect loop? Technically speaking, we can unroll_and_jam a loop if we can legally outerloop-vectorize. So, there can be multiple inner loops.
5411	Looks rather centric to distribute-for-vectorization. Loop distribution can happen for many reasons (and it may be more than one reasons). Are we going to define followup_ Metadata for each of those reasons? What'll happen if a loop matches the characteristics of more than one Metadata?

Report unroll-and-jam as not applied even if unroll is present as well.
- rename followup_cyclic/followup_noncyclic to followup_sequential/followup_coincident
- Move hasUnrollAndJamTransformation in LoopUnrollAndJamPass to different place
- Remove some unrelated whitespace changes made by clang-format
- Extract followup attribute names into constant

docs/LangRef.rst
5185	I think a prologue/peel is analogous to epilogue/remainder. That is, a new `llvm.loop.vectorize.followup_peel` can be added. Should be handled as two separate transformations (such as vectorize/interleave). That is `llvm.loop.disable_nonforced` would ensure that a loop does not unexpectedly disappear `llvm.loop.followup_remainder` should apply on any of the remainder loops. If a finer distinction is required, we can add more specific attributes. This can already happen, at least with LoopUnroll/LoopUnrollandJam. The docs mentions that in this case the `followup_remainder` is dropped. However, changing the model a transformation transform to can indeed raise some backward-compatibility issues. This also applies to the user-interface. If a programmer added #pragma clang loop vectorize(enable) do they expect it to be unrolled as well? Loop peeling? D50480 is interesting here: At `-Os`, it uses masking instead of an epilogue to avoid a code copy. In this case `followup_remainder` explicitly states that there's not necessarily a remainder loop, so I don't see a problem here. But a programmer might expect more control over what the output structure is. We can add more attributes to control this behaviour, such as `llvm.loop.vectorize.peel.enable`, `llvm.loop.vectorize.remainder.enable`, `llvm.loop.vectorize.allow_versioning`. The interesting question is, what is the default setting? If we go by the current behavior to maximize backwards-compatibility, remainder and versioning would be enabled by default (if not in `-Os`), peeling disabled because it is not yet implemented. On the other side we probably do not want frontends to emit the most recent enable-metadata to get the best vectorization. So we would enable all features by default, but the output loops might be different from what the programmer intended before the feature is introduced. We can enable all features unless the transformation is forced, in which case all deviations from the current transformation model needs to be explicitly enabled. IMHO, we can decide this case-by-case, weighting compatibility concerns and optimization levels. Then again, such transformations does not influence the correctness of the output. To be less concerned about compatibility issues, I could for now remove all followup-attributes except those that are 'central' to the transformation, and `followup_all`. For vectorization, there will always be the performance-critical vectorized loop (i.e. `followup_vectorized`), independent of whether there is a prologue, epilogue or fallback. For partial unrolling, it is always a unrolled loop. Will the loop attribute get dropped if the "loop" is fully unrolled? Yes. But it should not happen if `llvm.loop.disable_nonforced` is used and the unroll is not explicitly specified. How do we designate more than one remainder loop? Using different attributes. Like `followup_all` it is possibly to address a group of loops. Will the loop attribute applicable for vectorized peel/remainder? Only for the followup that addresses them Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder? As mentioned sometime before, the typical reaction to 'loop not vectorized' is not 'ok, let's unroll it instead', but 'how can I make it vectorize'. So I don't think fallbacks are necessary (unless we can apply a sequence of transformation to multiple loop), but I am open if you think there is a need for such.
5282	`followup_remainder` is ignored. If this it is not clear from the section in `TransformMetadata.rst`, please tell me.

Meinersbur added inline comments.Aug 17 2018, 4:23 PM

docs/LangRef.rst
5296–5297	@dmgreen This directly contradicts the `nounroll_plus_unroll_and_jam` test case in `Transforms/LoopUnrollAndJam/pragma.ll`
5342	There's still an outermost (unrolled) loop and an innermost (jammed) loop. We could also adds followups for middle loops. If it is the naming that concerns you: Would you prefer `followup_unrolled` and `followup_jammed`?
5411	There is no overlap between `cyclic` and `noncyclic`. For the extended loop-transformations, the user would name the loops they want distributed. Indeed, these followup are are specific to the current distribution pass. However, I think it is easy for any distribution to determine whether a loop has cyclic dependences and and those attributes to any output loop that matches. `makeFollowupLoopID` can already combine attributes from multiple followups.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
205	`IgnoreUser` exists for the `nounroll_plus_unroll_and_jam` in `LoopUnrollAndJam\pragma.ll`. `llvm.loop.unroll.disable causes`hasUnrollTransformation` in `computeUnrollCount` to return `TM_Disable`. Unrolling inside `computeUnrollCount` is disabled setting the unroll factor to 0. UnrollAndJam then tries to use the that unroll factor.

dmgreen added inline comments.Aug 19 2018, 2:35 AM

docs/LangRef.rst
5296–5297	The way this should be working at the moment is: If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) if there is any unroll metadata disable unrollandjam (leave the loop to the unroller) normal heiristics I think with "but no `llvm.loop.unroll_and_jam` metadata", that is what this is saying. Correct me if I'm wrong and it's not working like this. Or feel free to update it if it's unclear. Or if you think this should work another way...? ;) I originally invisioned unrollandjam as an extension to the unroll pass, so I sometimes see the two things as interrelated. If a user specifies loop.unroll.disable, they almost certainly wanted to disable all unrolling, not just that in the unroll pass.
5342	I believe he meant this being unrolled and jammed: for i { for j A(i,j) for k B(i,k) } This is not something we currently support as I didn't think it would ever be likely to be profitable. Users specifying metadata might change that. The pass could be able to be expanded to work on this (I think), but it's not something that it currently does.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
205	Have you considered moving the "disable" check out of computeUnrollCount and into tryUnrollLoop, where the existing "HasUnrollDisablePragma" check is? Hopefully that could be replaced with the new method, much like HasUnrollAndJamDisablePragma has been, and would mean this computeUnrollCount function would just work as it used to.

Meinersbur added inline comments.Aug 20 2018, 10:16 AM

docs/LangRef.rst
5296–5297	If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) There is a comment in LoopUnrollAndJamPass.cpp: // We have already checked that the loop has no unroll.* pragmas. According to this, this is not true (since only checked afterwards) and `computeUnrollCount` will consider e.g. `llvm.loop.unroll.count` even when used for unroll-and-jam. I was concerned about the last phrase `plus llvm.loop.unroll.disable metadata will disable unroll and jam too.`, but it might be a misinterpretation in that it will disable unroll-and-jam, but only if unroll-and-jam is not explicitly enabled.
5342	The j- and k-loops are both inner loops, so `followup_inner` should apply to both of them. distinguishing them might be possible when introducing a mechanism like for naming the output loops of loop distribution.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
205	There are multiple mechanisms in `computeUnrollCount` that disable unrolling (such as `UnrollCountPragmaValue` returning zero). If I was to fix this method, I'd do it cleanly by refactoring-out the mechanism that computes the unroll factor when pragmas/options are absent (and not emit any LoopUnroll-specific diagnostics).

hfinkel added inline comments.Sep 21 2018, 8:44 AM

docs/LangRef.rst
5117	This is too strong (see comment below).
5196	preserving -> preserve
5196	e.g., because two ...
5198	vector lane -> set of vector lanes
5204	added to both the vectorized and remainder loop
docs/TransformMetadata.rst
15	By default, transformation passes use heuristics to determine whether or not to perform transformations, and when doing so, other details of how the transformations are applied (e.g., which vectorization factor to select).
18	As stated, this is untrue (for -O3). For -O3, we only require a likely speedup across many workloads (and slowdowns be unlikely). This is why, for example, under -O3, we can vectorize with runtime checks. How about this wording: Unless the optimizer is otherwise directed, transformations are applied conservatively. This conservatism generally allows the optimizer to avoid unprofitable transformations, but in practice, this results in the optimizer not applying transformations that would be highly profitable.
23	it -> they
25	for -> of
52	Unrolling, etc. - no need to capitalize.
55	We should be careful with the language here. As any of these can be dropped without changing the semantics of the code, nothing here is "mandatory". How about saying, ", or convey a specific request from the user"
59	optimization-missed warning
61	I know what you mean by "separate", but I think it's better to say: is separate. -> can be unused by the optimizer without generating a warning.
69	This wording is too strong. I think that we need to draw a distinction here between (at least) three classes of transformations: canonicalizing transformations (cost-model-driven) restructuring transformations low-level (target-specific) transformations (e.g., using ctr-register-based loops on PowerPC) I believe that this pragma should only affect those in class (2). Canonizalizing transformations are always performed (when optimizing at all), and low-level transformations are beyond the reach of this kind of metadata. I'd recommend using the wording that this metadata disables, "optional, high-level, restructuring transformations."
71	avoids that the loop is altered -> avoids the loop being altered
88	Why is this a useful feature? Should we allow only one transformation per node?
91	loop being vectorized -> loop to be vectorized
104	This leaves open the question of whether the vectorizer adds the 'isvectorized' attribute when a follow-up is specific. It should, right?
121	for -> to
122	comma after following
153	Why would isvectorized not always be provided?
301	where 'rtc' is the generated runtime safety check.
391	must be -> should be
392	responsible for this reporting
393	they might -> there might
393	being able - > able
395	is -> may be (keep the entire list in the hypothetical)
411	in a fixedpoint loop -> using a dynamic ordering (not to be too prescriptive)
include/llvm/Transforms/Utils/LoopUtils.h
224	We can't have metadata necessary for correctness. 'ForcedByUser' is fine to indicate that the user should receive a warning if the transformation cannot be performed.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
364	So each inner loop gets the same id? That doesn't sound right.

Meinersbur marked 26 inline comments as done.Sep 28 2018, 4:21 AM

Meinersbur added inline comments.

docs/TransformMetadata.rst
88	It's not a new feature, but effectively the current behavior. ("is possible for compatibility reasons"). I'd indeed prefer to crash if multiple transformations are applied to avoid undefined behavior, which unfortunately is a breaking change.
104	I think it should not, but the metadata gives complete control over which attributes will be in the new loop to the metadata. If a frontend wants to apply a transformation twice, it should be able to do so. I think the paragraph says that no such additional implicit attributes are added when a followup is specified: If no followup is specified, ... I added "If, and only if," to make it even clearer.
153	I don't know how LoopVectorize behaves when encountering vector instructions, but I want to avoid the vagueness of some passes adding implicit metadata in some situation. The IR should have the control over whether a transformation is applied multiple times.
include/llvm/Transforms/Utils/LoopUtils.h
224	[comment] This indicates that using metadata for user-directed loop-transformation #pragmas is a leaky abstraction.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
364	`LoopID` is a misnomer. A LoopID is neither unique (multiple loops having the same LoopID, e.g. because LoopVersioning of any pass not aware of loops copied the BBs of a loop; there is even a regression test for the behaviour with non-unique LoopID) nor identifying (adding/removing attributes in the LoopID MDNode will create a new MDNode; see D52116 for a fix for llvm.loop.parallel_accesses assuming this property). Fixing LoopID to be identifier-like is not possible with the current MDNode structure and would require to make any pass that copies code to be aware of LoopIDs. It would be easier to not assume that LoopID has any identifying properties. I am open to rename 'LoopID' to something else.

Rebase
Address @hfinkel's remarks

Harbormaster completed remote builds in B23237: Diff 167450.Sep 28 2018, 4:24 AM

ping

Herald added a subscriber: arphaman. · View Herald TranscriptNov 19 2018, 1:02 PM

dmgreen added inline comments.Nov 25 2018, 12:09 PM

docs/TransformMetadata.rst
18	(Space in howthe)
lib/Transforms/Scalar/LoopUnrollPass.cpp
753	This shouldn't be needed here. Before this patch, there was a single place that checked if the loop had unroll disable pragma (HasUnrollDisablePragma at the start of tryToUnrollLoop). It seems best to keep that as-is in this patch (it's already long enough!) and remove HasUnrollDisablePragma, replacing it with the new hasUnrollTransformation & TM_Disable check. Then we won't need this IgnoreUser.
lib/Transforms/Utils/LoopUtils.cpp
297	Would this fall over if the metadata was not a string? Such as debug metadata.

Meinersbur marked 3 inline comments as done.Nov 29 2018, 11:51 PM

Meinersbur added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
753	This is here because if the unfortunate interaction between LoopUnroll and LoopUnrollAndJam. `computeUnrollAndJamCount` uses the result of this function to itself determine whether it should unroll-and-jam. `HasUnrollDisablePragma` checks for the `llvm.loop.unroll.enable` property. `hasUnrollTransformation` returns whether LoopUnroll should do something which is not interchangeable. For some reason, `llvm.loop.unroll.enable` is handled here, but `llvm.loop.unroll.count` and `llvm.loop.unroll.full` are handled here and therefore have in influence on LoopUnrollAndJam. I would be glad if you, the author of LoopUnrollAndJam, could untangle this.
lib/Transforms/Utils/LoopUtils.cpp
297	This was previously checked to be in a LoopID, therefore cannot be debug metadata. This assumes that the metadata is not malformed. However, this is nowhere handled gracefully in LLVM. For instance, `UnrollAndJamCountPragmaValue` will trigger an assertion if the MDNode has not exactly 2 items, or the second item is something else than a positive integer. In the case here, an assertion in `cast<T>` will trigger. I added extra checks at this location, but there are many others.

Address dmgreen's comments
Rebase

Harbormaster completed remote builds in B25530: Diff 176044.Nov 29 2018, 11:52 PM

dmgreen added inline comments.Dec 2 2018, 12:08 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
753	Sometimes it's easier to show with code :-) so this is what I was thinking of: https://reviews.llvm.org/P8121 Unless you think that will not work for some reason? It passes all the tests you have here, and removes HasUnrollDisablePragma and the IgnoreUser, so seems cleaner. It also has the advantage of keeping unrelated changes to a minimum and not introducing a second place for llvm.loop.unroll.disable to be checked.

dmgreen added inline comments.Dec 2 2018, 12:11 PM

lib/Transforms/Utils/LoopUtils.cpp
297	Yeah, malformed input would be fine to not handle, as far as I understand (or perhaps is just QOI). But I was testing something like this (hope I still have it correct): void c(int n, int* w, int* x, int y, int z, int a) { #pragma clang loop distribute(enable) vectorize(disable) for (int i=0; i < n; i++) { x[i] = y[i] + z[i]w[i]; a[i+1] = (a[i-1] + a[i] + a[i+1])/3.0; y[i] = z[i] - x[i]; } } Ran with "clang -O3 distribute.c -S -g" would crash with the previous patch. Now I think it doesn't drop the distribute metadata? I believe the llvm.loop metedata will looks something like !58 in: !58 = distinct !{!58, !30, !59, !60, !61, !62} !59 = !DILocation(line: 8, column: 5, scope: !20) !60 = !{!"llvm.loop.vectorize.width", i32 1} !61 = !{!"llvm.loop.unroll.disable"} !62 = !{!"llvm.loop.distribute.enable", i1 true} !30 is a DILocation too, which I think are the parts causing the problems.

Meinersbur marked 2 inline comments as done.Dec 3 2018, 1:49 PM

Meinersbur added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
753	Thank you for the patch. I am not 100% sure whether this does not change LoopUnroll's behavior. That is, with `!{!"llvm.loop.unroll.count", i32 1}` it currently executes UP.Count = PragmaCount; UP.Runtime = true; UP.AllowExpensiveTripCount = true; UP.Force = true; if ((UP.AllowRemainder \|\| (TripMultiple % PragmaCount == 0)) && getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold) return true; where as with your patch it bails out early (it might still do peeling even if UP.Count is 1). Also, the `-unroll-count` command-line option would be evaluated first before your patch. unroll-pragmas_contradiction.ll950 BDownload fails with your patch. However, I like that it indeed makes the unroll decision simpler and goes in the direction of separating LoopUnroll and LoopUnrollAndJam's decision logic.
lib/Transforms/Utils/LoopUtils.cpp
297	I may not have considered that CGLoopInfo.cpp also adds debug locations to LoopIDs. Should be fixed with the previous update. Thanks for noticing. I also made a mistake in that update which dropped all non-distribute metadata instead of the distribute metadata. It made one regression test fail.

Rebress
Fix drop metadata regression
Apply dmgreen's patch suggestion

Harbormaster completed remote builds in B25622: Diff 176471.Dec 3 2018, 1:51 PM

A few additional comments. Otherwise, this LGTM. When @dmgreen is happy with the unrolling changes, I think you're good to go.

docs/LangRef.rst
5124	Maybe add, "It is recommended to use this metadata when using any of the other llvm.loop.* metadata to direct specific transformations."
docs/TransformMetadata.rst
400	there -> they
lib/Transforms/Scalar/LoopDistribute.cpp
87	This should say Followup, not Followu, I suppose.
lib/Transforms/Scalar/WarnMissedTransforms.cpp
32	Here and below, explicitly specified should have a hyphen (it is a compound adjective): explicitly-specified loop unrolling that having been said, I'd prefer a different phrasing all together. These are end-user visible messages, and I think that we can make these slightly more user friendly. How about this: "loop not unrolled: the optimizer was unable to perform the requested transformation" (and similar for the others)

This revision is now accepted and ready to land.Dec 3 2018, 4:32 PM

Add transformation order notice to llvm.loop.disable_nonforced.
Typos

Harbormaster completed remote builds in B25647: Diff 176545.Dec 3 2018, 9:15 PM

Meinersbur added inline comments.Dec 3 2018, 9:15 PM

lib/Transforms/Scalar/WarnMissedTransforms.cpp
32	I think that "the optimizer was unable to perform" is less accurate: it gives the impression that the optimizer actually tried to perform the transformation, but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline (e.g. because of `-fno-vectorize` or `-mllvm -enable-unroll-and-jam` is missing). That is, the user should modify the compiler flags instead of tweaking the source code. That being said, "failed to ..." is not much better. Any better suggestions?

dexonsmith removed a subscriber: dexonsmith.Dec 3 2018, 10:05 PM

hfinkel added inline comments.Dec 4 2018, 7:49 AM

lib/Transforms/Scalar/WarnMissedTransforms.cpp
32	I think that "the optimizer was unable to perform" is less accurate: ... but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline... I disagree that it is less accurate, and the optimizer might be unable to perform an optimization for structural reasons, and to say that something "failed" clearly implies to me that it was explicitly attempted (which in this case it was not). Nevertheless, this is a good point, and we could provide a more-useful message. How about this: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering

Meinersbur mentioned this in D55288: [test] Fix tests for changed optimizer warning message.Dec 4 2018, 11:56 AM

Clear-up leftover transformation warning messages

This change requires a patch to Clang: D55288

Harbormaster completed remote builds in B25685: Diff 176683.Dec 4 2018, 11:58 AM

Meinersbur added a child revision: D55288: [test] Fix tests for changed optimizer warning message.Dec 4 2018, 11:59 AM

When @dmgreen is happy with the unrolling changes, I think you're good to go.

Certainly. If you are happy, I am happy. Thanks.

[test] Revise tests
- Consistent disable_nonforced testing
- Unify followup-attribute testing. The previous approach was to copy existing test cases and emulate the behavior of the loop transformation passes using followup attributes. This had the disadvantage that the pass would pass even if the followup-attribute was ignored (indeed, some were misspelled) since the result is the same. Instead, use a new "followup.ll" test per loop pass that checks the presence of new attributes specific for each followup.
[docs] Add followup attribute recommendations

Harbormaster completed remote builds in B25946: Diff 177820.Dec 11 2018, 9:27 PM

Closed by commit rC348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. (authored by Meinersbur). · Explain WhyDec 12 2018, 9:38 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: cfe-commits. · View Herald TranscriptDec 12 2018, 9:38 AM

Revision Contents

Path

Size

docs/

LangRef.rst

116 lines

Passes.rst

5 lines

TransformMetadata.rst

420 lines

index.rst

1 line

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

WarnMissedTransforms.h

38 lines

Utils/

LoopUtils.h

71 lines

UnrollLoop.h

23 lines

Vectorize/

LoopVectorizationLegality.h

6 lines

lib/

Analysis/

LoopInfo.cpp

18 lines

Passes/

PassBuilder.cpp

2 lines

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

4 lines

Scalar/

CMakeLists.txt

1 line

LoopDistribute.cpp

50 lines

LoopUnrollAndJamPass.cpp

77 lines

LoopUnrollPass.cpp

41 lines

LoopVersioningLICM.cpp

5 lines

Scalar.cpp

1 line

WarnMissedTransforms.cpp

134 lines

Utils/

LoopUnroll.cpp

7 lines

LoopUnrollAndJam.cpp

15 lines

LoopUnrollRuntime.cpp

35 lines

LoopUtils.cpp

261 lines

Vectorize/

LoopVectorize.cpp

73 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

opt-O2-pipeline.ll

4 lines

opt-O3-pipeline.ll

4 lines

opt-Os-pipeline.ll

4 lines

opt-hot-cold-split.ll

4 lines

Transforms/

LoopDistribute/

disable-heuristic.ll

48 lines

followup.ll

62 lines

LoopTransformWarning/

distribution-remarks-missed.ll

99 lines

unrollandjam-remarks-missed.ll

99 lines

unrolling-remarks-missed.ll

99 lines

vectorization-remarks-missed.ll

113 lines

LoopUnroll/

disable_nonforced.ll

26 lines

disable_nonforced_count.ll

27 lines

disable_nonforced_enable.ll

27 lines

disable_nonforced_full.ll

29 lines

runtime-loop_transform.ll

251 lines

unroll-count_transform.ll

26 lines

unroll-pragmas-disabled_transform.ll

150 lines

unroll-pragmas_transform.ll

34 lines

LoopUnrollAndJam/

disable_nonforced.ll

47 lines

disable_nonforced_count.ll

49 lines

disable_nonforced_enable.ll

49 lines

followup-metadata.ll

63 lines

pragma.ll

2 lines

LoopVectorize/

X86/

already-vectorized_transform.ll

50 lines

vectorization-remarks-missed.ll

10 lines

x86_fp80-vector-store_transform.ll

32 lines

disable-heuristic.ll

26 lines

duplicated-metadata_transform.ll

30 lines

followups.ll

41 lines

hints-trans_transform.ll

30 lines

multiple-strides-vectorization_transform.ll

67 lines

no_array_bounds.ll

2 lines

no_switch.ll

8 lines

vectorize-once_transform.ll

79 lines

Commit	Tree	Parents	Author	Summary	Date
1d4e8a0c7f45	fd2b80e9ff37	53ee91634d7e	Michael Kruse	clang-format	Dec 3 2018, 1:28 PM
53ee91634d7e	bd3f62587386	c8747140519c d1325655d524	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Dec 3 2018, 1:11 PM
c8747140519c	2484d06712fe	7ffff98b7442	Michael Kruse	Apply dmgreen's patch suggestion	Dec 3 2018, 1:09 PM
7ffff98b7442	fa75be0e7da5	7653ff54c56e	Michael Kruse	Fix regressions	Dec 3 2018, 12:24 PM
7653ff54c56e	ae129a79af44	8d2e1c42f7c5 4a9bb285e14e	Michael Kruse	Merge remote-tracking branch 'github-mirror/master' into HEAD	Dec 3 2018, 11:29 AM
8d2e1c42f7c5	0f799c1a7c92	c072ec7e47be	Michael Kruse	Address dmgreen's comments	Nov 29 2018, 11:50 PM
c072ec7e47be	cb37f982aef3	38a6662743a7 9f3f290561f7	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Nov 29 2018, 10:05 PM
38a6662743a7	3b14e80359ff	07b822e4d482 e962d26b1fd9	Michael Kruse	Merge branch 'followup' into HEAD	Nov 29 2018, 10:03 PM
e962d26b1fd9	40453288113a	f2731b8744d1	Michael Kruse	Address @hfinkel's remarks	Sep 28 2018, 4:22 AM
f2731b8744d1	ed2a5ca8feee	82351167bd74 8a1edcb2277d	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 27 2018, 6:51 AM
82351167bd74	b62124db12ea	3303750bfb00 7a6ebef3d3a5	Michael Kruse	Merge branch 'followup' into HEAD	Sep 27 2018, 6:50 AM
7a6ebef3d3a5	6ca83606e004	df32dd03822e 03e629a1a434	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 13 2018, 1:38 PM
df32dd03822e	aa99ec919167	41df1c315b03 f32491f8a4ce	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 9 2018, 10:03 PM
41df1c315b03	7cd51f74d4d2	3a6038932dd3 35bfd59001d6	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 7 2018, 8:05 PM
3a6038932dd3	12916c14cee2	9a892a1da159 9840d7c8db6a	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 6 2018, 10:45 AM
9a892a1da159	4fd07520a188	9dbe70e50c08	Michael Kruse	'static const char' to 'const char const'	Sep 4 2018, 2:51 PM
9dbe70e50c08	9acd165ae58d	05b033a7b042 87877c50435b	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 4 2018, 11:56 AM
05b033a7b042	aad76ecd5aad	5b7b60f3d378 012ff47f2f57	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 4 2018, 11:41 AM
5b7b60f3d378	8d92a85264f8	5acdeec87961 c38c85037ddf	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Sep 4 2018, 10:35 AM
5acdeec87961	c125258c38c0	e71ae985ab66 e2e1cabd39f8	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 30 2018, 10:00 AM
e71ae985ab66	43e30ea4115b	57494d54a17e 8ce430e18194	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 29 2018, 2:48 PM
57494d54a17e	6b8b25d8acde	4effaceb6316 aea52742efe8	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 28 2018, 5:07 PM
4effaceb6316	bcbe60032bb4	2f27d23c235f cf50706d0b78	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 28 2018, 10:04 AM
2f27d23c235f	669647209080	cd15a3569cb1 0e4afbdc918a	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 27 2018, 12:41 PM
cd15a3569cb1	54a121ae26b9	49e0f2e0f4a3	Michael Kruse	Fix header underline length	Aug 27 2018, 12:40 PM
49e0f2e0f4a3	897bb11ec921	a936e73da4b8 48d2b81f7a14	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 23 2018, 8:35 AM
a936e73da4b8	c6c31037d154	d999eac258db b4dc85b780e0	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 22 2018, 10:32 AM
d999eac258db	deff55e76361	44a5e1350f93 77028e31e5f8	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 21 2018, 8:18 AM
44a5e1350f93	6fee37b8c902	c59309520897 681bdae97747	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 20 2018, 9:40 AM
c59309520897	23e146508e23	2570e69a5a45	Michael Kruse	- Report unroll-and-jam as not applied even if unroll is present as well. (Show More…)	Aug 17 2018, 4:11 PM
2570e69a5a45	172db5bbf876	218b727473ca 7aaa11e8981e	Michael Kruse	Merge branch 'followup' into HEAD	Aug 17 2018, 12:39 PM
7aaa11e8981e	20112a65a5a8	7933842f62c4 39b5a02c6fe2	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 17 2018, 10:05 AM
7933842f62c4	b4778d9a815c	141351757a7c cd2d2cfae69a	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 16 2018, 10:25 AM
141351757a7c	3451855cb09b	a0d83d520ea4 bb0fa60d23a6	Michael Kruse	Merge branch 'followup' into HEAD	Aug 16 2018, 10:22 AM
bb0fa60d23a6	4190c618c506	23717cecfb6c	Michael Kruse	Address comments (Show More…)	Aug 9 2018, 8:36 PM
23717cecfb6c	61a6eac802dd	3f3b925751c9 9d9ecd48fb73	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 9 2018, 2:42 PM
3f3b925751c9	0d57d56ee3cc	bdba58f80a89 5e77ae5ff56a	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 9 2018, 9:29 AM
bdba58f80a89	de49fe29531c	14c7654e71fb d99d65710d0d	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 9 2018, 8:43 AM
14c7654e71fb	718094752feb	b5dbba44809b c5db0e0f7c75	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Aug 9 2018, 7:35 AM
b5dbba44809b	96ec343915c7	f5fbddee1905	Michael Kruse	Address @dmgreen's remarks	Jul 26 2018, 7:29 PM
f5fbddee1905	f9c467169a6b	c0e32a59eda3 09e5b9263888	Michael Kruse	Merge branch 'followup' into HEAD	Jul 26 2018, 7:12 PM
09e5b9263888	b9dd71c79b24	ea94f8291b80	Michael Kruse	Fix pipeline tests	Jul 12 2018, 8:30 PM
ea94f8291b80	2f080a1e7157	036241ecc168 9074a87ea5f5	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 12 2018, 8:21 PM
036241ecc168	a42af78f6e73	8cf9af577cb9	Michael Kruse	Fix some tests	Jul 12 2018, 8:21 PM
8cf9af577cb9	3e4a74998663	46851c04c5c7 54919303bfce	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 12 2018, 2:50 PM
46851c04c5c7	a2fdeb31b77c	c22b9a60ebca	Michael Kruse	Do not emit forced warning within LoopVectorizer itself	Jul 12 2018, 2:49 PM
c22b9a60ebca	33797e225696	223d957d61a0	Michael Kruse	clang-format	Jul 12 2018, 2:43 PM
223d957d61a0	e4caf27e20cc	cffecbadcfa7	Michael Kruse	Cleanup	Jul 12 2018, 2:39 PM
cffecbadcfa7	2785f247640d	ca21f88b086d 87100fdc04a0	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 12 2018, 12:41 PM
ca21f88b086d	56da8538c6b8	ad6effe9deff	Michael Kruse	More cleanup	Jul 12 2018, 12:41 PM
ad6effe9deff	615bdd6feac5	3e92611c14f0 8840e88391b8	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 12 2018, 8:23 AM
3e92611c14f0	6b817e4c311d	e8c63a64efe1 f6e61654a198	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 12 2018, 7:46 AM
e8c63a64efe1	d8eef8693546	2011aaca3d97	Michael Kruse	clang-format	Jul 11 2018, 11:28 PM
2011aaca3d97	bbd71f8704d8	ce3477f8c3ca 04d4b7fd45bc	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 11 2018, 11:12 PM
ce3477f8c3ca	37a335b8f204	a5f1ae6bb3f2	Michael Kruse	Fix transforms	Jul 11 2018, 11:12 PM
a5f1ae6bb3f2	3c7a56512490	fb16ba528715 60aba7d664a9	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 11 2018, 6:58 PM
fb16ba528715	a8c801af7c57	38034de2fa15	Michael Kruse	Cleanup	Jul 11 2018, 6:58 PM
38034de2fa15	c352279b34ea	a5876795890e ea13527137d2	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 11 2018, 1:21 PM
a5876795890e	2492c02789e9	f6b0adfa24c2	Michael Kruse	formatting	Jul 11 2018, 1:20 PM
f6b0adfa24c2	93c0aa69e3cb	8ffbb14f9137	Michael Kruse	Formatting	Jul 11 2018, 1:02 PM
8ffbb14f9137	8b030e410643	daa9a60a3c31	Michael Kruse	Complete doc	Jul 11 2018, 12:42 PM
daa9a60a3c31	19d39c2c2df7	7b021dddc173 d5cfc836bb55	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 11 2018, 9:41 AM
7b021dddc173	b1e30e8da82b	a1b3f2a3ddfe	Michael Kruse	Rename legacy pass	Jul 11 2018, 9:34 AM
a1b3f2a3ddfe	dcd4ccf07149	3fb3eaa061ed	Michael Kruse	Rename pass	Jul 11 2018, 9:28 AM
3fb3eaa061ed	2c0357f64e2c	1a073de9f5a3	Michael Kruse	Rename file	Jul 11 2018, 9:24 AM
1a073de9f5a3	ad536ec0cb97	6614cff0bd44	Michael Kruse	Add llvm.loop.vectorize.followup_all	Jul 10 2018, 10:04 AM
6614cff0bd44	14ed052b851b	9bf900ef48dd	Michael Kruse	Unroll-and-jam followups	Jul 9 2018, 6:27 PM
9bf900ef48dd	680d1e639126	c432e3847350	Michael Kruse	LoopVectorize followup test	Jul 6 2018, 3:18 PM
c432e3847350	fe222a856217	a4663d61397d	Michael Kruse	Teach loop distribution followups	Jul 6 2018, 3:00 PM
a4663d61397d	cac20b5fecf1	896ac51852d4	Michael Kruse	add llvm.loop.transformations.disable_nonforced	Jul 6 2018, 1:22 PM
896ac51852d4	49850085436e	ff71c60e38f4 3f928c753ced	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 6 2018, 10:28 AM
ff71c60e38f4	3ef69224d61b	99307a193823	Michael Kruse	Add unroll-and-jam warning	Jul 6 2018, 10:27 AM
99307a193823	f1fd3f07fa76	a5f5667517f9	Michael Kruse	Leftover transformation warnign pass.	Jul 6 2018, 10:08 AM
a5f5667517f9	3dafa41250e9	16090c9fc7cd 6e0b82fc61ca	Michael Kruse	Merge remote-tracking branch 'official/master' into followup	Jul 5 2018, 11:20 AM
16090c9fc7cd	2742019ee6b2	1eef495cdcc8 d4fcf5e9a041	Michael Kruse	Merge branch 'followup' into HEAD	Jul 5 2018, 11:17 AM
d4fcf5e9a041	231c4fe3ea6f	bcb65bd6d72b	U-FRANKHAGEN\meinersbur	Work in Transformation warning	Jul 3 2018, 4:56 PM
bcb65bd6d72b	7418a1e799e2	e0321f753aa1	U-FRANKHAGEN\meinersbur	Fix LoopVectorize test cases:x	Jul 3 2018, 1:48 PM
e0321f753aa1	6fe3fa7b0cb3	cbf669bd5900	U-FRANKHAGEN\meinersbur	Fix LoopUnroll tests	Jul 3 2018, 12:48 PM
cbf669bd5900	e4f5439c753a	6dfabd4b130f	U-FRANKHAGEN\meinersbur	Reset LoopID after transform	Jul 2 2018, 2:38 PM
6dfabd4b130f	deb5cc64727b	f162464ba813	U-FRANKHAGEN\meinersbur	ambiguous transforms	Jul 2 2018, 12:17 PM

Diff 176471

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,070 Lines • ▼ Show 20 Lines

``unpredictable`` metadata may be attached to any branch or switch		``unpredictable`` metadata may be attached to any branch or switch
instruction. It can be used to express the unpredictability of control		instruction. It can be used to express the unpredictability of control
flow. Similar to the llvm.expect intrinsic, it may be used to alter		flow. Similar to the llvm.expect intrinsic, it may be used to alter
optimizations related to compare and branch instructions. The metadata		optimizations related to compare and branch instructions. The metadata
is treated as a boolean value; if it exists, it signals that the branch		is treated as a boolean value; if it exists, it signals that the branch
or switch that it is attached to is completely unpredictable.		or switch that it is attached to is completely unpredictable.

		.. _llvm.loop:

'``llvm.loop``'		'``llvm.loop``'
^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^

It is sometimes useful to attach information to loop constructs. Currently,		It is sometimes useful to attach information to loop constructs. Currently,
loop metadata is implemented as metadata attached to the branch instruction		loop metadata is implemented as metadata attached to the branch instruction
in the loop latch block. This type of metadata refer to a metadata node that is		in the loop latch block. This type of metadata refer to a metadata node that is
guaranteed to be separate for each loop. The loop identifier metadata is		guaranteed to be separate for each loop. The loop identifier metadata is
specified with the name ``llvm.loop``.		specified with the name ``llvm.loop``.
Show All 17 Lines

.. code-block:: llvm		.. code-block:: llvm

br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0		br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
...		...
!0 = !{!0, !1}		!0 = !{!0, !1}
!1 = !{!"llvm.loop.unroll.count", i32 4}		!1 = !{!"llvm.loop.unroll.count", i32 4}

		'``llvm.loop.disable_nonforced``'
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata disables all optional loop transformations unless
		hfinkelUnsubmitted Done Reply Inline Actions This is too strong (see comment below). hfinkel: This is too strong (see comment below).
		explicitly instructed using other transformation metdata such as
		``llvm.loop.unroll.enable''. That is, no heuristic will try to determine
		whether a transformation is profitable. The purpose is to avoid that the
		loop is transformed to a different loop before an explicitly requested
		(forced) transformation is applied. For instance, loop fusion can make
		other transformations impossible. Mandatory loop canonicalizations such
		as loop rotation are still applied.
		hfinkelUnsubmitted Done Reply Inline Actions Maybe add, "It is recommended to use this metadata when using any of the other llvm.loop.* metadata to direct specific transformations." hfinkel: Maybe add, "It is recommended to use this metadata when using any of the other llvm.loop.*…
		See :ref:`transformation-metadata` for details.

'``llvm.loop.vectorize``' and '``llvm.loop.interleave``'		'``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are		Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
used to control per-loop vectorization and interleaving parameters such as		used to control per-loop vectorization and interleaving parameters such as
vectorization width and interleave count. These metadata should be used in		vectorization width and interleave count. These metadata should be used in
conjunction with ``llvm.loop`` loop identification metadata. The		conjunction with ``llvm.loop`` loop identification metadata. The
``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only		``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	.. code-block:: llvm

!0 = !{!"llvm.loop.vectorize.width", i32 4}		!0 = !{!"llvm.loop.vectorize.width", i32 4}

Note that setting ``llvm.loop.vectorize.width`` to 1 disables		Note that setting ``llvm.loop.vectorize.width`` to 1 disables
vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to		vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
0 or if the loop does not have this metadata the width will be		0 or if the loop does not have this metadata the width will be
determined automatically.		determined automatically.

		'``llvm.loop.vectorize.followup_vectorized``' Metadata
		hsaitoUnsubmitted Not Done Reply Inline Actions I understand that the RST file update should talk about what happens today, but for the sake of code review, it's good to discuss what could happen in reasonably foreseeable future so that we don't under-design things. I think we should be thinking ahead about vectorizer peeling the loop, for example, for alignment optimization. Such peeled loop could be fully unrolled if the trip count is known, or vectorized with mask. main vector loop could be fully unrolled there may be more than one remainder loop, e.g., vectorized remainder followed by scalar remainder. remainder loop may be fully unrolled. All those situations could happen w/o programmer knowing it'll happen that way. Some of the questions we want to think before the real need arises: Will the loop attribute get dropped if the "loop" is fully unrolled? How do we designate more than one remainder loop? Will the loop attribute applicable for vectorized peel/remainder? Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder? hsaito: I understand that the RST file update should talk about what happens today, but for the sake of…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I think a prologue/peel is analogous to epilogue/remainder. That is, a new `llvm.loop.vectorize.followup_peel` can be added. Should be handled as two separate transformations (such as vectorize/interleave). That is `llvm.loop.disable_nonforced` would ensure that a loop does not unexpectedly disappear `llvm.loop.followup_remainder` should apply on any of the remainder loops. If a finer distinction is required, we can add more specific attributes. This can already happen, at least with LoopUnroll/LoopUnrollandJam. The docs mentions that in this case the `followup_remainder` is dropped. However, changing the model a transformation transform to can indeed raise some backward-compatibility issues. This also applies to the user-interface. If a programmer added #pragma clang loop vectorize(enable) do they expect it to be unrolled as well? Loop peeling? D50480 is interesting here: At `-Os`, it uses masking instead of an epilogue to avoid a code copy. In this case `followup_remainder` explicitly states that there's not necessarily a remainder loop, so I don't see a problem here. But a programmer might expect more control over what the output structure is. We can add more attributes to control this behaviour, such as `llvm.loop.vectorize.peel.enable`, `llvm.loop.vectorize.remainder.enable`, `llvm.loop.vectorize.allow_versioning`. The interesting question is, what is the default setting? If we go by the current behavior to maximize backwards-compatibility, remainder and versioning would be enabled by default (if not in `-Os`), peeling disabled because it is not yet implemented. On the other side we probably do not want frontends to emit the most recent enable-metadata to get the best vectorization. So we would enable all features by default, but the output loops might be different from what the programmer intended before the feature is introduced. We can enable all features unless the transformation is forced, in which case all deviations from the current transformation model needs to be explicitly enabled. IMHO, we can decide this case-by-case, weighting compatibility concerns and optimization levels. Then again, such transformations does not influence the correctness of the output. To be less concerned about compatibility issues, I could for now remove all followup-attributes except those that are 'central' to the transformation, and `followup_all`. For vectorization, there will always be the performance-critical vectorized loop (i.e. `followup_vectorized`), independent of whether there is a prologue, epilogue or fallback. For partial unrolling, it is always a unrolled loop. Will the loop attribute get dropped if the "loop" is fully unrolled? Yes. But it should not happen if `llvm.loop.disable_nonforced` is used and the unroll is not explicitly specified. How do we designate more than one remainder loop? Using different attributes. Like `followup_all` it is possibly to address a group of loops. Will the loop attribute applicable for vectorized peel/remainder? Only for the followup that addresses them Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder? As mentioned sometime before, the typical reaction to 'loop not vectorized' is not 'ok, let's unroll it instead', but 'how can I make it vectorize'. So I don't think fallbacks are necessary (unless we can apply a sequence of transformation to multiple loop), but I am open if you think there is a need for such. Meinersbur: # I think a prologue/peel is analogous to epilogue/remainder. That is, a new `llvm.loop.
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the vectorized loop will
		have. See :ref:`transformation-metadata` for details.

		'``llvm.loop.vectorize.followup_epilogue``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the epilogue will have. The
		epilogue is not vectorized and is executed when either the vectorized
		loop is not known to preserve semantics (because e.g., it processes two
		hfinkelUnsubmitted Done Reply Inline Actions preserving -> preserve hfinkel: preserving -> preserve
		hfinkelUnsubmitted Done Reply Inline Actions e.g., because two ... hfinkel: e.g., because two ...
		arrays that are found to alias by a runtime check) or for the last
		iterations that do not fill a complete set of vector lanes. See
		hfinkelUnsubmitted Done Reply Inline Actions vector lane -> set of vector lanes hfinkel: vector lane -> set of vector lanes
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.vectorize.followup_all``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Attributes in the metadata will be added to both the vectorized and
		hfinkelUnsubmitted Done Reply Inline Actions added to both the vectorized and remainder loop hfinkel: added to both the vectorized and remainder loop
		epilogue loop.
		See :ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.unroll``'		'``llvm.loop.unroll``'
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling		Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling
optimization hints such as the unroll factor. ``llvm.loop.unroll``		optimization hints such as the unroll factor. ``llvm.loop.unroll``
metadata should be used in conjunction with ``llvm.loop`` loop		metadata should be used in conjunction with ``llvm.loop`` loop
identification metadata. The ``llvm.loop.unroll`` metadata are only		identification metadata. The ``llvm.loop.unroll`` metadata are only
optimization hints and the unrolling will only be performed if the		optimization hints and the unrolling will only be performed if the
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
This metadata suggests that the loop should be unrolled fully. The		This metadata suggests that the loop should be unrolled fully. The
metadata has a single operand which is the string ``llvm.loop.unroll.full``.		metadata has a single operand which is the string ``llvm.loop.unroll.full``.
For example:		For example:

.. code-block:: llvm		.. code-block:: llvm

!0 = !{!"llvm.loop.unroll.full"}		!0 = !{!"llvm.loop.unroll.full"}

		'``llvm.loop.unroll.followup``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the unrolled loop will have.
		See :ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.unroll.followup_remainder``' Metadata
		hsaitoUnsubmitted Not Done Reply Inline Actions Remainder here may be unrolled again or fully unrolled (see the comments on vectorize metadata). What do we do for that? hsaito: Remainder here may be unrolled again or fully unrolled (see the comments on vectorize metadata).
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions `followup_remainder` is ignored. If this it is not clear from the section in `TransformMetadata.rst`, please tell me. Meinersbur: `followup_remainder` is ignored. If this it is not clear from the section in `TransformMetadata.
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the remainder loop after
		partial/runtime unrolling will have. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.unroll_and_jam``'		'``llvm.loop.unroll_and_jam``'
		dmgreenUnsubmitted Done Reply Inline Actions These can now move down with the other unroll_and_jam metadata dmgreen: These can now move down with the other unroll_and_jam metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata		This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata
above, but affect the unroll and jam pass. In addition any loop with		above, but affect the unroll and jam pass. In addition any loop with
``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will		``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will
disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the		disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the
unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam		unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam
too.)		too.)
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions @dmgreen This directly contradicts the `nounroll_plus_unroll_and_jam` test case in `Transforms/LoopUnrollAndJam/pragma.ll` Meinersbur: @dmgreen This directly contradicts the `nounroll_plus_unroll_and_jam` test case in…
		dmgreenUnsubmitted Not Done Reply Inline Actions The way this should be working at the moment is: If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) if there is any unroll metadata disable unrollandjam (leave the loop to the unroller) normal heiristics I think with "but no `llvm.loop.unroll_and_jam` metadata", that is what this is saying. Correct me if I'm wrong and it's not working like this. Or feel free to update it if it's unclear. Or if you think this should work another way...? ;) I originally invisioned unrollandjam as an extension to the unroll pass, so I sometimes see the two things as interrelated. If a user specifies loop.unroll.disable, they almost certainly wanted to disable all unrolling, not just that in the unroll pass. dmgreen: The way this should be working at the moment is: If there is any unroll_and_jam metadata…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) There is a comment in LoopUnrollAndJamPass.cpp: // We have already checked that the loop has no unroll.* pragmas. According to this, this is not true (since only checked afterwards) and `computeUnrollCount` will consider e.g. `llvm.loop.unroll.count` even when used for unroll-and-jam. I was concerned about the last phrase `plus llvm.loop.unroll.disable metadata will disable unroll and jam too.`, but it might be a misinterpretation in that it will disable unroll-and-jam, but only if unroll-and-jam is not explicitly enabled. Meinersbur: ``` If there is any unroll_and_jam metadata do that thing (the user explicitly…

The metadata for unroll and jam otherwise is the same as for ``unroll``.		The metadata for unroll and jam otherwise is the same as for ``unroll``.
``llvm.loop.unroll_and_jam.enable``, ``llvm.loop.unroll_and_jam.disable`` and		``llvm.loop.unroll_and_jam.enable``, ``llvm.loop.unroll_and_jam.disable`` and
``llvm.loop.unroll_and_jam.count`` do the same as for unroll.		``llvm.loop.unroll_and_jam.count`` do the same as for unroll.
``llvm.loop.unroll_and_jam.full`` is not supported. Again these are only hints		``llvm.loop.unroll_and_jam.full`` is not supported. Again these are only hints
and the normal safety checks will still be performed.		and the normal safety checks will still be performed.

'``llvm.loop.unroll_and_jam.count``' Metadata		'``llvm.loop.unroll_and_jam.count``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		dmgreenUnsubmitted Done Reply Inline Actions Nit: This loop dmgreen: Nit: This loop
This metadata suggests an unroll and jam factor to use, similarly to		This metadata suggests an unroll and jam factor to use, similarly to
``llvm.loop.unroll.count``. The first operand is the string		``llvm.loop.unroll.count``. The first operand is the string
``llvm.loop.unroll_and_jam.count`` and the second operand is a positive integer		``llvm.loop.unroll_and_jam.count`` and the second operand is a positive integer
specifying the unroll factor. For example:		specifying the unroll factor. For example:

.. code-block:: llvm		.. code-block:: llvm

!0 = !{!"llvm.loop.unroll_and_jam.count", i32 4}		!0 = !{!"llvm.loop.unroll_and_jam.count", i32 4}
Show All 18 Lines
trip count is known at compile time and partially unrolled if the trip count is		trip count is known at compile time and partially unrolled if the trip count is
not known at compile time. The metadata has a single operand which is the		not known at compile time. The metadata has a single operand which is the
string ``llvm.loop.unroll_and_jam.enable``. For example:		string ``llvm.loop.unroll_and_jam.enable``. For example:

.. code-block:: llvm		.. code-block:: llvm

!0 = !{!"llvm.loop.unroll_and_jam.enable"}		!0 = !{!"llvm.loop.unroll_and_jam.enable"}

		'``llvm.loop.unroll_and_jam.followup_outer``' Metadata
		hsaitoUnsubmitted Not Done Reply Inline Actions Is there an assumption of unroll_and_jam operating only on a double loop and/or a perfect loop? Technically speaking, we can unroll_and_jam a loop if we can legally outerloop-vectorize. So, there can be multiple inner loops. hsaito: Is there an assumption of unroll_and_jam operating only on a double loop and/or a perfect loop?
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions There's still an outermost (unrolled) loop and an innermost (jammed) loop. We could also adds followups for middle loops. If it is the naming that concerns you: Would you prefer `followup_unrolled` and `followup_jammed`? Meinersbur: There's still an outermost (unrolled) loop and an innermost (jammed) loop. We could also adds…
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe he meant this being unrolled and jammed: for i { for j A(i,j) for k B(i,k) } This is not something we currently support as I didn't think it would ever be likely to be profitable. Users specifying metadata might change that. The pass could be able to be expanded to work on this (I think), but it's not something that it currently does. dmgreen: I believe he meant this being unrolled and jammed: for i { for j A(i,j) for k…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions The j- and k-loops are both inner loops, so `followup_inner` should apply to both of them. distinguishing them might be possible when introducing a mechanism like for naming the output loops of loop distribution. Meinersbur: The j- and k-loops are both inner loops, so `followup_inner` should apply to both of them.
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the outer unrolled loop will
		have. See :ref:`Transformation Metadata <transformation-metadata>` for
		details.

		'``llvm.loop.unroll_and_jam.followup_inner``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which loop attributes the inner jammed loop will
		have. See :ref:`Transformation Metadata <transformation-metadata>` for
		details.

		'``llvm.loop.unroll_and_jam.followup_remainder_outer``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which attributes the epilogue of the outer loop
		will have. This loop is usually unrolled, meaning there is no such
		loop. This attribute will be ignored in this case. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.unroll_and_jam.followup_remainder_inner``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which attributes the inner loop of the epilogue
		will have. The outer epilogue will usually be unrolled, meaning there
		can be multiple inner remainder loops. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.unroll_and_jam.followup_all``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Attributes specified in the metadata is added to all
		``llvm.loop.unroll_and_jam.*`` loops. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.licm_versioning.disable``' Metadata		'``llvm.loop.licm_versioning.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This metadata indicates that the loop should not be versioned for the purpose		This metadata indicates that the loop should not be versioned for the purpose
of enabling loop-invariant code motion (LICM). The metadata has a single operand		of enabling loop-invariant code motion (LICM). The metadata has a single operand
which is the string ``llvm.loop.licm_versioning.disable``. For example:		which is the string ``llvm.loop.licm_versioning.disable``. For example:

.. code-block:: llvm		.. code-block:: llvm
Show All 16 Lines
.. code-block:: llvm		.. code-block:: llvm

!0 = !{!"llvm.loop.distribute.enable", i1 0}		!0 = !{!"llvm.loop.distribute.enable", i1 0}
!1 = !{!"llvm.loop.distribute.enable", i1 1}		!1 = !{!"llvm.loop.distribute.enable", i1 1}

This metadata should be used in conjunction with ``llvm.loop`` loop		This metadata should be used in conjunction with ``llvm.loop`` loop
identification metadata.		identification metadata.

		'``llvm.loop.distribute.followup_coincident``' Metadata
		hsaitoUnsubmitted Not Done Reply Inline Actions Looks rather centric to distribute-for-vectorization. Loop distribution can happen for many reasons (and it may be more than one reasons). Are we going to define followup_ Metadata for each of those reasons? What'll happen if a loop matches the characteristics of more than one Metadata? hsaito: Looks rather centric to distribute-for-vectorization. Loop distribution can happen for many…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions There is no overlap between `cyclic` and `noncyclic`. For the extended loop-transformations, the user would name the loops they want distributed. Indeed, these followup are are specific to the current distribution pass. However, I think it is easy for any distribution to determine whether a loop has cyclic dependences and and those attributes to any output loop that matches. `makeFollowupLoopID` can already combine attributes from multiple followups. Meinersbur: There is no overlap between `cyclic` and `noncyclic`. For the [[ https://arxiv.org/abs/1805.
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which attributes extracted loops with no cyclic
		dependencies will have (i.e. can be vectorized). See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.distribute.followup_sequential``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		This metadata defines which attributes the isolated loops with unsafe
		memory dependencies will have. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.distribute.followup_fallback``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		If loop versioning is necessary, this metadata defined the attributes
		the non-distributed fallback version will have. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

		'``llvm.loop.distribute.followup_all``' Metadata
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Thes attributes in this metdata is added to all followup loops of the
		loop distribution pass. See
		:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.mem``'		'``llvm.mem``'
^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^

Metadata types used to annotate memory accesses with information helpful		Metadata types used to annotate memory accesses with information helpful
for optimizations are prefixed with ``llvm.mem``.		for optimizations are prefixed with ``llvm.mem``.

'``llvm.mem.parallel_loop_access``' Metadata		'``llvm.mem.parallel_loop_access``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▲ Show 20 Lines • Show All 10,611 Lines • Show Last 20 Lines

docs/Passes.rst

	Show First 20 Lines • Show All 1,218 Lines • ▼ Show 20 Lines
	Displays the post dominator tree using the GraphViz tool.			Displays the post dominator tree using the GraphViz tool.

	``-view-postdom-only``: View postdominance tree of function (with no function bodies)			``-view-postdom-only``: View postdominance tree of function (with no function bodies)
	-------------------------------------------------------------------------------------			-------------------------------------------------------------------------------------

	Displays the post dominator tree using the GraphViz tool, but omitting function			Displays the post dominator tree using the GraphViz tool, but omitting function
	bodies.			bodies.

				``-transform-warning``: Report missed forced transformations
				------------------------------------------------------------

				Emits warnings about not yet applied forced transformations (e.g. from
				``#pragma omp simd``).

docs/TransformMetadata.rst

This file was added.

				.. _transformation-metadata:

				============================
				Code Transformation Metadata
				============================

				.. contents::
				:local:

				Overview
				========


				LLVM transformation passes can be controlled by attaching metadata to
				the code to transform. By default, transformation passes use heuristics
				hfinkelUnsubmitted Done Reply Inline Actions By default, transformation passes use heuristics to determine whether or not to perform transformations, and when doing so, other details of how the transformations are applied (e.g., which vectorization factor to select). hfinkel: By default, transformation passes use heuristics to determine whether or not to perform…
				to determine whether or not to perform transformations, and when doing
				so, other details of how the transformations are applied (e.g., which
				vectorization factor to select).
				hfinkelUnsubmitted Done Reply Inline Actions As stated, this is untrue (for -O3). For -O3, we only require a likely speedup across many workloads (and slowdowns be unlikely). This is why, for example, under -O3, we can vectorize with runtime checks. How about this wording: Unless the optimizer is otherwise directed, transformations are applied conservatively. This conservatism generally allows the optimizer to avoid unprofitable transformations, but in practice, this results in the optimizer not applying transformations that would be highly profitable. hfinkel: As stated, this is untrue (for -O3). For -O3, we only require a likely speedup across many…
				dmgreenUnsubmitted Done Reply Inline Actions (Space in howthe) dmgreen: (Space in howthe)
				Unless the optimizer is otherwise directed, transformations are applied
				conservatively. This conservatism generally allows the optimizer to
				avoid unprofitable transformations, but in practice, this results in the
				optimizer not applying transformations that would be highly profitable.

				hfinkelUnsubmitted Done Reply Inline Actions it -> they hfinkel: it -> they
				Frontends can give additional hints to LLVM passes on which
				dmgreenUnsubmitted Done Reply Inline Actions Nit: from the emitted IR dmgreen: Nit: from the emitted IR
				transformations they should apply. This can be additional knowledge that
				hfinkelUnsubmitted Done Reply Inline Actions for -> of hfinkel: for -> of
				cannot be derived from the emitted IR, or directives passed from the
				user/programmer. OpenMP pragmas are an example of the latter.

				If any such metadata is dropped from the program, the code's semantics
				must not change.

				Metadata on Loops
				=================

				Attributes can be attached to loops as described in :ref:`llvm.loop`.
				Attributes can describe properties of the loop, disable transformations,
				force specific transformations and set transformation options.

				Because metadata nodes are immutable (with the exception of
				``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
				metadata), in order to add or remove a loop attributes, a new ``MDNode``
				must be created and assigned as the new ``llvm.loop`` metadata. Any
				connection between the old ``MDNode`` and the loop is lost. The
				``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
				the loop effectively gets a new identifier. For instance,
				``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
				the parallel access property is to be preserved after adding/removing
				loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
				updated to the new LoopID.

				Transformation Metadata Structure
				=================================
				hfinkelUnsubmitted Done Reply Inline Actions Unrolling, etc. - no need to capitalize. hfinkel: Unrolling, etc. - no need to capitalize.

				Some attributes describe code transformations (unrolling, vectorizing,
				dmgreenUnsubmitted Done Reply Inline Actions Nit: transformation dmgreen: Nit: transformation
				loop distribution, etc.). They can either be a hint to the optimizer
				hfinkelUnsubmitted Done Reply Inline Actions We should be careful with the language here. As any of these can be dropped without changing the semantics of the code, nothing here is "mandatory". How about saying, ", or convey a specific request from the user" hfinkel: We should be careful with the language here. As any of these can be dropped without changing…
				that a transformation might be beneficial, instruction to use a specific
				option, , or convey a specific request from the user (such as
				``#pragma clang loop`` or ``#pragma omp simd``).

				hfinkelUnsubmitted Done Reply Inline Actions optimization-missed warning hfinkel: optimization-missed warning
				If a transformation is forced but cannot be carried-out for any reason,
				an optimization-missed warning must be emitted. Semantic information
				hfinkelUnsubmitted Done Reply Inline Actions I know what you mean by "separate", but I think it's better to say: is separate. -> can be unused by the optimizer without generating a warning. hfinkel: I know what you mean by "separate", but I think it's better to say: is separate. -> can be…
				such as a transformation being safe (e.g.
				``llvm.mem.parallel_loop_access``) can be unused by the optimizer
				without generating a warning.

				Unless explicitly disabled, any optimization pass may heuristically
				determine whether a transformation is beneficial and apply it. If
				metadata for another transformation was specified, applying a different
				transformation before it might be inadvertent due to being applied on a
				hfinkelUnsubmitted Done Reply Inline Actions This wording is too strong. I think that we need to draw a distinction here between (at least) three classes of transformations: canonicalizing transformations (cost-model-driven) restructuring transformations low-level (target-specific) transformations (e.g., using ctr-register-based loops on PowerPC) I believe that this pragma should only affect those in class (2). Canonizalizing transformations are always performed (when optimizing at all), and low-level transformations are beyond the reach of this kind of metadata. I'd recommend using the wording that this metadata disables, "optional, high-level, restructuring transformations." hfinkel: This wording is too strong. I think that we need to draw a distinction here between (at least)…
				different loop or the loop not existing anymore. To avoid having to
				explicitly disable an unknown number of passes, the attribute
				hfinkelUnsubmitted Done Reply Inline Actions avoids that the loop is altered -> avoids the loop being altered hfinkel: avoids that the loop is altered -> avoids the loop being altered
				``llvm.loop.disable_nonforced`` disables all optional, high-level,
				dmgreenUnsubmitted Done Reply Inline Actions Nit: Maybe change the second "for instance" dmgreen: Nit: Maybe change the second "for instance"
				restructuring transformations.

				The following example avoids the loop being altered before being
				vectorized, for instance being unrolled.

				.. code-block:: llvm

				br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
				...
				!0 = distinct !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				dmgreenUnsubmitted Done Reply Inline Actions Nit: transformed dmgreen: Nit: transformed
				!2 = !{!"llvm.loop.disable_nonforced"}

				After a transformation is applied, follow-up attributes are set on the
				transformed and/or new loop(s). This allows additional attributes
				including followup-transformations to be specified. Specifying multiple
				dmgreenUnsubmitted Done Reply Inline Actions Nit: occur dmgreen: Nit: occur
				hfinkelUnsubmitted Not Done Reply Inline Actions Why is this a useful feature? Should we allow only one transformation per node? hfinkel: Why is this a useful feature? Should we allow only one transformation per node?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions It's not a new feature, but effectively the current behavior. ("is possible for compatibility reasons"). I'd indeed prefer to crash if multiple transformations are applied to avoid undefined behavior, which unfortunately is a breaking change. Meinersbur: It's not a new feature, but effectively the current behavior. ("is possible for compatibility…
				transformations in the same metadata node is possible for compatibility
				reasons, but their execution order is undefined. For instance, when
				``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
				hfinkelUnsubmitted Done Reply Inline Actions loop being vectorized -> loop to be vectorized hfinkel: loop being vectorized -> loop to be vectorized
				specified at the same time, unrolling may occur either before or after
				vectorization.

				As an example, the following instructs a loop to be vectorized and only
				then unrolled.

				.. code-block:: llvm

				!0 = distinct !{!0, !1, !2, !3}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				!2 = !{!"llvm.loop.disable_nonforced"}
				!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}

				hfinkelUnsubmitted Not Done Reply Inline Actions This leaves open the question of whether the vectorizer adds the 'isvectorized' attribute when a follow-up is specific. It should, right? hfinkel: This leaves open the question of whether the vectorizer adds the 'isvectorized' attribute when…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I think it should not, but the metadata gives complete control over which attributes will be in the new loop to the metadata. If a frontend wants to apply a transformation twice, it should be able to do so. I think the paragraph says that no such additional implicit attributes are added when a followup is specified: If no followup is specified, ... I added "If, and only if," to make it even clearer. Meinersbur: I think it should not, but the metadata gives complete control over which attributes will be in…
				If, and only if, no followup is specified, the pass may add attributes itself.
				For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
				all attributes from the original loop excluding its loop vectorizer
				attributes. To avoid this, an empty followup attribute can be used, e.g.

				.. code-block:: llvm

				dmgreenUnsubmitted Done Reply Inline Actions Nit: never be added dmgreen: Nit: never be added
				!3 = !{!"llvm.loop.vectorize.followup_vectorized"}

				The followup attributes of a transformation that cannot be applied will
				never be added to a loop and are therefore effectively ignored. This means
				that any followup-transformation in such attributes requires that its
				prior transformations are applied before the followup-transformation.
				The user should receive a warning about the first transformation in the
				transformation chain that could not be applied if it a forced
				transformation. All following transformations are skipped.

				hfinkelUnsubmitted Done Reply Inline Actions for -> to hfinkel: for -> to
				Pass-Specific Transformation Metadata
				hfinkelUnsubmitted Done Reply Inline Actions comma after following hfinkel: comma after following
				=====================================

				Transformation options are specific to each transformation. In the
				following, we present the model for each LLVM loop optimization pass and
				the metadata to influence them.

				Loop Vectorization and Interleaving
				-----------------------------------

				Loop vectorization and interleaving is interpreted as a single
				transformation. It is interpreted as forced if
				``!{"llvm.loop.vectorize.enable", i1 true}`` is set.

				Assuming the pre-vectorization loop is

				.. code-block:: c

				for (int i = 0; i < n; i+=1) // original loop
				Stmt(i);

				then the code after vectorization will be approximately (assuming an
				SIMD width of 4):

				.. code-block:: c

				int i = 0;
				if (rtc) {
				for (; i + 3 < n; i+=4) // vectorized/interleaved loop
				Stmt(i:i+3);
				}
				for (; i < n; i+=1) // epilogue loop
				hfinkelUnsubmitted Not Done Reply Inline Actions Why would isvectorized not always be provided? hfinkel: Why would isvectorized not always be provided?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I don't know how LoopVectorize behaves when encountering vector instructions, but I want to avoid the vagueness of some passes adding implicit metadata in some situation. The IR should have the control over whether a transformation is applied multiple times. Meinersbur: I don't know how LoopVectorize behaves when encountering vector instructions, but I want to…
				Stmt(i);

				where ``rtc`` is a generated runtime check.

				``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
				the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
				combined with the original loop's attributes to avoid it being
				vectorized multiple times.

				``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
				the remainder loop. If not specified, it will have the original loop's
				dmgreenUnsubmitted Not Done Reply Inline Actions Do you think it's worth mentioning unroll.count and unroll.disable etc, before jumping into the followup metadata? dmgreen: Do you think it's worth mentioning unroll.count and unroll.disable etc, before jumping into the…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Yes, maybe, but they are also already documented in the `LangRef.rst`. Please understand that the goal in this patch is to define a transformation model for each pass such that it is clear what are those followup-loops, not to write an exhaustive documentation. Meinersbur: Yes, maybe, but they are also already documented in the `LangRef.rst`. Please understand that…
				attributes combined with ``llvm.loop.isvectorized`` and
				``llvm.loop.unroll.runtime.disable`` (unless the original loop already
				has unroll metadata).

				The attributes specified by ``llvm.loop.vectorize.followup_all`` are
				added to both loops.

				Loop Unrolling
				--------------

				Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
				metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
				is present. Unrolling can be full unrolling, partial unrolling of a loop
				with constant trip count or runtime unrolling of a loop with a trip
				count unknown at compile-time.

				If the loop has been unrolled fully, there is no followup-loop. For
				partial/runtime unrolling, the original loop of

				.. code-block:: c

				for (int i = 0; i < n; i+=1) // original loop
				Stmt(i);

				is transformed into (using an unroll factor of 4):

				.. code-block:: c

				int i = 0;
				for (; i + 3 < n; i+=4) // unrolled loop
				Stmt(i);
				Stmt(i+1);
				Stmt(i+2);
				Stmt(i+3);
				}
				for (; i < n; i+=1) // remainder loop
				Stmt(i);

				``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
				the unrolled loop. If not specified, the attributes of the original loop
				without the ``llvm.loop.unroll.*`` attributes are copied and
				``llvm.loop.unroll.disable`` added to it.

				dmgreenUnsubmitted Not Done Reply Inline Actions Again, could mention unroll_and_jam.count and enable/disable. dmgreen: Again, could mention unroll_and_jam.count and enable/disable.
				``llvm.loop.unroll.followup_remainder`` defines the attributes of the
				remainder loop. If not specified the remainder loop will have no
				attributes. The remainder loop might not be present due to being fully
				unrolled in which case this attribute has no effect.

				Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
				unrolled and remainder loops.

				Unroll-And-Jam
				--------------

				Unroll-and-jam uses the following transformation model (here with an
				unroll factor if 2). Currently, it does not support a fallback version
				when the transformation is unsafe.

				.. code-block:: c

				for (int i = 0; i < n; i+=1) { // original outer loop
				Fore(i);
				for (int j = 0; j < m; j+=1) // original inner loop
				SubLoop(i, j);
				Aft(i);
				}

				.. code-block:: c

				int i = 0;
				for (; i + 1 < n; i+=2) { // unrolled outer loop
				Fore(i);
				Fore(i+1);
				for (int j = 0; j < m; j+=1) { // unrolled inner loop
				SubLoop(i, j);
				SubLoop(i+1, j);
				}
				Aft(i);
				Aft(i+1);
				}
				for (; i < n; i+=1) { // remainder outer loop
				Fore(i);
				for (int j = 0; j < m; j+=1) // remainder inner loop
				SubLoop(i, j);
				Aft(i);
				}

				``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
				of the unrolled outer loop. If not specified, the attributes of the
				dmgreenUnsubmitted Done Reply Inline Actions Nit: have dmgreen: Nit: have
				original outer loop without the ``llvm.loop.unroll.*`` attributes are
				copied and ``llvm.loop.unroll.disable`` added to it.

				``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
				of the unrolled inner loop. If not specified, the attributes of the
				original inner loop are used unchanged.

				``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
				attributes of the outer remainder loop. If not specified it will not
				have any attributes. The remainder loop might not be present due to
				being fully unrolled.

				dmgreenUnsubmitted Not Done Reply Inline Actions Again, maybe describe other metadata first? dmgreen: Again, maybe describe other metadata first?
				``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
				attributes of the inner remainder loop. If not specified it will have
				the attributes of the original inner loop. It the outer remainder loop
				is unrolled, the inner remainder loop might be present multiple times.

				Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
				added to all of the aforementioned output loops.

				Loop Distribution
				-----------------

				The LoopDistribution pass tries to separate vectorizable parts of a loop
				from the non-vectorizable part (which otherwise would make the entire
				loop non-vectorizable). Conceptually, it transforms a loop such as

				.. code-block:: c

				for (int i = 1; i < n; i+=1) { // original loop
				A[i] = i;
				B[i] = 2 + B[i];
				C[i] = 3 + C[i - 1];
				}

				into the following code:

				.. code-block:: c

				if (rtc) {
				for (int i = 1; i < n; i+=1) // coincident loop
				A[i] = i;
				for (int i = 1; i < n; i+=1) // coincident loop
				B[i] = 2 + B[i];
				for (int i = 1; i < n; i+=1) // sequential loop
				C[i] = 3 + C[i - 1];
				} else {
				for (int i = 1; i < n; i+=1) { // fallback loop
				hfinkelUnsubmitted Done Reply Inline Actions where 'rtc' is the generated runtime safety check. hfinkel: where 'rtc' is the generated runtime safety check.
				A[i] = i;
				B[i] = 2 + B[i];
				C[i] = 3 + C[i - 1];
				}
				}

				where ``rtc`` is a generated runtime check.

				``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
				all loops without loop-carried dependencies (i.e. vectorizable loops).
				There might be more than one such loops. If not defined, the loops will
				inherit the original loop's attributes.

				``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
				loop with potentially unsafe dependencies. There should be at most one
				such loop. If not defined, the loop will inherit the original loop's
				attributes.

				``llvm.loop.distribute.followup_fallback`` defines the loop attributes
				for the fallback loop, which is a copy of the original loop for when
				loop versioning is required. If undefined, the fallback loop inherits
				all attributes from the original loop.

				Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
				all of the aforementioned output loops.

				Versioning LICM
				---------------

				The pass hoists code out of loops that are only loop-invariant when
				dynamic conditions apply. For instance, it transforms the loop

				.. code-block:: c

				for (int i = 0; i < n; i+=1) // original loop
				A[i] = B[0];

				into:

				.. code-block:: c

				if (rtc) {
				auto b = B[0];
				dmgreenUnsubmitted Not Done Reply Inline Actions Should we fix this? Will it work as expected with nonforced, if it was enabled? dmgreen: Should we fix this? Will it work as expected with nonforced, if it was enabled?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Yes, but in a separate patch that adds interchange-specific metadata. LoopInterchange currently is not enabled by default and does not modify metadata so its interaction with other transformation is less significant. Meinersbur: Yes, but in a separate patch that adds interchange-specific metadata. LoopInterchange currently…
				dmgreenUnsubmitted Not Done Reply Inline Actions Yep. Certainly a separate patch. dmgreen: Yep. Certainly a separate patch.
				for (int i = 0; i < n; i+=1) // versioned loop
				A[i] = b;
				} else {
				for (int i = 0; i < n; i+=1) // unversioned loop
				A[i] = B[0];
				}

				The runtime condition (``rtc``) checks that the array ``A`` and the
				element `B[0]` do not alias.

				Currently, this transformation does not support followup-attributes.

				Loop Interchange
				----------------

				Currently, the ``LoopInterchange`` pass does not use any metadata.

				Ambiguous Transformation Order
				==============================

				If there multiple transformations defined, the order in which they are
				executed depends on the order in LLVM's pass pipeline, which is subject
				to change. The default optimization pipeline (anything higher than
				``-O0``) has the following order.

				When using the legacy pass manager:

				- LoopInterchange (if enabled)
				- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				- VersioningLICM (if enabled)
				- LoopDistribute
				- LoopVectorizer
				- LoopUnrollAndJam (if enabled)
				- LoopUnroll (partial and runtime unrolling)

				When using the legacy pass manager with LTO:

				- LoopInterchange (if enabled)
				- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				- LoopVectorizer
				- LoopUnroll (partial and runtime unrolling)

				When using the new pass manager:

				- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				- LoopDistribute
				- LoopVectorizer
				hfinkelUnsubmitted Done Reply Inline Actions must be -> should be hfinkel: must be -> should be
				- LoopUnrollAndJam (if enabled)
				hfinkelUnsubmitted Done Reply Inline Actions responsible for this reporting hfinkel: responsible for this reporting
				- LoopUnroll (partial and runtime unrolling)
				hfinkelUnsubmitted Done Reply Inline Actions they might -> there might hfinkel: they might -> there might
				hfinkelUnsubmitted Done Reply Inline Actions being able - > able hfinkel: being able - > able

				Leftover Transformations
				hfinkelUnsubmitted Done Reply Inline Actions is -> may be (keep the entire list in the hypothetical) hfinkel: is -> may be (keep the entire list in the hypothetical)
				========================
				dmgreenUnsubmitted Done Reply Inline Actions Nit: pass pass dmgreen: Nit: pass pass

				Forced transformations that have not been applied after the last
				transformation pass should be reported to the user. The transformation
				passes themselves cannot be responsible for this reporting because there
				hfinkelUnsubmitted Done Reply Inline Actions there -> they hfinkel: there -> they
				might not be in the pipeline, there might be multiple passes able to
				apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
				transformation attribute may be 'hidden' inside another passes' followup
				attribute.

				The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
				emits such warnings. It should be placed after the last transformation
				pass.

				The current pass pipeline has a fixed order in which transformations
				passes are executed. A transformation can be in the followup of a pass
				hfinkelUnsubmitted Done Reply Inline Actions in a fixedpoint loop -> using a dynamic ordering (not to be too prescriptive) hfinkel: in a fixedpoint loop -> using a dynamic ordering (not to be too prescriptive)
				that is executed later and thus leftover. For instance, a loop nest
				cannot be distributed and then interchanged with the current pass
				pipeline. The loop distribution will execute, but there is no loop
				interchange pass following such that any loop interchange metadata will
				be ignored. The ``-transform-warning`` should emit a warning in this
				case.

				Future versions of LLVM may fix this by executing transformations using
				a dynamic ordering.

docs/index.rst

Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	.. toctree::
AMDGPUUsage		AMDGPUUsage
StackMaps		StackMaps
InAlloca		InAlloca
BigEndianNEON		BigEndianNEON
CoverageMappingFormat		CoverageMappingFormat
Statepoints		Statepoints
MergeFunctions		MergeFunctions
TypeMetadata		TypeMetadata
		TransformMetadata
FaultMaps		FaultMaps
MIRLangRef		MIRLangRef
Coroutines		Coroutines
GlobalISel		GlobalISel
XRay		XRay
XRayExample		XRayExample
XRayFDRFormat		XRayFDRFormat
PDB/index		PDB/index
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 393 Lines • ▼ Show 20 Lines
	void initializeTypeBasedAAWrapperPassPass(PassRegistry&);			void initializeTypeBasedAAWrapperPassPass(PassRegistry&);
	void initializeUnifyFunctionExitNodesPass(PassRegistry&);			void initializeUnifyFunctionExitNodesPass(PassRegistry&);
	void initializeUnpackMachineBundlesPass(PassRegistry&);			void initializeUnpackMachineBundlesPass(PassRegistry&);
	void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);			void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);
	void initializeUnreachableMachineBlockElimPass(PassRegistry&);			void initializeUnreachableMachineBlockElimPass(PassRegistry&);
	void initializeVerifierLegacyPassPass(PassRegistry&);			void initializeVerifierLegacyPassPass(PassRegistry&);
	void initializeVirtRegMapPass(PassRegistry&);			void initializeVirtRegMapPass(PassRegistry&);
	void initializeVirtRegRewriterPass(PassRegistry&);			void initializeVirtRegRewriterPass(PassRegistry&);
				void initializeWarnMissedTransformationsLegacyPass(PassRegistry &);
	void initializeWasmEHPreparePass(PassRegistry&);			void initializeWasmEHPreparePass(PassRegistry&);
	void initializeWholeProgramDevirtPass(PassRegistry&);			void initializeWholeProgramDevirtPass(PassRegistry&);
	void initializeWinEHPreparePass(PassRegistry&);			void initializeWinEHPreparePass(PassRegistry&);
	void initializeWriteBitcodePassPass(PassRegistry&);			void initializeWriteBitcodePassPass(PassRegistry&);
	void initializeWriteThinLTOBitcodePass(PassRegistry&);			void initializeWriteThinLTOBitcodePass(PassRegistry&);
	void initializeXRayInstrumentationPass(PassRegistry&);			void initializeXRayInstrumentationPass(PassRegistry&);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_INITIALIZEPASSES_H			#endif // LLVM_INITIALIZEPASSES_H

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();		(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
(void) llvm::createStraightLineStrengthReducePass();		(void) llvm::createStraightLineStrengthReducePass();
(void) llvm::createMemDerefPrinter();		(void) llvm::createMemDerefPrinter();
(void) llvm::createMustExecutePrinter();		(void) llvm::createMustExecutePrinter();
(void) llvm::createFloat2IntPass();		(void) llvm::createFloat2IntPass();
(void) llvm::createEliminateAvailableExternallyPass();		(void) llvm::createEliminateAvailableExternallyPass();
(void) llvm::createScalarizeMaskedMemIntrinPass();		(void) llvm::createScalarizeMaskedMemIntrinPass();
		(void) llvm::createWarnMissedTransformationsPass();
		dmgreenUnsubmitted Done Reply Inline Actions Nit: Space I guess? I think this file could do with a clang-formatting dmgreen: Nit: Space I guess? I think this file could do with a clang-formatting
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I used `git clang-format origin/master` which only applies to changed lines like this one. Meinersbur: I used `git clang-format origin/master` which only applies to changed lines like this one.

(void)new llvm::IntervalPartition();		(void)new llvm::IntervalPartition();
(void)new llvm::ScalarEvolutionWrapperPass();		(void)new llvm::ScalarEvolutionWrapperPass();
llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();		llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();
llvm::RGPassManager RGM;		llvm::RGPassManager RGM;
llvm::TargetLibraryInfoImpl TLII;		llvm::TargetLibraryInfoImpl TLII;
llvm::TargetLibraryInfo TLI(TLII);		llvm::TargetLibraryInfo TLI(TLII);
llvm::AliasAnalysis AA(TLI);		llvm::AliasAnalysis AA(TLI);
Show All 9 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 478 Lines • ▼ Show 20 Lines
	FunctionPass *createLibCallsShrinkWrapPass();			FunctionPass *createLibCallsShrinkWrapPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopSimplifyCFG - This pass performs basic CFG simplification on loops,			// LoopSimplifyCFG - This pass performs basic CFG simplification on loops,
	// primarily to help other loop passes.			// primarily to help other loop passes.
	//			//
	Pass *createLoopSimplifyCFGPass();			Pass *createLoopSimplifyCFGPass();

				//===----------------------------------------------------------------------===//
				//
				// WarnMissedTransformations - This pass emits warnings for leftover forced
				// transformations.
				//
				Pass *createWarnMissedTransformationsPass();
	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

include/llvm/Transforms/Scalar/WarnMissedTransforms.h

This file was added.

				//===- WarnMissedTransforms.h ------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Emit warnings if forced code transformations have not been performed.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_WARNMISSEDTRANSFORMS_H
				#define LLVM_TRANSFORMS_SCALAR_WARNMISSEDTRANSFORMS_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {
				class Function;
				class Loop;
				class LPMUpdater;

				// New pass manager boilerplate.
				class WarnMissedTransformationsPass
				: public PassInfoMixin<WarnMissedTransformationsPass> {
				public:
				explicit WarnMissedTransformationsPass() {}

				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};

				// Legacy pass manager boilerplate.
				Pass *createWarnMissedTransformationsPass();
				void initializeWarnMissedTransformationsLegacyPass(PassRegistry &);
				} // end namespace llvm
				dmgreenUnsubmitted Not Done Reply Inline Actions Are these two needed here, if they are declared in other places? dmgreen: Are these two needed here, if they are declared in other places?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Depends on which headers are included when using this pass. `WarnMissingTransforms.cpp` itself does not include `include/llvm/InitializePasses.h` or `include/llvm/Transforms/Scalar.h`. Having the forward declarations here matches the translation unit header idiom an the compiler checks that the definition matches the declaration here. Meinersbur: Depends on which headers are included when using this pass. `WarnMissingTransforms.cpp` itself…

				#endif // LLVM_TRANSFORMS_SCALAR_WARNMISSEDTRANSFORMS_H

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	/// Find string metadata for loop			/// Find string metadata for loop
	///			///
	/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an			/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
	/// operand or null otherwise. If the string metadata is not found return			/// operand or null otherwise. If the string metadata is not found return
	/// Optional's not-a-value.			/// Optional's not-a-value.
	Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,			Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,
	StringRef Name);			StringRef Name);

				/// Find named metadata for a loop with an integer value.
				llvm::Optional<int> getOptionalIntLoopAttribute(Loop *TheLoop, StringRef Name);

				/// Create a new loop identifier for a loop created from a loop transformation.
				dmgreenUnsubmitted Done Reply Inline Actions Nit: transformations->transformation dmgreen: Nit: transformations->transformation
				///
				/// @param OrigLoopID The loop ID of the loop before the transformation.
				/// @param FollowupAttrs List of attribute names that contain attributes to be
				/// added to the new loop ID.
				/// @param InheritAttrsExceptPrefix Selects which attributes should be inherited
				/// from the original loop. The following values
				/// are considered:
				/// nullptr : Inherit all attributes from @p OrigLoopID.
				dmgreenUnsubmitted Done Reply Inline Actions Nit: inherit dmgreen: Nit: inherit
				/// "" : Do not inherit any attribute from @p OrigLoopID; only use
				/// those specified by a followup attribute.
				/// "<prefix>": Inherit all attributes except those which start with
				/// <prefix>; commonly used to remove metadata for the
				/// applied transformation.
				/// @param AlwaysNew If true, do not try to reuse OrigLoopID and never return
				/// None.
				///
				/// @return The loop ID for the after-transformation loop. The following values
				/// can be returned:
				/// None : No followup attribute was found; it is up to the
				/// transformation to choose attributes that make sense.
				dmgreenUnsubmitted Done Reply Inline Actions Nit: choose dmgreen: Nit: choose
				/// @p OrigLoopID: The original identifier can be reused.
				/// nullptr : The new loop has no attributes.
				/// MDNode* : A new unique loop identifier.
				Optional<MDNode *>
				makeFollowupLoopID(MDNode *OrigLoopID, ArrayRef<StringRef> FollowupAttrs,
				const char *InheritOptionsAttrsPrefix = "",
				bool AlwaysNew = false);

				/// Look for the loop attribute that disables all transformation heuristic.
				bool hasDisableAllTransformsHint(const Loop *L);

				/// The mode sets how eager a transformation should be applied.
				enum TransformationMode {
				/// The pass can use heuristics to determine whether a transformation should
				/// be applied.
				TM_Unspecified,

				/// The transformation should be applied without considering a cost model.
				TM_Enable,

				/// The transformation should not be applied.
				TM_Disable,

				/// Force is a flag and should not be used alone.
				TM_Force = 0x04,

				/// The transformation was directed by the user, e.g. by a #pragma in
				hfinkelUnsubmitted Done Reply Inline Actions We can't have metadata necessary for correctness. 'ForcedByUser' is fine to indicate that the user should receive a warning if the transformation cannot be performed. hfinkel: We can't have metadata necessary for correctness. 'ForcedByUser' is fine to indicate that the…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions [comment] This indicates that using metadata for user-directed loop-transformation #pragmas is a leaky abstraction. Meinersbur: [comment] This indicates that using metadata for user-directed loop-transformation #pragmas is…
				/// the source code. If the transformation could not be applied, a
				/// warning should be emitted.
				TM_ForcedByUser = TM_Enable \| TM_Force,
				dmgreenUnsubmitted Done Reply Inline Actions Nit: warning dmgreen: Nit: warning

				/// The transformation must not be applied. For instance, `#pragma clang loop
				/// unroll(disable)` explicitly forbids any unrolling to take place. Unlike
				/// general loop metadata, it must not be dropped. Most passes should not
				/// behave differently under TM_Disable and TM_SuppressedByUser.
				TM_SuppressedByUser = TM_Disable \| TM_Force
				};

				/// @{
				/// Get the mode for LLVM's supported loop transformations.
				TransformationMode hasUnrollTransformation(Loop *L);
				TransformationMode hasUnrollAndJamTransformation(Loop *L);
				TransformationMode hasVectorizeTransformation(Loop *L);
				TransformationMode hasDistributeTransformation(Loop *L);
				TransformationMode hasLICMVersioningTransformation(Loop *L);
				/// @}

	/// Set input string into loop metadata by keeping other values intact.			/// Set input string into loop metadata by keeping other values intact.
	void addStringMetadataToLoop(Loop TheLoop, const char MDString,			void addStringMetadataToLoop(Loop TheLoop, const char MDString,
	unsigned V = 0);			unsigned V = 0);

	/// Get a loop's estimated trip count based on branch weight metadata.			/// Get a loop's estimated trip count based on branch weight metadata.
	/// Returns 0 when the count is estimated to be 0, or None when a meaningful			/// Returns 0 when the count is estimated to be 0, or None when a meaningful
	/// estimate can not be made.			/// estimate can not be made.
	Optional<unsigned> getLoopEstimatedTripCount(Loop *L);			Optional<unsigned> getLoopEstimatedTripCount(Loop *L);
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/UnrollLoop.h

	Show All 29 Lines
	class Loop;			class Loop;
	class LoopInfo;			class LoopInfo;
	class MDNode;			class MDNode;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class ScalarEvolution;			class ScalarEvolution;

	using NewLoopsMap = SmallDenseMap<const Loop , Loop , 4>;			using NewLoopsMap = SmallDenseMap<const Loop , Loop , 4>;

				/// @{
				/// Metadata attribute names
				const char *const LLVMLoopUnrollFollowupAll = "llvm.loop.unroll.followup_all";
				const char *const LLVMLoopUnrollFollowupUnrolled =
				"llvm.loop.unroll.followup_unrolled";
				const char *const LLVMLoopUnrollFollowupRemainder =
				"llvm.loop.unroll.followup_remainder";
				/// @}

	const Loop* addClonedBlockToLoopInfo(BasicBlock *OriginalBB,			const Loop* addClonedBlockToLoopInfo(BasicBlock *OriginalBB,
	BasicBlock ClonedBB, LoopInfo LI,			BasicBlock ClonedBB, LoopInfo LI,
	NewLoopsMap &NewLoops);			NewLoopsMap &NewLoops);

	/// Represents the result of a \c UnrollLoop invocation.			/// Represents the result of a \c UnrollLoop invocation.
	enum class LoopUnrollResult {			enum class LoopUnrollResult {
	/// The loop was not modified.			/// The loop was not modified.
	Unmodified,			Unmodified,
	Show All 10 Lines

	LoopUnrollResult UnrollLoop(Loop *L, unsigned Count, unsigned TripCount,			LoopUnrollResult UnrollLoop(Loop *L, unsigned Count, unsigned TripCount,
	bool Force, bool AllowRuntime,			bool Force, bool AllowRuntime,
	bool AllowExpensiveTripCount, bool PreserveCondBr,			bool AllowExpensiveTripCount, bool PreserveCondBr,
	bool PreserveOnlyFirst, unsigned TripMultiple,			bool PreserveOnlyFirst, unsigned TripMultiple,
	unsigned PeelCount, bool UnrollRemainder,			unsigned PeelCount, bool UnrollRemainder,
	LoopInfo LI, ScalarEvolution SE,			LoopInfo LI, ScalarEvolution SE,
	DominatorTree DT, AssumptionCache AC,			DominatorTree DT, AssumptionCache AC,
	OptimizationRemarkEmitter *ORE, bool PreserveLCSSA);			OptimizationRemarkEmitter *ORE, bool PreserveLCSSA,
				Loop **RemainderLoop = nullptr);

	bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,			bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
	bool AllowExpensiveTripCount,			bool AllowExpensiveTripCount,
	bool UseEpilogRemainder, bool UnrollRemainder,			bool UseEpilogRemainder, bool UnrollRemainder,
	LoopInfo *LI,			LoopInfo LI, ScalarEvolution SE,
	ScalarEvolution SE, DominatorTree DT,			DominatorTree DT, AssumptionCache AC,
	AssumptionCache *AC,			bool PreserveLCSSA,
	bool PreserveLCSSA);			Loop **ResultLoop = nullptr);

	void computePeelCount(Loop *L, unsigned LoopSize,			void computePeelCount(Loop *L, unsigned LoopSize,
	TargetTransformInfo::UnrollingPreferences &UP,			TargetTransformInfo::UnrollingPreferences &UP,
	unsigned &TripCount, ScalarEvolution &SE);			unsigned &TripCount, ScalarEvolution &SE);

	bool canPeel(Loop *L);			bool canPeel(Loop *L);

	bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,			bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,
	DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);			DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);

	LoopUnrollResult UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,			LoopUnrollResult UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,
	unsigned TripMultiple, bool UnrollRemainder,			unsigned TripMultiple, bool UnrollRemainder,
	LoopInfo LI, ScalarEvolution SE,			LoopInfo LI, ScalarEvolution SE,
	DominatorTree DT, AssumptionCache AC,			DominatorTree DT, AssumptionCache AC,
	OptimizationRemarkEmitter *ORE);			OptimizationRemarkEmitter *ORE,
				Loop **EpilogueLoop = nullptr);

	bool isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,			bool isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
	DependenceInfo &DI);			DependenceInfo &DI);

	bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,			bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,
	DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,			DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
	const SmallPtrSetImpl<const Value *> &EphValues,			const SmallPtrSetImpl<const Value *> &EphValues,
	OptimizationRemarkEmitter *ORE, unsigned &TripCount,			OptimizationRemarkEmitter *ORE, unsigned &TripCount,
	Show All 31 Lines

include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:
bool allowVectorization(Function F, Loop L, bool AlwaysVectorize) const;		bool allowVectorization(Function F, Loop L, bool AlwaysVectorize) const;

/// Dumps all the hint information.		/// Dumps all the hint information.
void emitRemarkWithHints() const;		void emitRemarkWithHints() const;

unsigned getWidth() const { return Width.Value; }		unsigned getWidth() const { return Width.Value; }
unsigned getInterleave() const { return Interleave.Value; }		unsigned getInterleave() const { return Interleave.Value; }
unsigned getIsVectorized() const { return IsVectorized.Value; }		unsigned getIsVectorized() const { return IsVectorized.Value; }
enum ForceKind getForce() const { return (ForceKind)Force.Value; }		enum ForceKind getForce() const {
		if (Force.Value == FK_Undefined && hasDisableAllTransformsHint(TheLoop))
		return FK_Disabled;
		return (ForceKind)Force.Value;
		}

/// If hints are provided that force vectorization, use the AlwaysPrint		/// If hints are provided that force vectorization, use the AlwaysPrint
/// pass name to force the frontend to print the diagnostic.		/// pass name to force the frontend to print the diagnostic.
const char *vectorizeAnalysisPassName() const;		const char *vectorizeAnalysisPassName() const;

bool allowReordering() const {		bool allowReordering() const {
// When enabling loop hints are provided we allow the vectorizer to change		// When enabling loop hints are provided we allow the vectorizer to change
// the order of operations that is given by the scalar loop. This is not		// the order of operations that is given by the scalar loop. This is not
▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

lib/Analysis/LoopInfo.cpp

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	MDNode *Loop::getLoopID() const {
}		}
if (!LoopID \|\| LoopID->getNumOperands() == 0 \|\|		if (!LoopID \|\| LoopID->getNumOperands() == 0 \|\|
LoopID->getOperand(0) != LoopID)		LoopID->getOperand(0) != LoopID)
return nullptr;		return nullptr;
return LoopID;		return LoopID;
}		}

void Loop::setLoopID(MDNode *LoopID) const {		void Loop::setLoopID(MDNode *LoopID) const {
assert(LoopID && "Loop ID should not be null");		assert((!LoopID \|\| LoopID->getNumOperands() > 0) &&
assert(LoopID->getNumOperands() > 0 && "Loop ID needs at least one operand");		"Loop ID needs at least one operand");
assert(LoopID->getOperand(0) == LoopID && "Loop ID should refer to itself");		assert((!LoopID \|\| LoopID->getOperand(0) == LoopID) &&
		"Loop ID should refer to itself");

if (BasicBlock *Latch = getLoopLatch()) {
Latch->getTerminator()->setMetadata(LLVMContext::MD_loop, LoopID);
return;
}

assert(!getLoopLatch() &&
"The loop should have no single latch at this point");
BasicBlock *H = getHeader();		BasicBlock *H = getHeader();
for (BasicBlock *BB : this->blocks()) {		for (BasicBlock *BB : this->blocks()) {
Instruction *TI = BB->getTerminator();		Instruction *TI = BB->getTerminator();
for (BasicBlock *Successor : successors(TI)) {		for (BasicBlock *Successor : successors(TI)) {
if (Successor == H)		if (Successor == H) {
TI->setMetadata(LLVMContext::MD_loop, LoopID);		TI->setMetadata(LLVMContext::MD_loop, LoopID);
		break;
		}
}		}
}		}
}		}

void Loop::setLoopAlreadyUnrolled() {		void Loop::setLoopAlreadyUnrolled() {
MDNode *LoopID = getLoopID();		MDNode *LoopID = getLoopID();
// First remove any existing loop unrolling metadata.		// First remove any existing loop unrolling metadata.
SmallVector<Metadata *, 4> MDs;		SmallVector<Metadata *, 4> MDs;
▲ Show 20 Lines • Show All 495 Lines • Show Last 20 Lines

lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/SROA.h"		#include "llvm/Transforms/Scalar/SROA.h"
#include "llvm/Transforms/Scalar/Scalarizer.h"		#include "llvm/Transforms/Scalar/Scalarizer.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Scalar/SimplifyCFG.h"		#include "llvm/Transforms/Scalar/SimplifyCFG.h"
#include "llvm/Transforms/Scalar/Sink.h"		#include "llvm/Transforms/Scalar/Sink.h"
#include "llvm/Transforms/Scalar/SpeculateAroundPHIs.h"		#include "llvm/Transforms/Scalar/SpeculateAroundPHIs.h"
#include "llvm/Transforms/Scalar/SpeculativeExecution.h"		#include "llvm/Transforms/Scalar/SpeculativeExecution.h"
#include "llvm/Transforms/Scalar/TailRecursionElimination.h"		#include "llvm/Transforms/Scalar/TailRecursionElimination.h"
		#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
#include "llvm/Transforms/Utils/AddDiscriminators.h"		#include "llvm/Transforms/Utils/AddDiscriminators.h"
#include "llvm/Transforms/Utils/BreakCriticalEdges.h"		#include "llvm/Transforms/Utils/BreakCriticalEdges.h"
#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"		#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
#include "llvm/Transforms/Utils/LCSSA.h"		#include "llvm/Transforms/Utils/LCSSA.h"
#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"		#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LowerInvoke.h"		#include "llvm/Transforms/Utils/LowerInvoke.h"
#include "llvm/Transforms/Utils/Mem2Reg.h"		#include "llvm/Transforms/Utils/Mem2Reg.h"
▲ Show 20 Lines • Show All 670 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.		// across the loop nests.
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll		// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
if (EnableUnrollAndJam) {		if (EnableUnrollAndJam) {
OptimizePM.addPass(		OptimizePM.addPass(
createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));		createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));
}		}
OptimizePM.addPass(LoopUnrollPass(LoopUnrollOptions(Level)));		OptimizePM.addPass(LoopUnrollPass(LoopUnrollOptions(Level)));
		OptimizePM.addPass(WarnMissedTransformationsPass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());
OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
OptimizePM.addPass(createFunctionToLoopPassAdaptor(LICMPass(), DebugLogging));		OptimizePM.addPass(createFunctionToLoopPassAdaptor(LICMPass(), DebugLogging));

// Now that we've vectorized and unrolled loops, we may have more refined		// Now that we've vectorized and unrolled loops, we may have more refined
// alignment information, try to re-derive it here.		// alignment information, try to re-derive it here.
OptimizePM.addPass(AlignmentFromAssumptionsPass());		OptimizePM.addPass(AlignmentFromAssumptionsPass());

▲ Show 20 Lines • Show All 1,126 Lines • Show Last 20 Lines

lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("unroll<peeling;no-runtime>",LoopUnrollPass(LoopUnrollOptions().setPeeling(true).setRuntime(false)))			FUNCTION_PASS("unroll<peeling;no-runtime>",LoopUnrollPass(LoopUnrollOptions().setPeeling(true).setRuntime(false)))
	FUNCTION_PASS("verify", VerifierPass())			FUNCTION_PASS("verify", VerifierPass())
	FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())			FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())
	FUNCTION_PASS("verify<loops>", LoopVerifierPass())			FUNCTION_PASS("verify<loops>", LoopVerifierPass())
	FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())			FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())
	FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())			FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())
	FUNCTION_PASS("view-cfg", CFGViewerPass())			FUNCTION_PASS("view-cfg", CFGViewerPass())
	FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())			FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())
				FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
	#undef FUNCTION_PASS			#undef FUNCTION_PASS

	#ifndef LOOP_ANALYSIS			#ifndef LOOP_ANALYSIS
	#define LOOP_ANALYSIS(NAME, CREATE_PASS)			#define LOOP_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())			LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
	LOOP_ANALYSIS("access-info", LoopAccessAnalysis())			LOOP_ANALYSIS("access-info", LoopAccessAnalysis())
	LOOP_ANALYSIS("ivusers", IVUsersAnalysis())			LOOP_ANALYSIS("ivusers", IVUsersAnalysis())
	Show All 25 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 696 Lines • ▼ Show 20 Lines	if (!DisableUnrollLoops) {

// Runtime unrolling will introduce runtime check in loop prologue. If the		// Runtime unrolling will introduce runtime check in loop prologue. If the
// unrolled loop is a inner loop, then the prologue will be inside the		// unrolled loop is a inner loop, then the prologue will be inside the
// outer loop. LICM pass can help to promote the runtime check out if the		// outer loop. LICM pass can help to promote the runtime check out if the
// checked value is loop invariant.		// checked value is loop invariant.
MPM.add(createLICMPass());		MPM.add(createLICMPass());
}		}

		MPM.add(createWarnMissedTransformationsPass());

// After vectorization and unrolling, assume intrinsics may tell us more		// After vectorization and unrolling, assume intrinsics may tell us more
// about pointer alignments.		// about pointer alignments.
MPM.add(createAlignmentFromAssumptionsPass());		MPM.add(createAlignmentFromAssumptionsPass());

// FIXME: We shouldn't bother with this anymore.		// FIXME: We shouldn't bother with this anymore.
MPM.add(createStripDeadPrototypesPass()); // Get rid of dead prototypes		MPM.add(createStripDeadPrototypesPass()); // Get rid of dead prototypes

// GlobalOpt already deletes dead functions and globals, at -O2 try a		// GlobalOpt already deletes dead functions and globals, at -O2 try a
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {

if (!DisableUnrollLoops)		if (!DisableUnrollLoops)
PM.add(createSimpleLoopUnrollPass(OptLevel)); // Unroll small loops		PM.add(createSimpleLoopUnrollPass(OptLevel)); // Unroll small loops
PM.add(createLoopVectorizePass(true, LoopVectorize));		PM.add(createLoopVectorizePass(true, LoopVectorize));
// The vectorizer may have significantly shortened a loop body; unroll again.		// The vectorizer may have significantly shortened a loop body; unroll again.
if (!DisableUnrollLoops)		if (!DisableUnrollLoops)
PM.add(createLoopUnrollPass(OptLevel));		PM.add(createLoopUnrollPass(OptLevel));

		PM.add(createWarnMissedTransformationsPass());

// Now that we've optimized loops (in particular loop induction variables),		// Now that we've optimized loops (in particular loop induction variables),
// we may have exposed more scalar opportunities. Run parts of the scalar		// we may have exposed more scalar opportunities. Run parts of the scalar
// optimizer again at this point.		// optimizer again at this point.
addInstructionCombiningPass(PM); // Initial cleanup		addInstructionCombiningPass(PM); // Initial cleanup
PM.add(createCFGSimplificationPass()); // if-convert		PM.add(createCFGSimplificationPass()); // if-convert
PM.add(createSCCPPass()); // Propagate exposed constants		PM.add(createSCCPPass()); // Propagate exposed constants
addInstructionCombiningPass(PM); // Clean up again		addInstructionCombiningPass(PM); // Clean up again
PM.add(createBitTrackingDCEPass());		PM.add(createBitTrackingDCEPass());
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMScalarOpts
SimpleLoopUnswitch.cpp		SimpleLoopUnswitch.cpp
SimplifyCFGPass.cpp		SimplifyCFGPass.cpp
Sink.cpp		Sink.cpp
SpeculativeExecution.cpp		SpeculativeExecution.cpp
SpeculateAroundPHIs.cpp		SpeculateAroundPHIs.cpp
StraightLineStrengthReduce.cpp		StraightLineStrengthReduce.cpp
StructurizeCFG.cpp		StructurizeCFG.cpp
TailRecursionElimination.cpp		TailRecursionElimination.cpp
		WarnMissedTransforms.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Transforms/Scalar/LoopDistribute.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

#define LDIST_NAME "loop-distribute"		#define LDIST_NAME "loop-distribute"
#define DEBUG_TYPE LDIST_NAME		#define DEBUG_TYPE LDIST_NAME

		/// @{
		/// Metadata attribute names
		const char *const LLVMLoopDistributeFollowupAll =
		"llvm.loop.distribute.followup_all";
		const char *const LLVMLoopDistributeFollowupCoincident =
		"llvm.loop.distribute.followup_coincident";
		const char *const LLVMLoopDistributeFollowuSequential =
		hfinkelUnsubmitted Done Reply Inline Actions This should say Followup, not Followu, I suppose. hfinkel: This should say Followup, not Followu, I suppose.
		"llvm.loop.distribute.followup_sequential";
		const char *const LLVMLoopDistributeFollowupFallback =
		"llvm.loop.distribute.followup_fallback";
		/// @}

static cl::opt<bool>		static cl::opt<bool>
LDistVerify("loop-distribute-verify", cl::Hidden,		LDistVerify("loop-distribute-verify", cl::Hidden,
cl::desc("Turn on DominatorTree and LoopInfo verification "		cl::desc("Turn on DominatorTree and LoopInfo verification "
"after Loop Distribution"),		"after Loop Distribution"),
cl::init(false));		cl::init(false));

static cl::opt<bool> DistributeNonIfConvertible(		static cl::opt<bool> DistributeNonIfConvertible(
"loop-distribute-non-if-convertible", cl::Hidden,		"loop-distribute-non-if-convertible", cl::Hidden,
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:

/// The cloned loop. If this partition is mapped to the original loop,		/// The cloned loop. If this partition is mapped to the original loop,
/// this is null.		/// this is null.
const Loop *getClonedLoop() const { return ClonedLoop; }		const Loop *getClonedLoop() const { return ClonedLoop; }

/// Returns the loop where this partition ends up after distribution.		/// Returns the loop where this partition ends up after distribution.
/// If this partition is mapped to the original loop then use the block from		/// If this partition is mapped to the original loop then use the block from
/// the loop.		/// the loop.
const Loop *getDistributedLoop() const {		Loop *getDistributedLoop() const {
return ClonedLoop ? ClonedLoop : OrigLoop;		return ClonedLoop ? ClonedLoop : OrigLoop;
}		}

/// The VMap that is populated by cloning and then used in		/// The VMap that is populated by cloning and then used in
/// remapinstruction to remap the cloned instructions.		/// remapinstruction to remap the cloned instructions.
ValueToValueMapTy &getVMap() { return VMap; }		ValueToValueMapTy &getVMap() { return VMap; }

/// Remaps the cloned instructions using VMap.		/// Remaps the cloned instructions using VMap.
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	void cloneLoops() {
Loop *NewLoop;		Loop *NewLoop;

assert(!PartitionContainer.empty() && "at least two partitions expected");		assert(!PartitionContainer.empty() && "at least two partitions expected");
// We're cloning the preheader along with the loop so we already made sure		// We're cloning the preheader along with the loop so we already made sure
// it was empty.		// it was empty.
assert(&*OrigPH->begin() == OrigPH->getTerminator() &&		assert(&*OrigPH->begin() == OrigPH->getTerminator() &&
"preheader not empty");		"preheader not empty");

		// Preserve the original loop ID for use after the transformation.
		MDNode *OrigLoopID = L->getLoopID();

// Create a loop for each partition except the last. Clone the original		// Create a loop for each partition except the last. Clone the original
// loop before PH along with adding a preheader for the cloned loop. Then		// loop before PH along with adding a preheader for the cloned loop. Then
// update PH to point to the newly added preheader.		// update PH to point to the newly added preheader.
BasicBlock *TopPH = OrigPH;		BasicBlock *TopPH = OrigPH;
unsigned Index = getSize() - 1;		unsigned Index = getSize() - 1;
for (auto I = std::next(PartitionContainer.rbegin()),		for (auto I = std::next(PartitionContainer.rbegin()),
E = PartitionContainer.rend();		E = PartitionContainer.rend();
I != E; ++I, --Index, TopPH = NewLoop->getLoopPreheader()) {		I != E; ++I, --Index, TopPH = NewLoop->getLoopPreheader()) {
auto Part = &I;		auto Part = &I;

NewLoop = Part->cloneLoopWithPreheader(TopPH, Pred, Index, LI, DT);		NewLoop = Part->cloneLoopWithPreheader(TopPH, Pred, Index, LI, DT);

Part->getVMap()[ExitBlock] = TopPH;		Part->getVMap()[ExitBlock] = TopPH;
Part->remapInstructions();		Part->remapInstructions();
		setNewLoopID(OrigLoopID, Part);
}		}
Pred->getTerminator()->replaceUsesOfWith(OrigPH, TopPH);		Pred->getTerminator()->replaceUsesOfWith(OrigPH, TopPH);

		// Also set a new loop ID for the last loop.
		setNewLoopID(OrigLoopID, &PartitionContainer.back());

// Now go in forward order and update the immediate dominator for the		// Now go in forward order and update the immediate dominator for the
// preheaders with the exiting block of the previous loop. Dominance		// preheaders with the exiting block of the previous loop. Dominance
// within the loop is updated in cloneLoopWithPreheader.		// within the loop is updated in cloneLoopWithPreheader.
for (auto Curr = PartitionContainer.cbegin(),		for (auto Curr = PartitionContainer.cbegin(),
Next = std::next(PartitionContainer.cbegin()),		Next = std::next(PartitionContainer.cbegin()),
E = PartitionContainer.cend();		E = PartitionContainer.cend();
Next != E; ++Curr, ++Next)		Next != E; ++Curr, ++Next)
DT->changeImmediateDominator(		DT->changeImmediateDominator(
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	for (auto I = PartitionContainer.begin(); I != PartitionContainer.end();) {
I->moveTo(*PrevMatch);		I->moveTo(*PrevMatch);
I = PartitionContainer.erase(I);		I = PartitionContainer.erase(I);
} else {		} else {
PrevMatch = nullptr;		PrevMatch = nullptr;
++I;		++I;
}		}
}		}
}		}

		/// Assign new LoopIDs for the partition's cloned loop.
		void setNewLoopID(MDNode OrigLoopID, InstPartition Part) {
		Optional<MDNode *> PartitionID = makeFollowupLoopID(
		OrigLoopID,
		{LLVMLoopDistributeFollowupAll,
		Part->hasDepCycle() ? LLVMLoopDistributeFollowuSequential
		: LLVMLoopDistributeFollowupCoincident});
		if (PartitionID.hasValue()) {
		Loop *NewLoop = Part->getDistributedLoop();
		NewLoop->setLoopID(PartitionID.getValue());
		}
		}
};		};

/// For each memory instruction, this class maintains difference of the		/// For each memory instruction, this class maintains difference of the
/// number of unsafe dependences that start out from this instruction minus		/// number of unsafe dependences that start out from this instruction minus
/// those that end here.		/// those that end here.
///		///
/// By traversing the memory instructions in program order and accumulating this		/// By traversing the memory instructions in program order and accumulating this
/// number, we know whether any unsafe dependence crosses over a program point.		/// number, we know whether any unsafe dependence crosses over a program point.
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	bool processLoop(std::function<const LoopAccessInfo &(Loop &)> &GetLAA) {
// Don't distribute the loop if we need too many SCEV run-time checks.		// Don't distribute the loop if we need too many SCEV run-time checks.
const SCEVUnionPredicate &Pred = LAI->getPSE().getUnionPredicate();		const SCEVUnionPredicate &Pred = LAI->getPSE().getUnionPredicate();
if (Pred.getComplexity() > (IsForced.getValueOr(false)		if (Pred.getComplexity() > (IsForced.getValueOr(false)
? PragmaDistributeSCEVCheckThreshold		? PragmaDistributeSCEVCheckThreshold
: DistributeSCEVCheckThreshold))		: DistributeSCEVCheckThreshold))
return fail("TooManySCEVRuntimeChecks",		return fail("TooManySCEVRuntimeChecks",
"too many SCEV run-time checks needed.\n");		"too many SCEV run-time checks needed.\n");

		if (!IsForced.getValueOr(false) && hasDisableAllTransformsHint(L))
		return fail("HeuristicDisabled", "distribution heuristic disabled");

LLVM_DEBUG(dbgs() << "\nDistributing loop: " << *L << "\n");		LLVM_DEBUG(dbgs() << "\nDistributing loop: " << *L << "\n");
// We're done forming the partitions set up the reverse mapping from		// We're done forming the partitions set up the reverse mapping from
// instructions to partitions.		// instructions to partitions.
Partitions.setupPartitionIdOnInstructions();		Partitions.setupPartitionIdOnInstructions();

// To keep things simple have an empty preheader before we version or clone		// To keep things simple have an empty preheader before we version or clone
// the loop. (Also split if this has no predecessor, i.e. entry, because we		// the loop. (Also split if this has no predecessor, i.e. entry, because we
// rely on PH having a predecessor.)		// rely on PH having a predecessor.)
if (!PH->getSinglePredecessor() \|\| &*PH->begin() != PH->getTerminator())		if (!PH->getSinglePredecessor() \|\| &*PH->begin() != PH->getTerminator())
SplitBlock(PH, PH->getTerminator(), DT, LI);		SplitBlock(PH, PH->getTerminator(), DT, LI);

// If we need run-time checks, version the loop now.		// If we need run-time checks, version the loop now.
auto PtrToPartition = Partitions.computePartitionSetForPointers(*LAI);		auto PtrToPartition = Partitions.computePartitionSetForPointers(*LAI);
const auto *RtPtrChecking = LAI->getRuntimePointerChecking();		const auto *RtPtrChecking = LAI->getRuntimePointerChecking();
const auto &AllChecks = RtPtrChecking->getChecks();		const auto &AllChecks = RtPtrChecking->getChecks();
auto Checks = includeOnlyCrossPartitionChecks(AllChecks, PtrToPartition,		auto Checks = includeOnlyCrossPartitionChecks(AllChecks, PtrToPartition,
RtPtrChecking);		RtPtrChecking);

if (!Pred.isAlwaysTrue() \|\| !Checks.empty()) {		if (!Pred.isAlwaysTrue() \|\| !Checks.empty()) {
		MDNode *OrigLoopID = L->getLoopID();

LLVM_DEBUG(dbgs() << "\nPointers:\n");		LLVM_DEBUG(dbgs() << "\nPointers:\n");
LLVM_DEBUG(LAI->getRuntimePointerChecking()->printChecks(dbgs(), Checks));		LLVM_DEBUG(LAI->getRuntimePointerChecking()->printChecks(dbgs(), Checks));
LoopVersioning LVer(*LAI, L, LI, DT, SE, false);		LoopVersioning LVer(*LAI, L, LI, DT, SE, false);
LVer.setAliasChecks(std::move(Checks));		LVer.setAliasChecks(std::move(Checks));
LVer.setSCEVChecks(LAI->getPSE().getUnionPredicate());		LVer.setSCEVChecks(LAI->getPSE().getUnionPredicate());
LVer.versionLoop(DefsUsedOutside);		LVer.versionLoop(DefsUsedOutside);
LVer.annotateLoopWithNoAlias();		LVer.annotateLoopWithNoAlias();

		// The unversioned loop will not be changed, so we inherit all attributes
		// from the original loop, but remove the loop distribution metadata to
		// avoid to distribute it again.
		MDNode *UnversionedLoopID =
		makeFollowupLoopID(OrigLoopID,
		{LLVMLoopDistributeFollowupAll,
		LLVMLoopDistributeFollowupFallback},
		"llvm.loop.distribute.", true)
		.getValue();
		LVer.getNonVersionedLoop()->setLoopID(UnversionedLoopID);
}		}

// Create identical copies of the original loop for each partition and hook		// Create identical copies of the original loop for each partition and hook
// them up sequentially.		// them up sequentially.
Partitions.cloneLoops();		Partitions.cloneLoops();

// Now, we remove the instruction from each loop that don't belong to that		// Now, we remove the instruction from each loop that don't belong to that
// partition.		// partition.
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <string>		#include <string>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll-and-jam"		#define DEBUG_TYPE "loop-unroll-and-jam"

		/// @{
		/// Metadata attribute names
		const char *const LLVMLoopUnrollAndJamFollowupAll =
		"llvm.loop.unroll_and_jam.followup_all";
		const char *const LLVMLoopUnrollAndJamFollowupInner =
		"llvm.loop.unroll_and_jam.followup_inner";
		const char *const LLVMLoopUnrollAndJamFollowupOuter =
		"llvm.loop.unroll_and_jam.followup_outer";
		const char *const LLVMLoopUnrollAndJamFollowupRemainderInner =
		"llvm.loop.unroll_and_jam.followup_remainder_inner";
		const char *const LLVMLoopUnrollAndJamFollowupRemainderOuter =
		"llvm.loop.unroll_and_jam.followup_remainder_outer";
		/// @}

static cl::opt<bool>		static cl::opt<bool>
AllowUnrollAndJam("allow-unroll-and-jam", cl::Hidden,		AllowUnrollAndJam("allow-unroll-and-jam", cl::Hidden,
cl::desc("Allows loops to be unroll-and-jammed."));		cl::desc("Allows loops to be unroll-and-jammed."));

static cl::opt<unsigned> UnrollAndJamCount(		static cl::opt<unsigned> UnrollAndJamCount(
"unroll-and-jam-count", cl::Hidden,		"unroll-and-jam-count", cl::Hidden,
cl::desc("Use this unroll count for all loops including those with "		cl::desc("Use this unroll count for all loops including those with "
"unroll_and_jam_count pragma values, for testing purposes"));		"unroll_and_jam_count pragma values, for testing purposes"));
Show All 40 Lines	static bool HasAnyUnrollPragma(const Loop *L, StringRef Prefix) {
return false;		return false;
}		}

// Returns true if the loop has an unroll_and_jam(enable) pragma.		// Returns true if the loop has an unroll_and_jam(enable) pragma.
static bool HasUnrollAndJamEnablePragma(const Loop *L) {		static bool HasUnrollAndJamEnablePragma(const Loop *L) {
return GetUnrollMetadataForLoop(L, "llvm.loop.unroll_and_jam.enable");		return GetUnrollMetadataForLoop(L, "llvm.loop.unroll_and_jam.enable");
}		}

// Returns true if the loop has an unroll_and_jam(disable) pragma.
static bool HasUnrollAndJamDisablePragma(const Loop *L) {
return GetUnrollMetadataForLoop(L, "llvm.loop.unroll_and_jam.disable");
}

// If loop has an unroll_and_jam_count pragma return the (necessarily		// If loop has an unroll_and_jam_count pragma return the (necessarily
// positive) value from the pragma. Otherwise return 0.		// positive) value from the pragma. Otherwise return 0.
static unsigned UnrollAndJamCountPragmaValue(const Loop *L) {		static unsigned UnrollAndJamCountPragmaValue(const Loop *L) {
MDNode *MD = GetUnrollMetadataForLoop(L, "llvm.loop.unroll_and_jam.count");		MDNode *MD = GetUnrollMetadataForLoop(L, "llvm.loop.unroll_and_jam.count");
if (MD) {		if (MD) {
assert(MD->getNumOperands() == 2 &&		assert(MD->getNumOperands() == 2 &&
"Unroll count hint metadata should have two operands.");		"Unroll count hint metadata should have two operands.");
unsigned Count =		unsigned Count =
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (PragmaCount > 0) {
UP.Force = true;		UP.Force = true;
if ((UP.AllowRemainder \|\| (OuterTripMultiple % PragmaCount == 0)) &&		if ((UP.AllowRemainder \|\| (OuterTripMultiple % PragmaCount == 0)) &&
getUnrollAndJammedLoopSize(OuterLoopSize, UP) < UP.Threshold &&		getUnrollAndJammedLoopSize(OuterLoopSize, UP) < UP.Threshold &&
getUnrollAndJammedLoopSize(InnerLoopSize, UP) <		getUnrollAndJammedLoopSize(InnerLoopSize, UP) <
UP.UnrollAndJamInnerLoopThreshold)		UP.UnrollAndJamInnerLoopThreshold)
return true;		return true;
}		}

bool PragmaEnableUnroll = HasUnrollAndJamEnablePragma(L);		bool PragmaEnableUnroll = HasUnrollAndJamEnablePragma(L);
		dmgreenUnsubmitted Done Reply Inline Actions This code will need rebasing. There is a check earlier that looks for disable metadata that could be replaced by this. Look for HasUnrollDisablePragma/HasUnrollAndJamDisablePragma. If the same was done for unrolling, I think that would remove the need for the IgnoreUser (although your comment about it is probably still true). dmgreen: This code will need rebasing. There is a check earlier that looks for disable metadata that…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I'd push towards refectoring-out the common parts of `computeUnrollCount` used by LoopUnroll and LoopUnrollAndJam. Currently `computeUnrollCount` uses lots of settings meant for LoopUnroll (`llvm.loop.unroll.` metadata which should not exist anymore, OptimizationRemarkMissed specific to LoopUnroll, `-unroll-count`, `PartialThreshold`, handling of full unroll, loop peeling that UnrollAndJam does not support, being used in a single call by UnrollAndJam for two different things: determining `ExplicitUnroll` (i.e. is normal unroll is forced) and the unroll-and-jam count). It's hard to understand the subtleties between those codes. I gave up at some point and added the `IgnoreUser` flag to make test cases pass. Meinersbur: I'd push towards refectoring-out the common parts of `computeUnrollCount` used by LoopUnroll…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions `IgnoreUser` exists for the `nounroll_plus_unroll_and_jam` in `LoopUnrollAndJam\pragma.ll`. `llvm.loop.unroll.disable causes`hasUnrollTransformation` in `computeUnrollCount` to return `TM_Disable`. Unrolling inside `computeUnrollCount` is disabled setting the unroll factor to 0. UnrollAndJam then tries to use the that unroll factor. Meinersbur: `IgnoreUser` exists for the `nounroll_plus_unroll_and_jam` in `LoopUnrollAndJam\pragma.ll`.
		dmgreenUnsubmitted Not Done Reply Inline Actions Have you considered moving the "disable" check out of computeUnrollCount and into tryUnrollLoop, where the existing "HasUnrollDisablePragma" check is? Hopefully that could be replaced with the new method, much like HasUnrollAndJamDisablePragma has been, and would mean this computeUnrollCount function would just work as it used to. dmgreen: Have you considered moving the "disable" check out of computeUnrollCount and into tryUnrollLoop…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions There are multiple mechanisms in `computeUnrollCount` that disable unrolling (such as `UnrollCountPragmaValue` returning zero). If I was to fix this method, I'd do it cleanly by refactoring-out the mechanism that computes the unroll factor when pragmas/options are absent (and not emit any LoopUnroll-specific diagnostics). Meinersbur: There are multiple mechanisms in `computeUnrollCount` that disable unrolling (such as…
bool ExplicitUnrollAndJamCount = PragmaCount > 0 \|\| UserUnrollCount;		bool ExplicitUnrollAndJamCount = PragmaCount > 0 \|\| UserUnrollCount;
		dmgreenUnsubmitted Not Done Reply Inline Actions Same as unroll. What if PragmaCount and hasDisableAllTransformsHint? dmgreen: Same as unroll. What if PragmaCount and hasDisableAllTransformsHint?
bool ExplicitUnrollAndJam = PragmaEnableUnroll \|\| ExplicitUnrollAndJamCount;		bool ExplicitUnrollAndJam = PragmaEnableUnroll \|\| ExplicitUnrollAndJamCount;

// If the loop has an unrolling pragma, we want to be more aggressive with		// If the loop has an unrolling pragma, we want to be more aggressive with
// unrolling limits.		// unrolling limits.
if (ExplicitUnrollAndJam)		if (ExplicitUnrollAndJam)
UP.UnrollAndJamInnerLoopThreshold = PragmaUnrollAndJamThreshold;		UP.UnrollAndJamInnerLoopThreshold = PragmaUnrollAndJamThreshold;

if (!UP.AllowRemainder && getUnrollAndJammedLoopSize(InnerLoopSize, UP) >=		if (!UP.AllowRemainder && getUnrollAndJammedLoopSize(InnerLoopSize, UP) >=
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,
// Exit early if unrolling is disabled.		// Exit early if unrolling is disabled.
if (!UP.UnrollAndJam \|\| UP.UnrollAndJamInnerLoopThreshold == 0)		if (!UP.UnrollAndJam \|\| UP.UnrollAndJamInnerLoopThreshold == 0)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

LLVM_DEBUG(dbgs() << "Loop Unroll and Jam: F["		LLVM_DEBUG(dbgs() << "Loop Unroll and Jam: F["
<< L->getHeader()->getParent()->getName() << "] Loop %"		<< L->getHeader()->getParent()->getName() << "] Loop %"
<< L->getHeader()->getName() << "\n");		<< L->getHeader()->getName() << "\n");

		TransformationMode EnableMode = hasUnrollAndJamTransformation(L);
		if (EnableMode & TM_Disable)
		return LoopUnrollResult::Unmodified;

// A loop with any unroll pragma (enabling/disabling/count/etc) is left for		// A loop with any unroll pragma (enabling/disabling/count/etc) is left for
// the unroller, so long as it does not explicitly have unroll_and_jam		// the unroller, so long as it does not explicitly have unroll_and_jam
// metadata. This means #pragma nounroll will disable unroll and jam as well		// metadata. This means #pragma nounroll will disable unroll and jam as well
// as unrolling		// as unrolling
if (HasUnrollAndJamDisablePragma(L) \|\|		if (HasAnyUnrollPragma(L, "llvm.loop.unroll.") &&
(HasAnyUnrollPragma(L, "llvm.loop.unroll.") &&		!HasAnyUnrollPragma(L, "llvm.loop.unroll_and_jam.")) {
!HasAnyUnrollPragma(L, "llvm.loop.unroll_and_jam."))) {
LLVM_DEBUG(dbgs() << " Disabled due to pragma.\n");		LLVM_DEBUG(dbgs() << " Disabled due to pragma.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

if (!isSafeToUnrollAndJam(L, SE, DT, DI)) {		if (!isSafeToUnrollAndJam(L, SE, DT, DI)) {
LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");		LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
Show All 22 Lines	if (NumInlineCandidates != 0) {
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
if (Convergent) {		if (Convergent) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << " Not unrolling loop with convergent instructions.\n");		dbgs() << " Not unrolling loop with convergent instructions.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

		// Save original loop IDs for after the transformation.
		MDNode *OrigOuterLoopID = L->getLoopID();
		MDNode *OrigSubLoopID = SubLoop->getLoopID();
		dmgreenUnsubmitted Done Reply Inline Actions This is called SubLoop here dmgreen: This is called SubLoop here

		// To assign the loop id of the epilogue, assign it before unrolling it so it
		// is applied to every inner loop of the epilogue. We later apply the loop ID
		hfinkelUnsubmitted Not Done Reply Inline Actions So each inner loop gets the same id? That doesn't sound right. hfinkel: So each inner loop gets the same id? That doesn't sound right.
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions `LoopID` is a misnomer. A LoopID is neither unique (multiple loops having the same LoopID, e.g. because LoopVersioning of any pass not aware of loops copied the BBs of a loop; there is even a regression test for the behaviour with non-unique LoopID) nor identifying (adding/removing attributes in the LoopID MDNode will create a new MDNode; see D52116 for a fix for llvm.loop.parallel_accesses assuming this property). Fixing LoopID to be identifier-like is not possible with the current MDNode structure and would require to make any pass that copies code to be aware of LoopIDs. It would be easier to not assume that LoopID has any identifying properties. I am open to rename 'LoopID' to something else. Meinersbur: `LoopID` is a misnomer. A LoopID is neither unique (multiple loops having the same LoopID, e.g.
		// for the jammed inner loop.
		Optional<MDNode *> NewInnerEpilogueLoopID = makeFollowupLoopID(
		OrigOuterLoopID, {LLVMLoopUnrollAndJamFollowupAll,
		LLVMLoopUnrollAndJamFollowupRemainderInner});
		if (NewInnerEpilogueLoopID.hasValue())
		SubLoop->setLoopID(NewInnerEpilogueLoopID.getValue());

// Find trip count and trip multiple		// Find trip count and trip multiple
unsigned OuterTripCount = SE.getSmallConstantTripCount(L, Latch);		unsigned OuterTripCount = SE.getSmallConstantTripCount(L, Latch);
unsigned OuterTripMultiple = SE.getSmallConstantTripMultiple(L, Latch);		unsigned OuterTripMultiple = SE.getSmallConstantTripMultiple(L, Latch);
unsigned InnerTripCount = SE.getSmallConstantTripCount(SubLoop, SubLoopLatch);		unsigned InnerTripCount = SE.getSmallConstantTripCount(SubLoop, SubLoopLatch);

// Decide if, and by how much, to unroll		// Decide if, and by how much, to unroll
bool IsCountSetExplicitly = computeUnrollAndJamCount(		bool IsCountSetExplicitly = computeUnrollAndJamCount(
L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount,		L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount,
OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP);		OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP);
if (UP.Count <= 1)		if (UP.Count <= 1)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
// Unroll factor (Count) must be less or equal to TripCount.		// Unroll factor (Count) must be less or equal to TripCount.
if (OuterTripCount && UP.Count > OuterTripCount)		if (OuterTripCount && UP.Count > OuterTripCount)
UP.Count = OuterTripCount;		UP.Count = OuterTripCount;

LoopUnrollResult UnrollResult =		Loop *EpilogueOuterLoop = nullptr;
UnrollAndJamLoop(L, UP.Count, OuterTripCount, OuterTripMultiple,		LoopUnrollResult UnrollResult = UnrollAndJamLoop(
UP.UnrollRemainder, LI, &SE, &DT, &AC, &ORE);		L, UP.Count, OuterTripCount, OuterTripMultiple, UP.UnrollRemainder, LI,
		&SE, &DT, &AC, &ORE, &EpilogueOuterLoop);

		// Assign new loop attributes.
		if (EpilogueOuterLoop) {
		Optional<MDNode *> NewOuterEpilogueLoopID = makeFollowupLoopID(
		OrigOuterLoopID, {LLVMLoopUnrollAndJamFollowupAll,
		LLVMLoopUnrollAndJamFollowupRemainderOuter});
		if (NewOuterEpilogueLoopID.hasValue())
		EpilogueOuterLoop->setLoopID(NewOuterEpilogueLoopID.getValue());
		}

		Optional<MDNode *> NewInnerLoopID =
		makeFollowupLoopID(OrigOuterLoopID, {LLVMLoopUnrollAndJamFollowupAll,
		LLVMLoopUnrollAndJamFollowupInner});
		if (NewInnerLoopID.hasValue())
		SubLoop->setLoopID(NewInnerLoopID.getValue());
		else
		SubLoop->setLoopID(OrigSubLoopID);

		if (UnrollResult == LoopUnrollResult::PartiallyUnrolled) {
		Optional<MDNode *> NewOuterLoopID = makeFollowupLoopID(
		OrigOuterLoopID,
		{LLVMLoopUnrollAndJamFollowupAll, LLVMLoopUnrollAndJamFollowupOuter});
		if (NewOuterLoopID.hasValue()) {
		L->setLoopID(NewOuterLoopID.getValue());

		// Do not setLoopAlreadyUnrolled if a followup was given.
		return UnrollResult;
		}
		}

// If loop has an unroll count pragma or unrolled by explicitly set count		// If loop has an unroll count pragma or unrolled by explicitly set count
// mark loop as unrolled to prevent unrolling beyond that requested.		// mark loop as unrolled to prevent unrolling beyond that requested.
if (UnrollResult != LoopUnrollResult::FullyUnrolled && IsCountSetExplicitly)		if (UnrollResult != LoopUnrollResult::FullyUnrolled && IsCountSetExplicitly)
L->setLoopAlreadyUnrolled();		L->setLoopAlreadyUnrolled();

return UnrollResult;		return UnrollResult;
}		}
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 655 Lines • ▼ Show 20 Lines
}		}

// Returns true if the loop has an unroll(enable) pragma. This metadata is used		// Returns true if the loop has an unroll(enable) pragma. This metadata is used
// for both "#pragma unroll" and "#pragma clang loop unroll(enable)" directives.		// for both "#pragma unroll" and "#pragma clang loop unroll(enable)" directives.
static bool HasUnrollEnablePragma(const Loop *L) {		static bool HasUnrollEnablePragma(const Loop *L) {
return GetUnrollMetadataForLoop(L, "llvm.loop.unroll.enable");		return GetUnrollMetadataForLoop(L, "llvm.loop.unroll.enable");
}		}

// Returns true if the loop has an unroll(disable) pragma.
static bool HasUnrollDisablePragma(const Loop *L) {
return GetUnrollMetadataForLoop(L, "llvm.loop.unroll.disable");
}

// Returns true if the loop has an runtime unroll(disable) pragma.		// Returns true if the loop has an runtime unroll(disable) pragma.
static bool HasRuntimeUnrollDisablePragma(const Loop *L) {		static bool HasRuntimeUnrollDisablePragma(const Loop *L) {
return GetUnrollMetadataForLoop(L, "llvm.loop.unroll.runtime.disable");		return GetUnrollMetadataForLoop(L, "llvm.loop.unroll.runtime.disable");
}		}

// If loop has an unroll_count pragma return the (necessarily		// If loop has an unroll_count pragma return the (necessarily
// positive) value from the pragma. Otherwise return 0.		// positive) value from the pragma. Otherwise return 0.
static unsigned UnrollCountPragmaValue(const Loop *L) {		static unsigned UnrollCountPragmaValue(const Loop *L) {
Show All 31 Lines	static uint64_t getUnrolledLoopSize(
unsigned LoopSize,		unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP) {		TargetTransformInfo::UnrollingPreferences &UP) {
assert(LoopSize >= UP.BEInsns && "LoopSize should not be less than BEInsns!");		assert(LoopSize >= UP.BEInsns && "LoopSize should not be less than BEInsns!");
return (uint64_t)(LoopSize - UP.BEInsns) * UP.Count + UP.BEInsns;		return (uint64_t)(LoopSize - UP.BEInsns) * UP.Count + UP.BEInsns;
}		}

// Returns true if unroll count was set explicitly.		// Returns true if unroll count was set explicitly.
// Calculates unroll count and writes it to UP.Count.		// Calculates unroll count and writes it to UP.Count.
		// Unless IgnoreUser is true, will also use metadata and command-line options
		// that are specific to to the LoopUnroll pass (which, for instance, are
		// irrelevant for the LoopUnrollAndJam pass).
		// FIXME: This function is used by LoopUnroll and LoopUnrollAndJam, but consumes
		// many LoopUnroll-specific options. The shared functionality should be
		// refactored into it own function.
bool llvm::computeUnrollCount(		bool llvm::computeUnrollCount(
Loop L, const TargetTransformInfo &TTI, DominatorTree &DT, LoopInfo LI,		Loop L, const TargetTransformInfo &TTI, DominatorTree &DT, LoopInfo LI,
ScalarEvolution &SE, const SmallPtrSetImpl<const Value *> &EphValues,		ScalarEvolution &SE, const SmallPtrSetImpl<const Value *> &EphValues,
OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount,		OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount,
unsigned &TripMultiple, unsigned LoopSize,		unsigned &TripMultiple, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) {		TargetTransformInfo::UnrollingPreferences &UP, bool &UseUpperBound) {

// Check for explicit Count.		// Check for explicit Count.
// 1st priority is unroll count set by "unroll-count" option.		// 1st priority is unroll count set by "unroll-count" option.
bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;		bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;
if (UserUnrollCount) {		if (UserUnrollCount) {
UP.Count = UnrollCount;		UP.Count = UnrollCount;
UP.AllowExpensiveTripCount = true;		UP.AllowExpensiveTripCount = true;
UP.Force = true;		UP.Force = true;
if (UP.AllowRemainder && getUnrolledLoopSize(LoopSize, UP) < UP.Threshold)		if (UP.AllowRemainder && getUnrolledLoopSize(LoopSize, UP) < UP.Threshold)
Show All 13 Lines	bool llvm::computeUnrollCount(
}		}
bool PragmaFullUnroll = HasUnrollFullPragma(L);		bool PragmaFullUnroll = HasUnrollFullPragma(L);
if (PragmaFullUnroll && TripCount != 0) {		if (PragmaFullUnroll && TripCount != 0) {
UP.Count = TripCount;		UP.Count = TripCount;
if (getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)		if (getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)
return false;		return false;
}		}

bool PragmaEnableUnroll = HasUnrollEnablePragma(L);		bool PragmaEnableUnroll = HasUnrollEnablePragma(L);
		dmgreenUnsubmitted Not Done Reply Inline Actions This shouldn't be needed here. Before this patch, there was a single place that checked if the loop had unroll disable pragma (HasUnrollDisablePragma at the start of tryToUnrollLoop). It seems best to keep that as-is in this patch (it's already long enough!) and remove HasUnrollDisablePragma, replacing it with the new hasUnrollTransformation & TM_Disable check. Then we won't need this IgnoreUser. dmgreen: This shouldn't be needed here. Before this patch, there was a single place that checked if the…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions This is here because if the unfortunate interaction between LoopUnroll and LoopUnrollAndJam. `computeUnrollAndJamCount` uses the result of this function to itself determine whether it should unroll-and-jam. `HasUnrollDisablePragma` checks for the `llvm.loop.unroll.enable` property. `hasUnrollTransformation` returns whether LoopUnroll should do something which is not interchangeable. For some reason, `llvm.loop.unroll.enable` is handled here, but `llvm.loop.unroll.count` and `llvm.loop.unroll.full` are handled here and therefore have in influence on LoopUnrollAndJam. I would be glad if you, the author of LoopUnrollAndJam, could untangle this. Meinersbur: This is here because if the unfortunate interaction between LoopUnroll and LoopUnrollAndJam.
		dmgreenUnsubmitted Not Done Reply Inline Actions Sometimes it's easier to show with code :-) so this is what I was thinking of: https://reviews.llvm.org/P8121 Unless you think that will not work for some reason? It passes all the tests you have here, and removes HasUnrollDisablePragma and the IgnoreUser, so seems cleaner. It also has the advantage of keeping unrelated changes to a minimum and not introducing a second place for llvm.loop.unroll.disable to be checked. dmgreen: Sometimes it's easier to show with code :-) so this is what I was thinking of: https://reviews.
		MeinersburAuthorUnsubmitted Done Reply Inline Actions Thank you for the patch. I am not 100% sure whether this does not change LoopUnroll's behavior. That is, with `!{!"llvm.loop.unroll.count", i32 1}` it currently executes UP.Count = PragmaCount; UP.Runtime = true; UP.AllowExpensiveTripCount = true; UP.Force = true; if ((UP.AllowRemainder \|\| (TripMultiple % PragmaCount == 0)) && getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold) return true; where as with your patch it bails out early (it might still do peeling even if UP.Count is 1). Also, the `-unroll-count` command-line option would be evaluated first before your patch. unroll-pragmas_contradiction.ll950 BDownload fails with your patch. However, I like that it indeed makes the unroll decision simpler and goes in the direction of separating LoopUnroll and LoopUnrollAndJam's decision logic. Meinersbur: Thank you for the patch. I am not 100% sure whether this does not change LoopUnroll's behavior.
bool ExplicitUnroll = PragmaCount > 0 \|\| PragmaFullUnroll \|\|		bool ExplicitUnroll = PragmaCount > 0 \|\| PragmaFullUnroll \|\|
PragmaEnableUnroll \|\| UserUnrollCount;		PragmaEnableUnroll \|\| UserUnrollCount;
		dmgreenUnsubmitted Done Reply Inline Actions What if it has PragmaCount and hasDisableAllTransformsHint? Should that not enable too? dmgreen: What if it has PragmaCount and hasDisableAllTransformsHint? Should that not enable too?
		MeinersburAuthorUnsubmitted Done Reply Inline Actions If `llvm.loop.unroll.enable` is not set, interpretation here is that the transformation is 'non-forced', that is, `llvm.loop.unroll.count` is a hint to the compiler that if it unrolls, then it should unroll by that amount. `llvm.loop.disable_nonforced` overrride the decision whether to unroll, i.e. the unroll factor does not matter. I am aware that the concept of 'forced' transformations is not consistent between passes, but I am trying to give it some consistency. Passes could query shared code such as `hasUnrollTransformation` in `LoopUtils`. `hasUnrollTransformation` currently follows your interpretation of `llvm.loop.unroll.count` / `llvm.loop.disable_nonforced`. I am happy to implement either definition, as long as we find a consistent rule. Meinersbur: If `llvm.loop.unroll.enable` is not set, interpretation here is that the transformation is 'non…
		dmgreenUnsubmitted Done Reply Inline Actions I would expect that if a loop has any metadata for a pass, that would mean disable_nonforced doesn't apply. As if the user has specified some metadata, it likely wants something to happen. I think in this specific case llvm.loop.unroll.count implies llvm.loop.unroll.enable, and we wouldn't put both on a loop for "#pragma unroll(4)" or "#pragma clang loop unroll_count(4)" dmgreen: I would expect that if a loop has any metadata for a pass, that would mean disable_nonforced…

if (ExplicitUnroll && TripCount != 0) {		if (ExplicitUnroll && TripCount != 0) {
// If the loop has an unrolling pragma, we want to be more aggressive with		// If the loop has an unrolling pragma, we want to be more aggressive with
// unrolling limits. Set thresholds to at least the PragmaUnrollThreshold		// unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
// value which is larger than the default limits.		// value which is larger than the default limits.
UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);		UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);
UP.PartialThreshold =		UP.PartialThreshold =
std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);		std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	static LoopUnrollResult tryToUnrollLoop(
const TargetTransformInfo &TTI, AssumptionCache &AC,		const TargetTransformInfo &TTI, AssumptionCache &AC,
OptimizationRemarkEmitter &ORE, bool PreserveLCSSA, int OptLevel,		OptimizationRemarkEmitter &ORE, bool PreserveLCSSA, int OptLevel,
Optional<unsigned> ProvidedCount, Optional<unsigned> ProvidedThreshold,		Optional<unsigned> ProvidedCount, Optional<unsigned> ProvidedThreshold,
Optional<bool> ProvidedAllowPartial, Optional<bool> ProvidedRuntime,		Optional<bool> ProvidedAllowPartial, Optional<bool> ProvidedRuntime,
Optional<bool> ProvidedUpperBound, Optional<bool> ProvidedAllowPeeling) {		Optional<bool> ProvidedUpperBound, Optional<bool> ProvidedAllowPeeling) {
LLVM_DEBUG(dbgs() << "Loop Unroll: F["		LLVM_DEBUG(dbgs() << "Loop Unroll: F["
<< L->getHeader()->getParent()->getName() << "] Loop %"		<< L->getHeader()->getParent()->getName() << "] Loop %"
<< L->getHeader()->getName() << "\n");		<< L->getHeader()->getName() << "\n");
if (HasUnrollDisablePragma(L))		if (hasUnrollTransformation(L) & TM_Disable)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
if (!L->isLoopSimplifyForm()) {		if (!L->isLoopSimplifyForm()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << " Not unrolling loop which is not in loop-simplify form.\n");		dbgs() << " Not unrolling loop which is not in loop-simplify form.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	bool IsCountSetExplicitly = computeUnrollCount(
L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount,		L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount,
TripMultiple, LoopSize, UP, UseUpperBound);		TripMultiple, LoopSize, UP, UseUpperBound);
if (!UP.Count)		if (!UP.Count)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
// Unroll factor (Count) must be less or equal to TripCount.		// Unroll factor (Count) must be less or equal to TripCount.
if (TripCount && UP.Count > TripCount)		if (TripCount && UP.Count > TripCount)
UP.Count = TripCount;		UP.Count = TripCount;

		// Save loop properties before it is transformed.
		MDNode *OrigLoopID = L->getLoopID();

// Unroll the loop.		// Unroll the loop.
		Loop *RemainderLoop = nullptr;
LoopUnrollResult UnrollResult = UnrollLoop(		LoopUnrollResult UnrollResult = UnrollLoop(
L, UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,		L, UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,
UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder,		UseUpperBound, MaxOrZero, TripMultiple, UP.PeelCount, UP.UnrollRemainder,
LI, &SE, &DT, &AC, &ORE, PreserveLCSSA);		LI, &SE, &DT, &AC, &ORE, PreserveLCSSA, &RemainderLoop);
if (UnrollResult == LoopUnrollResult::Unmodified)		if (UnrollResult == LoopUnrollResult::Unmodified)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

		if (RemainderLoop) {
		Optional<MDNode *> RemainderLoopID =
		makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,
		hiradityaUnsubmitted Done Reply Inline Actions nit: maybe put the string literals as a separate declaration? hiraditya: nit: maybe put the string literals as a separate declaration?
		LLVMLoopUnrollFollowupRemainder});
		if (RemainderLoopID.hasValue())
		RemainderLoop->setLoopID(RemainderLoopID.getValue());
		}

		if (UnrollResult != LoopUnrollResult::FullyUnrolled) {
		Optional<MDNode *> NewLoopID =
		makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,
		LLVMLoopUnrollFollowupUnrolled});
		if (NewLoopID.hasValue()) {
		L->setLoopID(NewLoopID.getValue());

		// Do not setLoopAlreadyUnrolled if loop attributes have been specified
		// explicitly.
		return UnrollResult;
		}
		}

// If loop has an unroll count pragma or unrolled by explicitly set count		// If loop has an unroll count pragma or unrolled by explicitly set count
// mark loop as unrolled to prevent unrolling beyond that requested.		// mark loop as unrolled to prevent unrolling beyond that requested.
// If the loop was peeled, we already "used up" the profile information		// If the loop was peeled, we already "used up" the profile information
// we had, so we don't want to unroll or peel again.		// we had, so we don't want to unroll or peel again.
if (UnrollResult != LoopUnrollResult::FullyUnrolled &&		if (UnrollResult != LoopUnrollResult::FullyUnrolled &&
(IsCountSetExplicitly \|\| UP.PeelCount))		(IsCountSetExplicitly \|\| UP.PeelCount))
L->setLoopAlreadyUnrolled();		L->setLoopAlreadyUnrolled();

▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopVersioningLICM.cpp

	Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines

	bool LoopVersioningLICM::runOnLoop(Loop *L, LPPassManager &LPM) {			bool LoopVersioningLICM::runOnLoop(Loop *L, LPPassManager &LPM) {
	// This will automatically release all resources hold by the current			// This will automatically release all resources hold by the current
	// LoopVersioningLICM object.			// LoopVersioningLICM object.
	AutoResetter Resetter(*this);			AutoResetter Resetter(*this);

	if (skipLoop(L))			if (skipLoop(L))
	return false;			return false;

				// Do not do the transformation if disabled by metadata.
				if (hasLICMVersioningTransformation(L) & TM_Disable)
				return false;

	// Get Analysis information.			// Get Analysis information.
	AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();			AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
	SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();			SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
	LAA = &getAnalysis<LoopAccessLegacyAnalysis>();			LAA = &getAnalysis<LoopAccessLegacyAnalysis>();
	ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();			ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
	LAI = nullptr;			LAI = nullptr;
	// Set Current Loop			// Set Current Loop
	CurLoop = L;			CurLoop = L;
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLoopInterchangePass(Registry);		initializeLoopInterchangePass(Registry);
initializeLoopPredicationLegacyPassPass(Registry);		initializeLoopPredicationLegacyPassPass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnrollAndJamPass(Registry);		initializeLoopUnrollAndJamPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
		initializeWarnMissedTransformationsLegacyPass(Registry);
initializeLoopVersioningLICMPass(Registry);		initializeLoopVersioningLICMPass(Registry);
initializeLoopIdiomRecognizeLegacyPassPass(Registry);		initializeLoopIdiomRecognizeLegacyPassPass(Registry);
initializeLowerAtomicLegacyPassPass(Registry);		initializeLowerAtomicLegacyPassPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
initializeLowerGuardIntrinsicLegacyPassPass(Registry);		initializeLowerGuardIntrinsicLegacyPassPass(Registry);
initializeMemCpyOptLegacyPassPass(Registry);		initializeMemCpyOptLegacyPassPass(Registry);
initializeMergeICmpsPass(Registry);		initializeMergeICmpsPass(Registry);
initializeMergedLoadStoreMotionLegacyPassPass(Registry);		initializeMergedLoadStoreMotionLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

lib/Transforms/Scalar/WarnMissedTransforms.cpp

This file was added.

				//===- LoopTransformWarning.cpp - ----------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Emit warnings if forced code transformations have not been performed.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
				#include "llvm/Analysis/OptimizationRemarkEmitter.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"

				using namespace llvm;

				#define DEBUG_TYPE "transform-warning"

				/// Emit warnings for forced (i.e. user-defined) loop transformations which have
				/// still not been performed.
				static void warnAboutLeftoverTransformations(Loop *L,
				OptimizationRemarkEmitter *ORE) {
				if (hasUnrollTransformation(L) == TM_ForcedByUser) {
				LLVM_DEBUG(dbgs() << "Leftover unroll transformation\n");
				ORE->emit(
				DiagnosticInfoOptimizationFailure(DEBUG_TYPE,
				"FailedRequestedUnrolling",
				L->getStartLoc(), L->getHeader())
				<< "loop not unrolled: failed explicitly specified loop unrolling");
				hfinkelUnsubmitted Done Reply Inline Actions Here and below, explicitly specified should have a hyphen (it is a compound adjective): explicitly-specified loop unrolling that having been said, I'd prefer a different phrasing all together. These are end-user visible messages, and I think that we can make these slightly more user friendly. How about this: "loop not unrolled: the optimizer was unable to perform the requested transformation" (and similar for the others) hfinkel: Here and below, explicitly specified should have a hyphen (it is a compound adjective)…
				MeinersburAuthorUnsubmitted Done Reply Inline Actions I think that "the optimizer was unable to perform" is less accurate: it gives the impression that the optimizer actually tried to perform the transformation, but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline (e.g. because of `-fno-vectorize` or `-mllvm -enable-unroll-and-jam` is missing). That is, the user should modify the compiler flags instead of tweaking the source code. That being said, "failed to ..." is not much better. Any better suggestions? Meinersbur: I think that "the optimizer was unable to perform" is less accurate: it gives the impression…
				hfinkelUnsubmitted Done Reply Inline Actions I think that "the optimizer was unable to perform" is less accurate: ... but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline... I disagree that it is less accurate, and the optimizer might be unable to perform an optimization for structural reasons, and to say that something "failed" clearly implies to me that it was explicitly attempted (which in this case it was not). Nevertheless, this is a good point, and we could provide a more-useful message. How about this: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering hfinkel: > I think that "the optimizer was unable to perform" is less accurate: ... but one of the…
				}

				if (hasUnrollAndJamTransformation(L) == TM_ForcedByUser) {
				LLVM_DEBUG(dbgs() << "Leftover unroll-and-jam transformation\n");
				ORE->emit(DiagnosticInfoOptimizationFailure(
				DEBUG_TYPE, "FailedRequestedUnrollAndJamming",
				L->getStartLoc(), L->getHeader())
				<< "loop not unroll-and-jammed: failed explicitly specified loop "
				"unroll-and-jam");
				}

				if (hasVectorizeTransformation(L) == TM_ForcedByUser) {
				LLVM_DEBUG(dbgs() << "Leftover vectorization transformation\n");
				Optional<int> VectorizeWidth =
				getOptionalIntLoopAttribute(L, "llvm.loop.vectorize.width");
				Optional<int> InterleaveCount =
				getOptionalIntLoopAttribute(L, "llvm.loop.interleave.count");

				if (VectorizeWidth.getValueOr(0) != 1)
				ORE->emit(DiagnosticInfoOptimizationFailure(
				DEBUG_TYPE, "FailedRequestedVectorization",
				L->getStartLoc(), L->getHeader())
				<< "loop not vectorized: "
				<< "failed explicitly specified loop vectorization");
				else if (InterleaveCount.getValueOr(0) != 1)
				ORE->emit(DiagnosticInfoOptimizationFailure(
				DEBUG_TYPE, "FailedRequestedInterleaving", L->getStartLoc(),
				L->getHeader())
				<< "loop not interleaved: "
				<< "failed explicitly specified loop interleaving");
				}

				if (hasDistributeTransformation(L) == TM_ForcedByUser) {
				LLVM_DEBUG(dbgs() << "Leftover distribute transformation\n");
				ORE->emit(DiagnosticInfoOptimizationFailure(
				DEBUG_TYPE, "FailedRequestedDistribution", L->getStartLoc(),
				L->getHeader())
				<< "loop not distributed: failed explicitly specified loop "
				"distribution");
				}
				}

				static void warnAboutLeftoverTransformations(Function F, LoopInfo LI,
				OptimizationRemarkEmitter *ORE) {
				for (auto *L : LI->getLoopsInPreorder())
				warnAboutLeftoverTransformations(L, ORE);
				}

				// New pass manager boilerplate
				PreservedAnalyses
				WarnMissedTransformationsPass::run(Function &F, FunctionAnalysisManager &AM) {
				auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
				auto &LI = AM.getResult<LoopAnalysis>(F);

				warnAboutLeftoverTransformations(&F, &LI, &ORE);

				return PreservedAnalyses::all();
				}

				// Legacy pass manager boilerplate
				namespace {
				class WarnMissedTransformationsLegacy : public FunctionPass {
				public:
				static char ID;

				explicit WarnMissedTransformationsLegacy() : FunctionPass(ID) {
				initializeWarnMissedTransformationsLegacyPass(
				*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;

				auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
				auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();

				warnAboutLeftoverTransformations(&F, &LI, &ORE);
				return false;
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();

				AU.setPreservesAll();
				}
				};
				} // end anonymous namespace

				char WarnMissedTransformationsLegacy::ID = 0;

				INITIALIZE_PASS_BEGIN(WarnMissedTransformationsLegacy, "transform-warning",
				"Warn about non-applied transformations", false, false)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
				INITIALIZE_PASS_END(WarnMissedTransformationsLegacy, "transform-warning",
				"Warn about non-applied transformations", false, false)

				Pass *llvm::createWarnMissedTransformationsPass() {
				return new WarnMissedTransformationsLegacy();
				}

lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines
///		///
/// If we want to perform PGO-based loop peeling, PeelCount is set to the		/// If we want to perform PGO-based loop peeling, PeelCount is set to the
/// number of iterations we want to peel off.		/// number of iterations we want to peel off.
///		///
/// The LoopInfo Analysis that is passed will be kept consistent.		/// The LoopInfo Analysis that is passed will be kept consistent.
///		///
/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and		/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and
/// DominatorTree if they are non-null.		/// DominatorTree if they are non-null.
		///
		/// If RemainderLoop is non-null, it will receive the remainder loop (if
		/// required and not fully unrolled).
LoopUnrollResult llvm::UnrollLoop(		LoopUnrollResult llvm::UnrollLoop(
Loop *L, unsigned Count, unsigned TripCount, bool Force, bool AllowRuntime,		Loop *L, unsigned Count, unsigned TripCount, bool Force, bool AllowRuntime,
bool AllowExpensiveTripCount, bool PreserveCondBr, bool PreserveOnlyFirst,		bool AllowExpensiveTripCount, bool PreserveCondBr, bool PreserveOnlyFirst,
unsigned TripMultiple, unsigned PeelCount, bool UnrollRemainder,		unsigned TripMultiple, unsigned PeelCount, bool UnrollRemainder,
LoopInfo LI, ScalarEvolution SE, DominatorTree DT, AssumptionCache AC,		LoopInfo LI, ScalarEvolution SE, DominatorTree DT, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, bool PreserveLCSSA) {		OptimizationRemarkEmitter ORE, bool PreserveLCSSA, Loop *RemainderLoop) {

BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();
if (!Preheader) {		if (!Preheader) {
LLVM_DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");		LLVM_DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

BasicBlock *LatchBlock = L->getLoopLatch();		BasicBlock *LatchBlock = L->getLoopLatch();
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollLoop(

bool EpilogProfitability =		bool EpilogProfitability =
UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog		UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog
: isEpilogProfitable(L);		: isEpilogProfitable(L);

if (RuntimeTripCount && TripMultiple % Count != 0 &&		if (RuntimeTripCount && TripMultiple % Count != 0 &&
!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,		!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,
EpilogProfitability, UnrollRemainder, LI, SE,		EpilogProfitability, UnrollRemainder, LI, SE,
DT, AC, PreserveLCSSA)) {		DT, AC, PreserveLCSSA, RemainderLoop)) {
if (Force)		if (Force)
RuntimeTripCount = false;		RuntimeTripCount = false;
else {		else {
LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "		LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "
"generated when assuming runtime trip count\n");		"generated when assuming runtime trip count\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
}		}
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnrollAndJam.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	/*

We do this by spliting the blocks in the loop into Fore, Subloop and Aft.		We do this by spliting the blocks in the loop into Fore, Subloop and Aft.
Fore blocks are those before the inner loop, Aft are those after. Normal		Fore blocks are those before the inner loop, Aft are those after. Normal
Unroll code is used to copy each of these sets of blocks and the results are		Unroll code is used to copy each of these sets of blocks and the results are
combined together into the final form above.		combined together into the final form above.

isSafeToUnrollAndJam should be used prior to calling this to make sure the		isSafeToUnrollAndJam should be used prior to calling this to make sure the
unrolling will be valid. Checking profitablility is also advisable.		unrolling will be valid. Checking profitablility is also advisable.

		If EpilogueLoop is non-null, it receives the epilogue loop (if it was
		necessary to create one and not fully unrolled).
*/		*/
LoopUnrollResult		LoopUnrollResult llvm::UnrollAndJamLoop(
llvm::UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,		Loop *L, unsigned Count, unsigned TripCount, unsigned TripMultiple,
unsigned TripMultiple, bool UnrollRemainder,		bool UnrollRemainder, LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,
LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,		AssumptionCache AC, OptimizationRemarkEmitter ORE, Loop **EpilogueLoop) {
AssumptionCache AC, OptimizationRemarkEmitter ORE) {

// When we enter here we should have already checked that it is safe		// When we enter here we should have already checked that it is safe
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
assert(L->getSubLoops().size() == 1);		assert(L->getSubLoops().size() == 1);
Loop SubLoop = L->begin();		Loop SubLoop = L->begin();

// Don't enter the unroll code if there is nothing to do.		// Don't enter the unroll code if there is nothing to do.
if (TripCount == 0 && Count < 2) {		if (TripCount == 0 && Count < 2) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; almost nothing to do\n");		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; almost nothing to do\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

assert(Count > 0);		assert(Count > 0);
assert(TripMultiple > 0);		assert(TripMultiple > 0);
assert(TripCount == 0 \|\| TripCount % TripMultiple == 0);		assert(TripCount == 0 \|\| TripCount % TripMultiple == 0);

// Are we eliminating the loop control altogether?		// Are we eliminating the loop control altogether?
bool CompletelyUnroll = (Count == TripCount);		bool CompletelyUnroll = (Count == TripCount);

// We use the runtime remainder in cases where we don't know trip multiple		// We use the runtime remainder in cases where we don't know trip multiple
if (TripMultiple == 1 \|\| TripMultiple % Count != 0) {		if (TripMultiple == 1 \|\| TripMultiple % Count != 0) {
if (!UnrollRuntimeLoopRemainder(L, Count, /AllowExpensiveTripCount/ false,		if (!UnrollRuntimeLoopRemainder(L, Count, /AllowExpensiveTripCount/ false,
/UseEpilogRemainder/ true,		/UseEpilogRemainder/ true,
UnrollRemainder, LI, SE, DT, AC, true)) {		UnrollRemainder, LI, SE, DT, AC, true,
		EpilogueLoop)) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; remainder loop could not be "		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; remainder loop could not be "
"generated when assuming runtime trip count\n");		"generated when assuming runtime trip count\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
}		}

// Notify ScalarEvolution that the loop will be substantially changed,		// Notify ScalarEvolution that the loop will be substantially changed,
// if not outright eliminated.		// if not outright eliminated.
▲ Show 20 Lines • Show All 602 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 374 Lines • ▼ Show 20 Lines	if (!CreateRemainderLoop) {
Value *InVal = NewPHI->getIncomingValue(idx);		Value *InVal = NewPHI->getIncomingValue(idx);
NewPHI->setIncomingBlock(idx, NewLatch);		NewPHI->setIncomingBlock(idx, NewLatch);
if (Value *V = VMap.lookup(InVal))		if (Value *V = VMap.lookup(InVal))
NewPHI->setIncomingValue(idx, V);		NewPHI->setIncomingValue(idx, V);
}		}
}		}
if (CreateRemainderLoop) {		if (CreateRemainderLoop) {
Loop *NewLoop = NewLoops[L];		Loop *NewLoop = NewLoops[L];
		MDNode *LoopID = NewLoop->getLoopID();
assert(NewLoop && "L should have been cloned");		assert(NewLoop && "L should have been cloned");

// Only add loop metadata if the loop is not going to be completely		// Only add loop metadata if the loop is not going to be completely
// unrolled.		// unrolled.
if (UnrollRemainder)		if (UnrollRemainder)
return NewLoop;		return NewLoop;

		Optional<MDNode *> NewLoopID = makeFollowupLoopID(
		LoopID, {LLVMLoopUnrollFollowupAll, LLVMLoopUnrollFollowupRemainder});
		if (NewLoopID.hasValue()) {
		NewLoop->setLoopID(NewLoopID.getValue());

		// Do not setLoopAlreadyUnrolled if loop attributes have been defined
		// explicitly.
		return NewLoop;
		}

// Add unroll disable metadata to disable future unrolling for this loop.		// Add unroll disable metadata to disable future unrolling for this loop.
NewLoop->setLoopAlreadyUnrolled();		NewLoop->setLoopAlreadyUnrolled();
return NewLoop;		return NewLoop;
}		}
else		else
return nullptr;		return nullptr;
}		}

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
/// Epil: LoopBody; (executes extraiters times)		/// Epil: LoopBody; (executes extraiters times)
/// extraiters -= 1 // Omitted if unroll factor is 2.		/// extraiters -= 1 // Omitted if unroll factor is 2.
/// if (extraiters != 0) jump Epil: // Omitted if unroll factor is 2.		/// if (extraiters != 0) jump Epil: // Omitted if unroll factor is 2.
/// EpilExit:		/// EpilExit:

bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,		bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
bool AllowExpensiveTripCount,		bool AllowExpensiveTripCount,
bool UseEpilogRemainder,		bool UseEpilogRemainder,
bool UnrollRemainder,		bool UnrollRemainder, LoopInfo *LI,
LoopInfo LI, ScalarEvolution SE,		ScalarEvolution SE, DominatorTree DT,
DominatorTree DT, AssumptionCache AC,		AssumptionCache *AC, bool PreserveLCSSA,
bool PreserveLCSSA) {		Loop **ResultLoop) {
		hiradityaUnsubmitted Not Done Reply Inline Actions What is the rationale of using pointer to a pointer here? If we want to assign to ResultLoop, then maybe we can just return ResultLoop and bool as a pair. hiraditya: What is the rationale of using pointer to a pointer here? If we want to assign to ResultLoop…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions If the Result loop is not needed, one can pass `nullptr` (which is the default argument). Returning `std::pair` will require more changes. Meinersbur: If the Result loop is not needed, one can pass `nullptr` (which is the default argument).
LLVM_DEBUG(dbgs() << "Trying runtime unrolling on Loop: \n");		LLVM_DEBUG(dbgs() << "Trying runtime unrolling on Loop: \n");
LLVM_DEBUG(L->dump());		LLVM_DEBUG(L->dump());
LLVM_DEBUG(UseEpilogRemainder ? dbgs() << "Using epilog remainder.\n"		LLVM_DEBUG(UseEpilogRemainder ? dbgs() << "Using epilog remainder.\n"
: dbgs() << "Using prolog remainder.\n");		: dbgs() << "Using prolog remainder.\n");

// Make sure the loop is in canonical form.		// Make sure the loop is in canonical form.
if (!L->isLoopSimplifyForm()) {		if (!L->isLoopSimplifyForm()) {
LLVM_DEBUG(dbgs() << "Not in simplify form!\n");		LLVM_DEBUG(dbgs() << "Not in simplify form!\n");
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	if (OtherExits.size() > 0) {
// LoopSimplifyForm.		// LoopSimplifyForm.
formDedicatedExitBlocks(L, DT, LI, PreserveLCSSA);		formDedicatedExitBlocks(L, DT, LI, PreserveLCSSA);
// Generate dedicated exit blocks for the remainder loop if one exists, to		// Generate dedicated exit blocks for the remainder loop if one exists, to
// preserve LoopSimplifyForm.		// preserve LoopSimplifyForm.
if (remainderLoop)		if (remainderLoop)
formDedicatedExitBlocks(remainderLoop, DT, LI, PreserveLCSSA);		formDedicatedExitBlocks(remainderLoop, DT, LI, PreserveLCSSA);
}		}

		auto UnrollResult = LoopUnrollResult::Unmodified;
		hiradityaUnsubmitted Done Reply Inline Actions nit: space hiraditya: nit: space
		MeinersburAuthorUnsubmitted Done Reply Inline Actions This is done by `clang-format`. It try not to fight its decisions and hope for future improvement. Meinersbur: This is done by `clang-format`. It try not to fight its decisions and hope for future…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions I will try to remove the space in patch updates, but may sneak in again when I re-run clang-format and forget about it before submission. Meinersbur: I will try to remove the space in patch updates, but may sneak in again when I re-run clang…
if (remainderLoop && UnrollRemainder) {		if (remainderLoop && UnrollRemainder) {
LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");		LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");
		UnrollResult =
UnrollLoop(remainderLoop, /Count/ Count - 1, /TripCount/ Count - 1,		UnrollLoop(remainderLoop, /Count/ Count - 1, /TripCount/ Count - 1,
/Force/ false, /AllowRuntime/ false,		/Force/ false, /AllowRuntime/ false,
/AllowExpensiveTripCount/ false, /PreserveCondBr/ true,		/AllowExpensiveTripCount/ false, /PreserveCondBr/ true,
/PreserveOnlyFirst/ false, /TripMultiple/ 1,		/PreserveOnlyFirst/ false, /TripMultiple/ 1,
/PeelCount/ 0, /UnrollRemainder/ false, LI, SE, DT, AC,		/PeelCount/ 0, /UnrollRemainder/ false, LI, SE, DT, AC,
/ORE/ nullptr, PreserveLCSSA);		/ORE/ nullptr, PreserveLCSSA);
}		}

		if (ResultLoop && UnrollResult != LoopUnrollResult::FullyUnrolled)
		*ResultLoop = remainderLoop;
NumRuntimeUnrolled++;		NumRuntimeUnrolled++;
return true;		return true;
}		}

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	void llvm::initializeLoopPassPass(PassRegistry &Registry) {
INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(SCEVAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
}		}

/// Find string metadata for loop		static Optional<MDNode > findOptionMDForLoopID(MDNode LoopID,
///
/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
/// operand or null otherwise. If the string metadata is not found return
/// Optional's not-a-value.
Optional<const MDOperand > llvm::findStringMetadataForLoop(Loop TheLoop,
StringRef Name) {		StringRef Name) {
MDNode *LoopID = TheLoop->getLoopID();
// Return none if LoopID is false.		// Return none if LoopID is false.
if (!LoopID)		if (!LoopID)
return None;		return None;

// First operand should refer to the loop id itself.		// First operand should refer to the loop id itself.
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");		assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");		assert(LoopID->getOperand(0) == LoopID && "invalid loop id");

// Iterate over LoopID operands and look for MDString Metadata		// Iterate over LoopID operands and look for MDString Metadata
for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {		for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));		MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
if (!MD)		if (!MD)
continue;		continue;
MDString *S = dyn_cast<MDString>(MD->getOperand(0));		MDString *S = dyn_cast<MDString>(MD->getOperand(0));
if (!S)		if (!S)
continue;		continue;
// Return true if MDString holds expected MetaData.		// Return true if MDString holds expected MetaData.
if (Name.equals(S->getString()))		if (Name.equals(S->getString()))
		return MD;
		}
		return None;
		}

		static Optional<MDNode > findOptionMDForLoop(const Loop TheLoop,
		StringRef Name) {
		return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
		}

		/// Find string metadata for loop
		///
		/// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
		/// operand or null otherwise. If the string metadata is not found return
		/// Optional's not-a-value.
		Optional<const MDOperand > llvm::findStringMetadataForLoop(Loop TheLoop,
		StringRef Name) {
		auto MD = findOptionMDForLoop(TheLoop, Name).getValueOr(nullptr);
		if (!MD)
		return None;
switch (MD->getNumOperands()) {		switch (MD->getNumOperands()) {
case 1:		case 1:
return nullptr;		return nullptr;
case 2:		case 2:
return &MD->getOperand(1);		return &MD->getOperand(1);
default:		default:
llvm_unreachable("loop metadata has 0 or 1 operand");		llvm_unreachable("loop metadata has 0 or 1 operand");
}		}
}		}

		static Optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,
		StringRef Name) {
		Optional<MDNode *> MD = findOptionMDForLoop(TheLoop, Name);
		if (!MD.hasValue())
		return None;
		MDNode *OptionNode = MD.getValue();
		if (OptionNode == nullptr)
		return None;
		switch (OptionNode->getNumOperands()) {
		case 1:
		// When the value is absent it is interpreted as 'attribute set'.
		return true;
		case 2:
		return mdconst::extract_or_null<ConstantInt>(
		OptionNode->getOperand(1).get());
		}
		llvm_unreachable("unexpected number of options");
		}

		static bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {
		return getOptionalBoolLoopAttribute(TheLoop, Name).getValueOr(false);
		}

		llvm::Optional<int> llvm::getOptionalIntLoopAttribute(Loop *TheLoop,
		StringRef Name) {
		const MDOperand *AttrMD =
		findStringMetadataForLoop(TheLoop, Name).getValueOr(nullptr);
		if (!AttrMD)
		return None;

		ConstantInt *IntMD = mdconst::extract_or_null<ConstantInt>(AttrMD->get());
		if (!IntMD)
		return None;

		return IntMD->getSExtValue();
		}

		Optional<MDNode *> llvm::makeFollowupLoopID(
		MDNode *OrigLoopID, ArrayRef<StringRef> FollowupOptions,
		const char *InheritOptionsExceptPrefix, bool AlwaysNew) {
		if (!OrigLoopID) {
		if (AlwaysNew)
		return nullptr;
return None;		return None;
}		}

		assert(OrigLoopID->getOperand(0) == OrigLoopID);

		bool InheritAllAttrs = !InheritOptionsExceptPrefix;
		bool InheritSomeAttrs =
		dmgreenUnsubmitted Not Done Reply Inline Actions Maybe InheritSomeAttrs -> InheritNonExceptAttrs? dmgreen: Maybe InheritSomeAttrs -> InheritNonExceptAttrs?
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Avoiding double negation here. Meinersbur: Avoiding double negation here.
		InheritOptionsExceptPrefix && InheritOptionsExceptPrefix[0] != '\0';
		SmallVector<Metadata *, 8> MDs;
		MDs.push_back(nullptr);

		bool Changed = false;
		if (InheritAllAttrs \|\| InheritSomeAttrs) {
		for (const MDOperand &Existing : drop_begin(OrigLoopID->operands(), 1)) {
		MDNode *Op = cast<MDNode>(Existing.get());

		auto InheritThisAttribute = [InheritSomeAttrs,
		InheritOptionsExceptPrefix](MDNode *Op) {
		if (!InheritSomeAttrs)
		dmgreenUnsubmitted Done Reply Inline Actions Would this fall over if the metadata was not a string? Such as debug metadata. dmgreen: Would this fall over if the metadata was not a string? Such as debug metadata.
		MeinersburAuthorUnsubmitted Done Reply Inline Actions This was previously checked to be in a LoopID, therefore cannot be debug metadata. This assumes that the metadata is not malformed. However, this is nowhere handled gracefully in LLVM. For instance, `UnrollAndJamCountPragmaValue` will trigger an assertion if the MDNode has not exactly 2 items, or the second item is something else than a positive integer. In the case here, an assertion in `cast<T>` will trigger. I added extra checks at this location, but there are many others. Meinersbur: This was previously checked to be in a LoopID, therefore cannot be debug metadata. This…
		dmgreenUnsubmitted Not Done Reply Inline Actions Yeah, malformed input would be fine to not handle, as far as I understand (or perhaps is just QOI). But I was testing something like this (hope I still have it correct): void c(int n, int* w, int* x, int y, int z, int a) { #pragma clang loop distribute(enable) vectorize(disable) for (int i=0; i < n; i++) { x[i] = y[i] + z[i]w[i]; a[i+1] = (a[i-1] + a[i] + a[i+1])/3.0; y[i] = z[i] - x[i]; } } Ran with "clang -O3 distribute.c -S -g" would crash with the previous patch. Now I think it doesn't drop the distribute metadata? I believe the llvm.loop metedata will looks something like !58 in: !58 = distinct !{!58, !30, !59, !60, !61, !62} !59 = !DILocation(line: 8, column: 5, scope: !20) !60 = !{!"llvm.loop.vectorize.width", i32 1} !61 = !{!"llvm.loop.unroll.disable"} !62 = !{!"llvm.loop.distribute.enable", i1 true} !30 is a DILocation too, which I think are the parts causing the problems. dmgreen: Yeah, malformed input would be fine to not handle, as far as I understand (or perhaps is just…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions I may not have considered that CGLoopInfo.cpp also adds debug locations to LoopIDs. Should be fixed with the previous update. Thanks for noticing. I also made a mistake in that update which dropped all non-distribute metadata instead of the distribute metadata. It made one regression test fail. Meinersbur: I may not have considered that CGLoopInfo.cpp also adds debug locations to LoopIDs. Should be…
		return false;

		// Skip malformatted attribute metadata nodes.
		if (Op->getNumOperands() == 0)
		return true;
		Metadata *NameMD = Op->getOperand(0).get();
		if (!isa<MDString>(NameMD))
		return true;
		StringRef AttrName = cast<MDString>(NameMD)->getString();

		// Do not inherit excluded attributes.
		return !AttrName.startswith(InheritOptionsExceptPrefix);
		};

		if (InheritThisAttribute(Op))
		MDs.push_back(Op);
		else
		Changed = true;
		}
		} else {
		// Modified if we dropped at least one attribute.
		Changed = OrigLoopID->getNumOperands() > 1;
		}

		bool HasAnyFollowup = false;
		for (StringRef OptionName : FollowupOptions) {
		MDNode *FollowupNode =
		findOptionMDForLoopID(OrigLoopID, OptionName).getValueOr(nullptr);
		if (!FollowupNode)
		continue;

		HasAnyFollowup = true;
		for (const MDOperand &Option : drop_begin(FollowupNode->operands(), 1)) {
		MDs.push_back(Option.get());
		Changed = true;
		}
		}

		// Attributes of the followup loop not specified explicity, so signal to the
		// transformation pass to add suitable attributes.
		if (!AlwaysNew && !HasAnyFollowup)
		return None;

		// If no attributes were added or remove, the previous loop Id can be reused.
		if (!AlwaysNew && !Changed)
		return OrigLoopID;

		// No attributes is equivalent to having no !llvm.loop metadata at all.
		if (MDs.size() == 1)
		return nullptr;

		// Build the new loop ID.
		MDTuple *FollowupLoopID = MDNode::get(OrigLoopID->getContext(), MDs);
		FollowupLoopID->replaceOperandWith(0, FollowupLoopID);
		return FollowupLoopID;
		}

		bool llvm::hasDisableAllTransformsHint(const Loop *L) {
		return getBooleanLoopAttribute(L, "llvm.loop.disable_nonforced");
		}

		TransformationMode llvm::hasUnrollTransformation(Loop *L) {
		if (getBooleanLoopAttribute(L, "llvm.loop.unroll.disable"))
		return TM_SuppressedByUser;

		Optional<int> Count =
		getOptionalIntLoopAttribute(L, "llvm.loop.unroll.count");
		if (Count.hasValue())
		return Count.getValue() == 1 ? TM_SuppressedByUser : TM_ForcedByUser;

		if (getBooleanLoopAttribute(L, "llvm.loop.unroll.enable"))
		return TM_ForcedByUser;

		if (getBooleanLoopAttribute(L, "llvm.loop.unroll.full"))
		return TM_ForcedByUser;

		if (hasDisableAllTransformsHint(L))
		return TM_Disable;

		return TM_Unspecified;
		}

		TransformationMode llvm::hasUnrollAndJamTransformation(Loop *L) {
		if (getBooleanLoopAttribute(L, "llvm.loop.unroll_and_jam.disable"))
		return TM_SuppressedByUser;

		Optional<int> Count =
		getOptionalIntLoopAttribute(L, "llvm.loop.unroll_and_jam.count");
		if (Count.hasValue())
		return Count.getValue() == 1 ? TM_SuppressedByUser : TM_ForcedByUser;

		if (getBooleanLoopAttribute(L, "llvm.loop.unroll_and_jam.enable"))
		return TM_ForcedByUser;

		if (hasDisableAllTransformsHint(L))
		return TM_Disable;

		return TM_Unspecified;
		}

		TransformationMode llvm::hasVectorizeTransformation(Loop *L) {
		Optional<bool> Enable =
		getOptionalBoolLoopAttribute(L, "llvm.loop.vectorize.enable");

		if (Enable == false)
		return TM_SuppressedByUser;

		Optional<int> VectorizeWidth =
		getOptionalIntLoopAttribute(L, "llvm.loop.vectorize.width");
		Optional<int> InterleaveCount =
		getOptionalIntLoopAttribute(L, "llvm.loop.interleave.count");

		if (Enable == true) {
		// 'Forcing' vector width and interleave count to one effectively disables
		// this tranformation.
		if (VectorizeWidth == 1 && InterleaveCount == 1)
		return TM_SuppressedByUser;
		return TM_ForcedByUser;
		}

		if (getBooleanLoopAttribute(L, "llvm.loop.isvectorized"))
		return TM_Disable;

		if (VectorizeWidth == 1 && InterleaveCount == 1)
		return TM_Disable;

		if (VectorizeWidth > 1 \|\| InterleaveCount > 1)
		return TM_Enable;

		if (hasDisableAllTransformsHint(L))
		return TM_Disable;

		return TM_Unspecified;
		}

		TransformationMode llvm::hasDistributeTransformation(Loop *L) {
		if (getBooleanLoopAttribute(L, "llvm.loop.distribute.enable"))
		return TM_ForcedByUser;

		if (hasDisableAllTransformsHint(L))
		return TM_Disable;

		return TM_Unspecified;
		}

		TransformationMode llvm::hasLICMVersioningTransformation(Loop *L) {
		if (getBooleanLoopAttribute(L, "llvm.loop.licm_versioning.disable"))
		return TM_SuppressedByUser;

		if (hasDisableAllTransformsHint(L))
		return TM_Disable;

		return TM_Unspecified;
		}

/// Does a BFS from a given node to all of its children inside a given loop.		/// Does a BFS from a given node to all of its children inside a given loop.
/// The returned vector of nodes includes the starting point.		/// The returned vector of nodes includes the starting point.
SmallVector<DomTreeNode *, 16>		SmallVector<DomTreeNode *, 16>
llvm::collectChildrenInLoop(DomTreeNode N, const Loop CurLoop) {		llvm::collectChildrenInLoop(DomTreeNode N, const Loop CurLoop) {
SmallVector<DomTreeNode *, 16> Worklist;		SmallVector<DomTreeNode *, 16> Worklist;
auto AddRegionToWorklist = [&](DomTreeNode *DTN) {		auto AddRegionToWorklist = [&](DomTreeNode *DTN) {
// Only include subregions in the top level loop.		// Only include subregions in the top level loop.
BasicBlock *BB = DTN->getBlock();		BasicBlock *BB = DTN->getBlock();
▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define LV_NAME "loop-vectorize"		#define LV_NAME "loop-vectorize"
#define DEBUG_TYPE LV_NAME		#define DEBUG_TYPE LV_NAME

		/// @{
		/// Metadata attribute names
		const char *const LLVMLoopVectorizeFollowupAll =
		"llvm.loop.vectorize.followup_all";
		const char *const LLVMLoopVectorizeFollowupVectorized =
		"llvm.loop.vectorize.followup_vectorized";
		const char *const LLVMLoopVectorizeFollowupEpilogue =
		"llvm.loop.vectorize.followup_epilogue";
		/// @}

STATISTIC(LoopsVectorized, "Number of loops vectorized");		STATISTIC(LoopsVectorized, "Number of loops vectorized");
STATISTIC(LoopsAnalyzed, "Number of loops analyzed for vectorization");		STATISTIC(LoopsAnalyzed, "Number of loops analyzed for vectorization");

/// Loops with a known constant trip count below this number are vectorized only		/// Loops with a known constant trip count below this number are vectorized only
/// if no scalar iteration overheads are incurred.		/// if no scalar iteration overheads are incurred.
static cl::opt<unsigned> TinyTripCountVectorThreshold(		static cl::opt<unsigned> TinyTripCountVectorThreshold(
"vectorizer-min-trip-count", cl::init(16), cl::Hidden,		"vectorizer-min-trip-count", cl::init(16), cl::Hidden,
cl::desc("Loops with a constant trip count that is smaller than this "		cl::desc("Loops with a constant trip count that is smaller than this "
▲ Show 20 Lines • Show All 628 Lines • ▼ Show 20 Lines
void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,		void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,
Instruction *From) {		Instruction *From) {
for (Value *V : To) {		for (Value *V : To) {
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
addMetadata(I, From);		addMetadata(I, From);
}		}
}		}

static void emitMissedWarning(Function F, Loop L,
const LoopVectorizeHints &LH,
OptimizationRemarkEmitter *ORE) {
LH.emitRemarkWithHints();

if (LH.getForce() == LoopVectorizeHints::FK_Enabled) {
if (LH.getWidth() != 1)
ORE->emit(DiagnosticInfoOptimizationFailure(
DEBUG_TYPE, "FailedRequestedVectorization",
L->getStartLoc(), L->getHeader())
<< "loop not vectorized: "
<< "failed explicitly specified loop vectorization");
else if (LH.getInterleave() != 1)
ORE->emit(DiagnosticInfoOptimizationFailure(
DEBUG_TYPE, "FailedRequestedInterleaving", L->getStartLoc(),
L->getHeader())
<< "loop not interleaved: "
<< "failed explicitly specified loop interleaving");
}
}

namespace llvm {		namespace llvm {

/// LoopVectorizationCostModel - estimates the expected speedups due to		/// LoopVectorizationCostModel - estimates the expected speedups due to
/// vectorization.		/// vectorization.
/// In many cases vectorization is not profitable. This can happen because of		/// In many cases vectorization is not profitable. This can happen because of
/// a number of reasons. In this class we mainly attempt to predict the		/// a number of reasons. In this class we mainly attempt to predict the
/// expected speedup/slowdowns due to the supported instruction set. We use the		/// expected speedup/slowdowns due to the supported instruction set. We use the
/// TargetTransformInfo to query the different backends for the cost of		/// TargetTransformInfo to query the different backends for the cost of
▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	static bool isExplicitVecOuterLoop(Loop *OuterLp,
Function *Fn = OuterLp->getHeader()->getParent();		Function *Fn = OuterLp->getHeader()->getParent();
if (!Hints.allowVectorization(Fn, OuterLp, false /AlwaysVectorize/)) {		if (!Hints.allowVectorization(Fn, OuterLp, false /AlwaysVectorize/)) {
LLVM_DEBUG(dbgs() << "LV: Loop hints prevent outer loop vectorization.\n");		LLVM_DEBUG(dbgs() << "LV: Loop hints prevent outer loop vectorization.\n");
return false;		return false;
}		}

if (!Hints.getWidth()) {		if (!Hints.getWidth()) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: No user vector width.\n");		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: No user vector width.\n");
emitMissedWarning(Fn, OuterLp, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

if (Hints.getInterleave() > 1) {		if (Hints.getInterleave() > 1) {
// TODO: Interleave support is future work.		// TODO: Interleave support is future work.
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Interleave is not supported for "		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Interleave is not supported for "
"outer loops.\n");		"outer loops.\n");
emitMissedWarning(Fn, OuterLp, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

return true;		return true;
}		}

static void collectSupportedLoops(Loop &L, LoopInfo *LI,		static void collectSupportedLoops(Loop &L, LoopInfo *LI,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
▲ Show 20 Lines • Show All 1,337 Lines • ▼ Show 20 Lines	\| [ ]_\| <-- old scalar loop to handle remainder.
\ v		\ v
>[ ] <-- exit block.		>[ ] <-- exit block.
...		...
*/		*/

BasicBlock *OldBasicBlock = OrigLoop->getHeader();		BasicBlock *OldBasicBlock = OrigLoop->getHeader();
BasicBlock *VectorPH = OrigLoop->getLoopPreheader();		BasicBlock *VectorPH = OrigLoop->getLoopPreheader();
BasicBlock *ExitBlock = OrigLoop->getExitBlock();		BasicBlock *ExitBlock = OrigLoop->getExitBlock();
		MDNode *OrigLoopID = OrigLoop->getLoopID();
assert(VectorPH && "Invalid loop structure");		assert(VectorPH && "Invalid loop structure");
assert(ExitBlock && "Must have an exit block");		assert(ExitBlock && "Must have an exit block");

// Some loops have a single integer induction variable, while other loops		// Some loops have a single integer induction variable, while other loops
// don't. One example is c++ iterators that often have multiple pointer		// don't. One example is c++ iterators that often have multiple pointer
// induction variables. In the code below we also support a case where we		// induction variables. In the code below we also support a case where we
// don't have a single induction variable.		// don't have a single induction variable.
//		//
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// Save the state.		// Save the state.
LoopVectorPreHeader = Lp->getLoopPreheader();		LoopVectorPreHeader = Lp->getLoopPreheader();
LoopScalarPreHeader = ScalarPH;		LoopScalarPreHeader = ScalarPH;
LoopMiddleBlock = MiddleBlock;		LoopMiddleBlock = MiddleBlock;
LoopExitBlock = ExitBlock;		LoopExitBlock = ExitBlock;
LoopVectorBody = VecBody;		LoopVectorBody = VecBody;
LoopScalarBody = OldBasicBlock;		LoopScalarBody = OldBasicBlock;

		Optional<MDNode *> VectorizedLoopID =
		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
		LLVMLoopVectorizeFollowupVectorized});
		if (VectorizedLoopID.hasValue()) {
		Lp->setLoopID(VectorizedLoopID.getValue());

		// Do not setAlreadyVectorized if loop attributes have been defined
		// explicitly.
		return LoopVectorPreHeader;
		}

// Keep all loop hints from the original loop on the vector loop (we'll		// Keep all loop hints from the original loop on the vector loop (we'll
// replace the vectorizer-specific hints below).		// replace the vectorizer-specific hints below).
if (MDNode *LID = OrigLoop->getLoopID())		if (MDNode *LID = OrigLoop->getLoopID())
Lp->setLoopID(LID);		Lp->setLoopID(LID);

LoopVectorizeHints Hints(Lp, true, *ORE);		LoopVectorizeHints Hints(Lp, true, *ORE);
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();

▲ Show 20 Lines • Show All 4,279 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
PredicatedScalarEvolution PSE(SE, L);		PredicatedScalarEvolution PSE(SE, L);

// Check if it is legal to vectorize the loop.		// Check if it is legal to vectorize the loop.
LoopVectorizationRequirements Requirements(*ORE);		LoopVectorizationRequirements Requirements(*ORE);
LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, GetLAA, LI, ORE,		LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, GetLAA, LI, ORE,
&Requirements, &Hints, DB, AC);		&Requirements, &Hints, DB, AC);
if (!LVL.canVectorize(EnableVPlanNativePath)) {		if (!LVL.canVectorize(EnableVPlanNativePath)) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
emitMissedWarning(F, L, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Check the function attributes to find out if this function should be		// Check the function attributes to find out if this function should be
// optimized for size.		// optimized for size.
bool OptForSize =		bool OptForSize =
Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();		Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
// an integer loop and the vector instructions selected are purely integer		// an integer loop and the vector instructions selected are purely integer
// vector instructions?		// vector instructions?
if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {		if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {
LLVM_DEBUG(dbgs() << "LV: Can't vectorize when the NoImplicitFloat"		LLVM_DEBUG(dbgs() << "LV: Can't vectorize when the NoImplicitFloat"
"attribute is used.\n");		"attribute is used.\n");
ORE->emit(createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(),
"NoImplicitFloat", L)		"NoImplicitFloat", L)
<< "loop not vectorized due to NoImplicitFloat attribute");		<< "loop not vectorized due to NoImplicitFloat attribute");
emitMissedWarning(F, L, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Check if the target supports potentially unsafe FP vectorization.		// Check if the target supports potentially unsafe FP vectorization.
// FIXME: Add a check for the type of safety issue (denormal, signaling)		// FIXME: Add a check for the type of safety issue (denormal, signaling)
// for the target we're vectorizing for, to make sure none of the		// for the target we're vectorizing for, to make sure none of the
// additional fp-math flags can help.		// additional fp-math flags can help.
if (Hints.isPotentiallyUnsafe() &&		if (Hints.isPotentiallyUnsafe() &&
TTI->isFPVectorizationPotentiallyUnsafe()) {		TTI->isFPVectorizationPotentiallyUnsafe()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");		dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");
ORE->emit(		ORE->emit(
createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)		createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)
<< "loop not vectorized due to unsafe FP support.");		<< "loop not vectorized due to unsafe FP support.");
emitMissedWarning(F, L, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

bool UseInterleaved = TTI->enableInterleavedAccessVectorization();		bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());		InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());

// If an override option has been passed in for interleaved accesses, use it.		// If an override option has been passed in for interleaved accesses, use it.
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)		if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
Show All 25 Lines	#endif /* NDEBUG */
unsigned UserIC = Hints.getInterleave();		unsigned UserIC = Hints.getInterleave();

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;		std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
bool VectorizeLoop = true, InterleaveLoop = true;		bool VectorizeLoop = true, InterleaveLoop = true;
if (Requirements.doesNotMeet(F, L, Hints)) {		if (Requirements.doesNotMeet(F, L, Hints)) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
"requirements.\n");		"requirements.\n");
emitMissedWarning(F, L, Hints, ORE);		Hints.emitRemarkWithHints();
return false;		return false;
}		}

if (VF.Width == 1) {		if (VF.Width == 1) {
LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");		LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
VecDiagMsg = std::make_pair(		VecDiagMsg = std::make_pair(
"VectorizationNotBeneficial",		"VectorizationNotBeneficial",
"the cost-model indicates that vectorization is not beneficial");		"the cost-model indicates that vectorization is not beneficial");
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (!VectorizeLoop && !InterleaveLoop) {
LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width		LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width
<< ") in " << DebugLocStr << '\n');		<< ") in " << DebugLocStr << '\n');
LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');		LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
}		}

LVP.setBestPlan(VF.Width, IC);		LVP.setBestPlan(VF.Width, IC);

using namespace ore;		using namespace ore;
		bool DisableRuntimeUnroll = false;
		MDNode *OrigLoopID = L->getLoopID();

if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
// interleave it.		// interleave it.
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,		InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
&CM);		&CM);
LVP.executePlan(Unroller, DT);		LVP.executePlan(Unroller, DT);
Show All 10 Lines	InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
&LVL, &CM);		&LVL, &CM);
LVP.executePlan(LB, DT);		LVP.executePlan(LB, DT);
++LoopsVectorized;		++LoopsVectorized;

// Add metadata to disable runtime unrolling a scalar loop when there are		// Add metadata to disable runtime unrolling a scalar loop when there are
// no runtime checks about strides and memory. A scalar loop that is		// no runtime checks about strides and memory. A scalar loop that is
// rarely used is not worth unrolling.		// rarely used is not worth unrolling.
if (!LB.areSafetyChecksAdded())		if (!LB.areSafetyChecksAdded())
AddRuntimeUnrollDisableMetaData(L);		DisableRuntimeUnroll = true;

// Report the vectorization decision.		// Report the vectorization decision.
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(LV_NAME, "Vectorized", L->getStartLoc(),		return OptimizationRemark(LV_NAME, "Vectorized", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "vectorized loop (vectorization width: "		<< "vectorized loop (vectorization width: "
<< NV("VectorizationFactor", VF.Width)		<< NV("VectorizationFactor", VF.Width)
<< ", interleaved count: " << NV("InterleaveCount", IC) << ")";		<< ", interleaved count: " << NV("InterleaveCount", IC) << ")";
});		});
}		}

		Optional<MDNode *> RemainderLoopID =
		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
		LLVMLoopVectorizeFollowupEpilogue});
		if (RemainderLoopID.hasValue()) {
		L->setLoopID(RemainderLoopID.getValue());
		} else {
		if (DisableRuntimeUnroll)
		AddRuntimeUnrollDisableMetaData(L);

// Mark the loop as already vectorized to avoid vectorizing again.		// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();
		}

LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));		LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));
return true;		return true;
}		}

bool LoopVectorizePass::runImpl(		bool LoopVectorizePass::runImpl(
Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,		Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,
DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,		DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
				; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-POSTLINK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
				; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
				; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	; CHECK-NEXT: Dead Global Elimination			; CHECK-NEXT: Dead Global Elimination
	; CHECK-NEXT: Merge Duplicate Global Constants			; CHECK-NEXT: Merge Duplicate Global Constants
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	Show All 40 Lines

test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
				; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	; CHECK-NEXT: Dead Global Elimination			; CHECK-NEXT: Dead Global Elimination
	; CHECK-NEXT: Merge Duplicate Global Constants			; CHECK-NEXT: Merge Duplicate Global Constants
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	Show All 40 Lines

test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
				; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	; CHECK-NEXT: Dead Global Elimination			; CHECK-NEXT: Dead Global Elimination
	; CHECK-NEXT: Merge Duplicate Global Constants			; CHECK-NEXT: Merge Duplicate Global Constants
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	Show All 40 Lines

test/Other/opt-hot-cold-split.ll

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
				; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	; CHECK-NEXT: Dead Global Elimination			; CHECK-NEXT: Dead Global Elimination
	; CHECK-NEXT: Merge Duplicate Global Constants			; CHECK-NEXT: Merge Duplicate Global Constants
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/Transforms/LoopDistribute/disable-heuristic.ll

This file was added.

				; RUN: opt -basicaa -loop-distribute -enable-loop-distribute=0 -S < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_heuristic(
				; EXPLICIT-NOT: for.body.ldist1:
				define void @disable_heuristic(i32* noalias %a,
				i32* noalias %b,
				i32* noalias %c,
				i32* noalias %d,
				i32* noalias %e) {
				entry:
				br label %for.body

				for.body:
				%ind = phi i64 [ 0, %entry ], [ %add, %for.body ]

				%arrayidxA = getelementptr inbounds i32, i32* %a, i64 %ind
				%loadA = load i32, i32* %arrayidxA, align 4

				%arrayidxB = getelementptr inbounds i32, i32* %b, i64 %ind
				%loadB = load i32, i32* %arrayidxB, align 4

				%mulA = mul i32 %loadB, %loadA

				%add = add nuw nsw i64 %ind, 1
				%arrayidxA_plus_4 = getelementptr inbounds i32, i32* %a, i64 %add
				store i32 %mulA, i32* %arrayidxA_plus_4, align 4

				%arrayidxD = getelementptr inbounds i32, i32* %d, i64 %ind
				%loadD = load i32, i32* %arrayidxD, align 4

				%arrayidxE = getelementptr inbounds i32, i32* %e, i64 %ind
				%loadE = load i32, i32* %arrayidxE, align 4

				%mulC = mul i32 %loadD, %loadE

				%arrayidxC = getelementptr inbounds i32, i32* %c, i64 %ind
				store i32 %mulC, i32* %arrayidxC, align 4

				%exitcond = icmp eq i64 %add, 20
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = distinct !{!0, !{!"llvm.loop.transformations.disable_nonforced"}}

test/Transforms/LoopDistribute/followup.ll

This file was added.

				; RUN: opt -basicaa -loop-distribute -S < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %a, i32* %b, i32* %c, i32* %d, i32* %e) {
				entry:
				br label %for.body

				for.body:
				%ind = phi i64 [ 0, %entry ], [ %add, %for.body ]

				%arrayidxA = getelementptr inbounds i32, i32* %a, i64 %ind
				%loadA = load i32, i32* %arrayidxA, align 4

				%arrayidxB = getelementptr inbounds i32, i32* %b, i64 %ind
				%loadB = load i32, i32* %arrayidxB, align 4

				%mulA = mul i32 %loadB, %loadA

				%add = add nuw nsw i64 %ind, 1
				%arrayidxA_plus_4 = getelementptr inbounds i32, i32* %a, i64 %add
				store i32 %mulA, i32* %arrayidxA_plus_4, align 4

				%arrayidxD = getelementptr inbounds i32, i32* %d, i64 %ind
				%loadD = load i32, i32* %arrayidxD, align 4

				%arrayidxE = getelementptr inbounds i32, i32* %e, i64 %ind
				%loadE = load i32, i32* %arrayidxE, align 4

				%mulC = mul i32 %loadD, %loadE

				%arrayidxC = getelementptr inbounds i32, i32* %c, i64 %ind
				store i32 %mulC, i32* %arrayidxC, align 4

				%exitcond = icmp eq i64 %add, 20
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.distribute.enable", i1 true}
				!2 = !{!"llvm.loop.distribute.followup_all", !{!"llvm.loop.unroll.runtime.disable"}}
				!3 = !{!"llvm.loop.distribute.followup_coincident", !{!"llvm.loop.vectorize.enable", i1 false}}
				!4 = !{!"llvm.loop.distribute.followup_sequential", !{!"llvm.loop.vectorize.width", i32 8}}
				!5 = !{!"llvm.loop.distribute.followup_fallback", !{!"llvm.loop.unroll.disable"}}

				; CHECK-LABEL: for.body.lver.orig:
				; CHECK: br i1 %exitcond.lver.orig, label %for.end, label %for.body.lver.orig, !llvm.loop ![[LOOP_ORIG:[0-9]+]]
				; CHECK-LABEL: for.body.ldist1:
				; CHECK: br i1 %exitcond.ldist1, label %for.body.ph, label %for.body.ldist1, !llvm.loop ![[LOOP_SEQUENTIAL:[0-9]+]]
				; CHECK-LABEL: for.body:
				; CHECK: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop ![[LOOP_COINCIDENT:[0-9]+]]

				; CHECK: ![[LOOP_ORIG]] = distinct !{![[LOOP_ORIG]], ![[RUNTIME_DISABLE:[0-9]+]], ![[UNROLL_DISABLE:[0-9]+]]}
				; CHECK: ![[RUNTIME_DISABLE]] = !{!"llvm.loop.unroll.runtime.disable"}
				; CHECK: ![[UNROLL_DISABLE]] = !{!"llvm.loop.unroll.disable"}
				; CHECK: ![[LOOP_SEQUENTIAL]] = distinct !{![[LOOP_SEQUENTIAL]], ![[RUNTIME_DISABLE]], ![[WIDTH:[0-9]+]]}
				; CHECK: ![[WIDTH]] = !{!"llvm.loop.vectorize.width", i32 8}
				; CHECK: ![[LOOP_COINCIDENT]] = distinct !{![[LOOP_COINCIDENT]], ![[RUNTIME_DISABLE]], ![[VECTORIZE_ENABLE:[0-9]+]]}
				; CHECK: ![[VECTORIZE_ENABLE]] = !{!"llvm.loop.vectorize.enable", i1 false}

test/Transforms/LoopTransformWarning/distribution-remarks-missed.ll

This file was added.

				; Legacy pass manager
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

				; New pass manager
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s


				; CHECK: warning: source.cpp:19:5: loop not distributed: failed explicitly specified loop distribution

				; YAML: --- !Failure
				; YAML-NEXT: Pass: transform-warning
				; YAML-NEXT: Name: FailedRequestedDistribution
				; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }
				; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
				; YAML-NEXT: Args:
				; YAML-NEXT: - String: 'loop not distributed: failed explicitly specified loop distribution'
				; YAML-NEXT: ...

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @_Z17test_array_boundsPiS_i(i32* nocapture %A, i32* nocapture readonly %B, i32 %Length) !dbg !8 {
				entry:
				%cmp9 = icmp sgt i32 %Length, 0, !dbg !32
				br i1 %cmp9, label %for.body.preheader, label %for.end, !dbg !32

				for.body.preheader:
				br label %for.body, !dbg !35

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %B, i64 %indvars.iv, !dbg !35
				%0 = load i32, i32* %arrayidx, align 4, !dbg !35, !tbaa !18
				%idxprom1 = sext i32 %0 to i64, !dbg !35
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1, !dbg !35
				%1 = load i32, i32* %arrayidx2, align 4, !dbg !35, !tbaa !18
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !35
				store i32 %1, i32* %arrayidx4, align 4, !dbg !35, !tbaa !18
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32, !dbg !32
				%exitcond = icmp eq i32 %lftr.wideiv, %Length, !dbg !32
				br i1 %exitcond, label %for.end.loopexit, label %for.body, !dbg !32, !llvm.loop !50

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void, !dbg !36
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!9, !10}
				!llvm.ident = !{!11}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5.0", isOptimized: true, runtimeVersion: 6, emissionKind: LineTablesOnly, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "source.cpp", directory: ".")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "test", line: 1, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 1, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!5 = !DIFile(filename: "source.cpp", directory: ".")
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "test_disabled", line: 10, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 10, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!8 = distinct !DISubprogram(name: "test_array_bounds", line: 16, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 16, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!9 = !{i32 2, !"Dwarf Version", i32 2}
				!10 = !{i32 2, !"Debug Info Version", i32 3}
				!11 = !{!"clang version 3.5.0"}
				!12 = !DILocation(line: 3, column: 8, scope: !13)
				!13 = distinct !DILexicalBlock(line: 3, column: 3, file: !1, scope: !4)
				!16 = !DILocation(line: 4, column: 5, scope: !17)
				!17 = distinct !DILexicalBlock(line: 3, column: 36, file: !1, scope: !13)
				!18 = !{!19, !19, i64 0}
				!19 = !{!"int", !20, i64 0}
				!20 = !{!"omnipotent char", !21, i64 0}
				!21 = !{!"Simple C/C++ TBAA"}
				!22 = !DILocation(line: 5, column: 9, scope: !23)
				!23 = distinct !DILexicalBlock(line: 5, column: 9, file: !1, scope: !17)
				!24 = !DILocation(line: 8, column: 1, scope: !4)
				!25 = !DILocation(line: 12, column: 8, scope: !26)
				!26 = distinct !DILexicalBlock(line: 12, column: 3, file: !1, scope: !7)
				!30 = !DILocation(line: 13, column: 5, scope: !26)
				!31 = !DILocation(line: 14, column: 1, scope: !7)
				!32 = !DILocation(line: 18, column: 8, scope: !33)
				!33 = distinct !DILexicalBlock(line: 18, column: 3, file: !1, scope: !8)
				!35 = !DILocation(line: 19, column: 5, scope: !33)
				!36 = !DILocation(line: 20, column: 1, scope: !8)
				!37 = distinct !DILexicalBlock(line: 24, column: 3, file: !1, scope: !46)
				!38 = !DILocation(line: 27, column: 3, scope: !37)
				!39 = !DILocation(line: 31, column: 3, scope: !37)
				!40 = !DILocation(line: 28, column: 9, scope: !37)
				!41 = !DILocation(line: 29, column: 11, scope: !37)
				!42 = !DILocation(line: 29, column: 7, scope: !37)
				!43 = !DILocation(line: 27, column: 32, scope: !37)
				!44 = !DILocation(line: 27, column: 30, scope: !37)
				!45 = !DILocation(line: 27, column: 21, scope: !37)
				!46 = distinct !DISubprogram(name: "test_multiple_failures", line: 26, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 26, file: !1, scope: !5, type: !6, retainedNodes: !2)

				!50 = !{!50, !{!"llvm.loop.distribute.enable"}}
				No newline at end of file

test/Transforms/LoopTransformWarning/unrollandjam-remarks-missed.ll

This file was added.

				; Legacy pass manager
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

				; New pass manager
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s


				; CHECK: warning: source.cpp:19:5: loop not unroll-and-jammed: failed explicitly specified loop unroll-and-jam

				; YAML: --- !Failure
				; YAML-NEXT: Pass: transform-warning
				; YAML-NEXT: Name: FailedRequestedUnrollAndJamming
				; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }
				; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
				; YAML-NEXT: Args:
				; YAML-NEXT: - String: 'loop not unroll-and-jammed: failed explicitly specified loop unroll-and-jam'
				; YAML-NEXT: ...

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @_Z17test_array_boundsPiS_i(i32* nocapture %A, i32* nocapture readonly %B, i32 %Length) !dbg !8 {
				entry:
				%cmp9 = icmp sgt i32 %Length, 0, !dbg !32
				br i1 %cmp9, label %for.body.preheader, label %for.end, !dbg !32

				for.body.preheader:
				br label %for.body, !dbg !35

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %B, i64 %indvars.iv, !dbg !35
				%0 = load i32, i32* %arrayidx, align 4, !dbg !35, !tbaa !18
				%idxprom1 = sext i32 %0 to i64, !dbg !35
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1, !dbg !35
				%1 = load i32, i32* %arrayidx2, align 4, !dbg !35, !tbaa !18
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !35
				store i32 %1, i32* %arrayidx4, align 4, !dbg !35, !tbaa !18
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32, !dbg !32
				%exitcond = icmp eq i32 %lftr.wideiv, %Length, !dbg !32
				br i1 %exitcond, label %for.end.loopexit, label %for.body, !dbg !32, !llvm.loop !50

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void, !dbg !36
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!9, !10}
				!llvm.ident = !{!11}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5.0", isOptimized: true, runtimeVersion: 6, emissionKind: LineTablesOnly, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "source.cpp", directory: ".")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "test", line: 1, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 1, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!5 = !DIFile(filename: "source.cpp", directory: ".")
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "test_disabled", line: 10, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 10, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!8 = distinct !DISubprogram(name: "test_array_bounds", line: 16, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 16, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!9 = !{i32 2, !"Dwarf Version", i32 2}
				!10 = !{i32 2, !"Debug Info Version", i32 3}
				!11 = !{!"clang version 3.5.0"}
				!12 = !DILocation(line: 3, column: 8, scope: !13)
				!13 = distinct !DILexicalBlock(line: 3, column: 3, file: !1, scope: !4)
				!16 = !DILocation(line: 4, column: 5, scope: !17)
				!17 = distinct !DILexicalBlock(line: 3, column: 36, file: !1, scope: !13)
				!18 = !{!19, !19, i64 0}
				!19 = !{!"int", !20, i64 0}
				!20 = !{!"omnipotent char", !21, i64 0}
				!21 = !{!"Simple C/C++ TBAA"}
				!22 = !DILocation(line: 5, column: 9, scope: !23)
				!23 = distinct !DILexicalBlock(line: 5, column: 9, file: !1, scope: !17)
				!24 = !DILocation(line: 8, column: 1, scope: !4)
				!25 = !DILocation(line: 12, column: 8, scope: !26)
				!26 = distinct !DILexicalBlock(line: 12, column: 3, file: !1, scope: !7)
				!30 = !DILocation(line: 13, column: 5, scope: !26)
				!31 = !DILocation(line: 14, column: 1, scope: !7)
				!32 = !DILocation(line: 18, column: 8, scope: !33)
				!33 = distinct !DILexicalBlock(line: 18, column: 3, file: !1, scope: !8)
				!35 = !DILocation(line: 19, column: 5, scope: !33)
				!36 = !DILocation(line: 20, column: 1, scope: !8)
				!37 = distinct !DILexicalBlock(line: 24, column: 3, file: !1, scope: !46)
				!38 = !DILocation(line: 27, column: 3, scope: !37)
				!39 = !DILocation(line: 31, column: 3, scope: !37)
				!40 = !DILocation(line: 28, column: 9, scope: !37)
				!41 = !DILocation(line: 29, column: 11, scope: !37)
				!42 = !DILocation(line: 29, column: 7, scope: !37)
				!43 = !DILocation(line: 27, column: 32, scope: !37)
				!44 = !DILocation(line: 27, column: 30, scope: !37)
				!45 = !DILocation(line: 27, column: 21, scope: !37)
				!46 = distinct !DISubprogram(name: "test_multiple_failures", line: 26, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 26, file: !1, scope: !5, type: !6, retainedNodes: !2)

				!50 = !{!50, !{!"llvm.loop.unroll_and_jam.enable"}}
				No newline at end of file

test/Transforms/LoopTransformWarning/unrolling-remarks-missed.ll

This file was added.

				; Legacy pass manager
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

				; New pass manager
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s


				; CHECK: warning: source.cpp:19:5: loop not unrolled: failed explicitly specified loop unrolling

				; YAML: --- !Failure
				; YAML-NEXT: Pass: transform-warning
				; YAML-NEXT: Name: FailedRequestedUnrolling
				; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }
				; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
				; YAML-NEXT: Args:
				; YAML-NEXT: - String: 'loop not unrolled: failed explicitly specified loop unrolling'
				; YAML-NEXT: ...

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @_Z17test_array_boundsPiS_i(i32* nocapture %A, i32* nocapture readonly %B, i32 %Length) !dbg !8 {
				entry:
				%cmp9 = icmp sgt i32 %Length, 0, !dbg !32
				br i1 %cmp9, label %for.body.preheader, label %for.end, !dbg !32

				for.body.preheader:
				br label %for.body, !dbg !35

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %B, i64 %indvars.iv, !dbg !35
				%0 = load i32, i32* %arrayidx, align 4, !dbg !35, !tbaa !18
				%idxprom1 = sext i32 %0 to i64, !dbg !35
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1, !dbg !35
				%1 = load i32, i32* %arrayidx2, align 4, !dbg !35, !tbaa !18
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !35
				store i32 %1, i32* %arrayidx4, align 4, !dbg !35, !tbaa !18
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32, !dbg !32
				%exitcond = icmp eq i32 %lftr.wideiv, %Length, !dbg !32
				br i1 %exitcond, label %for.end.loopexit, label %for.body, !dbg !32, !llvm.loop !50

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void, !dbg !36
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!9, !10}
				!llvm.ident = !{!11}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5.0", isOptimized: true, runtimeVersion: 6, emissionKind: LineTablesOnly, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "source.cpp", directory: ".")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "test", line: 1, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 1, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!5 = !DIFile(filename: "source.cpp", directory: ".")
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "test_disabled", line: 10, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 10, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!8 = distinct !DISubprogram(name: "test_array_bounds", line: 16, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 16, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!9 = !{i32 2, !"Dwarf Version", i32 2}
				!10 = !{i32 2, !"Debug Info Version", i32 3}
				!11 = !{!"clang version 3.5.0"}
				!12 = !DILocation(line: 3, column: 8, scope: !13)
				!13 = distinct !DILexicalBlock(line: 3, column: 3, file: !1, scope: !4)
				!16 = !DILocation(line: 4, column: 5, scope: !17)
				!17 = distinct !DILexicalBlock(line: 3, column: 36, file: !1, scope: !13)
				!18 = !{!19, !19, i64 0}
				!19 = !{!"int", !20, i64 0}
				!20 = !{!"omnipotent char", !21, i64 0}
				!21 = !{!"Simple C/C++ TBAA"}
				!22 = !DILocation(line: 5, column: 9, scope: !23)
				!23 = distinct !DILexicalBlock(line: 5, column: 9, file: !1, scope: !17)
				!24 = !DILocation(line: 8, column: 1, scope: !4)
				!25 = !DILocation(line: 12, column: 8, scope: !26)
				!26 = distinct !DILexicalBlock(line: 12, column: 3, file: !1, scope: !7)
				!30 = !DILocation(line: 13, column: 5, scope: !26)
				!31 = !DILocation(line: 14, column: 1, scope: !7)
				!32 = !DILocation(line: 18, column: 8, scope: !33)
				!33 = distinct !DILexicalBlock(line: 18, column: 3, file: !1, scope: !8)
				!35 = !DILocation(line: 19, column: 5, scope: !33)
				!36 = !DILocation(line: 20, column: 1, scope: !8)
				!37 = distinct !DILexicalBlock(line: 24, column: 3, file: !1, scope: !46)
				!38 = !DILocation(line: 27, column: 3, scope: !37)
				!39 = !DILocation(line: 31, column: 3, scope: !37)
				!40 = !DILocation(line: 28, column: 9, scope: !37)
				!41 = !DILocation(line: 29, column: 11, scope: !37)
				!42 = !DILocation(line: 29, column: 7, scope: !37)
				!43 = !DILocation(line: 27, column: 32, scope: !37)
				!44 = !DILocation(line: 27, column: 30, scope: !37)
				!45 = !DILocation(line: 27, column: 21, scope: !37)
				!46 = distinct !DISubprogram(name: "test_multiple_failures", line: 26, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 26, file: !1, scope: !5, type: !6, retainedNodes: !2)

				!50 = !{!50, !{!"llvm.loop.unroll.enable"}}
				No newline at end of file

test/Transforms/LoopTransformWarning/vectorization-remarks-missed.ll

This file was added.

				; Legacy pass manager
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

				; New pass manager
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-missed=transform-warning -pass-remarks-analysis=transform-warning 2>&1 \| FileCheck %s
				; RUN: opt < %s -passes=transform-warning -disable-output -pass-remarks-output=%t.yaml
				; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

				; C/C++ code for tests
				; void test(int *A, int Length) {
				; #pragma clang loop vectorize(enable) interleave(enable)
				; for (int i = 0; i < Length; i++) {
				; A[i] = i;
				; if (A[i] > Length)
				; break;
				; }
				; }
				; File, line, and column should match those specified in the metadata
				; CHECK: warning: source.cpp:19:5: loop not vectorized: failed explicitly specified loop vectorization

				; YAML: --- !Failure
				; YAML-NEXT: Pass: transform-warning
				; YAML-NEXT: Name: FailedRequestedVectorization
				; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }
				; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
				; YAML-NEXT: Args:
				; YAML-NEXT: - String: 'loop not vectorized: '
				; YAML-NEXT: - String: failed explicitly specified loop vectorization
				; YAML-NEXT: ...

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @_Z17test_array_boundsPiS_i(i32* nocapture %A, i32* nocapture readonly %B, i32 %Length) !dbg !8 {
				entry:
				%cmp9 = icmp sgt i32 %Length, 0, !dbg !32
				br i1 %cmp9, label %for.body.preheader, label %for.end, !dbg !32, !llvm.loop !34

				for.body.preheader:
				br label %for.body, !dbg !35

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %B, i64 %indvars.iv, !dbg !35
				%0 = load i32, i32* %arrayidx, align 4, !dbg !35, !tbaa !18
				%idxprom1 = sext i32 %0 to i64, !dbg !35
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1, !dbg !35
				%1 = load i32, i32* %arrayidx2, align 4, !dbg !35, !tbaa !18
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !35
				store i32 %1, i32* %arrayidx4, align 4, !dbg !35, !tbaa !18
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32, !dbg !32
				%exitcond = icmp eq i32 %lftr.wideiv, %Length, !dbg !32
				br i1 %exitcond, label %for.end.loopexit, label %for.body, !dbg !32, !llvm.loop !34

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void, !dbg !36
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!9, !10}
				!llvm.ident = !{!11}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5.0", isOptimized: true, runtimeVersion: 6, emissionKind: LineTablesOnly, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "source.cpp", directory: ".")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "test", line: 1, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 1, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!5 = !DIFile(filename: "source.cpp", directory: ".")
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "test_disabled", line: 10, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 10, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!8 = distinct !DISubprogram(name: "test_array_bounds", line: 16, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 16, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!9 = !{i32 2, !"Dwarf Version", i32 2}
				!10 = !{i32 2, !"Debug Info Version", i32 3}
				!11 = !{!"clang version 3.5.0"}
				!12 = !DILocation(line: 3, column: 8, scope: !13)
				!13 = distinct !DILexicalBlock(line: 3, column: 3, file: !1, scope: !4)
				!14 = !{!14, !15, !15}
				!15 = !{!"llvm.loop.vectorize.enable", i1 true}
				!16 = !DILocation(line: 4, column: 5, scope: !17)
				!17 = distinct !DILexicalBlock(line: 3, column: 36, file: !1, scope: !13)
				!18 = !{!19, !19, i64 0}
				!19 = !{!"int", !20, i64 0}
				!20 = !{!"omnipotent char", !21, i64 0}
				!21 = !{!"Simple C/C++ TBAA"}
				!22 = !DILocation(line: 5, column: 9, scope: !23)
				!23 = distinct !DILexicalBlock(line: 5, column: 9, file: !1, scope: !17)
				!24 = !DILocation(line: 8, column: 1, scope: !4)
				!25 = !DILocation(line: 12, column: 8, scope: !26)
				!26 = distinct !DILexicalBlock(line: 12, column: 3, file: !1, scope: !7)
				!27 = !{!27, !28, !29}
				!28 = !{!"llvm.loop.interleave.count", i32 1}
				!29 = !{!"llvm.loop.vectorize.width", i32 1}
				!30 = !DILocation(line: 13, column: 5, scope: !26)
				!31 = !DILocation(line: 14, column: 1, scope: !7)
				!32 = !DILocation(line: 18, column: 8, scope: !33)
				!33 = distinct !DILexicalBlock(line: 18, column: 3, file: !1, scope: !8)
				!34 = !{!34, !15}
				!35 = !DILocation(line: 19, column: 5, scope: !33)
				!36 = !DILocation(line: 20, column: 1, scope: !8)
				!37 = distinct !DILexicalBlock(line: 24, column: 3, file: !1, scope: !46)
				!38 = !DILocation(line: 27, column: 3, scope: !37)
				!39 = !DILocation(line: 31, column: 3, scope: !37)
				!40 = !DILocation(line: 28, column: 9, scope: !37)
				!41 = !DILocation(line: 29, column: 11, scope: !37)
				!42 = !DILocation(line: 29, column: 7, scope: !37)
				!43 = !DILocation(line: 27, column: 32, scope: !37)
				!44 = !DILocation(line: 27, column: 30, scope: !37)
				!45 = !DILocation(line: 27, column: 21, scope: !37)
				!46 = distinct !DISubprogram(name: "test_multiple_failures", line: 26, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: true, unit: !0, scopeLine: 26, file: !1, scope: !5, type: !6, retainedNodes: !2)

test/Transforms/LoopUnroll/disable_nonforced.ll

This file was added.

				; RUN: opt -loop-unroll -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_nonforced(
				; CHECK: load
				; CHECK-NOT: load
				define void @disable_nonforced(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.transformations.disable_nonforced"}}

test/Transforms/LoopUnroll/disable_nonforced_count.ll

This file was added.

				; RUN: opt -loop-unroll -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_nonforced_count(
				; CHECK: store
				; CHECK: store
				; CHECK-NOT: store
				define void @disable_nonforced_count(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.transformations.disable_nonforced"}, !{!"llvm.loop.unroll.count", i32 2}}

test/Transforms/LoopUnroll/disable_nonforced_enable.ll

This file was added.

				; RUN: opt -loop-unroll -unroll-count=2 -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_nonforced_enable(
				; CHECK: store
				; CHECK: store
				; CHECK-NOT: store
				define void @disable_nonforced_enable(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.transformations.disable_nonforced"}, !{!"llvm.loop.unroll.enable"}}

test/Transforms/LoopUnroll/disable_nonforced_full.ll

This file was added.

				; RUN: opt -loop-unroll -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_nonforced_full(
				; CHECK: store
				; CHECK: store
				; CHECK: store
				; CHECK: store
				; CHECK-NOT: store
				define void @disable_nonforced_full(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.transformations.disable_nonforced"}, !{!"llvm.loop.unroll.full"}}

test/Transforms/LoopUnroll/runtime-loop_transform.ll

This file was added.

				; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=true \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

				; RUN: opt < %s -S -passes='require<opt-remark-emit>,unroll' -unroll-runtime=true -unroll-runtime-epilog=true \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -passes='require<opt-remark-emit>,unroll' -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; Tests for unrolling loops with run-time trip counts

				; EPILOG: %xtraiter = and i32 %n
				; EPILOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
				; EPILOG: br i1 %lcmp.mod, label %for.body.epil.preheader, label %for.end.loopexit

				; PROLOG: %xtraiter = and i32 %n
				; PROLOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
				; PROLOG: br i1 %lcmp.mod, label %for.body.prol.preheader, label %for.body.prol.loopexit

				; EPILOG: for.body.epil:
				; EPILOG: %indvars.iv.epil = phi i64 [ %indvars.iv.next.epil, %for.body.epil ], [ %indvars.iv.unr, %for.body.epil.preheader ]
				; EPILOG: %epil.iter.sub = sub i32 %epil.iter, 1
				; EPILOG: %epil.iter.cmp = icmp ne i32 %epil.iter.sub, 0
				; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !llvm.loop !2

				; PROLOG: for.body.prol:
				; PROLOG: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]
				; PROLOG: %prol.iter.sub = sub i32 %prol.iter, 1
				; PROLOG: %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
				; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit.unr-lcssa, !llvm.loop !0


				define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
				entry:
				%cmp1 = icmp eq i32 %n, 0
				br i1 %cmp1, label %for.end, label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.02
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]
				ret i32 %sum.0.lcssa
				}


				; Still try to completely unroll loops with compile-time trip counts
				; even if the -unroll-runtime is specified

				; EPILOG: for.body:
				; EPILOG-NOT: for.body.epil:

				; PROLOG: for.body:
				; PROLOG-NOT: for.body.prol:

				define i32 @test1(i32* nocapture %a) nounwind uwtable readonly {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%sum.01 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.01
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, 5
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret i32 %add
				}

				; This is test 2007-05-09-UnknownTripCount.ll which can be unrolled now
				; if the -unroll-runtime option is turned on

				; EPILOG: bb72.2:
				; PROLOG: bb72.2:

				define void @foo(i32 %trips) {
				entry:
				br label %cond_true.outer

				cond_true.outer:
				%indvar1.ph = phi i32 [ 0, %entry ], [ %indvar.next2, %bb72 ]
				br label %bb72

				bb72:
				%indvar.next2 = add i32 %indvar1.ph, 1
				%exitcond3 = icmp eq i32 %indvar.next2, %trips
				br i1 %exitcond3, label %cond_true138, label %cond_true.outer

				cond_true138:
				ret void
				}


				; Test run-time unrolling for a loop that counts down by -2.

				; EPILOG: for.body.epil:
				; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.cond.for.end_crit_edge.epilog-lcssa

				; PROLOG: for.body.prol:
				; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit

				define zeroext i16 @down(i16* nocapture %p, i32 %len) nounwind uwtable readonly {
				entry:
				%cmp2 = icmp eq i32 %len, 0
				br i1 %cmp2, label %for.end, label %for.body

				for.body: ; preds = %for.body, %entry
				%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]
				%len.addr.04 = phi i32 [ %sub, %for.body ], [ %len, %entry ]
				%res.03 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%incdec.ptr = getelementptr inbounds i16, i16* %p.addr.05, i64 1
				%0 = load i16, i16* %p.addr.05, align 2
				%conv = zext i16 %0 to i32
				%add = add i32 %conv, %res.03
				%sub = add nsw i32 %len.addr.04, -2
				%cmp = icmp eq i32 %sub, 0
				br i1 %cmp, label %for.cond.for.end_crit_edge, label %for.body

				for.cond.for.end_crit_edge: ; preds = %for.body
				%phitmp = trunc i32 %add to i16
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %entry
				%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]
				ret i16 %res.0.lcssa
				}

				; Test run-time unrolling disable metadata.
				; EPILOG: for.body:
				; EPILOG-NOT: for.body.epil:

				; PROLOG: for.body:
				; PROLOG-NOT: for.body.prol:

				define zeroext i16 @test2(i16* nocapture %p, i32 %len) nounwind uwtable readonly {
				entry:
				%cmp2 = icmp eq i32 %len, 0
				br i1 %cmp2, label %for.end, label %for.body

				for.body: ; preds = %for.body, %entry
				%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]
				%len.addr.04 = phi i32 [ %sub, %for.body ], [ %len, %entry ]
				%res.03 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%incdec.ptr = getelementptr inbounds i16, i16* %p.addr.05, i64 1
				%0 = load i16, i16* %p.addr.05, align 2
				%conv = zext i16 %0 to i32
				%add = add i32 %conv, %res.03
				%sub = add nsw i32 %len.addr.04, -2
				%cmp = icmp eq i32 %sub, 0
				br i1 %cmp, label %for.cond.for.end_crit_edge, label %for.body, !llvm.loop !0

				for.cond.for.end_crit_edge: ; preds = %for.body
				%phitmp = trunc i32 %add to i16
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %entry
				%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]
				ret i16 %res.0.lcssa
				}

				; dont unroll loop with multiple exit/exiting blocks, unless
				; -runtime-unroll-multi-exit=true
				; single exit, multiple exiting blocks.
				define void @unique_exit(i32 %arg) {
				; PROLOG: unique_exit(
				; PROLOG-NOT: .unr

				; EPILOG: unique_exit(
				; EPILOG-NOT: .unr
				entry:
				%tmp = icmp sgt i32 undef, %arg
				br i1 %tmp, label %preheader, label %returnblock

				preheader: ; preds = %entry
				br label %header

				LoopExit: ; preds = %header, %latch
				%tmp2.ph = phi i32 [ %tmp4, %header ], [ -1, %latch ]
				br label %returnblock

				returnblock: ; preds = %LoopExit, %entry
				%tmp2 = phi i32 [ -1, %entry ], [ %tmp2.ph, %LoopExit ]
				ret void

				header: ; preds = %preheader, %latch
				%tmp4 = phi i32 [ %inc, %latch ], [ %arg, %preheader ]
				%inc = add nsw i32 %tmp4, 1
				br i1 true, label %LoopExit, label %latch

				latch: ; preds = %header
				%cmp = icmp slt i32 %inc, undef
				br i1 %cmp, label %header, label %LoopExit
				}

				; multiple exit blocks. don't unroll
				define void @multi_exit(i64 %trip, i1 %cond) {
				; PROLOG: multi_exit(
				; PROLOG-NOT: .unr

				; EPILOG: multi_exit(
				; EPILOG-NOT: .unr
				entry:
				br label %loop_header

				loop_header:
				%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
				br i1 %cond, label %loop_latch, label %loop_exiting_bb1

				loop_exiting_bb1:
				br i1 false, label %loop_exiting_bb2, label %exit1

				loop_exiting_bb2:
				br i1 false, label %loop_latch, label %exit3

				exit3:
				ret void

				loop_latch:
				%iv_next = add i64 %iv, 1
				%cmp = icmp ne i64 %iv_next, %trip
				br i1 %cmp, label %loop_header, label %exit2.loopexit

				exit1:
				ret void

				exit2.loopexit:
				ret void
				}
				!0 = distinct !{!0, !1, !2, !3}
				!1 = !{!"llvm.loop.unroll.runtime.disable"}
				!2 = !{!"llvm.loop.unroll.followup_unrolled", !{!"llvm.loop.unroll.disable"}}
				!3 = !{!"llvm.loop.unroll.followup_remainder", !{!"llvm.loop.unroll.disable"}}
				!4 = distinct !{!4, !2, !3}

				; EPILOG: !0 = distinct !{!0, !1}
				; EPILOG: !1 = !{!"llvm.loop.unroll.disable"}

				; PROLOG: !0 = distinct !{!0, !1}
				; PROLOG: !1 = !{!"llvm.loop.unroll.disable"}

test/Transforms/LoopUnroll/unroll-count_transform.ll

This file was added.

				; RUN: opt < %s -S -loop-unroll -unroll-count=2 \| FileCheck %s
				; Checks that "llvm.loop.unroll.disable" is set when
				; unroll with count set by user has been applied.
				;
				; CHECK-LABEL: @foo(
				; CHECK: llvm.loop.unroll.disable

				define void @foo(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.unroll.followup", !{!"llvm.loop.unroll.disable"}}}

test/Transforms/LoopUnroll/unroll-pragmas-disabled_transform.ll

This file was added.

				; RUN: opt < %s -loop-unroll -S \| FileCheck %s
				;
				; Verify that the unrolling pass removes existing unroll count metadata
				; and adds a disable unrolling node after unrolling is complete.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; #pragma clang loop vectorize(enable) unroll_count(4) vectorize_width(8)
				;
				; Unroll count metadata should be replaced with unroll(disable). Vectorize
				; metadata should be untouched.
				;
				; CHECK-LABEL: @unroll_count_4(
				; CHECK: br i1 {{.}}, label {{.}}, label {{.}}, !llvm.loop ![[LOOP_1:.]]
				define void @unroll_count_4(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

				for.end: ; preds = %for.body
				ret void
				}
				!1 = !{!1, !3, !11}
				!2 = !{!"llvm.loop.vectorize.enable", i1 true}
				!3 = !{!"llvm.loop.unroll.count", i32 4}
				!4 = !{!"llvm.loop.vectorize.width", i32 8}
				!11 = !{!"llvm.loop.unroll.followup_unrolled", !2, !4, !{!"llvm.loop.unroll.disable"}}

				; #pragma clang loop unroll(full)
				;
				; An unroll disable metadata node is only added for the unroll count case.
				; In this case, the loop has a full unroll metadata but can't be fully unrolled
				; because the trip count is dynamic. The full unroll metadata should remain
				; after unrolling.
				;
				; CHECK-LABEL: @unroll_full(
				; CHECK: br i1 {{.}}, label {{.}}, label {{.}}, !llvm.loop ![[LOOP_2:.]]
				define void @unroll_full(i32* nocapture %a, i32 %b) {
				entry:
				%cmp3 = icmp sgt i32 %b, 0
				br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !5

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %b
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !5

				for.end: ; preds = %for.body, %entry
				ret void
				}
				!5 = !{!5, !6}
				!6 = !{!"llvm.loop.unroll.full"}

				; #pragma clang loop unroll(disable)
				;
				; Unroll metadata should not change.
				;
				; CHECK-LABEL: @unroll_disable(
				; CHECK: br i1 {{.}}, label {{.}}, label {{.}}, !llvm.loop ![[LOOP_3:.]]
				define void @unroll_disable(i32* nocapture %a) {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 64
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !7

				for.end: ; preds = %for.body
				ret void
				}
				!7 = !{!7, !8}
				!8 = !{!"llvm.loop.unroll.disable"}

				; This function contains two loops which share the same llvm.loop metadata node
				; with an llvm.loop.unroll.count 2 hint. Both loops should be unrolled. This
				; verifies that adding disable metadata to a loop after unrolling doesn't affect
				; other loops which previously shared the same llvm.loop metadata.
				;
				; CHECK-LABEL: @shared_metadata(
				; CHECK: store i32
				; CHECK: store i32
				; CHECK: br i1 {{.}}, label {{.}}, label {{.}}, !llvm.loop ![[LOOP_4:.]]
				; CHECK: store i32
				; CHECK: store i32
				; CHECK: br i1 {{.}}, label {{.}}, label {{.}}, !llvm.loop ![[LOOP_5:.]]
				define void @shared_metadata(i32* nocapture %List) #0 {
				entry:
				br label %for.body3

				for.body3: ; preds = %for.body3, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx = getelementptr inbounds i32, i32* %List, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add4 = add nsw i32 %0, 10
				store i32 %add4, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4
				br i1 %exitcond, label %for.body3.1.preheader, label %for.body3, !llvm.loop !9

				for.body3.1.preheader: ; preds = %for.body3
				br label %for.body3.1

				for.body3.1: ; preds = %for.body3.1.preheader, %for.body3.1
				%indvars.iv.1 = phi i64 [ %1, %for.body3.1 ], [ 0, %for.body3.1.preheader ]
				%1 = add nsw i64 %indvars.iv.1, 1
				%arrayidx.1 = getelementptr inbounds i32, i32* %List, i64 %1
				%2 = load i32, i32* %arrayidx.1, align 4
				%add4.1 = add nsw i32 %2, 10
				store i32 %add4.1, i32* %arrayidx.1, align 4
				%exitcond.1 = icmp eq i64 %1, 4
				br i1 %exitcond.1, label %for.inc5.1, label %for.body3.1, !llvm.loop !9

				for.inc5.1: ; preds = %for.body3.1
				ret void
				}
				!9 = !{!9, !10, !13}
				!10 = !{!"llvm.loop.unroll.count", i32 2}
				!13 = !{!"llvm.loop.unroll.followup_unrolled", !{!"llvm.loop.unroll.disable"}}

				; CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[VEC_ENABLE:.]], ![[WIDTH_8:.]], ![[UNROLL_DISABLE:.*]]}
				; CHECK: ![[VEC_ENABLE]] = !{!"llvm.loop.vectorize.enable", i1 true}
				; CHECK: ![[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}
				; CHECK: ![[UNROLL_DISABLE]] = !{!"llvm.loop.unroll.disable"}
				; CHECK: ![[LOOP_2]] = distinct !{![[LOOP_2]], ![[UNROLL_FULL:.*]]}
				; CHECK: ![[UNROLL_FULL]] = !{!"llvm.loop.unroll.full"}
				; CHECK: ![[LOOP_3]] = distinct !{![[LOOP_3]], ![[UNROLL_DISABLE:.*]]}
				; CHECK: ![[LOOP_4]] = distinct !{![[LOOP_4]], ![[UNROLL_DISABLE:.*]]}
				; CHECK: ![[LOOP_5]] = distinct !{![[LOOP_5]], ![[UNROLL_DISABLE:.*]]}

test/Transforms/LoopUnroll/unroll-pragmas_transform.ll

This file was added.

				; RUN: opt < %s -loop-unroll -pragma-unroll-threshold=1024 -S \| FileCheck -check-prefixes=CHECK,REM %s
				; RUN: opt < %s -loop-unroll -loop-unroll -pragma-unroll-threshold=1024 -S \| FileCheck -check-prefixes=CHECK,REM %s
				dmgreenUnsubmitted Not Done Reply Inline Actions This file look sensible on it's own and I think looks OK to be committed separately. (Apart from the nit below) dmgreen: This file look sensible on it's own and I think looks OK to be committed separately. (Apart…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions This is a copy of `unroll-pragmas.ll` and any ambiguous metadata replaced by follow-up attributes. An hope is to generally make 'multiple transformation attributes on the same loop' illegal and rejected by the IR verifier (since the result depends on an implementation detail: the order in the pass manager). In this case this file would replace `unroll-pragmas.ll`. But my expectation is that we cannot break backwards-compatibility this way. Meinersbur: This is a copy of `unroll-pragmas.ll` and any ambiguous metadata replaced by follow-up…
				dmgreenUnsubmitted Not Done Reply Inline Actions Ah, I missed the "followup" here. Is it worth replicating this entire file, or should it just be an extra test in the old file. The "followup" on unroll_1 seems to be the only test changed here? To add unroll.disable as a followup attribute? I'm not sure I see why. Would we expect "#pragma unroll(1)" to not work as it did before? (disable unroll) dmgreen: Ah, I missed the "followup" here. Is it worth replicating this entire file, or should it just…
				; RUN: opt < %s -loop-unroll -unroll-allow-remainder=0 -pragma-unroll-threshold=1024 -S \| FileCheck -check-prefixes=CHECK,NOREM %s
				;
				; Run loop unrolling twice to verify that loop unrolling metadata is properly
				; removed and further unrolling is disabled after the pass is run once.
				dmgreenUnsubmitted Not Done Reply Inline Actions Nit: Is this sentence still true? dmgreen: Nit: Is this sentence still true?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Yes: When no follow-up attributes are specified, the default ones are added (here: `llvm.loop.unroll.disable` to disable further unrolling). In case there are follow-up attribute lists, there is no default and the transformation-disabling must be added explicitly (MDNode `!18`) and of course added after unrolling and recognized by the second LoopUnroll. Meinersbur: Yes: When no follow-up attributes are specified, the default ones are added (here: `llvm.loop.

				; #pragma clang loop unroll_count(1)
				; Loop should not be unrolled
				;
				; CHECK-LABEL: @unroll_1(
				; CHECK: store i32
				; CHECK-NOT: store i32
				; CHECK: br i1
				define void @unroll_1(i32* nocapture %a, i32 %b) {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !10

				for.end: ; preds = %for.body
				ret void
				}
				!10 = !{!10, !11, !18}
				!11 = !{!"llvm.loop.unroll.count", i32 1}
				!18 = !{!"llvm.loop.unroll.followup", !{!"llvm.loop.unroll.disable"}}

test/Transforms/LoopUnrollAndJam/disable_nonforced.ll

This file was added.

				; RUN: opt -loop-unroll-and-jam -allow-unroll-and-jam -unroll-runtime -S < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				; CHECK-LABEL: disable_nonforced
				; CHECK: load
				; CHECK-NOT: load
				define void @disable_nonforced(i32 %I, i32 %J, i32* noalias nocapture %A, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp ne i32 %J, 0
				%cmp122 = icmp ne i32 %I, 0
				%or.cond = and i1 %cmp, %cmp122
				br i1 %or.cond, label %for.outer.preheader, label %for.end

				for.outer.preheader:
				br label %for.outer

				for.outer:
				%i.us = phi i32 [ %add8.us, %for.latch ], [ 0, %for.outer.preheader ]
				br label %for.inner

				for.inner:
				%j.us = phi i32 [ 0, %for.outer ], [ %inc.us, %for.inner ]
				%sum1.us = phi i32 [ 0, %for.outer ], [ %add.us, %for.inner ]
				%arrayidx.us = getelementptr inbounds i32, i32* %B, i32 %j.us
				%0 = load i32, i32* %arrayidx.us, align 4
				%add.us = add i32 %0, %sum1.us
				%inc.us = add nuw i32 %j.us, 1
				%exitcond = icmp eq i32 %inc.us, %J
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add.us.lcssa = phi i32 [ %add.us, %for.inner ]
				%arrayidx6.us = getelementptr inbounds i32, i32* %A, i32 %i.us
				store i32 %add.us.lcssa, i32* %arrayidx6.us, align 4
				%add8.us = add nuw i32 %i.us, 1
				%exitcond25 = icmp eq i32 %add8.us, %I
				br i1 %exitcond25, label %for.end.loopexit, label %for.outer, !llvm.loop !0

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				!0 = distinct !{!0, !{!"llvm.loop.disable_nonforced"}}

test/Transforms/LoopUnrollAndJam/disable_nonforced_count.ll

This file was added.

				; RUN: opt -loop-unroll-and-jam -allow-unroll-and-jam -S < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				; CHECK-LABEL: @disable_nonforced_enable(
				; CHECK: load
				; CHECK: load
				; CHECK-NOT: load
				; CHECK: br i1
				define void @disable_nonforced_enable(i32 %I, i32 %J, i32* noalias nocapture %A, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp ne i32 %J, 0
				%cmp122 = icmp ne i32 %I, 0
				%or.cond = and i1 %cmp, %cmp122
				br i1 %or.cond, label %for.outer.preheader, label %for.end

				for.outer.preheader:
				br label %for.outer

				for.outer:
				%i.us = phi i32 [ %add8.us, %for.latch ], [ 0, %for.outer.preheader ]
				br label %for.inner

				for.inner:
				%j.us = phi i32 [ 0, %for.outer ], [ %inc.us, %for.inner ]
				%sum1.us = phi i32 [ 0, %for.outer ], [ %add.us, %for.inner ]
				%arrayidx.us = getelementptr inbounds i32, i32* %B, i32 %j.us
				%0 = load i32, i32* %arrayidx.us, align 4
				%add.us = add i32 %0, %sum1.us
				%inc.us = add nuw i32 %j.us, 1
				%exitcond = icmp eq i32 %inc.us, %J
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add.us.lcssa = phi i32 [ %add.us, %for.inner ]
				%arrayidx6.us = getelementptr inbounds i32, i32* %A, i32 %i.us
				store i32 %add.us.lcssa, i32* %arrayidx6.us, align 4
				%add8.us = add nuw i32 %i.us, 1
				%exitcond25 = icmp eq i32 %add8.us, %I
				br i1 %exitcond25, label %for.end.loopexit, label %for.outer, !llvm.loop !0

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				!0 = distinct !{!0, !{!"llvm.loop.disable_nonforced"}, !{!"llvm.loop.unroll_and_jam.count", i32 2}}

test/Transforms/LoopUnrollAndJam/disable_nonforced_enable.ll

This file was added.

				; RUN: opt -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=2 -S < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				; CHECK-LABEL: disable_nonforced_enable
				; CHECK: load
				; CHECK: load
				; CHECK-NOT: load
				; CHECK: br i1
				define void @disable_nonforced_enable(i32 %I, i32 %J, i32* noalias nocapture %A, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp ne i32 %J, 0
				%cmp122 = icmp ne i32 %I, 0
				%or.cond = and i1 %cmp, %cmp122
				br i1 %or.cond, label %for.outer.preheader, label %for.end

				for.outer.preheader:
				br label %for.outer

				for.outer:
				%i.us = phi i32 [ %add8.us, %for.latch ], [ 0, %for.outer.preheader ]
				br label %for.inner

				for.inner:
				%j.us = phi i32 [ 0, %for.outer ], [ %inc.us, %for.inner ]
				%sum1.us = phi i32 [ 0, %for.outer ], [ %add.us, %for.inner ]
				%arrayidx.us = getelementptr inbounds i32, i32* %B, i32 %j.us
				%0 = load i32, i32* %arrayidx.us, align 4
				%add.us = add i32 %0, %sum1.us
				%inc.us = add nuw i32 %j.us, 1
				%exitcond = icmp eq i32 %inc.us, %J
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add.us.lcssa = phi i32 [ %add.us, %for.inner ]
				%arrayidx6.us = getelementptr inbounds i32, i32* %A, i32 %i.us
				store i32 %add.us.lcssa, i32* %arrayidx6.us, align 4
				%add8.us = add nuw i32 %i.us, 1
				%exitcond25 = icmp eq i32 %add8.us, %I
				br i1 %exitcond25, label %for.end.loopexit, label %for.outer, !llvm.loop !0

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				!0 = distinct !{!0, !{!"llvm.loop.disable_nonforced"}, !{!"llvm.loop.unroll_and_jam.enable"}}

test/Transforms/LoopUnrollAndJam/followup-metadata.ll

This file was added.

				; RUN: opt -basicaa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 -unroll-remainder < %s -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				define void @followup(i32 %I, i32 %J, i32* noalias nocapture %A, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp ne i32 %J, 0
				%cmp122 = icmp ne i32 %I, 0
				%or.cond = and i1 %cmp, %cmp122
				br i1 %or.cond, label %for.outer.preheader, label %for.end

				for.outer.preheader:
				br label %for.outer

				for.outer:
				%i.us = phi i32 [ %add8.us, %for.latch ], [ 0, %for.outer.preheader ]
				br label %for.inner

				for.inner:
				%j.us = phi i32 [ 0, %for.outer ], [ %inc.us, %for.inner ]
				%sum1.us = phi i32 [ 0, %for.outer ], [ %add.us, %for.inner ]
				%arrayidx.us = getelementptr inbounds i32, i32* %B, i32 %j.us
				%0 = load i32, i32* %arrayidx.us, align 4
				%add.us = add i32 %0, %sum1.us
				%inc.us = add nuw i32 %j.us, 1
				%exitcond = icmp eq i32 %inc.us, %J
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add.us.lcssa = phi i32 [ %add.us, %for.inner ]
				%arrayidx6.us = getelementptr inbounds i32, i32* %A, i32 %i.us
				store i32 %add.us.lcssa, i32* %arrayidx6.us, align 4
				%add8.us = add nuw i32 %i.us, 1
				%exitcond25 = icmp eq i32 %add8.us, %I
				br i1 %exitcond25, label %for.end.loopexit, label %for.outer, !llvm.loop !0

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				!0 = !{!0, !1, !2, !3, !4, !6}
				!1 = !{!"llvm.loop.unroll_and_jam.enable"}
				!2 = !{!"llvm.loop.unroll_and_jam.followup_outer", !{!"llvm.loop.unroll.disable"}}
				!3 = !{!"llvm.loop.unroll_and_jam.followup_inner", !{!"llvm.loop.vectorize.width", i32 4}}
				!4 = !{!"llvm.loop.unroll_and_jam.followup_all", !{!"llvm.loop.unroll.runtime.disable"}}
				!6 = !{!"llvm.loop.unroll_and_jam.followup_remainder_inner", !{!"llvm.loop.vectorize.width", i32 1}}

				; CHECK: br i1 %exitcond.3, label %for.latch, label %for.inner, !llvm.loop ![[LOOP_INNER:[0-9]+]]
				; CHECK: br i1 %niter.ncmp.3, label %for.end.loopexit.unr-lcssa.loopexit, label %for.outer, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
				; CHECK: br i1 %exitcond.epil, label %for.latch.epil, label %for.inner.epil, !llvm.loop ![[LOOP_REMAINDER_INNER:[0-9]+]]
				; CHECK: br i1 %exitcond.epil.1, label %for.latch.epil.1, label %for.inner.epil.1, !llvm.loop ![[LOOP_REMAINDER_INNER]]
				; CHECK: br i1 %exitcond.epil.2, label %for.latch.epil.2, label %for.inner.epil.2, !llvm.loop ![[LOOP_REMAINDER_INNER]]

				; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], ![[RUNTIME_DISABLE:[0-9]+]], ![[VEC_WIDTH:[0-9]+]]}
				; CHECK: ![[RUNTIME_DISABLE]] = !{!"llvm.loop.unroll.runtime.disable"}
				; CHECK: ![[VEC_WIDTH]] = !{!"llvm.loop.vectorize.width", i32 4}
				; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], ![[RUNTIME_DISABLE]], ![[UNROLL_DISABLE:[0-9]+]]}
				; CHECK: ![[UNROLL_DISABLE]] = !{!"llvm.loop.unroll.disable"}
				; CHECK: ![[LOOP_REMAINDER_INNER]] = distinct !{![[LOOP_REMAINDER_INNER]], ![[RUNTIME_DISABLE]], ![[VEC_DISABLE:[0-9]+]]}
				; CHECK: ![[VEC_DISABLE]] = !{!"llvm.loop.vectorize.width", i32 1}

test/Transforms/LoopUnrollAndJam/pragma.ll

	Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines
	!3 = distinct !{!3, !4}			!3 = distinct !{!3, !4}
	!4 = distinct !{!"llvm.loop.unroll_and_jam.count", i32 8}			!4 = distinct !{!"llvm.loop.unroll_and_jam.count", i32 8}
	!5 = distinct !{!5, !6}			!5 = distinct !{!5, !6}
	!6 = distinct !{!"llvm.loop.unroll_and_jam.enable"}			!6 = distinct !{!"llvm.loop.unroll_and_jam.enable"}
	!7 = distinct !{!7, !8}			!7 = distinct !{!7, !8}
	!8 = distinct !{!"llvm.loop.unroll.disable"}			!8 = distinct !{!"llvm.loop.unroll.disable"}
	!9 = distinct !{!9, !10}			!9 = distinct !{!9, !10}
	!10 = distinct !{!"llvm.loop.unroll.enable"}			!10 = distinct !{!"llvm.loop.unroll.enable"}
	!11 = distinct !{!11, !8, !6}			!11 = distinct !{!11, !8, !6}
	No newline at end of file

test/Transforms/LoopVectorize/X86/already-vectorized_transform.ll

This file was added.

				; RUN: opt < %s -disable-loop-unrolling -debug-only=loop-vectorize -O3 -S 2>&1 \| FileCheck %s
				; REQUIRES: asserts
				; We want to make sure that we don't even try to vectorize loops again
				; The vectorizer used to mark the un-vectorized loop only as already vectorized
				; thus, trying to vectorize the vectorized loop again

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@a = external global [255 x i32]

				; Function Attrs: nounwind readonly uwtable
				define i32 @vect() {
				; CHECK: LV: Checking a loop in "vect"
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				; We need to make sure we did vectorize the loop
				; CHECK: LV: Found a loop: for.body
				; CHECK: LV: We can vectorize this loop!
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%red.05 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds [255 x i32], [255 x i32]* @a, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %red.05
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 255
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				; If it did, we have two loops:
				; CHECK: vector.body:
				; CHECK: br {{.*}} label %vector.body, !llvm.loop [[vect:![0-9]+]]
				; CHECK: for.body:
				; CHECK: br {{.*}} label %for.body, !llvm.loop [[scalar:![0-9]+]]

				for.end: ; preds = %for.body
				ret i32 %add
				}

				!0 = !{!0, !3, !4}
				!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{!"llvm.loop.isvectorized", i32 1}}
				!4 = !{!"llvm.loop.vectorize.followup_epilogue", !{!"llvm.loop.unroll.runtime.disable"}, !{!"llvm.loop.isvectorized", i32 1}}

				; Now, we check for the Hint metadata
				; CHECK: [[vect]] = distinct !{[[vect]], [[width:![0-9]+]]}
				; CHECK: [[width]] = !{!"llvm.loop.isvectorized", i32 1}
				; CHECK: [[scalar]] = distinct !{[[scalar]], [[runtime_unroll:![0-9]+]], [[width]]}
				; CHECK: [[runtime_unroll]] = !{!"llvm.loop.unroll.runtime.disable"}

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll

	; RUN: opt < %s -loop-vectorize -S -pass-remarks-missed='loop-vectorize' -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -transform-warning -S -pass-remarks-missed='loop-vectorize' -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -o /dev/null -pass-remarks-output=%t.yaml			; RUN: opt < %s -loop-vectorize -transform-warning -o /dev/null -pass-remarks-output=%t.yaml
	; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s			; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

	; RUN: opt < %s -passes=loop-vectorize -S -pass-remarks-missed='loop-vectorize' -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize,transform-warning -S -pass-remarks-missed='loop-vectorize' -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s
	; RUN: opt < %s -passes=loop-vectorize -o /dev/null -pass-remarks-output=%t.yaml			; RUN: opt < %s -passes=loop-vectorize,transform-warning -o /dev/null -pass-remarks-output=%t.yaml
	; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s			; RUN: cat %t.yaml \| FileCheck -check-prefix=YAML %s

	; C/C++ code for tests			; C/C++ code for tests
	; void test(int *A, int Length) {			; void test(int *A, int Length) {
	; #pragma clang loop vectorize(enable) interleave(enable)			; #pragma clang loop vectorize(enable) interleave(enable)
	; for (int i = 0; i < Length; i++) {			; for (int i = 0; i < Length; i++) {
	; A[i] = i;			; A[i] = i;
	; if (A[i] > Length)			; if (A[i] > Length)
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; YAML-NEXT: Function: _Z17test_array_boundsPiS_i			; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: loop not vectorized			; YAML-NEXT: - String: loop not vectorized
	; YAML-NEXT: - String: ' (Force='			; YAML-NEXT: - String: ' (Force='
	; YAML-NEXT: - Force: 'true'			; YAML-NEXT: - Force: 'true'
	; YAML-NEXT: - String: ')'			; YAML-NEXT: - String: ')'
	; YAML-NEXT: ...			; YAML-NEXT: ...
	; YAML-NEXT: --- !Failure			; YAML-NEXT: --- !Failure
	; YAML-NEXT: Pass: loop-vectorize			; YAML-NEXT: Pass: transform-warning
	; YAML-NEXT: Name: FailedRequestedVectorization			; YAML-NEXT: Name: FailedRequestedVectorization
	; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }			; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 19, Column: 5 }
	; YAML-NEXT: Function: _Z17test_array_boundsPiS_i			; YAML-NEXT: Function: _Z17test_array_boundsPiS_i
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'loop not vectorized: '			; YAML-NEXT: - String: 'loop not vectorized: '
	; YAML-NEXT: - String: failed explicitly specified loop vectorization			; YAML-NEXT: - String: failed explicitly specified loop vectorization
	; YAML-NEXT: ...			; YAML-NEXT: ...
	; YAML-NEXT: --- !Analysis			; YAML-NEXT: --- !Analysis
	▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/x86_fp80-vector-store_transform.ll

This file was added.

				; RUN: opt -O3 -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@x = common global [1024 x x86_fp80] zeroinitializer, align 16

				;CHECK-LABEL: @example(
				;CHECK-NOT: bitcast x86_fp80* {{%[^ ]+}} to <{{[2-9][0-9]}} x x86_fp80>
				;CHECK: store
				;CHECK: ret void

				define void @example() nounwind ssp uwtable {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%conv = sitofp i32 1 to x86_fp80
				%arrayidx = getelementptr inbounds [1024 x x86_fp80], [1024 x x86_fp80]* @x, i64 0, i64 %indvars.iv
				store x86_fp80 %conv, x86_fp80* %arrayidx, align 16
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, 1024
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body
				ret void
				}

				!0 = !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.followup", !{!"llvm.loop.isvectorized", i1 true}}

test/Transforms/LoopVectorize/disable-heuristic.ll

This file was added.

				; RUN: opt -loop-vectorize -force-vector-interleave=1 -dce -instcombine -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; CHECK-LABEL: @disable_heuristic(
				; CHECK-NOT: x i32>
				define void @disable_heuristic(i32* nocapture %a, i32 %n) {
				entry:
				%cmp4 = icmp sgt i32 %n, 0
				br i1 %cmp4, label %for.body, label %for.end

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = trunc i64 %indvars.iv to i32
				store i32 %0, i32* %arrayidx, align 4
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !{!"llvm.loop.transformations.disable_nonforced"}}

test/Transforms/LoopVectorize/duplicated-metadata_transform.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S 2>&1 \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; This test makes sure we don't duplicate the loop vectorizer's metadata
				; while marking them as already vectorized (by setting width = 1), even
				; at lower optimization levels, where no extra cleanup is done

				define void @_Z3fooPf(float* %a) {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
				%p = load float, float* %arrayidx, align 4
				%mul = fmul float %p, 2.000000e+00
				store float %mul, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body
				ret void
				}

				!0 = !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.followup", !{!"llvm.loop.isvectorized", i32 1}}
				; CHECK-NOT: !{metadata !"llvm.loop.vectorize.width", i32 4}
				; CHECK: !{!"llvm.loop.isvectorized", i32 1}

test/Transforms/LoopVectorize/followups.ll

This file was added.

				; RUN: opt -loop-vectorize -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; CHECK-LABEL @followups(
				define void @followups(i32* nocapture %a, i32 %n) {
				entry:
				%cmp4 = icmp sgt i32 %n, 0
				br i1 %cmp4, label %for.body, label %for.end

				for.body:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = trunc i64 %indvars.iv to i32
				store i32 %0, i32* %arrayidx, align 4
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				!0 = !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{!"llvm.loop.isvectorized", i1 true}}
				!4 = !{!"llvm.loop.vectorize.followup_epilogue", !{!"llvm.loop.unroll.disable"}}
				!5 = !{!"llvm.loop.vectorize.followup_all", !{!"llvm.loop.unroll.runtime.disable"}}

				; CHECK-LABEL: vector.body:
				; CHECK: br i1 %13, label %middle.block, label %vector.body, !llvm.loop ![[LOOP_VECTOR:[0-9]+]]
				; CHECK-LABEL: for.body:
				; CHECK: br i1 %exitcond, label %for.end.loopexit, label %for.body, !llvm.loop ![[LOOP_REMAINDER:[0-9]+]]

				; CHECK: ![[LOOP_VECTOR]] = distinct !{![[LOOP_VECTOR]], ![[RUNTIMEUNROLL_DISABLE:[0-9]+]], ![[ISVECTORIZED:[0-9]+]]}
				; CHECK: ![[RUNTIMEUNROLL_DISABLE]] = !{!"llvm.loop.unroll.runtime.disable"}
				; CHECK: ![[ISVECTORIZED:[0-9]+]] = !{!"llvm.loop.isvectorized", i1 true}
				; CHECK: ![[LOOP_REMAINDER]] = distinct !{![[LOOP_REMAINDER]], ![[RUNTIMEUNROLL_DISABLE]], ![[UNROLLDISABLE:[0-9]+]]}
				; CHECK: ![[UNROLLDISABLE]] = !{!"llvm.loop.unroll.disable"}

test/Transforms/LoopVectorize/hints-trans_transform.ll

This file was added.

				; RUN: opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -instsimplify -simplifycfg < %s \| FileCheck %s
				; Note: -instsimplify -simplifycfg remove the (now dead) original loop, making
				; it easy to test that the llvm.loop.unroll.disable hint is still present.
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Function Attrs: norecurse nounwind uwtable
				define void @foo(i32* nocapture %b) #0 {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				store i32 1, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 16
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !0
				}

				; CHECK-LABEL: @foo
				; CHECK: = !{!"llvm.loop.unroll.disable"}

				attributes #0 = { norecurse nounwind uwtable }

				!0 = distinct !{!0, !1, !2}
				!1 = !{!"llvm.loop.unroll.disable"}
				!2 = !{!"llvm.loop.vectorize.followup", !1}

test/Transforms/LoopVectorize/multiple-strides-vectorization_transform.ll

This file was added.

				; RUN: opt -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s

				; This is the test case from PR26314.
				; When we were retrying dependence checking with memchecks only,
				; the loop-invariant access in the inner loop was incorrectly determined to be wrapping
				; because it was not strided in the inner loop.
				; Improved wrapping detection allows vectorization in the following case.

				; #define Z 32
				; typedef struct s {
				; int v1[Z];
				; int v2[Z];
				; int v3[Z][Z];
				; } s;
				;
				; void slow_function (s* const obj, int z) {
				; for (int j=0; j<Z; j++) {
				; for (int k=0; k<z; k++) {
				; int x = obj->v1[k] + obj->v2[j];
				; obj->v3[j][k] += x;
				; }
				; }
				; }

				; CHECK-LABEL: Test
				; CHECK: <4 x i64>
				; CHECK: <4 x i32>, <4 x i32>
				; CHECK: !{!"llvm.loop.isvectorized", i32 1}

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%struct.s = type { [32 x i32], [32 x i32], [32 x [32 x i32]] }

				define void @Test(%struct.s* nocapture %obj, i64 %z) #0 {
				br label %.outer.preheader


				.outer.preheader:
				%i = phi i64 [ 0, %0 ], [ %i.next, %.outer ]
				%1 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 1, i64 %i
				br label %.inner

				.exit:
				ret void

				.outer:
				%i.next = add nuw nsw i64 %i, 1
				%exitcond.outer = icmp eq i64 %i.next, 32
				br i1 %exitcond.outer, label %.exit, label %.outer.preheader

				.inner:
				%j = phi i64 [ 0, %.outer.preheader ], [ %j.next, %.inner ]
				%2 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 0, i64 %j
				%3 = load i32, i32* %2
				%4 = load i32, i32* %1
				%5 = add nsw i32 %4, %3
				%6 = getelementptr inbounds %struct.s, %struct.s* %obj, i64 0, i32 2, i64 %i, i64 %j
				%7 = load i32, i32* %6
				%8 = add nsw i32 %5, %7
				store i32 %8, i32* %6
				%j.next = add nuw nsw i64 %j, 1
				%exitcond.inner = icmp eq i64 %j.next, %z
				br i1 %exitcond.inner, label %.outer, label %.inner, !llvm.loop !0
				}

				!0 = !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.followup", !{!"llvm.loop.isvectorized", i32 1}}

test/Transforms/LoopVectorize/no_array_bounds.ll

	; RUN: opt < %s -loop-vectorize -S 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -transform-warning -S 2>&1 \| FileCheck %s

	; Verify warning is generated when vectorization/ interleaving is explicitly specified and fails to occur.			; Verify warning is generated when vectorization/ interleaving is explicitly specified and fails to occur.
	; CHECK: warning: no_array_bounds.cpp:5:5: loop not vectorized: failed explicitly specified loop vectorization			; CHECK: warning: no_array_bounds.cpp:5:5: loop not vectorized: failed explicitly specified loop vectorization
	; CHECK: warning: no_array_bounds.cpp:10:5: loop not interleaved: failed explicitly specified loop interleaving			; CHECK: warning: no_array_bounds.cpp:10:5: loop not interleaved: failed explicitly specified loop interleaving

	; #pragma clang loop vectorize(enable)			; #pragma clang loop vectorize(enable)
	; for (int i = 0; i < number; i++) {			; for (int i = 0; i < number; i++) {
	; A[B[i]]++;			; A[B[i]]++;
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/no_switch.ll

	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -transform-warning -S 2>&1 \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-width=1 -S 2>&1 \| FileCheck %s -check-prefix=NOANALYSIS			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -pass-remarks-missed='loop-vectorize' -transform-warning -S 2>&1 \| FileCheck %s -check-prefix=MOREINFO
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -pass-remarks-missed='loop-vectorize' -S 2>&1 \| FileCheck %s -check-prefix=MOREINFO

	; CHECK: remark: source.cpp:4:5: loop not vectorized: loop contains a switch statement			; CHECK: remark: source.cpp:4:5: loop not vectorized: loop contains a switch statement
	; CHECK: warning: source.cpp:4:5: loop not vectorized: failed explicitly specified loop vectorization			; CHECK: warning: source.cpp:4:5: loop not vectorized: failed explicitly specified loop vectorization

	; NOANALYSIS-NOT: remark: {{.*}}
	; NOANALYSIS: warning: source.cpp:4:5: loop not interleaved: failed explicitly specified loop interleaving

	; MOREINFO: remark: source.cpp:4:5: loop not vectorized: loop contains a switch statement			; MOREINFO: remark: source.cpp:4:5: loop not vectorized: loop contains a switch statement
	; MOREINFO: remark: source.cpp:4:5: loop not vectorized (Force=true, Vector Width=4)			; MOREINFO: remark: source.cpp:4:5: loop not vectorized (Force=true, Vector Width=4)
	; MOREINFO: warning: source.cpp:4:5: loop not vectorized: failed explicitly specified loop vectorization			; MOREINFO: warning: source.cpp:4:5: loop not vectorized: failed explicitly specified loop vectorization

	; CHECK: _Z11test_switchPii			; CHECK: _Z11test_switchPii
	; CHECK-NOT: x i32>			; CHECK-NOT: x i32>
	; CHECK: ret			; CHECK: ret

	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/vectorize-once_transform.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S -simplifycfg \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				;
				; We want to make sure that we are vectorizeing the scalar loop only once
				; even if the pass manager runs the vectorizer multiple times due to inlining.


				; This test checks that we add metadata to vectorized loops
				; CHECK-LABEL: @_Z4foo1Pii(
				; CHECK: <4 x i32>
				; CHECK: llvm.loop
				; CHECK: ret

				; This test comes from the loop:
				;
				;int foo (int *A, int n) {
				; return std::accumulate(A, A + n, 0);
				;}
				define i32 @_Z4foo1Pii(i32* %A, i32 %n) #0 {
				entry:
				%idx.ext = sext i32 %n to i64
				%add.ptr = getelementptr inbounds i32, i32* %A, i64 %idx.ext
				%cmp3.i = icmp eq i32 %n, 0
				br i1 %cmp3.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i

				for.body.i: ; preds = %entry, %for.body.i
				%__init.addr.05.i = phi i32 [ %add.i, %for.body.i ], [ 0, %entry ]
				%__first.addr.04.i = phi i32* [ %incdec.ptr.i, %for.body.i ], [ %A, %entry ]
				%0 = load i32, i32* %__first.addr.04.i, align 4
				%add.i = add nsw i32 %0, %__init.addr.05.i
				%incdec.ptr.i = getelementptr inbounds i32, i32* %__first.addr.04.i, i64 1
				%cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
				br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !2

				_ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
				%__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
				ret i32 %__init.addr.0.lcssa.i
				}

				; This test checks that we don't vectorize loops that are marked with the "width" == 1 metadata.
				; CHECK-LABEL: @_Z4foo2Pii(
				; CHECK-NOT: <4 x i32>
				; CHECK: llvm.loop
				; CHECK: ret
				define i32 @_Z4foo2Pii(i32* %A, i32 %n) #0 {
				entry:
				%idx.ext = sext i32 %n to i64
				%add.ptr = getelementptr inbounds i32, i32* %A, i64 %idx.ext
				%cmp3.i = icmp eq i32 %n, 0
				br i1 %cmp3.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i

				for.body.i: ; preds = %entry, %for.body.i
				%__init.addr.05.i = phi i32 [ %add.i, %for.body.i ], [ 0, %entry ]
				%__first.addr.04.i = phi i32* [ %incdec.ptr.i, %for.body.i ], [ %A, %entry ]
				%0 = load i32, i32* %__first.addr.04.i, align 4
				%add.i = add nsw i32 %0, %__init.addr.05.i
				%incdec.ptr.i = getelementptr inbounds i32, i32* %__first.addr.04.i, i64 1
				%cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
				br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !0

				_ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
				%__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
				ret i32 %__init.addr.0.lcssa.i
				}

				attributes #0 = { nounwind readonly ssp uwtable "fp-contract-model"="standard" "no-frame-pointer-elim" "no-frame-pointer-elim-non-leaf" "realign-stack" "relocation-model"="pic" "ssp-buffers-size"="8" }

				; CHECK: !0 = distinct !{!0, !1}
				; CHECK: !1 = !{!"llvm.loop.isvectorized", i32 1}
				; CHECK: !2 = distinct !{!2, !3, !1}
				; CHECK: !3 = !{!"llvm.loop.unroll.runtime.disable"}

				!0 = !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.width", i32 1}
				!2 = !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{!"llvm.loop.isvectorized", i32 1}}
				!4 = !{!"llvm.loop.vectorize.followup_epilogue", !{!"llvm.loop.unroll.runtime.disable"}, !{!"llvm.loop.isvectorized", i32 1}}

This is an archive of the discontinued LLVM Phabricator instance.

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 176471

docs/LangRef.rst

docs/Passes.rst

docs/TransformMetadata.rst

docs/index.rst

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Scalar/WarnMissedTransforms.h

include/llvm/Transforms/Utils/LoopUtils.h

include/llvm/Transforms/Utils/UnrollLoop.h

include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

lib/Analysis/LoopInfo.cpp

lib/Passes/PassBuilder.cpp

lib/Passes/PassRegistry.def

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LoopDistribute.cpp

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Scalar/LoopVersioningLICM.cpp

lib/Transforms/Scalar/Scalar.cpp

lib/Transforms/Scalar/WarnMissedTransforms.cpp

lib/Transforms/Utils/LoopUnroll.cpp

lib/Transforms/Utils/LoopUnrollAndJam.cpp

lib/Transforms/Utils/LoopUnrollRuntime.cpp

lib/Transforms/Utils/LoopUtils.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Other/new-pm-defaults.ll

test/Other/new-pm-thinlto-defaults.ll

test/Other/opt-O2-pipeline.ll

test/Other/opt-O3-pipeline.ll

test/Other/opt-Os-pipeline.ll

test/Other/opt-hot-cold-split.ll

test/Transforms/LoopDistribute/disable-heuristic.ll

test/Transforms/LoopDistribute/followup.ll

test/Transforms/LoopTransformWarning/distribution-remarks-missed.ll

test/Transforms/LoopTransformWarning/unrollandjam-remarks-missed.ll

test/Transforms/LoopTransformWarning/unrolling-remarks-missed.ll

test/Transforms/LoopTransformWarning/vectorization-remarks-missed.ll

test/Transforms/LoopUnroll/disable_nonforced.ll

test/Transforms/LoopUnroll/disable_nonforced_count.ll

test/Transforms/LoopUnroll/disable_nonforced_enable.ll

test/Transforms/LoopUnroll/disable_nonforced_full.ll

test/Transforms/LoopUnroll/runtime-loop_transform.ll

test/Transforms/LoopUnroll/unroll-count_transform.ll

test/Transforms/LoopUnroll/unroll-pragmas-disabled_transform.ll

test/Transforms/LoopUnroll/unroll-pragmas_transform.ll

test/Transforms/LoopUnrollAndJam/disable_nonforced.ll

test/Transforms/LoopUnrollAndJam/disable_nonforced_count.ll

test/Transforms/LoopUnrollAndJam/disable_nonforced_enable.ll

test/Transforms/LoopUnrollAndJam/followup-metadata.ll

test/Transforms/LoopUnrollAndJam/pragma.ll

test/Transforms/LoopVectorize/X86/already-vectorized_transform.ll

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll

test/Transforms/LoopVectorize/X86/x86_fp80-vector-store_transform.ll

test/Transforms/LoopVectorize/disable-heuristic.ll

test/Transforms/LoopVectorize/duplicated-metadata_transform.ll

test/Transforms/LoopVectorize/followups.ll

test/Transforms/LoopVectorize/hints-trans_transform.ll

test/Transforms/LoopVectorize/multiple-strides-vectorization_transform.ll

test/Transforms/LoopVectorize/no_array_bounds.ll

test/Transforms/LoopVectorize/no_switch.ll

test/Transforms/LoopVectorize/vectorize-once_transform.ll

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
ClosedPublic