Download Raw Diff

Details

Reviewers

hfinkel
dmgreen
hsaito
jeffhammond
jingyue
eliben
anemet
Ashutosh
ashutosh.nema
mcrosier
fhahn
hiraditya
meheff
bollu

Commits

rG7244852557ca: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
rL348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
rC348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.

Summary

When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)

is the same as

#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

To ensure that no other transformation is executed before the intended one, the attribute llvm.loop.transformations.disable_nonforced can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

This approach deviates from proposal in the RFC at http://lists.llvm.org/pipermail/llvm-dev/2018-May/123690.html. There are three reasons:

In the RFC approach, when a pass wants to determine whether it should perform a transformation, it must search the list of transformations for the first transformation on its loop. When inlining, the two transformation list must be combined.
For compatibility, the current approach using loop IDs must still be supported, either each pass looks up both metadata formats, or using the AutoUpgrade mechanism. This patch's approach keeps the current mechanism.
The "loop IDs" can change. When a attribute is added/removed from the Loop ID metata (e.g. adding "llvm.loop.isvectorized" after vectorizing), the loop is assigned a new, distrinct MDNode. Every reference to these nodes needs to be updated as well, and metadata nodes do not support RAUW. Some of the current passes (e.g. LoopDistribution) keep the same loop ID for multiple output loops such that the loop ID is not uniquely identifying loops anymore. This patch's approach does not reference loop IDs in the attribute values.

Diff Detail

Repository: rC Clang

Event Timeline

Meinersbur created this revision.Jul 12 2018, 10:15 PM

Herald added a reviewer: bollu. · View Herald TranscriptJul 12 2018, 10:15 PM

Herald added subscribers: dexonsmith, steven_wu, zzheng and 3 others. · View Herald Transcript

I like the idea of this, giving control to the user to figure out the best way to manipulate loops (even if they end up shooting themselves in the foot with it :) )

What does this mean for the clang side of this patch? Has the user-visible loop-naming scheme changed, or will it be mapped to these matadatas?

What follows is mostly a number of small Nits:

docs/TransformMetadata.rst
23 ↗	(On Diff #155318)	Nit: from the emitted IR
53 ↗	(On Diff #155318)	Nit: transformation
82 ↗	(On Diff #155318)	Nit: transformed
87 ↗	(On Diff #155318)	Nit: occur
252 ↗	(On Diff #155318)	Nit: have
343 ↗	(On Diff #155318)	Should we fix this? Will it work as expected with nonforced, if it was enabled?
include/llvm/LinkAllPasses.h
221 ↗	(On Diff #155318)	Nit: Space I guess? I think this file could do with a clang-formatting
include/llvm/Transforms/Scalar/WarnMissedTransforms.h
35 ↗	(On Diff #155318)	Are these two needed here, if they are declared in other places?
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
336 ↗	(On Diff #155318)	This is called SubLoop here

Thanks for the review.

In D49281#1162931, @dmgreen wrote:

What does this mean for the clang side of this patch? Has the user-visible loop-naming scheme changed, or will it be mapped to these matadatas?

I also have a patch for clang in mind to make the order defined by the frontend, not the pass pipeline. However, for compatibility it will have to put them in the order of the current pass order e.g.

#pragma unroll_and_jam
#pragma clang loop vectorize(enable)

and

#pragma clang loop vectorize(enable)
#pragma unroll_and_jam

both have to emit

!0 = !{!0, 
    !{!"llvm.loop.vectorize.enable"},
    !{!"llvm.loop.vectorize.followup_all", 
        !{!"llvm.loop.unrollandjam.enable"}}}}

A different syntax is needed that the user can use if they want to define the transformation order.

Also for compatibility we cannot error-out if transformation undefined (at least for any combination of currently supported transformations), but could use the Auto-Upgrader.

ping

Meinersbur marked 7 inline comments as done.Jul 26 2018, 7:30 PM

Meinersbur added inline comments.

docs/TransformMetadata.rst
343 ↗	(On Diff #155318)	Yes, but in a separate patch that adds interchange-specific metadata. LoopInterchange currently is not enabled by default and does not modify metadata so its interaction with other transformation is less significant.
include/llvm/LinkAllPasses.h
221 ↗	(On Diff #155318)	I used `git clang-format origin/master` which only applies to changed lines like this one.
include/llvm/Transforms/Scalar/WarnMissedTransforms.h
35 ↗	(On Diff #155318)	Depends on which headers are included when using this pass. `WarnMissingTransforms.cpp` itself does not include `include/llvm/InitializePasses.h` or `include/llvm/Transforms/Scalar.h`. Having the forward declarations here matches the translation unit header idiom an the compiler checks that the definition matches the declaration here.

Address @dmgreen's remarks

Harbormaster completed remote builds in B20769: Diff 157638.Jul 26 2018, 7:32 PM

Some extra tests for nonforced + a pragma would be good to see.

I'm not much of an expert on the vectoriser changes here.

docs/LangRef.rst
5245 ↗	(On Diff #157638)	These can now move down with the other unroll_and_jam metadata
5263 ↗	(On Diff #157638)	Nit: This loop
docs/TransformMetadata.rst
343 ↗	(On Diff #155318)	Yep. Certainly a separate patch.
71 ↗	(On Diff #157638)	Nit: Maybe change the second "for instance"
163 ↗	(On Diff #157638)	Do you think it's worth mentioning unroll.count and unroll.disable etc, before jumping into the followup metadata?
206 ↗	(On Diff #157638)	Again, could mention unroll_and_jam.count and enable/disable.
264 ↗	(On Diff #157638)	Again, maybe describe other metadata first?
395 ↗	(On Diff #157638)	Nit: pass pass
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #157638)	Same as unroll. What if PragmaCount and hasDisableAllTransformsHint?
lib/Transforms/Scalar/LoopUnrollPass.cpp
753 ↗	(On Diff #157638)	What if it has PragmaCount and hasDisableAllTransformsHint? Should that not enable too?
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
1 ↗	(On Diff #157638)	This file look sensible on it's own and I think looks OK to be committed separately. (Apart from the nit below)
5 ↗	(On Diff #157638)	Nit: Is this sentence still true?

In D49281#1187583, @dmgreen wrote:

I'm not much of an expert on the vectoriser changes here.

There is a call for a vectorizer person, and here I am, but before going there, ever since this patch was uploaded, I've been thinking whether the original intent of this patch can be really accomplished ---- especially so with the set of "hints" being used here. So, I'd like to go back and start there if that's okay. I think this patch is in some sense based on the optimism of "programmer won't abuse this and transformation will happen". Reality is more like programmer will try using those pragmas to arm-twist the compiler to get the set of transformations he/she wants w/o thinking deep enough about what happens at each step of the way. As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens. From what I've read --- granted that I haven't gone through this very deep, the fall back aspect isn't handled well. If we don't start from "programmer specified transformation may fail to kick-in", providing this feature to the programmers would quickly backfire and we'll get tons of this doesn't work that doesn't work problem reports ---- which is a big mess/disaster. If we are doing this for research purposes, that may be fine. I'm looking at this from a production compiler development perspective.

There are two different approaches we can think of.

Start from defining the transformation directives that actually transforms in most cases. For example, Intel compiler's implementation of OpenMP SIMD is in this position, and we are trying to bring the same position to LLVM LV. Then, failing to transform is a compiler bug (or a programmer has a bug in the source code to fix). OpenMP SIMD is defined in such a way for compilers to be able to take this positioning. In this approach, we can stop processing transformation directives after the first failed transformation. Even in this case, for example, programmer probably won't know under certain circumstances, vectorizer won't produce remainder loop, or under certain cases, vectorizer completely unrolls the loop so that there aren't any loops left after vectorization. So, controlling a single transformation in a programmer predictable manner --- enough to describe what should happen next --- is a big task.
Based on transformation hint directives. Since what the programmer is using is a hint, for each transformation, programmer needs to tell the next step 1) if the transformation kicks in and also 2) if the transformation does not kick-in. This will be very messy and I'm not sure how practical it would be for the programmer to specify the behaviors for all situations.

If these kinds of stuff have been already discussed, please give me the pointers and I'll try to digest. Else, we can talk about this in this review or in llvm-dev. Hope my argument makes sense to you/others, but please feel free to ask for clarification. I've been working on making SIMD directive programmer predictable for many years. So, I may be skipping some explanations that became natural enough to me over the years.

In short, doing this even for one transformation (in my case SIMD) is difficult enough. If we are trying to expand to multiple transformations, we should try doing so in baby steps.

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

Thanks,
Hideki

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not). By default, the compiler should still provide safety conditions. The alternative to providing the functionality proposed here is, in reality, source-to-source code generators (which often do semantically-incorrect things and are hard to use in production for a large number of reasons). Also, the client for this functionality is not just programmers directly, but other tools (e.g., autotuners and other higher-level languages), which is also why generating safety predicates, or at least having that option, is important. I think that LLVM also currently does the right thing regarding transformations that don't apply: we issue a warning (which bubbles up from the optimizer via the optimization-remark interface). We should certainly continue to do that (and I know that Michael has experimented with ways of making this continue to happen reliably).

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

I don't think that we can ever really having something in this space which isn't best effort, but, I think that providing a warning is both necessary (because silent failure is poor user experience) and sufficient (it's not clear to me what kind of fallback we could provide that would be more robust in practice). If people start providing us with bug reports about loops that we couldn't transform, but should have transformed, that will be great data on what to improve. That having been said, if you can suggest pragmas that have semantics that allow us to control loop transformations in a way that's more robust than the current ones, then please, of course, suggest them.

I feel obligated to note, however, that the motivation for this work comes entirely from our experience supporting HPC users. I'm convinced that it will provide a significantly better user experience over the current state of the art. I'm sure that you've seen code that comes out of higher-level code generators and the like. These tend to be hard to maintain, inflexible, and buggy, and the code produced is difficult for both humans and compilers to understand. I'm sure you've also seen cases where people implement these transformations by hand (do I need to go on?). The compiler can perform these transformations and having the user able to direct the compiler to do so if a much better option. To some extent, a compiler can have cost models and heuristics to apply these automatically, but only occasionally do we actually have enough static information to do so (even with some PGO capabilities).

In D49281#1187583, @dmgreen wrote:

Some extra tests for nonforced + a pragma would be good to see.

Any transformations in particular?

In D49281#1188026, @hsaito wrote:

In D49281#1187583, @dmgreen wrote:

I'm not much of an expert on the vectoriser changes here.

There is a call for a vectorizer person, and here I am,

Thank you!!!

but before going there, ever since this patch was uploaded, I've been thinking whether the original intent of this patch can be really accomplished ---- especially so with the set of "hints" being used here. So, I'd like to go back and start there if that's okay. I think this patch is in some sense based on the optimism of "programmer won't abuse this and transformation will happen". Reality is more like programmer will try using those pragmas to arm-twist the compiler to get the set of transformations he/she wants w/o thinking deep enough about what happens at each step of the way. As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens. From what I've read --- granted that I haven't gone through this very deep, the fall back aspect isn't handled well. If we don't start from "programmer specified transformation may fail to kick-in", providing this feature to the programmers would quickly backfire and we'll get tons of this doesn't work that doesn't work problem reports ---- which is a big mess/disaster. If we are doing this for research purposes, that may be fine. I'm looking at this from a production compiler development perspective.

There are two different approaches we can think of.

Start from defining the transformation directives that actually transforms in most cases. For example, Intel compiler's implementation of OpenMP SIMD is in this position, and we are trying to bring the same position to LLVM LV. Then, failing to transform is a compiler bug (or a programmer has a bug in the source code to fix). OpenMP SIMD is defined in such a way for compilers to be able to take this positioning. In this approach, we can stop processing transformation directives after the first failed transformation. Even in this case, for example, programmer probably won't know under certain circumstances, vectorizer won't produce remainder loop, or under certain cases, vectorizer completely unrolls the loop so that there aren't any loops left after vectorization. So, controlling a single transformation in a programmer predictable manner --- enough to describe what should happen next --- is a big task.

Based on transformation hint directives. Since what the programmer is using is a hint, for each transformation, programmer needs to tell the next step 1) if the transformation kicks in and also 2) if the transformation does not kick-in. This will be very messy and I'm not sure how practical it would be for the programmer to specify the behaviors for all situations.

If these kinds of stuff have been already discussed, please give me the pointers and I'll try to digest. Else, we can talk about this in this review or in llvm-dev. Hope my argument makes sense to you/others, but please feel free to ask for clarification. I've been working on making SIMD directive programmer predictable for many years. So, I may be skipping some explanations that became natural enough to me over the years.

All these issues apply to the current loop metadata/#pragma clang loop as well. A user can specify #pragma clang loop distribute(enable) vectorize_width(4) with the expectation that these will be carried out. If not, this can be a bug (or semantically incorrect) just as if the distribution/vecorization was explicitly orderder using followup-attributes.
In contrast, currently there is no way to even express that the transformations should be carried out in there reverse order (first vectorization, then distribution; let's ignore for the moment whether this makes sense, it definitely makes sense with loop transformations other than those for which LLVM currently has metadata). Our longer-term plans are to support this in LLVM and the first step is to make sequences of transformations expressible in IR. Of course our goal is also to make transformations more applicable/robust. I see these as two orthogonal problems.

I don't think we need a fallback loop transformation. If a transformation cannot be applied, the user's reactions is probably not "let's do a different transformation then" (which will in most cases also fail for the same reasons the primary transformation failed), but "unfortunately the compiler cannot do my transformation I need to get the best performance, will implement it by hand then." Even if the first option is what the programmer wantsm they will implement it using preprocessor switches as it is common today with different compilers that support/do not support specific pragmas.

I had already some discussions on the reliability of transformations in the compiler. Some groups do not want to rely at all on the compiler being able to do a specific optimization and user libraries instead. Those libraries will 'miscompile' the input if certain preconditions are not met which is the desired outcome where slow execution just is no option. With incorrect results it at least becomes obvious that there is a problem. However, this is not an option for a compiler where correctness is the most important aspect (and only relaxed by attributes such as assume_safety). For some use cases, such as autotuning, being able to rely on the compiler producing correct output makes it possible in first place.
We could argue about whether we want high-level transformations in a low-level compiler in the first place. I think this has been answered a long time ago: LoopVectorizer, LoopUnroll, LoopInterchange, LoopDistribution, LoopUnswitch, etc. Since we have this kind of transformation, why not mkaing them as good as possible?

In short, doing this even for one transformation (in my case SIMD) is difficult enough.

I'd be happy to extract-out attributes for specific transformations in separate reviews. However, to be come useful, I think we need the entire set.

If we are trying to expand to multiple transformations, we should try doing so in baby steps.

I understood that you are working on making the loop vectorizer predictable, i.e. we are working towards the same goal.

The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

I think the current one-pass-per-transformation is indeed very fragile and I am working on something that should be more robust.

lib/Transforms/Scalar/LoopUnrollPass.cpp
753 ↗	(On Diff #157638)	If `llvm.loop.unroll.enable` is not set, interpretation here is that the transformation is 'non-forced', that is, `llvm.loop.unroll.count` is a hint to the compiler that if it unrolls, then it should unroll by that amount. `llvm.loop.disable_nonforced` overrride the decision whether to unroll, i.e. the unroll factor does not matter. I am aware that the concept of 'forced' transformations is not consistent between passes, but I am trying to give it some consistency. Passes could query shared code such as `hasUnrollTransformation` in `LoopUtils`. `hasUnrollTransformation` currently follows your interpretation of `llvm.loop.unroll.count` / `llvm.loop.disable_nonforced`. I am happy to implement either definition, as long as we find a consistent rule.
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
1 ↗	(On Diff #157638)	This is a copy of `unroll-pragmas.ll` and any ambiguous metadata replaced by follow-up attributes. An hope is to generally make 'multiple transformation attributes on the same loop' illegal and rejected by the IR verifier (since the result depends on an implementation detail: the order in the pass manager). In this case this file would replace `unroll-pragmas.ll`. But my expectation is that we cannot break backwards-compatibility this way.
5 ↗	(On Diff #157638)	Yes: When no follow-up attributes are specified, the default ones are added (here: `llvm.loop.unroll.disable` to disable further unrolling). In case there are follow-up attribute lists, there is no default and the transformation-disabling must be added explicitly (MDNode `!18`) and of course added after unrolling and recognized by the second LoopUnroll.

I agree that this is a very powerful idea (something I wish I'd had back when I was writing psuedo-hpc applications). I think it's well worth having, but equally worth making sure we get it right. The clang side is very important for that.

In D49281#1188441, @Meinersbur wrote:

In D49281#1187583, @dmgreen wrote:

Some extra tests for nonforced + a pragma would be good to see.

Any transformations in particular?

There seem to be tests that nonforced disables things, but not that nonforced + an attribute keeps it enabled. e.g. the unroll.count metadata.

lib/Transforms/Scalar/LoopUnrollPass.cpp
753 ↗	(On Diff #157638)	I would expect that if a loop has any metadata for a pass, that would mean disable_nonforced doesn't apply. As if the user has specified some metadata, it likely wants something to happen. I think in this specific case llvm.loop.unroll.count implies llvm.loop.unroll.enable, and we wouldn't put both on a loop for "#pragma unroll(4)" or "#pragma clang loop unroll_count(4)"
test/Transforms/LoopUnroll/unroll-pragmas_transform.ll
1 ↗	(On Diff #157638)	Ah, I missed the "followup" here. Is it worth replicating this entire file, or should it just be an extra test in the old file. The "followup" on unroll_1 seems to be the only test changed here? To add unroll.disable as a followup attribute? I'm not sure I see why. Would we expect "#pragma unroll(1)" to not work as it did before? (disable unroll)

In D49281#1188249, @hfinkel wrote:

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

In D49281#1189643, @hsaito wrote:

In D49281#1188249, @hfinkel wrote:

... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

I certainly agree that we should document this.

In D49281#1189643, @hsaito wrote:

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?

Yes. A warning will be emitted by the -transform-warning pass (Please see Passes.rst). B and C cannot apply on a loop that does not exist.

Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways?

The current LoopDistribution pass unfortunately does not support this, by a goal is to make the user able to define what code should become their own loop. See [[ A Proposal for Loop-Transformation Pragmas | https://arxiv.org/abs/1805.03374 ]].
For the current LoopDistribution pass, only two categories of followup-attributes can be defined noncyclic and cyclic. The noncyclic category can be added to multiple loops.

If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Documented in TransformMetadata.rst line 57ff.

docs/TransformMetadata.rst
163 ↗	(On Diff #157638)	Yes, maybe, but they are also already documented in the `LangRef.rst`. Please understand that the goal in this patch is to define a transformation model for each pass such that it is clear what are those followup-loops, not to write an exhaustive documentation.

In D49281#1189737, @Meinersbur wrote:

In D49281#1189643, @hsaito wrote:

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?

Yes. A warning will be emitted by the -transform-warning pass (Please see Passes.rst).

This part, I know you did.

B and C cannot apply on a loop that does not exist.

I don't think this is explicitly written. Here's an example. Suppose A is vectorization and B is unroll. If a loop is somehow not vectorized. Unrolling can still happen to the non-vectorized loop. Whether we stop unrolling in this situation is what I'd like to see us being explicit about.

Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways?

The current LoopDistribution pass unfortunately does not support this, by a goal is to make the user able to define what code should become their own loop. See [[ A Proposal for Loop-Transformation Pragmas | https://arxiv.org/abs/1805.03374 ]].
For the current LoopDistribution pass, only two categories of followup-attributes can be defined noncyclic and cyclic. The noncyclic category can be added to multiple loops.

Whether distribution currently supports that is a different issue. I'm sure we will be expanding the features in the futures. This composability discussion should encapsulate the baseline behaviors for enough of possible future situations ---- else we have to keep revising baseline behaviors, which is very bad.

If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Documented in TransformMetadata.rst line 57ff.

I only see the warning behavior there. I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

Meinersbur mentioned this in D50075: [UnJ] Improve explicit loop count checks.Aug 9 2018, 2:27 PM

Explicitly document followup of not applied transformations to be ignored
Unroll/UnrollAndJam: Interpret enable/count/full as forced
Unroll/UnrollAndJam: Add tests for disable_nonforced combined with enable/count/full
Reduce size of unroll-pragmas_transform.ll

Harbormaster completed remote builds in B21316: Diff 160051.Aug 9 2018, 8:42 PM

In D49281#1189774, @hsaito wrote:

I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

I added a paragraph to TransformMetadata.rst. (I was assuming it was obvious from the definition: A transformation in a followup-attribute only becomes assigned to a loop by the loop transformation pass. Before that, it is not associated with any loop)

In D49281#1194814, @Meinersbur wrote:

In D49281#1189774, @hsaito wrote:

I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

I added a paragraph to TransformMetadata.rst. (I was assuming it was obvious from the definition: A transformation in a followup-attribute only becomes assigned to a loop by the loop transformation pass. Before that, it is not associated with any loop)

The added paragraph looks good to me on the implementation side specification. Looking forward to see the programmers (i.e., compiler users, not compiler writers) side pragma description, but that will not gate my review of this patch. There is a difference between specification forcing one behavior versus implementation choice ends up in the same behavior. I wanted the former, not the latter. With this specification, we can have another implementation choice ---- attaching all those metadata to the loop, to be updated by the successful transformation, and let failed transform drop subsequent ones. I'm not saying it's better to go that way. What I'm saying is that if, for some reason, we later choose to implement this differently, there is a specification to guide how to implement the feature correctly. Hope I don't sound too picky. I just want to provide consistent experiences to the programmers.

In D49281#1196034, @hsaito wrote:

The added paragraph looks good to me on the implementation side specification. Looking forward to see the programmers (i.e., compiler users, not compiler writers) side pragma description, but that will not gate my review of this patch. There is a difference between specification forcing one behavior versus implementation choice ends up in the same behavior. I wanted the former, not the latter.

Different behavior of different implementations is also a serious concern for me. I have three different implementations in mind (the current loop transformations, an extension to Polly to use this metadata, and an idealized loop-transformation pass; the latter two being more powerful is one of the motivtions for this path). Given the prototypical transformations in TransformMetadata.rst, I think the model is applicable to other implementations as well.

With this specification, we can have another implementation choice ---- attaching all those metadata to the loop, to be updated by the successful transformation, and let failed transform drop subsequent ones. I'm not saying it's better to go that way. What I'm saying is that if, for some reason, we later choose to implement this differently, there is a specification to guide how to implement the feature correctly.

This would unfortunately break existing behavior. E.g. llvm.loop.distribute.enable and llvm.loop.vectorize,enable can both be specified in the same loop attributes. Currently, if LoopDistribution fails, the attribute llvm.loop.vectorize.enable will be left untouched. If we change LoopDistribution to remove it, the loop would not be vectorized anymore (assuming the heuristic does not deem it profitable).
It also does do what motivates this patch: Neither the order of transformations be specified, nor can the same transformation be applied multiple times.

I was testing the code and ran into some problems with debug metadata on the loop nodes (actually using -Rpass=unroll in that case). Can you make sure that works as expected?

docs/TransformMetadata.rst
110 ↗	(On Diff #160051)	Nit: never be added
include/llvm/Transforms/Utils/LoopUtils.h
487 ↗	(On Diff #160051)	Nit: transformations->transformation
495 ↗	(On Diff #160051)	Nit: inherit
507 ↗	(On Diff #160051)	Nit: choose
537 ↗	(On Diff #160051)	Nit: warning
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #160051)	This code will need rebasing. There is a check earlier that looks for disable metadata that could be replaced by this. Look for HasUnrollDisablePragma/HasUnrollAndJamDisablePragma. If the same was done for unrolling, I think that would remove the need for the IgnoreUser (although your comment about it is probably still true).
lib/Transforms/Utils/LoopUtils.cpp
1384 ↗	(On Diff #160051)	Maybe InheritSomeAttrs -> InheritNonExceptAttrs?

hiraditya added inline comments.Aug 14 2018, 5:13 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
1101 ↗	(On Diff #160051)	nit: maybe put the string literals as a separate declaration?
lib/Transforms/Utils/LoopUnrollRuntime.cpp
543 ↗	(On Diff #160051)	What is the rationale of using pointer to a pointer here? If we want to assign to ResultLoop, then maybe we can just return ResultLoop and bool as a pair.
912 ↗	(On Diff #160051)	nit: space

Meinersbur mentioned this in D50698: [UnJ] Ensure unroll_and_jam metadata is removed once consumed..Aug 14 2018, 3:04 PM

I am thinking about adding a LoopMetadataTacker (sort of a combination of LoopVectorizeHints and AssumptionTracker) analysis pass which would centralize the interpretation of that metadata and avoid the linear search through the metadata list when looking up a specific attribute.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #160051)	I'd push towards refectoring-out the common parts of `computeUnrollCount` used by LoopUnroll and LoopUnrollAndJam. Currently `computeUnrollCount` uses lots of settings meant for LoopUnroll (`llvm.loop.unroll.` metadata which should not exist anymore, OptimizationRemarkMissed specific to LoopUnroll, `-unroll-count`, `PartialThreshold`, handling of full unroll, loop peeling that UnrollAndJam does not support, being used in a single call by UnrollAndJam for two different things: determining `ExplicitUnroll` (i.e. is normal unroll is forced) and the unroll-and-jam count). It's hard to understand the subtleties between those codes. I gave up at some point and added the `IgnoreUser` flag to make test cases pass.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
543 ↗	(On Diff #160051)	If the Result loop is not needed, one can pass `nullptr` (which is the default argument). Returning `std::pair` will require more changes.
912 ↗	(On Diff #160051)	This is done by `clang-format`. It try not to fight its decisions and hope for future improvement.
lib/Transforms/Utils/LoopUtils.cpp
1384 ↗	(On Diff #160051)	Avoiding double negation here.

Meinersbur added inline comments.Aug 14 2018, 3:46 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
912 ↗	(On Diff #160051)	I will try to remove the space in patch updates, but may sneak in again when I re-run clang-format and forget about it before submission.

hsaito added inline comments.Aug 15 2018, 5:33 PM

docs/LangRef.rst
5154 ↗	(On Diff #160051)	I understand that the RST file update should talk about what happens today, but for the sake of code review, it's good to discuss what could happen in reasonably foreseeable future so that we don't under-design things. I think we should be thinking ahead about vectorizer peeling the loop, for example, for alignment optimization. Such peeled loop could be fully unrolled if the trip count is known, or vectorized with mask. main vector loop could be fully unrolled there may be more than one remainder loop, e.g., vectorized remainder followed by scalar remainder. remainder loop may be fully unrolled. All those situations could happen w/o programmer knowing it'll happen that way. Some of the questions we want to think before the real need arises: Will the loop attribute get dropped if the "loop" is fully unrolled? How do we designate more than one remainder loop? Will the loop attribute applicable for vectorized peel/remainder? Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder?
5251 ↗	(On Diff #160051)	Remainder here may be unrolled again or fully unrolled (see the comments on vectorize metadata). What do we do for that?
5311 ↗	(On Diff #160051)	Is there an assumption of unroll_and_jam operating only on a double loop and/or a perfect loop? Technically speaking, we can unroll_and_jam a loop if we can legally outerloop-vectorize. So, there can be multiple inner loops.
5380 ↗	(On Diff #160051)	Looks rather centric to distribute-for-vectorization. Loop distribution can happen for many reasons (and it may be more than one reasons). Are we going to define followup_ Metadata for each of those reasons? What'll happen if a loop matches the characteristics of more than one Metadata?

Report unroll-and-jam as not applied even if unroll is present as well.
- rename followup_cyclic/followup_noncyclic to followup_sequential/followup_coincident
- Move hasUnrollAndJamTransformation in LoopUnrollAndJamPass to different place
- Remove some unrelated whitespace changes made by clang-format
- Extract followup attribute names into constant

docs/LangRef.rst
5154 ↗	(On Diff #160051)	I think a prologue/peel is analogous to epilogue/remainder. That is, a new `llvm.loop.vectorize.followup_peel` can be added. Should be handled as two separate transformations (such as vectorize/interleave). That is `llvm.loop.disable_nonforced` would ensure that a loop does not unexpectedly disappear `llvm.loop.followup_remainder` should apply on any of the remainder loops. If a finer distinction is required, we can add more specific attributes. This can already happen, at least with LoopUnroll/LoopUnrollandJam. The docs mentions that in this case the `followup_remainder` is dropped. However, changing the model a transformation transform to can indeed raise some backward-compatibility issues. This also applies to the user-interface. If a programmer added #pragma clang loop vectorize(enable) do they expect it to be unrolled as well? Loop peeling? D50480 is interesting here: At `-Os`, it uses masking instead of an epilogue to avoid a code copy. In this case `followup_remainder` explicitly states that there's not necessarily a remainder loop, so I don't see a problem here. But a programmer might expect more control over what the output structure is. We can add more attributes to control this behaviour, such as `llvm.loop.vectorize.peel.enable`, `llvm.loop.vectorize.remainder.enable`, `llvm.loop.vectorize.allow_versioning`. The interesting question is, what is the default setting? If we go by the current behavior to maximize backwards-compatibility, remainder and versioning would be enabled by default (if not in `-Os`), peeling disabled because it is not yet implemented. On the other side we probably do not want frontends to emit the most recent enable-metadata to get the best vectorization. So we would enable all features by default, but the output loops might be different from what the programmer intended before the feature is introduced. We can enable all features unless the transformation is forced, in which case all deviations from the current transformation model needs to be explicitly enabled. IMHO, we can decide this case-by-case, weighting compatibility concerns and optimization levels. Then again, such transformations does not influence the correctness of the output. To be less concerned about compatibility issues, I could for now remove all followup-attributes except those that are 'central' to the transformation, and `followup_all`. For vectorization, there will always be the performance-critical vectorized loop (i.e. `followup_vectorized`), independent of whether there is a prologue, epilogue or fallback. For partial unrolling, it is always a unrolled loop. Will the loop attribute get dropped if the "loop" is fully unrolled? Yes. But it should not happen if `llvm.loop.disable_nonforced` is used and the unroll is not explicitly specified. How do we designate more than one remainder loop? Using different attributes. Like `followup_all` it is possibly to address a group of loops. Will the loop attribute applicable for vectorized peel/remainder? Only for the followup that addresses them Should we have a way to designate runtime-DD non-vectorizable loop separately from remainder? As mentioned sometime before, the typical reaction to 'loop not vectorized' is not 'ok, let's unroll it instead', but 'how can I make it vectorize'. So I don't think fallbacks are necessary (unless we can apply a sequence of transformation to multiple loop), but I am open if you think there is a need for such.
5251 ↗	(On Diff #160051)	`followup_remainder` is ignored. If this it is not clear from the section in `TransformMetadata.rst`, please tell me.

Meinersbur added inline comments.Aug 17 2018, 4:23 PM

docs/LangRef.rst
5265–5266 ↗	(On Diff #160051)	@dmgreen This directly contradicts the `nounroll_plus_unroll_and_jam` test case in `Transforms/LoopUnrollAndJam/pragma.ll`
5311 ↗	(On Diff #160051)	There's still an outermost (unrolled) loop and an innermost (jammed) loop. We could also adds followups for middle loops. If it is the naming that concerns you: Would you prefer `followup_unrolled` and `followup_jammed`?
5380 ↗	(On Diff #160051)	There is no overlap between `cyclic` and `noncyclic`. For the extended loop-transformations, the user would name the loops they want distributed. Indeed, these followup are are specific to the current distribution pass. However, I think it is easy for any distribution to determine whether a loop has cyclic dependences and and those attributes to any output loop that matches. `makeFollowupLoopID` can already combine attributes from multiple followups.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #160051)	`IgnoreUser` exists for the `nounroll_plus_unroll_and_jam` in `LoopUnrollAndJam\pragma.ll`. `llvm.loop.unroll.disable causes`hasUnrollTransformation` in `computeUnrollCount` to return `TM_Disable`. Unrolling inside `computeUnrollCount` is disabled setting the unroll factor to 0. UnrollAndJam then tries to use the that unroll factor.

dmgreen added inline comments.Aug 19 2018, 2:35 AM

docs/LangRef.rst
5265–5266 ↗	(On Diff #160051)	The way this should be working at the moment is: If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) if there is any unroll metadata disable unrollandjam (leave the loop to the unroller) normal heiristics I think with "but no `llvm.loop.unroll_and_jam` metadata", that is what this is saying. Correct me if I'm wrong and it's not working like this. Or feel free to update it if it's unclear. Or if you think this should work another way...? ;) I originally invisioned unrollandjam as an extension to the unroll pass, so I sometimes see the two things as interrelated. If a user specifies loop.unroll.disable, they almost certainly wanted to disable all unrolling, not just that in the unroll pass.
5311 ↗	(On Diff #160051)	I believe he meant this being unrolled and jammed: for i { for j A(i,j) for k B(i,k) } This is not something we currently support as I didn't think it would ever be likely to be profitable. Users specifying metadata might change that. The pass could be able to be expanded to work on this (I think), but it's not something that it currently does.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #160051)	Have you considered moving the "disable" check out of computeUnrollCount and into tryUnrollLoop, where the existing "HasUnrollDisablePragma" check is? Hopefully that could be replaced with the new method, much like HasUnrollAndJamDisablePragma has been, and would mean this computeUnrollCount function would just work as it used to.

Meinersbur added inline comments.Aug 20 2018, 10:16 AM

docs/LangRef.rst
5265–5266 ↗	(On Diff #160051)	If there is any unroll_and_jam metadata do that thing (the user explicitly asked for a thing -> do it) There is a comment in LoopUnrollAndJamPass.cpp: // We have already checked that the loop has no unroll.* pragmas. According to this, this is not true (since only checked afterwards) and `computeUnrollCount` will consider e.g. `llvm.loop.unroll.count` even when used for unroll-and-jam. I was concerned about the last phrase `plus llvm.loop.unroll.disable metadata will disable unroll and jam too.`, but it might be a misinterpretation in that it will disable unroll-and-jam, but only if unroll-and-jam is not explicitly enabled.
5311 ↗	(On Diff #160051)	The j- and k-loops are both inner loops, so `followup_inner` should apply to both of them. distinguishing them might be possible when introducing a mechanism like for naming the output loops of loop distribution.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
194 ↗	(On Diff #160051)	There are multiple mechanisms in `computeUnrollCount` that disable unrolling (such as `UnrollCountPragmaValue` returning zero). If I was to fix this method, I'd do it cleanly by refactoring-out the mechanism that computes the unroll factor when pragmas/options are absent (and not emit any LoopUnroll-specific diagnostics).

hfinkel added inline comments.Sep 21 2018, 8:44 AM

docs/LangRef.rst
5092 ↗	(On Diff #161358)	This is too strong (see comment below).
5165 ↗	(On Diff #161358)	preserving -> preserve
5165 ↗	(On Diff #161358)	e.g., because two ...
5167 ↗	(On Diff #161358)	vector lane -> set of vector lanes
5173 ↗	(On Diff #161358)	added to both the vectorized and remainder loop
docs/TransformMetadata.rst
14 ↗	(On Diff #161358)	By default, transformation passes use heuristics to determine whether or not to perform transformations, and when doing so, other details of how the transformations are applied (e.g., which vectorization factor to select).
17 ↗	(On Diff #161358)	As stated, this is untrue (for -O3). For -O3, we only require a likely speedup across many workloads (and slowdowns be unlikely). This is why, for example, under -O3, we can vectorize with runtime checks. How about this wording: Unless the optimizer is otherwise directed, transformations are applied conservatively. This conservatism generally allows the optimizer to avoid unprofitable transformations, but in practice, this results in the optimizer not applying transformations that would be highly profitable.
22 ↗	(On Diff #161358)	it -> they
24 ↗	(On Diff #161358)	for -> of
51 ↗	(On Diff #161358)	Unrolling, etc. - no need to capitalize.
54 ↗	(On Diff #161358)	We should be careful with the language here. As any of these can be dropped without changing the semantics of the code, nothing here is "mandatory". How about saying, ", or convey a specific request from the user"
58 ↗	(On Diff #161358)	optimization-missed warning
60 ↗	(On Diff #161358)	I know what you mean by "separate", but I think it's better to say: is separate. -> can be unused by the optimizer without generating a warning.
68 ↗	(On Diff #161358)	This wording is too strong. I think that we need to draw a distinction here between (at least) three classes of transformations: canonicalizing transformations (cost-model-driven) restructuring transformations low-level (target-specific) transformations (e.g., using ctr-register-based loops on PowerPC) I believe that this pragma should only affect those in class (2). Canonizalizing transformations are always performed (when optimizing at all), and low-level transformations are beyond the reach of this kind of metadata. I'd recommend using the wording that this metadata disables, "optional, high-level, restructuring transformations."
70 ↗	(On Diff #161358)	avoids that the loop is altered -> avoids the loop being altered
87 ↗	(On Diff #161358)	Why is this a useful feature? Should we allow only one transformation per node?
90 ↗	(On Diff #161358)	loop being vectorized -> loop to be vectorized
103 ↗	(On Diff #161358)	This leaves open the question of whether the vectorizer adds the 'isvectorized' attribute when a follow-up is specific. It should, right?
120 ↗	(On Diff #161358)	for -> to
121 ↗	(On Diff #161358)	comma after following
152 ↗	(On Diff #161358)	Why would isvectorized not always be provided?
300 ↗	(On Diff #161358)	where 'rtc' is the generated runtime safety check.
390 ↗	(On Diff #161358)	must be -> should be
391 ↗	(On Diff #161358)	responsible for this reporting
392 ↗	(On Diff #161358)	they might -> there might
392 ↗	(On Diff #161358)	being able - > able
394 ↗	(On Diff #161358)	is -> may be (keep the entire list in the hypothetical)
410 ↗	(On Diff #161358)	in a fixedpoint loop -> using a dynamic ordering (not to be too prescriptive)
include/llvm/Transforms/Utils/LoopUtils.h
534 ↗	(On Diff #161358)	We can't have metadata necessary for correctness. 'ForcedByUser' is fine to indicate that the user should receive a warning if the transformation cannot be performed.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
365 ↗	(On Diff #161358)	So each inner loop gets the same id? That doesn't sound right.

Meinersbur marked 26 inline comments as done.Sep 28 2018, 4:21 AM

Meinersbur added inline comments.

docs/TransformMetadata.rst
87 ↗	(On Diff #161358)	It's not a new feature, but effectively the current behavior. ("is possible for compatibility reasons"). I'd indeed prefer to crash if multiple transformations are applied to avoid undefined behavior, which unfortunately is a breaking change.
103 ↗	(On Diff #161358)	I think it should not, but the metadata gives complete control over which attributes will be in the new loop to the metadata. If a frontend wants to apply a transformation twice, it should be able to do so. I think the paragraph says that no such additional implicit attributes are added when a followup is specified: If no followup is specified, ... I added "If, and only if," to make it even clearer.
152 ↗	(On Diff #161358)	I don't know how LoopVectorize behaves when encountering vector instructions, but I want to avoid the vagueness of some passes adding implicit metadata in some situation. The IR should have the control over whether a transformation is applied multiple times.
include/llvm/Transforms/Utils/LoopUtils.h
534 ↗	(On Diff #161358)	[comment] This indicates that using metadata for user-directed loop-transformation #pragmas is a leaky abstraction.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
365 ↗	(On Diff #161358)	`LoopID` is a misnomer. A LoopID is neither unique (multiple loops having the same LoopID, e.g. because LoopVersioning of any pass not aware of loops copied the BBs of a loop; there is even a regression test for the behaviour with non-unique LoopID) nor identifying (adding/removing attributes in the LoopID MDNode will create a new MDNode; see D52116 for a fix for llvm.loop.parallel_accesses assuming this property). Fixing LoopID to be identifier-like is not possible with the current MDNode structure and would require to make any pass that copies code to be aware of LoopIDs. It would be easier to not assume that LoopID has any identifying properties. I am open to rename 'LoopID' to something else.

Rebase
Address @hfinkel's remarks

Harbormaster completed remote builds in B23237: Diff 167450.Sep 28 2018, 4:24 AM

ping

Herald added a subscriber: arphaman. · View Herald TranscriptNov 19 2018, 1:02 PM

dmgreen added inline comments.Nov 25 2018, 12:09 PM

docs/TransformMetadata.rst
17 ↗	(On Diff #167450)	(Space in howthe)
lib/Transforms/Scalar/LoopUnrollPass.cpp
761 ↗	(On Diff #167450)	This shouldn't be needed here. Before this patch, there was a single place that checked if the loop had unroll disable pragma (HasUnrollDisablePragma at the start of tryToUnrollLoop). It seems best to keep that as-is in this patch (it's already long enough!) and remove HasUnrollDisablePragma, replacing it with the new hasUnrollTransformation & TM_Disable check. Then we won't need this IgnoreUser.
lib/Transforms/Utils/LoopUtils.cpp
297 ↗	(On Diff #167450)	Would this fall over if the metadata was not a string? Such as debug metadata.

Meinersbur marked 3 inline comments as done.Nov 29 2018, 11:51 PM

Meinersbur added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
761 ↗	(On Diff #167450)	This is here because if the unfortunate interaction between LoopUnroll and LoopUnrollAndJam. `computeUnrollAndJamCount` uses the result of this function to itself determine whether it should unroll-and-jam. `HasUnrollDisablePragma` checks for the `llvm.loop.unroll.enable` property. `hasUnrollTransformation` returns whether LoopUnroll should do something which is not interchangeable. For some reason, `llvm.loop.unroll.enable` is handled here, but `llvm.loop.unroll.count` and `llvm.loop.unroll.full` are handled here and therefore have in influence on LoopUnrollAndJam. I would be glad if you, the author of LoopUnrollAndJam, could untangle this.
lib/Transforms/Utils/LoopUtils.cpp
297 ↗	(On Diff #167450)	This was previously checked to be in a LoopID, therefore cannot be debug metadata. This assumes that the metadata is not malformed. However, this is nowhere handled gracefully in LLVM. For instance, `UnrollAndJamCountPragmaValue` will trigger an assertion if the MDNode has not exactly 2 items, or the second item is something else than a positive integer. In the case here, an assertion in `cast<T>` will trigger. I added extra checks at this location, but there are many others.

Address dmgreen's comments
Rebase

Harbormaster completed remote builds in B25530: Diff 176044.Nov 29 2018, 11:52 PM

dmgreen added inline comments.Dec 2 2018, 12:08 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
761 ↗	(On Diff #167450)	Sometimes it's easier to show with code :-) so this is what I was thinking of: https://reviews.llvm.org/P8121 Unless you think that will not work for some reason? It passes all the tests you have here, and removes HasUnrollDisablePragma and the IgnoreUser, so seems cleaner. It also has the advantage of keeping unrelated changes to a minimum and not introducing a second place for llvm.loop.unroll.disable to be checked.

dmgreen added inline comments.Dec 2 2018, 12:11 PM

lib/Transforms/Utils/LoopUtils.cpp
297 ↗	(On Diff #167450)	Yeah, malformed input would be fine to not handle, as far as I understand (or perhaps is just QOI). But I was testing something like this (hope I still have it correct): void c(int n, int* w, int* x, int y, int z, int a) { #pragma clang loop distribute(enable) vectorize(disable) for (int i=0; i < n; i++) { x[i] = y[i] + z[i]w[i]; a[i+1] = (a[i-1] + a[i] + a[i+1])/3.0; y[i] = z[i] - x[i]; } } Ran with "clang -O3 distribute.c -S -g" would crash with the previous patch. Now I think it doesn't drop the distribute metadata? I believe the llvm.loop metedata will looks something like !58 in: !58 = distinct !{!58, !30, !59, !60, !61, !62} !59 = !DILocation(line: 8, column: 5, scope: !20) !60 = !{!"llvm.loop.vectorize.width", i32 1} !61 = !{!"llvm.loop.unroll.disable"} !62 = !{!"llvm.loop.distribute.enable", i1 true} !30 is a DILocation too, which I think are the parts causing the problems.

Meinersbur marked 2 inline comments as done.Dec 3 2018, 1:49 PM

Meinersbur added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
761 ↗	(On Diff #167450)	Thank you for the patch. I am not 100% sure whether this does not change LoopUnroll's behavior. That is, with `!{!"llvm.loop.unroll.count", i32 1}` it currently executes UP.Count = PragmaCount; UP.Runtime = true; UP.AllowExpensiveTripCount = true; UP.Force = true; if ((UP.AllowRemainder \|\| (TripMultiple % PragmaCount == 0)) && getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold) return true; where as with your patch it bails out early (it might still do peeling even if UP.Count is 1). Also, the `-unroll-count` command-line option would be evaluated first before your patch. unroll-pragmas_contradiction.ll950 BDownload fails with your patch. However, I like that it indeed makes the unroll decision simpler and goes in the direction of separating LoopUnroll and LoopUnrollAndJam's decision logic.
lib/Transforms/Utils/LoopUtils.cpp
297 ↗	(On Diff #167450)	I may not have considered that CGLoopInfo.cpp also adds debug locations to LoopIDs. Should be fixed with the previous update. Thanks for noticing. I also made a mistake in that update which dropped all non-distribute metadata instead of the distribute metadata. It made one regression test fail.

Rebress
Fix drop metadata regression
Apply dmgreen's patch suggestion

Harbormaster completed remote builds in B25622: Diff 176471.Dec 3 2018, 1:51 PM

A few additional comments. Otherwise, this LGTM. When @dmgreen is happy with the unrolling changes, I think you're good to go.

docs/LangRef.rst
5124 ↗	(On Diff #176471)	Maybe add, "It is recommended to use this metadata when using any of the other llvm.loop.* metadata to direct specific transformations."
docs/TransformMetadata.rst
400 ↗	(On Diff #176471)	there -> they
lib/Transforms/Scalar/LoopDistribute.cpp
87 ↗	(On Diff #176471)	This should say Followup, not Followu, I suppose.
lib/Transforms/Scalar/WarnMissedTransforms.cpp
32 ↗	(On Diff #176471)	Here and below, explicitly specified should have a hyphen (it is a compound adjective): explicitly-specified loop unrolling that having been said, I'd prefer a different phrasing all together. These are end-user visible messages, and I think that we can make these slightly more user friendly. How about this: "loop not unrolled: the optimizer was unable to perform the requested transformation" (and similar for the others)

This revision is now accepted and ready to land.Dec 3 2018, 4:32 PM

Add transformation order notice to llvm.loop.disable_nonforced.
Typos

Harbormaster completed remote builds in B25647: Diff 176545.Dec 3 2018, 9:15 PM

Meinersbur added inline comments.Dec 3 2018, 9:15 PM

lib/Transforms/Scalar/WarnMissedTransforms.cpp
32 ↗	(On Diff #176471)	I think that "the optimizer was unable to perform" is less accurate: it gives the impression that the optimizer actually tried to perform the transformation, but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline (e.g. because of `-fno-vectorize` or `-mllvm -enable-unroll-and-jam` is missing). That is, the user should modify the compiler flags instead of tweaking the source code. That being said, "failed to ..." is not much better. Any better suggestions?

dexonsmith removed a subscriber: dexonsmith.Dec 3 2018, 10:05 PM

hfinkel added inline comments.Dec 4 2018, 7:49 AM

lib/Transforms/Scalar/WarnMissedTransforms.cpp
32 ↗	(On Diff #176471)	I think that "the optimizer was unable to perform" is less accurate: ... but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline... I disagree that it is less accurate, and the optimizer might be unable to perform an optimization for structural reasons, and to say that something "failed" clearly implies to me that it was explicitly attempted (which in this case it was not). Nevertheless, this is a good point, and we could provide a more-useful message. How about this: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering

Meinersbur mentioned this in D55288: [test] Fix tests for changed optimizer warning message.Dec 4 2018, 11:56 AM

Clear-up leftover transformation warning messages

This change requires a patch to Clang: D55288

Harbormaster completed remote builds in B25685: Diff 176683.Dec 4 2018, 11:58 AM

Meinersbur added a child revision: D55288: [test] Fix tests for changed optimizer warning message.Dec 4 2018, 11:59 AM

When @dmgreen is happy with the unrolling changes, I think you're good to go.

Certainly. If you are happy, I am happy. Thanks.

[test] Revise tests
- Consistent disable_nonforced testing
- Unify followup-attribute testing. The previous approach was to copy existing test cases and emulate the behavior of the loop transformation passes using followup attributes. This had the disadvantage that the pass would pass even if the followup-attribute was ignored (indeed, some were misspelled) since the result is the same. Instead, use a new "followup.ll" test per loop pass that checks the presence of new attributes specific for each followup.
[docs] Add followup attribute recommendations

Harbormaster completed remote builds in B25946: Diff 177820.Dec 11 2018, 9:27 PM

Closed by commit rC348944: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. (authored by Meinersbur). · Explain WhyDec 12 2018, 9:38 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: cfe-commits. · View Herald TranscriptDec 12 2018, 9:38 AM

This is an archive of the discontinued LLVM Phabricator instance.

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 177870

test/Misc/backend-optimization-failure-nodbg.cpp

test/Misc/backend-optimization-failure.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 177870

test/Misc/backend-optimization-failure-nodbg.cpp

test/Misc/backend-optimization-failure.cpp

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
ClosedPublic