Page MenuHomePhabricator

[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.

Authored by Meinersbur on Jul 12 2018, 10:15 PM.



When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)

is the same as

#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

To ensure that no other transformation is executed before the intended one, the attribute llvm.loop.transformations.disable_nonforced can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

This approach deviates from proposal in the RFC at There are three reasons:

  • In the RFC approach, when a pass wants to determine whether it should perform a transformation, it must search the list of transformations for the first transformation on its loop. When inlining, the two transformation list must be combined.
  • For compatibility, the current approach using loop IDs must still be supported, either each pass looks up both metadata formats, or using the AutoUpgrade mechanism. This patch's approach keeps the current mechanism.
  • The "loop IDs" can change. When a attribute is added/removed from the Loop ID metata (e.g. adding "llvm.loop.isvectorized" after vectorizing), the loop is assigned a new, distrinct MDNode. Every reference to these nodes needs to be updated as well, and metadata nodes do not support RAUW. Some of the current passes (e.g. LoopDistribution) keep the same loop ID for multiple output loops such that the loop ID is not uniquely identifying loops anymore. This patch's approach does not reference loop IDs in the attribute values.

Diff Detail

rC Clang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Meinersbur added inline comments.Aug 17 2018, 4:23 PM
5265–5266 ↗(On Diff #160051)

@dmgreen This directly contradicts the nounroll_plus_unroll_and_jam test case in Transforms/LoopUnrollAndJam/pragma.ll

5311 ↗(On Diff #160051)

There's still an outermost (unrolled) loop and an innermost (jammed) loop. We could also adds followups for middle loops.

If it is the naming that concerns you: Would you prefer followup_unrolled and followup_jammed?

5380 ↗(On Diff #160051)

There is no overlap between cyclic and noncyclic. For the extended loop-transformations, the user would name the loops they want distributed.

Indeed, these followup are are specific to the current distribution pass. However, I think it is easy for any distribution to determine whether a loop has cyclic dependences and and those attributes to any output loop that matches.

makeFollowupLoopID can already combine attributes from multiple followups.

194 ↗(On Diff #160051)

IgnoreUser exists for the nounroll_plus_unroll_and_jam in LoopUnrollAndJam\pragma.ll.

llvm.loop.unroll.disable causeshasUnrollTransformation` in computeUnrollCount to return TM_Disable. Unrolling inside computeUnrollCount is disabled setting the unroll factor to 0. UnrollAndJam then tries to use the that unroll factor.

dmgreen added inline comments.Aug 19 2018, 2:35 AM
5265–5266 ↗(On Diff #160051)

The way this should be working at the moment is:

If there is any unroll_and_jam metadata
  do that thing             (the user explicitly asked for a thing -> do it)
if there is any unroll metadata
  disable unrollandjam      (leave the loop to the unroller)
normal heiristics

I think with "but no `llvm.loop.unroll_and_jam` metadata", that is what this is saying. Correct me if I'm wrong and it's not working like this. Or feel free to update it if it's unclear. Or if you think this should work another way...? ;)

I originally invisioned unrollandjam as an extension to the unroll pass, so I sometimes see the two things as interrelated. If a user specifies loop.unroll.disable, they almost certainly wanted to disable all unrolling, not just that in the unroll pass.

5311 ↗(On Diff #160051)

I believe he meant this being unrolled and jammed:

for i {
  for j
  for k

This is not something we currently support as I didn't think it would ever be likely to be profitable. Users specifying metadata might change that. The pass could be able to be expanded to work on this (I think), but it's not something that it currently does.

194 ↗(On Diff #160051)

Have you considered moving the "disable" check out of computeUnrollCount and into tryUnrollLoop, where the existing "HasUnrollDisablePragma" check is? Hopefully that could be replaced with the new method, much like HasUnrollAndJamDisablePragma has been, and would mean this computeUnrollCount function would just work as it used to.

Meinersbur added inline comments.Aug 20 2018, 10:16 AM
5265–5266 ↗(On Diff #160051)
If there is any unroll_and_jam metadata
  do that thing             (the user explicitly asked for a thing -> do it)

There is a comment in LoopUnrollAndJamPass.cpp:

// We have already checked that the loop has no unroll.* pragmas.

According to this, this is not true (since only checked afterwards) and computeUnrollCount will consider e.g. llvm.loop.unroll.count even when used for unroll-and-jam.

I was concerned about the last phrase plus llvm.loop.unroll.disable metadata will disable unroll and jam too., but it might be a misinterpretation in that it will disable unroll-and-jam, but only if unroll-and-jam is not explicitly enabled.

5311 ↗(On Diff #160051)

The j- and k-loops are both inner loops, so followup_inner should apply to both of them.
distinguishing them might be possible when introducing a mechanism like for naming the output loops of loop distribution.

194 ↗(On Diff #160051)

There are multiple mechanisms in computeUnrollCount that disable unrolling (such as UnrollCountPragmaValue returning zero).

If I was to fix this method, I'd do it cleanly by refactoring-out the mechanism that computes the unroll factor when pragmas/options are absent (and not emit any LoopUnroll-specific diagnostics).

hfinkel added inline comments.Sep 21 2018, 8:44 AM
5092 ↗(On Diff #161358)

This is too strong (see comment below).

5165 ↗(On Diff #161358)

preserving -> preserve

5165 ↗(On Diff #161358)

e.g., because two ...

5167 ↗(On Diff #161358)

vector lane -> set of vector lanes

5173 ↗(On Diff #161358)

added to both the vectorized and remainder loop

14 ↗(On Diff #161358)

By default, transformation passes use heuristics to determine whether or not to perform transformations, and when doing so, other details of how the transformations are applied (e.g., which vectorization factor to select).

17 ↗(On Diff #161358)

As stated, this is untrue (for -O3). For -O3, we only require a likely speedup across many workloads (and slowdowns be unlikely). This is why, for example, under -O3, we can vectorize with runtime checks. How about this wording:

Unless the optimizer is otherwise directed, transformations are applied conservatively. This conservatism generally allows the optimizer to avoid unprofitable transformations, but in practice, this results in the optimizer not applying transformations that would be highly profitable.

22 ↗(On Diff #161358)

it -> they

24 ↗(On Diff #161358)

for -> of

51 ↗(On Diff #161358)

Unrolling, etc. - no need to capitalize.

54 ↗(On Diff #161358)

We should be careful with the language here. As any of these can be dropped without changing the semantics of the code, nothing here is "mandatory". How about saying, ", or convey a specific request from the user"

58 ↗(On Diff #161358)

optimization-missed warning

60 ↗(On Diff #161358)

I know what you mean by "separate", but I think it's better to say:

is separate. -> can be unused by the optimizer without generating a warning.

68 ↗(On Diff #161358)

This wording is too strong. I think that we need to draw a distinction here between (at least) three classes of transformations:

  1. canonicalizing transformations
  2. (cost-model-driven) restructuring transformations
  3. low-level (target-specific) transformations (e.g., using ctr-register-based loops on PowerPC)

I believe that this pragma should only affect those in class (2). Canonizalizing transformations are always performed (when optimizing at all), and low-level transformations are beyond the reach of this kind of metadata.

I'd recommend using the wording that this metadata disables, "optional, high-level, restructuring transformations."

70 ↗(On Diff #161358)

avoids that the loop is altered -> avoids the loop being altered

87 ↗(On Diff #161358)

Why is this a useful feature? Should we allow only one transformation per node?

90 ↗(On Diff #161358)

loop being vectorized -> loop to be vectorized

103 ↗(On Diff #161358)

This leaves open the question of whether the vectorizer adds the 'isvectorized' attribute when a follow-up is specific. It should, right?

120 ↗(On Diff #161358)

for -> to

121 ↗(On Diff #161358)

comma after following

152 ↗(On Diff #161358)

Why would isvectorized not always be provided?

300 ↗(On Diff #161358)

where 'rtc' is the generated runtime safety check.

390 ↗(On Diff #161358)

must be -> should be

391 ↗(On Diff #161358)

responsible for this reporting

392 ↗(On Diff #161358)

they might -> there might

392 ↗(On Diff #161358)

being able - > able

394 ↗(On Diff #161358)

is -> may be

(keep the entire list in the hypothetical)

410 ↗(On Diff #161358)

in a fixedpoint loop -> using a dynamic ordering

(not to be too prescriptive)

534 ↗(On Diff #161358)

We can't have metadata necessary for correctness. 'ForcedByUser' is fine to indicate that the user should receive a warning if the transformation cannot be performed.

365 ↗(On Diff #161358)

So each inner loop gets the same id? That doesn't sound right.

Meinersbur marked 26 inline comments as done.Sep 28 2018, 4:21 AM
Meinersbur added inline comments.
87 ↗(On Diff #161358)

It's not a new feature, but effectively the current behavior. ("is possible for compatibility
reasons"). I'd indeed prefer to crash if multiple transformations are applied to avoid undefined behavior, which unfortunately is a breaking change.

103 ↗(On Diff #161358)

I think it should not, but the metadata gives complete control over which attributes will be in the new loop to the metadata. If a frontend wants to apply a transformation twice, it should be able to do so.

I think the paragraph says that no such additional implicit attributes are added when a followup is specified:

If no followup is specified, ...

I added "If, and only if," to make it even clearer.

152 ↗(On Diff #161358)

I don't know how LoopVectorize behaves when encountering vector instructions, but I want to avoid the vagueness of some passes adding implicit metadata in some situation. The IR should have the control over whether a transformation is applied multiple times.

534 ↗(On Diff #161358)

[comment] This indicates that using metadata for user-directed loop-transformation #pragmas is a leaky abstraction.

365 ↗(On Diff #161358)

LoopID is a misnomer. A LoopID is neither unique (multiple loops having the same LoopID, e.g. because LoopVersioning of any pass not aware of loops copied the BBs of a loop; there is even a regression test for the behaviour with non-unique LoopID) nor identifying (adding/removing attributes in the LoopID MDNode will create a new MDNode; see D52116 for a fix for llvm.loop.parallel_accesses assuming this property).

Fixing LoopID to be identifier-like is not possible with the current MDNode structure and would require to make any pass that copies code to be aware of LoopIDs. It would be easier to not assume that LoopID has any identifying properties. I am open to rename 'LoopID' to something else.

Meinersbur marked an inline comment as done.
dmgreen added inline comments.Sun, Nov 25, 12:09 PM
17 ↗(On Diff #167450)

(Space in howthe)

761 ↗(On Diff #167450)

This shouldn't be needed here. Before this patch, there was a single place that checked if the loop had unroll disable pragma (HasUnrollDisablePragma at the start of tryToUnrollLoop). It seems best to keep that as-is in this patch (it's already long enough!) and remove HasUnrollDisablePragma, replacing it with the new hasUnrollTransformation & TM_Disable check. Then we won't need this IgnoreUser.

297 ↗(On Diff #167450)

Would this fall over if the metadata was not a string? Such as debug metadata.

Meinersbur marked 3 inline comments as done.Thu, Nov 29, 11:51 PM
Meinersbur added inline comments.
761 ↗(On Diff #167450)

This is here because if the unfortunate interaction between LoopUnroll and LoopUnrollAndJam. computeUnrollAndJamCount uses the result of this function to itself determine whether it should unroll-and-jam.

HasUnrollDisablePragma checks for the llvm.loop.unroll.enable property. hasUnrollTransformation returns whether LoopUnroll should do something which is not interchangeable. For some reason, llvm.loop.unroll.enable is handled here, but llvm.loop.unroll.count and llvm.loop.unroll.full are handled here and therefore have in influence on LoopUnrollAndJam.

I would be glad if you, the author of LoopUnrollAndJam, could untangle this.

297 ↗(On Diff #167450)

This was previously checked to be in a LoopID, therefore cannot be debug metadata.

This assumes that the metadata is not malformed. However, this is nowhere handled gracefully in LLVM. For instance, UnrollAndJamCountPragmaValue will trigger an assertion if the MDNode has not exactly 2 items, or the second item is something else than a positive integer. In the case here, an assertion in cast<T> will trigger.

I added extra checks at this location, but there are many others.

Meinersbur marked an inline comment as done.
  • Address dmgreen's comments
  • Rebase
dmgreen added inline comments.Sun, Dec 2, 12:08 PM
761 ↗(On Diff #167450)

Sometimes it's easier to show with code :-) so this is what I was thinking of:
Unless you think that will not work for some reason? It passes all the tests you have here, and removes HasUnrollDisablePragma and the IgnoreUser, so seems cleaner. It also has the advantage of keeping unrelated changes to a minimum and not introducing a second place for llvm.loop.unroll.disable to be checked.

dmgreen added inline comments.Sun, Dec 2, 12:11 PM
297 ↗(On Diff #167450)

Yeah, malformed input would be fine to not handle, as far as I understand (or perhaps is just QOI). But I was testing something like this (hope I still have it correct):

void c(int n, int* w, int* x, int *y, int* z, int *a) {
#pragma clang loop distribute(enable)  vectorize(disable)
    for (int i=0; i < n; i++) {
        x[i] = y[i] + z[i]*w[i];
        a[i+1] = (a[i-1] + a[i] + a[i+1])/3.0;
        y[i] = z[i] - x[i];

Ran with "clang -O3 distribute.c -S -g" would crash with the previous patch. Now I think it doesn't drop the distribute metadata? I believe the llvm.loop metedata will looks something like !58 in:

!58 = distinct !{!58, !30, !59, !60, !61, !62}
!59 = !DILocation(line: 8, column: 5, scope: !20)
!60 = !{!"llvm.loop.vectorize.width", i32 1}
!61 = !{!"llvm.loop.unroll.disable"}
!62 = !{!"llvm.loop.distribute.enable", i1 true}

!30 is a DILocation too, which I think are the parts causing the problems.

Meinersbur marked 2 inline comments as done.Mon, Dec 3, 1:49 PM
Meinersbur added inline comments.
761 ↗(On Diff #167450)

Thank you for the patch. I am not 100% sure whether this does not change LoopUnroll's behavior. That is, with !{!"llvm.loop.unroll.count", i32 1} it currently executes

UP.Count = PragmaCount;
UP.Runtime = true;
UP.AllowExpensiveTripCount = true;
UP.Force = true;
if ((UP.AllowRemainder || (TripMultiple % PragmaCount == 0)) &&
    getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)
  return true;

where as with your patch it bails out early (it might still do peeling even if UP.Count is 1). Also, the -unroll-count command-line option would be evaluated first before your patch.

fails with your patch.

However, I like that it indeed makes the unroll decision simpler and goes in the direction of separating LoopUnroll and LoopUnrollAndJam's decision logic.

297 ↗(On Diff #167450)

I may not have considered that CGLoopInfo.cpp also adds debug locations to LoopIDs. Should be fixed with the previous update. Thanks for noticing.

I also made a mistake in that update which dropped all non-distribute metadata instead of the distribute metadata. It made one regression test fail.

Meinersbur updated this revision to Diff 176471.Mon, Dec 3, 1:51 PM
  • Rebress
  • Fix drop metadata regression
  • Apply dmgreen's patch suggestion
hfinkel accepted this revision.Mon, Dec 3, 4:32 PM

A few additional comments. Otherwise, this LGTM. When @dmgreen is happy with the unrolling changes, I think you're good to go.

5124 ↗(On Diff #176471)

Maybe add, "It is recommended to use this metadata when using any of the other llvm.loop.* metadata to direct specific transformations."

400 ↗(On Diff #176471)

there -> they

87 ↗(On Diff #176471)

This should say Followup, not Followu, I suppose.

32 ↗(On Diff #176471)

Here and below, explicitly specified should have a hyphen (it is a compound adjective):

explicitly-specified loop unrolling

that having been said, I'd prefer a different phrasing all together. These are end-user visible messages, and I think that we can make these slightly more user friendly. How about this:

"loop not unrolled: the optimizer was unable to perform the requested transformation"

(and similar for the others)

This revision is now accepted and ready to land.Mon, Dec 3, 4:32 PM
Meinersbur updated this revision to Diff 176545.Mon, Dec 3, 9:15 PM
Meinersbur marked 4 inline comments as done.
  • Add transformation order notice to llvm.loop.disable_nonforced.
  • Typos
Meinersbur added inline comments.Mon, Dec 3, 9:15 PM
32 ↗(On Diff #176471)

I think that "the optimizer was unable to perform" is less accurate: it gives the impression that the optimizer actually tried to perform the transformation, but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline (e.g. because of -fno-vectorize or -mllvm -enable-unroll-and-jam is missing). That is, the user should modify the compiler flags instead of tweaking the source code.

That being said, "failed to ..." is not much better. Any better suggestions?

hfinkel added inline comments.Tue, Dec 4, 7:49 AM
32 ↗(On Diff #176471)

I think that "the optimizer was unable to perform" is less accurate: ... but one of the reasons the metadata is still present is that the corresponding pass is not in the pipeline...

I disagree that it is less accurate, and the optimizer might be unable to perform an optimization for structural reasons, and to say that something "failed" clearly implies to me that it was explicitly attempted (which in this case it was not). Nevertheless, this is a good point, and we could provide a more-useful message. How about this:

loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering
Meinersbur marked 2 inline comments as done.
  • Clear-up leftover transformation warning messages

This change requires a patch to Clang: D55288

dmgreen accepted this revision.Tue, Dec 4, 2:14 PM

When @dmgreen is happy with the unrolling changes, I think you're good to go.

Certainly. If you are happy, I am happy. Thanks.

  • [test] Revise tests
    • Consistent disable_nonforced testing
    • Unify followup-attribute testing. The previous approach was to copy existing test cases and emulate the behavior of the loop transformation passes using followup attributes. This had the disadvantage that the pass would pass even if the followup-attribute was ignored (indeed, some were misspelled) since the result is the same. Instead, use a new "followup.ll" test per loop pass that checks the presence of new attributes specific for each followup.
  • [docs] Add followup attribute recommendations
This revision was automatically updated to reflect the committed changes.