This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Generate follow-up metadata for loops with more than one transformation.
ClosedPublic

Authored by Meinersbur on Feb 8 2019, 2:34 PM.

Details

Summary

Before this patch, CGLoop would dump all transformations for a loop into a single LoopID without encoding any order in which to apply them. rL348944 added the possibility to encode a transformation order using followup-attributes.

When a loop has more than one transformation, use the follow-up attribute define the order in which they are applied. The emitted order is the defacto order as defined by the current LLVM pass pipeline, which is:

  1. LoopFullUnrollPass
  2. LoopDistributePass
  3. LoopVectorizePass
  4. LoopUnrollAndJamPass
  5. LoopUnrollPass
  6. MachinePipeliner

This patch should therefore not change the assembly output, assuming that all explicit transformations can be applied, and no implicit transformations in-between. In the former case, WarnMissedTransformationsPass should emit a warning (except for MachinePipeliner which is not implemented yet). The latter could be avoided by adding 'llvm.loop.disable_nonforced' attributes.

Because LoopUnrollAndJamPass processes a loop nest, generation of the MDNode is delayed to after the inner loop metadata have been processed. A temporary LoopID is therefore used to annotate instructions and RAUW'ed by the actual LoopID later.

Diff Detail

Event Timeline

Meinersbur created this revision.Feb 8 2019, 2:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 2:34 PM
Herald added a subscriber: zzheng. · View Herald Transcript
rjmccall resigned from this revision.Feb 12 2019, 2:04 PM

Sorry, this is deep in the loop analysis code, and I don't remember anything about this stuff, so I'm not sure I'm going to be a useful reviewer.

Anastasia resigned from this revision.Feb 18 2019, 7:50 AM

This seems like an interesting change. So far in OpenCL we have actively used only unroll. I will let others to assess in more details.

lib/CodeGen/CGLoopInfo.cpp
107

Will this end up emitted in the final module?

Anastasia added a subscriber: Anastasia.
Meinersbur marked an inline comment as done.Feb 18 2019, 9:18 AM
Meinersbur added inline comments.
lib/CodeGen/CGLoopInfo.cpp
107

This MDNode is emitted in the module generated by clang -- as a follow-up attribute -- in case there is another transformation after partial unrolling (currently just pipelining).

ping

The Polly-powered additional transformations now also generate this kind of metadata.

ping

This is a good follow up for rL348944, but I have no familiarity to the code under Clang. So, let me thank you for doing this work and move myself to the subscriber.

hsaito removed a reviewer: hsaito.Mar 19 2019, 2:52 PM
hsaito added a subscriber: hsaito.

Hello. I also don't feel very familiar with clang, but had a poke around and I think it looks pretty good. I see unroll and jam is being awkward again.

This could maybe do with a few extra tests. Am I correct in saying something like this:

#pragma unroll_and_jam(4)
for(int j = 0; j < n; j++) {
  #pragma unroll(4)
  for(int k = 0; k < n; k++) {
    x[j*n+k] = 10;
  }
}

would end up with a llvm.loop.unroll_and_jam.followup_inner with a llvm.loop.unroll_count?

lib/CodeGen/CGLoopInfo.cpp
502

I was having trouble parsing this sentance. Does it mean both the inner loop and the outer loop both have unroll-and-jam? UnrollAndJam processes loops from inner to outer, so if this is working as I think, maybe it should be put into BeforeJam.

504

= Attrs.UnrollAndJamEnable ?

lib/CodeGen/CGLoopInfo.h
135

*encodes

170

*encodes

Meinersbur marked 5 inline comments as done.
  • Rebase
  • Add two test cases for all attributes combined (as inner and outer loop for of an unroll-and-jam)
  • Of two nested unroll-and-jams, apply the inner first
  • Typos and comment clarification
  • Add missing llvm.loop.isvectorized when attributes are split between BeforeJam and AfterJam

This could maybe do with a few extra tests. Am I correct in saying something like this:

#pragma unroll_and_jam(4)
for(int j = 0; j < n; j++) {
  #pragma unroll(4)
  for(int k = 0; k < n; k++) {
    x[j*n+k] = 10;
  }
}

would end up with a llvm.loop.unroll_and_jam.followup_inner with a llvm.loop.unroll_count?

correct. I added a new test pragma-followup_inner.cpp where this can be seen.

lib/CodeGen/CGLoopInfo.cpp
502

Does it mean both the inner loop and the outer loop both have unroll-and-jam?

Correct. I was thinking of such a case:

#pragma unroll_and_jam(2)
for (int i = 0; i < m; i+=1)
  #pragma unroll_and_jam(3)
  for (int j = 0; j < n; j+=1)

Assuming the inner unroll-and-jam would do something, would it happen before or after the outer transformation. As you mentioned, there is an argument to put it into BeforeJam. Changed.

504

thanks

dmgreen accepted this revision.Mar 26 2019, 5:41 AM

Yeah, OK. This looks like a good patch to me. As I said, I'm not a clang expert, but the code looks sensible enough. (Perhaps wait a couple of days in case others have objections.)

This revision is now accepted and ready to land.Mar 26 2019, 5:41 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 10:50 AM

@dmgreen Thank you for the review!