This is an archive of the discontinued LLVM Phabricator instance.

[UnrollAndJam] Add unroll_and_jam pragma handling
ClosedPublic

Authored by dmgreen on May 23 2018, 9:00 AM.

Details

Summary

This adds support for the unroll_and_jam pragma, to go with D41953. The name of
the pragma is copied from the Intel compiler, and most of the code works the same as
for unroll.

I have some doubts whether this will ever be used sensibly in the real world, but can
be useful for testing and was not difficult to put together.

Diff Detail

Event Timeline

dmgreen created this revision.May 23 2018, 9:00 AM

I have some doubts whether this will ever be used sensibly in the real world, but can
be useful for testing and was not difficult to put together.

I definitely support having pragmas for these kinds of loop transformations. In my experience, they are used.

This is straightforward in that it clones the implementation of #pragma unroll for unroll_and_jam.

Hence, it also shares the same problem of clang's LoopHints, namely that the order of loop transformations (if there are multiple on the same loop: loop distribution, vectorization, etc) is defined by the order of the passes in the pass pipeline, which should be an implementation detail.

I am currently working on this topic. Could we maybe disable the #pragma clang loop unroll_and_jam spelling ftm to avoid compatibility issues? However, the problem already exists for the other loop hints, so I will have to ensure compatibility with those anyway.

In my experience, they are used.

Good to know, cheers.

Could we maybe disable the #pragma clang loop unroll_and_jam spelling ftm to avoid compatibility issues?

Sure, I'm not against. It sounds like you have opinions on how this should work. That's good. If there are multiple clang loop pragma's, what is the expected behaviour there?

In the llvm side of this, in the unroll and jam pass, I made it so that a loop with "llvm.loop.unroll" metadata without any "llvm.loop.unroll_and_jam" metadata will not do unroll and jam, it will leave the loop for the unroller. On the expectation that the use really wants to unroll (and it applies to llvm.loop.unroll.disable too, disabling unroll and jam as well as unroll). I haven't done anything with other loop metadata though.

Could we maybe disable the #pragma clang loop unroll_and_jam spelling ftm to avoid compatibility issues?

Sure, I'm not against. It sounds like you have opinions on how this should work. That's good. If there are multiple clang loop pragma's, what is the expected behaviour there?

In the llvm side of this, in the unroll and jam pass, I made it so that a loop with "llvm.loop.unroll" metadata without any "llvm.loop.unroll_and_jam" metadata will not do unroll and jam, it will leave the loop for the unroller. On the expectation that the use really wants to unroll (and it applies to llvm.loop.unroll.disable too, disabling unroll and jam as well as unroll). I haven't done anything with other loop metadata though.

I'd expect that transformations "stack up". E.g.

#pragma unroll_and_jam
#pragma unroll(4)

which I'd expect to unroll by a factor of 4 first, then try to unroll-and-jam (which I am not sure D41953 can do due to then containing 4 loops). On the other hand

#pragma unroll(4)
#pragma unroll_and_jam

should unroll-and-jam, and then unroll the outer loop by a factor of 4.

Just out of curiousity:

  • How do you plan to implement this? Are you going to generate from the pragma some sort of "script" that dictates the transformation order which is going to be fed to the pass manager?
  • About the stacking of pragmas, in your example you apply the bottom one first, but would a user perhaps expect the first to be applied? In other words, is the expected behaviour described somewhere (in a spec)?
dmgreen updated this revision to Diff 148393.May 24 2018, 5:51 AM

This splits out the pragma clang loop unroll_and_jam handling into D47320, for if/when we need it. Which I believe is what you wanted, correct me if I'm wrong.

Yes, this is what I had in mind. Thank you.

I am preparing an RFC on what I am trying to do. This should clarify our goals and to what extend #pragma clang loop conflicts with it.

Thanks.

I noticed in the paper that you used the name "unrollandjam", minus underscores. Should I change this use that spelling here? I have no strong opinion of one over the other (was just using what I had found from the Intel docs).

I noticed in the paper that you used the name "unrollandjam", minus underscores. Should I change this use that spelling here? I have no strong opinion of one over the other (was just using what I had found from the Intel docs).

IMHO you can keep unroll_and_jam (which is already supported by Intel syntax). When I imagined the name, I had xlc's unrollandfuse in mind, but found that "and jam" is better known than "and fuse".

We can have a discussion about how to name them in general. nounroll_and_jam seems a strange mix of write words together and separate words by underscores.

I quite like the UnrollAndFuse naming. I'd not heard that the xlc compiler called it that. The UnrollAndJam pass was origin named that before I renamed for similar reasons (UnrollAndJam being more well known).

I see your point about the mix of underscores. "nounroll_and_jam" also comes from the Intel docs, and theres already "nounroll" vs "unroll". The "no" becomes a qualifier on "unroll_and_jam". "no_unroll_and_jam" feels less consistent to me.

But if you have a strong opinion, I'm happy enough to change it.

I see your point about the mix of underscores. "nounroll_and_jam" also comes from the Intel docs, and theres already "nounroll" vs "unroll". The "no" becomes a qualifier on "unroll_and_jam". "no_unroll_and_jam" feels less consistent to me.

nounroll_and_jam looks like it should be parsed as "(no unroll) and jam" (do not unroll, but fuse) instead of "no (unroll-and-jam)" because nounroll is one word and as you mentioned, already used as a keyword somewhere else. Other variants use the underscore to append an option, e.g. vectorize_width.

But if you have a strong opinion, I'm happy enough to change it.

I don't. Feel free to chose the name you think fits best. We might support multiple spellings if necessary.

If we want to add more transformations, it would be nice to have an explicit naming scheme. E.g. for "register tiling", "stream_unroll" (supported by xlc), "index set splitting", "statement reordering", "strip mine", "overlap tiling", "diamond tiling", "thread-parallelization", "task-parallelization", "loop unswitching", etc.

I see your point about the mix of underscores. "nounroll_and_jam" also comes from the Intel docs, and theres already "nounroll" vs "unroll". The "no" becomes a qualifier on "unroll_and_jam". "no_unroll_and_jam" feels less consistent to me.

nounroll_and_jam looks like it should be parsed as "(no unroll) and jam" (do not unroll, but fuse) instead of "no (unroll-and-jam)" because nounroll is one word and as you mentioned, already used as a keyword somewhere else. Other variants use the underscore to append an option, e.g. vectorize_width.

But if you have a strong opinion, I'm happy enough to change it.

I don't. Feel free to chose the name you think fits best. We might support multiple spellings if necessary.

I agree that we can support multiple spellings (especially for compatibility with other compilers). I have a preference for using the underscores as our primary spelling. I think that it's easier to read. nounroll_and_jam is fine for compatibility if we'd like. I prefer we have a different syntax that we can use consistently within the 'clang loop' pragmas. How about 'unroll_and_jam disable' or similar?

If we want to add more transformations, it would be nice to have an explicit naming scheme. E.g. for "register tiling", "stream_unroll" (supported by xlc), "index set splitting", "statement reordering", "strip mine", "overlap tiling", "diamond tiling", "thread-parallelization", "task-parallelization", "loop unswitching", etc.

I have a preference for using the underscores as our primary spelling. I think that it's easier to read.

I agree with it being easier to read.

I prefer we have a different syntax that we can use consistently within the 'clang loop' pragmas. How about 'unroll_and_jam disable' or similar?

The code I had for #pragma clang loop (now in D47320, although I may not have split all the relevant parts into there) was doing the same thing as the unroll code. So worked the same way, I think looking like "#pragma clang loop unroll_and_jam(disable)" vs enable. It sounds sensible to me to have these look the same way as unroll clang loop pragmas, for both the old syntax and the new from the RFC.

alexfh removed a reviewer: alexfh.Jun 7 2018, 7:12 AM
dmgreen updated this revision to Diff 158219.Jul 31 2018, 5:01 AM

Rebase.

Michael, you happy with this part?

The pragma clang loop part is off in D47320. Let me know if/when I should do something with that.

Meinersbur accepted this revision.Jul 31 2018, 10:13 AM

LGTM

lib/Sema/SemaStmtAttr.cpp
183

[nit] unrelated whitespace change?

test/CodeGenCXX/pragma-unroll-and-jam.cpp
5

[suggestion] CHECK-LABEL? (applies to the function names below as well)

This revision is now accepted and ready to land.Jul 31 2018, 10:13 AM
This revision was automatically updated to reflect the committed changes.