This is an archive of the discontinued LLVM Phabricator instance.

Update documentation for unroll pragmas on loops with runtime trip counts
ClosedPublic

Authored by meheff on Jun 30 2015, 4:55 PM.

Details

Reviewers
broune
hfinkel
Summary

This change updates the documentation for the loop unrolling pragma behavior
change in http://reviews.llvm.org/D10854 Specifically, with that change
"#pragma unroll" will not unroll loops with a runtime trip count.

Diff Detail

Repository
rL LLVM

Event Timeline

meheff updated this revision to Diff 28830.Jun 30 2015, 4:55 PM
meheff retitled this revision from to Update documentation for unroll pragmas on loops with runtime trip counts.
meheff updated this object.
meheff edited the test plan for this revision. (Show Details)
meheff added reviewers: hfinkel, broune.
meheff set the repository for this revision to rL LLVM.
meheff added subscribers: Unknown Object (MLST), Unknown Object (MLST).
broune accepted this revision.Jun 30 2015, 8:50 PM
broune edited edge metadata.
broune added inline comments.
include/clang/Basic/AttrDocs.td
1330

Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc strings the same?

This revision is now accepted and ready to land.Jun 30 2015, 8:50 PM
meheff added inline comments.Jul 1 2015, 9:26 AM
include/clang/Basic/AttrDocs.td
1330

In AttrDocs.td there is not documentation for "#pragma clang loop unroll". Instead it just mentions "#pragma clang loop" and points to LanguageExtensions for details. Regarding LanguageExtensions documentation of "#pragma clang loop unroll" and AttrDocs documentation for "#pragma unroll", I believe they do both say the same thing (or at least aren't conflicting with each other) with the LanguageExtensions documentation providing more context and describing the limits on unrolling.

Little bit of background here. There are two forms of the loop unroll pragmas:

#pragma unroll
#pragma unroll N

and:

#pragma clang loop unroll(full)
#pragma clang loop unroll_count(N)

The "#pragma unroll" forms are supported for compatibilty with other compilers (such as nvcc CUDA compiler) where this form is common while "#pragma clang .." is a more cannonical form specific to clang. "#pragma unroll" and "#pragma clang loop unroll(full)" are equivalent. What I propose in the related optimization patch (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable loop unrolling with a runtime trip count of the loop. So if you have:

void foo(int n) {
#pragma unroll

for (int i=0; i<n; i++) { ... }

}

Then the optimizer will unroll the loop with some reasonable power-of-two factor. One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do.

hfinkel added inline comments.Jul 2 2015, 4:38 PM
include/clang/Basic/AttrDocs.td
1330

One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do.

I really don't like this. If the user is requesting 'full' unrolling, I'd rather we do only that (at least up to some really high limit, as we do currently). Runtime unrolling adds extra expense in the low-trip-count case (because of the various checks involved) and has a different set of trade-offs than simpler unrolling.

I think that we should have 'full' unrolling add "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, "aggressive" for example, which is like full unrolling but also allows for runtime unrolling. I think the desired end state here is that -Rpass=unroll would provide some specific feedback in the case where full unrolling was requested, but not possible because of a runtime trip count.

meheff added inline comments.Jul 7 2015, 1:40 PM
include/clang/Basic/AttrDocs.td
1330

I think that makes sense. The issue we were seeing was a loop with runtime trip count behaving badly because it was marked '#pragma unroll', had a runtime trip count, and we'd like to enable runtime unrolling for the NVPTX target (http://reviews.llvm.org/D10855). IIRC, the loop sometimes is compile-time trip count depending upon template magic. Anyway, your suggestion (unroll(full) implies llvm.loop.unroll.runtime.disable) fixes that issue. One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way.

hfinkel added inline comments.Jul 8 2015, 5:10 PM
include/clang/Basic/AttrDocs.td
1330

One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way.

I think it makes sense to enable it for '#pragma unroll N', since to user likely expects that behavior in the case of a runtime trip count.

meheff updated this revision to Diff 29423.Jul 9 2015, 8:05 PM
meheff updated this object.
meheff edited edge metadata.
meheff removed rL LLVM as the repository for this revision.
hfinkel added inline comments.Jul 9 2015, 8:12 PM
docs/LanguageExtensions.rst
1995

I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a Clang option, and I don't think we should mention it directly here. If I did not know how LLVM worked, and what -mllvm did, I'd find this documentation confusing.

Maybe just not mentioning this is best. What do you think?

meheff updated this revision to Diff 29425.Jul 9 2015, 8:23 PM

Removing the llvm-specific flag sounds good to me. Updated in diff.

hfinkel accepted this revision.Jul 9 2015, 9:01 PM
hfinkel edited edge metadata.

LGTM, thanks!

Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in r242048.