This change updates the documentation for the loop unrolling pragma behavior
change in http://reviews.llvm.org/D10854 Specifically, with that change
"#pragma unroll" will not unroll loops with a runtime trip count.
Details
Diff Detail
Event Timeline
include/clang/Basic/AttrDocs.td | ||
---|---|---|
1330 ↗ | (On Diff #28830) | Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc strings the same? |
include/clang/Basic/AttrDocs.td | ||
---|---|---|
1330 ↗ | (On Diff #28830) | In AttrDocs.td there is not documentation for "#pragma clang loop unroll". Instead it just mentions "#pragma clang loop" and points to LanguageExtensions for details. Regarding LanguageExtensions documentation of "#pragma clang loop unroll" and AttrDocs documentation for "#pragma unroll", I believe they do both say the same thing (or at least aren't conflicting with each other) with the LanguageExtensions documentation providing more context and describing the limits on unrolling. Little bit of background here. There are two forms of the loop unroll pragmas: #pragma unroll and: #pragma clang loop unroll(full) The "#pragma unroll" forms are supported for compatibilty with other compilers (such as nvcc CUDA compiler) where this form is common while "#pragma clang .." is a more cannonical form specific to clang. "#pragma unroll" and "#pragma clang loop unroll(full)" are equivalent. What I propose in the related optimization patch (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable loop unrolling with a runtime trip count of the loop. So if you have: void foo(int n) { for (int i=0; i<n; i++) { ... } } Then the optimizer will unroll the loop with some reasonable power-of-two factor. One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do. |
include/clang/Basic/AttrDocs.td | ||
---|---|---|
1330 ↗ | (On Diff #28830) |
I really don't like this. If the user is requesting 'full' unrolling, I'd rather we do only that (at least up to some really high limit, as we do currently). Runtime unrolling adds extra expense in the low-trip-count case (because of the various checks involved) and has a different set of trade-offs than simpler unrolling. I think that we should have 'full' unrolling add "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, "aggressive" for example, which is like full unrolling but also allows for runtime unrolling. I think the desired end state here is that -Rpass=unroll would provide some specific feedback in the case where full unrolling was requested, but not possible because of a runtime trip count. |
include/clang/Basic/AttrDocs.td | ||
---|---|---|
1330 ↗ | (On Diff #28830) | I think that makes sense. The issue we were seeing was a loop with runtime trip count behaving badly because it was marked '#pragma unroll', had a runtime trip count, and we'd like to enable runtime unrolling for the NVPTX target (http://reviews.llvm.org/D10855). IIRC, the loop sometimes is compile-time trip count depending upon template magic. Anyway, your suggestion (unroll(full) implies llvm.loop.unroll.runtime.disable) fixes that issue. One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way. |
include/clang/Basic/AttrDocs.td | ||
---|---|---|
1330 ↗ | (On Diff #28830) |
I think it makes sense to enable it for '#pragma unroll N', since to user likely expects that behavior in the case of a runtime trip count. |
docs/LanguageExtensions.rst | ||
---|---|---|
1995 | I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a Clang option, and I don't think we should mention it directly here. If I did not know how LLVM worked, and what -mllvm did, I'd find this documentation confusing. Maybe just not mentioning this is best. What do you think? |
I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a Clang option, and I don't think we should mention it directly here. If I did not know how LLVM worked, and what -mllvm did, I'd find this documentation confusing.
Maybe just not mentioning this is best. What do you think?