This is an archive of the discontinued LLVM Phabricator instance.

Update documentation for unroll pragmas on loops with runtime trip counts
ClosedPublic

Authored by meheff on Jun 30 2015, 4:55 PM.

Download Raw Diff

Details

Reviewers

broune
hfinkel

Summary

This change updates the documentation for the loop unrolling pragma behavior
change in http://reviews.llvm.org/D10854 Specifically, with that change
"#pragma unroll" will not unroll loops with a runtime trip count.

Diff Detail

Repository: rL LLVM

Event Timeline

meheff updated this revision to Diff 28830.Jun 30 2015, 4:55 PM

meheff retitled this revision from to Update documentation for unroll pragmas on loops with runtime trip counts.

meheff updated this object.

meheff edited the test plan for this revision. (Show Details)

meheff added reviewers: hfinkel, broune.

meheff set the repository for this revision to rL LLVM.

meheff added subscribers: Unknown Object (MLST), Unknown Object (MLST).

broune accepted this revision.Jun 30 2015, 8:50 PM

broune edited edge metadata.

broune added inline comments.

include/clang/Basic/AttrDocs.td
1330	Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc strings the same?

This revision is now accepted and ready to land.Jun 30 2015, 8:50 PM

meheff added inline comments.Jul 1 2015, 9:26 AM

include/clang/Basic/AttrDocs.td
1330	In AttrDocs.td there is not documentation for "#pragma clang loop unroll". Instead it just mentions "#pragma clang loop" and points to LanguageExtensions for details. Regarding LanguageExtensions documentation of "#pragma clang loop unroll" and AttrDocs documentation for "#pragma unroll", I believe they do both say the same thing (or at least aren't conflicting with each other) with the LanguageExtensions documentation providing more context and describing the limits on unrolling. Little bit of background here. There are two forms of the loop unroll pragmas: #pragma unroll #pragma unroll N and: #pragma clang loop unroll(full) #pragma clang loop unroll_count(N) The "#pragma unroll" forms are supported for compatibilty with other compilers (such as nvcc CUDA compiler) where this form is common while "#pragma clang .." is a more cannonical form specific to clang. "#pragma unroll" and "#pragma clang loop unroll(full)" are equivalent. What I propose in the related optimization patch (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable loop unrolling with a runtime trip count of the loop. So if you have: void foo(int n) { #pragma unroll for (int i=0; i<n; i++) { ... } } Then the optimizer will unroll the loop with some reasonable power-of-two factor. One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do.

hfinkel added inline comments.Jul 2 2015, 4:38 PM

include/clang/Basic/AttrDocs.td
1330	One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do. I really don't like this. If the user is requesting 'full' unrolling, I'd rather we do only that (at least up to some really high limit, as we do currently). Runtime unrolling adds extra expense in the low-trip-count case (because of the various checks involved) and has a different set of trade-offs than simpler unrolling. I think that we should have 'full' unrolling add "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, "aggressive" for example, which is like full unrolling but also allows for runtime unrolling. I think the desired end state here is that -Rpass=unroll would provide some specific feedback in the case where full unrolling was requested, but not possible because of a runtime trip count.

hfinkel mentioned this in D10854: Enable runtime unrolling with unroll pragma metadata.Jul 2 2015, 4:39 PM

meheff added inline comments.Jul 7 2015, 1:40 PM

include/clang/Basic/AttrDocs.td
1330	I think that makes sense. The issue we were seeing was a loop with runtime trip count behaving badly because it was marked '#pragma unroll', had a runtime trip count, and we'd like to enable runtime unrolling for the NVPTX target (http://reviews.llvm.org/D10855). IIRC, the loop sometimes is compile-time trip count depending upon template magic. Anyway, your suggestion (unroll(full) implies llvm.loop.unroll.runtime.disable) fixes that issue. One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way.

hfinkel added inline comments.Jul 8 2015, 5:10 PM

include/clang/Basic/AttrDocs.td
1330	One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way. I think it makes sense to enable it for '#pragma unroll N', since to user likely expects that behavior in the case of a runtime trip count.

meheff updated this revision to Diff 29423.Jul 9 2015, 8:05 PM

meheff updated this object.

meheff edited edge metadata.

meheff removed rL LLVM as the repository for this revision.

hfinkel added inline comments.Jul 9 2015, 8:12 PM

docs/LanguageExtensions.rst
1995	I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a Clang option, and I don't think we should mention it directly here. If I did not know how LLVM worked, and what -mllvm did, I'd find this documentation confusing. Maybe just not mentioning this is best. What do you think?

Removing the llvm-specific flag sounds good to me. Updated in diff.

LGTM, thanks!

Committed in r242048.

Revision Contents

Path

Size

docs/

LanguageExtensions.rst

10 lines

include/

clang/

Basic/

AttrDocs.td

7 lines

Diff 28830

docs/LanguageExtensions.rst

	Show First 20 Lines • Show All 1,985 Lines • ▼ Show 20 Lines
	Unrolling a loop reduces the loop control overhead and exposes more			Unrolling a loop reduces the loop control overhead and exposes more
	opportunities for ILP. Loops can be fully or partially unrolled. Full unrolling			opportunities for ILP. Loops can be fully or partially unrolled. Full unrolling
	eliminates the loop and replaces it with an enumerated sequence of loop			eliminates the loop and replaces it with an enumerated sequence of loop
	iterations. Full unrolling is only possible if the loop trip count is known at			iterations. Full unrolling is only possible if the loop trip count is known at
	compile time. Partial unrolling replicates the loop body within the loop and			compile time. Partial unrolling replicates the loop body within the loop and
	reduces the trip count.			reduces the trip count.

	If ``unroll(full)`` is specified the unroller will attempt to fully unroll the			If ``unroll(full)`` is specified the unroller will attempt to fully unroll the
	loop if the trip count is known at compile time. If the loop count is not known			loop if the trip count is known at compile time. If the fully unrolled code size
	or the fully unrolled code size is greater than the limit specified by the			is greater than the limit specified by the `-pragma-unroll-threshold` command
				hfinkelUnsubmitted Not Done Reply Inline Actions I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a Clang option, and I don't think we should mention it directly here. If I did not know how LLVM worked, and what -mllvm did, I'd find this documentation confusing. Maybe just not mentioning this is best. What do you think? hfinkel: I know this is a pre-existing issue, but -pragma-unroll-threshold is an LLVM option, not a…
	`-pragma-unroll-threshold` command line option the loop will be partially			line option the loop will be partially unrolled subject to the same limit. If
	unrolled subject to the same limit.			the loop trip count is unknown at compile time the unroller tries to unroll the
				loop by a power-of-two factor subject to an unrolled size limit specified by
				the `-unroll-threshold` command line option.

	.. code-block:: c++			.. code-block:: c++

	#pragma clang loop unroll(full)			#pragma clang loop unroll(full)
	for(...) {			for(...) {
	...			...
	}			}

	Show All 31 Lines

include/clang/Basic/AttrDocs.td

	Show First 20 Lines • Show All 1,317 Lines • ▼ Show 20 Lines
	def UnrollHintDocs : Documentation {			def UnrollHintDocs : Documentation {
	let Category = DocCatStmt;			let Category = DocCatStmt;
	let Heading = "#pragma unroll, #pragma nounroll";			let Heading = "#pragma unroll, #pragma nounroll";
	let Content = [{			let Content = [{
	Loop unrolling optimization hints can be specified with ``#pragma unroll`` and			Loop unrolling optimization hints can be specified with ``#pragma unroll`` and
	``#pragma nounroll``. The pragma is placed immediately before a for, while,			``#pragma nounroll``. The pragma is placed immediately before a for, while,
	do-while, or c++11 range-based for loop.			do-while, or c++11 range-based for loop.

	Specifying ``#pragma unroll`` without a parameter directs the loop unroller to			Specifying ``#pragma unroll`` without a parameter directs the loop
	attempt to fully unroll the loop if the trip count is known at compile time:			unroller to attempt to fully unroll the loop if the trip count is
				known at compile time. If the trip count is not known at compile time
				the loop unroller tries to unroll the loop by a power-of-two factor
				subject to a limit on unrolled code size.
				brouneUnsubmitted Not Done Reply Inline Actions Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc strings the same? broune: Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc…
				meheffAuthorUnsubmitted Not Done Reply Inline Actions In AttrDocs.td there is not documentation for "#pragma clang loop unroll". Instead it just mentions "#pragma clang loop" and points to LanguageExtensions for details. Regarding LanguageExtensions documentation of "#pragma clang loop unroll" and AttrDocs documentation for "#pragma unroll", I believe they do both say the same thing (or at least aren't conflicting with each other) with the LanguageExtensions documentation providing more context and describing the limits on unrolling. Little bit of background here. There are two forms of the loop unroll pragmas: #pragma unroll #pragma unroll N and: #pragma clang loop unroll(full) #pragma clang loop unroll_count(N) The "#pragma unroll" forms are supported for compatibilty with other compilers (such as nvcc CUDA compiler) where this form is common while "#pragma clang .." is a more cannonical form specific to clang. "#pragma unroll" and "#pragma clang loop unroll(full)" are equivalent. What I propose in the related optimization patch (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable loop unrolling with a runtime trip count of the loop. So if you have: void foo(int n) { #pragma unroll for (int i=0; i<n; i++) { ... } } Then the optimizer will unroll the loop with some reasonable power-of-two factor. One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do. meheff: In AttrDocs.td there is not documentation for "#pragma clang loop unroll". Instead it just…
				hfinkelUnsubmitted Not Done Reply Inline Actions One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do. I really don't like this. If the user is requesting 'full' unrolling, I'd rather we do only that (at least up to some really high limit, as we do currently). Runtime unrolling adds extra expense in the low-trip-count case (because of the various checks involved) and has a different set of trade-offs than simpler unrolling. I think that we should have 'full' unrolling add "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, "aggressive" for example, which is like full unrolling but also allows for runtime unrolling. I think the desired end state here is that -Rpass=unroll would provide some specific feedback in the case where full unrolling was requested, but not possible because of a runtime trip count. hfinkel: > One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will…
				meheffAuthorUnsubmitted Not Done Reply Inline Actions I think that makes sense. The issue we were seeing was a loop with runtime trip count behaving badly because it was marked '#pragma unroll', had a runtime trip count, and we'd like to enable runtime unrolling for the NVPTX target (http://reviews.llvm.org/D10855). IIRC, the loop sometimes is compile-time trip count depending upon template magic. Anyway, your suggestion (unroll(full) implies llvm.loop.unroll.runtime.disable) fixes that issue. One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way. meheff: I think that makes sense. The issue we were seeing was a loop with runtime trip count behaving…
				hfinkelUnsubmitted Not Done Reply Inline Actions One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"? This gives more benefit to loops with higher trip counts, but I could be convinced either way. I think it makes sense to enable it for '#pragma unroll N', since to user likely expects that behavior in the case of a runtime trip count. hfinkel: > One question is how do you feel about enabling runtime unrolling with '#pragma unroll N"?

	.. code-block:: c++			.. code-block:: c++

	#pragma unroll			#pragma unroll
	for (...) {			for (...) {
	...			...
	}			}

	▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines