This is an archive of the discontinued LLVM Phabricator instance.

Use loop unrolling pragma metadata in the loop unroller (take 2)
ClosedPublic

Authored by meheff on Jun 14 2014, 3:39 PM.

Details

Summary

This is the second attempt at adding support for using the unrolling pragma metadata in the loop unroller. The previous patch (r210721) was reverted as it was a suspect in test failures (root cause was determined to be a different patch).

Again, here are the supported pragmas and their meaning (they are passed through the IR as metadata):

#pragma clang loop unroll(enable) unroll the loop completely
#pragma clang loop unroll(disable)
do not unroll the loop.
#pragma clang loop unroll_count(N) // unroll the loop N time

If the unroller is unable to unroll the loop as directed by the pragma then the unroller will still generally be more aggressive than the default limits.

This change includes more refactoring than the original patch. After a second look, I felt this was necessary as the original logic was a bit convoluted and layering on the pragma handling just made it worse. Hopefully this change makes it easier to understand.

Hal, this addresses your suggestions of making the pragma unroll limit a cl opt and a size threshold (rather than an unroll count), also it emits optimization remarks if the loop cannot be unrolled as directed by the pragma.

Diff Detail

Event Timeline

meheff updated this revision to Diff 10421.Jun 14 2014, 3:39 PM
meheff retitled this revision from to Use loop unrolling pragma metadata in the loop unroller (take 2).
meheff updated this object.
meheff edited the test plan for this revision. (Show Details)
meheff added reviewers: eliben, hfinkel, reames.
meheff added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.Jun 16 2014, 2:15 AM
lib/Transforms/Scalar/LoopUnrollPass.cpp
54

4096 is much too small... we're worried about catching cases that might cause us to segfault, right? Make this at least an order of magnitude larger. You should experiment with this, take some simple loop and set the limit so that the memory size increase is limited to 200 MB or something like that.

meheff updated this revision to Diff 10453.Jun 16 2014, 11:59 AM
meheff added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
54

I built a release build and ran it over a simple loop with pragma unroll(enable) with various values. Looks like 32K results in a couple hundred more MB being consumed so changed to that value.

I should also mention that I encountered some long compilation times which are superlinear with the unroll count when experimenting with the pragma loop limit. With the current limit (32K) on a simple loop the compilation time is ~7s. Doubling the limit results in a compilation time of ~50s. It seems to be beneath llvm::UnrollLoop -> FoldBlockIntoPredecessor -> llvm::ScalarEvolution::forgetLoop. This may suggest that we don't want to move much higher than 32K.

Mark

hfinkel edited edge metadata.Jun 16 2014, 3:15 PM
  • Original Message -----

From: "Mark Heffernan" <meheff@google.com>
To: meheff@google.com, eliben@google.com, hfinkel@anl.gov, public@my.anl.gov, full@mailrelay.anl.gov, "name com"
<name.com@mailrelay.anl.gov>
Cc: llvm-commits@cs.uiuc.edu
Sent: Monday, June 16, 2014 2:00:36 PM
Subject: Re: [PATCH] Use loop unrolling pragma metadata in the loop unroller (take 2)

Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:54
@@ +53,3 @@
+static cl::opt<unsigned>
+PragmaUnrollThreshold("pragma-unroll-threshold", cl::init(4096),
cl::Hidden,
+ cl::desc("Unrolled size limit for loops with an unroll(enable) or

"

hfinkel@anl.gov wrote:

4096 is much too small... we're worried about catching cases that
might cause us to segfault, right? Make this at least an order of
magnitude larger. You should experiment with this, take some
simple loop and set the limit so that the memory size increase is
limited to 200 MB or something like that.

I built a release build and ran it over a simple loop with pragma
unroll(enable) with various values. Looks like 32K results in a
couple hundred more MB being consumed so changed to that value.

Okay; I'm satisfied with this. Thanks! To be on the safe side, maybe cut it in half again (because because not all instructions contribute to the cost, so there is some variance here).

-Hal

http://reviews.llvm.org/D4147

PR 20058 filed for the long compilation time with high unroll count issue.

Also, will chop pragma threshold in half to 16K as suggested.

meheff updated this revision to Diff 10467.Jun 16 2014, 4:09 PM
meheff edited edge metadata.
eliben accepted this revision.Jun 17 2014, 12:04 PM
eliben edited edge metadata.
This revision is now accepted and ready to land.Jun 17 2014, 12:04 PM
meheff closed this revision.Jun 17 2014, 12:04 PM