This is the second attempt at adding support for using the unrolling pragma metadata in the loop unroller. The previous patch (r210721) was reverted as it was a suspect in test failures (root cause was determined to be a different patch).
Again, here are the supported pragmas and their meaning (they are passed through the IR as metadata):
#pragma clang loop unroll(enable) unroll the loop completely
#pragma clang loop unroll(disable) do not unroll the loop.
#pragma clang loop unroll_count(N) // unroll the loop N time
If the unroller is unable to unroll the loop as directed by the pragma then the unroller will still generally be more aggressive than the default limits.
This change includes more refactoring than the original patch. After a second look, I felt this was necessary as the original logic was a bit convoluted and layering on the pragma handling just made it worse. Hopefully this change makes it easier to understand.
Hal, this addresses your suggestions of making the pragma unroll limit a cl opt and a size threshold (rather than an unroll count), also it emits optimization remarks if the loop cannot be unrolled as directed by the pragma.
4096 is much too small... we're worried about catching cases that might cause us to segfault, right? Make this at least an order of magnitude larger. You should experiment with this, take some simple loop and set the limit so that the memory size increase is limited to 200 MB or something like that.