This is an archive of the discontinued LLVM Phabricator instance.

[UnJ] Improve explicit loop count checks
ClosedPublic

Authored by dmgreen on Jul 31 2018, 8:54 AM.

Details

Summary

Try to improve the computed counts when it has been explicitly set by a pragma or command line option. This moves the code around, so that first call to computeUnrollCount to get a sensible count and override that if explicit unroll and jam counts are specified.

Also added some extra debug messages for when unroll and jamming is disabled.

Diff Detail

Repository
rL LLVM

Event Timeline

dmgreen created this revision.Jul 31 2018, 8:54 AM

I find the summary confusing. Can we say that simple unrolling will be prioritized over llvm.loop.unroll_and_jam.count metadata and -unroll-and-jam-count= cmdline options?

The test case does not seem to test this. The comment mentions it expects unroll-and-jam to happen, instead of prioritizing llvm.loop.unroll.enable.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
160 ↗(On Diff #158291)

OuterTripCount is passed by-reference to computeUnrollCount but passed by-value to computeUnrollAndJamCount. It is used here and by the caller of computeUnrollAndJamCount (but which will use the non-updated value because of passing by-value). Is this intentional?

198 ↗(On Diff #158291)

ExplicitUnroll seem to have two meanings: Once to determine whether simple unroll is enabled, second to determine whether Unroll-And-Jam is enabled. Could we rename this to ExplicitUnrollAndJam?

213–216 ↗(On Diff #158291)

What is supposed to happen with llvm.loop.unroll_and_jam.enable set, but not llvm.loop.unroll_and_jam.count? My expectation would be that the count is determined automatically, including considering UP.UnrollAndJamInnerLoopThreshold which would be skipped here.

dmgreen updated this revision to Diff 158346.Jul 31 2018, 11:45 AM
dmgreen marked an inline comment as done.
dmgreen edited the summary of this revision. (Show Details)

Thanks for taking a look. The problem I was trying to solve here was the runtime thresholds preventing UnJ even if an explicit count was set.

There are other pragma tests in pragma.ll, which check combinations of unroll and unroll_and_jam pragmas. The current behaviour if there is both unroll metadata and unroll_and_jam metadata isn't currently very refined. I would expect, at least in the default pipeline, for the unroll metadata to be handled first in one of the early unroll passes.

Let me know if you think I should change how it works here.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
160 ↗(On Diff #158291)

I believe computeUnrollCount will only update TripCount if it is fully unrolling, and only update it to either TripCount (itself) or MaxTripCount. The second shouldn't be used as we are setting it to 0 here, so OuterTripCount shouldn't be updated.

I've removed the OuterTripCount use below in this function, I think the increased threshold should apply for partial loops too.

213–216 ↗(On Diff #158291)

Yep, Good point. The pragma threshold is very high, but should still be checked. I've tried to split ExplicitCount's vs other Explicits.

Meinersbur accepted this revision.Aug 9 2018, 2:27 PM

There are other pragma tests in pragma.ll, which check combinations of unroll and unroll_and_jam pragmas. The current behaviour if there is both unroll metadata and unroll_and_jam metadata isn't currently very refined. I would expect, at least in the default pipeline, for the unroll metadata to be handled first in one of the early unroll passes.

This is what I am trying to tackle in D49281: Ideally one must not define multiple transformations at all, but if the frontend wants multiple transformations, it has to explicitly define an order.

Let me know if you think I should change how it works here.

Except the CHECK-NOT: LGTM

test/Transforms/LoopUnrollAndJam/pragma-explicit.ll
9 ↗(On Diff #158346)

-NOT tests are quite fragile. To check that there is no 4th unroll, did you consider checking e.g. the ; preds = of the BB that follows the last unroll copy?

This revision is now accepted and ready to land.Aug 9 2018, 2:27 PM

Thanks

test/Transforms/LoopUnrollAndJam/pragma-explicit.ll
9 ↗(On Diff #158346)

The unrolled BB's get merged, usually leaving just the first block, so unfortunately I don't think the preds will show much. I've changed things to check the outer phi nodes, where it can check that the return phi comes from %.pre60.3, in this case.

This revision was automatically updated to reflect the committed changes.
Meinersbur added inline comments.Aug 14 2018, 3:56 PM
test/Transforms/LoopUnrollAndJam/pragma-explicit.ll
9 ↗(On Diff #158346)

Correct, as I had to find myself in D50075. Sorry for the incorrect suggestion.

I ended up checking for the number of stores (with only one in the source IR) follows-by a CHECK-NOT: store. This does not depend on any instruction naming. To avoid occurrences in the remainder to be matched, add a CHECK: br i1 to match the end of the BasicBlock before the CHECK-NOT. For instance here: https://reviews.llvm.org/D49281#change-CH2qXAAQjJUs