This is an archive of the discontinued LLVM Phabricator instance.

Verify profile data confirms large loop trip counts.
ClosedPublic

Authored by mtrofin on Feb 5 2018, 9:00 PM.

Details

Summary

Loops with inequality comparers, such as:

// unsigned bound
for (unsigned i = 1; i < bound; ++i) {...}

have getSmallConstantMaxTripCount report a large maximum static
trip count - in this case, 0xffff fffe. However, profiling info
may show that the trip count is much smaller, and thus
counter-recommend vectorization.

This change:

  • flips loop-vectorize-with-block-frequency on by default.
  • validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far.

Diff Detail

Repository
rL LLVM

Event Timeline

mtrofin created this revision.Feb 5 2018, 9:00 PM
davidxl added inline comments.Feb 5 2018, 10:09 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
8353 ↗(On Diff #132934)

What is the ExpectedTC returned in this case? Why does it not return CouldNotCompute (or 0)?

mtrofin added inline comments.Feb 5 2018, 10:59 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
8353 ↗(On Diff #132934)

For example, for:

for (unsigned i = 1; i < something; ++i) {
  ...
}

it returns 0xffff fffe - assuming unsigned something. That's correct - that's the maximum times the loop body would be executed, because something might at most be 0xffff ffff. If the step were 2 instead of 1, the ExpectedTC would reflect that accordingly (meaning, the trip count would be half).
In contrast, CouldNotCompute is produced in cases like unsupported loops, for instance those with multiple exits.

When it comes to '0', the API behaves a bit unexpectedly, I'd argue: in cases like the ones in the current set of unit tests, it ends up adding 1 to the max taken back edges (which is 0xffff ffff there), which through overflow gets us to 0. That was the reason those tests were passing.

I'd love to hear others' thoughts are on this overflow behavior, and whether there's any reason we shouldn't refactor it, because relying on the overflow to represent "not having an expected trip count" falls through in cases like I just presented (e.g. non-unitary step, starting from something else than 0, etc)

This issue of determining "no expected trip counts" aside, I'd argue the issue addressed here can be addressed separately.

davidxl added inline comments.Feb 5 2018, 11:13 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
8353 ↗(On Diff #132934)

I think the logic here should implement the order of precedence in this way:

  1. If there is known constant trip count (such as for( ...; i < 1000; i++), use that as the expectedTC
  2. otherwise if there is profile based estimated trip count, use it as the expectedTC
  3. use the smallConstantMaxTripCount as an estimation.
mtrofin updated this revision to Diff 132946.Feb 6 2018, 12:28 AM

Incorporating davidxl's feedback.

mtrofin added inline comments.Feb 6 2018, 12:29 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
8353 ↗(On Diff #132934)

That would make it more clear, indeed.

bkramer added a subscriber: bkramer.Feb 6 2018, 5:00 AM
bkramer added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
8356 ↗(On Diff #132946)

I'd prefer something like ConstExits->getValue()->getActiveBits() <= 32. getZExtValue is not safe for values larger than 64 bits, which are rare but can happen in theory.

mtrofin updated this revision to Diff 133029.Feb 6 2018, 9:29 AM

Incorporated bkramer's feedback.

bkramer added inline comments.Feb 6 2018, 9:41 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
8359 ↗(On Diff #133029)

This is now redundant ;)

mtrofin added inline comments.Feb 6 2018, 9:47 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
8359 ↗(On Diff #133029)

Why? getZExtValue() could be 0xffff ffff (getActiveBits() == 32).

That would overflow on line 8358.

bkramer added inline comments.Feb 6 2018, 10:10 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
8359 ↗(On Diff #133029)

I see. Off by one. I should've suggested ExitsCount.ult(std::numeric_limits<unsigned>::max()) in that case, which should do the same as your current condition.

davidxl accepted this revision.Feb 6 2018, 10:38 AM

LGTM + incorporating Ben's suggestion to use 'ult' if it can simplify the condition.

This revision is now accepted and ready to land.Feb 6 2018, 10:38 AM
mtrofin updated this revision to Diff 133060.Feb 6 2018, 12:19 PM

Incorporated bkramer's feedback (2)

mtrofin added inline comments.Feb 6 2018, 12:21 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
8359 ↗(On Diff #133029)

Ah - thanks, that looks better, indeed!

This revision was automatically updated to reflect the committed changes.