This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Canonicalize X - urem X, Y patterns
ClosedPublic

Authored by reames on Nov 16 2021, 11:30 AM.

Details

Summary

There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

Why this appears to minorly negatively impact hardware loop recognition on ARM, I have no idea. I definitely don't consider that a blocker. I can't even tell from the test if this is actually a regression - the test is too poorly structured to be informative.

Diff Detail

Event Timeline

reames created this revision.Nov 16 2021, 11:30 AM
reames requested review of this revision.Nov 16 2021, 11:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2021, 11:30 AM
lebedev.ri added inline comments.Nov 16 2021, 11:34 AM
llvm/lib/Analysis/ScalarEvolution.cpp
2604

What about the more general case of (-1 * urem X, Y) + X + Z --> ((-1 * urem X, Y) + X) + Z --> (Y * X/Y) + Z ?

reames added inline comments.Nov 16 2021, 11:35 AM
llvm/lib/Analysis/ScalarEvolution.cpp
2604

The pattern matching for this gets hairy. I think it's worthwhile to do (so that we can handle backedge taken count expressions in runtime unroll), but I'd strongly prefer to generalize in a separate commit with it's own review.

lebedev.ri accepted this revision.Nov 16 2021, 11:41 AM

This LG. Not sure if second opinion is needed.

llvm/lib/Analysis/ScalarEvolution.cpp
2604

Yep, just highlighting that this is only the simplest case possible.

This revision is now accepted and ready to land.Nov 16 2021, 11:41 AM

@dmgreen might want to look at the ARM test changes. The code change itself looks good to me though.

Why this appears to minorly negatively impact hardware loop recognition on ARM, I have no idea.

We're recognizing more loops, not less. The "le" instruction indicates we've recognized a loop; we generate a generic branch "bne" when we don't. It's more instructions outside the loop because hardware loop optimization requires materializing the precise trip count. The sequence we're using to compute the trip count is messy, though. I guess the issue is that we're running SCEVExpander later than we otherwise would?

Adding a couple reviewers more familiar with MVE in case they have any further comment on this, but the ARM changes should be fine as-is.

This revision was landed with ongoing or failed builds.Nov 16 2021, 11:59 AM
This revision was automatically updated to reflect the committed changes.

Yeah, these look good now. Thanks!