This is an archive of the discontinued LLVM Phabricator instance.

Disable loop peeling during full unrolling pass.
ClosedPublic

Authored by tejohnson on Aug 2 2017, 10:12 PM.

Details

Summary

Peeling should not occur during the full unrolling invocation early
in the pipeline, but rather later with partial and runtime loop
unrolling. The later loop unrolling invocation will also eventually
utilize profile summary and branch frequency information, which
we would like to use to control peeling. And for ThinLTO we want
to delay peeling until the backend (post thin link) phase, just as
we do for most types of unrolling.

Ensure peeling doesn't occur during the full unrolling invocation
by adding a parameter to the shared implementation function, similar
to the way partial and runtime loop unrolling are disabled.

Performance results for ThinLTO suggest this has a neutral to positive
effect on some internal benchmarks.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Aug 2 2017, 10:12 PM
davidxl edited edge metadata.Aug 2 2017, 10:33 PM

Perhaps it is cleaner to

  1. pass the information whether it is a full unroll to tryToUnroll
  2. add a internal option FullUnrollAllowPeeling and make it off by default? This will be similar to UnrollAllowPeeling flag, but takes precedence if it is full unroll.
lib/Transforms/Scalar/LoopUnrollPass.cpp
204 ↗(On Diff #109484)

Move this closer to line 188 above?

chandlerc edited edge metadata.Aug 2 2017, 10:58 PM

Perhaps it is cleaner to

  1. pass the information whether it is a full unroll to tryToUnroll
  2. add a internal option FullUnrollAllowPeeling and make it off by default? This will be similar to UnrollAllowPeeling flag, but takes precedence if it is full unroll.

While I'm somewhat fond of refactoring these interfaces to be less of a mess, would it be better in a follow-up patch? I would also hope it can tackle more than just peeling but encompass several of the numerous parameters.

lib/Transforms/Scalar/LoopUnrollPass.cpp
204 ↗(On Diff #109484)

I think this location is more consistent with the surrounding code... I'm reluctant to deviate here from what seems like a very consistent pattern.

1289 ↗(On Diff #109484)

Shouldn't this pass AllowPeeling since you added that variable above?

davidxl added inline comments.Aug 2 2017, 11:03 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
204 ↗(On Diff #109484)

Right. They all should be interleaved so that settings to the same flag should be side by side, but that is not relevant here.

tejohnson marked an inline comment as done.Aug 3 2017, 6:06 AM

Perhaps it is cleaner to

  1. pass the information whether it is a full unroll to tryToUnroll
  2. add a internal option FullUnrollAllowPeeling and make it off by default? This will be similar to UnrollAllowPeeling flag, but takes precedence if it is full unroll.

While I'm somewhat fond of refactoring these interfaces to be less of a mess, would it be better in a follow-up patch? I would also hope it can tackle more than just peeling but encompass several of the numerous parameters.

Right, I wanted to be consistent with what is there for other unrolling features and how we disable them during full unrolling.

lib/Transforms/Scalar/LoopUnrollPass.cpp
204 ↗(On Diff #109484)

Right, I put it here to be consistent.

1289 ↗(On Diff #109484)

Woops, added that variable then realized I could just pass None directly and forgot to remove it. I'll remove the variable.

tejohnson updated this revision to Diff 109535.Aug 3 2017, 6:28 AM
tejohnson marked an inline comment as done.

Remove extraneous variable

davidxl accepted this revision.Aug 3 2017, 10:49 AM

lgtm

This revision is now accepted and ready to land.Aug 3 2017, 10:49 AM
This revision was automatically updated to reflect the committed changes.
reames added a subscriber: reames.Sep 21 2017, 10:18 PM

FYI, this patch causes a somewhat serious regression on one of our internal benchmarks. Given we have a custom pass pipeline, it's definitely our responsibility to adapt to upstream changes not the other way around, but it would have helped us recognize the problem if either of two things had happened along with this patch:

  1. The patch had been separated into two pieces: adding the new control know without changing the default and then a small patch changing the default.
  2. The change had been announced in some way. An email to llvm-dev probably would have been sufficient.

Given we're hardly the only ones with custom pass pipelines, I'd encourage everyone involved (author, reviewers, etc..) to think about the upgrade path in similar cases which arise in the future.

FYI, this patch causes a somewhat serious regression on one of our internal benchmarks. Given we have a custom pass pipeline, it's definitely our responsibility to adapt to upstream changes not the other way around, but it would have helped us recognize the problem if either of two things had happened along with this patch:

  1. The patch had been separated into two pieces: adding the new control know without changing the default and then a small patch changing the default.
  2. The change had been announced in some way. An email to llvm-dev probably would have been sufficient.

Given we're hardly the only ones with custom pass pipelines, I'd encourage everyone involved (author, reviewers, etc..) to think about the upgrade path in similar cases which arise in the future.

That's clearly interesting. This change, however, seems like it should be more-or-less a nop. If you're going to fully unroll, why would peeling help? Do you end up with a shorter critical path through the branches if we "peel" first?

That's clearly interesting. This change, however, seems like it should be more-or-less a nop. If you're going to fully unroll, why would peeling help? Do you end up with a shorter critical path through the branches if we "peel" first?

In our pass pipeline, we'd end up with loop-unrolling being quite a bit earlier than upstream. This wasn't so much an intentional choice as an accident of history that we really hadn't evaluated up until now. My best guess is that the peeling was accidentally exposing information to other transformations and that the actual unrolling was incidental to the performance swing. Someone on my team is investigating that a bit further to see what actually happened; what I just said was just an educated guess.