Page MenuHomePhabricator

[LV] Generate RT checks up-front and remove them if required.
ClosedPublic

Authored by fhahn on Mar 11 2020, 4:03 AM.

Details

Summary

This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lebedev.ri added inline comments.Aug 7 2020, 8:21 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3348–3350
lebedev.ri requested changes to this revision.Aug 12 2020, 2:33 PM

(marking as reviewed)

This revision now requires changes to proceed.Aug 12 2020, 2:33 PM
fhahn added a comment.Oct 7 2020, 11:00 AM

Just in the process of rebasing this again now that most other parts landed and discovered I accidentally made our job harder by teaching SCEVExpander to preserve LCSSA. Needs a bit of more work now.

fhahn updated this revision to Diff 314335.Jan 4 2021, 2:11 AM

Rebased on top of current trunk. This version now can build MultiSource/SPEC2006/SPEC2000 with -O3 -flto without crashing.

Just in the process of rebasing this again now that most other parts landed and discovered I accidentally made our job harder by teaching SCEVExpander to preserve LCSSA. Needs a bit of more work now.

I recently fixed a related issue (0ea3749b3cde), which makes this point obsolete; the final CFG is constructed before expanding, which solves the issue.

rkruppe removed a subscriber: rkruppe.Jan 4 2021, 2:52 AM
fhahn updated this revision to Diff 315352.Jan 8 2021, 5:00 AM

Split off the addRuntimeCheck changes to D94295

fhahn added a comment.Jan 8 2021, 5:34 AM

Sent mail to llvm-dev (https://lists.llvm.org/pipermail/llvm-dev/2021-January/147678.html), to get thoughts on the 'speculatively create IR and throw away if not needed` pattern in general

fhahn updated this revision to Diff 319575.Jan 27 2021, 7:36 AM

Rebase & ping. The thread on llvm-dev did not surface any objections, so I think it might be time to finally restart the review on this one :)

fhahn retitled this revision from [LV] Generate RT checks up-front and remove them if required. (WIP) to [LV] Generate RT checks up-front and remove them if required..Jan 27 2021, 7:42 AM
lebedev.ri accepted this revision.Jan 27 2021, 8:31 AM

I haven't reviewed this in full detail, but this looks good to me overall.
It would probably be good for someone else to also signoff, too.

This revision is now accepted and ready to land.Jan 27 2021, 8:31 AM

I'm going to take a look next week if nobody outstrip me...

fhahn updated this revision to Diff 322101.Feb 8 2021, 7:16 AM

Rebased

I'm going to take a look next week if nobody outstrip me...

That would be very much appreciated! :)

fhahn updated this revision to Diff 323975.Feb 16 2021, 5:56 AM

Rebased again & ping :)

I am planning on landing this within the next few days or early next week, unless there are any additional concerns. The 'build IR and throw away if not needed' is already used in multiple places and there have been no objections in llvm-dev. I am also more than happy to iterate on this once in tree to iron out any issues.

I started looking into this... Hopefully, will be able to provide my feedback by tomorrow.

ebrevnov added inline comments.Feb 17 2021, 2:55 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
435

Please add empty lines around and add something like "// Forward declarations" or similar

460

Would "RTChecks" be a better name?

740

Update description on return value. Also consider using Optional<BasicBlock *>.

1887

Some general notes. I would like us to:

  1. Consolidate all the work related to runtime checks under this helper abstraction. Currently, it looks like some functionality still on it's old place. Find more comments on this as you go.
  2. Make clear boundaries between this GeneratedRTChecks and its users. Please, convert it to class and define clear public API.
1901

'Preheader' looks redundant here as it can easily be get from Loop

1911

Do you really need as separate hook to kick in generation of runtime checks? In other words can we simply move this functionality to constructor?

1914

The goal is to create "un-linked" block(s). Right? In that case, would it be simpler to not depend on Preheader and use CreateBlock directly. I believe it would be easier to un-link the block then.

1923

I would suggest generating SCEV and MemRuntime checks into separate temporary blocks in the first place. I believe that will simplify further managment.

1957

I'm surprised to see the following piece of code dealing with deletion. I would expected SCEVExpanderCleaner does all necessary cleanup. It should be enough just to properly set "markResultUsed". If this is not the case can we rework in this direction?

3281–3299

I believe this functionality should be moved to GeneratedRTChecks.

3288–3289

I believe this functionality should be moved to GeneratedRTChecks.

3293–3296

Please address.

3356

I believe this functionality should be moved to GeneratedRTChecks.

3374

TmpBlock should not be directly exposed outside of GeneratedRTChecks.

8069

I think we should better get all info about runtime checks from GeneratedruntimeChecks directly.

9511

Why do we need to create GeneratedRTChecks outside of InnerLoopVectorizer? Can/Should it have bigger life time?

9760–9765

Why this was moved from its original place?

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll
59

Looks like we stopped vectorizing something, did we?

fhahn updated this revision to Diff 324991.Feb 19 2021, 8:32 AM
fhahn marked 14 inline comments as done.

Addressed comments, thanks! Major changes include turning GeneratedRTChecks into a class, maintaining separate blocks for SCEV checks and mem checks, moving most code to add checks back to CFG to GeneratedRTChecks.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
460

Agreed, changed to RTChecks, thanks!

1887

That's a great suggestion, thanks! I moved most of the code from emitSCEVChecks & emitMemRuntimeChecks here. There's some code (mostly assertions), which remain in the original functions, because they need access to various member of ILV.

The new function names potentially can be improved and they should have doc-comments, but I'd prefer to first settle on the final API before doing that.

1914

I updated the code put SCEV and memory checks in separate blocks. It still uses SplitBlock, because this keeps the blocks in the DT/LI, which may be used during SCEV expansion and it also keeps them linked to the right predecessors during expansion (SCEVExpander tries to find values to re-use in predecessors).

1957

The extra cleanup is needed for instructions create directly by addRuntimeChecks, like compares, and/ors which compare pointer bounds (which are generated through SCEV expansion). I updated the comment and made it clearer. It is also only needed for the block with the mem-checks.

9511

For some users (e.g. epilogue vectorization), GeneratedRTChecks needs to be shared unfortunately.

9760–9765

This is only needed for the follow-up, I moved it to the original place for this patch.

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll
59

We still vectorize, but we now do not create an unnecessary phi node (which has no users)

Thanks for doing the changes. Overall this version looks fine to me. Please go ahead with some final cleanups.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
743

Please update this one as well.

1914

Please, put that info into the code as a comment. SCEVExpander also can decide to hoist some instruction(s) into predecessors. Will they be properly scheduled for deletion?

1982–1983

Please revisit

2028

Why do we need to set it to null?

2055

Why storing to temp?

2056

Why needed?

fhahn updated this revision to Diff 326063.Feb 24 2021, 5:43 AM
fhahn marked 2 inline comments as done.

Addressed latest round of comments, thanks! I also added additional comments to some of the fields in GeneratedRTChecks.

fhahn updated this revision to Diff 326064.Feb 24 2021, 5:46 AM
fhahn marked 2 inline comments as done.

re-flow comment not properly formatted

fhahn marked 11 inline comments as done.Feb 24 2021, 5:51 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1911

I forgot to post my response here. In some cases we know that we do not need runtime checks up front (e.g. because we only interleave the loop). Having a helper to only generate the RT checks if needed helps to avoid unnecessary work.

1914

That's a great idea, I added the comment. All instructions that are generated by SCEVExpander should be tracked by it; it uses an IRBuilder callback to collect them, so it should not matter where exactly they are added.

1950

I am not entirely sure, but I think if the compares could be simplified to true/false during generation from SCEV expression, LoopAccessInfo should have been able to figure out that the checks are unneeded. We could also check for i1 true, but I think that would be best in a separate patch, as the original code had the same assert.

1982–1983

Fixed comment, thanks!

2028

This is used to indicate the check was used and should not be cleaned up. I'll add a comment.

2055

It's not needed any longer, earlier versions set MemCheckBlock = nullptr. I removed it.

2056

This is used to indicate the check was used and should not be cleaned up. I'll add a comment.

ebrevnov accepted this revision.Feb 24 2021, 7:34 PM

LGTM.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2023

Can/Should SCEVCheckCond be used instead? Just to make it symmetric with emitMemRuntimeChecks.

fhahn updated this revision to Diff 326363.Feb 25 2021, 5:12 AM
fhahn marked 5 inline comments as done.

Update to use SCEVCheckCond.

This revision was landed with ongoing or failed builds.Mar 1 2021, 2:48 AM
This revision was automatically updated to reflect the committed changes.