Page MenuHomePhabricator

[LV] Generate RT checks up-front and remove them if required.

Authored by fhahn on Mar 11 2020, 4:03 AM.



This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Diff Detail

Unit TestsFailed

60 msLLVM.Transforms/LoopVectorize::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 < /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/iv_outside_user.ll | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/iv_outside_user.ll
30 msLLVM.Transforms/LoopVectorize/X86::Unknown Unit Message ("")
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/opt < /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -S | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lebedev.ri added inline comments.Aug 7 2020, 8:21 AM
lebedev.ri requested changes to this revision.Aug 12 2020, 2:33 PM

(marking as reviewed)

This revision now requires changes to proceed.Aug 12 2020, 2:33 PM
fhahn added a comment.Oct 7 2020, 11:00 AM

Just in the process of rebasing this again now that most other parts landed and discovered I accidentally made our job harder by teaching SCEVExpander to preserve LCSSA. Needs a bit of more work now.

fhahn updated this revision to Diff 314335.Jan 4 2021, 2:11 AM

Rebased on top of current trunk. This version now can build MultiSource/SPEC2006/SPEC2000 with -O3 -flto without crashing.

Just in the process of rebasing this again now that most other parts landed and discovered I accidentally made our job harder by teaching SCEVExpander to preserve LCSSA. Needs a bit of more work now.

I recently fixed a related issue (0ea3749b3cde), which makes this point obsolete; the final CFG is constructed before expanding, which solves the issue.

rkruppe removed a subscriber: rkruppe.Jan 4 2021, 2:52 AM
fhahn updated this revision to Diff 315352.Jan 8 2021, 5:00 AM

Split off the addRuntimeCheck changes to D94295

fhahn added a comment.Jan 8 2021, 5:34 AM

Sent mail to llvm-dev (, to get thoughts on the 'speculatively create IR and throw away if not needed` pattern in general

fhahn updated this revision to Diff 319575.Jan 27 2021, 7:36 AM

Rebase & ping. The thread on llvm-dev did not surface any objections, so I think it might be time to finally restart the review on this one :)

fhahn retitled this revision from [LV] Generate RT checks up-front and remove them if required. (WIP) to [LV] Generate RT checks up-front and remove them if required..Jan 27 2021, 7:42 AM
lebedev.ri accepted this revision.Jan 27 2021, 8:31 AM

I haven't reviewed this in full detail, but this looks good to me overall.
It would probably be good for someone else to also signoff, too.

This revision is now accepted and ready to land.Jan 27 2021, 8:31 AM

I'm going to take a look next week if nobody outstrip me...

fhahn updated this revision to Diff 322101.Feb 8 2021, 7:16 AM


I'm going to take a look next week if nobody outstrip me...

That would be very much appreciated! :)

fhahn updated this revision to Diff 323975.Feb 16 2021, 5:56 AM

Rebased again & ping :)

I am planning on landing this within the next few days or early next week, unless there are any additional concerns. The 'build IR and throw away if not needed' is already used in multiple places and there have been no objections in llvm-dev. I am also more than happy to iterate on this once in tree to iron out any issues.

I started looking into this... Hopefully, will be able to provide my feedback by tomorrow.

ebrevnov added inline comments.Feb 17 2021, 2:55 AM

Please add empty lines around and add something like "// Forward declarations" or similar


Would "RTChecks" be a better name?


Update description on return value. Also consider using Optional<BasicBlock *>.


Some general notes. I would like us to:

  1. Consolidate all the work related to runtime checks under this helper abstraction. Currently, it looks like some functionality still on it's old place. Find more comments on this as you go.
  2. Make clear boundaries between this GeneratedRTChecks and its users. Please, convert it to class and define clear public API.

'Preheader' looks redundant here as it can easily be get from Loop


Do you really need as separate hook to kick in generation of runtime checks? In other words can we simply move this functionality to constructor?


The goal is to create "un-linked" block(s). Right? In that case, would it be simpler to not depend on Preheader and use CreateBlock directly. I believe it would be easier to un-link the block then.


I would suggest generating SCEV and MemRuntime checks into separate temporary blocks in the first place. I believe that will simplify further managment.


I'm surprised to see the following piece of code dealing with deletion. I would expected SCEVExpanderCleaner does all necessary cleanup. It should be enough just to properly set "markResultUsed". If this is not the case can we rework in this direction?


I believe this functionality should be moved to GeneratedRTChecks.


I believe this functionality should be moved to GeneratedRTChecks.


Please address.


I believe this functionality should be moved to GeneratedRTChecks.


TmpBlock should not be directly exposed outside of GeneratedRTChecks.


I think we should better get all info about runtime checks from GeneratedruntimeChecks directly.


Why do we need to create GeneratedRTChecks outside of InnerLoopVectorizer? Can/Should it have bigger life time?


Why this was moved from its original place?

59 ↗(On Diff #323975)

Looks like we stopped vectorizing something, did we?

fhahn updated this revision to Diff 324991.Feb 19 2021, 8:32 AM
fhahn marked 14 inline comments as done.

Addressed comments, thanks! Major changes include turning GeneratedRTChecks into a class, maintaining separate blocks for SCEV checks and mem checks, moving most code to add checks back to CFG to GeneratedRTChecks.


Agreed, changed to RTChecks, thanks!


That's a great suggestion, thanks! I moved most of the code from emitSCEVChecks & emitMemRuntimeChecks here. There's some code (mostly assertions), which remain in the original functions, because they need access to various member of ILV.

The new function names potentially can be improved and they should have doc-comments, but I'd prefer to first settle on the final API before doing that.


I updated the code put SCEV and memory checks in separate blocks. It still uses SplitBlock, because this keeps the blocks in the DT/LI, which may be used during SCEV expansion and it also keeps them linked to the right predecessors during expansion (SCEVExpander tries to find values to re-use in predecessors).


The extra cleanup is needed for instructions create directly by addRuntimeChecks, like compares, and/ors which compare pointer bounds (which are generated through SCEV expansion). I updated the comment and made it clearer. It is also only needed for the block with the mem-checks.


For some users (e.g. epilogue vectorization), GeneratedRTChecks needs to be shared unfortunately.


This is only needed for the follow-up, I moved it to the original place for this patch.

59 ↗(On Diff #323975)

We still vectorize, but we now do not create an unnecessary phi node (which has no users)

Thanks for doing the changes. Overall this version looks fine to me. Please go ahead with some final cleanups.


Please update this one as well.


Please, put that info into the code as a comment. SCEVExpander also can decide to hoist some instruction(s) into predecessors. Will they be properly scheduled for deletion?


Please revisit


Why do we need to set it to null?


Why storing to temp?


Why needed?

fhahn updated this revision to Diff 326063.Feb 24 2021, 5:43 AM
fhahn marked 2 inline comments as done.

Addressed latest round of comments, thanks! I also added additional comments to some of the fields in GeneratedRTChecks.

fhahn updated this revision to Diff 326064.Feb 24 2021, 5:46 AM
fhahn marked 2 inline comments as done.

re-flow comment not properly formatted

fhahn marked 11 inline comments as done.Feb 24 2021, 5:51 AM
fhahn added inline comments.

I forgot to post my response here. In some cases we know that we do not need runtime checks up front (e.g. because we only interleave the loop). Having a helper to only generate the RT checks if needed helps to avoid unnecessary work.


That's a great idea, I added the comment. All instructions that are generated by SCEVExpander should be tracked by it; it uses an IRBuilder callback to collect them, so it should not matter where exactly they are added.


I am not entirely sure, but I think if the compares could be simplified to true/false during generation from SCEV expression, LoopAccessInfo should have been able to figure out that the checks are unneeded. We could also check for i1 true, but I think that would be best in a separate patch, as the original code had the same assert.


Fixed comment, thanks!


This is used to indicate the check was used and should not be cleaned up. I'll add a comment.


It's not needed any longer, earlier versions set MemCheckBlock = nullptr. I removed it.


This is used to indicate the check was used and should not be cleaned up. I'll add a comment.

ebrevnov accepted this revision.Feb 24 2021, 7:34 PM



Can/Should SCEVCheckCond be used instead? Just to make it symmetric with emitMemRuntimeChecks.

fhahn updated this revision to Diff 326363.Feb 25 2021, 5:12 AM
fhahn marked 5 inline comments as done.

Update to use SCEVCheckCond.

This revision was landed with ongoing or failed builds.Mar 1 2021, 2:48 AM
This revision was automatically updated to reflect the committed changes.