This is an archive of the discontinued LLVM Phabricator instance.

[LV] Pick vector loop body as insert point for SCEV expansion.
ClosedPublic

Authored by fhahn on Jul 7 2020, 3:08 AM.

Details

Summary

Currently the DomTree is not kept up to date for additional blocks
generated in the vector loop, for example when vectorizing with
predication. SCEVExpander relies on dominance checks when looking for
existing instructions to re-use and in some cases that can lead to the
expander picking instructions that do not actually dominate their insert
point (e.g. as in PR46525).

Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG
is only patched up after generating code for a block. For now, we can
just use the vector loop header, as this ensures the inserted
instructions dominate all uses in the vector loop. There should be no
noticeable impact on the generated code, as other passes should sink
those instructions, if profitable.

Fixes PR46525.

Diff Detail

Event Timeline

fhahn created this revision.Jul 7 2020, 3:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2020, 3:08 AM
dmgreen added a subscriber: dmgreen.Jul 7 2020, 4:51 AM

Is the DT reliable enough to use for checking the block is in the loop? I see we might have to exclude the preheader and midblock. But if it's not uptodate, and only knows about split block, it might think the midblock it dominated by the vector body.

Maybe if LI->getLoopFor(LoopVectorBody == LI->getLoopFor(InsertBB) works, that may be better? It looks like LI is kept uptodate as blocks get split.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2881

*choose

fhahn updated this revision to Diff 276435.Jul 8 2020, 7:49 AM

Is the DT reliable enough to use for checking the block is in the loop? I see we might have to exclude the preheader and midblock. But if it's not uptodate, and only knows about split block, it might think the midblock it dominated by the vector body.

Maybe if LI->getLoopFor(LoopVectorBody == LI->getLoopFor(InsertBB) works, that may be better? It looks like LI is kept uptodate as blocks get split.

We are mainly want to avoid using LoopVectorBody as insert point for blocks not dominated by t. The new blocks will be unreachable initially and dominated by everything, so it should work fine.

That being said, LI is updated directly as you mentioned, so it is probably better to just use LI. Updated the patch, thanks!

dmgreen accepted this revision.Jul 8 2020, 8:14 AM

Thanks. LGTM.

This revision is now accepted and ready to land.Jul 8 2020, 8:14 AM
This revision was automatically updated to reflect the committed changes.
Ayal added a comment.Jul 12 2020, 7:58 AM

Good catch!

The specific culprit for the pr is the call to SE.DT.dominates(EntInst, InsertPt) by SCEVExpander::FindValueInExprValueMap().

Would probably be better to keep DT up to date as we go along, due to SE.DT's dependence on it, instead of fixing it after code-gen via updateDominatorTree(); but there was some reason for doing it this way(?).

Good catch!

The specific culprit for the pr is the call to SE.DT.dominates(EntInst, InsertPt) by SCEVExpander::FindValueInExprValueMap().

Would probably be better to keep DT up to date as we go along, due to SE.DT's dependence on it, instead of fixing it after code-gen via updateDominatorTree(); but there was some reason for doing it this way(?).

Yes, ideally we would keep the DT up-to-date while vectorizing. It is a slightly bigger change though, mostly because we only patch up the CFG after vectorizing some blocks. I'll look into that as follow-up.