This is an archive of the discontinued LLVM Phabricator instance.

[LoopStrengthReduction] Treat SCEVUnknown pessimistically in LSR
AbandonedPublic

Authored by mkazantsev on Jun 6 2017, 12:50 AM.

Details

Summary

When evaluating formulae for LSR, we sometimes may have SCEVUnknown included
into the expression. When we try to make a fixup for an instruction outside the loop
and this SCEVUnknown is loop-variant, we are unable to duplicate/sink its s out of loop.
We must reuse the instruction from the loop after fixup. As result, it will not be trivially
dead for sure. We are also unable to evaluate the number of other instructions that will
also not become dead, because they are still used by it. This makes our reasoning about
profitability of application of the formula misleading.

On the other hand, while doing this transformation, we may create LSR IV Phis in the loop,
thus increasing the code. These new Phis will also be (likely) not dead. As result, this
misleading choise of a formula ends up with code increase without any benefits. This only
leads to performance, compile time and code size degradations.

The examples of tests where the LSR only produces new instructions inside and ouside the
loop are demonstrated in the attached test.

To avoid this situation, this patch changes LSR cost model so that it rejects formulae with
registers that include loop-variant SCEVUnknown values if they require at least one fixup
outside the loop.

Diff Detail

Event Timeline

mkazantsev created this revision.Jun 6 2017, 12:50 AM
mkazantsev updated this revision to Diff 101520.Jun 6 2017, 1:01 AM

Fixed a typo in comment.

wmi edited edge metadata.Jun 22 2017, 10:13 AM

Hi Max,

The way you treat formula containing loop variant unknown as loser will definitely help for some cases. But I can also think of some cases that LSRUse with all-fixups-outside-loop may have no available formula after this filtering and LSR will end up doing nothing for the whole loop. This may cause regression. Think about a hypothetical loop with 10 different induction variables inside of loop. Ideally LSR can replace those 10 induction variables with a single one and will be very helpful for performance. If because a LSRUse with all-fixups-outside-loop has no available formula after filtering all the losers and LSR finds no solution for the loop, it will be a big lost.

Thanks,
Wei.

mkazantsev planned changes to this revision.Jun 29 2017, 11:15 PM

This patch doesn't only help, but in some cases produces the worse code. The modification of the test rm-and-tst-peephole.ll is actually a red flad. Need to look closer into the situations when this patch makes us worse.

mkazantsev abandoned this revision.Oct 10 2017, 10:21 PM

No obvious reasons to keep it since it is not purely profitable transform.