This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner] Add up latencies of all instructions in new pattern.
ClosedPublic

Authored by fhahn on Nov 21 2017, 7:53 AM.

Details

Summary

When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.

Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like

r1 = INS1 ..
r2 = INS2 ..
r3 = INS3 r1, r2

I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.

Diff Detail

Event Timeline

fhahn created this revision.Nov 21 2017, 7:53 AM

I ran into this being a problem in D40306. The patterns added there are are not profitable on Cortex-A57, but because the NewRootLatency is under-estimated, the unprofitable pattern gets applied.

Thanks Evandro for having a look! I would appreciate any feedback from Gerolf or Sebastian, as they reviewed/added the change summing up the latencies of the deleted instructions.

Gerolf edited edge metadata.Dec 5 2017, 2:01 PM

I think your commit makes accounting for both cases - the instructions inserted and the instructions deleted - consistent. From that angle this look OK. I have also some food for thought: in both cases the code makes the assumption that it inserts a dependent instruction chain. Only then it is correct to simply add instruction latencies. I have not checked that this assumption is correct for all cases. So I suggest to add a comment about this. Assuming also that you don't see any notable perf regression you can go ahead and commit from my perspective: LGTM.

Thanks for working on this and your patience!
Cheers
Gerolf

fhahn updated this revision to Diff 125750.Dec 6 2017, 9:26 AM

Thanks for having a look. There are no perf regressions on Cortex-A57 on SPEC2k, the lnt test suite and SPEC2006. I'll commit this now, but will also run SPEC2017 on another board.

I've added a comment about the dependency chains. I tried to cover this issue in the original review description, but the dependent instruction chains wording is more concise.

fhahn accepted this revision.Dec 6 2017, 9:27 AM

Accept in Phabricator after LGTM by Gerolf and Evandro, to make Arc happy.

This revision is now accepted and ready to land.Dec 6 2017, 9:27 AM
fhahn closed this revision.Dec 6 2017, 12:28 PM