This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Set COPY ZR isAsCheapAsAMove when needed.
ClosedPublic

Authored by haicheng on Jul 14 2016, 8:21 AM.

Details

Summary

Hi Matthias,

I tried

case TargetOpcode::COPY:
    return true;

on cortex-a57 and kryo by running spec2006. Below is the result (positive is good),

astar+0.650%
bzip2+0.236%
dealII-8.229%
gcc+0.550%
gobmk-13.273%
h264ref+0.062%
hmmer+0.575%
lbm-0.008%
libquantum+0.014%
mcf+0.508%
milc+0.100%
namd-0.067%
omnetpp-4.855%
perlbench-12.301%
povray-14.307%
sjeng-0.989%
soplex+0.005%
sphinx3-0.363%
xalancbmk-0.148%

Unfortunately, several benchmarks have double-digit regression. However, on Kryo, setting any COPY isAsCheapAsAMove brings small win, but restricting it to COPY ZR is slightly better.

Thanks,

Haicheng

Diff Detail

Repository
rL LLVM

Event Timeline

haicheng updated this revision to Diff 63978.Jul 14 2016, 8:21 AM
haicheng retitled this revision from to [AArch64] Set COPY ZR isAsCheapAsAMove when needed..
haicheng updated this object.
haicheng set the repository for this revision to rL LLVM.
haicheng added subscribers: t.p.northover, mssimpso.

I'm not sure I understand your argument, or the point of this patch. Can you explain in detail what you're trying to achieve, what are the benchmark numbers you achieve to show that it's a good decision, and what shows it's not a good thing for other cores?

I'm not sure I understand your argument, or the point of this patch. Can you explain in detail what you're trying to achieve, what are the benchmark numbers you achieve to show that it's a good decision, and what shows it's not a good thing for other cores?

This patch is the third part of D19985. Similar to Cyclone, Kryo has ZCZeroing feature which means COPY ZR is free. Cyclone uses general CheapAsAMoveHandling which already sets COPY isCheapAsAMove. However, Kryo uses custom CheapAsMoveHandling so that I need to set COPY ZR isCheapAsAMove explicitly.

In D19985, @MatzeB suggests maybe I should set any COPY isCheapAsAMove for all the subtargets using custom CheapAsMoveHandling rather than just COPY ZR. I did the experiments on A57 and Kryo by running spec2006. The results of A57 is shown in the patch summary which is bad. On Kryo, setting any COPY isCheapAsAMove improves the geomean of spec2006 by +0.05%, but restricting it to COPY ZR improves the geomean by +0.1%.

Currently, only Kryo has both ZCZeroing and CheapAsMoveHandling features. D22256 is the second part of D19985 which sets float point register zeroing out isCheapAsAMove. This patch sets the integer registers.

MatzeB accepted this revision.Jul 14 2016, 12:00 PM
MatzeB edited edge metadata.

Unfortunately, several benchmarks have double-digit regression. However, on Kryo, setting any COPY isAsCheapAsAMove brings small win, but restricting it to COPY ZR is slightly better.

Wow! I would never expect that to have such a huge effect, we may have a bug or bad strategy somewhere in the generic codegen. It would certainly be interesting to learn why we have those huge swings...

In any case this patch LGTM.

This revision is now accepted and ready to land.Jul 14 2016, 12:00 PM

Unfortunately, several benchmarks have double-digit regression. However, on Kryo, setting any COPY isAsCheapAsAMove brings small win, but restricting it to COPY ZR is slightly better.

Wow! I would never expect that to have such a huge effect, we may have a bug or bad strategy somewhere in the generic codegen. It would certainly be interesting to learn why we have those huge swings...

In any case this patch LGTM.

Thank you again, Matthias.

Actually, if I replace

case TargetOpcode::COPY:
    return true;

by

default:
    return isZeroCost(MI);

All double-digit regressions are gone on A57. Only omnetpp has -3.6% regression and povray has -2.2% regression on A57. isZeroCost() includes COPY and other cheap general pseudo instructions.

This revision was automatically updated to reflect the committed changes.