This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] MachineSSA pass to reduce the number of CR-logical operations
ClosedPublic

Authored by nemanjai on Feb 27 2017, 2:48 PM.

Details

Summary

This is an initial attempt at a pass that will traverse the machine code while still in SSA form and attempt various transformations aimed at reducing the number of CR-logical operations. The only support provided by the initial patch is for splitting basic blocks on binary CR-logical operations (i.e. branch early on operands). The hope however is that the provided infrastructure will allow for further transformations that can reduce the number of these operations.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai created this revision.Feb 27 2017, 2:48 PM
echristo edited edge metadata.Mar 27 2017, 3:56 PM

How does this work on the testcases that hfinkel added in PR32320?

-eric

How does this work on the testcases that hfinkel added in PR32320?

-eric

I plan to run those to try to get reliable performance data to motivate heuristics for this pass. I'll do that after the dev conference and report the impact of this patch on those.

jtony added inline comments.Nov 28 2017, 11:39 AM
lib/Target/PowerPC/PPCMachineBasicBlockUtils.h
93 ↗(On Diff #89942)

It looks to me this static function uses too many parameters, which makes it a little bit hard to understand. Is it possible for us to reduce the number of parameters used here? One way I could think of is to make this a member function of class PPCReduceCRLogicals. In that case we could make some of the parameters member of the class, so that we don't have to pass them around. But it is totally up to you to decide whether the effort is worthwhile or not.

lib/Target/PowerPC/PPCReduceCRLogicals.cpp
220 ↗(On Diff #89942)

Can we do what the FIXME suggested here in this patch? Since the community doesn't like us putting FIXME in the code.

nemanjai added inline comments.Nov 28 2017, 11:43 AM
lib/Target/PowerPC/PPCMachineBasicBlockUtils.h
93 ↗(On Diff #89942)

I don't want to make it a member variable as I think this is useful for other purposes than just this pass. However, I think it might be useful to wrap all the parameters into a struct that we'll build up and pass.

lib/Target/PowerPC/PPCReduceCRLogicals.cpp
220 ↗(On Diff #89942)

I meant to remove this FIXME. Thanks for pointing it out.

nemanjai updated this revision to Diff 124650.Nov 28 2017, 3:24 PM
  • Rebase
  • Fix test cases that fail on ToT
  • Simplify the split function signature
  • Add the input CR logical instructions back to the queue if we split on the use

Some performance numbers as requested by @echristo (all run on a Power8 2GHz machine, bound to a specific physical CPU):
SPEC2006 (run time improvements - negative is good):
444.namd,868.7807,0.0000,850.4903,0.0000,-2.11%
447.dealII,630.6040,0.0000,631.2014,0.0000,0.09%
450.soplex,473.6103,0.0000,417.0427,0.0000,-11.94%
453.povray,505.4464,0.0000,487.0534,0.0000,-3.64%
401.bzip2,969.0608,0.0000,923.7337,0.0000,-4.68%
445.gobmk,868.6733,0.0000,857.6179,0.0000,-1.27%
464.h264ref,1053.8501,0.0000,1005.3811,0.0000,-4.60%
471.omnetpp,473.8660,0.0000,444.2557,0.0000,-6.25%
Changes below 1% omitted.
This was from a single run of SPEC2016 with r319300 and then same revision with this patch applied. Since it was a single run, I don't have information about the noise range, but the important thing is that the trend seems to be rather clear - performance improves with this patch.

For Hal's benchmarks, the relative performance doesn't change almost at all with this patch. The absolute performance improves with 4,3,float, 4,4,float by about 30% and degrades with 4,4,int by about 20%. The rest are unchanged.

Some performance numbers as requested by @echristo (all run on a Power8 2GHz machine, bound to a specific physical CPU):
SPEC2006 (run time improvements - negative is good):
444.namd,868.7807,0.0000,850.4903,0.0000,-2.11%
447.dealII,630.6040,0.0000,631.2014,0.0000,0.09%
450.soplex,473.6103,0.0000,417.0427,0.0000,-11.94%
453.povray,505.4464,0.0000,487.0534,0.0000,-3.64%
401.bzip2,969.0608,0.0000,923.7337,0.0000,-4.68%
445.gobmk,868.6733,0.0000,857.6179,0.0000,-1.27%
464.h264ref,1053.8501,0.0000,1005.3811,0.0000,-4.60%
471.omnetpp,473.8660,0.0000,444.2557,0.0000,-6.25%
Changes below 1% omitted.
This was from a single run of SPEC2016 with r319300 and then same revision with this patch applied. Since it was a single run, I don't have information about the noise range, but the important thing is that the trend seems to be rather clear - performance improves with this patch.

Quite.

For Hal's benchmarks, the relative performance doesn't change almost at all with this patch. The absolute performance improves with 4,3,float, 4,4,float by about 30% and degrades with 4,4,int by about 20%. The rest are unchanged.

Weird.

Anyhow, I'm ok with this.

echristo accepted this revision.Nov 30 2017, 7:39 AM
This revision is now accepted and ready to land.Nov 30 2017, 7:39 AM
This revision was automatically updated to reflect the committed changes.