This is an archive of the discontinued LLVM Phabricator instance.

Do not use isel on P7 and P8
Needs ReviewPublic

Authored by ohsallen on Mar 11 2015, 10:42 AM.

Details

Reviewers
hfinkel
Summary

It appears that using branches is almost always better than using isel for P7 and P8, because of branch prediction — the decision outcome has to be almost completely random for isel to win. Also isel has a way higher latency for P8 than for A2, for instance. This patch disables the usage of isel for P7 and P8, and employs regular branches instead.

Had to change some regression tests to use A2 instead of P7 for checking that isel is generated. There are still 2 tests for which I have problems:

LLVM :: CodeGen/PowerPC/ifcvt.ll
LLVM :: CodeGen/PowerPC/p8-isel-sched.ll

ifcvt.ll: using A2 does not generate isel (Hal: seems fishy, right?)
p8-isel-sched.ll: shall we remove this testcase?

Thanks,
Olivier

Diff Detail

Event Timeline

ohsallen updated this revision to Diff 21742.Mar 11 2015, 10:42 AM
ohsallen retitled this revision from to Do not use isel on P7 and P8.
ohsallen updated this object.
ohsallen edited the test plan for this revision. (Show Details)
ohsallen added a reviewer: hfinkel.
ohsallen added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Mar 16 2015, 11:08 PM

ifcvt.ll: using A2 does not generate isel (Hal: seems fishy, right?)

Yea, we should figure this out.

p8-isel-sched.ll: shall we remove this testcase?

No, but once you control this using some target feature, you can enable it specifically for this test.

Also, one more wrinkle, we should still generate isel on the p7/p8 when optimizing for size.

lib/Target/PowerPC/PPCSubtarget.h
268

I don't want to do it this way. Either just remove HasISEL from the P7/P8 processor definitions, or add a ISELIsSlow (or whatever) to PPC.td, but we shouldn't hard code it like this here.

test/CodeGen/PowerPC/crbit-asm.ll
1

Please keep the isel test coverage here (and in crbit.ll below), but we can use different -check-prefix when running on the pwr7 vs. the a2.

ohsallen updated this revision to Diff 22201.Mar 18 2015, 12:04 PM
ohsallen edited edge metadata.

Thanks for the comments, Hal. Here is a version with a change in the processor definitions of P7/P8. The logic has been moved into enableEarlyIfConversion(), and the optsize attribute is considered. I had to change the signature of that function for that purpose.

The target independent parts look fine to me.

-eric

I've not run benchmarks on the P8, only the P7 (I did 20 test suite runs in each configuration today), but I think that the situation is more complicated than you imply...

With this patch I see (negative is speedup):

External/SPEC/CFP2006/444.namd/444.namd
-29.3699% +/- 19.5658%
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
2.46854% +/- 0.541648%
MultiSource/Benchmarks/McCat/04-bisect/bisect
15.5624% +/- 5.32078%
MultiSource/Benchmarks/mediabench/adpcm/rawdaudio/rawdaudio
31.4063% +/- 19.7579%

So the namd speedup is nice, but essentially everything else is a slowdown. I think that we need a more in-depth analysis of what's going on here.

As discussed with Hal, using the isel instruction can be beneficial with the current infrastructure because it reduces the number of basic blocks, and enables some late optimizations. Ultimately, we should keep the select instructions and expand them into branches after those optimizations happened. In the meantime, we will keep using isel by default and provide a -misel/-mno-isel option in Clang.

Thanks,
Olivier