Page MenuHomePhabricator

[ARM] Don't pessimize i32 vselect.

Authored by chatur01 on Nov 17 2015, 1:10 AM.



The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.

I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.

From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%

I also measured A57 A32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change.

Diff Detail


Event Timeline

chatur01 retitled this revision from to [ARM] Don't pessimize i32 vselect..
chatur01 updated this object.
chatur01 set the repository for this revision to rL LLVM.
chatur01 added a subscriber: llvm-commits.
rengolin accepted this revision.Nov 17 2015, 1:27 AM
rengolin added a reviewer: rengolin.

Hi Charlie,

The original commit mentions Cortex-A8's 25% improvement performance in PAQp8. It seems you've only tested in OOO cores, can you just double check on either A53 or A7/A8? It's probably a pattern we now recognise better, but would be good to be on the safe side.

If you find no regressions either, and I'm not expecting any, LGTM. Thanks!


This revision is now accepted and ready to land.Nov 17 2015, 1:27 AM

Thanks Renato. I double-checked A53 T32 and found no regressions in SPEC 2000, 2006 or LNT.

Adobe-C++/loop_unroll was up 6.49%
ASC_Sequoia/AMGmk/AMGmk was up 5.04%

Several other improvements in LNT.

This revision was automatically updated to reflect the committed changes.

I missed some mechanical changes in test/Analysis/CostModel/ARM/select.ll in this review. I committed the changes along with what's in this revision, which I felt were appropriate for post-commit review if necessary. Sorry about that!