This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][AArch64] Increase cost of vector insert element and add missing cast costs
ClosedPublic

Authored by sbaranga on Aug 11 2015, 6:39 AM.

Details

Summary

Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.

Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 31806.Aug 11 2015, 6:39 AM
sbaranga retitled this revision from to [CostModel][AArch64] Increase cost of vector insert element and add missing cast costs.
sbaranga updated this object.
sbaranga added a subscriber: llvm-commits.

Benchmarking results (lnt, spec2000, spec2006) on cortex-a53/cortex-a57 at r243774

Cortex-A53:

Regressions:
lnt.MultiSource/Benchmarks/Prolangs-C++/primes/primes 2.97%
lnt.MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt 1.39%
spec.cpu2006.ref.464_h264ref 1.09%

Performance Improvements - Execution Time
lnt.MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -21.93%
lnt.SingleSource/Benchmarks/BenchmarkGame/puzzle -16.42%
lnt.MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -5.00%
lnt.SingleSource/Benchmarks/Misc/flops -3.16%
spec.cpu2000.ref.183_equake -2.54%
lnt.MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -1.97%
spec.cpu2000.ref.253_perlbmk -1.18%
spec.cpu2006.ref.447_dealII -1.10%

Cortex-A57:

Performance Regressions - Execution Time
lnt.MultiSource/Benchmarks/BitBench/five11/five11 3.20%
spec.cpu2006.ref.482_sphinx3 1.27%
lnt.SingleSource/Benchmarks/BenchmarkGame/n-body 1.03%

Performance Improvements - Execution Time
lnt.SingleSource/Benchmarks/BenchmarkGame/puzzle -16.11%
lnt.MultiSource/Benchmarks/Olden/bh/bh -8.30%
lnt.MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -5.39%
lnt.MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -4.37%
lnt.MultiSource/Applications/hexxagon/hexxagon -4.14%
lnt.SingleSource/UnitTests/Vectorizer/gcc-loops -4.06%
lnt.MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt -2.55%
spec.cpu2006.ref.453_povray -2.31%
spec.cpu2006.ref.444_namd -2.16%
lnt.SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -1.80%
lnt.SingleSource/Benchmarks/Shootout-C++/methcall -1.23%
lnt.MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt -1.14%
lnt.SingleSource/Benchmarks/Shootout-C++/ary3 -1.00%

rengolin accepted this revision.Aug 13 2015, 6:00 AM
rengolin added a reviewer: rengolin.

The changes make sense to what we've seen so far about the shuffles, and the benchmark numbers are very compelling.

Thanks for computing that table!

LGTM. Thanks!

This revision is now accepted and ready to land.Aug 13 2015, 6:00 AM

Thanks, Renato! I'll commit this on Monday

Cheers,
Silviu

mcrosier edited edge metadata.Aug 14 2015, 10:21 AM
mcrosier added a subscriber: mcrosier.
sbaranga closed this revision.Aug 17 2015, 9:05 AM

Thanks, r245226

-Silviu