This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix incorrect cost tree calculation.
AbandonedPublic

Authored by dtemirbulatov on Feb 3 2019, 4:10 PM.

Details

Summary

I found that during tree cost calulation, the algorithm uses tree entries that were not supposed to be vectorized and were rejected on the early stage, but we still estimating those entries during the whole tree estimation. Following change fixes this issue.
Also here is spec 2k6 data before and after this change on :
...
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
stepping : 3
microcode : 0xc6
cpu MHz : 2429.650
cache size : 6144 KB
....
Before:
400.perlbench NR
401.bzip2 9650 502 19.2 S
401.bzip2 9650 481 20.1 S
401.bzip2 9650 500 19.3 *
403.gcc 8050 248 32.5 S
403.gcc 8050 244 33.0 S
403.gcc 8050 245 32.9 *
429.mcf 9120 328 27.8 *
429.mcf 9120 322 28.3 S
429.mcf 9120 331 27.5 S
445.gobmk 10490 468 22.4 *
445.gobmk 10490 475 22.1 S
445.gobmk 10490 466 22.5 S
456.hmmer 9330 349 26.7 S
456.hmmer 9330 348 26.8 S
456.hmmer 9330 349 26.7 *
458.sjeng 12100 458 26.4 S
458.sjeng 12100 588 20.6 S
458.sjeng 12100 467 25.9 *
462.libquantum 20720 269 77.1 *
462.libquantum 20720 312 66.4 S
462.libquantum 20720 267 77.7 S
464.h264ref 22130 516 42.9 *
464.h264ref 22130 516 42.9 S
464.h264ref 22130 515 43.0 S
471.omnetpp 6250 327 19.1 S
471.omnetpp 6250 330 18.9 *
471.omnetpp 6250 333 18.8 S
473.astar 7020 -- CE

483.xalancbmk 6900 -- CE

400.perlbench NR
401.bzip2 9650 500 19.3 *
403.gcc 8050 245 32.9 *
429.mcf 9120 328 27.8 *
445.gobmk 10490 468 22.4 *
456.hmmer 9330 349 26.7 *
458.sjeng 12100 467 25.9 *
462.libquantum 20720 269 77.1 *
464.h264ref 22130 516 42.9 *
471.omnetpp 6250 330 18.9 *
473.astar NR
483.xalancbmk NR

After:
400.perlbench NR
401.bzip2 9650 493 19.6 S
401.bzip2 9650 491 19.6 S
401.bzip2 9650 492 19.6 *
403.gcc 8050 254 31.7 S
403.gcc 8050 253 31.8 S
403.gcc 8050 254 31.7 *
429.mcf 9120 329 27.7 S
429.mcf 9120 328 27.8 *
429.mcf 9120 327 27.9 S
445.gobmk 10490 469 22.4 S
445.gobmk 10490 468 22.4 S
445.gobmk 10490 468 22.4 *
456.hmmer 9330 347 26.9 S
456.hmmer 9330 427 21.8 S
456.hmmer 9330 348 26.8 *
458.sjeng 12100 460 26.3 S
458.sjeng 12100 662 18.3 S
458.sjeng 12100 460 26.3 *
462.libquantum 20720 268 77.3 *
462.libquantum 20720 268 77.4 S
462.libquantum 20720 341 60.8 S
464.h264ref 22130 504 43.9 S
464.h264ref 22130 500 44.2 S
464.h264ref 22130 503 44.0 *
471.omnetpp 6250 325 19.3 *
471.omnetpp 6250 324 19.3 S
471.omnetpp 6250 328 19.1 S
473.astar 7020 -- CE

483.xalancbmk 6900 -- CE

400.perlbench NR
401.bzip2 9650 492 19.6 *
403.gcc 8050 254 31.7 *
429.mcf 9120 328 27.8 *
445.gobmk 10490 468 22.4 *
456.hmmer 9330 348 26.8 *
458.sjeng 12100 460 26.3 *
462.libquantum 20720 268 77.3 *
464.h264ref 22130 503 44.0 *
471.omnetpp 6250 325 19.3 *
473.astar NR
483.xalancbmk NR

Diff Detail

Event Timeline

dtemirbulatov created this revision.Feb 3 2019, 4:10 PM

Looks ok to me (after that comment fix), but @ABataev should probably have the final say.

lib/Transforms/Vectorize/SLPVectorizer.cpp
704

Please can make this comment more explanatory.

It does not look correct to me. Seems, you're throwing away the cost of the gather nodes.

dtemirbulatov abandoned this revision.Feb 4 2019, 6:11 PM