Page MenuHomePhabricator

[SLP] Also try to vectorize incoming values of PHIs .
ClosedPublic

Authored by fhahn on Oct 2 2020, 5:55 AM.

Details

Summary

Currently we do not consider incoming values of PHIs as roots for SLP
vectorization. This means we miss scenarios like the one in the test
case and PR47670.

It appears quite straight-forward to consider incoming values of PHIs as
roots for vectorization, but I might be missing something that makes
this problematic.

In terms of vectorized instructions, this applies to quite a few
benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto

Same hash: 185 (filtered out)
Remaining: 52
Metric: SLP.NumVectorInstructions

Program                                        base    patch   diff
 test-suite...ProxyApps-C++/HPCCG/HPCCG.test     9.00   27.00  200.0%
 test-suite...C/CFP2000/179.art/179.art.test     8.00   22.00  175.0%
 test-suite...T2006/458.sjeng/458.sjeng.test    14.00   30.00  114.3%
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test    11.00   18.00  63.6%
 test-suite...s/FreeBench/neural/neural.test    12.00   18.00  50.0%
 test-suite...rimaran/enc-3des/enc-3des.test    65.00   95.00  46.2%
 test-suite...006/450.soplex/450.soplex.test    63.00   89.00  41.3%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   177.00  250.00  41.2%
 test-suite...nchmarks/McCat/18-imp/imp.test    13.00   18.00  38.5%
 test-suite.../Applications/sgefa/sgefa.test    26.00   35.00  34.6%
 test-suite...pplications/oggenc/oggenc.test   100.00  133.00  33.0%
 test-suite...6/482.sphinx3/482.sphinx3.test   103.00  134.00  30.1%
 test-suite...oxyApps-C++/miniFE/miniFE.test   169.00  213.00  26.0%
 test-suite.../Benchmarks/Olden/tsp/tsp.test    59.00   73.00  23.7%
 test-suite...TimberWolfMC/timberwolfmc.test   503.00  622.00  23.7%
 test-suite...T2006/456.hmmer/456.hmmer.test    65.00   79.00  21.5%
 test-suite...libquantum/462.libquantum.test    58.00   68.00  17.2%
 test-suite...ternal/HMMER/hmmcalibrate.test    84.00   98.00  16.7%
 test-suite...ications/JM/ldecod/ldecod.test   351.00  401.00  14.2%
 test-suite...arks/VersaBench/dbms/dbms.test    52.00   57.00   9.6%
 test-suite...ce/Benchmarks/Olden/bh/bh.test   118.00  128.00   8.5%
 test-suite.../Benchmarks/Bullet/bullet.test   6355.00 6880.00  8.3%
 test-suite...nsumer-lame/consumer-lame.test   480.00  519.00   8.1%
 test-suite...000/183.equake/183.equake.test   226.00  244.00   8.0%
 test-suite...chmarks/Olden/power/power.test   105.00  113.00   7.6%
 test-suite...6/471.omnetpp/471.omnetpp.test    92.00   99.00   7.6%
 test-suite...ications/JM/lencod/lencod.test   1173.00 1261.00  7.5%
 test-suite...0/253.perlbmk/253.perlbmk.test    55.00   59.00   7.3%
 test-suite...oxyApps-C/miniAMR/miniAMR.test    92.00   98.00   6.5%
 test-suite...chmarks/MallocBench/gs/gs.test   446.00  473.00   6.1%
 test-suite.../CINT2006/403.gcc/403.gcc.test   464.00  491.00   5.8%
 test-suite...6/464.h264ref/464.h264ref.test   998.00  1055.00  5.7%
 test-suite...006/453.povray/453.povray.test   5711.00 6007.00  5.2%
 test-suite...FreeBench/distray/distray.test   102.00  107.00   4.9%
 test-suite...:: External/Povray/povray.test   4184.00 4378.00  4.6%
 test-suite...DOE-ProxyApps-C/CoMD/CoMD.test   112.00  117.00   4.5%
 test-suite...T2006/445.gobmk/445.gobmk.test   104.00  108.00   3.8%
 test-suite...CI_Purple/SMG2000/smg2000.test   789.00  819.00   3.8%
 test-suite...yApps-C++/PENNANT/PENNANT.test   233.00  241.00   3.4%
 test-suite...marks/7zip/7zip-benchmark.test   417.00  428.00   2.6%
 test-suite...arks/mafft/pairlocalalign.test   627.00  643.00   2.6%
 test-suite.../Benchmarks/nbench/nbench.test   259.00  265.00   2.3%
 test-suite...006/447.dealII/447.dealII.test   4641.00 4732.00  2.0%
 test-suite...lications/ClamAV/clamscan.test   106.00  108.00   1.9%
 test-suite...CFP2000/177.mesa/177.mesa.test   1639.00 1664.00  1.5%
 test-suite...oxyApps-C/RSBench/rsbench.test    66.00   65.00  -1.5%
 test-suite.../CINT2000/252.eon/252.eon.test   3416.00 3444.00  0.8%
 test-suite...CFP2000/188.ammp/188.ammp.test   1846.00 1861.00  0.8%
 test-suite.../CINT2000/176.gcc/176.gcc.test   152.00  153.00   0.7%
 test-suite...CFP2006/444.namd/444.namd.test   3528.00 3544.00  0.5%
 test-suite...T2006/473.astar/473.astar.test    98.00   98.00   0.0%
 test-suite...frame_layout/frame_layout.test    NaN     39.00   nan%

On ARM64, there appears to be a slight regression on SPEC2006, which
might be interesting to investigate:

test-suite...T2006/473.astar/473.astar.test   0.9%

Diff Detail

Event Timeline

fhahn created this revision.Oct 2 2020, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2020, 5:55 AM
fhahn requested review of this revision.Oct 2 2020, 5:55 AM
fhahn retitled this revision from [SLP] Also try to vectorize incoming values of PHs . to [SLP] Also try to vectorize incoming values of PHIs ..Oct 2 2020, 5:58 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7630

I dropped the early return, because I am not sure if the bail-out on PHIs without 2 incoming values was intentional. I might split this off into a separate patch, if I can come up with an independent test.

lebedev.ri resigned from this revision.Oct 11 2020, 11:27 PM

@spatel, @RKSimon any concerns?

Improvements look great,

We're waiting for @ABataev to return from holiday, but it looks ok to me.

ABataev added inline comments.Oct 20 2020, 8:07 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7658–7663

I would add a check here that we're going to vectorize the incoming values from a different basic block only, it is better to postpone vectorization of the values from the current basic block. It may reduce the compile time, I assume.

fhahn updated this revision to Diff 299445.Oct 20 2020, 12:48 PM

Address comments, skip case where the incoming block matches the current block.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7658–7663

Thanks, I added a continue if the incoming block matches the current one.

ABataev added inline comments.Oct 20 2020, 12:58 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7642–7646

The comment is not quite correct, I think. The instructions from the current block would not necessarily be considered for vectorization, we still may miss them. That's why I said that it would be good to gather these instructions into a container and then process them after the end of the processing of the current block if they were not vectorized(deleted) yet.
This can be implemented later, just need to improve the comment and add a TODO to implement delayed vectorization for the incoming values from the current block.

fhahn updated this revision to Diff 299468.Oct 20 2020, 2:13 PM

Adjust comment, add TODO

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7642–7646

Thanks for clarifying. I updated the wording and added a TODO, please let me know if it is still missing something.

This revision is now accepted and ready to land.Oct 20 2020, 3:30 PM
fhahn added a comment.Mon, Nov 2, 1:52 PM

Land this?

yes, will do in the next few days.

This revision was landed with ongoing or failed builds.Fri, Nov 6, 4:59 AM
This revision was automatically updated to reflect the committed changes.

Hi!
This seems to fail if one of the predecessors to the PHI block is dead and contain weird stuff e.g. like this:

define void @foo() {
bb.0:
  br label %bb.1

dead:
  %tmp0 = add i16 %tmp0, undef
  br label %bb.1

bb.1:
  %tmp1 = phi i16 [ undef, %bb.0 ], [ %tmp0, %dead ]
  ret void
}

This seems to fail if one of the predecessors to the PHI block is dead and contain weird stuff e.g. like this:

Thanks for the example. I added an unreachable code bailout to avoid the bug:
08834979

fhahn added a comment.Tue, Nov 17, 1:52 PM

This seems to fail if one of the predecessors to the PHI block is dead and contain weird stuff e.g. like this:

Thanks for the example. I added an unreachable code bailout to avoid the bug:
08834979

Thanks for pushing the fix!