This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Control maximum vectorization factor from TTI
ClosedPublic

Authored by rampitec on Nov 24 2020, 2:33 PM.

Details

Summary

D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Diff Detail

Event Timeline

rampitec created this revision.Nov 24 2020, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 2:33 PM
rampitec requested review of this revision.Nov 24 2020, 2:33 PM
rampitec updated this revision to Diff 307482.Nov 24 2020, 5:15 PM
jonpa added a comment.Nov 28 2020, 9:48 AM

Not sure how appropriate is it to commit a target test using llc into the Transforms directory.

You are probably right - perhaps it is enough to run opt and add a comment that explains that the CHECKed instructions would result in spilling

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
129

maybe include "slp" in the description as well, to be really clear..?

748

Is there a reason for having the default TTI value of 0 instead of UINT_MAX directly as you first suggested?

jonpa added a comment.Nov 28 2020, 9:49 AM

This patch solves the issue I was seeing on SystemZ - the benchmark regression is gone :-)

Not sure how appropriate is it to commit a target test using llc into the Transforms directory.

You are probably right - perhaps it is enough to run opt and add a comment that explains that the CHECKed instructions would result in spilling

Yes, this is an SLP patch, so the regression test should only be opt -slp-vectorizer and auto-check all of the lines or minimally check just enough lines to determine that the IR instructions are or are not vectorized as expected. If we want an unoptimized IR -> asm test, that should go into the test-suite rather than the regression tests.

jonpa added a comment.Dec 7 2020, 3:02 PM

@rampitec : It would be nice to see this go in soon...

rampitec updated this revision to Diff 310036.Dec 7 2020, 3:12 PM
rampitec marked an inline comment as done.

Updated option description.
Updated callback comment to clarify this is used by SLP only now.
Dropped BE test.
Addded a store to the general test to make it useful if anyone needs to run AMDGPU BE on it too.

@rampitec : It would be nice to see this go in soon...

Sorry, just returned from vacation. I believe I have addressed the comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
748

Yes, there is getMinimumVF() with default 0. This is just for the uniformity.

jonpa added a comment.Dec 11 2020, 3:52 PM

This LGTM, but for one little detail: "Maximum SLP vectorization factor" should perhaps include "0=unlimited" or something similar, to avoid confusing 0 to mean default "off". Or maybe that isn't needed with a hidden option?

@spatel looks good to you as well?

rampitec updated this revision to Diff 311336.Dec 11 2020, 4:42 PM

Added 0=unlimited to the option description.

spatel accepted this revision.Dec 12 2020, 9:00 AM

LGTM

This revision is now accepted and ready to land.Dec 12 2020, 9:00 AM
This revision was automatically updated to reflect the committed changes.

This commit seems to be causing some internal test failures (raising a SIGFPE). I'm working on getting more details or a reduced test case to share. Is disabling this with -mllvm -slp-max-vf=1 generally OK as a short term workaround? That seems to disable enough bits that the tests pass with that, but I'm not sure if there are other implications by setting that.

This commit seems to be causing some internal test failures (raising a SIGFPE). I'm working on getting more details or a reduced test case to share. Is disabling this with -mllvm -slp-max-vf=1 generally OK as a short term workaround? That seems to disable enough bits that the tests pass with that, but I'm not sure if there are other implications by setting that.

Which tests are these? Setting the option to 1 effectively disables vectorization.

This commit seems to be causing some internal test failures (raising a SIGFPE). I'm working on getting more details or a reduced test case to share. Is disabling this with -mllvm -slp-max-vf=1 generally OK as a short term workaround? That seems to disable enough bits that the tests pass with that, but I'm not sure if there are other implications by setting that.

Which tests are these? Setting the option to 1 effectively disables vectorization.

Sorry, forgot to followup: the issue ended up being due to use of feenableexcept() with the default -ffp-exception-behavior=ignore` -- so, saying to div-by-zero errors at runtime, but saying the compiler can assume it won't be raised at build configuration time. The crashing instruction was divpd %xmm0,%xmm12, and the SIGFPE came from a div-by-zero on one of the unused slots in the packed value. The team is just removing feenableexcept() which fixes the issue. No action needed here :)