This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Support 64-bit vectorization (WIP)
AbandonedPublic

Authored by RKSimon on Jun 8 2021, 1:34 PM.

Details

Summary

Based off bugs such as:

https://bugs.llvm.org/show_bug.cgi?id=47491
https://bugs.llvm.org/show_bug.cgi?id=49933

This is a WIP patch investigating x86 supporting vectorizing to 64-bit vector widths. Although 128-bits is the minimum architectural vector width on SSE targets, we do have fast load/stores for 64-bits and we've widened all shorter vector types to 128-bits for some time so there should be no extension/truncation nastiness to get in the way.

The description for getMinVectorRegisterBitWidth() is "The width of the smallest vector register type" - if there's concern that we're stretching this too far, we could add a new 'getMinVectorizationBitWidth()' TTI call that calls getMinVectorRegisterBitWidth() by default and use that. But at the moment only SLP and VectorCombiner actually uses it.

Diff Detail

Event Timeline

RKSimon created this revision.Jun 8 2021, 1:34 PM
RKSimon requested review of this revision.Jun 8 2021, 1:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2021, 1:34 PM
RKSimon updated this revision to Diff 358938.Jul 15 2021, 5:53 AM
RKSimon edited the summary of this revision. (Show Details)

rebase - still WIP

I foresee some problems with it in LTO mode. Small vectors are really a pain in SLP + LTO. Because of too early optimization, we may miss some better opportunities in later passes. We need to add some kind of a check that SLP runs at link time with LTO or it runs without LTO and enable small vectors only in these modes. If SLP runs at compile time with LTO, better to limit it to be used only with large vectors.

Some extra note: ran into the same problem with non-power-2 patch and had to limit the maximum allowed vectorization factor because of the regression with very small vectors.

RKSimon planned changes to this revision.Jul 15 2021, 8:15 AM

Cheers - this patch is still at the prototype stage and I'm still investigating how we can avoid regressions.

Hi Simon, could you try this patch and D108826 at LTO?

Hi Simon, could you try this patch and D108826 at LTO?

Not easily for at least another week - is there something in particular you're after? I'm on PTO and only have a rather under-powered windows laptop at hand....

PS - I haven't looked at this patch for a little while, I was focusing on helping you with your non-pow2 support first to see how close that got us before deciding whether this patch was necessary.

Hi Simon, could you try this patch and D108826 at LTO?

Not easily for at least another week - is there something in particular you're after? I'm on PTO and only have a rather under-powered windows laptop at hand....

PS - I haven't looked at this patch for a little while, I was focusing on helping you with your non-pow2 support first to see how close that got us before deciding whether this patch was necessary.

No problem. Just non-power-2 patch has very similar problems just like this patch at LTO. Ok, I'll try to test everything myself but after a 2-week vacation.

RKSimon abandoned this revision.Apr 28 2022, 7:59 AM

@ABataev is working on something similar (and more focussed) in D124284

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 7:59 AM
Herald added a subscriber: StephenFan. · View Herald Transcript