This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve vectorization of gathered loads.
Needs ReviewPublic

Authored by ABataev on Jul 14 2021, 7:41 AM.

Details

Summary

When building the vectorization graph, the compiler may end up with the
consecutive loads in the different branches, which end up to be
gathered. We can scan these loads and try to load them as final
vectorized load and then reshuffle between the branches to avoid extra
scalar loads in the code.

Part of D57059

Diff Detail

Event Timeline

ABataev created this revision.Jul 14 2021, 7:41 AM
ABataev requested review of this revision.Jul 14 2021, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 7:41 AM
RKSimon edited the summary of this revision. (Show Details)Jul 14 2021, 7:47 AM

Any idea why so many of these test changes are just undef<->poison changes?

Any idea why so many of these test changes are just undef<->poison changes?

Changes in the vectorizations of gathers. I'm going to prepare a separate patch and fix it there.

A few minors

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3705

Add comment explaining this logic

3723

Can you clean this up? The clang-formating has made it difficult to follow.

4364

Could we end up in a situation where we end up truncating here and causing overflow problems later on?

6810–7064

By using .back() can't we safely use this for the Entries.size() == 1 case as well?

ABataev added inline comments.Jul 30 2021, 2:35 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4364

This is what we already have when we're trying to build the vector for the GEPs. Here we end up with the IntPtr type.

6810–7064

I had trouble with it for an unknown reason, will check it once again.

ABataev added inline comments.Jul 30 2021, 2:40 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6810–7064

Ahh, If I use .back() here, the compiler might emit something like this:

%vec = shuffle %v1, %v1, <0,1,2,undef>

I.e. the second argument is not replaced by the PoisonValue vector.

ABataev updated this revision to Diff 363210.Jul 30 2021, 2:57 PM

Rebase, fixes and improvements.

vporpo added a subscriber: vporpo.Nov 11 2021, 8:01 PM
RKSimon added inline comments.May 1 2022, 8:44 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2249

add newline between member variables

3010

Please can you move the canVectorizeLoads code as a NFC pre-commit so that the patch becomes smaller?

Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 8:44 AM

rebase?

Working on it

dtemirbulatov added a comment.EditedSep 5 2022, 2:09 PM

@ABataev, I accidently hit the error while reviewing your patch. I have tried 2819126d0c4f42573464ad5b64a2829e6c1f3c22 since I have merge error with the main branch.
Here is the reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @dfct(ptr %t) #0 {
entry:
  br label %vector.memcheck

vector.memcheck:                                  ; preds = %entry
  %bound0595 = icmp ult ptr null, null
  %bound1596 = icmp ult ptr null, %t
  %found.conflict597 = and i1 %bound0595, %bound1596
  %bound0599 = icmp ult ptr null, null
  %bound1600 = icmp ult ptr null, null
  %found.conflict601 = and i1 %bound0599, %bound1600
  %conflict.rdx602 = or i1 %found.conflict597, %found.conflict601
  %bound0603 = icmp ult ptr null, %t
  %bound1604 = icmp ult ptr null, %t
  %found.conflict605 = and i1 %bound0603, %bound1604
  %conflict.rdx606 = or i1 %conflict.rdx602, %found.conflict605
  %bound0607 = icmp ult ptr null, null
  %bound1608 = icmp ult ptr null, null
  %found.conflict609 = and i1 %bound0607, %bound1608
  %conflict.rdx610 = or i1 %conflict.rdx606, %found.conflict609
  br i1 %conflict.rdx610, label %for.body.preheader701, label %vector.ph

vector.ph:                                        ; preds = %vector.memcheck
  ret void

for.body.preheader701:                            ; preds = %vector.memcheck
  ret void
}

attributes #0 = { "target-features"="+avx,+avx2,+crc32,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }

it says "Instruction does not dominate all uses!"