Page MenuHomePhabricator

[LV] Enable the LoopVectorizer to create pointer inductions

Authored by anwel on Jun 5 2020, 8:49 AM.



Currently, pointer inductions are scalarized by the LoopVectorizer. Individual getelementptrs are built to create an address from the induction variable and are then inserted into vectors to be used by vector loads and stores. This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable.

Diff Detail

Event Timeline

anwel created this revision.Jun 5 2020, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2020, 8:49 AM
SjoerdMeijer added a comment.EditedJun 22 2020, 9:36 AM

The approach looks very reasonable to me. A first round of mainly nits, while in the mean time I go over this a second time and let things sink in.


Ah, look at this....I was actually wondering about this last week, and now I found it :)


nit: unnecessary change?


Nit: can we just return here, and get rid of the "else"? That saves some indentation.


Nit: I don't know if the step is always a constant for a PtrInduction, so don't know if this should an assert or just a return.


Nit: NewOldPhi is a bit of a confusing name.


nit: this comment could be more descriptive.


Nit: this comment is slightly out of date, i.e. the ScalarPtrs part....


.... because we are adding things to Worklist here.
I was getting confused a bit if this should not be ScalarsPtrs, but because it is an induction it makes sense to add it to Worklist?


Nit: missing .

anwel updated this revision to Diff 273615.Jun 26 2020, 1:42 AM
anwel marked 12 inline comments as done.

All nitpicks have been addressed.


Yes, but also unnecessary return in the first place, and a change very close to the relevant changes?


Yes, we can.


Using the assert here was inspired by emitTransformedIndex, which was - and, if Cost->isScalarAfterVectorization(P, VF) is true, still is - used for scalarising the induction. It uses the exact same assert for the step of a pointer induction.


Okay, maybe a bit confusing. I renamed it to NewPointerPhi, that's a bit more straightforward.


Comment has been made more descriptive :)

SjoerdMeijer accepted this revision.Jun 29 2020, 8:49 AM

This looks like a good change to me. Please wait a day or 2 with committing this in case @Ayal or @fhahn have additional comments.

This revision is now accepted and ready to land.Jun 29 2020, 8:49 AM
This revision was automatically updated to reflect the committed changes.
rupprecht reopened this revision.Jul 6 2020, 6:05 PM
rupprecht added a subscriber: rupprecht.

Hi, I had to revert this in 10c82eecbcb7 due to crashing when building gzip (util.c). Reduced as both C and IR:

$ cat repro.c
void a(char* b) {
  for (char* c = 0; c != b;)
    if (*--c) *c = '_';
$ clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c
PHI node entries do not match predecessors!
  %pointer.phi = phi i8* [ null, ], [ %2, %vector.body ]
label %vector.body
clang: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8029: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.
PLEASE submit a bug report to and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c 
1.      <eof> parser at end of file
2.      Per-module optimization passes
3.      Running pass 'Function Pass Manager' on module 'repro.c'.
4.      Running pass 'Loop Vectorization' on function '@a'
 #0 0x0000000008912197 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/
 #1 0x0000000008912339 PrintStackTraceSignalHandler(void*) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/
 #2 0x0000000008910b4b llvm::sys::RunSignalHandlers() /home/rupprecht/src/llvm-project/llvm/lib/Support/Signals.cpp:67:5
 #3 0x0000000008912a95 SignalHandler(int) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/
 #4 0x00007f7c948ff110 __restore_rt (/lib/x86_64-linux-gnu/
 #5 0x00007f7c92e7a761 raise /build/glibc-M65Gwz/glibc-2.30/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #6 0x00007f7c92e6455b abort /build/glibc-M65Gwz/glibc-2.30/stdlib/abort.c:81:7
 #7 0x00007f7c92e6442f get_sysdep_segment_value /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:509:8
 #8 0x00007f7c92e6442f _nl_load_domain /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:970:34
 #9 0x00007f7c92e73092 (/lib/x86_64-linux-gnu/
#10 0x0000000008adb37b llvm::LoopVectorizePass::processLoop(llvm::Loop*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8030:3

$ cat repro.ll
; ModuleID = './repro.ll'
source_filename = "repro.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind
define void @a(i8* %b) #0 {
  %b.addr = alloca i8*, align 8
  %c = alloca i8*, align 8
  store i8* %b, i8** %b.addr, align 8
  br label %for.cond

for.cond:                                         ; preds = %if.end, %entry
  %0 = load i8*, i8** %c, align 8
  %1 = load i8*, i8** %b.addr, align 8
  %cmp = icmp ne i8* %0, %1
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %2 = load i8*, i8** %c, align 8
  %incdec.ptr = getelementptr inbounds i8, i8* %2, i32 -1
  store i8* %incdec.ptr, i8** %c, align 8
  %3 = load i8, i8* %incdec.ptr, align 1
  %tobool = icmp ne i8 %3, 0
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %for.body
  %4 = load i8*, i8** %c, align 8
  store i8 95, i8* %4, align 1
  br label %if.end

if.end:                                           ; preds = %if.then, %for.body
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret void

attributes #0 = { noinline nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"wchar_size", i32 4}
$ opt -O2 repro.ll -o repro.o
<same as above>

Reopening as the patch is reverted

This revision is now accepted and ready to land.Jul 6 2020, 6:05 PM
rupprecht requested changes to this revision.Jul 6 2020, 6:09 PM

I guess I should also request changes since there's a crash :)

This revision now requires changes to proceed.Jul 6 2020, 6:09 PM
anwel updated this revision to Diff 276391.Jul 8 2020, 5:48 AM

Hi, there was a bug concerning the location at which the new getelementptr instruction would be generated, thanks for catching this.
I have fixed the bug and derived a LoopVectorize test case from your C example.

dmgreen added inline comments.Jul 8 2020, 7:51 AM

Latch might be a better name here.


I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in.


Also some of this might be able to be cleaned up, like the local_unnamed_addr, the metadata and all/most(?) of the attributes.

fhahn added inline comments.Jul 8 2020, 7:57 AM

It looks like the options above actually force vectorization with a certain factor. In that case, it Is probably best to remove the triple.

I'd also consider just checking the loop-vectorize output (without -dce -instcombine), if it is not too messy, as it makes the test more prone to break when something changes in instcombine. Also, it might be possible to only specifically check the IR related to the generated induction, rather than autogenerating the checks, which include a lot of relatively irrelevant stuff.

anwel updated this revision to Diff 276970.Fri, Jul 10, 3:03 AM
anwel marked 6 inline comments as done.

Revisited the new test to make it cleaner.


I thought it did need the target information to behave in the right way, but apparently I was mistaken - so no relocation necessary, I removed the target.


Thanks for the feedback, I don't have much experience writing opt tests so your advice is very welcome.

I have removed the triple and the meta data, after checking that we don't need them, and reduced the checks to, vector.body and the loop latch that changes the induction variable.


Should be a lot cleaner now.

dmgreen accepted this revision.Tue, Jul 14, 2:50 AM

Thanks. This LGTM.

Perhaps wait a few days until after the branch is taken though, to be on the safe side.

This revision was not accepted when it landed; it landed in state Needs Review.Fri, Jul 17, 5:35 AM
This revision was automatically updated to reflect the committed changes.