Currently, pointer inductions are scalarized by the LoopVectorizer. Individual getelementptrs are built to create an address from the induction variable and are then inserted into vectors to be used by vector loads and stores. This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The approach looks very reasonable to me. A first round of mainly nits, while in the mean time I go over this a second time and let things sink in.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4192 | Ah, look at this....I was actually wondering about this last week, and now I found it :) | |
4222 | nit: unnecessary change? | |
4222 | Nit: can we just return here, and get rid of the "else"? That saves some indentation. | |
4223 | Nit: I don't know if the step is always a constant for a PtrInduction, so don't know if this should an assert or just a return. | |
4230 | Nit: NewOldPhi is a bit of a confusing name. | |
4246 | nit: this comment could be more descriptive. | |
4535 | Nit: this comment is slightly out of date, i.e. the ScalarPtrs part.... | |
4539 | .... because we are adding things to Worklist here. | |
4574 | Nit: missing . |
All nitpicks have been addressed.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4222 | Yes, but also unnecessary return in the first place, and a change very close to the relevant changes? | |
4222 | Yes, we can. | |
4223 | Using the assert here was inspired by emitTransformedIndex, which was - and, if Cost->isScalarAfterVectorization(P, VF) is true, still is - used for scalarising the induction. It uses the exact same assert for the step of a pointer induction. | |
4230 | Okay, maybe a bit confusing. I renamed it to NewPointerPhi, that's a bit more straightforward. | |
4246 | Comment has been made more descriptive :) |
Hi, I had to revert this in 10c82eecbcb7 due to crashing when building gzip (util.c). Reduced as both C and IR:
$ cat repro.c void a(char* b) { for (char* c = 0; c != b;) if (*--c) *c = '_'; } $ clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c PHI node entries do not match predecessors! %pointer.phi = phi i8* [ null, %vector.ph ], [ %2, %vector.body ] label %vector.body label %vector.ph clang: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8029: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/rupprecht/dev/clang -cc1 -emit-obj -target-cpu x86-64 -target-feature +sse4.2 -O2 -Wall -vectorize-loops repro.c 1. <eof> parser at end of file 2. Per-module optimization passes 3. Running pass 'Function Pass Manager' on module 'repro.c'. 4. Running pass 'Loop Vectorization' on function '@a' #0 0x0000000008912197 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:11 #1 0x0000000008912339 PrintStackTraceSignalHandler(void*) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:1 #2 0x0000000008910b4b llvm::sys::RunSignalHandlers() /home/rupprecht/src/llvm-project/llvm/lib/Support/Signals.cpp:67:5 #3 0x0000000008912a95 SignalHandler(int) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:1 #4 0x00007f7c948ff110 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14110) #5 0x00007f7c92e7a761 raise /build/glibc-M65Gwz/glibc-2.30/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 #6 0x00007f7c92e6455b abort /build/glibc-M65Gwz/glibc-2.30/stdlib/abort.c:81:7 #7 0x00007f7c92e6442f get_sysdep_segment_value /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:509:8 #8 0x00007f7c92e6442f _nl_load_domain /build/glibc-M65Gwz/glibc-2.30/intl/loadmsgcat.c:970:34 #9 0x00007f7c92e73092 (/lib/x86_64-linux-gnu/libc.so.6+0x34092) #10 0x0000000008adb37b llvm::LoopVectorizePass::processLoop(llvm::Loop*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8030:3 $ cat repro.ll ; ModuleID = './repro.ll' source_filename = "repro.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: noinline nounwind define void @a(i8* %b) #0 { entry: %b.addr = alloca i8*, align 8 %c = alloca i8*, align 8 store i8* %b, i8** %b.addr, align 8 br label %for.cond for.cond: ; preds = %if.end, %entry %0 = load i8*, i8** %c, align 8 %1 = load i8*, i8** %b.addr, align 8 %cmp = icmp ne i8* %0, %1 br i1 %cmp, label %for.body, label %for.end for.body: ; preds = %for.cond %2 = load i8*, i8** %c, align 8 %incdec.ptr = getelementptr inbounds i8, i8* %2, i32 -1 store i8* %incdec.ptr, i8** %c, align 8 %3 = load i8, i8* %incdec.ptr, align 1 %tobool = icmp ne i8 %3, 0 br i1 %tobool, label %if.then, label %if.end if.then: ; preds = %for.body %4 = load i8*, i8** %c, align 8 store i8 95, i8* %4, align 1 br label %if.end if.end: ; preds = %if.then, %for.body br label %for.cond for.end: ; preds = %for.cond ret void } attributes #0 = { noinline nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0} !0 = !{i32 1, !"wchar_size", i32 4} $ opt -O2 repro.ll -o repro.o <same as above>
Reopening as the patch is reverted
Hi, there was a bug concerning the location at which the new getelementptr instruction would be generated, thanks for catching this.
I have fixed the bug and derived a LoopVectorize test case from your C example.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
1033 | Latch might be a better name here. | |
llvm/test/Transforms/LoopVectorize/pointer-induction.ll | ||
5 | I think if you use x86 as a target (and needs it for the costing), the test needs to go into test/Transforms/LoopVectorize/X86 in case the target is not compiled in. | |
8 | Also some of this might be able to be cleaned up, like the local_unnamed_addr, the metadata and all/most(?) of the attributes. |
llvm/test/Transforms/LoopVectorize/pointer-induction.ll | ||
---|---|---|
5 | It looks like the options above actually force vectorization with a certain factor. In that case, it Is probably best to remove the triple. I'd also consider just checking the loop-vectorize output (without -dce -instcombine), if it is not too messy, as it makes the test more prone to break when something changes in instcombine. Also, it might be possible to only specifically check the IR related to the generated induction, rather than autogenerating the checks, which include a lot of relatively irrelevant stuff. |
Revisited the new test to make it cleaner.
llvm/test/Transforms/LoopVectorize/pointer-induction.ll | ||
---|---|---|
5 | I thought it did need the target information to behave in the right way, but apparently I was mistaken - so no relocation necessary, I removed the target. | |
5 | Thanks for the feedback, I don't have much experience writing opt tests so your advice is very welcome. I have removed the triple and the meta data, after checking that we don't need them, and reduced the checks to vector.ph, vector.body and the loop latch that changes the induction variable. | |
8 | Should be a lot cleaner now. |
Thanks. This LGTM.
Perhaps wait a few days until after the branch is taken though, to be on the safe side.
Latch might be a better name here.