This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix recording of BranchTakenCount for FoldTail
ClosedPublic

Authored by Ayal on Apr 24 2020, 5:59 PM.

Download Raw Diff

Details

Reviewers

fhahn
gilr
rengolin

Commits

rGa3c964a278b4: [LV] Fix recording of BranchTakenCount for FoldTail

Summary

When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Ayal created this revision.Apr 24 2020, 5:59 PM

Herald added a reviewer: rengolin. · View Herald TranscriptApr 24 2020, 5:59 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, vkmr, rogfer01 and 3 others. · View Herald Transcript

Harbormaster completed remote builds in B54655: Diff 260045.Apr 24 2020, 6:58 PM

LGTM, thanks!

I'll add a reduced version of the crashing case from D78847 when I recommit it.

This revision is now accepted and ready to land.Apr 25 2020, 6:51 AM

fhahn mentioned this in D76992: [VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)..Apr 26 2020, 4:39 AM

Added test. Rebased.

In D78847#2003543, @fhahn wrote:

LGTM, thanks!

I'll add a reduced version of the crashing case from D78847 when I recommit it.

Thanks! Managed to devise a test independent of D76992 (proving it ain't the latter's fault ;-)

In D78847#2004129, @Ayal wrote:

In D78847#2003543, @fhahn wrote:

LGTM, thanks!

I'll add a reduced version of the crashing case from D78847 when I recommit it.

Thanks! Managed to devise a test independent of D76992 (proving it ain't the latter's fault ;-)

Great, that's even better!

Closed by commit rGa3c964a278b4: [LV] Fix recording of BranchTakenCount for FoldTail (authored by Ayal). · Explain WhyApr 26 2020, 10:37 AM

This revision was automatically updated to reflect the committed changes.

Hello,

The new fix delivered in this patch has caused an assert failure with a testcase having a loop with a very small trip count going through fold tail by masking.
The reduced testcase is as follows. The options are: -loop-vectorize -force-vector-interleave=4

For your reference, at the time of this writing, my HEAD is at commit 2d3f5a62de8e5d2cc25aaa49d0a00d31ed32544a

$ cat simple.ll
define void @foo() {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, 15
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

$ opt -loop-vectorize -force-vector-interleave=4 -S ./simple.ll

opt: llvm-project/llvm/include/llvm/IR/Instructions.h:1140: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.

Stack dump:
0.      Program arguments: build/bin/opt -loop-vectorize -force-vector-interleave=4 -S -debug-only=loop-vectorize simple.ll
1.      Running pass 'Function Pass Manager' on module 'simple.ll'.
2.      Running pass 'Loop Vectorization' on function '@foo'
 #0 0x00007c2509ec9bf4 PrintStackTraceSignalHandler(void*) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9bf4)
 #1 0x00007c2509ec6b98 llvm::sys::RunSignalHandlers() (build/bin/../lib/libLLVMSupport.so.11git+0x1e6b98)
 #2 0x00007c2509ec9f04 SignalHandler(int) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9f04)
 #3 0x00007c250ff004d8 (linux-vdso64.so.1+0x4d8)
 #4 0x00007c25090de98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
 #5 0x00007c25090de98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
 #6 0x00007c25090e0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
 #7 0x00007c25090cbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
 #8 0x00007c25090cbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
 #9 0x00007c25098fe518 llvm::ICmpInst::AssertOK() (build/bin/../lib/libLLVMVectorize.so.11git+0x7e518)
#10 0x00007c25098cc31c llvm::IRBuilderBase::CreateICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) (build/bin/../lib/libLLVMVectorize.so.11git+0x4c31c)
#11 0x00007c250996130c llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) (build/bin/../lib/libLLVMVectorize.so.11git+0xe130c)
#12 0x00007c2509961810 llvm::VPInstruction::execute(llvm::VPTransformState&) (build/bin/../lib/libLLVMVectorize.so.11git+0xe1810)
#13 0x00007c25099604bc llvm::VPBasicBlock::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe04bc)
#14 0x00007c25099620a8 llvm::VPlan::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe20a8)
#15 0x00007c25098eb368 llvm::LoopVectorizationPlanner::executePlan(llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (build/bin/../lib/libLLVMVectorize.so.11git+0x6b368)
#16 0x00007c25098f7380 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build/bin/../lib/libLLVMVectorize.so.11git+0x77380)
#17 0x00007c25098f8b74 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (build/bin/../lib/libLLVMVectorize.so.11git+0x78b74)
#18 0x00007c2509903320 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMVectorize.so.11git+0x83320)
#19 0x00007c250af0069c llvm::FPPassManager::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMCore.so.11git+0x27069c)
#20 0x00007c250af00b00 llvm::FPPassManager::runOnModule(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x270b00)
#21 0x00007c250af012b4 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x2712b4)
#22 0x00007c250af0195c llvm::legacy::PassManager::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x27195c)
#23 0x0000000010031aec main (build/bin/opt+0x10031aec)
#24 0x00007c25090b441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
#25 0x00007c25090b4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0

It looks like setting a vector as BackedgeTakenCount has caused the type mismatch when generating the icmp instructions

...
vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %induction = add i64 %index, 0
  %induction1 = add i64 %index, 1
  %induction2 = add i64 %index, 2
  %induction3 = add i64 %index, 3
  %0 = icmp ule i64 %induction, 14   <======== assert when generating this line

In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>

There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.

Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.
Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.

I will open a Bugzzila and copy its link to this page when my password reset goes through.

Thanks,
Anh

In D78847#2028708, @anhtuyen wrote:

Hello,

[snip]

In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>

There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.

Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.

Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.

I will open a Bugzzila and copy its link to this page when my password reset goes through.

Thanks,
Anh

Yes, thanks for catching this!
One quick fix is indeed to set VTCMO to TCMO when State->VF == 1, instead of "splatting" it into a vector of a single element.
Thinking if fold-tail-by-masking should be restricted to work for VF>1 only, given that only vectors (loads/stores) get masked.

In D78847#2028759, @Ayal wrote:

In D78847#2028708, @anhtuyen wrote:

Hello,

[snip]

In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>

There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.

Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.

Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.

I will open a Bugzzila and copy its link to this page when my password reset goes through.

Thanks,
Anh

Yes, thanks for catching this!
One quick fix is indeed to set VTCMO to TCMO when State->VF == 1, instead of "splatting" it into a vector of a single element.
Thinking if fold-tail-by-masking should be restricted to work for VF>1 only, given that only vectors (loads/stores) get masked.

I also came up (and gave up) that fix last week, because it would not work for a loop whose VF is 1, but the loop bound is a vector. I will come up with an example shortly to demonstrate my thought.

In D78847#2028819, @anhtuyen wrote:

In D78847#2028759, @Ayal wrote:

In D78847#2028708, @anhtuyen wrote:

Hello,

[snip]

In this example, the operand[0] (%induction) correctly has type i64, but the loop bound (14) is of vector type <1 x i64>

There might be multiple ways to address this assert failure. I list below a few simple ones for your reference: they might or might not be a good solution at all.

Option 1: Not to generate the icmp instructions for %induction. In the particular case of this testcase, these instructions seem to be redundant.

Option 2: If we are to generate the icmp instructions above, can we set the BackedgeTakenCount to the State depending on the type of the first operand? In cases like this one when the first operand is not a vector type, using Value *TCMO instead of Value *VTCMO might be an option.

I will open a Bugzzila and copy its link to this page when my password reset goes through.

Thanks,
Anh

Yes, thanks for catching this!
One quick fix is indeed to set VTCMO to TCMO when State->VF == 1, instead of "splatting" it into a vector of a single element.
Thinking if fold-tail-by-masking should be restricted to work for VF>1 only, given that only vectors (loads/stores) get masked.

I also came up (and gave up) that fix last week, because it would not work for a loop whose VF is 1, but the loop bound is a vector. I will come up with an example shortly to demonstrate my thought.

Below is an example to demonstrate that setting VTCMO to TCMO when State->VF == 1 will not help in the case of a loop of VF 1 having a vector loop-bound.
In my example below, I add some profile meta data to guide the optimization selection process.
As you can tell it from the meta data, the numeric values were pretty much randomly selected as 10, 20, 30, and so on.
Basically, almost any arbitrary data will work as long as it is meaningful.

$ cat ./simple2.ll

define void @foo() !prof !12 {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %addr = phi double* [ %ptr, %for.body ], [ undef, %entry ]
  %ptr = getelementptr inbounds double, double* %addr, i64 1
  %cond = icmp eq double* %ptr, undef
  br i1 %cond, label %for.cond.cleanup, label %for.body
}

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"ProfileSummary", !1}
!1 = !{!2, !3, !4, !5, !6, !7, !8, !9}
!2 = !{!"ProfileFormat", !"InstrProf"}
!3 = !{!"TotalCount", i64 10}
!4 = !{!"MaxCount", i64 20}
!5 = !{!"MaxInternalCount", i64 30}
!6 = !{!"MaxFunctionCount", i64 40}
!7 = !{!"NumCounts", i64 50}
!8 = !{!"NumFunctions", i64 60}
!9 = !{!"DetailedSummary", !10}
!10 = !{!11}
!11 = !{i32 999999, i64 70, i32 80}
!12 = !{!"function_entry_count", i64 0}

Again, I will use the same option -loop-vectorize -force-vector-interleave=4

$ opt -loop-vectorize -force-vector-interleave=4 -S ./simple2.ll

opt: llvm-project/llvm/include/llvm/IR/Instructions.h:1144: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.

Stack dump:
0.      Program arguments: build/bin/opt -loop-vectorize -force-vector-interleave=4 -S -debug-only=loop-vectorize simple2.ll
1.      Running pass 'Function Pass Manager' on module 'simple2.ll'.
2.      Running pass 'Loop Vectorization' on function '@foo'
 #0 0x000070cd938b9bf4 PrintStackTraceSignalHandler(void*) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9bf4)
 #1 0x000070cd938b6b98 llvm::sys::RunSignalHandlers() (build/bin/../lib/libLLVMSupport.so.11git+0x1e6b98)
 #2 0x000070cd938b9f04 SignalHandler(int) (build/bin/../lib/libLLVMSupport.so.11git+0x1e9f04)
 #3 0x000070cd998f04d8 (linux-vdso64.so.1+0x4d8)
 #4 0x000070cd92ace98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
 #5 0x000070cd92ace98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
 #6 0x000070cd92ad0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
 #7 0x000070cd92abbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
 #8 0x000070cd92abbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
 #9 0x000070cd932ee5bc llvm::ICmpInst::AssertOK() (build/bin/../lib/libLLVMVectorize.so.11git+0x7e5bc)
#10 0x000070cd932bc37c llvm::IRBuilderBase::CreateICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) (build/bin/../lib/libLLVMVectorize.so.11git+0x4c37c)
#11 0x000070cd933513cc llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) (build/bin/../lib/libLLVMVectorize.so.11git+0xe13cc)
#12 0x000070cd933518d0 llvm::VPInstruction::execute(llvm::VPTransformState&) (build/bin/../lib/libLLVMVectorize.so.11git+0xe18d0)
#13 0x000070cd9335057c llvm::VPBasicBlock::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe057c)
#14 0x000070cd93352188 llvm::VPlan::execute(llvm::VPTransformState*) (build/bin/../lib/libLLVMVectorize.so.11git+0xe2188)
#15 0x000070cd932db3c8 llvm::LoopVectorizationPlanner::executePlan(llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (build/bin/../lib/libLLVMVectorize.so.11git+0x6b3c8)
#16 0x000070cd932e73e0 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build/bin/../lib/libLLVMVectorize.so.11git+0x773e0)
#17 0x000070cd932e8bd4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (build/bin/../lib/libLLVMVectorize.so.11git+0x78bd4)
#18 0x000070cd932f33e0 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMVectorize.so.11git+0x833e0)
#19 0x000070cd948f073c llvm::FPPassManager::runOnFunction(llvm::Function&) (build/bin/../lib/libLLVMCore.so.11git+0x27073c)
#20 0x000070cd948f0ba0 llvm::FPPassManager::runOnModule(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x270ba0)
#21 0x000070cd948f1354 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x271354)
#22 0x000070cd948f19fc llvm::legacy::PassManager::run(llvm::Module&) (build/bin/../lib/libLLVMCore.so.11git+0x2719fc)
#23 0x0000000010031aec main (build/bin/opt+0x10031aec)
#24 0x000070cd92aa441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
#25 0x000070cd92aa4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0

When setting BackedgeTakenCount with TCMO if State->VF == 1, the mismatch will occur

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %0 = add i64 %index, 0
  %next.gep = getelementptr double, double* undef, i64 %0
  %1 = add i64 %index, 1
  %next.gep1 = getelementptr double, double* undef, i64 %1
  %2 = add i64 %index, 2
  %next.gep2 = getelementptr double, double* undef, i64 %2
  %3 = add i64 %index, 3
  %next.gep3 = getelementptr double, double* undef, i64 %3
  %broadcast.splatinsert = insertelement <1 x i64> undef, i64 %index, i32 0
  %broadcast.splat = shufflevector <1 x i64> %broadcast.splatinsert, <1 x i64> undef, <1 x i32> zeroinitializer
  %vec.iv = add <1 x i64> %broadcast.splat, zeroinitializer
  %vec.iv4 = add <1 x i64> %broadcast.splat, <i64 1>
  %vec.iv5 = add <1 x i64> %broadcast.splat, <i64 2>
  %vec.iv6 = add <1 x i64> %broadcast.splat, <i64 3>
  %4 = icmp ule <1 x i64> %vec.iv, <i64 2305843009213693951>  <======== assert when generating this line

In this case, the operand[0] (which is %vec.iv) has type <1 x i64>. The loop-bound, however, will get the type as i64 instead of the expected <i64 2305843009213693951> .

In D78847#2028853, @anhtuyen wrote:

[snip]

Below is an example to demonstrate that setting VTCMO to TCMO when State->VF == 1 will not help in the case of a loop of VF 1 having a vector loop-bound.

[snip]

In this case, the operand[0] (which is %vec.iv) has type <1 x i64>. The loop-bound, however, will get the type as **i64** instead of the expected **<i64 2305843009213693951>** .

Right, VPWidenCanonicalIVRecipe::execute() also needs to treat VF==1 differently.

In D78847#2028930, @Ayal wrote:
In D78847#2028853, @anhtuyen wrote:

[snip]

Below is an example to demonstrate that setting VTCMO to TCMO when State->VF == 1 will not help in the case of a loop of VF 1 having a vector loop-bound.

[snip]
In this case, the operand[0] (which is %vec.iv) has type <1 x i64>. The loop-bound, however, will get the type as **i64** instead of the expected **<i64 2305843009213693951>** .
Right, VPWidenCanonicalIVRecipe::execute() also needs to treat VF==1 differently.

I looked at that, too. It still gives us the assert at a different location. We will need a little more work to do.

opt -loop-vectorize -force-vector-interleave=4 -S simple2.ll

opt: llvm-project/llvm/lib/IR/Instructions.cpp:2446: static llvm::BinaryOperator *llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value *, llvm::Value *, const llvm::Twine &, llvm::Instruction *): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.

In D78847#2029339, @anhtuyen wrote:

[snip]

Right, VPWidenCanonicalIVRecipe::execute() also needs to treat VF==1 differently.

I looked at that, too. It still gives us the assert at a different location. We will need a little more work to do.

The above implies setting both VStart to CanonicalIV instead of splatting, and VStep to ConstantInt::set(STy, Part) instead of ConstantVector::get(Indices), when VF==1. Would doing so pass all your tests?

Some of this issue stems from not using the overloaded getBroadcastInstrs().

The more general issues raised are whether to apply foldTail when VF==1 in the absence of masked scalar loads/stores, and/or whether to internally turn foldTail on for small loops (due to cost considerations) when the VF and/or UF are provided externally (bypassing their cost-based selection process).

In D78847#2031205, @Ayal wrote:
In D78847#2029339, @anhtuyen wrote:

[snip]
Right, VPWidenCanonicalIVRecipe::execute() also needs to treat VF==1 differently.

I looked at that, too. It still gives us the assert at a different location. We will need a little more work to do.
The above implies setting both VStart to CanonicalIV instead of splatting, and VStep to ConstantInt::set(STy, Part) instead of ConstantVector::get(Indices), when VF==1. Would doing so pass all your tests?

Some of this issue stems from not using the overloaded getBroadcastInstrs().

The more general issues raised are whether to apply foldTail when VF==1 in the absence of masked scalar loads/stores, and/or whether to internally turn foldTail on for small loops (due to cost considerations) when the VF and/or UF are provided externally (bypassing their cost-based selection process).

Ah, yes! That should work for me. Thanks!
My personal preference, but I think fold-tail-by-masking should be restricted for VF>1 only.

Based on discussion with Ayal, I created a quick patch to the problem of type-mismatch issue I reported above.
https://reviews.llvm.org/D79976

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VPlan.h

5 lines

VPlan.cpp

4 lines

test/

Transforms/

LoopVectorize/

X86/

x86-interleaved-accesses-masked-group.ll

8 lines

tail-folding-counting-down.ll

33 lines

Diff 260174

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 1,538 Lines • ▼ Show 20 Lines	VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {
if (Entry)		if (Entry)
Entry->setPlan(this);		Entry->setPlan(this);
}		}

~VPlan() {		~VPlan() {
if (Entry)		if (Entry)
VPBlockBase::deleteCFG(Entry);		VPBlockBase::deleteCFG(Entry);
for (auto &MapEntry : Value2VPValue)		for (auto &MapEntry : Value2VPValue)
if (MapEntry.second != BackedgeTakenCount)
delete MapEntry.second;		delete MapEntry.second;
if (BackedgeTakenCount)		if (BackedgeTakenCount)
delete BackedgeTakenCount; // Delete once, if in Value2VPValue or not.		delete BackedgeTakenCount;
for (VPValue *Def : VPExternalDefs)		for (VPValue *Def : VPExternalDefs)
delete Def;		delete Def;
for (VPValue *CBV : VPCBVs)		for (VPValue *CBV : VPCBVs)
delete CBV;		delete CBV;
}		}

/// Generate the IR code for this VPlan.		/// Generate the IR code for this VPlan.
void execute(struct VPTransformState *State);		void execute(struct VPTransformState *State);
▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

	Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines
	/// basic-blocks as needed, and fill them all.			/// basic-blocks as needed, and fill them all.
	void VPlan::execute(VPTransformState *State) {			void VPlan::execute(VPTransformState *State) {
	// -1. Check if the backedge taken count is needed, and if so build it.			// -1. Check if the backedge taken count is needed, and if so build it.
	if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {			if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {
	Value *TC = State->TripCount;			Value *TC = State->TripCount;
	IRBuilder<> Builder(State->CFG.PrevBB->getTerminator());			IRBuilder<> Builder(State->CFG.PrevBB->getTerminator());
	auto *TCMO = Builder.CreateSub(TC, ConstantInt::get(TC->getType(), 1),			auto *TCMO = Builder.CreateSub(TC, ConstantInt::get(TC->getType(), 1),
	"trip.count.minus.1");			"trip.count.minus.1");
	Value2VPValue[TCMO] = BackedgeTakenCount;			Value *VTCMO = Builder.CreateVectorSplat(State->VF, TCMO, "broadcast");
				for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part)
				State->set(BackedgeTakenCount, VTCMO, Part);
	}			}

	// 0. Set the reverse mapping from VPValues to Values for code generation.			// 0. Set the reverse mapping from VPValues to Values for code generation.
	for (auto &Entry : Value2VPValue)			for (auto &Entry : Value2VPValue)
	State->VPValue2Value[Entry.second] = Entry.first;			State->VPValue2Value[Entry.second] = Entry.first;

	BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;			BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;
	BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();			BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();
	▲ Show 20 Lines • Show All 504 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0			; ENABLED_MASKED_STRIDED-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP9]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP9]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]
	; ENABLED_MASKED_STRIDED: vector.ph:			; ENABLED_MASKED_STRIDED: vector.ph:
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7			; ENABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7
	; ENABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8			; ENABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8
	; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1			; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> undef, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> undef, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> undef, i32 [[TRIP_COUNT_MINUS_1]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> undef, i32 [[TRIP_COUNT_MINUS_1]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> undef, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> undef, <8 x i32> zeroinitializer
				; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> undef, i32 [[CONV]], i32 0
				; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> undef, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0			; ENABLED_MASKED_STRIDED-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP9]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP9]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]
	; ENABLED_MASKED_STRIDED: vector.ph:			; ENABLED_MASKED_STRIDED: vector.ph:
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7			; ENABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7
	; ENABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8			; ENABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8
	; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1			; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> undef, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> undef, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> undef, i32 [[TRIP_COUNT_MINUS_1]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> undef, i32 [[TRIP_COUNT_MINUS_1]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> undef, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> undef, <8 x i32> zeroinitializer
				; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> undef, i32 [[CONV]], i32 0
				; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> undef, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = mul nsw i32 [[INDEX]], 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = mul nsw i32 [[INDEX]], 3
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	▲ Show 20 Lines • Show All 512 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll

Show All 33 Lines	while.body:
br i1 %cmp, label %while.end.loopexit, label %while.body		br i1 %cmp, label %while.end.loopexit, label %while.body

while.end.loopexit:		while.end.loopexit:
br label %while.end		br label %while.end

while.end:		while.end:
ret void		ret void
}		}

		; Make sure a loop is successfully vectorized with fold-tail when the backedge
		; taken count is constant and used inside the loop. Issue revealed by D76992.
		;
		define void @reuse_const_btc(i8* %A) optsize {
		; CHECK-LABEL: @reuse_const_btc
		; CHECK: {{%.}} = icmp ule <4 x i32> {{%.}}, <i32 13, i32 13, i32 13, i32 13>
		; CHECK: {{%.}} = select <4 x i1> {{%.}}, <4 x i32> <i32 12, i32 12, i32 12, i32 12>, <4 x i32> <i32 13, i32 13, i32 13, i32 13>
		;
		entry:
		br label %loop

		loop:
		%riv = phi i32 [ 13, %entry ], [ %rivMinus1, %merge ]
		%sub = sub nuw nsw i32 20, %riv
		%arrayidx = getelementptr inbounds i8, i8* %A, i32 %sub
		%cond0 = icmp eq i32 %riv, 7
		br i1 %cond0, label %then, label %else
		then:
		br label %merge
		else:
		br label %merge
		merge:
		%blend = phi i32 [ 13, %then ], [ 12, %else ]
		%trunc = trunc i32 %blend to i8
		store i8 %trunc, i8* %arrayidx, align 1
		%rivMinus1 = add nuw nsw i32 %riv, -1
		%cond = icmp eq i32 %riv, 0
		br i1 %cond, label %exit, label %loop

		exit:
		ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix recording of BranchTakenCount for FoldTailClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 260174

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll

[LV] Fix recording of BranchTakenCount for FoldTail
ClosedPublic