Page MenuHomePhabricator

[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
ClosedPublic

Authored by gilr on Oct 7 2019, 7:39 AM.

Details

Summary

The sink-after and interleave-group vectorization decisions were so far applied to VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence.
This patch refactors buildVPlanWithRecipes() to construct a simpler initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.

Diff Detail

Event Timeline

gilr created this revision.Oct 7 2019, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 7:39 AM
rengolin added inline comments.Oct 9 2019, 3:28 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6559

Other try{something} functions return a recipe pointer, while this one returns a boolean.

If you rename this to "check" or "can" (instead of try), then you shouldn't clamp the range.

I'm not sure what's best here, but this way looks a bit odd.

gilr marked an inline comment as done.Oct 9 2019, 5:48 AM
gilr added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6559

Agreed. Will inline this code at call site instead.

gilr updated this revision to Diff 224026.Oct 9 2019, 5:58 AM
  • Applied review comment
  • Simplified predicate lambda function

So, IIUC, this is changing tryCreateRecipe to move the interleave recipe creation to the caller, buildVPlanWithVPRecipes. The dependencies with the sink values is recorded initially, then the plans are created, then the sinks are applied and, if any, the interleave groups.

The refactoring of VPRecipeBase make sense to me and the resulting code looks cleaner, but this doesn't look like an NFC change. Not that this is a bad thing, but I can't quite reach the conclusion that all the loops that would have been interleaved will continue to do so, because the order of the plans may change (for better or worse) the conditions in which the plan starts with.

Regardless, I think this is a positive change and goes in the direction we want the VPlan infrastructure to be. It also looks semantically equivalent (with the caveat above), so the change looks good to me.

It would be good to wait for further reviews on the next few days, just in case I missed something. I haven't looked at this code for a while, so that's very likely. :)

Thanks!

Ayal added inline comments.Oct 9 2019, 9:17 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6986

This if (!Recipe) case and the next should be nested?

6995

Update above comment: we no longer check for Interleave Groups widening recipe here, only Inductions and Phi nodes.

7133

Can retain the early-exiting "if (tryToCreateRecipe()) continue"?

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
53

receipes >> recipes

llvm/lib/Transforms/Vectorize/VPlan.cpp
117

Better place the implementation of removeFromParent() next to that of eraseFromParent() below.

286

Above comment suffices at header file only (and strictly speaking it's about Recipes rather than instructions).

Update Parent as done in insertBefore() above.

Should (existing) insertBefore() also assert !Parent before setting it?

llvm/lib/Transforms/Vectorize/VPlan.h
611

Better place removeFromParent() right before eraseFromParent() below, to emphasize that the latter (only) also deletes it.

gilr added a comment.Oct 9 2019, 10:59 AM

So, IIUC, this is changing tryCreateRecipe to move the interleave recipe creation to the caller, buildVPlanWithVPRecipes. The dependencies with the sink values is recorded initially, then the plans are created, then the sinks are applied and, if any, the interleave groups.

Correct. Motivation is to express these dependecies as VPlan transformation phase ordering. Ayal discussed this is more details in his 2017 VPlan talk.

... but this doesn't look like an NFC change. Not that this is a bad thing, but I can't quite reach the conclusion that all the loops that would have been interleaved will continue to do so, because the order of the plans may change (for better or worse) the conditions in which the plan starts with.
Regardless, I think this is a positive change and goes in the direction we want the VPlan infrastructure to be. It also looks semantically equivalent (with the caveat above), so the change looks good to me.
It would be good to wait for further reviews on the next few days, just in case I missed something.

Excellent. The intention is indeed to only change the way Planner executes these already-taken decisions (SA by Legal, IG by CostModel).
Thanks!

gilr marked 7 inline comments as done.Oct 11 2019, 5:01 AM
gilr updated this revision to Diff 224571.Oct 11 2019, 5:23 AM

Applied review comments.

fhahn added inline comments.Oct 11 2019, 6:41 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7156

Not sure how other feel, but I think it would be great if we could move this transform out of LoopVectorize.cpp , to group together VP2VP transforms. I think it would fit well into llvm/lib/Transforms/Vectorize/VPlanHCFGTransforms.h (although the name mentions HFCGTransforms, maybe it should be just VplanToVplanTransforms.h/cpp).

I could not spot anything that would prevent moving it to a different file on first glance.

fhahn added a comment.Oct 11 2019, 7:23 AM

A while ago, I put up a patch to do sinking just on the VPInstruction/recipe level, but I never finished integrating it: D46826.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7164

This could just be Sink->moveAfter(RecipeBuilder.getRecipe(Entry.second)) . I've added it in D46825 and now finally have a reason to commit it ;)

gilr marked 2 inline comments as done.Oct 13 2019, 3:48 AM
gilr added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7156

This is currently still ingredient-based, i.e. not a pure VPlan2VPlan transformation, and as you mention the VPlan2VPlan part it's basically just a moveAfter(). A VPlan-based sinkAfter() should not be based on ingredients (as stated in D46826) and instead might take a Recipe2Recipe map., but that seems a bit of an overkill for this patch. Modelling VPlan-based transformations is definitely worth a larger discussion.

7164

Right. Will use it instead.
Seems it doesn't update Parent, though. Will rewrite as a composition of the more basic removeFromParent(), insertAfter() to avoid duplicating that and the assertions.

gilr updated this revision to Diff 224777.Oct 13 2019, 3:50 AM

Applied review comments.

Ayal added inline comments.Oct 13 2019, 4:23 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
290

nit: use Parent, as in insertBefore above.

gilr marked an inline comment as done.Oct 13 2019, 4:39 AM
gilr updated this revision to Diff 224779.Oct 13 2019, 4:42 AM

Applied review comments.

fhahn added inline comments.Oct 13 2019, 11:54 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7156

Sounds good to me, let's not overcomplicate things.

Yep, D46826 is still not really suitable here yet, which I only realised after accidentally submitting the comment here. Sorry for the confusion.

7164

Makes sense, thanks!

Originally I tried to mirror the implementation of Instruction::moveAfter, but it seems like it is only supposed to move instructions inside a basic block, although that's not really clear from the documentation.

llvm/lib/Transforms/Vectorize/VPlan.h
985

nit: it would be great to have a doc comment.

gilr updated this revision to Diff 224792.Oct 13 2019, 1:08 PM
gilr marked an inline comment as done.

Applied review comments.

Ayal added inline comments.Oct 13 2019, 2:14 PM
llvm/lib/Transforms/Vectorize/VPlan.h
985

While you're at it... worth adding that a full mask (potentially used by this recipe) is represented by nullptr, as well.

gilr updated this revision to Diff 224808.Oct 13 2019, 11:19 PM

Applied review comments.
Reused VPWidenMemoryInstructionRecipe's new getMask() in its execute().

gilr marked an inline comment as done.Oct 13 2019, 11:19 PM
gilr updated this revision to Diff 227432.Nov 1 2019, 6:01 AM

Rebased.

gilr added a comment.Nov 1 2019, 6:25 AM

Any other concern, guys?

Ayal added a comment.Nov 1 2019, 1:57 PM

Any other concern, guys?

None from me.
@rengolin , @fhahn - ok to land?

hsaito added inline comments.Nov 1 2019, 2:43 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6984

I think the code here is easier to understand if written as

if ((Recipe = tryToWiden...())     ||
    (Recipe = tryToOptimize...()) ||
    (Recipe = tryToBlend())         ||
    (isa<PHINode()> && (Recipe = new VPWidenPHIRecipe())) {
    setRecipe(Instr, Recipe);
}

None from me.
@rengolin , @fhahn - ok to land?

None from me either. LGTM.

fhahn accepted this revision.Nov 2 2019, 10:30 AM

LGTM, thanks for getting things started in that direction!

Just a few optional small nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6895

nit: could be changed to IsSingleton, to make it clearer we this is a boolean.

7064

nit: [r]ecipes?

7068

nit: [i]nstructions?

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
53

It may be worth mentioning the implicit contract: to mark a recipe for recording, the instruction needs to be added with a nullptr value.

57

maybe mention that we only record only if the instruction is already in the map.

This revision is now accepted and ready to land.Nov 2 2019, 10:30 AM
gilr marked 6 inline comments as done.Nov 3 2019, 8:55 AM

Thanks everyone!

gilr updated this revision to Diff 227617.Nov 3 2019, 8:57 AM

Addressed comments.

This revision was automatically updated to reflect the committed changes.
bkramer added a subscriber: bkramer.EditedNov 4 2019, 6:08 AM

This fails with asan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/36343/steps/check-llvm%20asan/logs/stdio
Reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b

==27731==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000300b0 at pc 0x000006bbd784 bp 0x7ffc45da9750 sp 0x7ffc45da9748
READ of size 8 at 0x6060000300b0 thread T0
    #0 0x6bbd783 in getInsertPos /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:467:41
    #1 0x6bbd783 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7082
    #2 0x6bbd783 in __invoke<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7080:20) &, unsigned int> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/type_traits:3528
    #3 0x6bbd783 in __call<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7080:20) &, unsigned int> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/__functional_base:317
    #4 0x6bbd783 in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1540
    #5 0x6bbd783 in std::__1::__function::__func<llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&)::$_31, std::__1::allocator<llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&)::$_31>, bool (unsigned int)>::operator()(unsigned int&&) /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1714
    #6 0x6b8232b in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1867:16
    #7 0x6b8232b in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:2473
    #8 0x6b8232b in getDecisionAndClampRange /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6617
    #9 0x6b8232b in llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7085
    #10 0x6b74387 in llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned int, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7043:9
    #11 0x6b71f01 in llvm::LoopVectorizationPlanner::plan(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6453:5
    #12 0x6b912b1 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:47
    #13 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16
    #14 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17
    #15 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27
    #16 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16
    #17 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27
    #18 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694
    #19 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12
    #20 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #21 0x9e53e9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/opt+0x9e53e9)

0x6060000300b0 is located 48 bytes inside of 56-byte region [0x606000030080,0x6060000300b8)
freed by thread T0 here:
    #0 0xaa9bd8 in operator delete(void*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cc:166
    #1 0x6b73274 in llvm::InterleavedAccessInfo::reset() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:550:7
    #2 0x6b7239b in llvm::LoopVectorizationPlanner::plan(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6444:23
    #3 0x6b912b1 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:47
    #4 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16
    #5 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17
    #6 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27
    #7 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16
    #8 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27
    #9 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694
    #10 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12
    #11 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

previously allocated by thread T0 here:
    #0 0xaa9198 in operator new(unsigned long) /b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cc:105
    #1 0x434e673 in llvm::InterleavedAccessInfo::createInterleaveGroup(llvm::Instruction*, int, llvm::Align) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:643:9
    #2 0x434b4d9 in llvm::InterleavedAccessInfo::analyzeInterleaving(bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/VectorUtils.cpp:930:17
    #3 0x6b90a71 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7608:9
    #4 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16
    #5 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17
    #6 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27
    #7 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16
    #8 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27
    #9 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694
    #10 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12
    #11 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
gilr added a comment.Nov 5 2019, 11:24 PM

Fail was due to a bug in IAI whose reset() method was not clearing the set holding pointers to the interleave groups being deleted, uncovered by this patch's use of the set. Patch recommited as 100e797adb433724a17c9b42b6533cd634cb796b with a fix to reset().
Thanks Benjamin!

rtrieu added a subscriber: rtrieu.Nov 6 2019, 8:22 PM

The recommited patch is hitting an assert. See message and reduced test case below:

assertion failed at llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:91 in llvm::VPRecipeBase *llvm::VPRecipeBuilder::getRecipe(llvm::Instruction *): Ingredient2Recipe[I] != nullptr && "Ingredient doesn't have a recipe"

"clang" \
"-cc1" \
"-triple" "x86_64-unknown-linux-gnu" \
"-emit-obj" \
"-O3" \
"-w" \
"-vectorize-loops" \
"-x" "c" "test.c"
int test(int I1, int *Arr1, int *Arr2) {
  int i = 0, j = 0;
  int I2 = 0;
  
  int ret = 0;
                          
  for (i = 0; i < 5; i++) {
    if ((Arr1[i] < 5)) {
      I2 = I1;
    }
  } 

  for (i = 0; i < I2; i++, j++) {
    Arr2[3] = (j < I1 ? Arr1[j] : 0);
  }         
  if (j < I1) {
    ret = 1;
  }
        
  return ret;
}

The recommited patch is hitting an assert. See message and reduced test case below:

assertion failed at llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:91 in llvm::VPRecipeBase *llvm::VPRecipeBuilder::getRecipe(llvm::Instruction *): Ingredient2Recipe[I] != nullptr && "Ingredient doesn't have a recipe"

I've got a similar crash. My case is a little bit simpler:

#include <stdio.h>

void find(int *data) {
    char t = 0; // changing char to int won't trigger the crash
    for (int i = 789; i > 0; i--)
        t = t || (data[i] == 42); // note the short circuit of OR
    if (t) // removing this conditional won't trigger the crash
      printf("some string");
}

This compiles to (under clang++ -O3):

define dso_local void @_Z4findPi(i32* nocapture readonly %0) local_unnamed_addr #0 {
  br label %4

2:                                                ; preds = %11
  %3 = phi i1 [ %12, %11 ]
  br i1 %3, label %16, label %18

4:                                                ; preds = %11, %1
  %5 = phi i64 [ 789, %1 ], [ %13, %11 ]
  %6 = phi i1 [ true, %1 ], [ %15, %11 ]
  br i1 %6, label %7, label %11

7:                                                ; preds = %4
  %8 = getelementptr inbounds i32, i32* %0, i64 %5
  %9 = load i32, i32* %8, align 4, !tbaa !2
  %10 = icmp eq i32 %9, 42
  br label %11

11:                                               ; preds = %4, %7
  %12 = phi i1 [ true, %4 ], [ %10, %7 ]
  %13 = add nsw i64 %5, -1
  %14 = icmp eq i64 %13, 0
  %15 = xor i1 %12, true
  br i1 %14, label %2, label %4

16:                                               ; preds = %2
  %17 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([12 x i8], [12 x i8]* @.str, i64 0, i64 0))
  br label %18

18:                                               ; preds = %2, %16
  ret void
}

It identifies "br i1 %6, label %7, label %11" as sinkable and tries to sink it.

tianqing added inline comments.Nov 7 2019, 6:29 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7128

Previously we have this conditional guard against sinking of certain instructions so the problem got hidden. The real problem should be in IVDescriptors.cpp:RecurrenceDescriptor::isFirstOrderRecurrence(). Probably it's better to allow only a whitelist of arithmetic opcodes.

Please note there is another discussion https://reviews.llvm.org/D69228 going on with that function.

fhahn added a comment.Nov 8 2019, 5:11 AM

Gil, I can put a patch up for not sinking past terminators in OVDescriptor, as I am already working on patches in that area.

I’d also suggest to land the removeFromParent & co changes separately from the other changes in the patch.

gilr added a subscriber: echristo.Nov 8 2019, 5:46 AM

Gil, I can put a patch up for not sinking past terminators in OVDescriptor, as I am already working on patches in that area.

I’d also suggest to land the removeFromParent & co changes separately from the other changes in the patch.

Ah, sorry, recommitted before noticing this comment :(
Also added a LIT based on @rtrieu's test case (thanks @rtrieu, @echristo, @tianqing!)

I'm seeing a new failure with this patch: http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable/builds/1128 . Crash log below. Please let me know if you need me to come up with a testcase.

`
FAILED: out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o 
/bin/bash -c "PWD=/proc/self/cwd ulimit -t 240; llvm.inst/bin/clang 	-I external/tcpdump -I out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates -I out/target/product/angler/gen/EXECUTABLES/tcpdump_intermediates -I libnativehelper/include/nativehelper \$(cat out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/import_includes)  -I system/core/include -I system/media/audio/include -I hardware/libhardware/include -I hardware/libhardware_legacy/include -I hardware/ril/include -I libnativehelper/include -I frameworks/native/include -I frameworks/native/opengl/include -isystem frameworks/av/include -isystem out/target/product/angler/obj/include -isystem device/huawei/angler/kernel-headers -isystem hardware/qcom/msm8994/kernel-headers -isystem bionic/libc/arch-arm64/include -isystem bionic/libc/include -isystem bionic/libc/kernel/uapi -isystem bionic/libc/kernel/uapi/asm-arm64 -isystem bionic/libc/kernel/android/uapi -c  -fno-exceptions -Wno-multichar -fno-strict-aliasing -fstack-protector-strong -ffunction-sections -fdata-sections -funwind-tables -Wa,--noexecstack -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-short-enums -no-canonical-prefixes -Werror=pointer-to-int-cast -Werror=int-to-pointer-cast -Werror=implicit-function-declaration -DNDEBUG -O2 -g -Wstrict-aliasing=2 -DANDROID -fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith -DNDEBUG -UDEBUG -fdebug-prefix-map=/proc/self/cwd= -D__compiler_offsetof=__builtin_offsetof -Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic -Wno-unused-command-line-argument -fcolor-diagnostics -Wno-expansion-to-defined -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Werror=date-time -nostdlibinc -mcpu=cortex-a53 -target aarch64-linux-android -Bprebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/bin   -std=gnu99    -D_BSD_SOURCE -DHAVE_CONFIG_H -D_U_=\"__attribute__((unused))\" -Werror -fpie -D_USING_LIBCXX   -Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -Werror=address-of-temporary -Werror=return-type -Wno-error -O3 -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-process-unprofitable -MD -MF out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.d -o out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o external/tcpdump/tcpdump.c"
clang: /var/lib/buildbot/slaves/hexagon-build-03/aosp/llvm.src/llvm/include/llvm/Support/Casting.h:264: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::VPWidenMemoryInstructionRecipe, Y = llvm::VPRecipeBase]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Stack dump:
0.	Program arguments: llvm.inst/bin/clang -cc1 -triple aarch64-unknown-linux-android -emit-obj -mnoexecstack -disable-free -main-file-name tcpdump.c -mrelocation-model pic -pic-level 1 -pic-is-pie -mthread-model posix -mframe-pointer=all -relaxed-aliasing -fno-rounding-math -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu cortex-a53 -target-feature +fp-armv8 -target-feature +neon -target-feature +crc -target-feature +crypto -target-feature +sha2 -target-feature +aes -target-abi aapcs -mllvm -aarch64-fix-cortex-a53-835769=1 -fallow-half-arguments-and-returns -dwarf-column-info -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -ffunction-sections -fdata-sections -nostdsysteminc -resource-dir llvm.inst/lib/clang/10.0.0 -dependency-file out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.d -MT out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o -sys-header-deps -isystem frameworks/av/include -isystem out/target/product/angler/obj/include -isystem device/huawei/angler/kernel-headers -isystem hardware/qcom/msm8994/kernel-headers -isystem bionic/libc/arch-arm64/include -isystem bionic/libc/include -isystem bionic/libc/kernel/uapi -isystem bionic/libc/kernel/uapi/asm-arm64 -isystem bionic/libc/kernel/android/uapi -I external/tcpdump -I out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates -I out/target/product/angler/gen/EXECUTABLES/tcpdump_intermediates -I libnativehelper/include/nativehelper -I external/boringssl/src/include -I external/boringssl/src/include -I external/boringssl/src/include -I external/libpcap -I external/libcxx/include -I external/libcxxabi/include -I system/core/include -I system/media/audio/include -I hardware/libhardware/include -I hardware/libhardware_legacy/include -I hardware/ril/include -I libnativehelper/include -I frameworks/native/include -I frameworks/native/opengl/include -D _FORTIFY_SOURCE=2 -D NDEBUG -D ANDROID -D NDEBUG -U DEBUG -D __compiler_offsetof=__builtin_offsetof -D _BSD_SOURCE -D HAVE_CONFIG_H -D _U_=__attribute__((unused)) -D _USING_LIBCXX -internal-isystem llvm.inst/lib/clang/10.0.0/include -O3 -Wno-multichar -Werror=format-security -Werror=pointer-to-int-cast -Werror=int-to-pointer-cast -Werror=implicit-function-declaration -Wstrict-aliasing=2 -W -Wall -Wno-unused -Winit-self -Wpointer-arith -Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic -Wno-unused-command-line-argument -Wno-expansion-to-defined -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Werror=date-time -Werror -Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -Werror=address-of-temporary -Werror=return-type -Wno-error -std=gnu99 -fdebug-compilation-dir /var/lib/buildbot/slaves/hexagon-build-03/aosp -fdebug-prefix-map=/proc/self/cwd= -ferror-limit 19 -fmessage-length 0 -stack-protector 2 -fno-signed-char -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-process-unprofitable -o out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o -x c external/tcpdump/tcpdump.c 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'Function Pass Manager' on module 'external/tcpdump/tcpdump.c'.
4.	Running pass 'Loop Vectorization' on function '@lookup_ndo_printer'
 #0 0x00000000019a2684 PrintStackTraceSignalHandler(void*) (llvm.inst/bin/clang+0x19a2684)
 #1 0x00000000019a02de llvm::sys::RunSignalHandlers() (llvm.inst/bin/clang+0x19a02de)
 #2 0x00000000019a2aa8 SignalHandler(int) (llvm.inst/bin/clang+0x19a2aa8)
 #3 0x00007f370a6ccd10 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10)
 #4 0x00007f37095b9267 raise /build/buildd/glibc-2.21/signal/../sysdeps/unix/sysv/linux/raise.c:55:0
 #5 0x00007f37095baeca abort /build/buildd/glibc-2.21/stdlib/abort.c:91:0
 #6 0x00007f37095b203d __assert_fail_base /build/buildd/glibc-2.21/assert/assert.c:92:0
 #7 0x00007f37095b20f2 (/lib/x86_64-linux-gnu/libc.so.6+0x2e0f2)
 #8 0x0000000001adbe07 llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&) (llvm.inst/bin/clang+0x1adbe07)
 #9 0x0000000001ad5f5f llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned int, unsigned int) (llvm.inst/bin/clang+0x1ad5f5f)
#10 0x0000000001ad57c2 llvm::LoopVectorizationPlanner::plan(unsigned int) (llvm.inst/bin/clang+0x1ad57c2)
#11 0x0000000001adfd4b llvm::LoopVectorizePass::processLoop(llvm::Loop*) (llvm.inst/bin/clang+0x1adfd4b)
#12 0x0000000001ae2ba4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (llvm.inst/bin/clang+0x1ae2ba4)
#13 0x0000000001ae6813 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (llvm.inst/bin/clang+0x1ae6813)
#14 0x0000000001402eb3 llvm::FPPassManager::runOnFunction(llvm::Function&) (llvm.inst/bin/clang+0x1402eb3)
#15 0x0000000001403198 llvm::FPPassManager::runOnModule(llvm::Module&) (llvm.inst/bin/clang+0x1403198)
#16 0x0000000001403696 llvm::legacy::PassManagerImpl::run(llvm::Module&) (llvm.inst/bin/clang+0x1403696)
#17 0x0000000001b7d310 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (llvm.inst/bin/clang+0x1b7d310)
#18 0x00000000027a9c84 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (llvm.inst/bin/clang+0x27a9c84)
#19 0x0000000002e90cf3 clang::ParseAST(clang::Sema&, bool, bool) (llvm.inst/bin/clang+0x2e90cf3)
#20 0x00000000020c2480 clang::FrontendAction::Execute() (llvm.inst/bin/clang+0x20c2480)
#21 0x0000000002061698 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (llvm.inst/bin/clang+0x2061698)
#22 0x000000000216b03d clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (llvm.inst/bin/clang+0x216b03d)
#23 0x000000000094091f cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (llvm.inst/bin/clang+0x94091f)
#24 0x000000000093e3b1 main (llvm.inst/bin/clang+0x93e3b1)
#25 0x00007f37095a4a40 __libc_start_main /build/buildd/glibc-2.21/csu/libc-start.c:323:0
#26 0x000000000093b649 _start (llvm.inst/bin/clang+0x93b649)
clang: error: unable to execute command: Aborted (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 10.0.0 (/var/lib/buildbot/slaves/hexagon-build-03/aosp/llvm.src/clang e25bc5e0247141cb31093a370e22fe3249bdbb05)
Target: aarch64-unknown-linux-android
Thread model: posix
InstalledDir: llvm.inst/bin
clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: 
********************

`
gilr added a comment.Nov 8 2019, 12:01 PM

Please let me know if you need me to come up with a testcase.

That would be great. Thanks Eli!

Testcase follows; reproduce with opt -loop-vectorize:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-android"

%0 = type { i32 ()*, i32 }

@0 = internal unnamed_addr constant [59 x %0] [%0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 {i32 ()* null, i32 258}, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer,
%0 zeroinitializer], align 8

define dso_local void @lookup_ndo_printer(i32 %arg) {
bb1:
  br label %bb2

bb2:
  %tmp = phi %0* [ %tmp6, %bb2 ], [ getelementptr inbounds ([59 x %0], [59 x %0]* @0, i64 0, i64 0), %bb1 ]
  %tmp3 = getelementptr inbounds %0, %0* %tmp, i64 0, i32 1
  %tmp4 = load i32, i32* %tmp3, align 8
  %tmp5 = icmp eq i32 %tmp4, 258
  %tmp6 = getelementptr inbounds %0, %0* %tmp, i64 1
  br i1 %tmp5, label %bb65, label %bb2

bb65:
  unreachable
}
Ayal added a comment.Nov 9 2019, 2:21 PM

Testcase follows; reproduce with opt -loop-vectorize:

[snip]

This testcase is similar to PR40816, where a load from an array of constants feeds the compare of the loop's latch, allowing SCEV to determine the loop's trip count by looking at these constants. In this case there's a discrepancy between vectorizing such a load as an interleave group due to its strided access pattern, and scalarizing it due to being UniformAfterVectorization (see https://reviews.llvm.org/D68831#inline-627725). To be consistent with the previous behavior (of vectorizing), willWiden above should succeed if the Decision is CM_Interleave, even if the instruction isScalarAfterVectorization().

TODO notes for separate follow-up non-NFC patches:

  1. It's probably better to (consistently) scalarize such cases, or even identify them as DeadInstructions, rather than vectorize them w/ or w/o an interleave group.
  2. Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.
gilr added a comment.Nov 9 2019, 10:57 PM

Testcase follows; reproduce with opt -loop-vectorize:

Thanks Eli!
Patch recommitted with a fix and a lit based on this testcase.

  1. Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.

indvars does this transform. That said, it only runs once; in some cases, we manage to simplify the loop after indvars runs. Maybe it makes sense to run indvars again? We'd want to measure how much it actually triggers in practice.

Ayal added a comment.Nov 12 2019, 12:12 PM
  1. Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.

indvars does this transform. That said, it only runs once; in some cases, we manage to simplify the loop after indvars runs. Maybe it makes sense to run indvars again? We'd want to measure how much it actually triggers in practice.

Indeed indvars takes care of the testcase provided in https://reviews.llvm.org/D68577#1739605, leaving nothing to be vectorized. It does not handle the testcase of PR40816, though, presumably because of the store inside its loop (which prevents IndVarSimplify::PredicateLoopExits() from succeeding). Perhaps indvars should be extented to convert any loop to use a simple canonical IV whenever SCEV can compute its trip count.
Running indvars before LV might also relieve the latter from having to take care of live outs that are IV's.