The sink-after and interleave-group vectorization decisions were so far applied to VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence.
This patch refactors buildVPlanWithRecipes() to construct a simpler initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6472 | Other try{something} functions return a recipe pointer, while this one returns a boolean. If you rename this to "check" or "can" (instead of try), then you shouldn't clamp the range. I'm not sure what's best here, but this way looks a bit odd. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6472 | Agreed. Will inline this code at call site instead. |
So, IIUC, this is changing tryCreateRecipe to move the interleave recipe creation to the caller, buildVPlanWithVPRecipes. The dependencies with the sink values is recorded initially, then the plans are created, then the sinks are applied and, if any, the interleave groups.
The refactoring of VPRecipeBase make sense to me and the resulting code looks cleaner, but this doesn't look like an NFC change. Not that this is a bad thing, but I can't quite reach the conclusion that all the loops that would have been interleaved will continue to do so, because the order of the plans may change (for better or worse) the conditions in which the plan starts with.
Regardless, I think this is a positive change and goes in the direction we want the VPlan infrastructure to be. It also looks semantically equivalent (with the caveat above), so the change looks good to me.
It would be good to wait for further reviews on the next few days, just in case I missed something. I haven't looked at this code for a while, so that's very likely. :)
Thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6918 | This if (!Recipe) case and the next should be nested? | |
6932 | Update above comment: we no longer check for Interleave Groups widening recipe here, only Inductions and Phi nodes. | |
7064 | Can retain the early-exiting "if (tryToCreateRecipe()) continue"? | |
llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h | ||
53 | receipes >> recipes | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
117 | Better place the implementation of removeFromParent() next to that of eraseFromParent() below. | |
288 | Above comment suffices at header file only (and strictly speaking it's about Recipes rather than instructions). Update Parent as done in insertBefore() above. Should (existing) insertBefore() also assert !Parent before setting it? | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
611 | Better place removeFromParent() right before eraseFromParent() below, to emphasize that the latter (only) also deletes it. |
Correct. Motivation is to express these dependecies as VPlan transformation phase ordering. Ayal discussed this is more details in his 2017 VPlan talk.
... but this doesn't look like an NFC change. Not that this is a bad thing, but I can't quite reach the conclusion that all the loops that would have been interleaved will continue to do so, because the order of the plans may change (for better or worse) the conditions in which the plan starts with.
Regardless, I think this is a positive change and goes in the direction we want the VPlan infrastructure to be. It also looks semantically equivalent (with the caveat above), so the change looks good to me.
It would be good to wait for further reviews on the next few days, just in case I missed something.
Excellent. The intention is indeed to only change the way Planner executes these already-taken decisions (SA by Legal, IG by CostModel).
Thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7089 | Not sure how other feel, but I think it would be great if we could move this transform out of LoopVectorize.cpp , to group together VP2VP transforms. I think it would fit well into llvm/lib/Transforms/Vectorize/VPlanHCFGTransforms.h (although the name mentions HFCGTransforms, maybe it should be just VplanToVplanTransforms.h/cpp). I could not spot anything that would prevent moving it to a different file on first glance. |
A while ago, I put up a patch to do sinking just on the VPInstruction/recipe level, but I never finished integrating it: D46826.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7097 | This could just be Sink->moveAfter(RecipeBuilder.getRecipe(Entry.second)) . I've added it in D46825 and now finally have a reason to commit it ;) |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7089 | This is currently still ingredient-based, i.e. not a pure VPlan2VPlan transformation, and as you mention the VPlan2VPlan part it's basically just a moveAfter(). A VPlan-based sinkAfter() should not be based on ingredients (as stated in D46826) and instead might take a Recipe2Recipe map., but that seems a bit of an overkill for this patch. Modelling VPlan-based transformations is definitely worth a larger discussion. | |
7097 | Right. Will use it instead. |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
---|---|---|
292 | nit: use Parent, as in insertBefore above. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7089 | Sounds good to me, let's not overcomplicate things. Yep, D46826 is still not really suitable here yet, which I only realised after accidentally submitting the comment here. Sorry for the confusion. | |
7097 | Makes sense, thanks! Originally I tried to mirror the implementation of Instruction::moveAfter, but it seems like it is only supposed to move instructions inside a basic block, although that's not really clear from the documentation. | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
981 | nit: it would be great to have a doc comment. |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
---|---|---|
981 | While you're at it... worth adding that a full mask (potentially used by this recipe) is represented by nullptr, as well. |
Applied review comments.
Reused VPWidenMemoryInstructionRecipe's new getMask() in its execute().
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6916 | I think the code here is easier to understand if written as if ((Recipe = tryToWiden...()) || (Recipe = tryToOptimize...()) || (Recipe = tryToBlend()) || (isa<PHINode()> && (Recipe = new VPWidenPHIRecipe())) { setRecipe(Instr, Recipe); } |
LGTM, thanks for getting things started in that direction!
Just a few optional small nits.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6823 | nit: could be changed to IsSingleton, to make it clearer we this is a boolean. | |
7001 | nit: [r]ecipes? | |
7005 | nit: [i]nstructions? | |
llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h | ||
53 | It may be worth mentioning the implicit contract: to mark a recipe for recording, the instruction needs to be added with a nullptr value. | |
57 | maybe mention that we only record only if the instruction is already in the map. |
This fails with asan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/36343/steps/check-llvm%20asan/logs/stdio
Reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b
==27731==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000300b0 at pc 0x000006bbd784 bp 0x7ffc45da9750 sp 0x7ffc45da9748 READ of size 8 at 0x6060000300b0 thread T0 #0 0x6bbd783 in getInsertPos /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:467:41 #1 0x6bbd783 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7082 #2 0x6bbd783 in __invoke<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7080:20) &, unsigned int> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/type_traits:3528 #3 0x6bbd783 in __call<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7080:20) &, unsigned int> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/__functional_base:317 #4 0x6bbd783 in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1540 #5 0x6bbd783 in std::__1::__function::__func<llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&)::$_31, std::__1::allocator<llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&)::$_31>, bool (unsigned int)>::operator()(unsigned int&&) /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1714 #6 0x6b8232b in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1867:16 #7 0x6b8232b in operator() /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:2473 #8 0x6b8232b in getDecisionAndClampRange /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6617 #9 0x6b8232b in llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7085 #10 0x6b74387 in llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned int, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7043:9 #11 0x6b71f01 in llvm::LoopVectorizationPlanner::plan(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6453:5 #12 0x6b912b1 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:47 #13 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16 #14 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17 #15 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27 #16 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16 #17 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27 #18 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694 #19 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12 #20 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #21 0x9e53e9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/opt+0x9e53e9) 0x6060000300b0 is located 48 bytes inside of 56-byte region [0x606000030080,0x6060000300b8) freed by thread T0 here: #0 0xaa9bd8 in operator delete(void*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cc:166 #1 0x6b73274 in llvm::InterleavedAccessInfo::reset() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:550:7 #2 0x6b7239b in llvm::LoopVectorizationPlanner::plan(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6444:23 #3 0x6b912b1 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:47 #4 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16 #5 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17 #6 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27 #7 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16 #8 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27 #9 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694 #10 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12 #11 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) previously allocated by thread T0 here: #0 0xaa9198 in operator new(unsigned long) /b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cc:105 #1 0x434e673 in llvm::InterleavedAccessInfo::createInterleaveGroup(llvm::Instruction*, int, llvm::Align) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Analysis/VectorUtils.h:643:9 #2 0x434b4d9 in llvm::InterleavedAccessInfo::analyzeInterleaving(bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/VectorUtils.cpp:930:17 #3 0x6b90a71 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7608:9 #4 0x6b9baa8 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7843:16 #5 0x6ba7c92 in (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1617:17 #6 0x52d7953 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:27 #7 0x52d8132 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1517:16 #8 0x52d8fad in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1582:27 #9 0x52d8fad in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1694 #10 0xafbff4 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:890:12 #11 0x7f5ecce0a2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
Fail was due to a bug in IAI whose reset() method was not clearing the set holding pointers to the interleave groups being deleted, uncovered by this patch's use of the set. Patch recommited as 100e797adb433724a17c9b42b6533cd634cb796b with a fix to reset().
Thanks Benjamin!
The recommited patch is hitting an assert. See message and reduced test case below:
assertion failed at llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:91 in llvm::VPRecipeBase *llvm::VPRecipeBuilder::getRecipe(llvm::Instruction *): Ingredient2Recipe[I] != nullptr && "Ingredient doesn't have a recipe"
"clang" \ "-cc1" \ "-triple" "x86_64-unknown-linux-gnu" \ "-emit-obj" \ "-O3" \ "-w" \ "-vectorize-loops" \ "-x" "c" "test.c"
int test(int I1, int *Arr1, int *Arr2) { int i = 0, j = 0; int I2 = 0; int ret = 0; for (i = 0; i < 5; i++) { if ((Arr1[i] < 5)) { I2 = I1; } } for (i = 0; i < I2; i++, j++) { Arr2[3] = (j < I1 ? Arr1[j] : 0); } if (j < I1) { ret = 1; } return ret; }
I've got a similar crash. My case is a little bit simpler:
#include <stdio.h> void find(int *data) { char t = 0; // changing char to int won't trigger the crash for (int i = 789; i > 0; i--) t = t || (data[i] == 42); // note the short circuit of OR if (t) // removing this conditional won't trigger the crash printf("some string"); }
This compiles to (under clang++ -O3):
define dso_local void @_Z4findPi(i32* nocapture readonly %0) local_unnamed_addr #0 { br label %4 2: ; preds = %11 %3 = phi i1 [ %12, %11 ] br i1 %3, label %16, label %18 4: ; preds = %11, %1 %5 = phi i64 [ 789, %1 ], [ %13, %11 ] %6 = phi i1 [ true, %1 ], [ %15, %11 ] br i1 %6, label %7, label %11 7: ; preds = %4 %8 = getelementptr inbounds i32, i32* %0, i64 %5 %9 = load i32, i32* %8, align 4, !tbaa !2 %10 = icmp eq i32 %9, 42 br label %11 11: ; preds = %4, %7 %12 = phi i1 [ true, %4 ], [ %10, %7 ] %13 = add nsw i64 %5, -1 %14 = icmp eq i64 %13, 0 %15 = xor i1 %12, true br i1 %14, label %2, label %4 16: ; preds = %2 %17 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([12 x i8], [12 x i8]* @.str, i64 0, i64 0)) br label %18 18: ; preds = %2, %16 ret void }
It identifies "br i1 %6, label %7, label %11" as sinkable and tries to sink it.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7060 | Previously we have this conditional guard against sinking of certain instructions so the problem got hidden. The real problem should be in IVDescriptors.cpp:RecurrenceDescriptor::isFirstOrderRecurrence(). Probably it's better to allow only a whitelist of arithmetic opcodes. Please note there is another discussion https://reviews.llvm.org/D69228 going on with that function. |
Gil, I can put a patch up for not sinking past terminators in OVDescriptor, as I am already working on patches in that area.
I’d also suggest to land the removeFromParent & co changes separately from the other changes in the patch.
I'm seeing a new failure with this patch: http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable/builds/1128 . Crash log below. Please let me know if you need me to come up with a testcase.
` FAILED: out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o /bin/bash -c "PWD=/proc/self/cwd ulimit -t 240; llvm.inst/bin/clang -I external/tcpdump -I out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates -I out/target/product/angler/gen/EXECUTABLES/tcpdump_intermediates -I libnativehelper/include/nativehelper \$(cat out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/import_includes) -I system/core/include -I system/media/audio/include -I hardware/libhardware/include -I hardware/libhardware_legacy/include -I hardware/ril/include -I libnativehelper/include -I frameworks/native/include -I frameworks/native/opengl/include -isystem frameworks/av/include -isystem out/target/product/angler/obj/include -isystem device/huawei/angler/kernel-headers -isystem hardware/qcom/msm8994/kernel-headers -isystem bionic/libc/arch-arm64/include -isystem bionic/libc/include -isystem bionic/libc/kernel/uapi -isystem bionic/libc/kernel/uapi/asm-arm64 -isystem bionic/libc/kernel/android/uapi -c -fno-exceptions -Wno-multichar -fno-strict-aliasing -fstack-protector-strong -ffunction-sections -fdata-sections -funwind-tables -Wa,--noexecstack -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-short-enums -no-canonical-prefixes -Werror=pointer-to-int-cast -Werror=int-to-pointer-cast -Werror=implicit-function-declaration -DNDEBUG -O2 -g -Wstrict-aliasing=2 -DANDROID -fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith -DNDEBUG -UDEBUG -fdebug-prefix-map=/proc/self/cwd= -D__compiler_offsetof=__builtin_offsetof -Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic -Wno-unused-command-line-argument -fcolor-diagnostics -Wno-expansion-to-defined -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Werror=date-time -nostdlibinc -mcpu=cortex-a53 -target aarch64-linux-android -Bprebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/bin -std=gnu99 -D_BSD_SOURCE -DHAVE_CONFIG_H -D_U_=\"__attribute__((unused))\" -Werror -fpie -D_USING_LIBCXX -Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -Werror=address-of-temporary -Werror=return-type -Wno-error -O3 -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-process-unprofitable -MD -MF out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.d -o out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o external/tcpdump/tcpdump.c" clang: /var/lib/buildbot/slaves/hexagon-build-03/aosp/llvm.src/llvm/include/llvm/Support/Casting.h:264: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::VPWidenMemoryInstructionRecipe, Y = llvm::VPRecipeBase]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. Stack dump: 0. Program arguments: llvm.inst/bin/clang -cc1 -triple aarch64-unknown-linux-android -emit-obj -mnoexecstack -disable-free -main-file-name tcpdump.c -mrelocation-model pic -pic-level 1 -pic-is-pie -mthread-model posix -mframe-pointer=all -relaxed-aliasing -fno-rounding-math -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu cortex-a53 -target-feature +fp-armv8 -target-feature +neon -target-feature +crc -target-feature +crypto -target-feature +sha2 -target-feature +aes -target-abi aapcs -mllvm -aarch64-fix-cortex-a53-835769=1 -fallow-half-arguments-and-returns -dwarf-column-info -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -ffunction-sections -fdata-sections -nostdsysteminc -resource-dir llvm.inst/lib/clang/10.0.0 -dependency-file out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.d -MT out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o -sys-header-deps -isystem frameworks/av/include -isystem out/target/product/angler/obj/include -isystem device/huawei/angler/kernel-headers -isystem hardware/qcom/msm8994/kernel-headers -isystem bionic/libc/arch-arm64/include -isystem bionic/libc/include -isystem bionic/libc/kernel/uapi -isystem bionic/libc/kernel/uapi/asm-arm64 -isystem bionic/libc/kernel/android/uapi -I external/tcpdump -I out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates -I out/target/product/angler/gen/EXECUTABLES/tcpdump_intermediates -I libnativehelper/include/nativehelper -I external/boringssl/src/include -I external/boringssl/src/include -I external/boringssl/src/include -I external/libpcap -I external/libcxx/include -I external/libcxxabi/include -I system/core/include -I system/media/audio/include -I hardware/libhardware/include -I hardware/libhardware_legacy/include -I hardware/ril/include -I libnativehelper/include -I frameworks/native/include -I frameworks/native/opengl/include -D _FORTIFY_SOURCE=2 -D NDEBUG -D ANDROID -D NDEBUG -U DEBUG -D __compiler_offsetof=__builtin_offsetof -D _BSD_SOURCE -D HAVE_CONFIG_H -D _U_=__attribute__((unused)) -D _USING_LIBCXX -internal-isystem llvm.inst/lib/clang/10.0.0/include -O3 -Wno-multichar -Werror=format-security -Werror=pointer-to-int-cast -Werror=int-to-pointer-cast -Werror=implicit-function-declaration -Wstrict-aliasing=2 -W -Wall -Wno-unused -Winit-self -Wpointer-arith -Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic -Wno-unused-command-line-argument -Wno-expansion-to-defined -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Werror=date-time -Werror -Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -Werror=address-of-temporary -Werror=return-type -Wno-error -std=gnu99 -fdebug-compilation-dir /var/lib/buildbot/slaves/hexagon-build-03/aosp -fdebug-prefix-map=/proc/self/cwd= -ferror-limit 19 -fmessage-length 0 -stack-protector 2 -fno-signed-char -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-process-unprofitable -o out/target/product/angler/obj/EXECUTABLES/tcpdump_intermediates/tcpdump.o -x c external/tcpdump/tcpdump.c 1. <eof> parser at end of file 2. Per-module optimization passes 3. Running pass 'Function Pass Manager' on module 'external/tcpdump/tcpdump.c'. 4. Running pass 'Loop Vectorization' on function '@lookup_ndo_printer' #0 0x00000000019a2684 PrintStackTraceSignalHandler(void*) (llvm.inst/bin/clang+0x19a2684) #1 0x00000000019a02de llvm::sys::RunSignalHandlers() (llvm.inst/bin/clang+0x19a02de) #2 0x00000000019a2aa8 SignalHandler(int) (llvm.inst/bin/clang+0x19a2aa8) #3 0x00007f370a6ccd10 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10) #4 0x00007f37095b9267 raise /build/buildd/glibc-2.21/signal/../sysdeps/unix/sysv/linux/raise.c:55:0 #5 0x00007f37095baeca abort /build/buildd/glibc-2.21/stdlib/abort.c:91:0 #6 0x00007f37095b203d __assert_fail_base /build/buildd/glibc-2.21/assert/assert.c:92:0 #7 0x00007f37095b20f2 (/lib/x86_64-linux-gnu/libc.so.6+0x2e0f2) #8 0x0000000001adbe07 llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Value*>&, llvm::SmallPtrSetImpl<llvm::Instruction*>&) (llvm.inst/bin/clang+0x1adbe07) #9 0x0000000001ad5f5f llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned int, unsigned int) (llvm.inst/bin/clang+0x1ad5f5f) #10 0x0000000001ad57c2 llvm::LoopVectorizationPlanner::plan(unsigned int) (llvm.inst/bin/clang+0x1ad57c2) #11 0x0000000001adfd4b llvm::LoopVectorizePass::processLoop(llvm::Loop*) (llvm.inst/bin/clang+0x1adfd4b) #12 0x0000000001ae2ba4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (llvm.inst/bin/clang+0x1ae2ba4) #13 0x0000000001ae6813 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) (llvm.inst/bin/clang+0x1ae6813) #14 0x0000000001402eb3 llvm::FPPassManager::runOnFunction(llvm::Function&) (llvm.inst/bin/clang+0x1402eb3) #15 0x0000000001403198 llvm::FPPassManager::runOnModule(llvm::Module&) (llvm.inst/bin/clang+0x1403198) #16 0x0000000001403696 llvm::legacy::PassManagerImpl::run(llvm::Module&) (llvm.inst/bin/clang+0x1403696) #17 0x0000000001b7d310 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (llvm.inst/bin/clang+0x1b7d310) #18 0x00000000027a9c84 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (llvm.inst/bin/clang+0x27a9c84) #19 0x0000000002e90cf3 clang::ParseAST(clang::Sema&, bool, bool) (llvm.inst/bin/clang+0x2e90cf3) #20 0x00000000020c2480 clang::FrontendAction::Execute() (llvm.inst/bin/clang+0x20c2480) #21 0x0000000002061698 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (llvm.inst/bin/clang+0x2061698) #22 0x000000000216b03d clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (llvm.inst/bin/clang+0x216b03d) #23 0x000000000094091f cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (llvm.inst/bin/clang+0x94091f) #24 0x000000000093e3b1 main (llvm.inst/bin/clang+0x93e3b1) #25 0x00007f37095a4a40 __libc_start_main /build/buildd/glibc-2.21/csu/libc-start.c:323:0 #26 0x000000000093b649 _start (llvm.inst/bin/clang+0x93b649) clang: error: unable to execute command: Aborted (core dumped) clang: error: clang frontend command failed due to signal (use -v to see invocation) clang version 10.0.0 (/var/lib/buildbot/slaves/hexagon-build-03/aosp/llvm.src/clang e25bc5e0247141cb31093a370e22fe3249bdbb05) Target: aarch64-unknown-linux-android Thread model: posix InstalledDir: llvm.inst/bin clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang: note: diagnostic msg: ******************** `
Testcase follows; reproduce with opt -loop-vectorize:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-android" %0 = type { i32 ()*, i32 } @0 = internal unnamed_addr constant [59 x %0] [%0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 {i32 ()* null, i32 258}, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer, %0 zeroinitializer], align 8 define dso_local void @lookup_ndo_printer(i32 %arg) { bb1: br label %bb2 bb2: %tmp = phi %0* [ %tmp6, %bb2 ], [ getelementptr inbounds ([59 x %0], [59 x %0]* @0, i64 0, i64 0), %bb1 ] %tmp3 = getelementptr inbounds %0, %0* %tmp, i64 0, i32 1 %tmp4 = load i32, i32* %tmp3, align 8 %tmp5 = icmp eq i32 %tmp4, 258 %tmp6 = getelementptr inbounds %0, %0* %tmp, i64 1 br i1 %tmp5, label %bb65, label %bb2 bb65: unreachable }
[snip]
This testcase is similar to PR40816, where a load from an array of constants feeds the compare of the loop's latch, allowing SCEV to determine the loop's trip count by looking at these constants. In this case there's a discrepancy between vectorizing such a load as an interleave group due to its strided access pattern, and scalarizing it due to being UniformAfterVectorization (see https://reviews.llvm.org/D68831#inline-627725). To be consistent with the previous behavior (of vectorizing), willWiden above should succeed if the Decision is CM_Interleave, even if the instruction isScalarAfterVectorization().
TODO notes for separate follow-up non-NFC patches:
- It's probably better to (consistently) scalarize such cases, or even identify them as DeadInstructions, rather than vectorize them w/ or w/o an interleave group.
- Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.
indvars does this transform. That said, it only runs once; in some cases, we manage to simplify the loop after indvars runs. Maybe it makes sense to run indvars again? We'd want to measure how much it actually triggers in practice.
Indeed indvars takes care of the testcase provided in https://reviews.llvm.org/D68577#1739605, leaving nothing to be vectorized. It does not handle the testcase of PR40816, though, presumably because of the store inside its loop (which prevents IndVarSimplify::PredicateLoopExits() from succeeding). Perhaps indvars should be extented to convert any loop to use a simple canonical IV whenever SCEV can compute its trip count.
Running indvars before LV might also relieve the latter from having to take care of live outs that are IV's.
Other try{something} functions return a recipe pointer, while this one returns a boolean.
If you rename this to "check" or "can" (instead of try), then you shouldn't clamp the range.
I'm not sure what's best here, but this way looks a bit odd.