This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Replace use of needsVectorIV with VPlan user check.
ClosedPublic

Authored by fhahn on Apr 13 2022, 12:59 PM.

Details

Summary

Replace one of the last remaining uses of ::needsVectorIV, preparing for
its removal.

Diff Detail

Event Timeline

fhahn created this revision.Apr 13 2022, 12:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 12:59 PM
fhahn requested review of this revision.Apr 13 2022, 12:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 12:59 PM
Herald added a subscriber: vkmr. · View Herald Transcript
Ayal accepted this revision.Apr 13 2022, 1:35 PM

Looks good to me.
Can this too lead to different - more accurate decisions? If so wonder if that could be tested(?)

This revision is now accepted and ready to land.Apr 13 2022, 1:35 PM
This revision was landed with ongoing or failed builds.Apr 28 2022, 8:30 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Apr 28 2022, 8:32 AM

Looks good to me.
Can this too lead to different - more accurate decisions? If so wonder if that could be tested(?)

The code was already falling back to usesScalars in the loop below, if needsVectorIV was returning true. In the committed version I completely dropped the check entirely. I wasn't able to find a case where this would lead to more accurate decisions unfortunately.

Ayal added inline comments.Apr 28 2022, 9:16 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
432

80 columns?

"to use Step[s] instead"

fhahn added inline comments.Apr 28 2022, 9:23 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
432

I realized the line length issue immediately after committing :( Should already be fixed by rG2883de05145f

I bisected a new crash while building the Linux kernel for arm64 to this change:

# bad: [9e3b7e8e656bdcf5aeafbac7b345a7b29272434c] [X86] getTargetVShiftByConstNode - use SelectionDAG::FoldConstantArithmetic to perform constant folding. NFCI.
# good: [c74a706893f0667d6aae2d7704d21af97c92dc07] [LegacyPM] Remove ThreadSanitizerLegacyPass
git bisect start '9e3b7e8e656bdcf5aeafbac7b345a7b29272434c' 'c74a706893f0667d6aae2d7704d21af97c92dc07'
# good: [de7e5394320bd1c3a212d1242cb163fd006e2150] [gn build] (manually) port 0f1b5f115a7f
git bisect good de7e5394320bd1c3a212d1242cb163fd006e2150
# good: [5420834aadbd271b3773f53fbbd58e9437952616] [demangler] Fix demangling a template argument which happens to be a null pointer
git bisect good 5420834aadbd271b3773f53fbbd58e9437952616
# bad: [2883de05145fc5b4afb99b91f69ebb835af36af5] [VPlan] Fix comment formatting from 43842b887e.
git bisect bad 2883de05145fc5b4afb99b91f69ebb835af36af5
# good: [9861ca0c23a60aa18874bb4378f359d4659a6ee6] Revert "[COST]Improve cost model for shuffles in SLP."
git bisect good 9861ca0c23a60aa18874bb4378f359d4659a6ee6
# good: [f6b7fd20a52ef83d0462db190eb40800afda2506] [lldb] Remove patch reject file (.rej)
git bisect good f6b7fd20a52ef83d0462db190eb40800afda2506
# bad: [43842b887e0a7b918bb2d6c9f672025b2c621f8a] [VPlan] Remove uneeded needsVectorIV check.
git bisect bad 43842b887e0a7b918bb2d6c9f672025b2c621f8a
# good: [50d648b40ecdb0bedc0676ed96aad59ebf257d7e] [mlir][emitc] Replace !emitc.opaque pointers
git bisect good 50d648b40ecdb0bedc0676ed96aad59ebf257d7e
# first bad commit: [43842b887e0a7b918bb2d6c9f672025b2c621f8a] [VPlan] Remove uneeded needsVectorIV check.

A simplified C reproducer:

ata_scsi_rbuf, ata_format_dsm_trim_descr_count;
long *ata_scsi_write_same_xlat_buf;
ata_scsi_write_same_xlat() {
  while (ata_scsi_rbuf) {
    long entry = ata_format_dsm_trim_descr_count > 65535
                     ? 65535
                     : ata_format_dsm_trim_descr_count;
    ata_scsi_write_same_xlat_buf[ata_scsi_rbuf++] = entry;
    ata_format_dsm_trim_descr_count -= 5;
  }
}
$ clang --target=aarch64-linux-gnu -mgeneral-regs-only -c -o /dev/null libata-scsi.i
...

$ clang --target=aarch64-linux-gnu -O2 -c -o /dev/null libata-scsi.i
...

$ clang --target=aarch64-linux-gnu -O2 -mgeneral-regs-only -c -o /dev/null libata-scsi.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9473: virtual void llvm::VPWidenIntOrFpInductionRecipe::execute(llvm::VPTransformState &): Assertion `State.VF.isVector() && "must have vector VF"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang --target=aarch64-linux-gnu -O2 -mgeneral-regs-only -c -o /dev/null libata-scsi.i
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x0000aaaacbe357a8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x34857a8)
 #1 0x0000aaaacbe339b4 llvm::sys::RunSignalHandlers() (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x34839b4)
 #2 0x0000aaaacbdc062c (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #3 0x0000aaaacbdc07dc CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #4 0x0000ffff8ae8284c (linux-vdso.so.1+0x84c)
 #5 0x0000ffff8a9a64a8 __pthread_kill_implementation (/lib64/libc.so.6+0x864a8)
 #6 0x0000ffff8a95e940 gsignal (/lib64/libc.so.6+0x3e940)
 #7 0x0000ffff8a94aef8 abort (/lib64/libc.so.6+0x2aef8)
 #8 0x0000ffff8a958008 __assert_fail_base (/lib64/libc.so.6+0x38008)
 #9 0x0000ffff8a958070 __assert_perror_fail (/lib64/libc.so.6+0x38070)
#10 0x0000aaaacbfd6130 llvm::VPWidenIntOrFpInductionRecipe::execute(llvm::VPTransformState&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3626130)
#11 0x0000aaaacc05b0c8 llvm::VPBasicBlock::execute(llvm::VPTransformState*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x36ab0c8)
#12 0x0000aaaacc05ca98 llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x36aca98)
#13 0x0000aaaacc05ef90 llvm::VPlan::execute(llvm::VPTransformState*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x36aef90)
#14 0x0000aaaacbfc6a58 llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3616a58)
#15 0x0000aaaacbfdcab0 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x362cab0)
#16 0x0000aaaacbfdfff4 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x362fff4)
#17 0x0000aaaacbfe08cc llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x36308cc)
#18 0x0000aaaacb796818 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x2de6818)
#19 0x0000aaaacb799d28 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x2de9d28)
#20 0x0000aaaacb795504 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x2de5504)
#21 0x0000aaaacc594d3c (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >&) BackendUtil.cpp:0:0
#22 0x0000aaaacc58e540 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3bde540)
#23 0x0000aaaacc8e841c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#24 0x0000aaaacd109bb4 clang::ParseAST(clang::Sema&, bool, bool) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x4759bb4)
#25 0x0000aaaacc82dcfc clang::FrontendAction::Execute() (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3e7dcfc)
#26 0x0000aaaacc7bb7d0 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3e0b7d0)
#27 0x0000aaaacc8e2878 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3f32878)
#28 0x0000aaaacab333d4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x21833d4)
#29 0x0000aaaacab31800 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#30 0x0000aaaacc696e00 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::$_1>(long) Job.cpp:0:0
#31 0x0000aaaacbdc04d4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x34104d4)
#32 0x0000aaaacc6968c0 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3ce68c0)
#33 0x0000aaaacc665f2c clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3cb5f2c)
#34 0x0000aaaacc666198 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3cb6198)
#35 0x0000aaaacc679318 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x3cc9318)
#36 0x0000aaaacab30f90 main (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x2180f90)
#37 0x0000ffff8a94b1c8 __libc_start_call_main (/lib64/libc.so.6+0x2b1c8)
#38 0x0000ffff8a94b2a0 __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x2b2a0)
#39 0x0000aaaacab2e9b0 _start (/home/nathan/tmp/build/llvm/stage1/bin/clang-15+0x217e9b0)
clang-15: error: clang frontend command failed with exit code 134 (use -v to see invocation)
ClangBuiltLinux clang version 15.0.0 (https://github.com/llvm/llvm-project 1fbdf3a02ed610a9d03ea0b908f2b9cdab75a82a)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/nathan/tmp/build/llvm/stage1/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

A simplified C reproducer:

Thanks, I reverted the change while I investigate. It looks like the issue is that we create VPWidenCall recipes even in VPlans with VF = 1.

alexfh added a subscriber: alexfh.May 9 2022, 9:13 AM

The last recommit of this patch (8b48223447311af8b3022697dd58858e1ce6975f "Recommit ""[VPlan] Remove uneeded needsVectorIV check.""") causes the compiler to crash on some of our code. A reduced test case:

$ cat q.cc
int a[2];
int b;
bool c() {
  bool d;
  for (int e; e; e--) {
    d = !d;
    b += d ?: a[e];
  }
  return b;
}
$ ./clang -cc1 -triple x86_64--linux-gnu -emit-obj -O2 -vectorize-loops q.cc
Stack dump:
0.      Program arguments: ./clang -cc1 -triple x86_64--linux-gnu -emit-obj -O2 -vectorize-loops q.cc
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000055cb5104bfa8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./clang+0x6e4bfa8)
 #1 0x000055cb5104c3c5 SignalHandler(int) (./clang+0x6e4c3c5)
 #2 0x00007f464c15b750 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x15750)
 #3 0x000055cb4d2f15fc llvm::VPlanTransforms::optimizeInductions(llvm::VPlan&, llvm::ScalarEvolution&) (./clang+0x30f15fc)
 #4 0x000055cb4d2eae74 llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Instruction*>&, llvm::MapVector<llvm::Instruction*, llvm::Instruction*, llvm::DenseMap<llvm::Instruction*, unsigned int, llvm::DenseMapInfo<llvm::Instruction*, void>, llvm::detail::DenseMapPair<llvm::Instruction*, unsigned int> >, std::__u::vector<std::__u::pair<llvm::Instruction*, llvm::Instruction*>, std::__u::allocator<std::__u::pair<llvm::Instruction*, llvm::Instruction*> > > > const&) (./clang+0x30eae74)
 #5 0x000055cb4d2e903b llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (./clang+0x30e903b)
 #6 0x000055cb4e865734 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (.cold) (./clang+0x4665734)
 #7 0x000055cb4d143dcf llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__u::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (./clang+0x2f43dcf)
 #8 0x000055cb4d143b7d llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x2f43b7d)
 #9 0x000055cb4d143832 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x2f43832)
#10 0x000055cb4db331fd llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x39331fd)
#11 0x000055cb4db32db2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x3932db2)
#12 0x000055cb4d95fd16 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x375fd16)
#13 0x000055cb4d3cca92 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x31cca92)
#14 0x000055cb4d1fdbe4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x2ffdbe4)
#15 0x000055cb4d233efd (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&) (./clang+0x3033efd)
#16 0x000055cb4d22cee5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) (./clang+0x302cee5)
#17 0x000055cb4dd2e253 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (./clang+0x3b2e253)
#18 0x000055cb4d226af4 clang::ParseAST(clang::Sema&, bool, bool) (./clang+0x3026af4)
#19 0x000055cb4d308654 clang::FrontendAction::Execute() (./clang+0x3108654)
#20 0x000055cb4d308472 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (./clang+0x3108472)
#21 0x000055cb4d3082e5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (./clang+0x31082e5)
#22 0x000055cb4dcf3fca cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (./clang+0x3af3fca)
#23 0x000055cb4d3cab81 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (./clang+0x31cab81)
#24 0x000055cb4dbdddf2 main (./clang+0x39dddf2)
#25 0x00007f464bfe98d3 __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d3)
#26 0x000055cb4d3ca9fa _start (./clang+0x31ca9fa)
Segmentation fault

Do you see how to fix this quickly? Otherwise, please revert the commit while you're working on a fix. Thanks!

fhahn added a comment.May 9 2022, 12:34 PM

Do you see how to fix this quickly? Otherwise, please revert the commit while you're working on a fix. Thanks!

Thanks for the test case! This highlights other parts that need to be cleaned up before landing this again. I've reverted the patch for now.

fhahn added a comment.May 9 2022, 12:36 PM

Do you see how to fix this quickly? Otherwise, please revert the commit while you're working on a fix. Thanks!

Thanks for the test case! This highlights other parts that need to be cleaned up before landing this again. I've reverted the patch for now.

D124936 is part of the required cleanup.

fhahn added a comment.Jun 8 2022, 10:54 AM

Do you see how to fix this quickly? Otherwise, please revert the commit while you're working on a fix. Thanks!

Thanks for the test case! This highlights other parts that need to be cleaned up before landing this again. I've reverted the patch for now.

I recommitted the change now that all pending cleanups have landed.

Hi @fhahn , this looks like a regression from your commit below:

commit cedfd7a2e536d2ff6da44e89a024baa402dc3e58
Author: Florian Hahn <flo@fhahn.com>
Date:   Wed Jun 8 14:06:45 2022 +0100

    Recommit "[VPlan] Remove uneeded needsVectorIV check."

    This reverts commit 266ea446ab747671eb6c736569c3c9c5f3c53d11.

    The reasons for the revert have been addressed by cleaning up condition
    handling in VPlan and properly marking VPBranchOnMaskRecipe as using
    scalars.

    The test case for the revert from D123720 has been added in 3d663308a5d.

Last good commit is 98d4f0651a7f90df161167362e2060e0b4de7969. Could you please have a look?

> cat red3.c
int foo(float *p, int n)
{
  int count = 0, prev = 0;
  for (int i = 0; i < n; i++) {
    if (p[prev] < 0)
      count++;
    prev = i;
  }
  return count;
}
clang -mcpu=pwr10 -O2 -fprofile-generate -c red3.c -target "powerpc-ibm-aix7.2.0.0"

or reproduce with the attached IR using:

opt -passes=loop-vectorize input.ll -S -o out.ll -debug-only=loop-vectorize
...
LoopVectorize.cpp:9434: virtual void llvm::VPWidenIntOrFpInductionRecipe::execute(llvm::VPTransformState &): Assertion `State.VF.isVector() && "must have vector VF"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt -passes=loop-vectorize input.ll -S -o out.ll -debug-only=loop-vectorize
 #0 0x0000000012ea7414 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x12ea7414)
 #1 0x0000000012ea7834 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #2 0x0000000012ea4558 llvm::sys::RunSignalHandlers() (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x12ea4558)
 #3 0x0000000012ea7afc SignalHandler(int) Signals.cpp:0:0
 #4 0x00002000000604c8 (linux-vdso64.so.1+0x4c8)
 #5 0x000020000064d168 __libc_signal_restore_set /build/glibc-E31xyL/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
 #6 0x000020000064d168 raise /build/glibc-E31xyL/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3
 #7 0x0000200000624850 abort /build/glibc-E31xyL/glibc-2.31/stdlib/abort.c:79:7
 #8 0x000020000063c49c __assert_fail_base /build/glibc-E31xyL/glibc-2.31/assert/assert.c:92:3
 #9 0x000020000063c540 __assert_fail /build/glibc-E31xyL/glibc-2.31/assert/assert.c:101:3
#10 0x00000000130df418 llvm::VPWidenIntOrFpInductionRecipe::execute(llvm::VPTransformState&) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x130df418)
#11 0x00000000131abd6c llvm::VPBasicBlock::execute(llvm::VPTransformState*) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x131abd6c)
#12 0x00000000131ad898 llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x131ad898)
#13 0x00000000131b1188 llvm::VPlan::execute(llvm::VPTransformState*) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x131b1188)
#14 0x00000000130ce954 llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (/home/bmahjour/workspaces/wyvern_all/upstream-fork/build/bin/opt+0x130ce954)

Looks like we try to vectorize by VF=1 UF=8, but the VPlan has a widen-induction recipe:

VPlan 'Initial VPlan for VF={1},UF>=1' {
Live-in vp<%1> = vector-trip-count

vector.ph:
Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    EMIT vp<%2> = CANONICAL-INDUCTION
    FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15>
    WIDEN-REDUCTION-PHI ir<%count.010> = phi ir<0>, ir<%count.1>
    WIDEN-INDUCTION\l" +
    "  %pgocount1314 = phi %5, 0\l" +
    "  ir<%indvars15>, ir<1>
    EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>
    CLONE ir<%arrayidx> = getelementptr ir<%p>, vp<%6>
    CLONE ir<%6> = load ir<%arrayidx>
    CLONE ir<%cmp1> = fcmp ir<%6>, ir<0.000000e+00>
    CLONE ir<%inc> = zext ir<%cmp1>
    CLONE ir<%count.1> = add ir<%count.010>, ir<%inc>
    EMIT vp<%12> = VF * UF +(nuw)  vp<%2>
    EMIT branch-on-count  vp<%12> vp<%1>
  No successors
}

In the good case the VPlan uses scalar-steps recipe instead:

VPlan 'Initial VPlan for VF={1},UF>=1' {
Live-in vp<%1> = vector-trip-count

vector.ph:
Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    EMIT vp<%2> = CANONICAL-INDUCTION
    FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, vp<%5>
    WIDEN-REDUCTION-PHI ir<%count.010> = phi ir<0>, ir<%count.1>
    vp<%5>    = SCALAR-STEPS vp<%2>, ir<0>, ir<1>
    EMIT vp<%6> = first-order splice ir<%prev.011> vp<%5>
    CLONE ir<%arrayidx> = getelementptr ir<%p>, vp<%6>
    CLONE ir<%6> = load ir<%arrayidx>
    CLONE ir<%cmp1> = fcmp ir<%6>, ir<0.000000e+00>
    CLONE ir<%inc> = zext ir<%cmp1>
    CLONE ir<%count.1> = add ir<%count.010>, ir<%inc>
    EMIT vp<%12> = VF * UF +(nuw)  vp<%2>
    EMIT branch-on-count  vp<%12> vp<%1>
  No successors
}
fhahn added a comment.Jun 23 2022, 9:37 AM

Hi @fhahn , this looks like a regression from your commit below:

Thanks for the reproducer! Looking into it now. I think we will also need to update the way we print truncated widen IV recipes, as this is at the moment quite confusing.

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Note this is also related to rG85983ca42ec6102b1692d0451090cacd002c958f, so simply reverting the above commit won't fix the case I quoted above.

fhahn added a comment.Jun 28 2022, 1:06 PM

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Note this is also related to rG85983ca42ec6102b1692d0451090cacd002c958f, so simply reverting the above commit won't fix the case I quoted above.

Thanks, I put up D128755 to fix it by making sure that we replace all widen-inductions with scalar steps for plans with scalar VFs.

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Note this is also related to rG85983ca42ec6102b1692d0451090cacd002c958f, so simply reverting the above commit won't fix the case I quoted above.

Thanks, I put up D128755 to fix it by making sure that we replace all widen-inductions with scalar steps for plans with scalar VFs.

This is strange, but I'm still getting the same crash even when applying https://reviews.llvm.org/D128755. Looks like Plan.hasVF(ElementCount::getFixed(1)); returns false for both vplans!

danlark added a subscriber: danlark.EditedDec 21 2022, 5:17 PM

Hi @fhahn, this is the first commit that caused a miscompilation we observed.

I don't particular think this is the root cause and likely it uncovered some other bug but still

#include <optional>
#include <string>
#include <string_view>
#include <iostream>

__attribute__((always_inline)) static inline unsigned char constant_time_ge_8(unsigned int a, unsigned int b) {
  return ~(((unsigned int)(a - b)) >> 15);
}

__attribute__((noinline)) unsigned char CheckPadding(unsigned char *data, unsigned int length)
{
    unsigned int padding_length = data[length - 1];
    unsigned char res = 0;

    for (unsigned int i = 0; i < length - 1; i++) {
        unsigned char mask = constant_time_ge_8(padding_length, i);
        unsigned char b = data[length - 1 - i];
        res |= mask & (padding_length ^ b);
    }

    return res;
}

__attribute__((noinline)) unsigned char CheckPaddingNoVec(unsigned char *data, unsigned int length)
{
    unsigned int padding_length = data[length - 1];
    unsigned char res = 0;

#pragma clang loop vectorize(disable)
    for (unsigned int i = 0; i < length - 1; i++) {
        unsigned char mask = constant_time_ge_8(padding_length, i);
        unsigned char b = data[length - 1 - i];
        res |= mask & (padding_length ^ b);
    }

    return res;
}

__attribute__((aligned(16))) unsigned char data_length41[41] = {
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x03, 0x03, 0x03, 0x03};

__attribute__((aligned(16))) unsigned char data_length40[40] = {
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x03, 0x03, 0x03};

int main() {
    std::cout << (int)CheckPadding(data_length40, sizeof(data_length40)) << std::endl;
    std::cout << (int)CheckPadding(data_length41, sizeof(data_length41)) << std::endl;
    std::cout << (int)CheckPaddingNoVec(data_length40, sizeof(data_length40)) << std::endl;
    std::cout << (int)CheckPaddingNoVec(data_length41, sizeof(data_length41)) << std::endl;
}

This has the output of 0 3 0 0 with the recommitted patch whereas the correct result is 0 0 0 0. https://gcc.godbolt.org/z/Mz9eeMscn

If we apply back

--- a/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -419,7 +419,12 @@ void VPlanTransforms::optimizeInductions
 
     VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
     HeaderVPBB->insert(Steps, IP);
-
+    // If there are no vector users of IV, simply update all users to use Step
+    // instead.
+    if (!WideIV->needsVectorIV()) {
+      WideIV->replaceAllUsesWith(Steps);
+      continue;
+    }
     // Update scalar users of IV to use Step instead. Use SetVector to ensure
     // the list of users doesn't contain duplicates.
     SetVector<VPUser *> Users(WideIV->user_begin(), WideIV->user_end());

Then it passes