This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
ClosedPublic

Authored by ABataev on May 3 2023, 5:47 AM.

Details

Summary

Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. WE can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Diff Detail

Event Timeline

ABataev created this revision.May 3 2023, 5:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 5:47 AM
ABataev requested review of this revision.May 3 2023, 5:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 5:47 AM
RKSimon added inline comments.May 4 2023, 3:54 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2515

Update the \returns description

ABataev updated this revision to Diff 519650.May 4 2023, 2:06 PM

Address comments

Hi Alexey,

I tried to measure performance impact of this patch. Observed ~10% regression on nnet_test spec from coremark-pro test suite. (on avx2 with -flto).

Hi Alexey,

I tried to measure performance impact of this patch. Observed ~10% regression on nnet_test spec from coremark-pro test suite. (on avx2 with -flto).

Could you provide more details what is the cause? The patch itself just improves shuffles emissions, most probably there are some problems with the TTI cost model.

command to reproduce: opt -passes=slp-vectorizer -mtriple=x86_64 -mcpu=core-avx2 -S test.ll

BTW. which gcc version you use to build? I've experienced problem building with both 7.5.0 and 8.5.0 (I did not try any later versions) while there were no issues w/o patch.
Here is fail log:
<dir>/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3573:24: error: declaration of llvm::TargetTransformInfo* llvm::slpvectorizer::BoUpSLP::TTI [-fpermissive]

TargetTransformInfo *TTI;
                     ^~~

In file included from <dir>/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:47:0:
<dir>/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:205:29: error: changes meaning of TTI from typedef class llvm::TargetTransformInfo llvm::TTI [-fpermissive]
typedef TargetTransformInfo TTI;

^~~
ABataev added a comment.EditedJun 29 2023, 9:55 AM

command to reproduce: opt -passes=slp-vectorizer -mtriple=x86_64 -mcpu=core-avx2 -S test.ll

HMM, llvm-mca shows that the code after patch is better than the code before (with the patch the throughput is 4.0, without - 5.0)

https://godbolt.org/z/zvzTKedPd

BTW. which gcc version you use to build? I've experienced problem building with both 7.5.0 and 8.5.0 (I did not try any later versions) while there were no issues w/o patch.
Here is fail log:
<dir>/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3573:24: error: declaration of llvm::TargetTransformInfo* llvm::slpvectorizer::BoUpSLP::TTI [-fpermissive]

TargetTransformInfo *TTI;
                     ^~~

In file included from <dir>/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:47:0:
<dir>/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:205:29: error: changes meaning of TTI from typedef class llvm::TargetTransformInfo llvm::TTI [-fpermissive]
typedef TargetTransformInfo TTI;

^~~

It is most probably because of the using TTI:: in the function declaration, will fix this.

command to reproduce: opt -passes=slp-vectorizer -mtriple=x86_64 -mcpu=core-avx2 -S test.ll

HMM, llvm-mca shows that the code after patch is better than the code before (with the patch the throughput is 4.0, without - 5.0)

https://godbolt.org/z/zvzTKedPd

It puzzled me too actually. It was the only vectorization candidate that changed behavior due to the patch in the test. I could not find any obvious flaw in handling subsequent vectorization sequences.
Eyeballing of generated code before/after also did not reveal any obvious issues.

command to reproduce: opt -passes=slp-vectorizer -mtriple=x86_64 -mcpu=core-avx2 -S test.ll

HMM, llvm-mca shows that the code after patch is better than the code before (with the patch the throughput is 4.0, without - 5.0)

https://godbolt.org/z/zvzTKedPd

It puzzled me too actually. It was the only vectorization candidate that changed behavior due to the patch in the test. I could not find any obvious flaw in handling subsequent vectorization sequences.
Eyeballing of generated code before/after also did not reveal any obvious issues.

I fixed cost estimation, now it results in the same translation (I added your test, no changes)

ABataev updated this revision to Diff 538237.Jul 7 2023, 1:02 PM

Rebase, ping!

ABataev updated this revision to Diff 538755.Jul 10 2023, 11:46 AM

Rebase, ping!

ABataev updated this revision to Diff 540995.Jul 17 2023, 6:40 AM

Rebase, ping!

ABataev updated this revision to Diff 544471.Jul 26 2023, 12:20 PM

Rebase, ping!

ABataev updated this revision to Diff 546043.Aug 1 2023, 7:14 AM

Rebase, ping!!!
Required to continue the work on cost model unification and non-power-2.

ABataev updated this revision to Diff 547869.Aug 7 2023, 11:25 AM

Rebase, ping!

ABataev updated this revision to Diff 548992.Aug 10 2023, 5:33 AM

Rebase, ping!

Why are we doing this in SLP and not TTI getShuffleCost? It looks like TTI should be doing a better job of recognizing how it should split the shuffle mask and determine the shuffle kinds per split?

Why are we doing this in SLP and not TTI getShuffleCost? It looks like TTI should be doing a better job of recognizing how it should split the shuffle mask and determine the shuffle kinds per split?

I thought about this. Unfortunately, TTI is too late here + it does not have idion of TreeEntry.

What this function does? It scans the list of scalars and then checks, if it can build a shuffle, using previously build TreeEntries (use previously build vector to build the new one) to reduce number of inserts.
It supports only 1- and 2-vector shuffles, as 3- and more vector shuffles may not be effective for some targets.
Why cannot it be done in TTI? 1. TTI does not know about TreeEntries. 2. Without this knowledge it won't help in complex situations.

Assume we have 3 already built TreeEntries: <abcd>, <efgh>, <ijkl>. But the actual vector register includes only 2 elements (i.e. each this vector uses 2 vector registers). And we trying to build a new buildvector (NeedToGather) entry with elements <acih>.
Original function will help to translate this gather node into this:

%s1 = shuffle <abcd>, <efgh>, <0,2,poison,7>
%v = insertelement %s1, i, 2 - might be very cost ineffective + there might be several such inserts, increasing the overall cost of the buildvector/gather node.

This reworked function can do this:

%s1 = shuffle <abcd>, poison, <0,2>
%s2 = shuffle <efgh>, <ijkl>, <4,3>
%v = shuffle %s1, %s2, <0,1,2,3> - free insert subvector shuffle

TTI won't be able to transform the first variant to the second one without knowing that there is <ijkl> vector already exists. It requires teaching it about the whole SLP graph idiom.

ABataev updated this revision to Diff 552339.Aug 22 2023, 6:51 AM

Rebase, ping!

ABataev updated this revision to Diff 554447.Aug 29 2023, 11:59 AM

Rebase, ping!

ikelarev removed a subscriber: ikelarev.Sep 6 2023, 8:10 AM
reames added a subscriber: reames.Sep 6 2023, 2:44 PM
reames added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9177–9179

Stylistic suggestion here. This function is quite complicated, and following all the changes through are tricky. I *think* this is just performing the same operation for each sub-range (divided by register). If so, I'd suggest leaving the function alone, and introducing a wrapper which just calls the old version with a sub-range of the VLs, and builds up the vector results.

If this works here, then applying the same basic idea throughout the patch would make it much easier to follow and review.

ABataev added inline comments.Sep 6 2023, 3:04 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9177–9179

Yes, it splits the multi-register vector into several one-register vectors and just performs the same analysis, as before, to avoid long multi-vector shuffles. Will try to move the previous logic into a separate function.

ABataev updated this revision to Diff 556298.Sep 8 2023, 1:12 PM

Rebase, address comments

ABataev updated this revision to Diff 556434.Sep 11 2023, 8:05 AM

Rebase, ping!

ABataev updated this revision to Diff 556877.Sep 15 2023, 12:47 PM

Rebase, ping!

I've done my best, but I'm really struggling to understand a lot of this patch :(

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7260

Add method description, also why is it called estimateNodesPermuteCost when it doesn't seem to use/return costs?

7269

This logic makes very little sense to me

7830

Would we be better off merging the 1TE + 2TE add() methods?

I've done my best, but I'm really struggling to understand a lot of this patch :(

Generally speaking, it does the same as before, just splits the node into vector registers and tries to do per register shuffles instead of per node shuffles. It may reduce final number of shuffles, since we do not need to permute all the register in the node, just some of them. The single node may result in several vector registers, no need to permute all of them.

ABataev added inline comments.Sep 20 2023, 1:13 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7260

It adds the cost to final cost estimation, see e.g. lines 7169 and 7176.

7269

If we decided to permute E1 and (possibly) E2 already, and we found that we again need to permute the same nodes, no need to do it immediately, instead need to include the sub-Mask into CommonMask and do it later to avoid double shuffling of the same nodes.

7830

I'll try to rework this function to avoid it.

ABataev updated this revision to Diff 557185.Sep 21 2023, 9:11 AM

Rebase, address comments

ABataev updated this revision to Diff 557616.Oct 5 2023, 9:23 AM

Rebase, ping!

RKSimon added inline comments.Oct 17 2023, 7:34 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9523

Expected positive

9531

assert(VL.size() % NumParts == 0)?

10343

Pull these kind of NFC cleanups out of the patch

ABataev updated this revision to Diff 557763.Oct 18 2023, 1:04 PM

Rebase, address comments

RKSimon added inline comments.Oct 18 2023, 3:03 PM
llvm/test/Transforms/SLPVectorizer/X86/multi-nodes-to-shuffle.ll
1–3

Please can you add a AVX2 run to confirm that the shuffles are still being suitably split for legal 256-bit vectors?

; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -slp-threshold=-115 | FileCheck % --check-prefix=SSE
; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux mattr=+avx2 -slp-threshold=-115 | FileCheck % --check-prefix=AVX
vdmitrie added inline comments.Oct 18 2023, 6:20 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7007

I'm not sure I understand the purpose of this flag. Could you please describe?
It is quite confusing that it is already set right at construction when no estimation was done yet.

7275

InVectors.size() >=1 is same as !InVectors.empty()

Having this check assumes that InVectors can be empty here, but at the same time InVectors.front() below (at line 7267) assumes it is not empty. So which one is the right assumption?

7741–7742

perhaps worth renaming to GatherShuffles since it becomes an array of optionals

7743

This ScalarTy declaration hides one at the function scope.

7749

ScalarTy ?

7759

VecTy is already defined at 7674. Can just reuse it.

ABataev updated this revision to Diff 557790.Oct 19 2023, 10:42 AM

Rebase, address comments

ABataev updated this revision to Diff 557791.Oct 19 2023, 10:43 AM

Fix comment

ABataev updated this revision to Diff 557793.Oct 19 2023, 10:57 AM

Fix formatting

@vdmitrie Any more comments?

vdmitrie added inline comments.Oct 23 2023, 1:22 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7007

Would this sound more accurate: "While set, we are still trying ..." ?
I just did only see it cleared.

7275

InVectors.size() >=1 is same as !InVectors.empty()

Having this check assumes that InVectors can be empty here, but at the same time InVectors.front() below (at line 7267) assumes it is not empty. So which one is the right assumption?

What about the assumption? I believe "!InVectors.empty()" is always true. That is why I asked the question about assumption. This method is called twice in the patch and both have early return when InVectors is empty.

9523

Typo: "poistive"

10420–10421

nit: suggest hoisting "Entries.front().front()" subexpr and assign to dedicated local variable of type "const TreeEntry *"

ABataev added inline comments.Oct 23 2023, 2:13 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7275

Yes, this check can be dropped here.

ABataev updated this revision to Diff 557850.Oct 23 2023, 2:14 PM

Rebase, address comments

This revision is now accepted and ready to land.Oct 23 2023, 2:32 PM
RKSimon accepted this revision.Oct 25 2023, 1:25 AM

LGTM

This revision was landed with ongoing or failed builds.Oct 26 2023, 6:00 AM
This revision was automatically updated to reflect the committed changes.

Hi, this reland causes crash: https://github.com/llvm/llvm-project/commit/196d154ab7a76e8ccb11addf61ff53387e397130.

ld.lld: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7602: void llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::add(const TreeEntry &, ArrayRef<int>): Assertion `NumParts > 0 && NumParts < Mask.size() && "Expected positive number of registers."' failed.

@zequanwu Do you have a repro please?

@zequanwu Do you have a repro please?

This only happens when using -fprofile-use=. I'll try to come up with a smaller repro on Monday.

Stack trace:

clang: /usr/local/google/home/zequanwu/workspace/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7602: void llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::add(const TreeEntry &, ArrayRef<int>): Assertion `NumParts > 0 && NumParts < Mask.size() && "Expected positive number of registers."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang -MMD -MF obj/third_party/boringssl/boringssl/curve25519.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNO_UNWIND_TABLES -D_GNU_SOURCE -D_LIBCPP_ENABLE_SAFE_MODE=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCXXABI_DISABLE_VISIBILITY_ANNOTATIONS -DCR_LIBCXX_REVISION=9b27200e8219bbc9e05ba8435456554a6e35d67d -DCR_SYSROOT_KEY=20230611T210420Z-2 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DBORINGSSL_IMPLEMENTATION -D_BORINGSSL_LIBPKI_ -DBORINGSSL_ALLOW_CXX_RUNTIME -DBORINGSSL_NO_STATIC_INITIALIZER -DOPENSSL_SMALL -I../.. -Igen -I../../buildtools/third_party/libc++ -I../../third_party/boringssl/src/include -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -fno-unwind-tables -fno-asynchronous-unwind-tables -fPIC -pthread -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 -mllvm -split-threshold-for-reg-with-hint=0 -ffp-contract=off -fcomplete-member-pointers -m64 -msse3 -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -gdwarf-4 -g2 -gdwarf-aranges -ggnu-pubnames -Xclang -fuse-ctor-homing -fprofile-use=../../chrome/build/pgo_profiles/chrome-linux-main-1698407887-6c91c384bff1766cf9949635c133936a2feea1bd.profdata -Wno-profile-instr-unprofiled -Wno-profile-instr-out-of-date -Wno-backend-plugin -mllvm -enable-ext-tsp-block-placement=1 -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wall -Wno-unused-variable -Wno-c++11-narrowing -Wno-unused-but-set-variable -Wno-misleading-indentation -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wenum-compare-conditional -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wno-delayed-template-parsing-in-cxx20 -Werror -O2 -fdata-sections -ffunction-sections -fno-unique-section-names -fno-math-errno -std=c11 --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -c ../../third_party/boringssl/src/crypto/curve25519/curve25519.c -o obj/third_party/boringssl/boringssl/curve25519.o
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x0000556e5dfa28c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7cbf8c8)
 #1 0x0000556e5dfa048e llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7cbd48e)
 #2 0x0000556e5df0a786 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f92cb85a510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f92cb8a80fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f92cb85a472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f92cb8444b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f92cb8443d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f92cb8533a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x0000556e5f69792c llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::add(llvm::slpvectorizer::BoUpSLP::TreeEntry const&, llvm::ArrayRef<int>) SLPVectorizer.cpp:0:0
#10 0x0000556e5f692f38 llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallPtrSetImpl<llvm::Value*>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93aff38)
#11 0x0000556e5f69c33b llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93b933b)
#12 0x0000556e5f6c0a1a llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93dda1a)
#13 0x0000556e5f6c1a2c llvm::SLPVectorizerPass::tryToVectorize(llvm::Instruction*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93dea2c)
#14 0x0000556e5f6c4be2 llvm::SLPVectorizerPass::tryToVectorize(llvm::ArrayRef<llvm::WeakTrackingVH>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93e1be2)
#15 0x0000556e5f6c4b2f llvm::SLPVectorizerPass::vectorizeRootInstruction(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93e1b2f)
#16 0x0000556e5f6bbc2a llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d8c2a)
#17 0x0000556e5f6b96fc llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d66fc)
#18 0x0000556e5f6b8c9e llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d5c9e)
#19 0x0000556e5f37319d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#20 0x0000556e5d9f2a24 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x770fa24)
#21 0x0000556e5be0377d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) AMDGPUTargetMachine.cpp:0:0
#22 0x0000556e5d9f6cb3 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7713cb3)
#23 0x0000556e5be0351d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) AMDGPUTargetMachine.cpp:0:0
#24 0x0000556e5d9f1c04 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x770ec04)
#25 0x0000556e5e74f5b5 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#26 0x0000556e5e746102 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8463102)
#27 0x0000556e5ec426b1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#28 0x0000556e60390b56 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0xa0adb56)
#29 0x0000556e5eb59f7f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8876f7f)
#30 0x0000556e5eacb01d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x87e801d)
#31 0x0000556e5ec3ad4e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8957d4e)
#32 0x0000556e5ba36012 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x5753012)
#33 0x0000556e5ba323ed ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#34 0x0000556e5e92b9b9 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#35 0x0000556e5df0a4c6 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7c274c6)
#36 0x0000556e5e92b0c2 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x86480c2)
#37 0x0000556e5e8e6107 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8603107)
#38 0x0000556e5e8e6647 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8603647)
#39 0x0000556e5e9065e9 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x86235e9)
#40 0x0000556e5ba31836 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x574e836)
#41 0x0000556e5ba42871 main (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x575f871)
#42 0x00007f92cb8456ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#43 0x00007f92cb845785 call_init ./csu/../csu/libc-start.c:128:20
#44 0x00007f92cb845785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#45 0x0000556e5ba2e8e1 _start (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x574b8e1)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 18.0.0 (git@github.com:ZequanWu/llvm-project.git 196d154ab7a76e8ccb11addf61ff53387e397130)

@zequanwu Do you have a repro please?

I prepared the repro myself, will fix it ASAP.

thakis added a subscriber: thakis.Oct 30 2023, 7:32 AM

If this is known broken at head, can we revert while investigation is underway?

If this is known broken at head, can we revert while investigation is underway?

Will commit a fix in a minute

@zequanwu Do you have a repro please?

This only happens when using -fprofile-use=. I'll try to come up with a smaller repro on Monday.

Stack trace:

clang: /usr/local/google/home/zequanwu/workspace/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7602: void llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::add(const TreeEntry &, ArrayRef<int>): Assertion `NumParts > 0 && NumParts < Mask.size() && "Expected positive number of registers."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang -MMD -MF obj/third_party/boringssl/boringssl/curve25519.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNO_UNWIND_TABLES -D_GNU_SOURCE -D_LIBCPP_ENABLE_SAFE_MODE=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCXXABI_DISABLE_VISIBILITY_ANNOTATIONS -DCR_LIBCXX_REVISION=9b27200e8219bbc9e05ba8435456554a6e35d67d -DCR_SYSROOT_KEY=20230611T210420Z-2 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DBORINGSSL_IMPLEMENTATION -D_BORINGSSL_LIBPKI_ -DBORINGSSL_ALLOW_CXX_RUNTIME -DBORINGSSL_NO_STATIC_INITIALIZER -DOPENSSL_SMALL -I../.. -Igen -I../../buildtools/third_party/libc++ -I../../third_party/boringssl/src/include -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -fno-unwind-tables -fno-asynchronous-unwind-tables -fPIC -pthread -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 -mllvm -split-threshold-for-reg-with-hint=0 -ffp-contract=off -fcomplete-member-pointers -m64 -msse3 -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -gdwarf-4 -g2 -gdwarf-aranges -ggnu-pubnames -Xclang -fuse-ctor-homing -fprofile-use=../../chrome/build/pgo_profiles/chrome-linux-main-1698407887-6c91c384bff1766cf9949635c133936a2feea1bd.profdata -Wno-profile-instr-unprofiled -Wno-profile-instr-out-of-date -Wno-backend-plugin -mllvm -enable-ext-tsp-block-placement=1 -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wall -Wno-unused-variable -Wno-c++11-narrowing -Wno-unused-but-set-variable -Wno-misleading-indentation -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wenum-compare-conditional -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wno-delayed-template-parsing-in-cxx20 -Werror -O2 -fdata-sections -ffunction-sections -fno-unique-section-names -fno-math-errno -std=c11 --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -c ../../third_party/boringssl/src/crypto/curve25519/curve25519.c -o obj/third_party/boringssl/boringssl/curve25519.o
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x0000556e5dfa28c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7cbf8c8)
 #1 0x0000556e5dfa048e llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7cbd48e)
 #2 0x0000556e5df0a786 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f92cb85a510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f92cb8a80fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f92cb85a472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f92cb8444b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f92cb8443d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f92cb8533a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x0000556e5f69792c llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::add(llvm::slpvectorizer::BoUpSLP::TreeEntry const&, llvm::ArrayRef<int>) SLPVectorizer.cpp:0:0
#10 0x0000556e5f692f38 llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallPtrSetImpl<llvm::Value*>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93aff38)
#11 0x0000556e5f69c33b llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93b933b)
#12 0x0000556e5f6c0a1a llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93dda1a)
#13 0x0000556e5f6c1a2c llvm::SLPVectorizerPass::tryToVectorize(llvm::Instruction*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93dea2c)
#14 0x0000556e5f6c4be2 llvm::SLPVectorizerPass::tryToVectorize(llvm::ArrayRef<llvm::WeakTrackingVH>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93e1be2)
#15 0x0000556e5f6c4b2f llvm::SLPVectorizerPass::vectorizeRootInstruction(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93e1b2f)
#16 0x0000556e5f6bbc2a llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d8c2a)
#17 0x0000556e5f6b96fc llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d66fc)
#18 0x0000556e5f6b8c9e llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x93d5c9e)
#19 0x0000556e5f37319d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#20 0x0000556e5d9f2a24 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x770fa24)
#21 0x0000556e5be0377d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) AMDGPUTargetMachine.cpp:0:0
#22 0x0000556e5d9f6cb3 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7713cb3)
#23 0x0000556e5be0351d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) AMDGPUTargetMachine.cpp:0:0
#24 0x0000556e5d9f1c04 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x770ec04)
#25 0x0000556e5e74f5b5 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#26 0x0000556e5e746102 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8463102)
#27 0x0000556e5ec426b1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#28 0x0000556e60390b56 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0xa0adb56)
#29 0x0000556e5eb59f7f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8876f7f)
#30 0x0000556e5eacb01d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x87e801d)
#31 0x0000556e5ec3ad4e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8957d4e)
#32 0x0000556e5ba36012 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x5753012)
#33 0x0000556e5ba323ed ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#34 0x0000556e5e92b9b9 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#35 0x0000556e5df0a4c6 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x7c274c6)
#36 0x0000556e5e92b0c2 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x86480c2)
#37 0x0000556e5e8e6107 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8603107)
#38 0x0000556e5e8e6647 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x8603647)
#39 0x0000556e5e9065e9 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x86235e9)
#40 0x0000556e5ba31836 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x574e836)
#41 0x0000556e5ba42871 main (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x575f871)
#42 0x00007f92cb8456ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#43 0x00007f92cb845785 call_init ./csu/../csu/libc-start.c:128:20
#44 0x00007f92cb845785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#45 0x0000556e5ba2e8e1 _start (/usr/local/google/home/zequanwu/workspace/llvm-project/build/cmake/bin/clang+0x574b8e1)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 18.0.0 (git@github.com:ZequanWu/llvm-project.git 196d154ab7a76e8ccb11addf61ff53387e397130)

Fixed in af15c46777208a4cb4b276c4974a5b556608a415

Fixed in af15c46777208a4cb4b276c4974a5b556608a415

Thanks, confirmed that is fixed.