Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

Vectorization Of Conditional Statements Using BOSCC
Needs ReviewPublic

Authored by Ashutosh on Nov 30 2022, 10:20 PM.



This is a proposal about implementing vectorization of conditional statements using BOSCC(branches-on-superword-condition-codes).

Current LLVM's Loop vectorization deploy flattning of control flow where the guarded code is executed in all the paths with the help of predicate mask instructions. In many cases the execution of the instructions in all control paths is not optimal.

BOSCC inserts branches that skip a region if the predicate of the region entry evaluates to false.

Consider below loop:

for (unsigned i = 0; i < len; i++) {
  if (X[i]) {
    A[i] = B[i] + C[i];
  } else {
    D[i] = E[i] * F[i];

Existing LLVM Flatten Style Vectorization:

for (unsigned i = 0; i < len; i+=4) {
  VectorMask = (X[i,i+1,i+2,i+3] != <0,0,0,0>);
  FlipVectorMask = XOR VectorMask <true, true, true, true>
  Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] 
                                     + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask
  Mask.Vector.Store.D[i,i+1,i+2,i+3] = Mask.Vector.Load.E[i,i+1,i+2,i+3] 
                                     * Mask.Vector.Load.F[i,i+1,i+2,i+3]; // Based on FlipVectorMask

BOSCC Style Vectorization:

for (unsigned i = 0; i < len; i+=4) {
  VectorMask = (X[i,i+1,i+2,i+3] != <0,0,0,0>);
  VectorMaskScalar = VectorToScalarCast VectorMask;

  if (VectorMaskScalar) {
    Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] 
                                       + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask

  FlipVectorMask = XOR VectorMask <true, true, true, true>
  FlipVectorMaskScalar = VectorToScalarCast FlipVectorMask;

  if (FlipVectorMaskScalar) {
    Mask.Vector.Store.D[i,i+1,i+2,i+3] = Mask.Vector.Load.E[i,i+1,i+2,i+3] 
                                       * Mask.Vector.Load.F[i,i+1,i+2,i+3]; // Based on FlipVectorMask

Under this change we introduce the following:
1: BOSCCBlockPlanner : Facilitates to generate the required block layout for BOSCC blocks during VPlan.
2: New Recipes:

a: VPBranchOnBOSCCGuardSC : This recipe is responsible for generating the required conditional entry check on a vector block.
b: VPBOSCCLiveOutRecipe : This recipe is responsible to generate PHI for the live out from the guarded vector blocks.

Diff Detail

Event Timeline

Ashutosh created this revision.Nov 30 2022, 10:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 10:20 PM
Ashutosh requested review of this revision.Nov 30 2022, 10:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 10:20 PM
nlopes added inline comments.Dec 1 2022, 12:31 AM

Please use poison instead of undef as placeholder whenever possible. We are trying to remove undef from LLVM.
Thank you!

Ashutosh added inline comments.Dec 2 2022, 1:10 AM

Sure i'll address this.

Hi Ashutosh,
Did you do benchmarking with this patch? For example, does this benefit/regress SPEC'17 by any means on x86 or AArch64 platforms? I see that it is disabled by default but it is useful to understand the impact. Thanks!

I do not see any change in SPEC Benchmark performed on my threadripper 3995wx box.

Below are the run results for SPEC2017 C/C++ Benchmarks:

SPEC2017Benchmark Baseline BOSCC
500.perlbench_r 330 330
502.gcc_r 192 192
505.mcf_r 289 289
520.omnetpp_r 428 429
523.xalancbmk_r 207 209
525.x264_r 140 140
531.deepsjeng_r 269 268
541.leela_r 354 351
557.xz_r 292 293
508.namd_r 151 152
510.parest_r 228 229
511.povray_r 326 328
519.lbm_r 152 151
538.imagick_r 285 283
544.nab_r 187 187

I'm ignoring couple of second variations, probably these are run to run variations.

mdchen added a subscriber: mdchen.Dec 5 2022, 1:23 AM
Matt added a subscriber: Matt.Dec 7 2022, 6:24 PM

Gentle Reminder !

Dear Reviewers and Community,

Greetings for the new year !

This patch is under review since a long time, please let me know how I can help to make progress.

Please feel free to ask if you need more information in terms of design details, or more test cases, etc..


fhahn added a comment.Jan 5 2023, 2:53 PM

Thanks for sharing the patch and SPEC data. AFAICT there's no impact on the benchmarks you mentioned. Do you have any data on workloads where this transform is beneficial?


We have observed performance uplift for some internal benchmarks and applications.

In general BOSCC guards benefitting large basic blocks.

Tried TSVC and got mixed results:
Loop Baseline Time(sec) BOSCC Time(sec) Baseline/BOSCC
s253 26.424 27.844 0.95
s271 54.904 59.162 0.93
s2711 54.793 59.333 0.92
s2712 56.953 61.567 0.93
s441 49.145 16.976 2.89
s482 23 20.39 1.13
s272 26.103 1.486 17.57
s278 28.273 17.577 1.61
s279 22.587 10.419 2.17
s1279 15.404 2.858 5.39
s2710 28.848 17.956 1.61


Ashutosh updated this revision to Diff 500522.Feb 26 2023, 12:40 AM

Addressed the regressions in TSVC and updated the patch.

Observed these regressions are due to extra shuffle generated during append mask functionality.

This was appending compare mask corresponding to various unroll parts and then considering the appended mask in vector guard check.

VectorMask1 = (X[i,i+1,i+2,i+3] != <0,0,0,0>);  // Mask From UnrollPart1
VectorMask2 = (X[i+4,i+5,i+6,i+7] != <0,0,0,0>); // Mask From UnrollPart2

AppendedVectorMask = AppendVectorMask VectorMask1, VectorMask2;
VectorMaskScalar = VectorToScalarCast AppendedVectorMask;

if (VectorMaskScalar) {
  Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] 
                                     + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask1
  Mask.Vector.Store.A[i+4,i+5,i+6,i+7] = Mask.Vector.Load.B[i+4,i+5,i+6,i+7] 
                                       + Mask.Vector.Load.C[i+4,i+5,i+6,i+7]; // Based on VectorMask2


This was not optimal specially for cases where the unroll factor is >=4, because it has to promote the mask to match the types.

Instead now updated the guard check to consider compare mask corresponding to each unroll part separately:

VectorMask1 = (X[i,i+1,i+2,i+3] != <0,0,0,0>);  // Mask From UnrollPart1
VectorMask2 = (X[i+4,i+5,i+6,i+7] != <0,0,0,0>);  // Mask From UnrollPart2

VectorMaskScalar1 = VectorToScalarCast VectorMask1;
VectorMaskScalar2 = VectorToScalarCast VectorMask2;
VectorMaskScalar = VectorMaskScalar1 OR VectorMaskScalar2
if (VectorMaskScalar) {
  Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] 
                                     + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask1
  Mask.Vector.Store.A[i+4,i+5,i+6,i+7] = Mask.Vector.Load.B[i+4,i+5,i+6,i+7] 
                                       + Mask.Vector.Load.C[i+4,i+5,i+6,i+7]; // Based on VectorMask2


Ashutosh added a comment.EditedFeb 26 2023, 12:48 AM

TSVC Improvements (Runtime In Sec):

LOOPBOSCCBaseline(Flatten)Uplift Percentage
foad added a subscriber: foad.May 10 2023, 4:18 AM

Gentle reminder !

fhahn added a comment.May 30 2023, 8:16 AM

It looks like the patch doesn't apply cleanly any longer. Could you rebase it?


no need for private:


Did you find any cases where the transform isn't profitable? Shouldn't this take into account at least the potential savings if the block isn't executed together with the amount of execution units/resource saturation?

Also, should this take into account profile info and avoid the transform if the block is executed frequently?


dead code?


dead code?

126 ↗(On Diff #500522)

Is this needed? If possible that should avoided.


The recipe shouldn't directly reference VPBasicBlocks; this may cause issues if the VPBasicBlocks are renmoved or replaced for example. Is it possible to use the predecessors in the CFG in the VPlan?


Could you explain why this is needed (as well as markBOSCCBlock and isBOSCCBlock)? It would be good to avoid having to put those details into VPBasicBlock.


Is this needed? If a VPlan transform removes a BSOCC block then this will become out-of-date.


Please try to clean up the input IR for the test. It appears like there many unused arguments, function attributes and metadata.


check should not be needed and can be simplified.


just pass %len as i64 so there's no need to zext

Ashutosh updated this revision to Diff 528745.Jun 6 2023, 2:07 AM

Updated patch with latest rev

Ashutosh marked 6 inline comments as done.Jun 6 2023, 2:26 AM
Ashutosh added inline comments.

Having an additional check is a tradeoff, in general we have experienced covering expensive or large block is beneficial compared to small sized block.
Definitely with the help of profile info we can make more precise judgements.
I'll check the possibility to include that.


I understood your point, but for this recipe its important to know the guard block and continue block.

Using this info it will add the required incomings to the live out PHI.

Possibly we can maintain the mapping for this information in VPlan and while executing VPBOSCCLiveOutRecipe it can infer the details from it ?

If you have any alternatives please suggest.


This is required to understand the BOSCC blocks during mergeBlocksIntoPredecessors.
It merges the BOSCC blocks, which troubles the live out generations.

Gentle Reminder !

Allen added a subscriber: Allen.Jul 6 2023, 7:07 PM
Matt added a comment.Aug 1 2023, 8:19 PM


While trying out the patch I've run into an internal compiler error (ICE): It looks like an assertion violation coming from collectPoisonGeneratingRecipes (on VPWidenMemoryInstructionRecipe).

To reproduce, apply the patch on top of a recent LLVM and build with asserts enabled; in particular, I've used b6847edfc235829b37dd6d734ef5bbfa0a58b6fc and CMAKE_BUILD_TYPE=RelWithDebInfo.

Then, use the newly built clang++ to compile the reproducer source code "repro.cpp" (below) as follows :

clang++ -O2 -march=znver4 -mllvm -enable-boscc-vectorization -c repro.cpp
clang++ -O2 -march=x86-64-v4 -mllvm -enable-boscc-vectorization -c repro.cpp

Either -march=znver4 or -march=x86-64-v4 suffices (or another AVX-512 target, like -march=skylake-avx512).
OTOH, the ICE does not occur at other optimization levels (O1, O3, Ofast, Os) or pre-AVX-512 x86 targets like znver1-znver3 or x86-64-v3.

Reproducer source code:

// repro.cpp

bool condition;
double *p_a, *p_b;
long begin;
long end;

void f() {
  auto lambda = [](long i) {
    if (condition) {
      p_a[i] = p_b[i];
      if (p_a[i])
        p_a[i] = 0.0;
  for (long i = begin; i != end; ++i) {


clang++: /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:109: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::Instruction; From = llvm::Value]: Assertion `Val && "isa<> used on a null pointer"' failed.
PLEASE submit a bug report to and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /subdir/src/build_x86-release-asserts-boscc/bin/clang++ -O2 -march=x86-64-v4 -mllvm -enable-boscc-vectorization -c repro.cpp
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00007feccea5a2be llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /subdir/src/llvm-project/llvm/lib/Support/Unix/
 #1 0x00007feccea58266 llvm::sys::RunSignalHandlers() /subdir/src/llvm-project/llvm/lib/Support/Signals.cpp:104:20
 #2 0x00007fecce9820c8 HandleCrash /subdir/src/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #3 0x00007fecce9820c8 CrashRecoverySignalHandler(int) /subdir/src/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:62
 #4 0x00007feccdf5fd00 __restore_rt (/lib64/
 #5 0x00007feccdf5fc6b raise (/lib64/
 #6 0x00007feccdf61305 abort (/lib64/
 #7 0x00007feccdf57c6a __assert_fail_base (/lib64/
 #8 0x00007feccdf57cf2 (/lib64/
 #9 0x00007fecd11067b3 (/subdir/src/build_x86-release-asserts-boscc/bin/../lib/
#10 0x00007fecd110eb8d llvm::isa_impl_cl<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase const*>::doit(llvm::VPRecipeBase const*) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:109:5
#11 0x00007fecd110eb8d llvm::isa_impl_cl<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase const*>::doit(llvm::VPRecipeBase const*) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:108:22
#12 0x00007fecd110eb8d llvm::isa_impl_wrap<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase const*, llvm::VPRecipeBase const*>::doit(llvm::VPRecipeBase const* const&) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:137:41
#13 0x00007fecd110eb8d llvm::isa_impl_wrap<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase const* const, llvm::VPRecipeBase const*>::doit(llvm::VPRecipeBase const* const&) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:129:13
#14 0x00007fecd110eb8d llvm::CastIsPossible<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase const*, void>::isPossible(llvm::VPRecipeBase const* const&) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:257:62
#15 0x00007fecd110eb8d llvm::CastInfo<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase* const, void>::isPossible(llvm::VPRecipeBase* const&) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:509:38
#16 0x00007fecd110eb8d bool llvm::isa<llvm::VPWidenMemoryInstructionRecipe, llvm::VPRecipeBase*>(llvm::VPRecipeBase* const&) /subdir/src/llvm-project/llvm/include/llvm/Support/Casting.h:549:46
#17 0x00007fecd110eb8d llvm::InnerLoopVectorizer::collectPoisonGeneratingRecipes(llvm::VPTransformState&)::'lambda'(llvm::VPRecipeBase*)::operator()(llvm::VPRecipeBase*) const /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1133:46
#18 0x00007fecd11373e7 llvm::InnerLoopVectorizer::collectPoisonGeneratingRecipes(llvm::VPTransformState&) /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1171:55
#19 0x00007fecd115d279 llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>>*) /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7795:30
#20 0x00007fecd116715a llvm::LoopVectorizePass::processLoop(llvm::Loop*) /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10736:24
#21 0x00007fecd11692cf llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10833:27
#22 0x00007fecd1169c02 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /subdir/src/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10867:5
#23 0x00007fecccec45a5 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /subdir/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:90:3
#24 0x00007fecd2367373 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /subdir/src/llvm-project/llvm/include/llvm/IR/PassManager.h:521:20
#25 0x00007fecd2367373 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /subdir/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:89:41
#26 0x00007feccf4cde52 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /subdir/src/llvm-project/llvm/lib/IR/PassManager.cpp:129:41
#27 0x00007fecd2356355 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /subdir/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:90:3
#28 0x00007feccf4cb763 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /subdir/src/llvm-project/llvm/include/llvm/IR/PassManager.h:521:20
#29 0x00007fecd23688d1 llvm::SmallPtrSetImplBase::isSmall() const /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:195:33
#30 0x00007fecd23688d1 llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:83:17
#31 0x00007fecd23688d1 llvm::SmallPtrSetImpl<llvm::AnalysisKey*>::~SmallPtrSetImpl() /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:345:7
#32 0x00007fecd23688d1 llvm::SmallPtrSet<llvm::AnalysisKey*, 2u>::~SmallPtrSet() /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:451:7
#33 0x00007fecd23688d1 llvm::PreservedAnalyses::~PreservedAnalyses() /subdir/src/llvm-project/llvm/include/llvm/IR/PassManager.h:152:7
#34 0x00007fecd23688d1 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) /subdir/src/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1100:12
#35 0x00007fecd236bbf9 RunCodegenPipeline /subdir/src/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1110:23
#36 0x00007fecd236bbf9 EmitAssembly /subdir/src/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1158:21
#37 0x00007fecd236bbf9 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) /subdir/src/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1320:25
#38 0x00007fecd27d5c1f llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>::release() /subdir/src/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:223:9
#39 0x00007fecd27d5c1f llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>::~IntrusiveRefCntPtr() /subdir/src/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:191:34
#40 0x00007fecd27d5c1f clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /subdir/src/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:386:24
#41 0x00007feccbc03869 clang::ParseAST(clang::Sema&, bool, bool) /subdir/src/llvm-project/clang/lib/Parse/ParseAST.cpp:176:34
#42 0x00007fecd0facd71 clang::FrontendAction::Execute() /subdir/src/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1058:21
#43 0x00007fecd0f37bf4 llvm::Error::getPtr() const /subdir/src/llvm-project/llvm/include/llvm/Support/Error.h:270:51
#44 0x00007fecd0f37bf4 llvm::Error::operator bool() /subdir/src/llvm-project/llvm/include/llvm/Support/Error.h:233:22
#45 0x00007fecd0f37bf4 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /subdir/src/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1053:42
#46 0x00007fecd2ebd4b5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /subdir/src/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:273:32
#47 0x00000000004137b6 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /subdir/src/llvm-project/clang/tools/driver/cc1_main.cpp:249:40
#48 0x000000000040ca30 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) /subdir/src/llvm-project/clang/tools/driver/driver.cpp:366:20
#49 0x00007fecd0c4f2d1 operator() /subdir/src/llvm-project/clang/lib/Driver/Job.cpp:440:32
#50 0x00007fecd0c4f2d1 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) /subdir/src/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:52
#51 0x00007fecce982235 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /subdir/src/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:427:10
#52 0x00007fecd0c4f9ee clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) /subdir/src/llvm-project/clang/lib/Driver/Job.cpp:444:10
#53 0x00007fecd0c14554 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const /subdir/src/llvm-project/clang/lib/Driver/Compilation.cpp:200:3
#54 0x00007fecd0c14d0c clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const /subdir/src/llvm-project/clang/lib/Driver/Compilation.cpp:253:5
#55 0x00007fecd0c2182c llvm::SmallVectorBase<unsigned int>::empty() const /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallVector.h:94:46
#56 0x00007fecd0c2182c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) /subdir/src/llvm-project/clang/lib/Driver/Driver.cpp:1906:28
#57 0x0000000000411a8a llvm::SmallVectorBase<unsigned int>::size() const /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallVector.h:91:32
#58 0x0000000000411a8a llvm::SmallVectorTemplateCommon<std::pair<int, clang::driver::Command const*>, void>::end() /subdir/src/llvm-project/llvm/include/llvm/ADT/SmallVector.h:272:41
#59 0x0000000000411a8a clang_main(int, char**, llvm::ToolContext const&) /subdir/src/llvm-project/clang/tools/driver/driver.cpp:544:26
#60 0x000000000040b234 main /subdir/src/build_x86-release-asserts-boscc/tools/clang/tools/driver/clang-driver.cpp:16:1
#61 0x00007feccdf4a24d __libc_start_main (/lib64/
#62 0x000000000040b26a _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:122:0

clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 18.0.0 ( b6847edfc235829b37dd6d734ef5bbfa0a58b6fc)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /subdir/src/build_x86-release-asserts-boscc/bin
Ashutosh updated this revision to Diff 546699.Aug 2 2023, 10:42 PM

Thanks for reporting the failure, this patch includes the fix for it.
Also rebased it to align it latest changes.

Matt added a reviewer: Matt.Aug 18 2023, 3:57 PM
Matt added a comment.Aug 18 2023, 3:59 PM

Ashutosh: Thank you for the fix! I can confirm I have checked the revised version of the patch and no longer see the ICE on my end.

I'm going to leave a few (relatively minor) review comments; other than that, the patch can likely benefit from a final look from an SLP code owner(s), but it may be useful overall.

Matt added inline comments.Aug 18 2023, 4:15 PM

(Extremely minor / formatting: stray space replaced with a comma after i.e.)


I'm curious about the difference from AOCC here. Context: My understanding (from the EuroLLVM 2023) talk is that this has been already implemented in AOCC before. I'm wondering, were the cost model decisions / profitability analysis the same? I'm curious about it in the context of the (already resolved) ICE--is it possible that AOCC was able to avoid running into an ICE whereas this patch previously exposed a possibly missing precondition/assumption (could there be any more follow-on issues?)


(Minor typo; tense): s/corrosponds/corresponds/


(Minor typo) s/Auxilary/Auxiliary/


(I presume we care about usage across more than one basic block in general?)


Is it intentional to have EnableBOSCCVectorization in the global namespace (outside of namespace llvm, cf. EnableVPlanNativePath a few lines above)?


This comment appears to be unfinished.


(update comment alone to reflect already correctly using PoisonValue in the code)


(update comment alone to reflect already correctly using PoisonValue in the code)


I think this function could benefit from more comments explaining the logic--similarly to comments in createVectorToScalarCast in this patch (which are pretty good!)


(Coding style) Nit: Preincrement form ++RecipeI is preferable (particularly for an iterator),

Ashutosh updated this revision to Diff 555657.Sep 3 2023, 11:37 PM

Addressed review comments.
Requesting SLP developers/owners and other reviewers to have a look.

Ashutosh marked 15 inline comments as done.Sep 3 2023, 11:45 PM
Ashutosh added inline comments.

Thanks for the review Mat. Unfortunately the reported ICE was due to some design changes made to upstream this change. I don’t anticipate further issues with this change.

Ashutosh marked an inline comment as done.Sep 25 2023, 7:45 AM