This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve isUndefVector function by adding insertelement analysis.
ClosedPublic

Authored by ABataev on Sep 14 2022, 1:37 PM.

Details

Summary

Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                          results                   results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00                  27360.00 -0.0%

Metric: size..text

Program                                                                                                           size..text
                                                                   results     results0    diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   805299.00   806035.00  0.1%

526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.

Diff Detail

Event Timeline

ABataev created this revision.Sep 14 2022, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 1:37 PM
ABataev requested review of this revision.Sep 14 2022, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 1:37 PM
ABataev updated this revision to Diff 460268.Sep 14 2022, 5:13 PM

Relaxed constraints, updated function description

vdmitrie accepted this revision.Sep 15 2022, 2:37 PM

Looks good.

This revision is now accepted and ready to land.Sep 15 2022, 2:37 PM
RKSimon edited the summary of this revision. (Show Details)Sep 15 2022, 2:41 PM
This revision was landed with ongoing or failed builds.Sep 16 2022, 2:41 PM
This revision was automatically updated to reflect the committed changes.
srj added a subscriber: srj.Sep 20 2022, 11:33 AM

This change seems to have injected a bad-call-to-free() crash in some Halide code. Looking to get a good repro case for you now.

srj added a comment.Sep 20 2022, 12:21 PM

As of yet, I haven't been unable to get a .ll file that will repro this -- we crash while generating it, and AFAIK there isn't a way to defer running the SLP pass until llc -- so currently the simplest repro case requires building Halide locally. In the meantime, here's a stacktrace of the failure:

#1  0x00007ffff07a6546 in __GI_abort () at abort.c:79
#2  0x00007ffff07fded8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff091bc2f "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff080593a in malloc_printerr (str=str@entry=0x7ffff091e0d0 "free(): invalid next size (fast)") at malloc.c:5628
#4  0x00007ffff0806d94 in _int_free (av=0x7ffff0952b80 <main_arena>, p=0x555556250330, have_lock=have_lock@entry=0) at malloc.c:4481
#5  0x00007ffff080a9d4 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309
#6  0x00007ffff3b8dc7a in llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#7  0x00007ffff3ba4a11 in llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> >, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> > > > >&) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#8  0x00007ffff3ba71b4 in llvm::slpvectorizer::BoUpSLP::vectorizeTree() [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#9  0x00007ffff3bb4f0b in llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#10 0x00007ffff3bb6976 in llvm::SLPVectorizerPass::vectorizeInsertElementInst(llvm::InsertElementInst*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) [clone .localalias] () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#11 0x00007ffff3bb8096 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions(llvm::SmallSetVector<llvm::Instruction*, 8u>&, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, bool) [clone .localalias] () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#12 0x00007ffff3bbb77d in llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#13 0x00007ffff3bbef23 in llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::--Type <RET> for more, q to quit, c to continue without paging--
AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#14 0x00007ffff3bbfcdd in llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#15 0x00007ffff368cd9d in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#16 0x00007ffff24b9d2c in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#17 0x00007ffff4bcec07 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#18 0x00007ffff15afda5 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#19 0x00007ffff15a6e9c in Halide::Internal::CodeGen_LLVM::optimize_module() () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so

As of yet, I haven't been unable to get a .ll file that will repro this -- we crash while generating it, and AFAIK there isn't a way to defer running the SLP pass until llc -- so currently the simplest repro case requires building Halide locally. In the meantime, here's a stacktrace of the failure:

#1  0x00007ffff07a6546 in __GI_abort () at abort.c:79
#2  0x00007ffff07fded8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff091bc2f "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff080593a in malloc_printerr (str=str@entry=0x7ffff091e0d0 "free(): invalid next size (fast)") at malloc.c:5628
#4  0x00007ffff0806d94 in _int_free (av=0x7ffff0952b80 <main_arena>, p=0x555556250330, have_lock=have_lock@entry=0) at malloc.c:4481
#5  0x00007ffff080a9d4 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309
#6  0x00007ffff3b8dc7a in llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#7  0x00007ffff3ba4a11 in llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> >, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> > > > >&) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#8  0x00007ffff3ba71b4 in llvm::slpvectorizer::BoUpSLP::vectorizeTree() [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#9  0x00007ffff3bb4f0b in llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#10 0x00007ffff3bb6976 in llvm::SLPVectorizerPass::vectorizeInsertElementInst(llvm::InsertElementInst*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) [clone .localalias] () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#11 0x00007ffff3bb8096 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions(llvm::SmallSetVector<llvm::Instruction*, 8u>&, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, bool) [clone .localalias] () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#12 0x00007ffff3bbb77d in llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#13 0x00007ffff3bbef23 in llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::--Type <RET> for more, q to quit, c to continue without paging--
AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) [clone .localalias] ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#14 0x00007ffff3bbfcdd in llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#15 0x00007ffff368cd9d in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#16 0x00007ffff24b9d2c in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#17 0x00007ffff4bcec07 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
   from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#18 0x00007ffff15afda5 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so
#19 0x00007ffff15a6e9c in Halide::Internal::CodeGen_LLVM::optimize_module() () from /usr/local/google/home/srj/GitHub/Halide/distrib/lib/libHalide.so

You can use -mllvm -opt-bisect-limit to limit number of transformation passes

srj added a comment.Sep 20 2022, 1:00 PM

You can use -mllvm -opt-bisect-limit to limit number of transformation passes

I'm sorry, I don't understand -- are those flags to llc? If so, that doesn't help me; if I leave PipelineTuningOptions.SLPVectorization=true in Halide's codegen, we crash before any .ll can be generated (and thus flags to llc are irrelevant). If I set PipelineTuningOptions.SLPVectorization=false in Halide's codegen, I get .ll out, but I don't see any way to apply that pass via llc.

srj added a comment.EditedSep 20 2022, 1:01 PM

Slightly more detailed traceback from a RelWithDebInfo build of LLVM. Looks like one of the SmallVectors created during processing of case Instruction::InsertElement is bad when we try to free it? I wonder if Mask.swap(PrevMask); on SLPVectorizer.cpp:8160 could be suspicious here.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff0adc546 in __GI_abort () at abort.c:79
#2  0x00007ffff0b33ed8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff0c51c2f "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff0b3b93a in malloc_printerr (str=str@entry=0x7ffff0c540d0 "free(): invalid next size (fast)") at malloc.c:5628
#4  0x00007ffff0b3cd94 in _int_free (av=0x7ffff0c88b80 <main_arena>, p=0x555555d65cb0, have_lock=have_lock@entry=0) at malloc.c:4481
#5  0x00007ffff0b409d4 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309
#6  0x00007ffff3c2bed9 in llvm::SmallVector<int, 12u>::~SmallVector (this=0x7fffffff76a0, __in_chrg=<optimized out>)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/ADT/SmallVector.h:1187
#7  llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=0x7fffffff9290, E=0x555555e68ef0)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8199
#8  0x00007ffff3c34370 in llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=0x7fffffff9290, ExternallyUsedValues=...)
    at /usr/include/c++/12/bits/unique_ptr.h:461
#9  0x00007ffff3c36ae6 in llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=<optimized out>)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8610
#10 0x00007ffff3c492b8 in llvm::SLPVectorizerPass::tryToVectorizeList (this=<optimized out>, VL=..., R=..., LimitForRegisterSize=false)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10588
#11 0x00007ffff3c4bce0 in llvm::SLPVectorizerPass::vectorizeInsertElementInst (this=0x555556603868, IEI=0x555555c76880, BB=<optimized out>, R=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/ADT/SmallVector.h:277
#12 0x00007ffff3c4cb52 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions (this=0x555556603868, Instructions=..., BB=0x555555edf5a0, R=..., 
    AtTerminator=<optimized out>) at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12182
#13 0x00007ffff3c4e35a in llvm::SLPVectorizerPass::vectorizeChainsInBlock (this=0x555556603868, BB=0x555555edf5a0, R=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/Instruction.h:185
#14 0x00007ffff3c50d7b in llvm::SLPVectorizerPass::runImpl (this=0x555556603868, F=..., SE_=<optimized out>, TTI_=<optimized out>, TLI_=<optimized out>, 
    AA_=<optimized out>, LI_=0x555555f4cb48, DT_=0x555555db9f08, AC_=0x5555565d52c8, DB_=0x555555ea20d8, ORE_=0x555555d9ed68)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10233
#15 0x00007ffff3c51db8 in llvm::SLPVectorizerPass::run (this=0x555556603868, F=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10164
#16 0x00007ffff37641cd in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=<optimized out>, IR=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/PassManagerInternal.h:86
#17 0x00007ffff271bdd5 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (AM=..., IR=..., this=<optimized out>) at /usr/include/c++/12/bits/unique_ptr.h:461
#18 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=<optimized out>, IR=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/PassManagerInternal.h:88
#19 0x00007ffff4b17749 in llvm::ModuleToFunctionPassAdaptor::run (this=<optimized out>, M=..., AM=...) at /usr/include/c++/12/bits/unique_ptr.h:461
#20 0x00007ffff18ebe35 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/MC/SubtargetFeature.h:121
#21 0x00007ffff18e2f2c in Halide::Internal::CodeGen_LLVM::optimize_module() ()

You can use -mllvm -opt-bisect-limit to limit number of transformation passes

I'm sorry, I don't understand -- are those flags to llc? If so, that doesn't help me; if I leave PipelineTuningOptions.SLPVectorization=true in Halide's codegen, we crash before any .ll can be generated (and thus flags to llc are irrelevant). If I set PipelineTuningOptions.SLPVectorization=false in Halide's codegen, I get .ll out, but I don't see any way to apply that pass via llc.

No, it is for opt (just -opt-bisect-limit() or clang/clang++ (-mllvm -opt-bisect-limit).

Slightly more detailed traceback from a RelWithDebInfo build of LLVM. Looks like one of the SmallVectors created during processing of case Instruction::InsertElement is bad when we try to free it? I wonder if Mask.swap(PrevMask); on SLPVectorizer.cpp:8160 could be suspicious here.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff0adc546 in __GI_abort () at abort.c:79
#2  0x00007ffff0b33ed8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff0c51c2f "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff0b3b93a in malloc_printerr (str=str@entry=0x7ffff0c540d0 "free(): invalid next size (fast)") at malloc.c:5628
#4  0x00007ffff0b3cd94 in _int_free (av=0x7ffff0c88b80 <main_arena>, p=0x555555d65cb0, have_lock=have_lock@entry=0) at malloc.c:4481
#5  0x00007ffff0b409d4 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309
#6  0x00007ffff3c2bed9 in llvm::SmallVector<int, 12u>::~SmallVector (this=0x7fffffff76a0, __in_chrg=<optimized out>)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/ADT/SmallVector.h:1187
#7  llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=0x7fffffff9290, E=0x555555e68ef0)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8199
#8  0x00007ffff3c34370 in llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=0x7fffffff9290, ExternallyUsedValues=...)
    at /usr/include/c++/12/bits/unique_ptr.h:461
#9  0x00007ffff3c36ae6 in llvm::slpvectorizer::BoUpSLP::vectorizeTree (this=<optimized out>)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8610
#10 0x00007ffff3c492b8 in llvm::SLPVectorizerPass::tryToVectorizeList (this=<optimized out>, VL=..., R=..., LimitForRegisterSize=false)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10588
#11 0x00007ffff3c4bce0 in llvm::SLPVectorizerPass::vectorizeInsertElementInst (this=0x555556603868, IEI=0x555555c76880, BB=<optimized out>, R=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/ADT/SmallVector.h:277
#12 0x00007ffff3c4cb52 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions (this=0x555556603868, Instructions=..., BB=0x555555edf5a0, R=..., 
    AtTerminator=<optimized out>) at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12182
#13 0x00007ffff3c4e35a in llvm::SLPVectorizerPass::vectorizeChainsInBlock (this=0x555556603868, BB=0x555555edf5a0, R=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/Instruction.h:185
#14 0x00007ffff3c50d7b in llvm::SLPVectorizerPass::runImpl (this=0x555556603868, F=..., SE_=<optimized out>, TTI_=<optimized out>, TLI_=<optimized out>, 
    AA_=<optimized out>, LI_=0x555555f4cb48, DT_=0x555555db9f08, AC_=0x5555565d52c8, DB_=0x555555ea20d8, ORE_=0x555555d9ed68)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10233
#15 0x00007ffff3c51db8 in llvm::SLPVectorizerPass::run (this=0x555556603868, F=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10164
#16 0x00007ffff37641cd in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=<optimized out>, IR=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/PassManagerInternal.h:86
#17 0x00007ffff271bdd5 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (AM=..., IR=..., this=<optimized out>) at /usr/include/c++/12/bits/unique_ptr.h:461
#18 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=<optimized out>, IR=..., AM=...)
    at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/IR/PassManagerInternal.h:88
#19 0x00007ffff4b17749 in llvm::ModuleToFunctionPassAdaptor::run (this=<optimized out>, M=..., AM=...) at /usr/include/c++/12/bits/unique_ptr.h:461
#20 0x00007ffff18ebe35 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () at /usr/local/google/home/srj/GitHub/llvm-project/16/llvm/include/llvm/MC/SubtargetFeature.h:121
#21 0x00007ffff18e2f2c in Halide::Internal::CodeGen_LLVM::optimize_module() ()

Hmmm, I wonder what's wrong with the swap here but I'll check.

srj added a comment.Sep 20 2022, 1:20 PM

Running in a different environment with a bit more runtime checking gives me this:

F0000 00:00:1663704938.984352  305116 debugallocation.cc:397] RAW: memory stomping bug: a word after object at 0x26193e2a84f0 has been corrupted
    @     0x55c15c135380  absl::raw_log_internal::RawLog()
    @     0x55c15c028832  MallocBlock::CheckLocked()
    @     0x55c15c028577  MallocBlock::CheckAndClear()
    @     0x55c15c1bb892  __libc_free
    @     0x55c15aed36c2  llvm::slpvectorizer::BoUpSLP::getEntryCost()
    @     0x55c15aed9c83  llvm::slpvectorizer::BoUpSLP::getTreeCost()
    @     0x55c15aeee58c  llvm::SLPVectorizerPass::tryToVectorizeList()
    @     0x55c15aef26e5  llvm::SLPVectorizerPass::vectorizeInsertElementInst()
    @     0x55c15aef294b  llvm::SLPVectorizerPass::vectorizeSimpleInstructions()
    @     0x55c15aeeaec7  llvm::SLPVectorizerPass::vectorizeChainsInBlock()
    @     0x55c15aee8e30  llvm::SLPVectorizerPass::runImpl()
    @     0x55c15aee872a  llvm::SLPVectorizerPass::run()
srj added a comment.Sep 20 2022, 1:33 PM

OK, I finally got a failure with ASAN enabled, hopefully this will be enough to track it down:

=================================================================
==349869==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60b000dd8e04 at pc 0x55c2ef8886fe bp 0x7ffdda88ecb0 sp 0x7ffdda88e470
WRITE of size 64 at 0x60b000dd8e04 thread T0
    #0 0x55c2ef8886fd in __asan_memmove third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:30:3
    #1 0x55c2f485ecdb in __copy_impl<int, int, void> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__algorithm/copy.h:56:5
    #2 0x55c2f485ecdb in __copy<int *, int *, int *, 0> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__algorithm/copy.h:94:18
    #3 0x55c2f485ecdb in copy<int *, int *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__algorithm/copy.h:103:10
    #4 0x55c2f485ecdb in copy<llvm::SmallVector<int, 12U> &, int *> third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1642:10
    #5 0x55c2f485ecdb in llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6416:7
    #6 0x55c2f4869ebd in llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7218:25
    #7 0x55c2f488d4d2 in llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10575:32
    #8 0x55c2f4894dfc in llvm::SLPVectorizerPass::vectorizeInsertElementInst(llvm::InsertElementInst*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12031:10
    #9 0x55c2f4895456 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions(llvm::SmallSetVector<llvm::Instruction*, 8u>&, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, bool) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12182:21
    #10 0x55c2f48885e4 in llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12491:21
    #11 0x55c2f4884f84 in llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10233:16
    #12 0x55c2f4884198 in llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10164:18
    #13 0x55c2f2aa37e1 in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #14 0x55c2f5944276 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManager.h:517:40
    #15 0x55c2f16f3df1 in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #16 0x55c2f5942170 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/lib/IR/PassManager.cpp:124:38
    #17 0x55c2efb948d1 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #18 0x55c2f59429e6 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManager.h:517:40
    #19 0x55c2efb2ece1 in Halide::Internal::CodeGen_LLVM::optimize_module() third_party/halide/halide/src/CodeGen_LLVM.cpp:1276:9
    #20 0x55c2efb29e39 in Halide::Internal::CodeGen_LLVM::finish_codegen() third_party/halide/halide/src/CodeGen_LLVM.cpp:587:19
    #21 0x55c2efb2b408 in Halide::Internal::CodeGen_LLVM::compile(Halide::Module const&) third_party/halide/halide/src/CodeGen_LLVM.cpp:576:12
    #22 0x55c2efb22f54 in Halide::codegen_llvm(Halide::Module const&, llvm::LLVMContext&) third_party/halide/halide/src/CodeGen_LLVM.cpp:44:16
    #23 0x55c2efe39886 in Halide::Module::compile(std::__u::map<Halide::OutputFileType, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>, std::__u::less<Halide::OutputFileType>, std::__u::allocator<std::__u::pair<Halide::OutputFileType const, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>>>> const&) const third_party/halide/halide/src/Module.cpp:662:51
    #24 0x55c2eff591a2 in Halide::Pipeline::compile_to_llvm_assembly(std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, std::__u::vector<Halide::Argument, std::__u::allocator<Halide::Argument>> const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, Halide::Target const&) third_party/halide/halide/src/Pipeline.cpp:341:7
    #25 0x55c2ef978bd8 in Halide::Func::compile_to_llvm_assembly(std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, std::__u::vector<Halide::Argument, std::__u::allocator<Halide::Argument>> const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, Halide::Target const&) third_party/halide/halide/src/Func.cpp:3418:16
    #26 0x55c2ef90e410 in main third_party/halide/halide/apps/fft/main.cpp:112:14
    #27 0x7fc79ef6b632 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x61632) (BuildId: 280088eab084c30a3992a9bce5c35b44)
    #28 0x55c2ef7ebd69 in _start /build/work/ab393f4ac612f9027aae6b1a7226027ba2a2/google3/blaze-out/k8-opt/bin/third_party/grte/v5_src/grte-scratch/BUILD/src/csu/../sysdeps/x86_64/start.S:120

0x60b000dd8e04 is located 0 bytes after 100-byte region [0x60b000dd8da0,0x60b000dd8e04)
allocated by thread T0 here:
    #0 0x55c2ef888f2e in malloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x55c2f5d9bdca in safe_malloc third_party/llvm/llvm-project/llvm/include/llvm/Support/MemAlloc.h:26:18
    #2 0x55c2f5d9bdca in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) third_party/llvm/llvm-project/llvm/lib/Support/SmallVector.cpp:126:15
    #3 0x55c2f0ec4d63 in grow_pod third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:127:11
    #4 0x55c2f0ec4d63 in grow third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:512:41
    #5 0x55c2f0ec4d63 in llvm::SmallVectorTemplateBase<int, true>::growAndAssign(unsigned long, int) third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:534:11
    #6 0x55c2f485eca2 in SmallVector third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:1194:11
    #7 0x55c2f485eca2 in llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6415:24
    #8 0x55c2f4869ebd in llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7218:25
    #9 0x55c2f488d4d2 in llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10575:32
    #10 0x55c2f4894dfc in llvm::SLPVectorizerPass::vectorizeInsertElementInst(llvm::InsertElementInst*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12031:10
    #11 0x55c2f4895456 in llvm::SLPVectorizerPass::vectorizeSimpleInstructions(llvm::SmallSetVector<llvm::Instruction*, 8u>&, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, bool) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12182:21
    #12 0x55c2f48885e4 in llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12491:21
    #13 0x55c2f4884f84 in llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10233:16
    #14 0x55c2f4884198 in llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10164:18
    #15 0x55c2f2aa37e1 in llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #16 0x55c2f5944276 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManager.h:517:40
    #17 0x55c2f16f3df1 in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #18 0x55c2f5942170 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/lib/IR/PassManager.cpp:124:38
    #19 0x55c2efb948d1 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #20 0x55c2f59429e6 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) third_party/llvm/llvm-project/llvm/include/llvm/IR/PassManager.h:517:40
    #21 0x55c2efb2ece1 in Halide::Internal::CodeGen_LLVM::optimize_module() third_party/halide/halide/src/CodeGen_LLVM.cpp:1276:9
    #22 0x55c2efb29e39 in Halide::Internal::CodeGen_LLVM::finish_codegen() third_party/halide/halide/src/CodeGen_LLVM.cpp:587:19
    #23 0x55c2efb2b408 in Halide::Internal::CodeGen_LLVM::compile(Halide::Module const&) third_party/halide/halide/src/CodeGen_LLVM.cpp:576:12
    #24 0x55c2efb22f54 in Halide::codegen_llvm(Halide::Module const&, llvm::LLVMContext&) third_party/halide/halide/src/CodeGen_LLVM.cpp:44:16
    #25 0x55c2efe39886 in Halide::Module::compile(std::__u::map<Halide::OutputFileType, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>, std::__u::less<Halide::OutputFileType>, std::__u::allocator<std::__u::pair<Halide::OutputFileType const, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>>>> const&) const third_party/halide/halide/src/Module.cpp:662:51
    #26 0x55c2eff591a2 in Halide::Pipeline::compile_to_llvm_assembly(std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, std::__u::vector<Halide::Argument, std::__u::allocator<Halide::Argument>> const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, Halide::Target const&) third_party/halide/halide/src/Pipeline.cpp:341:7
    #27 0x55c2ef978bd8 in Halide::Func::compile_to_llvm_assembly(std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, std::__u::vector<Halide::Argument, std::__u::allocator<Halide::Argument>> const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>> const&, Halide::Target const&) third_party/halide/halide/src/Func.cpp:3418:16
    #28 0x55c2ef90e410 in main third_party/halide/halide/apps/fft/main.cpp:112:14
    #29 0x7fc79ef6b632 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x61632) (BuildId: 280088eab084c30a3992a9bce5c35b44)
    #30 0x55c2ef7ebd69 in _start /build/work/ab393f4ac612f9027aae6b1a7226027ba2a2/google3/blaze-out/k8-opt/bin/third_party/grte/v5_src/grte-scratch/BUILD/src/csu/../sysdeps/x86_64/start.S:120

SUMMARY: AddressSanitizer: heap-buffer-overflow third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:30:3 in __asan_memmove
Shadow bytes around the buggy address:
  0x0c16801b3170: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c16801b3180: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c16801b3190: 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa fa
  0x0c16801b31a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c16801b31b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c16801b31c0:[04]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c16801b31d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c16801b31e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  0x0c16801b31f0: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x0c16801b3200: 00 00 04 fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c16801b3210: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==349869==ABORTING
srj added a comment.Sep 20 2022, 1:39 PM

The line numbers in the ASAN trace are slightly different from this patch, but the culprit seems to be:

SmallVector<int> InsertMask(NumElts, UndefMaskElem);
copy(Mask, std::next(InsertMask.begin(), OffsetBeg));    <-- THIS LINE

Not sure about the semantics here, but presumably InsertMask doesn't have the capacity for the copy and gets tromped on...

The line numbers in the ASAN trace are slightly different from this patch, but the culprit seems to be:

SmallVector<int> InsertMask(NumElts, UndefMaskElem);
copy(Mask, std::next(InsertMask.begin(), OffsetBeg));    <-- THIS LINE

Not sure about the semantics here, but presumably InsertMask doesn't have the capacity for the copy and gets tromped on...

Ok, thanks, will add an assertion and check what's wrong here.

The line numbers in the ASAN trace are slightly different from this patch, but the culprit seems to be:

SmallVector<int> InsertMask(NumElts, UndefMaskElem);
copy(Mask, std::next(InsertMask.begin(), OffsetBeg));    <-- THIS LINE

Not sure about the semantics here, but presumably InsertMask doesn't have the capacity for the copy and gets tromped on...

Must be fixed in e664dea1821ab1277e62f0b4074fb02867636e6e