This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas
ClosedPublic

Authored by khei4 on Jun 21 2023, 11:50 AM.

Details

Summary

This is takeover from @pcwalton 's https://reviews.llvm.org/D140089

New transformation unifies the two allocates on the same BB if the followings are satisfied
(1) The src is fully copied to dest by memcpy, memmove or store&load.
(2) src and dest are 1) static alloca, which is to avoid stacksave/stackrestore handling, 2) unescaped allocas.
(3) The dest has no Mod Ref except full-sized lifetime intrinsic and copy itself, from the alloca to the store.
(4) If the dest has no Mod => src has no Ref, dest has no Ref => src has no Mod from the store to the end of BB.

Roughly, the target structure on this patch is like the following.

{
   unescaped static allocas of src/dest

   ↓ src/dest may have mod/ref. Practically it may be sufficient to bailout if dest has any ModRef, many case would be problem if the dest is replaced with src. 

   full-size copy from src to dest, memcpy/memmove/load-store of the src value.

   ↓ src/dest may have mod/ref. There have to be no read(ref) with the other one's write(mod)
   
   end without capturing both of the alloca.
}

For the dest modification before the copy, it could be ok to erase those mods, because of the unescaped allocas in this case.
For mod/ref condition after the full-size copy, the @pcwalton 's original patch did the BB-level liveness checks. Maybe this patch has a lot of room to improve. The inherent condition is that one of two ptr is unnecessary for any execution path.
It seems like there are almost no compile-time regressions. https://llvm-compile-time-tracker.com/compare.php?from=0f9f95146a7fc6fa4f9bc3c1aa2a23386f521dac&to=a6c40d12ea1c6a4b82e88ce1c7b2a723d199fb85&stat=instructions:u

pre-commit tests: https://reviews.llvm.org/D152277

pre-commit tests for user order crash https://reviews.llvm.org/D155179

pre-commit tests for terminator crash https://reviews.llvm.org/D155571

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jplehr added a subscriber: jplehr.Jul 13 2023, 12:37 AM

Hi,
it may be the case that this broke the AMDGPU OpenMP buildbot (https://lab.llvm.org/buildbot/#/builders/193/builds/34626)
While we had some trouble with randomly occurring failures lately, the printed stack trace has this particular transforms in it.
Would you mind looking into it? I'm happy to assist if needed.

Seems this causes clang crash in stage2.

https://lab.llvm.org/buildbot/#/builders/198/builds/3639
https://lab.llvm.org/buildbot/#/builders/121/builds/32483

I saw the crash also in our private x86_64-linux builder.

khei4 added a comment.Jul 13 2023, 2:03 AM

@jplehr @chapuni Thank you for the report! I'm sorry to bother you. Although I can't find why this crashed, I'll revert it first.

nikic added a comment.Jul 13 2023, 2:21 AM

@khei4 I think the problem is that the load/store can be in a different block, so the UI->comesBefore(Store) check asserts. (We checked that UI is in the same block, but not Store.)

khei4 added a comment.Jul 13 2023, 2:33 AM

@nikic Ah, thanks! That must be one of the multi-bb cases I should have tested. Thank you for finding it!

chfast added a subscriber: chfast.Jul 13 2023, 4:30 AM
khei4 added a comment.Jul 13 2023, 4:43 AM

@nikic Although I can conservatively add checks that confirm load/store and src/dest allocas are all in one BB, following simple load/store separated test (https://reviews.llvm.org/D155179) doesn't reproduce the error on openmp

define void @multi_bb_load_store(i1 %b) {
  %src = alloca i32, align 4
  %dest = alloca i32, align 4
  call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %src)
  call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %dest)
  store i32 42, ptr %src
  %1 = call i32 @use_nocapture(ptr nocapture %src)

  %src.val = load i32, ptr %src
  br label %bb0

bb0:
  store i32 %src.val, ptr %dest
  br label %bb1

bb1:
  %2 = call i32 @use_nocapture(ptr nocapture %dest)
  call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %src)
  call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %dest)
  ret void
}

I'll find openmp/libomptarget/test/offloading/std_complex_arithmetic.cpp 's IR to reproduce the failure locally.

@jplehr
Sorry, I'm pretty novice for openMP, and https://godbolt.org/z/1xq5Ta9rh 's output IR doesn't reproduce the failure we see on the build, could you kindly teach me what is a good way to obtain failed IR?

khei4 reopened this revision.Jul 13 2023, 4:46 AM
This revision is now accepted and ready to land.Jul 13 2023, 4:46 AM
khei4 updated this revision to Diff 539965.Jul 13 2023, 5:03 AM

check that load/store and dest/src alloca are all in the same bb

khei4 updated this revision to Diff 540309.Jul 14 2023, 1:01 AM
  • Check store/load and allocas are in a single bb strictly (will be removed soon although)
  • use DT partially

@nikic Thanks! I now know that the original implementation just luckily avoid different block users by seeing the Store itself first. CSE or some other transform seems to introduce those users apriori to the Store. Thank you @serge-sans-paille !

Could you also add the test case from https://reviews.llvm.org/D155207 ? It was reproducing the issue I had when compiling firefox.

Could you also add the test case from https://reviews.llvm.org/D155207 ? It was reproducing the issue I had when compiling firefox.

I think you should pre-commit the test case.

khei4 added a comment.EditedJul 14 2023, 2:49 AM

@serge-sans-paille Thanks! I'm sorry for the odd revisions. I will add it by another commit https://reviews.llvm.org/D155179. I set it to the parent of this revision.

@chfast
Thanks too! I'll do that!

nikic added a comment.Jul 14 2023, 2:51 AM

We shouldn't use dominates() in this patch, as it's not always semantically correct. Just the BB check you added should be enough to fix the issue, no?

khei4 updated this revision to Diff 540355.Jul 14 2023, 4:21 AM
  • revert dominates to comesBefore
khei4 added a comment.Jul 14 2023, 4:23 AM

@nikic

We shouldn't use dominates() in this patch, as it's not always semantically correct. Just the BB check you added should be enough to fix the issue, no?

Thanks! Right! The semantics would be different (strict?) from the multi-BB case, and checking is enough to prevent crashes.

We shouldn't use dominates() in this patch, as it's not always semantically correct.

Can you explain why?

Just the BB check you added should be enough to fix the issue, no?

Yes, that was my first patch and I tested it successfully on firefox.

nikic added a comment.Jul 14 2023, 5:29 AM

We shouldn't use dominates() in this patch, as it's not always semantically correct.

Can you explain why?

A lot of these comesBefore() are actually reachability queries. This reduces to comesBefore() within one block, but is not the same as dominates() across blocks.

khei4 updated this revision to Diff 540639.Jul 14 2023, 9:40 PM
khei4 edited the summary of this revision. (Show Details)

rebase

khei4 added a comment.Jul 14 2023, 9:41 PM

We shouldn't use dominates() in this patch, as it's not always semantically correct.

Can you explain why?

A lot of these comesBefore() are actually reachability queries. This reduces to comesBefore() within one block, but is not the same as dominates() across blocks.

Thank you for your explanation! Yes, that was precisely the reachability, and semantics will differ from domination on multi-bb. But that is at least sound and might be reasonable to restrict only dominated cases. I'm still finding the reasonable scope ;)

nikic accepted this revision.Jul 15 2023, 12:21 AM

LGTM

hoy added a subscriber: hoy.Jul 17 2023, 11:23 AM
hoy added inline comments.
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1616

Hi, it looks like this resulted in a case where a liftimeEnd was inserted to the end of a block and invalidating the terminator convention:

invoke fastcc void @_ZN5folly3f146detail11F14BasicMapINS1_19NodeContainerPolicyIN8facebook8infrasec13authorization8IdentityEKSt10shared_ptrIKNS_10F14NodeMapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_10F14NodeSetISF_NS_23HeterogeneousAccessHashISF_vEENS_26HeterogeneousAccessEqualToISF_vEESaISF_EEESI_SK_SaISt4pairIKSF_SM_EEEEEvvvEEE13initialInsertIN6ranges14basic_iteratorINSY_14adaptor_cursorINSZ_INSY_9join_viewINSY_14transform_viewINSY_8ref_viewISt6vectorINS5_9Flattener18FlattenedGroupDataESaIS16_EEEEMS16_NS9_IS7_SU_NSH_IS7_vEENSJ_IS7_vEESaISN_IKS7_SU_EEEEEEE6cursorILb0EEEEENSY_9move_viewIS1I_E7adaptorILb0EEEEEEEEEvT_S1S_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef nonnull byval(%"struct.ranges::basic_iterator.638") align 8 %5, ptr noundef nonnull byval(%"struct.ranges::basic_iterator.638") align 8 %3, i64 noundef 0)
          to label %22 unwind label %20, !dbg !56200
  call void @llvm.lifetime.end.p0(i64 40, ptr %5), !dbg !54832

This in turns caused a compiler ICE:

clang:      llvm-project/llvm/include/llvm/Support/Casting.h:662: decltype(auto) llvm::dyn_cast(From *) [To = llvm::InvokeInst, From = llvm::Instruction]: Assertion `detail::isPresent(Val) && "dyn_cast on a non-existent value"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/hoy/install/llvm-chk/bin/clang @clang.rsp
 #0 0x00007f85fd06e248 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)      llvm-project/llvm/lib/Support/Unix/Signals.inc:602:13
 #1 0x00007f85fd06c4c0 llvm::sys::RunSignalHandlers()      llvm-project/llvm/lib/Support/Signals.cpp:105:18
 #2 0x00007f85fd06e8ed SignalHandler(int)      llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f86011cccf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f85fc1e0acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f85fc1b3ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f85fc1b3d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f85fc1d9426 (/lib64/libc.so.6+0x47426)
 #8 0x00007f85fe6641e1 decltype(auto) llvm::dyn_cast<llvm::InvokeInst, llvm::Instruction>(llvm::Instruction*)      llvm-project/llvm/include/llvm/Support/Casting.h:663:10
 #9 0x00007f85fe664d0f markAliveBlocks(llvm::Function&, llvm::SmallPtrSetImpl<llvm::BasicBlock*>&, llvm::DomTreeUpdater*)      llvm-project/llvm/lib/Transforms/Utils/Local.cpp:2514:15
#10 0x00007f85fe6645eb llvm::removeUnreachableBlocks(llvm::Function&, llvm::DomTreeUpdater*, llvm::MemorySSAUpdater*)      llvm-project/llvm/lib/Transforms/Utils/Local.cpp:2663:8
#11 0x00007f85fed64542 simplifyFunctionCFGImpl(llvm::Function&, llvm::TargetTransformInfo const&, llvm::DominatorTree*, llvm::SimplifyCFGOptions const&)      llvm-project/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:0:22
#12 0x00007f85fed64542 simplifyFunctionCFG(llvm::Function&, llvm::TargetTransformInfo const&, llvm::DominatorTree*, llvm::SimplifyCFGOptions const&)      llvm-project/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:301:18
#13 0x00007f85fed64357 llvm::SimplifyCFGPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)      llvm-project/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:363:7

Reverting the patch clears the error.

khei4 added inline comments.Jul 18 2023, 2:21 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1616

@hoy Thank you for the report with helpful info! Yeah, I forgot to port that handling the original patch does. Sorry for being late, I assume this is not so difficult but I'll revert it first.

khei4 updated this revision to Diff 541421.Jul 18 2023, 3:11 AM

To address teminator lifetime insertion issue. Rebased for https://reviews.llvm.org/D155571

  • don't insert if the lastuser is the terminator for the BB,
    • This requires PostDominatorTree, so coming patch insert this, I believe the absence of the lifetime.end is not so much an issue ;) I'll appreciate if you tell me if it matters.
This revision is now accepted and ready to land.Jul 18 2023, 3:12 AM
nikic accepted this revision.Jul 18 2023, 5:37 AM

LGTM with a nit.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1618
khei4 edited the summary of this revision. (Show Details)Jul 19 2023, 1:39 AM
khei4 updated this revision to Diff 541903.Jul 19 2023, 2:10 AM
khei4 marked an inline comment as done.

apply feedback and rebase

ayzhao added a subscriber: ayzhao.Jul 21 2023, 4:18 PM

I'm seeing a crash caused by this patch when I'm building Chrome's unit tests on Windows:

C:\src\chromium\src>autoninja -C out\continuous obj/base/base_unittests/checked_iterators_unittest.obj
"C:\src\depot_tools\bootstrap-2@3_8_10_chromium_26_bin\python3\bin\python3.exe" C:\src\depot_tools\ninja.py -C out\continuous obj/base/base_unittests/checked_iterators_unittest.obj -j 130
ninja: Entering directory `out\continuous'
[1/1] CXX obj/base/base_unittests/checked_iterators_unittest.obj
FAILED: obj/base/base_unittests/checked_iterators_unittest.obj
"C:/src/depot_tools/bootstrap-2@3_8_10_chromium_26_bin/python3/bin/python3.exe" ../../build/toolchain/clang_code_coverage_wrapper.py --target-os=win ..\..\..\..\llvm-project\build-ninja\bin\clang-cl.exe /c ../../base/containers/checked_iterators_unittest.cc /Foobj/base/base_unittests/checked_iterators_unittest.obj /nologo /showIncludes:user /winsysroot../../third_party/depot_tools/win_toolchain/vs_files/27370823e7 -DUSE_AURA=1 -D_HAS_NODISCARD -D_CRT_NONSTDC_NO_WARNINGS -D_WINSOCK_DEPRECATED_NO_WARNINGS -D_LIBCPP_ENABLE_ASSERTIONS=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -DCR_LIBCXX_REVISION=84fb809dd6dae36d556dc0bb702c6cc2ce9d4b80 -D__STD_C -D_CRT_RAND_S -D_CRT_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_DEPRECATE -D_ATL_NO_OPENGL -D_WINDOWS -DCERT_CHAIN_PARA_HAS_EXTRA_FIELDS -DPSAPI_VERSION=2 -DWIN32 -D_SECURE_ATL -DWINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -DWIN32_LEAN_AND_MEAN -DNOMINMAX -D_UNICODE -DUNICODE -DNTDDI_VERSION=NTDDI_WIN10_NI -D_WIN32_WINNT=0x0A00 -DWINVER=0x0A00 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DGOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -DGTEST_API_= -DGTEST_HAS_POSIX_RE=0 -DGTEST_LANG_CXX11=1 -DGTEST_HAS_TR1_TUPLE=0 -DGTEST_HAS_ABSL=1 -DUNIT_TEST -I../.. -Igen -I../../buildtools/third_party/libc++ -I../../third_party/perfetto/include -Igen/third_party/perfetto/build_config -Igen/third_party/perfetto -I../../third_party/abseil-cpp -I../../third_party/boringssl/src/include -I../../third_party/protobuf/src -Igen/protoc_out -I../../third_party/ced/src -I../../third_party/icu/source/common -I../../third_party/icu/source/i18n -I../../third_party/googletest/custom -I../../third_party/googletest/src/googlemock/include -I../../third_party/googletest/src/googletest/include /W4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety /WX -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wenum-compare-conditional -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wshadow -fno-delete-null-pointer-checks -fno-ident -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 /clang:-ffp-contract=off -fcomplete-member-pointers /Gy /FS /bigobj /utf-8 /Zc:twoPhase -ffile-reproducible /Zc:sizedDealloc- /D__WRL_ENABLE_FUNCTION_STATICS__ -fmsc-version=1934 -m64 -msse3 /Brepro -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern /O1 /Ob2 /Oy- /Zc:inline /Gw /Oi -fprofile-instr-generate -fcoverage-mapping -fprofile-update=atomic -mllvm -runtime-counter-relocation=true -mllvm -limited-coverage-experimental=true /guard:cf /MT -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DPROTOBUF_ALLOW_DEPRECATED=1 -Wno-inconsistent-missing-override /wd4800 /std:c++20 -Wno-trigraphs /TP /GR- -I../../buildtools/third_party/libc++/trunk/include /Fd"obj/base/base_unittests_cc.pdb"
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: ..\\..\\..\\..\\llvm-project\\build-ninja\\bin\\clang-cl.exe /c ../../base/containers/checked_iterators_unittest.cc /Foobj/base/base_unittests/checked_iterators_unittest.obj /nologo /showIncludes:user /winsysroot../../third_party/depot_tools/win_toolchain/vs_files/27370823e7 -DUSE_AURA=1 -D_HAS_NODISCARD -D_CRT_NONSTDC_NO_WARNINGS -D_WINSOCK_DEPRECATED_NO_WARNINGS -D_LIBCPP_ENABLE_ASSERTIONS=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -DCR_LIBCXX_REVISION=84fb809dd6dae36d556dc0bb702c6cc2ce9d4b80 -D__STD_C -D_CRT_RAND_S -D_CRT_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_DEPRECATE -D_ATL_NO_OPENGL -D_WINDOWS -DCERT_CHAIN_PARA_HAS_EXTRA_FIELDS -DPSAPI_VERSION=2 -DWIN32 -D_SECURE_ATL -DWINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -DWIN32_LEAN_AND_MEAN -DNOMINMAX -D_UNICODE -DUNICODE -DNTDDI_VERSION=NTDDI_WIN10_NI -D_WIN32_WINNT=0x0A00 -DWINVER=0x0A00 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DGOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -DGTEST_API_= -DGTEST_HAS_POSIX_RE=0 -DGTEST_LANG_CXX11=1 -DGTEST_HAS_TR1_TUPLE=0 -DGTEST_HAS_ABSL=1 -DUNIT_TEST -I../.. -Igen -I../../buildtools/third_party/libc++ -I../../third_party/perfetto/include -Igen/third_party/perfetto/build_config -Igen/third_party/perfetto -I../../third_party/abseil-cpp -I../../third_party/boringssl/src/include -I../../third_party/protobuf/src -Igen/protoc_out -I../../third_party/ced/src -I../../third_party/icu/source/common -I../../third_party/icu/source/i18n -I../../third_party/googletest/custom -I../../third_party/googletest/src/googlemock/include -I../../third_party/googletest/src/googletest/include /W4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety /WX -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wenum-compare-conditional -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wshadow -fno-delete-null-pointer-checks -fno-ident -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 /clang:-ffp-contract=off -fcomplete-member-pointers /Gy /FS /bigobj /utf-8 /Zc:twoPhase -ffile-reproducible /Zc:sizedDealloc- /D__WRL_ENABLE_FUNCTION_STATICS__ -fmsc-version=1934 -m64 -msse3 /Brepro -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern /O1 /Ob2 /Oy- /Zc:inline /Gw /Oi -fprofile-instr-generate -fcoverage-mapping -fprofile-update=atomic -mllvm -runtime-counter-relocation=true -mllvm -limited-coverage-experimental=true /guard:cf /MT -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DPROTOBUF_ALLOW_DEPRECATED=1 -Wno-inconsistent-missing-override /wd4800 /std:c++20 -Wno-trigraphs /TP /GR- -I../../buildtools/third_party/libc++/trunk/include /Fdobj/base/base_unittests_cc.pdb
1.      <eof> parser at end of file
2.      Optimizer
Exception Code: 0xC0000005
 #0 0x00007ff798095239 llvm::Type::getContext(void) const C:\src\llvm-project\llvm\include\llvm\IR\Type.h:129:0
 #1 0x00007ff79845fcc6 llvm::Value::getContext(void) const C:\src\llvm-project\llvm\lib\IR\Value.cpp:1069:0
 #2 0x00007ff798bd5a78 llvm::Value::setMetadata(unsigned int, class llvm::MDNode *) C:\src\llvm-project\llvm\lib\IR\Metadata.cpp:1405:0
 #3 0x00007ff798bd612c llvm::Instruction::setMetadata(unsigned int, class llvm::MDNode *) C:\src\llvm-project\llvm\lib\IR\Metadata.cpp:1542:0
 #4 0x00007ff79ea7b320 llvm::MemCpyOptPass::performStackMoveOptzn(class llvm::Instruction *, class llvm::Instruction *, class llvm::AllocaInst *, class llvm::AllocaInst *, unsigned __int64, class llvm::BatchAAResults &) C:\src\llvm-project\llvm\lib\Transforms\Scalar\MemCpyOptimizer.cpp:1632:0
 #5 0x00007ff79ea7e9d2 llvm::MemCpyOptPass::processMemCpy(class llvm::MemCpyInst *, class llvm::ilist_iterator<struct llvm::ilist_detail::node_options<class llvm::Instruction, 1, 0, void>, 0, 0> &) C:\src\llvm-project\llvm\lib\Transforms\Scalar\MemCpyOptimizer.cpp:1754:0
 #6 0x00007ff79ea7fd7c llvm::MemCpyOptPass::iterateOnFunction(class llvm::Function &) C:\src\llvm-project\llvm\lib\Transforms\Scalar\MemCpyOptimizer.cpp:1978:0
 #7 0x00007ff79ea801bd llvm::MemCpyOptPass::runImpl(class llvm::Function &, class llvm::TargetLibraryInfo *, class llvm::AAResults *, class llvm::AssumptionCache *, class llvm::DominatorTree *, class llvm::MemorySSA *) C:\src\llvm-project\llvm\lib\Transforms\Scalar\MemCpyOptimizer.cpp:2032:0
 #8 0x00007ff79ea80088 llvm::MemCpyOptPass::run(class llvm::Function &, class llvm::AnalysisManager<class llvm::Function> &) C:\src\llvm-project\llvm\lib\Transforms\Scalar\MemCpyOptimizer.cpp:2009:0
 #9 0x00007ff79bd63fc8 llvm::detail::PassModel<class llvm::Function, class llvm::MemCpyOptPass, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::Function>>::run(class llvm::Function &, class llvm::AnalysisManager<class llvm::Function> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#10 0x00007ff79924686c llvm::PassManager<class llvm::Function, class llvm::AnalysisManager<class llvm::Function>>::run(class llvm::Function &, class llvm::AnalysisManager<class llvm::Function> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManager.h:521:0
#11 0x00007ff799d4dac8 llvm::detail::PassModel<class llvm::Function, class llvm::PassManager<class llvm::Function, class llvm::AnalysisManager<class llvm::Function>>, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::Function>>::run(class llvm::Function &, class llvm::AnalysisManager<class llvm::Function> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#12 0x00007ff79b335b5d llvm::CGSCCToFunctionPassAdaptor::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\lib\Analysis\CGSCCPassManager.cpp:535:0
#13 0x00007ff79bd6ad52 llvm::detail::PassModel<class llvm::LazyCallGraph::SCC, class llvm::CGSCCToFunctionPassAdaptor, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &>, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &>::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#14 0x00007ff79b333bbd llvm::PassManager<class llvm::LazyCallGraph::SCC, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &>, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &>::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\lib\Analysis\CGSCCPassManager.cpp:90:0
#15 0x00007ff79bd84252 llvm::detail::PassModel<class llvm::LazyCallGraph::SCC, class llvm::PassManager<class llvm::LazyCallGraph::SCC, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &>, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &>, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &>, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &>::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#16 0x00007ff79b334a79 llvm::DevirtSCCRepeatedPass::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\lib\Analysis\CGSCCPassManager.cpp:405:0
#17 0x00007ff79bea5db2 llvm::detail::PassModel<class llvm::LazyCallGraph::SCC, class llvm::DevirtSCCRepeatedPass, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &>, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &>::run(class llvm::LazyCallGraph::SCC &, class llvm::AnalysisManager<class llvm::LazyCallGraph::SCC, class llvm::LazyCallGraph &> &, class llvm::LazyCallGraph &, struct llvm::CGSCCUpdateResult &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#18 0x00007ff79b334560 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\lib\Analysis\CGSCCPassManager.cpp:278:0
#19 0x00007ff79bd709a8 llvm::detail::PassModel<class llvm::Module, class llvm::ModuleToPostOrderCGSCCPassAdaptor, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#20 0x00007ff7992454dc llvm::PassManager<class llvm::Module, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManager.h:521:0
#21 0x00007ff79e83e609 llvm::ModuleInlinerWrapperPass::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\lib\Transforms\IPO\Inliner.cpp:633:0
#22 0x00007ff79bd6b3b8 llvm::detail::PassModel<class llvm::Module, class llvm::ModuleInlinerWrapperPass, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:89:0
#23 0x00007ff7992454dc llvm::PassManager<class llvm::Module, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) C:\src\llvm-project\llvm\include\llvm\IR\PassManager.h:521:0
#24 0x00007ff799d338a1 `anonymous namespace'::EmitAssemblyHelper::RunOptimizationPipeline C:\src\llvm-project\clang\lib\CodeGen\BackendUtil.cpp:1100:0
#25 0x00007ff799d2ddc3 `anonymous namespace'::EmitAssemblyHelper::EmitAssembly C:\src\llvm-project\clang\lib\CodeGen\BackendUtil.cpp:1157:0
#26 0x00007ff799d2d139 clang::EmitBackendOutput(class clang::DiagnosticsEngine &, class clang::HeaderSearchOptions const &, class clang::CodeGenOptions const &, class clang::TargetOptions const &, class clang::LangOptions const &, class llvm::StringRef, class llvm::Module *, enum clang::BackendAction, class llvm::IntrusiveRefCntPtr<class llvm::vfs::FileSystem>, class std::unique_ptr<class llvm::raw_pwrite_stream, struct std::default_delete<class llvm::raw_pwrite_stream>>) C:\src\llvm-project\clang\lib\CodeGen\BackendUtil.cpp:1324:0
#27 0x00007ff79a5fe890 clang::BackendConsumer::HandleTranslationUnit(class clang::ASTContext &) C:\src\llvm-project\clang\lib\CodeGen\CodeGenAction.cpp:390:0
#28 0x00007ff79d41e3f5 clang::ParseAST(class clang::Sema &, bool, bool) C:\src\llvm-project\clang\lib\Parse\ParseAST.cpp:183:0
#29 0x00007ff79a38e0f2 clang::ASTFrontendAction::ExecuteAction(void) C:\src\llvm-project\clang\lib\Frontend\FrontendAction.cpp:1170:0
#30 0x00007ff79a5f7a9a clang::CodeGenAction::ExecuteAction(void) C:\src\llvm-project\clang\lib\CodeGen\CodeGenAction.cpp:1183:0
#31 0x00007ff79a38dabd clang::FrontendAction::Execute(void) C:\src\llvm-project\clang\lib\Frontend\FrontendAction.cpp:1062:0
#32 0x00007ff798a5cba7 clang::CompilerInstance::ExecuteAction(class clang::FrontendAction &) C:\src\llvm-project\clang\lib\Frontend\CompilerInstance.cpp:1049:0
#33 0x00007ff798b5dec2 clang::ExecuteCompilerInvocation(class clang::CompilerInstance *) C:\src\llvm-project\clang\lib\FrontendTool\ExecuteCompilerInvocation.cpp:272:0
#34 0x00007ff798054cb3 cc1_main(class llvm::ArrayRef<char const *>, char const *, void *) C:\src\llvm-project\clang\tools\driver\cc1_main.cpp:249:0
#35 0x00007ff79804281b ExecuteCC1Tool C:\src\llvm-project\clang\tools\driver\driver.cpp:366:0
#36 0x00007ff7980443cd clang_main::<lambda_0>::operator() C:\src\llvm-project\clang\tools\driver\driver.cpp:506:0
#37 0x00007ff79804439d llvm::function_ref<int (llvm::SmallVectorImpl<const char *> &)>::callback_fn<`lambda at C:/src/llvm-project/clang/tools/driver/driver.cpp:505:25'> C:\src\llvm-project\llvm\include\llvm\ADT\STLFunctionalExtras.h:45:0
#38 0x00007ff799fe7691 llvm::function_ref<(class llvm::SmallVectorImpl<char const *> &)>::operator()(class llvm::SmallVectorImpl<char const *> &) const C:\src\llvm-project\llvm\include\llvm\ADT\STLFunctionalExtras.h:68:0
#39 0x00007ff799fe3587 clang::driver::CC1Command::Execute::<lambda_1>::operator() C:\src\llvm-project\clang\lib\Driver\Job.cpp:440:0
#40 0x00007ff799fe3553 llvm::function_ref<void ()>::callback_fn<`lambda at C:/src/llvm-project/clang/lib/Driver/Job.cpp:440:22'> C:\src\llvm-project\llvm\include\llvm\ADT\STLFunctionalExtras.h:45:0
#41 0x00007ff7987d3b47 llvm::function_ref<(void)>::operator()(void) const C:\src\llvm-project\llvm\include\llvm\ADT\STLFunctionalExtras.h:68:0
#42 0x00007ff7987e0a7f llvm::CrashRecoveryContext::RunSafely(class llvm::function_ref<(void)>) C:\src\llvm-project\llvm\lib\Support\CrashRecoveryContext.cpp:235:0
#43 0x00007ff799fe30c8 clang::driver::CC1Command::Execute(class llvm::ArrayRef<class std::optional<class llvm::StringRef>>, class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> *, bool *) const C:\src\llvm-project\clang\lib\Driver\Job.cpp:440:0
#44 0x00007ff7989dfbbf clang::driver::Compilation::ExecuteCommand(class clang::driver::Command const &, class clang::driver::Command const *&, bool) const C:\src\llvm-project\clang\lib\Driver\Compilation.cpp:199:0
#45 0x00007ff7989dfddf clang::driver::Compilation::ExecuteJobs(class clang::driver::JobList const &, class llvm::SmallVectorImpl<struct std::pair<int, class clang::driver::Command const *>> &, bool) const C:\src\llvm-project\clang\lib\Driver\Compilation.cpp:253:0
#46 0x00007ff7989fe0d2 clang::driver::Driver::ExecuteCompilation(class clang::driver::Compilation &, class llvm::SmallVectorImpl<struct std::pair<int, class clang::driver::Command const *>> &) C:\src\llvm-project\clang\lib\Driver\Driver.cpp:1906:0
#47 0x00007ff798042291 clang_main(int, char **, struct llvm::ToolContext const &) C:\src\llvm-project\clang\tools\driver\driver.cpp:542:0
#48 0x00007ff798077a02 main C:\src\llvm-project\build-ninja\tools\clang\tools\driver\clang-driver.cpp:15:0
#49 0x00007ff7a1bf3b89 invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79:0
#50 0x00007ff7a1bf3cbe __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288:0
#51 0x00007ff7a1bf3d3e __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331:0
#52 0x00007ff7a1bf3d5e mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17:0
#53 0x00007ffdcdd77614 (C:\Windows\System32\KERNEL32.DLL+0x17614)
#54 0x00007ffdcf3826b1 (C:\Windows\SYSTEM32\ntdll.dll+0x526b1)
clang-cl: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 17.0.0 (git@github.com:llvm/llvm-project.git f7e4304120506c9973a5ac939e06c106d8816911)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: ..\..\..\..\llvm-project\build-ninja\bin
clang-cl: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-cl: note: diagnostic msg: ../../tools/clang/crashreports\checked_iterators_unittest-808a19.cpp
clang-cl: note: diagnostic msg: ../../tools/clang/crashreports\checked_iterators_unittest-808a19.sh
clang-cl: note: diagnostic msg:

********************
ninja: build stopped: subcommand failed.

khei4 added a comment.Jul 21 2023, 9:09 PM

@ayzhao Hi! Thanks! I reverted first!

Although I can't reproduce this locally, I think this is suspicious.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1632–1633

Hmm, it seems like setMetadata is called deleted instruction.

khei4 reopened this revision.Jul 21 2023, 10:44 PM
This revision is now accepted and ready to land.Jul 21 2023, 10:44 PM
nikic added a comment.Jul 22 2023, 1:03 AM

Although I can't reproduce this locally, I think this is suspicious.

This probably happens when there is !noalias on lifetime intrinsics. You should probably not add those to NoAliasInstrs, as they will be removed anyway (and are removed before the metadata update).

khei4 updated this revision to Diff 543195.Jul 22 2023, 8:23 AM

don't remove noalias metadata from erased lifetime markers

khei4 added a comment.Jul 22 2023, 8:25 AM

@nikic Thanks! I added a test and prevented insertion of lifetime intrinsic to NoAliasInstrs on CaptureTracking lambda function. Something odd is I can't reproduce reported errors on that test, which have noalias metadata on lifetime intrinsics.

nikic added a comment.Jul 22 2023, 8:57 AM

@nikic Thanks! I added a test and prevented insertion of lifetime intrinsic to NoAliasInstrs on CaptureTracking lambda function. Something odd is I can't reproduce reported errors on that test, which have noalias metadata on lifetime intrinsics.

Do you get memory errors when running under valgrind? Use after free doesn't always result in a crash.

khei4 added a comment.Jul 22 2023, 6:27 PM

Do you get memory errors when running under valgrind? Use after free doesn't always result in a crash.

Ah, yes. valgrind --track-origins=yes found the one for the added test! Thanks, it's beneficial!

==1060872== Command: ./build/bin/opt -passes=memcpyopt -S ./test.ll
==1060872== 
==1060872== Invalid read of size 8
==1060872==    at 0x271F568: llvm::Instruction::setMetadata(unsigned int, llvm::MDNode*) ...
nikic accepted this revision.Jul 23 2023, 9:51 AM

LGTM

This revision was landed with ongoing or failed builds.Jul 23 2023, 6:24 PM
This revision was automatically updated to reflect the committed changes.

Putting this here despite not having much information to provide yet, but this breaks cross-language LTO builds of Firefox in rust libstd BtreeMap code (compiled with rustc 1.71/llvm 16, LTO-ed with llvm-trunk).

khei4 added a comment.Aug 2 2023, 2:58 AM

@glandium Thank you for the report! Sorry, it might take a while to reproduce it on my local. If it exists, could you share any message or crash log to suspect this patch? (I'll also appreciate it if you provide a procedure or doc to reproduce the error.)

Heads up - we are seeing test failures that root cause to this revision.

More interesting, we are seeing some of these tests failing also under AddressSanitizer, complaining about stack-use-after-scope (https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterScope) - which seems to correspond well to the topic of this change.

We'll try to come up with a reproducer, but this might take time.

I'd like to revert this patch, it also seriously brake Asan.
Asan (and other sanitizer) relies on those lifetime markers which were removed.

And it's already in 17x branch, so we will need to cherry-pick the revert.

To address teminator lifetime insertion issue. Rebased for https://reviews.llvm.org/D155571

  • don't insert if the lastuser is the terminator for the BB,
    • This requires PostDominatorTree, so coming patch insert this, I believe the absence of the lifetime.end is not so much an issue ;) I'll appreciate if you tell me if it matters.

It's and issue for Sanitizers.

vitalybuka reopened this revision.Aug 2 2023, 12:40 PM
This revision is now accepted and ready to land.Aug 2 2023, 12:40 PM
vitalybuka requested changes to this revision.Aug 2 2023, 12:52 PM

I have reproducer, and lifetime markers are very broken.

I need to figure out if I can share the code, or create a new one, it may take day or so.

This revision now requires changes to proceed.Aug 2 2023, 12:52 PM

@glandium Thank you for the report! Sorry, it might take a while to reproduce it on my local. If it exists, could you share any message or crash log to suspect this patch? (I'll also appreciate it if you provide a procedure or doc to reproduce the error.)

I "suspect" this patch because I bisected the Firefox problem down to it. Revisions of llvm after the landing make Firefox crash, Revisions of llvm before don't :) The way Firefox crashes is that the rust BTreeMap code is freeing data from the stack that is not what it's supposed to be (as in, it's freeing something as a String that was never constructed as a String before)... which would probably match this message:

Heads up - we are seeing test failures that root cause to this revision.

More interesting, we are seeing some of these tests failing also under AddressSanitizer, complaining about stack-use-after-scope (https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterScope) - which seems to correspond well to the topic of this change.

We'll try to come up with a reproducer, but this might take time.

Presumably, a rust compiler using llvm trunk should be able to reproduce this on something smaller than Firefox.

However what I see is that:

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

This is reproducer 6ee9de7a676a22e8ee5254a91c9a14fdcb8b7f94

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

khei4 added a comment.EditedAug 2 2023, 8:21 PM

@vitalybuka @glandium @eaeltsin
Thank you for reducing crashes, and reverting! I really appreciate your analysis.

That means Asan has different semantics for unescaped src.

{
lifetime.start(src)
...
lifetime.end(src)
ret void
}

and

{
...(no lifetime.start/end)
ret void
}

right?
I (perhaps mistakenly) think, in the case you give, unescaped src alloca lifetime seems to have not so much change, at least for the point that it's alive during the function call, but releasing by even automatic end and manual end might differ. Anyway, I'll try to repro the Asan crash and see the problem,

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

vitalybuka added a comment.EditedAug 2 2023, 11:08 PM

@vitalybuka @glandium @eaeltsin
Thank you for reducing crashes, and reverting! I really appreciate your analysis.

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

That means Asan has the different semantics for

{
lifetime.start(src)
...
lifetime.end(src)
ret void
}

and

{
...(no lifetime.start/end)
ret void
}

Not sure what you asking.
Lifetime intrinsics are optional. if they are omitted, alloca is valid from entry to any return. So asan, if no markers, can not detect use after scope, if the is markers, it will complain on any alloca access after the end marker.

Reproducer is probably oversimplified. In real miscompile case after the end, inserted by D153453 transformation, there was use of the src alloca. Which was valid when we have no markers, but invalid now, because it's after the end.

I've update the reproducer df0b1df99c9ccf110482661678321e596566c725.
That's what we have with D153453

define void @test() {
entry:
  %agg.tmp.sroa.14 = alloca [20 x i8], align 4
  %agg.tmp.sroa.14.128.sroa_idx = getelementptr i8, ptr %agg.tmp.sroa.14, i64 4
  call void @llvm.lifetime.start.p0(i64 20, ptr %agg.tmp.sroa.14)
  call void @llvm.memcpy.p0.p0.i64(ptr %agg.tmp.sroa.14.128.sroa_idx, ptr null, i64 1, i1 false)
  %agg.tmp3.sroa.35.128.sroa_idx = getelementptr i8, ptr %agg.tmp.sroa.14, i64 4
  call void @llvm.memcpy.p0.p0.i64(ptr inttoptr (i64 4 to ptr), ptr %agg.tmp3.sroa.35.128.sroa_idx, i64 1, i1 false)
  call void @llvm.lifetime.end.p0(i64 20, ptr %agg.tmp.sroa.14)                                   ; <------------------------------- END
  call void @llvm.memcpy.p0.p0.i64(ptr null, ptr %agg.tmp3.sroa.35.128.sroa_idx, i64 1, i1 false) ; <------------------------------- use after END
  ret void
}

right?
I (perhaps mistakenly) think, in the case you give, src alloca lifetime seems to have not so much change, at least for the point that it's alive during the function call, but releasing by even automatic end and manual end might differ. Anyway, I'll try to repro the Asan crash and see the problem,

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

Maybe it's not so bad.
If we limit to alloca with at most one pair of start/end we just need to pick which start/end out of up 4 markers to keep using Domination.

e.g, if either src or dst has no markers, we strip markers from src
if src.start dominates dst.start then keep src.start
if dst.start dominates src.start then keep dst.start
if neither, don't perform transformation

similar with ends and postdominate.

khei4 added a comment.Aug 2 2023, 11:29 PM

Not sure what you asking.
Lifetime intrinsics are optional. if they are omitted, alloca is valid from entry to any return. So asan, if no markers, can not detect use after scope, if the is markers, it will complain on any alloca access after the end marker.

Reproducer is probably oversimplified. In real miscompile case after the end, inserted by D153453 transformation, there was use of the src alloca. Which was valid when we have no markers, but invalid now, because it's after the end.

I've update the reproducer df0b1df99c9ccf110482661678321e596566c725.

@vitalybuka
Thanks! The case you give makes it clear! (Sorry, my words were misreading, I referred to your reproducers, and you answered me clearly :)
Actually, the use after the end is problematic, and this patch introduces lifetime.end for %agg.tmp3.sroa.35.128.sroa_idx,, and other optimization seems to lift the contiguous memcpy src, which causes use after lifetime.end.
Although the patch may be on that optimization, I'll handle that! thanks!

nikic added a comment.Aug 7 2023, 5:38 AM

I looked into this a bit, and I believe the problem here is that we're not adding MSSA MemoryAccesses for the newly inserted lifetime intrinsics. The actual intrinsic placement transform itself is correct, it's just that following MSSA queries fail to see them and perform incorrect transforms as a result.

khei4 updated this revision to Diff 548068.Aug 7 2023, 11:10 PM

create MemoryAccess for newly inserted lifetime intrinsics.

khei4 added a comment.Aug 7 2023, 11:18 PM

I looked into this a bit, and I believe the problem here is that we're not adding MSSA MemoryAccesses for the newly inserted lifetime intrinsics. The actual intrinsic placement transform itself is correct, it's just that following MSSA queries fail to see them and perform incorrect transforms as a result.

@nikic
Thanks! You are right. I created MemoryAccess for newly created lifetime markers.

If we limit to alloca with at most one pair of start/end we just need to pick which start/end out of up 4 markers to keep using Domination.

e.g, if either src or dst has no markers, we strip markers from src
if src.start dominates dst.start then keep src.start
if dst.start dominates src.start then keep dst.start
if neither, don't perform transformation

similar with ends and postdominate.

@vitalybuka

Thank you! I missed this. Although I'm not pretty sure about the lifetime-related optimizations e.g. stack-coloring, I begin to feel like if the transformation is possible(with any interleaving lifetime), then the transformed one(single shrink-wrapped lifetime) may be not worse than the original one.

llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
22

I thought This memcpy removal is because of undef value copying, but if so original example should be also removed. The current MemoryAccess creation might not be complete.

nikic added inline comments.Aug 7 2023, 11:21 PM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1613

You should omit the third parameter (Template) here, because the effects of the lifetime intrinsics have no relation to those of FirstMA.

Also, shouldn't we be inserting the access *before* the first user, rather than after?

khei4 updated this revision to Diff 548128.Aug 8 2023, 2:41 AM

use createMemoryAccessBefore instead of createMemoryAccessAfter and use Definitions

nikic added inline comments.Aug 8 2023, 2:45 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Why does this use getDefiningAccess() now? I would expect the call to be just MSSAU->createMemoryAcessBefore(Start, FirstMA).

khei4 added a comment.Aug 8 2023, 3:10 AM

I slightly modified the tests by https://github.com/llvm/llvm-project/commit/90ecb9d5b082e331a569f8c06f85289faa2a5c5f

  • add -verify-memoryssa to the opt flag
  • make memcpy defined on lifetime-missing test
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Sorry, I didn't get the point and just mimicked the other use on MemCpyOpt.

You should omit the third parameter (Template) here, because the effects of the lifetime intrinsics have no relation to those of FirstMA.

Do you mean createDefinedAccess?
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/MemorySSA.cpp#L1682-L1685
The third argument for createMemoryAccessBefore seems necessary(I'm not sure whether this is inherently so) https://github.com/llvm/llvm-project/blob/71c3a5519dbcd609fb64560ac7fdfe8db149b905/llvm/include/llvm/Analysis/MemorySSAUpdater.h

Also, shouldn't we be inserting the access *before* the first user, rather than after?

This seems exactly right! Applied.

khei4 updated this revision to Diff 548135.Aug 8 2023, 3:12 AM

rebased for test fix

nikic added inline comments.Aug 8 2023, 3:14 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Ah, ignore my comment, you are right.

nikic added inline comments.Aug 8 2023, 6:32 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1627

I think for the lifetime.start using getDefiningAccess() is right, as we're inserting it before the access (the new defining access of FirstMA will become the newly inserted MemoryDef). However, I think that for the lifetime.end we need to pass LastMA rather than LastMA->getDefiningAccess() as the defining access.

khei4 updated this revision to Diff 548226.Aug 8 2023, 7:56 AM

use LastMA(MemoryAccess) directly rather than LastMA->getDefiningAccess() for MemorySSAUpdater::createMemoryAccessAfter

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1627

Thanks!

However, I think that for the lifetime.end we need to pass LastMA rather than LastMA->getDefiningAccess() as the defining access.

Hmm, I see all use of createMemoryAccessAfter on MemCpyOpt is so, TBH, I didn't get the meaning of defining access, anyway I can try it!

nikic accepted this revision.Aug 8 2023, 8:42 AM

LGTM

khei4 added a comment.Aug 8 2023, 5:55 PM

@vitalybuka What do you think about this change? Especially the lifetime part is unchanged because it would be safe.

I begin to feel like if the transformation is possible(with any interleaving lifetime), then the transformed one(single shrink-wrapped lifetime) may be not worse than the original one.

Kobzol added a subscriber: Kobzol.Aug 10 2023, 2:32 AM
vitalybuka accepted this revision.Aug 10 2023, 6:29 PM

Looks strange that we add lifetime markers on alloca which has no markers before.
Can't create counterexample, precondition blocks transformation.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
72

misaligned, clang-format please?

1529

they also mark when object is considered alive/dead

1532

why do we need this condition?

This revision is now accepted and ready to land.Aug 10 2023, 6:29 PM
khei4 marked 2 inline comments as done.Aug 13 2023, 5:44 AM

@vitalybuka Thank you for the review!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
72

I aligned! Thanks!

1532

This transformation currently sees the full-sized (and not variable) lifetime intrinsics as an ignorable mod, because removing them doesn't change the semantics(but as you say, stack coloring might be affected), same as https://reviews.llvm.org/D140089, but the former patch does have more meaning to the full-sized lifetime intrinsics for liveness analysis.

khei4 updated this revision to Diff 549692.Aug 13 2023, 5:45 AM
khei4 marked an inline comment as done.

Apply clang-format, update comments.

khei4 added a comment.Aug 13 2023, 5:47 AM

@vitalybuka

Looks strange that we add lifetime markers on alloca which has no markers before.
Can't create counterexample, precondition blocks transformation.

Yeah, this is hard, but I believe capture tracking does guarantee that this is safe for unescaped allocas.

This revision was landed with ongoing or failed builds.Aug 13 2023, 6:10 AM
This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Aug 13 2023, 4:31 PM
khei4 added a comment.EditedAug 13 2023, 11:02 PM

@vitalybuka Thank you!
seems like MemoryAccess Creation is broken... (only stage2 build?)

FAILED: tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googletest/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googlemock/include -nostdinc++ -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include/c++/v1 -fsanitize=undefined -Wl,--rpath=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -L/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -w -stdlib=libc++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fno-omit-frame-pointer -gline-tables-only -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=all -fsanitize-blacklist=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/sanitizers/ubsan_ignorelist.txt -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG  -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -std=c++17 -MD -MT tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -MF tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o.d -o tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -c /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp
clang++: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Analysis/MemorySSA.cpp:1693: llvm::MemoryUseOrDef *llvm::MemorySSA::createDefinedAccess(llvm::Instruction *, llvm::MemoryAccess *, const llvm::MemoryUseOrDef *, bool): Assertion `(!Definition || !isa<MemoryUse>(Definition)) && "A use cannot be a defining access"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ -fsanitize=undefined -L/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -w -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fno-omit-frame-pointer -gline-tables-only -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=all -fsanitize-blacklist=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/sanitizers/ubsan_ignorelist.txt -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -funwind-tables -fno-rtti -Wno-suggest-override -std=c++17 -fdiagnostics-color -Wl,--rpath=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googletest/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googlemock/include -nostdinc++ -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include/c++/v1 -stdlib=libc++ -DNDEBUG -UNDEBUG -c -MD -MT tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -MF tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o.d -fcolor-diagnostics -o tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00005638879950f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x7b4f0f7)
 #1 0x0000563887992fee llvm::sys::RunSignalHandlers() (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x7b4cfee)
 #2 0x0000563887900248 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007fb9f783bcf0 (/lib/x86_64-linux-gnu/libc.so.6+0x3bcf0)
 #4 0x00007fb9f789226b pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9226b)
 #5 0x00007fb9f783bc46 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3bc46)
 #6 0x00007fb9f78227fc abort (/lib/x86_64-linux-gnu/libc.so.6+0x227fc)
 #7 0x00007fb9f782271b (/lib/x86_64-linux-gnu/libc.so.6+0x2271b)
 #8 0x00007fb9f7833596 (/lib/x86_64-linux-gnu/libc.so.6+0x33596)
 #9 0x0000563886bee00b llvm::MemorySSA::createDefinedAccess(llvm::Instruction*, llvm::MemoryAccess*, llvm::MemoryUseOrDef const*, bool) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x6da800b)
#10 0x0000563886c0c585 llvm::MemorySSAUpdater::createMemoryAccessAfter(llvm::Instruction*, llvm::MemoryAccess*, llvm::MemoryAccess*) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x6dc6585)
#11 0x0000563889005416 llvm::MemCpyOptPass::performStackMoveOptzn(llvm::Instruction*, llvm::Instruction*, llvm::AllocaInst*, llvm::AllocaInst*, unsigned long, llvm::BatchAAResults&) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x91bf416)
nikic added a comment.Aug 15 2023, 3:48 AM

Here's a reproducer for the assertion failure:

define i32 @test() {
  %src = alloca %struct.Foo, align 4
  %dest = alloca %struct.Foo, align 4
  call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
  call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
  %v = load i32, ptr %dest
  ret i32 %v
}
khei4 added a comment.EditedAug 15 2023, 4:10 AM

@nikic Thanks! I was wondering readonly call isn't MemoryUse(Currently seems not so, that can probably be patched);)

%src = alloca %struct.Foo, align 4
%dest = alloca %struct.Foo, align 4
store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
%_v = call i32 @use_readonly(ptr nocapture readonly %dest)
%_v2 = call i32 @use_readonly(ptr nocapture readonly %src)
ret void

I'll add a test for this case and rebase it!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

So, replace the second argument with LastMA->getDefiningAccess() works. It seems like getDefiningAccess is idempotent for MemoryDef.

khei4 updated this revision to Diff 550253.EditedAug 15 2023, 4:22 AM
nikic accepted this revision.Aug 15 2023, 5:17 AM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

getDefiningAccess() is not idempotent for MemoryDef. However, it's still fine, because of something I didn't realize before: The defining access we specify here actually doesn't matter at all. It will always be replaced with the correct defining access in https://github.com/llvm/llvm-project/blob/98322d3eb43168a6a64c1a15a1e754e15c04aa2f/llvm/lib/Analysis/MemorySSAUpdater.cpp#L341.

I may look into refactoring these APIs, because the way they currently work is very confusing...

khei4 added inline comments.Aug 15 2023, 6:30 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

getDefiningAccess() is not idempotent for MemoryDef

Hmm. Thanks!

I see, getPreviousDef might be used on multi-BB cases; To reminder at least I can attach comments!

khei4 updated this revision to Diff 550307.Aug 15 2023, 6:31 AM
nikic added inline comments.Aug 15 2023, 6:53 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

I've put up https://reviews.llvm.org/D157979 to clarify this.