Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas
ClosedPublic

Authored by khei4 on Jun 21 2023, 11:50 AM.

Details

Summary

This is takeover from @pcwalton 's https://reviews.llvm.org/D140089

New transformation unifies the two allocates on the same BB if the followings are satisfied
(1) The src is fully copied to dest by memcpy, memmove or store&load.
(2) src and dest are 1) static alloca, which is to avoid stacksave/stackrestore handling, 2) unescaped allocas.
(3) The dest has no Mod Ref except full-sized lifetime intrinsic and copy itself, from the alloca to the store.
(4) If the dest has no Mod => src has no Ref, dest has no Ref => src has no Mod from the store to the end of BB.

Roughly, the target structure on this patch is like the following.

{
   unescaped static allocas of src/dest

   ↓ src/dest may have mod/ref. Practically it may be sufficient to bailout if dest has any ModRef, many case would be problem if the dest is replaced with src. 

   full-size copy from src to dest, memcpy/memmove/load-store of the src value.

   ↓ src/dest may have mod/ref. There have to be no read(ref) with the other one's write(mod)
   
   end without capturing both of the alloca.
}

For the dest modification before the copy, it could be ok to erase those mods, because of the unescaped allocas in this case.
For mod/ref condition after the full-size copy, the @pcwalton 's original patch did the BB-level liveness checks. Maybe this patch has a lot of room to improve. The inherent condition is that one of two ptr is unnecessary for any execution path.
It seems like there are almost no compile-time regressions. https://llvm-compile-time-tracker.com/compare.php?from=0f9f95146a7fc6fa4f9bc3c1aa2a23386f521dac&to=a6c40d12ea1c6a4b82e88ce1c7b2a723d199fb85&stat=instructions:u

pre-commit tests: https://reviews.llvm.org/D152277

pre-commit tests for user order crash https://reviews.llvm.org/D155179

pre-commit tests for terminator crash https://reviews.llvm.org/D155571

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
khei4 added a comment.Aug 2 2023, 2:58 AM

@glandium Thank you for the report! Sorry, it might take a while to reproduce it on my local. If it exists, could you share any message or crash log to suspect this patch? (I'll also appreciate it if you provide a procedure or doc to reproduce the error.)

Heads up - we are seeing test failures that root cause to this revision.

More interesting, we are seeing some of these tests failing also under AddressSanitizer, complaining about stack-use-after-scope (https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterScope) - which seems to correspond well to the topic of this change.

We'll try to come up with a reproducer, but this might take time.

I'd like to revert this patch, it also seriously brake Asan.
Asan (and other sanitizer) relies on those lifetime markers which were removed.

And it's already in 17x branch, so we will need to cherry-pick the revert.

To address teminator lifetime insertion issue. Rebased for https://reviews.llvm.org/D155571

  • don't insert if the lastuser is the terminator for the BB,
    • This requires PostDominatorTree, so coming patch insert this, I believe the absence of the lifetime.end is not so much an issue ;) I'll appreciate if you tell me if it matters.

It's and issue for Sanitizers.

vitalybuka reopened this revision.Aug 2 2023, 12:40 PM
This revision is now accepted and ready to land.Aug 2 2023, 12:40 PM
vitalybuka requested changes to this revision.Aug 2 2023, 12:52 PM

I have reproducer, and lifetime markers are very broken.

I need to figure out if I can share the code, or create a new one, it may take day or so.

This revision now requires changes to proceed.Aug 2 2023, 12:52 PM

@glandium Thank you for the report! Sorry, it might take a while to reproduce it on my local. If it exists, could you share any message or crash log to suspect this patch? (I'll also appreciate it if you provide a procedure or doc to reproduce the error.)

I "suspect" this patch because I bisected the Firefox problem down to it. Revisions of llvm after the landing make Firefox crash, Revisions of llvm before don't :) The way Firefox crashes is that the rust BTreeMap code is freeing data from the stack that is not what it's supposed to be (as in, it's freeing something as a String that was never constructed as a String before)... which would probably match this message:

Heads up - we are seeing test failures that root cause to this revision.

More interesting, we are seeing some of these tests failing also under AddressSanitizer, complaining about stack-use-after-scope (https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterScope) - which seems to correspond well to the topic of this change.

We'll try to come up with a reproducer, but this might take time.

Presumably, a rust compiler using llvm trunk should be able to reproduce this on something smaller than Firefox.

However what I see is that:

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

This is reproducer 6ee9de7a676a22e8ee5254a91c9a14fdcb8b7f94

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

khei4 added a comment.EditedAug 2 2023, 8:21 PM

@vitalybuka @glandium @eaeltsin
Thank you for reducing crashes, and reverting! I really appreciate your analysis.

That means Asan has different semantics for unescaped src.

{
lifetime.start(src)
...
lifetime.end(src)
ret void
}

and

{
...(no lifetime.start/end)
ret void
}

right?
I (perhaps mistakenly) think, in the case you give, unescaped src alloca lifetime seems to have not so much change, at least for the point that it's alive during the function call, but releasing by even automatic end and manual end might differ. Anyway, I'll try to repro the Asan crash and see the problem,

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

vitalybuka added a comment.EditedAug 2 2023, 11:08 PM

@vitalybuka @glandium @eaeltsin
Thank you for reducing crashes, and reverting! I really appreciate your analysis.

  • src has no lifetime markers - so it's considered alive all the time
  • dst has lifetiem markers

I guess the pass removes dst and transfer lifetimes to src, effectively reducing scope. So I would recommend to handle this case some how. In general dst lifetime can only expand src lifetime.

That means Asan has the different semantics for

{
lifetime.start(src)
...
lifetime.end(src)
ret void
}

and

{
...(no lifetime.start/end)
ret void
}

Not sure what you asking.
Lifetime intrinsics are optional. if they are omitted, alloca is valid from entry to any return. So asan, if no markers, can not detect use after scope, if the is markers, it will complain on any alloca access after the end marker.

Reproducer is probably oversimplified. In real miscompile case after the end, inserted by D153453 transformation, there was use of the src alloca. Which was valid when we have no markers, but invalid now, because it's after the end.

I've update the reproducer df0b1df99c9ccf110482661678321e596566c725.
That's what we have with D153453

define void @test() {
entry:
  %agg.tmp.sroa.14 = alloca [20 x i8], align 4
  %agg.tmp.sroa.14.128.sroa_idx = getelementptr i8, ptr %agg.tmp.sroa.14, i64 4
  call void @llvm.lifetime.start.p0(i64 20, ptr %agg.tmp.sroa.14)
  call void @llvm.memcpy.p0.p0.i64(ptr %agg.tmp.sroa.14.128.sroa_idx, ptr null, i64 1, i1 false)
  %agg.tmp3.sroa.35.128.sroa_idx = getelementptr i8, ptr %agg.tmp.sroa.14, i64 4
  call void @llvm.memcpy.p0.p0.i64(ptr inttoptr (i64 4 to ptr), ptr %agg.tmp3.sroa.35.128.sroa_idx, i64 1, i1 false)
  call void @llvm.lifetime.end.p0(i64 20, ptr %agg.tmp.sroa.14)                                   ; <------------------------------- END
  call void @llvm.memcpy.p0.p0.i64(ptr null, ptr %agg.tmp3.sroa.35.128.sroa_idx, i64 1, i1 false) ; <------------------------------- use after END
  ret void
}

right?
I (perhaps mistakenly) think, in the case you give, src alloca lifetime seems to have not so much change, at least for the point that it's alive during the function call, but releasing by even automatic end and manual end might differ. Anyway, I'll try to repro the Asan crash and see the problem,

BTW. Also there are cases when lifetime is not very trivial between basic block. We don't have issue with that yet, but I think it's better to bail out and don't touch if we have more then 1 start/end for source or dst, or final lifetime can not be expressed as single range.

So the pass MUST not reduce lifetime of existing allocas - that's miscompile.
Expanding is acceptable, but can make stack coloring or sanitizers less efficient.

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

Hmm, seems reasonable, it may take time, but I'll treat those! thanks!

Maybe it's not so bad.
If we limit to alloca with at most one pair of start/end we just need to pick which start/end out of up 4 markers to keep using Domination.

e.g, if either src or dst has no markers, we strip markers from src
if src.start dominates dst.start then keep src.start
if dst.start dominates src.start then keep dst.start
if neither, don't perform transformation

similar with ends and postdominate.

khei4 added a comment.Aug 2 2023, 11:29 PM

Not sure what you asking.
Lifetime intrinsics are optional. if they are omitted, alloca is valid from entry to any return. So asan, if no markers, can not detect use after scope, if the is markers, it will complain on any alloca access after the end marker.

Reproducer is probably oversimplified. In real miscompile case after the end, inserted by D153453 transformation, there was use of the src alloca. Which was valid when we have no markers, but invalid now, because it's after the end.

I've update the reproducer df0b1df99c9ccf110482661678321e596566c725.

@vitalybuka
Thanks! The case you give makes it clear! (Sorry, my words were misreading, I referred to your reproducers, and you answered me clearly :)
Actually, the use after the end is problematic, and this patch introduces lifetime.end for %agg.tmp3.sroa.35.128.sroa_idx,, and other optimization seems to lift the contiguous memcpy src, which causes use after lifetime.end.
Although the patch may be on that optimization, I'll handle that! thanks!

nikic added a comment.Aug 7 2023, 5:38 AM

I looked into this a bit, and I believe the problem here is that we're not adding MSSA MemoryAccesses for the newly inserted lifetime intrinsics. The actual intrinsic placement transform itself is correct, it's just that following MSSA queries fail to see them and perform incorrect transforms as a result.

khei4 updated this revision to Diff 548068.Aug 7 2023, 11:10 PM

create MemoryAccess for newly inserted lifetime intrinsics.

khei4 added a comment.Aug 7 2023, 11:18 PM

I looked into this a bit, and I believe the problem here is that we're not adding MSSA MemoryAccesses for the newly inserted lifetime intrinsics. The actual intrinsic placement transform itself is correct, it's just that following MSSA queries fail to see them and perform incorrect transforms as a result.

@nikic
Thanks! You are right. I created MemoryAccess for newly created lifetime markers.

If we limit to alloca with at most one pair of start/end we just need to pick which start/end out of up 4 markers to keep using Domination.

e.g, if either src or dst has no markers, we strip markers from src
if src.start dominates dst.start then keep src.start
if dst.start dominates src.start then keep dst.start
if neither, don't perform transformation

similar with ends and postdominate.

@vitalybuka

Thank you! I missed this. Although I'm not pretty sure about the lifetime-related optimizations e.g. stack-coloring, I begin to feel like if the transformation is possible(with any interleaving lifetime), then the transformed one(single shrink-wrapped lifetime) may be not worse than the original one.

llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
23

I thought This memcpy removal is because of undef value copying, but if so original example should be also removed. The current MemoryAccess creation might not be complete.

nikic added inline comments.Aug 7 2023, 11:21 PM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1613

You should omit the third parameter (Template) here, because the effects of the lifetime intrinsics have no relation to those of FirstMA.

Also, shouldn't we be inserting the access *before* the first user, rather than after?

khei4 updated this revision to Diff 548128.Aug 8 2023, 2:41 AM

use createMemoryAccessBefore instead of createMemoryAccessAfter and use Definitions

nikic added inline comments.Aug 8 2023, 2:45 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Why does this use getDefiningAccess() now? I would expect the call to be just MSSAU->createMemoryAcessBefore(Start, FirstMA).

khei4 added a comment.Aug 8 2023, 3:10 AM

I slightly modified the tests by https://github.com/llvm/llvm-project/commit/90ecb9d5b082e331a569f8c06f85289faa2a5c5f

  • add -verify-memoryssa to the opt flag
  • make memcpy defined on lifetime-missing test
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Sorry, I didn't get the point and just mimicked the other use on MemCpyOpt.

You should omit the third parameter (Template) here, because the effects of the lifetime intrinsics have no relation to those of FirstMA.

Do you mean createDefinedAccess?
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/MemorySSA.cpp#L1682-L1685
The third argument for createMemoryAccessBefore seems necessary(I'm not sure whether this is inherently so) https://github.com/llvm/llvm-project/blob/71c3a5519dbcd609fb64560ac7fdfe8db149b905/llvm/include/llvm/Analysis/MemorySSAUpdater.h

Also, shouldn't we be inserting the access *before* the first user, rather than after?

This seems exactly right! Applied.

khei4 updated this revision to Diff 548135.Aug 8 2023, 3:12 AM

rebased for test fix

nikic added inline comments.Aug 8 2023, 3:14 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1614

Ah, ignore my comment, you are right.

nikic added inline comments.Aug 8 2023, 6:32 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1627

I think for the lifetime.start using getDefiningAccess() is right, as we're inserting it before the access (the new defining access of FirstMA will become the newly inserted MemoryDef). However, I think that for the lifetime.end we need to pass LastMA rather than LastMA->getDefiningAccess() as the defining access.

khei4 updated this revision to Diff 548226.Aug 8 2023, 7:56 AM

use LastMA(MemoryAccess) directly rather than LastMA->getDefiningAccess() for MemorySSAUpdater::createMemoryAccessAfter

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1627

Thanks!

However, I think that for the lifetime.end we need to pass LastMA rather than LastMA->getDefiningAccess() as the defining access.

Hmm, I see all use of createMemoryAccessAfter on MemCpyOpt is so, TBH, I didn't get the meaning of defining access, anyway I can try it!

nikic accepted this revision.Aug 8 2023, 8:42 AM

LGTM

khei4 added a comment.Aug 8 2023, 5:55 PM

@vitalybuka What do you think about this change? Especially the lifetime part is unchanged because it would be safe.

I begin to feel like if the transformation is possible(with any interleaving lifetime), then the transformed one(single shrink-wrapped lifetime) may be not worse than the original one.

Kobzol added a subscriber: Kobzol.Aug 10 2023, 2:32 AM
vitalybuka accepted this revision.Aug 10 2023, 6:29 PM

Looks strange that we add lifetime markers on alloca which has no markers before.
Can't create counterexample, precondition blocks transformation.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
70

misaligned, clang-format please?

1529

they also mark when object is considered alive/dead

1532

why do we need this condition?

This revision is now accepted and ready to land.Aug 10 2023, 6:29 PM
khei4 marked 2 inline comments as done.Aug 13 2023, 5:44 AM

@vitalybuka Thank you for the review!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
70

I aligned! Thanks!

1532

This transformation currently sees the full-sized (and not variable) lifetime intrinsics as an ignorable mod, because removing them doesn't change the semantics(but as you say, stack coloring might be affected), same as https://reviews.llvm.org/D140089, but the former patch does have more meaning to the full-sized lifetime intrinsics for liveness analysis.

khei4 updated this revision to Diff 549692.Aug 13 2023, 5:45 AM
khei4 marked an inline comment as done.

Apply clang-format, update comments.

khei4 added a comment.Aug 13 2023, 5:47 AM

@vitalybuka

Looks strange that we add lifetime markers on alloca which has no markers before.
Can't create counterexample, precondition blocks transformation.

Yeah, this is hard, but I believe capture tracking does guarantee that this is safe for unescaped allocas.

This revision was landed with ongoing or failed builds.Aug 13 2023, 6:10 AM
This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Aug 13 2023, 4:31 PM
khei4 added a comment.EditedAug 13 2023, 11:02 PM

@vitalybuka Thank you!
seems like MemoryAccess Creation is broken... (only stage2 build?)

FAILED: tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googletest/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googlemock/include -nostdinc++ -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include/c++/v1 -fsanitize=undefined -Wl,--rpath=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -L/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -w -stdlib=libc++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fno-omit-frame-pointer -gline-tables-only -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=all -fsanitize-blacklist=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/sanitizers/ubsan_ignorelist.txt -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG  -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -std=c++17 -MD -MT tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -MF tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o.d -o tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -c /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp
clang++: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Analysis/MemorySSA.cpp:1693: llvm::MemoryUseOrDef *llvm::MemorySSA::createDefinedAccess(llvm::Instruction *, llvm::MemoryAccess *, const llvm::MemoryUseOrDef *, bool): Assertion `(!Definition || !isa<MemoryUse>(Definition)) && "A use cannot be a defining access"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ -fsanitize=undefined -L/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -w -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fno-omit-frame-pointer -gline-tables-only -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=all -fsanitize-blacklist=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/sanitizers/ubsan_ignorelist.txt -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -funwind-tables -fno-rtti -Wno-suggest-override -std=c++17 -fdiagnostics-color -Wl,--rpath=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/lib -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googletest/include -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/third-party/unittest/googlemock/include -nostdinc++ -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include -isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/include/c++/v1 -stdlib=libc++ -DNDEBUG -UNDEBUG -c -MD -MT tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o -MF tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o.d -fcolor-diagnostics -o tools/clang/unittests/Analysis/FlowSensitive/CMakeFiles/ClangAnalysisFlowSensitiveTests.dir/MultiVarConstantPropagationTest.cpp.o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/unittests/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00005638879950f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x7b4f0f7)
 #1 0x0000563887992fee llvm::sys::RunSignalHandlers() (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x7b4cfee)
 #2 0x0000563887900248 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007fb9f783bcf0 (/lib/x86_64-linux-gnu/libc.so.6+0x3bcf0)
 #4 0x00007fb9f789226b pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9226b)
 #5 0x00007fb9f783bc46 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3bc46)
 #6 0x00007fb9f78227fc abort (/lib/x86_64-linux-gnu/libc.so.6+0x227fc)
 #7 0x00007fb9f782271b (/lib/x86_64-linux-gnu/libc.so.6+0x2271b)
 #8 0x00007fb9f7833596 (/lib/x86_64-linux-gnu/libc.so.6+0x33596)
 #9 0x0000563886bee00b llvm::MemorySSA::createDefinedAccess(llvm::Instruction*, llvm::MemoryAccess*, llvm::MemoryUseOrDef const*, bool) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x6da800b)
#10 0x0000563886c0c585 llvm::MemorySSAUpdater::createMemoryAccessAfter(llvm::Instruction*, llvm::MemoryAccess*, llvm::MemoryAccess*) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x6dc6585)
#11 0x0000563889005416 llvm::MemCpyOptPass::performStackMoveOptzn(llvm::Instruction*, llvm::Instruction*, llvm::AllocaInst*, llvm::AllocaInst*, unsigned long, llvm::BatchAAResults&) (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang+++0x91bf416)
nikic added a comment.Aug 15 2023, 3:48 AM

Here's a reproducer for the assertion failure:

define i32 @test() {
  %src = alloca %struct.Foo, align 4
  %dest = alloca %struct.Foo, align 4
  call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
  call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
  %v = load i32, ptr %dest
  ret i32 %v
}
khei4 added a comment.EditedAug 15 2023, 4:10 AM

@nikic Thanks! I was wondering readonly call isn't MemoryUse(Currently seems not so, that can probably be patched);)

%src = alloca %struct.Foo, align 4
%dest = alloca %struct.Foo, align 4
store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
%_v = call i32 @use_readonly(ptr nocapture readonly %dest)
%_v2 = call i32 @use_readonly(ptr nocapture readonly %src)
ret void

I'll add a test for this case and rebase it!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

So, replace the second argument with LastMA->getDefiningAccess() works. It seems like getDefiningAccess is idempotent for MemoryDef.

khei4 updated this revision to Diff 550253.EditedAug 15 2023, 4:22 AM
nikic accepted this revision.Aug 15 2023, 5:17 AM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

getDefiningAccess() is not idempotent for MemoryDef. However, it's still fine, because of something I didn't realize before: The defining access we specify here actually doesn't matter at all. It will always be replaced with the correct defining access in https://github.com/llvm/llvm-project/blob/98322d3eb43168a6a64c1a15a1e754e15c04aa2f/llvm/lib/Analysis/MemorySSAUpdater.cpp#L341.

I may look into refactoring these APIs, because the way they currently work is very confusing...

khei4 added inline comments.Aug 15 2023, 6:30 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

getDefiningAccess() is not idempotent for MemoryDef

Hmm. Thanks!

I see, getPreviousDef might be used on multi-BB cases; To reminder at least I can attach comments!

khei4 updated this revision to Diff 550307.Aug 15 2023, 6:31 AM
nikic added inline comments.Aug 15 2023, 6:53 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1626

I've put up https://reviews.llvm.org/D157979 to clarify this.