This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] implement multi BB stack-move optimization
ClosedPublic

Authored by khei4 on Jul 16 2023, 8:36 AM.

Details

Summary

This patch extends https://reviews.llvm.org/D153453 to muti-BB case, aims to remove memcpy like https://internals.rust-lang.org/t/the-way-to-see-how-memcpy-and-alloca-introduced-by-move-is-optimized/19148/4

Single-BB version was like the following

(1) The src is fully copied to dest by memcpy, memmove or store&load.
(2) src and dest are 1) static alloca, which is to avoid stacksave/stackrestore handling, 2) unescaped allocas.
(3) The dest has no Mod Ref except full-sized lifetime intrinsic and copy itself, from the alloca to the store.
(4) If the dest has no Mod => src has no Ref, dest has no Ref => src has no Mod from the store to the end of BB.

This patch changes the original single-BB patch's ModRef checking conditions (3) and (4) by using reachability and dominator tree query as the following

(3) Mod/Ref for dest doesn't exist if it's reachable to full-size copy
(4) All uses not post dominated by the full-size copy satisfies (dest-Mod NAND src-Ref) AND (dest-Ref NAND src-Mod),

NOTE: For above conditions, this is more conservative than pcwalton's live analysis approach, and we can transform if all execution passes have no conflict, like cache coherency for the src and dest. It's safe for unescaped static contexts, if any ref, after the other one's mod and before the self-full-size Mod. This Mod could be called Def like the original pcwalton's patch. To do this, need a liveness analysis like the original patch or similar things. Also, unescaped alloca and byval args have almost the same semantics, so we can extend the way also.

pre-commit test: https://reviews.llvm.org/D155422
compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=2975ccb4b06b3d3aedd86ab21729146e441521d7&to=83fcffd325e79dd89e2b932b053945f868659f56&stat=instructions:u

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2023, 8:36 AM
khei4 requested review of this revision.Jul 16 2023, 8:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2023, 8:36 AM
khei4 edited the summary of this revision. (Show Details)Jul 16 2023, 10:05 AM
khei4 updated this revision to Diff 540870.Jul 16 2023, 11:24 PM

add

  • Dom/PDomTree branches and rebased for instruction level nearst common pdom calculation https://reviews.llvm.org/D155420
  • insertion for lifetime.end/start for above dom/pdom cases
  • use isPotentiallyReachable for reachable query.

TODO: seems like conditions and/or implementation isn't sufficient to reduce loop example.

khei4 edited the summary of this revision. (Show Details)Jul 16 2023, 11:32 PM
khei4 updated this revision to Diff 540931.Jul 17 2023, 3:18 AM
  • do batch reachability checks
  • change pm tests
  • remove TODO comments
NOTE: loop fix might be another revision, but load/store should be erased.
nikic added a comment.Jul 17 2023, 6:34 AM

This looks okay to me as a fairly conservative extension. Does requiring PDT have any impact on compile-time?

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1633

Hm, this return true doesn't seem right to me if it the instruction is inside a loop. I think we still need to perform the reachability check in this case.

1704

A possible issue here is that PDom could be a terminator, e.g. an invoke. In that case ++ will go past the end of the block. I think it would be fine to treat this case as if there is no PDom, as it's unlikely to occur in practice (generally, the pdom use will be in a lifetime intrinsic, which can't be a terminator).

nikic added inline comments.Jul 17 2023, 11:27 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1704

Heh, looks like exactly this issue got reported here: https://reviews.llvm.org/D153453#inline-1503870 I failed to realize that this can already be an issue with the initial patch, not just this extension...

khei4 added a comment.Jul 18 2023, 2:24 AM

@nikic Thank you for the review! I'll handle the teminator problem on the single-BB first! I'll add tests and reopen it;)

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1704

Thank you! As you said exactly, I have forgotten to see it, implemented on pcwalton's patch.

khei4 updated this revision to Diff 542302.EditedJul 19 2023, 9:42 PM

address feedback

TODO

  • batch reachability checks for instructions
  • more negative cases
khei4 edited the summary of this revision. (Show Details)Jul 19 2023, 11:19 PM
nikic added inline comments.Jul 20 2023, 3:33 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1705

This looks suspicious to me. Would getBlock() return nullptr, or already getIDom()?

khei4 updated this revision to Diff 542792.EditedJul 21 2023, 1:08 AM
khei4 retitled this revision from (WIP) [MemCpyOpt] implement multi BB stack-move optimization to [MemCpyOpt] implement multi BB stack-move optimization.
khei4 edited the summary of this revision. (Show Details)

apply feedback

The primary unfinished case is the original pcwalton's _is_def suffix cases, which fully overwrite src/def after the copy. This inherently requires liveness-analysis kind things, but I think for safe Rust move cases, Rust reference property aliasing xor mutability doesn't allow so much case for that (maybe?). So I hope this might have a slight effect on Rust programs :) (Theoretically, I didn't see rustc-codegen so much...)

khei4 added a comment.Jul 21 2023, 1:10 AM

@nikic
Thank you for the review!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1633

Right! We need to use the batch BB reachability for instructions!

1705

Hmm, certainly. So far, I'm also not sure about this behavior, for the current invoke example.
when PDom = %rv = invoke i32 @use_nocapture(ptr %src) to label %suc unwind label %unw

((*PDT)[PDom->getParent()]->getIDom() == nullptr) => 0
((*PDT)[PDom->getParent()]->getIDom()->getBlock() == nullptr) => 1

Something wrong might be on PDom. Anyway, I could handle both nullptr conservatively.

khei4 edited the summary of this revision. (Show Details)Jul 21 2023, 1:13 AM
khei4 retitled this revision from [MemCpyOpt] implement multi BB stack-move optimization to (WIP) [MemCpyOpt] implement multi BB stack-move optimization.Aug 7 2023, 11:54 PM
khei4 updated this revision to Diff 548450.Aug 8 2023, 8:44 PM

rebase and a bailout for MemoryAccess creation on lifetime.end inserting for non-user post dominator.

khei4 retitled this revision from (WIP) [MemCpyOpt] implement multi BB stack-move optimization to [MemCpyOpt] implement multi BB stack-move optimization.Aug 8 2023, 8:44 PM
nikic added inline comments.Aug 12 2023, 8:25 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1624

Is this check needed? I would expect that this is covered by adding the successors to the reachability worklist. Then we shouldn't need LoopInfo.

1702

bb -> BB

1710

I don't get what you're doing here with AA. Why is this necessary now and wasn't needed in the single BB case?

khei4 marked an inline comment as done.Aug 13 2023, 8:39 AM
khei4 added inline comments.
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1624

Hmm, you seem right. At least the current test results don't change. I could erase the LoopInfo!

1710

This was to know whether the post dominator has MemoryAccess on SrcAlloca.
If it has MemoryAccess, check whether it's terminator or not, and find the place to insert MemoryAccess.
If it's not, currently skipped because I can't know how to insert MemoryDef or MemoryPhi consistently with any post dominators.

khei4 updated this revision to Diff 549706.Aug 13 2023, 8:41 AM

apply feedback

  • Remove LoopInfo
  • fix comments
nikic added inline comments.Aug 16 2023, 12:08 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1710

Based on D157979, you can pass nullptr as the defining access and use createMemoryAccessInBB with Begin insertion place (when inserting at start of block).

nikic added a comment.Aug 16 2023, 1:01 AM

It looks like a clang-format of the whole file snuck in.

khei4 added a comment.Aug 16 2023, 1:34 AM

It looks like a clang-format of the whole file snuck in.

Ah, ... sorry for the noise.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1710

Sounds great! Thank you! I'll handle it.

khei4 updated this revision to Diff 550663.Aug 16 2023, 1:47 AM
khei4 edited the summary of this revision. (Show Details)
  • remove unrelated clang-format things
  • handle post dominator for each case 1) invoke, 2) user of SrcAlloca, 3) other.
nikic added inline comments.Aug 17 2023, 5:27 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1710

Hm, I really don't like how we distinguish between pdom being a user and the start of a block here. I think it may be better if we explicitly distinguish these two cases during the analysis.

I had something like this in mind:

/// PDom is either an Instruction, or the start of a BasicBlock.
using PDom = PointerUnion<Instruction *, BasicBlock *>;
static PDom findNearestCommonPostDominator(PDom I1, PDom I2,
                                           PostDominatorTree *PDT) {
  auto GetParent = [](PDom I) {
    if (auto *BB = dyn_cast<BasicBlock *>(I))
      return BB;
    return cast<Instruction *>(I)->getParent();
  };
  BasicBlock *BB1 = GetParent(I1);
  BasicBlock *BB2 = GetParent(I2);
  if (BB1 == BB2) {
    if (isa<BasicBlock *>(I1))
      return I1;
    if (isa<BasicBlock *>(I2))
      return I2;
    return cast<Instruction *>(I1)->comesBefore(cast<Instruction *>(I2)) ? I2
                                                                         : I1;
  }
  BasicBlock *PDomBB = PDT->findNearestCommonDominator(BB1, BB2);
  if (!PDomBB)
    return nullptr;
  if (BB2 == PDomBB)
    return I2;
  if (BB1 == PDomBB)
    return I1;
  return PDomBB;
}

For Instruction * PDom you would use createMemoryAccesAfter, for BasicBlock * you would use createMemoryAccessInBB.

khei4 added a comment.Aug 17 2023, 8:42 AM

Thank you for the review!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1710

Yeah, I agree, it seems better to define the insertion point type as you gave. Moreover, checking BB or not could erase the ModRef check!

khei4 updated this revision to Diff 551153.Aug 17 2023, 8:44 AM

Use UnionPointer for post-dominator finding and erase ModRefCheck after that.

nikic added inline comments.Aug 17 2023, 9:16 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434–1437

I think I got these the wrong way around.

If I1 is the start of the block, then I2 is either also the start of the block or comes later in it, so it post-dominates.

1596

Remove TODO?

1714

because

1720

And drop the else below.

khei4 edited the summary of this revision. (Show Details)Aug 18 2023, 12:11 AM
khei4 updated this revision to Diff 551407.Aug 18 2023, 12:20 AM
khei4 marked an inline comment as done.
  • modify description, fix typo
  • fix post-dominator
  • rebase for tests
khei4 marked 3 inline comments as done.Aug 18 2023, 12:21 AM

Thanks!

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434–1437

Ah, thanks! This part only seems wrong(normal dominator rather than post). I added the test to check this, also.

nikic added inline comments.Aug 20 2023, 3:04 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434–1437

This is still the wrong way around (and your new test multi_bb_pdom_test0 does show it: The lifetime.end is before the final use).

1610

I don't understand what this FIXME is about.

1704

nit: Space before (

1714

user -> a user

nikic added inline comments.Aug 20 2023, 3:04 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1722

This should use PDomB->getFirstInsertionPoint(). Can you add a test where we need to insert in a block with phi nodes? In that case, the insertion has to be after the phi nodes, not before.

khei4 updated this revision to Diff 551904.Aug 20 2023, 10:43 PM
khei4 marked 5 inline comments as done.

apply feedbacks

  • refine comments
  • use getFirstInsertionPt
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434–1437

Thanks! Sorry, I didn't notice the reverted change...

1610

This meant adding the Instruction version isPotentiallyReachableFromMany (currently only for BasicBlocks) might be helpful. I copied the code from CFG.h. Although even for the BasicBlocks version, a user is only a few cases.

1722

Thanks! It seems right, I added a test.

This revision is now accepted and ready to land.Aug 21 2023, 7:22 AM
khei4 updated this revision to Diff 552150.Aug 21 2023, 3:02 PM

rebased for test update

nikic added a comment.Aug 22 2023, 3:50 AM

I think the patch description could use an update to focus more on what this patch changes (e.g. the static alloca condition is not new, and just one of many old conditions).

khei4 reopened this revision.Aug 24 2023, 6:50 AM
This revision is now accepted and ready to land.Aug 24 2023, 6:50 AM
khei4 added a comment.EditedAug 24 2023, 6:52 AM

@nikic Thanks! I'll reduce the descriptions!

It seems like a sanitizer crash happened on the createMemoryAccessBefore again. https://lab.llvm.org/buildbot/#/builders/37/builds/24650 I'll revert first!

#4 0x000055b2d243cc22 llvm::MemorySSAUpdater::createMemoryAccessBefore(llvm::Instruction*, llvm::MemoryAccess*, llvm::MemoryUseOrDef*) (/b/sanitizer-x86_64-linux-autoconf/build/tsan_debug_build/bin/clang+++0x5d80c22)
#5 0x000055b2d4bb8be6 llvm::MemCpyOptPass::performStackMoveOptzn(llvm::Instruction*, llvm::Instruction*, llvm::AllocaInst*, llvm::AllocaInst*, unsigned long, llvm::BatchAAResults&) (/b/sanitizer-x86_64-linux-autoconf/build/tsan_debug_build/bin/clang+++0x84fcbe6)
#6 0x000055b2d4bb9714 llvm::MemCpyOptPass::processMemCpy(llvm::MemCpyInst*, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, false>&) (.part.0) MemCpyOptimizer.cpp:0:0
#7 0x000055b2d4bbe301 llvm::MemCpyOptPass::iterateOnFunction(llvm::Function&) (/b/sanitizer-x86_64-linux-autoconf/build/tsan_debug_build/bin/clang+++0x8502301)

Edit) I'm now searching the failure, I also believe this is similar to the last user insertion patterns.

The reproducer is the case dominator doesn't have any memory access for src. As the following

define void @multi_bb_dom_test0(i1 %b) {
  %src = alloca %struct.Foo, align 4
  %dest = alloca %struct.Foo, align 4
  br i1 %b, label %bb0, label %bb1

bb0:
  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
  br label %bb2

bb1:
  store %struct.Foo { i32 40, i32 50, i32 60 }, ptr %src
  br label %bb2

bb2:
  call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false); 1
  %1 = call i32 @use_nocapture(ptr noundef nocapture %dest)

  ret void
}

I'll add a few tests and fix it.

khei4 edited the summary of this revision. (Show Details)Aug 24 2023, 7:16 AM
khei4 updated this revision to Diff 553150.Aug 24 2023, 9:01 AM

implement and use findNearestCommonDominator for InsertionPt

nikic added inline comments.Aug 26 2023, 12:13 PM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434

Shouldn't this be I1?

khei4 updated this revision to Diff 553757.Aug 26 2023, 1:26 PM

tweak comment

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434

In this case, I use BasicBlock to represent the terminator of the block, while it's first non-phi in the PDom. This might be confusing, so I commented on it just in time.

nikic accepted this revision.Aug 26 2023, 1:33 PM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1434

Oh yeah, that makes sense.

This revision was landed with ongoing or failed builds.Aug 26 2023, 2:50 PM
This revision was automatically updated to reflect the committed changes.
khei4 added a comment.EditedAug 26 2023, 3:48 PM

https://lab.llvm.org/buildbot/#/builders/36/builds/37042

Hmm, seems like the assertion failed somehow. (Is this due to the DominatorTree update being forgotten somewhere?)
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/GenericDomTree.h#L500

 gh -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Object/CMakeFiles/LLVMObject.dir/MachOObjectFile.cpp.o -MF lib/Object/CMakeFiles/LLVMObject.dir/MachOObjectFile.cpp.o.d -o lib/Object/CMakeFiles/LLVMObject.dir/MachOObjectFile.cpp.o -c /home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/lib/Object/MachOObjectFile.cpp
clang++: /home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/include/llvm/Support/GenericDomTree.h:500: NodeT* llvm::DominatorTreeBase<NodeT, IsPostDom>::findNearestCommonDominator(NodeT*, NodeT*) const [with NodeT = llvm::BasicBlock; bool IsPostDom = false]: Assertion `NodeA && "A must be in the tree"' failed.
...
 #6 0x00007fffbaeddc90 __assert_fail_base (/lib64/libc.so.6+0x3dc90)
 #7 0x00007fffbaeddd34 __assert_fail (/lib64/libc.so.6+0x3dd34)
 #8 0x00000000131ae08c llvm::DominatorTreeBase<llvm::BasicBlock, false>::findNearestCommonDominator(llvm::BasicBlock*, llvm::BasicBlock*) const (/home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/install/stage1/bin/clang+++0x131ae08c)
 #9 0x0000000017b08920 llvm::MemCpyOptPass::performStackMoveOptzn(llvm::Instruction*, llvm::Instruction*, llvm::AllocaInst*, llvm::AllocaInst*, unsigned long, llvm::BatchAAResults&)::'lambda0'(llvm::Instruction*, llvm::function_ref<bool (llvm::Instruction*)>)::operator()(llvm::Instruction*, llvm::function_ref<bool (llvm::Instruction*)>) const MemCpyOptimizer.cpp:0:0
#10 0x0000000017b09308 llvm::MemCpyOptPass::performStackMoveOptzn(llvm::Instruction*, llvm::Instruction*, llvm::AllocaInst*, llvm::AllocaInst*, unsigned long,
vitalybuka reopened this revision.Aug 26 2023, 7:26 PM
This revision is now accepted and ready to land.Aug 26 2023, 7:26 PM

@khei4 This happens when one of the instructions is in an unreachable block (disconnected from the CFG and not part of the DT). Missing the special cases here: https://github.com/llvm/llvm-project/blob/e41d383440883fa2f87012b765fcc2d450f6fd55/llvm/lib/IR/Dominators.cpp#L350-L353

khei4 updated this revision to Diff 553843.Aug 27 2023, 10:56 PM

@nikic Thanks! fixed.

nikic added a comment.Aug 28 2023, 3:33 AM

Is it necessary to handle this for the PDT case as well, or can it not happen there? (Maybe all blocks get connected to the virtual exit block there?)

Is it necessary to handle this for the PDT case, or can it not happen there? (Maybe all blocks get connected to the virtual exit block there?)

@nikic Thanks! It sounds like a good point! I couldn't follow completely, but I do believe so; DT's entry is PDT's leaves, so I guess we can calculate PDom if it's unreachable from entry, but I'll add the test for the case that contains a cycle unreachable from entry.

khei4 updated this revision to Diff 554019.Aug 28 2023, 9:50 PM

rebase for the test

nikic accepted this revision.Aug 29 2023, 12:07 AM

LGTM

llvm/test/Transforms/MemCpyOpt/stack-move.ll
620

There are these weird ; 1 comments in some of the tests.

khei4 updated this revision to Diff 554212.Aug 29 2023, 1:27 AM

remove test noises

This revision was landed with ongoing or failed builds.Aug 29 2023, 3:40 AM
This revision was automatically updated to reflect the committed changes.
bgraur added a subscriber: bgraur.Sep 7 2023, 7:09 AM
bgraur added a comment.Sep 7 2023, 7:39 AM

@khei4 the last commit is causing again lots of false positive for tests executed under asan.

We (the google compilers team) got several tests with no asan findings when built before rG3a1409f93da32bf626f76257e0aac71716f2f67e that trigger stack-use-after-scope when built with this commit.

Coming up with a reproducer is pretty time consuming, could you please revert this until one is available?

vitalybuka reopened this revision.Sep 7 2023, 11:19 AM
vitalybuka added a subscriber: vitalybuka.

@khei4 the last commit is causing again lots of false positive for tests executed under asan.

We (the google compilers team) got several tests with no asan findings when built before rG3a1409f93da32bf626f76257e0aac71716f2f67e that trigger stack-use-after-scope when built with this commit.

Coming up with a reproducer is pretty time consuming, could you please revert this until one is available?

reverted in efe8aa2e618122e8050af10cc5d6ad83f24ef557

If this optimization is critical, maybe you can split this patch in two?

  1. BB stack-move optimization which strips all lifetime markers from src alloca
  2. insert alloca on dst

Then we can continue related/revert patch 2 as needed.

This revision is now accepted and ready to land.Sep 7 2023, 11:19 AM
nikic added a comment.Sep 8 2023, 3:56 AM

Conceptually, this optimization can only cause false negatives with asan, not false positives. We'll need a reproducer to understand what's going wrong.

Conceptually, this optimization can only cause false negatives with asan, not false positives. We'll need a reproducer to understand what's going wrong.

Yes, that's why I think that something is definitely wrong with lifetimes, so better to revert ASAP.

And I believe @joanahalili is working on reproducer. Let me know if I can help.

khei4 added a comment.Sep 11 2023, 3:30 AM

@vitalybuka @bgraur @joanahalili
Thank you for reporting this!

FWIW, this transformation itself ensures 

  1. allocas are never captured on IR level. (This is the requirements for this transformation.)
  2. newly inserted lifetime.start dominates all use of memory location, and inserted end post-dominates those.

So, at least on the IR level, I believe this doesn't introduce any problematic use for the memory location, but certainly, this changes the memory use of the original source code, and it may introduce the unexpected use of the address, out of the scope that original source code had, by inserting (post)dominating lifetime intrinsics. I assumed lifetime markers are just overwriting the memory location by undef values, and the original lifetime ranges (scope?) for the memory location could be changed by the optimization if we correctly attach lifetime intrinsics, but I think it was wrong for the asan, right? (I'm not sure about the actual false positive case, any concrete counter-example will be appreciated.)
I should have cared about the original lifetime to kindly introduce this optimization, as vitaly says https://reviews.llvm.org/D153453#4556386. I believe this optimization is effective practically e.g. for the Rust programs that introduce memcpy for move, so splitting the patch sounds reasonable.

Assuming that inserting the new lifetime marker(or use) for an address out of the original scope would be problematic for asan, I want to try to use the single range lifetime completely covered other one, as vitaly suggested, for the first step!

khei4 updated this revision to Diff 556518.Sep 11 2023, 8:29 PM

strip the src lifetime if the transform suceed

khei4 added a comment.Sep 11 2023, 8:29 PM

@vitalybuka Ah, sorry, I was confused and not sure what you said. Did you mean just stripping the src lifetime work?

@khei4 the last commit is causing again lots of false positive for tests executed under asan.

We (the google compilers team) got several tests with no asan findings when built before rG3a1409f93da32bf626f76257e0aac71716f2f67e that trigger stack-use-after-scope when built with this commit.

Coming up with a reproducer is pretty time consuming, could you please revert this until one is available?

reverted in efe8aa2e618122e8050af10cc5d6ad83f24ef557

If this optimization is critical, maybe you can split this patch in two?

  1. BB stack-move optimization which strips all lifetime markers from src alloca
  2. insert alloca on dst

Then we can continue related/revert patch 2 as needed.

vitalybuka added a comment.EditedSep 13 2023, 1:32 PM

@vitalybuka Ah, sorry, I was confused and not sure what you said. Did you mean just stripping the src lifetime work?

Yes.

  1. First you land the MemCpyOpt patch which remove lifetime from the preserved alloca. (assuming perf improvement from lifetime markers is smaller then from MemCpyOpt)
  2. Then you land another patch with "correct" lifetimes. Looks like this is most tricky part so we can iterate here.

We have preliminary repro, which I will share by tomorrow.

But I still recommend to split the patch as above. Even with repro, I would not be surprised we hit another case and have to revert.

@khei4 the last commit is causing again lots of false positive for tests executed under asan.

We (the google compilers team) got several tests with no asan findings when built before rG3a1409f93da32bf626f76257e0aac71716f2f67e that trigger stack-use-after-scope when built with this commit.

Coming up with a reproducer is pretty time consuming, could you please revert this until one is available?

reverted in efe8aa2e618122e8050af10cc5d6ad83f24ef557

If this optimization is critical, maybe you can split this patch in two?

  1. BB stack-move optimization which strips all lifetime markers from src alloca
  2. insert alloca on dst

Then we can continue related/revert patch 2 as needed.

vitalybuka accepted this revision.Sep 13 2023, 1:39 PM

Yes, like this.

LGTM,

If you can wait a few hours I try the patch on our internal code.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1678–1730

Please drop TODO: Reconstruct merged lifetime markers.

Please update llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll

khei4 updated this revision to Diff 556741.Sep 13 2023, 8:45 PM
  • update lifetime-missing.ll
  • drop TODO comment.
khei4 added a comment.Sep 13 2023, 8:47 PM

@vitalybuka Thank you a lot! I'll try to land this in a day!

If you can wait a few hours I try the patch on our internal code.

I don't see any regressions.