This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC)
ClosedPublic

Authored by khei4 on May 19 2023, 7:44 AM.

Diff Detail

Unit TestsFailed

Event Timeline

khei4 created this revision.May 19 2023, 7:44 AM
khei4 requested review of this revision.May 19 2023, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2023, 7:44 AM
khei4 updated this revision to Diff 523777.May 19 2023, 7:51 AM
khei4 edited the summary of this revision. (Show Details)

add todo and struct case.

khei4 edited the summary of this revision. (Show Details)May 19 2023, 7:55 AM
nikic added inline comments.May 20 2023, 1:10 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
398

Drop noundef, it's not relevant. Can also change return type to void. I also don't think noalias readonly on the function argument matter, only align 4. The noalias readonly are relevant on the call only.

409

Also needs nocapture, otherwise the address may be significant.

StephenFan added inline comments.May 20 2023, 1:52 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
397

For readonly, the LangRef says: This attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to.
IIUC, it means there is no write that through this argument, but we can't guarantee that the memory is not written. So maybe we need to make sure the function @fnr has the attribute memory(read)?

nikic added inline comments.May 20 2023, 1:54 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
397

The combination of readonly and noalias guarantees this. Due to the noalias the only write can happen though that argument, which is readonly.

StephenFan added inline comments.May 20 2023, 2:05 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
397

Thanks for your explanation!

khei4 added inline comments.May 20 2023, 10:15 PM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
398

Drop noundef, it's not relevant. Can also change return type to void.

Thanks! They seemed correct!

I also don't think noalias readonly on the function argument matter, only align 4. The noalias readonly are relevant on the call only.

I worry about the case like

#include <thread>

void ro(int &d) { d; }

void w(int &d) { d = 13; }

int main() {
  int *src, *dest1, *dest2;
  *src = 42;
  std::memcpy(dest1, src, sizeof dest1);
  std::thread ro_t1(ro, dest1);
  std::memcpy(dest2, src, sizeof dest2);
  std::thread w_t(w, dest2);
  std::thread ro_t2(ro, dest2);
}

(although I'm not sure noalias can be actually attributed for this specific case)

It seems like the transformation would break the noalias property of the callee parameter, because memcpy might be used to introduce the actual noalias property.

409

Thanks! Yeah, removing memcpy by only noalias and readonly might broke capturing behavior!
Also I noticed readnone doesn't need nocapture, so I'll add a test for that case :)

khei4 edited the summary of this revision. (Show Details)May 20 2023, 10:25 PM
khei4 updated this revision to Diff 524079.May 20 2023, 11:00 PM
khei4 retitled this revision from precommit test for memcpy removal for noalias, readonly attributed arguments to [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC.
khei4 edited the summary of this revision. (Show Details)

apply feedback, add cases for attributes union of Callsite and Parameter and negative case

khei4 retitled this revision from [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC to [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC).May 20 2023, 11:00 PM
khei4 updated this revision to Diff 524092.May 21 2023, 3:58 AM

add ret void and update tests (Sorry for being noisy...)

khei4 added inline comments.May 21 2023, 4:07 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
409

Also I noticed readnone doesn't need nocapture, so I'll add a test for that case :)

Sorry, that's not. Maybe readnone only guarantee no-deref, so capturing with readnone seems valid.

nikic added inline comments.May 21 2023, 10:08 AM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
398

Yes, I think you're right. Basically, in terms of clobbers, the things we have to ensure are:

  • The memcpy dst is not modified by the call. (The noalias readonly ensures this.)
  • The memcpy dst is not modified between the memcpy and the call. (Needs MSSA clobber check.)
  • The memcpy src is not modified between the memcpy and the call. (Needs MSSA clobber check.)
  • The memcpy src is not modified by the call. Needs a ModRef check. This is the bit that could be ensured by noalias readonly the function parameter. (But also through other means, e.g. if the source is an unescaped alloca.)
409

Yes, a readnone argument can still be captured. Even if the function itself is memory(none) it can still be captured via return value or via unwind/divergence side effects.

khei4 added inline comments.May 21 2023, 11:32 PM
llvm/test/Transforms/MemCpyOpt/memcpy.ll
398

Thank you for the review! It saves me a lot!
My example was a little subtle to say the necessity for noalias, but now become clear!

409

Thanks! I see!

khei4 updated this revision to Diff 524171.May 21 2023, 11:52 PM

add variational attributes nowrite for memory

khei4 updated this revision to Diff 524200.May 22 2023, 2:17 AM

update tests

To write down what I said on the last call: For this transform, we need to be careful because we don't know what kind of accesses the called function will perform. The function may have an unknown assumption on the alignment and may access an unknown number of bytes. As such, we can only perform the transform if we know the alignment/size of the object that is passed to the function.

If we have something like this

define void @test(i1 %c, ptr noalias %val) {
  %val1 = alloca [4 x i8], align 4
  %val2 = alloca [16 x i8], align 16
  %val3 = select i1 %c, ptr %val1, ptr %val2
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %val3, ptr align 4 %val, i64 4, i1 false)
  call void @f(ptr nocapture noalias readonly %val3)
  ret void
}

then it might be that %c is always known false, so @f assumes it gets the %val2 = alloca [16 x i8], align 16 pointer. As such, it might assume that the pointer is aligned to 16 bytes and is 16 bytes large. As such, we need to be sure that %val satisfies this as well if we do the replacement.

I think in practice, this means we should only do the replacement if a) the pointer passed to the function is an alloca (which ensures we know alignment & size) b) the memcpy copies the full size of the alloca (which ensures that the new pointer is dereferencable for the required range) and c) the new pointer has alignment >= the alloca alignment.

khei4 added a comment.May 25 2023, 6:08 AM

Thank you so much!

To write down what I said on the last call: For this transform, we need to be careful because we don't know what kind of accesses the called function will perform. The function may have an unknown assumption on the alignment and may access an unknown number of bytes. As such, we can only perform the transform if we know the alignment/size of the object that is passed to the function.

If we have something like this

define void @test(i1 %c, ptr noalias %val) {
  %val1 = alloca [4 x i8], align 4
  %val2 = alloca [16 x i8], align 16
  %val3 = select i1 %c, ptr %val1, ptr %val2
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %val3, ptr align 4 %val, i64 4, i1 false)
  call void @f(ptr nocapture noalias readonly %val3)
  ret void
}

then it might be that %c is always known false, so @f assumes it gets the %val2 = alloca [16 x i8], align 16 pointer. As such, it might assume that the pointer is aligned to 16 bytes and is 16 bytes large. As such, we need to be sure that %val satisfies this as well if we do the replacement.

Sounds reasonable! For the first step, I'll assume *syntactically* possible args' alignments are coming to the args, and see whether they satisfy the following conditions. I'll entrust further analysis(for example, the above always-known-value case you gave without any folding) on other passes except MemCpyOptimizer, at least for the first step! (I think it might not be valuable to do it on this pass).
I'll add 1) both valid alignment selected case, 2) invalid-valid selected case (like you gave,)!

I think in practice, this means we should only do the replacement if
a) the pointer passed to the function is an alloca (which ensures we know alignment & size)
b) the memcpy copies the full size of the alloca (which ensures that the new pointer is dereferencable for the required range) and
c) the new pointer has alignment >= the alloca alignment.

Sounds good! Moreover, b seems like possible to extend less than the size of alloca :).

I'll add tests to check each conditions! Thanks!

khei4 updated this revision to Diff 525611.May 25 2023, 7:46 AM

add the cases memcpy and alloca have alignment assumptions
expect removable if just no alignment attributes are given to the call site.

khei4 updated this revision to Diff 526253.May 27 2023, 3:52 AM
khei4 retitled this revision from [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC) to (WIP) [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC).

update dupped tests name, also refinement might be needed

khei4 planned changes to this revision.May 27 2023, 3:52 AM
khei4 updated this revision to Diff 526254.May 27 2023, 3:56 AM

update tests (sorry for being noisy

khei4 updated this revision to Diff 526255.May 27 2023, 3:58 AM

not yet handled precisely, but remove basic cases without seeing alignment

khei4 updated this revision to Diff 526306.May 27 2023, 8:13 PM
khei4 edited the summary of this revision. (Show Details)

remove redundant tests, update tests

khei4 updated this revision to Diff 526518.May 29 2023, 11:14 PM
khei4 retitled this revision from (WIP) [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC) to [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC).
khei4 edited the summary of this revision. (Show Details)

add test cases

  • no noalias attributed arg src
  • unescaped alloca src
  • modified between memcpy and call
khei4 updated this revision to Diff 526992.May 31 2023, 4:14 AM
khei4 retitled this revision from [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC) to (WIP) [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC).

tweak

  • add bigger align case
  • reduce unnecessary branch on some tetst

TODO: sort test cases to correspond the transformation branch

khei4 planned changes to this revision.May 31 2023, 4:14 AM
khei4 updated this revision to Diff 528077.Jun 3 2023, 12:58 AM
khei4 retitled this revision from (WIP) [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC) to [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC).

fix

  • wrong tests
  • bigger alignment assumption
  • comments' initial cases
khei4 updated this revision to Diff 528079.Jun 3 2023, 1:29 AM
This comment was removed by khei4.
khei4 updated this revision to Diff 528081.Jun 3 2023, 1:31 AM

revert wrong diff update

khei4 updated this revision to Diff 528088.Jun 3 2023, 3:22 AM

add cases

  • VLA alloca
  • non-alloca destination
khei4 updated this revision to Diff 528285.Jun 4 2023, 10:23 PM

remove smaller memcpy length case temporary

khei4 added a comment.Jun 4 2023, 10:24 PM

I think if the dest is unescaped alloca, allocasize and memcpy length corresponding is unnecessary, because

  • If alloca length < memcpy length, memcpy is UB so we could transform it anyway?
  • If memcpy length < alloca length, if the alloca is unescaped transformation will widen the dereferenceability for param, but dereferenceable(bigger) subsumes dereferenceable(smaller) so, some UB will non-UB, but inverse doesn't happens. This will be valid?

example: https://llvm.godbolt.org/z/Pd383eo46

But I feel like it’s a little bit too detailed to consider before merging the general case ;)

nikic added a comment.Jun 4 2023, 11:56 PM

I think if the dest is unescaped alloca, allocasize and memcpy length corresponding is unnecessary, because

  • If alloca length < memcpy length, memcpy is UB so we could transform it anyway?

This should be safe (though not very useful...)

  • If memcpy length < alloca length, if the alloca is unescaped transformation will widen the dereferenceability for param, but dereferenceable(bigger) subsumes dereferenceable(smaller) so, some UB will non-UB, but inverse doesn't happens. This will be valid?

example: https://llvm.godbolt.org/z/Pd383eo46

In this case, the original pointer (%val1) is dereferenceable to 4 bytes, but the new one (%val) is only known dereferenceable to 1 byte, so we are replacing dereferencable(bigger) with dereferenceable(smaller), no? That wouldn't be valid.

khei4 added a comment.Jun 5 2023, 3:50 AM

In this case, the original pointer (%val1) is dereferenceable to 4 bytes, but the new one (%val) is only known dereferenceable to 1 byte, so we are replacing dereferencable(bigger) with dereferenceable(smaller), no? That wouldn't be valid.

Yeah. As you said not only memcpy length and alloca size, but the new pointer's dereferenceability matters. Yeah, if %val1's dereferenceablity is shown in someway(although I'm not sure alloca can be ensured dereferenceable without initializer), that'll be a problem! Thanks! But anyway currently these case seem a little far from a practical case!

khei4 updated this revision to Diff 529167.Jun 6 2023, 10:58 PM

add

  • smaller alignment failure case
  • address space failure case
  • scalable vector failure case
khei4 updated this revision to Diff 529240.Jun 7 2023, 3:57 AM

add cases

  • dst modified
  • enforced alignment

remove obsolete comment
replace unnecessary homogeneous scalable aggregate

khei4 updated this revision to Diff 529278.Jun 7 2023, 6:24 AM

update alignment enforcing test

nikic added a comment.Jun 8 2023, 12:12 AM

I think this is missing tests where some of the attributes on the argument are missing: E.g. only readonly nocapture but not noalias or only readonly noalias but not nocapture.

khei4 updated this revision to Diff 529534.Jun 8 2023, 1:49 AM

add

  • may alias failure
  • may capture failure
  • may write failure

Although on may alias case might be conservative because current implementation checks unescaped alloca only.

nikic accepted this revision.Jun 8 2023, 5:19 AM

LGTM

This revision is now accepted and ready to land.Jun 8 2023, 5:19 AM