This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Avoid infinite loop in processMemSetMemCpyDependence (PR54983)
ClosedPublic

Authored by nikic on Apr 20 2022, 2:19 AM.

Details

Summary

This adds an additional transform to drop zero-size memcpys, also in the case where the size is only zero after instruction simplification. the motivation is the case from PR54983 where the size is non-trivially zero, and processMemSetMemCpyDependence() keeps trying to reduce the memset size by zero bytes.

I'm a bit unsure on this patch -- it's not really principled. It only works on the premise that if InstSimplify doesn't realize the size is zero, then AA also won't.

The principled approach would be to instead add a isKnownNonZero() guard to the processMemSetMemCpyDependence() transform, but I suspect that would render that optimization mostly useless (at least it breaks all the existing test coverage -- worth noting that the constant size case is also handled by DSE, so I think this transform is primarily about the dynamic size case). Though I can't say I particularly care about this transform either, so I'd be happy to go with that direction instead.

Fixes https://github.com/llvm/llvm-project/issues/54983.
Fixes https://github.com/llvm/llvm-project/issues/64886.

Diff Detail

Event Timeline

nikic created this revision.Apr 20 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 2:19 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
nikic requested review of this revision.Apr 20 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 2:19 AM
nikic updated this revision to Diff 552253.Aug 22 2023, 1:07 AM
nikic edited the summary of this revision. (Show Details)

Rebase and handle undef/poison.

fhahn added inline comments.Aug 22 2023, 2:26 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1300

I think the underlying issue here is that the builder folds this GEP to Dest is SrcSize is poison. This in turn leads to the new MemSet to must-alias the original memcpy, causing the cycle IIUC. Could we break the cycle here by bailing out if the pointer folds to Dest as more general solution and leave the cleanup of the cases where the size simplifies to 0 to instcombine?

llvm/test/Transforms/MemCpyOpt/memcpy-zero-size.ll
35

might be good to use pointer arguments passed to the function here, instead of null/inttoptr so the poison is the only issue here

nikic updated this revision to Diff 552320.Aug 22 2023, 5:24 AM

Avoid copy from null in test.

nikic added inline comments.Aug 22 2023, 5:25 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1300

In this case, yes:

%1 = icmp ule i64 %len, poison
%2 = sub i64 %len, poison
%3 = select i1 %1, i64 0, i64 %2
%4 = icmp ule i64 %3, poison
%5 = sub i64 %3, poison
%6 = select i1 %4, i64 0, i64 %5
call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 -1 to ptr), i8 0, i64 %6, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr inttoptr (i64 -1 to ptr), ptr null, i64 poison, i1 false)
ret void

But not in the other case:

%size = shl i64 0, 0
%1 = icmp ule i64 1, %size
%2 = sub i64 1, %size
%3 = select i1 %1, i64 0, i64 %2
%4 = getelementptr i8, ptr %p, i64 %size
%5 = icmp ule i64 %3, %size
%6 = sub i64 %3, %size
%7 = select i1 %5, i64 0, i64 %6
%8 = getelementptr i8, ptr %p, i64 %size
call void @llvm.memset.p0.i64(ptr align 1 %8, i8 0, i64 %7, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr %p, ptr %p2, i64 %size, i1 false)
ret void

Here the pointers aren't directly equal, but AA can determine they're equal.

So just checking that we get back the old Dest wouldn't fully avoid the issue.

llvm/test/Transforms/MemCpyOpt/memcpy-zero-size.ll
35

I've replaced the null, but the inttoptr is important for the IRBuilder constant folding.

davide accepted this revision.Aug 28 2023, 5:44 PM
davide added a subscriber: davide.

This sounds reasonable to me. Thanks, assuming @fhahn has no other comments.

This revision is now accepted and ready to land.Aug 28 2023, 5:44 PM

Proving the size is non-zero is a higher bar than proving the zero case and eliminating the memcpy in those cases. So this seems reasonable to me.

In cases where zero length cannot be proven by AA/InstSimplify in processMemCpy, is it still possible to end up in an infinite loop in processMemSetMemCpyDependence?