This is an archive of the discontinued LLVM Phabricator instance.

[BasicAA] Add support for memmove intrinsic
ClosedPublic

Authored by ebrevnov on Jan 12 2022, 1:34 AM.

Details

Summary

Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic.

Diff Detail

Event Timeline

ebrevnov created this revision.Jan 12 2022, 1:34 AM
ebrevnov requested review of this revision.Jan 12 2022, 1:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 1:34 AM
xbolva00 added inline comments.
llvm/lib/Analysis/BasicAliasAnalysis.cpp
1012–1013

Update comment

nikic requested changes to this revision.Jan 12 2022, 1:46 AM

Pretty sure this is dead code and the generic code already takes care of it. Your tests also pass for me without this patch.

This revision now requires changes to proceed.Jan 12 2022, 1:46 AM
fhahn added inline comments.Jan 12 2022, 1:49 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
1013–1019

Is this correct?

LangRef says the following for memmove. Note that the inputs may overlap, which is different to what the requirement stated in the comment.

The ‘llvm.memmove.*’ intrinsics copy a block of memory from the source location to the destination location, which may overlap. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.

If <len> is 0, it is no-op modulo the behavior of attributes attached to the arguments. If <len> is not a well-defined value, the behavior is undefined. If <len> is not zero, both <dest> and <src> should be well-defined, otherwise the behavior is undefined.

https://llvm.org/docs/LangRef.html#id505

xbolva00 added inline comments.Jan 12 2022, 1:50 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
1003

There is better support for inaccessiblememonly now.

Pretty sure this is dead code and the generic code already takes care of it. Your tests also pass for me without this patch.

For some reason generic code hadn't handled my original test case. Ok, let me check why... probably we should make generic code to be more generic :-)

Pretty sure this is dead code and the generic code already takes care of it. Your tests also pass for me without this patch.

For some reason generic code hadn't handled my original test case. Ok, let me check why... probably we should make generic code to be more generic :-)

The problem turned out to be in the check "AAQI.CI->isNotCapturedBeforeOrAt(Object, Call)" at BasicAliasAnalysis.cpp:935. AFAIU, If called function is marked with "inaccessiblemem_or_argmemonly" (or stronger) then we may simply ignore this check. Do you think this is a move in the right direction?
I got expected result If I change the mentioned line to:

(AAQI.CI->isNotCapturedBeforeOrAt(Object, Call) || AAResults::onlyAccessesInaccessibleOrArgMem(getModRefBehavior(Call)))

I got expected result If I change the mentioned line to:

(AAQI.CI->isNotCapturedBeforeOrAt(Object, Call) || AAResults::onlyAccessesInaccessibleOrArgMem(getModRefBehavior(Call)))

Er, this is definitely unsound in general. It's plausibly correct for the memtransfer routines since they don't load any pointers from existing state, but not in general.

Looking at it, I think you're actual problem is that we're missing generic handling for the argmemonly + readonly/writeonly param case. The existing memcpy case should be subsumable by that, and so should the memmove case.

Having said that, generalizing to memmove first, and then generic seems entirely reasonable to me. (i.e. I have no problem with a cleaned up version of this patch landing.)

llvm/lib/Analysis/BasicAliasAnalysis.cpp
1013–1019

Well, the comment above is definitely wrong after allowing memmove, but I don't see the guarded code depends on the comment at all. We could just remove the comment.

nikic added a comment.Jan 12 2022, 8:10 AM

@reames The generic argmemonly + readonly/writeonly case is handled in https://github.com/llvm/llvm-project/blob/6db04b97e6a2873cbf1252b7b6b7efedf3fceb4e/llvm/lib/Analysis/AliasAnalysis.cpp#L248 I believe.

I'd like to see a test case that shows the issue, because I'm not sure I understand which case we're currently failing to handle.

@nikic Thanks for the pointer. I agree. This looks like it should be fully handled already.

@ebrevnov You really need a test case to motivate this. Maybe file a bug and share some analysis on how you got here?

Thank you all for your input. The reason why generic argmemonly case (pointed by @nikic) doesn't handle my case is presence of deopt bundle on the call. Currently, "BasicAAResult::getModRefBehavior(const CallBase *Call)" simply ignores function attributes in presence of deopt bundles. By specification: "From the compiler’s perspective, deoptimization operand bundles make the call sites they’re attached to at least readonly. They read through all of their pointer typed operands (even if they’re not otherwise escaped) and the entire visible heap". That has two consequences:

  1. Current code which special case AnyMemCpyInst (the one I was trying to extend to any memory transfer) is incorrect because it may return NoModRef if memcpy intrinsic has operand bundles. Should it be fixed or just removed (can remove only once the next item is resolved though)?
  2. It's impossible to describe memcpy/memmove modref behavior with existing encoding defined by FunctionModRefBehavior. In particular, for memmove/memcopy with deopt bundle we need to encode "reads everything, writes inaccessible or argmemonly". Current scheme doesn't support setting different modref for individual type of memory. The best solution which came to me so far is to split available space (currently it's 32-bits given by "enum FunctionModRefBehavior") to 4 regions (4 regions is enough to represent 8 variants defined by "enum class ModRefInfo") and associate each region with specific modref value from ModRefInfo enum. With this approach we can represent any variation for 8 independent memory types in 32-bits which seems to be enough (assuming modref info won't change). In case that turns out to be insufficient can easily double number of supported attributes by switching to 64-bits.

To make it more clear I can give an example. Say we need to encode "reads everything, writes inaccessible or argmemonly". Assume we split our 32-bits by regions in the following way "MustMod | MustRef | Mod | Ref" (each region is 4-bits). Then the value would be "FMRL_Nowhere | FMRL_Nowhere | FMRL_ArgumentPointees_OR_FMRL_InaccessibleMem | FMRL_Anywhere".
Do you think this is reasonable? Anything better?
Anyway I assume it should be discussed on dev list...

Thanks
Evgeniy

@ebrevnov Thanks for the explanation, that makes a lot more sense!

Your explanation raises a question for me. Have we ever defined the semantics of operand bundles on intrinsic calls? We've certainly done so for arbitrary calls, but intrinsics are somewhat "special". The whole point of them is that they have known semantics. This is purely a "I don't remember" question, the answer may be obvious. I'm just hoping for reminder/pointer to something to spark my memory.

On the solution, I'm really hesitant to generalize the modref behavior for this case. That's a bunch of complexity for one narrow example. I think we need a couple more motivating examples before that becomes reasonable.

On the other hand, patching the memcpy special case to be correct with operand bundles, and then applying this patch so it applies to memmove too would seem reasonable. Particularly since we can now include a nice comment about about why this is not handled by more generic code elsewhere. :)

ebrevnov updated this revision to Diff 402468.Jan 24 2022, 4:01 AM
This comment was removed by ebrevnov.
ebrevnov updated this revision to Diff 402475.Jan 24 2022, 4:15 AM

Fix comment

@ebrevnov Thanks for the explanation, that makes a lot more sense!

Your explanation raises a question for me. Have we ever defined the semantics of operand bundles on intrinsic calls? We've certainly done so for arbitrary calls, but intrinsics are somewhat "special". The whole point of them is that they have known semantics. This is purely a "I don't remember" question, the answer may be obvious. I'm just hoping for reminder/pointer to something to spark my memory.

Don't know either. As I can judge from common logic and implementation of "CallBase::hasReadingOperandBundles()" (especially comment) the same rules should be applied to intrinsics.

On the solution, I'm really hesitant to generalize the modref behavior for this case. That's a bunch of complexity for one narrow example. I think we need a couple more motivating examples before that becomes reasonable.

That might be pretty general problem for us as virtually any call has deopt bundle. I'm gathering statistics to better understand the scope. Will share once available.

On the other hand, patching the memcpy special case to be correct with operand bundles, and then applying this patch so it applies to memmove too would seem reasonable. Particularly since we can now include a nice comment about about why this is not handled by more generic code elsewhere. :)

Agree, it's not clear how fast if ever we can support that on general matters. Fixed the comment and rebased on top of D118033

nikic added a comment.Jan 24 2022, 5:41 AM

@ebrevnov For what it's worth, I think your suggestion to split FMRB into per-location-kind modref information makes sense. That seems like the right high-level modelling approach.

reames accepted this revision.Jan 25 2022, 7:34 AM

LGTM to this patch, assuming you update the commit comment to reflect the framing change of this being a bugfix, and why.

If you want to explore the broader aliasing change, we can. It sounds like @nikic things this might be useful. I'm a bit skeptical, but we can explore that in it's own review.

nikic accepted this revision.Jan 26 2022, 1:04 AM
This revision is now accepted and ready to land.Jan 26 2022, 1:04 AM
This revision was landed with ongoing or failed builds.Jan 28 2022, 3:19 AM
This revision was automatically updated to reflect the committed changes.