Page MenuHomePhabricator

[MemCpyOpt] Correctly merge alias scopes during call slot optimization
Needs ReviewPublic

Authored by modimo on Mon, Nov 16, 3:44 PM.

Details

Summary

When MemCpyOpt performs call slot optimization it will concatenate the alias.scope metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results.

The fix is to take the intersection of domains then union the scopes within those domains.

The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case %src got forwarded past its lifetime leading to a dereference of garbage data.

Testing
ninja check-llvm

Diff Detail

Event Timeline

modimo created this revision.Mon, Nov 16, 3:44 PM
modimo requested review of this revision.Mon, Nov 16, 3:44 PM
modimo edited the summary of this revision. (Show Details)
modimo added a subscriber: test.
modimo updated this revision to Diff 305616.Mon, Nov 16, 3:46 PM

Add more comments in test case

modimo edited the summary of this revision. (Show Details)Mon, Nov 16, 3:46 PM

I don't get this fix. Adding extra scopes to alias.scope should always be conservatively correct. I think there may be some confusion here due to the malformed noalias metadata you're using.

llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
35

This alias.scope metadata looks corrupt. alias.scope accepts a list of scopes. I don't see any scopes or domains here.

hoy added a subscriber: hoy.Tue, Nov 17, 4:02 PM

Adding extra scopes to alias.scope should always be conservatively correct

That is also my understanding, but per https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata, in particular this statement:

"When evaluating an aliasing query, if for some domain, the set of scopes with that domain in one instruction’s alias.scope list is a subset of (or equal to) the set of scopes for that domain in another instruction’s noalias list, then the two memory accesses are assumed not to alias.

, if the extra scopes being added meet the condition above, a no-alias answer could be given. This doesn't sound conservative to me. Please correct me if I miss anything.

In D91576#2401162, @hoy wrote:

Adding extra scopes to alias.scope should always be conservatively correct

That is also my understanding, but per https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata, in particular this statement:

"When evaluating an aliasing query, if for some domain, the set of scopes with that domain in one instruction’s alias.scope list is a subset of (or equal to) the set of scopes for that domain in another instruction’s noalias list, then the two memory accesses are assumed not to alias.

, if the extra scopes being added meet the condition above, a no-alias answer could be given. This doesn't sound conservative to me. Please correct me if I miss anything.

Note the "is a subset" requirement. If you add more scopes to the list, then the list is a subset of strictly less noalias scope lists. {!0} is a subset of {!0}, but {!0, !1} is not a subset of {!0}.

In the description, you say:

This causes future optimizations to get incorrect AA results leading to bad code.

Do you have an example/bugreport that shows this ? Based on this testcase, I cannot conclude that what is produced today is wrong. The only thing that might happen is that you get more 'aliasing', as the !alias.scopes have been merged.

hoy added a comment.Wed, Nov 18, 9:25 AM
In D91576#2401162, @hoy wrote:

Adding extra scopes to alias.scope should always be conservatively correct

That is also my understanding, but per https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata, in particular this statement:

"When evaluating an aliasing query, if for some domain, the set of scopes with that domain in one instruction’s alias.scope list is a subset of (or equal to) the set of scopes for that domain in another instruction’s noalias list, then the two memory accesses are assumed not to alias.

, if the extra scopes being added meet the condition above, a no-alias answer could be given. This doesn't sound conservative to me. Please correct me if I miss anything.

Note the "is a subset" requirement. If you add more scopes to the list, then the list is a subset of strictly less noalias scope lists. {!0} is a subset of {!0}, but {!0, !1} is not a subset of {!0}.

If !0 and !1 don't share the same domain, alias.scope {!0, !1} and noalias {!0} can still be considered non-aliased, since the statement says "if for some domain, ..." ?

BTW, what does a domain stand? Does it represent a set of abstract memory locations? Thanks.

modimo added a comment.EditedWed, Nov 18, 2:44 PM

I've looked more into what alias.scopes are and I think I've got a better handle on what's actually wrong. In the meantime let me try to describe the original problem better. The first section is checking my understanding and making sure we're on the same page. Skip to the second part for the bug details.

My understanding of domains

Let's say we have the following (I slightly modified test/Transforms/Inline/noalias.ll for this):

define void @hello(float* noalias nocapture %a, float* nocapture readonly %c) #0 {
entry:
  %0 = load float, float* %c, align 4
  %arrayidx = getelementptr inbounds float, float* %a, i64 5
  store float %0, float* %arrayidx, align 4
  ret void
}

define void @hello2(float* noalias nocapture %a, float* nocapture readonly %c) #0 {
entry:
  %0 = load float, float* %c, align 4
  %arrayidx = getelementptr inbounds float, float* %a, i64 5
  store float %0, float* %arrayidx, align 4
  ret void
}

define void @foo(float* nocapture %a, float* nocapture readonly %c) #0 {
entry:
  tail call void @hello(float* %a, float* %c)
  tail call void @hello2(float* %a, float* %c)
  %0 = load float, float* %c, align 4
  %arrayidx = getelementptr inbounds float, float* %a, i64 7
  store float %0, float* %arrayidx, align 4
  ret void
}

After -inline -enable-noalias-to-md-conversion

; Function Attrs: nounwind uwtable
define void @foo(float* nocapture %a, float* nocapture readonly %c) #0 {
entry:
  %0 = load float, float* %c, align 4, !noalias !0
  %arrayidx.i = getelementptr inbounds float, float* %a, i64 5
  store float %0, float* %arrayidx.i, align 4, !alias.scope !0
  %1 = load float, float* %c, align 4, !noalias !3
  %arrayidx.i1 = getelementptr inbounds float, float* %a, i64 5
  store float %1, float* %arrayidx.i1, align 4, !alias.scope !3
  %2 = load float, float* %c, align 4
  %arrayidx = getelementptr inbounds float, float* %a, i64 7
  store float %2, float* %arrayidx, align 4
  ret void
}

attributes #0 = { nounwind uwtable }

!0 = !{!1}
!1 = distinct !{!1, !2, !"hello: %a"}
!2 = distinct !{!2, !"hello"}
!3 = !{!4}
!4 = distinct !{!4, !5, !"hello2: %a"}
!5 = distinct !{!5, !"hello2"}

Looking at the code in AddAliasScopeMetadata I believe what happens is that:

  1. We create a new domain (!2 and !5) for each inline site. Doesn't matter if the callee is the same (say if I called hello twice) two still get created
  2. Scopes are created around arguments that are marked noalias, so in this case !4 and !1
  3. Alias sets are the last to be created, these are what's tagged on the actual instructions themselves (!0 and !3)

When querying AA results by adding -basic-aa -scoped-noalias-aa -aa-eval -evaluate-aa-metadata -print-all-alias-modref-info to opt on this pair:

MayAlias:   %0 = load float, float* %c, align 4, !noalias !0 <->   store float %1, float* %arrayidx.i1, align 4, !alias.scope !3

The following steps are taken:

  1. Find the underlying domains left to right behind !noalias !0, in this case !2 but it could be more
  2. Find all scopes in in these domains that belong to !0 (noalias_set) and !3 (aliasscope_set)
    1. In this case, noalias_set={!1} and aliasscope_set={}
  3. Since aliasscope_set is empty here that's a bail condition and we conclude MayAlias

In a more interesting case such as:

NoAlias:   %0 = load float, float* %c, align 4, !noalias !0 <->   store float %0, float* %arrayidx.i, align 4, !alias.scope !0

Here noalias_set={!1} and aliasscope_set={!1} and we check to see if noalias_set is a superset of aliasscope_set. Since it is, we can concluded this is noalias

In the step (2) above though this is isolated to a single domain at a time. When for any single domain the superset relationship holds, then we immediately conclude noalias.
I think this is because domains are nested based from inlining. If I have domains {!1, !2} in my metadata, that indicates the I'm currently in callee2 of this call chain caller->callee1->callee2. Thus, as long as this property holds the first domain that matches with a superset is enough to conclude noalias.

How the bug I'm looking at manifests

In memcpyopt we can fold 2 memcpys into a single one. This causes the alias.scope metadata to be merged in a union. Suppose I'm merging the following 2 instances (I'm reusing the metadata from the above examples):

define i8 @test(i8 %input) {
  %tmp = alloca i8
  %dst = alloca i8
  %src = alloca i8
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %src), !noalias !3
  store i8 %input, i8* %src
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp, i8* align 8 %src, i64 1, i1 false), !alias.scope !1
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %src), !noalias !3
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %dst, i8* align 8 %tmp, i64 1, i1 false), !alias.scope !3
  %ret_value = load i8, i8* %dst
  ret i8 %ret_value
}

declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture)
declare void @llvm.memcpy.p0i8.p0i8.i64(i8*, i8*, i64, i1)

!0 = !{!1}
!1 = distinct !{!1, !2, !"hello: %a"}
!2 = distinct !{!2, !"hello"}
!3 = !{!4}
!4 = distinct !{!4, !5, !"hello2: %a"}
!5 = distinct !{!5, !"hello2"}

The merged instance then will look like (-memcpyopt -enable-noalias-to-md-conversion -basic-aa -scoped-noalias-aa -aa-eval -evaluate-aa-metadata -print-all-alias-modref-info -S)

define i8 @test(i8 %input) {
  %tmp = alloca i8, align 1
  %dst = alloca i8, align 1
  %src = alloca i8, align 1
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %src), !noalias !0
  store i8 %input, i8* %src, align 1
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %dst, i8* align 8 %src, i64 1, i1 false), !alias.scope !3
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %src), !noalias !0
  %ret_value = load i8, i8* %dst, align 1
  ret i8 %ret_value
}
...
!0 = !{!1}
!1 = distinct !{!1, !2, !"hello2: %a"}
!2 = distinct !{!2, !"hello2"}
!3 = !{!1, !4, !5, !"hello: %a"}
!4 = distinct !{!4, !5, !"hello: %a"}
!5 = distinct !{!5, !"hello"}

!3 now represents the union of the previous scopes. Now, if I ask AA what the relationship between:

NoModRef:   call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %dst, i8* align 8 %src, i64 1, i1 false), !alias.scope !3 <->   call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %src), !noalias !0

I get a result which seems incorrect. lifetime.end definitely does ref %src from the previous memcpy. When querying scoped-AA we look at the first domain inside !noalias !0, which is !2 and its scopes which is {!1}. The scopes in that domain for !alias.scope !3 is however also only {!1} as well since the added metadata is in a separate domain. Thus because {!1} is a superset of {!1} scoped-AA concludes its nomodref.

I think the underlying problem is that scoped-AA relies on any appended domains to reside in a caller-callee dependence. If that holds, in this example you wouldn't care that you have another domain in !alias.scope !3 since it would be a callee within a scope that we know doesn't alias. However, because memcpyopt is not inlining and doesn't preserve this rule the merged metadata doesn't express the new dependence correctly and we lose aliasing information.

Hi Modi,

Following line in your input example is wrong and explains why the resulting alias info is corrupt:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp, i8* align 8 %src, i64 1, i1 false), !alias.scope !1

It should be !alias.scope !0

Given that, it does not change the result of the alias analysis which indeed seems to be wrong. It seems the 'union' should only be allowed if the domains match (and maybe not just for this case ?)

I would propose as solution to not 'merge' the !alias.scope in this case, but to:

  • keep it if it is identical on both memcpy
  • throw it away if it is not identical

Note to self: check why this should not be a problem with the full restrict patches (which does not use !alias.scope).

Hi Modi,

Following line in your input example is wrong and explains why the resulting alias info is corrupt:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp, i8* align 8 %src, i64 1, i1 false), !alias.scope !1

It should be !alias.scope !0

Thanks for verifying! Good catch, I'll update the patch with this fixed up example.

Given that, it does not change the result of the alias analysis which indeed seems to be wrong. It seems the 'union' should only be allowed if the domains match (and maybe not just for this case ?)

Given how the problem manifests I agree that this is general to any merging of alias.scope. This is a pretty complicated area and I think I've drilled through to the ground truth.

I would propose as solution to not 'merge' the !alias.scope in this case, but to:

  • keep it if it is identical on both memcpy
  • throw it away if it is not identical

That's a reasonable approach as well. I mentioned this in the summary as "We could also strip away or 'intersect' alias.scope on the final merge to still maintain correctness". I found in my specific case final codegen was not impacted by hindering this optimization and the trade-off would be less powerful AA for later phases.

modimo updated this revision to Diff 306275.Wed, Nov 18, 6:11 PM

Update unit test with correct set of metadata

modimo edited the summary of this revision. (Show Details)Wed, Nov 18, 6:16 PM
modimo edited the summary of this revision. (Show Details)Wed, Nov 18, 6:19 PM
modimo added inline comments.Wed, Nov 18, 6:23 PM
llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
35

Yeah the metadata in my first example was wrong. I've dug through how scopes and domains work and the current example should be representative of legal metadata.

I would not block the optimization, but just get the !alias.scope metadata correct

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
988 ↗(On Diff #306275)

Omit MD_alias_scope, as the 'default merge' mechanism is definitely invalid for this case.
Replace it (after the combineMetadata) with something like:

if (C->getMetadata(LLVMContext::MD_alias_scope) != cpyLoad->getMetadata(LLVMContext::MD_alias_scope)

C->setMetadata(LLVMContext::MD_alias_scope, nullptr);

(and add a decent comment ;) )

modimo updated this revision to Diff 306547.Thu, Nov 19, 3:21 PM

Changing to conservative merge of metadata rather than bailing on optimization. Updated test case due to that.

modimo retitled this revision from [MemCpyOpt] Bail on call slot optimization if it merges alias scopes to [MemCpyOpt] Correctly merge alias scopes during call slot optimization.Thu, Nov 19, 3:23 PM
modimo edited the summary of this revision. (Show Details)

@jeroen.dobbelaere I think the correct merging in all cases requires this strategy (or one which checks that all the domains match). If true (@hfinkel?) that'll be a larger change that needs more testing.

smith added a subscriber: smith.Fri, Nov 20, 3:28 AM
nikic added a comment.Mon, Nov 23, 4:12 AM

Unless there's something that makes this issue specific to the call slot optimization, wouldn't it be better to adjust the logic in MDNode::getMostGenericAliasScope() to ensure that any code performing alias scope merging is covered?

Unless there's something that makes this issue specific to the call slot optimization, wouldn't it be better to adjust the logic in MDNode::getMostGenericAliasScope() to ensure that any code performing alias scope merging is covered?

Yes, after thinking some more about it, that is probably the best solution. This also corresponds to what the Full Restrict patches are doing when merging ptr_provenance operands. The merging/handling of different domains should only be done during inlining and that seems to be done explicitly in InlineFunction.cpp. It is for this use case that the langref explains the behavior of alias.scope/noalias wrt handling of aliasing.

llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
21

It would be nice if we could check that the !alias.scope really has been omitted.

modimo updated this revision to Diff 308463.Mon, Nov 30, 1:01 PM
modimo edited the summary of this revision. (Show Details)

Move the fix to getMostGenericAliasScope. Renamed metadata in test case.

modimo added inline comments.Mon, Nov 30, 1:03 PM
llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
21

added --match-full-lines in FileCheck to do just that :D

About !alias.scope and !noalias (also see https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata):

  • 'Scopes in one domain don't affect scopes in other domains' -> this is used during inlining
  • The 'alias.scope' indicates for an instruction A will not alias with another instruction B IF, for a domain, the alias.scopes for A belonging to that domain are a subset of the 'noalias' scopes of instruction B.

Because of that: when merging metadata at the same level (merging two store instructions for example),
you can only keep the domains that are in both instructions valid. For those, you must take the union of the scopes.

Examples: (vXdY : variable X in Domain Y)

a) same domain(s):

store1: alias.scope v1d1
store2: alias.scope v2d1
=>
store_merged: alias.scope  v1d1, v2d2

b) Disjunct domains

store1 : alias.scope  v1d1
store2 : alias.scope  v2d2
=>
store_merged:  NO alias.scope

c) partially overlapping (subset):

store1: alias.scope v1d1
store2: alias.scope v2d1, v3d2
=>
store_merged: alias.scope v1d1, v2d1

d) partially overlapping:

store1: alias.scope v1d1, v2d2
store2: alias.scope v3d2, v4d3
=>
store_merged: alias.scope v2d2, v3d2
NOTE: why is concatenate not valid ?

Take example (b). Suppose you have a:

store3 : noalias  v1d1

now, store1 and store3 do not alias. store2 and store3 might alias.
The merger store_merged might alias with store3. So, it is not valid to just take the union of the scopes of store1 and store2.

llvm/lib/IR/Metadata.cpp
948

Two remarks:http://www.cplusplus.com/reference/algorithm/includes/ )

  1. SmallPtrSet is not sorted, so it does not fulfill the preconditions for std::includes (See http://www.cplusplus.com/reference/algorithm/includes/ )
  2. imho, this is still not a valid merging for !alias.scope (See external comment)

You probably want to do:

if (ADomains != BDomains)
  return nullptr;

or take a real intersection of the domains, and then the union of the scopes belonging to the intersection of the domains.

tislam added a subscriber: tislam.Tue, Dec 1, 11:19 AM
tislam added inline comments.
llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
12

Regarding the alias.scope metadata !3 = !{!1, !4, !5, !"callee0: %a"}. Should this be !3 = !{!1, !4}? An alias.scope metadata should be a list of scope metadata and should not include any domain metadata (!5, for example).

modimo updated this revision to Diff 308796.Tue, Dec 1, 3:21 PM
modimo edited the summary of this revision. (Show Details)

Go with intersection of domains then union the scopes within those domains as discussed. Updating tests to match latest behavior.

modimo added inline comments.Tue, Dec 1, 3:27 PM
llvm/lib/IR/Metadata.cpp
948

That's good to know about SmallPtrSet, I'll keep that in mind for the future. For (2) your approach is the correct one and I've updated the patch to match.

llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
12

Yes good catch. I regenerated the output with trunk which matches your statement. I suspect the top half and bottom half got out of sync at some point in the revisions.

kkwli0 added a subscriber: kkwli0.Tue, Dec 1, 6:27 PM