This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Add uniqueptr attribute.
AbandonedPublic

Authored by fhahn on Jul 29 2020, 8:51 AM.

Details

Summary

This patch adds a new uniqueptr attribute.

The attribute can be used to indicate that an argument is the only
pointer referencing the underlying object based on the pointer.

The intent of this attribute is to provide a guarantee that the no
pointers to the object are escaped before calling the function.

The motivating case is passing indirectly passed arguments on platforms
that do not use byval, like AArch64. On AArch64, objects passed by value
are allocated in the caller and then passed as pointers, but there is no
information in the called function that the argument is the only pointer
to the underlying object.

This currently prohibits some optimizations. The example below is roughly the
IR generated by Clang for a function that takes a struct by value on AArch64.
We cannot remove the second load, because nothing guarantees that %x
did not escape before calling the function. The new attribute provides
exactly this guarantee, allowing us to remove the second load.

%struct.Foo = type { i8*, i8*, i8*, i32 }

define i32 @test(%struct.Foo* nocapture readonly %x) {
entry:
  %x1 = getelementptr inbounds %struct.Foo, %struct.Foo* %x, i64 0, i32 3
  %0 = load i32, i32* %x1, align 8, !tbaa !2
  tail call void bitcast (void (...)* @sideeffect to void ()*)() #2
  %1 = load i32, i32* %x1, align 8, !tbaa !2
  %add = add nsw i32 %1, %0
  ret i32 %add
}

In a way, the uniqueptr attribute should behave similar to byval, just
that the object is allocated in the caller, rather than on the callee
stack.

I am a bit reluctant to add yet another attribute, but unfortunately there
seems to be no combination of attributes that achieves the desired
optimizations.

NoAlias is relatively close, but it is not suitable, because references
to the object can escape in the called function.

I am also not sure about the name of the attribute. Any suggestions for
a better name are more than welcome!

Diff Detail

Event Timeline

fhahn created this revision.Jul 29 2020, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2020, 8:51 AM
fhahn requested review of this revision.Jul 29 2020, 8:51 AM

I think this sounds suspiciously like byref

llvm/include/llvm/IR/Attributes.td
160

Missing comment

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
742

Missing bitcode and asm parser tests

fhahn added a comment.Jul 29 2020, 9:22 AM

Sorry for not being up-front about this, there are still some comments/tests missing, but I'd like to make sure this is going in the right direction first.

As discussed on IRC, I think noalias is actually what you want. Storing a pointer and loading it does *not* break the "based on" relationship (https://llvm.org/docs/LangRef.html#pointeraliasing). We might want to clarify the rules though.

@hfinkel @jeroen.dobbelaere ^ WDYT?

We cannot remove the second load, because nothing guarantees that %x did not escape before calling the function. The new attribute provides exactly this guarantee, allowing us to remove the second load.

Why does noalias not provide the relevant guarantee? Even if the pointer had escaped, your call to @sideeffect would not be allowed to access the value through that escaped pointer (unless it were passed as an argument to @sideeffect) because that would violate the rule that all values accessed through a noalias pointer are accessed only through pointers based on the noalias pointer within the scope of the noalias pointer.

fhahn added a comment.Jul 29 2020, 9:46 AM

We cannot remove the second load, because nothing guarantees that %x did not escape before calling the function. The new attribute provides exactly this guarantee, allowing us to remove the second load.

Why does noalias not provide the relevant guarantee? Even if the pointer had escaped, your call to @sideeffect would not be allowed to access the value through that escaped pointer (unless it were passed as an argument to @sideeffect) because that would violate the rule that all values accessed through a noalias pointer are accessed only through pointers based on the noalias pointer within the scope of the noalias pointer.

I think it boils down to the exact interpretation of noalias for the IR example below. Assume %0 would be marked as noalias. IIUC your comment correctly, @sideffect would not be allowed to modify the underlying object of %0 (this also matches my reading of the based-on rules, with value referring to LLVM IR/SSA value).

%struct.Foo = type { i8*, i8*, i32 }

@PTR = dso_local local_unnamed_addr global %struct.Foo* null, align 8

define dso_local i32 @test(%struct.Foo* %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 2
  %3 = load i32, i32* %2, align 8, !tbaa !2
  store %struct.Foo* %0, %struct.Foo** @PTR, align 8, !tbaa !8
  tail call void bitcast (void (...)* @sideeffect to void ()*)() #2
  %4 = load i32, i32* %2, align 8, !tbaa !2
  %5 = add nsw i32 %4, %3
  ret i32 %5
}

declare dso_local void @sideeffect(...) local_unnamed_addr #1

If the above is true, then noalias would be too strict, because modifications via an escaped pointer in the function should be allowed with the uniqueptr semantics. IIUC we need to allow modifications, because references to passed-by-value objects can escape and used to modify the object in callees, as in something like the code below, where @sideeffect would be allowed to modify x via PTR

struct Foo *PTR;

int test(struct Foo x) {
    int a = x.i;
    PTR = &x;
    sideeffect();
    return x.i + a;
}

We cannot remove the second load, because nothing guarantees that %x did not escape before calling the function. The new attribute provides exactly this guarantee, allowing us to remove the second load.

Why does noalias not provide the relevant guarantee? Even if the pointer had escaped, your call to @sideeffect would not be allowed to access the value through that escaped pointer (unless it were passed as an argument to @sideeffect) because that would violate the rule that all values accessed through a noalias pointer are accessed only through pointers based on the noalias pointer within the scope of the noalias pointer.

I think it boils down to the exact interpretation of noalias for the IR example below. Assume %0 would be marked as noalias. IIUC your comment correctly, @sideffect would not be allowed to modify the underlying object of %0 (this also matches my reading of the based-on rules, with value referring to LLVM IR/SSA value).

%struct.Foo = type { i8*, i8*, i32 }

@PTR = dso_local local_unnamed_addr global %struct.Foo* null, align 8

define dso_local i32 @test(%struct.Foo* %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 2
  %3 = load i32, i32* %2, align 8, !tbaa !2
  store %struct.Foo* %0, %struct.Foo** @PTR, align 8, !tbaa !8
  tail call void bitcast (void (...)* @sideeffect to void ()*)() #2
  %4 = load i32, i32* %2, align 8, !tbaa !2
  %5 = add nsw i32 %4, %3
  ret i32 %5
}

declare dso_local void @sideeffect(...) local_unnamed_addr #1

If the above is true, then noalias would be too strict, because modifications via an escaped pointer in the function should be allowed with the uniqueptr semantics. IIUC we need to allow modifications, because references to passed-by-value objects can escape and used to modify the object in callees, as in something like the code below, where @sideeffect would be allowed to modify x via PTR

struct Foo *PTR;

int test(struct Foo x) {
    int a = x.i;
    PTR = &x;
    sideeffect();
    return x.i + a;
}

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers). In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

We cannot remove the second load, because nothing guarantees that %x did not escape before calling the function. The new attribute provides exactly this guarantee, allowing us to remove the second load.

Why does noalias not provide the relevant guarantee? Even if the pointer had escaped, your call to @sideeffect would not be allowed to access the value through that escaped pointer (unless it were passed as an argument to @sideeffect) because that would violate the rule that all values accessed through a noalias pointer are accessed only through pointers based on the noalias pointer within the scope of the noalias pointer.

I think it boils down to the exact interpretation of noalias for the IR example below. Assume %0 would be marked as noalias. IIUC your comment correctly, @sideffect would not be allowed to modify the underlying object of %0 (this also matches my reading of the based-on rules, with value referring to LLVM IR/SSA value).

%struct.Foo = type { i8*, i8*, i32 }

@PTR = dso_local local_unnamed_addr global %struct.Foo* null, align 8

define dso_local i32 @test(%struct.Foo* %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 2
  %3 = load i32, i32* %2, align 8, !tbaa !2
  store %struct.Foo* %0, %struct.Foo** @PTR, align 8, !tbaa !8
  tail call void bitcast (void (...)* @sideeffect to void ()*)() #2
  %4 = load i32, i32* %2, align 8, !tbaa !2
  %5 = add nsw i32 %4, %3
  ret i32 %5
}

declare dso_local void @sideeffect(...) local_unnamed_addr #1

If the above is true, then noalias would be too strict, because modifications via an escaped pointer in the function should be allowed with the uniqueptr semantics. IIUC we need to allow modifications, because references to passed-by-value objects can escape and used to modify the object in callees, as in something like the code below, where @sideeffect would be allowed to modify x via PTR

struct Foo *PTR;

int test(struct Foo x) {
    int a = x.i;
    PTR = &x;
    sideeffect();
    return x.i + a;
}

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers).

Agreed, thanks for confirming.

In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

I guess I am curious how the restrict pointer itself can escape in such a way?

We cannot remove the second load, because nothing guarantees that %x did not escape before calling the function. The new attribute provides exactly this guarantee, allowing us to remove the second load.

Why does noalias not provide the relevant guarantee? Even if the pointer had escaped, your call to @sideeffect would not be allowed to access the value through that escaped pointer (unless it were passed as an argument to @sideeffect) because that would violate the rule that all values accessed through a noalias pointer are accessed only through pointers based on the noalias pointer within the scope of the noalias pointer.

I think it boils down to the exact interpretation of noalias for the IR example below. Assume %0 would be marked as noalias. IIUC your comment correctly, @sideffect would not be allowed to modify the underlying object of %0 (this also matches my reading of the based-on rules, with value referring to LLVM IR/SSA value).

%struct.Foo = type { i8*, i8*, i32 }

@PTR = dso_local local_unnamed_addr global %struct.Foo* null, align 8

define dso_local i32 @test(%struct.Foo* %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 2
  %3 = load i32, i32* %2, align 8, !tbaa !2
  store %struct.Foo* %0, %struct.Foo** @PTR, align 8, !tbaa !8
  tail call void bitcast (void (...)* @sideeffect to void ()*)() #2
  %4 = load i32, i32* %2, align 8, !tbaa !2
  %5 = add nsw i32 %4, %3
  ret i32 %5
}

declare dso_local void @sideeffect(...) local_unnamed_addr #1

If the above is true, then noalias would be too strict, because modifications via an escaped pointer in the function should be allowed with the uniqueptr semantics. IIUC we need to allow modifications, because references to passed-by-value objects can escape and used to modify the object in callees, as in something like the code below, where @sideeffect would be allowed to modify x via PTR

struct Foo *PTR;

int test(struct Foo x) {
    int a = x.i;
    PTR = &x;
    sideeffect();
    return x.i + a;
}

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers).

Agreed, thanks for confirming.

In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

I guess I am curious how the restrict pointer itself can escape in such a way?

What do you mean? Just like in your example, it can be assigned to a global variable (or, it can be assigned to *p, where p is an argument).

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers).

Agreed, thanks for confirming.

In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

I guess I am curious how the restrict pointer itself can escape in such a way?

What do you mean? Just like in your example, it can be assigned to a global variable (or, it can be assigned to *p, where p is an argument).

Oh right, I thought you meant there are cases where escaping is not allowed on the C level, but you meant we need to be careful during alias analysis?

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers).

Agreed, thanks for confirming.

In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

I guess I am curious how the restrict pointer itself can escape in such a way?

What do you mean? Just like in your example, it can be assigned to a global variable (or, it can be assigned to *p, where p is an argument).

Oh right, I thought you meant there are cases where escaping is not allowed on the C level, but you meant we need to be careful during alias analysis?

Exactly.

fhahn abandoned this revision.Jul 29 2020, 10:36 AM

No, that's okay. The "based on" relationship flows through memory as well. The C spec for restrict does a better job than the LangRef does in this respect in formalizing that, but our semantics need to allow this too (otherwise our SSA conversation in SROA/mem2reg would not be sound for noalias pointers).

Agreed, thanks for confirming.

In this case, PTR is based on &x, and accesses through it in sideeffect are allowed even if x is noalias. One of the things that makes handling noalais pointers more-difficult than one might expect is, specifically, that we do need to check for cases where the restrict pointer itself escapes because then any other pointer of unknown provenance might be based on it.

I guess I am curious how the restrict pointer itself can escape in such a way?

What do you mean? Just like in your example, it can be assigned to a global variable (or, it can be assigned to *p, where p is an argument).

Oh right, I thought you meant there are cases where escaping is not allowed on the C level, but you meant we need to be careful during alias analysis?

Exactly.

Ok that makes sense. I thought we disallowed going through memory for based-on, because it would make for simpler analysis, but it would indeed make mem2reg/SROA's handling of noalias invalid.

IICU noalias should work fine for the intended purpose.