This is an archive of the discontinued LLVM Phabricator instance.

Add ptrmask intrinsic
ClosedPublic

Authored by fhahn on Mar 6 2019, 5:52 PM.

Details

Summary

This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.

This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.Mar 6 2019, 5:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 5:52 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
arsenm added a subscriber: arsenm.Mar 14 2019, 8:07 AM
arsenm added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3760 ↗(On Diff #189632)

I don't see why you would use the max size, when you can check the 2 address space sizes

llvm/test/Analysis/BasicAA/strip-nonsignificant-ptr-bits.ll
88 ↗(On Diff #189632)

Needs tests where the address spaces don't match and have different sizes

I'm really afraid that this isn't sound. We've had a number of issues in this space, and we've always resisted attempts to make BasicAA look through inttoptr/ptrtotint. Once you convert to an integer, control dependencies can effectively add additional underlying objects. In cases where this is sound, why not fold away the whole inttoptr(and(ptrtoint)) in the first place?

fhahn marked an inline comment as done.Mar 14 2019, 11:51 AM

I'm really afraid that this isn't sound. We've had a number of issues in this space, and we've always resisted attempts to make BasicAA look through inttoptr/ptrtotint. Once you convert to an integer, control dependencies can effectively add additional underlying objects. In cases where this is sound, why not fold away the whole inttoptr(and(ptrtoint)) in the first place?

Thanks. I agree it is a bit shaky as I think the LangRef is not too clear about the issue. I think it would make sense to specify that the unused bits have to be 0 for memory accesses (at least for MacOS/Linux on Arm64/X86), but maybe not for getelementptr & co. Given how BasicAA is used to look through getlementptrs, I am not entirely sure that can be guaranteed at the moment.

I am also not entirely sure how control dependencies could add new underlying objects with this patch. We are looking backwards for a specific pattern and the underlying object has to dominate the and & ptrtoints.

In the test case, it is not possible to fold to inttoptr(and(ptrtoint)) unfortunately, because the original pointer has some unused bits set, and they need to be cleared before memory accesses, otherwise an invalid address is accessed.

llvm/lib/Analysis/ValueTracking.cpp
3760 ↗(On Diff #189632)

Thanks, I'll update the code, if the underlying code is sound.

Ping. @hfinkel did my response make sense?

I could also split it in 2 parts: 1) strip lower bits due to alignment requirement and 2) strip high bits due to available bits in address space. Would that make it easier to reason about separately?

fhahn added a subscriber: atrick.Mar 29 2019, 11:11 AM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

llvm/lib/Analysis/ValueTracking.cpp
3751 ↗(On Diff #189632)

This code is essentially repeated from the code added to BasicAA. Please put this into a utility function or similar.

fhahn updated this revision to Diff 192931.Mar 29 2019, 3:19 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

This was very helpful and interesting, thanks!

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

Thanks, from the reading above, those restrictions should prevent applying the
analysis on ptrtoint/inttoptr pairs where multiple objects contribute to the result.

I've updated the patch to check the number of users and to not allow any instructions between ptrtoint,and,inttoptr for now. We can relax that later IMO.

fhahn marked an inline comment as done.Mar 29 2019, 3:19 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1. That IR ^ could have resulted from:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int64 ptr0_i = (int64)(ptr0+1);
int64 ptr1_i = (int64)(ptr1);

int32* ptr = (int*)ptr1_i;
aqjune added a subscriber: aqjune.Mar 29 2019, 7:39 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1. That IR ^ could have resulted from:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int64 ptr0_i = (int64)(ptr0+1);
int64 ptr1_i = (int64)(ptr1);

int32* ptr = (int*)ptr1_i;

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

sanjoy requested changes to this revision.Mar 31 2019, 10:35 AM
This revision now requires changes to proceed.Mar 31 2019, 10:35 AM
fhahn added a comment.Mar 31 2019, 2:09 PM

Thanks for taking a look!

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1.

I am probably missing something, but I am not sure how such a case would be possible with this patch? It specifically looks for a inttoptr(and(ptrtoint, C)) sequence, where C is such that the (logical) destination of the pointer remains unchanged. Unfortunately I do not think the LangRef is clear about 'irrelevant' bits in pointers (due to alignment or address spaces)

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1.

I am probably missing something, but I am not sure how such a case would be possible with this patch? It specifically looks for a inttoptr(and(ptrtoint, C)) sequence, where C is such that the (logical) destination of the pointer remains unchanged. Unfortunately I do not think the LangRef is clear about 'irrelevant' bits in pointers (due to alignment or address spaces)

For instance:

// Let's say we know malloc(64) will always return a pointer that is 8 byte
// aligned.

int8* ptr0 = malloc(64);
int8* ptr1 = malloc(64);

int8* ptr0_end = ptr0 + 64;

// I'm not sure if this comparison is well defined in C++, but it is well
// defined in LLVM IR:
if (ptr0_end != ptr1) return;

intptr ptr0_end_i = (intptr)ptr0_end;

intptr ptr0_end_masked = ptr0_end_i & -8;

// I think the transform being added in this comment will fire below since it is
// doing inttoptr(and(ptrtoint(ptr0_end), -8)).

int8* aliases_ptr0_and_ptr1 = (int8*)ptr0_end_masked;

Right now aliases_ptr0_and_ptr1 aliases both ptr0 and ptr1 (we can GEP backwards from it to access ptr0 and forwards from it to access ptr1). But if we replace it with ptr0_end then it can be used to access ptr0 only.

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

aqjune added a comment.Apr 1 2019, 2:36 AM

@fhahn

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

The problem happens when %addr is untagged and not dereferenceable.
For example, given an array int a[4], &a[4] is not dereferenceable, but (int *)(int)&a[4] may update another object that is adjacent to a[4].
IIUC, if getUnderlyingObject(p) returns an object obj, modifying p should either only update obj or raise undefined behavior, but is unallowed to modify other objects.
If %addr = &a[4], it is incorrect for getUnderlyingObject(inttoptr(and(ptrtoint %addr, C))) to return a.
If there is a guarantee that the object %addr points to is still alive and inttoptr(and(ptrtoint %addr, C)) is within the object (e.g. a[0], .., a[3]), the analysis is correct.

@atrick

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

fhahn added a comment.Apr 1 2019, 5:44 AM

For instance:

// Let's say we know malloc(64) will always return a pointer that is 8 byte
// aligned.

int8* ptr0 = malloc(64);
int8* ptr1 = malloc(64);

int8* ptr0_end = ptr0 + 64;

// I'm not sure if this comparison is well defined in C++, but it is well
// defined in LLVM IR:
if (ptr0_end != ptr1) return;

intptr ptr0_end_i = (intptr)ptr0_end;

intptr ptr0_end_masked = ptr0_end_i & -8;

// I think the transform being added in this comment will fire below since it is
// doing inttoptr(and(ptrtoint(ptr0_end), -8)).

int8* aliases_ptr0_and_ptr1 = (int8*)ptr0_end_masked;

Right now aliases_ptr0_and_ptr1 aliases both ptr0 and ptr1 (we can GEP backwards from it to access ptr0 and forwards from it to access ptr1). But if we replace it with ptr0_end then it can be used to access ptr0 only.

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);

if (p != (intptr_t) obj2) return;
 
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)? If that is the case, then an intrinsic as suggested by @atrick would help circumvent that issue. If it is not the case and the information that p aliases obj1 and obj2 is limited to uses of p, then I think the restrictions in place should be sufficient to rule out your example (assuming we use integer comparisons for the pointers)

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

sanjoy added a comment.Apr 2 2019, 7:30 PM

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

I agree, but unfortunately it isn't clear to me how we can generate this intrinsic from frontend code (assuming a C++ frontend) that does pointer arithmetic by casting pointers to integers and back.

sanjoy added a comment.Apr 2 2019, 7:51 PM

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);

if (p != (intptr_t) obj2) return;
 
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)?

Yes. In the abstract LLVM machine pointers have provenance and integers don't. All integers with the same bitwise value are equivalent (can be replaced one for another), but bitwise identical pointers are not necessarily equivalent. This lets us do aggressive optimization on integers while still keeping a strong (ish) memory model.

A consequence of this is that when you convert (intptr_t)(obj1 + 4) back to a pointer, the new pointer's provenance includes all pointers whose bitwise value could have been obj1 + 4.

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

I haven't thought this through but it still seems fishy to me: IIRC LLVM's alias predicate is defined *if* a write to X can be observed by Y (or vice versa) *then* X aliases Y. So two readonly locations A and B are both must-alias and no-alias: there can never a write to a readonly location so the antecedent of the predicate is false (so both "A aliases B" and "A does not alias B" are true). It seems like we have a similar situation here: X is an illegal address that you can't load or store from and so both "X alias A" and "X does not alias A" are true. But Alias(inttoptr(ptrtoint(X) & -8), A) (which has a specific answer since it is legal to load from/store to inttoptr(ptrtoint(X) & -8)) has a definite answer.

fhahn added a comment.Apr 3 2019, 2:53 PM

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);

if (p != (intptr_t) obj2) return;
 
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)?

Yes. In the abstract LLVM machine pointers have provenance and integers don't. All integers with the same bitwise value are equivalent (can be replaced one for another), but bitwise identical pointers are not necessarily equivalent. This lets us do aggressive optimization on integers while still keeping a strong (ish) memory model.

A consequence of this is that when you convert (intptr_t)(obj1 + 4) back to a pointer, the new pointer's provenance includes all pointers whose bitwise value could have been obj1 + 4.

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

I haven't thought this through but it still seems fishy to me: IIRC LLVM's alias predicate is defined *if* a write to X can be observed by Y (or vice versa) *then* X aliases Y. So two readonly locations A and B are both must-alias and no-alias: there can never a write to a readonly location so the antecedent of the predicate is false (so both "A aliases B" and "A does not alias B" are true). It seems like we have a similar situation here: X is an illegal address that you can't load or store from and so both "X alias A" and "X does not alias A" are true. But Alias(inttoptr(ptrtoint(X) & -8), A) (which has a specific answer since it is legal to load from/store to inttoptr(ptrtoint(X) & -8)) has a definite answer.

Hm, if the definition is based on pointers directly and requires de-referenceability, that would be indeed be tricky. If the predicate is defined based on memory locations, one might be able to argue that the memory location is only reference by the valid bits. The documentation about must-alias/no-alias ( http://llvm.org/docs/AliasAnalysis.html#must-may-or-no ) is not as precise I think. I think we would have to resolve this question also when using the intrinsic.

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

I agree, but unfortunately it isn't clear to me how we can generate this intrinsic from frontend code (assuming a C++ frontend) that does pointer arithmetic by casting pointers to integers and back.

I think that we would need the frontend to directly generate the relevant code. It's the source-language semantics that would make that legal. We wouldn't be able to form them later for the same reason that we couldn't do the analysis later in the first place.

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Yes, I think we should add this to the AA docs. I think the best reference for a consistent LLVM memory model is https://sf.snu.ac.kr/publications/llvmtwin.pdf .

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

Generally I don't think we can define semantics like these as function local since it would make inlining and outlining non-behavior preserving.

fhahn added a comment.Apr 4 2019, 1:31 PM

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Yes, I think we should add this to the AA docs. I think the best reference for a consistent LLVM memory model is https://sf.snu.ac.kr/publications/llvmtwin.pdf .

Ok great. Thanks for all the input & patience. I'll try to summarize things in the AA docs when I have a bit more time!

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

Generally I don't think we can define semantics like these as function local since it would make inlining and outlining non-behavior preserving.

Right, that's what I thought. I'll take a look at clang and see if we would have access to the datalayout and if it would be feasible to generate calls to a ptrmask intrinsic.

fhahn updated this revision to Diff 198607.May 8 2019, 2:35 AM

I went with adding a pointer-mask intrinsic, as suggested.

Updated this patch to add the intrinsic and I will soon post a follow up
supporting it in getUnderlyingObject.

fhahn retitled this revision from [BasicAA] Simplify inttoptr(and(ptrtoint(X), C)) to X, if C preserves all significant bits. to Add ptrmask intrinsic.May 8 2019, 2:39 AM
fhahn edited the summary of this revision. (Show Details)
fhahn added a reviewer: aqjune.
jdoerfert added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1026 ↗(On Diff #198607)

Did we think about adding Returned<0> to the attribute list? It would most of LLVM simply look through this (as it should).

1170 ↗(On Diff #198607)

remove

fhahn marked an inline comment as done.May 8 2019, 9:11 AM
fhahn added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1026 ↗(On Diff #198607)

I think that would indicate that ptrmask is a no-op, right? That would not be true, it potentially changes the bit value of the pointer, e.g. masking out the data in the lowest bits of a tagged pointer.

The intended semantics are that the masking is done in a way that the underlying object the pointer points to is not changed, because it only masks out bits that need to be 0 anyways, when the pointer is used to access memory.

jdoerfert added inline comments.May 8 2019, 9:22 AM
llvm/include/llvm/IR/Intrinsics.td
1026 ↗(On Diff #198607)

I see. Maybe add it to stripPointerCasts then. Btw. D61607 introduces the "keep bit pattern the same" version, something we currently do not strictly enforce.

aqjune added inline comments.May 8 2019, 9:24 AM
llvm/docs/LangRef.rst
16406 ↗(On Diff #198607)

Defining this case as undefined behavior may block code motion.
For example (which does LICM):

for (..) {
  if (cond) break;
  p = llvm.ptrmask(p0, mask)
  use(p)
}
=>
p = llvm.ptrmask(p0, mask)
for (..) {
  if (cond) break;
  use(p)
}

If cond is false and mask is invalid, the source program has no UB (because the call is never executed), but the optimized program raises UB, which is incorrect.
Defining p as poison in this case would resolve the problem.

fhahn updated this revision to Diff 198869.May 9 2019, 10:39 AM
fhahn marked 2 inline comments as done.

Change behavior from undefined to poison when passing masks that zero out relevant bits of a pointer, thanks @aqjune

What happens if I use an integer type bigger than the native pointer width? Or smaller?

llvm/docs/LangRef.rst
16396 ↗(On Diff #198869)

Can you split the sentences? Maybe also make them active. To see what I would like I tried to rewrite it myself. No need to take it though!

The `llvm.ptrmask` intrinsic mask out bits of the pointer that are set in the mask.
This allows stripping data from tagged pointers without ever converting them to an integer (ptrtoint/inttoptr). As a consequence, we can preserve more information to facilitate alias analysis and underlying object detection. The masked out bits cannot be meaningful bits as defined by the ABI and data layout. Only bits like required alignment bits shall be masked out.

16404 ↗(On Diff #198869)

Start with the semantic, e.g., the bitwise AND sentence below.

16409 ↗(On Diff #198869)

I'm missing something like the following:

"The semantic of the intrinsic is equivalent to a bitwise AND operation of the two operands."

That could also replace the lowering part (which does not specify much anyway).

And the last sentence is also problematic, now that you specify poison as return value and not UB. You can say: "If the return value is not poison, it points into the same object as the first argument did."

llvm/include/llvm/IR/Intrinsics.td
1025 ↗(On Diff #198869)

I'd drop the latter part. The intrinsic masks out bits. That some architectures require these bits to be 0 when the pointer is "just a use case".

hfinkel added inline comments.May 9 2019, 3:13 PM
llvm/docs/LangRef.rst
16397 ↗(On Diff #198607)

"a restriction of the meaningful bits through the data layout" - I'm not sure what this is supposed to mean. I guess that you're referring to higher-order bits which are not actually used to form valid addresses (e.g., if a system has 64-bit pointers but only uses 48 bits in practice).

16404 ↗(On Diff #198607)

Please specifically specify how the mask is applied. It is currently unclear whether this is computing (P & mask) or (P & ~mask) or something else.

16406 ↗(On Diff #198607)

Hopefully, the UB would occur only if the resulting pointer were accessed. If an invalid mask is UB, this will prevent the intrinsic from being hoisted, and that seems undesirable. Regardless, please remove the UB-related text from here. Once we specify what the intrinsic does, the rest follows from pre-existing semantics and trying to specify all of the relevant conditions here seems unlikely to be useful.

16408 ↗(On Diff #198607)

I think that it's best to specify only the effect. Maybe something like: The effect of this intrinsic is to ensure that ptrtoint(ptrmask(p, m)) == (ptrtoint(p) & m).

16409 ↗(On Diff #198607)

... and the first argument are based on the same underlying object. (ref. the "based on" terminology from the "Pointer Aliasing Rules" section).

fhahn updated this revision to Diff 199008.May 10 2019, 6:16 AM

Thanks for your feedback @jdoerfert & @hfinkel! I've adjusted the description as suggested. I dropped the information about 'relevant' bits and made the description more precise.

fhahn updated this revision to Diff 199011.May 10 2019, 6:27 AM
fhahn marked 8 inline comments as done.

What happens if I use an integer type bigger than the native pointer width? Or smaller?

Initially I wanted to restrict the mask type to the native pointer width, but couldn't find how to do that in Intrinsics.td. On second thought, this might be more restrictive than necessary. I've adjusted the wording to make it clear that we zero-extend or truncate the mask if the bit width does not match the pointer size of the target. What do you think?

fhahn marked an inline comment as done.May 10 2019, 6:32 AM
fhahn added inline comments.
llvm/docs/LangRef.rst
16397 ↗(On Diff #198607)

Yep that was what I tried to say. I've dropped that bit about the relevant bits, as it does not seem to add much information.

What happens if I use an integer type bigger than the native pointer width? Or smaller?

Initially I wanted to restrict the mask type to the native pointer width, but couldn't find how to do that in Intrinsics.td. On second thought, this might be more restrictive than necessary. I've adjusted the wording to make it clear that we zero-extend or truncate the mask if the bit width does not match the pointer size of the target. What do you think?

The wording makes it clear now (what is supposed to happen at least).

Last points from my side:

  • What happened to the poison result sentence. I think we need that to ensure the "based on the same ..." part. Otherwise, we can butcher the pointer with the mask and end up with a "valid pointer" into a different object. I might be wrong here, just want to make sure we have thought of it.
  • Can we add the speculatable attribute? Executing ptrmask should never cause UB (as we agreed on) so we need speculatable to express that, right?
fhahn updated this revision to Diff 199460.May 14 2019, 8:36 AM

What happens if I use an integer type bigger than the native pointer width? Or smaller?

Initially I wanted to restrict the mask type to the native pointer width, but couldn't find how to do that in Intrinsics.td. On second thought, this might be more restrictive than necessary. I've adjusted the wording to make it clear that we zero-extend or truncate the mask if the bit width does not match the pointer size of the target. What do you think?

The wording makes it clear now (what is supposed to happen at least).

Thanks!

Last points from my side:

  • What happened to the poison result sentence. I think we need that to ensure the "based on the same ..." part. Otherwise, we can butcher the pointer with the mask and end up with a "valid pointer" into a different object. I might be wrong here, just want to make sure we have thought of it.

My understanding from Hal's comments was that the behavior should be clear from the pre-existing semantics, but I might have missed something?

The case where we fail to clear bits that must be zero should be clearly handled by the existing semantics. The case where we might clear meaningful bits and/or end up in a situation where we have multiple ptrmask calls for the same pointer and different masks should be OK, as that is what is asked for in the IR. If this creates invalid pointers, the existing semantics handle that case too.

  • Can we add the speculatable attribute? Executing ptrmask should never cause UB (as we agreed on) so we need speculatable to express that, right?

Done, that should be fine. UB can only occur when using the pointer.

  • What happened to the poison result sentence. I think we need that to ensure the "based on the same ..." part. Otherwise, we can butcher the pointer with the mask and end up with a "valid pointer" into a different object. I might be wrong here, just want to make sure we have thought of it.

My understanding from Hal's comments was that the behavior should be clear from the pre-existing semantics, but I might have missed something?

I don't see why but I might be wrong (@hfinkel). What I thought is detailed below.

The case where we fail to clear bits that must be zero should be clearly handled by the existing semantics.

Sure.

The case where we might clear meaningful bits

I'm not sure about this. At least I think this is not what you intended to do initially but maybe this is OK for you.
With this definition, I don't see why clearing lower bits would result in poison as long as the result points into the
same object. Even if not, the current definition doesn't really state what would happen if the resulting pointer would point into any valid object.

If this creates invalid pointers, the existing semantics handle that case too.

I don't know, see above.

llvm/docs/LangRef.rst
16456 ↗(On Diff #199460)

speculatable

fhahn added a comment.May 14 2019, 9:32 AM
  • What happened to the poison result sentence. I think we need that to ensure the "based on the same ..." part. Otherwise, we can butcher the pointer with the mask and end up with a "valid pointer" into a different object. I might be wrong here, just want to make sure we have thought of it.

My understanding from Hal's comments was that the behavior should be clear from the pre-existing semantics, but I might have missed something?

I don't see why but I might be wrong (@hfinkel). What I thought is detailed below.

The case where we fail to clear bits that must be zero should be clearly handled by the existing semantics.

Sure.

The case where we might clear meaningful bits

I'm not sure about this. At least I think this is not what you intended to do initially but maybe this is OK for you.

Yes, the current definition is more permissive than the original one.

With this definition, I don't see why clearing lower bits would result in poison as long as the result points into the
same object. Even if not, the current definition doesn't really state what would happen if the resulting pointer would point into any valid object.

It wouldn't be poison, unless ptrmask yields an inaccessible pointer through the masking operation.

I might be missing your point, but for the case ptrmask returning a pointer that "would point to any valid object", then the underlying object of the input pointer must be that object, as per definition, right? Transformations (and frontends) have to ensure no inconsistencies are introduced. Maybe that's worth calling out?

It might be simpler to describe the semantics in terms of GEP: e.g. ptrmask(x) is equivalent to gep x, (ptrtoint(x)&mask)-ptrtoint(x). I think that has the same semantics as the description, but it might be easier to understand for a reader familiar with GEPs.

Actually, a frontend could emit that explicitly and get reasonable results in most cases, I think. For example, "char*a(char*p) { return p+(((long)p&-8)-(long)p); }" in C produces a single "andq" instruction at -O2. Granted, you probably wouldn't want to depend on that if you're compiling a language that depends on a lot of ptrmasks; it has inaccurate cost modeling, you might end up with extra operations if the sequence gets split into multiple blocks, and it produces low-quality code at -O0.

I don't see why but I might be wrong (@hfinkel). What I thought is detailed below.

I think that the current wording looks good, and does include a statement that the new pointer is based on the same underlying object as the old pointer. Do you have any remaining concerns?

llvm/docs/LangRef.rst
16469 ↗(On Diff #199460)

underlying-object detected [dash for a compound adjective].

16477 ↗(On Diff #199460)

*based on* [the term is "based on"].

It might be simpler to describe the semantics in terms of GEP: e.g. ptrmask(x) is equivalent to gep x, (ptrtoint(x)&mask)-ptrtoint(x). I think that has the same semantics as the description, but it might be easier to understand for a reader familiar with GEPs.

I'm also happy with this.

fhahn updated this revision to Diff 199497.May 14 2019, 12:50 PM
fhahn marked 3 inline comments as done.

Thanks for the excellent suggestions! I've updated the text to use the GEP based description, as the GEP documentation spells out the behavior with respect to invalid pointers in detail. Also incorporated Hal's comments.

fhahn added a comment.May 14 2019, 2:19 PM

@sanjoy it would be great if you could have a quick look and check if the new approach makes sense to you.

sanjoy accepted this revision.May 14 2019, 9:11 PM

@sanjoy it would be great if you could have a quick look and check if the new approach makes sense to you.

LGTM!

This revision is now accepted and ready to land.May 14 2019, 9:11 PM

I just submitted a clang patch to generate ptrmask calls for review and I am planning to submit the intrinsics soon, unless there are any concerns on the clang side.

This revision was automatically updated to reflect the committed changes.