Page MenuHomePhabricator

[BasicAA] Simplify inttoptr(and(ptrtoint(X), C)) to X, if C preserves all significant bits.
Needs RevisionPublic

Authored by fhahn on Mar 6 2019, 5:52 PM.

Details

Summary

Depending on the ABI alignment and the actual pointer size, some of the
most and least significant bits are not used for addressing. When used
for accessing memory, those bits have to be 0. Those unused bits can be
used to store additional information (tagged pointers) and this
additional information can be masked out using the pattern above. If
only insignificant bits are masked out, the location the pointer
references is not changed.

Does this make sense? Unfortunately I did not find anything in the
LangRef about what the 'size of a pointer' in the datalayout actually means.
This patch assumes that bits outside the maximum pointer size and ABI
alignment do not change the location the pointer points to.

This interpretation allows for better handling of AA for tagged
pointers. In any case, I intend to clarify the LangRef if it is indeed missing
information about that scenario.

Event Timeline

fhahn created this revision.Mar 6 2019, 5:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 5:52 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
arsenm added a subscriber: arsenm.Mar 14 2019, 8:07 AM
arsenm added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3798

I don't see why you would use the max size, when you can check the 2 address space sizes

llvm/test/Analysis/BasicAA/strip-nonsignificant-ptr-bits.ll
89

Needs tests where the address spaces don't match and have different sizes

I'm really afraid that this isn't sound. We've had a number of issues in this space, and we've always resisted attempts to make BasicAA look through inttoptr/ptrtotint. Once you convert to an integer, control dependencies can effectively add additional underlying objects. In cases where this is sound, why not fold away the whole inttoptr(and(ptrtoint)) in the first place?

fhahn marked an inline comment as done.Mar 14 2019, 11:51 AM

I'm really afraid that this isn't sound. We've had a number of issues in this space, and we've always resisted attempts to make BasicAA look through inttoptr/ptrtotint. Once you convert to an integer, control dependencies can effectively add additional underlying objects. In cases where this is sound, why not fold away the whole inttoptr(and(ptrtoint)) in the first place?

Thanks. I agree it is a bit shaky as I think the LangRef is not too clear about the issue. I think it would make sense to specify that the unused bits have to be 0 for memory accesses (at least for MacOS/Linux on Arm64/X86), but maybe not for getelementptr & co. Given how BasicAA is used to look through getlementptrs, I am not entirely sure that can be guaranteed at the moment.

I am also not entirely sure how control dependencies could add new underlying objects with this patch. We are looking backwards for a specific pattern and the underlying object has to dominate the and & ptrtoints.

In the test case, it is not possible to fold to inttoptr(and(ptrtoint)) unfortunately, because the original pointer has some unused bits set, and they need to be cleared before memory accesses, otherwise an invalid address is accessed.

llvm/lib/Analysis/ValueTracking.cpp
3798

Thanks, I'll update the code, if the underlying code is sound.

Ping. @hfinkel did my response make sense?

I could also split it in 2 parts: 1) strip lower bits due to alignment requirement and 2) strip high bits due to available bits in address space. Would that make it easier to reason about separately?

fhahn added a subscriber: atrick.Fri, Mar 29, 11:11 AM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

llvm/lib/Analysis/ValueTracking.cpp
3789

This code is essentially repeated from the code added to BasicAA. Please put this into a utility function or similar.

fhahn updated this revision to Diff 192931.Fri, Mar 29, 3:19 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

This was very helpful and interesting, thanks!

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

Thanks, from the reading above, those restrictions should prevent applying the
analysis on ptrtoint/inttoptr pairs where multiple objects contribute to the result.

I've updated the patch to check the number of users and to not allow any instructions between ptrtoint,and,inttoptr for now. We can relax that later IMO.

fhahn marked an inline comment as done.Fri, Mar 29, 3:19 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1. That IR ^ could have resulted from:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int64 ptr0_i = (int64)(ptr0+1);
int64 ptr1_i = (int64)(ptr1);

int32* ptr = (int*)ptr1_i;
aqjune added a subscriber: aqjune.Fri, Mar 29, 7:39 PM

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1. That IR ^ could have resulted from:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int64 ptr0_i = (int64)(ptr0+1);
int64 ptr1_i = (int64)(ptr1);

int32* ptr = (int*)ptr1_i;

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

sanjoy requested changes to this revision.Sun, Mar 31, 10:35 AM
This revision now requires changes to proceed.Sun, Mar 31, 10:35 AM
fhahn added a comment.Sun, Mar 31, 2:09 PM

Thanks for taking a look!

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Please read this https://bugs.llvm.org/show_bug.cgi?id=34548

I *think* that this is okay if the ptrtoint and the and have only one user, and they're in the same basic block, and there's nothing that can throw, etc. in between?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1.

I am probably missing something, but I am not sure how such a case would be possible with this patch? It specifically looks for a inttoptr(and(ptrtoint, C)) sequence, where C is such that the (logical) destination of the pointer remains unchanged. Unfortunately I do not think the LangRef is clear about 'irrelevant' bits in pointers (due to alignment or address spaces)

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

I'm not sure about this. You could have:

int32* ptr0 = malloc(4);
int32* ptr1 = malloc(4);

if (ptr0+1 != ptr1) return;

int32* ptr = (int*)(int64)(ptr0+1);

in which ptr would alias ptr1. But if you transform ptr to ptr0+1 then it would not alias ptr1.

I am probably missing something, but I am not sure how such a case would be possible with this patch? It specifically looks for a inttoptr(and(ptrtoint, C)) sequence, where C is such that the (logical) destination of the pointer remains unchanged. Unfortunately I do not think the LangRef is clear about 'irrelevant' bits in pointers (due to alignment or address spaces)

For instance:

// Let's say we know malloc(64) will always return a pointer that is 8 byte
// aligned.

int8* ptr0 = malloc(64);
int8* ptr1 = malloc(64);

int8* ptr0_end = ptr0 + 64;

// I'm not sure if this comparison is well defined in C++, but it is well
// defined in LLVM IR:
if (ptr0_end != ptr1) return;

intptr ptr0_end_i = (intptr)ptr0_end;

intptr ptr0_end_masked = ptr0_end_i & -8;

// I think the transform being added in this comment will fire below since it is
// doing inttoptr(and(ptrtoint(ptr0_end), -8)).

int8* aliases_ptr0_and_ptr1 = (int8*)ptr0_end_masked;

Right now aliases_ptr0_and_ptr1 aliases both ptr0 and ptr1 (we can GEP backwards from it to access ptr0 and forwards from it to access ptr1). But if we replace it with ptr0_end then it can be used to access ptr0 only.

I just saw this patch. I agree with Sanjoy's comment.
One possible workaround to allow this optimization is to check dereferenceability of the original pointer:

p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p2, 20
=>
p1 = malloc()
v16 = ptrtoint p1 to i64
p2 = inttoptr v16 to i8*
store i8* p1, 10
store i8* p1, 20 // p2 replaced with p1

If (1) the original pointer p1 has been dereferenced before, and (2) there have been no operation that may have freed p1, we can assume that p2 must alias p1.
Informal reasoning is as follows. It should be guaranteed that replacing p2 with p1 does not introduce undefined behavior. If p1 already have been dereferenced before, replacing it does not introduce new undefined behavior.

Does this workaround work for this patch?

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

aqjune added a comment.Mon, Apr 1, 2:36 AM

@fhahn

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

The problem happens when %addr is untagged and not dereferenceable.
For example, given an array int a[4], &a[4] is not dereferenceable, but (int *)(int)&a[4] may update another object that is adjacent to a[4].
IIUC, if getUnderlyingObject(p) returns an object obj, modifying p should either only update obj or raise undefined behavior, but is unallowed to modify other objects.
If %addr = &a[4], it is incorrect for getUnderlyingObject(inttoptr(and(ptrtoint %addr, C))) to return a.
If there is a guarantee that the object %addr points to is still alive and inttoptr(and(ptrtoint %addr, C)) is within the object (e.g. a[0], .., a[3]), the analysis is correct.

@atrick

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

fhahn added a comment.Mon, Apr 1, 5:44 AM

For instance:

// Let's say we know malloc(64) will always return a pointer that is 8 byte
// aligned.

int8* ptr0 = malloc(64);
int8* ptr1 = malloc(64);

int8* ptr0_end = ptr0 + 64;

// I'm not sure if this comparison is well defined in C++, but it is well
// defined in LLVM IR:
if (ptr0_end != ptr1) return;

intptr ptr0_end_i = (intptr)ptr0_end;

intptr ptr0_end_masked = ptr0_end_i & -8;

// I think the transform being added in this comment will fire below since it is
// doing inttoptr(and(ptrtoint(ptr0_end), -8)).

int8* aliases_ptr0_and_ptr1 = (int8*)ptr0_end_masked;

Right now aliases_ptr0_and_ptr1 aliases both ptr0 and ptr1 (we can GEP backwards from it to access ptr0 and forwards from it to access ptr1). But if we replace it with ptr0_end then it can be used to access ptr0 only.

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);

if (p != (intptr_t) obj2) return;
 
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)? If that is the case, then an intrinsic as suggested by @atrick would help circumvent that issue. If it is not the case and the information that p aliases obj1 and obj2 is limited to uses of p, then I think the restrictions in place should be sufficient to rule out your example (assuming we use integer comparisons for the pointers)

In the example, the original pointer (%addr = load %struct.zot*, %struct.zot** %loc, align 8) is not dereference directly and the use case I am looking at is tagged pointers, where the inttoptr(and(ptrtoint(), C) roundtrip is required to get a valid pointer. So the original pointer might not be dereferenceable directly, but logically (ignoring the bits irrelevant for the pointer value) it should still point to the same object. Does that make sense to you?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

sanjoy added a comment.Tue, Apr 2, 7:30 PM

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

I agree, but unfortunately it isn't clear to me how we can generate this intrinsic from frontend code (assuming a C++ frontend) that does pointer arithmetic by casting pointers to integers and back.

sanjoy added a comment.Tue, Apr 2, 7:51 PM

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);
 
if (p != (intptr_t) obj2) return;
  
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)?

Yes. In the abstract LLVM machine pointers have provenance and integers don't. All integers with the same bitwise value are equivalent (can be replaced one for another), but bitwise identical pointers are not necessarily equivalent. This lets us do aggressive optimization on integers while still keeping a strong (ish) memory model.

A consequence of this is that when you convert (intptr_t)(obj1 + 4) back to a pointer, the new pointer's provenance includes all pointers whose bitwise value could have been obj1 + 4.

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

I haven't thought this through but it still seems fishy to me: IIRC LLVM's alias predicate is defined *if* a write to X can be observed by Y (or vice versa) *then* X aliases Y. So two readonly locations A and B are both must-alias and no-alias: there can never a write to a readonly location so the antecedent of the predicate is false (so both "A aliases B" and "A does not alias B" are true). It seems like we have a similar situation here: X is an illegal address that you can't load or store from and so both "X alias A" and "X does not alias A" are true. But Alias(inttoptr(ptrtoint(X) & -8), A) (which has a specific answer since it is legal to load from/store to inttoptr(ptrtoint(X) & -8)) has a definite answer.

fhahn added a comment.Wed, Apr 3, 2:53 PM

Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like

int8_t* obj1 = malloc(4);
int8_t* obj2 = malloc(4);
int p = (intptr_t)(obj1 + 4);
 
if (p != (intptr_t) obj2) return;
  
*(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?

I thought the information obtained via the control flow, p aliases both obj1 and obj2, is limited to the uses of p, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)?

Yes. In the abstract LLVM machine pointers have provenance and integers don't. All integers with the same bitwise value are equivalent (can be replaced one for another), but bitwise identical pointers are not necessarily equivalent. This lets us do aggressive optimization on integers while still keeping a strong (ish) memory model.

A consequence of this is that when you convert (intptr_t)(obj1 + 4) back to a pointer, the new pointer's provenance includes all pointers whose bitwise value could have been obj1 + 4.

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

That seems problematic for another reason: IIUC you're saying Alias(inttoptr(ptrtoint(X) & -8), A) == Alias(X, A). But X is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?

Agreed, I think we would need to make this explicit in the langref. X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.

I haven't thought this through but it still seems fishy to me: IIRC LLVM's alias predicate is defined *if* a write to X can be observed by Y (or vice versa) *then* X aliases Y. So two readonly locations A and B are both must-alias and no-alias: there can never a write to a readonly location so the antecedent of the predicate is false (so both "A aliases B" and "A does not alias B" are true). It seems like we have a similar situation here: X is an illegal address that you can't load or store from and so both "X alias A" and "X does not alias A" are true. But Alias(inttoptr(ptrtoint(X) & -8), A) (which has a specific answer since it is legal to load from/store to inttoptr(ptrtoint(X) & -8)) has a definite answer.

Hm, if the definition is based on pointers directly and requires de-referenceability, that would be indeed be tricky. If the predicate is defined based on memory locations, one might be able to argue that the memory location is only reference by the valid bits. The documentation about must-alias/no-alias ( http://llvm.org/docs/AliasAnalysis.html#must-may-or-no ) is not as precise I think. I think we would have to resolve this question also when using the intrinsic.

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

I agree, but unfortunately it isn't clear to me how we can generate this intrinsic from frontend code (assuming a C++ frontend) that does pointer arithmetic by casting pointers to integers and back.

I think that we would need the frontend to directly generate the relevant code. It's the source-language semantics that would make that legal. We wouldn't be able to form them later for the same reason that we couldn't do the analysis later in the first place.

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Yes, I think we should add this to the AA docs. I think the best reference for a consistent LLVM memory model is https://sf.snu.ac.kr/publications/llvmtwin.pdf .

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

Generally I don't think we can define semantics like these as function local since it would make inlining and outlining non-behavior preserving.

fhahn added a comment.Thu, Apr 4, 1:31 PM

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Yes, I think we should add this to the AA docs. I think the best reference for a consistent LLVM memory model is https://sf.snu.ac.kr/publications/llvmtwin.pdf .

Ok great. Thanks for all the input & patience. I'll try to summarize things in the AA docs when I have a bit more time!

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert inttoptr(and(ptrtoint(X), C)) chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

Generally I don't think we can define semantics like these as function local since it would make inlining and outlining non-behavior preserving.

Right, that's what I thought. I'll take a look at clang and see if we would have access to the datalayout and if it would be feasible to generate calls to a ptrmask intrinsic.