Page MenuHomePhabricator

[CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.
Needs ReviewPublic

Authored by fhahn on Jul 3 2019, 3:57 AM.

Details

Summary

I am not sure if this is the best way to do the matching and would
appreciate any pointers for improving it.

Also, I am not sure if any of the available targets limits the pointer
size, e.g. to something like 56 bit pointers.

Event Timeline

fhahn created this revision.Jul 3 2019, 3:57 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 3 2019, 3:57 AM
fhahn updated this revision to Diff 207754.Jul 3 2019, 4:21 AM

Strip away llvm changes, use ABIAlignment for address space.

xbolva00 added inline comments.
clang/lib/CodeGen/CGExprScalar.cpp
2025

BO->getLHS()->IgnoreParens() ?

fhahn updated this revision to Diff 207761.Jul 3 2019, 5:10 AM
fhahn marked 2 inline comments as done.

Use IgnoreParens().

fhahn added inline comments.Jul 3 2019, 5:10 AM
clang/lib/CodeGen/CGExprScalar.cpp
2025

That's very useful, thanks!

I don't think this transform is valid, for the same reasons we don't do it in IR optimizations.

jfb added a comment.Jul 3 2019, 3:37 PM

I'm not sure I understand all the implications, and why that would / wouldn't be valid.

Should this be an builtin that can be called from C++ directly?

clang/lib/CodeGen/CGExprScalar.cpp
2034

This assumes pointers are max 64 bits :)
It seems better to shift CV up by bit_per_byte*sizeof(void*) - max_pointer_size, and then down to remove alignment, and finally checking that it's zero?

Which leads me to wonder: should we ever diagnose "bad" pointer masks?

I don't think this transform is valid, for the same reasons we don't do it in IR optimizations.

I believe that in the problematic cases we previously discussed (e.g., from https://reviews.llvm.org/D59065#1449682), they all depend on some control dependency being introduced somewhere in between the initial pointer casts and the other operations. If they're all syntactically together like this, maybe that's safe?

Does this actually catch the ABI code that motivated this in the first place? Isn't that in lib/CodeGen/TargetInfo.cpp (e.g., in emitRoundPointerUpToAlignment)?

I agree with Eli that this isn't obviously a legal transformation. llvm.ptrmask appears to make semantic guarantees about e.g. the pointer after the mask referring to the same underlying object, which means we can only safely emit it when something about the source program makes that guarantee. It's not at all clear that C does so for an expression like (T*) ((intptr_t) x & N).

If they're all syntactically together like this, maybe that's safe?

Having them together syntactically doesn't really help, I think; it might be guarded by some code that does the same conversion (and if you repeat the conversion, it has to produce the same result).

If they're all syntactically together like this, maybe that's safe?

Having them together syntactically doesn't really help, I think; it might be guarded by some code that does the same conversion (and if you repeat the conversion, it has to produce the same result).

Indeed. That's correct (and also why the hasOneUse check at the IR level would have been ineffective). However...

I agree with Eli that this isn't obviously a legal transformation. llvm.ptrmask appears to make semantic guarantees about e.g. the pointer after the mask referring to the same underlying object, which means we can only safely emit it when something about the source program makes that guarantee. It's not at all clear that C does so for an expression like (T*) ((intptr_t) x & N).

I think that this is the key point. First, at the IR level we have a problem because we have no way to robustly track pointer provenance information. If we have if (a == b) { f(a); } the optimizer can transform this code into if (a == b) { f(b); } and we've lost track of whether the parameter to f is based on a or b. At the source level we don't have this problem (because we have the unaltered expressions provided by the user, and can therefore use whatever provenance information that source implies).

Thus, as John says, the question is whether, at the source level, (T*) ((intptr_t) x & N) always has, and only has, the same underlying objects as x - when executing the expression is well defined. In C++, I think that this is clearly true for implementations with "strict pointer safety" (6.6.5.4.3), as the rules for safely-derived pointer values state that, while you can get safely-derived pointer values using integer casts and bitwise operators, the result must be one that could have been safely derived from the original object using well-defined pointer arithmetic, and that's only true for pointers into some array pointed into by x (or one past the end). For implementations with "relaxed pointer safety", it's all implementation defined, so I don't see we couldn't choose our implementation-defined semantics to define this problem away (although we certainly need to be careful that we don't unintentionally make any significant body of preexisting code incompatible with Clang by doing so).

For C, we also need to be concerned with the definition of "based on" (6.7.3.1). In some philosophical sense, this seems trickier (i.e., what if modifying the value of x at some sequence point prior to the expression makes the expressions dead? Are we required, as part of the standardized through experiment, to also modify the other variables to keep the expression alive when performing the "based on" analysis, and do those modifications count for the purposes of determining the "based on" property?). Regardless, given that the intent is to enable optimizations, it seems reasonable to say that (T*) ((intptr_t) x & N) is only based on x. For C, 6.3.2.3 makes the conversion validity itself implementation defined.

@rsmith , thoughts on this?

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

If the general language rules are too permissive for some interesting optimization, it's fine to consider builtins that impose stronger restrictions on their use.

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have void *memory_pool, and we need to allow x & N to point into memory_pool, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary). If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default). I'm assuming that this would be rare. If I'm wrong, then we shouldn't do this by default.

(*) We do have a problem if we inline the implementation of malloc, given how our noalias return attribute works, but that's a preexisting problem, and the malloc implementation should probably be compiled with -fno-builtin-malloc regardless.

If the general language rules are too permissive for some interesting optimization, it's fine to consider builtins that impose stronger restrictions on their use.

I agree.

Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

fhahn added a comment.Jul 4 2019, 7:24 AM

Thanks for the quick responses and the helpful comments. Thank you very much Hal, for summarizing the argument from previous discussions. My initial understanding indeed was that by generating ptrmask directly for C/C++ expressions, we can circumvent the issues that come with ptrtoint/inttoptr in LLVM.

One key point that might not be too clear is that the question should be whether (T*) ((intptr_t) x & N) points to the same underlying object as x, iff the mask N preserves all 'relevant' bits of the pointer x. I am not sure if 'relevant' bits is the best term, but I use it to refer to all bits that do not have to be zero due to alignment requirements or pointer size restrictions. With that in mind, let me try to cover the possible cases in terms of C++'s safely-derived pointers, depending on x. (I'm referencing http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf)

  1. if x is a safely-derived pointer, then mask is a no-op and the result of the expression is a safely-derived pointer; as x was safely-derived, all bits that are masked out must already be 0, so according to 3.7.4.3.3, reinterpret_cast<void*>(x) should be equal to reinterpret_cast<void*>(((intptr_t) x & N)).
  1. if x is not a safely-derived pointer, but it becomes one after masking: then x must be the result of a series of bitwise operations, that only modify the bits masked out later by N. Otherwise the whole series of bitwise operations including the masking would violate 3.7.4.3.3 - the result of an additive or bitwise operation, one of whose operands is an integer representation of a safely-derived pointer value P, if that result converted by reinterpret_cast<void*> would compare equal to a safely-derived pointer computable from reinterpret_cast<void*>(P)
  1. if x is not a safely-derived pointer and the mask does not turn it into a safely-derived pointer: in that case, the masking should again not change the safely-derived property, and both would be invalid under strict pointer safety.

I think the key case is 2., where the mask operation is the last step in a series of bitwise operations, taking an integer representation of a safely-derived pointer value P and after masking we get P again. E.g. packing/unpacking bits of a tagged pointer (P | 1) & ~1. After writing all that down, there seems to be one problem though: technically we have a series of bitwise operations and the intermediate values are not integer values of safely-derived pointers. One could argue that the bitwise operations together cancel out each other and are a no-op, resulting in the original pointer.

Does this summary make sense?

fhahn added a comment.Jul 4 2019, 7:43 AM

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

If the mask could change the 'relevant' bits of a pointer, like in this example, we would not generate a ptrmask call. We would only generate ptrmask calls here, if the mask only masks out bits that need to be zero due to alignment requirements (and high bits if the pointer size is limited).

Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

Yes, adding it to clang is extra, my main motivation for the intrinsic was to improve performance of tagged pointers used by a different frontend. Generating the intrinsic from C/C++ would just be an additional benefit, to improve handling of tagged pointers and similar code and provide wider testing for the intrinsic. For the LLVM/Clang codebase, most cases we generate ptrmask with this patch come from PointerIntPair and those should be fairly easy to handle with a builtin, if we cannot generate it automatically.

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have void *memory_pool, and we need to allow x & N to point into memory_pool, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).

This is what I was thinking about for allocators; this is a common implementation technique for free / realloc / malloc_size.

If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).

If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree. If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.

Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead. So any proposal to weaken the latter as a workaround makes me very worried, especially if it's also enforcing alignment restrictions that we've generally chosen not to enforce when separated from actual memory accesses.

Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

I agree that we could use this intrinsic there safely, with the "object" being the variadic arguments area of the original va_list.

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have void *memory_pool, and we need to allow x & N to point into memory_pool, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).

This is what I was thinking about for allocators; this is a common implementation technique for free / realloc / malloc_size.

If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).

If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree. If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.

Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead.

Indeed, we have.

So any proposal to weaken the latter as a workaround makes me very worried, especially if it's also enforcing alignment restrictions that we've generally chosen not to enforce when separated from actual memory accesses.

I *think* that what @fhahn points out about the masking restrictions addresses the concerns that we were discussing: My understanding of the potentially-problematic cases here are when the masking could get you from one variable from one underlying object (that you might access) to another variable in a different underlying object (that you might also access), and given the masking restrictions, this is impossible (*): because the high bits can't be used for addressing, and for the lower bits, these fall within the alignment of the type, and so in order for that to move you between underlying objects, the objects would need to be smaller than the type alignment.

(*) I suppose one can arrange for this to break something given some system-specific setup:

// note: the linker script ensures that these are packed and adjacent.
short a;
struct {
  short b, c;
} s;

...
int *ib = (int *) s.b;
int *ia = (int *) (((size_t) ib) & ~1);
 a = 6;
(*(short *) ia) = 5;
return a; // oops, aliasing will now enable us to return 6 here.

The probability that someone is doing that seems low, because they need to be doing the masking and casting with a type with larger alignment requirements than the type they'll actually access.

On the other hand, using the otherwise-unused bits in the pointer to store information, and then mask to get the pointer back, is pretty common. In any case, I recommend that we add a "-f" flag for this.

Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

I agree that we could use this intrinsic there safely, with the "object" being the variadic arguments area of the original va_list.

I would be opposed to any addition of a -f flag for this absent any evidence that it's valuable for some actual C code; this patch appears to be purely speculative. I certainly don't think we should be adding it default-on. Your argument about alignment is what I was referring to when I mentioned that this is enforcing alignment restrictions in places we traditionally and intentionally haven't.

Note that it won't help Clang's major uses of PointerIntPair and PointerUnion because our traits say that types like Decl and Type have more alignment bits than their formal alignment would indicate, and PointerIntPair occupies alignment bits starting with the most significant.

I would be opposed to any addition of a -f flag for this absent any evidence that it's valuable for some actual C code; this patch appears to be purely speculative. I certainly don't think we should be adding it default-on. Your argument about alignment is what I was referring to when I mentioned that this is enforcing alignment restrictions in places we traditionally and intentionally haven't.

My underlying thought here is: The more we generate a particular IR construct the more quickly we'll find the bugs and the better we'll end up optimizing it. Thus, I found the idea behind this patch appealing. My experience is also that providing more pointer underlying-object information tends to help code quality. However, I definitely see your point that this is potentially risky: bugs uncovered by generating this intrinsic based on this code pattern might reveal only code doing interesting things with pointer bits and not actual optimizer bugs. I think that we have different feelings on the magnitude of that risk, but I definitely glad that you weighed in here.

I suggest that we drop this unless and until we have benchmark data showing it to be useful. If we do, then we can consider an off-by-default flag. It will also be interesting to explore adding a Clang intrinsic.

Note that it won't help Clang's major uses of PointerIntPair and PointerUnion because our traits say that types like Decl and Type have more alignment bits than their formal alignment would indicate, and PointerIntPair occupies alignment bits starting with the most significant.

fhahn added a comment.Jul 9 2019, 10:08 AM

I would be opposed to any addition of a -f flag for this absent any evidence that it's valuable for some actual C code; this patch appears to be purely speculative. I certainly don't think we should be adding it default-on. Your argument about alignment is what I was referring to when I mentioned that this is enforcing alignment restrictions in places we traditionally and intentionally haven't.

Note that it won't help Clang's major uses of PointerIntPair and PointerUnion because our traits say that types like Decl and Type have more alignment bits than their formal alignment would indicate, and PointerIntPair occupies alignment bits starting with the most significant.

In total with this patch, we generate ~180 ptrmask calls when building llvm/clang. As mentioned above, we miss out a bunch of cases, because of PointerIntPair putting its data into the most significant bits. I guess we could cover those cases with a builtin and the main motivation for this patch mostly goes away. Do we have any policies against using clang-only builtins in the codebase?

Anyways, thanks for all the feedback and discussion, it was really helpful!

I wouldn't favor adding something really obscure that was only useful for clang, but I think builtins to set and clear mask bits while promising to preserve object-reference identity would be more generally useful for libraries. For example, there might be somewhere in libc++ that could take advantage of this.

Do we have any policies against using clang-only builtins in the codebase?

No, but obviously it will need to be protected with appropriate ifdefs and integrated in some maintainable fashion.

I wouldn't favor adding something really obscure that was only useful for clang, but I think builtins to set and clear mask bits while promising to preserve object-reference identity would be more generally useful for libraries. For example, there might be somewhere in libc++ that could take advantage of this.

Yep, I think for this to be generally useful we would also need an intrinsic to set bits as well. I'll commit the LLVM patches and improvements to ValueTracking in the next few days for ptrmask and will put up a patch for builtins once I know more about how libcxx could use them.