diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2554,6 +2554,43 @@ which specialized optimization passes may use to implement type-based alias analysis. +The inttoptr hole +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The pointer aliasing rules currently don't have a consistent interpretation in +LLVM. The issue is the semantics of ``inttoptr``, and loads that produce a +pointer. It's not clear what it means for a pointer value to "contribute". + +Suppose the strictest possible interpretation: all computations and control +flow are relevant to whether a pointer value "contributes". Then if an integer +is converted to a pointer, it depends on all pointers which have escaped at +that point. This is true even if we can prove the pointer value is equal to +the address of some specific object. This makes a bunch of transforms LLVM +currently performs illegal. For example, an inttoptr of a ptrtoint can't be +simplified to the operand of the ptrtoint. Or if you have a store of a +pointer, followed by a load of the same pointer, it can't be simplified to +the operand of the store. + +There are various ways this could be relaxed. The most likely solution is some +sort of invisible provenance indicator. At its core, this says that if a store +writes a pointer value, then a load reads that pointer value, the load is only +based on the value operand of the store that stored the value, not any other +escaped pointer. This description leaves a lot of open questions regarding the +interaction between pointer operations and non-pointer operations. + +Another possibility is that we could drop the notion of "based-on", and come +up with some other approach for alias analysis focused around "inbounds". +Suppose we had a stricter version of "inbounds" that didn't allow computing +the address of the byte one past the end of an object. Then we end up with a +pretty simple model: pointers themselves are just integers, but GEPs still +preserve something roughly equivalent to "based on". The problem here is +the current "inbounds" allows pointers one byte past the end of an object; that +pointer could point to another object, so analyzing that is a lot more +complicated. + +https://bugs.llvm.org/show_bug.cgi?id=34548 discusses various related issues +in LLVM. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf goes over +related issues in the C standard. + .. _volatile: Volatile Memory Accesses