This is an archive of the discontinued LLVM Phabricator instance.

[RS4GC] Treat inttoptr as base pointer
ClosedPublic

Authored by tolziplohu on Jun 1 2021, 2:33 PM.

Details

Summary

Currently, RewriteStatepointsForGC doesn't support the inttoptr instruction.
This is a problem, because some optimizations can generate inttoptrs; for example, it may inline an memcpy followed by a load into an inttoptr.
Before optimization, the result of the load would be treated as a base pointer, but after, it wouldn't accept it.
On a build with assertions, it would crash with an error; on a release build, it would enter an infinite loop.
This patch makes RS4GC treat inttoptr instructions as base pointers, so the behavior after optimization is the same as before.

It can also be useful for (semi-)conservative garbage collectors for languages with polymorphism, like OCaml.
Integers may be marked as non-pointers with a high or low set bit, then casted to a GC pointer and passed to a polymorphic function.
After this patch, that's possible to do with just an inttoptr.

Diff Detail

Event Timeline

tolziplohu created this revision.Jun 1 2021, 2:33 PM
tolziplohu requested review of this revision.Jun 1 2021, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2021, 2:33 PM
reames requested changes to this revision.Jun 1 2021, 2:46 PM

This is unsound. If you have a test case where the optimizer is generating an inttoptr for a non-integral pointer from well defined IR, please share it so that we can fix the bug.

I'll note that I might be convinced to take something like this, but only with a clear comment describing it as explicit undefined behavior. (i.e. not safe to rely on) I do see an argument for robustness (or reliably crashing) in release builds though. fatal_error might be an alternate approach.

This revision now requires changes to proceed.Jun 1 2021, 2:46 PM
tolziplohu added a comment.EditedJun 1 2021, 3:51 PM

It tuns out the problem only occurs when addrspace(1) isn't specified as non-integral, because inttoptr instructions are ill-typed when it's non-integral. But when addrspace(1) isn't specified as non-integral, RewriteStatepointsForGC still assumes that it is, and that inttoptr isn't allowed. If it's not non-integral, statepoints are only well-defined if the collector is non-moving, so inttoptr isn't a problem. So RS4GC only encounters inttoptr if GC pointers are integral, and thus the collector isn't moving (or the user is risking UB), and accepting it isn't a problem.

Regardless, I still think there needs to be some way to represent an integer as a pointer even with moving collectors, for things like the polymorphism example I gave. In that case, it's not necessary for it to work with actual, valid pointers, only for integer -> pointer -> integer conversion to work, assuming nothing is done with the pointer in the middle and the collector doesn't move it. This is actually possible with non-integral pointers via an addrspacecast before ptrtoint and after inttoptr, which I believe is technically UB but works fine if RS4GC accepts it and the collector knows to ignore it.

reames added a comment.Jun 3 2021, 9:35 AM

It tuns out the problem only occurs when addrspace(1) isn't specified as non-integral, because inttoptr instructions are ill-typed when it's non-integral. But when addrspace(1) isn't specified as non-integral, RewriteStatepointsForGC still assumes that it is, and that inttoptr isn't allowed.

Er, I think you've got a deeper issue here. RS4GC is intended to lower from the abstract machine model to the physical machine model. In the abstract machine model, references (i.e. gc pointers) *must* be non-integral. You don't have to use address space 1 (there are code hooks to change that), but abstract machine model requires non-integral pointers for the address space being rewritten.

If it's not non-integral, statepoints are only well-defined if the collector is non-moving, so inttoptr isn't a problem. So RS4GC only encounters inttoptr if GC pointers are integral, and thus the collector isn't moving (or the user is risking UB), and accepting it isn't a problem.

This is wrong. The problem is that the optimizer can hide a pointer, not the relocation.

Example:
%i = ptrtoint %p
safepoint
%p2 = inttoptr %i
use (%p2)

In this example, the collector might collect the object pointed to by %p as it is not live over the safepoint. This holds for both relocation and non-relocating collectors.

Regardless, I still think there needs to be some way to represent an integer as a pointer even with moving collectors, for things like the polymorphism example I gave. In that case, it's not necessary for it to work with actual, valid pointers, only for integer -> pointer -> integer conversion to work, assuming nothing is done with the pointer in the middle and the collector doesn't move it. This is actually possible with non-integral pointers via an addrspacecast before ptrtoint and after inttoptr, which I believe is technically UB but works fine if RS4GC accepts it and the collector knows to ignore it.

This is a really really hard design problem. I'm happy to discuss this offline, and I'm open to having a rule for handling inttoptr with explicit calls outs on the UB, but I'm worried you are misunderstanding the problem and are going to find yourself down a much deeper rabbit hole than you expect.

I think you've got a deeper issue here. RS4GC is intended to lower from the abstract machine model to the physical machine model. In the abstract machine model, references (i.e. gc pointers) *must* be non-integral. You don't have to use address space 1 (there are code hooks to change that), but abstract machine model requires non-integral pointers for the address space being rewritten.

You're right, I'm wrong. The statepoint semantics require non-integral pointers no matter what. I think RS4GC should make sure that the address space it uses has been specified non-integral and give an error if it hasn't, as this seems like an easy trap to fall into.

This is a really really hard design problem. I'm happy to discuss this offline, and I'm open to having a rule for handling inttoptr with explicit calls outs on the UB, but I'm worried you are misunderstanding the problem and are going to find yourself down a much deeper rabbit hole than you expect.

It is possible I'm misunderstanding, so here's how I see the problem: converting a GC pointer to an integer and then to a pointer again is UB and bad, because the collector can't see it in between. However, many languages require pretending an integer is a pointer. In these cases, it's not actually a valid pointer, and it's not a problem if the GC can't see it; the problem is, it's hard to make that work without accidentally making it possible to hide a real pointer as an integer. I see three main options for dealing with this:

  1. Do nothing, leave it as-is. In this case, the way to do something like this is to hide the inttoptr along with an addrspacecast to the non-integral address space in a noinline function without GC. This works, but is bad for optimization and performance in general, since an inttoptr should ideally be a no-op.
  2. Allow inttoptr (and addrspacecast, which is currently allowed in release builds but not debug builds) in RS4GC, with a note that accessing memory through a pointer created by inttoptr is invalid (a linter to catch simple cases of that might be a good idea?). This is ideal for optimization and performance, since it lowers to a no-op and many optimization passes can see through it, but it's easier for frontend authors to miss and create invalid casts.
  3. Add conversion intrinsics, something like declare i8 addrspace(1)* @llvm.gc.inttoptr(i64). This is harder for frontend authors to screw up, since to find the intrinsic they'll be looking at the docs which will specify that the returned pointer isn't a valid pointer. From a performance perspective, it's in between the other two options, because while it lowers to a no-op, it's opaque to optimizations.

The third option sounds most attractive to me, now that I better understand the risks of the second option. Does my evaluation of the problem sound accurate?

reames added a comment.Jun 4 2021, 2:35 PM

I think RS4GC should make sure that the address space it uses has been specified non-integral and give an error if it hasn't, as this seems like an easy trap to fall into.

Good idea, post a patch and it'd be a quick LGTM.

This is a really really hard design problem. I'm happy to discuss this offline, and I'm open to having a rule for handling inttoptr with explicit calls outs on the UB, but I'm worried you are misunderstanding the problem and are going to find yourself down a much deeper rabbit hole than you expect.

It is possible I'm misunderstanding, so here's how I see the problem: converting a GC pointer to an integer and then to a pointer again is UB and bad, because the collector can't see it in between. However, many languages require pretending an integer is a pointer. In these cases, it's not actually a valid pointer, and it's not a problem if the GC can't see it; the problem is, it's hard to make that work without accidentally making it possible to hide a real pointer as an integer. I see three main options for dealing with this:

  1. Do nothing, leave it as-is. In this case, the way to do something like this is to hide the inttoptr along with an addrspacecast to the non-integral address space in a noinline function without GC. This works, but is bad for optimization and performance in general, since an inttoptr should ideally be a no-op.
  2. Allow inttoptr (and addrspacecast, which is currently allowed in release builds but not debug builds) in RS4GC, with a note that accessing memory through a pointer created by inttoptr is invalid (a linter to catch simple cases of that might be a good idea?). This is ideal for optimization and performance, since it lowers to a no-op and many optimization passes can see through it, but it's easier for frontend authors to miss and create invalid casts.
  3. Add conversion intrinsics, something like declare i8 addrspace(1)* @llvm.gc.inttoptr(i64). This is harder for frontend authors to screw up, since to find the intrinsic they'll be looking at the docs which will specify that the returned pointer isn't a valid pointer. From a performance perspective, it's in between the other two options, because while it lowers to a no-op, it's opaque to optimizations.

The third option sounds most attractive to me, now that I better understand the risks of the second option. Does my evaluation of the problem sound accurate?

All of this depends a lot on the language.

A few quick examples:

  • You could spell a "pin" operator which prevents an object from being moved, and an "unpin" that releases. Within that dynamic scope, it's fine to loose copies of the pointer, provided at least one isn't lost. In this world, the conversion operator only exists from reference to integer, not the other way around. Such a conversion can be hoisted no further than the containing pin operation.
  • For that same language with a non-relocating GC, you can let the conversion float above the pin.
  • A language could allow a "create object from raw memory" mechanism, with all the invariants about a) having valid object state, and b) having the creation object being a *move* of the raw pointer. (e.g. the raw pointer can't survive the creation.) In this model, we'd need the optimizer never to look back through the conversion operator. (e.g. a transform which did convert(p)->f to p->f would be wrong.)

I have yet to find/come up with a sufficiently general model to describe all reasonable variants. At the moment, I'd suggest your variant (1), but as you note, that impacts optimization. Are there concrete examples you're trying to optimize?

Warning: I'm really bad at responding promptly to long design discussions in email. I strongly advise setting up a time to talk offline.

Good idea, post a patch and it'd be a quick LGTM.

Okay, here it is. It turns out most of the existing test cases for RS4GC actually forgot to mark addrspace(1) as non-integral as well, and one even uses a bunch of inttoptrs.

A few quick examples:

  • You could spell a "pin" operator which prevents an object from being moved, and an "unpin" that releases. Within that dynamic scope, it's fine to loose copies of the pointer, provided at least one isn't lost. In this world, the conversion operator only exists from reference to integer, not the other way around. Such a conversion can be hoisted no further than the containing pin operation.
  • For that same language with a non-relocating GC, you can let the conversion float above the pin.
  • A language could allow a "create object from raw memory" mechanism, with all the invariants about a) having valid object state, and b) having the creation object being a *move* of the raw pointer. (e.g. the raw pointer can't survive the creation.) In this model, we'd need the optimizer never to look back through the conversion operator. (e.g. a transform which did convert(p)->f to p->f would be wrong.)

I have yet to find/come up with a sufficiently general model to describe all reasonable variants. At the moment, I'd suggest your variant (1), but as you note, that impacts optimization. Are there concrete examples you're trying to optimize?

My current use case is what I alluded to in the first post: I want to pass something like an i32 to a polymorphic function, without allocating. So I mark the lowest bit and do an inttoptr before passing it to the function. This is the same strategy OCaml uses. So if LLVM inlines the polymorphic function, I'd ideally like it to be able to cancel out the inttoptr -> ptrtoint.

Something else I've noticed a need for is something like declare i8 addrspace(1)* @llvm.gc.is_base_ptr(i8 addrspace(1)*), which would just return its argument, to wrap around the return value of my allocation function. I'd like that to be inlined, but then LLVM finds the wrong base pointer, so currently I have to have another noinline function to pass the return value through. This intrinsic would probably also work for your "create object from raw memory" example, although I'm not sure how to tell the optimizer that the previous pointer is no longer valid. For your other examples, I'm not sure I fully understand the scenario, but converting from a reference to an integer and not back again is always safe, isn't it?

Warning: I'm really bad at responding promptly to long design discussions in email. I strongly advise setting up a time to talk offline.

What do you mean by "offline"? A video call or something? I'd be open to that, when you have time.

This revision was not accepted when it landed; it landed in state Needs Revision.Jun 7 2021, 10:27 AM
This revision was automatically updated to reflect the committed changes.

When reviewing your https://reviews.llvm.org/D103732 (in particular, the constants.ll test which had inttoptr constant expressions), I realized that this patch is necessary due to unreachable code. I went ahead and landed a slightly tweaked version of this change. I believe you should now be able to rebase your D103732 without needing to XFAIL a test.

p.s. I think your use case may still be problematic, but let's talk more on that. Maybe I'm wrong there too. :)

When reviewing your https://reviews.llvm.org/D103732 (in particular, the constants.ll test which had inttoptr constant expressions), I realized that this patch is necessary due to unreachable code. I went ahead and landed a slightly tweaked version of this change. I believe you should now be able to rebase your D103732 without needing to XFAIL a test.

I tried it, but because inttoptr isn't allowed with non-integral pointers and RS4GC doesn't handle addrspacecast, both constants.ll and base-inttoptr.ll can't be rewritten to use non-integral pointers, and both fail with https://reviews.llvm.org/D103732. Actually, with the check for inttoptr in the place you have it now, it will never actually trigger with https://reviews.llvm.org/D103732 because it must be behind an addrspacecast. Moving the check to inside the if but before the address space check would work but I'm not sure if it would have the right semantics; I have another idea below.

p.s. I think your use case may still be problematic, but let's talk more on that. Maybe I'm wrong there too. :)

I actually had another idea which would solve my problems, and also solves the problem above where the tests are failing because addrspacecast isn't allowed. If you want we can wait to discuss it through another medium, but otherwise here it is:

You mentioned a "create object from raw memory" operation, and I realized that that operation is enough to implement all these things, and the semantics make sense. You can do whatever you want to construct a raw pointer - inttoptr, constants, allocate from a global slab - and then you call something like i8 addrspace(1)* @llvm.gc.create_gc_pointer(i8*), which is really just an addrspacecast. When you call that intrinsic, you promise that either the pointer is valid, or you'll never access memory through it and the GC knows to ignore it. You also promise that you won't use the old raw pointer anymore, and neither will any other code, but in practice that's not necessary if you've pinned the object or have a non-moving collector and the GC doesn't deallocate it. This doesn't prevent the frontend from using the raw pointer after calling the intrinsic, but that would be UB, and the optimizer will never move loads or stores to the raw pointer past the intrinsic because the intrinsic might read or write the memory behind the raw pointer.

Any inttoptrs must be behind an addrspacecast anyway, and if a "create object from raw memory" operation is allowed at all then casting an integer to a pointer first must be allowed too. It also solves my allocation problem: the allocation block is a raw pointer, because the memory is manually managed. When an object is allocated, the logic is done on the raw pointer and then the final object pointer is converted to a GC pointer, and it starts being "managed", and RS4GC knows that's a base pointer.

For those following along, the last comment from Sam convinced me to do something I've been considering for a while. In commit ac81cb7e, I removed the verifier rules which made it impossible to properly test the logic in rs4gc with non-integral pointers and went ahead and updated the relevant tests.

Sam and I are scheduled for some offline discussion to see if we can come up with a) a robust solution for his use case, and b) maybe something generally useful in the space of interacting with gc managed objects, pinned objects, and allocation code paths.