The attached patch implements an approach to supporting garbage collection in LLVM that has been mentioned on the mailing list a number of times by now. There's a couple of issues that need to be addressed before submission, but I wanted to get this up to give maximal time for review.
The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. Our testing to date has focused on fully relocating collectors (where pointers can change at any safepoint poll, or call site), but the infrastructure should support collectors of other styles. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.
A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.
During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. We leverage the existing StackMap section format, which is already used by the patchpoint intrinsics, to report where pointer values live. Unlike a patchpoint, these locations are known (by the backend) to be writeable during the call. This enables the garbage collector to transparently read and update pointer values if required. We do optimize lowering in certain well known cases (constant pointers, a.k.a. null, being the key one.)
There are a few areas of this patch which could use improvement:
- The test coverage could be improved. Most of the tests we've actually been using are built on top of the safepoint insertion mechanism (not included here) and our runtime. We need to improve the IR level tests for optimizer semantics (i.e. not doing illegal transforms), and lowering. There are some minimal tests in place for the lowering of simple statepoints.
- The documentation needs revision, but should be reasonable complete.
- Many functions are missing doxygen comments
- There's a hack in to force the use of RSP+Offset addressing vs RBP-Offset addressing for references in the StackMap section. This works, shouldn't break anyone else, but should definitely be cleaned up. The choice of addressing preference should be up to the runtime.
When reviewing, I would greatly appreciate feedback on which issues need to be fixed before submission and those which can be addressed afterwards. It is my plan to actively maintain and enhance this infrastructure over next few months (and years). It's already been developed out of tree entirely too long (our fault!), and I'd like to move to incremental work in tree as quickly as feasible.
Planned enhancements after submission:
- The ordering of arguments in statepoints is essentially historical cruft at this point. I'm open to suggestions on how to make this more approachable. Reordering arguments would (preferably) be a post commit action.
- Support for relocatable pointers in callee saved registers over call sites. This will require the notation of an explicit relocation psuedo op and support for it throughout the backend (particularly the register allocator.)
- Optimizations for non-relocating collectors. For example, the clobber semantics of the spill slots aren't needed if the collector isn't relocating roots.
- Further optimizations to reduce the cost of spilling around each statepoint (when required at all).
- Support for invokable statepoints.
- Once this has baked in tree for a while, I plan to delete the existing gc_root code. It is unsound, and essentially unused.
In addition to the enhancements to the infrastructure in the currently proposed patch, we're also working on a number of follow up changes:
- Verification passes to confirm that safepoints were inserted in a semantically valid way (i.e. no memory access of a value after it has been inserted)
- A transformation pass to convert naive IR to include both safepoint polling sites, and statepoints on every non-leaf call. This transformation pass can be used at initial IR creation time to simplify the frontend authors' work, but is also designed to run on *fully optimized* IR, provided the initial IR meets certain (fairly loose) restrictions.
- A transformation pass to convert normal loads and stores into user provided load and store barriers.
- Further optimizations to reduce the number of safepoints required, and improve the infrastructure as a whole.
We've been working on these topics for a while, but the follow on patches aren't quite as mature as what's being proposed now. Once these pieces stabilize a bit, we plan to upstream them as well. For those who are curious, our work on those topics is available here: https://github.com/AzulSystems/llvm-late-safepoint-placement