Statepoint infrastructure for garbage collection
ClosedPublic

Authored by reames on Oct 8 2014, 2:24 PM.

Details

Summary

The attached patch implements an approach to supporting garbage collection in LLVM that has been mentioned on the mailing list a number of times by now. There's a couple of issues that need to be addressed before submission, but I wanted to get this up to give maximal time for review.

The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. Our testing to date has focused on fully relocating collectors (where pointers can change at any safepoint poll, or call site), but the infrastructure should support collectors of other styles. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.

A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. We leverage the existing StackMap section format, which is already used by the patchpoint intrinsics, to report where pointer values live. Unlike a patchpoint, these locations are known (by the backend) to be writeable during the call. This enables the garbage collector to transparently read and update pointer values if required. We do optimize lowering in certain well known cases (constant pointers, a.k.a. null, being the key one.)

There are a few areas of this patch which could use improvement:

  • The test coverage could be improved. Most of the tests we've actually been using are built on top of the safepoint insertion mechanism (not included here) and our runtime. We need to improve the IR level tests for optimizer semantics (i.e. not doing illegal transforms), and lowering. There are some minimal tests in place for the lowering of simple statepoints.
  • The documentation needs revision, but should be reasonable complete.
  • Many functions are missing doxygen comments
  • There's a hack in to force the use of RSP+Offset addressing vs RBP-Offset addressing for references in the StackMap section. This works, shouldn't break anyone else, but should definitely be cleaned up. The choice of addressing preference should be up to the runtime.

When reviewing, I would greatly appreciate feedback on which issues need to be fixed before submission and those which can be addressed afterwards. It is my plan to actively maintain and enhance this infrastructure over next few months (and years). It's already been developed out of tree entirely too long (our fault!), and I'd like to move to incremental work in tree as quickly as feasible.

Planned enhancements after submission:

  • The ordering of arguments in statepoints is essentially historical cruft at this point. I'm open to suggestions on how to make this more approachable. Reordering arguments would (preferably) be a post commit action.
  • Support for relocatable pointers in callee saved registers over call sites. This will require the notation of an explicit relocation psuedo op and support for it throughout the backend (particularly the register allocator.)
  • Optimizations for non-relocating collectors. For example, the clobber semantics of the spill slots aren't needed if the collector isn't relocating roots.
  • Further optimizations to reduce the cost of spilling around each statepoint (when required at all).
  • Support for invokable statepoints.
  • Once this has baked in tree for a while, I plan to delete the existing gc_root code. It is unsound, and essentially unused.

In addition to the enhancements to the infrastructure in the currently proposed patch, we're also working on a number of follow up changes:

  • Verification passes to confirm that safepoints were inserted in a semantically valid way (i.e. no memory access of a value after it has been inserted)
  • A transformation pass to convert naive IR to include both safepoint polling sites, and statepoints on every non-leaf call. This transformation pass can be used at initial IR creation time to simplify the frontend authors' work, but is also designed to run on *fully optimized* IR, provided the initial IR meets certain (fairly loose) restrictions.
  • A transformation pass to convert normal loads and stores into user provided load and store barriers.
  • Further optimizations to reduce the number of safepoints required, and improve the infrastructure as a whole.

We've been working on these topics for a while, but the follow on patches aren't quite as mature as what's being proposed now. Once these pieces stabilize a bit, we plan to upstream them as well. For those who are curious, our work on those topics is available here: https://github.com/AzulSystems/llvm-late-safepoint-placement

Diff Detail

Repository
rL LLVM
reames retitled this revision from to Statepoint infrastructure for garbage collection.Oct 8 2014, 2:24 PM
reames updated this object.
reames edited the test plan for this revision. (Show Details)
reames added a subscriber: Unknown Object (MLST).
silvas added a subscriber: silvas.Oct 9 2014, 1:31 PM

+Filip Pizlo

reames updated this revision to Diff 14673.Oct 9 2014, 1:37 PM
reames updated this object.

Updating statepoint changes to:

  • include some reasonable documentation. What's here still needs serious work, but at least it's reasonable complete and doesn't contain obvious errors.
  • remove the hack to use Indirect vs Direct locations.
  • rebase on TOT
reames updated this revision to Diff 14677.Oct 9 2014, 1:55 PM

The previous update didn't actually include some of the changes I'd meant it to.

I don't know enough about this topic to comment on the details of the ideas (although they seem reasonable and match what had been discussed on the mailing list), but I have a read part of the patch and found a few typos and have a few inline comments.

Could you also include more contexts in your patches (-U999999 if you produce your patches by svn/git diff)? It generally makes it easier to review changes on Phabricator.
Thanks for working on this.

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #14677)

What are these 'BEGIN CHANGE/END CHANGE' comments ?

include/llvm/IR/Intrinsics.td
500 ↗(On Diff #14677)

Is this necessary ? I don't see in the documentation any case where the statepoint can throw an exception. And if the goal is just to say it can have side-effects on memory, that is the default if there is no annotation according to Intrinsics.td

502 ↗(On Diff #14677)

These (and gc_relocate) should probably have the attribute IntrNoMem if I understood correctly your documentation.

lib/CodeGen/StackMaps.cpp
285 ↗(On Diff #14677)

contigious -> contiguous

lib/IR/LLVMContext.cpp
252 ↗(On Diff #14677)

Unnecessary whitespace change.

lib/IR/Statepoint.cpp
15 ↗(On Diff #14677)

Could be simplified in "return F && F->getIntrinsicID() == Intrinsic::statepoint"

32 ↗(On Diff #14677)

Could be simplified in "return CS.isCall() && isGCRelocate(CS.getInstruction())"

47 ↗(On Diff #14677)

Could be simplified in "return CS.isCall() && isGCRelsult(CS.getInstruction())"

96 ↗(On Diff #14677)

This should probably be changed before upstreaming if it depends on your runtime.

lib/IR/Verifier.cpp
1120 ↗(On Diff #14677)

It is not clear to me why the const_cast is required: I is a const Instruction *, and isStatepoint accepts a const Instruction *.

2561 ↗(On Diff #14677)

gc.result -> gc.relocate

lib/Target/X86/X86FrameLowering.h
74 ↗(On Diff #14677)

Why not just use the override keyword ?

lib/Target/X86/X86ISelLowering.cpp
20621 ↗(On Diff #14677)

devirge -> diverge

lib/Target/X86/X86MCInstLower.cpp
861 ↗(On Diff #14677)

likelly -> likely

862 ↗(On Diff #14677)

Is this invariant something general to LLVM and documented as such, or specific to your application (I am asking because the comment talks about a 'VM')?

870 ↗(On Diff #14677)

Same question: where do this guarantee come from

test/Verifier/statepoint-non-gc-ptr.ll
1 ↗(On Diff #14677)

It is not clear exactly what this tests

reames updated this revision to Diff 14905.Oct 14 2014, 5:41 PM
reames updated this object.

respond to review comments, remove a few pieces of code which aren't relevant for basic functionality, and add experimental prefix to the names.

morisset, I think I've addressed all your comments. Let me know if you see anything I missed.

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #14677)

Removed.

include/llvm/IR/Intrinsics.td
500 ↗(On Diff #14677)

Removed. This was an artefact of work to support invokable statepoints which aren't part of this patch.

502 ↗(On Diff #14677)

You're right. This change will need more testing that I can easily give it at the moment. Do you mind if this change lands separately once this is integrated and I've merged it back into our tree?

lib/IR/Verifier.cpp
1120 ↗(On Diff #14677)

It's not. This is just old code that hadn't been updated. Fixed. Thanks.

lib/Target/X86/X86MCInstLower.cpp
862 ↗(On Diff #14677)

This is preliminary support for atomically patchable calls. For now, I just documented the 32 bit offset requirement and removed the rest. This will be a separate change proposal down the road.

kmod added a subscriber: kmod.Oct 14 2014, 7:01 PM

I think a change like this might be more compelling if you could give more detail on how it would actually help (I can't find the detail I'm looking for in your blog posts). It seems like the value of this patch is that it will work with late safepoint placement, but it'd be nice to see some examples of cases where late safepoint placement gives you something that early safepoint placement (ie by the frontend) doesn't. It kind of feels like either approach will work well with only non-gc values, and neither approach will be able to do much optimization when you do function calls. I'm not trying to claim that that's necessarily true, but it'd be easier to understand your point if there was some example IR.

Kevin,

Let me try to answer the point you're getting at. In doing so, I want
to explicitly separate the statepoint intrinsics which are currently up
for review, and the future late safepoint placement. The statepoint
intrinsics have value separate from the late safepoint placement
approach, and I want to justify them on their own merits.

The basic problem we're trying to solve with these intrinsics is
supporting fully relocating collectors. By definition, such a collector
needs to be precise w.r.t. root tracking. Even worse, we need to ensure
that *all copies* of a pointer are updated. It is not acceptable to
make two copies of a pointer, update one of them, then use the other for
a memory access.

If the compiler is allowed to introduce derived pointers (i.e. pointer
valued temporaries created by the compiler which point somewhere within
an object, or outside it, but associated with it), we also need to track
which *object* each *pointer* to be updated is associated with. This is
required to safely update the pointers.

For the sake of argument, let's say our frontend does safepoint insertion.

There's a couple of approaches which seem like they might work, let's
explore each in turn:

  • We could use patchpoints to record all the values needed for the GC

stack map. This mostly works, but requires that the patchpoint not be
marked readonly or readnone (to prevent illegal reorderings). That
could be a usage convention. The real problem is that the compiler is
still free to introduce multiple *copies* of an SSA value over the
patchpoint. (This is completely legal under SSA semantics.) When it
does so, it creates a situation where the gc could fail to update a
pointer which will then be dereferenced. That's a bug. Worth stating
explicitly, I believe the patchpoint scheme would be sufficient *if you
do not every relocate a root*.

  • We could use the gc.root. gc.root defines the allocs, but does not

define the call format, or any of the mechanisms to ensure proper
relocation. As such, it *by itself* is not viable. Also, gc.root
inherently assumes every value will have a stack slot. Without *heavy*
reengineering, there's no way to have a gc pointer in a callee saved
register over a call site. This is an unfortunate limitation. Any call
representation without explicit relocation suffers from the same bug as
the patchpoint scheme.

  • We could combine gc.root allocas and patchpoints. This essentially

combines the flaws (no gc pointers in callee saved registers over calls,
and missed copies), with no benefit.

The statepoint intrinsics are basically the patchpoint option above, but
with relocation made explicit in the IR. While it's still legal for the
optimizer to create a copy of the value feeding a statepoint, that's now
okay. By construction, there can be no use of the original SSA value
(and thus the copy) after the statepoint. Instead, the explicitly
relocated value is used.

To summarize: We need (something like) statepoints for correctness of
fully relocating collectors.

(The points I'm making here are somewhat subtle. If it would help to
have IR examples here, ask. I'm deferring writing them because it's
time consuming.)

Other advantages of the statepoint approach:

The gc.relocate intrinsics (part of the statepoint proposal) also makes
it explicit in the IR what the base object of each pointer to be
relocated is. This isn't *required* (you could encode the same
information in the arguments of the statepoint), but making it explicit
is much cleaner.

The explicit relocation notation has the potential to be extended in to
the backend. With some register allocator changes (not part of this
patch!), we could support gc pointers in callee saved registers. This
is possible with the (incorrect) patchpoint scheme. It is possible, but
*hard*, with the gc.root scheme.

The posted patch includes a couple of small optimizations (i.e. null
forwarding) that help performance, but could (probably) be implemented
on top of another scheme. We have a number of planned optimizations on
the statepoint mechanism.

Now, let me finally bring up late safepoint placement. The only real
impact on this patch is that, to date, we have only focused on the
*correctness* of a statepoint passing through the optimizer. We have
not attempted to teach the optimizer about how to leverage one or
perform optimizations over one. There's room for improvement here (i.e.
not completely blocking inlining), but we prefer to approach this
problem by simply inserting them late. You could instead choose to
insert them at generation time, and teach the optimizer about their
semantics. That *strategy choice* is independent of the representation
choosen provided that representation is *correct*.

Yours,
Philip

kmod added a comment.Oct 15 2014, 11:48 PM

Sorry yes, I am comparing this approach to gc.root; it seems like gc.root and statepoints are similar in that they both take a spill+reload "reduce the code generator's ability to use copies" approach as compared to a hypothetical "track all copies so that they can be updated in place". It seems like gc.root provides much of the same functionality as statepoint -- gc.root definitely should be able to support a relocating GC as well, and I guess I haven't heard of it being "fundamentally broken" outside of a late-safepoint-placement strategy. So far the arguments I've seen for statepoints over gc.root are

  • easier to save roots in callee-save registers
  • easier to automatically generate gc annotations on arbitrary IR, such as post-compiler-optimizations.

I guess I'm wondering if I'm missing other benefits, and what your thoughts are on whether this would be enough to save statepoints from the same fate as gc.root.

jyh added a subscriber: jyh.Oct 16 2014, 10:25 AM
mjacob added a subscriber: mjacob.Oct 23 2014, 5:22 AM

Hi Philip,

I looked only at parts of the patch so far. I guess certain things could have been broken out into smaller patches that are not part of the statepoints itself, such as the changes to MachineInstr, intrinsic munging, etc.

In general I noticed that the patch requires some love when it comes to the LLVM coding standards (e.g. CamelCase for variables). Besides that nitpicking the code looks really nice.

Please see my inline comments and questions.

Thanks

Cheers,
Juergen

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #14905)

Are you really hitting the limitations of uint8_t for mem references? Changing this by one byte increases the overall size of a MachineInstr from 64 bytes to 72 bytes (due to padding).

include/llvm/Target/Target.td
859 ↗(On Diff #14905)

Is this a hard requirement for STATEPOINTs to work?

include/llvm/Target/TargetFrameLowering.h
202–203 ↗(On Diff #14905)

Yup, couldn't agree more.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7330–7334 ↗(On Diff #14905)

I think we had something similar like this before. I think we changed it because we didn't wan't to modify the LLVM IR by creating a new function call at this point.

7351 ↗(On Diff #14905)

nullptr

reames added a comment.Nov 5 2014, 5:28 PM

Responding to review comments, updated change set coming soon.

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #14905)

We were at one point. I'm not sure we still are. I'm fine removing this in the change which initially gets upstreamed and revisiting this separately.

include/llvm/Target/Target.td
859 ↗(On Diff #14905)

You know, I'm not sure. I've forgotten why that was originally added. I *think* it's probably legacy junk at this point.

I'll remove it and see if I run into any failures.

include/llvm/Target/TargetFrameLowering.h
202–203 ↗(On Diff #14905)

Is this a "fix before submit" comment? Or just general agreement?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7330–7334 ↗(On Diff #14905)

I don't much care for this either, but at this point, it's fairly well tested and works. I'm fine cleaning this up, but do you mind if we land this first then work incrementally?

reames updated this revision to Diff 15885.Nov 6 2014, 11:37 AM

Revised to include changes requested by reviewers. Andy, Juergen, can I get a LGTM here? If you have additional comments, I'm happy to keep making changes once in tree, but keeping the diffs up to date and clean is becoming a real pain.

p.s. This is *not* against TOT. I had to include one change from TOT (in IR/Functions.cpp) to cleanup some now redundant code, but this is otherwise based on the same revision as previous. I will of course rebase before submission.

silvas added a comment.Nov 7 2014, 7:44 PM

We should try to get fpizlo's thoughts on GC-related stuff like this.

ributzka accepted this revision.Nov 21 2014, 1:36 PM

There are still a lot of coding standard violations. I marked a few of them ...

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #15885)

Whitespace

include/llvm/IR/Intrinsics.td
502 ↗(On Diff #15885)

How does that work? Doesn't "anyfloat" the the concrete type from one of the arguments?

include/llvm/IR/Statepoint.h
1 ↗(On Diff #15885)

Copyright Header

2 ↗(On Diff #15885)

I don't think we use leading and trailing "__".

25 ↗(On Diff #15885)

CamelCase

56 ↗(On Diff #15885)

CamelCase

61 ↗(On Diff #15885)

CamelCase

75 ↗(On Diff #15885)

CamelCase

141 ↗(On Diff #15885)

CamelCase

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
134 ↗(On Diff #15885)

CamelCase

135 ↗(On Diff #15885)

ditto

161 ↗(On Diff #15885)

CamelCase

192–194 ↗(On Diff #15885)

CamelCase

198 ↗(On Diff #15885)

CamelCase

200 ↗(On Diff #15885)

CamelCase

201–202 ↗(On Diff #15885)

ditto

234 ↗(On Diff #15885)

CamelCase

246–248 ↗(On Diff #15885)

...

lib/IR/Statepoint.cpp
1 ↗(On Diff #15885)

Copyright Header

lib/IR/Verifier.cpp
1119–1120 ↗(On Diff #15885)

Would be nice to put this in a "isStatepoint(const Value *)" method instead.

This revision is now accepted and ready to land.Nov 21 2014, 1:36 PM
atrick accepted this revision.Nov 21 2014, 5:23 PM

Sorry for the big delay, I don't generally get blocks of time large enough to review a patch this size. It's my fault for not insisting that you split it up into docs, IR representation, lowering to MI, and stackmap generation. Anyway, I can give a LGTM now as long you you address Juergen's coding convention comments and my comments inline and below.

Overall, I'm happy because we've agreed on a long-term plan for a generalized llvm.patchpoint to cover this use case and others. I just need to send out that proposal. In the meantime, this intrinsic lets you bootstrap the functionality.

It's somewhat shady that you create an IR instruction during SelectionDAG (CallInst::create). It's probably OK to use a temporary instruction like this (in fact we used to do it for patchpoint), but not ideal. I think we can live with it until a new uber-patchpoint comes along (no need to address it now).

Question: It looks like lowering may require statepoint and gc_relocate calls not to be interleaved? Is that true? If so, can you document that and ensure that it is verified somewhere?

The function getFrameIndexReferenceForGC() is not GC specific. Please use a more appropriate name, like getFrameIndexOffsetFromSP().

Please fix Sphinx warnings:

/s/patch/docs/Statepoints.rst:10: WARNING: Title underline too short.

Status/Warning

/s/patch/docs/Statepoints.rst:60: WARNING: Title underline too short.

An Example Safepoint Sequence

/s/patch/docs/Statepoints.rst:60: WARNING: Title underline too short.

An Example Safepoint Sequence

/s/patch/docs/Statepoints.rst:158: WARNING: Title underline too short.

'''gc_relocate''' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^
/s/patch/docs/Statepoints.rst:158: WARNING: Title underline too short.

'''gc_relocate''' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^
/s/patch/docs/Statepoints.rst:234: WARNING: Title underline too short.

Polling for a Safepoint

/s/patch/docs/Statepoints.rst:234: WARNING: Title underline too short.

Polling for a Safepoint

looking for now-outdated files... none found
pickling environment... done
checking consistency... /s/patch/docs/Statepoints.rst:: WARNING: document isn't included in any toctree

docs/Statepoints.rst
108 ↗(On Diff #15885)

You should remove the "unused" operand, or commit to it and explain what it's for.

It's strange occurring between # call args and the variadic args. Shouldn't the pattern be something like:

flags, #args, args..., flags, #args, args...

include/llvm/CodeGen/MachineInstr.h
91 ↗(On Diff #15885)

Accidental change.

lib/IR/Verifier.cpp
1117–1125 ↗(On Diff #15885)

This is incomplete because it only verifies that one use of the address is a safepoint, not all users.

lib/Target/X86/X86FrameLowering.cpp
1136–1137 ↗(On Diff #15885)

I think this comment is accidentally wrapped.

1185–1188 ↗(On Diff #15885)

I don't understand this comment or why the assert is disabled.

reames closed this revision.Dec 2 2014, 11:37 AM
reames updated this revision to Diff 16822.

Closed by commit rL223143 (authored by @reames).