This is an archive of the discontinued LLVM Phabricator instance.

[rs4gc] Optionally directly relocated vector of pointers
ClosedPublic

Authored by reames on Jan 7 2016, 4:37 PM.

Details

Summary

This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.

The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.

Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.

At the moment, the new functionality is hidden behind a flag. To use this code, you need to disable use "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.

Diff Detail

Event Timeline

reames updated this revision to Diff 44290.Jan 7 2016, 4:37 PM
reames retitled this revision from to [rs4gc] Optionally directly relocated vector of pointers.
reames updated this object.
reames added a subscriber: llvm-commits.
mjacob added inline comments.Jan 7 2016, 6:12 PM
lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1331–1342

You can hoist the construction of the i8 pointer in the correct address space before the conditional by using Type::getInt8PtrTy(M->getContext(), Ty->getPointerAddressSpace()). Then you could further simplify the code by sinking the call to Intrinsic::getDeclaration() to a common path.

The resulting code would be similar to this:

Type *Int8Ptr = Type::getInt8PtrTy(M->getContext(), Ty->getPointerAddressSpace());
if (VectorType *VT = dyn_cast<VectorType>(Ty))
  Int8Ptr = VectorType::get(Int8Ptr, VT->getNumElements());
return Intrinsic::getDeclaration(M, Intrinsic::experimental_gc_relocate, {Int8Ptr});
reames updated this revision to Diff 44378.Jan 8 2016, 2:23 PM

address manuel's comments - thanks, that's a lot cleaner!

mjacob accepted this revision.Jan 8 2016, 4:43 PM
mjacob edited edge metadata.

LGTM.

Maybe you could move the caching to the lambda. However I'm fine with it if you had good reasons not to do so.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1321–1328

This comment is a bit inaccurate now.

This revision is now accepted and ready to land.Jan 8 2016, 4:43 PM
reames added a comment.Jan 8 2016, 5:31 PM

Maybe you could move the caching to the lambda. However I'm fine with it if you had good reasons not to do so.

I generally find that having the generation function separate from the caching layer leads to fewer bugs. Maybe it's just me, but I've seen and written a *lot* of bugs where something didn't get cached correctly. I like to keep that portion of the code as simple and obviously correct as possible. I could use a second layer of lambda, but that seemed like a overkill in this situation.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1321–1328

Will tweak before submission.

This revision was automatically updated to reflect the committed changes.