This is an archive of the discontinued LLVM Phabricator instance.

Move easily derivable pointers optimization from CodeGenPrepare to RewriteStatepointPass
ClosedPublic

Authored by igor-laevsky on May 14 2015, 9:59 AM.

Details

Summary

In CodeGen prepare we have optimization which re-creates some derived pointers instead of using gc.relocates for them. I.e if we had following code:

%derived = gep %base, 15
safepoint(base, derived)
%base.rel = gc.relocate(%base)
%derived.rel = gc.relocate(%derived)

It transforms it into:

%derived = gep %base, 15
safepoint(base, derived)
%base.rel = gc.relocate(%base)
%derived.rel = gep %base.rel, 15

This is profitable for pointers which are cheaply computed from their bases.

In this changeset I am moving this optimization from CodeGenPrepare into RewriteSafepointPass. Basic motivation here is that by doing it before gc.relocate insertion we can catch substantially more cases. For example if pointer is located inside a loop with statepoint.

Also I am extending it to happen not only on geps with small constant indices, but on all geps and noop casts.

Diff Detail

Repository
rL LLVM

Event Timeline

igor-laevsky retitled this revision from to Move easily derivable pointers optimization from CodeGenPrepare to RewriteStatepointPass.
igor-laevsky updated this object.
igor-laevsky edited the test plan for this revision. (Show Details)
igor-laevsky set the repository for this revision to rL LLVM.
igor-laevsky added a subscriber: Unknown Object (MLST).
reames edited edge metadata.May 14 2015, 4:58 PM

I haven't had a chance to review the newly added bits in RewriteStatepointsForGC, but I want to comment on the CGP parts first. I see no reason to remove this optimization. If a statepoint is generated through a means other than RSForGC, this optimization is valid and useful. Please update your patch to leave this code in CGP.

(If you chose to common the code in some way, that's fine. I don't mean to discourage that.)

igor-laevsky edited edge metadata.
sanjoy requested changes to this revision.May 18 2015, 12:24 AM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1418 ↗(On Diff #25854)

Nit: change this to assert(AllocaMap.count() && "why this should be true!").

1422 ↗(On Diff #25854)

Unless you have a reason to prefer otherwise, I'd push this invariant of RematerializedValue being an Instruction earlier and

typedef DenseMap<Value *, Instruction *> RematerializedValueMapTy;
1456 ↗(On Diff #25854)

Variable naming in this function is somewhat haphazard. Do you mind bringing the variable names up to LLVM's coding standards for the functions you touched in a later separate NFC change, (once this change is in)?

1471 ↗(On Diff #25854)

Most of LLVM consistently avoids bracing small if bodies like these, can you change this to

if (...)
  continue;

here and elsewhere?

1801 ↗(On Diff #25854)

Nit: should be "successfully"

1842 ↗(On Diff #25854)

If TTI is non-null, then take a reference instead of a pointer.

1847 ↗(On Diff #25854)

I'd structure this as:

if (CastInst *CI = ...) {
} else if (GEPInst *GEP = ...) {
} else
  llvm_unreachable("message");

Currently the continue statements are redundant and they make it look like that there is some fallthrough logic that you're trying to avoid, but there isn't.

1850 ↗(On Diff #25854)

Add a string to the assert on why the condition should be true.

1873 ↗(On Diff #25854)

What is a situation where this guard will help?

1887 ↗(On Diff #25854)

If TTI is never null then take TargetTransformInfo & instead.

1893 ↗(On Diff #25854)

I'd rather you not do this, but instead accumulate the values you're supposed to delete into a SmallVector and batch-delete them after the loop.

If you wish to keep this as it is, please use the SmallVector constructor that takes a begin and end iterator.

1914 ↗(On Diff #25854)

No need to fix this now, but add a TODO for adjusting the cost when recomputing the derived pointer will let you remove the computation of the unrelocated derived pointer in cases where the only use of the derived pointer was the statepoint.

1934 ↗(On Diff #25854)

But you may still semantically "promote" a live value that was solely the base pointer for another derived pointer (and had no direct uses after the safepoint) to a pointer that is fundamentally live over the safepoint (has a direct use after the safepoint), right? Will that always work correctly?

1945 ↗(On Diff #25854)

In debug mode assert that ClonedValue does not use any of the values in ChainToBase except LastValue.

1966 ↗(On Diff #25854)

Please common out the cast<InvokeInst>(CS.getInstruction()). That way the assert(CS.isInvoke()) also becomes redundant.

This revision now requires changes to proceed.May 18 2015, 12:24 AM
igor-laevsky edited edge metadata.
igor-laevsky added inline comments.May 18 2015, 9:43 AM
lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1456 ↗(On Diff #25854)

Sure

1873 ↗(On Diff #25854)

In theory TTI functions can return zero cost. In that case it is possible to have very long instruction chain with estimated cost of zero. It does not seems very right. This is purely hypothetical and I don't think it is very likely to happen on practice.

1914 ↗(On Diff #25854)

Not sure if I am following.

Are you saying about cases when we will be able to remove instruction chain (or part of it) later? I.e it also can happen if there was several intersecting instruction chains we have rematerialized.

1934 ↗(On Diff #25854)

Yes, it should work. Nowhere in this pass we distinguish pointers which are live "for real" and base pointer which are required to be live because of the derived pointer, but does not have uses of their own.

sanjoy accepted this revision.May 18 2015, 10:40 AM
sanjoy edited edge metadata.

LGTM with comments addressed.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1860 ↗(On Diff #25854)

I'm not sure that this is the right meaning of IsComplex. Even with variable indices x86 can usually merge the GEP into the load/store. I think IsComplex should be "has more than one non-constant operand".

1873 ↗(On Diff #25854)

But I'd say in those cases it should be perfectly okay to have a large chain. Otherwise TTI is wrong. :)

Actually, I don't have strong objections to this part, but I think it is more sensible to have this as an optimization to reduce compile time -- in the caller, check if Chain.size() is higher than some threshold (10 is good) and if so bail out of rewriting anything for the given Chain.

In either case, please add a test case that exercises this path.

1914 ↗(On Diff #25854)

I'm talking about the case where we go from:

%D = GEP %B, 4
; No use of %D before statepoint
(%D2, %B2) = statepoint_and_relocate(%D, %B)

to

%B2 = statepoint_and_relocatt(%B)
%D2 = GEP %B2, 4

Here we have only moved the GEP, but did not copy it, so we did not add any cost to the computation overall.

1934 ↗(On Diff #25854)

Great! Please add one or two lines explaining this. If you can figure out a way to assert on this then even better.

test/Transforms/RewriteStatepointsForGC/basics.ll
2 ↗(On Diff #25854)

Very minor: I'd use -spp-rematerialization-threshold=-1 here to make it obvious that you're preventing any rewriting, even zero-cost ones.

test/Transforms/RewriteStatepointsForGC/rematerialize-derived-pointers.ll
12 ↗(On Diff #25854)

Please add some (2-3) more detailed test cases that check that you've generated the correct GEPs and cast instructions.

This revision is now accepted and ready to land.May 18 2015, 10:40 AM
igor-laevsky added inline comments.May 18 2015, 11:56 AM
lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1873 ↗(On Diff #25854)

I think it's ok for TTI function to return zero - they estimate only runtime cost. But since we are cloning instructions we also need to account compile time for them.

I agree, this threshold fits for the caller better.

This revision was automatically updated to reflect the committed changes.