This is an archive of the discontinued LLVM Phabricator instance.

[RS4GC] Extract rematerilazable candidate search. NFC.
ClosedPublic

Authored by skatkov on Jan 31 2022, 9:29 PM.

Details

Summary

Finding re-materialization chain for derived pointer does not depend on
call site. To avoid this finding for each call site it can be extracted in
a separate routine.

Diff Detail

Event Timeline

skatkov created this revision.Jan 31 2022, 9:29 PM
skatkov requested review of this revision.Jan 31 2022, 9:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 9:29 PM
reames added a comment.Feb 1 2022, 9:39 AM

General direction seems reasonable, a couple inline comments, but likely to converge to an LGTM quickly.

However, I'd suggest taking a step back and asking whether this code makes sense at all. I don't want to make this blocking, but I think it does make sense to reexamine before going much further.

Given a random derived pointer P and it's base B, we can always find the offset via sub(ptrtoint(P), ptrtoint(B)). Materializing the ptrtoint is maybe undesirable, but we already do this in a couple places in the existing code. (Remember, this is a lowering pass.) See gc.get.pointer.offset.

I'd argue this means we should either be remating everything, or nothing. The current heuristic - which amounts to "can we find the gep chain?" - really doesn't seem defensible.

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
287

Shouldn't this always be the base pointer of the last element in the chain? If so, might be good to either not store explicitly, or at least assert.

2275

This really looks like something we should fix in the findBasePointer code, but we can leave it for the moment. We're adding complexity to avoid fixing another problem, and that seems non ideal.

2330

You're doing a copy here, but I don't see either the copy or the source modified. Did you mean this to be a const reference?

I agree that this re-materialization is not great (especially cost model). Will explain my understanding in a separate comment.

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
287

According to comment below (special handling of findBasePointer inefficiency) RootOfChain might differ from found base pointer and we need to keep it to make a replacement of RootOfChain during re-materialization.

According to findRematerializationCandidates, RootOfChain might be only base or alternative phi. This is by construction. What assert do you propose to add? Just to repeat a logic of findRematerializationCandidates?

If we solve the problem with inefficiency of findBasePointer we will need this field at all.

2275

Agreed. I wanted to keep it as NFC.

2330

Next line is a reverse of the chain.

Hi Philip, I've tried to consider re-materialization of derived pointers from live range profitability.
Let's our dp = base + Idx *scale + offset.
Let's disp = dp - base = Idx *scale + offset. (this is about your potential proposal to remat all as I've understood it).

Let's we have the following points in a program:
0 Def of Idx and Off
1 Def of dp and probably disp if we consider this case.
2 Call site with statepoint (dp and base should ba alive here)
2.2 Remat of dp close to call site. (If we do re-materialization)
2.8 Remat of dp close to use. (If we do re-materialization)
3 Use of dp.

Now consider 5 scenarios of re-materialization:
A. No re-materialization at all.
B. Remat at call site
C. Remat at use
D. Remat at call site with disp pre-computerd.
E. Remat at call site with disp at use.

What live in different ranges?
[0,1) - Idx and off for A, B, C, D, E.
[1,2) -
A: base, dp
B: Off, Idx, base
C: Off, Idx, base
D: base, disp
E: base, disp
[2, 2.2) -
A: dp
B: Off, Idx, base
C: Off, Idx, base
D: base, disp
E: base, disp
[2.2, 2.8) -
A: dp
B: dp
C: Off, Idx, base
D: dp
E: base, disp
[2.8, 3) - only dp for A, B, C, D, E.

So in general case, in terms of liveness, no re-materialization is better than any re-materialization.
And disp case is better than Idx + Off in case both of them are not constant.
But if Idx and Off are constant and disp as a result we get
[0,1) - none
[1,2) -
A: base, dp
B: base
C: base
D: base
E: base
[2, 2.2) -
A: dp
B: base
C: base
D: base
E: base
[2.2, 2.8) -
A: dp
B: dp
C: base
D: dp
E: base
[2.8, 3) - only dp for A, B, C, D, E.
So any kind of re-materialization is better than no re-materialization.

So in terms of liveness, re-materialization is profitable if Idx and Off are constants.

However re-materialization has some additional effects.
First of all dp computation may potentially be folded into use and in the case re-materialization may be positive in spite of extend of live range due to we eliminate the derived pointer computation.

Additionally even if do not fold derived pointer into all users we may move the derived pointer computation from hot block to cold one (positive impact) or from cold block to hot block (negative impact).

Taking all above (which possibly misses some corner cases) I would say that re-materialization is profitable if

  1. We can fold address computation to all users and Idx and Off are constants.
  2. Rematerialization insert point is colder that original derived point computation and Idx and Off are constants.
  3. If Idx and Off are not constants, think three times before moving to colder block.

Any thoguhts? Would be very glad to consider this from other angles and continue discussion.
For a while we can land this re-factoring due to it change nothing - NFC.

Sorry for the long comment.

reames accepted this revision.Feb 2 2022, 4:12 PM

LGTM to this patch. All of my style/code concerns had good justifications.

On the general topic, I was thinking of remat slightly differently. You broke down the indexing expression into the component pieces (Scale, Idx, Disp). I was thinking about simply computing (Ptr-Base) or RawOffset. RawOffset might be factorable into component pieces, but I wasn't thinking in terms of extending the live ranges of those pieces.

Instead, I was thinking of rewriting this code:
base = ...
p = some_offset_function(base)
...
use (p)

Into
base = ...
rawoffset = some_offset_function(base) - p
...
p = base + rawoffset
use (p)

With this form, p+rawOffset should reliable fold into use (for memory uses at least), and the live range of rawoffset is exactly the same as the original p. We do have an extra variable in the sub-expression, but that's very localized register pressure as the variable for some_offset_function extends one instruction (sub) past the original definition of p.

This revision is now accepted and ready to land.Feb 2 2022, 4:12 PM
This revision was landed with ongoing or failed builds.Feb 3 2022, 7:11 PM
This revision was automatically updated to reflect the committed changes.