This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Remove gc.relocate duplicates
AbandonedPublic

Authored by skatkov on Mar 3 2021, 12:27 AM.

Details

Reviewers
reames
dantrushin
Summary

If there are two relocation tied to the same token and both
base and derived indices are the same then relocations are the same.

In this case we can drop one relocation and use another.

Diff Detail

Event Timeline

skatkov created this revision.Mar 3 2021, 12:27 AM
skatkov requested review of this revision.Mar 3 2021, 12:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2021, 12:27 AM
skatkov added inline comments.Mar 3 2021, 1:09 AM
llvm/include/llvm/IR/Statepoint.h
229

Forgot to update comment. will do after review or before ladning.

reames requested changes to this revision.Mar 3 2021, 12:35 PM
reames added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2191

You can avoid the isTiedToLandingPad routine in a couple of ways:

  • Key directly off the token value
  • Key off the existing isTiedToInvoke property

I'm also fine with you adding the cover function if desired, just pointing out alternatives in case you hadn't considered them.

2197

This assert will fail. Consider:

%token = call statepoint....
if (c)

%gcr = relocate(token, 0, 0)

else

%gcr2 = relocate(token, 0, 0)
2252

You haven't changed what is live, so the early continue is pointless.

This revision now requires changes to proceed.Mar 3 2021, 12:35 PM
reames added a comment.Mar 3 2021, 3:38 PM

Serguei,

At a high level, what's the motivation for this? After thinking about it a bit, this is essentially just CSE. CSE is already done by early-cse and gvn. (In particular, if you run the changed tests through -instcombine -early-cse you get the same result.)

Are you deliberately working around a pass ordering issue here? Or is there something I'm missing with the patch?

Serguei,

At a high level, what's the motivation for this? After thinking about it a bit, this is essentially just CSE. CSE is already done by early-cse and gvn. (In particular, if you run the changed tests through -instcombine -early-cse you get the same result.)

Are you deliberately working around a pass ordering issue here? Or is there something I'm missing with the patch?

It is interesting. To be honest I did not try cse but gvn did not help me. Makes sense to try while I guess here we cat get a dependency between passes and will need to iterate them in a loop to get the result. In this case I would prefer to move optimization in one pass to reduce compile time.

The motivation case looks as follows:

  1. Consider we have a live gc value which is represented as Phi node.
  2. RS4GC pass will create a new phi node unconditionally to represent a base and insert both live values (base/derived) into gc-bundle.
  3. Then Both relocation (for base and derived pointer) will be created.
  4. In the next statepoint where the same value is alive these both relocations will be included into gc-bundle and relocation for both of them will be created.
  5. This will be repeated for each next statepoint.

Now optimizations:

  1. Something like InstCombine will be able to eliminate the duplicated phi node (created on step 2). And now gc-bundle contains two identical values but different relocations still exist and next statepoint is unchanged.
  2. After that InstCombine is able to eliminate extra value from gc-bundle and as a result we have two relocations with the same indices. This is not simplified by InstCombine.
  3. You say that CSE is able to eliminate extra gc.relocation and we'll get two identical values in the next statepoint but CSE does not simplify statepoint instruction, so it will stop on that.
  4. Next invocation of InstCombine will be able to eliminate extra value in gc-bundle and open CSE way to simplify next relocation.

and so on.
If we have this optimization is in InstCombine, it will be able to simplify the whole chain at once.

Does it make sense?

skatkov added inline comments.Mar 3 2021, 7:09 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2191

I'll consider but not sure whether tuple has function support token type.
I cannot be based on isTiedToInvoke property because I need to separate gc relocation from the same statepoint for those which are in landing pad and in normal path. Both these relocations will have the same true value of isTiedToInvoke property.

So I will try token directly and if it does not work I will stay with the current approach.

2197

Ok, I'll cover this case, it will do an algorithm a bit more complex :)

2252

If I found a duplicate which has already been processed that it means that both derived and base pointers are already is in LiveGCValues. I can add an assert.

skatkov updated this revision to Diff 328021.Mar 3 2021, 10:03 PM

Handled comments, new tests added

skatkov marked 2 inline comments as done.Mar 3 2021, 10:03 PM

Serguei,

Your explanation of the pass ordering interaction makes a lot of sense, thank you for the nice explanation. However, I think there's still something to the story missing.

EarlyCSE already has dedicated support for this case. Consider the lines:

// gc.relocate is 'special' call: its second and third operands are
// not real values, but indices into statepoint's argument list.
// Get values they point to.
if (const GCRelocateInst *GCR = dyn_cast<GCRelocateInst>(Inst))
  return hash_combine(GCR->getOpcode(), GCR->getOperand(0),
                      GCR->getBasePtr(), GCR->getDerivedPtr());

What should happen here is that at each statepoint we canonicalize the gc.relocates (but leave the duplicate entries in the gc bundle of the statepoint) in a single pass over the IR. We in fact have a test (test/Transforms/EarlyCSE/gc_relocate.ll) which seems to show exactly that happening.

Is it possible that you don't have earlycse at the relevant point in your pass pipeline?

I will note that I don't see an analogous change having been made to GVN. So it's possible that's your problem. (It would be a straight forward tweak in lookupOrAddCall.)

Regardless of your answer, I realized the GVN support would be generally beneficial and useful to have. Patch up for review here: https://reviews.llvm.org/D97974

Serguei,

Your explanation of the pass ordering interaction makes a lot of sense, thank you for the nice explanation. However, I think there's still something to the story missing.

EarlyCSE already has dedicated support for this case. Consider the lines:

// gc.relocate is 'special' call: its second and third operands are
// not real values, but indices into statepoint's argument list.
// Get values they point to.
if (const GCRelocateInst *GCR = dyn_cast<GCRelocateInst>(Inst))
  return hash_combine(GCR->getOpcode(), GCR->getOperand(0),
                      GCR->getBasePtr(), GCR->getDerivedPtr());

What should happen here is that at each statepoint we canonicalize the gc.relocates (but leave the duplicate entries in the gc bundle of the statepoint) in a single pass over the IR. We in fact have a test (test/Transforms/EarlyCSE/gc_relocate.ll) which seems to show exactly that happening.

Is it possible that you don't have earlycse at the relevant point in your pass pipeline?

I will note that I don't see an analogous change having been made to GVN. So it's possible that's your problem. (It would be a straight forward tweak in lookupOrAddCall.)

Thanks a lot! I neede to look into CSE yesterday to save your time, my bad. Indeed CSE is able to make simplification in one pass.
So to make full cleanup after RS4GC we need

  1. InstCombine to eliminate redundant phi
  2. EarlyCSE to clean-up the chain of gc.relocation
  3. InstCombine again to clean-up gc bundle

InstCombine with this patch is able to do it by one pass but anyway if you think that CSE optimization is irrelevant in InstCombine I will abandon this patch.

Let me take a look into GVN whether with your patch GVN can do a full clean-up as in one pass...

Ok, I've checked GVN with your patch. The situation becomes the same as for EarlyCSE. To make a full clean-up we need

  1. InstCombine to eliminate redundant phi
  2. GVN to clean-up the chain of gc.relocation
  3. InstCombine again to clean-up gc bundle

Interesting why GVN cannot remove the duplicated phi node?

So the final question whether we are ok to have all in one in InstCombine with this patch or want to build a clean-up pipeline.
What is your opinion?

Actually I took a look into original reproducer and. found that picture is more complex on it.
Removing of gc.relocate duplication makes a duplication of phi node. And elimination of duplication of phi node open a room for further simplification of the next gc.relocate.

For original reproducer in the pipeline:
-passes=rewrite-statepoints-for-gc,instcombine,early-cse,instcombine,early-cse,instcombine,early-cse,instcombine,early-cse,instcombine,early-cse
only the last one did not make any simplification, all other did.

So the problem here that there is no pass which does both - simplify gc.relocate and eliminate duplicated phi nodes.

Wouldn't it make more sense to fix RS4GC then?
I think D97885 makes this mostly irrelevant. Even if it is not, we might try to detect duplicate phis in RS4GC.

reames added a comment.Mar 5 2021, 9:39 AM

Just to comment here - I'm not arguing that we shouldn't land this. I'm simply wanting to make sure we fully understand the reasoning justifying it, and have fixed all the low hanging issues "nearby".

I do want to look into the "why isn't GVN and EarlyCSE combining two equivalent phi nodes" a bit. Give me a bit to do that as having CSE handle this seems like a more natural fit than instcombine, but if that doesn't work out, I'm not opposed to doing this here since the scope of the CSE is so restricted.

reames added a comment.Mar 5 2021, 4:29 PM

Posted an enhancement for phi handling in GVN: https://reviews.llvm.org/D98080

I'd be curious to know if that's enough for the original test case.

reames added a comment.Mar 5 2021, 4:39 PM

LGTM w/minor comments.

I'm LGTMing this as the code appears correct, but I ask that you don't land this if the previously linked GVN patch solves the original test case. I'm not completely against duplicating the CSE into InstCombine (specific for gc.relocates), but I'd rather avoid that duplication. I also don't want to block progress indefinitely though.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2179

Can you reword this comment enough to a) emphasize this is cse, and b) explain why we're doing this here in addition to gvn?

2193

Please use .empty here, it's more idiomatic.

2215

again !empty

Serguei, I took another look at just improving rs4gc, and for some reason this time it seemed much easier than we'd talked about it last time. I'd be curious to know if https://reviews.llvm.org/D98122 addresses your original issues (without any of the GVN patches).

Posted an enhancement for phi handling in GVN: https://reviews.llvm.org/D98080

I'd be curious to know if that's enough for the original test case.

D97974 + D98080 does not produce the best result.
It does not remove dup gc.relocate since some point by some reason.
The following pipeline produces the best result:
-passes=rewrite-statepoints-for-gc,gvn,instcombine,gvn,instcombine,gvn,instcombine,gvn,instcombine.
I also checked D98082, nothing changed,

For statepoint with gc.bundle = "gc-live"(i8 addrspace(1)* %base_phi, i8 addrspace(1)* %base_phi, ...) ]
GVN did not combine

%base_phi.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %statepoint_token19, i32 0, i32 0), !dbg !91 ; (%base_phi, %base_phi)
%16 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %statepoint_token19, i32 1, i32 1), !dbg !91 ; (%base_phi, %base_phi)

I think the solution with EarlyCSE supporting elimination of dup phi nodes seems more interesting...

Serguei, I took another look at just improving rs4gc, and for some reason this time it seemed much easier than we'd talked about it last time. I'd be curious to know if https://reviews.llvm.org/D98122 addresses your original issues (without any of the GVN patches).

It does not address original issue.

I think the solution with EarlyCSE supporting elimination of dup phi nodes seems more interesting...

My prototype to remove dup ph nodes does not work as a single pass due to BasicBlock processing order - not all predecessors are handled before processing phi node.
So EarlyCSE is a dead way without changing basic block processing order.

Serguei, I took another look at just improving rs4gc, and for some reason this time it seemed much easier than we'd talked about it last time. I'd be curious to know if https://reviews.llvm.org/D98122 addresses your original issues (without any of the GVN patches).

It does not address original issue.

ok, I was wrong! The patch D98122 (+instcombine as clean-up) produces the same best result as I was able to get using other approaches.
To me it is a best decision and I'm ready to abandon this change for now if there will no be other producers of duplicated phi nodes.

reames added a comment.Mar 9 2021, 8:52 AM

Posted an enhancement for phi handling in GVN: https://reviews.llvm.org/D98080

I'd be curious to know if that's enough for the original test case.

D97974 + D98080 does not produce the best result.
It does not remove dup gc.relocate since some point by some reason.
The following pipeline produces the best result:
-passes=rewrite-statepoints-for-gc,gvn,instcombine,gvn,instcombine,gvn,instcombine,gvn,instcombine.

Separately from the RS4GC work, would you mind reducing and sharing a test case for this? I'd like to understand why GVN isn't getting this case. From what you wrote, it definitely *should*.

skatkov abandoned this revision.Mar 11 2021, 1:02 AM

Posted an enhancement for phi handling in GVN: https://reviews.llvm.org/D98080

I'd be curious to know if that's enough for the original test case.

D97974 + D98080 does not produce the best result.
It does not remove dup gc.relocate since some point by some reason.
The following pipeline produces the best result:
-passes=rewrite-statepoints-for-gc,gvn,instcombine,gvn,instcombine,gvn,instcombine,gvn,instcombine.

Separately from the RS4GC work, would you mind reducing and sharing a test case for this? I'd like to understand why GVN isn't getting this case. From what you wrote, it definitely *should*.

Hi Philip, I have a trouble to share the original test but after careful merging of all current GVN patches to downstream I ensured that gvn + instcombine gives the same simplification as I saw best one.
So I can confirm that current GVN can handle simplification in a one pass.

With that I abandon this patch and will rely on GVN. However simplification in RS4GC still makes sense to me.