If there are two relocation tied to the same token and both
base and derived indices are the same then relocations are the same.
In this case we can drop one relocation and use another.
Differential D97837
[InstCombine] Remove gc.relocate duplicates skatkov on Mar 3 2021, 12:27 AM. Authored by
Details
If there are two relocation tied to the same token and both In this case we can drop one relocation and use another.
Diff Detail Event Timeline
Comment Actions Serguei, At a high level, what's the motivation for this? After thinking about it a bit, this is essentially just CSE. CSE is already done by early-cse and gvn. (In particular, if you run the changed tests through -instcombine -early-cse you get the same result.) Are you deliberately working around a pass ordering issue here? Or is there something I'm missing with the patch? Comment Actions It is interesting. To be honest I did not try cse but gvn did not help me. Makes sense to try while I guess here we cat get a dependency between passes and will need to iterate them in a loop to get the result. In this case I would prefer to move optimization in one pass to reduce compile time. The motivation case looks as follows:
Now optimizations:
and so on. Does it make sense?
Comment Actions Serguei, Your explanation of the pass ordering interaction makes a lot of sense, thank you for the nice explanation. However, I think there's still something to the story missing. EarlyCSE already has dedicated support for this case. Consider the lines: // gc.relocate is 'special' call: its second and third operands are // not real values, but indices into statepoint's argument list. // Get values they point to. if (const GCRelocateInst *GCR = dyn_cast<GCRelocateInst>(Inst)) return hash_combine(GCR->getOpcode(), GCR->getOperand(0), GCR->getBasePtr(), GCR->getDerivedPtr()); What should happen here is that at each statepoint we canonicalize the gc.relocates (but leave the duplicate entries in the gc bundle of the statepoint) in a single pass over the IR. We in fact have a test (test/Transforms/EarlyCSE/gc_relocate.ll) which seems to show exactly that happening. Is it possible that you don't have earlycse at the relevant point in your pass pipeline? I will note that I don't see an analogous change having been made to GVN. So it's possible that's your problem. (It would be a straight forward tweak in lookupOrAddCall.) Comment Actions Regardless of your answer, I realized the GVN support would be generally beneficial and useful to have. Patch up for review here: https://reviews.llvm.org/D97974 Comment Actions Thanks a lot! I neede to look into CSE yesterday to save your time, my bad. Indeed CSE is able to make simplification in one pass.
InstCombine with this patch is able to do it by one pass but anyway if you think that CSE optimization is irrelevant in InstCombine I will abandon this patch. Let me take a look into GVN whether with your patch GVN can do a full clean-up as in one pass... Comment Actions Ok, I've checked GVN with your patch. The situation becomes the same as for EarlyCSE. To make a full clean-up we need
Interesting why GVN cannot remove the duplicated phi node? So the final question whether we are ok to have all in one in InstCombine with this patch or want to build a clean-up pipeline. Comment Actions Actually I took a look into original reproducer and. found that picture is more complex on it. For original reproducer in the pipeline: So the problem here that there is no pass which does both - simplify gc.relocate and eliminate duplicated phi nodes. Comment Actions Wouldn't it make more sense to fix RS4GC then? Comment Actions Just to comment here - I'm not arguing that we shouldn't land this. I'm simply wanting to make sure we fully understand the reasoning justifying it, and have fixed all the low hanging issues "nearby". I do want to look into the "why isn't GVN and EarlyCSE combining two equivalent phi nodes" a bit. Give me a bit to do that as having CSE handle this seems like a more natural fit than instcombine, but if that doesn't work out, I'm not opposed to doing this here since the scope of the CSE is so restricted. Comment Actions Posted an enhancement for phi handling in GVN: https://reviews.llvm.org/D98080 I'd be curious to know if that's enough for the original test case. Comment Actions LGTM w/minor comments. I'm LGTMing this as the code appears correct, but I ask that you don't land this if the previously linked GVN patch solves the original test case. I'm not completely against duplicating the CSE into InstCombine (specific for gc.relocates), but I'd rather avoid that duplication. I also don't want to block progress indefinitely though.
Comment Actions Serguei, I took another look at just improving rs4gc, and for some reason this time it seemed much easier than we'd talked about it last time. I'd be curious to know if https://reviews.llvm.org/D98122 addresses your original issues (without any of the GVN patches). Comment Actions D97974 + D98080 does not produce the best result. For statepoint with gc.bundle = "gc-live"(i8 addrspace(1)* %base_phi, i8 addrspace(1)* %base_phi, ...) ] %base_phi.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %statepoint_token19, i32 0, i32 0), !dbg !91 ; (%base_phi, %base_phi) %16 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %statepoint_token19, i32 1, i32 1), !dbg !91 ; (%base_phi, %base_phi) I think the solution with EarlyCSE supporting elimination of dup phi nodes seems more interesting... Comment Actions
My prototype to remove dup ph nodes does not work as a single pass due to BasicBlock processing order - not all predecessors are handled before processing phi node. Comment Actions ok, I was wrong! The patch D98122 (+instcombine as clean-up) produces the same best result as I was able to get using other approaches. Comment Actions Separately from the RS4GC work, would you mind reducing and sharing a test case for this? I'd like to understand why GVN isn't getting this case. From what you wrote, it definitely *should*. Comment Actions Hi Philip, I have a trouble to share the original test but after careful merging of all current GVN patches to downstream I ensured that gvn + instcombine gives the same simplification as I saw best one. With that I abandon this patch and will rely on GVN. However simplification in RS4GC still makes sense to me. |
Forgot to update comment. will do after review or before ladning.