This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add bailout thresholds to CSEMIRBuilder::dominates() and the localizer.
AbandonedPublic

Authored by aemerson on Sep 8 2020, 10:14 AM.

Details

Summary

dominates(A, B) tries to determine dominance by walking an iterator from the beginning of the block potentially until the end. In extremely large BBs this can result in very long compile times, since this is called often from the artifact combiner.

The localizer's intra-block localization phase does a similar thing and walks through each instruction in the block top-down, checking if next instruction is the user set of the localizing instruction. Add a bailout threshold here too.

Longer term we may want to explore algorithmic changes to the code to alleviate these issues, perhaps by using lazy instruction numbering within blocks.

I arrived at these thresholds by building the test suite including SPEC2006, and choosing thresholds that were high enough that they wouldn't fire, going one order of magnitude higher.

Diff Detail

Event Timeline

aemerson created this revision.Sep 8 2020, 10:14 AM
aemerson requested review of this revision.Sep 8 2020, 10:14 AM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

This change looks concerning.
If this method returns false, it implies that A does not dominate B, and getDominatingInstrForID ends up moving the instruction to the current position. Is this what you're trying to do? (I suspect not).

aemerson added inline comments.Sep 8 2020, 2:32 PM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

Right, I'd forgotten that getDominatingInstrForID() assumes that A and B are in the same block before its calls dominates(). I'll try passing in a return parameter to signal back when the search was aborted early?

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

That would work. Though I'm not sure if this is the best way to go about - what we'll end up with is a MachineFunction that may or may not have CSE'd instructions (even though it could be). If CSE is being relied upon for canonicalization, this may not be the right approach (where depending on where an instruction is, it may or may not be CSEd).
Alternatively we can make this a part of the CSEConfig which allows targets to opt out of this behavior?

arsenm added inline comments.Sep 9 2020, 7:36 AM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

I don't think CSE should be a guaranteed property. I don't think we need a knob for this

Hi Amara,

dominates(A, B) tries to determine dominance by walking an iterator from the beginning of the block potentially until the end. In extremely large BBs this can result in very long compile times, since this is called often from the artifact combiner.

Instead of disabling the check, I think it would be best to fix the underlying compile time issue. The solution here would be to switch the list of MachineInstr to numbered instructions so that dominance checks become O(1).
I had meant to look at this for quite some time but didn't get a chance to do it.
Something similar has been done on LLVM IR (see https://reviews.llvm.org/D51664). I wish it was done in a way that could have benefited the Machine IR like I suggested in https://reviews.llvm.org/D51664#1389884, but this ship has sailed!

I understand this is a big under taking so I am fine with the threshold approach in the meantime.

I leave the final approval to Aditya and Matt.

Cheers,
-Quentin

aemerson marked an inline comment as not done.Sep 9 2020, 12:52 PM

Hi Amara,

dominates(A, B) tries to determine dominance by walking an iterator from the beginning of the block potentially until the end. In extremely large BBs this can result in very long compile times, since this is called often from the artifact combiner.

Instead of disabling the check, I think it would be best to fix the underlying compile time issue. The solution here would be to switch the list of MachineInstr to numbered instructions so that dominance checks become O(1).
I had meant to look at this for quite some time but didn't get a chance to do it.
Something similar has been done on LLVM IR (see https://reviews.llvm.org/D51664). I wish it was done in a way that could have benefited the Machine IR like I suggested in https://reviews.llvm.org/D51664#1389884, but this ship has sailed!

I understand this is a big under taking so I am fine with the threshold approach in the meantime.

I leave the final approval to Aditya and Matt.

Cheers,
-Quentin

Yes I saw that thread, it's a shame we didn't have a shared solution for MBB too. There's probably more places where we have hidden super-linear compile time waiting to be triggered.

It's actually not quite as simple as I previously thought. The current logic in getDominatingInstrForID seems to be this:

MachineInstr *A = CSE_lookup_in_same_bb(B)
if (!dominates(A, B))
    move A before B such that dominates(A, B) is true
else
    return pointer to A

If we hit our threshold, what should getDominatingInstrForID() do? Perhaps clone the instruction A instead?

Hi Amara,

dominates(A, B) tries to determine dominance by walking an iterator from the beginning of the block potentially until the end. In extremely large BBs this can result in very long compile times, since this is called often from the artifact combiner.

Instead of disabling the check, I think it would be best to fix the underlying compile time issue. The solution here would be to switch the list of MachineInstr to numbered instructions so that dominance checks become O(1).
I had meant to look at this for quite some time but didn't get a chance to do it.
Something similar has been done on LLVM IR (see https://reviews.llvm.org/D51664). I wish it was done in a way that could have benefited the Machine IR like I suggested in https://reviews.llvm.org/D51664#1389884, but this ship has sailed!

I understand this is a big under taking so I am fine with the threshold approach in the meantime.

I leave the final approval to Aditya and Matt.

Cheers,
-Quentin

Yes I saw that thread, it's a shame we didn't have a shared solution for MBB too. There's probably more places where we have hidden super-linear compile time waiting to be triggered.

It's actually not quite as simple as I previously thought. The current logic in getDominatingInstrForID seems to be this:

MachineInstr *A = CSE_lookup_in_same_bb(B)
if (!dominates(A, B))
    move A before B such that dominates(A, B) is true
else
    return pointer to A

If we hit our threshold, what should getDominatingInstrForID() do? Perhaps clone the instruction A instead?

I would expect something like

Returns None when it bailed.
HasValue && true -> Dominates
HasValue && false -> Not dominates but within threshold
Optional<bool> dominates(A, B);

MachineInstr *MI = getInstrIfExists();
if (MI) {
  if (auto Dom = dominates(MI, CurrPos)) {
    if (!*Dom)
       CurMBB->splice(CurrPos, CurMBB, MI);
    return MachineInstrBuilder(getMF(), MI);
  }
}
return MachineInstrBuilder();
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

It's certainly a nice property to have. For eg, we can check if a new instruction we're building already exists and is being used just by trying building a new instruction and checking for uses of the def, which could be useful for combines (and we have at least one combine that does this in our backend). This could perhaps be enabled by other APIs but is currently unavailable.
I also think of CSE as a canonicalization and later passes can rematerialize as necessary.
Also @aemerson , does this trigger in those workloads you care about (and hence if this change is necessary at all) since you mentioned that you arrived at the limit by going one order of magnitude higher.

This does touch on a bug I was looking into recently. There's a bug where if you attempt to insert an instruction, and it CSEs to the instruction at the insertion point, any new uses of the return value end up getting inserted before the instruction. I think this really needs to check for properly dominates, but it seemed a lot of cases were relying on the current behavior

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

It may be a canonicalization, but I also don't think it's the MIRBuilder's job to ensure everything is canonical

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

I'm not sure if you mean MIRBuilder here or the CSE mechanism. I was talking about the mechanism (the fact that we do it through builders is just implementation detail). My personal take on this is that ending up with a mechanism that doesn't guarantee CSE (assuming instructions can be CSEd) even if you enabled it fully by default, is not ideal.
Perhaps to address compile time for this issue (say O0 pipeline), we could try to observe which opcodes are responsible and selectively disable these with the CSEConfig. When I did measurements a while back among all opcodes CSEd, G_CONSTANT + FCONSTANT were responsible for about 75-80% of the hits in the test suites I looked at (out of tree), and these are usually right at the beginning and hence not a big problem for dominance queries taking too long. Obviously the longer term solution of O(1) dominance queries could be considered, but right now I don't have a better + cleaner suggestion.

arsenm added inline comments.Sep 11 2020, 2:11 PM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

I mean the MIRBuilder doing anything beyond directly inserting the instruction requested is a compile time hack. If having the MIRBuilder CSE in some situation adds compile time cost, it shouldn't need to do it

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

I'm not sure I follow the argument. CSEMIRBuilder's only responsibility (and the way it's designed currently) is to do the CSE check and if not just fallback to MIRBuilder implementation. Are you suggesting CSEMIRBuilder should also not do CSE if it does more work (which it most certainly will)? That sounds like maybe the regular builder should be used (and not the CSEing).

arsenm added inline comments.Sep 16 2020, 1:48 PM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
38

I thought the argument for CSEMIRBuilder was that constructing an instruction is slow enough that it's cheaper to avoid creating it, and this happens often enough to give a net speedup vs. running MachineCSE later. I don't think it's particularly natural to have any kind of CSE or optimization in the builder, and it's only due to this practical compile time concern

arsenm added a comment.Aug 7 2021, 1:41 PM

Is this still relevant?

aemerson abandoned this revision.Oct 4 2021, 12:35 PM

Is this still relevant?

It'll be replaced by instruction numbering.