This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Use slot indexes to speed up huge block compile time
AbandonedPublic

Authored by aemerson on Dec 16 2020, 3:18 PM.

Details

Summary

As an alternative to setting threshold limits on CSE dominator queries, this approach requires tht the SlotIndex analysis be run before legalization, and to use the numbering to do constant time dominance checks. There is added complexity in needing to maintain the numbering as we add/remove instructions from the block.

Overall this seems to impact compile time negatively by about ~0.5% on -O0 and -Os but prevents us from having unreasonably long compile times in the worst cases.

Diff Detail

Event Timeline

aemerson created this revision.Dec 16 2020, 3:18 PM
aemerson requested review of this revision.Dec 16 2020, 3:18 PM

I still think the CSE MIR builder shouldn't be required to CSE and can bail early on long dominance queries

I still think the CSE MIR builder shouldn't be required to CSE and can bail early on long dominance queries

Agreed.

Another approach I tried which also worked is to prevent the IRTranslator from merging the constant entry block with it's successor, and delaying that until after the legalizer finished. Combined with also forcing the MIRBuilder to always look for constants to CSE in the entry block of the function, that fixed the issue. Quentin didn't think that was the right approach from a design POV. We have 3 options here, I think SlotIndexes are reasonable but they do seem to incur a slight compile time cost.

I still think the CSE MIR builder shouldn't be required to CSE and can bail early on long dominance queries

I guess the concern is how exactly one defines "early" and "long". Should it be universal? Opt in? Configurable ?.. What if it bailed just a few instructions away from the target instruction. What about the compile time impact of counting and checking in loop in the cases where you don't need and so on.

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

I still think the CSE MIR builder shouldn't be required to CSE and can bail early on long dominance queries

I guess the concern is how exactly one defines "early" and "long". Should it be universal? Opt in? Configurable ?.. What if it bailed just a few instructions away from the target instruction. What about the compile time impact of counting and checking in loop in the cases where you don't need and so on.

I don't know if it's feasible to dynamically require SlotIndexes to be run, but if it is: to ensure we never have different codegen depending on the threshold we can conditionally use SlotIndexes when we detect earlier on (maybe after translation), that the function we're compiling, or at least the entry block, is very large. If we detect that say the block is 100k generic instructions long, we could set a flag somewhere to indicate that and switch to using the indexes vs linked list walk.

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

It's possible it would help a bit. I think the localizer may also benefit from this. I don't see it making a big difference either way though in most code.

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

It's possible it would help a bit. I think the localizer may also benefit from this. I don't see it making a big difference either way though in most code.

In our out of tree backend, CSE is used and preserved for 6 passes in total. I suspect if slot indexes are also preserved, the overhead might not be too much. Hence I was curious if you actually collected some numbers. If only legalizer is being looked at for compile time impact, we're not really evaluating the compile time savings to be had in other passes (assuming amortized cost of slot index) right?

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

It's possible it would help a bit. I think the localizer may also benefit from this. I don't see it making a big difference either way though in most code.

In our out of tree backend, CSE is used and preserved for 6 passes in total. I suspect if slot indexes are also preserved, the overhead might not be too much. Hence I was curious if you actually collected some numbers. If only legalizer is being looked at for compile time impact, we're not really evaluating the compile time savings to be had in other passes (assuming amortized cost of slot index) right?

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

It's possible it would help a bit. I think the localizer may also benefit from this. I don't see it making a big difference either way though in most code.

In our out of tree backend, CSE is used and preserved for 6 passes in total. I suspect if slot indexes are also preserved, the overhead might not be too much. Hence I was curious if you actually collected some numbers. If only legalizer is being looked at for compile time impact, we're not really evaluating the compile time savings to be had in other passes (assuming amortized cost of slot index) right?

If slot indexes are preserved and used in other passes that use CSE, do you think the compile time impact is amortized/minimized?

It's possible it would help a bit. I think the localizer may also benefit from this. I don't see it making a big difference either way though in most code.

In our out of tree backend, CSE is used and preserved for 6 passes in total. I suspect if slot indexes are also preserved, the overhead might not be too much. Hence I was curious if you actually collected some numbers. If only legalizer is being looked at for compile time impact, we're not really evaluating the compile time savings to be had in other passes (assuming amortized cost of slot index) right?

Fair point. It might be quite difficult to ensure every pass preserves it but I'll see what I can do.

aemerson abandoned this revision.Feb 21 2021, 10:10 PM

This is superseded by D94264