This is an archive of the discontinued LLVM Phabricator instance.

[GISel]: Add support for CSEing continuously during GISel passes
ClosedPublic

Authored by aditya_nandakumar on Oct 2 2018, 2:32 PM.

Details

Summary

This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder.

This patch also includes some changes to MachineIRBuilderBase (which is separated out here - https://reviews.llvm.org/D52131). I have enabled it in the legalizer for now. I've also only updated AArch64 for the API and test changes - I'll update the rest post feedback.

I measured some off tree benchmarks for compile time and I didn't notice too much of a compile time regression (<1%). I'll try to get some numbers for AArch64 with llvm-test-suite.

Looking forward to your feedback.

Diff Detail

Repository
rL LLVM

Event Timeline

I haven't gone through the patch in full but I have a few comments based on the discussion we had today

include/llvm/CodeGen/GlobalISel/CSEInfo.h
43

I'm not keen on using recordNewInstruction to report changes to existing instructions. We should have another interface for mutations

lib/CodeGen/GlobalISel/CSEInfo.cpp
38

It seems that this persists until GISelCSEInfo is destroyed which could be many passes from now. Effectively this imposes behaviour changes on subsequent passes that may not be expecting it.

In the case where a subsequent pass gets it but doesn't intend to preserve it, the subsequent pass is going to pay the cost of the maintenance unnecessarily. In the case where a subsequent pass gets it and intends to preserve it but needs to do so carefully, it's going to be fighting against a default maintenance method that was installed without its knowledge.

Another problem is that GISelCSEInfo doesn't have exclusive ownership of the MF delegate. If a pass wants to use it, this is going to clobber the one they set up.

In summary, I think passes need to install their own strategy to maintain it. This could be the standard one but I think they need to choose it.

Thanks for working on this. Apart from Daniel's suggestion, I have a few more comments but haven't done an in depth analysis of the memory costs/algorithmic properties.

As for performance data, it would be good to have this enabled optionally in the IRTranslator as well. We can compare the effect on compile time and code size in just IRT vs IRT + Legalizer. I would hope we can get a speedup.

include/llvm/CodeGen/GlobalISel/CSEInfo.h
67

Document what this is for?

include/llvm/CodeGen/GlobalISel/CSEMIRBuilder.h
50

We should assert here the assumption that the iterators from the same block.

52

'count' is unused here.

65

Typo 'to'

227

There's a fair amount of code duplication here. This might get a bit religious depending on your philosophy w.r.t macros, but we could factor out most of the code from the first part of the builder methods. I'm not going to say it should definitely be done, I think it's just about worth it though.

lib/CodeGen/GlobalISel/CSEInfo.cpp
272

Should move this as a helper in LLT, something like serializeData()? That way we don't have to have internal bit layout detail in a user of LLT.

aditya_nandakumar marked 4 inline comments as done.

Addressed feedback from Daniel and Amara.
Added a RAIIDelegateInstaller that can be used to explicitly install and uninstall the CSEInfo delegate in the legalizer.

aditya_nandakumar marked an inline comment as done.Oct 19 2018, 6:24 PM
aditya_nandakumar added inline comments.
include/llvm/CodeGen/GlobalISel/CSEInfo.h
43

I can in a separate patch change the API of MachineIRBuilder to reflect the different kinds of recordings that we may need and then update this patch to use that.

include/llvm/CodeGen/GlobalISel/CSEMIRBuilder.h
52

This was used to profile by printing which domination algorithm would be preferable - I updated it now.

227

Thanks. I also want to avoid this code duplication. For now, I've kept it simple, but it's definitely something that needs to be addressed here using whatever mechanism.

lib/CodeGen/GlobalISel/CSEInfo.cpp
38

I've made this explicit now - the passes can use the RAIIDelegateInstaller now to set this up.

nhaehnle removed a subscriber: nhaehnle.Oct 22 2018, 2:10 AM
aditya_nandakumar marked an inline comment as done.

Rebased, and updated to the new MachineIRBuilder interface.
By default the CSE is enabled during IRTranslator and Legalizer now (disable with -enable-cse-in-legalizer/irtranslator=0).
Added a helper class called GISelObserverWrapper which provides a single observer that can call several observers which can be used to register as the delegate to MachineFunction.
I've also updated most tests (except some tests where it's not obvious what the test is doing where CSE is disabled).

By default the CSE is enabled during IRTranslator and Legalizer now (disable with -enable-cse-in-legalizer/irtranslator=0).

I naively wonder if unconditionally enabling a CSE optimization even at -O0 is going to lead to a poorer debug experience in some cases?

I've also updated most tests (except some tests where it's not obvious what the test is doing where CSE is disabled).

If CSE should be enabled by default, I guess that we'll need to figure out what the tests are supposed to be doing so that there aren't many tests mainly testing non-default behaviour?

By default the CSE is enabled during IRTranslator and Legalizer now (disable with -enable-cse-in-legalizer/irtranslator=0).

I naively wonder if unconditionally enabling a CSE optimization even at -O0 is going to lead to a poorer debug experience in some cases?

Yes this is definitely an important question. I only enabled them by default because I wanted to test that CSE works with the pipeline. I can definitely disable CSE for O0 for each of the passes and add some more tests (besides making the unit test more comprehensive) that make sure CSE works in pipeline.

I've also updated most tests (except some tests where it's not obvious what the test is doing where CSE is disabled).

If CSE should be enabled by default, I guess that we'll need to figure out what the tests are supposed to be doing so that there aren't many tests mainly testing non-default behaviour?

I updated most of the tests that just run a single GISel pass. I was hesitant to update the tests that take it all the way to assembly (that were not auto generated).

Updated to add a CSEConfig class that describes what instructions should be CSEd.
Additionally, updated IRTranslator and Legalizer to only perform CSE for G_CONSTANTS for -O0.

aemerson accepted this revision.Jan 15 2019, 11:15 AM

By default the CSE is enabled during IRTranslator and Legalizer now (disable with -enable-cse-in-legalizer/irtranslator=0).

I naively wonder if unconditionally enabling a CSE optimization even at -O0 is going to lead to a poorer debug experience in some cases?

This is a good point, so I think that if we enable it for -O0 it should only be for constants.

I've also updated most tests (except some tests where it's not obvious what the test is doing where CSE is disabled).

If CSE should be enabled by default, I guess that we'll need to figure out what the tests are supposed to be doing so that there aren't many tests mainly testing non-default behaviour?

After some offline discussion, we've decided to not enable this at all at -O0 due to some minor compile time and code size regressions. The CSE here introduces copies as it must define a register when asked to, but at -O0 we don't do the copy elision and so compile time isn't improved, and it likely affects later passes like immediate selection. Overall, LGTM for enabling at -O1 or higher.

This revision is now accepted and ready to land.Jan 15 2019, 11:15 AM

Rebased, updated based on feedback.
With this change, now they're disabled by default (for O0).

Submitted in r351283.