This is an archive of the discontinued LLVM Phabricator instance.

[WIP][GlobalISel] CSE copies
Needs ReviewPublic

Authored by foad on Aug 24 2020, 9:20 AM.

Diff Detail

Event Timeline

foad created this revision.Aug 24 2020, 9:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 9:20 AM
foad requested review of this revision.Aug 24 2020, 9:20 AM
foad added a comment.Aug 24 2020, 9:25 AM

The intent of this is to eliminate a bunch of simple vreg-to-vreg copies that are introduced by legalization. It's work in progress because I haven't updated the ARM and AMDGPU tests yet, and because I'm not sure whether this fits into the CSEMIRBuilder philosophy. I'm not even sure what "CSE" means for a copy. This optimization doesn't require any of the CSE machinery, so it could be done in any MIRBuilder that wants to do a bit of on-the-fly simplification.

arsenm added inline comments.Aug 24 2020, 9:27 AM
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
168–172

This would also have to be careful to not fold away cross class or bank copies

Shouldn't running the copy_prop combine post-legalization do the same thing as this?

(AArch64 does this as of 7006bb69efb04f32e0a666e8a74f85d7fa745981)

Shouldn't running the copy_prop combine post-legalization do the same thing as this?

(AArch64 does this as of 7006bb69efb04f32e0a666e8a74f85d7fa745981)

I thought the argument for having a CSEMIRBuilder is as a compile time improvement to avoid creating the instructions in the first place. I do think we are missing optimization combines during legalization though, copy propagation being among them.

foad added a comment.Aug 24 2020, 9:59 AM

Shouldn't running the copy_prop combine post-legalization do the same thing as this?

(AArch64 does this as of 7006bb69efb04f32e0a666e8a74f85d7fa745981)

I suppose that might explain why my patch gives codegen improvements for MIPS and X86 but not for AArch64, at least according to the tests I had to update.

foad added a comment.Aug 24 2020, 10:02 AM

I thought the argument for having a CSEMIRBuilder is as a compile time improvement to avoid creating the instructions in the first place. I do think we are missing optimization combines during legalization though, copy propagation being among them.

I think MIRBuilders in general should be doing way more up-front simplification to avoid generating redundant instructions. ConstantFoldingMIRBuilder doesn't seem to be used at all, and none of the MIRBuilders seem to do other trivial simplifications like x+0 -> x.

I thought the argument for having a CSEMIRBuilder is as a compile time improvement to avoid creating the instructions in the first place. I do think we are missing optimization combines during legalization though, copy propagation being among them.

I think MIRBuilders in general should be doing way more up-front simplification to avoid generating redundant instructions. ConstantFoldingMIRBuilder doesn't seem to be used at all, and none of the MIRBuilders seem to do other trivial simplifications like x+0 -> x.

I always thought all the stuff SelectionDAG::getNode tried to do was insane, and a large part of why the DAG is so slow. How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare

It might still be a good idea to add this to CSE MIRBuilder anyway since extra copies are so common.

It's a fine balance in how much the builders should do compared to DAG. I also personally found SDAG's getNode to be doing too much (and some of that code was replicated in the DAGCombines).
CSEMIRBuilder builds on the other builders by adding CSE. ConstantFoldingBuilder was POC on how to extend/customize builders. While I'm not sure where the balance on how much the builders should do, assuming we wanted to fold copies, it should be structured to be in one of the base builders (as the change seems unrelated to CSEing in general).

foad added a comment.Aug 26 2020, 2:43 AM

I always thought all the stuff SelectionDAG::getNode tried to do was insane, and a large part of why the DAG is so slow.

If you're ever going to do these simplifications, then surely it makes sense to do them up front so you don't have to create the MIR, rather than doing them in a later combine/simplification pass?

How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare

I tried a quick hack to simplify integer arithmetic in MIRBuilder, and you can see the effect on AMDGPU legalization tests here: https://github.com/jayfoad/llvm-project/commit/00d98a70c4dc8e8db2f8465104a5e65ca12409f5

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
168–172

I ran into this problem with AArch64InstructionSelector::fixupPHIOpBanks but I'm not sure hwo to handle it, since the register bank is not set until after the copy has been created.

I always thought all the stuff SelectionDAG::getNode tried to do was insane, and a large part of why the DAG is so slow.

If you're ever going to do these simplifications, then surely it makes sense to do them up front so you don't have to create the MIR, rather than doing them in a later combine/simplification pass?

How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare

I tried a quick hack to simplify integer arithmetic in MIRBuilder, and you can see the effect on AMDGPU legalization tests here: https://github.com/jayfoad/llvm-project/commit/00d98a70c4dc8e8db2f8465104a5e65ca12409f5

These look like some of the cases I want to have an optimizing combiner during the legalizer, but I'm not sure it means they should be done in the MIRBuilder. Some of these cases I think are avoidable. For example, I don't think we take the best path to legalize merge/unmerge, which in turn ends up producing more of these trivial situations

foad added a comment.Aug 26 2020, 8:41 AM

I always thought all the stuff SelectionDAG::getNode tried to do was insane, and a large part of why the DAG is so slow.

If you're ever going to do these simplifications, then surely it makes sense to do them up front so you don't have to create the MIR, rather than doing them in a later combine/simplification pass?

How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare

I tried a quick hack to simplify integer arithmetic in MIRBuilder, and you can see the effect on AMDGPU legalization tests here: https://github.com/jayfoad/llvm-project/commit/00d98a70c4dc8e8db2f8465104a5e65ca12409f5

These look like some of the cases I want to have an optimizing combiner during the legalizer, but I'm not sure it means they should be done in the MIRBuilder. Some of these cases I think are avoidable. For example, I don't think we take the best path to legalize merge/unmerge, which in turn ends up producing more of these trivial situations

Yes we could avoid some trivial cases where we do a buildAdd of a buildConstant of 0 or similar: https://github.com/jayfoad/llvm-project/commit/90bc2d596adac8c2e47cc3c763a648da580feb92

But I think most of the simplification opportunities come from lowering/legalizing ops whose operands may or may not be constants, and the lowering/legalization code doesn't bother to check for the constant case. And personally I think it would be tedious and error-prone to have to do that, when you could rely on a builder to do it for you instead.