Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
The intent of this is to eliminate a bunch of simple vreg-to-vreg copies that are introduced by legalization. It's work in progress because I haven't updated the ARM and AMDGPU tests yet, and because I'm not sure whether this fits into the CSEMIRBuilder philosophy. I'm not even sure what "CSE" means for a copy. This optimization doesn't require any of the CSE machinery, so it could be done in any MIRBuilder that wants to do a bit of on-the-fly simplification.
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp | ||
---|---|---|
168–172 | This would also have to be careful to not fold away cross class or bank copies |
Shouldn't running the copy_prop combine post-legalization do the same thing as this?
(AArch64 does this as of 7006bb69efb04f32e0a666e8a74f85d7fa745981)
I thought the argument for having a CSEMIRBuilder is as a compile time improvement to avoid creating the instructions in the first place. I do think we are missing optimization combines during legalization though, copy propagation being among them.
I suppose that might explain why my patch gives codegen improvements for MIPS and X86 but not for AArch64, at least according to the tests I had to update.
I thought the argument for having a CSEMIRBuilder is as a compile time improvement to avoid creating the instructions in the first place. I do think we are missing optimization combines during legalization though, copy propagation being among them.
I think MIRBuilders in general should be doing way more up-front simplification to avoid generating redundant instructions. ConstantFoldingMIRBuilder doesn't seem to be used at all, and none of the MIRBuilders seem to do other trivial simplifications like x+0 -> x.
I always thought all the stuff SelectionDAG::getNode tried to do was insane, and a large part of why the DAG is so slow. How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare
It might still be a good idea to add this to CSE MIRBuilder anyway since extra copies are so common.
It's a fine balance in how much the builders should do compared to DAG. I also personally found SDAG's getNode to be doing too much (and some of that code was replicated in the DAGCombines).
CSEMIRBuilder builds on the other builders by adding CSE. ConstantFoldingBuilder was POC on how to extend/customize builders. While I'm not sure where the balance on how much the builders should do, assuming we wanted to fold copies, it should be structured to be in one of the base builders (as the change seems unrelated to CSEing in general).
If you're ever going to do these simplifications, then surely it makes sense to do them up front so you don't have to create the MIR, rather than doing them in a later combine/simplification pass?
How often do these trivially simplifiable situations actually get produced during legalization and combines? I would expect it's pretty rare
I tried a quick hack to simplify integer arithmetic in MIRBuilder, and you can see the effect on AMDGPU legalization tests here: https://github.com/jayfoad/llvm-project/commit/00d98a70c4dc8e8db2f8465104a5e65ca12409f5
llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp | ||
---|---|---|
168–172 | I ran into this problem with AArch64InstructionSelector::fixupPHIOpBanks but I'm not sure hwo to handle it, since the register bank is not set until after the copy has been created. |
These look like some of the cases I want to have an optimizing combiner during the legalizer, but I'm not sure it means they should be done in the MIRBuilder. Some of these cases I think are avoidable. For example, I don't think we take the best path to legalize merge/unmerge, which in turn ends up producing more of these trivial situations
Yes we could avoid some trivial cases where we do a buildAdd of a buildConstant of 0 or similar: https://github.com/jayfoad/llvm-project/commit/90bc2d596adac8c2e47cc3c763a648da580feb92
But I think most of the simplification opportunities come from lowering/legalizing ops whose operands may or may not be constants, and the lowering/legalization code doesn't bother to check for the constant case. And personally I think it would be tedious and error-prone to have to do that, when you could rely on a builder to do it for you instead.
This would also have to be careful to not fold away cross class or bank copies