Peephole optimizer should not be introducing sub-reg definitions
as they are illegal in machine SSA phase. This patch modifies
the optimizer to not emit sub-register definitions.
Details
- Reviewers
nemanjai arsenm efriedma MatzeB - Group Reviewers
Restricted Project - Commits
- rG31ef15e0442a: Teach peephole optimizer to not emit sub-register defs
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think that the original author (Matthias Braun) no longer contributes to LLVM as I haven't seen any activity from him so I'll add some reviewers that may be familiar with this code.
llvm/lib/CodeGen/DetectDeadLanes.cpp | ||
---|---|---|
343–344 ↗ | (On Diff #348973) | Can this be conditional so we don't change the behaviour of the common case? if (unsigned SubIdx = Def.getSubReg()) DefinedLanes = TRI->composeSubRegIndexLaneMask(SubIdx, DefinedLanes); |
This assert is correct. Subregister defs in SSA are illegal, and this does run on SSA.
Also could use a mir test
I'm not sure this is the right fix.
From my understanding, REG_SEQUENCE, INSERT_SUBREG, EXTRACT_SUBREG, and PHI should never target a subregister. For COPY, we can target a subregister, but only if that register is a physical register. So this patch only affects COPY instructions.
Given that, I think we should be bailing out sooner; trying to understand subregister defs of physical registers seems tricky, and not really related to the purpose of this pass.
Physical registers do not have subregister indexes. TRI::getSubReg resolves the physreg with index to a different physreg value. It's invalid to combine the two
It certainly does sound odd to have a subreg on a def of a virtual register in SSA. We'll see where this is coming from and why but just a quick look at the -print-after-all output reveals that it is the peephole optimizer that produces it. We'll look into whether this is a bug in the peephole optimizer or if this is some PPC callback called from the peephole optimizer that does this.
Address comments:
- Subreg defs are indeed illegal; taught peephole optimizer not to emit these.
- Added mir test case.
llvm/lib/CodeGen/PeepholeOptimizer.cpp | ||
---|---|---|
592 | I don't see why you need to special case the use opcode | |
608 | The SubIdx should not be passed through here? | |
llvm/test/CodeGen/PowerPC/peephole-subreg.mir | ||
5–30 ↗ | (On Diff #353472) | IR section is not necessary |
32 ↗ | (On Diff #353472) | Should use a relevant name and add a descriptive comment |
35–42 ↗ | (On Diff #353472) | Registers section not necessary |
llvm/lib/CodeGen/PeepholeOptimizer.cpp | ||
---|---|---|
592 | Thanks, you are right, it is not really needed. I have removed the check for the opcode. I have also refined the approach to handle the case for when UseSrcSubIdx is true. | |
608 | I think, we need to copy 32-bit subreg value from the 64-bit register. So we need to generate something like this: Removing SubIdx, leads to copying the complete 64-bit register and we end up generating: which does not look right, unless I am missing something here. | |
llvm/test/CodeGen/PowerPC/peephole-subreg.mir | ||
5–30 ↗ | (On Diff #353472) | I have simplified the IR to generate the test case. However, removing the IR section, gives me error for use of undefined global value '@strlen'. |
llvm/test/CodeGen/PowerPC/peephole-subreg-def.mir | ||
---|---|---|
9–27 | Since I renamed the file, I am writing my response to the earlier comment, of removing the IR section, here for better visibility: "I have simplified the IR to generate the test case. However, removing the IR section, gives me error for use of undefined global value @strlen." |
Sorry for the late reply, I did not see any of the existing test cases trigger on the subreg definition that peephole opt was introducing. I observed it when using -ppc-track-subreg-liveness option for certain test cases.
llvm/test/CodeGen/PowerPC/peephole-subreg-def.mir | ||
---|---|---|
9–27 | You can strip the strlen reference from the MIR, either by replacing with 0 or just deleting the calli. Peephole opt doesn't care about the call here |
llvm/test/CodeGen/PowerPC/peephole-subreg-def.mir | ||
---|---|---|
9–27 | Thanks for your patience. I have been able to remove the IR section. |
llvm/lib/CodeGen/PeepholeOptimizer.cpp | ||
---|---|---|
600 | I don't see why this second copy is necessary since %3 and %6 have the same class, but I guess that's not a big deal |
I don't see why you need to special case the use opcode