This change uses adding implicit use of AMDGPU::EXEC register to avoid VGPR copies movement across the instructions defining EXEC.
Details
Diff Detail
Event Timeline
Please do not forget to add llvm-commits.
I personally do not object this change, let's hear other voices.
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
502–508 ↗ | (On Diff #206265) | Blindly stripping off any implicit_operands could be error prone. We do use other implicit operands on V_MOV_B32 for indirect register indexing |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
3923–3930 ↗ | (On Diff #206265) | I don't see why this is necessary |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
502–508 ↗ | (On Diff #206265) | This is done just before calling MachineInstr::addImplicitUseDefOperands(). |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
3923–3930 ↗ | (On Diff #206265) | To prevent moving VGPR copies across the EXEC definitions |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
504 ↗ | (On Diff #206265) | I believe the intent was to remove only duplicate operands, not all implicit operands at all. |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
3923–3930 ↗ | (On Diff #206265) | I think the question is more, why isn't the addOperand executed unconditionally for non-SGPR destinations? That is: if (!RI.isSGPRClass(DstRC) && !Copy->readsRegister(AMDGPU::EXEC, &RI)) Copy->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true)); ... without the preceding loop. Presumably the answer might have to do with PHIs that have undef / IMPLICIT_DEF operands? Though, isn't it preferable for those to just be reduced to an IMPLICIT_DEF instead of a COPY from an IMPLICIT_DEF? And if the COPY is optimized away later, does adding the EXEC really hurt? |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
504 ↗ | (On Diff #206265) | For the AMDGPU::COPY, that is the only case here, we don't expect any implicit operands to exist except those added by legalizeGenericOperand. |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
3923–3930 ↗ | (On Diff #206265) | The PHIs that have an input of COPY from IMPLICIT_DEF were exactly the reason for the "preceding loop" :) |
I'm still not convinced just sticking EXEC reads on copies like this is a complete solution. Yes, we'll now insert the exec read on any copies that came out of isel, but any pass could introduce new copies. I guess if generic passes were more restricted when copies have implicit uses? It would require careful auditing of any generic code that can introduce copies, and may be more pessimizing than we really want
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
502–508 ↗ | (On Diff #206265) | This would be a lot simpler to do I think this part should be split off into a separate patch to fix the extra implicit uses |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
502–508 ↗ | (On Diff #206265) | As you mentioned, there may be another copy makers besides the legalizeGenericOoperand. |
lib/CodeGen/PeepholeOptimizer.cpp | ||
---|---|---|
1811 ↗ | (On Diff #206265) | We should check that we don't have any implicit def and that would be good. |
According the reviewer's request the check for implicit defs has been added.
New helper functions in MachineInstr.h are necessary because the current interface like MachineInstr::getNumExplicitDefs returns just the MCInstDesc::NumDefs for non variadic opcodes.
LGTM for the generic part.
Nits below.
llvm/include/llvm/CodeGen/MachineInstr.h | ||
---|---|---|
430 | Nit: Period at the end of sentence. | |
435 | MO.isDef implies MO.isReg. | |
448 | Implicit operands must be registers. return getNumOperands() - getNumExplicitOperands(); |
Nit: Period at the end of sentence.