This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
ClosedPublic

Authored by alex-t on Jun 24 2019, 11:09 AM.

Details

Summary

This change uses adding implicit use of AMDGPU::EXEC register to avoid VGPR copies movement across the instructions defining EXEC.

Diff Detail

Repository
rL LLVM

Event Timeline

alex-t created this revision.Jun 24 2019, 11:09 AM
rampitec added subscribers: llvm-commits, Restricted Project.

Please do not forget to add llvm-commits.

I personally do not object this change, let's hear other voices.

arsenm added inline comments.Jun 24 2019, 11:29 AM
lib/Target/AMDGPU/SIFoldOperands.cpp
502–508 ↗(On Diff #206265)

Blindly stripping off any implicit_operands could be error prone. We do use other implicit operands on V_MOV_B32 for indirect register indexing

lib/Target/AMDGPU/SIInstrInfo.cpp
3923–3930 ↗(On Diff #206265)

I don't see why this is necessary

alex-t marked 2 inline comments as done.Jun 24 2019, 11:59 AM
alex-t added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
502–508 ↗(On Diff #206265)

This is done just before calling MachineInstr::addImplicitUseDefOperands().
Also, this is under the "if" condition UseMI->isCopy() so not V_MOV exist so far.

lib/Target/AMDGPU/SIInstrInfo.cpp
3923–3930 ↗(On Diff #206265)

To prevent moving VGPR copies across the EXEC definitions

rampitec added inline comments.Jun 24 2019, 1:18 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
504 ↗(On Diff #206265)

I believe the intent was to remove only duplicate operands, not all implicit operands at all.

nhaehnle added inline comments.Jun 25 2019, 5:42 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
3923–3930 ↗(On Diff #206265)

I think the question is more, why isn't the addOperand executed unconditionally for non-SGPR destinations? That is:

if (!RI.isSGPRClass(DstRC) && !Copy->readsRegister(AMDGPU::EXEC, &RI))
  Copy->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true));

... without the preceding loop.

Presumably the answer might have to do with PHIs that have undef / IMPLICIT_DEF operands? Though, isn't it preferable for those to just be reduced to an IMPLICIT_DEF instead of a COPY from an IMPLICIT_DEF? And if the COPY is optimized away later, does adding the EXEC really hurt?

alex-t marked 2 inline comments as done.Jun 25 2019, 6:57 AM
alex-t added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
504 ↗(On Diff #206265)

For the AMDGPU::COPY, that is the only case here, we don't expect any implicit operands to exist except those added by legalizeGenericOperand.

lib/Target/AMDGPU/SIInstrInfo.cpp
3923–3930 ↗(On Diff #206265)

The PHIs that have an input of COPY from IMPLICIT_DEF were exactly the reason for the "preceding loop" :)
Since the COPY from IMPLICIT_DEF to VGPR gets equipped with implicit use of EXEC it is not considered as the conversion candidate by the processImplicitDefs pass. So, I just excluded them from marking as using EXEC.

I'm still not convinced just sticking EXEC reads on copies like this is a complete solution. Yes, we'll now insert the exec read on any copies that came out of isel, but any pass could introduce new copies. I guess if generic passes were more restricted when copies have implicit uses? It would require careful auditing of any generic code that can introduce copies, and may be more pessimizing than we really want

lib/Target/AMDGPU/SIFoldOperands.cpp
502–508 ↗(On Diff #206265)

This would be a lot simpler to do
assert(UseMI->getNumOperands() == 3 && UseMI->getOperand(2) == exec), RemoveOperand(2).

I think this part should be split off into a separate patch to fix the extra implicit uses

alex-t marked an inline comment as done.Jun 25 2019, 10:31 AM
alex-t added inline comments.
lib/Target/AMDGPU/SIFoldOperands.cpp
502–508 ↗(On Diff #206265)

As you mentioned, there may be another copy makers besides the legalizeGenericOoperand.
They not necessarily have extra EXEC operand. I primarily take care of the copies introduced by the PHI legalization.

Just ping! Does anybody has any objections?

qcolombet added inline comments.Aug 1 2019, 7:02 AM
lib/CodeGen/PeepholeOptimizer.cpp
1811 ↗(On Diff #206265)

We should check that we don't have any implicit def and that would be good.
Generally speaking, I am somewhat uncomfortable that we attach thing to COPY, but as long as this is just implicit use, that's less critical. Otherwise, like the previous comment said, this will break in subtle way.

alex-t updated this revision to Diff 214124.Aug 8 2019, 5:56 AM

According the reviewer's request the check for implicit defs has been added.
New helper functions in MachineInstr.h are necessary because the current interface like MachineInstr::getNumExplicitDefs returns just the MCInstDesc::NumDefs for non variadic opcodes.

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2019, 5:56 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

LGTM for the generic part.

Nits below.

llvm/include/llvm/CodeGen/MachineInstr.h
430 ↗(On Diff #214124)

Nit: Period at the end of sentence.

435 ↗(On Diff #214124)

MO.isDef implies MO.isReg.
You can simplify the check.

448 ↗(On Diff #214124)

Implicit operands must be registers.
Thus, you can simplify all this with:

return getNumOperands() - getNumExplicitOperands();
alex-t updated this revision to Diff 215401.Aug 15 2019, 7:58 AM

Suggested nits added.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 21 2019, 8:14 AM
This revision was automatically updated to reflect the committed changes.