This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate
ClosedPublic

Authored by alex-t on May 22 2020, 5:34 AM.

Details

Summary

PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence.

       In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back.
       PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. 
       We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the 
       source VGPR in this COPY is move immediate.
                       
bb.0:
  
   %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
   %2:sreg_32 = .....

bb.1:
   %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1
   S_BRANCH %bb.3

bb.3:
   %1:sreg_32 = COPY %0 
   S_BRANCH %bb.2

Diff Detail

Event Timeline

alex-t created this revision.May 22 2020, 5:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2020, 5:34 AM
alex-t updated this revision to Diff 265715.May 22 2020, 5:38 AM

test added

rampitec requested changes to this revision.May 22 2020, 10:31 AM

Every mov isMoveImmediate(), it is just a property on MCDesc.
I also assume it relies on the folding later. If that folding will not happen for whatever reason it will be a broken IR. Right?

This revision now requires changes to proceed.May 22 2020, 10:31 AM
alex-t updated this revision to Diff 265786.May 22 2020, 12:45 PM

Explicit check move source operand to be immediate.

Every mov isMoveImmediate(), it is just a property on MCDesc.
I also assume it relies on the folding later. If that folding will not happen for whatever reason it will be a broken IR. Right?

I meant the move a constant value to VGPR. In this particular case it was moving 0. Then GVN force the VGPR to be reused multiple times along the basic block and it's successors.
The PHI in question just needed value 0. No other reasons to use VGPR.

You do not even need a loop to get PHI. A PHI might occur after an simple IF.
Nevertheless, patch retains an illegal copy which it is supposed to remove. The fact that it works in this testcase does not mean it will always work.
I would suggest to fold the immediate copy instead. I.e. you can replace COPY with an S_MOV_B32 (given it fits 32 bit of course).

You do not even need a loop to get PHI. A PHI might occur after an simple IF.
Nevertheless, patch retains an illegal copy which it is supposed to remove. The fact that it works in this testcase does not mean it will always work.
I would suggest to fold the immediate copy instead. I.e. you can replace COPY with an S_MOV_B32 (given it fits 32 bit of course).

The following code does it in one of the next iterations

// If we are just copying an immediate, we can replace the copy with
// s_mov_b32.
if (isSafeToFoldImmIntoCopy(&MI, DefMI, TII, SMovOp, Imm)) {
  MI.getOperand(1).ChangeToImmediate(Imm);
  MI.addImplicitDefUseOperands(MF);
  MI.setDesc(TII->get(SMovOp));
  break;
}

You do not even need a loop to get PHI. A PHI might occur after an simple IF.
Nevertheless, patch retains an illegal copy which it is supposed to remove. The fact that it works in this testcase does not mean it will always work.
I would suggest to fold the immediate copy instead. I.e. you can replace COPY with an S_MOV_B32 (given it fits 32 bit of course).

The following code does it in one of the next iterations

// If we are just copying an immediate, we can replace the copy with
// s_mov_b32.
if (isSafeToFoldImmIntoCopy(&MI, DefMI, TII, SMovOp, Imm)) {
  MI.getOperand(1).ChangeToImmediate(Imm);
  MI.addImplicitDefUseOperands(MF);
  MI.setDesc(TII->get(SMovOp));
  break;
}

What if not isSafeToFoldImmIntoCopy?

You do not even need a loop to get PHI. A PHI might occur after an simple IF.
Nevertheless, patch retains an illegal copy which it is supposed to remove. The fact that it works in this testcase does not mean it will always work.
I would suggest to fold the immediate copy instead. I.e. you can replace COPY with an S_MOV_B32 (given it fits 32 bit of course).

The following code does it in one of the next iterations

// If we are just copying an immediate, we can replace the copy with
// s_mov_b32.
if (isSafeToFoldImmIntoCopy(&MI, DefMI, TII, SMovOp, Imm)) {
  MI.getOperand(1).ChangeToImmediate(Imm);
  MI.addImplicitDefUseOperands(MF);
  MI.setDesc(TII->get(SMovOp));
  break;
}

What if not isSafeToFoldImmIntoCopy?

Okay. isSafeToFoldImmIntoCopy returns 'false' by one of the following:

if (Copy->getOpcode() != AMDGPU::COPY)
   return false;

 if (!MoveImm->isMoveImmediate())
   return false;

 const MachineOperand *ImmOp =
     TII->getNamedOperand(*MoveImm, AMDGPU::OpName::src0);
 if (!ImmOp->isImm())
   return false;

 // FIXME: Handle copies with sub-regs.
 if (Copy->getOperand(0).getSubReg())
   return false;

 switch (MoveImm->getOpcode()) {
 default:
   return false;

The latter 2 conditions - subregs and exact move opcode - are not handled in my code.
So, what would you suggest:

  1. guard with if (isSafeToFoldImmIntoCopy(...))
  2. add subreg and exact opcode check to my 'if' ?
alex-t updated this revision to Diff 266021.May 25 2020, 7:30 AM

Explicit call to isSafeToFoldImmIntoCopy added. Corresponding COPY converted to S_MOV right away.
2 test cases added to check for copy to subreg and inappropriate move opcodes.

rampitec added inline comments.May 26 2020, 10:47 AM
llvm/test/CodeGen/AMDGPU/phi-vgpr-input-moveimm.mir
36

Please fix the label to be a function name.

69

Here too.

72

Does it compile with a white space in function's name?

alex-t updated this revision to Diff 266477.May 27 2020, 3:44 AM
alex-t marked an inline comment as done.

test corrected

llvm/test/CodeGen/AMDGPU/phi-vgpr-input-moveimm.mir
72

Yes. Though the whitespace it a typo it compiles and same name is in output file.

alex-t marked 3 inline comments as done.May 27 2020, 3:45 AM
This revision is now accepted and ready to land.May 27 2020, 9:45 AM
This revision was automatically updated to reflect the committed changes.