This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
ClosedPublic

Authored by kerbowa on Nov 8 2020, 11:28 PM.

Details

Summary

It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Diff Detail

Event Timeline

kerbowa created this revision.Nov 8 2020, 11:28 PM
kerbowa requested review of this revision.Nov 8 2020, 11:28 PM
kerbowa updated this revision to Diff 303757.Nov 8 2020, 11:40 PM

Remove dead code.

Harbormaster completed remote builds in B78063: Diff 303757.

Is there a practical reason to have different pseudos for set_idx and movrel methods and not just expand a common pseudo accordingly post RA?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
3619–3620

Comment is obsolete.

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
1338

MI.isBundle()

arsenm added inline comments.Nov 9 2020, 10:35 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2721

addReg on separate line

Is there a practical reason to have different pseudos for set_idx and movrel methods and not just expand a common pseudo accordingly post RA?

It's a good question but in the end, I chose not to since m0 initializations can be rescheduled, and combing the pseudo's would disallow this. The other issue is that the movrel pseudo includes the scalar version which is used on targets with VGPR indexing as well. I'm open to trying it if you think it seems more practical.

Is there a practical reason to have different pseudos for set_idx and movrel methods and not just expand a common pseudo accordingly post RA?

It's a good question but in the end, I chose not to since m0 initializations can be rescheduled, and combing the pseudo's would disallow this. The other issue is that the movrel pseudo includes the scalar version which is used on targets with VGPR indexing as well. I'm open to trying it if you think it seems more practical.

Both methods use M0, aren't they? The scalar version might be an argument, but I guess it is just a matter of selecting an indirect move for scalar version based on the the subtarget or not.

Is there a practical reason to have different pseudos for set_idx and movrel methods and not just expand a common pseudo accordingly post RA?

It's a good question but in the end, I chose not to since m0 initializations can be rescheduled, and combing the pseudo's would disallow this. The other issue is that the movrel pseudo includes the scalar version which is used on targets with VGPR indexing as well. I'm open to trying it if you think it seems more practical.

Both methods use M0, aren't they?

Yes, but the GPR_IDX method also sets the mode at the same time and cannot be rescheduled unlike a simple write to m0.

The scalar version might be an argument, but I guess it is just a matter of selecting an indirect move for scalar version based on the the subtarget or not.

The s_movrel is expanded from INDIRECT_REG_WRITE_MOVREL_pseudo for subtargets with and without gpr indexing. The VGPR indexing only effects the vector variants where what matters is whether we are changing the mode register or not. Since this method sets m0 and the mode at the same time the pseudo instruction takes an extra sgpr operand. The v_movrel version emits the write to m0 earlier, and so this pseduo does not need the extra operand unless we combine the pseudos and are delaying emitting the m0 write until later than we currently do. My thought was that since the m0 write in the movrel version can be rescheduled it should be expanded earlier.

OK, could you please run PSDB on it as the change is fairly big?

arsenm added inline comments.Nov 12 2020, 11:48 AM
llvm/test/CodeGen/AMDGPU/indirect-addressing-si-gfx9.ll
43–44

Could also use a MIR copy in the exact situation the bad copies were inserted

kerbowa updated this revision to Diff 308201.Nov 29 2020, 12:27 AM

Address comments.

kerbowa marked 2 inline comments as done.Nov 29 2020, 12:29 AM

EPSDB is passing.

arsenm added inline comments.Nov 30 2020, 7:59 AM
llvm/test/CodeGen/AMDGPU/expand-si-indirect.mir
26

Can strip out the type, it's a minor bug this exists here

36

Can you -run-pass=none to compact the register numbers

kerbowa updated this revision to Diff 308543.Nov 30 2020, 10:22 PM

Fix test.

kerbowa marked 2 inline comments as done.Nov 30 2020, 10:23 PM
ruiling added a subscriber: ruiling.Dec 6 2020, 7:31 PM

The idea of the patch looks good to me. I went through the patch, didn't see other issues than the inline comments. But I do hope more experienced guy take further look. Thanks for working on this as we also meet the issue that TwoAddressInstruction pass inserts COPY inside the s_set_gpr_idx_on/off instructions and generate mis-compiled shader. So we really depends on this patch to fix the issue.

llvm/lib/Target/AMDGPU/SIInstructions.td
611

We may also need "let Defs = [M0];" here? as we are changing the bits of M0 registers.
And what kind of benefit we can get through adding M0 into Uses?

626

and also here?

rampitec added inline comments.Dec 7 2020, 3:48 PM
llvm/lib/Target/AMDGPU/SIInstructions.td
611

We may also need "let Defs = [M0];" here? as we are changing the bits of M0 registers.

I think that's right, and MODE too, plus add MODE to uses. S_SET_GPR_IDX_ON defs and uses both.

ruiling added inline comments.Dec 7 2020, 6:50 PM
llvm/lib/Target/AMDGPU/SIInstructions.td
611

@rampitec Thanks for a second look. I think if the new pseudo instruction only touch the mode.gpr_idx_en bit, other instructions (i.e. not s_set_gpr_idx_on/off) that modify/read the MODE register are free to schedule cross this new pseudo instruction and not having problems. So I think we may not need to add the MODE to Uses or Defs here. I am not sure if I missed some-case.

rampitec added inline comments.Dec 7 2020, 8:13 PM
llvm/lib/Target/AMDGPU/SIInstructions.td
611

For some reason set_gpr_idx lists mode. We need to ask Matt why.

JBTW, having a need to def M0 probably defeats the idea of potential rescheduling which justifies the separation of the indirect access methods.

JBTW, having a need to def M0 probably defeats the idea of potential rescheduling which justifies the separation of the indirect access methods.

The GPR_IDX variants are where we are speaking of adding the def M0 right? These are not expanded until after the scheduler. The MOVREL version's M0 def is expanded earlier during ISel finalization.

My thinking was that we wanted rescheduling the M0 def with the MOVREL version but not the GPR_IDX version and that combining these methods wouldn't allow that.

JBTW, having a need to def M0 probably defeats the idea of potential rescheduling which justifies the separation of the indirect access methods.

The GPR_IDX variants are where we are speaking of adding the def M0 right? These are not expanded until after the scheduler. The MOVREL version's M0 def is expanded earlier during ISel finalization.

My thinking was that we wanted rescheduling the M0 def with the MOVREL version but not the GPR_IDX version and that combining these methods wouldn't allow that.

OK, but def is still seems to be needed. We also need Matt to clrify MODE question.

kerbowa updated this revision to Diff 310309.Dec 8 2020, 12:08 PM

Add M0 to defs.

rampitec accepted this revision.Dec 8 2020, 12:17 PM

LGTM. Not sure why MODE was also defined by set_gpr_idx, but that can be corrected later if needed.

This revision is now accepted and ready to land.Dec 8 2020, 12:17 PM
This revision was landed with ongoing or failed builds.Dec 8 2020, 12:24 PM
This revision was automatically updated to reflect the committed changes.