This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Introduce and use the new PRED_COPY opcode
AbandonedPublic

Authored by yassingh on May 11 2023, 11:34 AM.

Details

Reviewers
arsenm
cdevadas
Group Reviewers
Restricted Project
Summary

Since all active lanes data should be preserved while splitting vector registers
that might contain SGPR spills we introduce new opcode to be used instead of
TargetOpcode::COPY.

Define a new predicated copy opcode (PRED_COPY) to be used in LiveRangeSplitting.
Later instead of lowering it to corresponding MachineInstrction it will be lowered to COPY
Instruction after the necessary WWM code is inserted around it in SISimplifyPredicatedCopyPass.
This will prevent duplication of lowerCopy() code.

Diff Detail

Event Timeline

yassingh created this revision.May 11 2023, 11:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:34 AM
yassingh requested review of this revision.May 11 2023, 11:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:34 AM
yassingh added reviewers: arsenm, cdevadas, Restricted Project.May 14 2023, 11:19 PM

I haven't followed this closely, but how will this interact with GlobalISel? Do we plan to introduce PRED_COPY as well after RegBankSelect there, or is it strictly going to be a Post-ISel thing?
If it's not making it in GISel anytime soon, is it somehow possible to ensure it cannot be found in generic MIR? We have a lot of places that check for ::COPY and if we start having multiple COPY opcodes, I think it could break stuff (e.g. IgnoreCopies pattern bit).

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2365–2367
7906

leftover?

llvm/lib/Target/AMDGPU/SIInstructions.td
3356–3358

Do you have more detail on the failure that's happening when this is used?
Out of correctness it might be nice to have the implicit $exec

(note: no strong opinion either so if other reviewers think it's fine as-is, then it's good for me too)

llvm/lib/Target/AMDGPU/SISimplifyPredicatedCopies.cpp
3

formatting

12–16

Small high-level nit: I think "Lowering" is the right term here. It's not simplifying them, it's completely replacing them with something else right?

61–64
81–82

also nit: you could also use MI.getOpcode() directly unless you need to check the opcode again later.

The current plan is to drop this and replace copies during live range splitting

cdevadas added inline comments.May 17 2023, 6:34 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
3356–3358

The SIFixVGPRCopies pass inserted post regalloc pipeline will add the implicit exec for all vector copies. The Pred_COPY opcode is only short-lived. It will only get introduced by the liverange-split during regalloc and later SISimplifyPredicatedCopies inserted before virtregrewriter changes the Pred_COPY back to COPY opcode after manipulating the wwm-copy as needed.
It's ok even if we don't add the implicit use here.

yassingh updated this revision to Diff 523308.May 18 2023, 3:13 AM
  • Review comments and rebase.
yassingh marked 6 inline comments as done.May 18 2023, 4:30 AM

I haven't followed this closely, but how will this interact with GlobalISel?

No. It's introduced in the greedy register allocator and removed right after.

llvm/lib/Target/AMDGPU/SIInstructions.td
3356–3358
static const TargetRegisterClass *canFoldCopy(const MachineInstr &MI,
                                              unsigned FoldIdx) {
  const TargetInstrInfo &TII = *MI.getParent()->getParent()->getSubtarget().getInstrInfo();
  assert(TII.isCopyInstr(MI) && "MI must be a COPY instruction");
  if (MI.getNumOperands() != 2)
    return nullptr;

This function will early exit if we have 'implicit exec' right from the start for PRED_COPY. There might be more places where we will run into similar problems(haven't fully explored them).

Since we lower PRED_COPY back to COPY which explicitly adds 'exec' operand in SIFixVGPRCopies not having this flag here doesn't affect anything.

llvm/lib/Target/AMDGPU/SISimplifyPredicatedCopies.cpp
80–82

I accidentally added this comment here, removed in dependent patch D143762.

cdevadas added inline comments.May 18 2023, 4:39 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1343

This is no longer needed. We have to deal with Pred-COPY only when greedy-regalloc is invoked, not in the fastalloc.
Can you remove it?

yassingh updated this revision to Diff 524229.May 22 2023, 4:07 AM
  • Remove SISimplifyPredicatedCopies from O0
  • Rebase and formattting

I don’t understand why you are updating this patch when it’s replaced with the new live range splitting hooks

I don’t understand why you are updating this patch when it’s replaced with the new live range splitting hooks

This is target handling for introducing the LiveRange Split opcode (Pred-COPY) for AMDGPU and together with D150388, this patch will avoid my initial Pred-COPY generic opcode implementation patches D143754 & D143757.

I don’t understand why you are updating this patch when it’s replaced with the new live range splitting hooks

This is target handling for introducing the LiveRange Split opcode (Pred-COPY) for AMDGPU and together with D150388, this patch will avoid my initial Pred-COPY generic opcode implementation patches D143754 & D143757.

More specifically I meant SimplifyPredicatedCopies. If they're only inserted in contexts where they are required, then we wouldn't need a pass to remove the unnecessary cases

I don’t understand why you are updating this patch when it’s replaced with the new live range splitting hooks

This is target handling for introducing the LiveRange Split opcode (Pred-COPY) for AMDGPU and together with D150388, this patch will avoid my initial Pred-COPY generic opcode implementation patches D143754 & D143757.

More specifically I meant SimplifyPredicatedCopies. If they're only inserted in contexts where they are required, then we wouldn't need a pass to remove the unnecessary cases

We could merge WWM code insertion logic with SIFixVGPRCopies while still lowering PRED_COPY back to COPY there(to prevent duplication of COPY lowering logic)? That would avoid running a separate pass. Or are you suggesting another approach altogether to handle PRED_COPY instructions?

We could merge WWM code insertion logic with SIFixVGPRCopies while still lowering PRED_COPY back to COPY there(to prevent duplication of COPY lowering logic)? That would avoid running a separate pass. Or are you suggesting another approach altogether to handle PRED_COPY instructions?

No, I'm saying there's no case where PRED_COPY->COPY makes sense. PRED_COPY needs to be lowered differently from COPY. In this scheme we aren't speculatively spamming PRED_COPY on SGPR copies which later need to be simplified. There's no simplification to be performed, only lowering

No, I'm saying there's no case where PRED_COPY->COPY makes sense. PRED_COPY needs to be lowered differently from COPY.

The reason we decided to lower it to COPY was that way this PRED_COPY instruction won't be live for too long and we won't have to worry about how it interacts with other passes.

yassingh abandoned this revision.Jun 6 2023, 5:45 AM

Merged with D143762