This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
ClosedPublic

Authored by hsmhsm on Feb 16 2020, 3:30 AM.

Details

Summary

Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Event Timeline

hsmhsm created this revision.Feb 16 2020, 3:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2020, 3:30 AM
arsenm added inline comments.Feb 17 2020, 6:37 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3497–3498

This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll
4–15

Shouldn't run the DAG tests in the GlobalIsel test directory

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96

Can you add another test that uses the stack?

Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet

arsenm added inline comments.Feb 17 2020, 6:39 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3497–3498

Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual

Can you also add a MIR test to make sure the intermediate virtual copy is added? You'll need to explicitly add the queueptr in the MIR MachineFunctionInfo

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96

Actually, never mind. A non-kernel function should work. Calling it will not

hsmhsm updated this revision to Diff 245020.Feb 17 2020, 11:51 AM

Take care of review comments by arsenm.

  1. Introduce intermediate virtual register copy
  2. Remove DAG tests in the GlobalIsel test directory
  3. Add LLVM IRs that uses the stack
  4. Assert that destination register is virtual within loadInputValue()
hsmhsm marked 4 inline comments as done.Feb 17 2020, 11:56 AM

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

I would expect these to be the same in both?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499

loadInputValue calls getLiveInRegister, you shouldn't need it here

3500

Why do you need to set the class? It shouldn't be necessary at this point

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96

Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either

hsmhsm updated this revision to Diff 245088.Feb 17 2020, 11:44 PM

Share the original DAG test for GlobalISel, and do not re-invent new one here.

hsmhsm marked 3 inline comments as done.Feb 18 2020, 12:08 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499

My understanding of your earlier review comments is:

  1. loadInputValue() should expect, source register as physical register, and destination register as virtual register, and we should have assertions for these accordingly. The assertion for later was missing, that I added as per your review suggestion.
  2. Now, going by above, it is the responsibility of the caller of loadInputValue() to make sure that it passes destination register as virtual register.
  3. Hence, I created a destination virtual register here by calling getLiveInRegister(). The another call within loadInputValue() is for source physical register.

I am not getting what am I missing here.

3500

We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid.

If it is not suppose to be setting here, where else should I be setting it?
OR
do I need to handle it in a different way? Please suggest it in more detail.

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96

Actually, I am bit confused about writing test cases here. Let's start from scratch here.

  1. As you suggested, I have shared the same original DAG test for GlobalISel, without re-inventing the same for GlobalISel. I was making some mistakes earlier, that I learned and corrected it, and sharing of original DAG test works fine now.
  2. Now, starting from this point, please suggest how should I further continue here.
    • What are the additional tests that need to be covered here, and why?
    • Do I need to add the additional tests within original DAG test, and make sure both ISelDAG path and GlobalSel path work fine for these additional tests?
    • Or these are specific to only GlobalISel path for some specific reason?
arsenm added inline comments.Feb 18 2020, 9:55 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499

loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one.

What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it

You may want to split this part into a helper function:

// Insert the argument copy if it doens't already exist.
// FIXME: It seems EmitLiveInCopies isn't called anywhere?
if (!MRI.getVRegDef(LiveIn)) {
  // FIXME: Should have scoped insert pt
  MachineBasicBlock &OrigInsBB = B.getMBB();
  auto OrigInsPt = B.getInsertPt();

  MachineBasicBlock &EntryMBB = B.getMF().front();
  EntryMBB.addLiveIn(Arg->getRegister());
  B.setInsertPt(EntryMBB, EntryMBB.begin());
  B.buildCopy(LiveIn, Arg->getRegister());

  B.setInsertPt(OrigInsBB, OrigInsPt);
}
3500

The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering?

I also would not call this VDstReg, as that makes it sound like a VGPR.

Another possible strategy would be to emit a call to the queue.ptr intrinsic, and allow that to be legalized

hsmhsm updated this revision to Diff 245346.Feb 19 2020, 1:04 AM

Take care of further review comments by arsenm.

  1. Add an another version of getLiveInRegister() which works with TargetRegisterClass
  2. While copying to destination virtual register, make sure that it is defined
  3. Bit of code refactoring around related part of the source.
hsmhsm marked 2 inline comments as done.Feb 19 2020, 1:12 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499

I have taken care of it. Now, before making copy to destination virtual register, we ensure that it is defined.

3500

I have taken care of it. I just needed an another version of getLiveInRegister() which works with TargetRegisterClass.

arsenm added inline comments.Feb 19 2020, 2:23 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2237–2238

I still don't think you should need the register class set. This should still have the type set on the virtual register

2243–2246

I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the
// Destination virtual register is already defined, just insert copy

case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result

2283

This shouldn't be called twice

2297–2301

Move the assignment up to avoid the weird !()

3495

Ditto

3502

Remove DstReg and just directly refer to the physical register

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
96

Should use MCRegister for Reg to make it clear it's physical.

hsmhsm updated this revision to Diff 245576.Feb 19 2020, 9:56 PM

Changes as per further review comments:

  1. First, I would like to make sure that all the *necessary* foundational issues are handled before actually discussing the crux of some assertion failure, that we hit, if we do not assign a register class to a virtual live-in register of SGPR01.
  2. So, I have taken care of all your previous review comments here, and I have deliberately put assertions about physical registers where it is expected.
  3. Let's ensure these changes are as per your satisfaction, before moving to next point of discussion. Please let me know if I need to take care any remaining issues here before actually discussing assertion failure related issues.
hsmhsm marked 6 inline comments as done.Feb 19 2020, 10:04 PM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2237–2238

I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes.

2243–2246

taken care

2283

taken care

2297–2301

taken care

3495

taken care

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
96

BuildCopy seems not work with MCRegister, hence added an assertion to ensure that *PhyReg* is indeed physical register.

arsenm added inline comments.Feb 20 2020, 4:26 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2243–2246

This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now.

3500

Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64)

hsmhsm updated this revision to Diff 246097.Feb 22 2020, 10:27 PM

Take care of further review comments by arsenm.

hsmhsm marked an inline comment as done.Feb 22 2020, 10:28 PM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2243–2246

Taken care

t-tye added a comment.Feb 23 2020, 7:09 AM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

arsenm accepted this revision.Feb 26 2020, 7:07 PM

LGTM

This revision is now accepted and ready to land.Feb 26 2020, 7:07 PM
hsmhsm updated this revision to Diff 247447.EditedFeb 29 2020, 8:41 AM

Couple of additional required changes

[1] SGPR01 is a new register created within legalizeTrapIntrinsic(), and we actually need to get its type as LLT::scalar(64).
[2] Call to insertLiveInCopy() within getLiveInRegister() is only required if the copy is required from physical to virtual register.
[3] When call to insertLiveInCopy() within getLiveInRegister() is required, make sure that it is called irrespective of whether the virtual register is newly created within getLiveInRegister() or not.

t-tye added a comment.Mar 1 2020, 8:31 PM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of. What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

hsmhsm added a comment.Mar 1 2020, 8:59 PM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

t-tye added a comment.Mar 1 2020, 10:29 PM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

OK I see that now. Thanks.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

OK that sounds reasonable. Thanks.

arsenm requested changes to this revision.Mar 2 2020, 12:03 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499

I would just use the literal register value below rather than have a variable for it. At least make this a const Register

3500

The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64)

This revision now requires changes to proceed.Mar 2 2020, 12:03 PM
hsmhsm updated this revision to Diff 247712.Mar 2 2020, 12:38 PM

Take care of latest review comments by matt about type setting of virtual register.

arsenm accepted this revision.Mar 4 2020, 12:23 PM
This revision is now accepted and ready to land.Mar 4 2020, 12:23 PM
This revision was automatically updated to reflect the committed changes.