Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 46596 Build 49128: arc lint + arc unit
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3375–3376 | This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll | ||
3–14 | Shouldn't run the DAG tests in the GlobalIsel test directory | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll | ||
95 | Can you add another test that uses the stack? Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3375–3376 | Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual |
Can you also add a MIR test to make sure the intermediate virtual copy is added? You'll need to explicitly add the queueptr in the MIR MachineFunctionInfo
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll | ||
---|---|---|
95 | Actually, never mind. A non-kernel function should work. Calling it will not |
Take care of review comments by arsenm.
- Introduce intermediate virtual register copy
- Remove DAG tests in the GlobalIsel test directory
- Add LLVM IRs that uses the stack
- Assert that destination register is virtual within loadInputValue()
Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.
; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208
I would expect these to be the same in both?
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3377 | loadInputValue calls getLiveInRegister, you shouldn't need it here | |
3378 | Why do you need to set the class? It shouldn't be necessary at this point | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll | ||
95 | Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3377 | My understanding of your earlier review comments is:
I am not getting what am I missing here. | |
3378 | We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid. If it is not suppose to be setting here, where else should I be setting it? | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll | ||
95 | Actually, I am bit confused about writing test cases here. Let's start from scratch here.
|
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3377 | loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one. What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it You may want to split this part into a helper function: // Insert the argument copy if it doens't already exist. // FIXME: It seems EmitLiveInCopies isn't called anywhere? if (!MRI.getVRegDef(LiveIn)) { // FIXME: Should have scoped insert pt MachineBasicBlock &OrigInsBB = B.getMBB(); auto OrigInsPt = B.getInsertPt(); MachineBasicBlock &EntryMBB = B.getMF().front(); EntryMBB.addLiveIn(Arg->getRegister()); B.setInsertPt(EntryMBB, EntryMBB.begin()); B.buildCopy(LiveIn, Arg->getRegister()); B.setInsertPt(OrigInsBB, OrigInsPt); } | |
3378 | The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering? I also would not call this VDstReg, as that makes it sound like a VGPR. |
Another possible strategy would be to emit a call to the queue.ptr intrinsic, and allow that to be legalized
Take care of further review comments by arsenm.
- Add an another version of getLiveInRegister() which works with TargetRegisterClass
- While copying to destination virtual register, make sure that it is defined
- Bit of code refactoring around related part of the source.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2232–2233 | I still don't think you should need the register class set. This should still have the type set on the virtual register | |
2238–2241 | I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result | |
2277 | This shouldn't be called twice | |
2291–2295 | Move the assignment up to avoid the weird !() | |
3373 | Ditto | |
3380 | Remove DstReg and just directly refer to the physical register | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h | ||
96 | Should use MCRegister for Reg to make it clear it's physical. |
Changes as per further review comments:
- First, I would like to make sure that all the *necessary* foundational issues are handled before actually discussing the crux of some assertion failure, that we hit, if we do not assign a register class to a virtual live-in register of SGPR01.
- So, I have taken care of all your previous review comments here, and I have deliberately put assertions about physical registers where it is expected.
- Let's ensure these changes are as per your satisfaction, before moving to next point of discussion. Please let me know if I need to take care any remaining issues here before actually discussing assertion failure related issues.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2232–2233 | I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes. | |
2238–2241 | taken care | |
2277 | taken care | |
2291–2295 | taken care | |
3373 | taken care | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h | ||
96 | BuildCopy seems not work with MCRegister, hence added an assertion to ensure that *PhyReg* is indeed physical register. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2238–2241 | This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now. | |
3378 | Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64) |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2238–2241 | Taken care |
I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.
So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.
Adding @kzhuravl for ELF ABI version help.
Hi Tony,
If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.
Couple of additional required changes
[1] SGPR01 is a new register created within legalizeTrapIntrinsic(), and we actually need to get its type as LLT::scalar(64).
[2] Call to insertLiveInCopy() within getLiveInRegister() is only required if the copy is required from physical to virtual register.
[3] When call to insertLiveInCopy() within getLiveInRegister() is required, make sure that it is called irrespective of whether the virtual register is newly created within getLiveInRegister() or not.
Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of. What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.
The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.
What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.
I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.
OK I see that now. Thanks.
What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.
I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.
OK that sounds reasonable. Thanks.
Should use MCRegister for Reg to make it clear it's physical.