Add trap handler support.
Details
Diff Detail
Event Timeline
I am sorry for checking in this code into LLVM repo by mistake. Please let me know if you have any comments for this patch so that I will modify & checkin accordingly. Thanks!
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | Matts: "This also needs to handle debug.trap(), and the case when unreachable’s are turned into traps. There should also be tests specifically for the annotator." | |
test/CodeGen/AMDGPU/trap.ll | ||
74–75 | Matt: "Should check for the enabled feature bits in the kernel_code_t. This also doesn’t have anything setting enable_trap_handler" |
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | There is a separate llvm.debugtrap intrinsic. I looked around SelectionDAG for other places that introduce ISD::TRAP. The expansion for ISD::DEBUGTRAP creates a TRAP. unreachable instructions also emit a trap if the DAG.getTarget().Options.TrapUnreachable is enabled. | |
test/CodeGen/AMDGPU/trap.ll | ||
74–75 | Yes, that is the problem. You need to be setting the trap handler bit |
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | Yes, ISD::DEBUGTRAP has been replaced with ISD::TRAP at the legalization phase. So, do we still need to map llvm.debugtrap intrinsic with amdgpu-queue-ptr? |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
74–75 | I am not sure if we should have the compiler be responsible for setting the enable_trap_handler bit. In general I don't think the compiler can figure this out on a per kernel basis. Once we support function calls and indirect calls how could be know if the closure of all functions include some that need the trap handler? Also, even if we did set the bit, for compute, the hardware CP microcode cannot do anything with it other than refuse to execute the kernel. Today I believe the CP micro code ignores this bit and always enables the trap handler as it is needed for CWSR (context switching). So it seems the presence of a trap handler is more a function of the environment that will execute the kernel than the kernel itself. So perhaps we should simply not define the trap handler bit. The code object is already marked as requiring the HSA environment and perhaps this implies the presence of a trap handler. I suspect that graphics kernels have their own "environment" they execute in and may not support trap handlers. If they do not then executing an S_TRAP will simply be a NOP which will not halt the shader. |
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | Yes. You should add a test which uses this. As it is this will fail because the queue ptr won't be enabled | |
test/CodeGen/AMDGPU/trap.ll | ||
74–75 | I don't think that implies the presence of the trap handler. This is a field in the kernel_code_t, so we should set it. We can figure out a conservative setting in the future whenever indirect calls are needed. I thought enabling this also required reserving 16 SGPRs? |
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | To enable or disable queue ptr, it depends whether we have used "-mtriple=amdgcn--amdhsa" or not. Since ISD::DEBUGTRAP has been replaced with ISD::TRAP, I don't think the test will fail if we compile llvm.debugtrap with "-mtriple=amdgcn--amdhsa", correct? |
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp | ||
---|---|---|
193–194 | That occurs in the DAG. Here you are trying to identify all the situations where ISD::TRAP will be emitted from the IR. Eventually I want to replace how the ABI is lowered so we don't need to do this anymore, but for now you must predict all of the traps. |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
74–75 | So, once compiler detects "llvm.trap()", shall we set up the trap handler bit in the kernel_code_t? I don't know why we need to add the trap handler bit. Does the bit setting will imply the pretense of the trap handler and otherwise it won't? |
- Fixed format issue.
- Add test case for llvm.debugtrap.
- Add non-hsa path trap handler.
- Confirm that 16 reserved SGPRs are not part of the regular SGPRs.
- Add trap handler bit set in the amd_kernel_code_t.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
276–279 | This does not need an else | |
1794 | const DebugLoc &DL | |
1986 | return on next line | |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
253 ↗ | (On Diff #85971) | This should return void and doesn't need HSA in the name |
test/CodeGen/AMDGPU/trap.ll | ||
1–15 | This should not be checking stderr since the test should pass with HSA | |
2 | This needs a not | |
73 | Check still missing for the enable_trap_handler bit in kernel_code_t | |
78 | You should add another test which ends in unreachable. As far as I can tell there is no flag to set TrapUnreachable, and we don't enable it now, but this should catch it if that ever breaks | |
78 | You also add another test which needs to enable the queue ptr for a different feature | |
91 | GCN should be replaced with ERROR or something like that |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
74–75 | It seems that the presence of a trap handler is a property of producing a code object to be executed using the HSA environment. Under ROCM all kernels will be executed with a trap handler and that trap handler will use the HSA ABI. Other environments may have different trap handlers, or no trap handler at all, and the code generated for llvm.trap/debugtrap would change accordingly. The bit in amd_kernel_code_t would then be set according to the demands of environment independent of using traps. Having a trap handler present does cause an extra 16 SGPRs to be allocated (which should be taken into account when determining wave occupancy), but these are in addition to the SGRs used by the generated code (so no need to reserve them or include them in the SGPR count in the amd_kernel_code_t). |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
74–75 | Do we still need to make sure we leave 16 unallocated for the implicit use? From the ABI spec it seems clear to not include them in the reported SGPR count, but do we need to ensure reported total + 16 is below the hardware limit? |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
74–75 | The hardware limit is the maximum non-privileged SGPRs that can be accessed using the instruction register encoding plus the 16 privileged trap registers. The trap temps are only allocated IF a trap handler is enabled, and are only accessible when executing the trap handler (using special instruction register encoding). So from the point of view of determining the limit of non-privileged SGPRs that can be allocated the presence of a trap handler can be ignored. The fact they are allocated needs to be considered when determining the number of waves that will fit on a CU. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
276–279 | I think we need leave the else here. I assume default setting is Legal, but looks like we need to explicitly specify it otherwise it will throw an error. | |
test/CodeGen/AMDGPU/trap.ll | ||
78 | What feature should I test? | |
78 | What kinds of instructiosn will trigger the trap instruction? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
276–279 | The else implements the non-HSA expansion that generates an end_prm which is used for graphics. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
276–279 | yes, so I think we need to keep the else part. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
276 | In discussion with @kzhuravl the suggestion was to use a subtarget query to determine the trap abi. Currently there are two ABIs, one for HSA and one for graphics, but in the future there could be others. | |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
253 ↗ | (On Diff #86133) | As mentioned in another comment, I think determining if there is a trap handler should be determined from the environment part of the triple. Check with @kzhuravl where the query should be put. |
As @tony-tye mentioned trap handler is more of an environment property (see https://reviews.llvm.org/D26010#inline-251754), so it should be moved into AMDGPUSubtarget or SISubtarget, and we should be setting trap handler bit in pgm rsrc 2 based on the subtarget query.
In order to allow extensibility in the future the suggestion is to introduce an enumeration with available trap handler ABIs. Currently there will be only 2, something along the lines:
enum TrapHandlerAbi { TrapHandlerAbiNone = 0, TrapHandlerAbiHsa = 1 }
Also, introduce a subtarget query that returns trap handler abi:
TrapHandlerAbi() { if (isAmdHsaOs) return TrapHandlerAbiHsa; return TrapHandlerAbiNone; }
Every ISD::TRAP is going to be lowered into S_TRAP_PSEUDO regardless of the trap handler ABI.
In order to expand S_TRAP_PSEUDO, we will have to query what kind of trap handler ABI does the subtarget have. If it is an HSA, then move 1 into v0, and queue ptr into s[0:1]. If trap ABI is none, then lower it to s_engpgm.
@tony-tye did I miss anything? @arsenm, @wdng what do you think?
Thanks
lib/Target/AMDGPU/AMDKernelCodeT.h | ||
---|---|---|
200–202 ↗ | (On Diff #86133) | This is not right. Trap handler bit is located in amd_compute_pgm_rsrc2_t with the field name enable_trap_handler. Refer to: |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
278–279 | If we go with the approach in my latest comment, custom lowering won't be needed. | |
1789–1790 | If we go with the approach in my latest comment, this should switch on ABI type, for HSA ABI this code will be used. For "none" ABIs s_engpgm should be used (without error or warning). | |
2469–2485 | If we go with the approach in my latest comment, this will be gone. | |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
253 ↗ | (On Diff #86133) | If we go with the approach in my latest comment, this will be moved to subtarget class. |
lib/Target/AMDGPU/Utils/AMDKernelCodeTInfo.h | ||
90 | enable_trap_handler should be here (see comment with links). | |
116 | There is no is_trap_handler_supported in CODEPROP (see comment with links). |
- Restructure code to make it extensible based on code reviews.
- Move trap handler bit in amd_compute_pgm_rsrc2_t.
- updated patch with complete & full diff
- Restructure code to make it extensible based on code reviews.
- Move trap handler bit in amd_compute_pgm_rsrc2_t.
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
112 | Comment unnecessary, also nothing HSA specific here | |
277 | Return enum type | |
277–282 | Badly formatted. return ternary operator | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1793 | Compare to enum value | |
1805–1809 | Formatting of previous code needs to be fixed | |
1810–1811 | Else on previous line. Instead of else you can early return on the error path | |
1817–1818 | Badly formatted. You already have MF and get the Function* above, so you don't need double getParent | |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
136–137 ↗ | (On Diff #86456) | This should be removed. The trap handler should be set based on a subtarget feature |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
136–137 ↗ | (On Diff #86456) | S_00B84C_TRAP_HANDLER(MFI->hasTrapHandler()) needs to know whether trap handler has been set. Where should we the based on the sub- target feature? |
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
641 | Should this be querying the TrapHandlerAbi using getTrapHandlerAbi() from the subtarget and setting to TRUE if not equal to TrapHandlerAbiNone ? | |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
113 | Should this be deleted now as there is getTrapHandlerAbi() instead? | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1807 | Should the 0x1 constant be a named enumeration? Should we use a different values for llvm.trap than for llvm.debugtrap? If so the handling of llvm.debugtrap should not be converted to llvm.trap for AMDGPU. Have we checked with teh HSA Runtime to see what codes the HSA trap handler is expecting? | |
1812–1819 | Should we be emitting any diagnostic here? If there is no trap handler then doesn't that imply that the environment wants an S_ENDPGM to be generated instead? That is not an error, it is that the environment demands as the implementation. For example, that is what the graphics environment wants. | |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
67 ↗ | (On Diff #86456) | Delete too? |
136–137 ↗ | (On Diff #86456) | Delete too? |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
154 ↗ | (On Diff #86456) | Should this be deleted? See above. |
261–263 ↗ | (On Diff #86456) | Delete as now we have the trap ABI query instead. |
Also need to use the trap handler ABI query to see if there is a trap handler, and if there is add the TRAP_HANDLER_SGPR_COUNT to the number of SGPRs budgeted for the wave in determining the number of waves per EU calculation. TRAP_HANDLER_SGPR_COUNT is 16 for GFX6 onwards.
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
278–281 | Still using return after else. Should be return ternary operator | |
341–343 | This should be a subtarget feature. Returning this is still a constant based purely on the triple. See FeatureUnalignedBufferAccess for an example | |
test/CodeGen/AMDGPU/trap.ll | ||
1–15 | You should have a run line with no subtarget features enabled, and one each explicitly enabling and disabling the trap handler subtarget feature |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1793 | Say, if we enable trap handler via subtarget feature, should we need to change the if here? |
We enable trap handler support once "-mtriple=amdgcn--amdhsa" is specified. So use the following code to enable trap handler.
bool isTrapHandlerEnabled() const { return getTrapHandlerAbi() == TrapHandlerAbiHsa; }
Not sure why we need to add a subtarget feature. Say if we replace code 328-330 using subtarget feature, do we want to enable trap handler by using both "-mtriple=amdgcn--amdhsa" and "-mattr=+enable-trap-handler"?
def FeatureTrapHandler: SubtargetFeature<"enable-trap-handler", "EnableTrapHandler", "true", "Enable trap handler support" >;
At SIISelLowering.cpp we enable trap handler based on "mtriple=amdgcn--amdhsa", shall we change to based on the subtarget feature input?
if (Subtarget->getTrapHandlerAbi() == SISubtarget::TrapHandlerAbiHsa) { . . . }
What do you think?
I do not see updates to waves-per-eu calculations for SGPRs. Did you upload the correct diff?
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
482–484 | I do not think enabling the trap handler reduces the number of available SGPRs. The hardware allocates and provides access to theses "in addition" to the regular SGPRs. If this is reserving registers then it should be deleted. |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
73 | Add enumeration for S_TRAP codes: enum TrapCode { TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2 }; | |
341–343 | Could you explain this more? The trap handler being present is determined by the environment part of the triple. Currently only the HSA environment uses a trap handler. | |
342 | In order to allow additional trap handler ABIs this should be: | |
517 | I do not see this being used anywhere. I think it should be used in the waves_per_eu calculation. | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1807 | I checked with the HSA Runtime and currently the trap code is not being consulted. So I propose adding an enum for the codes being used for s_trap and use the value here. It would be better to different code for llvm.trap and llvm.debugtrap. Would that require S_TRAP_PSEUDO to have an immediate operand that is set to the trap code used in the S_TRAP instruction. | |
test/CodeGen/AMDGPU/trap.ll | ||
88 | Would like it to be: |
lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
70 | Remove the word enable- |
- Separate llvm.debugtrap and llvm.trap.
- Decrease 16 from available SGPRs once trap handler is enabled.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1842 | Can we have an enum for these values somewhere instead of hard coding 1? | |
1851–1856 | This duplicates nearly the entire other switch case. They should be consolidated based on the immediate argument and handled with the single pseudo | |
lib/Target/AMDGPU/SIInstructions.td | ||
116 | This conflicts with the VPseudoInst type. You should remove this and change to SPseudoInstSI | |
120–126 | You do not need a second pseudo. You should just add an operand to the other pseudo which is just mimicking the operands of the physical instruction. | |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
1166 ↗ | (On Diff #87286) | Variable naming convention |
test/CodeGen/AMDGPU/trap.ll | ||
1–15 | Having FUNC and HSA-FUNC doesn't make sense. Replace these both with just GCN | |
2 | Missing a run line with mesa triple | |
77–79 | HSA: | |
86–87 | The check prefix names should be like HSA-TRAP, HSA-NOTRAP, MESA-TRAP, MESA-NOTRAP |
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | ||
---|---|---|
1034–1045 ↗ | (On Diff #87324) | You should not and do not need to touch generic code |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
276 | DEBUGTRAP should also be legal, you are handling the differences in the selection pattern and custom inserter | |
lib/Target/AMDGPU/SIInstructions.td | ||
114 | You should use i16imm for the operand type to match the instruction (not that it matters much) and name it the same too (which matters more): ( ins i16imm:$simm16) | |
404 | Can you define pseudo-enums for these constants (see for example SRCMODS) | |
test/CodeGen/AMDGPU/trap.ll | ||
1–16 | GCN and HSA are not alternative check prefixes. It doesn't make sense to have it on one of these but not the other. HSA-TRAP for the disabled trap handler feature is broken | |
3–4 | These should both explicitly use the mesa triple, and enable/disable the trap handler in each. The ones which should error are also missing the not. | |
90–91 | These check lines should not be able to coexist |
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | ||
---|---|---|
1034–1045 ↗ | (On Diff #87324) | I accidentally touched those lines and have reverted them, thanks! |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
276 | Yes, I did set DEBUGTRAP to Legal for next pattern matching at isel. | |
test/CodeGen/AMDGPU/trap.ll | ||
3–4 | As per our discussion with @tony-tye last week, looks like we want to issue warning instead of error, correct? |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
1166 ↗ | (On Diff #87529) | Is this the right place to useSISubtarget::TRAP_HANDLER_SGPR_COUNT? The presence of a trap handler does not reduce the number of SGPRs addressable for allocation by the register allocator. It only affects the wave occupancy calculation (I did not see any change to that code). |
lib/Target/AMDGPU/AMDKernelCodeT.h | ||
---|---|---|
198 ↗ | (On Diff #87529) | This change should be fully reverted |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
1166 ↗ | (On Diff #87529) | I don't think so. This is the allocated number |
test/CodeGen/AMDGPU/trap.ll | ||
5–9 | You seem to have removed the HSA enable/disable and do mesa twice? | |
19 | You should also check for the rsrc2 bit is set |
test/CodeGen/AMDGPU/trap.ll | ||
---|---|---|
5–9 | I also do HSA enable/disable twice: |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1797 | Should have enum (probably in SIDefines.h) for the values put in |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1797 | This probably is not an enum. It looks like an interface with HSA is that we need to set v0 to 1: v0 <- 1 |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1797 | I think defining it in SIDefines.h for now is OK. I am working on some restructuring to shared header files, and will include it in the restructuring. |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
1166 ↗ | (On Diff #87529) | I think this should be moved to getMaxNumSGPRs. We still want to report the correct number of addressable SGPRs regardless of trap handler. |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
1166 ↗ | (On Diff #87529) | I think in terms of functionality it's the same no matter whether we move the code into getMaxNumSGPRs or not. As getMaxNumSGPRs still invokes this function to calculate MaxNumSGPRs and MaxNumAddressableSGPRs. What do you think? |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
1166 ↗ | (On Diff #87529) | Not really. We need to record addressable number of SGPRs in the metadata for tools such as debugger and profiler. The addressable number of SGPRs should be recorded correctly regardless of trap handler (102 for 8+, 104 for everything else). |
- Add rsrc2 trap handler bit check in lit tests.
- Moved the trap handler SGPRs calculation into getMaxNumSGPRs
- Add a trap operand enum for the time being to replace a constant value 1.
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
1166 ↗ | (On Diff #87529) | Can we drop waves-per-eu calculations from this patch? We need to change few things to math, which can be done as a separate patch. |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
74–82 | After discussion with the HSA runtime team it was decided to use separate S_TRAP codes for each purpose. The current codes would be: enum TrapCode { TrapCodeBreakpoint = 0, TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2, TrapCodeHSADebugTrap = 3 }; These need documenting in the AMDGPU LLVM Spec page. For now we do not have an intrinsic for the HSA debugtrap (which takes an argument) so remove the TrapRegValues enum. | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1796–1797 | Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1811 | Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value. |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
74–82 | The TRAP handler codes can be defined in a new section of: http://llvm.org/docs/AMDGPUUsage.html |
Added enum for better HSA interface and removed mov instruction for V0 as llvm.trap and llvm.debugtrap will not pass in any value.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1820 | Since llvm.debugtrap is not defined as NORETURN then it seems after a warning it should be lowered to a S_NOP not an S_ENDPGM. llvm.trap is defined as NORETURN so generating S_ENDPGM seems the right choice when there is no trap handler. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1811 | Since TrapType is an enum type should this be a switch with a default: that asserts? Currently the code will silently ignore any new trap types that are added (an assert would avoid that). |
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
87 | Sort | |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
106 | Can you sort this later until after EnableXNACK? I think the unaligned options should stay together | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1813–1815 | Asserting !the switch case in the switch case is ugly. There's no reason to special case this, just put unreachable in the default case | |
1826 | addImm on next line. We don't really need to emit anything here though | |
1829–1832 | Just let it default | |
1834 | llvm_unreachable |
Discussed with @tony-tye , keep only 2 CASEs that can happen for the pseudo trap is llvmtrap and llvmdebugtrap in switch block.
Remove the word enable-