This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add trap handler support.
ClosedPublic

Authored by wdng on Oct 26 2016, 1:33 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
This revision now requires changes to proceed.Oct 28 2016, 2:01 PM
wdng retitled this revision from AMDGPU : Add s_trap intrinsic. to AMDGPU : Add trap handler support..Jan 23 2017, 9:36 PM
wdng edited the summary of this revision. (Show Details)
wdng updated this revision to Diff 85535.Jan 23 2017, 11:16 PM
wdng edited edge metadata.

I am sorry for checking in this code into LLVM repo by mistake. Please let me know if you have any comments for this patch so that I will modify & checkin accordingly. Thanks!

wdng added inline comments.Jan 24 2017, 9:16 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

Matts: "This also needs to handle debug.trap(), and the case when unreachable’s are turned into traps. There should also be tests specifically for the annotator."

test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

Matt: "Should check for the enabled feature bits in the kernel_code_t. This also doesn’t have anything setting enable_trap_handler"

wdng added inline comments.Jan 24 2017, 9:19 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

How should we handle debug.trap()? For the case of unreachable's, should we invoke debug.trap()?

test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

I think "@llvm.trap()" will enable_trap_handler, right?

arsenm added inline comments.Jan 24 2017, 11:19 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

There is a separate llvm.debugtrap intrinsic. I looked around SelectionDAG for other places that introduce ISD::TRAP. The expansion for ISD::DEBUGTRAP creates a TRAP. unreachable instructions also emit a trap if the DAG.getTarget().Options.TrapUnreachable is enabled.

test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

Yes, that is the problem. You need to be setting the trap handler bit

wdng added inline comments.Jan 24 2017, 12:29 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

Yes, ISD::DEBUGTRAP has been replaced with ISD::TRAP at the legalization phase. So, do we still need to map llvm.debugtrap intrinsic with amdgpu-queue-ptr?

tony-tye added inline comments.Jan 24 2017, 12:59 PM
test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

I am not sure if we should have the compiler be responsible for setting the enable_trap_handler bit. In general I don't think the compiler can figure this out on a per kernel basis. Once we support function calls and indirect calls how could be know if the closure of all functions include some that need the trap handler?

Also, even if we did set the bit, for compute, the hardware CP microcode cannot do anything with it other than refuse to execute the kernel. Today I believe the CP micro code ignores this bit and always enables the trap handler as it is needed for CWSR (context switching).

So it seems the presence of a trap handler is more a function of the environment that will execute the kernel than the kernel itself.

So perhaps we should simply not define the trap handler bit. The code object is already marked as requiring the HSA environment and perhaps this implies the presence of a trap handler.

I suspect that graphics kernels have their own "environment" they execute in and may not support trap handlers. If they do not then executing an S_TRAP will simply be a NOP which will not halt the shader.

arsenm added inline comments.Jan 24 2017, 1:07 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

Yes. You should add a test which uses this. As it is this will fail because the queue ptr won't be enabled

test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

I don't think that implies the presence of the trap handler. This is a field in the kernel_code_t, so we should set it. We can figure out a conservative setting in the future whenever indirect calls are needed.

I thought enabling this also required reserving 16 SGPRs?

wdng added inline comments.Jan 24 2017, 1:53 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

To enable or disable queue ptr, it depends whether we have used "-mtriple=amdgcn--amdhsa" or not. Since ISD::DEBUGTRAP has been replaced with ISD::TRAP, I don't think the test will fail if we compile llvm.debugtrap with "-mtriple=amdgcn--amdhsa", correct?

arsenm added inline comments.Jan 24 2017, 1:56 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194 ↗(On Diff #85535)

That occurs in the DAG. Here you are trying to identify all the situations where ISD::TRAP will be emitted from the IR. Eventually I want to replace how the ABI is lowered so we don't need to do this anymore, but for now you must predict all of the traps.

wdng added inline comments.Jan 25 2017, 9:49 AM
test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

So, once compiler detects "llvm.trap()", shall we set up the trap handler bit in the kernel_code_t? I don't know why we need to add the trap handler bit. Does the bit setting will imply the pretense of the trap handler and otherwise it won't?

wdng updated this revision to Diff 85969.Jan 26 2017, 2:38 PM
  1. Fixed format issue.
  2. Add test case for llvm.debugtrap.
  3. Add non-hsa path trap handler.
  4. Confirm that 16 reserved SGPRs are not part of the regular SGPRs.
  5. Add trap handler bit set in the amd_kernel_code_t.
wdng updated this revision to Diff 85971.Jan 26 2017, 2:57 PM

vim added some indentations by mistake, removed indentations & empty line.

arsenm added inline comments.Jan 26 2017, 3:19 PM
lib/Target/AMDGPU/SIISelLowering.cpp
276–279 ↗(On Diff #85971)

This does not need an else

1794 ↗(On Diff #85971)

const DebugLoc &DL

1986 ↗(On Diff #85971)

return on next line

lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗(On Diff #85971)

This should return void and doesn't need HSA in the name

test/CodeGen/AMDGPU/trap.ll
1 ↗(On Diff #85971)

This should not be checking stderr since the test should pass with HSA

2 ↗(On Diff #85971)

This needs a not

8 ↗(On Diff #85971)

Check still missing for the enable_trap_handler bit in kernel_code_t

26 ↗(On Diff #85971)

GCN should be replaced with ERROR or something like that

30 ↗(On Diff #85971)

You should add another test which ends in unreachable. As far as I can tell there is no flag to set TrapUnreachable, and we don't enable it now, but this should catch it if that ever breaks

30 ↗(On Diff #85971)

You also add another test which needs to enable the queue ptr for a different feature

tony-tye added inline comments.Jan 26 2017, 5:32 PM
test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

It seems that the presence of a trap handler is a property of producing a code object to be executed using the HSA environment. Under ROCM all kernels will be executed with a trap handler and that trap handler will use the HSA ABI. Other environments may have different trap handlers, or no trap handler at all, and the code generated for llvm.trap/debugtrap would change accordingly. The bit in amd_kernel_code_t would then be set according to the demands of environment independent of using traps.

Having a trap handler present does cause an extra 16 SGPRs to be allocated (which should be taken into account when determining wave occupancy), but these are in addition to the SGRs used by the generated code (so no need to reserve them or include them in the SGPR count in the amd_kernel_code_t).

arsenm added inline comments.Jan 26 2017, 8:51 PM
test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

Do we still need to make sure we leave 16 unallocated for the implicit use? From the ABI spec it seems clear to not include them in the reported SGPR count, but do we need to ensure reported total + 16 is below the hardware limit?

tony-tye added inline comments.Jan 26 2017, 9:10 PM
test/CodeGen/AMDGPU/trap.ll
7–8 ↗(On Diff #85535)

The hardware limit is the maximum non-privileged SGPRs that can be accessed using the instruction register encoding plus the 16 privileged trap registers. The trap temps are only allocated IF a trap handler is enabled, and are only accessible when executing the trap handler (using special instruction register encoding).

So from the point of view of determining the limit of non-privileged SGPRs that can be allocated the presence of a trap handler can be ignored. The fact they are allocated needs to be considered when determining the number of waves that will fit on a CU.

wdng updated this revision to Diff 86125.Jan 27 2017, 2:29 PM
wdng marked 7 inline comments as done.

Address code reviews.

wdng added inline comments.Jan 27 2017, 2:29 PM
lib/Target/AMDGPU/SIISelLowering.cpp
276–279 ↗(On Diff #85971)

I think we need leave the else here. I assume default setting is Legal, but looks like we need to explicitly specify it otherwise it will throw an error.

test/CodeGen/AMDGPU/trap.ll
30 ↗(On Diff #85971)

What feature should I test?

30 ↗(On Diff #85971)

What kinds of instructiosn will trigger the trap instruction?

tony-tye added inline comments.Jan 27 2017, 2:34 PM
lib/Target/AMDGPU/SIISelLowering.cpp
276–279 ↗(On Diff #85971)

The else implements the non-HSA expansion that generates an end_prm which is used for graphics.

Can you upload a full diff? Thanks

wdng added inline comments.Jan 27 2017, 2:43 PM
lib/Target/AMDGPU/SIISelLowering.cpp
276–279 ↗(On Diff #85971)

yes, so I think we need to keep the else part.

wdng updated this revision to Diff 86133.Jan 27 2017, 3:06 PM

Upload a full diff.

arsenm added inline comments.Jan 27 2017, 4:58 PM
lib/Target/AMDGPU/SIMachineFunctionInfo.h
265–266 ↗(On Diff #86133)

Remove Ptr from the names

test/CodeGen/AMDGPU/trap.ll
30 ↗(On Diff #85971)

You can do an address space cast from LDS to flat or do something with the queue ptr intrinsic

tony-tye added inline comments.Jan 27 2017, 9:25 PM
lib/Target/AMDGPU/SIISelLowering.cpp
276 ↗(On Diff #86133)

In discussion with @kzhuravl the suggestion was to use a subtarget query to determine the trap abi. Currently there are two ABIs, one for HSA and one for graphics, but in the future there could be others.

lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗(On Diff #86133)

As mentioned in another comment, I think determining if there is a trap handler should be determined from the environment part of the triple. Check with @kzhuravl where the query should be put.

As @tony-tye mentioned trap handler is more of an environment property (see https://reviews.llvm.org/D26010#inline-251754), so it should be moved into AMDGPUSubtarget or SISubtarget, and we should be setting trap handler bit in pgm rsrc 2 based on the subtarget query.

In order to allow extensibility in the future the suggestion is to introduce an enumeration with available trap handler ABIs. Currently there will be only 2, something along the lines:

enum TrapHandlerAbi {
  TrapHandlerAbiNone = 0,
  TrapHandlerAbiHsa = 1
}

Also, introduce a subtarget query that returns trap handler abi:

TrapHandlerAbi() {
  if (isAmdHsaOs)
    return TrapHandlerAbiHsa;
  return TrapHandlerAbiNone;
}

Every ISD::TRAP is going to be lowered into S_TRAP_PSEUDO regardless of the trap handler ABI.
In order to expand S_TRAP_PSEUDO, we will have to query what kind of trap handler ABI does the subtarget have. If it is an HSA, then move 1 into v0, and queue ptr into s[0:1]. If trap ABI is none, then lower it to s_engpgm.

@tony-tye did I miss anything? @arsenm, @wdng what do you think?

Thanks

lib/Target/AMDGPU/AMDKernelCodeT.h
200–202 ↗(On Diff #86133)
lib/Target/AMDGPU/SIISelLowering.cpp
278–279 ↗(On Diff #86133)

If we go with the approach in my latest comment, custom lowering won't be needed.

1798–1809 ↗(On Diff #86133)

If we go with the approach in my latest comment, this should switch on ABI type, for HSA ABI this code will be used. For "none" ABIs s_engpgm should be used (without error or warning).

2461–2477 ↗(On Diff #86133)

If we go with the approach in my latest comment, this will be gone.

lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗(On Diff #86133)

If we go with the approach in my latest comment, this will be moved to subtarget class.

lib/Target/AMDGPU/Utils/AMDKernelCodeTInfo.h
90 ↗(On Diff #86133)

enable_trap_handler should be here (see comment with links).

115 ↗(On Diff #86133)

There is no is_trap_handler_supported in CODEPROP (see comment with links).

Also, we need to update waves-per-eu calculations.

wdng added a comment.Jan 30 2017, 9:55 AM

Also, we need to update waves-per-eu calculations.

What kinds of changes should we made for the waves-per-eu?

wdng updated this revision to Diff 86451.Jan 31 2017, 10:11 AM
wdng marked 9 inline comments as done.
  1. Restructure code to make it extensible based on code reviews.
  2. Move trap handler bit in amd_compute_pgm_rsrc2_t.
wdng updated this revision to Diff 86456.Jan 31 2017, 10:22 AM
  1. updated patch with complete & full diff
  2. Restructure code to make it extensible based on code reviews.
  3. Move trap handler bit in amd_compute_pgm_rsrc2_t.
arsenm added inline comments.Jan 31 2017, 11:01 AM
lib/Target/AMDGPU/AMDGPUSubtarget.h
100 ↗(On Diff #86456)

Comment unnecessary, also nothing HSA specific here

267 ↗(On Diff #86456)

Return enum type

267–272 ↗(On Diff #86456)

Badly formatted. return ternary operator

lib/Target/AMDGPU/SIISelLowering.cpp
1792 ↗(On Diff #86456)

Compare to enum value

1804–1808 ↗(On Diff #86456)

Formatting of previous code needs to be fixed

1809–1810 ↗(On Diff #86456)

Else on previous line. Instead of else you can early return on the error path

1816–1817 ↗(On Diff #86456)

Badly formatted. You already have MF and get the Function* above, so you don't need double getParent

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗(On Diff #86456)

This should be removed. The trap handler should be set based on a subtarget feature

wdng marked 3 inline comments as done and 14 inline comments as done.Jan 31 2017, 2:34 PM
wdng added inline comments.
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗(On Diff #86456)

S_00B84C_TRAP_HANDLER(MFI->hasTrapHandler()) needs to know whether trap handler has been set. Where should we the based on the sub- target feature?

kzhuravl added inline comments.Jan 31 2017, 2:51 PM
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗(On Diff #86456)

Set enable_trap_handler bit if Subtarget->getTrapHandlerAbi() != TrapHandlerAbiNone?

lib/Target/AMDGPU/SIMachineFunctionInfo.h
154 ↗(On Diff #86456)

not needed.

kzhuravl added inline comments.Jan 31 2017, 2:54 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1806 ↗(On Diff #86456)

@tony-tye, do we want to use other code for s_trap for llvm.debugtrap?

tony-tye added inline comments.Jan 31 2017, 3:32 PM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
641 ↗(On Diff #86456)

Should this be querying the TrapHandlerAbi using getTrapHandlerAbi() from the subtarget and setting to TRUE if not equal to TrapHandlerAbiNone ?

lib/Target/AMDGPU/AMDGPUSubtarget.h
101 ↗(On Diff #86456)

Should this be deleted now as there is getTrapHandlerAbi() instead?

lib/Target/AMDGPU/SIISelLowering.cpp
1806 ↗(On Diff #86456)

Should the 0x1 constant be a named enumeration?

Should we use a different values for llvm.trap than for llvm.debugtrap? If so the handling of llvm.debugtrap should not be converted to llvm.trap for AMDGPU.

Have we checked with teh HSA Runtime to see what codes the HSA trap handler is expecting?

1811–1818 ↗(On Diff #86456)

Should we be emitting any diagnostic here? If there is no trap handler then doesn't that imply that the environment wants an S_ENDPGM to be generated instead? That is not an error, it is that the environment demands as the implementation. For example, that is what the graphics environment wants.

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
67 ↗(On Diff #86456)

Delete too?

136–137 ↗(On Diff #86456)

Delete too?

lib/Target/AMDGPU/SIMachineFunctionInfo.h
154 ↗(On Diff #86456)

Should this be deleted? See above.

261–263 ↗(On Diff #86456)

Delete as now we have the trap ABI query instead.

Also need to use the trap handler ABI query to see if there is a trap handler, and if there is add the TRAP_HANDLER_SGPR_COUNT to the number of SGPRs budgeted for the wave in determining the number of waves per EU calculation. TRAP_HANDLER_SGPR_COUNT is 16 for GFX6 onwards.

wdng updated this revision to Diff 86841.Feb 2 2017, 9:47 AM
wdng marked 23 inline comments as done.

Address code reviews.

arsenm added inline comments.Feb 2 2017, 10:19 AM
lib/Target/AMDGPU/AMDGPUSubtarget.h
266–269 ↗(On Diff #86841)

Still using return after else. Should be return ternary operator

328–330 ↗(On Diff #86841)

This should be a subtarget feature. Returning this is still a constant based purely on the triple. See FeatureUnalignedBufferAccess for an example

test/CodeGen/AMDGPU/trap.ll
1 ↗(On Diff #86841)

You should have a run line with no subtarget features enabled, and one each explicitly enabling and disabling the trap handler subtarget feature

wdng added inline comments.Feb 2 2017, 12:19 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1792 ↗(On Diff #86456)

Say, if we enable trap handler via subtarget feature, should we need to change the if here?

wdng added a comment.EditedFeb 2 2017, 12:32 PM

We enable trap handler support once "-mtriple=amdgcn--amdhsa" is specified. So use the following code to enable trap handler.

bool isTrapHandlerEnabled() const {
  return getTrapHandlerAbi() == TrapHandlerAbiHsa;
}

Not sure why we need to add a subtarget feature. Say if we replace code 328-330 using subtarget feature, do we want to enable trap handler by using both "-mtriple=amdgcn--amdhsa" and "-mattr=+enable-trap-handler"?

def FeatureTrapHandler: SubtargetFeature<"enable-trap-handler",
  "EnableTrapHandler",
  "true",
  "Enable trap handler support"
>;

At SIISelLowering.cpp we enable trap handler based on "mtriple=amdgcn--amdhsa", shall we change to based on the subtarget feature input?

if (Subtarget->getTrapHandlerAbi() == SISubtarget::TrapHandlerAbiHsa) {
 . . . 
}

What do you think?

I do not see updates to waves-per-eu calculations for SGPRs. Did you upload the correct diff?

wdng added a comment.Feb 2 2017, 1:14 PM

I do not see updates to waves-per-eu calculations for SGPRs. Did you upload the correct diff?

Yes, I did. Introduced TRAP_HANDLER_SGPR_COUNT and add it into ExtraSGPRs.

tony-tye added inline comments.Feb 2 2017, 10:00 PM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
482–484 ↗(On Diff #86841)

I do not think enabling the trap handler reduces the number of available SGPRs. The hardware allocates and provides access to theses "in addition" to the regular SGPRs. If this is reserving registers then it should be deleted.

tony-tye added inline comments.Feb 2 2017, 10:00 PM
lib/Target/AMDGPU/AMDGPUSubtarget.h
73 ↗(On Diff #86841)

Add enumeration for S_TRAP codes:

enum TrapCode {

TrapCodeLLVMTrap = 1,
TrapCodeLLVMDebugTrap = 2

};

328–330 ↗(On Diff #86841)

Could you explain this more? The trap handler being present is determined by the environment part of the triple. Currently only the HSA environment uses a trap handler.

329 ↗(On Diff #86841)

In order to allow additional trap handler ABIs this should be:
getTrapHandlerAbi() != TrapHandlerAbiNone

508 ↗(On Diff #86841)

I do not see this being used anywhere. I think it should be used in the waves_per_eu calculation.

lib/Target/AMDGPU/SIISelLowering.cpp
1806 ↗(On Diff #86456)

I checked with the HSA Runtime and currently the trap code is not being consulted. So I propose adding an enum for the codes being used for s_trap and use the value here. It would be better to different code for llvm.trap and llvm.debugtrap. Would that require S_TRAP_PSEUDO to have an immediate operand that is set to the trap code used in the S_TRAP instruction.

test/CodeGen/AMDGPU/trap.ll
21 ↗(On Diff #86841)

Would like it to be:
; HSA: s_trap 2

wdng updated this revision to Diff 86981.Feb 3 2017, 9:30 AM

Address code reviews.

arsenm added inline comments.Feb 3 2017, 12:08 PM
lib/Target/AMDGPU/AMDGPU.td
70 ↗(On Diff #86981)

Remove the word enable-

wdng updated this revision to Diff 87286.Feb 6 2017, 1:19 PM
  1. Separate llvm.debugtrap and llvm.trap.
  2. Decrease 16 from available SGPRs once trap handler is enabled.
arsenm added inline comments.Feb 6 2017, 1:40 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1824 ↗(On Diff #87286)

Can we have an enum for these values somewhere instead of hard coding 1?

1833–1838 ↗(On Diff #87286)

This duplicates nearly the entire other switch case. They should be consolidated based on the immediate argument and handled with the single pseudo

lib/Target/AMDGPU/SIInstructions.td
117 ↗(On Diff #87286)

This conflicts with the VPseudoInst type. You should remove this and change to SPseudoInstSI

121–127 ↗(On Diff #87286)

You do not need a second pseudo. You should just add an operand to the other pseudo which is just mimicking the operands of the physical instruction.

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87286)

Variable naming convention

test/CodeGen/AMDGPU/trap.ll
1 ↗(On Diff #87286)

Having FUNC and HSA-FUNC doesn't make sense. Replace these both with just GCN

2 ↗(On Diff #87286)

Missing a run line with mesa triple

11–13 ↗(On Diff #87286)

HSA:

20–21 ↗(On Diff #87286)

The check prefix names should be like HSA-TRAP, HSA-NOTRAP, MESA-TRAP, MESA-NOTRAP

wdng updated this revision to Diff 87324.Feb 6 2017, 3:32 PM
wdng marked 7 inline comments as done.

Address code reviews.

arsenm added inline comments.Feb 6 2017, 9:00 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1034–1045 ↗(On Diff #87324)

You should not and do not need to touch generic code

lib/Target/AMDGPU/SIISelLowering.cpp
276 ↗(On Diff #87324)

DEBUGTRAP should also be legal, you are handling the differences in the selection pattern and custom inserter

lib/Target/AMDGPU/SIInstructions.td
114 ↗(On Diff #87324)

You should use i16imm for the operand type to match the instruction (not that it matters much) and name it the same too (which matters more): ( ins i16imm:$simm16)

395 ↗(On Diff #87324)

Can you define pseudo-enums for these constants (see for example SRCMODS)

test/CodeGen/AMDGPU/trap.ll
1–2 ↗(On Diff #87324)

GCN and HSA are not alternative check prefixes. It doesn't make sense to have it on one of these but not the other. HSA-TRAP for the disabled trap handler feature is broken

3–4 ↗(On Diff #87324)

These should both explicitly use the mesa triple, and enable/disable the trap handler in each. The ones which should error are also missing the not.

25–26 ↗(On Diff #87324)

These check lines should not be able to coexist

wdng added inline comments.Feb 6 2017, 11:00 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1034–1045 ↗(On Diff #87324)

I accidentally touched those lines and have reverted them, thanks!

lib/Target/AMDGPU/SIISelLowering.cpp
276 ↗(On Diff #87324)

Yes, I did set DEBUGTRAP to Legal for next pattern matching at isel.

test/CodeGen/AMDGPU/trap.ll
3–4 ↗(On Diff #87324)

As per our discussion with @tony-tye last week, looks like we want to issue warning instead of error, correct?

wdng updated this revision to Diff 87529.Feb 7 2017, 2:52 PM
wdng marked 6 inline comments as done.

Address code reviews.

tony-tye added inline comments.Feb 7 2017, 3:13 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

Is this the right place to useSISubtarget::TRAP_HANDLER_SGPR_COUNT? The presence of a trap handler does not reduce the number of SGPRs addressable for allocation by the register allocator. It only affects the wave occupancy calculation (I did not see any change to that code).

wdng added inline comments.Feb 7 2017, 3:33 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

Yes, this is the right place. -mtriple=amdgcn--amdhsa defaults trap handler feature, so ST.isTrapHandlerEnabled() is true, then 16 SGPRs will be deduced otherwise 0 SGPRs will be deducted. @kzhuravl What do you think?

arsenm added inline comments.Feb 7 2017, 7:31 PM
lib/Target/AMDGPU/AMDKernelCodeT.h
198 ↗(On Diff #87529)

This change should be fully reverted

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

I don't think so. This is the allocated number

test/CodeGen/AMDGPU/trap.ll
5–9 ↗(On Diff #87529)

You seem to have removed the HSA enable/disable and do mesa twice?

21 ↗(On Diff #87529)

You should also check for the rsrc2 bit is set

wdng marked an inline comment as done.Feb 7 2017, 8:29 PM
wdng added inline comments.
test/CodeGen/AMDGPU/trap.ll
5–9 ↗(On Diff #87529)

I also do HSA enable/disable twice:
; enable HSA trap handler
; RUN: llc -mtriple=amdgcn--amdhsa -mattr=+trap-handler -verify-machineinstrs < %s | FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP %s
; disable HSA trap handler
; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s | FileCheck -check-prefix=GCN -check-prefix=NO-HSA-TRAP %s
; disable HSA trap handler to catch warning for llvm.debugtrap since llvm.trap doesn't issue any warnings
; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING %s

wdng updated this revision to Diff 87587.Feb 7 2017, 9:05 PM
wdng marked an inline comment as done.

Address code reviews.

arsenm added inline comments.Feb 8 2017, 10:25 AM
lib/Target/AMDGPU/SIISelLowering.cpp
1797 ↗(On Diff #87587)

Should have enum (probably in SIDefines.h) for the values put in

wdng added inline comments.Feb 8 2017, 11:39 AM
lib/Target/AMDGPU/SIISelLowering.cpp
1797 ↗(On Diff #87587)

This probably is not an enum. It looks like an interface with HSA is that we need to set v0 to 1:

v0 <- 1

kzhuravl added inline comments.Feb 8 2017, 11:41 AM
lib/Target/AMDGPU/SIISelLowering.cpp
1797 ↗(On Diff #87587)

I think defining it in SIDefines.h for now is OK. I am working on some restructuring to shared header files, and will include it in the restructuring.

kzhuravl added inline comments.Feb 8 2017, 12:26 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

I think this should be moved to getMaxNumSGPRs. We still want to report the correct number of addressable SGPRs regardless of trap handler.

wdng added inline comments.Feb 8 2017, 12:32 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

I think in terms of functionality it's the same no matter whether we move the code into getMaxNumSGPRs or not. As getMaxNumSGPRs still invokes this function to calculate MaxNumSGPRs and MaxNumAddressableSGPRs. What do you think?

kzhuravl added inline comments.Feb 8 2017, 12:37 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

Not really. We need to record addressable number of SGPRs in the metadata for tools such as debugger and profiler. The addressable number of SGPRs should be recorded correctly regardless of trap handler (102 for 8+, 104 for everything else).

https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AMDGPU/MCTargetDesc/AMDGPURuntimeMD.cpp#L422

wdng updated this revision to Diff 87698.Feb 8 2017, 1:08 PM
  1. Add rsrc2 trap handler bit check in lit tests.
  2. Moved the trap handler SGPRs calculation into getMaxNumSGPRs
  3. Add a trap operand enum for the time being to replace a constant value 1.
kzhuravl added inline comments.Feb 8 2017, 1:14 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗(On Diff #87529)

Can we drop waves-per-eu calculations from this patch? We need to change few things to math, which can be done as a separate patch.

wdng updated this revision to Diff 87702.Feb 8 2017, 1:25 PM

Drop waves-per-eu calculations.

arsenm added inline comments.Feb 8 2017, 1:56 PM
test/CodeGen/AMDGPU/trap.ll
1 ↗(On Diff #87702)

Missing GCN check prefix

12 ↗(On Diff #87702)

You don't need a check prefix for NO-RSRC2-BIT. That is too specific, the checks should be based on the environment type/options

25 ↗(On Diff #87702)

This is just the comment printed. You should check the actual register config register value too

wdng updated this revision to Diff 87727.Feb 8 2017, 4:08 PM

Address code reviews.

tony-tye added inline comments.Feb 8 2017, 5:16 PM
lib/Target/AMDGPU/AMDGPUSubtarget.h
74–82 ↗(On Diff #87727)

After discussion with the HSA runtime team it was decided to use separate S_TRAP codes for each purpose. The current codes would be:

enum TrapCode {
  TrapCodeBreakpoint = 0,
  TrapCodeLLVMTrap = 1,
  TrapCodeLLVMDebugTrap = 2,
  TrapCodeHSADebugTrap = 3
};

These need documenting in the AMDGPU LLVM Spec page.

For now we do not have an intrinsic for the HSA debugtrap (which takes an argument) so remove the TrapRegValues enum.

lib/Target/AMDGPU/SIISelLowering.cpp
1796–1797 ↗(On Diff #87727)

Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.

tony-tye added inline comments.Feb 8 2017, 5:17 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1811 ↗(On Diff #87727)

Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.

tony-tye added inline comments.Feb 8 2017, 5:23 PM
lib/Target/AMDGPU/AMDGPUSubtarget.h
74–82 ↗(On Diff #87727)

The TRAP handler codes can be defined in a new section of:

http://llvm.org/docs/AMDGPUUsage.html
wdng updated this revision to Diff 87761.Feb 8 2017, 10:22 PM
wdng marked 2 inline comments as done.

Added enum for better HSA interface and removed mov instruction for V0 as llvm.trap and llvm.debugtrap will not pass in any value.

wdng updated this revision to Diff 87764.Feb 8 2017, 10:49 PM

Upload correct diff file.

tony-tye added inline comments.Feb 9 2017, 8:15 AM
lib/Target/AMDGPU/SIISelLowering.cpp
1820 ↗(On Diff #87764)

Since llvm.debugtrap is not defined as NORETURN then it seems after a warning it should be lowered to a S_NOP not an S_ENDPGM.

llvm.trap is defined as NORETURN so generating S_ENDPGM seems the right choice when there is no trap handler.

wdng updated this revision to Diff 87816.Feb 9 2017, 8:35 AM

Add S_NOP.

wdng updated this revision to Diff 87817.Feb 9 2017, 8:44 AM

Upload full correct diff.

tony-tye added inline comments.Feb 9 2017, 9:25 AM
lib/Target/AMDGPU/SIISelLowering.cpp
1811 ↗(On Diff #87817)

Since TrapType is an enum type should this be a switch with a default: that asserts? Currently the code will silently ignore any new trap types that are added (an assert would avoid that).

wdng updated this revision to Diff 87848.Feb 9 2017, 11:50 AM

Address code reviews.

arsenm added inline comments.Feb 9 2017, 12:10 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
87 ↗(On Diff #87848)

Sort

lib/Target/AMDGPU/AMDGPUSubtarget.h
106 ↗(On Diff #87848)

Can you sort this later until after EnableXNACK? I think the unaligned options should stay together

lib/Target/AMDGPU/SIISelLowering.cpp
1813–1815 ↗(On Diff #87848)

Asserting !the switch case in the switch case is ugly. There's no reason to special case this, just put unreachable in the default case

1826 ↗(On Diff #87848)

addImm on next line.

We don't really need to emit anything here though

1829–1832 ↗(On Diff #87848)

Just let it default

1834 ↗(On Diff #87848)

llvm_unreachable

tony-tye added inline comments.Feb 9 2017, 12:45 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1826 ↗(On Diff #87848)

Discussed with @arsenm and leaving as a NOP does give some flexibility asa tool may want to patch these points. Since llvm.debugtrap is not being generally used it does not hurt to have a NOP.

wdng updated this revision to Diff 87859.Feb 9 2017, 12:59 PM
wdng marked 2 inline comments as done.

Address code reviews.

wdng updated this revision to Diff 87863.Feb 9 2017, 1:08 PM

Discussed with @tony-tye , keep only 2 CASEs that can happen for the pseudo trap is llvmtrap and llvmdebugtrap in switch block.

arsenm accepted this revision.Feb 9 2017, 3:46 PM

LGTM except for minor issues

lib/Target/AMDGPU/SIISelLowering.cpp
1828 ↗(On Diff #87863)

Remove dead line

lib/Target/AMDGPU/SIInstructions.td
392 ↗(On Diff #87863)

Space after :

402 ↗(On Diff #87863)

Ditto

This revision was automatically updated to reflect the committed changes.