This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1
AMDGPU.td
7
AMDGPUAnnotateKernelFeatures.cpp
1/2
AMDGPUAsmPrinter.cpp
4/13
AMDGPUSubtarget.h
1
AMDGPUSubtarget.cpp
-
SIDefines.h
20/36
SIISelLowering.cpp
-
SIInstrInfo.td
4/6
SIInstructions.td
-
Utils/
1/2
AMDKernelCodeTInfo.h
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
fneg-fabs.f16.ll
4/34
trap.ll

Differential D26010

AMDGPU : Add trap handler support.
ClosedPublic

Authored by wdng on Oct 26 2016, 1:33 PM.

Download Raw Diff

Details

Reviewers

tony-tye
b-sumner
• tstellarAMD
kzhuravl
arsenm

Commits

rG205bfdb3e9b0: AMDGPU : Add trap handler support.
rL294692: AMDGPU : Add trap handler support.

Summary

Add trap handler support.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This revision now requires changes to proceed.Oct 28 2016, 2:01 PM

wdng retitled this revision from AMDGPU : Add s_trap intrinsic. to AMDGPU : Add trap handler support..Jan 23 2017, 9:36 PM

wdng edited the summary of this revision. (Show Details)

I am sorry for checking in this code into LLVM repo by mistake. Please let me know if you have any comments for this patch so that I will modify & checkin accordingly. Thanks!

wdng added inline comments.Jan 24 2017, 9:16 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	Matts: "This also needs to handle debug.trap(), and the case when unreachable’s are turned into traps. There should also be tests specifically for the annotator."
test/CodeGen/AMDGPU/trap.ll
74–75	Matt: "Should check for the enabled feature bits in the kernel_code_t. This also doesn’t have anything setting enable_trap_handler"

wdng added inline comments.Jan 24 2017, 9:19 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	How should we handle debug.trap()? For the case of unreachable's, should we invoke debug.trap()?
test/CodeGen/AMDGPU/trap.ll
74–75	I think "@llvm.trap()" will enable_trap_handler, right?

arsenm added inline comments.Jan 24 2017, 11:19 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	There is a separate llvm.debugtrap intrinsic. I looked around SelectionDAG for other places that introduce ISD::TRAP. The expansion for ISD::DEBUGTRAP creates a TRAP. unreachable instructions also emit a trap if the DAG.getTarget().Options.TrapUnreachable is enabled.
test/CodeGen/AMDGPU/trap.ll
74–75	Yes, that is the problem. You need to be setting the trap handler bit

wdng added inline comments.Jan 24 2017, 12:29 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	Yes, ISD::DEBUGTRAP has been replaced with ISD::TRAP at the legalization phase. So, do we still need to map llvm.debugtrap intrinsic with amdgpu-queue-ptr?

tony-tye added inline comments.Jan 24 2017, 12:59 PM

test/CodeGen/AMDGPU/trap.ll
74–75	I am not sure if we should have the compiler be responsible for setting the enable_trap_handler bit. In general I don't think the compiler can figure this out on a per kernel basis. Once we support function calls and indirect calls how could be know if the closure of all functions include some that need the trap handler? Also, even if we did set the bit, for compute, the hardware CP microcode cannot do anything with it other than refuse to execute the kernel. Today I believe the CP micro code ignores this bit and always enables the trap handler as it is needed for CWSR (context switching). So it seems the presence of a trap handler is more a function of the environment that will execute the kernel than the kernel itself. So perhaps we should simply not define the trap handler bit. The code object is already marked as requiring the HSA environment and perhaps this implies the presence of a trap handler. I suspect that graphics kernels have their own "environment" they execute in and may not support trap handlers. If they do not then executing an S_TRAP will simply be a NOP which will not halt the shader.

arsenm added inline comments.Jan 24 2017, 1:07 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	Yes. You should add a test which uses this. As it is this will fail because the queue ptr won't be enabled
test/CodeGen/AMDGPU/trap.ll
74–75	I don't think that implies the presence of the trap handler. This is a field in the kernel_code_t, so we should set it. We can figure out a conservative setting in the future whenever indirect calls are needed. I thought enabling this also required reserving 16 SGPRs?

wdng added inline comments.Jan 24 2017, 1:53 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	To enable or disable queue ptr, it depends whether we have used "-mtriple=amdgcn--amdhsa" or not. Since ISD::DEBUGTRAP has been replaced with ISD::TRAP, I don't think the test will fail if we compile llvm.debugtrap with "-mtriple=amdgcn--amdhsa", correct?

arsenm added inline comments.Jan 24 2017, 1:56 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
193–194	That occurs in the DAG. Here you are trying to identify all the situations where ISD::TRAP will be emitted from the IR. Eventually I want to replace how the ABI is lowered so we don't need to do this anymore, but for now you must predict all of the traps.

wdng added inline comments.Jan 25 2017, 9:49 AM

test/CodeGen/AMDGPU/trap.ll
74–75	So, once compiler detects "llvm.trap()", shall we set up the trap handler bit in the kernel_code_t? I don't know why we need to add the trap handler bit. Does the bit setting will imply the pretense of the trap handler and otherwise it won't?

Fixed format issue.
Add test case for llvm.debugtrap.
Add non-hsa path trap handler.
Confirm that 16 reserved SGPRs are not part of the regular SGPRs.
Add trap handler bit set in the amd_kernel_code_t.

vim added some indentations by mistake, removed indentations & empty line.

arsenm added inline comments.Jan 26 2017, 3:19 PM

lib/Target/AMDGPU/SIISelLowering.cpp
276–279	This does not need an else
1794	const DebugLoc &DL
1986	return on next line
lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗	(On Diff #85971)	This should return void and doesn't need HSA in the name
test/CodeGen/AMDGPU/trap.ll
1–15	This should not be checking stderr since the test should pass with HSA
2	This needs a not
73	Check still missing for the enable_trap_handler bit in kernel_code_t
78	You should add another test which ends in unreachable. As far as I can tell there is no flag to set TrapUnreachable, and we don't enable it now, but this should catch it if that ever breaks
78	You also add another test which needs to enable the queue ptr for a different feature
91	GCN should be replaced with ERROR or something like that

tony-tye added inline comments.Jan 26 2017, 5:32 PM

test/CodeGen/AMDGPU/trap.ll
74–75	It seems that the presence of a trap handler is a property of producing a code object to be executed using the HSA environment. Under ROCM all kernels will be executed with a trap handler and that trap handler will use the HSA ABI. Other environments may have different trap handlers, or no trap handler at all, and the code generated for llvm.trap/debugtrap would change accordingly. The bit in amd_kernel_code_t would then be set according to the demands of environment independent of using traps. Having a trap handler present does cause an extra 16 SGPRs to be allocated (which should be taken into account when determining wave occupancy), but these are in addition to the SGRs used by the generated code (so no need to reserve them or include them in the SGPR count in the amd_kernel_code_t).

arsenm added inline comments.Jan 26 2017, 8:51 PM

test/CodeGen/AMDGPU/trap.ll
74–75	Do we still need to make sure we leave 16 unallocated for the implicit use? From the ABI spec it seems clear to not include them in the reported SGPR count, but do we need to ensure reported total + 16 is below the hardware limit?

tony-tye added inline comments.Jan 26 2017, 9:10 PM

test/CodeGen/AMDGPU/trap.ll
74–75	The hardware limit is the maximum non-privileged SGPRs that can be accessed using the instruction register encoding plus the 16 privileged trap registers. The trap temps are only allocated IF a trap handler is enabled, and are only accessible when executing the trap handler (using special instruction register encoding). So from the point of view of determining the limit of non-privileged SGPRs that can be allocated the presence of a trap handler can be ignored. The fact they are allocated needs to be considered when determining the number of waves that will fit on a CU.

Address code reviews.

Herald added a subscriber: tpr. · View Herald TranscriptJan 27 2017, 2:29 PM

wdng added inline comments.Jan 27 2017, 2:29 PM

lib/Target/AMDGPU/SIISelLowering.cpp
276–279	I think we need leave the else here. I assume default setting is Legal, but looks like we need to explicitly specify it otherwise it will throw an error.
test/CodeGen/AMDGPU/trap.ll
78	What feature should I test?
78	What kinds of instructiosn will trigger the trap instruction?

tony-tye added inline comments.Jan 27 2017, 2:34 PM

lib/Target/AMDGPU/SIISelLowering.cpp
276–279	The else implements the non-HSA expansion that generates an end_prm which is used for graphics.

Can you upload a full diff? Thanks

wdng added inline comments.Jan 27 2017, 2:43 PM

lib/Target/AMDGPU/SIISelLowering.cpp
276–279	yes, so I think we need to keep the else part.

Upload a full diff.

arsenm added inline comments.Jan 27 2017, 4:58 PM

lib/Target/AMDGPU/SIMachineFunctionInfo.h
265–266 ↗	(On Diff #86133)	Remove Ptr from the names
test/CodeGen/AMDGPU/trap.ll
78	You can do an address space cast from LDS to flat or do something with the queue ptr intrinsic

tony-tye added inline comments.Jan 27 2017, 9:25 PM

lib/Target/AMDGPU/SIISelLowering.cpp
276	In discussion with @kzhuravl the suggestion was to use a subtarget query to determine the trap abi. Currently there are two ABIs, one for HSA and one for graphics, but in the future there could be others.
lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗	(On Diff #86133)	As mentioned in another comment, I think determining if there is a trap handler should be determined from the environment part of the triple. Check with @kzhuravl where the query should be put.

As @tony-tye mentioned trap handler is more of an environment property (see https://reviews.llvm.org/D26010#inline-251754), so it should be moved into AMDGPUSubtarget or SISubtarget, and we should be setting trap handler bit in pgm rsrc 2 based on the subtarget query.

In order to allow extensibility in the future the suggestion is to introduce an enumeration with available trap handler ABIs. Currently there will be only 2, something along the lines:

enum TrapHandlerAbi {
  TrapHandlerAbiNone = 0,
  TrapHandlerAbiHsa = 1
}

Also, introduce a subtarget query that returns trap handler abi:

TrapHandlerAbi() {
  if (isAmdHsaOs)
    return TrapHandlerAbiHsa;
  return TrapHandlerAbiNone;
}

Every ISD::TRAP is going to be lowered into S_TRAP_PSEUDO regardless of the trap handler ABI.
In order to expand S_TRAP_PSEUDO, we will have to query what kind of trap handler ABI does the subtarget have. If it is an HSA, then move 1 into v0, and queue ptr into s[0:1]. If trap ABI is none, then lower it to s_engpgm.

@tony-tye did I miss anything? @arsenm, @wdng what do you think?

Thanks

lib/Target/AMDGPU/AMDKernelCodeT.h
200–202 ↗	(On Diff #86133)	This is not right. Trap handler bit is located in `amd_compute_pgm_rsrc2_t` with the field name `enable_trap_handler`. Refer to: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/amd_hsa_kernel_code.h#L114 https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc/blob/master/AMDGPU-ABI.md#compute-shader-program-settings-2-amd_compute_pgm_rsrc2_t
lib/Target/AMDGPU/SIISelLowering.cpp
278–279	If we go with the approach in my latest comment, custom lowering won't be needed.
1789–1790	If we go with the approach in my latest comment, this should switch on ABI type, for HSA ABI this code will be used. For "none" ABIs `s_engpgm` should be used (without error or warning).
2485–2501	If we go with the approach in my latest comment, this will be gone.
lib/Target/AMDGPU/SIMachineFunctionInfo.h
253 ↗	(On Diff #86133)	If we go with the approach in my latest comment, this will be moved to subtarget class.
lib/Target/AMDGPU/Utils/AMDKernelCodeTInfo.h
90	`enable_trap_handler` should be here (see comment with links).
116	There is no `is_trap_handler_supported` in `CODEPROP` (see comment with links).

Also, we need to update waves-per-eu calculations.

In D26010#660492, @kzhuravl wrote:

Also, we need to update waves-per-eu calculations.

What kinds of changes should we made for the waves-per-eu?

Restructure code to make it extensible based on code reviews.
Move trap handler bit in amd_compute_pgm_rsrc2_t.

updated patch with complete & full diff
Restructure code to make it extensible based on code reviews.
Move trap handler bit in amd_compute_pgm_rsrc2_t.

arsenm added inline comments.Jan 31 2017, 11:01 AM

lib/Target/AMDGPU/AMDGPUSubtarget.h
112	Comment unnecessary, also nothing HSA specific here
277	Return enum type
277–282	Badly formatted. return ternary operator
lib/Target/AMDGPU/SIISelLowering.cpp
1793	Compare to enum value
1805–1809	Formatting of previous code needs to be fixed
1810–1811	Else on previous line. Instead of else you can early return on the error path
1817–1818	Badly formatted. You already have MF and get the Function* above, so you don't need double getParent
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗	(On Diff #86456)	This should be removed. The trap handler should be set based on a subtarget feature

wdng marked 3 inline comments as done and 14 inline comments as done.Jan 31 2017, 2:34 PM

wdng added inline comments.

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗	(On Diff #86456)	S_00B84C_TRAP_HANDLER(MFI->hasTrapHandler()) needs to know whether trap handler has been set. Where should we the based on the sub- target feature?

kzhuravl added inline comments.Jan 31 2017, 2:51 PM

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
136–137 ↗	(On Diff #86456)	Set enable_trap_handler bit if Subtarget->getTrapHandlerAbi() != TrapHandlerAbiNone?
lib/Target/AMDGPU/SIMachineFunctionInfo.h
154 ↗	(On Diff #86456)	not needed.

kzhuravl added inline comments.Jan 31 2017, 2:54 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1807	@tony-tye, do we want to use other code for `s_trap` for `llvm.debugtrap`?

tony-tye added inline comments.Jan 31 2017, 3:32 PM

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
641	Should this be querying the TrapHandlerAbi using getTrapHandlerAbi() from the subtarget and setting to TRUE if not equal to TrapHandlerAbiNone ?
lib/Target/AMDGPU/AMDGPUSubtarget.h
113	Should this be deleted now as there is getTrapHandlerAbi() instead?
lib/Target/AMDGPU/SIISelLowering.cpp
1807	Should the 0x1 constant be a named enumeration? Should we use a different values for llvm.trap than for llvm.debugtrap? If so the handling of llvm.debugtrap should not be converted to llvm.trap for AMDGPU. Have we checked with teh HSA Runtime to see what codes the HSA trap handler is expecting?
1812–1819	Should we be emitting any diagnostic here? If there is no trap handler then doesn't that imply that the environment wants an S_ENDPGM to be generated instead? That is not an error, it is that the environment demands as the implementation. For example, that is what the graphics environment wants.
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
67 ↗	(On Diff #86456)	Delete too?
136–137 ↗	(On Diff #86456)	Delete too?
lib/Target/AMDGPU/SIMachineFunctionInfo.h
154 ↗	(On Diff #86456)	Should this be deleted? See above.
261–263 ↗	(On Diff #86456)	Delete as now we have the trap ABI query instead.

Also need to use the trap handler ABI query to see if there is a trap handler, and if there is add the TRAP_HANDLER_SGPR_COUNT to the number of SGPRs budgeted for the wave in determining the number of waves per EU calculation. TRAP_HANDLER_SGPR_COUNT is 16 for GFX6 onwards.

Address code reviews.

arsenm added inline comments.Feb 2 2017, 10:19 AM

lib/Target/AMDGPU/AMDGPUSubtarget.h
278–281	Still using return after else. Should be return ternary operator
341–343	This should be a subtarget feature. Returning this is still a constant based purely on the triple. See FeatureUnalignedBufferAccess for an example
test/CodeGen/AMDGPU/trap.ll
1–15	You should have a run line with no subtarget features enabled, and one each explicitly enabling and disabling the trap handler subtarget feature

wdng added inline comments.Feb 2 2017, 12:19 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1793	Say, if we enable trap handler via subtarget feature, should we need to change the if here?

We enable trap handler support once "-mtriple=amdgcn--amdhsa" is specified. So use the following code to enable trap handler.

bool isTrapHandlerEnabled() const {
  return getTrapHandlerAbi() == TrapHandlerAbiHsa;
}

Not sure why we need to add a subtarget feature. Say if we replace code 328-330 using subtarget feature, do we want to enable trap handler by using both "-mtriple=amdgcn--amdhsa" and "-mattr=+enable-trap-handler"?

def FeatureTrapHandler: SubtargetFeature<"enable-trap-handler",
  "EnableTrapHandler",
  "true",
  "Enable trap handler support"
>;

At SIISelLowering.cpp we enable trap handler based on "mtriple=amdgcn--amdhsa", shall we change to based on the subtarget feature input?

if (Subtarget->getTrapHandlerAbi() == SISubtarget::TrapHandlerAbiHsa) {
 . . . 
}

What do you think?

I do not see updates to waves-per-eu calculations for SGPRs. Did you upload the correct diff?

In D26010#665171, @kzhuravl wrote:

I do not see updates to waves-per-eu calculations for SGPRs. Did you upload the correct diff?

Yes, I did. Introduced TRAP_HANDLER_SGPR_COUNT and add it into ExtraSGPRs.

tony-tye added inline comments.Feb 2 2017, 10:00 PM

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
482–484	I do not think enabling the trap handler reduces the number of available SGPRs. The hardware allocates and provides access to theses "in addition" to the regular SGPRs. If this is reserving registers then it should be deleted.

tony-tye added inline comments.Feb 2 2017, 10:00 PM

lib/Target/AMDGPU/AMDGPUSubtarget.h
73	Add enumeration for S_TRAP codes: enum TrapCode { TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2 };
341–343	Could you explain this more? The trap handler being present is determined by the environment part of the triple. Currently only the HSA environment uses a trap handler.
342	In order to allow additional trap handler ABIs this should be: getTrapHandlerAbi() != TrapHandlerAbiNone
517	I do not see this being used anywhere. I think it should be used in the waves_per_eu calculation.
lib/Target/AMDGPU/SIISelLowering.cpp
1807	I checked with the HSA Runtime and currently the trap code is not being consulted. So I propose adding an enum for the codes being used for s_trap and use the value here. It would be better to different code for llvm.trap and llvm.debugtrap. Would that require S_TRAP_PSEUDO to have an immediate operand that is set to the trap code used in the S_TRAP instruction.
test/CodeGen/AMDGPU/trap.ll
88	Would like it to be: ; HSA: s_trap 2

Address code reviews.

arsenm added inline comments.Feb 3 2017, 12:08 PM

lib/Target/AMDGPU/AMDGPU.td
70	Remove the word enable-

Separate llvm.debugtrap and llvm.trap.
Decrease 16 from available SGPRs once trap handler is enabled.

arsenm added inline comments.Feb 6 2017, 1:40 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1858	Can we have an enum for these values somewhere instead of hard coding 1?
1867–1872	This duplicates nearly the entire other switch case. They should be consolidated based on the immediate argument and handled with the single pseudo
lib/Target/AMDGPU/SIInstructions.td
116	This conflicts with the VPseudoInst type. You should remove this and change to SPseudoInstSI
120–126	You do not need a second pseudo. You should just add an operand to the other pseudo which is just mimicking the operands of the physical instruction.
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87286)	Variable naming convention
test/CodeGen/AMDGPU/trap.ll
1–15	Having FUNC and HSA-FUNC doesn't make sense. Replace these both with just GCN
2	Missing a run line with mesa triple
77–79	HSA:
86–87	The check prefix names should be like HSA-TRAP, HSA-NOTRAP, MESA-TRAP, MESA-NOTRAP

Address code reviews.

arsenm added inline comments.Feb 6 2017, 9:00 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1034–1045 ↗	(On Diff #87324)	You should not and do not need to touch generic code
lib/Target/AMDGPU/SIISelLowering.cpp
276	DEBUGTRAP should also be legal, you are handling the differences in the selection pattern and custom inserter
lib/Target/AMDGPU/SIInstructions.td
114	You should use i16imm for the operand type to match the instruction (not that it matters much) and name it the same too (which matters more): ( ins i16imm:$simm16)
404	Can you define pseudo-enums for these constants (see for example SRCMODS)
test/CodeGen/AMDGPU/trap.ll
1–16	GCN and HSA are not alternative check prefixes. It doesn't make sense to have it on one of these but not the other. HSA-TRAP for the disabled trap handler feature is broken
3–4	These should both explicitly use the mesa triple, and enable/disable the trap handler in each. The ones which should error are also missing the not.
90–91	These check lines should not be able to coexist

wdng added inline comments.Feb 6 2017, 11:00 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1034–1045 ↗	(On Diff #87324)	I accidentally touched those lines and have reverted them, thanks!
lib/Target/AMDGPU/SIISelLowering.cpp
276	Yes, I did set DEBUGTRAP to Legal for next pattern matching at isel.
test/CodeGen/AMDGPU/trap.ll
3–4	As per our discussion with @tony-tye last week, looks like we want to issue warning instead of error, correct?

Address code reviews.

tony-tye added inline comments.Feb 7 2017, 3:13 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	Is this the right place to useSISubtarget::TRAP_HANDLER_SGPR_COUNT? The presence of a trap handler does not reduce the number of SGPRs addressable for allocation by the register allocator. It only affects the wave occupancy calculation (I did not see any change to that code).

wdng added inline comments.Feb 7 2017, 3:33 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	Yes, this is the right place. -mtriple=amdgcn--amdhsa defaults trap handler feature, so ST.isTrapHandlerEnabled() is true, then 16 SGPRs will be deduced otherwise 0 SGPRs will be deducted. @kzhuravl What do you think?

arsenm added inline comments.Feb 7 2017, 7:31 PM

lib/Target/AMDGPU/AMDKernelCodeT.h
198 ↗	(On Diff #87529)	This change should be fully reverted
lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	I don't think so. This is the allocated number
test/CodeGen/AMDGPU/trap.ll
5–9	You seem to have removed the HSA enable/disable and do mesa twice?
19	You should also check for the rsrc2 bit is set

wdng marked an inline comment as done.Feb 7 2017, 8:29 PM

wdng added inline comments.

test/CodeGen/AMDGPU/trap.ll
5–9	I also do HSA enable/disable twice: ; enable HSA trap handler ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=+trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP %s ; disable HSA trap handler ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=NO-HSA-TRAP %s ; disable HSA trap handler to catch warning for llvm.debugtrap since llvm.trap doesn't issue any warnings ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING %s

Address code reviews.

arsenm added inline comments.Feb 8 2017, 10:25 AM

lib/Target/AMDGPU/SIISelLowering.cpp
1797	Should have enum (probably in SIDefines.h) for the values put in

wdng added inline comments.Feb 8 2017, 11:39 AM

lib/Target/AMDGPU/SIISelLowering.cpp
1797	This probably is not an enum. It looks like an interface with HSA is that we need to set v0 to 1: v0 <- 1

kzhuravl added inline comments.Feb 8 2017, 11:41 AM

lib/Target/AMDGPU/SIISelLowering.cpp
1797	I think defining it in SIDefines.h for now is OK. I am working on some restructuring to shared header files, and will include it in the restructuring.

kzhuravl added inline comments.Feb 8 2017, 12:26 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	I think this should be moved to `getMaxNumSGPRs`. We still want to report the correct number of addressable SGPRs regardless of trap handler.

wdng added inline comments.Feb 8 2017, 12:32 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	I think in terms of functionality it's the same no matter whether we move the code into getMaxNumSGPRs or not. As getMaxNumSGPRs still invokes this function to calculate MaxNumSGPRs and MaxNumAddressableSGPRs. What do you think?

kzhuravl added inline comments.Feb 8 2017, 12:37 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	Not really. We need to record addressable number of SGPRs in the metadata for tools such as debugger and profiler. The addressable number of SGPRs should be recorded correctly regardless of trap handler (102 for 8+, 104 for everything else). https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AMDGPU/MCTargetDesc/AMDGPURuntimeMD.cpp#L422

Add rsrc2 trap handler bit check in lit tests.
Moved the trap handler SGPRs calculation into getMaxNumSGPRs
Add a trap operand enum for the time being to replace a constant value 1.

kzhuravl added inline comments.Feb 8 2017, 1:14 PM

lib/Target/AMDGPU/SIRegisterInfo.cpp
1166 ↗	(On Diff #87529)	Can we drop waves-per-eu calculations from this patch? We need to change few things to math, which can be done as a separate patch.

Drop waves-per-eu calculations.

arsenm added inline comments.Feb 8 2017, 1:56 PM

test/CodeGen/AMDGPU/trap.ll
1–15	Missing GCN check prefix
12	You don't need a check prefix for NO-RSRC2-BIT. That is too specific, the checks should be based on the environment type/options
25	This is just the comment printed. You should check the actual register config register value too

Address code reviews.

tony-tye added inline comments.Feb 8 2017, 5:16 PM

lib/Target/AMDGPU/AMDGPUSubtarget.h
74–82	After discussion with the HSA runtime team it was decided to use separate S_TRAP codes for each purpose. The current codes would be: enum TrapCode { TrapCodeBreakpoint = 0, TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2, TrapCodeHSADebugTrap = 3 }; These need documenting in the AMDGPU LLVM Spec page. For now we do not have an intrinsic for the HSA debugtrap (which takes an argument) so remove the TrapRegValues enum.
lib/Target/AMDGPU/SIISelLowering.cpp
1796–1797	Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.

tony-tye added inline comments.Feb 8 2017, 5:17 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1811	Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.

tony-tye added inline comments.Feb 8 2017, 5:23 PM

lib/Target/AMDGPU/AMDGPUSubtarget.h
74–82	The TRAP handler codes can be defined in a new section of: http://llvm.org/docs/AMDGPUUsage.html

Added enum for better HSA interface and removed mov instruction for V0 as llvm.trap and llvm.debugtrap will not pass in any value.

kzhuravl mentioned this in D29741: [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement.Feb 8 2017, 10:46 PM

Upload correct diff file.

tony-tye added inline comments.Feb 9 2017, 8:15 AM

lib/Target/AMDGPU/SIISelLowering.cpp
1820	Since llvm.debugtrap is not defined as NORETURN then it seems after a warning it should be lowered to a S_NOP not an S_ENDPGM. llvm.trap is defined as NORETURN so generating S_ENDPGM seems the right choice when there is no trap handler.

Add S_NOP.

Upload full correct diff.

tony-tye added inline comments.Feb 9 2017, 9:25 AM

lib/Target/AMDGPU/SIISelLowering.cpp
1811	Since TrapType is an enum type should this be a switch with a default: that asserts? Currently the code will silently ignore any new trap types that are added (an assert would avoid that).

Address code reviews.

arsenm added inline comments.Feb 9 2017, 12:10 PM

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
87	Sort
lib/Target/AMDGPU/AMDGPUSubtarget.h
106	Can you sort this later until after EnableXNACK? I think the unaligned options should stay together
lib/Target/AMDGPU/SIISelLowering.cpp
1813–1815	Asserting !the switch case in the switch case is ugly. There's no reason to special case this, just put unreachable in the default case
1826	addImm on next line. We don't really need to emit anything here though
1829–1832	Just let it default
1834	llvm_unreachable

tony-tye added inline comments.Feb 9 2017, 12:45 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1826	Discussed with @arsenm and leaving as a NOP does give some flexibility asa tool may want to patch these points. Since llvm.debugtrap is not being generally used it does not hurt to have a NOP.

Address code reviews.

Discussed with @tony-tye , keep only 2 CASEs that can happen for the pseudo trap is llvmtrap and llvmdebugtrap in switch block.

LGTM except for minor issues

lib/Target/AMDGPU/SIISelLowering.cpp
1828	Remove dead line
lib/Target/AMDGPU/SIInstructions.td
392	Space after :
402	Ditto

Closed by commit rL294692: AMDGPU : Add trap handler support. (authored by wdng). · Explain WhyFeb 9 2017, 6:27 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.td

6 lines

AMDGPUAnnotateKernelFeatures.cpp

3 lines

4 lines

25 lines

3 lines

4 lines

67 lines

5 lines

12 lines

Utils/

AMDKernelCodeTInfo.h

2 lines

test/

CodeGen/

AMDGPU/

fneg-fabs.f16.ll

6 lines

trap.ll

77 lines

Diff 87848

lib/Target/AMDGPU/AMDGPU.td

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	>;			>;

	def FeatureUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access",			def FeatureUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access",
	"UnalignedBufferAccess",			"UnalignedBufferAccess",
	"true",			"true",
	"Support unaligned global loads and stores"			"Support unaligned global loads and stores"
	>;			>;

				def FeatureTrapHandler: SubtargetFeature<"trap-handler",
				arsenmUnsubmitted Not Done Reply Inline Actions Remove the word enable- arsenm: Remove the word enable-
				"TrapHandler",
				"true",
				"Trap handler support"
				>;

	def FeatureUnalignedScratchAccess : SubtargetFeature<"unaligned-scratch-access",			def FeatureUnalignedScratchAccess : SubtargetFeature<"unaligned-scratch-access",
	"UnalignedScratchAccess",			"UnalignedScratchAccess",
	"true",			"true",
	"Support unaligned scratch loads and stores"			"Support unaligned scratch loads and stores"
	>;			>;

	// XNACK is disabled if SH_MEM_CONFIG.ADDRESS_MODE = GPUVM on chips that support			// XNACK is disabled if SH_MEM_CONFIG.ADDRESS_MODE = GPUVM on chips that support
	// XNACK. The current default kernel driver setting is:			// XNACK. The current default kernel driver setting is:
	▲ Show 20 Lines • Show All 496 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	static const StringRef IntrinsicToAttr[][2] = {
// .x omitted		// .x omitted
{ "llvm.r600.read.tidig.y", "amdgpu-work-item-id-y" },		{ "llvm.r600.read.tidig.y", "amdgpu-work-item-id-y" },
{ "llvm.r600.read.tidig.z", "amdgpu-work-item-id-z" }		{ "llvm.r600.read.tidig.z", "amdgpu-work-item-id-z" }
};		};

static const StringRef HSAIntrinsicToAttr[][2] = {		static const StringRef HSAIntrinsicToAttr[][2] = {
{ "llvm.amdgcn.dispatch.ptr", "amdgpu-dispatch-ptr" },		{ "llvm.amdgcn.dispatch.ptr", "amdgpu-dispatch-ptr" },
{ "llvm.amdgcn.queue.ptr", "amdgpu-queue-ptr" },		{ "llvm.amdgcn.queue.ptr", "amdgpu-queue-ptr" },
{ "llvm.amdgcn.dispatch.id", "amdgpu-dispatch-id" },		{ "llvm.amdgcn.dispatch.id", "amdgpu-dispatch-id" },
{ "llvm.trap", "amdgpu-queue-ptr" }		{ "llvm.trap", "amdgpu-queue-ptr" },
		wdngAuthorUnsubmitted Not Done Reply Inline Actions Matts: "This also needs to handle debug.trap(), and the case when unreachable’s are turned into traps. There should also be tests specifically for the annotator." wdng: Matts: "This also needs to handle debug.trap(), and the case when unreachable’s are turned into…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions How should we handle debug.trap()? For the case of unreachable's, should we invoke debug.trap()? wdng: How should we handle debug.trap()? For the case of unreachable's, should we invoke debug.trap()?
		arsenmUnsubmitted Not Done Reply Inline Actions There is a separate llvm.debugtrap intrinsic. I looked around SelectionDAG for other places that introduce ISD::TRAP. The expansion for ISD::DEBUGTRAP creates a TRAP. unreachable instructions also emit a trap if the DAG.getTarget().Options.TrapUnreachable is enabled. arsenm: There is a separate llvm.debugtrap intrinsic. I looked around SelectionDAG for other places…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions Yes, ISD::DEBUGTRAP has been replaced with ISD::TRAP at the legalization phase. So, do we still need to map llvm.debugtrap intrinsic with amdgpu-queue-ptr? wdng: Yes, ISD::DEBUGTRAP has been replaced with ISD::TRAP at the legalization phase. So, do we still…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes. You should add a test which uses this. As it is this will fail because the queue ptr won't be enabled arsenm: Yes. You should add a test which uses this. As it is this will fail because the queue ptr won't…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions To enable or disable queue ptr, it depends whether we have used "-mtriple=amdgcn--amdhsa" or not. Since ISD::DEBUGTRAP has been replaced with ISD::TRAP, I don't think the test will fail if we compile llvm.debugtrap with "-mtriple=amdgcn--amdhsa", correct? wdng: To enable or disable queue ptr, it depends whether we have used "-mtriple=amdgcn--amdhsa" or…
		arsenmUnsubmitted Not Done Reply Inline Actions That occurs in the DAG. Here you are trying to identify all the situations where ISD::TRAP will be emitted from the IR. Eventually I want to replace how the ABI is lowered so we don't need to do this anymore, but for now you must predict all of the traps. arsenm: That occurs in the DAG. Here you are trying to identify all the situations where ISD::TRAP will…
		{ "llvm.debugtrap", "amdgpu-queue-ptr" }
};		};

// TODO: We should not add the attributes if the known compile time workgroup		// TODO: We should not add the attributes if the known compile time workgroup
// size is 1 for y/z.		// size is 1 for y/z.

// TODO: Intrinsics that require queue ptr.		// TODO: Intrinsics that require queue ptr.

// We do not need to note the x workitem or workgroup id because they are		// We do not need to note the x workitem or workgroup id because they are
Show All 21 Lines

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	if (STM.getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS) {
Twine(KernelInfo.DebuggerWavefrontPrivateSegmentOffsetSGPR), false);		Twine(KernelInfo.DebuggerWavefrontPrivateSegmentOffsetSGPR), false);
OutStreamer->emitRawComment(" DebuggerPrivateSegmentBufferSGPR: s" +		OutStreamer->emitRawComment(" DebuggerPrivateSegmentBufferSGPR: s" +
Twine(KernelInfo.DebuggerPrivateSegmentBufferSGPR), false);		Twine(KernelInfo.DebuggerPrivateSegmentBufferSGPR), false);
}		}

OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:USER_SGPR: " +		OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:USER_SGPR: " +
Twine(G_00B84C_USER_SGPR(KernelInfo.ComputePGMRSrc2)),		Twine(G_00B84C_USER_SGPR(KernelInfo.ComputePGMRSrc2)),
false);		false);
		OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TRAP_HANDLER: " +
		Twine(G_00B84C_TRAP_HANDLER(KernelInfo.ComputePGMRSrc2)),
		false);
OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_X_EN: " +		OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_X_EN: " +
Twine(G_00B84C_TGID_X_EN(KernelInfo.ComputePGMRSrc2)),		Twine(G_00B84C_TGID_X_EN(KernelInfo.ComputePGMRSrc2)),
false);		false);
OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_Y_EN: " +		OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_Y_EN: " +
Twine(G_00B84C_TGID_Y_EN(KernelInfo.ComputePGMRSrc2)),		Twine(G_00B84C_TGID_Y_EN(KernelInfo.ComputePGMRSrc2)),
false);		false);
OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_Z_EN: " +		OutStreamer->emitRawComment(" COMPUTE_PGM_RSRC2:TGID_Z_EN: " +
Twine(G_00B84C_TGID_Z_EN(KernelInfo.ComputePGMRSrc2)),		Twine(G_00B84C_TGID_Z_EN(KernelInfo.ComputePGMRSrc2)),
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	if (STM.getGeneration() < SISubtarget::VOLCANIC_ISLANDS) {
if (FlatUsed)		if (FlatUsed)
ExtraSGPRs = 4;		ExtraSGPRs = 4;
} else {		} else {
if (STM.isXNACKEnabled())		if (STM.isXNACKEnabled())
ExtraSGPRs = 4;		ExtraSGPRs = 4;

if (FlatUsed)		if (FlatUsed)
ExtraSGPRs = 6;		ExtraSGPRs = 6;
}		}

// Record first reserved register and reserved register count fields, and		// Record first reserved register and reserved register count fields, and
		tony-tyeUnsubmitted Not Done Reply Inline Actions I do not think enabling the trap handler reduces the number of available SGPRs. The hardware allocates and provides access to theses "in addition" to the regular SGPRs. If this is reserving registers then it should be deleted. tony-tye: I do not think enabling the trap handler reduces the number of available SGPRs. The hardware…
// update max register counts if "amdgpu-debugger-reserve-regs" attribute was		// update max register counts if "amdgpu-debugger-reserve-regs" attribute was
// requested.		// requested.
ProgInfo.ReservedVGPRFirst = STM.debuggerReserveRegs() ? MaxVGPR + 1 : 0;		ProgInfo.ReservedVGPRFirst = STM.debuggerReserveRegs() ? MaxVGPR + 1 : 0;
ProgInfo.ReservedVGPRCount = RI->getNumDebuggerReservedVGPRs(STM);		ProgInfo.ReservedVGPRCount = RI->getNumDebuggerReservedVGPRs(STM);

// Update DebuggerWavefrontPrivateSegmentOffsetSGPR and		// Update DebuggerWavefrontPrivateSegmentOffsetSGPR and
// DebuggerPrivateSegmentBufferSGPR fields if "amdgpu-debugger-emit-prologue"		// DebuggerPrivateSegmentBufferSGPR fields if "amdgpu-debugger-emit-prologue"
// attribute was requested.		// attribute was requested.
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
if (MFI->hasWorkItemIDZ())		if (MFI->hasWorkItemIDZ())
TIDIGCompCnt = 2;		TIDIGCompCnt = 2;
else if (MFI->hasWorkItemIDY())		else if (MFI->hasWorkItemIDY())
TIDIGCompCnt = 1;		TIDIGCompCnt = 1;

ProgInfo.ComputePGMRSrc2 =		ProgInfo.ComputePGMRSrc2 =
S_00B84C_SCRATCH_EN(ProgInfo.ScratchBlocks > 0) \|		S_00B84C_SCRATCH_EN(ProgInfo.ScratchBlocks > 0) \|
S_00B84C_USER_SGPR(MFI->getNumUserSGPRs()) \|		S_00B84C_USER_SGPR(MFI->getNumUserSGPRs()) \|
		S_00B84C_TRAP_HANDLER(STM.isTrapHandlerEnabled()) \|
		tony-tyeUnsubmitted Done Reply Inline Actions Should this be querying the TrapHandlerAbi using getTrapHandlerAbi() from the subtarget and setting to TRUE if not equal to TrapHandlerAbiNone ? tony-tye: Should this be querying the TrapHandlerAbi using getTrapHandlerAbi() from the subtarget and…
S_00B84C_TGID_X_EN(MFI->hasWorkGroupIDX()) \|		S_00B84C_TGID_X_EN(MFI->hasWorkGroupIDX()) \|
S_00B84C_TGID_Y_EN(MFI->hasWorkGroupIDY()) \|		S_00B84C_TGID_Y_EN(MFI->hasWorkGroupIDY()) \|
S_00B84C_TGID_Z_EN(MFI->hasWorkGroupIDZ()) \|		S_00B84C_TGID_Z_EN(MFI->hasWorkGroupIDZ()) \|
S_00B84C_TG_SIZE_EN(MFI->hasWorkGroupInfo()) \|		S_00B84C_TG_SIZE_EN(MFI->hasWorkGroupInfo()) \|
S_00B84C_TIDIG_COMP_CNT(TIDIGCompCnt) \|		S_00B84C_TIDIG_COMP_CNT(TIDIGCompCnt) \|
S_00B84C_EXCP_EN_MSB(0) \|		S_00B84C_EXCP_EN_MSB(0) \|
S_00B84C_LDS_SIZE(ProgInfo.LDSBlocks) \|		S_00B84C_LDS_SIZE(ProgInfo.LDSBlocks) \|
S_00B84C_EXCP_EN(0);		S_00B84C_EXCP_EN(0);
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	enum {
ISAVersion8_0_0,		ISAVersion8_0_0,
ISAVersion8_0_1,		ISAVersion8_0_1,
ISAVersion8_0_2,		ISAVersion8_0_2,
ISAVersion8_0_3,		ISAVersion8_0_3,
ISAVersion8_0_4,		ISAVersion8_0_4,
ISAVersion8_1_0,		ISAVersion8_1_0,
};		};

		enum TrapHandlerAbi {
		TrapHandlerAbiNone = 0,
		TrapHandlerAbiHsa = 1
		};

		tony-tyeUnsubmitted Not Done Reply Inline Actions Add enumeration for S_TRAP codes: enum TrapCode { TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2 }; tony-tye: Add enumeration for S_TRAP codes: enum TrapCode { TrapCodeLLVMTrap = 1…
		enum TrapCode {
		TrapCodeBreakPoint = 0,
		TrapCodeLLVMTrap = 1,
		TrapCodeLLVMDebugTrap = 2,
		TrapCodeHSADebugTrap = 3
		};

		enum TrapRegValues {
		TrapCodeLLVMTrapRegValue = 1
		tony-tyeUnsubmitted Not Done Reply Inline Actions After discussion with the HSA runtime team it was decided to use separate S_TRAP codes for each purpose. The current codes would be: enum TrapCode { TrapCodeBreakpoint = 0, TrapCodeLLVMTrap = 1, TrapCodeLLVMDebugTrap = 2, TrapCodeHSADebugTrap = 3 }; These need documenting in the AMDGPU LLVM Spec page. For now we do not have an intrinsic for the HSA debugtrap (which takes an argument) so remove the TrapRegValues enum. tony-tye: After discussion with the HSA runtime team it was decided to use separate S_TRAP codes for each…
		tony-tyeUnsubmitted Not Done Reply Inline Actions The TRAP handler codes can be defined in a new section of: http://llvm.org/docs/AMDGPUUsage.html tony-tye: The TRAP handler codes can be defined in a new section of: http://llvm.org/docs/AMDGPUUsage.
		};

protected:		protected:
// Basic subtarget description.		// Basic subtarget description.
Triple TargetTriple;		Triple TargetTriple;
Generation Gen;		Generation Gen;
unsigned IsaVersion;		unsigned IsaVersion;
unsigned WavefrontSize;		unsigned WavefrontSize;
int LocalMemorySize;		int LocalMemorySize;
int LDSBankCount;		int LDSBankCount;
unsigned MaxPrivateElementSize;		unsigned MaxPrivateElementSize;

// Possibly statically set by tablegen, but may want to be overridden.		// Possibly statically set by tablegen, but may want to be overridden.
bool FastFMAF32;		bool FastFMAF32;
bool HalfRate64Ops;		bool HalfRate64Ops;

// Dynamially set bits that enable features.		// Dynamially set bits that enable features.
bool FP32Denormals;		bool FP32Denormals;
bool FP64FP16Denormals;		bool FP64FP16Denormals;
bool FPExceptions;		bool FPExceptions;
bool FlatForGlobal;		bool FlatForGlobal;
bool NoAddr64;		bool NoAddr64;
bool UnalignedScratchAccess;		bool UnalignedScratchAccess;
		bool TrapHandler;
		arsenmUnsubmitted Not Done Reply Inline Actions Can you sort this later until after EnableXNACK? I think the unaligned options should stay together arsenm: Can you sort this later until after EnableXNACK? I think the unaligned options should stay…
bool UnalignedBufferAccess;		bool UnalignedBufferAccess;
bool EnableXNACK;		bool EnableXNACK;
bool DebuggerInsertNops;		bool DebuggerInsertNops;
bool DebuggerReserveRegs;		bool DebuggerReserveRegs;
bool DebuggerEmitPrologue;		bool DebuggerEmitPrologue;

		arsenmUnsubmitted Done Reply Inline Actions Comment unnecessary, also nothing HSA specific here arsenm: Comment unnecessary, also nothing HSA specific here
// Used as options.		// Used as options.
		tony-tyeUnsubmitted Done Reply Inline Actions Should this be deleted now as there is getTrapHandlerAbi() instead? tony-tye: Should this be deleted now as there is getTrapHandlerAbi() instead?
bool EnableVGPRSpilling;		bool EnableVGPRSpilling;
bool EnablePromoteAlloca;		bool EnablePromoteAlloca;
bool EnableLoadStoreOpt;		bool EnableLoadStoreOpt;
bool EnableUnsafeDSOffsetFolding;		bool EnableUnsafeDSOffsetFolding;
bool EnableSIScheduler;		bool EnableSIScheduler;
bool DumpCode;		bool DumpCode;

// Subtarget statically properties set by tablegen		// Subtarget statically properties set by tablegen
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	public:
bool hasBORROW() const {		bool hasBORROW() const {
return (getGeneration() >= EVERGREEN);		return (getGeneration() >= EVERGREEN);
}		}

bool hasCaymanISA() const {		bool hasCaymanISA() const {
return CaymanISA;		return CaymanISA;
}		}

		TrapHandlerAbi getTrapHandlerAbi() const {
		arsenmUnsubmitted Done Reply Inline Actions Return enum type arsenm: Return enum type
		return isAmdHsaOS() ? TrapHandlerAbiHsa : TrapHandlerAbiNone;
		}

bool isPromoteAllocaEnabled() const {		bool isPromoteAllocaEnabled() const {
		arsenmUnsubmitted Not Done Reply Inline Actions Still using return after else. Should be return ternary operator arsenm: Still using return after else. Should be return ternary operator
return EnablePromoteAlloca;		return EnablePromoteAlloca;
		arsenmUnsubmitted Done Reply Inline Actions Badly formatted. return ternary operator arsenm: Badly formatted. return ternary operator
}		}

bool unsafeDSOffsetFoldingEnabled() const {		bool unsafeDSOffsetFoldingEnabled() const {
return EnableUnsafeDSOffsetFolding;		return EnableUnsafeDSOffsetFolding;
}		}

bool dumpCode() const {		bool dumpCode() const {
return DumpCode;		return DumpCode;
Show All 34 Lines	public:
bool hasUnalignedBufferAccess() const {		bool hasUnalignedBufferAccess() const {
return UnalignedBufferAccess;		return UnalignedBufferAccess;
}		}

bool hasUnalignedScratchAccess() const {		bool hasUnalignedScratchAccess() const {
return UnalignedScratchAccess;		return UnalignedScratchAccess;
}		}

		bool isTrapHandlerEnabled() const {
		return TrapHandler;
		}

bool isXNACKEnabled() const {		bool isXNACKEnabled() const {
return EnableXNACK;		return EnableXNACK;
}		}

bool isMesaKernel(const MachineFunction &MF) const {		bool isMesaKernel(const MachineFunction &MF) const {
return isMesa3DOS() && !AMDGPU::isShader(MF.getFunction()->getCallingConv());		return isMesa3DOS() && !AMDGPU::isShader(MF.getFunction()->getCallingConv());
		tony-tyeUnsubmitted Not Done Reply Inline Actions In order to allow additional trap handler ABIs this should be: getTrapHandlerAbi() != TrapHandlerAbiNone tony-tye: In order to allow additional trap handler ABIs this should be: getTrapHandlerAbi() !=…
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions This should be a subtarget feature. Returning this is still a constant based purely on the triple. See FeatureUnalignedBufferAccess for an example arsenm: This should be a subtarget feature. Returning this is still a constant based purely on the…
		tony-tyeUnsubmitted Not Done Reply Inline Actions Could you explain this more? The trap handler being present is determined by the environment part of the triple. Currently only the HSA environment uses a trap handler. tony-tye: Could you explain this more? The trap handler being present is determined by the environment…

// Covers VS/PS/CS graphics shaders		// Covers VS/PS/CS graphics shaders
bool isMesaGfxShader(const MachineFunction &MF) const {		bool isMesaGfxShader(const MachineFunction &MF) const {
return isMesa3DOS() && AMDGPU::isShader(MF.getFunction()->getCallingConv());		return isMesa3DOS() && AMDGPU::isShader(MF.getFunction()->getCallingConv());
}		}

bool isAmdCodeObjectV2(const MachineFunction &MF) const {		bool isAmdCodeObjectV2(const MachineFunction &MF) const {
return isAmdHsaOS() \|\| isMesaKernel(MF);		return isAmdHsaOS() \|\| isMesaKernel(MF);
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
};		};

class SISubtarget final : public AMDGPUSubtarget {		class SISubtarget final : public AMDGPUSubtarget {
public:		public:
enum {		enum {
// The closed Vulkan driver sets 96, which limits the wave count to 8 but		// The closed Vulkan driver sets 96, which limits the wave count to 8 but
// doesn't spill SGPRs as much as when 80 is set.		// doesn't spill SGPRs as much as when 80 is set.
FIXED_SGPR_COUNT_FOR_INIT_BUG = 96		FIXED_SGPR_COUNT_FOR_INIT_BUG = 96
};		};
		tony-tyeUnsubmitted Not Done Reply Inline Actions I do not see this being used anywhere. I think it should be used in the waves_per_eu calculation. tony-tye: I do not see this being used anywhere. I think it should be used in the waves_per_eu…

private:		private:
SIInstrInfo InstrInfo;		SIInstrInfo InstrInfo;
SIFrameLowering FrameLowering;		SIFrameLowering FrameLowering;
SITargetLowering TLInfo;		SITargetLowering TLInfo;
std::unique_ptr<GISelAccessor> GISel;		std::unique_ptr<GISelAccessor> GISel;

public:		public:
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show All 37 Lines	AMDGPUSubtarget::initializeSubtargetDependencies(const Triple &TT,
// double precision rate, so don't enable by default.		// double precision rate, so don't enable by default.
//		//
// We want to be able to turn these off, but making this a subtarget feature		// We want to be able to turn these off, but making this a subtarget feature
// for SI has the unhelpful behavior that it unsets everything else if you		// for SI has the unhelpful behavior that it unsets everything else if you
// disable it.		// disable it.

SmallString<256> FullFS("+promote-alloca,+fp64-fp16-denormals,+load-store-opt,");		SmallString<256> FullFS("+promote-alloca,+fp64-fp16-denormals,+load-store-opt,");
if (isAmdHsaOS()) // Turn on FlatForGlobal for HSA.		if (isAmdHsaOS()) // Turn on FlatForGlobal for HSA.
FullFS += "+flat-for-global,+unaligned-buffer-access,";		FullFS += "+flat-for-global,+unaligned-buffer-access,+trap-handler,";

FullFS += FS;		FullFS += FS;

ParseSubtargetFeatures(GPU, FullFS);		ParseSubtargetFeatures(GPU, FullFS);

// FIXME: I don't think think Evergreen has any useful support for		// FIXME: I don't think think Evergreen has any useful support for
// denormals, but should be checked. Should we issue a warning somewhere		// denormals, but should be checked. Should we issue a warning somewhere
// if someone tries to enable these?		// if someone tries to enable these?
Show All 24 Lines	: AMDGPUGenSubtargetInfo(TT, GPU, FS),
HalfRate64Ops(false),		HalfRate64Ops(false),

FP32Denormals(false),		FP32Denormals(false),
FP64FP16Denormals(false),		FP64FP16Denormals(false),
FPExceptions(false),		FPExceptions(false),
FlatForGlobal(false),		FlatForGlobal(false),
NoAddr64(false),		NoAddr64(false),
UnalignedScratchAccess(false),		UnalignedScratchAccess(false),
		TrapHandler(false),
		arsenmUnsubmitted Not Done Reply Inline Actions Sort arsenm: Sort
UnalignedBufferAccess(false),		UnalignedBufferAccess(false),

EnableXNACK(false),		EnableXNACK(false),
DebuggerInsertNops(false),		DebuggerInsertNops(false),
DebuggerReserveRegs(false),		DebuggerReserveRegs(false),
DebuggerEmitPrologue(false),		DebuggerEmitPrologue(false),

EnableVGPRSpilling(false),		EnableVGPRSpilling(false),
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIDefines.h

	Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines

	#define R_00B84C_COMPUTE_PGM_RSRC2 0x00B84C			#define R_00B84C_COMPUTE_PGM_RSRC2 0x00B84C
	#define S_00B84C_SCRATCH_EN(x) (((x) & 0x1) << 0)			#define S_00B84C_SCRATCH_EN(x) (((x) & 0x1) << 0)
	#define G_00B84C_SCRATCH_EN(x) (((x) >> 0) & 0x1)			#define G_00B84C_SCRATCH_EN(x) (((x) >> 0) & 0x1)
	#define C_00B84C_SCRATCH_EN 0xFFFFFFFE			#define C_00B84C_SCRATCH_EN 0xFFFFFFFE
	#define S_00B84C_USER_SGPR(x) (((x) & 0x1F) << 1)			#define S_00B84C_USER_SGPR(x) (((x) & 0x1F) << 1)
	#define G_00B84C_USER_SGPR(x) (((x) >> 1) & 0x1F)			#define G_00B84C_USER_SGPR(x) (((x) >> 1) & 0x1F)
	#define C_00B84C_USER_SGPR 0xFFFFFFC1			#define C_00B84C_USER_SGPR 0xFFFFFFC1
				#define S_00B84C_TRAP_HANDLER(x) (((x) & 0x1) << 6)
				#define G_00B84C_TRAP_HANDLER(x) (((x) >> 6) & 0x1)
				#define C_00B84C_TRAP_HANDLER 0xFFFFFFBF
	#define S_00B84C_TGID_X_EN(x) (((x) & 0x1) << 7)			#define S_00B84C_TGID_X_EN(x) (((x) & 0x1) << 7)
	#define G_00B84C_TGID_X_EN(x) (((x) >> 7) & 0x1)			#define G_00B84C_TGID_X_EN(x) (((x) >> 7) & 0x1)
	#define C_00B84C_TGID_X_EN 0xFFFFFF7F			#define C_00B84C_TGID_X_EN 0xFFFFFF7F
	#define S_00B84C_TGID_Y_EN(x) (((x) & 0x1) << 8)			#define S_00B84C_TGID_Y_EN(x) (((x) & 0x1) << 8)
	#define G_00B84C_TGID_Y_EN(x) (((x) >> 8) & 0x1)			#define G_00B84C_TGID_Y_EN(x) (((x) >> 8) & 0x1)
	#define C_00B84C_TGID_Y_EN 0xFFFFFEFF			#define C_00B84C_TGID_Y_EN 0xFFFFFEFF
	#define S_00B84C_TGID_Z_EN(x) (((x) & 0x1) << 9)			#define S_00B84C_TGID_Z_EN(x) (((x) & 0x1) << 9)
	#define G_00B84C_TGID_Z_EN(x) (((x) >> 9) & 0x1)			#define G_00B84C_TGID_Z_EN(x) (((x) >> 9) & 0x1)
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	#define R_00B860_COMPUTE_TMPRING_SIZE 0x00B860			#define R_00B860_COMPUTE_TMPRING_SIZE 0x00B860
	#define S_00B860_WAVESIZE(x) (((x) & 0x1FFF) << 12)			#define S_00B860_WAVESIZE(x) (((x) & 0x1FFF) << 12)

	#define R_0286E8_SPI_TMPRING_SIZE 0x0286E8			#define R_0286E8_SPI_TMPRING_SIZE 0x0286E8
	#define S_0286E8_WAVESIZE(x) (((x) & 0x1FFF) << 12)			#define S_0286E8_WAVESIZE(x) (((x) & 0x1FFF) << 12)

	#define R_SPILLED_SGPRS 0x4			#define R_SPILLED_SGPRS 0x4
	#define R_SPILLED_VGPRS 0x8			#define R_SPILLED_VGPRS 0x8

	} // End namespace llvm			} // End namespace llvm

	#endif			#endif

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,
}		}

setOperationAction(ISD::BSWAP, MVT::i32, Legal);		setOperationAction(ISD::BSWAP, MVT::i32, Legal);
setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);		setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);

// On SI this is s_memtime and s_memrealtime on VI.		// On SI this is s_memtime and s_memrealtime on VI.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
setOperationAction(ISD::TRAP, MVT::Other, Legal);		setOperationAction(ISD::TRAP, MVT::Other, Legal);
		setOperationAction(ISD::DEBUGTRAP, MVT::Other, Legal);
		tony-tyeUnsubmitted Done Reply Inline Actions In discussion with @kzhuravl the suggestion was to use a subtarget query to determine the trap abi. Currently there are two ABIs, one for HSA and one for graphics, but in the future there could be others. tony-tye: In discussion with @kzhuravl the suggestion was to use a subtarget query to determine the trap…
		arsenmUnsubmitted Done Reply Inline Actions DEBUGTRAP should also be legal, you are handling the differences in the selection pattern and custom inserter arsenm: DEBUGTRAP should also be legal, you are handling the differences in the selection pattern and…
		wdngAuthorUnsubmitted Done Reply Inline Actions Yes, I did set DEBUGTRAP to Legal for next pattern matching at isel. wdng: Yes, I did set DEBUGTRAP to Legal for next pattern matching at isel.

setOperationAction(ISD::FMINNUM, MVT::f64, Legal);		setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);
		arsenmUnsubmitted Done Reply Inline Actions This does not need an else arsenm: This does not need an else
		wdngAuthorUnsubmitted Done Reply Inline Actions I think we need leave the else here. I assume default setting is Legal, but looks like we need to explicitly specify it otherwise it will throw an error. wdng: I think we need leave the else here. I assume default setting is Legal, but looks like we need…
		tony-tyeUnsubmitted Done Reply Inline Actions The else implements the non-HSA expansion that generates an end_prm which is used for graphics. tony-tye: The else implements the non-HSA expansion that generates an end_prm which is used for graphics.
		wdngAuthorUnsubmitted Done Reply Inline Actions yes, so I think we need to keep the else part. wdng: yes, so I think we need to keep the else part.
		kzhuravlUnsubmitted Done Reply Inline Actions If we go with the approach in my latest comment, custom lowering won't be needed. kzhuravl: If we go with the approach in my latest comment, custom lowering won't be needed.

if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {		if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {
setOperationAction(ISD::FTRUNC, MVT::f64, Legal);		setOperationAction(ISD::FTRUNC, MVT::f64, Legal);
setOperationAction(ISD::FCEIL, MVT::f64, Legal);		setOperationAction(ISD::FCEIL, MVT::f64, Legal);
setOperationAction(ISD::FRINT, MVT::f64, Legal);		setOperationAction(ISD::FRINT, MVT::f64, Legal);
}		}

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);
▲ Show 20 Lines • Show All 1,493 Lines • ▼ Show 20 Lines	if (TII->isMIMG(MI)) {
if (MI.mayLoad())		if (MI.mayLoad())
Flags \|= MachineMemOperand::MOLoad;		Flags \|= MachineMemOperand::MOLoad;

auto MMO = MF->getMachineMemOperand(PtrInfo, Flags, 0, 0);		auto MMO = MF->getMachineMemOperand(PtrInfo, Flags, 0, 0);
MI.addMemOperand(*MF, MMO);		MI.addMemOperand(*MF, MMO);
return BB;		return BB;
}		}

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::S_TRAP_PSEUDO: {		case AMDGPU::S_TRAP_PSEUDO: {
		kzhuravlUnsubmitted Done Reply Inline Actions If we go with the approach in my latest comment, this should switch on ABI type, for HSA ABI this code will be used. For "none" ABIs `s_engpgm` should be used (without error or warning). kzhuravl: If we go with the approach in my latest comment, this should switch on ABI type, for HSA ABI…
DebugLoc DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();
BuildMI(*BB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), AMDGPU::VGPR0)		const int TrapType = MI.getOperand(0).getImm();
.addImm(1);
		arsenmUnsubmitted Done Reply Inline Actions Compare to enum value arsenm: Compare to enum value
		wdngAuthorUnsubmitted Not Done Reply Inline Actions Say, if we enable trap handler via subtarget feature, should we need to change the if here? wdng: Say, if we enable trap handler via subtarget feature, should we need to change the if here?
		if (Subtarget->getTrapHandlerAbi() == SISubtarget::TrapHandlerAbiHsa &&
		arsenmUnsubmitted Done Reply Inline Actions const DebugLoc &DL arsenm: const DebugLoc &DL
		Subtarget->isTrapHandlerEnabled()) {

MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
		arsenmUnsubmitted Not Done Reply Inline Actions Should have enum (probably in SIDefines.h) for the values put in arsenm: Should have enum (probably in SIDefines.h) for the values put in
		wdngAuthorUnsubmitted Not Done Reply Inline Actions This probably is not an enum. It looks like an interface with HSA is that we need to set v0 to 1: v0 <- 1 wdng: This probably is not an enum. It looks like an interface with HSA is that we need to set v0 to…
		kzhuravlUnsubmitted Not Done Reply Inline Actions I think defining it in SIDefines.h for now is OK. I am working on some restructuring to shared header files, and will include it in the restructuring. kzhuravl: I think defining it in SIDefines.h for now is OK. I am working on some restructuring to shared…
		tony-tyeUnsubmitted Done Reply Inline Actions Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value. tony-tye: Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.
SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
unsigned UserSGPR = Info->getQueuePtrUserSGPR();		unsigned UserSGPR = Info->getQueuePtrUserSGPR();
assert(UserSGPR != AMDGPU::NoRegister);		assert(UserSGPR != AMDGPU::NoRegister);

if (!BB->isLiveIn(UserSGPR))		if (!BB->isLiveIn(UserSGPR))
BB->addLiveIn(UserSGPR);		BB->addLiveIn(UserSGPR);

BuildMI(*BB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::SGPR0_SGPR1)		BuildMI(*BB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::SGPR0_SGPR1)
.addReg(UserSGPR);		.addReg(UserSGPR);
BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_TRAP)).addImm(0x1)		BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_TRAP))
		kzhuravlUnsubmitted Not Done Reply Inline Actions @tony-tye, do we want to use other code for `s_trap` for `llvm.debugtrap`? kzhuravl: @tony-tye, do we want to use other code for `s_trap` for `llvm.debugtrap`?
		tony-tyeUnsubmitted Not Done Reply Inline Actions Should the 0x1 constant be a named enumeration? Should we use a different values for llvm.trap than for llvm.debugtrap? If so the handling of llvm.debugtrap should not be converted to llvm.trap for AMDGPU. Have we checked with teh HSA Runtime to see what codes the HSA trap handler is expecting? tony-tye: Should the 0x1 constant be a named enumeration? Should we use a different values for llvm.trap…
		tony-tyeUnsubmitted Not Done Reply Inline Actions I checked with the HSA Runtime and currently the trap code is not being consulted. So I propose adding an enum for the codes being used for s_trap and use the value here. It would be better to different code for llvm.trap and llvm.debugtrap. Would that require S_TRAP_PSEUDO to have an immediate operand that is set to the trap code used in the S_TRAP instruction. tony-tye: I checked with the HSA Runtime and currently the trap code is not being consulted. So I propose…
.addReg(AMDGPU::VGPR0, RegState::Implicit)		.addImm(TrapType)
.addReg(AMDGPU::SGPR0_SGPR1, RegState::Implicit);		.addReg(AMDGPU::SGPR0_SGPR1, RegState::Implicit);
		arsenmUnsubmitted Done Reply Inline Actions Formatting of previous code needs to be fixed arsenm: Formatting of previous code needs to be fixed
		} else {
		switch (TrapType) {
		arsenmUnsubmitted Done Reply Inline Actions Else on previous line. Instead of else you can early return on the error path arsenm: Else on previous line. Instead of else you can early return on the error path
		tony-tyeUnsubmitted Done Reply Inline Actions Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value. tony-tye: Delete this as the traps generated for llvm.trap and llvm.debugtrap will not pass in any value.
		tony-tyeUnsubmitted Not Done Reply Inline Actions Since TrapType is an enum type should this be a switch with a default: that asserts? Currently the code will silently ignore any new trap types that are added (an assert would avoid that). tony-tye: Since TrapType is an enum type should this be a switch with a default: that asserts? Currently…
		case SISubtarget::TrapCodeBreakPoint:
		assert(TrapType != SISubtarget::TrapCodeBreakPoint &&
		"TrapCodeBreakPoint is not supported!");
		break;
		arsenmUnsubmitted Done Reply Inline Actions Asserting !the switch case in the switch case is ugly. There's no reason to special case this, just put unreachable in the default case arsenm: Asserting !the switch case in the switch case is ugly. There's no reason to special case this…
		case SISubtarget::TrapCodeLLVMTrap:
		BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_ENDPGM));
		break;
		arsenmUnsubmitted Done Reply Inline Actions Badly formatted. You already have MF and get the Function* above, so you don't need double getParent arsenm: Badly formatted. You already have MF and get the Function* above, so you don't need double…
		case SISubtarget::TrapCodeLLVMDebugTrap: {
		tony-tyeUnsubmitted Done Reply Inline Actions Should we be emitting any diagnostic here? If there is no trap handler then doesn't that imply that the environment wants an S_ENDPGM to be generated instead? That is not an error, it is that the environment demands as the implementation. For example, that is what the graphics environment wants. tony-tye: Should we be emitting any diagnostic here? If there is no trap handler then doesn't that imply…
		DiagnosticInfoUnsupported NoTrap(*MF->getFunction(),
		tony-tyeUnsubmitted Not Done Reply Inline Actions Since llvm.debugtrap is not defined as NORETURN then it seems after a warning it should be lowered to a S_NOP not an S_ENDPGM. llvm.trap is defined as NORETURN so generating S_ENDPGM seems the right choice when there is no trap handler. tony-tye: Since llvm.debugtrap is not defined as NORETURN then it seems after a warning it should be…
		"debugtrap handler not supported",
		DL,
		DS_Warning);
		LLVMContext &C = MF->getFunction()->getContext();
		C.diagnose(NoTrap);
		BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_NOP)).addImm(0);
		arsenmUnsubmitted Done Reply Inline Actions addImm on next line. We don't really need to emit anything here though arsenm: addImm on next line. We don't really need to emit anything here though
		tony-tyeUnsubmitted Not Done Reply Inline Actions Discussed with @arsenm and leaving as a NOP does give some flexibility asa tool may want to patch these points. Since llvm.debugtrap is not being generally used it does not hurt to have a NOP. tony-tye: Discussed with @arsenm and leaving as a NOP does give some flexibility asa tool may want to…
		break;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Remove dead line arsenm: Remove dead line
		case SISubtarget::TrapCodeHSADebugTrap:
		assert(TrapType != SISubtarget::TrapCodeHSADebugTrap &&
		"HSA Debug Trap is not supported!");
		break;
		arsenmUnsubmitted Not Done Reply Inline Actions Just let it default arsenm: Just let it default
		default:
		assert(false && "Unsupported trap handler type!");
		arsenmUnsubmitted Not Done Reply Inline Actions llvm_unreachable arsenm: llvm_unreachable
		break;
		}
		}

MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;
}		}

case AMDGPU::SI_INIT_M0:		case AMDGPU::SI_INIT_M0:
BuildMI(*BB, MI.getIterator(), MI.getDebugLoc(),		BuildMI(*BB, MI.getIterator(), MI.getDebugLoc(),
TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)		TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)
.add(MI.getOperand(0));		.add(MI.getOperand(0));
MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;

case AMDGPU::GET_GROUPSTATICSIZE: {		case AMDGPU::GET_GROUPSTATICSIZE: {
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_MOV_B32))		BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_MOV_B32))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.addImm(MFI->getLDSSize());		.addImm(MFI->getLDSSize());
MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;
}		}
case AMDGPU::SI_INDIRECT_SRC_V1:		case AMDGPU::SI_INDIRECT_SRC_V1:
		arsenmUnsubmitted Done Reply Inline Actions Can we have an enum for these values somewhere instead of hard coding 1? arsenm: Can we have an enum for these values somewhere instead of hard coding 1?
case AMDGPU::SI_INDIRECT_SRC_V2:		case AMDGPU::SI_INDIRECT_SRC_V2:
case AMDGPU::SI_INDIRECT_SRC_V4:		case AMDGPU::SI_INDIRECT_SRC_V4:
case AMDGPU::SI_INDIRECT_SRC_V8:		case AMDGPU::SI_INDIRECT_SRC_V8:
case AMDGPU::SI_INDIRECT_SRC_V16:		case AMDGPU::SI_INDIRECT_SRC_V16:
return emitIndirectSrc(MI, BB, getSubtarget());		return emitIndirectSrc(MI, BB, getSubtarget());
case AMDGPU::SI_INDIRECT_DST_V1:		case AMDGPU::SI_INDIRECT_DST_V1:
case AMDGPU::SI_INDIRECT_DST_V2:		case AMDGPU::SI_INDIRECT_DST_V2:
case AMDGPU::SI_INDIRECT_DST_V4:		case AMDGPU::SI_INDIRECT_DST_V4:
case AMDGPU::SI_INDIRECT_DST_V8:		case AMDGPU::SI_INDIRECT_DST_V8:
case AMDGPU::SI_INDIRECT_DST_V16:		case AMDGPU::SI_INDIRECT_DST_V16:
return emitIndirectDst(MI, BB, getSubtarget());		return emitIndirectDst(MI, BB, getSubtarget());
case AMDGPU::SI_KILL:		case AMDGPU::SI_KILL:
return splitKillBlock(MI, BB);		return splitKillBlock(MI, BB);
case AMDGPU::V_CNDMASK_B64_PSEUDO: {		case AMDGPU::V_CNDMASK_B64_PSEUDO: {
		arsenmUnsubmitted Not Done Reply Inline Actions This duplicates nearly the entire other switch case. They should be consolidated based on the immediate argument and handled with the single pseudo arsenm: This duplicates nearly the entire other switch case. They should be consolidated based on the…
MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();

unsigned Dst = MI.getOperand(0).getReg();		unsigned Dst = MI.getOperand(0).getReg();
unsigned Src0 = MI.getOperand(1).getReg();		unsigned Src0 = MI.getOperand(1).getReg();
unsigned Src1 = MI.getOperand(2).getReg();		unsigned Src1 = MI.getOperand(2).getReg();
const DebugLoc &DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();
unsigned SrcCond = MI.getOperand(3).getReg();		unsigned SrcCond = MI.getOperand(3).getReg();

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: return AMDGPUTargetLowering::LowerOperation(Op, DAG);		default: return AMDGPUTargetLowering::LowerOperation(Op, DAG);
case ISD::BRCOND: return LowerBRCOND(Op, DAG);		case ISD::BRCOND: return LowerBRCOND(Op, DAG);
case ISD::LOAD: {		case ISD::LOAD: {
SDValue Result = LowerLOAD(Op, DAG);		SDValue Result = LowerLOAD(Op, DAG);
assert((!Result.getNode() \|\|		assert((!Result.getNode() \|\|
		arsenmUnsubmitted Not Done Reply Inline Actions return on next line arsenm: return on next line
Result.getNode()->getNumValues() == 2) &&		Result.getNode()->getNumValues() == 2) &&
"Load should return a value and a chain");		"Load should return a value and a chain");
return Result;		return Result;
}		}

case ISD::FSIN:		case ISD::FSIN:
case ISD::FCOS:		case ISD::FCOS:
return LowerTrig(Op, DAG);		return LowerTrig(Op, DAG);
▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,
// FIXME: Use a PseudoSourceValue once those can be assigned an address space.		// FIXME: Use a PseudoSourceValue once those can be assigned an address space.
MachinePointerInfo PtrInfo(UndefValue::get(PtrTy));		MachinePointerInfo PtrInfo(UndefValue::get(PtrTy));

return DAG.getLoad(PtrVT, DL, DAG.getEntryNode(), GOTAddr, PtrInfo, Align,		return DAG.getLoad(PtrVT, DL, DAG.getEntryNode(), GOTAddr, PtrInfo, Align,
MachineMemOperand::MODereferenceable \|		MachineMemOperand::MODereferenceable \|
MachineMemOperand::MOInvariant);		MachineMemOperand::MOInvariant);
}		}

SDValue SITargetLowering::copyToM0(SelectionDAG &DAG, SDValue Chain,		SDValue SITargetLowering::copyToM0(SelectionDAG &DAG, SDValue Chain,
const SDLoc &DL, SDValue V) const {		const SDLoc &DL, SDValue V) const {
// We can't use S_MOV_B32 directly, because there is no way to specify m0 as		// We can't use S_MOV_B32 directly, because there is no way to specify m0 as
// the destination register.		// the destination register.
//		//
// We can't use CopyToReg, because MachineCSE won't combine COPY instructions,		// We can't use CopyToReg, because MachineCSE won't combine COPY instructions,
// so we will end up with redundant moves to m0.		// so we will end up with redundant moves to m0.
//		//
// We use a pseudo to ensure we emit s_mov_b32 with m0 as the direct result.		// We use a pseudo to ensure we emit s_mov_b32 with m0 as the direct result.

// A Null SDValue creates a glue result.		// A Null SDValue creates a glue result.
SDNode *M0 = DAG.getMachineNode(AMDGPU::SI_INIT_M0, DL, MVT::Other, MVT::Glue,		SDNode *M0 = DAG.getMachineNode(AMDGPU::SI_INIT_M0, DL, MVT::Other, MVT::Glue,
V, Chain);		V, Chain);
return SDValue(M0, 0);		return SDValue(M0, 0);
}		}

SDValue SITargetLowering::lowerImplicitZextParam(SelectionDAG &DAG,		SDValue SITargetLowering::lowerImplicitZextParam(SelectionDAG &DAG,
		kzhuravlUnsubmitted Not Done Reply Inline Actions If we go with the approach in my latest comment, this will be gone. kzhuravl: If we go with the approach in my latest comment, this will be gone.
SDValue Op,		SDValue Op,
MVT VT,		MVT VT,
unsigned Offset) const {		unsigned Offset) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue Param = LowerParameter(DAG, MVT::i32, MVT::i32, SL,		SDValue Param = LowerParameter(DAG, MVT::i32, MVT::i32, SL,
DAG.getEntryNode(), Offset, false);		DAG.getEntryNode(), Offset, false);
// The local size values will have the hi 16-bits as zero.		// The local size values will have the hi 16-bits as zero.
return DAG.getNode(ISD::AssertZext, SL, MVT::i32, Param,		return DAG.getNode(ISD::AssertZext, SL, MVT::i32, Param,
▲ Show 20 Lines • Show All 2,251 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 611 Lines • ▼ Show 20 Lines
	def DSTCLAMP {			def DSTCLAMP {
	int NONE = 0;			int NONE = 0;
	}			}

	def DSTOMOD {			def DSTOMOD {
	int NONE = 0;			int NONE = 0;
	}			}

				def TRAPTYPE {
				int LLVM_TRAP = 1;
				int LLVM_DEBUG_TRAP = 2;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// SI Instruction multiclass helpers.			// SI Instruction multiclass helpers.
	//			//
	// Instructions with _32 take 32-bit operands.			// Instructions with _32 take 32-bit operands.
	// Instructions with _64 take 64-bit operands.			// Instructions with _64 take 64-bit operands.
	//			//
	// VOP_* instructions can use either a 32-bit or 64-bit encoding. The 32-bit			// VOP_* instructions can use either a 32-bit or 64-bit encoding. The 32-bit
	▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
}		}

// 64-bit vector move instruction. This is mainly used by the SIFoldOperands		// 64-bit vector move instruction. This is mainly used by the SIFoldOperands
// pass to enable folding of inline immediates.		// pass to enable folding of inline immediates.
def V_MOV_B64_PSEUDO : VPseudoInstSI <(outs VReg_64:$vdst),		def V_MOV_B64_PSEUDO : VPseudoInstSI <(outs VReg_64:$vdst),
(ins VSrc_b64:$src0)>;		(ins VSrc_b64:$src0)>;
} // End let hasSideEffects = 0, mayLoad = 0, mayStore = 0, Uses = [EXEC]		} // End let hasSideEffects = 0, mayLoad = 0, mayStore = 0, Uses = [EXEC]

def S_TRAP_PSEUDO : VPseudoInstSI <(outs), (ins),		def S_TRAP_PSEUDO : SPseudoInstSI <(outs), (ins i16imm:$simm16)> {
		arsenmUnsubmitted Done Reply Inline Actions You should use i16imm for the operand type to match the instruction (not that it matters much) and name it the same too (which matters more): ( ins i16imm:$simm16) arsenm: You should use i16imm for the operand type to match the instruction (not that it matters much)…
[(trap)]> {
let hasSideEffects = 1;		let hasSideEffects = 1;
let SALU = 1;		let SALU = 1;
		arsenmUnsubmitted Done Reply Inline Actions This conflicts with the VPseudoInst type. You should remove this and change to SPseudoInstSI arsenm: This conflicts with the VPseudoInst type. You should remove this and change to SPseudoInstSI
let usesCustomInserter = 1;		let usesCustomInserter = 1;
}		}

let usesCustomInserter = 1, SALU = 1 in {		let usesCustomInserter = 1, SALU = 1 in {
def GET_GROUPSTATICSIZE : PseudoInstSI <(outs SReg_32:$sdst), (ins),		def GET_GROUPSTATICSIZE : PseudoInstSI <(outs SReg_32:$sdst), (ins),
[(set SReg_32:$sdst, (int_amdgcn_groupstaticsize))]>;		[(set SReg_32:$sdst, (int_amdgcn_groupstaticsize))]>;
} // End let usesCustomInserter = 1, SALU = 1		} // End let usesCustomInserter = 1, SALU = 1

def S_MOV_B64_term : PseudoInstSI<(outs SReg_64:$dst),		def S_MOV_B64_term : PseudoInstSI<(outs SReg_64:$dst),
(ins SSrc_b64:$src0)> {		(ins SSrc_b64:$src0)> {
		arsenmUnsubmitted Done Reply Inline Actions You do not need a second pseudo. You should just add an operand to the other pseudo which is just mimicking the operands of the physical instruction. arsenm: You do not need a second pseudo. You should just add an operand to the other pseudo which is…
let SALU = 1;		let SALU = 1;
let isAsCheapAsAMove = 1;		let isAsCheapAsAMove = 1;
let isTerminator = 1;		let isTerminator = 1;
}		}

def S_XOR_B64_term : PseudoInstSI<(outs SReg_64:$dst),		def S_XOR_B64_term : PseudoInstSI<(outs SReg_64:$dst),
(ins SSrc_b64:$src0, SSrc_b64:$src1)> {		(ins SSrc_b64:$src0, SSrc_b64:$src1)> {
let SALU = 1;		let SALU = 1;
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	def SI_PC_ADD_REL_OFFSET : SPseudoInstSI <
[(set SReg_64:$dst,		[(set SReg_64:$dst,
(i64 (SIpc_add_rel_offset (tglobaladdr:$ptr_lo), (tglobaladdr:$ptr_hi))))]> {		(i64 (SIpc_add_rel_offset (tglobaladdr:$ptr_lo), (tglobaladdr:$ptr_hi))))]> {
let Defs = [SCC];		let Defs = [SCC];
}		}

} // End SubtargetPredicate = isGCN		} // End SubtargetPredicate = isGCN

let Predicates = [isGCN] in {		let Predicates = [isGCN] in {
		def :Pat<
		arsenmUnsubmitted Not Done Reply Inline Actions Space after : arsenm: Space after :
		(trap),
		(S_TRAP_PSEUDO TRAPTYPE.LLVM_TRAP)
		>;

		def :Pat<
		(debugtrap),
		(S_TRAP_PSEUDO TRAPTYPE.LLVM_DEBUG_TRAP)
		>;

def : Pat<		def : Pat<
		arsenmUnsubmitted Not Done Reply Inline Actions Ditto arsenm: Ditto
(int_amdgcn_else i64:$src, bb:$target),		(int_amdgcn_else i64:$src, bb:$target),
(SI_ELSE $src, $target, 0)		(SI_ELSE $src, $target, 0)
		arsenmUnsubmitted Done Reply Inline Actions Can you define pseudo-enums for these constants (see for example SRCMODS) arsenm: Can you define pseudo-enums for these constants (see for example SRCMODS)
>;		>;

def : Pat <		def : Pat <
(int_AMDGPU_kilp),		(int_AMDGPU_kilp),
(SI_KILL (i32 0xbf800000))		(SI_KILL (i32 0xbf800000))
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 732 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDKernelCodeTInfo.h

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	COMPPGM1(priv, compute_pgm_rsrc1_priv, PRIV),			COMPPGM1(priv, compute_pgm_rsrc1_priv, PRIV),
	COMPPGM1(enable_dx10_clamp, compute_pgm_rsrc1_dx10_clamp, DX10_CLAMP),			COMPPGM1(enable_dx10_clamp, compute_pgm_rsrc1_dx10_clamp, DX10_CLAMP),
	COMPPGM1(debug_mode, compute_pgm_rsrc1_debug_mode, DEBUG_MODE),			COMPPGM1(debug_mode, compute_pgm_rsrc1_debug_mode, DEBUG_MODE),
	COMPPGM1(enable_ieee_mode, compute_pgm_rsrc1_ieee_mode, IEEE_MODE),			COMPPGM1(enable_ieee_mode, compute_pgm_rsrc1_ieee_mode, IEEE_MODE),
	// TODO: bulky			// TODO: bulky
	// TODO: cdbg_user			// TODO: cdbg_user
	COMPPGM2(enable_sgpr_private_segment_wave_byte_offset, compute_pgm_rsrc2_scratch_en, SCRATCH_EN),			COMPPGM2(enable_sgpr_private_segment_wave_byte_offset, compute_pgm_rsrc2_scratch_en, SCRATCH_EN),
	COMPPGM2(user_sgpr_count, compute_pgm_rsrc2_user_sgpr, USER_SGPR),			COMPPGM2(user_sgpr_count, compute_pgm_rsrc2_user_sgpr, USER_SGPR),
	// TODO: enable_trap_handler			COMPPGM2(enable_trap_handler, compute_pgm_rsrc2_trap_handler, TRAP_HANDLER),
	kzhuravlUnsubmitted Done Reply Inline Actions `enable_trap_handler` should be here (see comment with links). kzhuravl: `enable_trap_handler` should be here (see comment with links).
	COMPPGM2(enable_sgpr_workgroup_id_x, compute_pgm_rsrc2_tgid_x_en, TGID_X_EN),			COMPPGM2(enable_sgpr_workgroup_id_x, compute_pgm_rsrc2_tgid_x_en, TGID_X_EN),
	COMPPGM2(enable_sgpr_workgroup_id_y, compute_pgm_rsrc2_tgid_y_en, TGID_Y_EN),			COMPPGM2(enable_sgpr_workgroup_id_y, compute_pgm_rsrc2_tgid_y_en, TGID_Y_EN),
	COMPPGM2(enable_sgpr_workgroup_id_z, compute_pgm_rsrc2_tgid_z_en, TGID_Z_EN),			COMPPGM2(enable_sgpr_workgroup_id_z, compute_pgm_rsrc2_tgid_z_en, TGID_Z_EN),
	COMPPGM2(enable_sgpr_workgroup_info, compute_pgm_rsrc2_tg_size_en, TG_SIZE_EN),			COMPPGM2(enable_sgpr_workgroup_info, compute_pgm_rsrc2_tg_size_en, TG_SIZE_EN),
	COMPPGM2(enable_vgpr_workitem_id, compute_pgm_rsrc2_tidig_comp_cnt, TIDIG_COMP_CNT),			COMPPGM2(enable_vgpr_workitem_id, compute_pgm_rsrc2_tidig_comp_cnt, TIDIG_COMP_CNT),
	COMPPGM2(enable_exception_msb, compute_pgm_rsrc2_excp_en_msb, EXCP_EN_MSB), // TODO: split enable_exception_msb			COMPPGM2(enable_exception_msb, compute_pgm_rsrc2_excp_en_msb, EXCP_EN_MSB), // TODO: split enable_exception_msb
	COMPPGM2(granulated_lds_size, compute_pgm_rsrc2_lds_size, LDS_SIZE),			COMPPGM2(granulated_lds_size, compute_pgm_rsrc2_lds_size, LDS_SIZE),
	COMPPGM2(enable_exception, compute_pgm_rsrc2_excp_en, EXCP_EN), // TODO: split enable_exception			COMPPGM2(enable_exception, compute_pgm_rsrc2_excp_en, EXCP_EN), // TODO: split enable_exception
	Show All 9 Lines
	CODEPROP(enable_sgpr_grid_workgroup_count_y, ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y),			CODEPROP(enable_sgpr_grid_workgroup_count_y, ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y),
	CODEPROP(enable_sgpr_grid_workgroup_count_z, ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z),			CODEPROP(enable_sgpr_grid_workgroup_count_z, ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z),
	CODEPROP(enable_ordered_append_gds, ENABLE_ORDERED_APPEND_GDS),			CODEPROP(enable_ordered_append_gds, ENABLE_ORDERED_APPEND_GDS),
	CODEPROP(private_element_size, PRIVATE_ELEMENT_SIZE),			CODEPROP(private_element_size, PRIVATE_ELEMENT_SIZE),
	CODEPROP(is_ptr64, IS_PTR64),			CODEPROP(is_ptr64, IS_PTR64),
	CODEPROP(is_dynamic_callstack, IS_DYNAMIC_CALLSTACK),			CODEPROP(is_dynamic_callstack, IS_DYNAMIC_CALLSTACK),
	CODEPROP(is_debug_enabled, IS_DEBUG_SUPPORTED),			CODEPROP(is_debug_enabled, IS_DEBUG_SUPPORTED),
	CODEPROP(is_xnack_enabled, IS_XNACK_SUPPORTED),			CODEPROP(is_xnack_enabled, IS_XNACK_SUPPORTED),

				kzhuravlUnsubmitted Not Done Reply Inline Actions There is no `is_trap_handler_supported` in `CODEPROP` (see comment with links). kzhuravl: There is no `is_trap_handler_supported` in `CODEPROP` (see comment with links).
	FIELD(workitem_private_segment_byte_size),			FIELD(workitem_private_segment_byte_size),
	FIELD(workgroup_group_segment_byte_size),			FIELD(workgroup_group_segment_byte_size),
	FIELD(gds_segment_byte_size),			FIELD(gds_segment_byte_size),
	FIELD(kernarg_segment_byte_size),			FIELD(kernarg_segment_byte_size),
	FIELD(workgroup_fbarrier_count),			FIELD(workgroup_fbarrier_count),
	FIELD(wavefront_sgpr_count),			FIELD(wavefront_sgpr_count),
	FIELD(workitem_vgpr_count),			FIELD(workitem_vgpr_count),
	FIELD(reserved_vgpr_first),			FIELD(reserved_vgpr_first),
	Show All 28 Lines

test/CodeGen/AMDGPU/fneg-fabs.f16.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC %s

	; GCN-LABEL: {{^}}fneg_fabs_fadd_f16:			; GCN-LABEL: {{^}}fneg_fabs_fadd_f16:
	; CI: v_cvt_f32_f16_e32			; CI: v_cvt_f32_f16_e32
	; CI: v_cvt_f32_f16_e32			; CI: v_cvt_f32_f16_e32
	; CI: v_sub_f32_e64 v{{[0-9]+}}, v{{[0-9]+}}, \|v{{[0-9]+}}\|			; CI: v_sub_f32_e64 v{{[0-9]+}}, v{{[0-9]+}}, \|v{{[0-9]+}}\|

	; VI-NOT: and			; VI-NOT: _and
	; VI: v_sub_f16_e64 {{v[0-9]+}}, {{v[0-9]+}}, \|{{v[0-9]+}}\|			; VI: v_sub_f16_e64 {{v[0-9]+}}, {{v[0-9]+}}, \|{{v[0-9]+}}\|
	define void @fneg_fabs_fadd_f16(half addrspace(1)* %out, half %x, half %y) {			define void @fneg_fabs_fadd_f16(half addrspace(1)* %out, half %x, half %y) {
	%fabs = call half @llvm.fabs.f16(half %x)			%fabs = call half @llvm.fabs.f16(half %x)
	%fsub = fsub half -0.000000e+00, %fabs			%fsub = fsub half -0.000000e+00, %fabs
	%fadd = fadd half %y, %fsub			%fadd = fadd half %y, %fsub
	store half %fadd, half addrspace(1)* %out, align 2			store half %fadd, half addrspace(1)* %out, align 2
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}fneg_fabs_fmul_f16:			; GCN-LABEL: {{^}}fneg_fabs_fmul_f16:
	; CI: v_cvt_f32_f16_e32			; CI: v_cvt_f32_f16_e32
	; CI: v_cvt_f32_f16_e32			; CI: v_cvt_f32_f16_e32
	; CI: v_mul_f32_e64 {{v[0-9]+}}, {{v[0-9]+}}, -\|{{v[0-9]+}}\|			; CI: v_mul_f32_e64 {{v[0-9]+}}, {{v[0-9]+}}, -\|{{v[0-9]+}}\|
	; CI: v_cvt_f16_f32_e32			; CI: v_cvt_f16_f32_e32

	; VI-NOT: and			; VI-NOT: _and
	; VI: v_mul_f16_e64 {{v[0-9]+}}, {{v[0-9]+}}, -\|{{v[0-9]+}}\|			; VI: v_mul_f16_e64 {{v[0-9]+}}, {{v[0-9]+}}, -\|{{v[0-9]+}}\|
	; VI-NOT: and			; VI-NOT: _and
	define void @fneg_fabs_fmul_f16(half addrspace(1)* %out, half %x, half %y) {			define void @fneg_fabs_fmul_f16(half addrspace(1)* %out, half %x, half %y) {
	%fabs = call half @llvm.fabs.f16(half %x)			%fabs = call half @llvm.fabs.f16(half %x)
	%fsub = fsub half -0.000000e+00, %fabs			%fsub = fsub half -0.000000e+00, %fabs
	%fmul = fmul half %y, %fsub			%fmul = fmul half %y, %fsub
	store half %fmul, half addrspace(1)* %out, align 2			store half %fmul, half addrspace(1)* %out, align 2
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/trap.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn--amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=HSA-TRAP %s

				arsenmUnsubmitted Not Done Reply Inline Actions This needs a not arsenm: This needs a not
				arsenmUnsubmitted Done Reply Inline Actions Missing a run line with mesa triple arsenm: Missing a run line with mesa triple
				; RUN: llc -mtriple=amdgcn--amdhsa -mattr=+trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP %s
				; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=NO-HSA-TRAP %s
				arsenmUnsubmitted Not Done Reply Inline Actions These should both explicitly use the mesa triple, and enable/disable the trap handler in each. The ones which should error are also missing the not. arsenm: These should both explicitly use the mesa triple, and enable/disable the trap handler in each.
				wdngAuthorUnsubmitted Not Done Reply Inline Actions As per our discussion with @tony-tye last week, looks like we want to issue warning instead of error, correct? wdng: As per our discussion with @tony-tye last week, looks like we want to issue warning instead of…
				; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING %s

				; enable trap handler feature
				; RUN: llc -mtriple=amdgcn-unknown-mesa3d -mattr=+trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=NO-MESA-TRAP -check-prefix=TRAP-BIT -check-prefix=MESA-TRAP %s
				; RUN: llc -mtriple=amdgcn-unknown-mesa3d -mattr=+trap-handler -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -check-prefix=TRAP-BIT %s
				arsenmUnsubmitted Not Done Reply Inline Actions You seem to have removed the HSA enable/disable and do mesa twice? arsenm: You seem to have removed the HSA enable/disable and do mesa twice?
				wdngAuthorUnsubmitted Not Done Reply Inline Actions I also do HSA enable/disable twice: ; enable HSA trap handler ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=+trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP %s ; disable HSA trap handler ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=NO-HSA-TRAP %s ; disable HSA trap handler to catch warning for llvm.debugtrap since llvm.trap doesn't issue any warnings ; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING %s wdng: I also do HSA enable/disable twice: ; enable HSA trap handler ; RUN: llc -mtriple=amdgcn…

				; disable trap handler feature
				; RUN: llc -mtriple=amdgcn-unknown-mesa3d -mattr=-trap-handler -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=NO-MESA-TRAP -check-prefix=NO-TRAP-BIT -check-prefix=NOMESA-TRAP %s
				arsenmUnsubmitted Not Done Reply Inline Actions You don't need a check prefix for NO-RSRC2-BIT. That is too specific, the checks should be based on the environment type/options arsenm: You don't need a check prefix for NO-RSRC2-BIT. That is too specific, the checks should be…
				; RUN: llc -mtriple=amdgcn-unknown-mesa3d -mattr=-trap-handler -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -check-prefix=NO-TRAP-BIT %s

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING %s
				arsenmUnsubmitted Not Done Reply Inline Actions This should not be checking stderr since the test should pass with HSA arsenm: This should not be checking stderr since the test should pass with HSA
				arsenmUnsubmitted Not Done Reply Inline Actions You should have a run line with no subtarget features enabled, and one each explicitly enabling and disabling the trap handler subtarget feature arsenm: You should have a run line with no subtarget features enabled, and one each explicitly enabling…
				arsenmUnsubmitted Done Reply Inline Actions Having FUNC and HSA-FUNC doesn't make sense. Replace these both with just GCN arsenm: Having FUNC and HSA-FUNC doesn't make sense. Replace these both with just GCN
				arsenmUnsubmitted Not Done Reply Inline Actions Missing GCN check prefix arsenm: Missing GCN check prefix

				arsenmUnsubmitted Not Done Reply Inline Actions GCN and HSA are not alternative check prefixes. It doesn't make sense to have it on one of these but not the other. HSA-TRAP for the disabled trap handler feature is broken arsenm: GCN and HSA are not alternative check prefixes. It doesn't make sense to have it on one of…
	declare void @llvm.trap() #0			declare void @llvm.trap() #0
				declare void @llvm.debugtrap() #0

				arsenmUnsubmitted Done Reply Inline Actions You should also check for the rsrc2 bit is set arsenm: You should also check for the rsrc2 bit is set
				; MESA-TRAP: .section .AMDGPU.config
				; MESA-TRAP: .long 47180
				; MESA-TRAP-NEXT: .long 208

				; NOMESA-TRAP: .section .AMDGPU.config
				; NOMESA-TRAP: .long 47180
				arsenmUnsubmitted Not Done Reply Inline Actions This is just the comment printed. You should check the actual register config register value too arsenm: This is just the comment printed. You should check the actual register config register value too
				; NOMESA-TRAP-NEXT: .long 144

				; GCN-LABEL: {{^}}hsa_trap:
				; HSA-TRAP: enable_trap_handler = 1
				; HSA-TRAP: s_mov_b64 s[0:1], s[4:5]
				; HSA-TRAP: s_trap 1

				; for llvm.trap in hsa path without ABI, direct generate s_endpgm instruction without any warning information
				; NO-HSA-TRAP: enable_trap_handler = 0
				; NO-HSA-TRAP: s_endpgm
				; NO-HSA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; TRAP-BIT: enable_trap_handler = 1
				; NO-TRAP-BIT: enable_trap_handler = 0
				; NO-MESA-TRAP: s_endpgm
				define void @hsa_trap() {
				call void @llvm.trap()
				ret void
				}

				; MESA-TRAP: .section .AMDGPU.config
				; MESA-TRAP: .long 47180
				; MESA-TRAP-NEXT: .long 208

				; NOMESA-TRAP: .section .AMDGPU.config
				; NOMESA-TRAP: .long 47180
				; NOMESA-TRAP-NEXT: .long 144

				; GCN-WARNING: warning: <unknown>:0:0: in function hsa_debugtrap void (): debugtrap handler not supported
				; GCN-LABEL: {{^}}hsa_debugtrap:
				; HSA-TRAP: enable_trap_handler = 1
				; HSA-TRAP: s_mov_b64 s[0:1], s[4:5]
				; HSA-TRAP: s_trap 2

				; for llvm.debugtrap in non-hsa path without ABI, generate a warning and a s_endpgm instruction
				; NO-HSA-TRAP: enable_trap_handler = 0
				; NO-HSA-TRAP: s_endpgm

				; TRAP-BIT: enable_trap_handler = 1
				; NO-TRAP-BIT: enable_trap_handler = 0
				; NO-MESA-TRAP: s_endpgm
				define void @hsa_debugtrap() {
				call void @llvm.debugtrap()
				ret void
				}

				; For non-HSA path
	; GCN-LABEL: {{^}}trap:			; GCN-LABEL: {{^}}trap:
				arsenmUnsubmitted Not Done Reply Inline Actions Check still missing for the enable_trap_handler bit in kernel_code_t arsenm: Check still missing for the enable_trap_handler bit in kernel_code_t
	; GCN: v_mov_b32_e32 v0, 1			; TRAP-BIT: enable_trap_handler = 1
	; GCN: s_mov_b64 s[0:1], s[4:5]			; NO-TRAP-BIT: enable_trap_handler = 0
				wdngAuthorUnsubmitted Not Done Reply Inline Actions Matt: "Should check for the enabled feature bits in the kernel_code_t. This also doesn’t have anything setting enable_trap_handler" wdng: Matt: "Should check for the enabled feature bits in the kernel_code_t. This also doesn’t have…
				wdngAuthorUnsubmitted Not Done Reply Inline Actions I think "@llvm.trap()" will enable_trap_handler, right? wdng: I think "@llvm.trap()" will enable_trap_handler, right?
				arsenmUnsubmitted Not Done Reply Inline Actions Yes, that is the problem. You need to be setting the trap handler bit arsenm: Yes, that is the problem. You need to be setting the trap handler bit
				tony-tyeUnsubmitted Not Done Reply Inline Actions I am not sure if we should have the compiler be responsible for setting the enable_trap_handler bit. In general I don't think the compiler can figure this out on a per kernel basis. Once we support function calls and indirect calls how could be know if the closure of all functions include some that need the trap handler? Also, even if we did set the bit, for compute, the hardware CP microcode cannot do anything with it other than refuse to execute the kernel. Today I believe the CP micro code ignores this bit and always enables the trap handler as it is needed for CWSR (context switching). So it seems the presence of a trap handler is more a function of the environment that will execute the kernel than the kernel itself. So perhaps we should simply not define the trap handler bit. The code object is already marked as requiring the HSA environment and perhaps this implies the presence of a trap handler. I suspect that graphics kernels have their own "environment" they execute in and may not support trap handlers. If they do not then executing an S_TRAP will simply be a NOP which will not halt the shader. tony-tye: I am not sure if we should have the compiler be responsible for setting the enable_trap_handler…
				arsenmUnsubmitted Not Done Reply Inline Actions I don't think that implies the presence of the trap handler. This is a field in the kernel_code_t, so we should set it. We can figure out a conservative setting in the future whenever indirect calls are needed. I thought enabling this also required reserving 16 SGPRs? arsenm: I don't think that implies the presence of the trap handler. This is a field in the…
				wdngAuthorUnsubmitted Not Done Reply Inline Actions So, once compiler detects "llvm.trap()", shall we set up the trap handler bit in the kernel_code_t? I don't know why we need to add the trap handler bit. Does the bit setting will imply the pretense of the trap handler and otherwise it won't? wdng: So, once compiler detects "llvm.trap()", shall we set up the trap handler bit in the…
				tony-tyeUnsubmitted Not Done Reply Inline Actions It seems that the presence of a trap handler is a property of producing a code object to be executed using the HSA environment. Under ROCM all kernels will be executed with a trap handler and that trap handler will use the HSA ABI. Other environments may have different trap handlers, or no trap handler at all, and the code generated for llvm.trap/debugtrap would change accordingly. The bit in amd_kernel_code_t would then be set according to the demands of environment independent of using traps. Having a trap handler present does cause an extra 16 SGPRs to be allocated (which should be taken into account when determining wave occupancy), but these are in addition to the SGRs used by the generated code (so no need to reserve them or include them in the SGPR count in the amd_kernel_code_t). tony-tye: It seems that the presence of a trap handler is a property of producing a code object to be…
				arsenmUnsubmitted Not Done Reply Inline Actions Do we still need to make sure we leave 16 unallocated for the implicit use? From the ABI spec it seems clear to not include them in the reported SGPR count, but do we need to ensure reported total + 16 is below the hardware limit? arsenm: Do we still need to make sure we leave 16 unallocated for the implicit use? From the ABI spec…
				tony-tyeUnsubmitted Not Done Reply Inline Actions The hardware limit is the maximum non-privileged SGPRs that can be accessed using the instruction register encoding plus the 16 privileged trap registers. The trap temps are only allocated IF a trap handler is enabled, and are only accessible when executing the trap handler (using special instruction register encoding). So from the point of view of determining the limit of non-privileged SGPRs that can be allocated the presence of a trap handler can be ignored. The fact they are allocated needs to be considered when determining the number of waves that will fit on a CU. tony-tye: The hardware limit is the maximum non-privileged SGPRs that can be accessed using the…
	; GCN: s_trap 1			; NO-HSA-TRAP: s_endpgm
				; NO-MESA-TRAP: s_endpgm
	define void @trap() {			define void @trap() {
				arsenmUnsubmitted Not Done Reply Inline Actions You should add another test which ends in unreachable. As far as I can tell there is no flag to set TrapUnreachable, and we don't enable it now, but this should catch it if that ever breaks arsenm: You should add another test which ends in unreachable. As far as I can tell there is no flag to…
				wdngAuthorUnsubmitted Not Done Reply Inline Actions What kinds of instructiosn will trigger the trap instruction? wdng: What kinds of instructiosn will trigger the trap instruction?
				arsenmUnsubmitted Not Done Reply Inline Actions You also add another test which needs to enable the queue ptr for a different feature arsenm: You also add another test which needs to enable the queue ptr for a different feature
				wdngAuthorUnsubmitted Not Done Reply Inline Actions What feature should I test? wdng: What feature should I test?
				arsenmUnsubmitted Not Done Reply Inline Actions You can do an address space cast from LDS to flat or do something with the queue ptr intrinsic arsenm: You can do an address space cast from LDS to flat or do something with the queue ptr intrinsic
	call void @llvm.trap()			call void @llvm.trap()
				arsenmUnsubmitted Not Done Reply Inline Actions HSA: arsenm: HSA:
	ret void			ret void
	}			}

	attributes #0 = { nounwind noreturn }			attributes #0 = { nounwind noreturn }
				arsenmUnsubmitted Not Done Reply Inline Actions GCN should be replaced with ERROR or something like that arsenm: GCN should be replaced with ERROR or something like that
				tony-tyeUnsubmitted Not Done Reply Inline Actions Would like it to be: ; HSA: s_trap 2 tony-tye: Would like it to be: ; HSA: s_trap 2
				arsenmUnsubmitted Done Reply Inline Actions The check prefix names should be like HSA-TRAP, HSA-NOTRAP, MESA-TRAP, MESA-NOTRAP arsenm: The check prefix names should be like HSA-TRAP, HSA-NOTRAP, MESA-TRAP, MESA-NOTRAP
				arsenmUnsubmitted Not Done Reply Inline Actions These check lines should not be able to coexist arsenm: These check lines should not be able to coexist

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add trap handler support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 87848

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

lib/Target/AMDGPU/Utils/AMDKernelCodeTInfo.h

test/CodeGen/AMDGPU/fneg-fabs.f16.ll

test/CodeGen/AMDGPU/trap.ll

AMDGPU : Add trap handler support.
ClosedPublic