This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32)
ClosedPublic

Authored by foad on Mar 1 2021, 1:58 AM.

Download Raw Diff

Details

Reviewers

dstuttard
piotr
rampitec
arsenm

Commits

rG796a60d2ea32: [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32)

Summary

The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Mar 1 2021, 1:58 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 5 others. · View Herald TranscriptMar 1 2021, 1:58 AM

foad requested review of this revision.Mar 1 2021, 1:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 1:58 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B91301: Diff 327057.Mar 1 2021, 3:18 AM

arsenm added inline comments.Mar 1 2021, 6:09 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1288	Is this really willreturn?
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.sethalt.ll
4	I prefer to keep the -global-isel as the first argument

foad added inline comments.Mar 1 2021, 6:15 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1288	Yes, as far as the generated code is concerned. It doesn't really know that it'll be stopped in the debugger for a while, and we definitely want it to function correctly if/when the debugger resumes it.

Move -global-isel to front.

arsenm accepted this revision.Mar 1 2021, 6:23 AM

This revision is now accepted and ready to land.Mar 1 2021, 6:23 AM

This revision was landed with ongoing or failed builds.Mar 1 2021, 6:30 AM

Closed by commit rG796a60d2ea32: [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG796a60d2ea32: [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32).

Harbormaster completed remote builds in B91327: Diff 327098.Mar 1 2021, 7:42 AM

The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Please check with the AMDGPU debugger team before making these assumptions:-) It turns out that the debugger did recently add support for tracking the wave halt state independent of the breakpoint state so this is fine. However, using this does not notify the debugger. A better alternative is to use the llvm.debug intrinsic that will notify the debugger when executed, and also halt the wave for the debugger to resume.

In D97670#2596219, @t-tye wrote:

A better alternative is to use the llvm.debug intrinsic

It looks like llvm.debugtrap is only supported on HSA?

that will notify the debugger when executed, and also halt the wave for the debugger to resume.

Right, I realise that llvm.debugtrap/s_trap is a more normal way of breaking into a debugger. But there is more than one debugger, and more than one way of debugging things, and for cases where you want to halt the program and attach a debugger after the fact, llvm.amdgcn.s.sethalt/s_sethalt seems like a useful thing.

What other debuggers do you have in mind? There is the rocgdb and the UMR hardware debugger. Do you know of others for AMDGPU? If non-HSA targets would like to support a debugger it would make sense that the also add support for llvm.debugtrap as well:-)

Also, does this patch account for the hardware hazard that you must not have an s_endpgm following an s_halt on asics that do not support it? One approach to avoid the hazard is to always put a s_nop after an s_halt.

In D97670#2599025, @t-tye wrote:

What other debuggers do you have in mind?

Windbg and umr.

If non-HSA targets would like to support a debugger it would make sense that the also add support for llvm.debugtrap as well:-)

Makes sense to me. It seems like the lowering of llvm.debugtrap to s_trap is deliberately restricted to HSA, but I don't know the history of that.

Also, does this patch account for the hardware hazard that you must not have an s_endpgm following an s_halt on asics that do not support it? One approach to avoid the hazard is to always put a s_nop after an s_halt.

No I was not aware of that. I'll see what I can do...

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

4 lines

lib/

Target/

AMDGPU/

SOPInstructions.td

3 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.s.sethalt.ll

28 lines

Diff 327101

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,277 Lines • ▼ Show 20 Lines
	}			}

	def int_amdgcn_s_decperflevel :			def int_amdgcn_s_decperflevel :
	GCCBuiltin<"__builtin_amdgcn_s_decperflevel">,			GCCBuiltin<"__builtin_amdgcn_s_decperflevel">,
	Intrinsic<[], [llvm_i32_ty], [ImmArg<ArgIndex<0>>, IntrNoMem,			Intrinsic<[], [llvm_i32_ty], [ImmArg<ArgIndex<0>>, IntrNoMem,
	IntrHasSideEffects, IntrWillReturn]> {			IntrHasSideEffects, IntrWillReturn]> {
	}			}

				def int_amdgcn_s_sethalt :
				Intrinsic<[], [llvm_i32_ty], [ImmArg<ArgIndex<0>>, IntrNoMem,
				IntrHasSideEffects, IntrWillReturn]>;
				arsenmUnsubmitted Not Done Reply Inline Actions Is this really willreturn? arsenm: Is this really willreturn?
				foadAuthorUnsubmitted Done Reply Inline Actions Yes, as far as the generated code is concerned. It doesn't really know that it'll be stopped in the debugger for a while, and we definitely want it to function correctly if/when the debugger resumes it. foad: Yes, as far as the generated code is concerned. It doesn't really know that it'll be stopped in…

	def int_amdgcn_s_getreg :			def int_amdgcn_s_getreg :
	GCCBuiltin<"__builtin_amdgcn_s_getreg">,			GCCBuiltin<"__builtin_amdgcn_s_getreg">,
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty],			Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
	[IntrInaccessibleMemOnly, IntrReadMem, IntrSpeculatable,			[IntrInaccessibleMemOnly, IntrReadMem, IntrSpeculatable,
	IntrWillReturn, ImmArg<ArgIndex<0>>]			IntrWillReturn, ImmArg<ArgIndex<0>>]
	>;			>;

	// Note this can be used to set FP environment properties that are			// Note this can be used to set FP environment properties that are
	▲ Show 20 Lines • Show All 811 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SOPInstructions.td

Show First 20 Lines • Show All 1,222 Lines • ▼ Show 20 Lines	def S_WAKEUP : SOPP_Pseudo <"s_wakeup", (ins) > {
let fixed_imm = 1;		let fixed_imm = 1;
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 1;		let mayStore = 1;
}		}

let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in		let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in
def S_WAITCNT : SOPP_Pseudo <"s_waitcnt" , (ins WAIT_FLAG:$simm16), "$simm16",		def S_WAITCNT : SOPP_Pseudo <"s_waitcnt" , (ins WAIT_FLAG:$simm16), "$simm16",
[(int_amdgcn_s_waitcnt timm:$simm16)]>;		[(int_amdgcn_s_waitcnt timm:$simm16)]>;
def S_SETHALT : SOPP_Pseudo <"s_sethalt" , (ins i16imm:$simm16), "$simm16">;		def S_SETHALT : SOPP_Pseudo <"s_sethalt" , (ins i32imm:$simm16), "$simm16",
		[(int_amdgcn_s_sethalt timm:$simm16)]>;
def S_SETKILL : SOPP_Pseudo <"s_setkill" , (ins i16imm:$simm16), "$simm16">;		def S_SETKILL : SOPP_Pseudo <"s_setkill" , (ins i16imm:$simm16), "$simm16">;

// On SI the documentation says sleep for approximately 64 * low 2		// On SI the documentation says sleep for approximately 64 * low 2
// bits, consistent with the reported maximum of 448. On VI the		// bits, consistent with the reported maximum of 448. On VI the
// maximum reported is 960 cycles, so 960 / 64 = 15 max, so is the		// maximum reported is 960 cycles, so 960 / 64 = 15 max, so is the
// maximum really 15 on VI?		// maximum really 15 on VI?
def S_SLEEP : SOPP_Pseudo <"s_sleep", (ins i32imm:$simm16),		def S_SLEEP : SOPP_Pseudo <"s_sleep", (ins i32imm:$simm16),
"$simm16", [(int_amdgcn_s_sleep timm:$simm16)]> {		"$simm16", [(int_amdgcn_s_sleep timm:$simm16)]> {
▲ Show 20 Lines • Show All 739 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.sethalt.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				arsenmUnsubmitted Not Done Reply Inline Actions I prefer to keep the -global-isel as the first argument arsenm: I prefer to keep the -global-isel as the first argument
				define amdgpu_kernel void @test_s_sethalt() {
				; GCN-LABEL: test_s_sethalt:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_sethalt 0
				; GCN-NEXT: s_sethalt 1
				; GCN-NEXT: s_sethalt 2
				; GCN-NEXT: s_sethalt 3
				; GCN-NEXT: s_sethalt 4
				; GCN-NEXT: s_sethalt 5
				; GCN-NEXT: s_sethalt 6
				; GCN-NEXT: s_sethalt 7
				; GCN-NEXT: s_endpgm
				call void @llvm.amdgcn.s.sethalt(i32 0)
				call void @llvm.amdgcn.s.sethalt(i32 1)
				call void @llvm.amdgcn.s.sethalt(i32 2)
				call void @llvm.amdgcn.s.sethalt(i32 3)
				call void @llvm.amdgcn.s.sethalt(i32 4)
				call void @llvm.amdgcn.s.sethalt(i32 5)
				call void @llvm.amdgcn.s.sethalt(i32 6)
				call void @llvm.amdgcn.s.sethalt(i32 7)
				ret void
				}

				declare void @llvm.amdgcn.s.sethalt(i32)