This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
4/19
AMDGPUUsage.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAMDGPU.td
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
3
AMDGPU.h
-
SIISelLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
llvm.amdgcn.ds.add.gs.reg.rtn.ll
-
llvm.amdgcn.ds.sub.gs.reg.rtn.ll

Differential D146031

[AMDGPU] Add MMOs for GFX11 Streamout Instructions
ClosedPublic

Authored by rovka on Mar 14 2023, 3:54 AM.

Download Raw Diff

Details

Reviewers

arsenm

Group Reviewers

Restricted Project

Commits

rGd9bf8aba2371: [AMDGPU] Add MMOs for GFX11 Streamout Instructions

Summary

The GFX11 NGG Streamout Instructions perform atomic operations on
dedicated registers. At the moment, they lack machine memory operands,
which causes the si-memory-legalizer pass to treat them conservatively
and introduce several unnecessary waits and cache invalidations.

This patch introduces a new address space to represent these special
registers and teaches instruction selection to add memory operands with
this new address space to DS_ADD/SUB_GS_REG_RTN.

Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] https://github.com/llvm/llvm-project/blob/0107513fe79da7670e37c29c0862794a2213a89c/llvm/utils/TableGen/IntrinsicEmitter.cpp#L401

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rovka created this revision.Mar 14 2023, 3:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 3:54 AM

Herald added subscribers: kosarev, foad, kerbowa and 8 others. · View Herald Transcript

rovka requested review of this revision.Mar 14 2023, 3:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 3:54 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

rovka added a reviewer: Restricted Project.Mar 14 2023, 3:56 AM

foad added subscribers: nhaehnle, Pierre-vh.Mar 14 2023, 4:49 AM

foad added inline comments.

llvm/docs/AMDGPUUsage.rst
678	My gut feeling is that this should be an internal implementation detail of the compiler, should never be exposed to the user, so should not be documented here. Adding @nhaehnle.
llvm/lib/Target/AMDGPU/AMDGPU.h
363	Maybe add a comment here saying "the following address spaces are internal-only and can be freely renumbered"? E.g. @Pierre-vh plans to add a new user-visible address space 8, but I think it should be fine to do that by renumbering STREAMOUT_REGISTER to 9 to make room. I am assuming we will end up with more internal-only address spaces similar to this one.

Harbormaster completed remote builds in B219305: Diff 505017.Mar 14 2023, 5:19 AM

arsenm added inline comments.Mar 14 2023, 7:00 AM

llvm/docs/AMDGPUUsage.rst
678	I do think we need to document any address space numbers used for any purpose. We should also probably reserve a range of address spaces for external software uses

foad added inline comments.Mar 14 2023, 8:00 AM

llvm/docs/AMDGPUUsage.rst
678	I was thinking this would be completely inaccessible from IR and only introduced during instruction selection. Do you agree? Do you still think it needs to be documented here? Why?

arsenm added inline comments.Mar 14 2023, 8:02 AM

llvm/docs/AMDGPUUsage.rst
678	It would still need to be documented as internally used for something so it doesn't get reused for something else

foad added inline comments.Mar 14 2023, 8:18 AM

llvm/docs/AMDGPUUsage.rst
678	But we can reuse the number for something else whenever we like - we can just renumber STREAMOUT_REGISTER to something different, with no externally visible effect.

t-tye added a subscriber: t-tye.Mar 14 2023, 6:19 PM

t-tye added inline comments.

llvm/docs/AMDGPUUsage.rst
788	My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory?
llvm/lib/Target/AMDGPU/AMDGPU.h
363	Or could the internal address spaces start at a higher number so they do not have to be renumbered? AMDGPUUsage could then reserve a set of values to "internal use".

rovka added inline comments.Mar 15 2023, 2:47 AM

llvm/docs/AMDGPUUsage.rst
678	I kind of see the point that we don't really have an "LLVM IR Address Space Number" for this, since it doesn't show up in the IR. We could just document that there IS an address space for the Streamout registers (if indeed we decide we need one), but don't assign it a formal address space number in this doc.
788	I think the idea is that these registers are embedded into the GDS. In principle, we could use the region address space and as far as the existing tests are concerned we'd get the same results, but the docs are pretty clear that these are not operating on GDS directly, so it seemed cleaner to have a new address space. I'm open to other interpretations :)

foad added inline comments.Mar 15 2023, 3:44 AM

llvm/docs/AMDGPUUsage.rst
788	My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory? They are called "registers" in the documentation but as far as the ISA is concerned they act more like memory in a separate address space that does not interact with any other type of memory. The only instructions that access them take an offset into the "register" file and read or write 32 or 64 bits at that offset. The individual registers are not named. Read operations from the register file use LGKMcnt to track when the read has completed. I guess all of this could be implemented by modelling them as individual registers, but my hunch is that it would be more complicated to implement for no practical benefit.

nhaehnle added inline comments.Mar 16 2023, 1:48 AM

llvm/docs/AMDGPUUsage.rst
678	I agree with @arsenm, we need to document our choice here to reduce the risk of conflicts. Yes, in theory we could renumber if that happens, but isn't it better to try to prevent the problem in the first place? It's also helpful as internal documentation for anybody working on the backend.
788	Yeah, this is just a case of unlucky naming due to different worlds colliding. My understanding is that HW people ended up calling it "registers" because the actual HW implementation looks more like (GRBM) registers than memory, but from the shader code's perspective it very much behaves like memory.

foad added inline comments.Mar 16 2023, 3:48 AM

llvm/lib/Target/AMDGPU/AMDGPU.h
363	I am assuming we will end up with more internal-only address spaces similar to this one. For example we currently use a GWSResource PseudoSourceValue to represent GWS sync resources, and they end up in address space 0, which is just plain wrong. It would be better to use a dedicated internal-only address space for them, at which point the PSV no longer adds any value and we could get rid of it.

Moved the new address space to 128, and clarified that it is an internal address space that can be freely renumbered.

It seems arc didn't want to update the revision Summary from my commit message, so I'll add this as a comment:
I'm also planning to add this paragraph to the commit message:

[...]
Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] https://github.com/llvm/llvm-project/blob/0107513fe79da7670e37c29c0862794a2213a89c/llvm/utils/TableGen/IntrinsicEmitter.cpp#L401

I wonder if you need to update getAliasResult in AMDGPUAliasAnalysis for the new address space.

Harbormaster completed remote builds in B220422: Diff 506559.Mar 20 2023, 6:41 AM

In D146031#4206072, @foad wrote:

I wonder if you need to update getAliasResult in AMDGPUAliasAnalysis for the new address space.

Hm, interesting, I'm not even sure how we create MemoryLocations for these intrinsics. I'll have a look.

scott.linder added a subscriber: scott.linder.Mar 20 2023, 2:15 PM

scott.linder added inline comments.

llvm/docs/AMDGPUUsage.rst
678	I would also tend to agree that we should document it here, as the alternative is more confusing (i.e. seeing it documented but not needing it is worse than noticing it exists and finding no documentation). The fact that it currently exists only transitively during ISEL also seems like an implementation detail, and so I don't think it must dictate whether we document it.
788	Is it worth noting the tenuous use of the term "registers" in the doc being commented on? Something like: Dedicated registers used by the GS NGG Streamout Instructions. The register file is modelled as a memory in a distinct address space because it is indexed by an address-like offset in place of named registers, and because register accesses affect LGKMcnt.

foad added inline comments.Mar 21 2023, 2:10 AM

llvm/docs/AMDGPUUsage.rst
678	seeing it documented but not needing it The risk is seeing it documented and trying to access it from IR and finding that that "doesn't work".

scott.linder added inline comments.Mar 21 2023, 8:44 AM

llvm/docs/AMDGPUUsage.rst
678	We can document it as we already do Generic, stating when it is meaningful. Or we can reserve a range of values for "internal" purposes and document them in another table, so there is no chance for confusion?

foad added inline comments.Mar 21 2023, 9:01 AM

llvm/docs/AMDGPUUsage.rst
678	Sure, if the documentation is clear then I have no objection to documenting it. Do you know what the failure mode is if you try to compile IR that uses an invalid or unsupported address space?

rovka added inline comments.Mar 22 2023, 2:16 AM

llvm/docs/AMDGPUUsage.rst
678	I think it only fails during instruction selection. At the IR level we don't seem to have a problem with unknown address spaces, in fact there's been some effort to let them just pass through. We have a handful of tests using addrspace(999), e.g. to check that AA treats it conservatively. In our case, I could probably teach AMDGPU AA that calls to these GS intrinsics don't affect any pointers (WIP), but I'm not sure what we ought to do about IR pointers to internal address spaces. Should we error out anywhere before instruction selection, or just leave them undisturbed through the middle-end and ICE later on in isel? Should we check this in the IRVerifier? Other thoughts?

foad added inline comments.Mar 22 2023, 2:40 AM

llvm/docs/AMDGPUUsage.rst
678	"failed to select" seems like a reasonable failure mode for now. Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target?

rovka added inline comments.Mar 22 2023, 3:10 AM

llvm/docs/AMDGPUUsage.rst
678	"failed to select" seems like a reasonable failure mode for now. Ok :) Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target? I'm not aware of any. I skimmed a bit through the IR verifier and there's no support for validating pointers in general, but we usually have a Triple around so we could at least check that we don't load/store from internal address spaces for AMDGPU triples. How does that sound?

I have expanded the section in AMDGPUUsage.rst. Hopefully this clarifies the intention, but do let me know if I anything could use rephrasing.

I had a stab at catching IR pointers to this address space in the IR Verifier, but AFAICT we shouldn't import lib/Target/AMDGPU/AMDGPU.h there, so we wouldn't have access to our address space enum. I don't think it's worth the effort of maintaining the internal address space numbering in yet another place, so I'm abandoning that idea.

Regarding the alias info for these intrinsics - we don't seem to be doing anything special for other intrinsics (we're not overriding AAResult::getModRefInfo at all). So that should probably be a different patch anyway.

Harbormaster completed remote builds in B221946: Diff 508567.Mar 27 2023, 5:17 AM

arsenm accepted this revision.Mar 31 2023, 2:58 PM

This revision is now accepted and ready to land.Mar 31 2023, 2:58 PM

This revision was landed with ongoing or failed builds.Apr 11 2023, 2:12 AM

Closed by commit rGd9bf8aba2371: [AMDGPU] Add MMOs for GFX11 Streamout Instructions (authored by rovka). · Explain Why

This revision was automatically updated to reflect the committed changes.

rovka added a commit: rGd9bf8aba2371: [AMDGPU] Add MMOs for GFX11 Streamout Instructions.

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

8 lines

include/

llvm/

IR/

IntrinsicsAMDGPU.td

6 lines

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

SIISelLowering.cpp

9 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.ds.add.gs.reg.rtn.ll

22 lines

llvm.amdgcn.ds.sub.gs.reg.rtn.ll

22 lines

Diff 512374

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 669 Lines • ▼ Show 20 Lines	.. table:: AMDGPU Address Spaces
Generic 0 flat flat 64 0x0000000000000000		Generic 0 flat flat 64 0x0000000000000000
Global 1 global global 64 0x0000000000000000		Global 1 global global 64 0x0000000000000000
Region 2 N/A GDS 32 not implemented for AMDHSA		Region 2 N/A GDS 32 not implemented for AMDHSA
Local 3 group LDS 32 0xFFFFFFFF		Local 3 group LDS 32 0xFFFFFFFF
Constant 4 constant same as global 64 0x0000000000000000		Constant 4 constant same as global 64 0x0000000000000000
Private 5 private scratch 32 0xFFFFFFFF		Private 5 private scratch 32 0xFFFFFFFF
Constant 32-bit 6 TODO 0x00000000		Constant 32-bit 6 TODO 0x00000000
Buffer Fat Pointer (experimental) 7 TODO		Buffer Fat Pointer (experimental) 7 TODO
		Streamout Registers 128 N/A GS_REGS
		foadUnsubmitted Not Done Reply Inline Actions My gut feeling is that this should be an internal implementation detail of the compiler, should never be exposed to the user, so should not be documented here. Adding @nhaehnle. foad: My gut feeling is that this should be an internal implementation detail of the compiler, should…
		arsenmUnsubmitted Not Done Reply Inline Actions I do think we need to document any address space numbers used for any purpose. We should also probably reserve a range of address spaces for external software uses arsenm: I do think we need to document any address space numbers used for any purpose. We should also…
		foadUnsubmitted Not Done Reply Inline Actions I was thinking this would be completely inaccessible from IR and only introduced during instruction selection. Do you agree? Do you still think it needs to be documented here? Why? foad: I was thinking this would be completely inaccessible from IR and only introduced during…
		arsenmUnsubmitted Not Done Reply Inline Actions It would still need to be documented as internally used for something so it doesn't get reused for something else arsenm: It would still need to be documented as internally used for something so it doesn't get reused…
		foadUnsubmitted Not Done Reply Inline Actions But we can reuse the number for something else whenever we like - we can just renumber STREAMOUT_REGISTER to something different, with no externally visible effect. foad: But we can reuse the number for something else whenever we like - we can just renumber…
		rovkaAuthorUnsubmitted Done Reply Inline Actions I kind of see the point that we don't really have an "LLVM IR Address Space Number" for this, since it doesn't show up in the IR. We could just document that there IS an address space for the Streamout registers (if indeed we decide we need one), but don't assign it a formal address space number in this doc. rovka: I kind of see the point that we don't really have an "LLVM IR Address Space Number" for this…
		nhaehnleUnsubmitted Not Done Reply Inline Actions I agree with @arsenm, we need to document our choice here to reduce the risk of conflicts. Yes, in theory we could renumber if that happens, but isn't it better to try to prevent the problem in the first place? It's also helpful as internal documentation for anybody working on the backend. nhaehnle: I agree with @arsenm, we need to document our choice here to reduce the risk of conflicts. Yes…
		scott.linderUnsubmitted Not Done Reply Inline Actions I would also tend to agree that we should document it here, as the alternative is more confusing (i.e. seeing it documented but not needing it is worse than noticing it exists and finding no documentation). The fact that it currently exists only transitively during ISEL also seems like an implementation detail, and so I don't think it must dictate whether we document it. scott.linder: I would also tend to agree that we should document it here, as the alternative is more…
		foadUnsubmitted Not Done Reply Inline Actions seeing it documented but not needing it The risk is seeing it documented and trying to access it from IR and finding that that "doesn't work". foad: > seeing it documented but not needing it The risk is seeing it documented and trying to access…
		scott.linderUnsubmitted Not Done Reply Inline Actions We can document it as we already do Generic, stating when it is meaningful. Or we can reserve a range of values for "internal" purposes and document them in another table, so there is no chance for confusion? scott.linder: We can document it as we already do Generic, stating when it is meaningful. Or we can reserve a…
		foadUnsubmitted Not Done Reply Inline Actions Sure, if the documentation is clear then I have no objection to documenting it. Do you know what the failure mode is if you try to compile IR that uses an invalid or unsupported address space? foad: Sure, if the documentation is clear then I have no objection to documenting it. Do you know…
		rovkaAuthorUnsubmitted Done Reply Inline Actions I think it only fails during instruction selection. At the IR level we don't seem to have a problem with unknown address spaces, in fact there's been some effort to let them just pass through. We have a handful of tests using addrspace(999), e.g. to check that AA treats it conservatively. In our case, I could probably teach AMDGPU AA that calls to these GS intrinsics don't affect any pointers (WIP), but I'm not sure what we ought to do about IR pointers to internal address spaces. Should we error out anywhere before instruction selection, or just leave them undisturbed through the middle-end and ICE later on in isel? Should we check this in the IRVerifier? Other thoughts? rovka: I think it only fails during instruction selection. At the IR level we don't seem to have a…
		foadUnsubmitted Not Done Reply Inline Actions "failed to select" seems like a reasonable failure mode for now. Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target? foad: "failed to select" seems like a reasonable failure mode for now. Adding some kind of checks in…
		rovkaAuthorUnsubmitted Done Reply Inline Actions "failed to select" seems like a reasonable failure mode for now. Ok :) Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target? I'm not aware of any. I skimmed a bit through the IR verifier and there's no support for validating pointers in general, but we usually have a Triple around so we could at least check that we don't load/store from internal address spaces for AMDGPU triples. How does that sound? rovka: > "failed to select" seems like a reasonable failure mode for now. Ok :) > Adding some kind…
================================= =============== =========== ================ ======= ============================		================================= =============== =========== ================ ======= ============================

Generic		Generic
The generic address space is supported unless the Target Properties column		The generic address space is supported unless the Target Properties column
of :ref:`amdgpu-processor-table` specifies *Does not support generic address		of :ref:`amdgpu-processor-table` specifies *Does not support generic address
space*.		space*.

The generic address space uses the hardware flat address support for two fixed		The generic address space uses the hardware flat address support for two fixed
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	Buffer Fat Pointer
The buffer fat pointer is an experimental address space that is currently		The buffer fat pointer is an experimental address space that is currently
unsupported in the backend. It exposes a non-integral pointer that is in		unsupported in the backend. It exposes a non-integral pointer that is in
the future intended to support the modelling of 128-bit buffer descriptors		the future intended to support the modelling of 128-bit buffer descriptors
plus a 32-bit offset into the buffer (in total encapsulating a 160-bit		plus a 32-bit offset into the buffer (in total encapsulating a 160-bit
pointer), allowing normal LLVM load/store/atomic operations to be used to		pointer), allowing normal LLVM load/store/atomic operations to be used to
model the buffer descriptors used heavily in graphics workloads targeting		model the buffer descriptors used heavily in graphics workloads targeting
the backend.		the backend.

		Streamout Registers
		Dedicated registers used by the GS NGG Streamout Instructions. The register
		t-tyeUnsubmitted Not Done Reply Inline Actions My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory? t-tye: My understanding is that address spaces are about memory, not registers. So why is an address…
		rovkaAuthorUnsubmitted Done Reply Inline Actions I think the idea is that these registers are embedded into the GDS. In principle, we could use the region address space and as far as the existing tests are concerned we'd get the same results, but the docs are pretty clear that these are not operating on GDS directly, so it seemed cleaner to have a new address space. I'm open to other interpretations :) rovka: I think the idea is that these registers are embedded into the GDS. In principle, we could…
		foadUnsubmitted Not Done Reply Inline Actions My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory? They are called "registers" in the documentation but as far as the ISA is concerned they act more like memory in a separate address space that does not interact with any other type of memory. The only instructions that access them take an offset into the "register" file and read or write 32 or 64 bits at that offset. The individual registers are not named. Read operations from the register file use LGKMcnt to track when the read has completed. I guess all of this could be implemented by modelling them as individual registers, but my hunch is that it would be more complicated to implement for no practical benefit. foad: > My understanding is that address spaces are about memory, not registers. So why is an address…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Yeah, this is just a case of unlucky naming due to different worlds colliding. My understanding is that HW people ended up calling it "registers" because the actual HW implementation looks more like (GRBM) registers than memory, but from the shader code's perspective it very much behaves like memory. nhaehnle: Yeah, this is just a case of unlucky naming due to different worlds colliding. My understanding…
		scott.linderUnsubmitted Not Done Reply Inline Actions Is it worth noting the tenuous use of the term "registers" in the doc being commented on? Something like: Dedicated registers used by the GS NGG Streamout Instructions. The register file is modelled as a memory in a distinct address space because it is indexed by an address-like offset in place of named registers, and because register accesses affect LGKMcnt. scott.linder: Is it worth noting the tenuous use of the term "registers" in the doc being commented on?
		file is modelled as a memory in a distinct address space because it is indexed
		by an address-like offset in place of named registers, and because register
		accesses affect LGKMcnt. This is an internal address space used only by the
		compiler. Do not use this address space for IR pointers.

.. _amdgpu-memory-scopes:		.. _amdgpu-memory-scopes:

Memory Scopes		Memory Scopes
-------------		-------------

This section provides LLVM memory synchronization scopes supported by the AMDGPU		This section provides LLVM memory synchronization scopes supported by the AMDGPU
backend memory model when the target triple OS is ``amdhsa`` (see		backend memory model when the target triple OS is ``amdhsa`` (see
:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).		:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).
▲ Show 20 Lines • Show All 14,395 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,994 Lines • ▼ Show 20 Lines
	def int_amdgcn_permlane64 :			def int_amdgcn_permlane64 :
	ClangBuiltin<"__builtin_amdgcn_permlane64">,			ClangBuiltin<"__builtin_amdgcn_permlane64">,
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty],			Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
	[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;			[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

	def int_amdgcn_ds_add_gs_reg_rtn :			def int_amdgcn_ds_add_gs_reg_rtn :
	ClangBuiltin<"__builtin_amdgcn_ds_add_gs_reg_rtn">,			ClangBuiltin<"__builtin_amdgcn_ds_add_gs_reg_rtn">,
	Intrinsic<[llvm_anyint_ty], [llvm_i32_ty, llvm_i32_ty],			Intrinsic<[llvm_anyint_ty], [llvm_i32_ty, llvm_i32_ty],
	[ImmArg<ArgIndex<1>>, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;			[ImmArg<ArgIndex<1>>, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree],
				"", [SDNPMemOperand]>;

	def int_amdgcn_ds_sub_gs_reg_rtn :			def int_amdgcn_ds_sub_gs_reg_rtn :
	ClangBuiltin<"__builtin_amdgcn_ds_sub_gs_reg_rtn">,			ClangBuiltin<"__builtin_amdgcn_ds_sub_gs_reg_rtn">,
	Intrinsic<[llvm_anyint_ty], [llvm_i32_ty, llvm_i32_ty],			Intrinsic<[llvm_anyint_ty], [llvm_i32_ty, llvm_i32_ty],
	[ImmArg<ArgIndex<1>>, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;			[ImmArg<ArgIndex<1>>, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree],
				"", [SDNPMemOperand]>;

	def int_amdgcn_ds_bvh_stack_rtn :			def int_amdgcn_ds_bvh_stack_rtn :
	Intrinsic<			Intrinsic<
	[llvm_i32_ty, llvm_i32_ty], // %vdst, %addr			[llvm_i32_ty, llvm_i32_ty], // %vdst, %addr
	[			[
	llvm_i32_ty, // %addr			llvm_i32_ty, // %addr
	llvm_i32_ty, // %data0			llvm_i32_ty, // %data0
	llvm_v4i32_ty, // %data1			llvm_v4i32_ty, // %data1
	▲ Show 20 Lines • Show All 462 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines

	/// OpenCL uses address spaces to differentiate between			/// OpenCL uses address spaces to differentiate between
	/// various memory regions on the hardware. On the CPU			/// various memory regions on the hardware. On the CPU
	/// all of the address spaces point to the same memory,			/// all of the address spaces point to the same memory,
	/// however on the GPU, each address space points to			/// however on the GPU, each address space points to
	/// a separate piece of memory that is unique from other			/// a separate piece of memory that is unique from other
	/// memory locations.			/// memory locations.
	namespace AMDGPUAS {			namespace AMDGPUAS {
	enum : unsigned {			enum : unsigned {
				foadUnsubmitted Not Done Reply Inline Actions Maybe add a comment here saying "the following address spaces are internal-only and can be freely renumbered"? E.g. @Pierre-vh plans to add a new user-visible address space 8, but I think it should be fine to do that by renumbering STREAMOUT_REGISTER to 9 to make room. I am assuming we will end up with more internal-only address spaces similar to this one. foad: Maybe add a comment here saying "the following address spaces are internal-only and can be…
				t-tyeUnsubmitted Not Done Reply Inline Actions Or could the internal address spaces start at a higher number so they do not have to be renumbered? AMDGPUUsage could then reserve a set of values to "internal use". t-tye: Or could the internal address spaces start at a higher number so they do not have to be…
				foadUnsubmitted Not Done Reply Inline Actions I am assuming we will end up with more internal-only address spaces similar to this one. For example we currently use a GWSResource PseudoSourceValue to represent GWS sync resources, and they end up in address space 0, which is just plain wrong. It would be better to use a dedicated internal-only address space for them, at which point the PSV no longer adds any value and we could get rid of it. foad: > I am assuming we will end up with more internal-only address spaces similar to this one. For…
	// The maximum value for flat, generic, local, private, constant and region.			// The maximum value for flat, generic, local, private, constant and region.
	MAX_AMDGPU_ADDRESS = 7,			MAX_AMDGPU_ADDRESS = 7,

	FLAT_ADDRESS = 0, ///< Address space for flat memory.			FLAT_ADDRESS = 0, ///< Address space for flat memory.
	GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).			GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
	REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)			REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)

	CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2).			CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2).
	LOCAL_ADDRESS = 3, ///< Address space for local memory.			LOCAL_ADDRESS = 3, ///< Address space for local memory.
	PRIVATE_ADDRESS = 5, ///< Address space for private memory.			PRIVATE_ADDRESS = 5, ///< Address space for private memory.

	CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory.			CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory.

	BUFFER_FAT_POINTER = 7, ///< Address space for 160-bit buffer fat pointers.			BUFFER_FAT_POINTER = 7, ///< Address space for 160-bit buffer fat pointers.

				/// Internal address spaces. Can be freely renumbered.
				STREAMOUT_REGISTER = 128, ///< Address space for GS NGG Streamout registers.
				/// end Internal address spaces.

	/// Address space for direct addressable parameter memory (CONST0).			/// Address space for direct addressable parameter memory (CONST0).
	PARAM_D_ADDRESS = 6,			PARAM_D_ADDRESS = 6,
	/// Address space for indirect addressable parameter memory (VTX1).			/// Address space for indirect addressable parameter memory (VTX1).
	PARAM_I_ADDRESS = 7,			PARAM_I_ADDRESS = 7,

	// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on			// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on
	// this order to be able to dynamically index a constant buffer, for			// this order to be able to dynamically index a constant buffer, for
	// example:			// example:
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,090 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_buffer_atomic_fadd: {
Info.flags \|= MachineMemOperand::MOLoad \| MachineMemOperand::MOStore;		Info.flags \|= MachineMemOperand::MOLoad \| MachineMemOperand::MOStore;

const ConstantInt *Vol = dyn_cast<ConstantInt>(CI.getOperand(4));		const ConstantInt *Vol = dyn_cast<ConstantInt>(CI.getOperand(4));
if (!Vol \|\| !Vol->isZero())		if (!Vol \|\| !Vol->isZero())
Info.flags \|= MachineMemOperand::MOVolatile;		Info.flags \|= MachineMemOperand::MOVolatile;

return true;		return true;
}		}
		case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
		case Intrinsic::amdgcn_ds_sub_gs_reg_rtn: {
		Info.opc = ISD::INTRINSIC_W_CHAIN;
		Info.memVT = MVT::getVT(CI.getOperand(0)->getType());
		Info.ptrVal = nullptr;
		Info.fallbackAddressSpace = AMDGPUAS::STREAMOUT_REGISTER;
		Info.flags = MachineMemOperand::MOLoad \| MachineMemOperand::MOStore;
		return true;
		}
case Intrinsic::amdgcn_ds_append:		case Intrinsic::amdgcn_ds_append:
case Intrinsic::amdgcn_ds_consume: {		case Intrinsic::amdgcn_ds_consume: {
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
Info.memVT = MVT::getVT(CI.getType());		Info.memVT = MVT::getVT(CI.getType());
Info.ptrVal = CI.getOperand(0);		Info.ptrVal = CI.getOperand(0);
Info.align.reset();		Info.align.reset();
Info.flags \|= MachineMemOperand::MOLoad \| MachineMemOperand::MOStore;		Info.flags \|= MachineMemOperand::MOLoad \| MachineMemOperand::MOStore;

▲ Show 20 Lines • Show All 12,371 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.add.gs.reg.rtn.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s

	declare i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32, i32 immarg)			declare i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32, i32 immarg)
	declare i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32, i32 immarg)			declare i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32, i32 immarg)

	define amdgpu_gs void @test_add_32(i32 %arg) {			define amdgpu_gs void @test_add_32(i32 %arg) {
	; CHECK-LABEL: test_add_32:			; CHECK-LABEL: test_add_32:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_add_gs_reg_rtn v[0:1], v0 offset:16 gds			; CHECK-NEXT: ds_add_gs_reg_rtn v[0:1], v0 offset:16 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%unused = call i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32 %arg, i32 16)			%unused = call i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32 %arg, i32 16)
	ret void			ret void
	}			}

	define amdgpu_gs void @test_add_32_use(i32 %arg, ptr addrspace(1) %out) {			define amdgpu_gs void @test_add_32_use(i32 %arg, ptr addrspace(1) %out) {
	; CHECK-LABEL: test_add_32_use:			; CHECK-LABEL: test_add_32_use:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_add_gs_reg_rtn v[3:4], v0 offset:16 gds			; CHECK-NEXT: ds_add_gs_reg_rtn v[3:4], v0 offset:16 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: global_store_b32 v[1:2], v3, off			; CHECK-NEXT: global_store_b32 v[1:2], v3, off
	; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%res = call i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32 %arg, i32 16)			%res = call i32 @llvm.amdgcn.ds.add.gs.reg.rtn.i32(i32 %arg, i32 16)
	store i32 %res, ptr addrspace(1) %out, align 4			store i32 %res, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	define amdgpu_gs void @test_add_64(i32 %arg) {			define amdgpu_gs void @test_add_64(i32 %arg) {
	; CHECK-LABEL: test_add_64:			; CHECK-LABEL: test_add_64:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_add_gs_reg_rtn v[0:1], v0 offset:32 gds			; CHECK-NEXT: ds_add_gs_reg_rtn v[0:1], v0 offset:32 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%unused = call i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32 %arg, i32 32)			%unused = call i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32 %arg, i32 32)
	ret void			ret void
	}			}

	define amdgpu_gs void @test_add_64_use(i32 %arg, ptr addrspace(1) %out) {			define amdgpu_gs void @test_add_64_use(i32 %arg, ptr addrspace(1) %out) {
	; CHECK-LABEL: test_add_64_use:			; CHECK-LABEL: test_add_64_use:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_add_gs_reg_rtn v[3:4], v0 offset:32 gds			; CHECK-NEXT: ds_add_gs_reg_rtn v[3:4], v0 offset:32 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: global_store_b64 v[1:2], v[3:4], off			; CHECK-NEXT: global_store_b64 v[1:2], v[3:4], off
	; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%res = call i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32 %arg, i32 32)			%res = call i64 @llvm.amdgcn.ds.add.gs.reg.rtn.i64(i32 %arg, i32 32)
	store i64 %res, ptr addrspace(1) %out, align 4			store i64 %res, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.sub.gs.reg.rtn.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s

	declare i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32, i32 immarg)			declare i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32, i32 immarg)
	declare i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32, i32 immarg)			declare i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32, i32 immarg)

	define amdgpu_gs void @test_sub_32(i32 %arg) {			define amdgpu_gs void @test_sub_32(i32 %arg) {
	; CHECK-LABEL: test_sub_32:			; CHECK-LABEL: test_sub_32:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_sub_gs_reg_rtn v[0:1], v0 offset:16 gds			; CHECK-NEXT: ds_sub_gs_reg_rtn v[0:1], v0 offset:16 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%unused = call i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32 %arg, i32 16)			%unused = call i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32 %arg, i32 16)
	ret void			ret void
	}			}

	define amdgpu_gs void @test_sub_32_use(i32 %arg, ptr addrspace(1) %out) {			define amdgpu_gs void @test_sub_32_use(i32 %arg, ptr addrspace(1) %out) {
	; CHECK-LABEL: test_sub_32_use:			; CHECK-LABEL: test_sub_32_use:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_sub_gs_reg_rtn v[3:4], v0 offset:16 gds			; CHECK-NEXT: ds_sub_gs_reg_rtn v[3:4], v0 offset:16 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: global_store_b32 v[1:2], v3, off			; CHECK-NEXT: global_store_b32 v[1:2], v3, off
	; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%res = call i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32 %arg, i32 16)			%res = call i32 @llvm.amdgcn.ds.sub.gs.reg.rtn.i32(i32 %arg, i32 16)
	store i32 %res, ptr addrspace(1) %out, align 4			store i32 %res, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	define amdgpu_gs void @test_sub_64(i32 %arg) {			define amdgpu_gs void @test_sub_64(i32 %arg) {
	; CHECK-LABEL: test_sub_64:			; CHECK-LABEL: test_sub_64:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_sub_gs_reg_rtn v[0:1], v0 offset:32 gds			; CHECK-NEXT: ds_sub_gs_reg_rtn v[0:1], v0 offset:32 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%unused = call i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32 %arg, i32 32)			%unused = call i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32 %arg, i32 32)
	ret void			ret void
	}			}

	define amdgpu_gs void @test_sub_64_use(i32 %arg, ptr addrspace(1) %out) {			define amdgpu_gs void @test_sub_64_use(i32 %arg, ptr addrspace(1) %out) {
	; CHECK-LABEL: test_sub_64_use:			; CHECK-LABEL: test_sub_64_use:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: ds_sub_gs_reg_rtn v[3:4], v0 offset:32 gds			; CHECK-NEXT: ds_sub_gs_reg_rtn v[3:4], v0 offset:32 gds
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: buffer_gl0_inv
	; CHECK-NEXT: buffer_gl1_inv
	; CHECK-NEXT: global_store_b64 v[1:2], v[3:4], off			; CHECK-NEXT: global_store_b64 v[1:2], v[3:4], off
	; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; CHECK-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	%res = call i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32 %arg, i32 32)			%res = call i64 @llvm.amdgcn.ds.sub.gs.reg.rtn.i64(i32 %arg, i32 32)
	store i64 %res, ptr addrspace(1) %out, align 4			store i64 %res, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}