This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
2/2
IntrinsicsAMDGPU.td
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUGISel.td
-
AMDGPULegalizerInfo.h
1/4
AMDGPULegalizerInfo.cpp
-
AMDGPURegisterBankInfo.cpp
-
BUFInstructions.td
12/12
SIISelLowering.cpp
2/2
SIInstrInfo.cpp
-
SIInstrInfo.td
-
SIInstructions.td
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
4/4
llvm.amdgcn.raw.buffer.load.lds.ll

Differential D124884

[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
ClosedPublic

Authored by rampitec on May 3 2022, 3:01 PM.

Download Raw Diff

Details

Reviewers

arsenm

Commits

rG791ec1c68e3b: [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds

Diff Detail

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,060 ms	x64 debian > libFuzzer.libFuzzer::large.test
	60,040 ms	x64 debian > lit.lit::max-failures.py

Event Timeline

rampitec created this revision.May 3 2022, 3:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 3:01 PM

Herald added subscribers: hsmhsm, foad, kerbowa and 8 others. · View Herald Transcript

rampitec requested review of this revision.May 3 2022, 3:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 3:01 PM

Herald added a subscriber: wdng. · View Herald Transcript

It uses a single mem operand with both load and store in addrspace(4). The addrspace(4) is common for all buffer intrinsics memops. In fact neither MemSDNode nor MemIntrinsicSDNode can have 2 mem ops. Only MachineSDNode and final MI can. I certainly do now want to create a MachineSDNode here and duplicate a lot of buffer operations logic. If we believe we really want 2 mem ops these can be split in the FinalizeLowering.

It will also need to be rebased on top of D124550 to handle hazard between M0 initialization and LDS DMA.

Why does this need an intrinsic? I thought the whole point of the LDS DMA thing was an optimization the backend would perform and doesn't need to be exposed directly

In D124884#3489729, @arsenm wrote:

Why does this need an intrinsic? I thought the whole point of the LDS DMA thing was an optimization the backend would perform and doesn't need to be exposed directly

We cannot match this pattern. If you look at the addressing mode this is byzantine. Yet another addtid instruction on steroids.

arsenm added inline comments.May 3 2022, 3:44 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4289	Why is this needed if it's in the MMO?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	Should use the return / data type?

Harbormaster completed remote builds in B162572: Diff 426850.May 3 2022, 4:11 PM

rampitec added inline comments.May 3 2022, 4:11 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4289	I believe to select based on the MMO I would need to write a complex pattern.

rampitec updated this revision to Diff 426876.May 3 2022, 4:15 PM

rampitec marked an inline comment as done.

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	Changed to return. What do you mean by 'use data type'?

arsenm added inline comments.May 3 2022, 4:19 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	You're looking at the pointer element type instead of the return / data type. i.e. we would have i8/i16/i32 return types and you don't need to look at the pointer
1199–1200	I just noticed there is no return type so this is just introducing a dependency on typed pointers which is a no-go. I don't actually see why can't we match these from the buffer intrinsic plus LDS access?

rampitec added inline comments.May 3 2022, 4:20 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	It does not return anything. This instruction does not have vdata. The only way to know the size is by looking at the overloaded LDS base pointer pointee.

rampitec added inline comments.May 3 2022, 4:24 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	`LDS address = LDS_base + LDS_offset + inst_offset + (TIDinWave * 4)` We do not have TIDinWave, certainly not after selection. Even before selection it is extremely problematic. Why typed pointer is a no-go if that works?

rampitec added inline comments.May 3 2022, 4:33 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	On top of that MEM_ADDR also depends on the TID. It not the same address as a normal buffer_load would use with the same operands.

arsenm added inline comments.May 3 2022, 4:34 PM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1268	This should be addrspace 3 pointers only also
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	Pointee types have been removed from the IR. If this really needs the type it would need to use an attribute on the parameter to carry it which may be new territory

rampitec added inline comments.May 3 2022, 4:44 PM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1268	It cannot be overloaded on the pointee type, infrastructure limitation. We are using llvm_anyptr_ty everywhere in such context. If in turn we cannot use pointee types at all then this could be non overloaded pointer to void in addrspace 3.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1199–1200	It does not really need type but it needs size. I can add immediate to the intrinsic and switch to void* for LDS base.

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

In D124884#3489967, @rampitec wrote:

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

Or maybe after the pointer, although it is less convenient for lowering.

Harbormaster completed remote builds in B162590: Diff 426876.May 3 2022, 5:45 PM

In D124884#3489967, @rampitec wrote:

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

There's the elementtype attribute for this case which some arm intrinsics seem to be using. Not sure how you're supposed to define an intrinsic to use it though

In D124884#3490643, @arsenm wrote:

In D124884#3489967, @rampitec wrote:

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

There's the elementtype attribute for this case which some arm intrinsics seem to be using. Not sure how you're supposed to define an intrinsic to use it though

Apparently this isn't well developed but works. The verifier is hardcoding these intrinsics (it's also looking at the call site instead of the intrinsic declaration attributes)

In D124884#3490649, @arsenm wrote:

In D124884#3490643, @arsenm wrote:

In D124884#3489967, @rampitec wrote:

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

There's the elementtype attribute for this case which some arm intrinsics seem to be using. Not sure how you're supposed to define an intrinsic to use it though

Apparently this isn't well developed but works. The verifier is hardcoding these intrinsics (it's also looking at the call site instead of the intrinsic declaration attributes)

What's wrong with the idea of an i32 imm %size argument? That seems to me more in line with the philosophy of caring less about types.

In D124884#3490731, @nhaehnle wrote:

What's wrong with the idea of an i32 imm %size argument? That seems to me more in line with the philosophy of caring less about types.

I'd prefer to keep any intrinsics that look like a load or store to look more like the regular load or store instructions. All of these arbitrary immediate parameters are uglier (e.g. the memory ordering arguments that don't actually work on some of the atomics)

In D124884#3490643, @arsenm wrote:

In D124884#3489967, @rampitec wrote:

To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.

There's the elementtype attribute for this case which some arm intrinsics seem to be using. Not sure how you're supposed to define an intrinsic to use it though

This will need a clang builtin to produce the attribute. I am not sure we really want to expose it as a clang builtin.

Herald added a subscriber: jsilvanus. · View Herald TranscriptMay 4 2022, 12:42 PM

In D124884#3490745, @arsenm wrote:

In D124884#3490731, @nhaehnle wrote:

What's wrong with the idea of an i32 imm %size argument? That seems to me more in line with the philosophy of caring less about types.

I'd prefer to keep any intrinsics that look like a load or store to look more like the regular load or store instructions. All of these arbitrary immediate parameters are uglier (e.g. the memory ordering arguments that don't actually work on some of the atomics)

It is more or less similar to memcpy, and memcpy uses size argument.

asroy added a subscriber: asroy.May 4 2022, 2:30 PM

asroy added inline comments.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll
17	m0 holds the size of LDS, should we save the value of m0 before overwriting it, and write the value back before issuing ds_read?

arsenm added inline comments.May 4 2022, 2:33 PM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll
17	Every user of m0 is supposed to set it itself, and we hopefully clean up the redundant rewrites. It's not something that's generally saved and restored per operation

rampitec added inline comments.May 4 2022, 2:34 PM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll
17	DS_* do not read M0 since gfx9. These intrinsics are only available since gfx9. Moreover, on gfx8 and earlier selection of DS opcodes takes care about M0 initialization right before the opcode.

Removed the overload and added i32 %size operand instead.
LDS pointer is i8 addrspace(3) now qualified with the address space.
Rebased on the change to handle hazards between m0 initialization and these operations.

Harbormaster completed remote builds in B162803: Diff 427150.May 4 2022, 5:15 PM

rampitec added a child revision: D125034: [AMDGPU] Add llvm.amdgcn.struct.buffer.load.lds intrinsic.May 5 2022, 12:02 PM

In D124884#3489729, @arsenm wrote:

Why does this need an intrinsic? I thought the whole point of the LDS DMA thing was an optimization the backend would perform and doesn't need to be exposed directly

In D124884#3492464, @rampitec wrote:

Removed the overload and added i32 %size operand instead.

LDS pointer is i8 addrspace(3) now qualified with the address space.

Rebased on the change to handle hazards between m0 initialization and these operations.

Just to be clear , Is your expectation that intrinsic user to save and restore m0 before calling buffer_load lds intrinsic?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll
17	Just to be clear , Is your expectation that intrinsic user to save and restore m0 before calling buffer_load lds intrinsic?

In D124884#3496841, @ramjana wrote:

Just to be clear , Is your expectation that intrinsic user to save and restore m0 before calling buffer_load lds intrinsic?

No, you do not have to.

Do not split voffset because inst_offset is applied to both VMEM and LDS address and voffset is not. Add a separate operand instead.

rampitec mentioned this in D125034: [AMDGPU] Add llvm.amdgcn.struct.buffer.load.lds intrinsic.May 10 2022, 12:09 PM

Harbormaster completed remote builds in B163751: Diff 428454.May 10 2022, 1:54 PM

Removed support for wide than DWORD ops. See D125409.

Herald added a subscriber: kosarev. · View Herald TranscriptMay 11 2022, 1:39 PM

Harbormaster completed remote builds in B163972: Diff 428759.May 11 2022, 4:22 PM

piotr added a subscriber: piotr.May 13 2022, 2:31 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4277–4279	What is preventing this from clobbering M0? There are intrinsics like int_amdgcn_interp* that have a dependency on M0. Shouldn't the code save the existing M0 and restore it after the load?

arsenm added inline comments.May 13 2022, 2:55 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4277–4279	m0 isn't treated as a preserved value. Each user is supposed to initialize m0 itself
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8250	I don't see how / where this preserves the LDS bit
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
389	If you're going to rely on the memory operand, the verifier needs to start enforcing these have one memory operand (well, 2 actually with the same sizes)

Return false from getMemOperandsWithOffsetWidth() instead of checking mem operand.

Harbormaster completed remote builds in B164340: Diff 429278.May 13 2022, 11:28 AM

Switched to direct select which allows to use 2 separate memory operands.
The patch now handles both raw and struct intrinsics.

rampitec marked 2 inline comments as done.May 13 2022, 1:58 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8250	It has different number of operands comparing to the SIbuffer_load, so selects into _LDS versions of opcodes. In fact after I have removed offset split because we cannot do it on one pointer only, and dropped multi-dword support I start thinking it might be better to drop SIbuffer_load_lds, patterns, and produce MachineSDNode right here (like in the D125279 for global load), it will not be so much code anymore and I will be able to produce 2 separate memory operands.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
389	On a second thought it is better to just return false here. We cannot have a reasonable pointer here on either side anyway, and in fact even 2 memory operands which it should ideally have should be of a different size for a sub-dword operations. A load can be sub-dword, but the store is always extended to a dword.

Harbormaster completed remote builds in B164387: Diff 429347.May 13 2022, 4:04 PM

arsenm added inline comments.May 16 2022, 2:10 PM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3080 ↗	(On Diff #429347)	Should return false, the verifier isn't enforcing this
3130 ↗	(On Diff #429347)	The verifier should probably be enforcing MMO ordering if you're going to rely on that
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8243	Ditto, verifier isn't enforcing this so shouldn't assert

rampitec marked 2 inline comments as done.May 16 2022, 2:13 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3130 ↗	(On Diff #429347)	I do not rely on their order. Not anymore.

Changed asserts to cannot select.

Harbormaster completed remote builds in B164749: Diff 429847.May 16 2022, 3:09 PM

arsenm accepted this revision.May 17 2022, 9:03 AM

This revision is now accepted and ready to land.May 17 2022, 9:03 AM

This revision was landed with ongoing or failed builds.May 17 2022, 10:32 AM

Closed by commit rG791ec1c68e3b: [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG791ec1c68e3b: [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds.

rampitec mentioned this in D125731: [AMDGPU] No need to wait before issuing LDS DMA.May 17 2022, 10:51 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

15 lines

lib/

Target/

AMDGPU/

AMDGPUGISel.td

1 line

AMDGPULegalizerInfo.h

2 lines

AMDGPULegalizerInfo.cpp

50 lines

AMDGPURegisterBankInfo.cpp

29 lines

39 lines

32 lines

4 lines

12 lines

12 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.raw.buffer.load.lds.ll

141 lines

Diff 426850

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 1,255 Lines • ▼ Show 20 Lines	class AMDGPUBufferAtomicFP : Intrinsic <
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[ImmArg<ArgIndex<4>>, IntrWillReturn], "", [SDNPMemOperand]>,		[ImmArg<ArgIndex<4>>, IntrWillReturn], "", [SDNPMemOperand]>,
AMDGPURsrcIntrinsic<1, 0>;		AMDGPURsrcIntrinsic<1, 0>;

// Legacy form of the intrinsic. raw and struct forms should be preferred.		// Legacy form of the intrinsic. raw and struct forms should be preferred.
def int_amdgcn_buffer_atomic_fadd : AMDGPUBufferAtomicFP;		def int_amdgcn_buffer_atomic_fadd : AMDGPUBufferAtomicFP;

		class AMDGPURawBufferLoadLDS : Intrinsic <
		[],
		[llvm_v4i32_ty, // rsrc(SGPR)
		llvm_anyptr_ty, // LDS base offset
		arsenmUnsubmitted Done Reply Inline Actions This should be addrspace 3 pointers only also arsenm: This should be addrspace 3 pointers only also
		rampitecAuthorUnsubmitted Done Reply Inline Actions It cannot be overloaded on the pointee type, infrastructure limitation. We are using llvm_anyptr_ty everywhere in such context. If in turn we cannot use pointee types at all then this could be non overloaded pointer to void in addrspace 3. rampitec: It cannot be overloaded on the pointee type, infrastructure limitation. We are using…
		llvm_i32_ty, // offset(VGPR/imm, included in bounds checking and swizzling)
		llvm_i32_ty, // soffset(SGPR/imm, excluded from bounds checking and swizzling)
		llvm_i32_ty], // auxiliary data (imm, cachepolicy (bit 0 = glc,
		// bit 1 = slc,
		// bit 2 = dlc on gfx10+))
		// swizzled buffer (bit 3 = swz))
		[IntrWillReturn, NoCapture<ArgIndex<1>>, ImmArg<ArgIndex<4>>], "", [SDNPMemOperand]>,
		AMDGPURsrcIntrinsic<0>;
		def int_amdgcn_raw_buffer_load_lds : AMDGPURawBufferLoadLDS;

} // defset AMDGPUBufferIntrinsics		} // defset AMDGPUBufferIntrinsics

// Uses that do not set the done bit should set IntrWriteMem on the		// Uses that do not set the done bit should set IntrWriteMem on the
// call site.		// call site.
def int_amdgcn_exp : Intrinsic <[], [		def int_amdgcn_exp : Intrinsic <[], [
llvm_i32_ty, // tgt,		llvm_i32_ty, // tgt,
llvm_i32_ty, // en		llvm_i32_ty, // en
llvm_any_ty, // src0 (f32 or i32)		llvm_any_ty, // src0 (f32 or i32)
▲ Show 20 Lines • Show All 793 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	def : GINodeEquiv<G_AMDGPU_ATOMIC_CMPXCHG, AMDGPUatomic_cmp_swap>;			def : GINodeEquiv<G_AMDGPU_ATOMIC_CMPXCHG, AMDGPUatomic_cmp_swap>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD, SIbuffer_load>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD, SIbuffer_load>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_USHORT, SIbuffer_load_ushort>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_USHORT, SIbuffer_load_ushort>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_UBYTE, SIbuffer_load_ubyte>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_UBYTE, SIbuffer_load_ubyte>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_SSHORT, SIbuffer_load_short>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_SSHORT, SIbuffer_load_short>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_SBYTE, SIbuffer_load_byte>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_SBYTE, SIbuffer_load_byte>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_FORMAT, SIbuffer_load_format>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_FORMAT, SIbuffer_load_format>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_FORMAT_D16, SIbuffer_load_format_d16>;			def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_FORMAT_D16, SIbuffer_load_format_d16>;
				def : GINodeEquiv<G_AMDGPU_BUFFER_LOAD_LDS, SIbuffer_load_lds>;
	def : GINodeEquiv<G_AMDGPU_TBUFFER_LOAD_FORMAT, SItbuffer_load>;			def : GINodeEquiv<G_AMDGPU_TBUFFER_LOAD_FORMAT, SItbuffer_load>;
	def : GINodeEquiv<G_AMDGPU_TBUFFER_LOAD_FORMAT_D16, SItbuffer_load_d16>;			def : GINodeEquiv<G_AMDGPU_TBUFFER_LOAD_FORMAT_D16, SItbuffer_load_d16>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_STORE, SIbuffer_store>;			def : GINodeEquiv<G_AMDGPU_BUFFER_STORE, SIbuffer_store>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_SHORT, SIbuffer_store_short>;			def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_SHORT, SIbuffer_store_short>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_BYTE, SIbuffer_store_byte>;			def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_BYTE, SIbuffer_store_byte>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_FORMAT, SIbuffer_store_format>;			def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_FORMAT, SIbuffer_store_format>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_FORMAT_D16, SIbuffer_store_format_d16>;			def : GINodeEquiv<G_AMDGPU_BUFFER_STORE_FORMAT_D16, SIbuffer_store_format_d16>;
	def : GINodeEquiv<G_AMDGPU_TBUFFER_STORE_FORMAT, SItbuffer_store>;			def : GINodeEquiv<G_AMDGPU_TBUFFER_STORE_FORMAT, SItbuffer_store>;
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	Register fixStoreSourceType(MachineIRBuilder &B, Register VData,
bool IsFormat) const;		bool IsFormat) const;

bool legalizeBufferStore(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeBufferStore(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B, bool IsTyped,		MachineIRBuilder &B, bool IsTyped,
bool IsFormat) const;		bool IsFormat) const;
bool legalizeBufferLoad(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeBufferLoad(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B, bool IsFormat,		MachineIRBuilder &B, bool IsFormat,
bool IsTyped) const;		bool IsTyped) const;
		bool legalizeBufferLoadLds(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;
bool legalizeBufferAtomic(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeBufferAtomic(MachineInstr &MI, MachineIRBuilder &B,
Intrinsic::ID IID) const;		Intrinsic::ID IID) const;

bool legalizeBVHIntrinsic(MachineInstr &MI, MachineIRBuilder &B) const;		bool legalizeBVHIntrinsic(MachineInstr &MI, MachineIRBuilder &B) const;

bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;		bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;

bool legalizeImageIntrinsic(		bool legalizeImageIntrinsic(
Show All 25 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 4,239 Lines • ▼ Show 20 Lines	else {
B.buildMerge(Dst, Repack);		B.buildMerge(Dst, Repack);
}		}
}		}

MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		bool AMDGPULegalizerInfo::legalizeBufferLoadLds(MachineInstr &MI,
		MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const {
		MachineMemOperand MMO = MI.memoperands_begin();
		const LLT MemTy = MMO->getMemoryType();
		const LLT S32 = LLT::scalar(32);

		Register RSrc = MI.getOperand(1).getReg();

		// The struct intrinsic variants add one additional operand over raw.
		const bool HasVIndex = MI.getNumOperands() == 7;
		Register VIndex;
		int OpOffset = 0;
		if (HasVIndex) {
		VIndex = MI.getOperand(3).getReg();
		OpOffset = 1;
		} else {
		VIndex = B.buildConstant(S32, 0).getReg(0);
		}

		Register VOffset = MI.getOperand(3 + OpOffset).getReg();
		Register SOffset = MI.getOperand(4 + OpOffset).getReg();
		Register M0Val = MI.getOperand(2).getReg();

		unsigned AuxiliaryData = MI.getOperand(5 + OpOffset).getImm();
		unsigned ImmOffset;

		std::tie(VOffset, ImmOffset) = splitBufferOffsets(B, VOffset);
		updateBufferMMO(MMO, VOffset, SOffset, ImmOffset, VIndex, MRI);

		B.buildInstr(AMDGPU::COPY)
		.addDef(AMDGPU::M0)
		piotrUnsubmitted Not Done Reply Inline Actions What is preventing this from clobbering M0? There are intrinsics like int_amdgcn_interp* that have a dependency on M0. Shouldn't the code save the existing M0 and restore it after the load? piotr: What is preventing this from clobbering M0? There are intrinsics like int_amdgcn_interp* that…
		arsenmUnsubmitted Not Done Reply Inline Actions m0 isn't treated as a preserved value. Each user is supposed to initialize m0 itself arsenm: m0 isn't treated as a preserved value. Each user is supposed to initialize m0 itself
		.addUse(M0Val);
		B.buildInstr(AMDGPU::G_AMDGPU_BUFFER_LOAD_LDS)
		.addUse(RSrc) // rsrc
		.addUse(VIndex) // vindex
		.addUse(VOffset) // voffset
		.addUse(SOffset) // soffset
		.addImm(ImmOffset) // offset(imm)
		.addImm(AuxiliaryData) // cachepolicy, swizzled buffer(imm)
		.addImm(HasVIndex ? -1 : 0) // idxen(imm)
		.addImm(MemTy.getSizeInBytes()) // data byte size
		arsenmUnsubmitted Not Done Reply Inline Actions Why is this needed if it's in the MMO? arsenm: Why is this needed if it's in the MMO?
		rampitecAuthorUnsubmitted Done Reply Inline Actions I believe to select based on the MMO I would need to write a complex pattern. rampitec: I believe to select based on the MMO I would need to write a complex pattern.
		.addMemOperand(MMO);

		MI.eraseFromParent();
		return true;
		}

bool AMDGPULegalizerInfo::legalizeAtomicIncDec(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeAtomicIncDec(MachineInstr &MI,
MachineIRBuilder &B,		MachineIRBuilder &B,
bool IsInc) const {		bool IsInc) const {
unsigned Opc = IsInc ? AMDGPU::G_AMDGPU_ATOMIC_INC :		unsigned Opc = IsInc ? AMDGPU::G_AMDGPU_ATOMIC_INC :
AMDGPU::G_AMDGPU_ATOMIC_DEC;		AMDGPU::G_AMDGPU_ATOMIC_DEC;
B.buildInstr(Opc)		B.buildInstr(Opc)
.addDef(MI.getOperand(0).getReg())		.addDef(MI.getOperand(0).getReg())
.addUse(MI.getOperand(2).getReg())		.addUse(MI.getOperand(2).getReg())
▲ Show 20 Lines • Show All 1,061 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
case Intrinsic::amdgcn_struct_buffer_store_format:		case Intrinsic::amdgcn_struct_buffer_store_format:
return legalizeBufferStore(MI, MRI, B, false, true);		return legalizeBufferStore(MI, MRI, B, false, true);
case Intrinsic::amdgcn_raw_tbuffer_store:		case Intrinsic::amdgcn_raw_tbuffer_store:
case Intrinsic::amdgcn_struct_tbuffer_store:		case Intrinsic::amdgcn_struct_tbuffer_store:
return legalizeBufferStore(MI, MRI, B, true, true);		return legalizeBufferStore(MI, MRI, B, true, true);
case Intrinsic::amdgcn_raw_buffer_load:		case Intrinsic::amdgcn_raw_buffer_load:
case Intrinsic::amdgcn_struct_buffer_load:		case Intrinsic::amdgcn_struct_buffer_load:
return legalizeBufferLoad(MI, MRI, B, false, false);		return legalizeBufferLoad(MI, MRI, B, false, false);
		case Intrinsic::amdgcn_raw_buffer_load_lds:
		return legalizeBufferLoadLds(MI, MRI, B);
case Intrinsic::amdgcn_raw_buffer_load_format:		case Intrinsic::amdgcn_raw_buffer_load_format:
case Intrinsic::amdgcn_struct_buffer_load_format:		case Intrinsic::amdgcn_struct_buffer_load_format:
return legalizeBufferLoad(MI, MRI, B, true, false);		return legalizeBufferLoad(MI, MRI, B, true, false);
case Intrinsic::amdgcn_raw_tbuffer_load:		case Intrinsic::amdgcn_raw_tbuffer_load:
case Intrinsic::amdgcn_struct_tbuffer_load:		case Intrinsic::amdgcn_struct_tbuffer_load:
return legalizeBufferLoad(MI, MRI, B, true, true);		return legalizeBufferLoad(MI, MRI, B, true, true);
case Intrinsic::amdgcn_raw_buffer_atomic_swap:		case Intrinsic::amdgcn_raw_buffer_atomic_swap:
case Intrinsic::amdgcn_struct_buffer_atomic_swap:		case Intrinsic::amdgcn_struct_buffer_atomic_swap:
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 2,858 Lines • ▼ Show 20 Lines	void AMDGPURegisterBankInfo::applyMappingImpl(
case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:		case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:		case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:		case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {		case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);
executeInWaterfallLoop(MI, MRI, {1, 4});		executeInWaterfallLoop(MI, MRI, {1, 4});
return;		return;
}		}
		case AMDGPU::G_AMDGPU_BUFFER_LOAD_LDS: {
		applyDefaultMapping(OpdMapper);
		executeInWaterfallLoop(MI, MRI, {0, 3});
		return;
		}
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
▲ Show 20 Lines • Show All 1,082 Lines • ▼ Show 20 Lines	case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {

// soffset		// soffset
OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);		OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);

// Any remaining operands are immediates and were correctly null		// Any remaining operands are immediates and were correctly null
// initialized.		// initialized.
break;		break;
}		}
		case AMDGPU::G_AMDGPU_BUFFER_LOAD_LDS: {
		// rsrc
		OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);

		// vindex
		OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);

		// voffset
		OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);

		// soffset
		OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);

		// Any remaining operands are immediates and were correctly null
		// initialized.
		break;
		}
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:		case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
▲ Show 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_raw_tbuffer_load: {
// FIXME: Should make intrinsic ID the last operand of the instruction,		// FIXME: Should make intrinsic ID the last operand of the instruction,
// then this would be the same as store		// then this would be the same as store
OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);		OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);		OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);		OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);		OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
break;		break;
}		}
		case Intrinsic::amdgcn_raw_buffer_load_lds: {
		OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
		OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
		OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
		OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
		break;
		}
case Intrinsic::amdgcn_raw_buffer_store:		case Intrinsic::amdgcn_raw_buffer_store:
case Intrinsic::amdgcn_raw_buffer_store_format:		case Intrinsic::amdgcn_raw_buffer_store_format:
case Intrinsic::amdgcn_raw_tbuffer_store: {		case Intrinsic::amdgcn_raw_tbuffer_store: {
OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);		OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);		OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);		OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);		OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
break;		break;
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/BUFInstructions.td

	Show First 20 Lines • Show All 1,296 Lines • ▼ Show 20 Lines
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v3i32, "BUFFER_LOAD_DWORDX3">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v3i32, "BUFFER_LOAD_DWORDX3">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4f32, "BUFFER_LOAD_DWORDX4">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4f32, "BUFFER_LOAD_DWORDX4">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4i32, "BUFFER_LOAD_DWORDX4">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4i32, "BUFFER_LOAD_DWORDX4">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_byte, i32, "BUFFER_LOAD_SBYTE">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_byte, i32, "BUFFER_LOAD_SBYTE">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_short, i32, "BUFFER_LOAD_SSHORT">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_short, i32, "BUFFER_LOAD_SSHORT">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ubyte, i32, "BUFFER_LOAD_UBYTE">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ubyte, i32, "BUFFER_LOAD_UBYTE">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ushort, i32, "BUFFER_LOAD_USHORT">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ushort, i32, "BUFFER_LOAD_USHORT">;

				multiclass MUBUF_LoadLDSIntrinsicPat<SDPatternOperator ld, int size, string opcode> {
				def : GCNPat<
				(ld v4i32:$rsrc, 0, 0, i32:$soffset, timm:$offset,
				timm:$auxiliary, 0, (i32 size)),
				(!cast<MUBUF_Pseudo>(opcode # _OFFSET) SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),
				(extract_cpol $auxiliary), (extract_swz $auxiliary))
				>;

				def : GCNPat<
				(ld v4i32:$rsrc, 0, i32:$voffset, i32:$soffset, timm:$offset,
				timm:$auxiliary, 0, (i32 size)),
				(!cast<MUBUF_Pseudo>(opcode # _OFFEN) VGPR_32:$voffset, SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),
				(extract_cpol $auxiliary), (extract_swz $auxiliary))
				>;

				def : GCNPat<
				(ld v4i32:$rsrc, i32:$vindex, 0, i32:$soffset, timm:$offset,
				timm:$auxiliary, timm, (i32 size)),
				(!cast<MUBUF_Pseudo>(opcode # _IDXEN) VGPR_32:$vindex, SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),
				(extract_cpol $auxiliary), (extract_swz $auxiliary))
				>;

				def : GCNPat<
				(ld v4i32:$rsrc, i32:$vindex, i32:$voffset, i32:$soffset, timm:$offset,
				timm:$auxiliary, timm, (i32 size)),
				(!cast<MUBUF_Pseudo>(opcode # _BOTHEN)
				(REG_SEQUENCE VReg_64, VGPR_32:$vindex, sub0, VGPR_32:$voffset, sub1),
				SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),
				(extract_cpol $auxiliary), (extract_swz $auxiliary))
				>;
				}

				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 4, "BUFFER_LOAD_DWORD_LDS">;
				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 8, "BUFFER_LOAD_DWORDX2_LDS">;
				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 12, "BUFFER_LOAD_DWORDX3_LDS">;
				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 16, "BUFFER_LOAD_DWORDX4_LDS">;
				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 1, "BUFFER_LOAD_UBYTE_LDS">;
				defm : MUBUF_LoadLDSIntrinsicPat<SIbuffer_load_lds, 2, "BUFFER_LOAD_USHORT_LDS">;

	multiclass MUBUF_StoreIntrinsicPat<SDPatternOperator name, ValueType vt,			multiclass MUBUF_StoreIntrinsicPat<SDPatternOperator name, ValueType vt,
	string opcode, ValueType memoryVt = vt> {			string opcode, ValueType memoryVt = vt> {
	defvar st = !if(!eq(memoryVt, vt), name, mubuf_intrinsic_store<name, memoryVt>);			defvar st = !if(!eq(memoryVt, vt), name, mubuf_intrinsic_store<name, memoryVt>);

	def : GCNPat<			def : GCNPat<
	(st vt:$vdata, v4i32:$rsrc, 0, 0, i32:$soffset, timm:$offset,			(st vt:$vdata, v4i32:$rsrc, 0, 0, i32:$soffset, timm:$offset,
	timm:$auxiliary, 0),			timm:$auxiliary, 0),
	(!cast<MUBUF_Pseudo>(opcode # _OFFSET_exact) getVregSrcForVT<vt>.ret:$vdata, SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),			(!cast<MUBUF_Pseudo>(opcode # _OFFSET_exact) getVregSrcForVT<vt>.ret:$vdata, SReg_128:$rsrc, SCSrc_b32:$soffset, (as_i16timm $offset),
	▲ Show 20 Lines • Show All 1,499 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,185 Lines • ▼ Show 20 Lines	if (Attr.hasFnAttr(Attribute::ReadOnly)) {
ISD::INTRINSIC_W_CHAIN;		ISD::INTRINSIC_W_CHAIN;
Info.memVT = MVT::getVT(CI.getArgOperand(0)->getType());		Info.memVT = MVT::getVT(CI.getArgOperand(0)->getType());
Info.flags \|= MachineMemOperand::MOLoad \|		Info.flags \|= MachineMemOperand::MOLoad \|
MachineMemOperand::MOStore \|		MachineMemOperand::MOStore \|
MachineMemOperand::MODereferenceable;		MachineMemOperand::MODereferenceable;

// XXX - Should this be volatile without known ordering?		// XXX - Should this be volatile without known ordering?
Info.flags \|= MachineMemOperand::MOVolatile;		Info.flags \|= MachineMemOperand::MOVolatile;

		switch (IntrID) {
		default:
		break;
		case Intrinsic::amdgcn_raw_buffer_load_lds:
		Info.memVT = MVT::getVT(CI.getArgOperand(1)->getType()->
		getNonOpaquePointerElementType());
		arsenmUnsubmitted Done Reply Inline Actions Should use the return / data type? arsenm: Should use the return / data type?
		rampitecAuthorUnsubmitted Done Reply Inline Actions Changed to return. What do you mean by 'use data type'? rampitec: Changed to return. What do you mean by 'use data type'?
		arsenmUnsubmitted Done Reply Inline Actions You're looking at the pointer element type instead of the return / data type. i.e. we would have i8/i16/i32 return types and you don't need to look at the pointer arsenm: You're looking at the pointer element type instead of the return / data type. i.e. we would…
		arsenmUnsubmitted Done Reply Inline Actions I just noticed there is no return type so this is just introducing a dependency on typed pointers which is a no-go. I don't actually see why can't we match these from the buffer intrinsic plus LDS access? arsenm: I just noticed there is no return type so this is just introducing a dependency on typed…
		rampitecAuthorUnsubmitted Done Reply Inline Actions `LDS address = LDS_base + LDS_offset + inst_offset + (TIDinWave * 4)` We do not have TIDinWave, certainly not after selection. Even before selection it is extremely problematic. Why typed pointer is a no-go if that works? rampitec: `LDS address = LDS_base + LDS_offset + inst_offset + (TIDinWave * 4)` We do not have TIDinWave…
		rampitecAuthorUnsubmitted Done Reply Inline Actions On top of that MEM_ADDR also depends on the TID. It not the same address as a normal buffer_load would use with the same operands. rampitec: On top of that MEM_ADDR also depends on the TID. It not the same address as a normal…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It does not return anything. This instruction does not have vdata. The only way to know the size is by looking at the overloaded LDS base pointer pointee. rampitec: It does not return anything. This instruction does not have vdata. The only way to know the…
		arsenmUnsubmitted Done Reply Inline Actions Pointee types have been removed from the IR. If this really needs the type it would need to use an attribute on the parameter to carry it which may be new territory arsenm: Pointee types have been removed from the IR. If this really needs the type it would need to use…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It does not really need type but it needs size. I can add immediate to the intrinsic and switch to void* for LDS base. rampitec: It does not really need type but it needs size. I can add immediate to the intrinsic and switch…
		break;
		}
}		}
return true;		return true;
}		}

switch (IntrID) {		switch (IntrID) {
case Intrinsic::amdgcn_atomic_inc:		case Intrinsic::amdgcn_atomic_inc:
case Intrinsic::amdgcn_atomic_dec:		case Intrinsic::amdgcn_atomic_dec:
case Intrinsic::amdgcn_ds_ordered_add:		case Intrinsic::amdgcn_ds_ordered_add:
▲ Show 20 Lines • Show All 7,012 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_struct_buffer_store_format: {
// Handle BUFFER_STORE_BYTE/SHORT overloaded intrinsics		// Handle BUFFER_STORE_BYTE/SHORT overloaded intrinsics
EVT VDataType = VData.getValueType().getScalarType();		EVT VDataType = VData.getValueType().getScalarType();
if (!IsD16 && !VDataVT.isVector() && EltType.getSizeInBits() < 32)		if (!IsD16 && !VDataVT.isVector() && EltType.getSizeInBits() < 32)
return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);		return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}
		case Intrinsic::amdgcn_raw_buffer_load_lds: {
		auto Offsets = splitBufferOffsets(Op.getOperand(4), DAG);
		auto *M = cast<MemSDNode>(Op);
		MachineMemOperand *MMO = M->getMemOperand();

		SDValue Ops[] = {
		Op.getOperand(0), // Chain
		Op.getOperand(2), // rsrc
		DAG.getConstant(0, DL, MVT::i32), // vindex
		Offsets.first, // voffset
		Op.getOperand(5), // soffset
		Offsets.second, // offset
		Op.getOperand(6), // cachepolicy, swizzled buffer
		arsenmUnsubmitted Done Reply Inline Actions Ditto, verifier isn't enforcing this so shouldn't assert arsenm: Ditto, verifier isn't enforcing this so shouldn't assert
		DAG.getTargetConstant(0, DL, MVT::i1), // idxen
		DAG.getTargetConstant(MMO->getSize(), DL, MVT::i32), // data byte size
		copyToM0(DAG, Chain, DL, Op.getOperand(3)).getValue(1) // Glue
		};

		updateBufferMMO(MMO, Ops[3], Ops[4], Ops[5]);

		arsenmUnsubmitted Done Reply Inline Actions I don't see how / where this preserves the LDS bit arsenm: I don't see how / where this preserves the LDS bit
		rampitecAuthorUnsubmitted Done Reply Inline Actions It has different number of operands comparing to the SIbuffer_load, so selects into _LDS versions of opcodes. In fact after I have removed offset split because we cannot do it on one pointer only, and dropped multi-dword support I start thinking it might be better to drop SIbuffer_load_lds, patterns, and produce MachineSDNode right here (like in the D125279 for global load), it will not be so much code anymore and I will be able to produce 2 separate memory operands. rampitec: It has different number of operands comparing to the SIbuffer_load, so selects into _LDS…
		return DAG.getMemIntrinsicNode(AMDGPUISD::BUFFER_LOAD, DL,
		M->getVTList(), Ops, M->getMemoryVT(), MMO);
		}
case Intrinsic::amdgcn_end_cf:		case Intrinsic::amdgcn_end_cf:
return SDValue(DAG.getMachineNode(AMDGPU::SI_END_CF, DL, MVT::Other,		return SDValue(DAG.getMachineNode(AMDGPU::SI_END_CF, DL, MVT::Other,
Op->getOperand(2), Chain), 0);		Op->getOperand(2), Chain), 0);

default: {		default: {
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG, true);		return lowerImage(Op, ImageDimIntr, DAG, true);
▲ Show 20 Lines • Show All 4,498 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	if (SOffset) {
BaseOps.push_back(SOffset);		BaseOps.push_back(SOffset);
else		else
Offset += SOffset->getImm();		Offset += SOffset->getImm();
}		}
// Get appropriate operand, and compute width accordingly.		// Get appropriate operand, and compute width accordingly.
DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdst);		DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdst);
if (DataOpIdx == -1)		if (DataOpIdx == -1)
DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdata);		DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdata);
		if (DataOpIdx == -1) { // LDS DMA
		Width = (*LdSt.memoperands_begin())->getSize();
		arsenmUnsubmitted Done Reply Inline Actions If you're going to rely on the memory operand, the verifier needs to start enforcing these have one memory operand (well, 2 actually with the same sizes) arsenm: If you're going to rely on the memory operand, the verifier needs to start enforcing these have…
		rampitecAuthorUnsubmitted Done Reply Inline Actions On a second thought it is better to just return false here. We cannot have a reasonable pointer here on either side anyway, and in fact even 2 memory operands which it should ideally have should be of a different size for a sub-dword operations. A load can be sub-dword, but the store is always extended to a dword. rampitec: On a second thought it is better to just return false here. We cannot have a reasonable pointer…
		return true;
		}
Width = getOpSize(LdSt, DataOpIdx);		Width = getOpSize(LdSt, DataOpIdx);
return true;		return true;
}		}

if (isMIMG(LdSt)) {		if (isMIMG(LdSt)) {
int SRsrcIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::srsrc);		int SRsrcIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::srsrc);
BaseOps.push_back(&LdSt.getOperand(SRsrcIdx));		BaseOps.push_back(&LdSt.getOperand(SRsrcIdx));
int VAddr0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr0);		int VAddr0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vaddr0);
▲ Show 20 Lines • Show All 8,040 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	def SDTBufferLoad : SDTypeProfile<1, 7,
SDTCisVT<1, v4i32>, // rsrc		SDTCisVT<1, v4i32>, // rsrc
SDTCisVT<2, i32>, // vindex(VGPR)		SDTCisVT<2, i32>, // vindex(VGPR)
SDTCisVT<3, i32>, // voffset(VGPR)		SDTCisVT<3, i32>, // voffset(VGPR)
SDTCisVT<4, i32>, // soffset(SGPR)		SDTCisVT<4, i32>, // soffset(SGPR)
SDTCisVT<5, i32>, // offset(imm)		SDTCisVT<5, i32>, // offset(imm)
SDTCisVT<6, i32>, // cachepolicy, swizzled buffer(imm)		SDTCisVT<6, i32>, // cachepolicy, swizzled buffer(imm)
SDTCisVT<7, i1>]>; // idxen(imm)		SDTCisVT<7, i1>]>; // idxen(imm)

		def SDTBufferLoadLDS : SDTypeProfile<0, 8,
		[SDTCisVT<0, v4i32>, // rsrc
		SDTCisVT<1, i32>, // vindex(VGPR)
		SDTCisVT<2, i32>, // voffset(VGPR)
		SDTCisVT<3, i32>, // soffset(SGPR)
		SDTCisVT<4, i32>, // offset(imm)
		SDTCisVT<5, i32>, // cachepolicy, swizzled buffer(imm)
		SDTCisVT<6, i1>, // idxen(imm)
		SDTCisVT<7, i32>]>; // data byte size

def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,		def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_ubyte : SDNode <"AMDGPUISD::BUFFER_LOAD_UBYTE", SDTBufferLoad,		def SIbuffer_load_ubyte : SDNode <"AMDGPUISD::BUFFER_LOAD_UBYTE", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_ushort : SDNode <"AMDGPUISD::BUFFER_LOAD_USHORT", SDTBufferLoad,		def SIbuffer_load_ushort : SDNode <"AMDGPUISD::BUFFER_LOAD_USHORT", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_byte : SDNode <"AMDGPUISD::BUFFER_LOAD_BYTE", SDTBufferLoad,		def SIbuffer_load_byte : SDNode <"AMDGPUISD::BUFFER_LOAD_BYTE", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_short: SDNode <"AMDGPUISD::BUFFER_LOAD_SHORT", SDTBufferLoad,		def SIbuffer_load_short: SDNode <"AMDGPUISD::BUFFER_LOAD_SHORT", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,		def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_format_d16 : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT_D16",		def SIbuffer_load_format_d16 : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT_D16",
SDTBufferLoad,		SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
		def SIbuffer_load_lds : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoadLDS,
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPInGlue]>;

def SDTBufferStore : SDTypeProfile<0, 8,		def SDTBufferStore : SDTypeProfile<0, 8,
[ // vdata		[ // vdata
SDTCisVT<1, v4i32>, // rsrc		SDTCisVT<1, v4i32>, // rsrc
SDTCisVT<2, i32>, // vindex(VGPR)		SDTCisVT<2, i32>, // vindex(VGPR)
SDTCisVT<3, i32>, // voffset(VGPR)		SDTCisVT<3, i32>, // voffset(VGPR)
SDTCisVT<4, i32>, // soffset(SGPR)		SDTCisVT<4, i32>, // soffset(SGPR)
SDTCisVT<5, i32>, // offset(imm)		SDTCisVT<5, i32>, // offset(imm)
▲ Show 20 Lines • Show All 2,583 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 3,003 Lines • ▼ Show 20 Lines	class TBufferLoadGenericInstruction : AMDGPUGenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type1:$rsrc, type2:$vindex, type2:$voffset,		let InOperandList = (ins type1:$rsrc, type2:$vindex, type2:$voffset,
type2:$soffset, untyped_imm_0:$offset, untyped_imm_0:$format,		type2:$soffset, untyped_imm_0:$offset, untyped_imm_0:$format,
untyped_imm_0:$cachepolicy, untyped_imm_0:$idxen);		untyped_imm_0:$cachepolicy, untyped_imm_0:$idxen);
let hasSideEffects = 0;		let hasSideEffects = 0;
let mayLoad = 1;		let mayLoad = 1;
}		}

		class BufferLoadLdsGenericInstruction : AMDGPUGenericInstruction {
		let OutOperandList = (outs);
		let InOperandList = (ins type1:$rsrc, type2:$vindex, type2:$voffset,
		type2:$soffset, untyped_imm_0:$offset,
		untyped_imm_0:$cachepolicy, untyped_imm_0:$idxen,
		untyped_imm_0:$size);
		let hasSideEffects = 0;
		let mayLoad = 1;
		let mayStore = 1;
		}

def G_AMDGPU_BUFFER_LOAD_UBYTE : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_UBYTE : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD_SBYTE : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_SBYTE : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD_USHORT : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_USHORT : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD_SSHORT : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_SSHORT : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD_FORMAT : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_FORMAT : BufferLoadGenericInstruction;
def G_AMDGPU_BUFFER_LOAD_FORMAT_D16 : BufferLoadGenericInstruction;		def G_AMDGPU_BUFFER_LOAD_FORMAT_D16 : BufferLoadGenericInstruction;
def G_AMDGPU_TBUFFER_LOAD_FORMAT : TBufferLoadGenericInstruction;		def G_AMDGPU_TBUFFER_LOAD_FORMAT : TBufferLoadGenericInstruction;
def G_AMDGPU_TBUFFER_LOAD_FORMAT_D16 : TBufferLoadGenericInstruction;		def G_AMDGPU_TBUFFER_LOAD_FORMAT_D16 : TBufferLoadGenericInstruction;
		def G_AMDGPU_BUFFER_LOAD_LDS : BufferLoadLdsGenericInstruction;

class BufferStoreGenericInstruction : AMDGPUGenericInstruction {		class BufferStoreGenericInstruction : AMDGPUGenericInstruction {
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = (ins type0:$vdata, type1:$rsrc, type2:$vindex, type2:$voffset,		let InOperandList = (ins type0:$vdata, type1:$rsrc, type2:$vindex, type2:$voffset,
type2:$soffset, untyped_imm_0:$offset,		type2:$soffset, untyped_imm_0:$offset,
untyped_imm_0:$cachepolicy, untyped_imm_0:$idxen);		untyped_imm_0:$cachepolicy, untyped_imm_0:$idxen);
let hasSideEffects = 0;		let hasSideEffects = 0;
let mayStore = 1;		let mayStore = 1;
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GCN
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GCN

				declare void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32>, float addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3v2f32(<4 x i32>, <2 x float> addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3v3f32(<4 x i32>, <3 x float> addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3v4f32(<4 x i32>, <4 x float> addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3i16(<4 x i32>, i16 addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3f16(<4 x i32>, half addrspace(3)* nocapture, i32, i32, i32)
				declare void @llvm.amdgcn.raw.buffer.load.lds.p3i8(<4 x i32>, i8 addrspace(3)* nocapture, i32, i32, i32)

				define amdgpu_ps float @buffer_load_lds(<4 x i32> inreg %rsrc, float addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dword off, s[0:3], 0 lds
				asroyUnsubmitted Done Reply Inline Actions m0 holds the size of LDS, should we save the value of m0 before overwriting it, and write the value back before issuing ds_read? asroy: m0 holds the size of LDS, should we save the value of m0 before overwriting it, and write the…
				rampitecAuthorUnsubmitted Done Reply Inline Actions DS_* do not read M0 since gfx9. These intrinsics are only available since gfx9. Moreover, on gfx8 and earlier selection of DS opcodes takes care about M0 initialization right before the opcode. rampitec: DS_* do not read M0 since gfx9. These intrinsics are only available since gfx9. Moreover, on…
				arsenmUnsubmitted Done Reply Inline Actions Every user of m0 is supposed to set it itself, and we hopefully clean up the redundant rewrites. It's not something that's generally saved and restored per operation arsenm: Every user of m0 is supposed to set it itself, and we hopefully clean up the redundant rewrites.
				ramjanaUnsubmitted Done Reply Inline Actions Just to be clear , Is your expectation that intrinsic user to save and restore m0 before calling buffer_load lds intrinsic? ramjana: Just to be clear , Is your expectation that intrinsic user to save and restore m0 before…
				; GCN-NEXT: buffer_load_dword off, s[0:3], 0 glc lds
				; GCN-NEXT: buffer_load_dword off, s[0:3], 0 slc lds
				; GCN-NEXT: v_mov_b32_e32 v0, s4
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: ds_read_b32 v0, v0
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: ; return to shader part epilog
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 0, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 0, i32 0, i32 1)
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 0, i32 0, i32 2)
				%res = load float, float addrspace(3)* %lds
				ret float %res
				}

				define amdgpu_ps void @buffer_load_lds_imm_offset(<4 x i32> inreg %rsrc, float addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_imm_offset:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dword off, s[0:3], 0 offset:2048 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 2048, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_v_offset(<4 x i32> inreg %rsrc, float addrspace(3)* inreg %lds, i32 %voffset) {
				; GCN-LABEL: buffer_load_lds_v_offset:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dword v0, s[0:3], 0 offen lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 %voffset, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_s_offset(<4 x i32> inreg %rsrc, float addrspace(3)* inreg %lds, i32 inreg %soffset) {
				; GCN-LABEL: buffer_load_lds_s_offset:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dword off, s[0:3], s5 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 0, i32 %soffset, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_vs_offset(<4 x i32> inreg %rsrc, float addrspace(3)* inreg %lds, i32 %voffset, i32 inreg %soffset) {
				; GCN-LABEL: buffer_load_lds_vs_offset:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dword v0, s[0:3], s5 offen lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f32(<4 x i32> %rsrc, float addrspace(3)* %lds, i32 %voffset, i32 %soffset, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_v2f32(<4 x i32> inreg %rsrc, <2 x float> addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_v2f32:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dwordx2 off, s[0:3], 0 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3v2f32(<4 x i32> %rsrc, <2 x float> addrspace(3)* %lds, i32 0, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_v3f32(<4 x i32> inreg %rsrc, <3 x float> addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_v3f32:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dwordx3 off, s[0:3], 0 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3v3f32(<4 x i32> %rsrc, <3 x float> addrspace(3)* %lds, i32 0, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_v4f32(<4 x i32> inreg %rsrc, <4 x float> addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_v4f32:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_dwordx4 off, s[0:3], 0 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3v4f32(<4 x i32> %rsrc, <4 x float> addrspace(3)* %lds, i32 0, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_i16(<4 x i32> inreg %rsrc, i16 addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_i16:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_ushort off, s[0:3], 0 offset:2048 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3i16(<4 x i32> %rsrc, i16 addrspace(3)* %lds, i32 2048, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_f16(<4 x i32> inreg %rsrc, half addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_f16:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_ushort off, s[0:3], 0 offset:2048 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3f16(<4 x i32> %rsrc, half addrspace(3)* %lds, i32 2048, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @buffer_load_lds_i8(<4 x i32> inreg %rsrc, i8 addrspace(3)* inreg %lds) {
				; GCN-LABEL: buffer_load_lds_i8:
				; GCN: ; %bb.0: ; %main_body
				; GCN-NEXT: s_mov_b32 m0, s4
				; GCN-NEXT: buffer_load_ubyte off, s[0:3], 0 offset:2048 lds
				; GCN-NEXT: s_endpgm
				main_body:
				call void @llvm.amdgcn.raw.buffer.load.lds.p3i8(<4 x i32> %rsrc, i8 addrspace(3)* %lds, i32 2048, i32 0, i32 0)
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.ldsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 426850

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/BUFInstructions.td

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.lds.ll

[AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
ClosedPublic