This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsics for tbuffer load and store
ClosedPublic

Authored by dstuttard on Mar 7 2017, 1:44 AM.

Download Raw Diff

Details

Reviewers

arsenm
tstellar

Commits

rG70e8bc1bf3fd: [AMDGPU] Add intrinsics for tbuffer load and store
rL306031: [AMDGPU] Add intrinsics for tbuffer load and store

Summary

Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Diff Detail

Build Status

Buildable 6729
Build 6729: arc lint + arc unit

Event Timeline

dstuttard created this revision.Mar 7 2017, 1:44 AM

Herald added subscribers: tpr, tony-tye, yaxunl and 4 others. · View Herald TranscriptMar 7 2017, 1:44 AM

dstuttard added reviewers: arsenm, tstellar.Mar 7 2017, 1:47 AM

arsenm added inline comments.Mar 7 2017, 4:14 PM

include/llvm/IR/IntrinsicsAMDGPU.td
481	This should be part of the mangled return type, I don't think there's any reason to have to add this here
487–491	I don't think we can directly expose offen/idxen. tfe definitely can't be exposed this way because it changes the register class of the output

tstellar added inline comments.Mar 7 2017, 4:19 PM

include/llvm/IR/IntrinsicsAMDGPU.td
487–491	We need some solution to deal with swizzled addressing, because these have really complicated clamping rules. The easiest thing to do is to expose all fields so the user is responsible for making sure the operands are correct.

arsenm added inline comments.Mar 7 2017, 4:25 PM

include/llvm/IR/IntrinsicsAMDGPU.td
487–491	I was wondering if we could use fat pointers, and then have a constant argument hint for what the swizzle factor in the resource is so we can match the addressing mode. Would that work?

tstellar added inline comments.Mar 7 2017, 4:26 PM

include/llvm/IR/IntrinsicsAMDGPU.td
487–491	I think so, you could also use different address spaces to indicate swizzling vs non-swizzling.

See inline comments.

Is this worth looking at for now if there is a different re-factor already in progress?

include/llvm/IR/IntrinsicsAMDGPU.td
481	Isn't the issue that the 3 dword variant can't be generated as we can only have 1,2 and 4 dword return types?
487–491	I think I need to look at this again, in particular the fat pointers. This initial implementation was based on extending the pre-existing llvm.SI.tbuffer.store that already existed. Matt has also pointed out that there is a re-factor of this in progress already (for store). I guess it makes sense to wait for that as well?

arsenm added inline comments.Mar 8 2017, 12:31 PM

include/llvm/IR/IntrinsicsAMDGPU.td
481	We can have a 3x dword return type in the intrinsic. The only issue is the DAG doesn't treat this as a legal machine type. I have patches to fix this, and we can work around this in codegen. Long term this won't be an issue with global isel.

t-tye added a subscriber: t-tye.Mar 22 2017, 6:40 PM

tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:47 PM

I've rewritten the implementation of tbuffer load and store and based it around buffer.load.format
and buffer.store.format

This has resulted in a cleaner intrinsic that doesn't include all the instruction fields (no enable
bits, no tfe, single index and offset fields that determine what to do based on what is passed in)

No support yet for the 3 dword variants - but this is a step along the way

This doesn't include fat pointer support - something that can be added later when this is rolled out
more widely

I've also removed the old llvm.SI.tbuffer.store implementation and replaced the uses of it in the lit
testing (this part can be reverted if required).

tstellar added inline comments.May 9 2017, 8:02 AM

include/llvm/IR/IntrinsicsAMDGPU.td
494	Will these be able to support swizzled addressing with only one offset field?

dstuttard added inline comments.May 9 2017, 10:03 AM

include/llvm/IR/IntrinsicsAMDGPU.td
494	This implementation mirrors the buffer.load.format and buffer.store.format intrinsics, so presumably they would suffer from the same problem? You can generate an instruction with both the index and offset enable bits set (and vgpr offset and index), as well as an offset like this: %offs.2 = add i32 %offs, 52 %vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 %vindex, i32 %offs.2, i32 14, i32 4, i1 0, i1 0) which results in an instruction like this: tbuffer_load_format_xyzw v[0:3], v[0:1], s[0:3], dfmt:14, nfmt:4, 0 idxen offen offset:52 does that give you everything required for a swizzle?

arsenm added inline comments.May 9 2017, 7:08 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3452–3453	Replacing the uses in the test is fine, but we need to keep this around until mesa is updated to use the new intrinsic

Reintroduce the legacy llvm.SI.tbuffer.store intrinsic

I've left the tests using the new implementation, but have included the old test for this
intrinsic.

In the end the easiest way to do this was to re-insert the old code and rename with "legacy" tags as
appropriate.
Hopefully at some point this can be removed.

tstellar added inline comments.May 10 2017, 9:24 AM

include/llvm/IR/IntrinsicsAMDGPU.td
494	You should review the "Range Checking" section in the ISA docs for all generations. If you arbitrarily move offset values between soffset, inst_offset and vgpr_offset, you risk failing the range checks, which would turn the instruction into a nop. I think the existing intrinsics are probably broken in the same way, but I think we need to come up with some solution here to avoid generating broken code. I think the easiest thing to do is to expose all offset fields, inst_offset, vgpr_offset, and soffset through the intrinsic, and then have the backend not try to optimize the offsets. We can always go in later and add compiler hints to let the compiler know when it's safe to move the offsets.

As suggested by Tom Stellard (due to potential issues with range checking) , I've changed the
intrinsics to have explicit operands for vindex, voffset, soffset and offset. The backend no longer
attempts to optimise by folding in any offsets it can spot in preceding instructions.

I've used separate vindex and voffset operands which means that the idxen and offen flags aren't
required in the intrinsic itself (and the compiler does the appropriate thing to fold the values
into a reg_sequence when both are used). To me this seems a little cleaner.

A later change (again suggested by Tom) might be to re-enable some of these optimisations with
appropriate compiler hints.

ping

Any chance of another review of this change?
Does it look good to go now?
Thanks

Needs some assembler tests for the new nfmt/dfmt parsing

include/llvm/IR/IntrinsicsAMDGPU.td
479	any_ty, so f32 is a valid type too
lib/Target/AMDGPU/SIISelLowering.cpp
3334–3337	This isn't the right place for this, although it is what other intrinsics are doing right now. As a follow up patch it would be good to move the MMO creation into getTgtMemIntrinsic
test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.ll
5	Should use GCN check prefix (and space after ;)
test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.store.ll
3	Ditto

Data can now be float or int
I decided that since the load variant would support this, so should the store. This meant re-jigging
the implementation slightly to use an approach more similar to load to enable the use of any for the
store as well (unless you can see a more efficient way to do it).

Noted about the MMO creation - I'll look into a later patch for this.

Made suggested changes to the tbuffer.load and tbuffer.store tests.
Added some more tests for the assembler etc. (See extra tests in MC/AMDGPU and
MC/Disassembler/AMDGPU

Harbormaster completed remote builds in B6728: Diff 100135.May 24 2017, 10:56 AM

Forgot to update comment about float as an option for overloading

arsenm added inline comments.May 25 2017, 3:21 AM

lib/Target/AMDGPU/BUFInstructions.td
91	You shouldn't need to change anything about the encoding for the legacy intrinsics. They should purely be an input issue. I don'ts any repeated MTBUF encoding class, so this seems to be an unnecessary rename?
1574–1577	Again we don't need separate machine instruction or encoding definitions to support the old intrinsics
lib/Target/AMDGPU/SIISelLowering.cpp
3521	You shouldn't need a separate node here. You can insert whatever code is necessary to convert to the new intrinsic/node operands are here, such as bitcasting the types or inserting missing constants etc.

See responses to your individual comments - it could be that I've missed something in the way this could be done.

I was reluctant to make big changes to the original implementation (other than renaming) as:

I wanted to get the change in relatively quickly
I didn't want to risk breaking something in the legacy support that would mean errors creeping in to Mesa implementation for example

I think that the implementation could be simplified if the SIISelLowering code for the legacy intrinsic did the transformation to the new style at that point (as you suggest), but the issue with that is that the new implementation doesn't support the same fields as the old one (TFE, IDXEN, OFFEN). It could be that ignoring TFE is a valid thing to do and won't break anything, and it should be ok to ignore IDXEN and OFFEN, but is that definite?

lib/Target/AMDGPU/BUFInstructions.td
91	Unless I'm missing something - isn't the problem here that the pseudo instructions are using a different approach to representing the offset and idx enable (amongst others). I was wary of changing the old implementation too much in order to preserve the old mechanism without any risk of breaking it.
lib/Target/AMDGPU/SIISelLowering.cpp
3521	Part of the issue is that the legacy intrinsic has TFE which isn't present in the new one. If TFE is never used it could be done - but it would effectively ignore this field (unless the intrinsic is changed to remove it too). We could get this lowering code to check that the offen, idxen are correct given the parameters that have been passed in, and perhaps to make sure that tfe is never used (and assert if it is). Not sure this is necessarily the approach we should take given that the only reason the legacy support is still there is to make sure that code using this approach doesn't break.

arsenm added inline comments.Jun 5 2017, 7:48 AM

lib/Target/AMDGPU/BUFInstructions.td
91	The pseudo instructions here are not related to the operands at all. We define pseudo and real encoding classes for each instruction encoding type because in VI, the instruction encoding changed. The pseudo instructions allow codegen to ignore this detail, and then the MC emission can select the right real encoding opcode at the last possible moment. It is entirely disconnected from the intrinsics. MUBUF instructions have all the combinations of idxen and offen enabled, this again isn't related to the input intrinsics. There are actually this many different addressing mode options that need to be represented with different instructions, this isn't changing.

dstuttard added inline comments.Jun 6 2017, 8:58 AM

lib/Target/AMDGPU/BUFInstructions.td
91	I misunderstood your original comment. I took it to mean that the legacy and new implementation could be unified at this point. This is the original implementation of the legacy intrinsic (with the name changed to indicate it is the old implementation), I didn't think there was much point in tidying up the implementation. However, if you think it is worthwhile I'll make the necessary changes (I'm inclined to agree that the legacy code should be shortened if there is an easy option to do so).

Removed the bulk of the legacy implementation and now lower to the new form earlier

Some issues to note:

tfe can't be used in the legacy intrinsic (asserts if this is detected). The old implementation

wouldn't have worked if tfe was enabled anyway so this isn't a loss of functionality

idxen and offen can't both be used - this is supported by the new intrinsic, but the legacy

implementation wouldn't have worked in this mode anyway. Rather than re-write the legacy intrinsic
to support 1 or 2 dword VAddr operands (which is what is required) I've just added an assert to
catch this case. Again - this isn't a loss of functionality over the old intrinsic anyway.

The legacy intrinsic supports 3-vec form (TBUFFER_STORE_FORMAT_XYZ) which I've also added in this

updated implementation for the legacy intrinsic. The new intrinsic doesn't support it at the moment
and will have to be added at some point.

LGTM besides formatting issues

lib/Target/AMDGPU/SIISelLowering.cpp
3510	80 column limit (a few other places too)
3516	Variable name styles

Corrected 80 column formatting and variable names

Herald added subscribers: krytarowski, javed.absar, mgorny and 5 others. · View Herald TranscriptJun 21 2017, 1:50 AM

Inadvertently included extra changes in last diff

LGTM

This revision is now accepted and ready to land.Jun 21 2017, 3:12 PM

Closed by commit rL306031: [AMDGPU] Add intrinsics for tbuffer load and store (authored by dstuttard). · Explain WhyJun 22 2017, 9:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

27 lines

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

2 lines

AMDGPUISelLowering.cpp

2 lines

AsmParser/

AMDGPUAsmParser.cpp

48 lines

BUFInstructions.td

449 lines

InstPrinter/

AMDGPUInstPrinter.h

6 lines

AMDGPUInstPrinter.cpp

18 lines

SIISelLowering.cpp

107 lines

SIInstrInfo.td

38 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.tbuffer.load.ll

109 lines

llvm.amdgcn.tbuffer.store.ll

110 lines

merge-store-crash.ll

4 lines

merge-store-usedef.ll

4 lines

mubuf.ll

8 lines

scheduler-subrange-crash.ll

18 lines

si-triv-disjoint-mem-access.ll

9 lines

MC/

AMDGPU/

mtbuf.s

36 lines

Disassembler/

AMDGPU/

mtbuf_vi.txt

22 lines

Diff 100136

include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	class AMDGPUBufferStore : Intrinsic <
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[IntrWriteMem]>;		[IntrWriteMem]>;
def int_amdgcn_buffer_store_format : AMDGPUBufferStore;		def int_amdgcn_buffer_store_format : AMDGPUBufferStore;
def int_amdgcn_buffer_store : AMDGPUBufferStore;		def int_amdgcn_buffer_store : AMDGPUBufferStore;

		def int_amdgcn_tbuffer_load : Intrinsic <
		[llvm_any_ty], // overloaded for types f32/i32, v2f32/v2i32, v4f32/v4i32
		arsenmUnsubmitted Not Done Reply Inline Actions any_ty, so f32 is a valid type too arsenm: any_ty, so f32 is a valid type too
		[llvm_v4i32_ty, // rsrc(SGPR)
		llvm_i32_ty, // vindex(VGPR)
		arsenmUnsubmitted Not Done Reply Inline Actions This should be part of the mangled return type, I don't think there's any reason to have to add this here arsenm: This should be part of the mangled return type, I don't think there's any reason to have to add…
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions Isn't the issue that the 3 dword variant can't be generated as we can only have 1,2 and 4 dword return types? dstuttard: Isn't the issue that the 3 dword variant can't be generated as we can only have 1,2 and 4 dword…
		arsenmUnsubmitted Not Done Reply Inline Actions We can have a 3x dword return type in the intrinsic. The only issue is the DAG doesn't treat this as a legal machine type. I have patches to fix this, and we can work around this in codegen. Long term this won't be an issue with global isel. arsenm: We can have a 3x dword return type in the intrinsic. The only issue is the DAG doesn't treat…
		llvm_i32_ty, // voffset(VGPR)
		llvm_i32_ty, // soffset(SGPR)
		llvm_i32_ty, // offset(imm)
		llvm_i32_ty, // dfmt(imm)
		llvm_i32_ty, // nfmt(imm)
		llvm_i1_ty, // glc(imm)
		llvm_i1_ty], // slc(imm)
		[]>;

		def int_amdgcn_tbuffer_store : Intrinsic <
		arsenmUnsubmitted Not Done Reply Inline Actions I don't think we can directly expose offen/idxen. tfe definitely can't be exposed this way because it changes the register class of the output arsenm: I don't think we can directly expose offen/idxen. tfe definitely can't be exposed this way…
		tstellarUnsubmitted Not Done Reply Inline Actions We need some solution to deal with swizzled addressing, because these have really complicated clamping rules. The easiest thing to do is to expose all fields so the user is responsible for making sure the operands are correct. tstellar: We need some solution to deal with swizzled addressing, because these have really complicated…
		arsenmUnsubmitted Not Done Reply Inline Actions I was wondering if we could use fat pointers, and then have a constant argument hint for what the swizzle factor in the resource is so we can match the addressing mode. Would that work? arsenm: I was wondering if we could use fat pointers, and then have a constant argument hint for what…
		tstellarUnsubmitted Not Done Reply Inline Actions I think so, you could also use different address spaces to indicate swizzling vs non-swizzling. tstellar: I think so, you could also use different address spaces to indicate swizzling vs non-swizzling.
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions I think I need to look at this again, in particular the fat pointers. This initial implementation was based on extending the pre-existing llvm.SI.tbuffer.store that already existed. Matt has also pointed out that there is a re-factor of this in progress already (for store). I guess it makes sense to wait for that as well? dstuttard: I think I need to look at this again, in particular the fat pointers. This initial…
		[],
		[llvm_any_ty, // vdata(VGPR), overloaded for types f32/i32, v2f32/v2i32, v4f32/v4i32
		llvm_v4i32_ty, // rsrc(SGPR)
		tstellarUnsubmitted Not Done Reply Inline Actions Will these be able to support swizzled addressing with only one offset field? tstellar: Will these be able to support swizzled addressing with only one offset field?
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions This implementation mirrors the buffer.load.format and buffer.store.format intrinsics, so presumably they would suffer from the same problem? You can generate an instruction with both the index and offset enable bits set (and vgpr offset and index), as well as an offset like this: %offs.2 = add i32 %offs, 52 %vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 %vindex, i32 %offs.2, i32 14, i32 4, i1 0, i1 0) which results in an instruction like this: tbuffer_load_format_xyzw v[0:3], v[0:1], s[0:3], dfmt:14, nfmt:4, 0 idxen offen offset:52 does that give you everything required for a swizzle? dstuttard: This implementation mirrors the buffer.load.format and buffer.store.format intrinsics, so…
		tstellarUnsubmitted Not Done Reply Inline Actions You should review the "Range Checking" section in the ISA docs for all generations. If you arbitrarily move offset values between soffset, inst_offset and vgpr_offset, you risk failing the range checks, which would turn the instruction into a nop. I think the existing intrinsics are probably broken in the same way, but I think we need to come up with some solution here to avoid generating broken code. I think the easiest thing to do is to expose all offset fields, inst_offset, vgpr_offset, and soffset through the intrinsic, and then have the backend not try to optimize the offsets. We can always go in later and add compiler hints to let the compiler know when it's safe to move the offsets. tstellar: You should review the "Range Checking" section in the ISA docs for all generations. If you…
		llvm_i32_ty, // vindex(VGPR)
		llvm_i32_ty, // voffset(VGPR)
		llvm_i32_ty, // soffset(SGPR)
		llvm_i32_ty, // offset(imm)
		llvm_i32_ty, // dfmt(imm)
		llvm_i32_ty, // nfmt(imm)
		llvm_i1_ty, // glc(imm)
		llvm_i1_ty], // slc(imm)
		[]>;

class AMDGPUBufferAtomic : Intrinsic <		class AMDGPUBufferAtomic : Intrinsic <
[llvm_i32_ty],		[llvm_i32_ty],
[llvm_i32_ty, // vdata(VGPR)		[llvm_i32_ty, // vdata(VGPR)
llvm_v4i32_ty, // rsrc(SGPR)		llvm_v4i32_ty, // rsrc(SGPR)
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[]>;		[]>;
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
INTERP_P1,		INTERP_P1,
INTERP_P2,		INTERP_P2,
PC_ADD_REL_OFFSET,		PC_ADD_REL_OFFSET,
KILL,		KILL,
DUMMY_CHAIN,		DUMMY_CHAIN,
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,		FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
STORE_MSKOR,		STORE_MSKOR,
LOAD_CONSTANT,		LOAD_CONSTANT,
		TBUFFER_STORE_FORMAT_LEGACY,
TBUFFER_STORE_FORMAT,		TBUFFER_STORE_FORMAT,
		TBUFFER_LOAD_FORMAT,
ATOMIC_CMP_SWAP,		ATOMIC_CMP_SWAP,
ATOMIC_INC,		ATOMIC_INC,
ATOMIC_DEC,		ATOMIC_DEC,
BUFFER_LOAD,		BUFFER_LOAD,
BUFFER_LOAD_FORMAT,		BUFFER_LOAD_FORMAT,
LAST_AMDGPU_ISD_NUMBER		LAST_AMDGPU_ISD_NUMBER
};		};


} // End namespace AMDGPUISD		} // End namespace AMDGPUISD

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,650 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(INIT_EXEC_FROM_INPUT)		NODE_NAME_CASE(INIT_EXEC_FROM_INPUT)
NODE_NAME_CASE(SENDMSG)		NODE_NAME_CASE(SENDMSG)
NODE_NAME_CASE(SENDMSGHALT)		NODE_NAME_CASE(SENDMSGHALT)
NODE_NAME_CASE(INTERP_MOV)		NODE_NAME_CASE(INTERP_MOV)
NODE_NAME_CASE(INTERP_P1)		NODE_NAME_CASE(INTERP_P1)
NODE_NAME_CASE(INTERP_P2)		NODE_NAME_CASE(INTERP_P2)
NODE_NAME_CASE(STORE_MSKOR)		NODE_NAME_CASE(STORE_MSKOR)
NODE_NAME_CASE(LOAD_CONSTANT)		NODE_NAME_CASE(LOAD_CONSTANT)
		NODE_NAME_CASE(TBUFFER_STORE_FORMAT_LEGACY)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
		NODE_NAME_CASE(TBUFFER_LOAD_FORMAT)
NODE_NAME_CASE(ATOMIC_CMP_SWAP)		NODE_NAME_CASE(ATOMIC_CMP_SWAP)
NODE_NAME_CASE(ATOMIC_INC)		NODE_NAME_CASE(ATOMIC_INC)
NODE_NAME_CASE(ATOMIC_DEC)		NODE_NAME_CASE(ATOMIC_DEC)
NODE_NAME_CASE(BUFFER_LOAD)		NODE_NAME_CASE(BUFFER_LOAD)
NODE_NAME_CASE(BUFFER_LOAD_FORMAT)		NODE_NAME_CASE(BUFFER_LOAD_FORMAT)
case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;		case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;
}		}
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	enum ImmTy {
ImmTyDMask,		ImmTyDMask,
ImmTyUNorm,		ImmTyUNorm,
ImmTyDA,		ImmTyDA,
ImmTyR128,		ImmTyR128,
ImmTyLWE,		ImmTyLWE,
ImmTyExpTgt,		ImmTyExpTgt,
ImmTyExpCompr,		ImmTyExpCompr,
ImmTyExpVM,		ImmTyExpVM,
		ImmTyDFMT,
		ImmTyNFMT,
ImmTyHwreg,		ImmTyHwreg,
ImmTyOff,		ImmTyOff,
ImmTySendMsg,		ImmTySendMsg,
ImmTyInterpSlot,		ImmTyInterpSlot,
ImmTyInterpAttr,		ImmTyInterpAttr,
ImmTyAttrChan,		ImmTyAttrChan,
ImmTyOpSel,		ImmTyOpSel,
ImmTyOpSelHi,		ImmTyOpSelHi,
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	public:
bool isAddr64() const { return isImmTy(ImmTyAddr64); }		bool isAddr64() const { return isImmTy(ImmTyAddr64); }
bool isOffset() const { return isImmTy(ImmTyOffset) && isUInt<16>(getImm()); }		bool isOffset() const { return isImmTy(ImmTyOffset) && isUInt<16>(getImm()); }
bool isOffset0() const { return isImmTy(ImmTyOffset0) && isUInt<16>(getImm()); }		bool isOffset0() const { return isImmTy(ImmTyOffset0) && isUInt<16>(getImm()); }
bool isOffset1() const { return isImmTy(ImmTyOffset1) && isUInt<8>(getImm()); }		bool isOffset1() const { return isImmTy(ImmTyOffset1) && isUInt<8>(getImm()); }
bool isGDS() const { return isImmTy(ImmTyGDS); }		bool isGDS() const { return isImmTy(ImmTyGDS); }
bool isGLC() const { return isImmTy(ImmTyGLC); }		bool isGLC() const { return isImmTy(ImmTyGLC); }
bool isSLC() const { return isImmTy(ImmTySLC); }		bool isSLC() const { return isImmTy(ImmTySLC); }
bool isTFE() const { return isImmTy(ImmTyTFE); }		bool isTFE() const { return isImmTy(ImmTyTFE); }
		bool isDFMT() const { return isImmTy(ImmTyDFMT) && isUInt<8>(getImm()); }
		bool isNFMT() const { return isImmTy(ImmTyNFMT) && isUInt<8>(getImm()); }
bool isBankMask() const { return isImmTy(ImmTyDppBankMask); }		bool isBankMask() const { return isImmTy(ImmTyDppBankMask); }
bool isRowMask() const { return isImmTy(ImmTyDppRowMask); }		bool isRowMask() const { return isImmTy(ImmTyDppRowMask); }
bool isBoundCtrl() const { return isImmTy(ImmTyDppBoundCtrl); }		bool isBoundCtrl() const { return isImmTy(ImmTyDppBoundCtrl); }
bool isSDWADstSel() const { return isImmTy(ImmTySdwaDstSel); }		bool isSDWADstSel() const { return isImmTy(ImmTySdwaDstSel); }
bool isSDWASrc0Sel() const { return isImmTy(ImmTySdwaSrc0Sel); }		bool isSDWASrc0Sel() const { return isImmTy(ImmTySdwaSrc0Sel); }
bool isSDWASrc1Sel() const { return isImmTy(ImmTySdwaSrc1Sel); }		bool isSDWASrc1Sel() const { return isImmTy(ImmTySdwaSrc1Sel); }
bool isSDWADstUnused() const { return isImmTy(ImmTySdwaDstUnused); }		bool isSDWADstUnused() const { return isImmTy(ImmTySdwaDstUnused); }
bool isInterpSlot() const { return isImmTy(ImmTyInterpSlot); }		bool isInterpSlot() const { return isImmTy(ImmTyInterpSlot); }
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	static void printImmTy(raw_ostream& OS, ImmTy Type) {
case ImmTyIdxen: OS << "Idxen"; break;		case ImmTyIdxen: OS << "Idxen"; break;
case ImmTyAddr64: OS << "Addr64"; break;		case ImmTyAddr64: OS << "Addr64"; break;
case ImmTyOffset: OS << "Offset"; break;		case ImmTyOffset: OS << "Offset"; break;
case ImmTyOffset0: OS << "Offset0"; break;		case ImmTyOffset0: OS << "Offset0"; break;
case ImmTyOffset1: OS << "Offset1"; break;		case ImmTyOffset1: OS << "Offset1"; break;
case ImmTyGLC: OS << "GLC"; break;		case ImmTyGLC: OS << "GLC"; break;
case ImmTySLC: OS << "SLC"; break;		case ImmTySLC: OS << "SLC"; break;
case ImmTyTFE: OS << "TFE"; break;		case ImmTyTFE: OS << "TFE"; break;
		case ImmTyDFMT: OS << "DFMT"; break;
		case ImmTyNFMT: OS << "NFMT"; break;
case ImmTyClampSI: OS << "ClampSI"; break;		case ImmTyClampSI: OS << "ClampSI"; break;
case ImmTyOModSI: OS << "OModSI"; break;		case ImmTyOModSI: OS << "OModSI"; break;
case ImmTyDppCtrl: OS << "DppCtrl"; break;		case ImmTyDppCtrl: OS << "DppCtrl"; break;
case ImmTyDppRowMask: OS << "DppRowMask"; break;		case ImmTyDppRowMask: OS << "DppRowMask"; break;
case ImmTyDppBankMask: OS << "DppBankMask"; break;		case ImmTyDppBankMask: OS << "DppBankMask"; break;
case ImmTyDppBoundCtrl: OS << "DppBoundCtrl"; break;		case ImmTyDppBoundCtrl: OS << "DppBoundCtrl"; break;
case ImmTySdwaDstSel: OS << "SdwaDstSel"; break;		case ImmTySdwaDstSel: OS << "SdwaDstSel"; break;
case ImmTySdwaSrc0Sel: OS << "SdwaSrc0Sel"; break;		case ImmTySdwaSrc0Sel: OS << "SdwaSrc0Sel"; break;
▲ Show 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	public:
OperandMatchResultTy parseSendMsgOp(OperandVector &Operands);		OperandMatchResultTy parseSendMsgOp(OperandVector &Operands);
OperandMatchResultTy parseInterpSlot(OperandVector &Operands);		OperandMatchResultTy parseInterpSlot(OperandVector &Operands);
OperandMatchResultTy parseInterpAttr(OperandVector &Operands);		OperandMatchResultTy parseInterpAttr(OperandVector &Operands);
OperandMatchResultTy parseSOppBrTarget(OperandVector &Operands);		OperandMatchResultTy parseSOppBrTarget(OperandVector &Operands);

void cvtMubuf(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, false, false); }		void cvtMubuf(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, false, false); }
void cvtMubufAtomic(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, true, false); }		void cvtMubufAtomic(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, true, false); }
void cvtMubufAtomicReturn(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, true, true); }		void cvtMubufAtomicReturn(MCInst &Inst, const OperandVector &Operands) { cvtMubufImpl(Inst, Operands, true, true); }
		void cvtMtbuf(MCInst &Inst, const OperandVector &Operands);

AMDGPUOperand::Ptr defaultGLC() const;		AMDGPUOperand::Ptr defaultGLC() const;
AMDGPUOperand::Ptr defaultSLC() const;		AMDGPUOperand::Ptr defaultSLC() const;
AMDGPUOperand::Ptr defaultTFE() const;		AMDGPUOperand::Ptr defaultTFE() const;

AMDGPUOperand::Ptr defaultDMask() const;		AMDGPUOperand::Ptr defaultDMask() const;
AMDGPUOperand::Ptr defaultUNorm() const;		AMDGPUOperand::Ptr defaultUNorm() const;
AMDGPUOperand::Ptr defaultDA() const;		AMDGPUOperand::Ptr defaultDA() const;
AMDGPUOperand::Ptr defaultR128() const;		AMDGPUOperand::Ptr defaultR128() const;
▲ Show 20 Lines • Show All 2,450 Lines • ▼ Show 20 Lines	void AMDGPUAsmParser::cvtMubufImpl(MCInst &Inst,
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOffset);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOffset);
if (!IsAtomic) { // glc is hard-coded.		if (!IsAtomic) { // glc is hard-coded.
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);
}		}
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);
}		}

		void AMDGPUAsmParser::cvtMtbuf(MCInst &Inst,
		const OperandVector &Operands) {
		OptionalImmIndexMap OptionalIdx;

		for (unsigned i = 1, e = Operands.size(); i != e; ++i) {
		AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[i]);

		// Add the register arguments
		if (Op.isReg()) {
		Op.addRegOperands(Inst, 1);
		continue;
		}

		// Handle the case where soffset is an immediate
		if (Op.isImm() && Op.getImmTy() == AMDGPUOperand::ImmTyNone) {
		Op.addImmOperands(Inst, 1);
		continue;
		}

		// Handle tokens like 'offen' which are sometimes hard-coded into the
		// asm string. There are no MCInst operands for these.
		if (Op.isToken()) {
		continue;
		}
		assert(Op.isImm());

		// Handle optional arguments
		OptionalIdx[Op.getImmTy()] = i;
		}

		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOffset);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDFMT);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyNFMT);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// mimg		// mimg
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void AMDGPUAsmParser::cvtMIMG(MCInst &Inst, const OperandVector &Operands) {		void AMDGPUAsmParser::cvtMIMG(MCInst &Inst, const OperandVector &Operands) {
unsigned I = 1;		unsigned I = 1;
const MCInstrDesc &Desc = MII.get(Inst.getOpcode());		const MCInstrDesc &Desc = MII.get(Inst.getOpcode());
for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {		for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
static const OptionalOperand AMDGPUOptionalOperandTable[] = {		static const OptionalOperand AMDGPUOptionalOperandTable[] = {
{"offen", AMDGPUOperand::ImmTyOffen, true, nullptr},		{"offen", AMDGPUOperand::ImmTyOffen, true, nullptr},
{"idxen", AMDGPUOperand::ImmTyIdxen, true, nullptr},		{"idxen", AMDGPUOperand::ImmTyIdxen, true, nullptr},
{"addr64", AMDGPUOperand::ImmTyAddr64, true, nullptr},		{"addr64", AMDGPUOperand::ImmTyAddr64, true, nullptr},
{"offset0", AMDGPUOperand::ImmTyOffset0, false, nullptr},		{"offset0", AMDGPUOperand::ImmTyOffset0, false, nullptr},
{"offset1", AMDGPUOperand::ImmTyOffset1, false, nullptr},		{"offset1", AMDGPUOperand::ImmTyOffset1, false, nullptr},
{"gds", AMDGPUOperand::ImmTyGDS, true, nullptr},		{"gds", AMDGPUOperand::ImmTyGDS, true, nullptr},
{"offset", AMDGPUOperand::ImmTyOffset, false, nullptr},		{"offset", AMDGPUOperand::ImmTyOffset, false, nullptr},
		{"dfmt", AMDGPUOperand::ImmTyDFMT, false, nullptr},
		{"nfmt", AMDGPUOperand::ImmTyNFMT, false, nullptr},
{"glc", AMDGPUOperand::ImmTyGLC, true, nullptr},		{"glc", AMDGPUOperand::ImmTyGLC, true, nullptr},
{"slc", AMDGPUOperand::ImmTySLC, true, nullptr},		{"slc", AMDGPUOperand::ImmTySLC, true, nullptr},
{"tfe", AMDGPUOperand::ImmTyTFE, true, nullptr},		{"tfe", AMDGPUOperand::ImmTyTFE, true, nullptr},
{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},		{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},
{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},		{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},
{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},		{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},
{"da", AMDGPUOperand::ImmTyDA, true, nullptr},		{"da", AMDGPUOperand::ImmTyDA, true, nullptr},
{"r128", AMDGPUOperand::ImmTyR128, true, nullptr},		{"r128", AMDGPUOperand::ImmTyR128, true, nullptr},
▲ Show 20 Lines • Show All 651 Lines • Show Last 20 Lines

lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	string ret =
"")))));		"")))));
}		}

class MUBUFAddr64Table <bit is_addr64, string suffix = ""> {		class MUBUFAddr64Table <bit is_addr64, string suffix = ""> {
bit IsAddr64 = is_addr64;		bit IsAddr64 = is_addr64;
string OpName = NAME # suffix;		string OpName = NAME # suffix;
}		}

		class MTBUFAddr64Table <bit is_addr64, string suffix = ""> {
		bit IsAddr64 = is_addr64;
		string OpName = NAME # suffix;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MTBUF classes		// MTBUF classes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class MTBUF_Pseudo <string opName, dag outs, dag ins,		class MTBUF_Legacy_Pseudo <string opName, dag outs, dag ins,
string asmOps, list<dag> pattern=[]> :		string asmOps, list<dag> pattern=[]> :
InstSI<outs, ins, "", pattern>,		InstSI<outs, ins, "", pattern>,
SIMCInstr<opName, SIEncodingFamily.NONE> {		SIMCInstr<opName, SIEncodingFamily.NONE> {

let isPseudo = 1;		let isPseudo = 1;
let isCodeGenOnly = 1;		let isCodeGenOnly = 1;
let Size = 8;		let Size = 8;
let UseNamedOperandTable = 1;		let UseNamedOperandTable = 1;

string Mnemonic = opName;		string Mnemonic = opName;
string AsmOperands = asmOps;		string AsmOperands = asmOps;

let VM_CNT = 1;		let VM_CNT = 1;
let EXP_CNT = 1;		let EXP_CNT = 1;
let MTBUF = 1;		let MTBUF = 1;
let Uses = [EXEC];		let Uses = [EXEC];

let hasSideEffects = 0;		let hasSideEffects = 0;
let SchedRW = [WriteVMEM];		let SchedRW = [WriteVMEM];
}		}

class MTBUF_Real <MTBUF_Pseudo ps> :		class MTBUF_Legacy_Real <MTBUF_Legacy_Pseudo ps> :
		arsenmUnsubmitted Not Done Reply Inline Actions You shouldn't need to change anything about the encoding for the legacy intrinsics. They should purely be an input issue. I don'ts any repeated MTBUF encoding class, so this seems to be an unnecessary rename? arsenm: You shouldn't need to change anything about the encoding for the legacy intrinsics. They should…
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions Unless I'm missing something - isn't the problem here that the pseudo instructions are using a different approach to representing the offset and idx enable (amongst others). I was wary of changing the old implementation too much in order to preserve the old mechanism without any risk of breaking it. dstuttard: Unless I'm missing something - isn't the problem here that the pseudo instructions are using a…
		arsenmUnsubmitted Not Done Reply Inline Actions The pseudo instructions here are not related to the operands at all. We define pseudo and real encoding classes for each instruction encoding type because in VI, the instruction encoding changed. The pseudo instructions allow codegen to ignore this detail, and then the MC emission can select the right real encoding opcode at the last possible moment. It is entirely disconnected from the intrinsics. MUBUF instructions have all the combinations of idxen and offen enabled, this again isn't related to the input intrinsics. There are actually this many different addressing mode options that need to be represented with different instructions, this isn't changing. arsenm: The pseudo instructions here are not related to the operands at all. We define pseudo and real…
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions I misunderstood your original comment. I took it to mean that the legacy and new implementation could be unified at this point. This is the original implementation of the legacy intrinsic (with the name changed to indicate it is the old implementation), I didn't think there was much point in tidying up the implementation. However, if you think it is worthwhile I'll make the necessary changes (I'm inclined to agree that the legacy code should be shortened if there is an easy option to do so). dstuttard: I misunderstood your original comment. I took it to mean that the legacy and new implementation…
InstSI <ps.OutOperandList, ps.InOperandList, ps.Mnemonic # ps.AsmOperands, []>,		InstSI <ps.OutOperandList, ps.InOperandList, ps.Mnemonic # ps.AsmOperands, []>,
Enc64 {		Enc64 {

let isPseudo = 0;		let isPseudo = 0;
let isCodeGenOnly = 0;		let isCodeGenOnly = 0;

// copy relevant pseudo op flags		// copy relevant pseudo op flags
let SubtargetPredicate = ps.SubtargetPredicate;		let SubtargetPredicate = ps.SubtargetPredicate;
let AsmMatchConverter = ps.AsmMatchConverter;		let AsmMatchConverter = ps.AsmMatchConverter;
let Constraints = ps.Constraints;		let Constraints = ps.Constraints;
let DisableEncoding = ps.DisableEncoding;		let DisableEncoding = ps.DisableEncoding;
let TSFlags = ps.TSFlags;		let TSFlags = ps.TSFlags;

bits<8> vdata;		bits<8> vdata;
bits<12> offset;		bits<12> offset;
bits<1> offen;		bits<1> offen;
bits<1> idxen;		bits<1> idxen;
bits<1> glc;		bits<1> glc;
bits<1> addr64;		bits<1> addr64;
bits<4> dfmt;		bits<4> dfmt;
bits<3> nfmt;		bits<3> nfmt;
Show All 13 Lines	class MTBUF_Legacy_Real <MTBUF_Legacy_Pseudo ps> :
let Inst{39-32} = vaddr;		let Inst{39-32} = vaddr;
let Inst{47-40} = vdata;		let Inst{47-40} = vdata;
let Inst{52-48} = srsrc{6-2};		let Inst{52-48} = srsrc{6-2};
let Inst{54} = slc;		let Inst{54} = slc;
let Inst{55} = tfe;		let Inst{55} = tfe;
let Inst{63-56} = soffset;		let Inst{63-56} = soffset;
}		}

class MTBUF_Load_Pseudo <string opName, RegisterClass regClass> : MTBUF_Pseudo <		class MTBUF_Pseudo <string opName, dag outs, dag ins,
opName, (outs regClass:$dst),		string asmOps, list<dag> pattern=[]> :
(ins u16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,		InstSI<outs, ins, "", pattern>,
i8imm:$dfmt, i8imm:$nfmt, VGPR_32:$vaddr, SReg_128:$srsrc,		SIMCInstr<opName, SIEncodingFamily.NONE> {
i1imm:$slc, i1imm:$tfe, SCSrc_b32:$soffset),
" $dst, $offset, $offen, $idxen, $glc, $addr64, $dfmt,"#		let isPseudo = 1;
" $nfmt, $vaddr, $srsrc, $slc, $tfe, $soffset"> {		let isCodeGenOnly = 1;
		let Size = 8;
		let UseNamedOperandTable = 1;

		string Mnemonic = opName;
		string AsmOperands = asmOps;

		let VM_CNT = 1;
		let EXP_CNT = 1;
		let MTBUF = 1;
		let Uses = [EXEC];
		//let hasSideEffects = 0;
		let SchedRW = [WriteVMEM];

		let AsmMatchConverter = "cvtMtbuf";

		bits<1> offen = 0;
		bits<1> idxen = 0;
		bits<1> addr64 = 0;
		bits<1> has_vdata = 1;
		bits<1> has_vaddr = 1;
		bits<1> has_glc = 1;
		bits<1> glc_value = 0; // the value for glc if no such operand
		bits<4> dfmt_value = 1; // the value for dfmt if no such operand
		bits<3> nfmt_value = 0; // the value for nfmt if no such operand
		bits<1> has_srsrc = 1;
		bits<1> has_soffset = 1;
		bits<1> has_offset = 1;
		bits<1> has_slc = 1;
		bits<1> has_tfe = 1;
		bits<1> has_dfmt = 1;
		bits<1> has_nfmt = 1;
		}

		class MTBUF_Real <MTBUF_Pseudo ps> :
		InstSI <ps.OutOperandList, ps.InOperandList, ps.Mnemonic # ps.AsmOperands, []> {

		let isPseudo = 0;
		let isCodeGenOnly = 0;

		// copy relevant pseudo op flags
		let SubtargetPredicate = ps.SubtargetPredicate;
		let AsmMatchConverter = ps.AsmMatchConverter;
		let Constraints = ps.Constraints;
		let DisableEncoding = ps.DisableEncoding;
		let TSFlags = ps.TSFlags;

		bits<12> offset;
		bits<1> glc;
		bits<4> dfmt;
		bits<3> nfmt;
		bits<8> vaddr;
		bits<8> vdata;
		bits<7> srsrc;
		bits<1> slc;
		bits<1> tfe;
		bits<8> soffset;
		}

		class getMTBUFInsDA<list<RegisterClass> vdataList,
		list<RegisterClass> vaddrList=[]> {
		RegisterClass vdataClass = !if(!empty(vdataList), ?, !head(vdataList));
		RegisterClass vaddrClass = !if(!empty(vaddrList), ?, !head(vaddrList));
		dag InsNoData = !if(!empty(vaddrList),
		(ins SReg_128:$srsrc, SCSrc_b32:$soffset,
		offset:$offset, DFMT:$dfmt, NFMT:$nfmt, GLC:$glc, slc:$slc, tfe:$tfe),
		(ins vaddrClass:$vaddr, SReg_128:$srsrc, SCSrc_b32:$soffset,
		offset:$offset, DFMT:$dfmt, NFMT:$nfmt, GLC:$glc, slc:$slc, tfe:$tfe)
		);
		dag InsData = !if(!empty(vaddrList),
		(ins vdataClass:$vdata, SReg_128:$srsrc,
		SCSrc_b32:$soffset, offset:$offset, DFMT:$dfmt, NFMT:$nfmt, GLC:$glc, slc:$slc, tfe:$tfe),
		(ins vdataClass:$vdata, vaddrClass:$vaddr, SReg_128:$srsrc,
		SCSrc_b32:$soffset, offset:$offset, DFMT:$dfmt, NFMT:$nfmt, GLC:$glc, slc:$slc, tfe:$tfe)
		);
		dag ret = !if(!empty(vdataList), InsNoData, InsData);
		}

		class getMTBUFIns<int addrKind, list<RegisterClass> vdataList=[]> {
		dag ret =
		!if(!eq(addrKind, BUFAddrKind.Offset), getMTBUFInsDA<vdataList>.ret,
		!if(!eq(addrKind, BUFAddrKind.OffEn), getMTBUFInsDA<vdataList, [VGPR_32]>.ret,
		!if(!eq(addrKind, BUFAddrKind.IdxEn), getMTBUFInsDA<vdataList, [VGPR_32]>.ret,
		!if(!eq(addrKind, BUFAddrKind.BothEn), getMTBUFInsDA<vdataList, [VReg_64]>.ret,
		!if(!eq(addrKind, BUFAddrKind.Addr64), getMTBUFInsDA<vdataList, [VReg_64]>.ret,
		(ins))))));
		}

		class getMTBUFAsmOps<int addrKind> {
		string Pfx =
		!if(!eq(addrKind, BUFAddrKind.Offset), "off, $srsrc, $dfmt, $nfmt, $soffset",
		!if(!eq(addrKind, BUFAddrKind.OffEn), "$vaddr, $srsrc, $dfmt, $nfmt, $soffset offen",
		!if(!eq(addrKind, BUFAddrKind.IdxEn), "$vaddr, $srsrc, $dfmt, $nfmt, $soffset idxen",
		!if(!eq(addrKind, BUFAddrKind.BothEn), "$vaddr, $srsrc, $dfmt, $nfmt, $soffset idxen offen",
		!if(!eq(addrKind, BUFAddrKind.Addr64), "$vaddr, $srsrc, $dfmt, $nfmt, $soffset addr64",
		"")))));
		string ret = Pfx # "$offset";
		}

		class MTBUF_SetupAddr<int addrKind> {
		bits<1> offen = !if(!eq(addrKind, BUFAddrKind.OffEn), 1,
		!if(!eq(addrKind, BUFAddrKind.BothEn), 1 , 0));

		bits<1> idxen = !if(!eq(addrKind, BUFAddrKind.IdxEn), 1,
		!if(!eq(addrKind, BUFAddrKind.BothEn), 1 , 0));

		bits<1> addr64 = !if(!eq(addrKind, BUFAddrKind.Addr64), 1, 0);

		bits<1> has_vaddr = !if(!eq(addrKind, BUFAddrKind.Offset), 0, 1);
		}

		class MTBUF_Load_Pseudo <string opName,
		int addrKind,
		RegisterClass vdataClass,
		list<dag> pattern=[],
		// Workaround bug bz30254
		int addrKindCopy = addrKind>
		: MTBUF_Pseudo<opName,
		(outs vdataClass:$vdata),
		getMTBUFIns<addrKindCopy>.ret,
		" $vdata, " # getMTBUFAsmOps<addrKindCopy>.ret # "$glc$slc$tfe",
		pattern>,
		MTBUF_SetupAddr<addrKindCopy> {
		let PseudoInstr = opName # "_" # getAddrName<addrKindCopy>.ret;
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 0;		let mayStore = 0;
}		}

class MTBUF_Store_Pseudo <string opName, RegisterClass regClass> : MTBUF_Pseudo <		multiclass MTBUF_Pseudo_Loads<string opName, RegisterClass vdataClass,
		ValueType load_vt = i32,
		SDPatternOperator ld = null_frag> {

		def _OFFSET : MTBUF_Load_Pseudo <opName, BUFAddrKind.Offset, vdataClass,
		[(set load_vt:$vdata,
		(ld (MUBUFOffset v4i32:$srsrc, i32:$soffset, i16:$offset, i8:$dfmt, i8:$nfmt, i1:$glc, i1:$slc, i1:$tfe)))]>,
		MTBUFAddr64Table<0>;

		def _ADDR64 : MTBUF_Load_Pseudo <opName, BUFAddrKind.Addr64, vdataClass,
		[(set load_vt:$vdata,
		(ld (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset, i16:$offset, i8:$dfmt, i8:$nfmt, i1:$glc, i1:$slc, i1:$tfe)))]>,
		MTBUFAddr64Table<1>;

		def _OFFEN : MTBUF_Load_Pseudo <opName, BUFAddrKind.OffEn, vdataClass>;
		def _IDXEN : MTBUF_Load_Pseudo <opName, BUFAddrKind.IdxEn, vdataClass>;
		def _BOTHEN : MTBUF_Load_Pseudo <opName, BUFAddrKind.BothEn, vdataClass>;

		let DisableWQM = 1 in {
		def _OFFSET_exact : MTBUF_Load_Pseudo <opName, BUFAddrKind.Offset, vdataClass>;
		def _OFFEN_exact : MTBUF_Load_Pseudo <opName, BUFAddrKind.OffEn, vdataClass>;
		def _IDXEN_exact : MTBUF_Load_Pseudo <opName, BUFAddrKind.IdxEn, vdataClass>;
		def _BOTHEN_exact : MTBUF_Load_Pseudo <opName, BUFAddrKind.BothEn, vdataClass>;
		}
		}

		// Legacy tbuffer store support
		class MTBUF_Store_Legacy_Pseudo <string opName, RegisterClass regClass> : MTBUF_Legacy_Pseudo <
opName, (outs),		opName, (outs),
(ins regClass:$vdata, u16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,		(ins regClass:$vdata, u16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VGPR_32:$vaddr,		i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VGPR_32:$vaddr,
SReg_128:$srsrc, i1imm:$slc, i1imm:$tfe, SCSrc_b32:$soffset),		SReg_128:$srsrc, i1imm:$slc, i1imm:$tfe, SCSrc_b32:$soffset),
" $vdata, $offset, $offen, $idxen, $glc, $addr64, $dfmt,"#		" $vdata, $offset, $offen, $idxen, $glc, $addr64, $dfmt,"#
" $nfmt, $vaddr, $srsrc, $slc, $tfe, $soffset"> {		" $nfmt, $vaddr, $srsrc, $slc, $tfe, $soffset"> {
let mayLoad = 0;		let mayLoad = 0;
let mayStore = 1;		let mayStore = 1;
}		}

		class MTBUF_Store_Pseudo <string opName,
		int addrKind,
		RegisterClass vdataClass,
		list<dag> pattern=[],
		// Workaround bug bz30254
		int addrKindCopy = addrKind,
		RegisterClass vdataClassCopy = vdataClass>
		: MTBUF_Pseudo<opName,
		(outs),
		getMTBUFIns<addrKindCopy, [vdataClassCopy]>.ret,
		" $vdata, " # getMTBUFAsmOps<addrKindCopy>.ret # "$glc$slc$tfe",
		pattern>,
		MTBUF_SetupAddr<addrKindCopy> {
		let PseudoInstr = opName # "_" # getAddrName<addrKindCopy>.ret;
		let mayLoad = 0;
		let mayStore = 1;
		}

		multiclass MTBUF_Pseudo_Stores<string opName, RegisterClass vdataClass,
		ValueType store_vt = i32,
		SDPatternOperator st = null_frag> {

		def _OFFSET : MTBUF_Store_Pseudo <opName, BUFAddrKind.Offset, vdataClass,
		[(st store_vt:$vdata, (MUBUFOffset v4i32:$srsrc, i32:$soffset,
		i16:$offset, i8:$dfmt, i8:$nfmt, i1:$glc, i1:$slc, i1:$tfe))]>,
		MTBUFAddr64Table<0>;

		def _ADDR64 : MTBUF_Store_Pseudo <opName, BUFAddrKind.Addr64, vdataClass,
		[(st store_vt:$vdata, (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,
		i16:$offset, i8:$dfmt, i8:$nfmt, i1:$glc, i1:$slc, i1:$tfe))]>,
		MTBUFAddr64Table<1>;

		def _OFFEN : MTBUF_Store_Pseudo <opName, BUFAddrKind.OffEn, vdataClass>;
		def _IDXEN : MTBUF_Store_Pseudo <opName, BUFAddrKind.IdxEn, vdataClass>;
		def _BOTHEN : MTBUF_Store_Pseudo <opName, BUFAddrKind.BothEn, vdataClass>;

		let DisableWQM = 1 in {
		def _OFFSET_exact : MTBUF_Store_Pseudo <opName, BUFAddrKind.Offset, vdataClass>;
		def _OFFEN_exact : MTBUF_Store_Pseudo <opName, BUFAddrKind.OffEn, vdataClass>;
		def _IDXEN_exact : MTBUF_Store_Pseudo <opName, BUFAddrKind.IdxEn, vdataClass>;
		def _BOTHEN_exact : MTBUF_Store_Pseudo <opName, BUFAddrKind.BothEn, vdataClass>;
		}
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MUBUF classes		// MUBUF classes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class MUBUF_Pseudo <string opName, dag outs, dag ins,		class MUBUF_Pseudo <string opName, dag outs, dag ins,
string asmOps, list<dag> pattern=[]> :		string asmOps, list<dag> pattern=[]> :
InstSI<outs, ins, "", pattern>,		InstSI<outs, ins, "", pattern>,
SIMCInstr<opName, SIEncodingFamily.NONE> {		SIMCInstr<opName, SIEncodingFamily.NONE> {
▲ Show 20 Lines • Show All 512 Lines • ▼ Show 20 Lines

def BUFFER_WBINVL1 : MUBUF_Invalidate <"buffer_wbinvl1",		def BUFFER_WBINVL1 : MUBUF_Invalidate <"buffer_wbinvl1",
int_amdgcn_buffer_wbinvl1>;		int_amdgcn_buffer_wbinvl1>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MTBUF Instructions		// MTBUF Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//def TBUFFER_LOAD_FORMAT_X : MTBUF_ <0, "tbuffer_load_format_x", []>;		defm TBUFFER_LOAD_FORMAT_X : MTBUF_Pseudo_Loads <"tbuffer_load_format_x", VGPR_32>;
//def TBUFFER_LOAD_FORMAT_XY : MTBUF_ <1, "tbuffer_load_format_xy", []>;		defm TBUFFER_LOAD_FORMAT_XY : MTBUF_Pseudo_Loads <"tbuffer_load_format_xy", VReg_64>;
//def TBUFFER_LOAD_FORMAT_XYZ : MTBUF_ <2, "tbuffer_load_format_xyz", []>;		defm TBUFFER_LOAD_FORMAT_XYZ : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyz", VReg_96>;
def TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Load_Pseudo <"tbuffer_load_format_xyzw", VReg_128>;		defm TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyzw", VReg_128>;
def TBUFFER_STORE_FORMAT_X : MTBUF_Store_Pseudo <"tbuffer_store_format_x", VGPR_32>;		defm TBUFFER_STORE_FORMAT_X : MTBUF_Pseudo_Stores <"tbuffer_store_format_x", VGPR_32>;
def TBUFFER_STORE_FORMAT_XY : MTBUF_Store_Pseudo <"tbuffer_store_format_xy", VReg_64>;		defm TBUFFER_STORE_FORMAT_XY : MTBUF_Pseudo_Stores <"tbuffer_store_format_xy", VReg_64>;
def TBUFFER_STORE_FORMAT_XYZ : MTBUF_Store_Pseudo <"tbuffer_store_format_xyz", VReg_128>;		defm TBUFFER_STORE_FORMAT_XYZ : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyz", VReg_96>;
def TBUFFER_STORE_FORMAT_XYZW : MTBUF_Store_Pseudo <"tbuffer_store_format_xyzw", VReg_128>;		defm TBUFFER_STORE_FORMAT_XYZW : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyzw", VReg_128>;

		// Legacy tbuffer_store support
		def TBUFFER_STORE_FORMAT_LEGACY_X : MTBUF_Store_Legacy_Pseudo <"tbuffer_store_format_x", VGPR_32>;
		def TBUFFER_STORE_FORMAT_LEGACY_XY : MTBUF_Store_Legacy_Pseudo <"tbuffer_store_format_xy", VReg_64>;
		def TBUFFER_STORE_FORMAT_LEGACY_XYZ : MTBUF_Store_Legacy_Pseudo <"tbuffer_store_format_xyz", VReg_128>;
		def TBUFFER_STORE_FORMAT_LEGACY_XYZW : MTBUF_Store_Legacy_Pseudo <"tbuffer_store_format_xyzw", VReg_128>;

} // End let SubtargetPredicate = isGCN		} // End let SubtargetPredicate = isGCN

let SubtargetPredicate = isCIVI in {		let SubtargetPredicate = isCIVI in {

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction definitions for CI and newer.		// Instruction definitions for CI and newer.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, BUFFER_STORE_DWORD_OFFSET, i32, store_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, BUFFER_STORE_DWORD_OFFSET, i32, store_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, BUFFER_STORE_DWORDX2_OFFSET, v2i32, store_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, BUFFER_STORE_DWORDX2_OFFSET, v2i32, store_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, BUFFER_STORE_DWORDX4_OFFSET, v4i32, store_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, BUFFER_STORE_DWORDX4_OFFSET, v4i32, store_private>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MTBUF Patterns		// MTBUF Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		//===----------------------------------------------------------------------===//
		// tbuffer_store_format legacy patterns
		//===----------------------------------------------------------------------===//

// TBUFFER_STORE_FORMAT_*, addr64=0		// TBUFFER_STORE_FORMAT_*, addr64=0
class MTBUF_StoreResource <ValueType vt, int num_channels, MTBUF_Pseudo opcode> : Pat<		class MTBUF_StoreResource <ValueType vt, int num_channels, MTBUF_Legacy_Pseudo opcode> : Pat<
(SItbuffer_store v4i32:$rsrc, vt:$vdata, num_channels, i32:$vaddr,		(SItbuffer_store_legacy v4i32:$rsrc, vt:$vdata, num_channels, i32:$vaddr,
i32:$soffset, imm:$inst_offset, imm:$dfmt,		i32:$soffset, imm:$inst_offset, imm:$dfmt,
imm:$nfmt, imm:$offen, imm:$idxen,		imm:$nfmt, imm:$offen, imm:$idxen,
imm:$glc, imm:$slc, imm:$tfe),		imm:$glc, imm:$slc, imm:$tfe),
(opcode		(opcode
$vdata, (as_i16imm $inst_offset), (as_i1imm $offen), (as_i1imm $idxen),		$vdata, (as_i16imm $inst_offset), (as_i1imm $offen), (as_i1imm $idxen),
(as_i1imm $glc), 0, (as_i8imm $dfmt), (as_i8imm $nfmt), $vaddr, $rsrc,		(as_i1imm $glc), 0, (as_i8imm $dfmt), (as_i8imm $nfmt), $vaddr, $rsrc,
(as_i1imm $slc), (as_i1imm $tfe), $soffset)		(as_i1imm $slc), (as_i1imm $tfe), $soffset)
>;		>;

def : MTBUF_StoreResource <i32, 1, TBUFFER_STORE_FORMAT_X>;		def : MTBUF_StoreResource <i32, 1, TBUFFER_STORE_FORMAT_LEGACY_X>;
def : MTBUF_StoreResource <v2i32, 2, TBUFFER_STORE_FORMAT_XY>;		def : MTBUF_StoreResource <v2i32, 2, TBUFFER_STORE_FORMAT_LEGACY_XY>;
def : MTBUF_StoreResource <v4i32, 3, TBUFFER_STORE_FORMAT_XYZ>;		def : MTBUF_StoreResource <v4i32, 3, TBUFFER_STORE_FORMAT_LEGACY_XYZ>;
def : MTBUF_StoreResource <v4i32, 4, TBUFFER_STORE_FORMAT_XYZW>;		def : MTBUF_StoreResource <v4i32, 4, TBUFFER_STORE_FORMAT_LEGACY_XYZW>;

		//===----------------------------------------------------------------------===//
		// tbuffer_load/store_format patterns
		//===----------------------------------------------------------------------===//

		multiclass MTBUF_LoadIntrinsicPat<SDPatternOperator name, ValueType vt,
		string opcode> {
		def : Pat<
		(vt (name v4i32:$rsrc, 0, 0, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc)),
		(!cast<MTBUF_Pseudo>(opcode # _OFFSET) $rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(vt (name v4i32:$rsrc, i32:$vindex, 0, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc)),
		(!cast<MTBUF_Pseudo>(opcode # _IDXEN) $vindex, $rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(vt (name v4i32:$rsrc, 0, i32:$voffset, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc)),
		(!cast<MTBUF_Pseudo>(opcode # _OFFEN) $voffset, $rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(vt (name v4i32:$rsrc, i32:$vindex, i32:$voffset, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc)),
		(!cast<MTBUF_Pseudo>(opcode # _BOTHEN)
		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
		$rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;
		}

		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, i32, "TBUFFER_LOAD_FORMAT_X">;
		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, v2i32, "TBUFFER_LOAD_FORMAT_XY">;
		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, v4i32, "TBUFFER_LOAD_FORMAT_XYZW">;
		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, f32, "TBUFFER_LOAD_FORMAT_X">;
		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, v2f32, "TBUFFER_LOAD_FORMAT_XY">;
		defm : MTBUF_LoadIntrinsicPat<SItbuffer_load, v4f32, "TBUFFER_LOAD_FORMAT_XYZW">;

		multiclass MTBUF_StoreIntrinsicPat<SDPatternOperator name, ValueType vt,
		string opcode> {
		def : Pat<
		(name vt:$vdata, v4i32:$rsrc, 0, 0, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc),
		(!cast<MTBUF_Pseudo>(opcode # _OFFSET_exact) $vdata, $rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(name vt:$vdata, v4i32:$rsrc, i32:$vindex, 0, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc),
		(!cast<MTBUF_Pseudo>(opcode # _IDXEN_exact) $vdata, $vindex, $rsrc, $soffset,
		(as_i16imm $offset), (as_i8imm $dfmt), (as_i8imm $nfmt),
		(as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(name vt:$vdata, v4i32:$rsrc, 0, i32:$voffset, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc),
		(!cast<MTBUF_Pseudo>(opcode # _OFFEN_exact) $vdata, $voffset, $rsrc, $soffset,
		(as_i16imm $offset), (as_i8imm $dfmt), (as_i8imm $nfmt),
		(as_i1imm $glc), (as_i1imm $slc), 0)
		>;

		def : Pat<
		(name vt:$vdata, v4i32:$rsrc, i32:$vindex, i32:$voffset, i32:$soffset, imm:$offset,
		imm:$dfmt, imm:$nfmt, imm:$glc, imm:$slc),
		(!cast<MTBUF_Pseudo>(opcode # _BOTHEN_exact)
		$vdata,
		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
		$rsrc, $soffset, (as_i16imm $offset),
		(as_i8imm $dfmt), (as_i8imm $nfmt), (as_i1imm $glc), (as_i1imm $slc), 0)
		>;
		}

		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, i32, "TBUFFER_STORE_FORMAT_X">;
		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, v2i32, "TBUFFER_STORE_FORMAT_XY">;
		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, v4i32, "TBUFFER_STORE_FORMAT_XYZW">;
		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, f32, "TBUFFER_STORE_FORMAT_X">;
		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, v2f32, "TBUFFER_STORE_FORMAT_XY">;
		defm : MTBUF_StoreIntrinsicPat<SItbuffer_store, v4f32, "TBUFFER_STORE_FORMAT_XYZW">;

} // End let Predicates = [isGCN]		} // End let Predicates = [isGCN]

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target instructions, move to the appropriate target TD file		// Target instructions, move to the appropriate target TD file
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
//defm BUFFER_ATOMIC_FMIN_X2 : MUBUF_Real_Atomic_si <0x5f>; // isn't on VI		//defm BUFFER_ATOMIC_FMIN_X2 : MUBUF_Real_Atomic_si <0x5f>; // isn't on VI
//defm BUFFER_ATOMIC_FMAX_X2 : MUBUF_Real_Atomic_si <0x60>; // isn't on VI		//defm BUFFER_ATOMIC_FMAX_X2 : MUBUF_Real_Atomic_si <0x60>; // isn't on VI

def BUFFER_WBINVL1_SC_si : MUBUF_Real_si <0x70, BUFFER_WBINVL1_SC>;		def BUFFER_WBINVL1_SC_si : MUBUF_Real_si <0x70, BUFFER_WBINVL1_SC>;
def BUFFER_WBINVL1_si : MUBUF_Real_si <0x71, BUFFER_WBINVL1>;		def BUFFER_WBINVL1_si : MUBUF_Real_si <0x71, BUFFER_WBINVL1>;

class MTBUF_Real_si <bits<3> op, MTBUF_Pseudo ps> :		class MTBUF_Real_si <bits<3> op, MTBUF_Pseudo ps> :
MTBUF_Real<ps>,		MTBUF_Real<ps>,
		Enc64,
		SIMCInstr<ps.PseudoInstr, SIEncodingFamily.SI> {
		let AssemblerPredicate=isSICI;
		let DecoderNamespace="SICI";

		let Inst{11-0} = !if(ps.has_offset, offset, ?);
		let Inst{12} = ps.offen;
		let Inst{13} = ps.idxen;
		let Inst{14} = !if(ps.has_glc, glc, ps.glc_value);
		let Inst{15} = ps.addr64;
		let Inst{18-16} = op;
		let Inst{22-19} = !if(ps.has_dfmt, dfmt, ps.dfmt_value);
		let Inst{25-23} = !if(ps.has_nfmt, nfmt, ps.nfmt_value);
		let Inst{31-26} = 0x3a; //encoding
		let Inst{39-32} = !if(ps.has_vaddr, vaddr, ?);
		let Inst{47-40} = !if(ps.has_vdata, vdata, ?);
		let Inst{52-48} = !if(ps.has_srsrc, srsrc{6-2}, ?);
		let Inst{54} = !if(ps.has_slc, slc, ?);
		let Inst{55} = !if(ps.has_tfe, tfe, ?);
		let Inst{63-56} = !if(ps.has_soffset, soffset, ?);
		}

		multiclass MTBUF_Real_AllAddr_si<bits<3> op> {
		def _OFFSET_si : MTBUF_Real_si <op, !cast<MTBUF_Pseudo>(NAME#"_OFFSET")>;
		def _ADDR64_si : MTBUF_Real_si <op, !cast<MTBUF_Pseudo>(NAME#"_ADDR64")>;
		def _OFFEN_si : MTBUF_Real_si <op, !cast<MTBUF_Pseudo>(NAME#"_OFFEN")>;
		def _IDXEN_si : MTBUF_Real_si <op, !cast<MTBUF_Pseudo>(NAME#"_IDXEN")>;
		def _BOTHEN_si : MTBUF_Real_si <op, !cast<MTBUF_Pseudo>(NAME#"_BOTHEN")>;
		}

		defm TBUFFER_LOAD_FORMAT_X : MTBUF_Real_AllAddr_si <0>;
		defm TBUFFER_LOAD_FORMAT_XY : MTBUF_Real_AllAddr_si <1>;
		//defm TBUFFER_LOAD_FORMAT_XYZ : MTBUF_Real_AllAddr_si <2>;
		defm TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Real_AllAddr_si <3>;
		defm TBUFFER_STORE_FORMAT_X : MTBUF_Real_AllAddr_si <4>;
		defm TBUFFER_STORE_FORMAT_XY : MTBUF_Real_AllAddr_si <5>;
		defm TBUFFER_STORE_FORMAT_XYZ : MTBUF_Real_AllAddr_si <6>;
		defm TBUFFER_STORE_FORMAT_XYZW : MTBUF_Real_AllAddr_si <7>;

		class MTBUF_Legacy_Real_si <bits<3> op, MTBUF_Legacy_Pseudo ps> :
		MTBUF_Legacy_Real<ps>,
SIMCInstr<ps.PseudoInstr, SIEncodingFamily.SI> {		SIMCInstr<ps.PseudoInstr, SIEncodingFamily.SI> {
let AssemblerPredicate=isSICI;		let AssemblerPredicate=isSICI;
let DecoderNamespace="SICI";		let DecoderNamespace="SICI";

bits<1> addr64;		bits<1> addr64;
let Inst{15} = addr64;		let Inst{15} = addr64;
let Inst{18-16} = op;		let Inst{18-16} = op;
}		}

def TBUFFER_LOAD_FORMAT_XYZW_si : MTBUF_Real_si <3, TBUFFER_LOAD_FORMAT_XYZW>;		def TBUFFER_STORE_FORMAT_LEGACY_X_si : MTBUF_Legacy_Real_si <4, TBUFFER_STORE_FORMAT_LEGACY_X>;
def TBUFFER_STORE_FORMAT_X_si : MTBUF_Real_si <4, TBUFFER_STORE_FORMAT_X>;		def TBUFFER_STORE_FORMAT_LEGACY_XY_si : MTBUF_Legacy_Real_si <5, TBUFFER_STORE_FORMAT_LEGACY_XY>;
def TBUFFER_STORE_FORMAT_XY_si : MTBUF_Real_si <5, TBUFFER_STORE_FORMAT_XY>;		def TBUFFER_STORE_FORMAT_LEGACY_XYZ_si : MTBUF_Legacy_Real_si <6, TBUFFER_STORE_FORMAT_LEGACY_XYZ>;
def TBUFFER_STORE_FORMAT_XYZ_si : MTBUF_Real_si <6, TBUFFER_STORE_FORMAT_XYZ>;		def TBUFFER_STORE_FORMAT_LEGACY_XYZW_si : MTBUF_Legacy_Real_si <7, TBUFFER_STORE_FORMAT_LEGACY_XYZW>;
		arsenmUnsubmitted Not Done Reply Inline Actions Again we don't need separate machine instruction or encoding definitions to support the old intrinsics arsenm: Again we don't need separate machine instruction or encoding definitions to support the old…
def TBUFFER_STORE_FORMAT_XYZW_si : MTBUF_Real_si <7, TBUFFER_STORE_FORMAT_XYZW>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CI		// CI
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class MUBUF_Real_ci <bits<7> op, MUBUF_Pseudo ps> :		class MUBUF_Real_ci <bits<7> op, MUBUF_Pseudo ps> :
MUBUF_Real_si<op, ps> {		MUBUF_Real_si<op, ps> {
let AssemblerPredicate=isCIOnly;		let AssemblerPredicate=isCIOnly;
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
defm BUFFER_ATOMIC_INC_X2 : MUBUF_Real_Atomic_vi <0x6b>;		defm BUFFER_ATOMIC_INC_X2 : MUBUF_Real_Atomic_vi <0x6b>;
defm BUFFER_ATOMIC_DEC_X2 : MUBUF_Real_Atomic_vi <0x6c>;		defm BUFFER_ATOMIC_DEC_X2 : MUBUF_Real_Atomic_vi <0x6c>;

def BUFFER_WBINVL1_vi : MUBUF_Real_vi <0x3e, BUFFER_WBINVL1>;		def BUFFER_WBINVL1_vi : MUBUF_Real_vi <0x3e, BUFFER_WBINVL1>;
def BUFFER_WBINVL1_VOL_vi : MUBUF_Real_vi <0x3f, BUFFER_WBINVL1_VOL>;		def BUFFER_WBINVL1_VOL_vi : MUBUF_Real_vi <0x3f, BUFFER_WBINVL1_VOL>;

class MTBUF_Real_vi <bits<4> op, MTBUF_Pseudo ps> :		class MTBUF_Real_vi <bits<4> op, MTBUF_Pseudo ps> :
MTBUF_Real<ps>,		MTBUF_Real<ps>,
		Enc64,
SIMCInstr<ps.PseudoInstr, SIEncodingFamily.VI> {		SIMCInstr<ps.PseudoInstr, SIEncodingFamily.VI> {
let AssemblerPredicate=isVI;		let AssemblerPredicate=isVI;
let DecoderNamespace="VI";		let DecoderNamespace="VI";

		let Inst{11-0} = !if(ps.has_offset, offset, ?);
		let Inst{12} = ps.offen;
		let Inst{13} = ps.idxen;
		let Inst{14} = !if(ps.has_glc, glc, ps.glc_value);
let Inst{18-15} = op;		let Inst{18-15} = op;
		let Inst{22-19} = !if(ps.has_dfmt, dfmt, ps.dfmt_value);
		let Inst{25-23} = !if(ps.has_nfmt, nfmt, ps.nfmt_value);
		let Inst{31-26} = 0x3a; //encoding
		let Inst{39-32} = !if(ps.has_vaddr, vaddr, ?);
		let Inst{47-40} = !if(ps.has_vdata, vdata, ?);
		let Inst{52-48} = !if(ps.has_srsrc, srsrc{6-2}, ?);
		let Inst{54} = !if(ps.has_slc, slc, ?);
		let Inst{55} = !if(ps.has_tfe, tfe, ?);
		let Inst{63-56} = !if(ps.has_soffset, soffset, ?);
}		}

def TBUFFER_LOAD_FORMAT_XYZW_vi : MTBUF_Real_vi <3, TBUFFER_LOAD_FORMAT_XYZW>;		multiclass MTBUF_Real_AllAddr_vi<bits<4> op> {
def TBUFFER_STORE_FORMAT_X_vi : MTBUF_Real_vi <4, TBUFFER_STORE_FORMAT_X>;		def _OFFSET_vi : MTBUF_Real_vi <op, !cast<MTBUF_Pseudo>(NAME#"_OFFSET")>;
def TBUFFER_STORE_FORMAT_XY_vi : MTBUF_Real_vi <5, TBUFFER_STORE_FORMAT_XY>;		def _OFFEN_vi : MTBUF_Real_vi <op, !cast<MTBUF_Pseudo>(NAME#"_OFFEN")>;
def TBUFFER_STORE_FORMAT_XYZ_vi : MTBUF_Real_vi <6, TBUFFER_STORE_FORMAT_XYZ>;		def _IDXEN_vi : MTBUF_Real_vi <op, !cast<MTBUF_Pseudo>(NAME#"_IDXEN")>;
def TBUFFER_STORE_FORMAT_XYZW_vi : MTBUF_Real_vi <7, TBUFFER_STORE_FORMAT_XYZW>;		def _BOTHEN_vi : MTBUF_Real_vi <op, !cast<MTBUF_Pseudo>(NAME#"_BOTHEN")>;
		}

		defm TBUFFER_LOAD_FORMAT_X : MTBUF_Real_AllAddr_vi <0>;
		defm TBUFFER_LOAD_FORMAT_XY : MTBUF_Real_AllAddr_vi <1>;
		//defm TBUFFER_LOAD_FORMAT_XYZ : MTBUF_Real_AllAddr_vi <2>;
		defm TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Real_AllAddr_vi <3>;
		defm TBUFFER_STORE_FORMAT_X : MTBUF_Real_AllAddr_vi <4>;
		defm TBUFFER_STORE_FORMAT_XY : MTBUF_Real_AllAddr_vi <5>;
		defm TBUFFER_STORE_FORMAT_XYZ : MTBUF_Real_AllAddr_vi <6>;
		defm TBUFFER_STORE_FORMAT_XYZW : MTBUF_Real_AllAddr_vi <7>;

		class MTBUF_Legacy_Real_vi <bits<4> op, MTBUF_Legacy_Pseudo ps> :
		MTBUF_Legacy_Real<ps>,
		SIMCInstr<ps.PseudoInstr, SIEncodingFamily.VI> {
		let AssemblerPredicate=isVI;
		let DecoderNamespace="VI";

		let Inst{18-15} = op;
		}

		def TBUFFER_STORE_FORMAT_LEGACY_X_vi : MTBUF_Legacy_Real_vi <4, TBUFFER_STORE_FORMAT_LEGACY_X>;
		def TBUFFER_STORE_FORMAT_LEGACY_XY_vi : MTBUF_Legacy_Real_vi <5, TBUFFER_STORE_FORMAT_LEGACY_XY>;
		def TBUFFER_STORE_FORMAT_LEGACY_XYZ_vi : MTBUF_Legacy_Real_vi <6, TBUFFER_STORE_FORMAT_LEGACY_XYZ>;
		def TBUFFER_STORE_FORMAT_LEGACY_XYZW_vi : MTBUF_Legacy_Real_vi <7, TBUFFER_STORE_FORMAT_LEGACY_XYZW>;

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	private:
void printR128(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printR128(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printLWE(const MCInst *MI, unsigned OpNo,		void printLWE(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpCompr(const MCInst *MI, unsigned OpNo,		void printExpCompr(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpVM(const MCInst *MI, unsigned OpNo,		void printExpVM(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
		void printDFMT(const MCInst *MI, unsigned OpNo,
		const MCSubtargetInfo &STI, raw_ostream &O);
		void printNFMT(const MCInst *MI, unsigned OpNo,
		const MCSubtargetInfo &STI, raw_ostream &O);

void printRegOperand(unsigned RegNo, raw_ostream &O);		void printRegOperand(unsigned RegNo, raw_ostream &O);
void printVOPDst(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printVOPDst(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printImmediate16(uint32_t Imm, const MCSubtargetInfo &STI,		void printImmediate16(uint32_t Imm, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printImmediateV216(uint32_t Imm, const MCSubtargetInfo &STI,		void printImmediateV216(uint32_t Imm, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,		void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

	Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines

	void AMDGPUInstPrinter::printExpVM(const MCInst *MI, unsigned OpNo,			void AMDGPUInstPrinter::printExpVM(const MCInst *MI, unsigned OpNo,
	const MCSubtargetInfo &STI,			const MCSubtargetInfo &STI,
	raw_ostream &O) {			raw_ostream &O) {
	if (MI->getOperand(OpNo).getImm())			if (MI->getOperand(OpNo).getImm())
	O << " vm";			O << " vm";
	}			}

				void AMDGPUInstPrinter::printDFMT(const MCInst *MI, unsigned OpNo,
				const MCSubtargetInfo &STI,
				raw_ostream &O) {
				if (MI->getOperand(OpNo).getImm()) {
				O << " dfmt:";
				printU8ImmDecOperand(MI, OpNo, O);
				}
				}

				void AMDGPUInstPrinter::printNFMT(const MCInst *MI, unsigned OpNo,
				const MCSubtargetInfo &STI,
				raw_ostream &O) {
				if (MI->getOperand(OpNo).getImm()) {
				O << " nfmt:";
				printU8ImmDecOperand(MI, OpNo, O);
				}
				}

	void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,			void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,
	const MCRegisterInfo &MRI) {			const MCRegisterInfo &MRI) {
	switch (RegNo) {			switch (RegNo) {
	case AMDGPU::VCC:			case AMDGPU::VCC:
	O << "vcc";			O << "vcc";
	return;			return;
	case AMDGPU::SCC:			case AMDGPU::SCC:
	O << "scc";			O << "scc";
	▲ Show 20 Lines • Show All 989 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 3,268 Lines • ▼ Show 20 Lines	default:
return Op;		return Op;
}		}
}		}

SDValue SITargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,		SDValue SITargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
unsigned IntrID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();		unsigned IntrID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
SDLoc DL(Op);		SDLoc DL(Op);
		MachineFunction &MF = DAG.getMachineFunction();

switch (IntrID) {		switch (IntrID) {
case Intrinsic::amdgcn_atomic_inc:		case Intrinsic::amdgcn_atomic_inc:
case Intrinsic::amdgcn_atomic_dec: {		case Intrinsic::amdgcn_atomic_dec: {
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);
unsigned Opc = (IntrID == Intrinsic::amdgcn_atomic_inc) ?		unsigned Opc = (IntrID == Intrinsic::amdgcn_atomic_inc) ?
AMDGPUISD::ATOMIC_INC : AMDGPUISD::ATOMIC_DEC;		AMDGPUISD::ATOMIC_INC : AMDGPUISD::ATOMIC_DEC;
SDValue Ops[] = {		SDValue Ops[] = {
M->getOperand(0), // Chain		M->getOperand(0), // Chain
Show All 9 Lines	case Intrinsic::amdgcn_buffer_load_format: {
SDValue Ops[] = {		SDValue Ops[] = {
Op.getOperand(0), // Chain		Op.getOperand(0), // Chain
Op.getOperand(2), // rsrc		Op.getOperand(2), // rsrc
Op.getOperand(3), // vindex		Op.getOperand(3), // vindex
Op.getOperand(4), // offset		Op.getOperand(4), // offset
Op.getOperand(5), // glc		Op.getOperand(5), // glc
Op.getOperand(6) // slc		Op.getOperand(6) // slc
};		};
MachineFunction &MF = DAG.getMachineFunction();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

unsigned Opc = (IntrID == Intrinsic::amdgcn_buffer_load) ?		unsigned Opc = (IntrID == Intrinsic::amdgcn_buffer_load) ?
AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;		AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT IntVT = VT.changeTypeToInteger();		EVT IntVT = VT.changeTypeToInteger();

MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo(MFI->getBufferPSV()),		MachinePointerInfo(MFI->getBufferPSV()),
MachineMemOperand::MOLoad,		MachineMemOperand::MOLoad,
VT.getStoreSize(), VT.getStoreSize());		VT.getStoreSize(), VT.getStoreSize());

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT, MMO);		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT, MMO);
}		}
		case Intrinsic::amdgcn_tbuffer_load: {
		SDValue Ops[] = {
		Op.getOperand(0), // Chain
		Op.getOperand(2), // rsrc
		Op.getOperand(3), // vindex
		Op.getOperand(4), // voffset
		Op.getOperand(5), // soffset
		Op.getOperand(6), // offset
		Op.getOperand(7), // dfmt
		Op.getOperand(8), // nfmt
		Op.getOperand(9), // glc
		Op.getOperand(10) // slc
		};

		EVT VT = Op.getOperand(2).getValueType();

		MachineMemOperand *MMO = MF.getMachineMemOperand(
		MachinePointerInfo(),
		MachineMemOperand::MOLoad,
		VT.getStoreSize(), VT.getStoreSize());
		arsenmUnsubmitted Not Done Reply Inline Actions This isn't the right place for this, although it is what other intrinsics are doing right now. As a follow up patch it would be good to move the MMO creation into getTgtMemIntrinsic arsenm: This isn't the right place for this, although it is what other intrinsics are doing right now.
		return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_LOAD_FORMAT, DL,
		Op->getVTList(), Ops, VT, MMO);
		}
// Basic sample.		// Basic sample.
case Intrinsic::amdgcn_image_sample:		case Intrinsic::amdgcn_image_sample:
case Intrinsic::amdgcn_image_sample_cl:		case Intrinsic::amdgcn_image_sample_cl:
case Intrinsic::amdgcn_image_sample_d:		case Intrinsic::amdgcn_image_sample_d:
case Intrinsic::amdgcn_image_sample_d_cl:		case Intrinsic::amdgcn_image_sample_d_cl:
case Intrinsic::amdgcn_image_sample_l:		case Intrinsic::amdgcn_image_sample_l:
case Intrinsic::amdgcn_image_sample_b:		case Intrinsic::amdgcn_image_sample_b:
case Intrinsic::amdgcn_image_sample_b_cl:		case Intrinsic::amdgcn_image_sample_b_cl:
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
}		}
default:		default:
return SDValue();		return SDValue();
}		}
}		}

SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,		SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();
SDLoc DL(Op);		SDLoc DL(Op);
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();		unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
		MachineFunction &MF = DAG.getMachineFunction();

switch (IntrinsicID) {		switch (IntrinsicID) {
case Intrinsic::amdgcn_exp: {		case Intrinsic::amdgcn_exp: {
const ConstantSDNode *Tgt = cast<ConstantSDNode>(Op.getOperand(2));		const ConstantSDNode *Tgt = cast<ConstantSDNode>(Op.getOperand(2));
const ConstantSDNode *En = cast<ConstantSDNode>(Op.getOperand(3));		const ConstantSDNode *En = cast<ConstantSDNode>(Op.getOperand(3));
const ConstantSDNode *Done = cast<ConstantSDNode>(Op.getOperand(8));		const ConstantSDNode *Done = cast<ConstantSDNode>(Op.getOperand(8));
const ConstantSDNode *VM = cast<ConstantSDNode>(Op.getOperand(9));		const ConstantSDNode *VM = cast<ConstantSDNode>(Op.getOperand(9));

const SDValue Ops[] = {		const SDValue Ops[] = {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
case Intrinsic::amdgcn_init_exec: {		case Intrinsic::amdgcn_init_exec: {
return DAG.getNode(AMDGPUISD::INIT_EXEC, DL, MVT::Other, Chain,		return DAG.getNode(AMDGPUISD::INIT_EXEC, DL, MVT::Other, Chain,
Op.getOperand(2));		Op.getOperand(2));
}		}
case Intrinsic::amdgcn_init_exec_from_input: {		case Intrinsic::amdgcn_init_exec_from_input: {
return DAG.getNode(AMDGPUISD::INIT_EXEC_FROM_INPUT, DL, MVT::Other, Chain,		return DAG.getNode(AMDGPUISD::INIT_EXEC_FROM_INPUT, DL, MVT::Other, Chain,
Op.getOperand(2), Op.getOperand(3));		Op.getOperand(2), Op.getOperand(3));
}		}
		case AMDGPUIntrinsic::AMDGPU_kill: {
		SDValue Src = Op.getOperand(2);
		if (const ConstantFPSDNode *K = dyn_cast<ConstantFPSDNode>(Src)) {
		if (!K->isNegative())
		return Chain;

		SDValue NegOne = DAG.getTargetConstant(FloatToBits(-1.0f), DL, MVT::i32);
		return DAG.getNode(AMDGPUISD::KILL, DL, MVT::Other, Chain, NegOne);
		}

		SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::i32, Src);
		return DAG.getNode(AMDGPUISD::KILL, DL, MVT::Other, Chain, Cast);
		}
		case Intrinsic::amdgcn_s_barrier: {
		if (getTargetMachine().getOptLevel() > CodeGenOpt::None) {
		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
		unsigned WGSize = ST.getFlatWorkGroupSizes(*MF.getFunction()).second;
		if (WGSize <= ST.getWavefrontSize())
		return SDValue(DAG.getMachineNode(AMDGPU::WAVE_BARRIER, DL, MVT::Other,
		Op.getOperand(0)), 0);
		}
		return SDValue();
		};
case AMDGPUIntrinsic::SI_tbuffer_store: {		case AMDGPUIntrinsic::SI_tbuffer_store: {
SDValue Ops[] = {		SDValue Ops[] = {
arsenmUnsubmitted Not Done Reply Inline Actions Replacing the uses in the test is fine, but we need to keep this around until mesa is updated to use the new intrinsic arsenm: Replacing the uses in the test is fine, but we need to keep this around until mesa is updated…
Chain,		Chain,
Op.getOperand(2),		Op.getOperand(2),
Op.getOperand(3),		Op.getOperand(3),
Op.getOperand(4),		Op.getOperand(4),
Op.getOperand(5),		Op.getOperand(5),
Op.getOperand(6),		Op.getOperand(6),
Op.getOperand(7),		Op.getOperand(7),
Op.getOperand(8),		Op.getOperand(8),
Op.getOperand(9),		Op.getOperand(9),
Op.getOperand(10),		Op.getOperand(10),
		arsenmUnsubmitted Not Done Reply Inline Actions 80 column limit (a few other places too) arsenm: 80 column limit (a few other places too)
Op.getOperand(11),		Op.getOperand(11),
Op.getOperand(12),		Op.getOperand(12),
Op.getOperand(13),		Op.getOperand(13),
Op.getOperand(14)		Op.getOperand(14)
};		};

EVT VT = Op.getOperand(3).getValueType();		EVT VT = Op.getOperand(3).getValueType();
		arsenmUnsubmitted Not Done Reply Inline Actions Variable name styles arsenm: Variable name styles

MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo(),		MachinePointerInfo(),
MachineMemOperand::MOStore,		MachineMemOperand::MOStore,
VT.getStoreSize(), 4);		VT.getStoreSize(), 4);
return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,		return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT_LEGACY, DL,
		arsenmUnsubmitted Not Done Reply Inline Actions You shouldn't need a separate node here. You can insert whatever code is necessary to convert to the new intrinsic/node operands are here, such as bitcasting the types or inserting missing constants etc. arsenm: You shouldn't need a separate node here. You can insert whatever code is necessary to convert…
		dstuttardAuthorUnsubmitted Not Done Reply Inline Actions Part of the issue is that the legacy intrinsic has TFE which isn't present in the new one. If TFE is never used it could be done - but it would effectively ignore this field (unless the intrinsic is changed to remove it too). We could get this lowering code to check that the offen, idxen are correct given the parameters that have been passed in, and perhaps to make sure that tfe is never used (and assert if it is). Not sure this is necessarily the approach we should take given that the only reason the legacy support is still there is to make sure that code using this approach doesn't break. dstuttard: Part of the issue is that the legacy intrinsic has TFE which isn't present in the new one. If…
Op->getVTList(), Ops, VT, MMO);		Op->getVTList(), Ops, VT, MMO);
}		}
case AMDGPUIntrinsic::AMDGPU_kill: {
SDValue Src = Op.getOperand(2);
if (const ConstantFPSDNode *K = dyn_cast<ConstantFPSDNode>(Src)) {
if (!K->isNegative())
return Chain;

SDValue NegOne = DAG.getTargetConstant(FloatToBits(-1.0f), DL, MVT::i32);		case Intrinsic::amdgcn_tbuffer_store: {
return DAG.getNode(AMDGPUISD::KILL, DL, MVT::Other, Chain, NegOne);		SDValue Ops[] = {
		Chain,
		Op.getOperand(2),
		Op.getOperand(3),
		Op.getOperand(4),
		Op.getOperand(5),
		Op.getOperand(6),
		Op.getOperand(7),
		Op.getOperand(8),
		Op.getOperand(9),
		Op.getOperand(10),
		Op.getOperand(11)
		};
		EVT VT = Op.getOperand(3).getValueType();
		MachineMemOperand *MMO = MF.getMachineMemOperand(
		MachinePointerInfo(),
		MachineMemOperand::MOStore,
		VT.getStoreSize(), 4);
		return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,
		Op->getVTList(), Ops, VT, MMO);
}		}

SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::i32, Src);
return DAG.getNode(AMDGPUISD::KILL, DL, MVT::Other, Chain, Cast);
}
case Intrinsic::amdgcn_s_barrier: {
if (getTargetMachine().getOptLevel() > CodeGenOpt::None) {
const MachineFunction &MF = DAG.getMachineFunction();
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
unsigned WGSize = ST.getFlatWorkGroupSizes(*MF.getFunction()).second;
if (WGSize <= ST.getWavefrontSize())
return SDValue(DAG.getMachineNode(AMDGPU::WAVE_BARRIER, DL, MVT::Other,
Op.getOperand(0)), 0);
}
return SDValue();
};
default:		default:
return Op;		return Op;
}		}
}		}

SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
LoadSDNode *Load = cast<LoadSDNode>(Op);		LoadSDNode *Load = cast<LoadSDNode>(Op);
▲ Show 20 Lines • Show All 1,994 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

	Show All 33 Lines
	def SIatomic_inc : SDNode<"AMDGPUISD::ATOMIC_INC", SDTAtomic2,			def SIatomic_inc : SDNode<"AMDGPUISD::ATOMIC_INC", SDTAtomic2,
	[SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain]			[SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain]
	>;			>;

	def SIatomic_dec : SDNode<"AMDGPUISD::ATOMIC_DEC", SDTAtomic2,			def SIatomic_dec : SDNode<"AMDGPUISD::ATOMIC_DEC", SDTAtomic2,
	[SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain]			[SDNPMayLoad, SDNPMayStore, SDNPMemOperand, SDNPHasChain]
	>;			>;

	def SItbuffer_store : SDNode<"AMDGPUISD::TBUFFER_STORE_FORMAT",			// Legacy tbuffer_store support (llvm.SI.tbuffer.store)
				def SItbuffer_store_legacy : SDNode<"AMDGPUISD::TBUFFER_STORE_FORMAT_LEGACY",
	SDTypeProfile<0, 13,			SDTypeProfile<0, 13,
	[SDTCisVT<0, v4i32>, // rsrc(SGPR)			[SDTCisVT<0, v4i32>, // rsrc(SGPR)
	SDTCisVT<1, iAny>, // vdata(VGPR)			SDTCisVT<1, iAny>, // vdata(VGPR)
	SDTCisVT<2, i32>, // num_channels(imm)			SDTCisVT<2, i32>, // num_channels(imm)
	SDTCisVT<3, i32>, // vaddr(VGPR)			SDTCisVT<3, i32>, // vaddr(VGPR)
	SDTCisVT<4, i32>, // soffset(SGPR)			SDTCisVT<4, i32>, // soffset(SGPR)
	SDTCisVT<5, i32>, // inst_offset(imm)			SDTCisVT<5, i32>, // inst_offset(imm)
	SDTCisVT<6, i32>, // dfmt(imm)			SDTCisVT<6, i32>, // dfmt(imm)
	SDTCisVT<7, i32>, // nfmt(imm)			SDTCisVT<7, i32>, // nfmt(imm)
	SDTCisVT<8, i32>, // offen(imm)			SDTCisVT<8, i32>, // offen(imm)
	SDTCisVT<9, i32>, // idxen(imm)			SDTCisVT<9, i32>, // idxen(imm)
	SDTCisVT<10, i32>, // glc(imm)			SDTCisVT<10, i32>, // glc(imm)
	SDTCisVT<11, i32>, // slc(imm)			SDTCisVT<11, i32>, // slc(imm)
	SDTCisVT<12, i32> // tfe(imm)			SDTCisVT<12, i32> // tfe(imm)
	]>,			]>,
	[SDNPMayStore, SDNPMemOperand, SDNPHasChain]			[SDNPMayStore, SDNPMemOperand, SDNPHasChain]
	>;			>;

				def SItbuffer_load : SDNode<"AMDGPUISD::TBUFFER_LOAD_FORMAT",
				SDTypeProfile<1, 9,
				[ // vdata
				SDTCisVT<1, v4i32>, // rsrc
				SDTCisVT<2, i32>, // vindex(VGPR)
				SDTCisVT<3, i32>, // voffset(VGPR)
				SDTCisVT<4, i32>, // soffset(SGPR)
				SDTCisVT<5, i32>, // offset(imm)
				SDTCisVT<6, i32>, // dfmt(imm)
				SDTCisVT<7, i32>, // nfmt(imm)
				SDTCisVT<8, i32>, // glc(imm)
				SDTCisVT<9, i32> // slc(imm)
				]>,
				[SDNPMayLoad, SDNPMemOperand, SDNPHasChain]
				>;

				def SItbuffer_store : SDNode<"AMDGPUISD::TBUFFER_STORE_FORMAT",
				SDTypeProfile<0, 10,
				[ // vdata
				SDTCisVT<1, v4i32>, // rsrc
				SDTCisVT<2, i32>, // vindex(VGPR)
				SDTCisVT<3, i32>, // voffset(VGPR)
				SDTCisVT<4, i32>, // soffset(SGPR)
				SDTCisVT<5, i32>, // offset(imm)
				SDTCisVT<6, i32>, // dfmt(imm)
				SDTCisVT<7, i32>, // nfmt(imm)
				SDTCisVT<8, i32>, // glc(imm)
				SDTCisVT<9, i32> // slc(imm)
				]>,
				[SDNPMayStore, SDNPMemOperand, SDNPHasChain]
				>;

	def SDTBufferLoad : SDTypeProfile<1, 5,			def SDTBufferLoad : SDTypeProfile<1, 5,
	[ // vdata			[ // vdata
	SDTCisVT<1, v4i32>, // rsrc			SDTCisVT<1, v4i32>, // rsrc
	SDTCisVT<2, i32>, // vindex			SDTCisVT<2, i32>, // vindex
	SDTCisVT<3, i32>, // offset			SDTCisVT<3, i32>, // offset
	SDTCisVT<4, i1>, // glc			SDTCisVT<4, i1>, // glc
	SDTCisVT<5, i1>]>; // slc			SDTCisVT<5, i1>]>; // slc

	▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
	def tfe : NamedOperandBit<"TFE", NamedMatchClass<"TFE">>;			def tfe : NamedOperandBit<"TFE", NamedMatchClass<"TFE">>;
	def unorm : NamedOperandBit<"UNorm", NamedMatchClass<"UNorm">>;			def unorm : NamedOperandBit<"UNorm", NamedMatchClass<"UNorm">>;
	def da : NamedOperandBit<"DA", NamedMatchClass<"DA">>;			def da : NamedOperandBit<"DA", NamedMatchClass<"DA">>;
	def r128 : NamedOperandBit<"R128", NamedMatchClass<"R128">>;			def r128 : NamedOperandBit<"R128", NamedMatchClass<"R128">>;
	def lwe : NamedOperandBit<"LWE", NamedMatchClass<"LWE">>;			def lwe : NamedOperandBit<"LWE", NamedMatchClass<"LWE">>;
	def exp_compr : NamedOperandBit<"ExpCompr", NamedMatchClass<"ExpCompr">>;			def exp_compr : NamedOperandBit<"ExpCompr", NamedMatchClass<"ExpCompr">>;
	def exp_vm : NamedOperandBit<"ExpVM", NamedMatchClass<"ExpVM">>;			def exp_vm : NamedOperandBit<"ExpVM", NamedMatchClass<"ExpVM">>;

				def DFMT : NamedOperandU8<"DFMT", NamedMatchClass<"DFMT">>;
				def NFMT : NamedOperandU8<"NFMT", NamedMatchClass<"NFMT">>;

	def dmask : NamedOperandU16<"DMask", NamedMatchClass<"DMask">>;			def dmask : NamedOperandU16<"DMask", NamedMatchClass<"DMask">>;

	def dpp_ctrl : NamedOperandU32<"DPPCtrl", NamedMatchClass<"DPPCtrl", 0>>;			def dpp_ctrl : NamedOperandU32<"DPPCtrl", NamedMatchClass<"DPPCtrl", 0>>;
	def row_mask : NamedOperandU32<"RowMask", NamedMatchClass<"RowMask">>;			def row_mask : NamedOperandU32<"RowMask", NamedMatchClass<"RowMask">>;
	def bank_mask : NamedOperandU32<"BankMask", NamedMatchClass<"BankMask">>;			def bank_mask : NamedOperandU32<"BankMask", NamedMatchClass<"BankMask">>;
	def bound_ctrl : NamedOperandBit<"BoundCtrl", NamedMatchClass<"BoundCtrl">>;			def bound_ctrl : NamedOperandBit<"BoundCtrl", NamedMatchClass<"BoundCtrl">>;

	def dst_sel : NamedOperandU32<"SDWADstSel", NamedMatchClass<"SDWADstSel">>;			def dst_sel : NamedOperandU32<"SDWADstSel", NamedMatchClass<"SDWADstSel">>;
	▲ Show 20 Lines • Show All 1,155 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.ll

This file was added.

				;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck -check-prefix=GCN %s
				;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefix=GCN %s

				; GCN-LABEL: {{^}}tbuffer_load:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0
				arsenmUnsubmitted Not Done Reply Inline Actions Should use GCN check prefix (and space after ;) arsenm: Should use GCN check prefix (and space after ;)
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:15, nfmt:3, 0 glc
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:6, nfmt:1, 0 slc
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:6, nfmt:1, 0
				; GCN: s_waitcnt
				define amdgpu_vs {<4 x float>, <4 x float>, <4 x float>, <4 x float>} @tbuffer_load(<4 x i32> inreg) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 14, i32 4, i1 0, i1 0)
				%vdata_glc = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 15, i32 3, i1 1, i1 0)
				%vdata_slc = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 6, i32 1, i1 0, i1 1)
				%vdata_f32 = call <4 x float> @llvm.amdgcn.tbuffer.load.v4f32(<4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 6, i32 1, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				%vdata_glc.f = bitcast <4 x i32> %vdata_glc to <4 x float>
				%vdata_slc.f = bitcast <4 x i32> %vdata_slc to <4 x float>
				%r0 = insertvalue {<4 x float>, <4 x float>, <4 x float>, <4 x float>} undef, <4 x float> %vdata.f, 0
				%r1 = insertvalue {<4 x float>, <4 x float>, <4 x float>, <4 x float>} %r0, <4 x float> %vdata_glc.f, 1
				%r2 = insertvalue {<4 x float>, <4 x float>, <4 x float>, <4 x float>} %r1, <4 x float> %vdata_slc.f, 2
				%r3 = insertvalue {<4 x float>, <4 x float>, <4 x float>, <4 x float>} %r2, <4 x float> %vdata_f32, 3
				ret {<4 x float>, <4 x float>, <4 x float>, <4 x float>} %r3
				}

				; GCN-LABEL: {{^}}tbuffer_load_immoffs:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0 offset:42
				define amdgpu_vs <4 x float> @tbuffer_load_immoffs(<4 x i32> inreg) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 0, i32 42, i32 14, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				ret <4 x float> %vdata.f
				}

				; GCN-LABEL: {{^}}tbuffer_load_immoffs_large
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:15, nfmt:2, 61 offset:4095
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:3, {{s[0-9]+}} offset:73
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:13, nfmt:4, {{s[0-9]+}} offset:1
				; GCN: s_waitcnt
				define amdgpu_vs {<4 x float>, <4 x float>, <4 x float>} @tbuffer_load_immoffs_large(<4 x i32> inreg, i32 inreg %soffs) {
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 61, i32 4095, i32 15, i32 2, i1 0, i1 0)
				%vdata_glc = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 %soffs, i32 73, i32 14, i32 3, i1 0, i1 0)
				%vdata_slc = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 0, i32 %soffs, i32 1, i32 13, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				%vdata_glc.f = bitcast <4 x i32> %vdata_glc to <4 x float>
				%vdata_slc.f = bitcast <4 x i32> %vdata_slc to <4 x float>
				%r0 = insertvalue {<4 x float>, <4 x float>, <4 x float>} undef, <4 x float> %vdata.f, 0
				%r1 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r0, <4 x float> %vdata_glc.f, 1
				%r2 = insertvalue {<4 x float>, <4 x float>, <4 x float>} %r1, <4 x float> %vdata_slc.f, 2
				ret {<4 x float>, <4 x float>, <4 x float>} %r2
				}

				; GCN-LABEL: {{^}}tbuffer_load_idx:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0 idxen
				define amdgpu_vs <4 x float> @tbuffer_load_idx(<4 x i32> inreg, i32 %vindex) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 %vindex, i32 0, i32 0, i32 0, i32 14, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				ret <4 x float> %vdata.f
				}

				; GCN-LABEL: {{^}}tbuffer_load_ofs:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0 offen
				define amdgpu_vs <4 x float> @tbuffer_load_ofs(<4 x i32> inreg, i32 %voffs) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 %voffs, i32 0, i32 0, i32 14, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				ret <4 x float> %vdata.f
				}

				; GCN-LABEL: {{^}}tbuffer_load_ofs_imm:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0 offen offset:52
				define amdgpu_vs <4 x float> @tbuffer_load_ofs_imm(<4 x i32> inreg, i32 %voffs) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 0, i32 %voffs, i32 0, i32 52, i32 14, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				ret <4 x float> %vdata.f
				}

				; GCN-LABEL: {{^}}tbuffer_load_both:
				; GCN: tbuffer_load_format_xyzw {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, dfmt:14, nfmt:4, 0 idxen offen
				define amdgpu_vs <4 x float> @tbuffer_load_both(<4 x i32> inreg, i32 %vindex, i32 %voffs) {
				main_body:
				%vdata = call <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32> %0, i32 %vindex, i32 %voffs, i32 0, i32 0, i32 14, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <4 x i32> %vdata to <4 x float>
				ret <4 x float> %vdata.f
				}


				; GCN-LABEL: {{^}}buffer_load_xy:
				; GCN: tbuffer_load_format_xy {{v\[[0-9]+:[0-9]+\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:13, nfmt:4, 0
				define amdgpu_vs <2 x float> @buffer_load_xy(<4 x i32> inreg %rsrc) {
				%vdata = call <2 x i32> @llvm.amdgcn.tbuffer.load.v2i32(<4 x i32> %rsrc, i32 0, i32 0, i32 0, i32 0, i32 13, i32 4, i1 0, i1 0)
				%vdata.f = bitcast <2 x i32> %vdata to <2 x float>
				ret <2 x float> %vdata.f
				}

				; GCN-LABEL: {{^}}buffer_load_x:
				; GCN: tbuffer_load_format_x {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, dfmt:13, nfmt:4, 0
				define amdgpu_vs float @buffer_load_x(<4 x i32> inreg %rsrc) {
				%vdata = call i32 @llvm.amdgcn.tbuffer.load.i32(<4 x i32> %rsrc, i32 0, i32 0, i32 0, i32 0, i32 13, i32 4, i1 0, i1 0)
				%vdata.f = bitcast i32 %vdata to float
				ret float %vdata.f
				}

				declare i32 @llvm.amdgcn.tbuffer.load.i32(<4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)
				declare <2 x i32> @llvm.amdgcn.tbuffer.load.v2i32(<4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)
				declare <4 x i32> @llvm.amdgcn.tbuffer.load.v4i32(<4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)
				declare <4 x float> @llvm.amdgcn.tbuffer.load.v4f32(<4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)

test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.store.ll

This file was added.

				;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck -check-prefix=GCN %s
				;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefix=GCN %s

				arsenmUnsubmitted Not Done Reply Inline Actions Ditto arsenm: Ditto
				; GCN-LABEL: {{^}}tbuffer_store:
				; GCN: tbuffer_store_format_xyzw v[0:3], off, s[0:3], dfmt:12, nfmt:2, 0
				; GCN: tbuffer_store_format_xyzw v[4:7], off, s[0:3], dfmt:13, nfmt:3, 0 glc
				; GCN: tbuffer_store_format_xyzw v[8:11], off, s[0:3], dfmt:14, nfmt:4, 0 slc
				; GCN: tbuffer_store_format_xyzw v[8:11], off, s[0:3], dfmt:14, nfmt:4, 0
				define amdgpu_ps void @tbuffer_store(<4 x i32> inreg, <4 x float>, <4 x float>, <4 x float>) {
				main_body:
				%in1 = bitcast <4 x float> %1 to <4 x i32>
				%in2 = bitcast <4 x float> %2 to <4 x i32>
				%in3 = bitcast <4 x float> %3 to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 12, i32 2, i1 0, i1 0)
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in2, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 13, i32 3, i1 1, i1 0)
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in3, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 14, i32 4, i1 0, i1 1)
				call void @llvm.amdgcn.tbuffer.store.v4f32(<4 x float> %3, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0, i32 14, i32 4, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}tbuffer_store_immoffs:
				; GCN: tbuffer_store_format_xyzw v[0:3], off, s[0:3], dfmt:5, nfmt:7, 0 offset:42
				define amdgpu_ps void @tbuffer_store_immoffs(<4 x i32> inreg, <4 x float>) {
				main_body:
				%in1 = bitcast <4 x float> %1 to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 0, i32 0, i32 0, i32 42, i32 5, i32 7, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}tbuffer_store_scalar_and_imm_offs:
				; GCN: tbuffer_store_format_xyzw v[0:3], off, s[0:3], dfmt:5, nfmt:7, {{s[0-9]+}} offset:42
				define amdgpu_ps void @tbuffer_store_scalar_and_imm_offs(<4 x i32> inreg, <4 x float> %vdata, i32 inreg %soffset) {
				main_body:
				%in1 = bitcast <4 x float> %vdata to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 0, i32 0, i32 %soffset, i32 42, i32 5, i32 7, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}buffer_store_idx:
				; GCN: tbuffer_store_format_xyzw v[0:3], v4, s[0:3], dfmt:15, nfmt:2, 0 idxen
				define amdgpu_ps void @buffer_store_idx(<4 x i32> inreg, <4 x float> %vdata, i32 %vindex) {
				main_body:
				%in1 = bitcast <4 x float> %vdata to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 %vindex, i32 0, i32 0, i32 0, i32 15, i32 2, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}buffer_store_ofs:
				; GCN: tbuffer_store_format_xyzw v[0:3], v4, s[0:3], dfmt:3, nfmt:7, 0 offen
				define amdgpu_ps void @buffer_store_ofs(<4 x i32> inreg, <4 x float> %vdata, i32 %voffset) {
				main_body:
				%in1 = bitcast <4 x float> %vdata to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 0, i32 %voffset, i32 0, i32 0, i32 3, i32 7, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}buffer_store_both:
				; GCN: tbuffer_store_format_xyzw v[0:3], v[4:5], s[0:3], dfmt:6, nfmt:4, 0 idxen offen
				define amdgpu_ps void @buffer_store_both(<4 x i32> inreg, <4 x float> %vdata, i32 %vindex, i32 %voffset) {
				main_body:
				%in1 = bitcast <4 x float> %vdata to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 %vindex, i32 %voffset, i32 0, i32 0, i32 6, i32 4, i1 0, i1 0)
				ret void
				}

				; Ideally, the register allocator would avoid the wait here
				;
				; GCN-LABEL: {{^}}buffer_store_wait:
				; GCN: tbuffer_store_format_xyzw v[0:3], v4, s[0:3], dfmt:15, nfmt:3, 0 idxen
				; GCN: s_waitcnt vmcnt(0) expcnt(0)
				; GCN: buffer_load_format_xyzw v[0:3], v5, s[0:3], 0 idxen
				; GCN: s_waitcnt vmcnt(0)
				; GCN: tbuffer_store_format_xyzw v[0:3], v6, s[0:3], dfmt:16, nfmt:2, 0 idxen
				define amdgpu_ps void @buffer_store_wait(<4 x i32> inreg, <4 x float> %vdata, i32 %vindex.1, i32 %vindex.2, i32 %vindex.3) {
				main_body:
				%in1 = bitcast <4 x float> %vdata to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %in1, <4 x i32> %0, i32 %vindex.1, i32 0, i32 0, i32 0, i32 15, i32 3, i1 0, i1 0)
				%data = call <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32> %0, i32 %vindex.2, i32 0, i1 0, i1 0)
				%data.i = bitcast <4 x float> %data to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %data.i, <4 x i32> %0, i32 %vindex.3, i32 0, i32 0, i32 0, i32 16, i32 2, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}buffer_store_x1:
				; GCN: tbuffer_store_format_x v0, v1, s[0:3], dfmt:13, nfmt:7, 0 idxen
				define amdgpu_ps void @buffer_store_x1(<4 x i32> inreg %rsrc, float %data, i32 %vindex) {
				main_body:
				%data.i = bitcast float %data to i32
				call void @llvm.amdgcn.tbuffer.store.i32(i32 %data.i, <4 x i32> %rsrc, i32 %vindex, i32 0, i32 0, i32 0, i32 13, i32 7, i1 0, i1 0)
				ret void
				}

				; GCN-LABEL: {{^}}buffer_store_x2:
				; GCN: tbuffer_store_format_xy v[0:1], v2, s[0:3], dfmt:1, nfmt:2, 0 idxen
				define amdgpu_ps void @buffer_store_x2(<4 x i32> inreg %rsrc, <2 x float> %data, i32 %vindex) {
				main_body:
				%data.i = bitcast <2 x float> %data to <2 x i32>
				call void @llvm.amdgcn.tbuffer.store.v2i32(<2 x i32> %data.i, <4 x i32> %rsrc, i32 %vindex, i32 0, i32 0, i32 0, i32 1, i32 2, i1 0, i1 0)
				ret void
				}

				declare void @llvm.amdgcn.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0
				declare void @llvm.amdgcn.tbuffer.store.v2i32(<2 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0
				declare void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0
				declare void @llvm.amdgcn.tbuffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0
				declare <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32>, i32, i32, i1, i1) #1

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/merge-store-crash.ll

Show All 20 Lines	main_body:
%tmp5 = getelementptr [8192 x i32], [8192 x i32] addrspace(3)* @tess_lds, i64 0, i64 %tmp4		%tmp5 = getelementptr [8192 x i32], [8192 x i32] addrspace(3)* @tess_lds, i64 0, i64 %tmp4
%tmp6 = bitcast i32 addrspace(3)* %tmp5 to float addrspace(3)*		%tmp6 = bitcast i32 addrspace(3)* %tmp5 to float addrspace(3)*
store float %tmp1, float addrspace(3)* %tmp6, align 4		store float %tmp1, float addrspace(3)* %tmp6, align 4
%tmp7 = bitcast float %tmp1 to i32		%tmp7 = bitcast float %tmp1 to i32
%tmp8 = insertelement <4 x i32> undef, i32 %tmp2, i32 0		%tmp8 = insertelement <4 x i32> undef, i32 %tmp2, i32 0
%tmp9 = insertelement <4 x i32> %tmp8, i32 %tmp7, i32 1		%tmp9 = insertelement <4 x i32> %tmp8, i32 %tmp7, i32 1
%tmp10 = insertelement <4 x i32> %tmp9, i32 undef, i32 2		%tmp10 = insertelement <4 x i32> %tmp9, i32 undef, i32 2
%tmp11 = insertelement <4 x i32> %tmp10, i32 undef, i32 3		%tmp11 = insertelement <4 x i32> %tmp10, i32 undef, i32 3
call void @llvm.SI.tbuffer.store.v4i32(<16 x i8> undef, <4 x i32> %tmp11, i32 4, i32 undef, i32 %arg, i32 0, i32 14, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %tmp11, <4 x i32> undef, i32 undef, i32 0, i32 %arg, i32 0, i32 14, i32 4, i1 1, i1 1)
ret void		ret void
}		}

; Function Attrs: nounwind		; Function Attrs: nounwind
declare void @llvm.SI.tbuffer.store.v4i32(<16 x i8>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) #0		declare void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0

attributes #0 = { nounwind }		attributes #0 = { nounwind }

test/CodeGen/AMDGPU/merge-store-usedef.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck %s

	; CHECK-LABEL: {{^}}test1:			; CHECK-LABEL: {{^}}test1:
	; CHECK: ds_write_b32			; CHECK: ds_write_b32
	; CHECK: ds_read_b32			; CHECK: ds_read_b32
	; CHECK: ds_write_b32			; CHECK: ds_write_b32
	define amdgpu_vs void @test1(i32 %v) #0 {			define amdgpu_vs void @test1(i32 %v) #0 {
	%p0 = getelementptr i32, i32 addrspace(3)* null, i32 0			%p0 = getelementptr i32, i32 addrspace(3)* null, i32 0
	%p1 = getelementptr i32, i32 addrspace(3)* null, i32 1			%p1 = getelementptr i32, i32 addrspace(3)* null, i32 1

	store i32 %v, i32 addrspace(3)* %p0			store i32 %v, i32 addrspace(3)* %p0

	call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 %v, i32 1, i32 undef, i32 undef, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)			call void @llvm.amdgcn.tbuffer.store.i32(i32 %v, <4 x i32> undef, i32 0, i32 0, i32 0, i32 0, i32 4, i32 4, i1 1, i1 0)

	%w = load i32, i32 addrspace(3)* %p0			%w = load i32, i32 addrspace(3)* %p0
	store i32 %w, i32 addrspace(3)* %p1			store i32 %w, i32 addrspace(3)* %p1
	ret void			ret void
	}			}

	declare void @llvm.SI.tbuffer.store.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/AMDGPU/mubuf.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 64 offen glc			; CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 64 offen glc
	define amdgpu_gs void @soffset_max_imm([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [16 x <4 x i32>] addrspace(2)* byval, [32 x <8 x i32>] addrspace(2)* byval, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {			define amdgpu_gs void @soffset_max_imm([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [16 x <4 x i32>] addrspace(2)* byval, [32 x <8 x i32>] addrspace(2)* byval, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {
	main_body:			main_body:
	%tmp0 = getelementptr [6 x <16 x i8>], [6 x <16 x i8>] addrspace(2)* %0, i32 0, i32 0			%tmp0 = getelementptr [6 x <16 x i8>], [6 x <16 x i8>] addrspace(2)* %0, i32 0, i32 0
	%tmp1 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp0			%tmp1 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp0
	%tmp2 = shl i32 %6, 2			%tmp2 = shl i32 %6, 2
	%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> %tmp1, i32 %tmp2, i32 64, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)			%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> %tmp1, i32 %tmp2, i32 64, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)
	%tmp4 = add i32 %6, 16			%tmp4 = add i32 %6, 16
	call void @llvm.SI.tbuffer.store.i32(<16 x i8> %tmp1, i32 %tmp3, i32 1, i32 %tmp4, i32 %4, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)			%tmp1.4xi32 = bitcast <16 x i8> %tmp1 to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp3, <4 x i32> %tmp1.4xi32, i32 0, i32 %tmp4, i32 %4, i32 0, i32 4, i32 4, i1 1, i1 1)
	ret void			ret void
	}			}

	; Make sure immediates that aren't inline constants don't get folded into			; Make sure immediates that aren't inline constants don't get folded into
	; the soffset operand.			; the soffset operand.
	; FIXME: for this test we should be smart enough to shift the immediate into			; FIXME: for this test we should be smart enough to shift the immediate into
	; the offset field.			; the offset field.
	; CHECK-LABEL: {{^}}soffset_no_fold:			; CHECK-LABEL: {{^}}soffset_no_fold:
	; CHECK: s_movk_i32 [[SOFFSET:s[0-9]+]], 0x41			; CHECK: s_movk_i32 [[SOFFSET:s[0-9]+]], 0x41
	; CHECK: buffer_load_dword v{{[0-9+]}}, v{{[0-9+]}}, s[{{[0-9]+}}:{{[0-9]+}}], [[SOFFSET]] offen glc			; CHECK: buffer_load_dword v{{[0-9+]}}, v{{[0-9+]}}, s[{{[0-9]+}}:{{[0-9]+}}], [[SOFFSET]] offen glc
	define amdgpu_gs void @soffset_no_fold([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [16 x <4 x i32>] addrspace(2)* byval, [32 x <8 x i32>] addrspace(2)* byval, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {			define amdgpu_gs void @soffset_no_fold([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [16 x <4 x i32>] addrspace(2)* byval, [32 x <8 x i32>] addrspace(2)* byval, i32 inreg, i32 inreg, i32, i32, i32, i32, i32, i32, i32, i32) {
	main_body:			main_body:
	%tmp0 = getelementptr [6 x <16 x i8>], [6 x <16 x i8>] addrspace(2)* %0, i32 0, i32 0			%tmp0 = getelementptr [6 x <16 x i8>], [6 x <16 x i8>] addrspace(2)* %0, i32 0, i32 0
	%tmp1 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp0			%tmp1 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp0
	%tmp2 = shl i32 %6, 2			%tmp2 = shl i32 %6, 2
	%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> %tmp1, i32 %tmp2, i32 65, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)			%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> %tmp1, i32 %tmp2, i32 65, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)
	%tmp4 = add i32 %6, 16			%tmp4 = add i32 %6, 16
	call void @llvm.SI.tbuffer.store.i32(<16 x i8> %tmp1, i32 %tmp3, i32 1, i32 %tmp4, i32 %4, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)			%tmp1.4xi32 = bitcast <16 x i8> %tmp1 to <4 x i32>
				call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp3, <4 x i32> %tmp1.4xi32, i32 0, i32 %tmp4, i32 %4, i32 0, i32 4, i32 4, i1 1, i1 1)
	ret void			ret void
	}			}

	;;;==========================================================================;;;			;;;==========================================================================;;;
	;;; MUBUF STORE TESTS			;;; MUBUF STORE TESTS
	;;;==========================================================================;;;			;;;==========================================================================;;;

	; MUBUF store with an immediate byte offset that fits into 12-bits			; MUBUF store with an immediate byte offset that fits into 12-bits
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @store_vgpr_ptr(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @store_vgpr_ptr(i32 addrspace(1)* %out) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x() readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() readnone
	%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid			%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
	store i32 99, i32 addrspace(1)* %out.gep, align 4			store i32 99, i32 addrspace(1)* %out.gep, align 4
	ret void			ret void
	}			}

	declare i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32) #0			declare i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32) #0
	declare void @llvm.SI.tbuffer.store.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32)			declare void @llvm.amdgcn.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)

	attributes #0 = { nounwind readonly }			attributes #0 = { nounwind readonly }

test/CodeGen/AMDGPU/scheduler-subrange-crash.ll

Show All 19 Lines	main_body:
%tmp2 = call float @llvm.SI.load.const(<16 x i8> undef, i32 48)		%tmp2 = call float @llvm.SI.load.const(<16 x i8> undef, i32 48)
%array_vector3 = insertelement <4 x float> zeroinitializer, float %tmp2, i32 3		%array_vector3 = insertelement <4 x float> zeroinitializer, float %tmp2, i32 3
%array_vector5 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp, i32 1		%array_vector5 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp, i32 1
%array_vector6 = insertelement <4 x float> %array_vector5, float undef, i32 2		%array_vector6 = insertelement <4 x float> %array_vector5, float undef, i32 2
%array_vector9 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp1, i32 1		%array_vector9 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp1, i32 1
%array_vector10 = insertelement <4 x float> %array_vector9, float 0.000000e+00, i32 2		%array_vector10 = insertelement <4 x float> %array_vector9, float 0.000000e+00, i32 2
%array_vector11 = insertelement <4 x float> %array_vector10, float undef, i32 3		%array_vector11 = insertelement <4 x float> %array_vector10, float undef, i32 3
%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> undef, i32 undef, i32 4864, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)		%tmp3 = call i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8> undef, i32 undef, i32 4864, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0)
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 %tmp3, i32 1, i32 36, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp3, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 36, i32 4, i32 4, i1 1, i1 1)
%bc = bitcast <4 x float> %array_vector3 to <4 x i32>		%bc = bitcast <4 x float> %array_vector3 to <4 x i32>
%tmp4 = extractelement <4 x i32> %bc, i32 undef		%tmp4 = extractelement <4 x i32> %bc, i32 undef
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 %tmp4, i32 1, i32 48, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp4, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 48, i32 4, i32 4, i1 1, i1 1)
%bc49 = bitcast <4 x float> %array_vector11 to <4 x i32>		%bc49 = bitcast <4 x float> %array_vector11 to <4 x i32>
%tmp5 = extractelement <4 x i32> %bc49, i32 undef		%tmp5 = extractelement <4 x i32> %bc49, i32 undef
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 %tmp5, i32 1, i32 72, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp5, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 72, i32 4, i32 4, i1 1, i1 1)
%array_vector21 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp, i32 1		%array_vector21 = insertelement <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, float %tmp, i32 1
%array_vector22 = insertelement <4 x float> %array_vector21, float undef, i32 2		%array_vector22 = insertelement <4 x float> %array_vector21, float undef, i32 2
%array_vector23 = insertelement <4 x float> %array_vector22, float undef, i32 3		%array_vector23 = insertelement <4 x float> %array_vector22, float undef, i32 3
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 undef, i32 1, i32 28, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 undef, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 28, i32 4, i32 4, i1 1, i1 1)
%bc52 = bitcast <4 x float> %array_vector23 to <4 x i32>		%bc52 = bitcast <4 x float> %array_vector23 to <4 x i32>
%tmp6 = extractelement <4 x i32> %bc52, i32 undef		%tmp6 = extractelement <4 x i32> %bc52, i32 undef
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 %tmp6, i32 1, i32 64, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 %tmp6, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 64, i32 4, i32 4, i1 1, i1 1)
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 undef, i32 1, i32 20, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 undef, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 20, i32 4, i32 4, i1 1, i1 1)
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 undef, i32 1, i32 56, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 undef, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 56, i32 4, i32 4, i1 1, i1 1)
call void @llvm.SI.tbuffer.store.i32(<16 x i8> undef, i32 undef, i32 1, i32 92, i32 %arg, i32 0, i32 4, i32 4, i32 1, i32 0, i32 1, i32 1, i32 0)		call void @llvm.amdgcn.tbuffer.store.i32(i32 undef, <4 x i32> undef, i32 0, i32 0, i32 %arg, i32 92, i32 4, i32 4, i1 1, i1 1)
ret void		ret void
}		}

declare float @llvm.SI.load.const(<16 x i8>, i32) #1		declare float @llvm.SI.load.const(<16 x i8>, i32) #1
declare i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32) #2		declare i32 @llvm.SI.buffer.load.dword.i32.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32) #2
declare void @llvm.SI.tbuffer.store.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) #3		declare void @llvm.amdgcn.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #3

attributes #0 = { nounwind "target-cpu"="tonga" }		attributes #0 = { nounwind "target-cpu"="tonga" }
attributes #1 = { nounwind readnone }		attributes #1 = { nounwind readnone }
attributes #2 = { nounwind readonly }		attributes #2 = { nounwind readonly }
attributes #3 = { nounwind }		attributes #3 = { nounwind }

test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

	; RUN: llc -march=amdgcn -mcpu=bonaire -enable-amdgpu-aa=0 -verify-machineinstrs -enable-misched -enable-aa-sched-mi < %s \| FileCheck -check-prefix=FUNC -check-prefix=CI %s			; RUN: llc -march=amdgcn -mcpu=bonaire -enable-amdgpu-aa=0 -verify-machineinstrs -enable-misched -enable-aa-sched-mi < %s \| FileCheck -check-prefix=FUNC -check-prefix=CI %s

	declare void @llvm.SI.tbuffer.store.i32(<16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32)			declare void @llvm.amdgcn.tbuffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)
	declare void @llvm.SI.tbuffer.store.v4i32(<16 x i8>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32)			declare void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1)
	declare void @llvm.amdgcn.s.barrier() #1			declare void @llvm.amdgcn.s.barrier() #1
	declare i32 @llvm.amdgcn.workitem.id.x() #2			declare i32 @llvm.amdgcn.workitem.id.x() #2


	@stored_lds_ptr = addrspace(3) global i32 addrspace(3)* undef, align 4			@stored_lds_ptr = addrspace(3) global i32 addrspace(3)* undef, align 4
	@stored_constant_ptr = addrspace(3) global i32 addrspace(2)* undef, align 8			@stored_constant_ptr = addrspace(3) global i32 addrspace(2)* undef, align 8
	@stored_global_ptr = addrspace(3) global i32 addrspace(1)* undef, align 8			@stored_global_ptr = addrspace(3) global i32 addrspace(1)* undef, align 8

	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; %ptr0 = load i32 addrspace(3), i32 addrspace(3) addrspace(3)* @stored_lds_ptr, align 4			; %ptr0 = load i32 addrspace(3), i32 addrspace(3) addrspace(3)* @stored_lds_ptr, align 4

	; %ptr1 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 1			; %ptr1 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 1
	; %ptr2 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 2			; %ptr2 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 2

	; %tmp1 = load i32, i32 addrspace(3)* %ptr1, align 4			; %tmp1 = load i32, i32 addrspace(3)* %ptr1, align 4

	; %vdata = insertelement <4 x i32> undef, i32 %a1, i32 0			; %vdata = insertelement <4 x i32> undef, i32 %a1, i32 0
	; call void @llvm.SI.tbuffer.store.v4i32(<16 x i8> undef, <4 x i32> %vdata,			; call void @llvm.amdgcn.tbuffer.store.v4i32(<4 x i32> %vdata, <4 x i32> undef,
	; i32 4, i32 %vaddr, i32 0, i32 32, i32 14, i32 4, i32 1, i32 0, i32 1,			; i32 %vaddr, i32 0, i32 0, i32 32, i32 14, i32 4, i1 1, i1 1)
	; i32 1, i32 0)

	; %tmp2 = load i32, i32 addrspace(3)* %ptr2, align 4			; %tmp2 = load i32, i32 addrspace(3)* %ptr2, align 4

	; %add = add nsw i32 %tmp1, %tmp2			; %add = add nsw i32 %tmp1, %tmp2

	; store i32 %add, i32 addrspace(1)* %out, align 4			; store i32 %add, i32 addrspace(1)* %out, align 4
	; ret void			; ret void
	; }			; }

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind convergent }			attributes #1 = { nounwind convergent }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

test/MC/AMDGPU/mtbuf.s

This file was added.

				// RUN: llvm-mc -arch=amdgcn -mcpu=tahiti -show-encoding %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=SICI %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=bonaire -show-encoding %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=SICI %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=tonga -show-encoding %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

				//===----------------------------------------------------------------------===//
				// Test for dfmt and nfmt (tbuffer only)
				//===----------------------------------------------------------------------===//

				tbuffer_load_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_load_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x78,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_load_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x78,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_load_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_load_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x79,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_load_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x78,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_load_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_load_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7b,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_load_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x79,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_store_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_store_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7c,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_store_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7a,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_store_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_store_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7d,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_store_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x7a,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_store_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1
				// SICI: tbuffer_store_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7f,0xe9,0x00,0x01,0x01,0x01]
				// VI: tbuffer_store_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x7b,0xe9,0x00,0x01,0x01,0x01]

				tbuffer_store_format_xyzw v[1:4], off, ttmp[4:7], dfmt:15, nfmt:2, ttmp1
				// SICI: tbuffer_store_format_xyzw v[1:4], off, ttmp[4:7], dfmt:15, nfmt:2, ttmp1 ; encoding: [0x00,0x00,0x7f,0xe9,0x00,0x01,0x1d,0x71]
				// VI: tbuffer_store_format_xyzw v[1:4], off, ttmp[4:7], dfmt:15, nfmt:2, ttmp1 ; encoding: [0x00,0x80,0x7b,0xe9,0x00,0x01,0x1d,0x71]

test/MC/Disassembler/AMDGPU/mtbuf_vi.txt

This file was added.

				# RUN: llvm-mc -arch=amdgcn -mcpu=tonga -disassemble -show-encoding < %s \| FileCheck %s -check-prefix=VI

				# VI: tbuffer_load_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x78,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x00 0x78 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_load_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x78,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x80 0x78 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_load_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x79,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x80 0x79 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_store_format_x v1, off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x00,0x7a,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x00 0x7a 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_store_format_xy v[1:2], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x7a,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x80 0x7a 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_store_format_xyzw v[1:4], off, s[4:7], dfmt:15, nfmt:2, s1 ; encoding: [0x00,0x80,0x7b,0xe9,0x00,0x01,0x01,0x01]
				0x00 0x80 0x7b 0xe9 0x00 0x01 0x01 0x01

				# VI: tbuffer_store_format_xyzw v[1:4], off, ttmp[4:7], dfmt:15, nfmt:2, ttmp1 ; encoding: [0x00,0x80,0x7b,0xe9,0x00,0x01,0x1d,0x71]
				0x00 0x80 0x7b 0xe9 0x00 0x01 0x1d 0x71

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsics for tbuffer load and storeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100136

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

lib/Target/AMDGPU/BUFInstructions.td

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstrInfo.td

test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.ll

test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.store.ll

test/CodeGen/AMDGPU/merge-store-crash.ll

test/CodeGen/AMDGPU/merge-store-usedef.ll

test/CodeGen/AMDGPU/mubuf.ll

test/CodeGen/AMDGPU/scheduler-subrange-crash.ll

test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

test/MC/AMDGPU/mtbuf.s

test/MC/Disassembler/AMDGPU/mtbuf_vi.txt

[AMDGPU] Add intrinsics for tbuffer load and store
ClosedPublic