This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
SIISelLowering.h
18/18
SIISelLowering.cpp
5/5
SIInstructions.td
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
6/6
v_illegal-atomics.ll
-
v_illegal-image_sample.ll
-
MC/AMDGPU/
-
AMDGPU/
1/1
v_illegal-atomics.s

Differential D123693

Transform illegal intrinsics to V_ILLEGAL
ClosedPublic

Authored by Leonc on Apr 13 2022, 10:39 AM.

Download Raw Diff

Details

Reviewers

bcahoon
rampitec
cfang
arsenm

Commits

rG6a275cd53cc9: Transform illegal intrinsics to V_ILLEGAL

Summary

Related tasks:

SWDEV-240194
SWDEV-309417
SWDEV-334876

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Leonc created this revision.Apr 13 2022, 10:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 10:39 AM

Herald added subscribers: hsmhsm, foad, wenlei and 5 others. · View Herald Transcript

Leonc requested review of this revision.Apr 13 2022, 10:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 10:39 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Leonc edited the summary of this revision. (Show Details)Apr 13 2022, 10:41 AM

Leonc added reviewers: bcahoon, rampitec, cfang.

I think this is too focused. There are other image_sample_lz intrinsics and all of them potentially need to be replaced if following this approach.

Harbormaster completed remote builds in B159476: Diff 422560.Apr 13 2022, 10:56 AM

There is also global isel.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7852	!ST->hasExtendedImageInsts() instead of target check.

Updates toward catching and handling all illegal intrinsics.

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 10 2022, 10:08 PM

Revert rebase error.

Leonc retitled this revision from Transform tex2D to legal intrinsic on gfx90a. to Transform illegal intrinsics to V_ILLEGAL.Jun 10 2022, 10:24 PM

Leonc edited the summary of this revision. (Show Details)

Next steps:

Get feedback from team.
Add tests.
Extend coverage to INTRINSIC_WO_CHAIN and INTRINSIC_VOID.
Investigate test failure on gfx940.
- CodeGenOpenCL/builtins-fp-atomics-gfx940.cl.

Issues:

Not all intrinsics marked as custom are handled by the target lowering functions. This is the reason INTRINSIC_WO_CHAIN and INTRINSIC_VOID are not covered yet.
- It might be possible to determine if an intrinsic will be expanded if it is allowed to fall through.

Harbormaster completed remote builds in B169214: Diff 436108.Jun 10 2022, 10:49 PM

rampitec added a reviewer: arsenm.Jun 13 2022, 12:45 PM

Herald added a subscriber: wdng. · View Herald TranscriptJun 13 2022, 12:45 PM

Missing tests.

arsenm added inline comments.Jun 13 2022, 5:30 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
327 ↗	(On Diff #436108)	Just replace with undef regardless of the type. You also should have just done this when the node was initially created, no need to use the DAG preprocess
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
539 ↗	(On Diff #436108)	I don't think it's worth introducing a wrapper node for this
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7863	You can select directly to the instruction with getMachineNode, no need for the intermediate AMDGPUISD::ILLEGAL
llvm/lib/Target/AMDGPU/SIInstructions.td
3364–3368	This is more complicated. It's actually defined on gfx10 (and I believe can also be encoded as both 32 and 64-bit). For gfx6-9, all 0s does have an interpretation as an almost valid instruction. It decodes fine but violates the constant bus restriction. I would prefer to define a separate V_ILLEGAL that uses all 1s pre gfx10

arsenm requested changes to this revision.Jun 13 2022, 5:30 PM

This revision now requires changes to proceed.Jun 13 2022, 5:30 PM

Leonc added inline comments.Jun 13 2022, 9:23 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
327 ↗	(On Diff #436108)	I did try doing this in `TargetLowering` originally, but any `SDValue` returned from a lowering function must have the same type(s) as the node it's replacing. I think this was the same for undef but I'll try it again just to be sure.

Where's the rationale? Surely unsupported intrinsics should fail to select, and perhaps be diagnosed even earlier?

In D123693#3580812, @foad wrote:

Where's the rationale? Surely unsupported intrinsics should fail to select, and perhaps be diagnosed even earlier?

Normally yes. The issue is our libraries contain code for all targets. We rely on dead code elimination to remove illegal intrinsics, but that doesn't happen on -O0.

The issue is our libraries contain code for all targets. We rely on dead code elimination to remove illegal intrinsics, but that doesn't happen on -O0.

Why can't you put the code for different subtargets in different functions, each with an appropriate "target-cpu"= attribute?

In D123693#3580902, @foad wrote:

The issue is our libraries contain code for all targets. We rely on dead code elimination to remove illegal intrinsics, but that doesn't happen on -O0.

Why can't you put the code for different subtargets in different functions, each with an appropriate "target-cpu"= attribute?

That's what I would do. I think there's some resistance to it though, and it would probably require a lot more work.

In D123693#3580924, @Leonc wrote:

In D123693#3580902, @foad wrote:

The issue is our libraries contain code for all targets. We rely on dead code elimination to remove illegal intrinsics, but that doesn't happen on -O0.

Why can't you put the code for different subtargets in different functions, each with an appropriate "target-cpu"= attribute?

That's what I would do. I think there's some resistance to it though, and it would probably require a lot more work.

Well I am resistant to selecting intrinsics on subtargets that don't support them :)

How far are you planning to go with this approach? Do you expect to extend it to all AMDGPU intrinsics?

In D123693#3580962, @foad wrote:

In D123693#3580924, @Leonc wrote:

In D123693#3580902, @foad wrote:

The issue is our libraries contain code for all targets. We rely on dead code elimination to remove illegal intrinsics, but that doesn't happen on -O0.

Why can't you put the code for different subtargets in different functions, each with an appropriate "target-cpu"= attribute?

That's what I would do. I think there's some resistance to it though, and it would probably require a lot more work.

Well I am resistant to selecting intrinsics on subtargets that don't support them :)

How far are you planning to go with this approach? Do you expect to extend it to all AMDGPU intrinsics?

Agreed.

My initial approach was rejected for being too specific. Discussions since then have been about finding a solution for all illegal intrinsic calls. That's the plan unless I hear otherwise.

The rationale is for library code that looks like:
void example(float *p, float v) {

if (_ISA_verison == 9008 || __ISA_verison == 9010)
   global_atomic_fadd(p, v);
else
   generic_atomic_fadd(p, v);

}

At optimization levels > 0, the dead code is eliminated, but at optimization level 0, the dead code remain and the compiler attempts to generate an instruction. That instruction would never be executed due to the condition, but it still needs to be generated. In these cases, we don't have multilibs (target specific libraries).

Thanks @bcahoon. If there were another available approach that always worked, we would use it.

dmikushin added a subscriber: dmikushin.Jun 29 2022, 11:01 AM

Thank you for working on this. Debug build of PyTorch (BUILD_DEBUG_INFO=1) is also affected. Here is a creduce'd code which crashes rocm5.1.1 clang that you could possibly inciude into the test cases (compile with -O0):

extern "C" __attribute__((device)) void __ockl_atomic_add_noret_f32(float *,
                                                                    float);
__attribute__((device)) void a(float *address, float b) {
  __ockl_atomic_add_noret_f32(address, b);
}

Fix existing test failures.

Harbormaster completed remote builds in B177269: Diff 447180.Jul 24 2022, 8:23 PM

bcahoon added inline comments.Jul 25 2022, 7:03 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4415 ↗	(On Diff #447180)	At this location, ILLEGAL is a memory opcode, but SelectionDAGDumper will crash because when trying to print its memops.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6636	I think it may be easiest to create the ILLEGAL instruction here rather than creating it after the call to lowerImage. That way, there doesn't have to be copies of the code that creates ILLEGAL. For example, MachineSDNode *NewNode = DAG.getMachineNode(AMDGPU::V_ILLEGAL, DL, MVT::Other, Op.getOperand(0)); return DAG.getMergeValues({ DAG.getUNDEF(Op.getValueType()), SDValue(NewNode, 0) }, DL); The UNDEF value replaces return result for the intrinsic, and the V_ILLEGAL replaces the chain result. It appears, though, that V_ILLEGAL is removed if the intrinsic chain result is not used anywhere. I'm not sure if that is a bad thing since it means it's not really needed?
7121	Can this go in lowerImage? It seems like this will mark some cases as illegal that should be handled either in DAGToDAG or with patterns.

Leonc updated this revision to Diff 448074.Jul 27 2022, 9:31 AM

Revert changes to return values.
Refactor code.
Address comments.

Harbormaster completed remote builds in B177885: Diff 448074.Jul 27 2022, 10:12 AM

bcahoon added inline comments.Jul 27 2022, 3:33 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
540 ↗	(On Diff #448074)	Will ILLEGAL be removed?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6637	No need for braces
8401	The v_illegal should have an operand for the chain operand in the original instruction, if one exists. t0 EntryToken res,ch = intrinsic t0, <other opernads> -> res = undef ch = v_illegal t0 Also, if there is a use of the chain definition in the original instruction, then there is no need to add the code below.

Missing testscases. Also should include some assembler and disassembler tests for V_ILLEGAL

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401	I'd expect to use the original chain, but it also doesn't really matter given that it's filler anyway

Address comments.

Leonc added inline comments.Jul 28 2022, 4:34 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
540 ↗	(On Diff #448074)	Thanks I missed that one.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7852	I think this is unnecessary. We still need to call `lowerImage` for the return value when it succeeds.
8401	Thanks @bcahoon & @arsenm. I added a check for `MemSDNode` and pass the chain to `V_ILLEGAL` if it exists.

Harbormaster completed remote builds in B178051: Diff 448301.Jul 28 2022, 5:20 AM

Handling for atomics plus tests.

A couple more tests are needed. A test for for the image intrinsic. Also an assembler test for the v_illegal encoding.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7827	no braces needed
llvm/lib/Target/AMDGPU/SIInstructions.td
3368	Do we need different version for gfx10?

Harbormaster completed remote builds in B178306: Diff 448660.Jul 29 2022, 10:27 AM

Add different versions of V_ILLEGAL for different targets.

Leonc marked 2 inline comments as done.Aug 1 2022, 6:07 AM

Harbormaster completed remote builds in B178555: Diff 448997.Aug 1 2022, 6:29 AM

In D123693#3581616, @bcahoon wrote:
The rationale is for library code that looks like:
void example(float *p, float v) {
  if (_ISA_verison == 9008 || __ISA_verison == 9010)
     global_atomic_fadd(p, v);
  else
     generic_atomic_fadd(p, v);
}
At optimization levels > 0, the dead code is eliminated, but at optimization level 0, the dead code remain and the compiler attempts to generate an instruction. That instruction would never be executed due to the condition, but it still needs to be generated. In these cases, we don't have multilibs (target specific libraries).

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
2	File should not be executable.

bcahoon added inline comments.Aug 1 2022, 7:28 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
3364	Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's? When you add the llvm-mc tests, it will also be a good idea to add a test that pipes the output of llvm-mc through llvm-objdump to the that the different encodings show up properly for the two cases.

Fix indentation, braces, and file mode.

Leonc marked an inline comment as done.Aug 1 2022, 7:43 AM

Leonc added inline comments.

llvm/lib/Target/AMDGPU/SIInstructions.td
3364	Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's? @arsenm asked for separate definitions: I would prefer to define a separate V_ILLEGAL that uses all 1s pre gfx10

Leonc added inline comments.Aug 1 2022, 7:56 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8399	Ternary args are the wrong way round. Running local tests before I update the diff to fix this.

Fix ternary args.

Harbormaster completed remote builds in B178573: Diff 449021.Aug 1 2022, 8:57 AM

Add v_illegal assembly tests.
Update instruction definitions.

Leonc marked an inline comment as done.Aug 3 2022, 2:19 AM

Leonc added inline comments.

llvm/test/MC/AMDGPU/v_illegal-atomics.s
2–3	Adding modifications to pipe the compiler output through `objdump` as these tests aren't very helpful.

Harbormaster completed remote builds in B178986: Diff 449602.Aug 3 2022, 2:39 AM

In D123693#3690864, @foad wrote:

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

The problem is in the called function with the appropriate attributes. The function isn't deleted and we still need to produce something. The current situation is it works in cases where the wrong subtarget also happens to have the same encoding, but breaks on targets that never had an equivalent instruction encoded

arsenm added inline comments.Aug 3 2022, 7:04 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
3364	The instruction is actually named v_illegal, so you shouldn't invent new names here. I would name these V_ILLEGAL for the gfx10 version and V_ILLEGAL_gfx6_gfx7_gf8_gfx9? I also think V_ILLEGAL is available in a vop3 encoding (for gfx10) but probably should define that in a separate patch for the benefit of the disassembler

arsenm added inline comments.Aug 3 2022, 7:06 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401	Checking for MemSDNode doesn't make sense, other nodes can have chains. You would need to pass it in from the source if it applies. However I don't think it's worth the complexity, since this really is filler content anyway

Leonc added inline comments.Aug 3 2022, 7:12 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401	Thanks, I agree.

In D123693#3696457, @arsenm wrote:

In D123693#3690864, @foad wrote:

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

The problem is in the called function with the appropriate attributes. The function isn't deleted and we still need to produce something. The current situation is it works in cases where the wrong subtarget also happens to have the same encoding, but breaks on targets that never had an equivalent instruction encoded

If target-cpu doesn't let you target different CPUs for different functions then I don't understand what it does.

In D123693#3696493, @foad wrote:

If target-cpu doesn't let you target different CPUs for different functions then I don't understand what it does.

For the relevant cases in the library, we key off the subtarget feature in target-features. The problem is in the final codegen we've propagated the real cpu to the leaf function, which then artificially has an incompatible target-features. We would have to have some code fixup the target-cpu to some random other device with compatible features

Address comments.

Leonc marked 2 inline comments as done.Aug 3 2022, 11:12 AM

Leonc added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401	It turns out the chain is necessary. An assertion in `RemoveNodeFromCSEMaps` fails when `Op` is an atomic intrinsic.

Harbormaster completed remote builds in B179072: Diff 449723.Aug 3 2022, 1:09 PM

arsenm added inline comments.Aug 3 2022, 2:25 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401	Yes, a chain is needed but you should be able to use the entry node. If you want to forward it in from the call sites, that would also be OK

Address comments and fix tests.

Leonc marked an inline comment as done.Aug 4 2022, 8:55 AM

Harbormaster completed remote builds in B179295: Diff 450001.Aug 4 2022, 9:49 AM

Could also use a disassembler test

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8397	SDLoc DL(Op)

Address comments.

Leonc marked an inline comment as done.Aug 5 2022, 6:26 AM

arsenm added inline comments.Aug 5 2022, 6:48 AM

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41	This looks backwards. gfx10 is getting the -1 encoding and gfx9 is getting 0
54	gfx10 has v_illegal, so this should be 0x0 and printed as v_illegal

Leonc added inline comments.Aug 5 2022, 6:58 AM

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41	Are you sure gfx9 is getting the zero encoding? I thought this line was "address: encoding". If this was wrong wouldn't the tests in `v_illegal-atomics.s` also be wrong?

arsenm added inline comments.Aug 5 2022, 7:02 AM

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41	You're right, gfx9 looks right. The gfx1030 line with .long 0xffffffff looks wrong.

Leonc added inline comments.Aug 5 2022, 7:07 AM

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41	I'm guessing gfx1030 is due to the disassebler. We know the encoding is correct from the tests in `v_illegal-atomics.s`. I'll look into it.

Harbormaster completed remote builds in B179505: Diff 450279.Aug 5 2022, 7:16 AM

Address comments.

Leonc marked 3 inline comments as done.Aug 5 2022, 10:26 AM

Harbormaster completed remote builds in B179539: Diff 450320.Aug 5 2022, 11:03 AM

LGTM except for the stray debug print. We also probably should clean up the the encoding to rely on subtarget features instead of the generation check

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8396	DRop this debug printing

This revision is now accepted and ready to land.Aug 5 2022, 1:07 PM

Remove debug code.

arsenm accepted this revision.Aug 5 2022, 1:11 PM

Leonc marked an inline comment as done.Aug 5 2022, 1:12 PM

Leonc added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8396	Thanks, my mistake.

Leonc marked an inline comment as done.Aug 5 2022, 1:13 PM

In D123693#3703089, @arsenm wrote:

We also probably should clean up the the encoding to rely on subtarget features instead of the generation check

Is there a generic way to do that without adding maintenance overhead, or is it better that we don't implicitly support new subtargets/generations in case they have different encodings of v_illegal?

Harbormaster completed remote builds in B179574: Diff 450368.Aug 5 2022, 2:08 PM

This revision was landed with ongoing or failed builds.Aug 6 2022, 12:59 AM

Closed by commit rG6a275cd53cc9: Transform illegal intrinsics to V_ILLEGAL (authored by Leonc, committed by thopre). · Explain Why

This revision was automatically updated to reflect the committed changes.

thopre added a commit: rG6a275cd53cc9: Transform illegal intrinsics to V_ILLEGAL.

foad mentioned this in D136944: [AMDGPU] Enable `s_sendmsg_rtn` selection with `+gfx11-insts`.Oct 28 2022, 5:59 AM

Joe_Nash mentioned this in D139000: [AMDGPU] Remove function with incompatible features.Nov 30 2022, 9:31 AM

Joe_Nash added a subscriber: Joe_Nash.Jan 4 2023, 6:25 AM

foad added a reverting change: D148127: [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL.Apr 12 2023, 6:29 AM

foad mentioned this in D148127: [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL.Apr 13 2023, 1:54 AM

foad mentioned this in rGbf4dc4381e30: [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL.Apr 19 2023, 2:00 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.h

2 lines

SIISelLowering.cpp

27 lines

SIInstructions.td

22 lines

test/

CodeGen/

AMDGPU/

v_illegal-atomics.ll

63 lines

v_illegal-image_sample.ll

40 lines

MC/

AMDGPU/

v_illegal-atomics.s

14 lines

Diff 450488

llvm/lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	private:

SDValue lowerWorkitemID(SelectionDAG &DAG, SDValue Op, unsigned Dim,		SDValue lowerWorkitemID(SelectionDAG &DAG, SDValue Op, unsigned Dim,
const ArgDescriptor &ArgDesc) const;		const ArgDescriptor &ArgDesc) const;

SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;

		SDValue makeV_ILLEGAL(SDValue Op, SelectionDAG &DAG) const;

// The raw.tbuffer and struct.tbuffer intrinsics have two offset args: offset		// The raw.tbuffer and struct.tbuffer intrinsics have two offset args: offset
// (the offset that is included in bounds checking and swizzling, to be split		// (the offset that is included in bounds checking and swizzling, to be split
// between the instruction's voffset and immoffset fields) and soffset (the		// between the instruction's voffset and immoffset fields) and soffset (the
// offset that is excluded from bounds checking and swizzling, to go in the		// offset that is excluded from bounds checking and swizzling, to go in the
// instruction's soffset field). This function takes the first kind of		// instruction's soffset field). This function takes the first kind of
// offset and figures out how to split it between voffset and immoffset.		// offset and figures out how to split it between voffset and immoffset.
std::pair<SDValue, SDValue> splitBufferOffsets(SDValue Offset,		std::pair<SDValue, SDValue> splitBufferOffsets(SDValue Offset,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 438 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,626 Lines • ▼ Show 20 Lines	Opcode = AMDGPU::getMIMGOpcode(IntrOpcode,
UseNSA ? AMDGPU::MIMGEncGfx10NSA		UseNSA ? AMDGPU::MIMGEncGfx10NSA
: AMDGPU::MIMGEncGfx10Default,		: AMDGPU::MIMGEncGfx10Default,
NumVDataDwords, NumVAddrDwords);		NumVDataDwords, NumVAddrDwords);
} else {		} else {
if (Subtarget->hasGFX90AInsts()) {		if (Subtarget->hasGFX90AInsts()) {
Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx90a,		Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx90a,
NumVDataDwords, NumVAddrDwords);		NumVDataDwords, NumVAddrDwords);
if (Opcode == -1)		if (Opcode == -1)
report_fatal_error(		return makeV_ILLEGAL(Op, DAG);
"requested image instruction is not supported on this GPU");
}		}
		bcahoonUnsubmitted Done Reply Inline Actions I think it may be easiest to create the ILLEGAL instruction here rather than creating it after the call to lowerImage. That way, there doesn't have to be copies of the code that creates ILLEGAL. For example, MachineSDNode NewNode = DAG.getMachineNode(AMDGPU::V_ILLEGAL, DL, MVT::Other, Op.getOperand(0)); return DAG.getMergeValues({ DAG.getUNDEF(Op.getValueType()), SDValue(NewNode, 0) }, DL); The UNDEF value replaces return result for the intrinsic, and the V_ILLEGAL replaces the chain result. It appears, though, that V_ILLEGAL is removed if the intrinsic chain result is not used anywhere. I'm not sure if that is a bad thing since it means it's not really needed? bcahoon:* I think it may be easiest to create the ILLEGAL instruction here rather than creating it after…
if (Opcode == -1 &&		if (Opcode == -1 &&
		bcahoonUnsubmitted Done Reply Inline Actions No need for braces bcahoon: No need for braces
Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)		Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx8,		Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx8,
NumVDataDwords, NumVAddrDwords);		NumVDataDwords, NumVAddrDwords);
if (Opcode == -1)		if (Opcode == -1)
Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx6,		Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, AMDGPU::MIMGEncGfx6,
NumVDataDwords, NumVAddrDwords);		NumVDataDwords, NumVAddrDwords);
}		}
assert(Opcode != -1);		assert(Opcode != -1);
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	static void updateBufferMMO(MachineMemOperand *MMO, SDValue VOffset,
SDValue VIndex = SDValue()) {		SDValue VIndex = SDValue()) {
if (!isa<ConstantSDNode>(VOffset) \|\| !isa<ConstantSDNode>(SOffset) \|\|		if (!isa<ConstantSDNode>(VOffset) \|\| !isa<ConstantSDNode>(SOffset) \|\|
!isa<ConstantSDNode>(Offset)) {		!isa<ConstantSDNode>(Offset)) {
// The combined offset is not known to be constant, so we cannot represent		// The combined offset is not known to be constant, so we cannot represent
// it in the MMO. Give up.		// it in the MMO. Give up.
MMO->setValue((Value *)nullptr);		MMO->setValue((Value *)nullptr);
return;		return;
}		}

		bcahoonUnsubmitted Done Reply Inline Actions Can this go in lowerImage? It seems like this will mark some cases as illegal that should be handled either in DAGToDAG or with patterns. bcahoon: Can this go in lowerImage? It seems like this will mark some cases as illegal that should be…
if (VIndex && (!isa<ConstantSDNode>(VIndex) \|\|		if (VIndex && (!isa<ConstantSDNode>(VIndex) \|\|
!cast<ConstantSDNode>(VIndex)->isZero())) {		!cast<ConstantSDNode>(VIndex)->isZero())) {
// The strided index component of the address is not known to be zero, so we		// The strided index component of the address is not known to be zero, so we
// cannot represent it in the MMO. Give up.		// cannot represent it in the MMO. Give up.
MMO->setValue((Value *)nullptr);		MMO->setValue((Value *)nullptr);
return;		return;
}		}

▲ Show 20 Lines • Show All 687 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_flat_atomic_fmax: {
SDValue Ops[] = {		SDValue Ops[] = {
M->getOperand(0), // Chain		M->getOperand(0), // Chain
M->getOperand(2), // Ptr		M->getOperand(2), // Ptr
M->getOperand(3) // Value		M->getOperand(3) // Value
};		};
unsigned Opcode = 0;		unsigned Opcode = 0;
switch (IntrID) {		switch (IntrID) {
case Intrinsic::amdgcn_global_atomic_fadd:		case Intrinsic::amdgcn_global_atomic_fadd:
		if (!Subtarget->hasAtomicFaddNoRtnInsts())
		return makeV_ILLEGAL(Op, DAG);
		LLVM_FALLTHROUGH;
		bcahoonUnsubmitted Done Reply Inline Actions no braces needed bcahoon: no braces needed
case Intrinsic::amdgcn_flat_atomic_fadd: {		case Intrinsic::amdgcn_flat_atomic_fadd: {
EVT VT = Op.getOperand(3).getValueType();		EVT VT = Op.getOperand(3).getValueType();
return DAG.getAtomic(ISD::ATOMIC_LOAD_FADD, DL, VT,		return DAG.getAtomic(ISD::ATOMIC_LOAD_FADD, DL, VT,
DAG.getVTList(VT, MVT::Other), Ops,		DAG.getVTList(VT, MVT::Other), Ops,
M->getMemOperand());		M->getMemOperand());
}		}
case Intrinsic::amdgcn_global_atomic_fmin:		case Intrinsic::amdgcn_global_atomic_fmin:
case Intrinsic::amdgcn_flat_atomic_fmin: {		case Intrinsic::amdgcn_flat_atomic_fmin: {
Opcode = AMDGPUISD::ATOMIC_LOAD_FMIN;		Opcode = AMDGPUISD::ATOMIC_LOAD_FMIN;
break;		break;
}		}
case Intrinsic::amdgcn_global_atomic_fmax:		case Intrinsic::amdgcn_global_atomic_fmax:
case Intrinsic::amdgcn_flat_atomic_fmax: {		case Intrinsic::amdgcn_flat_atomic_fmax: {
Opcode = AMDGPUISD::ATOMIC_LOAD_FMAX;		Opcode = AMDGPUISD::ATOMIC_LOAD_FMAX;
break;		break;
}		}
default:		default:
llvm_unreachable("unhandled atomic opcode");		llvm_unreachable("unhandled atomic opcode");
}		}
return DAG.getMemIntrinsicNode(Opcode, SDLoc(Op),		return DAG.getMemIntrinsicNode(Opcode, SDLoc(Op),
M->getVTList(), Ops, M->getMemoryVT(),		M->getVTList(), Ops, M->getMemoryVT(),
M->getMemOperand());		M->getMemOperand());
}		}
default:		default:

		rampitecUnsubmitted Done Reply Inline Actions !ST->hasExtendedImageInsts() instead of target check. rampitec: !ST->hasExtendedImageInsts() instead of target check.
		LeoncAuthorUnsubmitted Done Reply Inline Actions I think this is unnecessary. We still need to call `lowerImage` for the return value when it succeeds. Leonc: I think this is unnecessary. We still need to call `lowerImage` for the return value when it…
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrID))		AMDGPU::getImageDimIntrinsicInfo(IntrID))
return lowerImage(Op, ImageDimIntr, DAG, true);		return lowerImage(Op, ImageDimIntr, DAG, true);

return SDValue();		return SDValue();
}		}
}		}

// Call DAG.getMemIntrinsicNode for a load, but first widen a dwordx3 type to		// Call DAG.getMemIntrinsicNode for a load, but first widen a dwordx3 type to
// dwordx4 if on SI.		// dwordx4 if on SI.
SDValue SITargetLowering::getMemIntrinsicNode(unsigned Opcode, const SDLoc &DL,		SDValue SITargetLowering::getMemIntrinsicNode(unsigned Opcode, const SDLoc &DL,
		arsenmUnsubmitted Done Reply Inline Actions You can select directly to the instruction with getMachineNode, no need for the intermediate AMDGPUISD::ILLEGAL arsenm: You can select directly to the instruction with getMachineNode, no need for the intermediate…
SDVTList VTList,		SDVTList VTList,
ArrayRef<SDValue> Ops, EVT MemVT,		ArrayRef<SDValue> Ops, EVT MemVT,
MachineMemOperand *MMO,		MachineMemOperand *MMO,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = VTList.VTs[0];		EVT VT = VTList.VTs[0];
EVT WidenedVT = VT;		EVT WidenedVT = VT;
EVT WidenedMemVT = MemVT;		EVT WidenedMemVT = MemVT;
if (!Subtarget->hasDwordx3LoadStores() &&		if (!Subtarget->hasDwordx3LoadStores() &&
▲ Show 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG, true);		return lowerImage(Op, ImageDimIntr, DAG, true);

return Op;		return Op;
}		}
}		}
}		}

		SDValue SITargetLowering::makeV_ILLEGAL(SDValue Op, SelectionDAG & DAG) const {
		// Create the V_ILLEGAL node.
		arsenmUnsubmitted Done Reply Inline Actions DRop this debug printing arsenm: DRop this debug printing
		LeoncAuthorUnsubmitted Done Reply Inline Actions Thanks, my mistake. Leonc: Thanks, my mistake.
		SDLoc DL(Op);
		arsenmUnsubmitted Done Reply Inline Actions SDLoc DL(Op) arsenm: SDLoc DL(Op)
		auto Opcode = Subtarget->getGeneration() < AMDGPUSubtarget::GFX10 ?
		AMDGPU::V_ILLEGAL_gfx6_gfx7_gfx8_gfx9 : AMDGPU::V_ILLEGAL;
		LeoncAuthorUnsubmitted Done Reply Inline Actions Ternary args are the wrong way round. Running local tests before I update the diff to fix this. Leonc: Ternary args are the wrong way round. Running local tests before I update the diff to fix this.
		auto EntryNode = DAG.getEntryNode();
		auto IllegalNode = DAG.getMachineNode(Opcode, DL, MVT::Other, EntryNode);
		bcahoonUnsubmitted Done Reply Inline Actions The v_illegal should have an operand for the chain operand in the original instruction, if one exists. t0 EntryToken res,ch = intrinsic t0, <other opernads> -> res = undef ch = v_illegal t0 Also, if there is a use of the chain definition in the original instruction, then there is no need to add the code below. bcahoon: The v_illegal should have an operand for the chain operand in the original instruction, if one…
		arsenmUnsubmitted Done Reply Inline Actions I'd expect to use the original chain, but it also doesn't really matter given that it's filler anyway arsenm: I'd expect to use the original chain, but it also doesn't really matter given that it's filler…
		LeoncAuthorUnsubmitted Done Reply Inline Actions Thanks @bcahoon & @arsenm. I added a check for `MemSDNode` and pass the chain to `V_ILLEGAL` if it exists. Leonc: Thanks @bcahoon & @arsenm. I added a check for `MemSDNode` and pass the chain to `V_ILLEGAL`…
		arsenmUnsubmitted Done Reply Inline Actions Checking for MemSDNode doesn't make sense, other nodes can have chains. You would need to pass it in from the source if it applies. However I don't think it's worth the complexity, since this really is filler content anyway arsenm: Checking for MemSDNode doesn't make sense, other nodes can have chains. You would need to pass…
		LeoncAuthorUnsubmitted Done Reply Inline Actions Thanks, I agree. Leonc: Thanks, I agree.
		LeoncAuthorUnsubmitted Done Reply Inline Actions It turns out the chain is necessary. An assertion in `RemoveNodeFromCSEMaps` fails when `Op` is an atomic intrinsic. Leonc: It turns out the chain is necessary. An assertion in `RemoveNodeFromCSEMaps` fails when `Op` is…
		arsenmUnsubmitted Done Reply Inline Actions Yes, a chain is needed but you should be able to use the entry node. If you want to forward it in from the call sites, that would also be OK arsenm: Yes, a chain is needed but you should be able to use the entry node. If you want to forward it…
		auto IllegalVal = SDValue(IllegalNode, 0u);

		// Add the V_ILLEGAL node to the root chain to prevent its removal.
		auto Chains = SmallVector<SDValue, 2u>();
		Chains.push_back(IllegalVal);
		Chains.push_back(DAG.getRoot());
		auto Root = DAG.getTokenFactor(SDLoc(Chains.back()), Chains);
		DAG.setRoot(Root);

		// Merge with UNDEF to satisfy return value requirements.
		auto UndefVal = DAG.getUNDEF(Op.getValueType());
		return DAG.getMergeValues({UndefVal, IllegalVal}, DL);
		}

// The raw.(t)buffer and struct.(t)buffer intrinsics have two offset args:		// The raw.(t)buffer and struct.(t)buffer intrinsics have two offset args:
// offset (the offset that is included in bounds checking and swizzling, to be		// offset (the offset that is included in bounds checking and swizzling, to be
// split between the instruction's voffset and immoffset fields) and soffset		// split between the instruction's voffset and immoffset fields) and soffset
// (the offset that is excluded from bounds checking and swizzling, to go in		// (the offset that is excluded from bounds checking and swizzling, to go in
// the instruction's soffset field). This function takes the first kind of		// the instruction's soffset field). This function takes the first kind of
// offset and figures out how to split it between voffset and immoffset.		// offset and figures out how to split it between voffset and immoffset.
std::pair<SDValue, SDValue> SITargetLowering::splitBufferOffsets(		std::pair<SDValue, SDValue> SITargetLowering::splitBufferOffsets(
SDValue Offset, SelectionDAG &DAG) const {		SDValue Offset, SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 4,573 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 3,350 Lines • ▼ Show 20 Lines	def G_FPTRUNC_ROUND_UPWARD : AMDGPUGenericInstruction {
let hasSideEffects = 0;		let hasSideEffects = 0;
}		}

def G_FPTRUNC_ROUND_DOWNWARD : AMDGPUGenericInstruction {		def G_FPTRUNC_ROUND_DOWNWARD : AMDGPUGenericInstruction {
let OutOperandList = (outs type0:$vdst);		let OutOperandList = (outs type0:$vdst);
let InOperandList = (ins type1:$src0);		let InOperandList = (ins type1:$src0);
let hasSideEffects = 0;		let hasSideEffects = 0;
}		}

		//============================================================================//
		// Dummy Instructions
		//============================================================================//

		def V_ILLEGAL_gfx6_gfx7_gfx8_gfx9 : Enc32, InstSI<(outs), (ins), "v_illegal"> {
		bcahoonUnsubmitted Done Reply Inline Actions Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's? When you add the llvm-mc tests, it will also be a good idea to add a test that pipes the output of llvm-mc through llvm-objdump to the that the different encodings show up properly for the two cases. bcahoon: Do we want different mnemonics, or should the same mnemonic be used for the cases when the…
		LeoncAuthorUnsubmitted Done Reply Inline Actions Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's? @arsenm asked for separate definitions: I would prefer to define a separate V_ILLEGAL that uses all 1s pre gfx10 Leonc: > Do we want different mnemonics, or should the same mnemonic be used for the cases when the…
		arsenmUnsubmitted Done Reply Inline Actions The instruction is actually named v_illegal, so you shouldn't invent new names here. I would name these V_ILLEGAL for the gfx10 version and V_ILLEGAL_gfx6_gfx7_gf8_gfx9? I also think V_ILLEGAL is available in a vop3 encoding (for gfx10) but probably should define that in a separate patch for the benefit of the disassembler arsenm: The instruction is actually named v_illegal, so you shouldn't invent new names here. I would…
		let Inst{31-0} = 0xFFFFFFFF;
		let FixedSize = 1;
		let Size = 4;
		let Uses = [EXEC];
		arsenmUnsubmitted Done Reply Inline Actions This is more complicated. It's actually defined on gfx10 (and I believe can also be encoded as both 32 and 64-bit). For gfx6-9, all 0s does have an interpretation as an almost valid instruction. It decodes fine but violates the constant bus restriction. I would prefer to define a separate V_ILLEGAL that uses all 1s pre gfx10 arsenm: This is more complicated. It's actually defined on gfx10 (and I believe can also be encoded as…
		bcahoonUnsubmitted Done Reply Inline Actions Do we need different version for gfx10? bcahoon: Do we need different version for gfx10?
		let hasSideEffects = 1;
		let SubtargetPredicate = isGFX6GFX7GFX8GFX9;
		}

		def V_ILLEGAL : Enc32, InstSI<(outs), (ins), "v_illegal"> {
		let Inst{31-0} = 0x00000000;
		let FixedSize = 1;
		let Size = 4;
		let Uses = [EXEC];
		let hasSideEffects = 1;
		let SubtargetPredicate = isGFX10Plus;
		}

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX906-ASM %s
				; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX908-ASM %s
				foadUnsubmitted Done Reply Inline Actions File should not be executable. foad: File should not be executable.
				; RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX90A-ASM %s
				; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX940-ASM %s
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX1030-ASM %s
				; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX1100-ASM %s

				; GFX906-ASM-LABEL: fadd_test:
				; GFX906-ASM-NOT: global_atomic_add_f32
				; GFX906-ASM: v_illegal

				; GFX908-ASM-LABEL: fadd_test:
				; GFX908-ASM-NOT: v_illegal
				; GFX908-ASM: global_atomic_add_f32

				; GFX90A-ASM-LABEL: fadd_test:
				; GFX90A-ASM-NOT: v_illegal
				; GFX90A-ASM: global_atomic_add_f32

				; GFX940-ASM-LABEL: fadd_test:
				; GFX940-ASM-NOT: v_illegal
				; GFX940-ASM: global_atomic_add_f32

				; GFX1030-ASM-LABEL: fadd_test:
				; GFX1030-ASM-NOT: global_atomic_add_f32
				; GFX1030-ASM: v_illegal

				; GFX1100-ASM-LABEL: fadd_test:
				; GFX1100-ASM-NOT: v_illegal
				; GFX1100-ASM: global_atomic_add_f32

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx906 -d - \| FileCheck --check-prefix=GFX906-OBJ %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx908 -d - \| FileCheck --check-prefix=GFX908-OBJ %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx90a -d - \| FileCheck --check-prefix=GFX90A-OBJ %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx940 -d - \| FileCheck --check-prefix=GFX940-OBJ %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx1030 -d - \| FileCheck --check-prefix=GFX1030-OBJ %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -filetype=obj -verify-machineinstrs < %s \| llvm-objdump --triple=amdgcn--amdhsa --mcpu=gfx1100 -d - \| FileCheck --check-prefix=GFX1100-OBJ %s

				; GFX906-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-OBJ-NEXT: v_illegal // 000000000004: FFFFFFFF

				arsenmUnsubmitted Done Reply Inline Actions This looks backwards. gfx10 is getting the -1 encoding and gfx9 is getting 0 arsenm: This looks backwards. gfx10 is getting the -1 encoding and gfx9 is getting 0
				LeoncAuthorUnsubmitted Done Reply Inline Actions Are you sure gfx9 is getting the zero encoding? I thought this line was "address: encoding". If this was wrong wouldn't the tests in `v_illegal-atomics.s` also be wrong? Leonc: Are you sure gfx9 is getting the zero encoding? I thought this line was "address: encoding". If…
				arsenmUnsubmitted Done Reply Inline Actions You're right, gfx9 looks right. The gfx1030 line with .long 0xffffffff looks wrong. arsenm: You're right, gfx9 looks right. The gfx1030 line with .long 0xffffffff looks wrong.
				LeoncAuthorUnsubmitted Done Reply Inline Actions I'm guessing gfx1030 is due to the disassebler. We know the encoding is correct from the tests in `v_illegal-atomics.s`. I'll look into it. Leonc: I'm guessing gfx1030 is due to the disassebler. We know the encoding is correct from the tests…
				; GFX908-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX908-OBJ-NEXT: global_atomic_add_f32

				; GFX90A-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-OBJ-NEXT: global_atomic_add_f32

				; GFX940-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX940-OBJ-NEXT: global_atomic_add_f32

				; GFX1030-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX1030-OBJ-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX1030-OBJ-NEXT: v_illegal // 000000000008: 00000000

				arsenmUnsubmitted Done Reply Inline Actions gfx10 has v_illegal, so this should be 0x0 and printed as v_illegal arsenm: gfx10 has v_illegal, so this should be 0x0 and printed as v_illegal
				; GFX1100-OBJ: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX1100-OBJ-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX1100-OBJ-NEXT: global_atomic_add_f32 v[0:1], v2, off

				define fastcc void @fadd_test(float addrspace(1)* nocapture noundef %0, float noundef %1) unnamed_addr {
				%3 = tail call float @llvm.amdgcn.global.atomic.fadd.f32.p1f32.f32(float addrspace(1)* noundef %0, float noundef %1)
				ret void
				}
				declare float @llvm.amdgcn.global.atomic.fadd.f32.p1f32.f32(float addrspace(1)* nocapture, float)

llvm/test/CodeGen/AMDGPU/v_illegal-image_sample.ll

This file was added.

				; RUN: llc -O0 -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX906 %s
				; RUN: llc -O0 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX908 %s
				; RUN: llc -O0 -march=amdgcn -mcpu=gfx90a -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX90A %s
				; RUN: llc -O0 -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX940 %s
				; RUN: llc -O0 -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX1030 %s
				; RUN: llc -O0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX1100 %s

				; GFX906-LABEL: image_sample_test:
				; GFX906-NOT: v_illegal
				; GFX906: image_sample_lz

				; GFX908-LABEL: image_sample_test:
				; GFX908-NOT: v_illegal
				; GFX908: image_sample_lz

				; GFX90A-LABEL: image_sample_test:
				; GFX90A-NOT: image_sample_lz
				; GFX90A: v_illegal

				; GFX940-LABEL: image_sample_test:
				; GFX940-NOT: image_sample_lz
				; GFX940: v_illegal

				; GFX1030-LABEL: image_sample_test:
				; GFX1030-NOT: v_illegal
				; GFX1030: image_sample_lz

				; GFX1100-LABEL: image_sample_test:
				; GFX1100-NOT: v_illegal
				; GFX1100: image_sample_lz

				define amdgpu_kernel void @image_sample_test(<4 x float> addrspace(1)* %out, float %arg1, float %arg2, <8 x i32> %arg3, <4 x i32> %arg4) {

				%result = tail call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float %arg1, float %arg2, <8 x i32> %arg3, <4 x i32> %arg4, i1 false, i32 0, i32 0)

				store <4 x float> %result, <4 x float> addrspace(1)* %out
				ret void
				}

				declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg)

llvm/test/MC/AMDGPU/v_illegal-atomics.s

This file was added.

				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx906 -show-encoding %s \| FileCheck --check-prefix=GFX906 %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx908 -show-encoding %s \| FileCheck --check-prefix=GFX908 %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx90a -show-encoding %s \| FileCheck --check-prefix=GFX90A %s
				LeoncAuthorUnsubmitted Done Reply Inline Actions Adding modifications to pipe the compiler output through `objdump` as these tests aren't very helpful. Leonc: Adding modifications to pipe the compiler output through `objdump` as these tests aren't very…
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx940 -show-encoding %s \| FileCheck --check-prefix=GFX940 %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding %s \| FileCheck --check-prefix=GFX1030 %s
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s \| FileCheck --check-prefix=GFX1100 %s

				v_illegal
				// GFX906: encoding: [0xff,0xff,0xff,0xff]
				// GFX908: encoding: [0xff,0xff,0xff,0xff]
				// GFX90A: encoding: [0xff,0xff,0xff,0xff]
				// GFX940: encoding: [0xff,0xff,0xff,0xff]
				// GFX1030: encoding: [0x00,0x00,0x00,0x00]
				// GFX1100: encoding: [0x00,0x00,0x00,0x00]

This is an archive of the discontinued LLVM Phabricator instance.

Transform illegal intrinsics to V_ILLEGALClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 450488

llvm/lib/Target/AMDGPU/SIISelLowering.h

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll

llvm/test/CodeGen/AMDGPU/v_illegal-image_sample.ll

llvm/test/MC/AMDGPU/v_illegal-atomics.s

Transform illegal intrinsics to V_ILLEGAL
ClosedPublic