This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
2/5
BuiltinsAMDGPU.def
-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
AMDGPU.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
amdgpu-features.cl
-
builtins-amdgcn-dl-insts-err.cl
-
llvm/lib/Target/AMDGPU/
-
lib/
-
Target/
-
AMDGPU/
-
AMDGPU.td
-
GCNSubtarget.h
2/2
VOP3PInstructions.td

Differential D142507

[AMDGPU] Split dot7 feature
ClosedPublic

Authored by rampitec on Jan 24 2023, 2:25 PM.

Download Raw Diff

Details

Reviewers

foad
kzhuravl
b-sumner

Commits

rGdf0488369d32: [AMDGPU] Split dot7 feature

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Jan 24 2023, 2:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 2:25 PM

Herald added subscribers: kosarev, StephenFan, kerbowa and 6 others. · View Herald Transcript

rampitec requested review of this revision.Jan 24 2023, 2:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 2:25 PM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added inline comments.Jan 24 2023, 2:28 PM

clang/include/clang/Basic/BuiltinsAMDGPU.def
239	I have even less idea what these numbers mean now than I did before. This is also a bitcode compatibility break

rampitec added inline comments.Jan 24 2023, 2:55 PM

clang/include/clang/Basic/BuiltinsAMDGPU.def
239	They actually never meant anything just because there is no system in the support matrix. I know this one will need simultaneous update of the device lib downstream.

Harbormaster completed remote builds in B209750: Diff 491913.Jan 24 2023, 6:05 PM

foad added inline comments.Jan 25 2023, 2:35 AM

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
1083–1084	Is this because Real instructions copy predicates from the Pseudo? Seems like this could go in as a separate obvious cleanup.

arsenm added inline comments.Jan 25 2023, 8:25 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
239	why not name these as just the exact instruction name?

rampitec added inline comments.Jan 25 2023, 11:33 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
239	This is legacy thing. When it first appeared it was a single instruction set. Changing it now completely will break a lot of stuff.

Split the cleanup NFCI.

rampitec added a parent revision: D142575: [AMDGPU] Remove predicates from real dot instructions. NFCI..Jan 25 2023, 12:09 PM

rampitec marked an inline comment as done.

rampitec added inline comments.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
1083–1084	D142575

Harbormaster completed remote builds in B209948: Diff 492220.Jan 25 2023, 1:16 PM

LGTM.

This revision is now accepted and ready to land.Jan 26 2023, 2:51 AM

This revision was landed with ongoing or failed builds.Jan 26 2023, 10:34 AM

Closed by commit rGdf0488369d32: [AMDGPU] Split dot7 feature (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec marked an inline comment as done.

rampitec added a commit: rGdf0488369d32: [AMDGPU] Split dot7 feature.

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 10:34 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Would it be possible to backport this to Clang 16?

If https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be makes it into ROCm 5.5 no distro would be able to build it with "vanilla" Clang 16, potentially causing pain for users that try to build ROCm 5.5 with a Clang from a package manager (a realistic scenario, considering that one may want to invest 5 min to build ROCm but not 40 min to build Clang). ROCm 5.5 will be the first release to officially support the 7900XT and 7900XTX, so not having this potentially causes issues for users with recent AMD hardware. (See https://github.com/RadeonOpenCompute/ROCm/issues/1880 for extensive, related discussion).

@jhuber6 This wouldn't exactly "solve" https://github.com/llvm/llvm-project/issues/60660, but I think this could also be a workaround (with potentially better user experience), as allowing users build ROCm with regular Clang 16 prevents that deadlock where we can't build ROCm anymore. This is entirely based on speculation that ROCm 5.5 won't introduce other breakages before its release though, so I'd totally understand if this is not a satisfactory solution.

In D142507#4125940, @aaronmondal wrote:

Would it be possible to backport this to Clang 16?

If https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be makes it into ROCm 5.5 no distro would be able to build it with "vanilla" Clang 16, potentially causing pain for users that try to build ROCm 5.5 with a Clang from a package manager (a realistic scenario, considering that one may want to invest 5 min to build ROCm but not 40 min to build Clang). ROCm 5.5 will be the first release to officially support the 7900XT and 7900XTX, so not having this potentially causes issues for users with recent AMD hardware. (See https://github.com/RadeonOpenCompute/ROCm/issues/1880 for extensive, related discussion).

@jhuber6 This wouldn't exactly "solve" https://github.com/llvm/llvm-project/issues/60660, but I think this could also be a workaround (with potentially better user experience), as allowing users build ROCm with regular Clang 16 prevents that deadlock where we can't build ROCm anymore. This is entirely based on speculation that ROCm 5.5 won't introduce other breakages before its release though, so I'd totally understand if this is not a satisfactory solution.

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

arsenm added inline comments.Feb 14 2023, 11:02 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
239	Then why bother renaming this? We really need to stop breaking feature names

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

@rampitec I'm not sure I understand. Does this mean that this is breaking in a way that Clang 17 won't be able to build ROCm 5.4?

I thought it was like "we need D142507 to build device-libs after 8dc779e" and for older device libs we just fall back to some older behavior.

In D142507#4126864, @aaronmondal wrote:

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

@rampitec I'm not sure I understand. Does this mean that this is breaking in a way that Clang 17 won't be able to build ROCm 5.4?

I thought it was like "we need D142507 to build device-libs after 8dc779e" and for older device libs we just fall back to some older behavior.

Since the feature is actually used by the device-lib it had to be updated in lock step with the compiler change, not after or before. That's what was done in the downstream.

Well, I can already feel the pain that distro maintainers having to build the next ROCm releases 😅

I wonder what the better course of action is here:

Port this patch to Clang 16 so that users with new hardware will be able to build ROCm 5.5, but make it impossible to build ROCm 5.4 and older with clang 16.
Don't port this patch and have a ~6 months gap during which users with the 7900 GPUs won't be able to build ROCm with a stable Clang version, requiring distro maintainers to use several toolchains and source-based distro users to use differentl compatibility patches for different ROCm releases. So basically when 8900 GPUs are announced, clang would support ROCm for 7900 GPUs 😅

Would there be a way to retain at least *some* backwards compatibility or version interoperability? For instance, via an #ifdef CLANG_VERSION_MAJOR in the device libs and an #ifdef INCOMPATIBLE_AMDGPU_INSTS in Clang?

This would obviously very ugly, but it still seems better to me than locking out users (and more likely, ROCm contributors) from using 7900 GPUs if they are unable to build Clang themselves. Users already complain about how hard it is to build ROCm, and they also complain about the frequent breaking changes Clang. I'm very much in favor of moving fast, but I'm worried that complete disregard for backwards compatibility like this with no clear upgrade path or fallback mechanism could cause a lot of frustration for users and distro maintainers.

Maybe there is some other, prettier way to solve this? 🥹

In D142507#4127167, @aaronmondal wrote:

Well, I can already feel the pain that distro maintainers having to build the next ROCm releases 😅

I wonder what the better course of action is here:

Port this patch to Clang 16 so that users with new hardware will be able to build ROCm 5.5, but make it impossible to build ROCm 5.4 and older with clang 16.

Don't port this patch and have a ~6 months gap during which users with the 7900 GPUs won't be able to build ROCm with a stable Clang version, requiring distro maintainers to use several toolchains and source-based distro users to use differentl compatibility patches for different ROCm releases. So basically when 8900 GPUs are announced, clang would support ROCm for 7900 GPUs 😅

Would there be a way to retain at least *some* backwards compatibility or version interoperability? For instance, via an #ifdef CLANG_VERSION_MAJOR in the device libs and an #ifdef INCOMPATIBLE_AMDGPU_INSTS in Clang?

This would obviously very ugly, but it still seems better to me than locking out users (and more likely, ROCm contributors) from using 7900 GPUs if they are unable to build Clang themselves. Users already complain about how hard it is to build ROCm, and they also complain about the frequent breaking changes Clang. I'm very much in favor of moving fast, but I'm worried that complete disregard for backwards compatibility like this with no clear upgrade path or fallback mechanism could cause a lot of frustration for users and distro maintainers.

Maybe there is some other, prettier way to solve this? 🥹

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

Hmm maybe ifdefs in the device libs would also just delay the issue. Maybe it really is best to pull this change into Clang 16 and accept the fact that it's an unfortunate situation, but at least give users with very recent hardware the option to use a regular Clang to build ROCm. Realistically, those actually upgrading to Clang 16 early will also be those upgrading to ROCm5.5 early and likely also be those most likely to have 7900 GPUs.

Somehow, telling users "if you have a new GPU you need new Clang + ROCm" and "if you want new ROCm for your old GPU you need to also upgrade Clang" sounds better to me than telling them "if you have a new GPU you are SOL unless you use binary releases or build the amd-llvm-fork" 😅

In D142507#4127275, @aaronmondal wrote:

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

Hmm maybe ifdefs in the device libs would also just delay the issue. Maybe it really is best to pull this change into Clang 16 and accept the fact that it's an unfortunate situation, but at least give users with very recent hardware the option to use a regular Clang to build ROCm. Realistically, those actually upgrading to Clang 16 early will also be those upgrading to ROCm5.5 early and likely also be those most likely to have 7900 GPUs.

Somehow, telling users "if you have a new GPU you need new Clang + ROCm" and "if you want new ROCm for your old GPU you need to also upgrade Clang" sounds better to me than telling them "if you have a new GPU you are SOL unless you use binary releases or build the amd-llvm-fork" 😅

In fact pulling it into clang-16 does not automatically mean it should be the same in the rocm clang build... So this may be a way to go. @b-sumner do you have any objections to backport this into clang-16?

@aaronmondal what exactly backport will look like?

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

In D142507#4127374, @aaronmondal wrote:

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I believe it will need D142407 to be cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.

In D142507#4127382, @rampitec wrote:

In D142507#4127374, @aaronmondal wrote:

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I believe it will need D142407 to be cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.

I have no objection to backporting this, but it may need to be accompanied with a device-libs patch, and I don't know where that patch would be checked in. The ROCm-Device-Libs in github certainly doesn't have a "clang-16" branch.

In D142507#4127421, @b-sumner wrote:

I have no objection to backporting this, but it may need to be accompanied with a device-libs patch, and I don't know where that patch would be checked in. The ROCm-Device-Libs in github certainly doesn't have a "clang-16" branch.

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

I wouldn't recommend it. I would patch whatever device libs are being built in association with clang-16, not staging. Staging device libs is only appropriate for the staging compiler. A hash of device libs from around the time that clang-16 stable released would probably be safe.

In D142507#4127505, @b-sumner wrote:

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

I wouldn't recommend it. I would patch whatever device libs are being built in association with clang-16, not staging. Staging device libs is only appropriate for the staging compiler. A hash of device libs from around the time that clang-16 stable released would probably be safe.

In general the idea is that compiler and device-libs should match. I guess the correct answer then users of clang-16 shall use rocm-5.4.x branch of the device libs?

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAMDGPU.def

2 lines

lib/

Basic/

Targets/

AMDGPU.cpp

4 lines

test/

CodeGenOpenCL/

amdgpu-features.cl

36 lines

builtins-amdgcn-dl-insts-err.cl

4 lines

llvm/

lib/

Target/

AMDGPU/

AMDGPU.td

19 lines

GCNSubtarget.h

5 lines

VOP3PInstructions.td

5 lines

Diff 492514

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "gfx940-insts")			TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "gfx940-insts")
	TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "gfx940-insts")			TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "gfx940-insts")
	TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "gfx940-insts")			TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "gfx940-insts")

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Deep learning builtins.			// Deep learning builtins.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot7-insts")			TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot10-insts")
				arsenmUnsubmitted Not Done Reply Inline Actions I have even less idea what these numbers mean now than I did before. This is also a bitcode compatibility break arsenm: I have even less idea what these numbers mean now than I did before. This is also a bitcode…
				rampitecAuthorUnsubmitted Done Reply Inline Actions They actually never meant anything just because there is no system in the support matrix. I know this one will need simultaneous update of the device lib downstream. rampitec: They actually never meant anything just because there is no system in the support matrix. I…
				arsenmUnsubmitted Not Done Reply Inline Actions why not name these as just the exact instruction name? arsenm: why not name these as just the exact instruction name?
				rampitecAuthorUnsubmitted Done Reply Inline Actions This is legacy thing. When it first appeared it was a single instruction set. Changing it now completely will break a lot of stuff. rampitec: This is legacy thing. When it first appeared it was a single instruction set. Changing it now…
				arsenmUnsubmitted Not Done Reply Inline Actions Then why bother renaming this? We really need to stop breaking feature names arsenm: Then why bother renaming this? We really need to stop breaking feature names
	TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts")			TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts")
	TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", "dot9-insts")			TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", "dot9-insts")
	TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot9-insts")			TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot9-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")			TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")
	TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")			TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")			TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")
	TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot7-insts")			TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot7-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sudot4, "iIbiIbiiIb", "nc", "dot8-insts")			TARGET_BUILTIN(__builtin_amdgcn_sudot4, "iIbiIbiiIb", "nc", "dot8-insts")
	▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	if (isAMDGCN(getTriple())) {
case GK_GFX1101:		case GK_GFX1101:
case GK_GFX1100:		case GK_GFX1100:
IsWave32Capable = true;		IsWave32Capable = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
Features["dot8-insts"] = true;		Features["dot8-insts"] = true;
Features["dot9-insts"] = true;		Features["dot9-insts"] = true;
		Features["dot10-insts"] = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["16-bit-insts"] = true;		Features["16-bit-insts"] = true;
Features["dpp"] = true;		Features["dpp"] = true;
Features["gfx8-insts"] = true;		Features["gfx8-insts"] = true;
Features["gfx9-insts"] = true;		Features["gfx9-insts"] = true;
Features["gfx10-insts"] = true;		Features["gfx10-insts"] = true;
Features["gfx10-3-insts"] = true;		Features["gfx10-3-insts"] = true;
Features["gfx11-insts"] = true;		Features["gfx11-insts"] = true;
break;		break;
case GK_GFX1036:		case GK_GFX1036:
case GK_GFX1035:		case GK_GFX1035:
case GK_GFX1034:		case GK_GFX1034:
case GK_GFX1033:		case GK_GFX1033:
case GK_GFX1032:		case GK_GFX1032:
case GK_GFX1031:		case GK_GFX1031:
case GK_GFX1030:		case GK_GFX1030:
IsWave32Capable = true;		IsWave32Capable = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot2-insts"] = true;		Features["dot2-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
		Features["dot10-insts"] = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["16-bit-insts"] = true;		Features["16-bit-insts"] = true;
Features["dpp"] = true;		Features["dpp"] = true;
Features["gfx8-insts"] = true;		Features["gfx8-insts"] = true;
Features["gfx9-insts"] = true;		Features["gfx9-insts"] = true;
Features["gfx10-insts"] = true;		Features["gfx10-insts"] = true;
Features["gfx10-3-insts"] = true;		Features["gfx10-3-insts"] = true;
Features["s-memrealtime"] = true;		Features["s-memrealtime"] = true;
Features["s-memtime-inst"] = true;		Features["s-memtime-inst"] = true;
break;		break;
case GK_GFX1012:		case GK_GFX1012:
case GK_GFX1011:		case GK_GFX1011:
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot2-insts"] = true;		Features["dot2-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
		Features["dot10-insts"] = true;
[[fallthrough]];		[[fallthrough]];
case GK_GFX1013:		case GK_GFX1013:
case GK_GFX1010:		case GK_GFX1010:
IsWave32Capable = true;		IsWave32Capable = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["16-bit-insts"] = true;		Features["16-bit-insts"] = true;
Features["dpp"] = true;		Features["dpp"] = true;
Show All 17 Lines	case GK_GFX908:
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["mai-insts"] = true;		Features["mai-insts"] = true;
[[fallthrough]];		[[fallthrough]];
case GK_GFX906:		case GK_GFX906:
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot2-insts"] = true;		Features["dot2-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
		Features["dot10-insts"] = true;
[[fallthrough]];		[[fallthrough]];
case GK_GFX90C:		case GK_GFX90C:
case GK_GFX909:		case GK_GFX909:
case GK_GFX904:		case GK_GFX904:
case GK_GFX902:		case GK_GFX902:
case GK_GFX900:		case GK_GFX900:
Features["gfx9-insts"] = true;		Features["gfx9-insts"] = true;
[[fallthrough]];		[[fallthrough]];
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/amdgpu-features.cl

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX802: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX802: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX803: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX803: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX805: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX805: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX810: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX810: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX900: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX900: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX902: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX902: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX90A: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX90A: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"			// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1013: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1013: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1030: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1030: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1031: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1031: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1032: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1032: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1033: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1033: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1034: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1034: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1035: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1035: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1036: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"			// GFX1036: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1100: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"			// GFX1100: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
	// GFX1101: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"			// GFX1101: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
	// GFX1102: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"			// GFX1102: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
	// GFX1103: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"			// GFX1103: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
	// GFX1103-W64: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"			// GFX1103-W64: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"

	kernel void test() {}			kernel void test() {}

clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl

	Show All 10 Lines
	#pragma OPENCL EXTENSION cl_khr_fp16 : enable			#pragma OPENCL EXTENSION cl_khr_fp16 : enable
	kernel void builtins_amdgcn_dl_insts_err(			kernel void builtins_amdgcn_dl_insts_err(
	global float fOut, global int siOut, global uint *uiOut,			global float fOut, global int siOut, global uint *uiOut,
	global short sOut, global int iOut, global half *hOut,			global short sOut, global int iOut, global half *hOut,
	half2 v2hA, half2 v2hB, float fC, half hC,			half2 v2hA, half2 v2hB, float fC, half hC,
	short2 v2ssA, short2 v2ssB, short sC, int siA, int siB, int siC,			short2 v2ssA, short2 v2ssB, short sC, int siA, int siB, int siC,
	ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC,			ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC,
	int A, int B, int C) {			int A, int B, int C) {
	fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}			fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot10-insts}}
	fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}			fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot10-insts}}

	hOut[0] = __builtin_amdgcn_fdot2_f16_f16(v2hA, v2hB, hC); // expected-error {{'__builtin_amdgcn_fdot2_f16_f16' needs target feature dot9-insts}}			hOut[0] = __builtin_amdgcn_fdot2_f16_f16(v2hA, v2hB, hC); // expected-error {{'__builtin_amdgcn_fdot2_f16_f16' needs target feature dot9-insts}}

	sOut[0] = __builtin_amdgcn_fdot2_bf16_bf16(v2ssA, v2ssB, sC); // expected-error {{'__builtin_amdgcn_fdot2_bf16_bf16' needs target feature dot9-insts}}			sOut[0] = __builtin_amdgcn_fdot2_bf16_bf16(v2ssA, v2ssB, sC); // expected-error {{'__builtin_amdgcn_fdot2_bf16_bf16' needs target feature dot9-insts}}

	fOut[3] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2_f32_bf16' needs target feature dot9-insts}}			fOut[3] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2_f32_bf16' needs target feature dot9-insts}}
	fOut[4] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2_f32_bf16' needs target feature dot9-insts}}			fOut[4] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2_f32_bf16' needs target feature dot9-insts}}

	Show All 24 Lines

llvm/lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 575 Lines • ▼ Show 20 Lines	def FeatureDot6Insts : SubtargetFeature<"dot6-insts",
"HasDot6Insts",		"HasDot6Insts",
"true",		"true",
"Has v_dot4c_i32_i8 instruction"		"Has v_dot4c_i32_i8 instruction"
>;		>;

def FeatureDot7Insts : SubtargetFeature<"dot7-insts",		def FeatureDot7Insts : SubtargetFeature<"dot7-insts",
"HasDot7Insts",		"HasDot7Insts",
"true",		"true",
"Has v_dot2_f32_f16, v_dot4_u32_u8, v_dot8_u32_u4 instructions"		"Has v_dot4_u32_u8, v_dot8_u32_u4 instructions"
>;		>;

def FeatureDot8Insts : SubtargetFeature<"dot8-insts",		def FeatureDot8Insts : SubtargetFeature<"dot8-insts",
"HasDot8Insts",		"HasDot8Insts",
"true",		"true",
"Has v_dot4_i32_iu8, v_dot8_i32_iu4 instructions"		"Has v_dot4_i32_iu8, v_dot8_i32_iu4 instructions"
>;		>;

def FeatureDot9Insts : SubtargetFeature<"dot9-insts",		def FeatureDot9Insts : SubtargetFeature<"dot9-insts",
"HasDot9Insts",		"HasDot9Insts",
"true",		"true",
"Has v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16 instructions"		"Has v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16 instructions"
>;		>;

		def FeatureDot10Insts : SubtargetFeature<"dot10-insts",
		"HasDot10Insts",
		"true",
		"Has v_dot2_f32_f16 instruction"
		>;

def FeatureMAIInsts : SubtargetFeature<"mai-insts",		def FeatureMAIInsts : SubtargetFeature<"mai-insts",
"HasMAIInsts",		"HasMAIInsts",
"true",		"true",
"Has mAI instructions"		"Has mAI instructions"
>;		>;

def FeatureFP8Insts : SubtargetFeature<"fp8-insts",		def FeatureFP8Insts : SubtargetFeature<"fp8-insts",
"HasFP8Insts",		"HasFP8Insts",
▲ Show 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	def FeatureISAVersion9_0_6 : FeatureSet<
FeatureDsSrc2Insts,		FeatureDsSrc2Insts,
FeatureExtendedImageInsts,		FeatureExtendedImageInsts,
FeatureImageInsts,		FeatureImageInsts,
FeatureMadMacF32Insts,		FeatureMadMacF32Insts,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
FeatureSupportsSRAMECC,		FeatureSupportsSRAMECC,
FeatureImageGather4D16Bug]>;		FeatureImageGather4D16Bug]>;

def FeatureISAVersion9_0_8 : FeatureSet<		def FeatureISAVersion9_0_8 : FeatureSet<
[FeatureGFX9,		[FeatureGFX9,
HalfRate64Ops,		HalfRate64Ops,
FeatureFmaMixInsts,		FeatureFmaMixInsts,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
FeatureDsSrc2Insts,		FeatureDsSrc2Insts,
FeatureExtendedImageInsts,		FeatureExtendedImageInsts,
FeatureImageInsts,		FeatureImageInsts,
FeatureMadMacF32Insts,		FeatureMadMacF32Insts,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot3Insts,		FeatureDot3Insts,
FeatureDot4Insts,		FeatureDot4Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
FeatureMAIInsts,		FeatureMAIInsts,
FeaturePkFmacF16Inst,		FeaturePkFmacF16Inst,
FeatureAtomicFaddNoRtnInsts,		FeatureAtomicFaddNoRtnInsts,
FeatureAtomicPkFaddNoRtnInsts,		FeatureAtomicPkFaddNoRtnInsts,
FeatureSupportsSRAMECC,		FeatureSupportsSRAMECC,
FeatureMFMAInlineLiteralBug,		FeatureMFMAInlineLiteralBug,
FeatureImageGather4D16Bug]>;		FeatureImageGather4D16Bug]>;

Show All 16 Lines	def FeatureISAVersion9_0_A : FeatureSet<
FeatureFmacF64Inst,		FeatureFmacF64Inst,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot3Insts,		FeatureDot3Insts,
FeatureDot4Insts,		FeatureDot4Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
Feature64BitDPP,		Feature64BitDPP,
FeaturePackedFP32Ops,		FeaturePackedFP32Ops,
FeatureMAIInsts,		FeatureMAIInsts,
FeaturePkFmacF16Inst,		FeaturePkFmacF16Inst,
FeatureAtomicFaddRtnInsts,		FeatureAtomicFaddRtnInsts,
FeatureAtomicFaddNoRtnInsts,		FeatureAtomicFaddNoRtnInsts,
FeatureAtomicPkFaddNoRtnInsts,		FeatureAtomicPkFaddNoRtnInsts,
FeatureImageInsts,		FeatureImageInsts,
Show All 23 Lines	def FeatureISAVersion9_4_0 : FeatureSet<
FeatureFmacF64Inst,		FeatureFmacF64Inst,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot3Insts,		FeatureDot3Insts,
FeatureDot4Insts,		FeatureDot4Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
Feature64BitDPP,		Feature64BitDPP,
FeaturePackedFP32Ops,		FeaturePackedFP32Ops,
FeatureMAIInsts,		FeatureMAIInsts,
FeatureFP8Insts,		FeatureFP8Insts,
FeaturePkFmacF16Inst,		FeaturePkFmacF16Inst,
FeatureAtomicFaddRtnInsts,		FeatureAtomicFaddRtnInsts,
FeatureAtomicFaddNoRtnInsts,		FeatureAtomicFaddNoRtnInsts,
FeatureAtomicPkFaddNoRtnInsts,		FeatureAtomicPkFaddNoRtnInsts,
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	!listconcat(FeatureGroup.GFX10_1_Bugs,
[FeatureGFX10,		[FeatureGFX10,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
FeatureNSAEncoding,		FeatureNSAEncoding,
FeatureNSAMaxSize5,		FeatureNSAMaxSize5,
FeatureWavefrontSize32,		FeatureWavefrontSize32,
FeatureScalarStores,		FeatureScalarStores,
FeatureScalarAtomics,		FeatureScalarAtomics,
FeatureScalarFlatScratchInsts,		FeatureScalarFlatScratchInsts,
FeatureGetWaveIdInst,		FeatureGetWaveIdInst,
FeatureMadMacF32Insts,		FeatureMadMacF32Insts,
FeatureDsSrc2Insts,		FeatureDsSrc2Insts,
FeatureLdsMisalignedBug,		FeatureLdsMisalignedBug,
FeatureSupportsXNACK,		FeatureSupportsXNACK,
FeatureBackOffBarrier])>;		FeatureBackOffBarrier])>;

def FeatureISAVersion10_1_2 : FeatureSet<		def FeatureISAVersion10_1_2 : FeatureSet<
!listconcat(FeatureGroup.GFX10_1_Bugs,		!listconcat(FeatureGroup.GFX10_1_Bugs,
[FeatureGFX10,		[FeatureGFX10,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
FeatureNSAEncoding,		FeatureNSAEncoding,
FeatureNSAMaxSize5,		FeatureNSAMaxSize5,
FeatureWavefrontSize32,		FeatureWavefrontSize32,
FeatureScalarStores,		FeatureScalarStores,
FeatureScalarAtomics,		FeatureScalarAtomics,
FeatureScalarFlatScratchInsts,		FeatureScalarFlatScratchInsts,
FeatureGetWaveIdInst,		FeatureGetWaveIdInst,
FeatureMadMacF32Insts,		FeatureMadMacF32Insts,
Show All 28 Lines	def FeatureISAVersion10_3_0 : FeatureSet<
FeatureGFX10_3Insts,		FeatureGFX10_3Insts,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot1Insts,		FeatureDot1Insts,
FeatureDot2Insts,		FeatureDot2Insts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot6Insts,		FeatureDot6Insts,
FeatureDot7Insts,		FeatureDot7Insts,
		FeatureDot10Insts,
FeatureNSAEncoding,		FeatureNSAEncoding,
FeatureNSAMaxSize13,		FeatureNSAMaxSize13,
FeatureWavefrontSize32,		FeatureWavefrontSize32,
FeatureShaderCyclesRegister,		FeatureShaderCyclesRegister,
FeatureBackOffBarrier]>;		FeatureBackOffBarrier]>;

def FeatureISAVersion11_Common : FeatureSet<		def FeatureISAVersion11_Common : FeatureSet<
[FeatureGFX11,		[FeatureGFX11,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
FeatureDLInsts,		FeatureDLInsts,
FeatureDot5Insts,		FeatureDot5Insts,
FeatureDot7Insts,		FeatureDot7Insts,
FeatureDot8Insts,		FeatureDot8Insts,
FeatureDot9Insts,		FeatureDot9Insts,
		FeatureDot10Insts,
FeatureNSAEncoding,		FeatureNSAEncoding,
FeatureNSAMaxSize5,		FeatureNSAMaxSize5,
FeatureWavefrontSize32,		FeatureWavefrontSize32,
FeatureShaderCyclesRegister,		FeatureShaderCyclesRegister,
FeatureArchitectedFlatScratch,		FeatureArchitectedFlatScratch,
FeatureAtomicFaddRtnInsts,		FeatureAtomicFaddRtnInsts,
FeatureAtomicFaddNoRtnInsts,		FeatureAtomicFaddNoRtnInsts,
FeatureFlatAtomicFaddF32Inst,		FeatureFlatAtomicFaddF32Inst,
▲ Show 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	def HasDot7Insts : Predicate<"Subtarget->hasDot7Insts()">,
AssemblerPredicate<(all_of FeatureDot7Insts)>;		AssemblerPredicate<(all_of FeatureDot7Insts)>;

def HasDot8Insts : Predicate<"Subtarget->hasDot8Insts()">,		def HasDot8Insts : Predicate<"Subtarget->hasDot8Insts()">,
AssemblerPredicate<(all_of FeatureDot8Insts)>;		AssemblerPredicate<(all_of FeatureDot8Insts)>;

def HasDot9Insts : Predicate<"Subtarget->hasDot9Insts()">,		def HasDot9Insts : Predicate<"Subtarget->hasDot9Insts()">,
AssemblerPredicate<(all_of FeatureDot9Insts)>;		AssemblerPredicate<(all_of FeatureDot9Insts)>;

		def HasDot10Insts : Predicate<"Subtarget->hasDot10Insts()">,
		AssemblerPredicate<(all_of FeatureDot10Insts)>;

def HasGetWaveIdInst : Predicate<"Subtarget->hasGetWaveIdInst()">,		def HasGetWaveIdInst : Predicate<"Subtarget->hasGetWaveIdInst()">,
AssemblerPredicate<(all_of FeatureGetWaveIdInst)>;		AssemblerPredicate<(all_of FeatureGetWaveIdInst)>;

def HasMAIInsts : Predicate<"Subtarget->hasMAIInsts()">,		def HasMAIInsts : Predicate<"Subtarget->hasMAIInsts()">,
AssemblerPredicate<(all_of FeatureMAIInsts)>;		AssemblerPredicate<(all_of FeatureMAIInsts)>;

def HasSMemRealTime : Predicate<"Subtarget->hasSMemRealTime()">,		def HasSMemRealTime : Predicate<"Subtarget->hasSMemRealTime()">,
AssemblerPredicate<(all_of FeatureSMemRealTime)>;		AssemblerPredicate<(all_of FeatureSMemRealTime)>;
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSubtarget.h

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	protected:
bool HasDot2Insts = false;		bool HasDot2Insts = false;
bool HasDot3Insts = false;		bool HasDot3Insts = false;
bool HasDot4Insts = false;		bool HasDot4Insts = false;
bool HasDot5Insts = false;		bool HasDot5Insts = false;
bool HasDot6Insts = false;		bool HasDot6Insts = false;
bool HasDot7Insts = false;		bool HasDot7Insts = false;
bool HasDot8Insts = false;		bool HasDot8Insts = false;
bool HasDot9Insts = false;		bool HasDot9Insts = false;
		bool HasDot10Insts = false;
bool HasMAIInsts = false;		bool HasMAIInsts = false;
bool HasFP8Insts = false;		bool HasFP8Insts = false;
bool HasPkFmacF16Inst = false;		bool HasPkFmacF16Inst = false;
bool HasAtomicFaddRtnInsts = false;		bool HasAtomicFaddRtnInsts = false;
bool HasAtomicFaddNoRtnInsts = false;		bool HasAtomicFaddNoRtnInsts = false;
bool HasAtomicPkFaddNoRtnInsts = false;		bool HasAtomicPkFaddNoRtnInsts = false;
bool HasFlatAtomicFaddF32Inst = false;		bool HasFlatAtomicFaddF32Inst = false;
bool SupportsSRAMECC = false;		bool SupportsSRAMECC = false;
▲ Show 20 Lines • Show All 576 Lines • ▼ Show 20 Lines	public:
bool hasDot8Insts() const {		bool hasDot8Insts() const {
return HasDot8Insts;		return HasDot8Insts;
}		}

bool hasDot9Insts() const {		bool hasDot9Insts() const {
return HasDot9Insts;		return HasDot9Insts;
}		}

		bool hasDot10Insts() const {
		return HasDot10Insts;
		}

bool hasMAIInsts() const {		bool hasMAIInsts() const {
return HasMAIInsts;		return HasMAIInsts;
}		}

bool hasFP8Insts() const {		bool hasFP8Insts() const {
return HasFP8Insts;		return HasFP8Insts;
}		}

▲ Show 20 Lines • Show All 582 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP3PInstructions.td

	Show First 20 Lines • Show All 331 Lines • ▼ Show 20 Lines

	defm V_DOT2_I32_I16 : VOP3PInst<"v_dot2_i32_i16",			defm V_DOT2_I32_I16 : VOP3PInst<"v_dot2_i32_i16",
	VOP3P_Profile<VOP_I32_V2I16_V2I16_I32>, int_amdgcn_sdot2, 1>;			VOP3P_Profile<VOP_I32_V2I16_V2I16_I32>, int_amdgcn_sdot2, 1>;
	defm V_DOT2_U32_U16 : VOP3PInst<"v_dot2_u32_u16",			defm V_DOT2_U32_U16 : VOP3PInst<"v_dot2_u32_u16",
	VOP3P_Profile<VOP_I32_V2I16_V2I16_I32>, int_amdgcn_udot2, 1>;			VOP3P_Profile<VOP_I32_V2I16_V2I16_I32>, int_amdgcn_udot2, 1>;

	} // End SubtargetPredicate = HasDot2Insts			} // End SubtargetPredicate = HasDot2Insts

	let SubtargetPredicate = HasDot7Insts in {			let SubtargetPredicate = HasDot10Insts in

	defm V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",			defm V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
	VOP3P_Profile<VOP_F32_V2F16_V2F16_F32, VOP3_REGULAR, /HasDPP/ 1>,			VOP3P_Profile<VOP_F32_V2F16_V2F16_F32, VOP3_REGULAR, /HasDPP/ 1>,
	AMDGPUfdot2, 1/ExplicitClamp/>;			AMDGPUfdot2, 1/ExplicitClamp/>;

				let SubtargetPredicate = HasDot7Insts in {
	defm V_DOT4_U32_U8 : VOP3PInst<"v_dot4_u32_u8",			defm V_DOT4_U32_U8 : VOP3PInst<"v_dot4_u32_u8",
	VOP3P_Profile<VOP_I32_I32_I32_I32, VOP3_PACKED>, int_amdgcn_udot4, 1>;			VOP3P_Profile<VOP_I32_I32_I32_I32, VOP3_PACKED>, int_amdgcn_udot4, 1>;
	defm V_DOT8_U32_U4 : VOP3PInst<"v_dot8_u32_u4",			defm V_DOT8_U32_U4 : VOP3PInst<"v_dot8_u32_u4",
	VOP3P_Profile<VOP_I32_I32_I32_I32, VOP3_PACKED>, int_amdgcn_udot8, 1>;			VOP3P_Profile<VOP_I32_I32_I32_I32, VOP3_PACKED>, int_amdgcn_udot8, 1>;

	} // End SubtargetPredicate = HasDot7Insts			} // End SubtargetPredicate = HasDot7Insts

	let SubtargetPredicate = HasDot1Insts in {			let SubtargetPredicate = HasDot1Insts in {
	▲ Show 20 Lines • Show All 722 Lines • ▼ Show 20 Lines
	// but the opcode stayed the same so we need to put these in a			// but the opcode stayed the same so we need to put these in a
	// different DecoderNamespace to avoid the ambiguity.			// different DecoderNamespace to avoid the ambiguity.
	defm V_FMA_MIX_F32 : VOP3P_Real_vi <0x20>;			defm V_FMA_MIX_F32 : VOP3P_Real_vi <0x20>;
	defm V_FMA_MIXLO_F16 : VOP3P_Real_vi <0x21>;			defm V_FMA_MIXLO_F16 : VOP3P_Real_vi <0x21>;
	defm V_FMA_MIXHI_F16 : VOP3P_Real_vi <0x22>;			defm V_FMA_MIXHI_F16 : VOP3P_Real_vi <0x22>;
	}			}
	}			}


	defm V_DOT2_I32_I16 : VOP3P_Real_vi <0x26>;			defm V_DOT2_I32_I16 : VOP3P_Real_vi <0x26>;
	foadUnsubmitted Done Reply Inline Actions Is this because Real instructions copy predicates from the Pseudo? Seems like this could go in as a separate obvious cleanup. foad: Is this because Real instructions copy predicates from the Pseudo? Seems like this could go in…
	rampitecAuthorUnsubmitted Done Reply Inline Actions D142575 rampitec: D142575
	defm V_DOT2_U32_U16 : VOP3P_Real_vi <0x27>;			defm V_DOT2_U32_U16 : VOP3P_Real_vi <0x27>;

	defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;			defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;
	defm V_DOT4_U32_U8 : VOP3P_Real_vi <0x29>;			defm V_DOT4_U32_U8 : VOP3P_Real_vi <0x29>;
	defm V_DOT8_U32_U4 : VOP3P_Real_vi <0x2b>;			defm V_DOT8_U32_U4 : VOP3P_Real_vi <0x2b>;

	defm V_DOT4_I32_I8 : VOP3P_Real_vi <0x28>;			defm V_DOT4_I32_I8 : VOP3P_Real_vi <0x28>;
	defm V_DOT8_I32_I4 : VOP3P_Real_vi <0x2a>;			defm V_DOT8_I32_I4 : VOP3P_Real_vi <0x2a>;
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines