This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsAMDGPU.def
-
lib/
-
Basic/Targets/
-
Targets/
2/5
AMDGPU.cpp
-
CodeGen/
-
CGBuiltin.cpp
-
test/
-
CodeGenOpenCL/
-
amdgpu-features-illegal.cl
1/3
amdgpu-features.cl
-
builtins-amdgcn-gfx10.cl
-
builtins-amdgcn-wave32.cl
-
builtins-amdgcn-wave64.cl
-
OpenMP/
-
amdgcn-attributes.cpp
-
SemaOpenCL/
-
builtins-amdgcn-error-wave32.cl
-
builtins-amdgcn-error-wave64.cl

Differential D82087

AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
ClosedPublic

Authored by arsenm on Jun 18 2020, 5:35 AM.

Download Raw Diff

Details

Reviewers

yaxunl
rampitec
b-sumner
foad
nhaehnle
sameerds

Summary

I wasn't sure what the best strategy was for the wave size
difference. I went for an explicit, enforced builtin for each. The
other option would be to just assume wave64, and IRGen the different
mangling + zext. I didn't see an obvious way to check the wave size
where builtins are emitted, and it might be beneficial to force you to
acknowledge the wave size difference? Or it might be an unnecessary
complication.

The behavior is also slightly odd when directly specifying
-target-feature to cc1 for +/- the size (since we have both positive
and negative forms of both sizes), but this is probably unimportant.

We're also still missing a predefined macro for the wave size, which
we probably need.

Diff Detail

Event Timeline

arsenm created this revision.Jun 18 2020, 5:35 AM

Herald added subscribers: kerbowa, t-tye, tpr and 4 others. · View Herald TranscriptJun 18 2020, 5:35 AM

ping

sameerds added a subscriber: sameerds.Jul 8 2020, 8:34 PM

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

clang/lib/Basic/Targets/AMDGPU.cpp
348	So the implication here is that wave32 is the preferred choice on newer architectures, and hence the default when available?

yaxunl added inline comments.Jul 9 2020, 5:45 AM

clang/lib/Basic/Targets/AMDGPU.cpp
353	what's the default wave front size in backend for gfx10* before this change?
clang/test/CodeGenOpenCL/amdgpu-features.cl
7	what happens if both +wavefrontsize32 and +wavefrontsize64 are specified?

In D82087#2140778, @sameerds wrote:

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

This seems like a defect in the design to me

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

arsenm marked 2 inline comments as done.Jul 10 2020, 11:04 AM

arsenm added inline comments.

clang/lib/Basic/Targets/AMDGPU.cpp
348	Yes, this has always been the case
353	It's always been wave32

In D82087#2140778, @sameerds wrote:

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Not sure if the frontend should try to infer warpsize and the mask size, or even whether it can in all cases. But this can result in wrong behaviour when the program passes 32-bit mask but then gets compiled for a 64-bit mask. It's easy to say that the programmer must not assume a warp-size, but it would be useful if the language can altogether avoid the confusion.

In D82087#2144712, @arsenm wrote:

In D82087#2140778, @sameerds wrote:

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

This seems like a defect in the design to me

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I tend to agree. The HIP vote/ballot builtins are also missing a mask parameter, whose type needs to match the wavesize.

In D82087#2146170, @sameerds wrote:

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Agree that FE should have a predefined macro for wave front size. Since it is per amdgpu target and not per language, it should be named as __amdgpu_wavefront_size__ or something similar, then it could be used by all languages.

Then we need to initialize warpSize in HIP header by this macro
https://github.com/ROCm-Developer-Tools/HIP/blob/386a0e0123d67b95b4c0ebb3ebcf1d1615758146/include/hip/hcc_detail/device_functions.h#L300

I tends to think we should define __ballot in HIP header conditionally by the wavefront size so that the return type is correct for both wave32 and wave64 mode. We should assume normal HIP compilation always have -mcpu specified so that wavefront size is known at compile time. If -mcpu is not specified probably we should not define warpSize or __ballot.

A macro for wavefront size would make targeting gfx10 for openmp easier.

We currently use an int32_t for nvptx and an int64_t for amdgcn in various runtime function interfaces. I'd like to be able to set the latter based on said macro.

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2022, 4:43 PM

Herald added a subscriber: kosarev. · View Herald Transcript

In D82087#3797883, @jdoerfert wrote:

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

What do you think about using the two separate builtins, vs. one magic builtin that auto-changes the wavesize?

In D82087#3803464, @arsenm wrote:

In D82087#3797883, @jdoerfert wrote:

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

What do you think about using the two separate builtins, vs. one magic builtin that auto-changes the wavesize?

The magic one would also change its return type, or always be i64 with high bits (zext or maybe sext or maybe copy of low), so less magic seems clearer

Rebase. Use _w32/_w64 suffixes since some other wave specific builtins seem to have gone with that convention

Harbormaster completed remote builds in B204686: Diff 484980.Dec 22 2022, 5:02 PM

sameerds added inline comments.Dec 23 2022, 1:01 AM

clang/test/CodeGenOpenCL/amdgpu-features.cl
7	Shouldn't this be separately an error in itself? Is it tested elsewhere?

arsenm added inline comments.Dec 23 2022, 4:53 AM

clang/test/CodeGenOpenCL/amdgpu-features.cl
7	It looks like you end up with both features set by clang, and wave64 wins in codegen

This doesn't work correctly for unspecified wavesize for non-wave32 targets

Fix unknown target handling, diagnose some more of the errors

Harbormaster completed remote builds in B204774: Diff 485110.Dec 23 2022, 7:23 AM

sameerds accepted this revision.Dec 26 2022, 12:39 AM

sameerds added inline comments.

clang/lib/Basic/Targets/AMDGPU.cpp
353	I would have preferred this to be a separate change, just like the FIXME for diagnosing wavefrontsize32 on targets that don't support it. But not feeling strongly enough to block this change!

This revision is now accepted and ready to land.Dec 26 2022, 12:39 AM

And note that the change description is written in a first-person train of thought. Please do rewrite it!

f4bcd7f598331457cfe74e459b489d4098369511

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAMDGPU.def

21 lines

lib/

Basic/

Targets/

AMDGPU.cpp

33 lines

CodeGen/

CGBuiltin.cpp

7 lines

test/

CodeGenOpenCL/

amdgpu-features-illegal.cl

6 lines

amdgpu-features.cl

87 lines

builtins-amdgcn-gfx10.cl

7 lines

builtins-amdgcn-wave32.cl

26 lines

builtins-amdgcn-wave64.cl

26 lines

OpenMP/

amdgcn-attributes.cpp

4 lines

SemaOpenCL/

builtins-amdgcn-error-wave32.cl

17 lines

builtins-amdgcn-error-wave64.cl

16 lines

Diff 485110

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_amdgcn_cubeid, "ffff", "nc")			BUILTIN(__builtin_amdgcn_cubeid, "ffff", "nc")
	BUILTIN(__builtin_amdgcn_cubesc, "ffff", "nc")			BUILTIN(__builtin_amdgcn_cubesc, "ffff", "nc")
	BUILTIN(__builtin_amdgcn_cubetc, "ffff", "nc")			BUILTIN(__builtin_amdgcn_cubetc, "ffff", "nc")
	BUILTIN(__builtin_amdgcn_cubema, "ffff", "nc")			BUILTIN(__builtin_amdgcn_cubema, "ffff", "nc")
	BUILTIN(__builtin_amdgcn_s_sleep, "vIi", "n")			BUILTIN(__builtin_amdgcn_s_sleep, "vIi", "n")
	BUILTIN(__builtin_amdgcn_s_incperflevel, "vIi", "n")			BUILTIN(__builtin_amdgcn_s_incperflevel, "vIi", "n")
	BUILTIN(__builtin_amdgcn_s_decperflevel, "vIi", "n")			BUILTIN(__builtin_amdgcn_s_decperflevel, "vIi", "n")
	BUILTIN(__builtin_amdgcn_s_setprio, "vIs", "n")			BUILTIN(__builtin_amdgcn_s_setprio, "vIs", "n")
	BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
	BUILTIN(__builtin_amdgcn_uicmpl, "WUiWUiWUiIi", "nc")
	BUILTIN(__builtin_amdgcn_sicmp, "WUiiiIi", "nc")
	BUILTIN(__builtin_amdgcn_sicmpl, "WUiWiWiIi", "nc")
	BUILTIN(__builtin_amdgcn_fcmp, "WUiddIi", "nc")
	BUILTIN(__builtin_amdgcn_fcmpf, "WUiffIi", "nc")
	BUILTIN(__builtin_amdgcn_ds_swizzle, "iiIi", "nc")			BUILTIN(__builtin_amdgcn_ds_swizzle, "iiIi", "nc")
	BUILTIN(__builtin_amdgcn_ds_permute, "iii", "nc")			BUILTIN(__builtin_amdgcn_ds_permute, "iii", "nc")
	BUILTIN(__builtin_amdgcn_ds_bpermute, "iii", "nc")			BUILTIN(__builtin_amdgcn_ds_bpermute, "iii", "nc")
	BUILTIN(__builtin_amdgcn_readfirstlane, "ii", "nc")			BUILTIN(__builtin_amdgcn_readfirstlane, "ii", "nc")
	BUILTIN(__builtin_amdgcn_readlane, "iii", "nc")			BUILTIN(__builtin_amdgcn_readlane, "iii", "nc")
	BUILTIN(__builtin_amdgcn_fmed3f, "ffff", "nc")			BUILTIN(__builtin_amdgcn_fmed3f, "ffff", "nc")
	BUILTIN(__builtin_amdgcn_ds_faddf, "ff*3fIiIiIb", "n")			BUILTIN(__builtin_amdgcn_ds_faddf, "ff*3fIiIiIb", "n")
	BUILTIN(__builtin_amdgcn_ds_fminf, "ff*3fIiIiIb", "n")			BUILTIN(__builtin_amdgcn_ds_fminf, "ff*3fIiIiIb", "n")
	Show All 14 Lines
	BUILTIN(__builtin_amdgcn_msad_u8, "UiUiUiUi", "nc")			BUILTIN(__builtin_amdgcn_msad_u8, "UiUiUiUi", "nc")
	BUILTIN(__builtin_amdgcn_sad_hi_u8, "UiUiUiUi", "nc")			BUILTIN(__builtin_amdgcn_sad_hi_u8, "UiUiUiUi", "nc")
	BUILTIN(__builtin_amdgcn_sad_u16, "UiUiUiUi", "nc")			BUILTIN(__builtin_amdgcn_sad_u16, "UiUiUiUi", "nc")
	BUILTIN(__builtin_amdgcn_qsad_pk_u16_u8, "WUiWUiUiWUi", "nc")			BUILTIN(__builtin_amdgcn_qsad_pk_u16_u8, "WUiWUiUiWUi", "nc")
	BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc")			BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc")
	BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")			BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				// Ballot builtins.
				//===----------------------------------------------------------------------===//

				TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "Uib", "nc", "wavefrontsize32")
				TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "LUib", "nc", "wavefrontsize64")

				// Deprecated intrinsics in favor of __builtin_amdgn_ballot_{w32\|w64}
				BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
				BUILTIN(__builtin_amdgcn_uicmpl, "WUiWUiWUiIi", "nc")
				BUILTIN(__builtin_amdgcn_sicmp, "WUiiiIi", "nc")
				BUILTIN(__builtin_amdgcn_sicmpl, "WUiWiWiIi", "nc")
				BUILTIN(__builtin_amdgcn_fcmp, "WUiddIi", "nc")
				BUILTIN(__builtin_amdgcn_fcmpf, "WUiffIi", "nc")

				//===----------------------------------------------------------------------===//
	// CI+ only builtins.			// CI+ only builtins.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	TARGET_BUILTIN(__builtin_amdgcn_s_dcache_inv_vol, "v", "n", "ci-insts")			TARGET_BUILTIN(__builtin_amdgcn_s_dcache_inv_vol, "v", "n", "ci-insts")
	TARGET_BUILTIN(__builtin_amdgcn_buffer_wbinvl1_vol, "v", "n", "ci-insts")			TARGET_BUILTIN(__builtin_amdgcn_buffer_wbinvl1_vol, "v", "n", "ci-insts")
	TARGET_BUILTIN(__builtin_amdgcn_ds_gws_sema_release_all, "vUi", "n", "ci-insts")			TARGET_BUILTIN(__builtin_amdgcn_ds_gws_sema_release_all, "vUi", "n", "ci-insts")
	TARGET_BUILTIN(__builtin_amdgcn_is_shared, "bvC*0", "nc", "flat-address-space")			TARGET_BUILTIN(__builtin_amdgcn_is_shared, "bvC*0", "nc", "flat-address-space")
	TARGET_BUILTIN(__builtin_amdgcn_is_private, "bvC*0", "nc", "flat-address-space")			TARGET_BUILTIN(__builtin_amdgcn_is_private, "bvC*0", "nc", "flat-address-space")

	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/AMDGPU.cpp

//===--- AMDGPU.cpp - Implement AMDGPU target feature support -------------===//		//===--- AMDGPU.cpp - Implement AMDGPU target feature support -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements AMDGPU TargetInfo objects.		// This file implements AMDGPU TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "clang/Basic/Builtins.h"		#include "clang/Basic/Builtins.h"
#include "clang/Basic/CodeGenOptions.h"		#include "clang/Basic/CodeGenOptions.h"
		#include "clang/Basic/Diagnostic.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/MacroBuilder.h"		#include "clang/Basic/MacroBuilder.h"
#include "clang/Basic/TargetBuiltins.h"		#include "clang/Basic/TargetBuiltins.h"

using namespace clang;		using namespace clang;
using namespace clang::targets;		using namespace clang::targets;

namespace clang {		namespace clang {
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines

ArrayRef<const char *> AMDGPUTargetInfo::getGCCRegNames() const {		ArrayRef<const char *> AMDGPUTargetInfo::getGCCRegNames() const {
return llvm::makeArrayRef(GCCRegNames);		return llvm::makeArrayRef(GCCRegNames);
}		}

bool AMDGPUTargetInfo::initFeatureMap(		bool AMDGPUTargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,		llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeatureVec) const {		const std::vector<std::string> &FeatureVec) const {
		const bool IsNullCPU = CPU.empty();
		bool IsWave32Capable = false;

using namespace llvm::AMDGPU;		using namespace llvm::AMDGPU;

// XXX - What does the member GPU mean if device name string passed here?		// XXX - What does the member GPU mean if device name string passed here?
if (isAMDGCN(getTriple())) {		if (isAMDGCN(getTriple())) {
switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) {		switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) {
case GK_GFX1103:		case GK_GFX1103:
case GK_GFX1102:		case GK_GFX1102:
case GK_GFX1101:		case GK_GFX1101:
case GK_GFX1100:		case GK_GFX1100:
		IsWave32Capable = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
Features["dot8-insts"] = true;		Features["dot8-insts"] = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["flat-address-space"] = true;		Features["flat-address-space"] = true;
Features["16-bit-insts"] = true;		Features["16-bit-insts"] = true;
Features["dpp"] = true;		Features["dpp"] = true;
Features["gfx8-insts"] = true;		Features["gfx8-insts"] = true;
Features["gfx9-insts"] = true;		Features["gfx9-insts"] = true;
Features["gfx10-insts"] = true;		Features["gfx10-insts"] = true;
Features["gfx10-3-insts"] = true;		Features["gfx10-3-insts"] = true;
Features["gfx11-insts"] = true;		Features["gfx11-insts"] = true;
break;		break;
case GK_GFX1036:		case GK_GFX1036:
case GK_GFX1035:		case GK_GFX1035:
case GK_GFX1034:		case GK_GFX1034:
case GK_GFX1033:		case GK_GFX1033:
case GK_GFX1032:		case GK_GFX1032:
case GK_GFX1031:		case GK_GFX1031:
case GK_GFX1030:		case GK_GFX1030:
		IsWave32Capable = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot2-insts"] = true;		Features["dot2-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["flat-address-space"] = true;		Features["flat-address-space"] = true;
Show All 11 Lines	case GK_GFX1011:
Features["dot1-insts"] = true;		Features["dot1-insts"] = true;
Features["dot2-insts"] = true;		Features["dot2-insts"] = true;
Features["dot5-insts"] = true;		Features["dot5-insts"] = true;
Features["dot6-insts"] = true;		Features["dot6-insts"] = true;
Features["dot7-insts"] = true;		Features["dot7-insts"] = true;
[[fallthrough]];		[[fallthrough]];
case GK_GFX1013:		case GK_GFX1013:
case GK_GFX1010:		case GK_GFX1010:
		IsWave32Capable = true;
Features["dl-insts"] = true;		Features["dl-insts"] = true;
Features["ci-insts"] = true;		Features["ci-insts"] = true;
Features["flat-address-space"] = true;		Features["flat-address-space"] = true;
Features["16-bit-insts"] = true;		Features["16-bit-insts"] = true;
Features["dpp"] = true;		Features["dpp"] = true;
Features["gfx8-insts"] = true;		Features["gfx8-insts"] = true;
Features["gfx9-insts"] = true;		Features["gfx9-insts"] = true;
Features["gfx10-insts"] = true;		Features["gfx10-insts"] = true;
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	if (isAMDGCN(getTriple())) {
case GK_R630:		case GK_R630:
case GK_R600:		case GK_R600:
break;		break;
default:		default:
llvm_unreachable("Unhandled GPU!");		llvm_unreachable("Unhandled GPU!");
}		}
}		}

return TargetInfo::initFeatureMap(Features, Diags, CPU, FeatureVec);		if (!TargetInfo::initFeatureMap(Features, Diags, CPU, FeatureVec))
		return false;

		// FIXME: Not diagnosing wavefrontsize32 on wave64 only targets.
		const bool HaveWave32 =
		(IsWave32Capable \|\| IsNullCPU) && Features.count("wavefrontsize32");
		sameerdsUnsubmitted Not Done Reply Inline Actions So the implication here is that wave32 is the preferred choice on newer architectures, and hence the default when available? sameerds: So the implication here is that wave32 is the preferred choice on newer architectures, and…
		arsenmAuthorUnsubmitted Done Reply Inline Actions Yes, this has always been the case arsenm: Yes, this has always been the case
		const bool HaveWave64 = Features.count("wavefrontsize64");

		// TODO: Should move this logic into TargetParser
		if (HaveWave32 && HaveWave64) {
		Diags.Report(diag::err_invalid_feature_combination)
		yaxunlUnsubmitted Not Done Reply Inline Actions what's the default wave front size in backend for gfx10* before this change? yaxunl: what's the default wave front size in backend for gfx10* before this change?
		arsenmAuthorUnsubmitted Done Reply Inline Actions It's always been wave32 arsenm: It's always been wave32
		sameerdsUnsubmitted Not Done Reply Inline Actions I would have preferred this to be a separate change, just like the FIXME for diagnosing wavefrontsize32 on targets that don't support it. But not feeling strongly enough to block this change! sameerds: I would have preferred this to be a separate change, just like the FIXME for diagnosing…
		<< "'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive";
		return false;
		}

		// Don't assume any wavesize with an unknown subtarget.
		if (!IsNullCPU) {
		// Default to wave32 if available, or wave64 if not
		if (!HaveWave32 && !HaveWave64) {
		StringRef DefaultWaveSizeFeature =
		IsWave32Capable ? "wavefrontsize32" : "wavefrontsize64";
		Features.insert(std::make_pair(DefaultWaveSizeFeature, true));
		}
		}

		return true;
}		}

void AMDGPUTargetInfo::fillValidCPUList(		void AMDGPUTargetInfo::fillValidCPUList(
SmallVectorImpl<StringRef> &Values) const {		SmallVectorImpl<StringRef> &Values) const {
if (isAMDGCN(getTriple()))		if (isAMDGCN(getTriple()))
llvm::AMDGPU::fillValidArchListAMDGCN(Values);		llvm::AMDGPU::fillValidArchListAMDGCN(Values);
else		else
llvm::AMDGPU::fillValidArchListR600(Values);		llvm::AMDGPU::fillValidArchListR600(Values);
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	case AMDGPU::BI__builtin_amdgcn_fracth:			case AMDGPU::BI__builtin_amdgcn_fracth:
	return emitUnaryBuiltin(*this, E, Intrinsic::amdgcn_fract);			return emitUnaryBuiltin(*this, E, Intrinsic::amdgcn_fract);
	case AMDGPU::BI__builtin_amdgcn_lerp:			case AMDGPU::BI__builtin_amdgcn_lerp:
	return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_lerp);			return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_lerp);
	case AMDGPU::BI__builtin_amdgcn_ubfe:			case AMDGPU::BI__builtin_amdgcn_ubfe:
	return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_ubfe);			return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_ubfe);
	case AMDGPU::BI__builtin_amdgcn_sbfe:			case AMDGPU::BI__builtin_amdgcn_sbfe:
	return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_sbfe);			return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_sbfe);
				case AMDGPU::BI__builtin_amdgcn_ballot_w32:
				case AMDGPU::BI__builtin_amdgcn_ballot_w64: {
				llvm::Type *ResultType = ConvertType(E->getType());
				llvm::Value *Src = EmitScalarExpr(E->getArg(0));
				Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_ballot, { ResultType });
				return Builder.CreateCall(F, { Src });
				}
	case AMDGPU::BI__builtin_amdgcn_uicmp:			case AMDGPU::BI__builtin_amdgcn_uicmp:
	case AMDGPU::BI__builtin_amdgcn_uicmpl:			case AMDGPU::BI__builtin_amdgcn_uicmpl:
	case AMDGPU::BI__builtin_amdgcn_sicmp:			case AMDGPU::BI__builtin_amdgcn_sicmp:
	case AMDGPU::BI__builtin_amdgcn_sicmpl: {			case AMDGPU::BI__builtin_amdgcn_sicmpl: {
	llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));			llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
	llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));			llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
	llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));			llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));

	▲ Show 20 Lines • Show All 2,791 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl

This file was added.

				// RUN: not %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 -target-feature +wavefrontsize64 -o /dev/null %s 2>&1 \| FileCheck %s
				// RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx1103 -target-feature +wavefrontsize32 -target-feature +wavefrontsize64 -o /dev/null %s 2>&1 \| FileCheck %s

				// CHECK: error: invalid feature combination: 'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive

				kernel void test() {}

clang/test/CodeGenOpenCL/amdgpu-features.cl

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// Check that appropriate features are defined for every supported AMDGPU			// Check that appropriate features are defined for every supported AMDGPU
	// "-target" and "-mcpu" options.			// "-target" and "-mcpu" options.

				// RUN: %clang_cc1 -triple amdgcn -S -emit-llvm -o - %s \| FileCheck --check-prefix=NOCPU %s
				// RUN: %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 -S -emit-llvm -o - %s \| FileCheck --check-prefix=NOCPU-WAVE32 %s
				yaxunlUnsubmitted Not Done Reply Inline Actions what happens if both +wavefrontsize32 and +wavefrontsize64 are specified? yaxunl: what happens if both +wavefrontsize32 and +wavefrontsize64 are specified?
				sameerdsUnsubmitted Not Done Reply Inline Actions Shouldn't this be separately an error in itself? Is it tested elsewhere? sameerds: Shouldn't this be separately an error in itself? Is it tested elsewhere?
				arsenmAuthorUnsubmitted Done Reply Inline Actions It looks like you end up with both features set by clang, and wave64 wins in codegen arsenm: It looks like you end up with both features set by clang, and wave64 wins in codegen
				// RUN: %clang_cc1 -triple amdgcn -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck --check-prefix=NOCPU-WAVE64 %s

	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX600 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX600 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX601 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX602 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX602 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX700 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX700 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX701 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX701 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX702 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX702 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX703 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX703 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX704 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX704 %s
	Show All 23 Lines
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1034 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1034 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1034 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1034 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1035 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1035 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1035 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1035 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1036 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1036 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1036 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1036 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1100 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1100 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1100 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1100 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1101 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1101 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1101 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1101 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1102 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1102 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1102 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1102 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1103 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1103 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1103 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1103 %s

	// GFX600: "target-features"="+s-memtime-inst"			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1103 -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1103-W64 %s
	// GFX601: "target-features"="+s-memtime-inst"
	// GFX602: "target-features"="+s-memtime-inst"			// NOCPU-NOT: "target-features"
	// GFX700: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"			// NOCPU-WAVE32: "target-features"="+wavefrontsize32"
	// GFX701: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"			// NOCPU-WAVE64: "target-features"="+wavefrontsize64"
	// GFX702: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
	// GFX703: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"			// GFX600: "target-features"="+s-memtime-inst,+wavefrontsize64"
	// GFX704: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"			// GFX601: "target-features"="+s-memtime-inst,+wavefrontsize64"
	// GFX705: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"			// GFX602: "target-features"="+s-memtime-inst,+wavefrontsize64"
	// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"			// GFX700: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX802: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"			// GFX701: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX803: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"			// GFX702: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX805: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"			// GFX703: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX810: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"			// GFX704: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX900: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX705: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst,+wavefrontsize64"
	// GFX902: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX802: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX803: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime,+s-memtime-inst"			// GFX805: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX810: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX90A: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst"			// GFX900: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX902: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst"			// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1013: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX90A: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1030: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1031: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
	// GFX1032: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1033: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1034: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1035: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX1013: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1036: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"			// GFX1030: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1100: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts"			// GFX1031: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1101: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts"			// GFX1032: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1102: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts"			// GFX1033: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
	// GFX1103: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts"			// GFX1034: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
				// GFX1035: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
				// GFX1036: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
				// GFX1100: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
				// GFX1101: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
				// GFX1102: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
				// GFX1103: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
				// GFX1103-W64: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"

	kernel void test() {}			kernel void test() {}

clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl

	Show All 31 Lines
	}			}

	// CHECK-LABEL: @test_groupstaticsize			// CHECK-LABEL: @test_groupstaticsize
	// CHECK: call i32 @llvm.amdgcn.groupstaticsize()			// CHECK: call i32 @llvm.amdgcn.groupstaticsize()
	void test_groupstaticsize(global uint* out)			void test_groupstaticsize(global uint* out)
	{			{
	*out = __builtin_amdgcn_groupstaticsize();			*out = __builtin_amdgcn_groupstaticsize();
	}			}

				// CHECK-LABEL: @test_ballot_wave32(
				// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 %{{.+}})
				void test_ballot_wave32(global uint* out, int a, int b)
				{
				*out = __builtin_amdgcn_ballot_w32(a == b);
				}

clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl

This file was added.

				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -D__AMDGCN_WAVEFRONT_SIZE=32 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s

				typedef unsigned int uint;


				// CHECK-LABEL: @test_ballot_wave32(
				// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 %{{.+}})
				void test_ballot_wave32(global uint* out, int a, int b)
				{
				*out = __builtin_amdgcn_ballot_w32(a == b);
				}

				// CHECK-LABEL: @test_ballot_wave32_target_attr(
				// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 %{{.+}})
				__attribute__((target("wavefrontsize32")))
				void test_ballot_wave32_target_attr(global uint* out, int a, int b)
				{
				*out = __builtin_amdgcn_ballot_w32(a == b);
				}

				#if __AMDGCN_WAVEFRONT_SIZE != 32
				#error Wrong wavesize detected
				#endif

clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl

This file was added.

				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx900 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx900 -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s
				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -target-feature +wavefrontsize64 -S -emit-llvm -o - %s \| FileCheck -enable-var-scope %s

				typedef unsigned long ulong;

				// CHECK-LABEL: @test_ballot_wave64(
				// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 %{{.+}})
				void test_ballot_wave64(global ulong* out, int a, int b)
				{
				*out = __builtin_amdgcn_ballot_w64(a == b);
				}

				// CHECK-LABEL: @test_ballot_wave64_target_attr(
				// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 %{{.+}})
				__attribute__((target("wavefrontsize64")))
				void test_ballot_wave64_target_attr(global ulong* out, int a, int b)
				{
				*out = __builtin_amdgcn_ballot_w64(a == b);
				}

				#if __AMDGCN_WAVEFRONT_SIZE != 64
				#error Wrong wavesize detected
				#endif

clang/test/OpenMP/amdgcn-attributes.cpp

	Show All 27 Lines
	}			}

	int callable(int x) {			int callable(int x) {
	// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1			// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1
	return x + 1;			return x + 1;
	}			}

	// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }			// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
	// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" "uniform-work-group-size"="true" }			// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" }
	// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }			// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
	// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }			// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }

	// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }			// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
	// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }			// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }
	// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }			// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
	// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }			// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }

clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl

This file was added.

				// RUN: %clang_cc1 -triple amdgcn-- -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx900 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx900 -target-feature +wavefrontsize64 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature +wavefrontsize64 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize32 -verify -S -o - %s

				typedef unsigned int uint;

				void test_ballot_wave32(global uint* out, int a, int b) {
				*out = __builtin_amdgcn_ballot_w32(a == b); // expected-error {{'__builtin_amdgcn_ballot_w32' needs target feature wavefrontsize32}}
				}

				// FIXME: Should error for subtargets that don't support wave32
				__attribute__((target("wavefrontsize32")))
				void test_ballot_wave32_target_attr(global uint* out, int a, int b) {
				*out = __builtin_amdgcn_ballot_w32(a == b);
				}

clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl

This file was added.

				// RUN: %clang_cc1 -triple amdgcn-- -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-feature +wavefrontsize32 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature +wavefrontsize32 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s
				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s

				typedef unsigned long ulong;

				void test_ballot_wave64(global ulong* out, int a, int b) {
				*out = __builtin_amdgcn_ballot_w64(a == b); // expected-error {{'__builtin_amdgcn_ballot_w64' needs target feature wavefrontsize64}}
				}

				__attribute__((target("wavefrontsize64")))
				void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {
				*out = __builtin_amdgcn_ballot_w64(a == b);
				}