This is an archive of the discontinued LLVM Phabricator instance.

lib/Basic/Targets.cpp
161	Unless you're planning to guarantee 1:1 match to functionality provided by nvidia's sm_32, it would be prudent to use some other value for the macro so the source code has a way to tell these GPUs apart. Another issue with this approach is that typical use pattern for CUDA_ARCH is `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be able to maintain order across GPU architectures among NVIDIA and AMD GPUs. Perhaps for HIP compilation it would make more sense to define CUDA_ARCH as 1 (this should serve as a legacy indication of device-side compilation) and define HIP_ARCH to indicate which AMD GPU we're compiling for without accidentally enabling something that was intended for NVIDIA's GPUs only.

yaxunl added inline comments.Apr 5 2018, 11:19 AM

lib/Basic/Targets.cpp
161	I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can make that change. On the other hand, I think it may be difficult to define `__HIP_ARCH__` which can sort mixed nvptx/amdgcn GPU's by capability. I do think a well defined `__HIP_ARCH__` would be useful for users. Just need some further discussion how to define it. For now, if there are specific codes for nvptx, it can continue use `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can check predefined amdgpu canonical names, e.g. `__gfx803__`, etc.

tra added inline comments.Apr 5 2018, 11:28 AM

lib/Basic/Targets.cpp
161	OK.

yaxunl added inline comments.Apr 6 2018, 7:41 AM

lib/Basic/Targets.cpp

161

I asked Ben Sander about whether we can define HIP_ARCH, here is his answer:

HIP targets a broader set of hardware than just a single vendor so additional flexibility in defining feature capability is required. The HIP_ARCH_ macro provide per-feature-granularity mechanism to query features. Also the code tends to be more clear as opposed to an "if __CUDA_ARCH>3 ..assume some feature".

For example

// 32-bit Atomics:
#define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1)
#define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1)
#define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1)
#define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1)
#define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1)

// 64-bit Atomics:
#define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1)
#define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0)

// Doubles
#define __HIP_ARCH_HAS_DOUBLES__ (1)

// warp cross-lane operations:
#define __HIP_ARCH_HAS_WARP_VOTE__ (1)
#define __HIP_ARCH_HAS_WARP_BALLOT__ (1)
#define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1)
#define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0)

// sync
#define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1)
#define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0)

// misc
#define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0)
#define __HIP_ARCH_HAS_3DGRID__ (1)
#define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0)

tra added inline comments.Apr 6 2018, 9:58 AM

lib/Basic/Targets.cpp
161	I assume that will be handled somewhere else -- different patch, different place. Looks like setting `__CUDA_ARCH__` to 1 is all that should be done here. While we're looking a this, is CUDA compatibility one of your goals? I.e. do you expect existing CUDA code to be compilable and functional on AMD hardware? If not, then, perhaps you don't need `__CUDA_*__` defines at all.

yaxunl added inline comments.Apr 6 2018, 10:47 AM

lib/Basic/Targets.cpp
161	CUDA code needs to be translated to HIP code since the host API is different. In most cases the translation can be done by script automatically. `__CUDA_ARCH__` cannot be automatically translated because it is not portable to non-nvptx devices, however it is often used to indicate device compilation. Therefore we still need to define it in HIP to indicate device compilation. In this way, CUDA programs using `__CUDA_ARCH__` just for checking device compilation can be automatically translated. If users want to use features associated with specific `__CUDA_ARCH__` they can manually modify the translated code to use `__HIP_ARCH_HAS_*` macros.

tra added inline comments.Apr 6 2018, 11:03 AM

lib/Basic/Targets.cpp
161	It sounds like this translation is a one-time offline process and HIP-mode compiler is not going to see any non-HIP code. If that's the case, I'm not quite sure I see the need for defining `__CUDA_ARCH__` in HIP mode -- translation process should've converted the CUDA-specific macro in the original code to it's HIP equivalent or get user to part it to something HIP can deal with. HIP programming guide also says that `__CUDA_ARCH__` is undefined by hcc.

yaxunl added inline comments.Apr 6 2018, 11:17 AM

lib/Basic/Targets.cpp
161	Sorry I missed that. I will revert the change about macro `__CUDA_ARCH__` and define `__HIP_DEVICE_COMPILE__` instead. Thanks.

tra mentioned this in D45387: [CUDA] Revert defining __CUDA_ARCH__ for amdgcn targets.Apr 6 2018, 1:02 PM

yaxunl mentioned this in rL329584: [CUDA] Revert defining __CUDA_ARCH__ for amdgcn targets.Apr 9 2018, 8:47 AM

yaxunl mentioned this in rC329584: [CUDA] Revert defining __CUDA_ARCH__ for amdgcn targets.

Revision Contents

Path

Size

include/

clang/

Basic/

Cuda.h

14 lines

lib/

Basic/

Cuda.cpp

83 lines

Targets.h

4 lines

Targets.cpp

55 lines

Targets/

AMDGPU.h

3 lines

AMDGPU.cpp

5 lines

NVPTX.cpp

44 lines

test/

Driver/

cuda-arch-translation.cu

39 lines

Diff 141056

include/clang/Basic/Cuda.h

Show All 40 Lines	enum class CudaArch {
SM_50,		SM_50,
SM_52,		SM_52,
SM_53,		SM_53,
SM_60,		SM_60,
SM_61,		SM_61,
SM_62,		SM_62,
SM_70,		SM_70,
SM_72,		SM_72,
		GFX600,
		GFX601,
		GFX700,
		GFX701,
		GFX702,
		GFX703,
		GFX704,
		GFX801,
		GFX802,
		GFX803,
		GFX810,
		GFX900,
		GFX902,
LAST,		LAST,
};		};
const char *CudaArchToString(CudaArch A);		const char *CudaArchToString(CudaArch A);

// The input should have the form "sm_20".		// The input should have the form "sm_20".
CudaArch StringToCudaArch(llvm::StringRef S);		CudaArch StringToCudaArch(llvm::StringRef S);

enum class CudaVirtualArch {		enum class CudaVirtualArch {
UNKNOWN,		UNKNOWN,
COMPUTE_20,		COMPUTE_20,
COMPUTE_30,		COMPUTE_30,
COMPUTE_32,		COMPUTE_32,
COMPUTE_35,		COMPUTE_35,
COMPUTE_37,		COMPUTE_37,
COMPUTE_50,		COMPUTE_50,
COMPUTE_52,		COMPUTE_52,
COMPUTE_53,		COMPUTE_53,
COMPUTE_60,		COMPUTE_60,
COMPUTE_61,		COMPUTE_61,
COMPUTE_62,		COMPUTE_62,
COMPUTE_70,		COMPUTE_70,
COMPUTE_72,		COMPUTE_72,
		COMPUTE_AMDGCN,
};		};
const char *CudaVirtualArchToString(CudaVirtualArch A);		const char *CudaVirtualArchToString(CudaVirtualArch A);

// The input should have the form "compute_20".		// The input should have the form "compute_20".
CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S);		CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S);

/// Get the compute_xx corresponding to an sm_yy.		/// Get the compute_xx corresponding to an sm_yy.
CudaVirtualArch VirtualArchForCudaArch(CudaArch A);		CudaVirtualArch VirtualArchForCudaArch(CudaArch A);
Show All 10 Lines

lib/Basic/Cuda.cpp

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	const char *CudaArchToString(CudaArch A) {
case CudaArch::SM_61:		case CudaArch::SM_61:
return "sm_61";		return "sm_61";
case CudaArch::SM_62:		case CudaArch::SM_62:
return "sm_62";		return "sm_62";
case CudaArch::SM_70:		case CudaArch::SM_70:
return "sm_70";		return "sm_70";
case CudaArch::SM_72:		case CudaArch::SM_72:
return "sm_72";		return "sm_72";
		case CudaArch::GFX600: // tahiti
		return "gfx600";
		case CudaArch::GFX601: // pitcairn, verde, oland,hainan
		return "gfx601";
		case CudaArch::GFX700: // kaveri
		return "gfx700";
		case CudaArch::GFX701: // hawaii
		return "gfx701";
		case CudaArch::GFX702: // 290,290x,R390,R390x
		return "gfx702";
		case CudaArch::GFX703: // kabini mullins
		return "gfx703";
		case CudaArch::GFX704: // bonaire
		return "gfx704";
		case CudaArch::GFX801: // carrizo
		return "gfx801";
		case CudaArch::GFX802: // tonga,iceland
		return "gfx802";
		case CudaArch::GFX803: // fiji,polaris10
		return "gfx803";
		case CudaArch::GFX810: // stoney
		return "gfx810";
		case CudaArch::GFX900: // vega, instinct
		return "gfx900";
		case CudaArch::GFX902: // TBA
		return "gfx902";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaArch StringToCudaArch(llvm::StringRef S) {		CudaArch StringToCudaArch(llvm::StringRef S) {
return llvm::StringSwitch<CudaArch>(S)		return llvm::StringSwitch<CudaArch>(S)
.Case("sm_20", CudaArch::SM_20)		.Case("sm_20", CudaArch::SM_20)
.Case("sm_21", CudaArch::SM_21)		.Case("sm_21", CudaArch::SM_21)
.Case("sm_30", CudaArch::SM_30)		.Case("sm_30", CudaArch::SM_30)
.Case("sm_32", CudaArch::SM_32)		.Case("sm_32", CudaArch::SM_32)
.Case("sm_35", CudaArch::SM_35)		.Case("sm_35", CudaArch::SM_35)
.Case("sm_37", CudaArch::SM_37)		.Case("sm_37", CudaArch::SM_37)
.Case("sm_50", CudaArch::SM_50)		.Case("sm_50", CudaArch::SM_50)
.Case("sm_52", CudaArch::SM_52)		.Case("sm_52", CudaArch::SM_52)
.Case("sm_53", CudaArch::SM_53)		.Case("sm_53", CudaArch::SM_53)
.Case("sm_60", CudaArch::SM_60)		.Case("sm_60", CudaArch::SM_60)
.Case("sm_61", CudaArch::SM_61)		.Case("sm_61", CudaArch::SM_61)
.Case("sm_62", CudaArch::SM_62)		.Case("sm_62", CudaArch::SM_62)
.Case("sm_70", CudaArch::SM_70)		.Case("sm_70", CudaArch::SM_70)
.Case("sm_72", CudaArch::SM_72)		.Case("sm_72", CudaArch::SM_72)
		.Case("gfx600", CudaArch::GFX600)
		.Case("gfx601", CudaArch::GFX601)
		.Case("gfx700", CudaArch::GFX700)
		.Case("gfx701", CudaArch::GFX701)
		.Case("gfx702", CudaArch::GFX702)
		.Case("gfx703", CudaArch::GFX703)
		.Case("gfx704", CudaArch::GFX704)
		.Case("gfx801", CudaArch::GFX801)
		.Case("gfx802", CudaArch::GFX802)
		.Case("gfx803", CudaArch::GFX803)
		.Case("gfx810", CudaArch::GFX810)
		.Case("gfx900", CudaArch::GFX900)
		.Case("gfx902", CudaArch::GFX902)
.Default(CudaArch::UNKNOWN);		.Default(CudaArch::UNKNOWN);
}		}

const char *CudaVirtualArchToString(CudaVirtualArch A) {		const char *CudaVirtualArchToString(CudaVirtualArch A) {
switch (A) {		switch (A) {
case CudaVirtualArch::UNKNOWN:		case CudaVirtualArch::UNKNOWN:
return "unknown";		return "unknown";
case CudaVirtualArch::COMPUTE_20:		case CudaVirtualArch::COMPUTE_20:
Show All 17 Lines	const char *CudaVirtualArchToString(CudaVirtualArch A) {
case CudaVirtualArch::COMPUTE_61:		case CudaVirtualArch::COMPUTE_61:
return "compute_61";		return "compute_61";
case CudaVirtualArch::COMPUTE_62:		case CudaVirtualArch::COMPUTE_62:
return "compute_62";		return "compute_62";
case CudaVirtualArch::COMPUTE_70:		case CudaVirtualArch::COMPUTE_70:
return "compute_70";		return "compute_70";
case CudaVirtualArch::COMPUTE_72:		case CudaVirtualArch::COMPUTE_72:
return "compute_72";		return "compute_72";
		case CudaVirtualArch::COMPUTE_AMDGCN:
		return "compute_amdgcn";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S) {		CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S) {
return llvm::StringSwitch<CudaVirtualArch>(S)		return llvm::StringSwitch<CudaVirtualArch>(S)
.Case("compute_20", CudaVirtualArch::COMPUTE_20)		.Case("compute_20", CudaVirtualArch::COMPUTE_20)
.Case("compute_30", CudaVirtualArch::COMPUTE_30)		.Case("compute_30", CudaVirtualArch::COMPUTE_30)
.Case("compute_32", CudaVirtualArch::COMPUTE_32)		.Case("compute_32", CudaVirtualArch::COMPUTE_32)
.Case("compute_35", CudaVirtualArch::COMPUTE_35)		.Case("compute_35", CudaVirtualArch::COMPUTE_35)
.Case("compute_37", CudaVirtualArch::COMPUTE_37)		.Case("compute_37", CudaVirtualArch::COMPUTE_37)
.Case("compute_50", CudaVirtualArch::COMPUTE_50)		.Case("compute_50", CudaVirtualArch::COMPUTE_50)
.Case("compute_52", CudaVirtualArch::COMPUTE_52)		.Case("compute_52", CudaVirtualArch::COMPUTE_52)
.Case("compute_53", CudaVirtualArch::COMPUTE_53)		.Case("compute_53", CudaVirtualArch::COMPUTE_53)
.Case("compute_60", CudaVirtualArch::COMPUTE_60)		.Case("compute_60", CudaVirtualArch::COMPUTE_60)
.Case("compute_61", CudaVirtualArch::COMPUTE_61)		.Case("compute_61", CudaVirtualArch::COMPUTE_61)
.Case("compute_62", CudaVirtualArch::COMPUTE_62)		.Case("compute_62", CudaVirtualArch::COMPUTE_62)
.Case("compute_70", CudaVirtualArch::COMPUTE_70)		.Case("compute_70", CudaVirtualArch::COMPUTE_70)
.Case("compute_72", CudaVirtualArch::COMPUTE_72)		.Case("compute_72", CudaVirtualArch::COMPUTE_72)
		.Case("compute_amdgcn", CudaVirtualArch::COMPUTE_AMDGCN)
.Default(CudaVirtualArch::UNKNOWN);		.Default(CudaVirtualArch::UNKNOWN);
}		}

CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {		CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::LAST:		case CudaArch::LAST:
break;		break;
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
Show All 20 Lines	CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {
case CudaArch::SM_61:		case CudaArch::SM_61:
return CudaVirtualArch::COMPUTE_61;		return CudaVirtualArch::COMPUTE_61;
case CudaArch::SM_62:		case CudaArch::SM_62:
return CudaVirtualArch::COMPUTE_62;		return CudaVirtualArch::COMPUTE_62;
case CudaArch::SM_70:		case CudaArch::SM_70:
return CudaVirtualArch::COMPUTE_70;		return CudaVirtualArch::COMPUTE_70;
case CudaArch::SM_72:		case CudaArch::SM_72:
return CudaVirtualArch::COMPUTE_72;		return CudaVirtualArch::COMPUTE_72;
		case CudaArch::GFX600:
		case CudaArch::GFX601:
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX702:
		case CudaArch::GFX703:
		case CudaArch::GFX704:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX902:
		return CudaVirtualArch::COMPUTE_AMDGCN;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion MinVersionForCudaArch(CudaArch A) {		CudaVersion MinVersionForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::LAST:		case CudaArch::LAST:
break;		break;
Show All 12 Lines	CudaVersion MinVersionForCudaArch(CudaArch A) {
case CudaArch::SM_60:		case CudaArch::SM_60:
case CudaArch::SM_61:		case CudaArch::SM_61:
case CudaArch::SM_62:		case CudaArch::SM_62:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
case CudaArch::SM_70:		case CudaArch::SM_70:
return CudaVersion::CUDA_90;		return CudaVersion::CUDA_90;
case CudaArch::SM_72:		case CudaArch::SM_72:
return CudaVersion::CUDA_91;		return CudaVersion::CUDA_91;
		case CudaArch::GFX600:
		case CudaArch::GFX601:
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX702:
		case CudaArch::GFX703:
		case CudaArch::GFX704:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX902:
		return CudaVersion::CUDA_70;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion MaxVersionForCudaArch(CudaArch A) {		CudaVersion MaxVersionForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
return CudaVersion::UNKNOWN;		return CudaVersion::UNKNOWN;
case CudaArch::SM_20:		case CudaArch::SM_20:
case CudaArch::SM_21:		case CudaArch::SM_21:
		case CudaArch::GFX600:
		case CudaArch::GFX601:
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX702:
		case CudaArch::GFX703:
		case CudaArch::GFX704:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX902:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
default:		default:
return CudaVersion::LATEST;		return CudaVersion::LATEST;
}		}
}		}

} // namespace clang		} // namespace clang

lib/Basic/Targets.h

	Show All 10 Lines
	// from a target triple. Typically individual targets will need to include from			// from a target triple. Typically individual targets will need to include from
	// here in order to get these functions if required.			// here in order to get these functions if required.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_H			#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_H
	#define LLVM_CLANG_LIB_BASIC_TARGETS_H			#define LLVM_CLANG_LIB_BASIC_TARGETS_H

				#include "clang/Basic/Cuda.h"
	#include "clang/Basic/LangOptions.h"			#include "clang/Basic/LangOptions.h"
	#include "clang/Basic/MacroBuilder.h"			#include "clang/Basic/MacroBuilder.h"
	#include "clang/Basic/TargetInfo.h"			#include "clang/Basic/TargetInfo.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"

	namespace clang {			namespace clang {
	namespace targets {			namespace targets {

	Show All 14 Lines

	LLVM_LIBRARY_VISIBILITY			LLVM_LIBRARY_VISIBILITY
	void addMinGWDefines(const llvm::Triple &Triple, const clang::LangOptions &Opts,			void addMinGWDefines(const llvm::Triple &Triple, const clang::LangOptions &Opts,
	clang::MacroBuilder &Builder);			clang::MacroBuilder &Builder);

	LLVM_LIBRARY_VISIBILITY			LLVM_LIBRARY_VISIBILITY
	void addCygMingDefines(const clang::LangOptions &Opts,			void addCygMingDefines(const clang::LangOptions &Opts,
	clang::MacroBuilder &Builder);			clang::MacroBuilder &Builder);

				LLVM_LIBRARY_VISIBILITY
				void defineCudaArchMacro(CudaArch GPU, clang::MacroBuilder &Builder);
	} // namespace targets			} // namespace targets
	} // namespace clang			} // namespace clang
	#endif // LLVM_CLANG_LIB_BASIC_TARGETS_H			#endif // LLVM_CLANG_LIB_BASIC_TARGETS_H

lib/Basic/Targets.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	if (Triple.isArch64Bit()) {
DefineStd(Builder, "WIN64", Opts);		DefineStd(Builder, "WIN64", Opts);
Builder.defineMacro("__MINGW64__");		Builder.defineMacro("__MINGW64__");
}		}
Builder.defineMacro("__MSVCRT__");		Builder.defineMacro("__MSVCRT__");
Builder.defineMacro("__MINGW32__");		Builder.defineMacro("__MINGW32__");
addCygMingDefines(Opts, Builder);		addCygMingDefines(Opts, Builder);
}		}

		void defineCudaArchMacro(CudaArch GPU, clang::MacroBuilder &Builder) {
		std::string CUDAArchCode = [GPU] {
		switch (GPU) {
		case CudaArch::LAST:
		break;
		case CudaArch::SM_20:
		return "200";
		case CudaArch::SM_21:
		return "210";
		case CudaArch::SM_30:
		return "300";
		case CudaArch::SM_32:
		return "320";
		case CudaArch::SM_35:
		return "350";
		case CudaArch::SM_37:
		return "370";
		case CudaArch::SM_50:
		return "500";
		case CudaArch::SM_52:
		return "520";
		case CudaArch::SM_53:
		return "530";
		case CudaArch::SM_60:
		return "600";
		case CudaArch::SM_61:
		return "610";
		case CudaArch::SM_62:
		return "620";
		case CudaArch::SM_70:
		return "700";
		case CudaArch::SM_72:
		return "720";
		case CudaArch::GFX600:
		case CudaArch::GFX601:
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX702:
		case CudaArch::GFX703:
		case CudaArch::GFX704:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX902:
		return "320";
		traUnsubmitted Not Done Reply Inline Actions Unless you're planning to guarantee 1:1 match to functionality provided by nvidia's sm_32, it would be prudent to use some other value for the macro so the source code has a way to tell these GPUs apart. Another issue with this approach is that typical use pattern for CUDA_ARCH is `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be able to maintain order across GPU architectures among NVIDIA and AMD GPUs. Perhaps for HIP compilation it would make more sense to define CUDA_ARCH as 1 (this should serve as a legacy indication of device-side compilation) and define HIP_ARCH to indicate which AMD GPU we're compiling for without accidentally enabling something that was intended for NVIDIA's GPUs only. tra: Unless you're planning to guarantee 1:1 match to functionality provided by nvidia's sm_32, it…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can make that change. On the other hand, I think it may be difficult to define `__HIP_ARCH__` which can sort mixed nvptx/amdgcn GPU's by capability. I do think a well defined `__HIP_ARCH__` would be useful for users. Just need some further discussion how to define it. For now, if there are specific codes for nvptx, it can continue use `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can check predefined amdgpu canonical names, e.g. `__gfx803__`, etc. yaxunl: I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can make that change. On the…
		traUnsubmitted Not Done Reply Inline Actions OK. tra: OK.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions I asked Ben Sander about whether we can define HIP_ARCH, here is his answer: HIP targets a broader set of hardware than just a single vendor so additional flexibility in defining feature capability is required. The HIP_ARCH_ macro provide per-feature-granularity mechanism to query features. Also the code tends to be more clear as opposed to an "if __CUDA_ARCH>3 ..assume some feature". For example // 32-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1) #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1) #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1) #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1) #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1) // 64-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1) #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0) // Doubles #define __HIP_ARCH_HAS_DOUBLES__ (1) // warp cross-lane operations: #define __HIP_ARCH_HAS_WARP_VOTE__ (1) #define __HIP_ARCH_HAS_WARP_BALLOT__ (1) #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1) #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0) // sync #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1) #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0) // misc #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0) #define __HIP_ARCH_HAS_3DGRID__ (1) #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0) yaxunl: I asked Ben Sander about whether we can define __HIP_ARCH__, here is his answer: HIP targets a…
		traUnsubmitted Not Done Reply Inline Actions I assume that will be handled somewhere else -- different patch, different place. Looks like setting `__CUDA_ARCH__` to 1 is all that should be done here. While we're looking a this, is CUDA compatibility one of your goals? I.e. do you expect existing CUDA code to be compilable and functional on AMD hardware? If not, then, perhaps you don't need `__CUDA___` defines at all. tra:* I assume that will be handled somewhere else -- different patch, different place. Looks like…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions CUDA code needs to be translated to HIP code since the host API is different. In most cases the translation can be done by script automatically. `__CUDA_ARCH__` cannot be automatically translated because it is not portable to non-nvptx devices, however it is often used to indicate device compilation. Therefore we still need to define it in HIP to indicate device compilation. In this way, CUDA programs using `__CUDA_ARCH__` just for checking device compilation can be automatically translated. If users want to use features associated with specific `__CUDA_ARCH__` they can manually modify the translated code to use `__HIP_ARCH_HAS_` macros. yaxunl:* CUDA code needs to be translated to HIP code since the host API is different. In most cases the…
		traUnsubmitted Not Done Reply Inline Actions It sounds like this translation is a one-time offline process and HIP-mode compiler is not going to see any non-HIP code. If that's the case, I'm not quite sure I see the need for defining `__CUDA_ARCH__` in HIP mode -- translation process should've converted the CUDA-specific macro in the original code to it's HIP equivalent or get user to part it to something HIP can deal with. HIP programming guide also says that `__CUDA_ARCH__` is undefined by hcc. tra: It sounds like this translation is a one-time offline process and HIP-mode compiler is not…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions Sorry I missed that. I will revert the change about macro `__CUDA_ARCH__` and define `__HIP_DEVICE_COMPILE__` instead. Thanks. yaxunl: Sorry I missed that. I will revert the change about macro `__CUDA_ARCH__` and define…
		case CudaArch::UNKNOWN:
		llvm_unreachable("unhandled Cuda/HIP Arch");
		}
		llvm_unreachable("unhandled Cuda/HIP Arch");
		}();
		Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Driver code		// Driver code
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

TargetInfo *AllocateTarget(const llvm::Triple &Triple,		TargetInfo *AllocateTarget(const llvm::Triple &Triple,
const TargetOptions &Opts) {		const TargetOptions &Opts) {
llvm::Triple::OSType os = Triple.getOS();		llvm::Triple::OSType os = Triple.getOS();

▲ Show 20 Lines • Show All 531 Lines • Show Last 20 Lines

lib/Basic/Targets/AMDGPU.h

//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//		//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares AMDGPU TargetInfo objects.		// This file declares AMDGPU TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H
#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

		#include "clang/Basic/Cuda.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/TargetOptions.h"		#include "clang/Basic/TargetOptions.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"

namespace clang {		namespace clang {
namespace targets {		namespace targets {
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo {

GPUInfo parseGPUName(StringRef Name) const;		GPUInfo parseGPUName(StringRef Name) const;

GPUInfo GPU;		GPUInfo GPU;

static bool isAMDGCN(const llvm::Triple &TT) {		static bool isAMDGCN(const llvm::Triple &TT) {
return TT.getArch() == llvm::Triple::amdgcn;		return TT.getArch() == llvm::Triple::amdgcn;
}		}
		CudaArch GCN_Subarch;

public:		public:
AMDGPUTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);		AMDGPUTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);

void setAddressSpaceMap(bool DefaultIsPrivate);		void setAddressSpaceMap(bool DefaultIsPrivate);

void adjust(LangOptions &Opts) override;		void adjust(LangOptions &Opts) override;

▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	public:
void fillValidCPUList(SmallVectorImpl<StringRef> &Values) const override;		void fillValidCPUList(SmallVectorImpl<StringRef> &Values) const override;

bool setCPU(const std::string &Name) override {		bool setCPU(const std::string &Name) override {
if (getTriple().getArch() == llvm::Triple::amdgcn)		if (getTriple().getArch() == llvm::Triple::amdgcn)
GPU = parseAMDGCNName(Name);		GPU = parseAMDGCNName(Name);
else		else
GPU = parseR600Name(Name);		GPU = parseR600Name(Name);

		GCN_Subarch = StringToCudaArch(Name);
return GK_NONE != GPU.Kind;		return GK_NONE != GPU.Kind;
}		}

void setSupportedOpenCLOpts() override {		void setSupportedOpenCLOpts() override {
auto &Opts = getSupportedOpenCLOpts();		auto &Opts = getSupportedOpenCLOpts();
Opts.support("cl_clang_storage_class_specifiers");		Opts.support("cl_clang_storage_class_specifiers");
Opts.support("cl_khr_icd");		Opts.support("cl_khr_icd");

▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

lib/Basic/Targets/AMDGPU.cpp

//===--- AMDGPU.cpp - Implement AMDGPU target feature support -------------===//		//===--- AMDGPU.cpp - Implement AMDGPU target feature support -------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements AMDGPU TargetInfo objects.		// This file implements AMDGPU TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "Targets.h"
#include "clang/Basic/Builtins.h"		#include "clang/Basic/Builtins.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/MacroBuilder.h"		#include "clang/Basic/MacroBuilder.h"
#include "clang/Basic/TargetBuiltins.h"		#include "clang/Basic/TargetBuiltins.h"
#include "clang/Frontend/CodeGenOptions.h"		#include "clang/Frontend/CodeGenOptions.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"

using namespace clang;		using namespace clang;
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines

AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,		AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,
const TargetOptions &Opts)		const TargetOptions &Opts)
: TargetInfo(Triple),		: TargetInfo(Triple),
GPU(isAMDGCN(Triple) ? AMDGCNGPUs[0] : parseR600Name(Opts.CPU)) {		GPU(isAMDGCN(Triple) ? AMDGCNGPUs[0] : parseR600Name(Opts.CPU)) {
resetDataLayout(isAMDGCN(getTriple()) ? DataLayoutStringAMDGCN		resetDataLayout(isAMDGCN(getTriple()) ? DataLayoutStringAMDGCN
: DataLayoutStringR600);		: DataLayoutStringR600);
assert(DataLayout->getAllocaAddrSpace() == Private);		assert(DataLayout->getAllocaAddrSpace() == Private);
		GCN_Subarch = CudaArch::GFX803; // Default to fiji

setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D \|\|		setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D \|\|
!isAMDGCN(Triple));		!isAMDGCN(Triple));
UseAddrSpaceMapMangling = true;		UseAddrSpaceMapMangling = true;

// Set pointer width and alignment for target address space 0.		// Set pointer width and alignment for target address space 0.
PointerWidth = PointerAlign = DataLayout->getPointerSizeInBits();		PointerWidth = PointerAlign = DataLayout->getPointerSizeInBits();
if (getMaxPointerWidth() == 64) {		if (getMaxPointerWidth() == 64) {
Show All 28 Lines	void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
if (isAMDGCN(getTriple()))		if (isAMDGCN(getTriple()))
Builder.defineMacro("__AMDGCN__");		Builder.defineMacro("__AMDGCN__");
else		else
Builder.defineMacro("__R600__");		Builder.defineMacro("__R600__");

if (GPU.Kind != GK_NONE)		if (GPU.Kind != GK_NONE)
Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName) + Twine("__"));		Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName) + Twine("__"));

		if (Opts.CUDAIsDevice)
		defineCudaArchMacro(GCN_Subarch, Builder);

// TODO: __HAS_FMAF__, __HAS_LDEXPF__, __HAS_FP64__ are deprecated and will be		// TODO: __HAS_FMAF__, __HAS_LDEXPF__, __HAS_FP64__ are deprecated and will be
// removed in the near future.		// removed in the near future.
if (GPU.HasFMAF)		if (GPU.HasFMAF)
Builder.defineMacro("__HAS_FMAF__");		Builder.defineMacro("__HAS_FMAF__");
if (GPU.HasFastFMAF)		if (GPU.HasFastFMAF)
Builder.defineMacro("FP_FAST_FMAF");		Builder.defineMacro("FP_FAST_FMAF");
if (GPU.HasLDEXPF)		if (GPU.HasLDEXPF)
Builder.defineMacro("__HAS_LDEXPF__");		Builder.defineMacro("__HAS_LDEXPF__");
if (GPU.HasFP64)		if (GPU.HasFP64)
Builder.defineMacro("__HAS_FP64__");		Builder.defineMacro("__HAS_FP64__");
if (GPU.HasFastFMA)		if (GPU.HasFastFMA)
Builder.defineMacro("FP_FAST_FMA");		Builder.defineMacro("FP_FAST_FMA");
}		}

lib/Basic/Targets/NVPTX.cpp

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	return llvm::StringSwitch<bool>(Feature)
.Case("satom", GPU >= CudaArch::SM_60) // Atomics w/ scope.		.Case("satom", GPU >= CudaArch::SM_60) // Atomics w/ scope.
.Default(false);		.Default(false);
}		}

void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts,		void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {		MacroBuilder &Builder) const {
Builder.defineMacro("__PTX__");		Builder.defineMacro("__PTX__");
Builder.defineMacro("__NVPTX__");		Builder.defineMacro("__NVPTX__");
if (Opts.CUDAIsDevice) {		if (Opts.CUDAIsDevice)
// Set __CUDA_ARCH__ for the GPU specified.		defineCudaArchMacro(GPU, Builder);
std::string CUDAArchCode = [this] {
switch (GPU) {
case CudaArch::LAST:
break;
case CudaArch::UNKNOWN:
assert(false && "No GPU arch when compiling CUDA device code.");
return "";
case CudaArch::SM_20:
return "200";
case CudaArch::SM_21:
return "210";
case CudaArch::SM_30:
return "300";
case CudaArch::SM_32:
return "320";
case CudaArch::SM_35:
return "350";
case CudaArch::SM_37:
return "370";
case CudaArch::SM_50:
return "500";
case CudaArch::SM_52:
return "520";
case CudaArch::SM_53:
return "530";
case CudaArch::SM_60:
return "600";
case CudaArch::SM_61:
return "610";
case CudaArch::SM_62:
return "620";
case CudaArch::SM_70:
return "700";
case CudaArch::SM_72:
return "720";
}
llvm_unreachable("unhandled CudaArch");
}();
Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
}
}		}

ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {		ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {
return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -		return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -
Builtin::FirstTSBuiltin);		Builtin::FirstTSBuiltin);
}		}

test/Driver/cuda-arch-translation.cu

	Show All 25 Lines
	// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_60 %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_60 %s 2>&1 \
	// RUN: \| FileCheck -check-prefixes=COMMON,SM60 %s			// RUN: \| FileCheck -check-prefixes=COMMON,SM60 %s
	// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_61 %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_61 %s 2>&1 \
	// RUN: \| FileCheck -check-prefixes=COMMON,SM61 %s			// RUN: \| FileCheck -check-prefixes=COMMON,SM61 %s
	// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_62 %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_62 %s 2>&1 \
	// RUN: \| FileCheck -check-prefixes=COMMON,SM62 %s			// RUN: \| FileCheck -check-prefixes=COMMON,SM62 %s
	// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_70 %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=sm_70 %s 2>&1 \
	// RUN: \| FileCheck -check-prefixes=COMMON,SM70 %s			// RUN: \| FileCheck -check-prefixes=COMMON,SM70 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx600 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX600 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx601 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX601 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx700 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX700 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx701 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX701 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx702 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX702 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx703 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX703 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx704 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX704 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx801 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX801 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx802 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX802 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx803 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX803 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx810 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX810 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx900 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX900 %s
				// RUN: %clang -### -target x86_64-linux-gnu -c --cuda-gpu-arch=gfx902 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=COMMON,GFX902 %s

	// COMMON: ptxas			// COMMON: ptxas
	// COMMON-SAME: -m64			// COMMON-SAME: -m64
	// COMMON: fatbinary			// COMMON: fatbinary

	// SM20:--image=profile=sm_20{{.*}}--image=profile=compute_20			// SM20:--image=profile=sm_20{{.*}}--image=profile=compute_20
	// SM21:--image=profile=sm_21{{.*}}--image=profile=compute_20			// SM21:--image=profile=sm_21{{.*}}--image=profile=compute_20
	// SM30:--image=profile=sm_30{{.*}}--image=profile=compute_30			// SM30:--image=profile=sm_30{{.*}}--image=profile=compute_30
	// SM32:--image=profile=sm_32{{.*}}--image=profile=compute_32			// SM32:--image=profile=sm_32{{.*}}--image=profile=compute_32
	// SM35:--image=profile=sm_35{{.*}}--image=profile=compute_35			// SM35:--image=profile=sm_35{{.*}}--image=profile=compute_35
	// SM37:--image=profile=sm_37{{.*}}--image=profile=compute_37			// SM37:--image=profile=sm_37{{.*}}--image=profile=compute_37
	// SM50:--image=profile=sm_50{{.*}}--image=profile=compute_50			// SM50:--image=profile=sm_50{{.*}}--image=profile=compute_50
	// SM52:--image=profile=sm_52{{.*}}--image=profile=compute_52			// SM52:--image=profile=sm_52{{.*}}--image=profile=compute_52
	// SM53:--image=profile=sm_53{{.*}}--image=profile=compute_53			// SM53:--image=profile=sm_53{{.*}}--image=profile=compute_53
	// SM60:--image=profile=sm_60{{.*}}--image=profile=compute_60			// SM60:--image=profile=sm_60{{.*}}--image=profile=compute_60
	// SM61:--image=profile=sm_61{{.*}}--image=profile=compute_61			// SM61:--image=profile=sm_61{{.*}}--image=profile=compute_61
	// SM62:--image=profile=sm_62{{.*}}--image=profile=compute_62			// SM62:--image=profile=sm_62{{.*}}--image=profile=compute_62
	// SM70:--image=profile=sm_70{{.*}}--image=profile=compute_70			// SM70:--image=profile=sm_70{{.*}}--image=profile=compute_70
				// GFX600:--image=profile=gfx600{{.*}}--image=profile=compute_amdgcn
				// GFX601:--image=profile=gfx601{{.*}}--image=profile=compute_amdgcn
				// GFX700:--image=profile=gfx700{{.*}}--image=profile=compute_amdgcn
				// GFX701:--image=profile=gfx701{{.*}}--image=profile=compute_amdgcn
				// GFX702:--image=profile=gfx702{{.*}}--image=profile=compute_amdgcn
				// GFX703:--image=profile=gfx703{{.*}}--image=profile=compute_amdgcn
				// GFX704:--image=profile=gfx704{{.*}}--image=profile=compute_amdgcn
				// GFX801:--image=profile=gfx801{{.*}}--image=profile=compute_amdgcn
				// GFX802:--image=profile=gfx802{{.*}}--image=profile=compute_amdgcn
				// GFX803:--image=profile=gfx803{{.*}}--image=profile=compute_amdgcn
				// GFX810:--image=profile=gfx810{{.*}}--image=profile=compute_amdgcn
				// GFX900:--image=profile=gfx900{{.*}}--image=profile=compute_amdgcn
				// GFX902:--image=profile=gfx902{{.*}}--image=profile=compute_amdgcn

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Add amdgpu sub archsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 141056

include/clang/Basic/Cuda.h

lib/Basic/Cuda.cpp

lib/Basic/Targets.h

lib/Basic/Targets.cpp

lib/Basic/Targets/AMDGPU.h

lib/Basic/Targets/AMDGPU.cpp

lib/Basic/Targets/NVPTX.cpp

test/Driver/cuda-arch-translation.cu

[CUDA] Add amdgpu sub archs
ClosedPublic