This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
10/10
BuiltinsAMDGPU.def
-
lib/CodeGen/
-
CodeGen/
15/15
CGBuiltin.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
builtins-fp-atomics-gfx1030.cl
-
builtins-fp-atomics-gfx8.cl
3/3
builtins-fp-atomics-gfx90a.cl
1/1
builtins-fp-atomics-unsupported-gfx7.cl
1/1
unsupported-fadd2f16-gfx908.cl
-
unsupported-fadd32-gfx908.cl
-
unsupported-fadd64-flat-gfx908.cl
-
unsupported-fadd64-gfx908.cl
-
unsupported-fmax64-flat-gfx908.cl
-
unsupported-fmax64-gfx908.cl
-
unsupported-fmin64-flat-gfx908.cl
-
unsupported-fmin64-gfx908.cl

Differential D106909

[clang] Add clang builtins support for gfx90a
ClosedPublic

Authored by gandhi21299 on Jul 27 2021, 12:55 PM.

Download Raw Diff

Details

Reviewers

arsenm
yaxunl
rampitec
b-sumner
t-tye

Commits

rG39dac1f7f656: [clang] Add clang builtins support for gfx90a

Summary

Implement target builtins for gfx90a including fadd64, fadd32, add2h, max and min on various global, flat and ds address spaces for which intrinsics are already implemented.

@rampitec Compiler recommended me to add global-noret target feature after setting it in BuiltinsAMDGPU.def. I am not sure what that means outside of the BuiltinsAMDGPU.def so I have changed it back to gfx90a-insts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gandhi21299 created this revision.Jul 27 2021, 12:55 PM

Herald added subscribers: kerbowa, jfb, tpr and 2 others. · View Herald TranscriptJul 27 2021, 12:55 PM

gandhi21299 requested review of this revision.Jul 27 2021, 12:55 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2021, 12:55 PM

Herald added subscribers: cfe-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B116516: Diff 362142.Jul 27 2021, 2:31 PM

added more tests with supported combinations of address spaces
minor nits

Harbormaster completed remote builds in B116729: Diff 362433.Jul 28 2021, 11:53 AM

added scope argument
removed irrelevant tests and updated them
replaced constant by generic qualifier for two of the test functions

removed kernel from functions taking in __generic qualified addr

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2021, 1:42 PM

Herald added subscribers: llvm-commits, foad, hiraditya. · View Herald Transcript

eliminated scope argument, seemed irrelevant

Harbormaster completed remote builds in B116815: Diff 362555.Jul 28 2021, 6:09 PM

fixed tests by replacing checks for the functions with generic-qualified arguments.

Harbormaster completed remote builds in B116992: Diff 362804.Jul 29 2021, 9:59 AM

Passed PSDB

Missing sema test for rejecting the builtins on targets that don't support this

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	"_2f16" looks weird to me. The instruction names call it "pk"
clang/lib/CodeGen/CGBuiltin.cpp
16251	Initializing this here is strange, sink down to the double case
clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗	(On Diff #362804)	If you're going to bother testing the ISA, is it worth testing rtn and no rtn versions?

yaxunl added inline comments.Jul 30 2021, 6:47 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	This is to have a consistent postfix naming convention, since the stem part here are the same. the postfix is for the argument type of the builtin function.

gandhi21299 added inline comments.Jul 30 2021, 8:30 AM

clang/lib/CodeGen/CGBuiltin.cpp
16251	You mean push it down after line 16119?
clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗	(On Diff #362804)	Sorry, what do you mean by rtn version?

arsenm added inline comments.Jul 30 2021, 9:02 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	Just a plain 2 isn't consistent either. The llvm type naming convention would add a v prefix, but the builtins should probably follow the instructions
clang/lib/CodeGen/CGBuiltin.cpp
16251	Yes
clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗	(On Diff #362804)	Most atomics can be optimized if they don't return the in memory value if the value is unused

gandhi21299 marked 7 inline comments as done.Jul 30 2021, 11:10 AM

gandhi21299 added inline comments.

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	Yea, v2f16 looks reasonable.

addressed reviewers' feedback:

changed a builtin name,
corrected tests,
minor formatting nits

Harbormaster completed remote builds in B117232: Diff 363159.Jul 30 2021, 11:50 AM

refreshing patch

Harbormaster completed remote builds in B117253: Diff 363196.Jul 30 2021, 2:27 PM

Needs an IR test, a test for different supported targets, and a negative test for unsupported features.

clang/include/clang/Basic/BuiltinsAMDGPU.def
199	Correct attribute for this one in atomic-fadd-insts. In particular it was first added in gfx908 and you would need to test it too.
205	Flat address space is 0.
210	This is available since gfx8. Attribute gfx8-insts.
clang/lib/CodeGen/CGBuiltin.cpp
16212	You do not need any of that code. You can directly map a builtin to intrinsic in the IntrinsicsAMDGPU.td.
clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗	(On Diff #362804)	Certainly yes, because global_atomic_add_f32 did not have return version on gfx908.

This revision now requires changes to proceed.Aug 2 2021, 2:53 PM

rampitec added a reviewer: b-sumner.Aug 2 2021, 2:54 PM

rampitec added a reviewer: t-tye.

gandhi21299 marked 3 inline comments as done.Aug 2 2021, 3:45 PM

gandhi21299 added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
16212	Sorry, I looked around for several days but I could not figure this out. Is there a concrete example?

rampitec added inline comments.Aug 2 2021, 3:46 PM

clang/lib/CodeGen/CGBuiltin.cpp
16212	Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.

arsenm added inline comments.Aug 2 2021, 3:47 PM

clang/lib/CodeGen/CGBuiltin.cpp
16212	This is not true if the intrinsic requires type mangling. GCCBuiltin is too simple to handle it

rampitec added inline comments.Aug 2 2021, 3:49 PM

clang/lib/CodeGen/CGBuiltin.cpp
16212	Yes, but these do not need it. All of these builtins are specific.

arsenm added inline comments.Aug 2 2021, 3:50 PM

clang/lib/CodeGen/CGBuiltin.cpp
16212	These intrinsics are all mangled based on the FP type

rampitec added inline comments.Aug 2 2021, 3:52 PM

clang/lib/CodeGen/CGBuiltin.cpp
16212	Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code then.

rampitec added inline comments.Aug 2 2021, 3:54 PM

clang/lib/CodeGen/CGBuiltin.cpp
16270	Should we map flags since we already have them?

gandhi21299 marked 5 inline comments as done.Aug 3 2021, 8:31 AM

gandhi21299 added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
16270	Do you mean the memory order flag?

gandhi21299 marked an inline comment as done.Aug 3 2021, 8:37 AM

@rampitec how do I handle the following?

builtins-fp-atomics.cl:38:10: error: '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature atomic-fadd-insts
  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, memory_order_relaxed);
         ^

In D106909#2922567, @gandhi21299 wrote:

@rampitec how do I handle the following?

builtins-fp-atomics.cl:38:10: error: '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature atomic-fadd-insts
  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, memory_order_relaxed);
         ^

It is f64, it needs gfx90a-insts. atomic-fadd-insts is for global f32.

clang/lib/CodeGen/CGBuiltin.cpp
16270	All 3: ordering, scope and volatile.

@rampitec what should I be testing exactly in the IR test?

In D106909#2923724, @gandhi21299 wrote:

@rampitec what should I be testing exactly in the IR test?

Produced call to the intrinsic. All of these tests there doing that.

gandhi21299 added inline comments.Aug 3 2021, 3:52 PM

clang/lib/CodeGen/CGBuiltin.cpp
16270	Following the discussion, what change is required here?

rampitec added inline comments.Aug 3 2021, 3:53 PM

clang/lib/CodeGen/CGBuiltin.cpp
16270	Keep zeroes, drop immediate argument of the builtins.

updating test

gandhi21299 added inline comments.Aug 4 2021, 9:13 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	I tried add target feature gfx908-insts for this builtin but the frontend complains that it should have target feature gfx90a-insts.

foad removed a subscriber: foad.Aug 4 2021, 9:17 AM

rampitec added inline comments.Aug 4 2021, 10:11 AM

clang/include/clang/Basic/BuiltinsAMDGPU.def
201	That was for global_atomic_fadd_f32, but as per discussion we are going to use builtin only starting from gfx90a because of the noret problem. Comments in the review are off their positions after multiple patch updates.
210	This needs tests with a gfx8 target and a negative test with gfx7.
clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
14	Use _f64 or _double in the test name.
32	Same here and in other double tests, use a suffix f64 or double.
68	'constant' is wrong. It is flat. Here and everywhere.
clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx908.cl
7 ↗	(On Diff #364152)	Need to check all other buintins too.

Harbormaster completed remote builds in B117938: Diff 364152.Aug 4 2021, 10:12 AM

gandhi21299 marked 9 inline comments as done.Aug 4 2021, 1:54 PM

gandhi21299 marked an inline comment as done.

added more negative tests
fixed some tests

Harbormaster completed remote builds in B117999: Diff 364242.Aug 4 2021, 2:34 PM

rampitec added inline comments.Aug 4 2021, 2:40 PM

clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl
8	Add new line.
clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl
1	Combine all of these gfx908 error tests into a single file. For example like in the builtins-amdgcn-dl-insts-err.cl. It is also better to rename these test filenames to follow the existing pattern: builtins-amdgcn-*-err.cl

gandhi21299 marked 2 inline comments as done.Aug 4 2021, 3:13 PM

combined tests into a single file
renamed tests for consistency

LGTM

This revision is now accepted and ready to land.Aug 4 2021, 3:17 PM

Thanks a lot for the review, I will merge this patch in :)

I will merge this patch in as soon as the builds are successful.

Harbormaster completed remote builds in B118018: Diff 364266.Aug 4 2021, 4:06 PM

Closed by commit rG39dac1f7f656: [clang] Add clang builtins support for gfx90a (authored by gandhi21299). · Explain WhyAug 5 2021, 1:08 AM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG39dac1f7f656: [clang] Add clang builtins support for gfx90a.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAMDGPU.def

13 lines

lib/

CodeGen/

CGBuiltin.cpp

68 lines

test/

CodeGenOpenCL/

builtins-fp-atomics-gfx1030.cl

14 lines

builtins-fp-atomics-gfx8.cl

14 lines

builtins-fp-atomics-gfx90a.cl

115 lines

builtins-fp-atomics-unsupported-gfx7.cl

7 lines

unsupported-fadd2f16-gfx908.cl

8 lines

unsupported-fadd32-gfx908.cl

8 lines

unsupported-fadd64-flat-gfx908.cl

8 lines

unsupported-fadd64-gfx908.cl

8 lines

unsupported-fmax64-flat-gfx908.cl

8 lines

unsupported-fmax64-gfx908.cl

8 lines

unsupported-fmin64-flat-gfx908.cl

8 lines

unsupported-fmin64-gfx908.cl

8 lines

Diff 364242

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_amdgcn_perm, "UiUiUiUi", "nc", "gfx8-insts")			TARGET_BUILTIN(__builtin_amdgcn_perm, "UiUiUiUi", "nc", "gfx8-insts")

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GFX9+ only builtins.			// GFX9+ only builtins.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	TARGET_BUILTIN(__builtin_amdgcn_fmed3h, "hhhh", "nc", "gfx9-insts")			TARGET_BUILTIN(__builtin_amdgcn_fmed3h, "hhhh", "nc", "gfx9-insts")

				TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1d", "t", "gfx90a-insts")
				rampitecUnsubmitted Done Reply Inline Actions Correct attribute for this one in atomic-fadd-insts. In particular it was first added in gfx908 and you would need to test it too. rampitec: Correct attribute for this one in atomic-fadd-insts. In particular it was first added in gfx908…
				TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", "gfx90a-insts")
				TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", "gfx90a-insts")
				arsenmUnsubmitted Done Reply Inline Actions "_2f16" looks weird to me. The instruction names call it "pk" arsenm: "_2f16" looks weird to me. The instruction names call it "pk"
				yaxunlUnsubmitted Done Reply Inline Actions This is to have a consistent postfix naming convention, since the stem part here are the same. the postfix is for the argument type of the builtin function. yaxunl: This is to have a consistent postfix naming convention, since the stem part here are the same.
				arsenmUnsubmitted Done Reply Inline Actions Just a plain 2 isn't consistent either. The llvm type naming convention would add a v prefix, but the builtins should probably follow the instructions arsenm: Just a plain 2 isn't consistent either. The llvm type naming convention would add a v prefix…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Yea, v2f16 looks reasonable. gandhi21299: Yea, v2f16 looks reasonable.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions I tried add target feature gfx908-insts for this builtin but the frontend complains that it should have target feature gfx90a-insts. gandhi21299: I tried add target feature gfx908-insts for this builtin but the frontend complains that it…
				rampitecUnsubmitted Done Reply Inline Actions That was for global_atomic_fadd_f32, but as per discussion we are going to use builtin only starting from gfx90a because of the noret problem. Comments in the review are off their positions after multiple patch updates. rampitec: That was for global_atomic_fadd_f32, but as per discussion we are going to use builtin only…
				TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmin_f64, "dd*1d", "t", "gfx90a-insts")
				TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmax_f64, "dd*1d", "t", "gfx90a-insts")

				TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_f64, "dd*0d", "t", "gfx90a-insts")
				rampitecUnsubmitted Done Reply Inline Actions Flat address space is 0. rampitec: Flat address space is 0.
				TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmin_f64, "dd*0d", "t", "gfx90a-insts")
				TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmax_f64, "dd*0d", "t", "gfx90a-insts")

				TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3d", "t", "gfx90a-insts")
				TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3f", "t", "gfx8-insts")
				rampitecUnsubmitted Done Reply Inline Actions This is available since gfx8. Attribute gfx8-insts. rampitec: This is available since gfx8. Attribute gfx8-insts.
				rampitecUnsubmitted Done Reply Inline Actions This needs tests with a gfx8 target and a negative test with gfx7. rampitec: This needs tests with a gfx8 target and a negative test with gfx7.

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Deep learning builtins.			// Deep learning builtins.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot7-insts")			TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot7-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")			TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")
	TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")			TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")			TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,195 Lines • ▼ Show 20 Lines	case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
llvm::Value *Src3 = EmitScalarExpr(E->getArg(3));		llvm::Value *Src3 = EmitScalarExpr(E->getArg(3));
llvm::Value *Src4 = EmitScalarExpr(E->getArg(4));		llvm::Value *Src4 = EmitScalarExpr(E->getArg(4));
llvm::Function *F = CGM.getIntrinsic(Intrin, { Src1->getType() });		llvm::Function *F = CGM.getIntrinsic(Intrin, { Src1->getType() });
llvm::FunctionType *FTy = F->getFunctionType();		llvm::FunctionType *FTy = F->getFunctionType();
llvm::Type *PTy = FTy->getParamType(0);		llvm::Type *PTy = FTy->getParamType(0);
Src0 = Builder.CreatePointerBitCastOrAddrSpaceCast(Src0, PTy);		Src0 = Builder.CreatePointerBitCastOrAddrSpaceCast(Src0, PTy);
return Builder.CreateCall(F, { Src0, Src1, Src2, Src3, Src4 });		return Builder.CreateCall(F, { Src0, Src1, Src2, Src3, Src4 });
}		}
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
		Intrinsic::ID IID;
		rampitecUnsubmitted Done Reply Inline Actions You do not need any of that code. You can directly map a builtin to intrinsic in the IntrinsicsAMDGPU.td. rampitec: You do not need any of that code. You can directly map a builtin to intrinsic in the…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Sorry, I looked around for several days but I could not figure this out. Is there a concrete example? gandhi21299: Sorry, I looked around for several days but I could not figure this out. Is there a concrete…
		rampitecUnsubmitted Done Reply Inline Actions Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`. rampitec: Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.
		arsenmUnsubmitted Done Reply Inline Actions This is not true if the intrinsic requires type mangling. GCCBuiltin is too simple to handle it arsenm: This is not true if the intrinsic requires type mangling. GCCBuiltin is too simple to handle it
		rampitecUnsubmitted Done Reply Inline Actions Yes, but these do not need it. All of these builtins are specific. rampitec: Yes, but these do not need it. All of these builtins are specific.
		arsenmUnsubmitted Done Reply Inline Actions These intrinsics are all mangled based on the FP type arsenm: These intrinsics are all mangled based on the FP type
		rampitecUnsubmitted Done Reply Inline Actions Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code then. rampitec: Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code then.
		llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
		switch (BuiltinID) {
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
		ArgTy = llvm::Type::getFloatTy(getLLVMContext());
		IID = Intrinsic::amdgcn_global_atomic_fadd;
		break;
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
		ArgTy = llvm::FixedVectorType::get(
		llvm::Type::getHalfTy(getLLVMContext()), 2);
		IID = Intrinsic::amdgcn_global_atomic_fadd;
		break;
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
		IID = Intrinsic::amdgcn_global_atomic_fadd;
		break;
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
		IID = Intrinsic::amdgcn_global_atomic_fmin;
		break;
		case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
		IID = Intrinsic::amdgcn_global_atomic_fmax;
		break;
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
		IID = Intrinsic::amdgcn_flat_atomic_fadd;
		break;
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
		IID = Intrinsic::amdgcn_flat_atomic_fmin;
		break;
		case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
		IID = Intrinsic::amdgcn_flat_atomic_fmax;
		break;
		}
		llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
		llvm::Value *Val = EmitScalarExpr(E->getArg(1));
		llvm::Function *F =
		CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
		return Builder.CreateCall(F, {Addr, Val});
		}
		case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
		case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32: {
		Intrinsic::ID IID;
		arsenmUnsubmitted Done Reply Inline Actions Initializing this here is strange, sink down to the double case arsenm: Initializing this here is strange, sink down to the double case
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions You mean push it down after line 16119? gandhi21299: You mean push it down after line 16119?
		arsenmUnsubmitted Done Reply Inline Actions Yes arsenm: Yes
		llvm::Type *ArgTy;
		switch (BuiltinID) {
		case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
		ArgTy = llvm::Type::getFloatTy(getLLVMContext());
		IID = Intrinsic::amdgcn_ds_fadd;
		break;
		case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
		ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
		IID = Intrinsic::amdgcn_ds_fadd;
		break;
		}
		llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
		llvm::Value *Val = EmitScalarExpr(E->getArg(1));
		llvm::Constant *ZeroI32 = llvm::ConstantInt::getIntegerValue(
		llvm::Type::getInt32Ty(getLLVMContext()), APInt(32, 0, true));
		llvm::Constant *ZeroI1 = llvm::ConstantInt::getIntegerValue(
		llvm::Type::getInt1Ty(getLLVMContext()), APInt(1, 0));
		llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
		return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
		rampitecUnsubmitted Done Reply Inline Actions Should we map flags since we already have them? rampitec: Should we map flags since we already have them?
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Do you mean the memory order flag? gandhi21299: Do you mean the memory order flag?
		rampitecUnsubmitted Done Reply Inline Actions All 3: ordering, scope and volatile. rampitec: All 3: ordering, scope and volatile.
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Following the discussion, what change is required here? gandhi21299: Following the discussion, what change is required here?
		rampitecUnsubmitted Done Reply Inline Actions Keep zeroes, drop immediate argument of the builtins. rampitec: Keep zeroes, drop immediate argument of the builtins.
		}
case AMDGPU::BI__builtin_amdgcn_read_exec: {		case AMDGPU::BI__builtin_amdgcn_read_exec: {
CallInst *CI = cast<CallInst>(		CallInst *CI = cast<CallInst>(
EmitSpecialRegisterBuiltin(*this, E, Int64Ty, Int64Ty, NormalRead, "exec"));		EmitSpecialRegisterBuiltin(*this, E, Int64Ty, Int64Ty, NormalRead, "exec"));
CI->setConvergent();		CI->setConvergent();
return CI;		return CI;
}		}
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:		case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
case AMDGPU::BI__builtin_amdgcn_read_exec_hi: {		case AMDGPU::BI__builtin_amdgcn_read_exec_hi: {
▲ Show 20 Lines • Show All 2,199 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx1030.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx1030 \
				// RUN: -S -o - %s
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx1030 \
				// RUN: -S -o - %s \| FileCheck -check-prefix=GFX1030 %s

				// CHECK-LABEL: test_ds_addf_local
				// CHECK: call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %{{.}}, float %{{.}},
				// GFX1030-LABEL: test_ds_addf_local$local
				// GFX1030: ds_add_rtn_f32
				void test_ds_addf_local(__local float *addr, float x){
				float *rtn;
				*rtn = __builtin_amdgcn_ds_atomic_fadd_f32(addr, x);
				}

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx8.cl

This file was added.

				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx810 \
				// RUN: %s -S -emit-llvm -o - \| FileCheck %s
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx810 \
				// RUN: -S -o - %s \| FileCheck -check-prefix=GFX8 %s

				// CHECK-LABEL: test_fadd_local
				// CHECK: call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %{{.}}, float %{{.}}, i32 0, i32 0, i1 false)
				// GFX8-LABEL: test_fadd_local$local:
				// GFX8: ds_add_rtn_f32 v2, v0, v1
				// GFX8: s_endpgm
				kernel void test_fadd_local(__local float *ptr, float val){
				float *res;
				*res = __builtin_amdgcn_ds_atomic_fadd_f32(ptr, val);
				}

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl

This file was added.

				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
				// RUN: %s -S -emit-llvm -o - \| FileCheck %s -check-prefix=CHECK

				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
				// RUN: -S -o - %s \| FileCheck -check-prefix=GFX90A %s

				typedef half __attribute__((ext_vector_type(2))) half2;

				// CHECK-LABEL: test_global_add_f64
				// CHECK: call double @llvm.amdgcn.global.atomic.fadd.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_global_add_f64$local:
				// GFX90A: global_atomic_add_f64
				void test_global_add_f64(__global double *addr, double x) {
				double *rtn;
				rampitecUnsubmitted Done Reply Inline Actions Use _f64 or _double in the test name. rampitec: Use _f64 or _double in the test name.
				*rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x);
				}

				// CHECK-LABEL: test_global_add_half2
				// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1v2f16.v2f16(<2 x half> addrspace(1)* %{{.}}, <2 x half> %{{.}})
				// GFX90A-LABEL: test_global_add_half2
				// GFX90A: global_atomic_pk_add_f16 v2, v[0:1], v2, off glc
				void test_global_add_half2(__global half2 *addr, half2 x) {
				half2 *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fadd_v2f16(addr, x);
				}

				// CHECK-LABEL: test_global_global_min_f64
				// CHECK: call double @llvm.amdgcn.global.atomic.fmin.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_global_global_min_f64$local
				// GFX90A: global_atomic_min_f64
				void test_global_global_min_f64(__global double *addr, double x){
				double *rtn;
				rampitecUnsubmitted Done Reply Inline Actions Same here and in other double tests, use a suffix f64 or double. rampitec: Same here and in other double tests, use a suffix f64 or double.
				*rtn = __builtin_amdgcn_global_atomic_fmin_f64(addr, x);
				}

				// CHECK-LABEL: test_global_max_f64
				// CHECK: call double @llvm.amdgcn.global.atomic.fmax.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_global_max_f64$local
				// GFX90A: global_atomic_max_f64
				void test_global_max_f64(__global double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fmax_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_add_local_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p3f64.f64(double addrspace(3)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_flat_add_local_f64$local
				// GFX90A: ds_add_rtn_f64
				void test_flat_add_local_f64(__local double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fadd_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_global_add_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_flat_global_add_f64$local
				// GFX90A: global_atomic_add_f64
				void test_flat_global_add_f64(__global double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fadd_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_min_flat_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fmin.f64.p0f64.f64(double* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_flat_min_flat_f64$local
				// GFX90A: flat_atomic_min_f64
				void test_flat_min_flat_f64(__generic double *addr, double x){
				double *rtn;
				rampitecUnsubmitted Done Reply Inline Actions 'constant' is wrong. It is flat. Here and everywhere. rampitec: 'constant' is wrong. It is flat. Here and everywhere.
				*rtn = __builtin_amdgcn_flat_atomic_fmin_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_global_min_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fmin.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A: test_flat_global_min_f64$local
				// GFX90A: global_atomic_min_f64
				void test_flat_global_min_f64(__global double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fmin_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_max_flat_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fmax.f64.p0f64.f64(double* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_flat_max_flat_f64$local
				// GFX90A: flat_atomic_max_f64
				void test_flat_max_flat_f64(__generic double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fmax_f64(addr, x);
				}

				// CHECK-LABEL: test_flat_global_max_f64
				// CHECK: call double @llvm.amdgcn.flat.atomic.fmax.f64.p1f64.f64(double addrspace(1)* %{{.}}, double %{{.}})
				// GFX90A-LABEL: test_flat_global_max_f64$local
				// GFX90A: global_atomic_max_f64
				void test_flat_global_max_f64(__global double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fmax_f64(addr, x);
				}

				// CHECK-LABEL: test_ds_add_local_f64
				// CHECK: call double @llvm.amdgcn.ds.fadd.f64(double addrspace(3)* %{{.}}, double %{{.}},
				// GFX90A: test_ds_add_local_f64$local
				// GFX90A: ds_add_rtn_f64
				void test_ds_add_local_f64(__local double *addr, double x){
				double *rtn;
				*rtn = __builtin_amdgcn_ds_atomic_fadd_f64(addr, x);
				}

				// CHECK-LABEL: test_ds_addf_local_f32
				// CHECK: call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %{{.}}, float %{{.}},
				// GFX90A-LABEL: test_ds_addf_local_f32$local
				// GFX90A: ds_add_rtn_f32
				void test_ds_addf_local_f32(__local float *addr, float x){
				float *rtn;
				*rtn = __builtin_amdgcn_ds_atomic_fadd_f32(addr, x);
				}

clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl

This file was added.

				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx700 \
				// RUN: %s -verify -S -o -

				kernel void test_fadd_local(__local float *ptr, float val){
				float *res;
				*res = __builtin_amdgcn_ds_atomic_fadd_f32(ptr, val); // expected-error{{'__builtin_amdgcn_ds_atomic_fadd_f32' needs target feature gfx8-insts}}
				}
				No newline at end of file
				rampitecUnsubmitted Done Reply Inline Actions Add new line. rampitec: Add new line.

clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				rampitecUnsubmitted Done Reply Inline Actions Combine all of these gfx908 error tests into a single file. For example like in the builtins-amdgcn-dl-insts-err.cl. It is also better to rename these test filenames to follow the existing pattern: builtins-amdgcn--err.cl rampitec:* Combine all of these gfx908 error tests into a single file. For example like in the builtins…
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s
				typedef half __attribute__((ext_vector_type(2))) half2;
				void test_global_add_2f16(__global half2 *addr, half2 x) {
				half2 *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fadd_v2f16(addr, x); // expected-error{{'__builtin_amdgcn_global_atomic_fadd_v2f16' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fadd32-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_global_add_f32(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fadd_f32(addr, x); // expected-error{{'__builtin_amdgcn_global_atomic_fadd_f32' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fadd64-flat-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_flat_add_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fadd_f64(addr, x); // expected-error{{'__builtin_amdgcn_flat_atomic_fadd_f64' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fadd64-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_global_add_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x); // expected-error{{'__builtin_amdgcn_global_atomic_fadd_f64' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fmax64-flat-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_flat_max_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fmax_f64(addr, x); // expected-error{{'__builtin_amdgcn_flat_atomic_fmax_f64' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fmax64-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_global_max_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fmax_f64(addr, x); // expected-error{{'__builtin_amdgcn_global_atomic_fmax_f64' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fmin64-flat-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_flat_min_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_flat_atomic_fmin_f64(addr, x); // expected-error{{'__builtin_amdgcn_flat_atomic_fmin_f64' needs target feature gfx90a-insts}}
				}

clang/test/CodeGenOpenCL/unsupported-fmin64-gfx908.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx908 \
				// RUN: -verify -S -o - %s

				void test_global_min_f64(__global double *addr, double x) {
				double *rtn;
				*rtn = __builtin_amdgcn_global_atomic_fmin_f64(addr, x); // expected-error{{'__builtin_amdgcn_global_atomic_fmin_f64' needs target feature gfx90a-insts}}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add clang builtins support for gfx90aClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364242

clang/include/clang/Basic/BuiltinsAMDGPU.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx1030.cl

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx8.cl

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl

clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl

clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fadd32-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fadd64-flat-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fadd64-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fmax64-flat-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fmax64-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fmin64-flat-gfx908.cl

clang/test/CodeGenOpenCL/unsupported-fmin64-gfx908.cl

[clang] Add clang builtins support for gfx90a
ClosedPublic