This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn
ClosedPublic

Authored by kzhuravl on Feb 16 2018, 2:46 PM.

Download Raw Diff

Details

Reviewers

t-tye
b-sumner
scchan

Summary

Expand GK_*s (i.e. GFX6 -> GFX600, GFX601, etc.)
- This allows us to choose features correctly in some cases (for example, fast fmaf is available on gfx600, but not gfx601)
Move HasFMAF, HasFP64, HasLDEXPF to GPUInfo tables
Add HasFastFMA, HasFastFMAF to GPUInfo tables
Add missing tests

Diff Detail

Event Timeline

kzhuravl created this revision.Feb 16 2018, 2:46 PM

Herald added subscribers: tpr, dstuttard, yaxunl and 2 others. · View Herald TranscriptFeb 16 2018, 2:46 PM

t-tye added a subscriber: b-sumner.Feb 16 2018, 2:53 PM

t-tye added inline comments.

lib/Basic/Targets/AMDGPU.cpp
345–348	Do all amdgcn targets have fast FMA? @b-sumner can you clarify?

t-tye added a reviewer: b-sumner.Feb 16 2018, 2:54 PM

b-sumner added inline comments.Feb 16 2018, 3:18 PM

lib/Basic/Targets/AMDGPU.cpp
345–348	No. All targets that support double precision should report FAST_FMA. Only targets with full rate v_fma_f32 should report FAST_FMAF

t-tye added inline comments.Feb 16 2018, 3:24 PM

lib/Basic/Targets/AMDGPU.cpp
345–348	It is unfortunate that clang does not have access to the processor features defined in the td files which gives the settings for each target.

t-tye added a reviewer: scchan.Feb 16 2018, 3:27 PM

t-tye requested changes to this revision.Feb 16 2018, 4:31 PM

t-tye added inline comments.

lib/Basic/Targets/AMDGPU.cpp
345–348	Now that the compiler knows the target it seems the clang options that specify fast_fma et al should be removed and the runtimes changed to not set them. The implementation of when fast fma[f] is present should match the amdgcn td files which have all gfx9 and some pre-gfx9 targets supporting fast fmaf (the ones that have full rate double precision).

This revision now requires changes to proceed.Feb 16 2018, 4:31 PM

Address review feedback

t-tye requested changes to this revision.Feb 23 2018, 3:03 PM

t-tye added inline comments.

lib/Basic/Targets/AMDGPU.cpp
159	What does this mean? Has it now been addressed by this patch?
256	To be consistent should this be: if (isAMDGCN(getTriple())) Similar comment elsewhere.
282	This was incorrect in the old code. Only full rate FP64 gcn targets have fast FMAF.
lib/Basic/Targets/AMDGPU.h
88–91 ↗	(On Diff #135437)	Would it be better to position these at the beginning/end of the respective enumerations so it is more obvious that they must be updated when adding a new target?
98–102 ↗	(On Diff #135437)	Suggest reordering to be in a logical groups of FP32 and FP64: bool HasFMAF; bool HasFastFMAF; bool HasLDEXPF; bool HasFP64; bool HasFastFMA;
108 ↗	(On Diff #135437)	Suggest adding a comment here which is a header for the columns to make it easier to check if the settings are right. For example: // Name Canonical Kind HasFMAF HasFP64 HasLDEXPF HasFastFMA HasFastFMAF
136 ↗	(On Diff #135437)	Same comment as above. Also, I think the fast_fma and fast_fmaf columns are reversed. All gcn has fast_fma, it is fast_fmaf that varies.
147 ↗	(On Diff #135437)	gfx702 should have true for fast fmaf. Does the TD file need correcting too?
325 ↗	(On Diff #135437)	I found the original operand order easier to read:-)

This revision now requires changes to proceed.Feb 23 2018, 3:03 PM

b-sumner added inline comments.Feb 23 2018, 3:10 PM

lib/Basic/Targets/AMDGPU.cpp
339–341	I'm not sure why this is here. No languages we support have this AFAIK. We should probably add a comment that this is deprecated and remove it in a year or so.

kzhuravl added inline comments.Feb 23 2018, 3:39 PM

lib/Basic/Targets/AMDGPU.cpp
159	This class has a member called GPU. We are using the CPU that is passed as an argument. Has it now been addressed by this patch? No.
lib/Basic/Targets/AMDGPU.h
136 ↗	(On Diff #135437)	fast_fmaf is the last column. I think those are in the correct order.
147 ↗	(On Diff #135437)	Not according to the TD files in our BE: https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AMDGPU/AMDGPU.td#L545
325 ↗	(On Diff #135437)	I was trying to match the style above. But I can change it back.

t-tye added inline comments.Feb 23 2018, 4:58 PM

lib/Basic/Targets/AMDGPU.h
136 ↗	(On Diff #135437)	Agreed.
147 ↗	(On Diff #135437)	I believe the TD file is incorrect. gfx701 and 702 should both have fast fmaf.
325 ↗	(On Diff #135437)	Leave it in your new order which keeps it consistent as you point out. (For me they are all backwards:-) )

nhaehnle removed a subscriber: nhaehnle.Feb 25 2018, 7:36 AM

Address review feedback.

kzhuravl mentioned this in D43790: AMDGPU: Add fast fmaf feature to gfx702.Feb 26 2018, 3:11 PM

b-sumner added inline comments.Feb 26 2018, 3:57 PM

lib/Basic/Targets/AMDGPU.h
103 ↗	(On Diff #135997)	I guess the HasFastFMA is for simplicity? It is a synonym for HasFP64.

LGTM except for comment on deprecated macros.

lib/Basic/Targets/AMDGPU.cpp
338–339	@b-sumner which ones were to recommending deprecating? I would think FP64 and FMAF would be ones that needs to be kept? I thought it was HAS_FAST_FMA that is not specified by OpenCL.

This revision is now accepted and ready to land.Feb 26 2018, 9:58 PM

b-sumner added inline comments.Feb 27 2018, 5:41 AM

lib/Basic/Targets/AMDGPU.cpp
338–339	Sorry, I was referring to all of the __HAS_* macros.

rL326254

Revision Contents

Path

Size

lib/

Basic/

Targets/

AMDGPU.cpp

5 lines

test/

Driver/

amdgpu-macros.cl

3 lines

Diff 134730

lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
ArrayRef<const char *> AMDGPUTargetInfo::getGCCRegNames() const {		ArrayRef<const char *> AMDGPUTargetInfo::getGCCRegNames() const {
return llvm::makeArrayRef(GCCRegNames);		return llvm::makeArrayRef(GCCRegNames);
}		}

bool AMDGPUTargetInfo::initFeatureMap(		bool AMDGPUTargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,		llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeatureVec) const {		const std::vector<std::string> &FeatureVec) const {

// XXX - What does the member GPU mean if device name string passed here?		// XXX - What does the member GPU mean if device name string passed here?
		t-tyeUnsubmitted Done Reply Inline Actions What does this mean? Has it now been addressed by this patch? t-tye: What does this mean? Has it now been addressed by this patch?
		kzhuravlAuthorUnsubmitted Done Reply Inline Actions This class has a member called GPU. We are using the CPU that is passed as an argument. Has it now been addressed by this patch? No. kzhuravl: This class has a member called GPU. We are using the CPU that is passed as an argument. >…
if (getTriple().getArch() == llvm::Triple::amdgcn) {		if (getTriple().getArch() == llvm::Triple::amdgcn) {
if (CPU.empty())		if (CPU.empty())
CPU = "tahiti";		CPU = "tahiti";

switch (parseAMDGCNName(CPU).Kind) {		switch (parseAMDGCNName(CPU).Kind) {
case GK_GFX6:		case GK_GFX6:
case GK_GFX7:		case GK_GFX7:
break;		break;
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	AMDGPUTargetInfo::GPUInfo AMDGPUTargetInfo::parseAMDGCNName(StringRef Name) {

if (Result == std::end(AMDGCNNames))		if (Result == std::end(AMDGCNNames))
return InvalidGPU;		return InvalidGPU;
return *Result;		return *Result;
}		}

void AMDGPUTargetInfo::fillValidCPUList(		void AMDGPUTargetInfo::fillValidCPUList(
SmallVectorImpl<StringRef> &Values) const {		SmallVectorImpl<StringRef> &Values) const {
if (getTriple().getArch() == llvm::Triple::amdgcn)		if (getTriple().getArch() == llvm::Triple::amdgcn)
		t-tyeUnsubmitted Done Reply Inline Actions To be consistent should this be: if (isAMDGCN(getTriple())) Similar comment elsewhere. t-tye: To be consistent should this be: ``` if (isAMDGCN(getTriple())) ``` Similar comment elsewhere.
llvm::for_each(AMDGCNNames, [&Values](const GPUInfo &GPU) {		llvm::for_each(AMDGCNNames, [&Values](const GPUInfo &GPU) {
Values.emplace_back(GPU.Name);});		Values.emplace_back(GPU.Name);});
else		else
llvm::for_each(R600Names, [&Values](const GPUInfo &GPU) {		llvm::for_each(R600Names, [&Values](const GPUInfo &GPU) {
Values.emplace_back(GPU.Name);});		Values.emplace_back(GPU.Name);});
}		}

void AMDGPUTargetInfo::setAddressSpaceMap(bool DefaultIsPrivate) {		void AMDGPUTargetInfo::setAddressSpaceMap(bool DefaultIsPrivate) {
Show All 9 Lines
AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,		AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,
const TargetOptions &Opts)		const TargetOptions &Opts)
: TargetInfo(Triple),		: TargetInfo(Triple),
GPU(isAMDGCN(Triple) ? AMDGCNNames[0] : parseR600Name(Opts.CPU)),		GPU(isAMDGCN(Triple) ? AMDGCNNames[0] : parseR600Name(Opts.CPU)),
hasFP64(false), hasFMAF(false), hasLDEXPF(false),		hasFP64(false), hasFMAF(false), hasLDEXPF(false),
AS(isGenericZero(Triple)) {		AS(isGenericZero(Triple)) {
if (getTriple().getArch() == llvm::Triple::amdgcn) {		if (getTriple().getArch() == llvm::Triple::amdgcn) {
hasFP64 = true;		hasFP64 = true;
hasFMAF = true;		hasFMAF = true;
t-tyeUnsubmitted Done Reply Inline Actions This was incorrect in the old code. Only full rate FP64 gcn targets have fast FMAF. t-tye: This was incorrect in the old code. Only full rate FP64 gcn targets have fast FMAF.
hasLDEXPF = true;		hasLDEXPF = true;
}		}
if (getTriple().getArch() == llvm::Triple::r600) {		if (getTriple().getArch() == llvm::Triple::r600) {
if (GPU.Kind == GK_EVERGREEN_DOUBLE_OPS \|\| GPU.Kind == GK_CAYMAN) {		if (GPU.Kind == GK_EVERGREEN_DOUBLE_OPS \|\| GPU.Kind == GK_CAYMAN) {
hasFMAF = true;		hasFMAF = true;
}		}
}		}
auto IsGenericZero = isGenericZero(Triple);		auto IsGenericZero = isGenericZero(Triple);
Show All 39 Lines	void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
if (getTriple().getArch() == llvm::Triple::amdgcn)		if (getTriple().getArch() == llvm::Triple::amdgcn)
Builder.defineMacro("__AMDGCN__");		Builder.defineMacro("__AMDGCN__");
else		else
Builder.defineMacro("__R600__");		Builder.defineMacro("__R600__");

if (GPU.Kind != GK_NONE)		if (GPU.Kind != GK_NONE)
Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName) + Twine("__"));		Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName) + Twine("__"));

if (hasFMAF)		if (hasFMAF)
Builder.defineMacro("__HAS_FMAF__");		Builder.defineMacro("__HAS_FMAF__");
		t-tyeUnsubmitted Not Done Reply Inline Actions @b-sumner which ones were to recommending deprecating? I would think FP64 and FMAF would be ones that needs to be kept? I thought it was HAS_FAST_FMA that is not specified by OpenCL. t-tye: @b-sumner which ones were to recommending deprecating? I would think FP64 and FMAF would be…
		b-sumnerUnsubmitted Not Done Reply Inline Actions Sorry, I was referring to all of the __HAS_* macros. b-sumner: Sorry, I was referring to all of the __HAS_* macros.
if (hasLDEXPF)		if (hasLDEXPF)
Builder.defineMacro("__HAS_LDEXPF__");		Builder.defineMacro("__HAS_LDEXPF__");
		b-sumnerUnsubmitted Done Reply Inline Actions I'm not sure why this is here. No languages we support have this AFAIK. We should probably add a comment that this is deprecated and remove it in a year or so. b-sumner: I'm not sure why this is here. No languages we support have this AFAIK. We should probably…
if (hasFP64)		if (hasFP64)
Builder.defineMacro("__HAS_FP64__");		Builder.defineMacro("__HAS_FP64__");

		if (getTriple().getArch() == llvm::Triple::amdgcn) {
		Builder.defineMacro("FP_FAST_FMA");
		Builder.defineMacro("FP_FAST_FMAF");
		}
		t-tyeUnsubmitted Done Reply Inline Actions Do all amdgcn targets have fast FMA? @b-sumner can you clarify? t-tye: Do all amdgcn targets have fast FMA? @b-sumner can you clarify?
		b-sumnerUnsubmitted Done Reply Inline Actions No. All targets that support double precision should report FAST_FMA. Only targets with full rate v_fma_f32 should report FAST_FMAF b-sumner: No. All targets that support double precision should report FAST_FMA. Only targets with full…
		t-tyeUnsubmitted Done Reply Inline Actions It is unfortunate that clang does not have access to the processor features defined in the td files which gives the settings for each target. t-tye: It is unfortunate that clang does not have access to the processor features defined in the td…
		t-tyeUnsubmitted Done Reply Inline Actions Now that the compiler knows the target it seems the clang options that specify fast_fma et al should be removed and the runtimes changed to not set them. The implementation of when fast fma[f] is present should match the amdgcn td files which have all gfx9 and some pre-gfx9 targets supporting fast fmaf (the ones that have full rate double precision). t-tye: Now that the compiler knows the target it seems the clang options that specify fast_fma et al…
}		}

test/Driver/amdgpu-macros.cl

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	// RUN: %clang -E -dM -target amdgcn -mcpu=stoney %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX810 %s			// RUN: %clang -E -dM -target amdgcn -mcpu=stoney %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX810 %s
	// RUN: %clang -E -dM -target amdgcn -mcpu=gfx900 %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX900 %s			// RUN: %clang -E -dM -target amdgcn -mcpu=gfx900 %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX900 %s
	// RUN: %clang -E -dM -target amdgcn -mcpu=gfx902 %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX902 %s			// RUN: %clang -E -dM -target amdgcn -mcpu=gfx902 %s 2>&1 \| FileCheck --check-prefixes=ARCH-GCN,GFX902 %s

	// ARCH-GCN-DAG: #define __AMD__ 1			// ARCH-GCN-DAG: #define __AMD__ 1
	// ARCH-GCN-DAG: #define __AMDGPU__ 1			// ARCH-GCN-DAG: #define __AMDGPU__ 1
	// ARCH-GCN-DAG: #define __AMDGCN__ 1			// ARCH-GCN-DAG: #define __AMDGCN__ 1

				// ARCH-GCN-DAG: #define FP_FAST_FMA 1
				// ARCH-GCN-DAG: #define FP_FAST_FMAF 1

	// GFX600: #define __gfx600__ 1			// GFX600: #define __gfx600__ 1
	// GFX601: #define __gfx601__ 1			// GFX601: #define __gfx601__ 1
	// GFX700: #define __gfx700__ 1			// GFX700: #define __gfx700__ 1
	// GFX701: #define __gfx701__ 1			// GFX701: #define __gfx701__ 1
	// GFX702: #define __gfx702__ 1			// GFX702: #define __gfx702__ 1
	// GFX703: #define __gfx703__ 1			// GFX703: #define __gfx703__ 1
	// GFX704: #define __gfx704__ 1			// GFX704: #define __gfx704__ 1
	// GFX801: #define __gfx801__ 1			// GFX801: #define __gfx801__ 1
	// GFX802: #define __gfx802__ 1			// GFX802: #define __gfx802__ 1
	// GFX803: #define __gfx803__ 1			// GFX803: #define __gfx803__ 1
	// GFX810: #define __gfx810__ 1			// GFX810: #define __gfx810__ 1
	// GFX900: #define __gfx900__ 1			// GFX900: #define __gfx900__ 1
	// GFX902: #define __gfx902__ 1			// GFX902: #define __gfx902__ 1

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Define FP_FAST_FMA{F} macros for amdgcnClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 134730

lib/Basic/Targets/AMDGPU.cpp

test/Driver/amdgpu-macros.cl

AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn
ClosedPublic