This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
AMDGPU.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
types.cu

Differential D57527

Do not copy long double and 128-bit fp format from aux target for AMDGPU
ClosedPublic

Authored by yaxunl on Jan 31 2019, 10:47 AM.

Download Raw Diff

Details

Reviewers

rjmccall

Commits

rG277e064bf529: Do not copy long double and 128-bit fp format from aux target for AMDGPU
rL352801: Do not copy long double and 128-bit fp format from aux target for AMDGPU
rC352801: Do not copy long double and 128-bit fp format from aux target for AMDGPU

Summary

rC352620 caused regressions because it copied floating point format from
aux target.

floating point format decides whether extended long double is supported.
It is x86_fp80 on x86 but IEEE double on amdgcn.

Document usage of long doubel type in HIP programming guide
https://github.com/ROCm-Developer-Tools/HIP/pull/890

Diff Detail

Repository: rC Clang

Event Timeline

yaxunl created this revision.Jan 31 2019, 10:47 AM

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

In D57527#1379065, @rjmccall wrote:

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

If host do not pass long double to device we will be fine. So we need to diagnose long double kernel arguments. However I'd like to do it in separate patch since we want to fix the regression first.

Since this maybe a special case for AMDGPU, I will fix it in AMDGPUTargetInfo.

In D57527#1379088, @yaxunl wrote:

In D57527#1379065, @rjmccall wrote:

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

If host do not pass long double to device we will be fine. So we need to diagnose long double kernel arguments. However I'd like to do it in separate patch since we want to fix the regression first.

Okay. Do you also need to look for global structs and other way that information might be passed? I suppose at some level you just have to document it as a danger and treat further diagnostics as QoI.

Since this maybe a special case for AMDGPU, I will fix it in AMDGPUTargetInfo.

Alright. That should be as easy as saving the old value and restoring it after the overwrite.

Fix in AMDGPUTargetInfo.

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJan 31 2019, 12:23 PM

In D57527#1379159, @rjmccall wrote:

In D57527#1379088, @yaxunl wrote:

In D57527#1379065, @rjmccall wrote:

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

If host do not pass long double to device we will be fine. So we need to diagnose long double kernel arguments. However I'd like to do it in separate patch since we want to fix the regression first.

Okay. Do you also need to look for global structs and other way that information might be passed? I suppose at some level you just have to document it as a danger and treat further diagnostics as QoI.

I created a pull request to document long double usage in HIP https://github.com/ROCm-Developer-Tools/HIP/pull/890

Explanatory comment, please. Otherwise LGTM.

In D57527#1379287, @rjmccall wrote:

Explanatory comment, please. Otherwise LGTM.

will do when committing.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2019, 1:57 PM

Closed by commit rC352801: Do not copy long double and 128-bit fp format from aux target for AMDGPU (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Basic/

Targets/

AMDGPU.cpp

11 lines

test/

CodeGenCUDA/

types.cu

10 lines

Diff 184604

lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	if (hasLDEXPF())
Builder.defineMacro("__HAS_LDEXPF__");		Builder.defineMacro("__HAS_LDEXPF__");
if (hasFP64())		if (hasFP64())
Builder.defineMacro("__HAS_FP64__");		Builder.defineMacro("__HAS_FP64__");
if (hasFastFMA())		if (hasFastFMA())
Builder.defineMacro("FP_FAST_FMA");		Builder.defineMacro("FP_FAST_FMA");
}		}

void AMDGPUTargetInfo::setAuxTarget(const TargetInfo *Aux) {		void AMDGPUTargetInfo::setAuxTarget(const TargetInfo *Aux) {
		assert(HalfFormat == Aux->HalfFormat);
		assert(FloatFormat == Aux->FloatFormat);
		assert(DoubleFormat == Aux->DoubleFormat);

		// On x86_64 long double is 80-bit extended precision format, which is
		// not supported by AMDGPU. 128-bit floating point format is also not
		// supported by AMDGPU. Therefore keep its own format for these two types.
		auto SaveLongDoubleFormat = LongDoubleFormat;
		auto SaveFloat128Format = Float128Format;
copyAuxTarget(Aux);		copyAuxTarget(Aux);
		LongDoubleFormat = SaveLongDoubleFormat;
		Float128Format = SaveFloat128Format;
}		}

test/CodeGenCUDA/types.cu

				// RUN: %clang_cc1 -triple amdgcn -aux-triple x86_64 -fcuda-is-device -emit-llvm %s -o - \| FileCheck -check-prefix=DEV %s
				// RUN: %clang_cc1 -triple x86_64 -aux-triple amdgcn -emit-llvm %s -o - \| FileCheck -check-prefix=HOST %s

				#include "Inputs/cuda.h"

				// HOST: @ld_host = global x86_fp80 0xK00000000000000000000
				long double ld_host;

				// DEV: @ld_device = addrspace(1) externally_initialized global double 0.000000e+00
				__device__ long double ld_device;