This is an archive of the discontinued LLVM Phabricator instance.

lib/Sema/Sema.cpp
1494–1496 ↗	(On Diff #187621)	Nit: i'd use ternary op here or explicit if/else to indicate that CUDADiagIfDeviceCode/CUDADiagIfHostCode are treated the same and that CUDADiagIfHostCode() is not a catch-all of some kind. return getLangOpts().CUDAIsDevice ? CUDADiagIfDeviceCode(Loc, DiagID) : CUDADiagIfHostCode(Loc, DiagID)

This revision is now accepted and ready to land.Feb 20 2019, 2:06 PM

Closed by commit rL354593: [CUDA]Delayed diagnostics for the asm instructions. (authored by ABataev). · Explain WhyFeb 21 2019, 7:52 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 7:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

There's a new quirk we've ran into after this patch landed. Consider this code:

int foo() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

When we compile for device, asm constraint is not valid for NVPTX, we emit delayed diag and move on. The function is never code-gen'ed so the diag never shows up. So far so good.

Now we add -Werror -Wininitialized and things break -- because we bail out early, prev is left uninitialized and is reported as such.

$ bin/clang++ -c --cuda-gpu-arch=sm_35 asm.cu -nocudainc --cuda-device-only -Wuninitialized -Werror
asm.cu:4:10: error: variable 'prev' is uninitialized when used here [-Werror,-Wuninitialized]
  return prev;
         ^~~~
asm.cu:2:11: note: initialize the variable 'prev' to silence this warning
  int prev;
          ^
           = 0
1 error generated when compiling for sm_35.

I think this should also show up in the test case in this patch, too, if you add -Wuninitialized

In D58463#1410900, @tra wrote:
There's a new quirk we've ran into after this patch landed. Consider this code:
int foo() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}
When we compile for device, asm constraint is not valid for NVPTX, we emit delayed diag and move on. The function is never code-gen'ed so the diag never shows up. So far so good.

Now we add -Werror -Wininitialized and things break -- because we bail out early, prev is left uninitialized and is reported as such.
$ bin/clang++ -c --cuda-gpu-arch=sm_35 asm.cu -nocudainc --cuda-device-only -Wuninitialized -Werror
asm.cu:4:10: error: variable 'prev' is uninitialized when used here [-Werror,-Wuninitialized]
  return prev;
         ^~~~
asm.cu:2:11: note: initialize the variable 'prev' to silence this warning
  int prev;
          ^
           = 0
1 error generated when compiling for sm_35.
I think this should also show up in the test case in this patch, too, if you add -Wuninitialized

Hi Artem, I think we can just delay emission of this warning to solve this problem.

Hi Artem, I think we can just delay emission of this warning to solve this problem.

I'm not sure we can always tell whether the warning is real or if it's the consequence of failing to parse inline asm.

E.g.:

namespace {
__host__ __device__ a() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

__host__ __device__ b() {
  int prev;
  return prev;
}

} //namespace

Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.

In D58463#1411039, @tra wrote:
Hi Artem, I think we can just delay emission of this warning to solve this problem.

I'm not sure we can always tell whether the warning is real or if it's the consequence of failing to parse inline asm.

E.g.:
namespace {
__host__ __device__ a() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

__host__ __device__ b() {
  int prev;
  return prev;
}

} //namespace
Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.

But it is going to be emitted for b() if b() is really used on the host or on the device. For a() the warning is going to be emitted only if it is really used on device, otherwise it is not.
Instead, we can try to do what we did before: construct GCCAsmStmt object, just like you said. What option do you prefer?

E.g.:
namespace {
__host__ __device__ a() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

__host__ __device__ b() {
  int prev;
  return prev;
}

} //namespace
Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.
But it is going to be emitted for b() if b() is really used on the host or on the device.

Clang also emits the uninitialized warnings for b when it is not used -- as in the example above.
I'm OK with that as b is a valid function on both sides.

Suppressing uninitialized warning in this case would be wrong, IMO -- that would diverge from what clang would do if b didn't have __host__ __device__ attributes.

For a() the warning is going to be emitted only if it is really used on device, otherwise it is not.

Instead, we can try to do what we did before: construct GCCAsmStmt object, just like you said. What option do you prefer?

I think creating a GCCAsmStmt() is the right way to deal with this as it gives compiler the correct (well, as correct as we can at that point) info about the code, as opposed to giving compiler broken pieces and trying to suppress the fallout.

In D58463#1411086, @tra wrote:
E.g.:
namespace {
__host__ __device__ a() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

__host__ __device__ b() {
  int prev;
  return prev;
}

} //namespace
Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.
But it is going to be emitted for b() if b() is really used on the host or on the device.
Clang also emits the uninitialized warnings for b when it is not used -- as in the example above.
I'm OK with that as b is a valid function on both sides.

Suppressing uninitialized warning in this case would be wrong, IMO -- that would diverge from what clang would do if b didn't have __host__ __device__ attributes.

For a() the warning is going to be emitted only if it is really used on device, otherwise it is not.

Instead, we can try to do what we did before: construct GCCAsmStmt object, just like you said. What option do you prefer?

I think creating a GCCAsmStmt() is the right way to deal with this as it gives compiler the correct (well, as correct as we can at that point) info about the code, as opposed to giving compiler broken pieces and trying to suppress the fallout.

Ok, will prepare a fix shortly.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Sema/

6 lines

2 lines

2 lines

9 lines

test/

SemaCUDA/

asm_delayed_diags.cu

118 lines

Diff 187798

cfe/trunk/lib/Sema/Sema.cpp

Show First 20 Lines • Show All 1,481 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
}		}

// C.Callee is now known-emitted, so we no longer need to maintain its list		// C.Callee is now known-emitted, so we no longer need to maintain its list
// of callees in DeviceCallGraph.		// of callees in DeviceCallGraph.
S.DeviceCallGraph.erase(CGIt);		S.DeviceCallGraph.erase(CGIt);
}		}
}		}

Sema::DeviceDiagBuilder Sema::targetDiag(SourceLocation Loc,		Sema::DeviceDiagBuilder Sema::targetDiag(SourceLocation Loc, unsigned DiagID) {
unsigned DiagID) {
if (LangOpts.OpenMP && LangOpts.OpenMPIsDevice)		if (LangOpts.OpenMP && LangOpts.OpenMPIsDevice)
return diagIfOpenMPDeviceCode(Loc, DiagID);		return diagIfOpenMPDeviceCode(Loc, DiagID);
		if (getLangOpts().CUDA)
		return getLangOpts().CUDAIsDevice ? CUDADiagIfDeviceCode(Loc, DiagID)
		: CUDADiagIfHostCode(Loc, DiagID);
return DeviceDiagBuilder(DeviceDiagBuilder::K_Immediate, Loc, DiagID,		return DeviceDiagBuilder(DeviceDiagBuilder::K_Immediate, Loc, DiagID,
getCurFunctionDecl(), *this);		getCurFunctionDecl(), *this);
}		}

/// Looks through the macro-expansion chain for the given		/// Looks through the macro-expansion chain for the given
/// location, looking for a macro expansion with the given name.		/// location, looking for a macro expansion with the given name.
/// If one is found, returns true and sets the location to that		/// If one is found, returns true and sets the location to that
/// expansion loc.		/// expansion loc.
▲ Show 20 Lines • Show All 678 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaExprCXX.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 744 Lines • ▼ Show 20 Lines	Sema::ActOnCXXThrow(Scope S, SourceLocation OpLoc, Expr Ex) {

return BuildCXXThrow(OpLoc, Ex, IsThrownVarInScope);		return BuildCXXThrow(OpLoc, Ex, IsThrownVarInScope);
}		}

ExprResult Sema::BuildCXXThrow(SourceLocation OpLoc, Expr *Ex,		ExprResult Sema::BuildCXXThrow(SourceLocation OpLoc, Expr *Ex,
bool IsThrownVarInScope) {		bool IsThrownVarInScope) {
// Don't report an error if 'throw' is used in system headers.		// Don't report an error if 'throw' is used in system headers.
if (!getLangOpts().CXXExceptions &&		if (!getLangOpts().CXXExceptions &&
!getSourceManager().isInSystemHeader(OpLoc)) {		!getSourceManager().isInSystemHeader(OpLoc) && !getLangOpts().CUDA) {
// Delay error emission for the OpenMP device code.		// Delay error emission for the OpenMP device code.
targetDiag(OpLoc, diag::err_exceptions_disabled) << "throw";		targetDiag(OpLoc, diag::err_exceptions_disabled) << "throw";
}		}

// Exceptions aren't allowed in CUDA device code.		// Exceptions aren't allowed in CUDA device code.
if (getLangOpts().CUDA)		if (getLangOpts().CUDA)
CUDADiagIfDeviceCode(OpLoc, diag::err_cuda_device_exceptions)		CUDADiagIfDeviceCode(OpLoc, diag::err_cuda_device_exceptions)
<< "throw" << CurrentCUDATarget();		<< "throw" << CurrentCUDATarget();
▲ Show 20 Lines • Show All 7,176 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaStmt.cpp

	Show First 20 Lines • Show All 3,987 Lines • ▼ Show 20 Lines
	}			}

	/// ActOnCXXTryBlock - Takes a try compound-statement and a number of			/// ActOnCXXTryBlock - Takes a try compound-statement and a number of
	/// handlers and creates a try statement from them.			/// handlers and creates a try statement from them.
	StmtResult Sema::ActOnCXXTryBlock(SourceLocation TryLoc, Stmt *TryBlock,			StmtResult Sema::ActOnCXXTryBlock(SourceLocation TryLoc, Stmt *TryBlock,
	ArrayRef<Stmt *> Handlers) {			ArrayRef<Stmt *> Handlers) {
	// Don't report an error if 'try' is used in system headers.			// Don't report an error if 'try' is used in system headers.
	if (!getLangOpts().CXXExceptions &&			if (!getLangOpts().CXXExceptions &&
	!getSourceManager().isInSystemHeader(TryLoc)) {			!getSourceManager().isInSystemHeader(TryLoc) && !getLangOpts().CUDA) {
	// Delay error emission for the OpenMP device code.			// Delay error emission for the OpenMP device code.
	targetDiag(TryLoc, diag::err_exceptions_disabled) << "try";			targetDiag(TryLoc, diag::err_exceptions_disabled) << "try";
	}			}

	// Exceptions aren't allowed in CUDA device code.			// Exceptions aren't allowed in CUDA device code.
	if (getLangOpts().CUDA)			if (getLangOpts().CUDA)
	CUDADiagIfDeviceCode(TryLoc, diag::err_cuda_device_exceptions)			CUDADiagIfDeviceCode(TryLoc, diag::err_cuda_device_exceptions)
	<< "try" << CurrentCUDATarget();			<< "try" << CurrentCUDATarget();
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaStmtAsm.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, bool IsSimple,
StringLiteral *AsmString = cast<StringLiteral>(asmString);		StringLiteral *AsmString = cast<StringLiteral>(asmString);
StringLiteral Clobbers = reinterpret_cast<StringLiteral>(clobbers.data());		StringLiteral Clobbers = reinterpret_cast<StringLiteral>(clobbers.data());

SmallVector<TargetInfo::ConstraintInfo, 4> OutputConstraintInfos;		SmallVector<TargetInfo::ConstraintInfo, 4> OutputConstraintInfos;

// The parser verifies that there is a string literal here.		// The parser verifies that there is a string literal here.
assert(AsmString->isAscii());		assert(AsmString->isAscii());

// If we're compiling CUDA file and function attributes indicate that it's not
// for this compilation side, skip all the checks.
if (!DeclAttrsMatchCUDAMode(getLangOpts(), getCurFunctionDecl())) {
GCCAsmStmt *NS = new (Context) GCCAsmStmt(
Context, AsmLoc, IsSimple, IsVolatile, NumOutputs, NumInputs, Names,
Constraints, Exprs.data(), AsmString, NumClobbers, Clobbers, RParenLoc);
return NS;
}

for (unsigned i = 0; i != NumOutputs; i++) {		for (unsigned i = 0; i != NumOutputs; i++) {
StringLiteral *Literal = Constraints[i];		StringLiteral *Literal = Constraints[i];
assert(Literal->isAscii());		assert(Literal->isAscii());

StringRef OutputName;		StringRef OutputName;
if (Names[i])		if (Names[i])
OutputName = Names[i]->getName();		OutputName = Names[i]->getName();

▲ Show 20 Lines • Show All 612 Lines • Show Last 20 Lines

cfe/trunk/test/SemaCUDA/asm_delayed_diags.cu

				// RUN: %clang_cc1 -fsyntax-only -verify %s -DHOST -triple x86_64-unknown-linux-gnu
				// RUN: %clang_cc1 -fsyntax-only -verify %s -DHOST -DHOST_USED -triple x86_64-unknown-linux-gnu
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s -DDEVICE_NOT_USED -triple nvptx-unknown-cuda
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s -DDEVICE -triple nvptx-unknown-cuda
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s -DDEVICE -DDEVICE_USED -triple nvptx-unknown-cuda

				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				#if (defined(HOST) && !defined(HOST_USED)) \|\| defined(DEVICE_NOT_USED)
				// expected-no-diagnostics
				#endif

				#include "Inputs/cuda.h"

				static __device__ __host__ void t1(int r) {
				__asm__("PR3908 %[lf] %[xx] %[li] %[r]"
				: [ r ] "+r"(r)
				: [ lf ] "mx"(0), [ li ] "mr"(0), [ xx ] "x"((double)(0)));
				}

				static __device__ __host__ unsigned t2(signed char input) {
				unsigned output;
				__asm__("xyz"
				: "=a"(output)
				: "0"(input));
				return output;
				}

				static __device__ __host__ double t3(double x) {
				register long double result;
				__asm __volatile("frndint"
				: "=t"(result)
				: "0"(x));
				return result;
				}

				static __device__ __host__ unsigned char t4(unsigned char a, unsigned char b) {
				unsigned int la = a;
				unsigned int lb = b;
				unsigned int bigres;
				unsigned char res;
				__asm__("0:\n1:\n"
				: [ bigres ] "=la"(bigres)
				: [ la ] "0"(la), [ lb ] "c"(lb)
				: "edx", "cc");
				res = bigres;
				return res;
				}

				static __device__ __host__ void t5(void) {
				__asm__ __volatile__(
				"finit"
				:
				:
				: "st", "st(1)", "st(2)", "st(3)",
				"st(4)", "st(5)", "st(6)", "st(7)",
				"fpsr", "fpcr");
				}

				typedef long long __m256i __attribute__((__vector_size__(32)));
				static __device__ __host__ void t6(__m256i *p) {
				__asm__ volatile("vmovaps %0, %%ymm0" ::"m"((__m256i )p)
				: "ymm0");
				}

				static __device__ __host__ void t7(__m256i *p) {
				__asm__ volatile("vmovaps %0, %%ymm0" ::"m"((__m256i )p)
				: "r0");
				}

				#ifdef DEVICE
				__device__ int m() {
				t1(0);
				t2(0);
				t3(0);
				t4(0, 0);
				t5();
				t6(0);
				#ifdef DEVICE_USED
				t7(0);
				#endif // DEVICE_USED
				return 0;
				}
				#endif // DEVICE

				#ifdef HOST
				__host__ int main() {
				t1(0);
				t2(0);
				t3(0);
				t4(0, 0);
				t5();
				t6(0);
				#ifdef HOST_USED
				t7(0);
				#endif // HOST_USED
				return 0;
				}
				#endif // HOST

				#if defined(HOST_USED)
				// expected-error@69 {{unknown register name 'r0' in asm}}
				// expected-note@96 {{called by 'main'}}
				#elif defined(DEVICE)
				// expected-error@19 {{invalid input constraint 'mx' in asm}}
				// expected-error@25 {{invalid output constraint '=a' in asm}}
				// expected-error@33 {{invalid output constraint '=t' in asm}}
				// expected-error@44 {{invalid output constraint '=la' in asm}}
				// expected-error@56 {{unknown register name 'st' in asm}}
				// expected-error@64 {{unknown register name 'ymm0' in asm}}
				// expected-note@74 {{called by 'm'}}
				// expected-note@75 {{called by 'm'}}
				// expected-note@76 {{called by 'm'}}
				// expected-note@77 {{called by 'm'}}
				// expected-note@78 {{called by 'm'}}
				// expected-note@79 {{called by 'm'}}
				#endif

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA]Delayed diagnostics for the asm instructions.ClosedPublic

Details

Diff Detail

Event Timeline