This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Don't crash when trying to printf a non-scalar object.
ClosedPublic

Authored by jlebar on Feb 10 2016, 2:40 PM.

Download Raw Diff

Details

Reviewers

majnemer
rnk

Commits

rG9a2c0fbaf56a: [CUDA] Don't crash when trying to printf a non-scalar object.
rC260479: [CUDA] Don't crash when trying to printf a non-scalar object.
rL260479: [CUDA] Don't crash when trying to printf a non-scalar object.

Summary

We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 47533.Feb 10 2016, 2:40 PM

jlebar retitled this revision from to [CUDA] Don't crash when trying to printf a non-scalar object..

jlebar updated this object.

jlebar added reviewers: majnemer, rnk.

jlebar added subscribers: tra, jhen, cfe-commits.

Erasing an argument would only complicate the problem.
I guess for consistency we need to match clang's behavior for regular C++ code.
For optimized builds it just seems to pass NULL pointer instead.

Yeah, I have no idea what's the right thing to do here. We can always pass a null pointer, that's easy. David, Reid, do you know what is the correct behavior?

In D17103#349182, @jlebar wrote:

Yeah, I have no idea what's the right thing to do here. We can always pass a null pointer, that's easy. David, Reid, do you know what is the correct behavior?

I think we need to diagnose / reject this during semantic analysis (and then put a reasonable assert in the backend).

In D17103#349245, @hfinkel wrote:

In D17103#349182, @jlebar wrote:

Yeah, I have no idea what's the right thing to do here. We can always pass a null pointer, that's easy. David, Reid, do you know what is the correct behavior?

I think we need to diagnose / reject this during semantic analysis (and then put a reasonable assert in the backend).

Two things.

a) That doesn't seem to be what we do in regular C++. It will happily let you pass a Struct in with only a warning.
b) At the moment, we don't have the capability to do a proper semantic analysis of this. The issue is, when doing sema checking of host device functions, we don't know whether the function will end up being codegen'ed for device. And the semantics of cuda are that it's OK to do things that are illegal in device mode from host device functions, so long as you never codegen those functions for the device.

We have a plan to address (b) (basically, when doing sema checking, buffer any errors we would emit if we were to codegen for device; then we can emit all those errors right before codegen), but it's a much bigger thing. Until then, we need to do *something* other than crash here, even if we add additional sema checking for plain device fns.

Ultimately, Sema should be responsible for rejecting this, correct? In the meantime we can have CodeGen reject this and emit a null value to avoid crashing.

lib/CodeGen/CGCUDABuiltin.cpp
105	I assume this is what's asserting. Probably this code should do something like: if (Args[I].RV.isScalar()) { Arg = Args[I].RV.getScalarVal(); } else { ErrorUnsupported(E, "non-scalar variadic argument"); Arg = CGM.getNullValue(...); }

In D17103#349254, @jlebar wrote:

In D17103#349245, @hfinkel wrote:

In D17103#349182, @jlebar wrote:

Yeah, I have no idea what's the right thing to do here. We can always pass a null pointer, that's easy. David, Reid, do you know what is the correct behavior?

I think we need to diagnose / reject this during semantic analysis (and then put a reasonable assert in the backend).

Two things.

a) That doesn't seem to be what we do in regular C++. It will happily let you pass a Struct in with only a warning.

Yes, but it also can be legally lowered and does not crash.

b) At the moment, we don't have the capability to do a proper semantic analysis of this. The issue is, when doing sema checking of host device functions, we don't know whether the function will end up being codegen'ed for device. And the semantics of cuda are that it's OK to do things that are illegal in device mode from host device functions, so long as you never codegen those functions for the device.

We have a plan to address (b) (basically, when doing sema checking, buffer any errors we would emit if we were to codegen for device; then we can emit all those errors right before codegen), but it's a much bigger thing. Until then, we need to do *something* other than crash here, even if we add additional sema checking for plain device fns.

Interesting dilemma. In the mean time, you can call CGM.ErrorUnsupported instead of removing arguments.

Ultimately, Sema should be responsible for rejecting this, correct?

I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).

In D17103#349274, @jlebar wrote:

Ultimately, Sema should be responsible for rejecting this, correct?

I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).

Is this because PTX does not have a way to represent va_arg structs?

I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).

Is this because PTX does not have a way to represent va_arg structs?

We do build up something that looks an awful lot like a va_arg struct in this function. (It's a struct with N members, one for each of the varargs.) Exactly what printf expects is not particularly carefully specified in the nvvm documentation.

If an arg to printf is non-scalar, we could pass the whole thing into the struct we build here, but that doesn't seem to be what regular C++ does (it seems to take the first 64 bits of the struct -- I have no idea if this is specified somewhere or just UB).

rnk added inline comments.Feb 10 2016, 4:09 PM

lib/CodeGen/CGCUDABuiltin.cpp
105	Under the assumption that the implementation of vprintf expects an old-school va_list byte array, then it's probably easier to implement this behavior rather than try to diagnose it. All you have to do is memcpy the bytes of the aggregate. You also might want to handle _Complex values.

In D17103#349280, @jlebar wrote:

I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).

Is this because PTX does not have a way to represent va_arg structs?

We do build up something that looks an awful lot like a va_arg struct in this function. (It's a struct with N members, one for each of the varargs.) Exactly what printf expects is not particularly carefully specified in the nvvm documentation.

If an arg to printf is non-scalar, we could pass the whole thing into the struct we build here, but that doesn't seem to be what regular C++ does (it seems to take the first 64 bits of the struct -- I have no idea if this is specified somewhere or just UB).

It takes the first 64 bits of the struct in your example because the struct is only 64 bits in size (two 32-bit ints). If you're example was:

$ cat /tmp/p.cpp 
#include <stdio.h>

struct Struct {
  int x;
  int y;
  int z;
  int w;
};

void PrintfNonScalar() {
  Struct S = { 1, 2, 3, 4 };
  printf("%d", S);
}

then you'd get:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1

; Function Attrs: nounwind uwtable
define void @_Z15PrintfNonScalarv() #0 {
  %1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i64 8589934593, i64 17179869187)
  ret void
}

and so on. The target ABI code decides how to handle this (by coercing the types to a series of ints in this case).

If you were to do this on ppc64, for example, the target ABI code there does a slightly different thing:

target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

@.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1

; Function Attrs: nounwind
define void @_Z15PrintfNonScalarv() #0 {
  %1 = tail call signext i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), [2 x i64] [i64 4294967298, i64 12884901892])
  ret void
}

it looks like maybe you just need some more sophisticated code in NVPTXABIInfo in lib/CodeGen/TargetInfo.cpp to produce something the backend will accept?

OK, talked to Reid irl. Since this is just printf, not general varargs handling, the Simplest Thing That Could Possibly Work is to error-unsupported. Once we fix sema as described above, we can move the check there. Will update the patch, thanks everyone.

Error out with CGM.ErrorUnsupported when we receive a non-scalar arg.

lgtm

lib/CodeGen/CGCUDABuiltin.cpp
90	Doesn't printf return int? Maybe return RValue::get(llvm::ConstantInt::get(IntTy, 0))?

This revision is now accepted and ready to land.Feb 10 2016, 5:57 PM

Closed by commit rL260479: [CUDA] Don't crash when trying to printf a non-scalar object. (authored by jlebar). · Explain WhyFeb 10 2016, 6:05 PM

This revision was automatically updated to reflect the committed changes.

jlebar marked an inline comment as done.

Revision Contents

Path

Size

lib/

CodeGen/

CGCUDABuiltin.cpp

5 lines

test/

CodeGenCUDA/

printf.cu

9 lines

Diff 47533

lib/CodeGen/CGCUDABuiltin.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	CodeGenFunction::EmitCUDADevicePrintfCallExpr(const CallExpr *E,
llvm::LLVMContext &Ctx = CGM.getLLVMContext();		llvm::LLVMContext &Ctx = CGM.getLLVMContext();

CallArgList Args;		CallArgList Args;
EmitCallArgs(Args,		EmitCallArgs(Args,
E->getDirectCallee()->getType()->getAs<FunctionProtoType>(),		E->getDirectCallee()->getType()->getAs<FunctionProtoType>(),
E->arguments(), E->getDirectCallee(),		E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);		/* ParamsToSkip = */ 0);

		// We don't know how to emit non-scalar varargs, so just remove them.
		Args.erase(std::remove_if(Args.begin() + 1, Args.end(),
		[](const CallArg &A) { return !A.RV.isScalar(); }),
		Args.end());

		rnkUnsubmitted Done Reply Inline Actions Doesn't printf return int? Maybe return RValue::get(llvm::ConstantInt::get(IntTy, 0))? rnk: Doesn't printf return int? Maybe return RValue::get(llvm::ConstantInt::get(IntTy, 0))?
// Construct and fill the args buffer that we'll pass to vprintf.		// Construct and fill the args buffer that we'll pass to vprintf.
llvm::Value *BufferPtr;		llvm::Value *BufferPtr;
if (Args.size() <= 1) {		if (Args.size() <= 1) {
// If there are no args, pass a null pointer to vprintf.		// If there are no args, pass a null pointer to vprintf.
BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));		BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
} else {		} else {
llvm::SmallVector<llvm::Type *, 8> ArgTypes;		llvm::SmallVector<llvm::Type *, 8> ArgTypes;
for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I)		for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I)
ArgTypes.push_back(Args[I].RV.getScalarVal()->getType());		ArgTypes.push_back(Args[I].RV.getScalarVal()->getType());
llvm::Type *AllocaTy = llvm::StructType::create(ArgTypes, "printf_args");		llvm::Type *AllocaTy = llvm::StructType::create(ArgTypes, "printf_args");
llvm::Value *Alloca = CreateTempAlloca(AllocaTy);		llvm::Value *Alloca = CreateTempAlloca(AllocaTy);

for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {		for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1);		llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1);
llvm::Value *Arg = Args[I].RV.getScalarVal();		llvm::Value *Arg = Args[I].RV.getScalarVal();
		rnkUnsubmitted Not Done Reply Inline Actions I assume this is what's asserting. Probably this code should do something like: if (Args[I].RV.isScalar()) { Arg = Args[I].RV.getScalarVal(); } else { ErrorUnsupported(E, "non-scalar variadic argument"); Arg = CGM.getNullValue(...); } rnk: I assume this is what's asserting. Probably this code should do something like: if (Args[I].
		rnkUnsubmitted Not Done Reply Inline Actions Under the assumption that the implementation of vprintf expects an old-school va_list byte array, then it's probably easier to implement this behavior rather than try to diagnose it. All you have to do is memcpy the bytes of the aggregate. You also might want to handle _Complex values. rnk: Under the assumption that the implementation of vprintf expects an old-school va_list byte…
Builder.CreateAlignedStore(Arg, P, DL.getPrefTypeAlignment(Arg->getType()));		Builder.CreateAlignedStore(Arg, P, DL.getPrefTypeAlignment(Arg->getType()));
}		}
BufferPtr = Builder.CreatePointerCast(Alloca, llvm::Type::getInt8PtrTy(Ctx));		BufferPtr = Builder.CreatePointerCast(Alloca, llvm::Type::getInt8PtrTy(Ctx));
}		}

// Invoke vprintf and return.		// Invoke vprintf and return.
llvm::Function* VprintfFunc = GetVprintfDeclaration(CGM.getModule());		llvm::Function* VprintfFunc = GetVprintfDeclaration(CGM.getModule());
return RValue::get(		return RValue::get(
Builder.CreateCall(VprintfFunc, {Args[0].RV.getScalarVal(), BufferPtr}));		Builder.CreateCall(VprintfFunc, {Args[0].RV.getScalarVal(), BufferPtr}));
}		}

test/CodeGenCUDA/printf.cu

	Show All 35 Lines
	__device__ bool foo();			__device__ bool foo();
	__device__ void CheckAllocaIsInEntryBlock() {			__device__ void CheckAllocaIsInEntryBlock() {
	// CHECK: alloca %printf_args			// CHECK: alloca %printf_args
	// CHECK: call {{.*}} @_Z3foov()			// CHECK: call {{.*}} @_Z3foov()
	if (foo()) {			if (foo()) {
	printf("%d", 42);			printf("%d", 42);
	}			}
	}			}

				// Check that we don't crash when asked to printf a non-scalar arg.
				struct Struct {
				int x;
				int y;
				};
				__device__ void PrintfNonScalar() {
				printf("%d", Struct());
				}