We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.
Details
Diff Detail
Event Timeline
Erasing an argument would only complicate the problem.
I guess for consistency we need to match clang's behavior for regular C++ code.
For optimized builds it just seems to pass NULL pointer instead.
Yeah, I have no idea what's the right thing to do here. We can always pass a null pointer, that's easy. David, Reid, do you know what is the correct behavior?
I think we need to diagnose / reject this during semantic analysis (and then put a reasonable assert in the backend).
Two things.
a) That doesn't seem to be what we do in regular C++. It will happily let you pass a Struct in with only a warning.
b) At the moment, we don't have the capability to do a proper semantic analysis of this. The issue is, when doing sema checking of host device functions, we don't know whether the function will end up being codegen'ed for device. And the semantics of cuda are that it's OK to do things that are illegal in device mode from host device functions, so long as you never codegen those functions for the device.
We have a plan to address (b) (basically, when doing sema checking, buffer any errors we would emit if we were to codegen for device; then we can emit all those errors right before codegen), but it's a much bigger thing. Until then, we need to do *something* other than crash here, even if we add additional sema checking for plain device fns.
Ultimately, Sema should be responsible for rejecting this, correct? In the meantime we can have CodeGen reject this and emit a null value to avoid crashing.
lib/CodeGen/CGCUDABuiltin.cpp | ||
---|---|---|
105 | I assume this is what's asserting. Probably this code should do something like: if (Args[I].RV.isScalar()) { Arg = Args[I].RV.getScalarVal(); } else { ErrorUnsupported(E, "non-scalar variadic argument"); Arg = CGM.getNullValue(...); } |
Yes, but it also can be legally lowered and does not crash.
b) At the moment, we don't have the capability to do a proper semantic analysis of this. The issue is, when doing sema checking of host device functions, we don't know whether the function will end up being codegen'ed for device. And the semantics of cuda are that it's OK to do things that are illegal in device mode from host device functions, so long as you never codegen those functions for the device.
We have a plan to address (b) (basically, when doing sema checking, buffer any errors we would emit if we were to codegen for device; then we can emit all those errors right before codegen), but it's a much bigger thing. Until then, we need to do *something* other than crash here, even if we add additional sema checking for plain device fns.
Interesting dilemma. In the mean time, you can call CGM.ErrorUnsupported instead of removing arguments.
Ultimately, Sema should be responsible for rejecting this, correct?
I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).
I guess this is the part I'm unsure of. If it's legal to pass a struct to printf in regular C++ (seems to be?), I'd guess it should be legal in CUDA, too? I'm just not sure what it's supposed to do (in either case).
Is this because PTX does not have a way to represent va_arg structs?
We do build up something that looks an awful lot like a va_arg struct in this function. (It's a struct with N members, one for each of the varargs.) Exactly what printf expects is not particularly carefully specified in the nvvm documentation.
If an arg to printf is non-scalar, we could pass the whole thing into the struct we build here, but that doesn't seem to be what regular C++ does (it seems to take the first 64 bits of the struct -- I have no idea if this is specified somewhere or just UB).
lib/CodeGen/CGCUDABuiltin.cpp | ||
---|---|---|
105 | Under the assumption that the implementation of vprintf expects an old-school va_list byte array, then it's probably easier to implement this behavior rather than try to diagnose it. All you have to do is memcpy the bytes of the aggregate. You also might want to handle _Complex values. |
It takes the first 64 bits of the struct in your example because the struct is only 64 bits in size (two 32-bit ints). If you're example was:
$ cat /tmp/p.cpp #include <stdio.h> struct Struct { int x; int y; int z; int w; }; void PrintfNonScalar() { Struct S = { 1, 2, 3, 4 }; printf("%d", S); }
then you'd get:
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1 ; Function Attrs: nounwind uwtable define void @_Z15PrintfNonScalarv() #0 { %1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i64 8589934593, i64 17179869187) ret void }
and so on. The target ABI code decides how to handle this (by coercing the types to a series of ints in this case).
If you were to do this on ppc64, for example, the target ABI code there does a slightly different thing:
target datalayout = "E-m:e-i64:64-n32:64" target triple = "powerpc64-unknown-linux-gnu" @.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1 ; Function Attrs: nounwind define void @_Z15PrintfNonScalarv() #0 { %1 = tail call signext i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), [2 x i64] [i64 4294967298, i64 12884901892]) ret void }
it looks like maybe you just need some more sophisticated code in NVPTXABIInfo in lib/CodeGen/TargetInfo.cpp to produce something the backend will accept?
OK, talked to Reid irl. Since this is just printf, not general varargs handling, the Simplest Thing That Could Possibly Work is to error-unsupported. Once we fix sema as described above, we can move the check there. Will update the patch, thanks everyone.
lgtm
lib/CodeGen/CGCUDABuiltin.cpp | ||
---|---|---|
90 | Doesn't printf return int? Maybe return RValue::get(llvm::ConstantInt::get(IntTy, 0))? |
Doesn't printf return int? Maybe return RValue::get(llvm::ConstantInt::get(IntTy, 0))?