Adapted targetDiag for the CUDA and used for the delayed diagnostics in
asm constructs. Works for both host and device compilation sides.
Details
- Reviewers
tra jlebar - Commits
- rG3167b3035e8a: [CUDA]Delayed diagnostics for the asm instructions.
rC354671: [CUDA]Delayed diagnostics for the asm instructions.
rL354671: [CUDA]Delayed diagnostics for the asm instructions.
rG16d3e1a4d20a: [CUDA]Delayed diagnostics for the asm instructions.
rC354593: [CUDA]Delayed diagnostics for the asm instructions.
rL354593: [CUDA]Delayed diagnostics for the asm instructions.
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thank you.
lib/Sema/Sema.cpp | ||
---|---|---|
1494–1496 ↗ | (On Diff #187621) | Nit: i'd use ternary op here or explicit if/else to indicate that CUDADiagIfDeviceCode/CUDADiagIfHostCode are treated the same and that CUDADiagIfHostCode() is *not* a catch-all of some kind. return getLangOpts().CUDAIsDevice ? CUDADiagIfDeviceCode(Loc, DiagID) : CUDADiagIfHostCode(Loc, DiagID) |
There's a new quirk we've ran into after this patch landed. Consider this code:
int foo() { int prev; __asm__ __volatile__("mov %0, 0" : "=a" (prev)::); return prev; }
When we compile for device, asm constraint is not valid for NVPTX, we emit delayed diag and move on. The function is never code-gen'ed so the diag never shows up. So far so good.
Now we add -Werror -Wininitialized and things break -- because we bail out early, prev is left uninitialized and is reported as such.
$ bin/clang++ -c --cuda-gpu-arch=sm_35 asm.cu -nocudainc --cuda-device-only -Wuninitialized -Werror asm.cu:4:10: error: variable 'prev' is uninitialized when used here [-Werror,-Wuninitialized] return prev; ^~~~ asm.cu:2:11: note: initialize the variable 'prev' to silence this warning int prev; ^ = 0 1 error generated when compiling for sm_35.
I think this should also show up in the test case in this patch, too, if you add -Wuninitialized
Hi Artem, I think we can just delay emission of this warning to solve this problem.
I'm not sure we can always tell whether the warning is real or if it's the consequence of failing to parse inline asm.
E.g.:
namespace { __host__ __device__ a() { int prev; __asm__ __volatile__("mov %0, 0" : "=a" (prev)::); return prev; } __host__ __device__ b() { int prev; return prev; } } //namespace
Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.
But it is going to be emitted for b() if b() is really used on the host or on the device. For a() the warning is going to be emitted only if it is really used on device, otherwise it is not.
Instead, we can try to do what we did before: construct GCCAsmStmt object, just like you said. What option do you prefer?
E.g.:
namespace { __host__ __device__ a() { int prev; __asm__ __volatile__("mov %0, 0" : "=a" (prev)::); return prev; } __host__ __device__ b() { int prev; return prev; } } //namespaceIdeally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.But it is going to be emitted for b() if b() is really used on the host or on the device.
Clang also emits the uninitialized warnings for b when it is not used -- as in the example above.
I'm OK with that as b is a valid function on both sides.
Suppressing uninitialized warning in this case would be wrong, IMO -- that would diverge from what clang would do if b didn't have __host__ __device__ attributes.
For a() the warning is going to be emitted only if it is really used on device, otherwise it is not.
Instead, we can try to do what we did before: construct GCCAsmStmt object, just like you said. What option do you prefer?
I think creating a GCCAsmStmt() is the right way to deal with this as it gives compiler the correct (well, as correct as we can at that point) info about the code, as opposed to giving compiler broken pieces and trying to suppress the fallout.