The current pretty-printer emits OpenCL-style memory spaces specifiers: __device , __constant, and __shared.
The correct CUDA memory space specifiers are: __device__, __constant__, and __shared__:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#variable-memory-space-specifiers
Details
Diff Detail
- Repository
- rC Clang
Event Timeline
Can you add tests for this change? We typically have these in Misc by passing -ast-print.
I think it's not so easy to provide such tests for CUDA.
CUDA memory space specifiers are implemented via attributes, e.g. #define __shared__ __attribute__((shared)).
As a result of this, they are pretty-printed via a different code path.
In my example, I call Ctx.getAddrSpaceQualType(QT, LangAS::cuda_shared), which is then pretty-printed via the code above.
Any hints how to provide tests for this one?
Could this be tested using diagnostics that prints the type? Like in test/SemaOpenCL/address-spaces.cl.
Same problem here: The CUDA memory space specifiers are represented via attributes, e.g. CUDASharedAttr and only converted in CodeGen to LangAS::cuda_shared.
We would need a different frontend that annotates LangAS::cuda_shared.
If there's no way to trigger a different spelling that a user would see, why is this change needed?
There are external tools (e.g. hipacc) that generate Clang AST. This AST uses LangAS annotations and emits incorrect memory space specifiers for CUDA when pretty-printed.
That's good to know!
We would need a different frontend that annotates LangAS::cuda_shared.
Do you happen to know why this behaves the way it does? e.g., is the bug that the frontend is annotating incorrectly and forces codegen to work around it, and fixing the frontend to annotate properly lets us remove some workarounds and fixes your issue?
CUDA maps __shared__ internally also to __attribute__((shared)):
#define __annotate__(a) \ __attribute__((a)) #define __location__(a) \ __annotate__(a) ... #define __shared__ \ __location__(shared)
My guess is that Clang does it just the same way and only converts to LangAS::cuda_shared for code generation in GetGlobalVarAddressSpace:
https://clang.llvm.org/doxygen/CodeGenModule_8cpp_source.html#l03305
In contrast, OpenCL uses keywords that are mapped directly to LangAS::opencl_local etc.
I agree with the change itself... but it's quite annoying that such things can't be tested. :(
In D54258#1297191, @Anastasia wrote:
I agree with the change itself... but it's quite annoying that such things can't be tested. :(
Yes, that's a pity :(
Is there anything missing so that this can be merged?
Yeah, it seems like trying to change this in the frontend would also be a fair amount of work.
Is there anything missing so that this can be merged?
No, I think this truly is untestable without major work. Given how trivial this change is, I'm okay with it.
I apologize, I wasn't aware you needed this merged on your behalf. I normally would be happy to do so, but I'm currently traveling. Maybe @Anastasia or someone else can help, otherwise I will merge it when I get back to the office in two weeks.
Merging in two weeks is fine for me.
My assumption was that accepted patches are merged into upstream in a timely manner.
Maybe this is not how it works?
No worries. I have just committed this.
My assumption was that accepted patches are merged into upstream in a timely manner.
Maybe this is not how it works?
No this is a manual process. The author is expected to commit unless he/she doesn't have commit rights. Then it has to be requested explicitly.