Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled.
The P64 macro casts its argument to an unsigned long long. Use %llu instead of %lld.
Chunk is defined as unsigned long long, shouldn't the modifier be %llu?
Same here, %llu?
For consistency with the rest of libomptarget, use the PRId64 macro to print an int64_t:
"dispatch init (static chunk) : num threads = %d, ub = %" PRId64 ","
Same here, PRId64 for ub, %llu for chunk.
size_t can be printed in a portable way with the %zu modifier. For the nvptx RTL this is not a problem, but in general size_t is not guaranteed to be a long int, so usage of %zu is preferred. This is what we use in the rest of libomptarget. Can you use %zu here as well for consistency?
I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?
Ok, will also update the other two lld to llu.
Not sure what is this. I can not find this PRId64 example.
It's not giving a warning because in CUDA long long int happens to be 64 bits long, so you can use %lld to print an int64_t. This is not guaranteed to be true on every platform, therefore the portable way to print an int64_t is using the PRId64 macro. That's what we do in the base library.
My understanding of PRId64 is just a pretty format for lld, which I did not master yet, maybe you can help me to understand those two macros better.
But not sure why %lld can be wrong with long long int, this is standard c/c++ which cuda is based on?
long long int is printed via %lld, but the value you are printing here is not a long long int, it's an int64_t. int64_t is printed via PRId64. int64_t is not necessarily equal to long long int.
PRId64 is not a pretty format for long long int, it's a macro that expands to the correct modifier for int64_t. long long int just happens to be 64 bits long in CUDA, but it's not guaranteed to be 64 bits everywhere. You don't get any compiler warnings here because in CUDA int64_t is typedef-ed as long long int and, consequently, PRId64 expands to lld. That doesn't mean that int64_t can be treated as a long long int everywhere.
E.g. some day we may reuse this code for another GPU architecture (we had a discussion a while ago about creating a GPU-agnostic RTL which then nvptx/amdgcn/some_other_gpu will specialize). If int64_t is defined on that other architecture as long int for instance, then %lld will trigger a warning. If, instead, we use the PRId64 macro, it will expand to the correct ld automatically.
I checked again on the format you suggested. To use it, you need #include <inttypes.h> For example the macro was defined like this in my version of the inttypes.h
# if __WORDSIZE == 64 # define __PRI64_PREFIX "l" # define __PRIPTR_PREFIX "l" # else # define __PRI64_PREFIX "ll" # define __PRIPTR_PREFIX # endif # define PRId8 "d" # define PRId16 "d" # define PRId32 "d" # define PRId64 __PRI64_PREFIX "d"
But the header causes a compilation issue for cuda clang. I am not sure how to solve that problem.