Asm is a gnu extension for C, so at present -fopenmp -std=c99
and similar fail to compile on nvptx, bug 51344
Changing to __asm__ or __asm works for openmp, all three appear to work
for cuda. Suggesting __asm__ here as __asm is used by MSVC with different
syntax, so this should make for better error diagnostics if the header is
passed to a compiler other than clang.
Should we also change volatile -> __volatile__ here in other places in the file?