In current Clang, on the OpenMP NVPTX toolchain, math functions are resolved as math functions for the host. For example, a call to sqrt() in a target region will result in an LLVM-IR call which looks like this:
call double sqrt(double %1)
This patch allows for math functions in OpenMP NVPTX target regions to call the same math functions that CUDA code calls. For example, for sqrt we get:
call double @llvm.nvvm.sqrt.rn.d(double %1)
This is necessary for both correctness and performance.
This relies on implementation detail of particular variant of the header file you're assuming all compilations will include. This is a workaround of the real problem (attempting to use headers from machine X while targeting Y) at best.
D50845 is dealing with the issue of headers for target code. Hopefully, they'll find a way to provide device-specific headers, so you don't rely on host headers being parseable during device-side compilation.