e0fb01e97b6b7d2fe66b17b36eeb98aa78c6e3bb caused issues in some of our HIP projects. Builds were failing because "__bf16" wasn't allowed on the target. This is because in those cases, the main target is AMDGPU (which doesn't have bf16), and the aux target is X86 (which has bf16).
This implements a fix similar to D57369 but for bf16 which prevents Clang from diagnosing uses of bf16 when compiling heterogenous applications.
Is there a particular reason we're singling out OpenMP here and not HIP/CUDA?