Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.
Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
should pose no issues, correctness-wise as we can always load/store them as 16-bit untyped values.
Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward, anyways.
New line here.