Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.
Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
should pose no issues, correctness-wise as we can always load/store them as 16-bit untyped values.
Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward, anyways.
Nice, this fixes the reg type for v2f16 from Float to Untyped, looks like no test picked up on that before.