A useful step on the way to supporting full bfloat arithmetic is allowing code that immediately extends a bfloat before doing anything non-trivial, and truncate it back before storage.
This patch implements that by making sure we don't try any extload/truncstores and adding patterns for the relevant conversions.
I also had to ban GlobalISel from dealing with bfloat here because its type system only has s16 and it thinks they're normal half conversions.
clang-format: please reformat the code