The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.
I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.