We were still seeing occasional crashes with inline assembly blocks
using fp16/bf16 after my previous patches:
- https://reviews.llvm.org/rGff4027d152d0
- https://reviews.llvm.org/rG7d15212b8c0c
- https://reviews.llvm.org/rG20b2d11896d9
It turns out:
- The original two commits were wrong, and we should have always been choosing the SPR register class, not the HPR register class, so that LLVM's SelectionDAGBuilder correctly did the right splits/joins.
- The splitValueIntoRegisterParts/joinRegisterPartsIntoValue changes from rG20b2d11896d9 are still correct, even though they sometimes result in inefficient codegen of casts between fp16/bf16 and i32/f32 (which is visible in these tests).
This patch fixes crashes in getCopyToParts and when trying to select
(bf16 (bitconvert (fp16 ...))) dags when Neon is enabled.
This patch also adds support for passing fp16/bf16 values using the 'x'
constraint that is LLVM-specific. This should broadly match how we pass
with 't' and 'w', but with a different set of valid S registers.
Should f16 and bf16 be allowed for the x constraint too?