- User Since
- May 20 2022, 10:29 AM (6 w, 2 d)
Fri, Jul 1
Ping! Any further thoughts on this change? To summarize the opens:
Wed, Jun 8
we need to make sure we still have an intrinsic that produces "scvtf h0, h0, #16" etc.
That particular example is easy, since there's only one instruction form that does the int16_t -> float16_t conversion.
For the cases where there are multiple choices, are you saying there should be a way to force a particular form, even if it is suboptimal? e.g. something to force generation of scvtf s0, w0, #10 instead of scvtf s0, s0, #10, even if the source is already in an FPR?
Mon, Jun 6
One of the existing tests (do_stuff in CodeGen/AArch64/arm64-fixed-point-scalar-cvt-dagcombine.ll) fails with this change, because we now generate an fmov followed by a gpr->fpr ucvtf, instead of the expected fpr->fpr ucvtf. Of course, other tests show that we now avoid an unnecessary fmov after this change.
Is there a preferred method for making a better decision for when to use the gpr->fpr or fpr->fpr flavor of these instructions?
I see the AArch64AdvSIMDScalarPass pass which converts certain GPR ops into AdvSIMD scalar ops when it would save on copies, which sounds mildly similar to the problem I'd like to solve here. Would it be reasonable to leverage that pass for this purpose?