Generate movprfx for floating point convert zeroing pseudo operations
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for the patch, I think this is looking reasonable to my knowledge. I've spotted the cause of the issues you've mentioned.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1677 | I believe this pattern should have the same tweak as discussed below -- SVEAllActive and FCVT_ZPmZ_HtoS => FCVT_ZPmZ_HtoS_UNDEF. | |
1682 | The change you're making is adding the condition that $Pg must be (SVEAllActive). This has the effect of:
To fix the unsigned cases below (line 1705), they also need the SVEAllActive constraint on $Pg and _UNDEF adding. |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1685 | I don't believe converting these patterns like this is ok, some of the intrinsics (such as @llvm.aarch64.sve.scvtf) can get lowered to these nodes without an all active predicate. With this patch I believe a case such as the below will either regress or fail to select. define <vscale x 8 x half> @scvtf_f16_i16(<vscale x 8 x half> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %b) { %out = call <vscale x 8 x half> @llvm.aarch64.sve.scvtf.nxv8f16.nxv8i16(<vscale x 8 x half> %a,<vscale x 8 x i1> %pg, <vscale x 8 x i16> %b) ret <vscale x 8 x half> %out } declare <vscale x 8 x half> @llvm.aarch64.sve.scvtf.nxv8f16.nxv8i16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x i16>) |
Getting there, thanks for the improvements. I've picked up on a few more things, but I think this is close to ready.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1684 | This comment appears to have been dropped. However, I note that it appears to be misleading/wrong. To me, it reads 'convert floating point to integer', but the conversion is taking a signed integer as input and producing a floating point. | |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
2596 | An ask, arising from the name vtint_extend: please rename this to vtint_wide and imm => imm_mask_wide. This is a narrowing rather than an extension. And the purpose of this type (and the related immediate imm) is to have a mask which selects only the desired bits from the input. Effectively, (as an example case), the patterns are producing a convert of an nxv2i16 to to an nxv2f16, but since there are only two of them, it's necessary to first put them in an nxv2f64 register. The extension operation translates from nxv2i16 to nxv2i64, for sign extension it's an sext, but for unsigned widening it masks out the higher bits with the and instruction. But since we're starting out with an nxv2i16 in the first place, and we know the result will only be used as an nxv2f16, we know those bits being masked won't be used and it can be dropped. | |
2605–2607 | (see later comment for rationale) | |
2627–2629 | Nit. Suggestion to reorder these. Typically we see multiclass { ... instructions ... ; ... patterns ... } or multiclass { ... instruction1 ... patterns1 ...; ... instructions2 ...; ... patterns2 ...; } but here they are interleaved (ins1, pat1, ins2, pat1, pat2). This could be subtly confusing, if someone is expecting the SVE_1_Op_Passthru_Round_Pat to correspond to the _UNDEF definition immediately above. More linebreaks also help here. | |
llvm/test/CodeGen/AArch64/sve-fcvt.ll | ||
1382 | It should be possible to exercise the relevant pattern without an explicit IR and, by analogy with ucvtf_stod_movprfx. The patterns of interest (involving and/sxtw, etc) exist to produce better code in the presence of unpacked types. |
Removed multiclass which matches patterns that can't arise from codegen as they are illegal operands in the ACLE
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1685 | These patterns exist to improve the code quality for the unpacked types (e.g. <vscale x 2 x i16>), and those are not expressible by the user (via the intrinsics). The predicate is all true by construction -- via the IR instructions -- so we have realised that the _UNDEF forms are what is wanted here, and these patterns are only needed for the (SVEAllActive) case. |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1677 | Shouldn't we retain the original pattern in case $Pg is not all active? | |
llvm/test/CodeGen/AArch64/sve-fpext-load.ll | ||
50 | I see how this can be an improvement, but it's also a bit of a hammer. I would at least have expected a check somewhere to ask if a MOVPRFX is free/cheap, is it worth adding that? |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
1677 | Ah I see this question was already answered by @peterwaller-arm. |
LGTM, as long as Sander agrees.
llvm/test/CodeGen/AArch64/sve-fpext-load.ll | ||
---|---|---|
50 | I think one key thing to note here is that this patch is just bringing these floating point converts in line with all of the other unary operations, for which this change was done some time ago. For now I think it's probably best to get this patch in and then later, if we deem it necessary, add machinery to only do this when appropriate for all appropriate operations, not just this one. |
Sure, I don't want to hold up this change, especially if we've already been going down this road as you said.
llvm/test/CodeGen/AArch64/sve-fpext-load.ll | ||
---|---|---|
50 | Thanks for the clarification. Yes, I'm happy with the machinery being added for all operations in a separate patch. |
I believe this pattern should have the same tweak as discussed below -- SVEAllActive and FCVT_ZPmZ_HtoS => FCVT_ZPmZ_HtoS_UNDEF.