This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Remove false register dependency for unary FP convert operations
ClosedPublic

Authored by MattDevereau on Jan 31 2022, 8:34 AM.

Diff Detail

Event Timeline

MattDevereau created this revision.Jan 31 2022, 8:34 AM
MattDevereau requested review of this revision.Jan 31 2022, 8:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 8:34 AM

note that tests scvtf_stod_movprfx and ucvtf_stod_movprfx aren't working as expected

efriedma retitled this revision from [AArch64][SVE] Remove fake register dependency for unary FP convert operations to [AArch64][SVE] Remove false register dependency for unary FP convert operations.Jan 31 2022, 3:12 PM

Thanks for the patch, I think this is looking reasonable to my knowledge. I've spotted the cause of the issues you've mentioned.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1686–1688

I believe this pattern should have the same tweak as discussed below -- SVEAllActive and FCVT_ZPmZ_HtoS => FCVT_ZPmZ_HtoS_UNDEF.

1692–1693

The change you're making is adding the condition that $Pg must be (SVEAllActive).

This has the effect of:

  • making these patterns more specific than the patterns you are introducing in sve_fp_2op_p_zd{,r}, so 👍, they still match when they are needed for example in llvm/test/CodeGen/AArch64/sve-fcvt.ll, to avoid introducing sign extension.
  • if I'm not mistaken, (SVEAllActive):$Pg is the case where the _UNDEF pseudos are applicable. UNDEF in this case meaning that since the predicate is all active, it's fine if the inactive lanes are undef (since they're unused), and pseudo expansion can do its thing (and materialize a movprfx).
    • Therefore, for these patterns you have modified, you want to introduce UNDEF FCVT_ZPmZ_StoH => FCVT_ZPmZ_StoH_UNDEF. This fixes the scvtf_stod_movprfx case.
  • One thing to beware is that you're introducing the constraint that the $Pg is all true. This implies there are some cases where this may have matched before but won't anymore. I believe this to be OK since floating point conversion is necessarily unpredicated in the LangRef instructions, which will cause the predicate to be all active. But we should check if it's possible to specify a predicate through the aarch64.sve.* intrinsics (or ACLE).
    • Please can you and other reviewers consider for a moment if this matters. My initial pass on this says 'no' but more time or knowledge may indicate 'yes'.

To fix the unsigned cases below (line 1705), they also need the SVEAllActive constraint on $Pg and _UNDEF adding.

bsmith added inline comments.Feb 1 2022, 6:56 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1695–1696

I don't believe converting these patterns like this is ok, some of the intrinsics (such as @llvm.aarch64.sve.scvtf) can get lowered to these nodes without an all active predicate. With this patch I believe a case such as the below will either regress or fail to select.

define <vscale x 8 x half> @scvtf_f16_i16(<vscale x 8 x half> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %b) {
    %out = call <vscale x 8 x half> @llvm.aarch64.sve.scvtf.nxv8f16.nxv8i16(<vscale x 8 x half> %a,<vscale x 8 x i1> %pg, <vscale x 8 x i16> %b)
   ret <vscale x 8 x half> %out
}

declare <vscale x 8 x half> @llvm.aarch64.sve.scvtf.nxv8f16.nxv8i16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x i16>)

Added some missing tests and patterns for scvtf and ucvtf with non ptrue predicates

Getting there, thanks for the improvements. I've picked up on a few more things, but I think this is close to ready.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1694

This comment appears to have been dropped.

However, I note that it appears to be misleading/wrong. To me, it reads 'convert floating point to integer', but the conversion is taking a signed integer as input and producing a floating point.

llvm/lib/Target/AArch64/SVEInstrFormats.td
2618

An ask, arising from the name vtint_extend: please rename this to vtint_wide and imm => imm_mask_wide.

This is a narrowing rather than an extension. And the purpose of this type (and the related immediate imm) is to have a mask which selects only the desired bits from the input. Effectively, (as an example case), the patterns are producing a convert of an nxv2i16 to to an nxv2f16, but since there are only two of them, it's necessary to first put them in an nxv2f64 register. The extension operation translates from nxv2i16 to nxv2i64, for sign extension it's an sext, but for unsigned widening it masks out the higher bits with the and instruction. But since we're starting out with an nxv2i16 in the first place, and we know the result will only be used as an nxv2f16, we know those bits being masked won't be used and it can be dropped.

2626–2628

(see later comment for rationale)

2649–2651

Nit. Suggestion to reorder these. Typically we see multiclass { ... instructions ... ; ... patterns ... } or multiclass { ... instruction1 ... patterns1 ...; ... instructions2 ...; ... patterns2 ...; } but here they are interleaved (ins1, pat1, ins2, pat1, pat2). This could be subtly confusing, if someone is expecting the SVE_1_Op_Passthru_Round_Pat to correspond to the _UNDEF definition immediately above. More linebreaks also help here.

llvm/test/CodeGen/AArch64/sve-fcvt.ll
1382

It should be possible to exercise the relevant pattern without an explicit IR and, by analogy with ucvtf_stod_movprfx. The patterns of interest (involving and/sxtw, etc) exist to produce better code in the presence of unpacked types.

Removed multiclass which matches patterns that can't arise from codegen as they are illegal operands in the ACLE

peterwaller-arm added inline comments.Feb 3 2022, 6:49 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1695–1696

These patterns exist to improve the code quality for the unpacked types (e.g. <vscale x 2 x i16>), and those are not expressible by the user (via the intrinsics). The predicate is all true by construction -- via the IR instructions -- so we have realised that the _UNDEF forms are what is wanted here, and these patterns are only needed for the (SVEAllActive) case.

sdesmalen added inline comments.Feb 3 2022, 7:32 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1686–1688

Shouldn't we retain the original pattern in case $Pg is not all active?

llvm/test/CodeGen/AArch64/sve-fpext-load.ll
50

I see how this can be an improvement, but it's also a bit of a hammer. I would at least have expected a check somewhere to ask if a MOVPRFX is free/cheap, is it worth adding that?

sdesmalen added inline comments.Feb 3 2022, 7:48 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1686–1688

Ah I see this question was already answered by @peterwaller-arm.

bsmith accepted this revision.Feb 3 2022, 9:01 AM

LGTM, as long as Sander agrees.

llvm/test/CodeGen/AArch64/sve-fpext-load.ll
50

I think one key thing to note here is that this patch is just bringing these floating point converts in line with all of the other unary operations, for which this change was done some time ago.

For now I think it's probably best to get this patch in and then later, if we deem it necessary, add machinery to only do this when appropriate for all appropriate operations, not just this one.

This revision is now accepted and ready to land.Feb 3 2022, 9:01 AM
sdesmalen accepted this revision.Feb 3 2022, 9:17 AM

LGTM, as long as Sander agrees.

Sure, I don't want to hold up this change, especially if we've already been going down this road as you said.

llvm/test/CodeGen/AArch64/sve-fpext-load.ll
50

Thanks for the clarification. Yes, I'm happy with the machinery being added for all operations in a separate patch.

This revision was landed with ongoing or failed builds.Feb 4 2022, 1:56 AM
This revision was automatically updated to reflect the committed changes.