This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Replace destructive operand of vector zeros with a bundled MOVPRFX instruction
ClosedPublic

Authored by Allen on Nov 28 2022, 7:41 PM.

Details

Summary

Replace unary instructions where the destructive operand is a vector of zeros
with a bundled MOVPRFX instruction, e.g:

Transform:
    %X0 = DUP_ZI_S 0, 0
    %X0 = FLOGB_ZPmZ_S X0, P0, X2
 into:
     X0 = MOVPRFX P0/z, X1  // doesn't introduce any fake register dependencies compare to X0 = MOVPRFX P0/z, X0
     X0 = FLOGB_ZPmZ_S X0, P0, X2
NOTE: This patch add a @earlyclobber constraint to PredOneOpPassthruPseudo to ensure safe register allocation for movprfx usage.

Depends on D105889

Diff Detail

Event Timeline

Allen created this revision.Nov 28 2022, 7:41 PM
Allen requested review of this revision.Nov 28 2022, 7:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 7:41 PM

hi @paulwalker-arm, would you please give me some suggestion, thanks

Matt added a subscriber: Matt.Nov 30 2022, 5:39 PM
paulwalker-arm added a comment.EditedDec 1 2022, 10:28 AM

hi @paulwalker-arm, would you please give me some suggestion, thanks

I'll investigate further but this looks like an old implementation that existed before the requirement to fix D105889. With D105889 in place we should be able to handle zeroing unary operations much like the existing experimental-zeroing support (i.e. within isel by creating _ZERO pseudo instructions that get expanded later), albeit the unary variants likely don't need to be experimental because we can better control their register allocation requirements.

Allen updated this revision to Diff 484509.Dec 21 2022, 2:49 AM
Allen edited the summary of this revision. (Show Details)

refactor with comment

Allen added a comment.Dec 21 2022, 3:04 AM

hi @paulwalker-arm, would you please give me some suggestion, thanks

I'll investigate further but this looks like an old implementation that existed before the requirement to fix D105889. With D105889 in place we should be able to handle zeroing unary operations much like the existing experimental-zeroing support (i.e. within isel by creating _ZERO pseudo instructions that get expanded later), albeit the unary variants likely don't need to be experimental because we can better control their register allocation requirements.

hi, @paulwalker-arm

Thanks for your advice. As instriction flogb and fneg have some different form in DAG ISEL,  so I only try to enable flogb in this patch. (fneg will be transformed into FNEG_MERGE_PASSTHRU)
nikic resigned from this revision.Feb 1 2023, 1:42 AM

(Not familiar with AArch64 / SVE)

@Allen: I've been out of the office throughout January, hence the sparse reviewing. I'll be back to normal from tomorrow and will start working my way through the backlog.

Allen added a comment.Feb 1 2023, 6:49 PM

@Allen: I've been out of the office throughout January, hence the sparse reviewing. I'll be back to normal from tomorrow and will start working my way through the backlog.

Thank you advance

Hi @Allen, again sorry for the delay. The patch looks mostly good but I think for the unary instructions the zeroing can be handled completely during isel with perhaps no changes to AArch64ExpandPseudoInsts.cpp necessary. Given the unary instructions have a dedicated operand for the inactive lanes I believe we can add a constraint to PredOneOpPassthruPseudo to ensure safe register allocation for movprfx usage. Something like:

let Constraints = !if(!eq(flags, FalseLanesZero), "$Zd = $Passthru,@earlyclobber $Zd", "");

This will ensure @passthru is allocated the same register as the destination whilst also being unique to the real input operand (i.e. the one containing the active lanes). The existing instruction expansion should emit:

movprfx	z0.h, p0/z, z1.h
flogb	z0.h, p0/m, z1.h

This is preferred over the output shown in sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll because it doesn't introduce any fake register dependencies, which is something we've already fixed for the UNDEF variants.

What do you think?

llvm/lib/Target/AArch64/SVEInstrFormats.td
2915–2922

Are these changes required? sve_fp_2op_p_zd looks to already set DestructiveInstType accordingly.

llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll
3

I believe this is the default and thus not required?

Allen updated this revision to Diff 494794.Feb 3 2023, 11:14 PM
Allen removed a reviewer: nikic.

Handle review suggestions based on the current modification solution.

I agree new solution that doesn't affect AArch64ExpandPseudoInsts.cpp seems better, I'm still trying.

Allen marked 2 inline comments as done.Feb 3 2023, 11:15 PM
Allen added inline comments.
llvm/lib/Target/AArch64/SVEInstrFormats.td
2915–2922

Thanks, When I try all the modifications of multiclass sve2_fp_flogb, an assert error occurs when I execute test case sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll. Of course, the condition let DestructiveInstType = flags in is unnecessary, I'll delete it.

#2  0x0000aaaaaceef5c8 in llvm::AArch64InstPrinter::printInstruction (this=0xaaaab2be5f70, MI=0xffffffffc1c8, Address=0, STI=..., O=...)
    at lib/Target/AArch64/AArch64GenAsmWriter.inc:16963
16963	  assert(Bits != 0 && "Cannot print this instruction.");
(gdb) l
16958	  auto MnemonicInfo = getMnemonic(MI);
16959	
16960	  O << MnemonicInfo.first;
16961	
16962	  uint64_t Bits = MnemonicInfo.second;
16963	  assert(Bits != 0 && "Cannot print this instruction.");                 ----------- here -----------------------
llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll
3

Done, thanks

Allen updated this revision to Diff 494796.Feb 4 2023, 1:47 AM
Allen marked 2 inline comments as done.
Allen edited the summary of this revision. (Show Details)

add @earlyclobber constraint according the comment, and revert the unnecessary changes to AArch64ExpandPseudoInsts.cpp

paulwalker-arm accepted this revision.Feb 6 2023, 9:09 AM

A couple of stylistic requests but otherwise looks good.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3584–3586

We typically put these blocks just after the instructions they're pseudos for so can this block be placed just after defm FLOGB_ZPmZ?

llvm/lib/Target/AArch64/SVEInstrFormats.td
515

Can this be SVE_1_Op_PassthruZero_Pat? and moved further up so they follow the existing SVE_1_Op_PassthruUndef_.... classes.

This revision is now accepted and ready to land.Feb 6 2023, 9:09 AM
This revision was landed with ongoing or failed builds.Feb 6 2023, 7:04 PM
This revision was automatically updated to reflect the committed changes.
Allen added a comment.EditedFeb 6 2023, 7:19 PM

sorry, missing the above review, and I'll fix that.