Download Raw Diff

Details

Reviewers

bsmith
paulwalker-arm
peterwaller-arm
efriedma
StephenFan
c-rhodes
spatel

Commits

rG39af4659f240: [AArch64][SVE] Replace destructive operand of vector zeros with a bundled…

Summary

Replace unary instructions where the destructive operand is a vector of zeros
with a bundled MOVPRFX instruction, e.g:

Transform:
    %X0 = DUP_ZI_S 0, 0
    %X0 = FLOGB_ZPmZ_S X0, P0, X2
 into:
     X0 = MOVPRFX P0/z, X1  // doesn't introduce any fake register dependencies compare to X0 = MOVPRFX P0/z, X0
     X0 = FLOGB_ZPmZ_S X0, P0, X2

NOTE: This patch add a @earlyclobber constraint to PredOneOpPassthruPseudo to ensure safe register allocation for movprfx usage.

Depends on D105889

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Allen created this revision.Nov 28 2022, 7:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 7:41 PM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

Allen requested review of this revision.Nov 28 2022, 7:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 7:41 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B199920: Diff 478442.Nov 28 2022, 8:27 PM

hi @paulwalker-arm, would you please give me some suggestion, thanks

Matt added a subscriber: Matt.Nov 30 2022, 5:39 PM

In D138888#3959465, @Allen wrote:

hi @paulwalker-arm, would you please give me some suggestion, thanks

I'll investigate further but this looks like an old implementation that existed before the requirement to fix D105889. With D105889 in place we should be able to handle zeroing unary operations much like the existing experimental-zeroing support (i.e. within isel by creating _ZERO pseudo instructions that get expanded later), albeit the unary variants likely don't need to be experimental because we can better control their register allocation requirements.

refactor with comment

In D138888#3964294, @paulwalker-arm wrote:

In D138888#3959465, @Allen wrote:

hi @paulwalker-arm, would you please give me some suggestion, thanks

I'll investigate further but this looks like an old implementation that existed before the requirement to fix D105889. With D105889 in place we should be able to handle zeroing unary operations much like the existing experimental-zeroing support (i.e. within isel by creating _ZERO pseudo instructions that get expanded later), albeit the unary variants likely don't need to be experimental because we can better control their register allocation requirements.

hi, @paulwalker-arm

Thanks for your advice. As instriction flogb and fneg have some different form in DAG ISEL,  so I only try to enable flogb in this patch. （fneg will be transformed into FNEG_MERGE_PASSTHRU）

Harbormaster completed remote builds in B204345: Diff 484509.Dec 21 2022, 3:44 AM

ping ?

Allen added a reviewer: StephenFan.Jan 18 2023, 1:13 AM

Allen added reviewers: c-rhodes, spatel, nikic.Jan 31 2023, 6:20 PM

(Not familiar with AArch64 / SVE)

@Allen: I've been out of the office throughout January, hence the sparse reviewing. I'll be back to normal from tomorrow and will start working my way through the backlog.

In D138888#4096167, @paulwalker-arm wrote:

@Allen: I've been out of the office throughout January, hence the sparse reviewing. I'll be back to normal from tomorrow and will start working my way through the backlog.

Thank you advance

Hi @Allen, again sorry for the delay. The patch looks mostly good but I think for the unary instructions the zeroing can be handled completely during isel with perhaps no changes to AArch64ExpandPseudoInsts.cpp necessary. Given the unary instructions have a dedicated operand for the inactive lanes I believe we can add a constraint to PredOneOpPassthruPseudo to ensure safe register allocation for movprfx usage. Something like:

let Constraints = !if(!eq(flags, FalseLanesZero), "$Zd = $Passthru,@earlyclobber $Zd", "");

This will ensure @passthru is allocated the same register as the destination whilst also being unique to the real input operand (i.e. the one containing the active lanes). The existing instruction expansion should emit:

movprfx	z0.h, p0/z, z1.h
flogb	z0.h, p0/m, z1.h

This is preferred over the output shown in sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll because it doesn't introduce any fake register dependencies, which is something we've already fixed for the UNDEF variants.

What do you think?

llvm/lib/Target/AArch64/SVEInstrFormats.td
2915–2922	Are these changes required? `sve_fp_2op_p_zd` looks to already set `DestructiveInstType` accordingly.
llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll
3	I believe this is the default and thus not required?

Handle review suggestions based on the current modification solution.

I agree new solution that doesn't affect AArch64ExpandPseudoInsts.cpp seems better, I'm still trying.

Allen marked 2 inline comments as done.Feb 3 2023, 11:15 PM

Allen added inline comments.

llvm/lib/Target/AArch64/SVEInstrFormats.td

2915–2922

Thanks, When I try all the modifications of multiclass sve2_fp_flogb, an assert error occurs when I execute test case sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll. Of course, the condition let DestructiveInstType = flags in is unnecessary, I'll delete it.

#2  0x0000aaaaaceef5c8 in llvm::AArch64InstPrinter::printInstruction (this=0xaaaab2be5f70, MI=0xffffffffc1c8, Address=0, STI=..., O=...)
    at lib/Target/AArch64/AArch64GenAsmWriter.inc:16963
16963	  assert(Bits != 0 && "Cannot print this instruction.");
(gdb) l
16958	  auto MnemonicInfo = getMnemonic(MI);
16959	
16960	  O << MnemonicInfo.first;
16961	
16962	  uint64_t Bits = MnemonicInfo.second;
16963	  assert(Bits != 0 && "Cannot print this instruction.");                 ----------- here -----------------------

llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll

Done, thanks

Harbormaster completed remote builds in B211843: Diff 494794.Feb 4 2023, 12:14 AM

add @earlyclobber constraint according the comment, and revert the unnecessary changes to AArch64ExpandPseudoInsts.cpp

Harbormaster completed remote builds in B211845: Diff 494796.Feb 4 2023, 2:35 AM

A couple of stylistic requests but otherwise looks good.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3584–3586	We typically put these blocks just after the instructions they're pseudos for so can this block be placed just after `defm FLOGB_ZPmZ`?
llvm/lib/Target/AArch64/SVEInstrFormats.td
515	Can this be `SVE_1_Op_PassthruZero_Pat`? and moved further up so they follow the existing `SVE_1_Op_PassthruUndef_....` classes.

This revision is now accepted and ready to land.Feb 6 2023, 9:09 AM

This revision was landed with ongoing or failed builds.Feb 6 2023, 7:04 PM

Closed by commit rG39af4659f240: [AArch64][SVE] Replace destructive operand of vector zeros with a bundled… (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG39af4659f240: [AArch64][SVE] Replace destructive operand of vector zeros with a bundled….

sorry, missing the above review, and I'll fix that.

Related link https://reviews.llvm.org/D143459

Allen mentioned this in rGe80f461d99ed: [AArch64] Fix missing comment on D138888, NFC.Feb 7 2023, 6:00 PM

Diff 495349

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines	bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
MachineBasicBlock::iterator &NextMBBI) {		MachineBasicBlock::iterator &NextMBBI) {
MachineInstr &MI = *MBBI;		MachineInstr &MI = *MBBI;
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();

// Check if we can expand the destructive op		// Check if we can expand the destructive op
int OrigInstr = AArch64::getSVEPseudoMap(MI.getOpcode());		int OrigInstr = AArch64::getSVEPseudoMap(MI.getOpcode());
if (OrigInstr != -1) {		if (OrigInstr != -1) {
auto &Orig = TII->get(OrigInstr);		auto &Orig = TII->get(OrigInstr);
if ((Orig.TSFlags & AArch64::DestructiveInstTypeMask)		if ((Orig.TSFlags & AArch64::DestructiveInstTypeMask) !=
!= AArch64::NotDestructive) {		AArch64::NotDestructive) {
return expand_DestructiveOp(MI, MBB, MBBI);		return expand_DestructiveOp(MI, MBB, MBBI);
}		}
}		}

switch (Opcode) {		switch (Opcode) {
default:		default:
break;		break;

▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 3,533 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE2] in {
def HISTSEG_ZZZ : sve2_hist_gen_segment<"histseg", int_aarch64_sve_histseg>;		def HISTSEG_ZZZ : sve2_hist_gen_segment<"histseg", int_aarch64_sve_histseg>;

// SVE2 histogram generation (vector)		// SVE2 histogram generation (vector)
defm HISTCNT_ZPzZZ : sve2_hist_gen_vector<"histcnt", int_aarch64_sve_histcnt>;		defm HISTCNT_ZPzZZ : sve2_hist_gen_vector<"histcnt", int_aarch64_sve_histcnt>;
} // End HasSVE2		} // End HasSVE2

let Predicates = [HasSVE2orSME] in {		let Predicates = [HasSVE2orSME] in {
// SVE2 floating-point base 2 logarithm as integer		// SVE2 floating-point base 2 logarithm as integer
defm FLOGB_ZPmZ : sve2_fp_flogb<"flogb", int_aarch64_sve_flogb>;		defm FLOGB_ZPmZ : sve2_fp_flogb<"flogb", "FLOGB_ZPZZ", int_aarch64_sve_flogb>;

// SVE2 floating-point convert precision		// SVE2 floating-point convert precision
defm FCVTXNT_ZPmZ : sve2_fp_convert_down_odd_rounding_top<"fcvtxnt", "int_aarch64_sve_fcvtxnt">;		defm FCVTXNT_ZPmZ : sve2_fp_convert_down_odd_rounding_top<"fcvtxnt", "int_aarch64_sve_fcvtxnt">;
defm FCVTX_ZPmZ : sve2_fp_convert_down_odd_rounding<"fcvtx", "int_aarch64_sve_fcvtx">;		defm FCVTX_ZPmZ : sve2_fp_convert_down_odd_rounding<"fcvtx", "int_aarch64_sve_fcvtx">;
defm FCVTNT_ZPmZ : sve2_fp_convert_down_narrow<"fcvtnt", "int_aarch64_sve_fcvtnt">;		defm FCVTNT_ZPmZ : sve2_fp_convert_down_narrow<"fcvtnt", "int_aarch64_sve_fcvtnt">;
defm FCVTLT_ZPmZ : sve2_fp_convert_up_long<"fcvtlt", "int_aarch64_sve_fcvtlt">;		defm FCVTLT_ZPmZ : sve2_fp_convert_up_long<"fcvtlt", "int_aarch64_sve_fcvtlt">;

// SVE2 floating-point pairwise operations		// SVE2 floating-point pairwise operations
Show All 25 Lines	let Predicates = [HasSVE2orSME] in {

// SVE2 bitwise xor and rotate right by immediate		// SVE2 bitwise xor and rotate right by immediate
defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar", int_aarch64_sve_xar>;		defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar", int_aarch64_sve_xar>;

// SVE2 extract vector (immediate offset, constructive)		// SVE2 extract vector (immediate offset, constructive)
def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;		def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;
} // End HasSVE2orSME		} // End HasSVE2orSME

		let Predicates = [HasSVE2orSME, UseExperimentalZeroingPseudos] in {
		defm FLOGB_ZPZZ : sve2_fp_un_pred_zeroing_hsd<int_aarch64_sve_flogb>;
		} // End HasSVE2orSME, UseExperimentalZeroingPseudos
		paulwalker-armUnsubmitted Not Done Reply Inline Actions We typically put these blocks just after the instructions they're pseudos for so can this block be placed just after `defm FLOGB_ZPmZ`? paulwalker-arm: We typically put these blocks just after the instructions they're pseudos for so can this block…

let Predicates = [HasSVE2] in {		let Predicates = [HasSVE2] in {
// SVE2 non-temporal gather loads		// SVE2 non-temporal gather loads
defm LDNT1SB_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00000, "ldnt1sb", AArch64ldnt1s_gather_z, nxv4i8>;		defm LDNT1SB_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00000, "ldnt1sb", AArch64ldnt1s_gather_z, nxv4i8>;
defm LDNT1B_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00001, "ldnt1b", AArch64ldnt1_gather_z, nxv4i8>;		defm LDNT1B_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00001, "ldnt1b", AArch64ldnt1_gather_z, nxv4i8>;
defm LDNT1SH_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00100, "ldnt1sh", AArch64ldnt1s_gather_z, nxv4i16>;		defm LDNT1SH_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00100, "ldnt1sh", AArch64ldnt1s_gather_z, nxv4i16>;
defm LDNT1H_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00101, "ldnt1h", AArch64ldnt1_gather_z, nxv4i16>;		defm LDNT1H_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b00101, "ldnt1h", AArch64ldnt1_gather_z, nxv4i16>;
defm LDNT1W_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b01001, "ldnt1w", AArch64ldnt1_gather_z, nxv4i32>;		defm LDNT1W_ZZR_S : sve2_mem_gldnt_vs_32_ptrs<0b01001, "ldnt1w", AArch64ldnt1_gather_z, nxv4i32>;

▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	: Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3, (vt4 ImmTy:$Op4))),
(inst $Op1, $Op2, $Op3, ImmTy:$Op4)>;		(inst $Op1, $Op2, $Op3, ImmTy:$Op4)>;

def SVEDup0 : ComplexPattern<vAny, 0, "SelectDupZero", []>;		def SVEDup0 : ComplexPattern<vAny, 0, "SelectDupZero", []>;
def SVEDup0Undef : ComplexPattern<vAny, 0, "SelectDupZeroOrUndef", []>;		def SVEDup0Undef : ComplexPattern<vAny, 0, "SelectDupZeroOrUndef", []>;

let AddedComplexity = 1 in {		let AddedComplexity = 1 in {
class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, ValueType vt3, Instruction inst>		ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),		: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Can this be `SVE_1_Op_PassthruZero_Pat`? and moved further up so they follow the existing `SVE_1_Op_PassthruUndef_....` classes. paulwalker-arm: Can this be `SVE_1_Op_PassthruZero_Pat`? and moved further up so they follow the existing…
(inst $Op1, $Op2, $Op3)>;		(inst $Op1, $Op2, $Op3)>;

class SVE_3_Op_Pat_Shift_Imm_SelZero<ValueType vtd, SDPatternOperator op,		class SVE_3_Op_Pat_Shift_Imm_SelZero<ValueType vtd, SDPatternOperator op,
ValueType vt1, ValueType vt2,		ValueType vt1, ValueType vt2,
Operand vt3, Instruction inst>		Operand vt3, Instruction inst>
: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), (i32 (vt3:$Op3)))),		: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), (i32 (vt3:$Op3)))),
(inst $Op1, $Op2, vt3:$Op3)>;		(inst $Op1, $Op2, vt3:$Op3)>;

		class SVE_2_Op_Pat_Zero<ValueType vtd, SDPatternOperator op, ValueType vt1,
		ValueType vt2, Instruction inst>
		: Pat<(vtd (op (vtd (SVEDup0)), vt1:$Op1, vt2:$Op2)),
		(inst (IMPLICIT_DEF), $Op1, $Op2)>;
}		}

//		//
// Common but less generic patterns.		// Common but less generic patterns.
//		//

class SVE_1_Op_AllActive_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_1_Op_AllActive_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
Instruction inst, Instruction ptrue>		Instruction inst, Instruction ptrue>
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	: SVEPseudo2Instr<name, 0>,
let FalseLanes = flags;		let FalseLanes = flags;
}		}
}		}

//		//
// Pseudos for passthru operands		// Pseudos for passthru operands
//		//
let hasNoSchedulingInfo = 1 in {		let hasNoSchedulingInfo = 1 in {
class PredOneOpPassthruPseudo<string name, ZPRRegOp zprty>		class PredOneOpPassthruPseudo<string name, ZPRRegOp zprty,
		FalseLanesEnum flags = FalseLanesNone>
: SVEPseudo2Instr<name, 0>,		: SVEPseudo2Instr<name, 0>,
Pseudo<(outs zprty:$Zd), (ins zprty:$Passthru, PPR3bAny:$Pg, zprty:$Zs), []>;		Pseudo<(outs zprty:$Zd), (ins zprty:$Passthru, PPR3bAny:$Pg, zprty:$Zs), []> {
		let FalseLanes = flags;
		let Constraints = !if(!eq(flags, FalseLanesZero), "$Zd = $Passthru,@earlyclobber $Zd", "");
		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Predicate Misc Group		// SVE Predicate Misc Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve_int_pfalse<bits<6> opc, string asm>		class sve_int_pfalse<bits<6> opc, string asm>
: I<(outs PPR8:$Pd), (ins),		: I<(outs PPR8:$Pd), (ins),
▲ Show 20 Lines • Show All 2,207 Lines • ▼ Show 20 Lines	multiclass sve_fp_2op_p_zd_HSD<bits<5> opc, string asm, SDPatternOperator op> {
defm : SVE_1_Op_PassthruUndef_Pat<nxv8f16, op, nxv8i1, nxv8f16, !cast<Instruction>(NAME # _UNDEF_H)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv8f16, op, nxv8i1, nxv8f16, !cast<Instruction>(NAME # _UNDEF_H)>;
defm : SVE_1_Op_PassthruUndef_Pat<nxv4f16, op, nxv4i1, nxv4f16, !cast<Instruction>(NAME # _UNDEF_H)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv4f16, op, nxv4i1, nxv4f16, !cast<Instruction>(NAME # _UNDEF_H)>;
defm : SVE_1_Op_PassthruUndef_Pat<nxv2f16, op, nxv2i1, nxv2f16, !cast<Instruction>(NAME # _UNDEF_H)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv2f16, op, nxv2i1, nxv2f16, !cast<Instruction>(NAME # _UNDEF_H)>;
defm : SVE_1_Op_PassthruUndef_Pat<nxv4f32, op, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _UNDEF_S)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv4f32, op, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _UNDEF_S)>;
defm : SVE_1_Op_PassthruUndef_Pat<nxv2f32, op, nxv2i1, nxv2f32, !cast<Instruction>(NAME # _UNDEF_S)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv2f32, op, nxv2i1, nxv2f32, !cast<Instruction>(NAME # _UNDEF_S)>;
defm : SVE_1_Op_PassthruUndef_Pat<nxv2f64, op, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _UNDEF_D)>;		defm : SVE_1_Op_PassthruUndef_Pat<nxv2f64, op, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _UNDEF_D)>;
}		}

multiclass sve2_fp_flogb<string asm, SDPatternOperator op> {		multiclass sve2_fp_flogb<string asm, string Ps, SDPatternOperator op> {
def _H : sve_fp_2op_p_zd<0b0011010, asm, ZPR16, ZPR16, ElementSizeH>;		def _H : sve_fp_2op_p_zd<0b0011010, asm, ZPR16, ZPR16, ElementSizeH>,
def _S : sve_fp_2op_p_zd<0b0011100, asm, ZPR32, ZPR32, ElementSizeS>;		SVEPseudo2Instr<Ps # _H, 1>;
def _D : sve_fp_2op_p_zd<0b0011110, asm, ZPR64, ZPR64, ElementSizeD>;		def _S : sve_fp_2op_p_zd<0b0011100, asm, ZPR32, ZPR32, ElementSizeS>,
		SVEPseudo2Instr<Ps # _S, 1>;
		def _D : sve_fp_2op_p_zd<0b0011110, asm, ZPR64, ZPR64, ElementSizeD>,
		SVEPseudo2Instr<Ps # _D, 1>;

		paulwalker-armUnsubmitted Done Reply Inline Actions Are these changes required? `sve_fp_2op_p_zd` looks to already set `DestructiveInstType` accordingly. paulwalker-arm: Are these changes required? `sve_fp_2op_p_zd` looks to already set `DestructiveInstType`…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks, When I try all the modifications of multiclass sve2_fp_flogb, an assert error occurs when I execute test case sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll. Of course, the condition let DestructiveInstType = flags in is unnecessary, I'll delete it. #2 0x0000aaaaaceef5c8 in llvm::AArch64InstPrinter::printInstruction (this=0xaaaab2be5f70, MI=0xffffffffc1c8, Address=0, STI=..., O=...) at lib/Target/AArch64/AArch64GenAsmWriter.inc:16963 16963 assert(Bits != 0 && "Cannot print this instruction."); (gdb) l 16958 auto MnemonicInfo = getMnemonic(MI); 16959 16960 O << MnemonicInfo.first; 16961 16962 uint64_t Bits = MnemonicInfo.second; 16963 assert(Bits != 0 && "Cannot print this instruction."); ----------- here ----------------------- Allen: Thanks, When I try all the modifications of multiclass sve2_fp_flogb, an assert error…
def : SVE_3_Op_Pat<nxv8i16, op, nxv8i16, nxv8i1, nxv8f16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Pat<nxv8i16, op, nxv8i16, nxv8i1, nxv8f16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Pat<nxv4i32, op, nxv4i32, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Pat<nxv4i32, op, nxv4i32, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Pat<nxv2i64, op, nxv2i64, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Pat<nxv2i64, op, nxv2i64, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _D)>;
}		}

		multiclass sve2_fp_un_pred_zeroing_hsd<SDPatternOperator op> {
		def _ZERO_H : PredOneOpPassthruPseudo<NAME # _H, ZPR16, FalseLanesZero>;
		def _ZERO_S : PredOneOpPassthruPseudo<NAME # _S, ZPR32, FalseLanesZero>;
		def _ZERO_D : PredOneOpPassthruPseudo<NAME # _D, ZPR64, FalseLanesZero>;

		def : SVE_2_Op_Pat_Zero<nxv8i16, op, nxv8i1, nxv8f16, !cast<Pseudo>(NAME # _ZERO_H)>;
		def : SVE_2_Op_Pat_Zero<nxv4i32, op, nxv4i1, nxv4f32, !cast<Pseudo>(NAME # _ZERO_S)>;
		def : SVE_2_Op_Pat_Zero<nxv2i64, op, nxv2i1, nxv2f64, !cast<Pseudo>(NAME # _ZERO_D)>;
		}

multiclass sve2_fp_convert_down_odd_rounding<string asm, string op> {		multiclass sve2_fp_convert_down_odd_rounding<string asm, string op> {
def _DtoS : sve_fp_2op_p_zd<0b0001010, asm, ZPR64, ZPR32, ElementSizeD>;		def _DtoS : sve_fp_2op_p_zd<0b0001010, asm, ZPR64, ZPR32, ElementSizeD>;
def : SVE_3_Op_Pat<nxv4f32, !cast<SDPatternOperator>(op # _f32f64), nxv4f32, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _DtoS)>;		def : SVE_3_Op_Pat<nxv4f32, !cast<SDPatternOperator>(op # _f32f64), nxv4f32, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _DtoS)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Floating Point Unary Operations - Unpredicated Group		// SVE Floating Point Unary Operations - Unpredicated Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 6,967 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 -mattr=+use-experimental-zeroing-pseudos < %s \| FileCheck %s

				paulwalker-armUnsubmitted Done Reply Inline Actions I believe this is the default and thus not required? paulwalker-arm: I believe this is the default and thus not required?
				AllenAuthorUnsubmitted Done Reply Inline Actions Done, thanks Allen: Done, thanks
				;
				; FLOGB
				;

				; NOTE: The %unused paramter ensures z0 is free, leading to a simpler test.
				define <vscale x 8 x i16> @flogb_f16(<vscale x 8 x i16> %unused, <vscale x 8 x i1> %pg, <vscale x 8 x half> %a) {
				; CHECK-LABEL: flogb_f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movprfx z0.h, p0/z, z1.h
				; CHECK-NEXT: flogb z0.h, p0/m, z1.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.flogb.nxv8f16(<vscale x 8 x i16> zeroinitializer,
				<vscale x 8 x i1> %pg,
				<vscale x 8 x half> %a)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @flogb_f32(<vscale x 4 x i32> %unused, <vscale x 4 x i1> %pg, <vscale x 4 x float> %a) {
				; CHECK-LABEL: flogb_f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movprfx z0.s, p0/z, z1.s
				; CHECK-NEXT: flogb z0.s, p0/m, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.flogb.nxv4f32(<vscale x 4 x i32> zeroinitializer,
				<vscale x 4 x i1> %pg,
				<vscale x 4 x float> %a)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @flogb_f64(<vscale x 2 x i64> %unused, <vscale x 2 x i1> %pg, <vscale x 2 x double> %a) {
				; CHECK-LABEL: flogb_f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movprfx z0.d, p0/z, z1.d
				; CHECK-NEXT: flogb z0.d, p0/m, z1.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.flogb.nxv2f64(<vscale x 2 x i64> zeroinitializer,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x double> %a)
				ret <vscale x 2 x i64> %out
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.flogb.nxv8f16(<vscale x 8 x i16>, <vscale x 8 x i1>, <vscale x 8 x half>)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.flogb.nxv4f32(<vscale x 4 x i32>, <vscale x 4 x i1>, <vscale x 4 x float>)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.flogb.nxv2f64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Replace destructive operand of vector zeros with a bundled MOVPRFX instruction
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495349

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Replace destructive operand of vector zeros with a bundled MOVPRFX instructionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495349

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm-zeroing.ll

[AArch64][SVE] Replace destructive operand of vector zeros with a bundled MOVPRFX instruction
ClosedPublic