This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/4
AArch64MIPeepholeOpt.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-neon-vector-insert-uaddlv.ll
2/4
implicitly-set-zero-high-64-bits.ll
-
peephole-insvigpr.mir

Differential D147235

[AArch64] Remove redundant `mov 0` instruction for high 64-bits
ClosedPublic

Authored by jaykang10 on Mar 30 2023, 7:25 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samtebbs
efriedma

Commits

rG932911d6b10a: [AArch64] Remove redundant `mov 0` instruction for high 64-bits

Summary

gcc generates less instructions than llvm from below intrinsic example.

#include <arm_neon.h>

float16x8_t test1(const float32x4_t a) {
    float16x4_t b = vcvt_f16_f32(a);
    return vcombine_f16(b, vdup_n_f16(0.0));
}

uint8x8_t test2(uint16_t *in, uint8x8_t *dst, uint8x8_t idx) {
    return vtbl1_u8(vshrn_n_u16(vld1q_u16(in), 4), idx); 
}

gcc output
test1:
        fcvtn   v0.4h, v0.4s 
        fmov    d0, d0
        ret

test2:
        ldr     q1, [x0]
        shrn    v1.8b, v1.8h, 4
        tbl     v0.8b, {v1.16b}, v0.8b 
        ret

llvm output
test1:                                  // @test1
        movi    d1, #0000000000000000
        fcvtn   v0.4h, v0.4s
        mov     v0.d[1], v1.d[0]
        ret

test2:                                  // @test2
        ldr     q1, [x0]
        movi    v2.2d, #0000000000000000
        shrn    v1.8b, v1.8h, #4
        mov     v1.d[1], v2.d[0]
        tbl     v0.8b, { v1.16b }, v0.8b
        ret

The fcvtn and shrn instructions set zero for high 64-bits implicitly so we do not need mov 0 instruction for high 64-bits. It looks gcc has patterns for the cases. For example,

the gcc rtl pattern for test2 function's shrn
(define_insn "aarch64_shrn<mode>_insn_le"
  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
        (vec_concat:<VNARROWQ2>
          (truncate:<VNARROWQ>
            (lshiftrt:VQN (match_operand:VQN 1 "register_operand" "w")
              (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_<vn_mode>")))
          (match_operand:<VNARROWQ> 3 "aarch64_simd_or_scalar_imm_zero")))]
  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
  "shrn\\t%0.<Vntype>, %1.<Vtype>, %2"
  [(set_attr "type" "neon_shift_imm_narrow_q")]
)

llvm could also add tablegen patterns for them like gcc but it could be better to handle the patterns on MIR Peephole optimization pass because they have common sub patterns and the pass can consider multiple basic blocks.

With this patch, llvm generates below output.

llvm output
test1:                                  // @test1
        fcvtn   v0.4h, v0.4s
        ret

test2:                                  // @test2
        ldr     q1, [x0]
        shrn    v1.8b, v1.8h, #4
        tbl     v0.8b, { v1.16b }, v0.8b
        ret

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jaykang10 created this revision.Mar 30 2023, 7:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 7:25 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

jaykang10 requested review of this revision.Mar 30 2023, 7:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 7:25 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

jaykang10 edited the summary of this revision. (Show Details)Mar 30 2023, 7:29 AM

dmgreen added inline comments.Mar 30 2023, 8:44 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
618–619	Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For example maybe these, which I think produce 64bit results and are similar to the instructions you already have: RSHRNv2i32_shift RSHRNv4i16_shift RSHRNv8i8_shift SHRNv2i32_shift SHRNv4i16_shift SHRNv8i8_shift FCVTNv2i32 FCVTNv4i16 We might be able to get away with "Any instruction that defs a FPR64", but that might need more careful checking and there are quite a few of them. We should probably try and get these classes of instruction though, not just the exact sizes.
620	If this return's true directly, then isSetZeroHigh64bits won't be needed and more.
llvm/test/CodeGen/AArch64/implicitly-set-zero-high-64-bits.ll
8	We can probably remove all the `nofpclass(nan inf)` stuff
20	dst doesn't seem to be used.

Harbormaster completed remote builds in B222749: Diff 509669.Mar 30 2023, 8:52 AM

jaykang10 added inline comments.Mar 31 2023, 2:15 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
618–619	Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For example maybe these, which I think produce 64bit results and are similar to the instructions you already have: Yep, they write lower 64-bits and clear high 64-bits so I think we can add them. Let me add them. We might be able to get away with "Any instruction that defs a FPR64", but that might need more careful checking and there are quite a few of them. We should probably try and get these classes of instruction though, not just the exact sizes. Yep, I agree with you. It is worth to try. After committing this patch, let's check it.
620	Yep, let me remove it.
llvm/test/CodeGen/AArch64/implicitly-set-zero-high-64-bits.ll
8	Yep, let me remove it.
20	You are right! Let me remove it.

jaykang10 updated this revision to Diff 509953.Mar 31 2023, 3:08 AM

Harbormaster completed remote builds in B222952: Diff 509953.Mar 31 2023, 6:20 AM

georges added a subscriber: georges.Mar 31 2023, 12:00 PM

Thanks. LGTM

This revision is now accepted and ready to land.Apr 3 2023, 1:13 AM

This revision was landed with ongoing or failed builds.Apr 3 2023, 2:59 AM

Closed by commit rG932911d6b10a: [AArch64] Remove redundant `mov 0` instruction for high 64-bits (authored by jaykang10). · Explain Why

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rG932911d6b10a: [AArch64] Remove redundant `mov 0` instruction for high 64-bits.

dmgreen mentioned this in D149616: [AArch64] Extend fp64 top zeroing peephole to all instructions.May 1 2023, 2:25 PM

dmgreen mentioned this in rG6e7840dd42d1: [AArch64] Extend fp64 top zeroing peephole to all instructions.May 5 2023, 9:27 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64MIPeepholeOpt.cpp

87 lines

test/

CodeGen/

AArch64/

aarch64-neon-vector-insert-uaddlv.ll

48 lines

implicitly-set-zero-high-64-bits.ll

34 lines

peephole-insvigpr.mir

3 lines

Diff 510438

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp

Show All 40 Lines
//		//
// In cases where a source FPR is copied to a GPR in order to be copied		// In cases where a source FPR is copied to a GPR in order to be copied
// to a destination FPR, we can directly copy the values between the FPRs,		// to a destination FPR, we can directly copy the values between the FPRs,
// eliminating the use of the Integer unit. When we match a pattern of		// eliminating the use of the Integer unit. When we match a pattern of
// INSvi[X]gpr that is preceded by a chain of COPY instructions from a FPR		// INSvi[X]gpr that is preceded by a chain of COPY instructions from a FPR
// source, we use the INSvi[X]lane to replace the COPY & INSvi[X]gpr		// source, we use the INSvi[X]lane to replace the COPY & INSvi[X]gpr
// instructions.		// instructions.
//		//
		// 7. If MI sets zero for high 64-bits implicitly, remove `mov 0` for high
		// 64-bits. For example,
		//
		// %1:fpr64 = nofpexcept FCVTNv4i16 %0:fpr128, implicit $fpcr
		// %2:fpr64 = MOVID 0
		// %4:fpr128 = IMPLICIT_DEF
		// %3:fpr128 = INSERT_SUBREG %4:fpr128(tied-def 0), killed %2:fpr64, %subreg.dsub
		// %6:fpr128 = IMPLICIT_DEF
		// %5:fpr128 = INSERT_SUBREG %6:fpr128(tied-def 0), killed %1:fpr64, %subreg.dsub
		// %7:fpr128 = INSvi64lane %5:fpr128(tied-def 0), 1, killed %3:fpr128, 0
		// ==>
		// %1:fpr64 = nofpexcept FCVTNv4i16 %0:fpr128, implicit $fpcr
		// %6:fpr128 = IMPLICIT_DEF
		// %7:fpr128 = INSERT_SUBREG %6:fpr128(tied-def 0), killed %1:fpr64, %subreg.dsub
		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64ExpandImm.h"		#include "AArch64ExpandImm.h"
#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	struct AArch64MIPeepholeOpt : public MachineFunctionPass {
template <typename T>		template <typename T>
bool visitADDSSUBS(OpcodePair PosOpcs, OpcodePair NegOpcs, MachineInstr &MI);		bool visitADDSSUBS(OpcodePair PosOpcs, OpcodePair NegOpcs, MachineInstr &MI);

template <typename T>		template <typename T>
bool visitAND(unsigned Opc, MachineInstr &MI);		bool visitAND(unsigned Opc, MachineInstr &MI);
bool visitORR(MachineInstr &MI);		bool visitORR(MachineInstr &MI);
bool visitINSERT(MachineInstr &MI);		bool visitINSERT(MachineInstr &MI);
bool visitINSviGPR(MachineInstr &MI, unsigned Opc);		bool visitINSviGPR(MachineInstr &MI, unsigned Opc);
		bool visitINSvi64lane(MachineInstr &MI);
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "AArch64 MI Peephole Optimization pass";		return "AArch64 MI Peephole Optimization pass";
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	MachineInstr *INSvilaneMI =
.addImm(0);		.addImm(0);

LLVM_DEBUG(dbgs() << MI << " replace by:\n: " << *INSvilaneMI << "\n");		LLVM_DEBUG(dbgs() << MI << " replace by:\n: " << *INSvilaneMI << "\n");
(void)INSvilaneMI;		(void)INSvilaneMI;
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		static bool is64bitDefwithZeroHigh64bit(MachineInstr *MI) {
		// ToDo: check and add more MIs which set zero for high 64bits.
		switch (MI->getOpcode()) {
		default:
		break;
		case AArch64::FCVTNv2i32:
		case AArch64::FCVTNv4i16:
		case AArch64::RSHRNv2i32_shift:
		case AArch64::RSHRNv4i16_shift:
		dmgreenUnsubmitted Not Done Reply Inline Actions Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For example maybe these, which I think produce 64bit results and are similar to the instructions you already have: RSHRNv2i32_shift RSHRNv4i16_shift RSHRNv8i8_shift SHRNv2i32_shift SHRNv4i16_shift SHRNv8i8_shift FCVTNv2i32 FCVTNv4i16 We might be able to get away with "Any instruction that defs a FPR64", but that might need more careful checking and there are quite a few of them. We should probably try and get these classes of instruction though, not just the exact sizes. dmgreen: Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For example…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For example maybe these, which I think produce 64bit results and are similar to the instructions you already have: Yep, they write lower 64-bits and clear high 64-bits so I think we can add them. Let me add them. We might be able to get away with "Any instruction that defs a FPR64", but that might need more careful checking and there are quite a few of them. We should probably try and get these classes of instruction though, not just the exact sizes. Yep, I agree with you. It is worth to try. After committing this patch, let's check it. jaykang10: >Can we extend this to all the instructions that are like FCVTNv4i16/SHRNv8i8_shift? For…
		case AArch64::RSHRNv8i8_shift :
		dmgreenUnsubmitted Not Done Reply Inline Actions If this return's true directly, then isSetZeroHigh64bits won't be needed and more. dmgreen: If this return's true directly, then isSetZeroHigh64bits won't be needed and more.
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me remove it. jaykang10: Yep, let me remove it.
		case AArch64::SHRNv2i32_shift:
		case AArch64::SHRNv4i16_shift:
		case AArch64::SHRNv8i8_shift:
		return true;
		}

		return false;
		}

		bool AArch64MIPeepholeOpt::visitINSvi64lane(MachineInstr &MI) {
		// Check the MI for low 64-bits sets zero for high 64-bits implicitly.
		// We are expecting below case.
		//
		// %1:fpr64 = nofpexcept FCVTNv4i16 %0:fpr128, implicit $fpcr
		// %6:fpr128 = IMPLICIT_DEF
		// %5:fpr128 = INSERT_SUBREG %6:fpr128(tied-def 0), killed %1:fpr64, %subreg.dsub
		// %7:fpr128 = INSvi64lane %5:fpr128(tied-def 0), 1, killed %3:fpr128, 0
		MachineInstr *Low64MI = MRI->getUniqueVRegDef(MI.getOperand(1).getReg());
		if (Low64MI->getOpcode() != AArch64::INSERT_SUBREG)
		return false;
		Low64MI = MRI->getUniqueVRegDef(Low64MI->getOperand(2).getReg());
		if (!is64bitDefwithZeroHigh64bit(Low64MI))
		return false;

		// Check there is `mov 0` MI for high 64-bits.
		// We are expecting below cases.
		//
		// %2:fpr64 = MOVID 0
		// %4:fpr128 = IMPLICIT_DEF
		// %3:fpr128 = INSERT_SUBREG %4:fpr128(tied-def 0), killed %2:fpr64, %subreg.dsub
		// %7:fpr128 = INSvi64lane %5:fpr128(tied-def 0), 1, killed %3:fpr128, 0
		// or
		// %5:fpr128 = MOVIv2d_ns 0
		// %6:fpr64 = COPY %5.dsub:fpr128
		// %8:fpr128 = IMPLICIT_DEF
		// %7:fpr128 = INSERT_SUBREG %8:fpr128(tied-def 0), killed %6:fpr64, %subreg.dsub
		// %11:fpr128 = INSvi64lane %9:fpr128(tied-def 0), 1, killed %7:fpr128, 0
		MachineInstr *High64MI = MRI->getUniqueVRegDef(MI.getOperand(3).getReg());
		if (High64MI->getOpcode() != AArch64::INSERT_SUBREG)
		return false;
		High64MI = MRI->getUniqueVRegDef(High64MI->getOperand(2).getReg());
		if (High64MI->getOpcode() == TargetOpcode::COPY)
		High64MI = MRI->getUniqueVRegDef(High64MI->getOperand(1).getReg());
		if (High64MI->getOpcode() != AArch64::MOVID &&
		High64MI->getOpcode() != AArch64::MOVIv2d_ns)
		return false;
		if (High64MI->getOperand(1).getImm() != 0)
		return false;

		// Let's remove MIs for high 64-bits.
		Register OldDef = MI.getOperand(0).getReg();
		Register NewDef = MI.getOperand(1).getReg();
		MRI->replaceRegWith(OldDef, NewDef);
		MI.eraseFromParent();

		return true;
		}

bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {		bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());		TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
TRI = static_cast<const AArch64RegisterInfo *>(		TRI = static_cast<const AArch64RegisterInfo *>(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : make_early_inc_range(MBB)) {
Changed = visitINSviGPR(MI, AArch64::INSvi32lane);		Changed = visitINSviGPR(MI, AArch64::INSvi32lane);
break;		break;
case AArch64::INSvi16gpr:		case AArch64::INSvi16gpr:
Changed = visitINSviGPR(MI, AArch64::INSvi16lane);		Changed = visitINSviGPR(MI, AArch64::INSvi16lane);
break;		break;
case AArch64::INSvi8gpr:		case AArch64::INSvi8gpr:
Changed = visitINSviGPR(MI, AArch64::INSvi8lane);		Changed = visitINSviGPR(MI, AArch64::INSvi8lane);
break;		break;
		case AArch64::INSvi64lane:
		Changed = visitINSvi64lane(MI);
		break;
}		}
}		}
}		}

return Changed;		return Changed;
}		}

FunctionPass *llvm::createAArch64MIPeepholeOptPass() {		FunctionPass *llvm::createAArch64MIPeepholeOptPass() {
return new AArch64MIPeepholeOpt();		return new AArch64MIPeepholeOpt();
}		}

llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	entry:
%2 = uitofp <2 x i32> %1 to <2 x float>		%2 = uitofp <2 x i32> %1 to <2 x float>
store <2 x float> %2, ptr %0, align 8		store <2 x float> %2, ptr %0, align 8
ret void		ret void
}		}

define void @insert_vec_v6i64_uaddlv_from_v4i32(ptr %0) {		define void @insert_vec_v6i64_uaddlv_from_v4i32(ptr %0) {
; CHECK-LABEL: insert_vec_v6i64_uaddlv_from_v4i32:		; CHECK-LABEL: insert_vec_v6i64_uaddlv_from_v4i32:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: movi.2d v1, #0000000000000000		; CHECK-NEXT: movi.2d v0, #0000000000000000
; CHECK-NEXT: movi d0, #0000000000000000		; CHECK-NEXT: movi.2d v2, #0000000000000000
; CHECK-NEXT: movi.2d v3, #0000000000000000		; CHECK-NEXT: uaddlv.4s d1, v0
; CHECK-NEXT: uaddlv.4s d2, v1		; CHECK-NEXT: str d2, [x0, #16]
; CHECK-NEXT: str d3, [x0, #16]		; CHECK-NEXT: mov.d v0[0], v1[0]
; CHECK-NEXT: mov.d v1[0], v2[0]		; CHECK-NEXT: ucvtf.2d v0, v0
; CHECK-NEXT: ucvtf.2d v1, v1		; CHECK-NEXT: fcvtn v0.2s, v0.2d
; CHECK-NEXT: fcvtn v1.2s, v1.2d		; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: mov.d v1[1], v0[0]
; CHECK-NEXT: str q1, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%vaddlv = tail call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> zeroinitializer)		%vaddlv = tail call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> zeroinitializer)
%1 = insertelement <6 x i64> zeroinitializer, i64 %vaddlv, i64 0		%1 = insertelement <6 x i64> zeroinitializer, i64 %vaddlv, i64 0
%2 = uitofp <6 x i64> %1 to <6 x float>		%2 = uitofp <6 x i64> %1 to <6 x float>
store <6 x float> %2, ptr %0, align 8		store <6 x float> %2, ptr %0, align 8
ret void		ret void
Show All 16 Lines	entry:
%2 = uitofp <2 x i64> %1 to <2 x float>		%2 = uitofp <2 x i64> %1 to <2 x float>
store <2 x float> %2, ptr %0, align 8		store <2 x float> %2, ptr %0, align 8
ret void		ret void
}		}

define void @insert_vec_v5i64_uaddlv_from_v4i32(ptr %0) {		define void @insert_vec_v5i64_uaddlv_from_v4i32(ptr %0) {
; CHECK-LABEL: insert_vec_v5i64_uaddlv_from_v4i32:		; CHECK-LABEL: insert_vec_v5i64_uaddlv_from_v4i32:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: movi.2d v1, #0000000000000000		; CHECK-NEXT: movi.2d v0, #0000000000000000
; CHECK-NEXT: str wzr, [x0, #16]		; CHECK-NEXT: str wzr, [x0, #16]
; CHECK-NEXT: movi d0, #0000000000000000		; CHECK-NEXT: uaddlv.4s d1, v0
; CHECK-NEXT: uaddlv.4s d2, v1		; CHECK-NEXT: mov.d v0[0], v1[0]
; CHECK-NEXT: mov.d v1[0], v2[0]		; CHECK-NEXT: ucvtf.2d v0, v0
; CHECK-NEXT: ucvtf.2d v1, v1		; CHECK-NEXT: fcvtn v0.2s, v0.2d
; CHECK-NEXT: fcvtn v1.2s, v1.2d		; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: mov.d v1[1], v0[0]
; CHECK-NEXT: str q1, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%vaddlv = tail call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> zeroinitializer)		%vaddlv = tail call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> zeroinitializer)
%1 = insertelement <5 x i64> zeroinitializer, i64 %vaddlv, i64 0		%1 = insertelement <5 x i64> zeroinitializer, i64 %vaddlv, i64 0
%2 = uitofp <5 x i64> %1 to <5 x float>		%2 = uitofp <5 x i64> %1 to <5 x float>
store <5 x float> %2, ptr %0, align 8		store <5 x float> %2, ptr %0, align 8
ret void		ret void
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
%3 = uitofp <3 x i16> %2 to <3 x float>		%3 = uitofp <3 x i16> %2 to <3 x float>
store <3 x float> %3, ptr %0, align 8		store <3 x float> %3, ptr %0, align 8
ret void		ret void
}		}

define void @insert_vec_v16i64_uaddlv_from_v4i16(ptr %0) {		define void @insert_vec_v16i64_uaddlv_from_v4i16(ptr %0) {
; CHECK-LABEL: insert_vec_v16i64_uaddlv_from_v4i16:		; CHECK-LABEL: insert_vec_v16i64_uaddlv_from_v4i16:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
		; CHECK-NEXT: movi.2d v0, #0000000000000000
; CHECK-NEXT: movi.2d v1, #0000000000000000		; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: movi d0, #0000000000000000		; CHECK-NEXT: uaddlv.4h s2, v0
; CHECK-NEXT: movi.2d v2, #0000000000000000		; CHECK-NEXT: stp q0, q0, [x0, #32]
; CHECK-NEXT: uaddlv.4h s3, v1		; CHECK-NEXT: mov.s v1[0], v2[0]
; CHECK-NEXT: stp q1, q1, [x0, #32]		; CHECK-NEXT: ucvtf.2d v1, v1
; CHECK-NEXT: mov.s v2[0], v3[0]		; CHECK-NEXT: fcvtn v1.2s, v1.2d
; CHECK-NEXT: ucvtf.2d v2, v2		; CHECK-NEXT: stp q1, q0, [x0]
; CHECK-NEXT: fcvtn v2.2s, v2.2d
; CHECK-NEXT: mov.d v2[1], v0[0]
; CHECK-NEXT: stp q2, q1, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%vaddlv = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v4i16(<4 x i16> zeroinitializer)		%vaddlv = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v4i16(<4 x i16> zeroinitializer)
%1 = zext i32 %vaddlv to i64		%1 = zext i32 %vaddlv to i64
%2 = insertelement <16 x i64> zeroinitializer, i64 %1, i64 0		%2 = insertelement <16 x i64> zeroinitializer, i64 %1, i64 0
%3 = uitofp <16 x i64> %2 to <16 x float>		%3 = uitofp <16 x i64> %2 to <16 x float>
store <16 x float> %3, ptr %0, align 8		store <16 x float> %3, ptr %0, align 8
▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/implicitly-set-zero-high-64-bits.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu \| FileCheck %s

				declare <4 x i16> @llvm.aarch64.neon.vcvtfp2hf(<4 x float>) #2
				declare <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8>, <8 x i8>) #2

				define <8 x half> @test1(<4 x float> noundef %a) {
				; CHECK-LABEL: test1:
				dmgreenUnsubmitted Not Done Reply Inline Actions We can probably remove all the `nofpclass(nan inf)` stuff dmgreen: We can probably remove all the `nofpclass(nan inf)` stuff
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me remove it. jaykang10: Yep, let me remove it.
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvtn v0.4h, v0.4s
				; CHECK-NEXT: ret
				entry:
				%vcvt_f16_f321.i = tail call <4 x i16> @llvm.aarch64.neon.vcvtfp2hf(<4 x float> %a)
				%0 = bitcast <4 x i16> %vcvt_f16_f321.i to <4 x half>
				%shuffle.i = shufflevector <4 x half> %0, <4 x half> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x half> %shuffle.i
				}

				define <8 x i8> @test2(ptr nocapture noundef readonly %in, <8 x i8> noundef %idx) {
				; CHECK-LABEL: test2:
				dmgreenUnsubmitted Not Done Reply Inline Actions dst doesn't seem to be used. dmgreen: dst doesn't seem to be used.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions You are right! Let me remove it. jaykang10: You are right! Let me remove it.
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr q1, [x0]
				; CHECK-NEXT: shrn v1.8b, v1.8h, #4
				; CHECK-NEXT: tbl v0.8b, { v1.16b }, v0.8b
				; CHECK-NEXT: ret
				entry:
				%0 = load <8 x i16>, ptr %in, align 2
				%1 = lshr <8 x i16> %0, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>
				%vshrn_n = trunc <8 x i16> %1 to <8 x i8>
				%vtbl1.i = shufflevector <8 x i8> %vshrn_n, <8 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%vtbl11.i = tail call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %vtbl1.i, <8 x i8> %idx)
				ret <8 x i8> %vtbl11.i
				}

llvm/test/CodeGen/AArch64/peephole-insvigpr.mir

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	bb.0.entry:
; CHECK-NEXT: [[INSvi64lane:%[0-9]+]]:fpr128 = INSvi64lane [[MOVIv2d_ns]], 0, [[INSERT_SUBREG]], 0		; CHECK-NEXT: [[INSvi64lane:%[0-9]+]]:fpr128 = INSvi64lane [[MOVIv2d_ns]], 0, [[INSERT_SUBREG]], 0
; CHECK-NEXT: [[MOVID:%[0-9]+]]:fpr64 = MOVID 0		; CHECK-NEXT: [[MOVID:%[0-9]+]]:fpr64 = MOVID 0
; CHECK-NEXT: [[DEF1:%[0-9]+]]:fpr128 = IMPLICIT_DEF		; CHECK-NEXT: [[DEF1:%[0-9]+]]:fpr128 = IMPLICIT_DEF
; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:fpr128 = INSERT_SUBREG [[DEF1]], killed [[MOVID]], %subreg.dsub		; CHECK-NEXT: [[INSERT_SUBREG1:%[0-9]+]]:fpr128 = INSERT_SUBREG [[DEF1]], killed [[MOVID]], %subreg.dsub
; CHECK-NEXT: [[UCVTFv2f64_:%[0-9]+]]:fpr128 = nofpexcept UCVTFv2f64 killed [[INSvi64lane]], implicit $fpcr		; CHECK-NEXT: [[UCVTFv2f64_:%[0-9]+]]:fpr128 = nofpexcept UCVTFv2f64 killed [[INSvi64lane]], implicit $fpcr
; CHECK-NEXT: [[FCVTNv2i32_:%[0-9]+]]:fpr64 = nofpexcept FCVTNv2i32 killed [[UCVTFv2f64_]], implicit $fpcr		; CHECK-NEXT: [[FCVTNv2i32_:%[0-9]+]]:fpr64 = nofpexcept FCVTNv2i32 killed [[UCVTFv2f64_]], implicit $fpcr
; CHECK-NEXT: [[DEF2:%[0-9]+]]:fpr128 = IMPLICIT_DEF		; CHECK-NEXT: [[DEF2:%[0-9]+]]:fpr128 = IMPLICIT_DEF
; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:fpr128 = INSERT_SUBREG [[DEF2]], killed [[FCVTNv2i32_]], %subreg.dsub		; CHECK-NEXT: [[INSERT_SUBREG2:%[0-9]+]]:fpr128 = INSERT_SUBREG [[DEF2]], killed [[FCVTNv2i32_]], %subreg.dsub
; CHECK-NEXT: [[INSvi64lane1:%[0-9]+]]:fpr128 = INSvi64lane [[INSERT_SUBREG2]], 1, killed [[INSERT_SUBREG1]], 0
; CHECK-NEXT: [[COPY2:%[0-9]+]]:fpr64 = COPY [[MOVIv2d_ns]].dsub		; CHECK-NEXT: [[COPY2:%[0-9]+]]:fpr64 = COPY [[MOVIv2d_ns]].dsub
; CHECK-NEXT: STRDui killed [[COPY2]], [[COPY]], 2 :: (store (s64) into %ir.0 + 16)		; CHECK-NEXT: STRDui killed [[COPY2]], [[COPY]], 2 :: (store (s64) into %ir.0 + 16)
; CHECK-NEXT: STRQui killed [[INSvi64lane1]], [[COPY]], 0 :: (store (s128) into %ir.0, align 8)		; CHECK-NEXT: STRQui killed [[INSERT_SUBREG2]], [[COPY]], 0 :: (store (s128) into %ir.0, align 8)
; CHECK-NEXT: RET_ReallyLR		; CHECK-NEXT: RET_ReallyLR
%0:gpr64common = COPY $x0		%0:gpr64common = COPY $x0
%1:fpr128 = MOVIv2d_ns 0		%1:fpr128 = MOVIv2d_ns 0
%2:fpr64 = UADDLVv4i32v %1		%2:fpr64 = UADDLVv4i32v %1
%4:fpr128 = IMPLICIT_DEF		%4:fpr128 = IMPLICIT_DEF
%3:fpr128 = INSERT_SUBREG %4, killed %2, %subreg.dsub		%3:fpr128 = INSERT_SUBREG %4, killed %2, %subreg.dsub
%5:gpr64 = COPY %3.dsub		%5:gpr64 = COPY %3.dsub
%7:fpr128 = INSvi64gpr %1, 0, killed %5		%7:fpr128 = INSvi64gpr %1, 0, killed %5
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines