This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/
-
CodeGen/
-
aarch64-neon-intrinsics.c
-
aarch64-v8.2a-neon-intrinsics.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAArch64.td
-
lib/
-
IR/
1/3
AutoUpgrade.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64InstrInfo.td
-
AArch64LegalizerInfo.h
-
AArch64LegalizerInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
fallback-ambiguous-addp-intrinsic.mir
-
legalizer-info-validation.mir
-
arm64-neon-add-pairwise.ll
-
arm64-vadd.ll
-
autoupgrade-aarch64-neon-addp-float.ll

Differential D59655

[AArch64] Split the neon.addp intrinsic into integer and fp variants
ClosedPublic

Authored by aemerson on Mar 21 2019, 10:55 AM.

Download Raw Diff

Details

Reviewers

paquette
eli.friedman
t.p.northover
efriedma

Commits

rGc10b24691a02: [AArch64] Split the neon.addp intrinsic into integer and fp variants.
rC356722: [AArch64] Split the neon.addp intrinsic into integer and fp variants.
rL356722: [AArch64] Split the neon.addp intrinsic into integer and fp variants.

Summary

This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aemerson created this revision.Mar 21 2019, 10:55 AM

Herald added subscribers: Petar.Avramovic, hiraditya, kristof.beyls, javed.absar. · View Herald TranscriptMar 21 2019, 10:55 AM

Minor test tweak.

I've put up a langref change as a separate review: D59657

efriedma added a subscriber: efriedma.Mar 21 2019, 12:14 PM

efriedma added inline comments.

llvm/lib/IR/AutoUpgrade.cpp
574	This code is weird... you're computing the types in two different ways. Also, missing a check for F->arg_size() (so we don't crash on invalid IR).

aemerson marked an inline comment as done.Mar 21 2019, 1:00 PM

aemerson added inline comments.

llvm/lib/IR/AutoUpgrade.cpp
574	I'll consolidate the logic, but none of the other code here checks for IR validity. By the time we reach here the IR should be valid, we're just translating it to a newer version. I can put an assert anyway.

The IR at this

llvm/lib/IR/AutoUpgrade.cpp
574	The IR during autoupgrade should be loosely "valid", to the point of passing the asm/bitcode parser, but we don't check the signature of intrinsics until later, so someone could write `declare <4 x float> @llvm.aarch64.neon.addp.v4f32()`.

Simplify logic and don't try to upgrade if IR is invalid.

LGTM

This revision is now accepted and ready to land.Mar 21 2019, 2:04 PM

Closed by commit rL356722: [AArch64] Split the neon.addp intrinsic into integer and fp variants. (authored by aemerson). · Explain WhyMar 21 2019, 3:30 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

6 lines

test/

CodeGen/

aarch64-neon-intrinsics.c

6 lines

aarch64-v8.2a-neon-intrinsics.c

4 lines

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

1 line

lib/

IR/

AutoUpgrade.cpp

11 lines

Target/

AArch64/

AArch64InstrInfo.td

2 lines

AArch64LegalizerInfo.h

3 lines

AArch64LegalizerInfo.cpp

24 lines

test/

CodeGen/

AArch64/

GlobalISel/

fallback-ambiguous-addp-intrinsic.mir

legalizer-info-validation.mir

2 lines

arm64-neon-add-pairwise.ll

12 lines

arm64-vadd.ll

12 lines

autoupgrade-aarch64-neon-addp-float.ll

9 lines

Diff 191745

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,089 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
};		};

unsigned Int = LLVMIntrinsic;		unsigned Int = LLVMIntrinsic;
if ((Modifier & UnsignedAlts) && !Usgn)		if ((Modifier & UnsignedAlts) && !Usgn)
Int = AltLLVMIntrinsic;		Int = AltLLVMIntrinsic;

switch (BuiltinID) {		switch (BuiltinID) {
default: break;		default: break;
		case NEON::BI__builtin_neon_vpadd_v:
		case NEON::BI__builtin_neon_vpaddq_v:
		// We don't allow fp/int overloading of intrinsics.
		if (VTy->getElementType()->isFloatingPointTy())
		Int = Intrinsic::aarch64_neon_faddp;
		break;
case NEON::BI__builtin_neon_vabs_v:		case NEON::BI__builtin_neon_vabs_v:
case NEON::BI__builtin_neon_vabsq_v:		case NEON::BI__builtin_neon_vabsq_v:
if (VTy->getElementType()->isFloatingPointTy())		if (VTy->getElementType()->isFloatingPointTy())
return EmitNeonCall(CGM.getIntrinsic(Intrinsic::fabs, Ty), Ops, "vabs");		return EmitNeonCall(CGM.getIntrinsic(Intrinsic::fabs, Ty), Ops, "vabs");
return EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Ty), Ops, "vabs");		return EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Ty), Ops, "vabs");
case NEON::BI__builtin_neon_vaddhn_v: {		case NEON::BI__builtin_neon_vaddhn_v: {
llvm::VectorType *SrcTy =		llvm::VectorType *SrcTy =
llvm::VectorType::getExtendedElementVectorType(VTy);		llvm::VectorType::getExtendedElementVectorType(VTy);
▲ Show 20 Lines • Show All 8,824 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-neon-intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,405 Lines • ▼ Show 20 Lines
	// CHECK: ret <2 x i32> [[VPADD_V2_I]]			// CHECK: ret <2 x i32> [[VPADD_V2_I]]
	uint32x2_t test_vpadd_u32(uint32x2_t a, uint32x2_t b) {			uint32x2_t test_vpadd_u32(uint32x2_t a, uint32x2_t b) {
	return vpadd_u32(a, b);			return vpadd_u32(a, b);
	}			}

	// CHECK-LABEL: @test_vpadd_f32(			// CHECK-LABEL: @test_vpadd_f32(
	// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[VPADD_V2_I:%.*]] = call <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float> %a, <2 x float> %b)			// CHECK: [[VPADD_V2_I:%.*]] = call <2 x float> @llvm.aarch64.neon.faddp.v2f32(<2 x float> %a, <2 x float> %b)
	// CHECK: [[VPADD_V3_I:%.*]] = bitcast <2 x float> [[VPADD_V2_I]] to <8 x i8>			// CHECK: [[VPADD_V3_I:%.*]] = bitcast <2 x float> [[VPADD_V2_I]] to <8 x i8>
	// CHECK: ret <2 x float> [[VPADD_V2_I]]			// CHECK: ret <2 x float> [[VPADD_V2_I]]
	float32x2_t test_vpadd_f32(float32x2_t a, float32x2_t b) {			float32x2_t test_vpadd_f32(float32x2_t a, float32x2_t b) {
	return vpadd_f32(a, b);			return vpadd_f32(a, b);
	}			}

	// CHECK-LABEL: @test_vpaddq_s8(			// CHECK-LABEL: @test_vpaddq_s8(
	// CHECK: [[VPADDQ_V_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8> %a, <16 x i8> %b)			// CHECK: [[VPADDQ_V_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8> %a, <16 x i8> %b)
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	// CHECK: ret <4 x i32> [[VPADDQ_V2_I]]			// CHECK: ret <4 x i32> [[VPADDQ_V2_I]]
	uint32x4_t test_vpaddq_u32(uint32x4_t a, uint32x4_t b) {			uint32x4_t test_vpaddq_u32(uint32x4_t a, uint32x4_t b) {
	return vpaddq_u32(a, b);			return vpaddq_u32(a, b);
	}			}

	// CHECK-LABEL: @test_vpaddq_f32(			// CHECK-LABEL: @test_vpaddq_f32(
	// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	// CHECK: [[VPADDQ_V2_I:%.*]] = call <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float> %a, <4 x float> %b)			// CHECK: [[VPADDQ_V2_I:%.*]] = call <4 x float> @llvm.aarch64.neon.faddp.v4f32(<4 x float> %a, <4 x float> %b)
	// CHECK: [[VPADDQ_V3_I:%.*]] = bitcast <4 x float> [[VPADDQ_V2_I]] to <16 x i8>			// CHECK: [[VPADDQ_V3_I:%.*]] = bitcast <4 x float> [[VPADDQ_V2_I]] to <16 x i8>
	// CHECK: ret <4 x float> [[VPADDQ_V2_I]]			// CHECK: ret <4 x float> [[VPADDQ_V2_I]]
	float32x4_t test_vpaddq_f32(float32x4_t a, float32x4_t b) {			float32x4_t test_vpaddq_f32(float32x4_t a, float32x4_t b) {
	return vpaddq_f32(a, b);			return vpaddq_f32(a, b);
	}			}

	// CHECK-LABEL: @test_vpaddq_f64(			// CHECK-LABEL: @test_vpaddq_f64(
	// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %a to <16 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <2 x double> %a to <16 x i8>
	// CHECK: [[TMP1:%.*]] = bitcast <2 x double> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x double> %b to <16 x i8>
	// CHECK: [[VPADDQ_V2_I:%.*]] = call <2 x double> @llvm.aarch64.neon.addp.v2f64(<2 x double> %a, <2 x double> %b)			// CHECK: [[VPADDQ_V2_I:%.*]] = call <2 x double> @llvm.aarch64.neon.faddp.v2f64(<2 x double> %a, <2 x double> %b)
	// CHECK: [[VPADDQ_V3_I:%.*]] = bitcast <2 x double> [[VPADDQ_V2_I]] to <16 x i8>			// CHECK: [[VPADDQ_V3_I:%.*]] = bitcast <2 x double> [[VPADDQ_V2_I]] to <16 x i8>
	// CHECK: ret <2 x double> [[VPADDQ_V2_I]]			// CHECK: ret <2 x double> [[VPADDQ_V2_I]]
	float64x2_t test_vpaddq_f64(float64x2_t a, float64x2_t b) {			float64x2_t test_vpaddq_f64(float64x2_t a, float64x2_t b) {
	return vpaddq_f64(a, b);			return vpaddq_f64(a, b);
	}			}

	// CHECK-LABEL: @test_vqdmulh_s16(			// CHECK-LABEL: @test_vqdmulh_s16(
	// CHECK: [[TMP0:%.*]] = bitcast <4 x i16> %a to <8 x i8>			// CHECK: [[TMP0:%.*]] = bitcast <4 x i16> %a to <8 x i8>
	▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics.c

	Show First 20 Lines • Show All 730 Lines • ▼ Show 20 Lines
	// CHECK-LABEL: test_vmulxq_f16			// CHECK-LABEL: test_vmulxq_f16
	// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> %b)			// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> %b)
	// CHECK: ret <8 x half> [[MUL]]			// CHECK: ret <8 x half> [[MUL]]
	float16x8_t test_vmulxq_f16(float16x8_t a, float16x8_t b) {			float16x8_t test_vmulxq_f16(float16x8_t a, float16x8_t b) {
	return vmulxq_f16(a, b);			return vmulxq_f16(a, b);
	}			}

	// CHECK-LABEL: test_vpadd_f16			// CHECK-LABEL: test_vpadd_f16
	// CHECK: [[ADD:%.*]] = call <4 x half> @llvm.aarch64.neon.addp.v4f16(<4 x half> %a, <4 x half> %b)			// CHECK: [[ADD:%.*]] = call <4 x half> @llvm.aarch64.neon.faddp.v4f16(<4 x half> %a, <4 x half> %b)
	// CHECK: ret <4 x half> [[ADD]]			// CHECK: ret <4 x half> [[ADD]]
	float16x4_t test_vpadd_f16(float16x4_t a, float16x4_t b) {			float16x4_t test_vpadd_f16(float16x4_t a, float16x4_t b) {
	return vpadd_f16(a, b);			return vpadd_f16(a, b);
	}			}

	// CHECK-LABEL: test_vpaddq_f16			// CHECK-LABEL: test_vpaddq_f16
	// CHECK: [[ADD:%.*]] = call <8 x half> @llvm.aarch64.neon.addp.v8f16(<8 x half> %a, <8 x half> %b)			// CHECK: [[ADD:%.*]] = call <8 x half> @llvm.aarch64.neon.faddp.v8f16(<8 x half> %a, <8 x half> %b)
	// CHECK: ret <8 x half> [[ADD]]			// CHECK: ret <8 x half> [[ADD]]
	float16x8_t test_vpaddq_f16(float16x8_t a, float16x8_t b) {			float16x8_t test_vpaddq_f16(float16x8_t a, float16x8_t b) {
	return vpaddq_f16(a, b);			return vpaddq_f16(a, b);
	}			}

	// CHECK-LABEL: test_vpmax_f16			// CHECK-LABEL: test_vpmax_f16
	// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmaxp.v4f16(<4 x half> %a, <4 x half> %b)			// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmaxp.v4f16(<4 x half> %a, <4 x half> %b)
	// CHECK: ret <4 x half> [[MAX]]			// CHECK: ret <4 x half> [[MAX]]
	▲ Show 20 Lines • Show All 866 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
// Vector Min Across Lanes		// Vector Min Across Lanes
def int_aarch64_neon_sminv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;		def int_aarch64_neon_sminv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;
def int_aarch64_neon_uminv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;		def int_aarch64_neon_uminv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;
def int_aarch64_neon_fminv : AdvSIMD_1VectorArg_Float_Across_Intrinsic;		def int_aarch64_neon_fminv : AdvSIMD_1VectorArg_Float_Across_Intrinsic;
def int_aarch64_neon_fminnmv : AdvSIMD_1VectorArg_Float_Across_Intrinsic;		def int_aarch64_neon_fminnmv : AdvSIMD_1VectorArg_Float_Across_Intrinsic;

// Pairwise Add		// Pairwise Add
def int_aarch64_neon_addp : AdvSIMD_2VectorArg_Intrinsic;		def int_aarch64_neon_addp : AdvSIMD_2VectorArg_Intrinsic;
		def int_aarch64_neon_faddp : AdvSIMD_2VectorArg_Intrinsic;

// Long Pairwise Add		// Long Pairwise Add
// FIXME: In theory, we shouldn't need intrinsics for saddlp or		// FIXME: In theory, we shouldn't need intrinsics for saddlp or
// uaddlp, but tblgen's type inference currently can't handle the		// uaddlp, but tblgen's type inference currently can't handle the
// pattern fragments this ends up generating.		// pattern fragments this ends up generating.
def int_aarch64_neon_saddlp : AdvSIMD_1VectorArg_Expand_Intrinsic;		def int_aarch64_neon_saddlp : AdvSIMD_1VectorArg_Expand_Intrinsic;
def int_aarch64_neon_uaddlp : AdvSIMD_1VectorArg_Expand_Intrinsic;		def int_aarch64_neon_uaddlp : AdvSIMD_1VectorArg_Expand_Intrinsic;

▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 562 Lines • ▼ Show 20 Lines	if (vstRegex.match(Name)) {
NewFn = Intrinsic::getDeclaration(F->getParent(),		NewFn = Intrinsic::getDeclaration(F->getParent(),
StoreLaneInts[fArgs.size() - 5], Tys);		StoreLaneInts[fArgs.size() - 5], Tys);
return true;		return true;
}		}
if (Name == "aarch64.thread.pointer" \|\| Name == "arm.thread.pointer") {		if (Name == "aarch64.thread.pointer" \|\| Name == "arm.thread.pointer") {
NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::thread_pointer);		NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::thread_pointer);
return true;		return true;
}		}
		if (Name.startswith("aarch64.neon.addp")) {
		VectorType *ArgTy = cast<VectorType>(F->arg_begin()->getType());
		if (ArgTy->getElementType()->isFloatingPointTy()) {
		auto fArgs = F->getFunctionType()->params();
		efriedmaUnsubmitted Not Done Reply Inline Actions This code is weird... you're computing the types in two different ways. Also, missing a check for F->arg_size() (so we don't crash on invalid IR). efriedma: This code is weird... you're computing the types in two different ways. Also, missing a check…
		aemersonAuthorUnsubmitted Done Reply Inline Actions I'll consolidate the logic, but none of the other code here checks for IR validity. By the time we reach here the IR should be valid, we're just translating it to a newer version. I can put an assert anyway. aemerson: I'll consolidate the logic, but none of the other code here checks for IR validity. By the time…
		efriedmaUnsubmitted Not Done Reply Inline Actions The IR during autoupgrade should be loosely "valid", to the point of passing the asm/bitcode parser, but we don't check the signature of intrinsics until later, so someone could write `declare <4 x float> @llvm.aarch64.neon.addp.v4f32()`. efriedma: The IR during autoupgrade should be loosely "valid", to the point of passing the asm/bitcode…
		Type *Tys[] = {fArgs[0], fArgs[1]};
		NewFn = Intrinsic::getDeclaration(F->getParent(),
		Intrinsic::aarch64_neon_faddp, Tys);
		return true;
		}
		return false;
		}
break;		break;
}		}

case 'c': {		case 'c': {
if (Name.startswith("ctlz.") && F->arg_size() == 1) {		if (Name.startswith("ctlz.") && F->arg_size() == 1) {
rename(F);		rename(F);
NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctlz,		NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctlz,
F->arg_begin()->getType());		F->arg_begin()->getType());
▲ Show 20 Lines • Show All 3,357 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,493 Lines • ▼ Show 20 Lines
	def : Pat<(fabs (fsub VT:$Rn, VT:$Rm)), (!cast<Instruction>("FABD"#VT) VT:$Rn, VT:$Rm)>;			def : Pat<(fabs (fsub VT:$Rn, VT:$Rm)), (!cast<Instruction>("FABD"#VT) VT:$Rn, VT:$Rm)>;
	}			}
	let Predicates = [HasNEON, HasFullFP16] in {			let Predicates = [HasNEON, HasFullFP16] in {
	foreach VT = [ v4f16, v8f16 ] in			foreach VT = [ v4f16, v8f16 ] in
	def : Pat<(fabs (fsub VT:$Rn, VT:$Rm)), (!cast<Instruction>("FABD"#VT) VT:$Rn, VT:$Rm)>;			def : Pat<(fabs (fsub VT:$Rn, VT:$Rm)), (!cast<Instruction>("FABD"#VT) VT:$Rn, VT:$Rm)>;
	}			}
	defm FACGE : SIMDThreeSameVectorFPCmp<1,0,0b101,"facge",int_aarch64_neon_facge>;			defm FACGE : SIMDThreeSameVectorFPCmp<1,0,0b101,"facge",int_aarch64_neon_facge>;
	defm FACGT : SIMDThreeSameVectorFPCmp<1,1,0b101,"facgt",int_aarch64_neon_facgt>;			defm FACGT : SIMDThreeSameVectorFPCmp<1,1,0b101,"facgt",int_aarch64_neon_facgt>;
	defm FADDP : SIMDThreeSameVectorFP<1,0,0b010,"faddp",int_aarch64_neon_addp>;			defm FADDP : SIMDThreeSameVectorFP<1,0,0b010,"faddp",int_aarch64_neon_faddp>;
	defm FADD : SIMDThreeSameVectorFP<0,0,0b010,"fadd", fadd>;			defm FADD : SIMDThreeSameVectorFP<0,0,0b010,"fadd", fadd>;
	defm FCMEQ : SIMDThreeSameVectorFPCmp<0, 0, 0b100, "fcmeq", AArch64fcmeq>;			defm FCMEQ : SIMDThreeSameVectorFPCmp<0, 0, 0b100, "fcmeq", AArch64fcmeq>;
	defm FCMGE : SIMDThreeSameVectorFPCmp<1, 0, 0b100, "fcmge", AArch64fcmge>;			defm FCMGE : SIMDThreeSameVectorFPCmp<1, 0, 0b100, "fcmge", AArch64fcmge>;
	defm FCMGT : SIMDThreeSameVectorFPCmp<1, 1, 0b100, "fcmgt", AArch64fcmgt>;			defm FCMGT : SIMDThreeSameVectorFPCmp<1, 1, 0b100, "fcmgt", AArch64fcmgt>;
	defm FDIV : SIMDThreeSameVectorFP<1,0,0b111,"fdiv", fdiv>;			defm FDIV : SIMDThreeSameVectorFP<1,0,0b111,"fdiv", fdiv>;
	defm FMAXNMP : SIMDThreeSameVectorFP<1,0,0b000,"fmaxnmp", int_aarch64_neon_fmaxnmp>;			defm FMAXNMP : SIMDThreeSameVectorFP<1,0,0b000,"fmaxnmp", int_aarch64_neon_fmaxnmp>;
	defm FMAXNM : SIMDThreeSameVectorFP<0,0,0b000,"fmaxnm", fmaxnum>;			defm FMAXNM : SIMDThreeSameVectorFP<0,0,0b000,"fmaxnm", fmaxnum>;
	defm FMAXP : SIMDThreeSameVectorFP<1,0,0b110,"fmaxp", int_aarch64_neon_fmaxp>;			defm FMAXP : SIMDThreeSameVectorFP<1,0,0b110,"fmaxp", int_aarch64_neon_fmaxp>;
	▲ Show 20 Lines • Show All 3,308 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64LegalizerInfo.h

Show All 28 Lines	public:

bool legalizeCustom(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeCustom(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder,		MachineIRBuilder &MIRBuilder,
GISelChangeObserver &Observer) const override;		GISelChangeObserver &Observer) const override;

private:		private:
bool legalizeVaArg(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeVaArg(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder) const;		MachineIRBuilder &MIRBuilder) const;

bool legalizeIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder) const;
};		};
} // End llvm namespace.		} // End llvm namespace.
#endif		#endif

llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	.fewerElementsIf(
},		},
[=](const LegalityQuery &Query) {		[=](const LegalityQuery &Query) {
LLT EltTy = Query.Types[0].getElementType();		LLT EltTy = Query.Types[0].getElementType();
if (EltTy == s64)		if (EltTy == s64)
return std::make_pair(0, LLT::vector(2, 64));		return std::make_pair(0, LLT::vector(2, 64));
return std::make_pair(0, EltTy);		return std::make_pair(0, EltTy);
});		});

// HACK: Check that the intrinsic isn't ambiguous.
// (See: https://bugs.llvm.org/show_bug.cgi?id=40968)
getActionDefinitionsBuilder(G_INTRINSIC)
.custom();

getActionDefinitionsBuilder(G_PHI)		getActionDefinitionsBuilder(G_PHI)
.legalFor({p0, s16, s32, s64})		.legalFor({p0, s16, s32, s64})
.clampScalar(0, s16, s64)		.clampScalar(0, s16, s64)
.widenScalarToNextPow2(0);		.widenScalarToNextPow2(0);

getActionDefinitionsBuilder(G_BSWAP)		getActionDefinitionsBuilder(G_BSWAP)
.legalFor({s32, s64})		.legalFor({s32, s64})
.clampScalar(0, s16, s64)		.clampScalar(0, s16, s64)
▲ Show 20 Lines • Show All 432 Lines • ▼ Show 20 Lines	bool AArch64LegalizerInfo::legalizeCustom(MachineInstr &MI,
MachineIRBuilder &MIRBuilder,		MachineIRBuilder &MIRBuilder,
GISelChangeObserver &Observer) const {		GISelChangeObserver &Observer) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
// No idea what to do.		// No idea what to do.
return false;		return false;
case TargetOpcode::G_VAARG:		case TargetOpcode::G_VAARG:
return legalizeVaArg(MI, MRI, MIRBuilder);		return legalizeVaArg(MI, MRI, MIRBuilder);
case TargetOpcode::G_INTRINSIC:
return legalizeIntrinsic(MI, MRI, MIRBuilder);
}		}

llvm_unreachable("expected switch to return");		llvm_unreachable("expected switch to return");
}		}

bool AArch64LegalizerInfo::legalizeIntrinsic(
MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder) const {
// HACK: Don't allow faddp/addp for now. We don't pass down the type info
// necessary to get this right today.
//
// It looks like addp/faddp is the only intrinsic that's impacted by this.
// All other intrinsics fully describe the required types in their names.
//
// (See: https://bugs.llvm.org/show_bug.cgi?id=40968)
const MachineOperand &IntrinOp = MI.getOperand(1);
if (IntrinOp.isIntrinsicID() &&
IntrinOp.getIntrinsicID() == Intrinsic::aarch64_neon_addp)
return false;
return true;
}

bool AArch64LegalizerInfo::legalizeVaArg(MachineInstr &MI,		bool AArch64LegalizerInfo::legalizeVaArg(MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder) const {		MachineIRBuilder &MIRBuilder) const {
MIRBuilder.setInstr(MI);		MIRBuilder.setInstr(MI);
MachineFunction &MF = MIRBuilder.getMF();		MachineFunction &MF = MIRBuilder.getMF();
unsigned Align = MI.getOperand(2).getImm();		unsigned Align = MI.getOperand(2).getImm();
unsigned Dst = MI.getOperand(0).getReg();		unsigned Dst = MI.getOperand(0).getReg();
unsigned ListPtr = MI.getOperand(1).getReg();		unsigned ListPtr = MI.getOperand(1).getReg();
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/fallback-ambiguous-addp-intrinsic.mir

This file was deleted.

	# RUN: llc -mtriple aarch64-unknown-unknown -O0 -start-before=legalizer -pass-remarks-missed=gisel* %s -o - 2>&1 \| FileCheck %s
	#
	# Check that we fall back on @llvm.aarch64.neon.addp and ensure that we get the
	# correct instruction.
	# https://bugs.llvm.org/show_bug.cgi?id=40968

	--- \|
	define <2 x float> @foo(<2 x float> %v1, <2 x float> %v2) {
	entry:
	%v3 = call <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float> %v1, <2 x float> %v2)
	ret <2 x float> %v3
	}
	declare <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float>, <2 x float>)
	...
	---
	name: foo
	alignment: 2
	tracksRegLiveness: true
	body: \|
	bb.1.entry:
	liveins: $d0, $d1
	; CHECK: remark:
	; CHECK-SAME: unable to legalize instruction: %2:_(<2 x s32>) = G_INTRINSIC intrinsic(@llvm.aarch64.neon.addp), %0:_(<2 x s32>), %1:_(<2 x s32>)
	; CHECK: faddp
	; CHECK-NOT: addp
	%0:_(<2 x s32>) = COPY $d0
	%1:_(<2 x s32>) = COPY $d1
	%2:_(<2 x s32>) = G_INTRINSIC intrinsic(@llvm.aarch64.neon.addp), %0(<2 x s32>), %1(<2 x s32>)
	$d0 = COPY %2(<2 x s32>)
	RET_ReallyLR implicit $d0

	...

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	#			#
	# DEBUG-NEXT: G_BRCOND (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_BRCOND (opcode {{[0-9]+}}): 1 type index
	# DEBUG: .. the first uncovered type index: 1, OK			# DEBUG: .. the first uncovered type index: 1, OK
	#			#
	# DEBUG-NEXT: G_BRINDIRECT (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_BRINDIRECT (opcode {{[0-9]+}}): 1 type index
	# DEBUG: .. the first uncovered type index: 1, OK			# DEBUG: .. the first uncovered type index: 1, OK
	#			#
	# DEBUG-NEXT: G_INTRINSIC (opcode {{[0-9]+}}): 0 type indices			# DEBUG-NEXT: G_INTRINSIC (opcode {{[0-9]+}}): 0 type indices
	# DEBUG: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG: .. type index coverage check SKIPPED: no rules defined
	#			#
	# DEBUG-NEXT: G_INTRINSIC_W_SIDE_EFFECTS (opcode {{[0-9]+}}): 0 type indices			# DEBUG-NEXT: G_INTRINSIC_W_SIDE_EFFECTS (opcode {{[0-9]+}}): 0 type indices
	# DEBUG: .. type index coverage check SKIPPED: no rules defined			# DEBUG: .. type index coverage check SKIPPED: no rules defined
	#			#
	# DEBUG-NEXT: G_ANYEXT (opcode {{[0-9]+}}): 2 type indices			# DEBUG-NEXT: G_ANYEXT (opcode {{[0-9]+}}): 2 type indices
	# DEBUG: .. the first uncovered type index: 2, OK			# DEBUG: .. the first uncovered type index: 2, OK
	#			#
	# DEBUG-NEXT: G_TRUNC (opcode {{[0-9]+}}): 2 type indices			# DEBUG-NEXT: G_TRUNC (opcode {{[0-9]+}}): 2 type indices
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-add-pairwise.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

	define <2 x i64> @test_addp_v2i64(<2 x i64> %lhs, <2 x i64> %rhs) {			define <2 x i64> @test_addp_v2i64(<2 x i64> %lhs, <2 x i64> %rhs) {
	; CHECK: test_addp_v2i64:			; CHECK: test_addp_v2i64:
	%val = call <2 x i64> @llvm.aarch64.neon.addp.v2i64(<2 x i64> %lhs, <2 x i64> %rhs)			%val = call <2 x i64> @llvm.aarch64.neon.addp.v2i64(<2 x i64> %lhs, <2 x i64> %rhs)
	; CHECK: addp v0.2d, v0.2d, v1.2d			; CHECK: addp v0.2d, v0.2d, v1.2d
	ret <2 x i64> %val			ret <2 x i64> %val
	}			}

	declare <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float>, <2 x float>)			declare <2 x float> @llvm.aarch64.neon.faddp.v2f32(<2 x float>, <2 x float>)
	declare <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.aarch64.neon.faddp.v4f32(<4 x float>, <4 x float>)
	declare <2 x double> @llvm.aarch64.neon.addp.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.aarch64.neon.faddp.v2f64(<2 x double>, <2 x double>)

	define <2 x float> @test_faddp_v2f32(<2 x float> %lhs, <2 x float> %rhs) {			define <2 x float> @test_faddp_v2f32(<2 x float> %lhs, <2 x float> %rhs) {
	; CHECK: test_faddp_v2f32:			; CHECK: test_faddp_v2f32:
	%val = call <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float> %lhs, <2 x float> %rhs)			%val = call <2 x float> @llvm.aarch64.neon.faddp.v2f32(<2 x float> %lhs, <2 x float> %rhs)
	; CHECK: faddp v0.2s, v0.2s, v1.2s			; CHECK: faddp v0.2s, v0.2s, v1.2s
	ret <2 x float> %val			ret <2 x float> %val
	}			}

	define <4 x float> @test_faddp_v4f32(<4 x float> %lhs, <4 x float> %rhs) {			define <4 x float> @test_faddp_v4f32(<4 x float> %lhs, <4 x float> %rhs) {
	; CHECK: test_faddp_v4f32:			; CHECK: test_faddp_v4f32:
	%val = call <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float> %lhs, <4 x float> %rhs)			%val = call <4 x float> @llvm.aarch64.neon.faddp.v4f32(<4 x float> %lhs, <4 x float> %rhs)
	; CHECK: faddp v0.4s, v0.4s, v1.4s			; CHECK: faddp v0.4s, v0.4s, v1.4s
	ret <4 x float> %val			ret <4 x float> %val
	}			}

	define <2 x double> @test_faddp_v2f64(<2 x double> %lhs, <2 x double> %rhs) {			define <2 x double> @test_faddp_v2f64(<2 x double> %lhs, <2 x double> %rhs) {
	; CHECK: test_faddp_v2f64:			; CHECK: test_faddp_v2f64:
	%val = call <2 x double> @llvm.aarch64.neon.addp.v2f64(<2 x double> %lhs, <2 x double> %rhs)			%val = call <2 x double> @llvm.aarch64.neon.faddp.v2f64(<2 x double> %lhs, <2 x double> %rhs)
	; CHECK: faddp v0.2d, v0.2d, v1.2d			; CHECK: faddp v0.2d, v0.2d, v1.2d
	ret <2 x double> %val			ret <2 x double> %val
	}			}

	define i32 @test_vaddv.v2i32(<2 x i32> %a) {			define i32 @test_vaddv.v2i32(<2 x i32> %a) {
	; CHECK-LABEL: test_vaddv.v2i32			; CHECK-LABEL: test_vaddv.v2i32
	; CHECK: addp {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.2s			; CHECK: addp {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.2s
	%1 = tail call i32 @llvm.aarch64.neon.saddv.i32.v2i32(<2 x i32> %a)			%1 = tail call i32 @llvm.aarch64.neon.saddv.i32.v2i32(<2 x i32> %a)
	ret i32 %1			ret i32 %1
	}			}

	declare i32 @llvm.aarch64.neon.saddv.i32.v2i32(<2 x i32>)			declare i32 @llvm.aarch64.neon.saddv.i32.v2i32(<2 x i32>)

llvm/test/CodeGen/AArch64/arm64-vadd.ll

	Show First 20 Lines • Show All 706 Lines • ▼ Show 20 Lines
	declare <4 x i32> @llvm.aarch64.neon.addp.v4i32(<4 x i32>, <4 x i32>) nounwind readnone			declare <4 x i32> @llvm.aarch64.neon.addp.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
	declare <2 x i64> @llvm.aarch64.neon.addp.v2i64(<2 x i64>, <2 x i64>) nounwind readnone			declare <2 x i64> @llvm.aarch64.neon.addp.v2i64(<2 x i64>, <2 x i64>) nounwind readnone

	define <2 x float> @faddp_2s(<2 x float>* %A, <2 x float>* %B) nounwind {			define <2 x float> @faddp_2s(<2 x float>* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: faddp_2s:			;CHECK-LABEL: faddp_2s:
	;CHECK: faddp.2s			;CHECK: faddp.2s
	%tmp1 = load <2 x float>, <2 x float>* %A			%tmp1 = load <2 x float>, <2 x float>* %A
	%tmp2 = load <2 x float>, <2 x float>* %B			%tmp2 = load <2 x float>, <2 x float>* %B
	%tmp3 = call <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float> %tmp1, <2 x float> %tmp2)			%tmp3 = call <2 x float> @llvm.aarch64.neon.faddp.v2f32(<2 x float> %tmp1, <2 x float> %tmp2)
	ret <2 x float> %tmp3			ret <2 x float> %tmp3
	}			}

	define <4 x float> @faddp_4s(<4 x float>* %A, <4 x float>* %B) nounwind {			define <4 x float> @faddp_4s(<4 x float>* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: faddp_4s:			;CHECK-LABEL: faddp_4s:
	;CHECK: faddp.4s			;CHECK: faddp.4s
	%tmp1 = load <4 x float>, <4 x float>* %A			%tmp1 = load <4 x float>, <4 x float>* %A
	%tmp2 = load <4 x float>, <4 x float>* %B			%tmp2 = load <4 x float>, <4 x float>* %B
	%tmp3 = call <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float> %tmp1, <4 x float> %tmp2)			%tmp3 = call <4 x float> @llvm.aarch64.neon.faddp.v4f32(<4 x float> %tmp1, <4 x float> %tmp2)
	ret <4 x float> %tmp3			ret <4 x float> %tmp3
	}			}

	define <2 x double> @faddp_2d(<2 x double>* %A, <2 x double>* %B) nounwind {			define <2 x double> @faddp_2d(<2 x double>* %A, <2 x double>* %B) nounwind {
	;CHECK-LABEL: faddp_2d:			;CHECK-LABEL: faddp_2d:
	;CHECK: faddp.2d			;CHECK: faddp.2d
	%tmp1 = load <2 x double>, <2 x double>* %A			%tmp1 = load <2 x double>, <2 x double>* %A
	%tmp2 = load <2 x double>, <2 x double>* %B			%tmp2 = load <2 x double>, <2 x double>* %B
	%tmp3 = call <2 x double> @llvm.aarch64.neon.addp.v2f64(<2 x double> %tmp1, <2 x double> %tmp2)			%tmp3 = call <2 x double> @llvm.aarch64.neon.faddp.v2f64(<2 x double> %tmp1, <2 x double> %tmp2)
	ret <2 x double> %tmp3			ret <2 x double> %tmp3
	}			}

	declare <2 x float> @llvm.aarch64.neon.addp.v2f32(<2 x float>, <2 x float>) nounwind readnone			declare <2 x float> @llvm.aarch64.neon.faddp.v2f32(<2 x float>, <2 x float>) nounwind readnone
	declare <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.aarch64.neon.faddp.v4f32(<4 x float>, <4 x float>) nounwind readnone
	declare <2 x double> @llvm.aarch64.neon.addp.v2f64(<2 x double>, <2 x double>) nounwind readnone			declare <2 x double> @llvm.aarch64.neon.faddp.v2f64(<2 x double>, <2 x double>) nounwind readnone

	define <2 x i64> @uaddl_duprhs(<4 x i32> %lhs, i32 %rhs) {			define <2 x i64> @uaddl_duprhs(<4 x i32> %lhs, i32 %rhs) {
	; CHECK-LABEL: uaddl_duprhs			; CHECK-LABEL: uaddl_duprhs
	; CHECK-NOT: ext.16b			; CHECK-NOT: ext.16b
	; CHECK: uaddl.2d			; CHECK: uaddl.2d
	%rhsvec.tmp = insertelement <2 x i32> undef, i32 %rhs, i32 0			%rhsvec.tmp = insertelement <2 x i32> undef, i32 %rhs, i32 0
	%rhsvec = insertelement <2 x i32> %rhsvec.tmp, i32 %rhs, i32 1			%rhsvec = insertelement <2 x i32> %rhsvec.tmp, i32 %rhs, i32 1

	▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/autoupgrade-aarch64-neon-addp-float.ll

This file was added.

				; RUN: opt -S < %s -mtriple=arm64 \| FileCheck %s
				declare <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float>, <4 x float>)

				; CHECK: call <4 x float> @llvm.aarch64.neon.faddp.v4f32
				define <4 x float> @upgrade_aarch64_neon_addp_float(<4 x float> %a, <4 x float> %b) {
				%res = call <4 x float> @llvm.aarch64.neon.addp.v4f32(<4 x float> %a, <4 x float> %b)
				ret <4 x float> %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Split the neon.addp intrinsic into integer and fp variantsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 191745

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/aarch64-neon-intrinsics.c

clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics.c

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/IR/AutoUpgrade.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64LegalizerInfo.h

llvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp

llvm/test/CodeGen/AArch64/GlobalISel/fallback-ambiguous-addp-intrinsic.mir

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

llvm/test/CodeGen/AArch64/arm64-neon-add-pairwise.ll

llvm/test/CodeGen/AArch64/arm64-vadd.ll

llvm/test/CodeGen/AArch64/autoupgrade-aarch64-neon-addp-float.ll

[AArch64] Split the neon.addp intrinsic into integer and fp variants
ClosedPublic