This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2/5
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
TargetLowering.cpp
-
Target/AArch64/
-
AArch64/
1
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-build-vector.ll
-
arm64-sli-sri-opt.ll

Differential D128144

[AArch64] Known bits for AArch64ISD::DUP
ClosedPublic

Authored by dmgreen on Jun 19 2022, 10:55 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
paulwalker-arm
david-arm
efriedma

Commits

rGc0ecbfa4fdf0: [AArch64] Known bits for AArch64ISD::DUP

Summary

An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jun 19 2022, 10:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2022, 10:55 AM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Jun 19 2022, 10:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2022, 10:55 AM

Thanks for isCanonicalConstantNode - hopefully it will work for some similar cases we have on X86.

Would AArch64 benefit from overriding isSplatValueForTargetNode to handle AArch64ISD::DUP?

Harbormaster completed remote builds in B170732: Diff 438192.Jun 19 2022, 12:36 PM

In D128144#3594769, @RKSimon wrote:

Thanks for isCanonicalConstantNode - hopefully it will work for some similar cases we have on X86.

Would AArch64 benefit from overriding isSplatValueForTargetNode to handle AArch64ISD::DUP?

Yeah that might be useful - thanks for the suggestion, I'll take a look. In the long run I think I would like to remove AArch64ISD::DUP and just use ISD::SPLAT_VECTOR for all vectors across AArch64. That would take some time though, and would hit the same problem of canonicalising into a BUILD_VECTOR.

RKSimon added inline comments.Jun 20 2022, 12:09 AM

llvm/include/llvm/CodeGen/TargetLowering.h
1866	Add doxygen comment
1867	Does this mean we can/should add ISD::SPLAT_VECTOR handling to TargetLowering::SimplifyDemandedBits ?
llvm/lib/Target/AArch64/AArch64ISelLowering.h
1157	We probably want to do something like this in case this gets extended in the future: return Opc == AArch64ISD::DUP \|\| TargetLoweringBase::isCanonicalConstantNode(Opc, VT);

dmgreen added inline comments.Jun 20 2022, 12:49 AM

llvm/include/llvm/CodeGen/TargetLowering.h
1867	Yeah that sounds good, but we will need to enable SimplifyDemandedBits for scalable vectors first, which is in a followup patch. Enabling ComputeKnownBits for SPLAT_VECTOR is a part of D128159.

Update as per review comments - Added doxygen and call the base function from AArch64. Also moved the definition of the function closer to other *ForTargetNode functions, and renamed to isTargetCanonicalConstantNode.

dmgreen added a child revision: D128159: [DAG] Enable scalable vectors handling in computeKnownBits.Jun 20 2022, 12:51 AM

Harbormaster completed remote builds in B170781: Diff 438278.Jun 20 2022, 12:51 AM

LGTM cheers

This revision is now accepted and ready to land.Jun 20 2022, 1:16 AM

In D128144#3595275, @dmgreen wrote:

I'll take a look. In the long run I think I would like to remove AArch64ISD::DUP and just use ISD::SPLAT_VECTOR for all vectors across AArch64. That would take some time though, and would hit the same problem of canonicalising into a BUILD_VECTOR.

A while back I tried this and there were only a few places that relied on DAGCombine not messing with AArch64ISD::DUP in order to emit good code. I didn't push on with it as I was unsure how others would feel. Given the statement above I think I'll dig it because the easy first step is to restrict AArch64ISD::DUP to only fixed length vectors. Please let me know if you'd rather me hold off.

RKSimon added inline comments.Jun 20 2022, 3:46 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3790	After some testing to reuse this on X86 we might need to change this to take a SDValue Op instead. X86 shares broadcasted constants across different vector widths (i.e. we broadcast to v8i32 and reuse it for v4i32 as well because we can freely access the bottom subvector) - which means we still have infinite loops from extract_subvector(broadcast_load(constant_pool), 0) patterns. Not sure if any other targets does anything similar? If not we can keep this as it is for now and I might need to tweak it when x86 support gets added.

A while back I tried this and there were only a few places that relied on DAGCombine not messing with AArch64ISD::DUP in order to emit good code. I didn't push on with it as I was unsure how others would feel. Given the statement above I think I'll dig it because the easy first step is to restrict AArch64ISD::DUP to only fixed length vectors. Please let me know if you'd rather me hold off.

Yeah - that does sound like a good incremental step.

llvm/include/llvm/CodeGen/TargetLowering.h
3790	SDValue sounds good to me - the EVT was always unused anyway and was just added in case it was useful. An SDValue sounds more general than that. I'll make that change as I commit it, feel free to adjust as you need in the future.

This revision was landed with ongoing or failed builds.Jun 20 2022, 11:12 AM

Closed by commit rGc0ecbfa4fdf0: [AArch64] Known bits for AArch64ISD::DUP (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGc0ecbfa4fdf0: [AArch64] Known bits for AArch64ISD::DUP.

This breaks compilation for me, causing hangs when compiling some source files. Repro with https://martin.st/temp/rdopt-preproc.c, built with clang -target aarch64-w64-mingw32 -w -c -O3 rdopt-preproc.c. Previously this completed in ~5 seconds, now it doesn't terminate.

Thanks for the report, I'll take a look and get it fixed. I think it's due to a similar issue Simon mentioned, and our combining of half vectors into full vectors from D126449.

dmgreen mentioned this in rG3f81841474fe: [AArch64] Add Extract(DUP(C)) as a canonical constant..Jun 21 2022, 1:51 AM

RKSimon mentioned this in rG843d43e62ae1: [X86] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST_LOAD handling.Jun 21 2022, 3:48 AM

This patch seems to miss the case where the DUP first operand is not an integer type

The following example crashes

define void @dot_product(double %a) {
entry:
  %fadd = call double @llvm.vector.reduce.fadd.v3f64(double %a, <3 x double> <double 1.000000e+00, double 1.000000e+00, double 0.000000e+00>)
  %sqrt = call double @llvm.sqrt.f64(double %fadd)
  %insert = insertelement <3 x double> zeroinitializer, double %sqrt, i64 0
  %shuffle = shufflevector <3 x double> %insert, <3 x double> zeroinitializer, <3 x i32> zeroinitializer
  %mul = fmul <3 x double> %shuffle, <double 1.000000e+00, double 1.000000e+00, double 0.000000e+00>
  %shuffle.1 = extractelement <3 x double> %mul, i64 0
  %shuffle.2 = extractelement <3 x double> %mul, i64 1
  %cmp = fcmp ogt double %shuffle.2, 0.000000e+00
  br i1 %cmp, label %exit, label %bb.1

bb.1:
  %mul.2 = fmul double %shuffle.1, 0.000000e+00
  br label %exit

exit:
  ret void
}

declare double @llvm.sqrt.f64(double)
declare double @llvm.vector.reduce.fadd.v3f64(double, <3 x double>)

link: https://godbolt.org/z/dPc68eWfz

Thanks. For reference: https://reviews.llvm.org/D148705

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

6 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

3 lines

Target/

AArch64/

AArch64ISelLowering.h

5 lines

AArch64ISelLowering.cpp

18 lines

test/

CodeGen/

AArch64/

arm64-build-vector.ll

4 lines

arm64-sli-sri-opt.ll

6 lines

Diff 438446

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,857 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
virtual Function *getSSPStackGuardCheck(const Module &M) const;		virtual Function *getSSPStackGuardCheck(const Module &M) const;

/// \returns true if a constant G_UBFX is legal on the target.		/// \returns true if a constant G_UBFX is legal on the target.
virtual bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,		virtual bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,
LLT Ty2) const {		LLT Ty2) const {
return false;		return false;
}		}

protected:		protected:
		RKSimonUnsubmitted Not Done Reply Inline Actions Add doxygen comment RKSimon: Add doxygen comment
Value *getDefaultSafeStackPointerLocation(IRBuilderBase &IRB,		Value *getDefaultSafeStackPointerLocation(IRBuilderBase &IRB,
		RKSimonUnsubmitted Not Done Reply Inline Actions Does this mean we can/should add ISD::SPLAT_VECTOR handling to TargetLowering::SimplifyDemandedBits ? RKSimon: Does this mean we can/should add ISD::SPLAT_VECTOR handling to TargetLowering…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah that sounds good, but we will need to enable SimplifyDemandedBits for scalable vectors first, which is in a followup patch. Enabling ComputeKnownBits for SPLAT_VECTOR is a part of D128159. dmgreen: Yeah that sounds good, but we will need to enable SimplifyDemandedBits for scalable vectors…
bool UseTLS) const;		bool UseTLS) const;

public:		public:
/// Returns the target-specific address of the unsafe stack pointer.		/// Returns the target-specific address of the unsafe stack pointer.
virtual Value *getSafeStackPointerLocation(IRBuilderBase &IRB) const;		virtual Value *getSafeStackPointerLocation(IRBuilderBase &IRB) const;

/// Returns the name of the symbol used to emit stack probes or the empty		/// Returns the name of the symbol used to emit stack probes or the empty
/// string if not applicable.		/// string if not applicable.
▲ Show 20 Lines • Show All 1,904 Lines • ▼ Show 20 Lines	virtual bool isKnownNeverNaNForTargetNode(SDValue Op,
unsigned Depth = 0) const;		unsigned Depth = 0) const;

/// Return true if vector \p Op has the same value across all \p DemandedElts,		/// Return true if vector \p Op has the same value across all \p DemandedElts,
/// indicating any elements which may be undef in the output \p UndefElts.		/// indicating any elements which may be undef in the output \p UndefElts.
virtual bool isSplatValueForTargetNode(SDValue Op, const APInt &DemandedElts,		virtual bool isSplatValueForTargetNode(SDValue Op, const APInt &DemandedElts,
APInt &UndefElts,		APInt &UndefElts,
unsigned Depth = 0) const;		unsigned Depth = 0) const;

		/// Returns true if the given Opc is considered a canonical constant for the
		/// target, which should not be transformed back into a BUILD_VECTOR.
		virtual bool isTargetCanonicalConstantNode(SDValue Op) const {
		RKSimonUnsubmitted Not Done Reply Inline Actions After some testing to reuse this on X86 we might need to change this to take a SDValue Op instead. X86 shares broadcasted constants across different vector widths (i.e. we broadcast to v8i32 and reuse it for v4i32 as well because we can freely access the bottom subvector) - which means we still have infinite loops from extract_subvector(broadcast_load(constant_pool), 0) patterns. Not sure if any other targets does anything similar? If not we can keep this as it is for now and I might need to tweak it when x86 support gets added. RKSimon: After some testing to reuse this on X86 we might need to change this to take a SDValue Op…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions SDValue sounds good to me - the EVT was always unused anyway and was just added in case it was useful. An SDValue sounds more general than that. I'll make that change as I commit it, feel free to adjust as you need in the future. dmgreen: SDValue sounds good to me - the EVT was always unused anyway and was just added in case it was…
		return Op.getOpcode() == ISD::SPLAT_VECTOR;
		}

struct DAGCombinerInfo {		struct DAGCombinerInfo {
void *DC; // The DAG Combiner object.		void *DC; // The DAG Combiner object.
CombineLevel Level;		CombineLevel Level;
bool CalledByLegalizer;		bool CalledByLegalizer;

public:		public:
SelectionDAG &DAG;		SelectionDAG &DAG;

▲ Show 20 Lines • Show All 1,180 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,582 Lines • ▼ Show 20 Lines	default:

// Just use computeKnownBits to compute output bits.		// Just use computeKnownBits to compute output bits.
Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);		Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
break;		break;
}		}

// If we know the value of all of the demanded bits, return this as a		// If we know the value of all of the demanded bits, return this as a
// constant.		// constant.
if (DemandedBits.isSubsetOf(Known.Zero \| Known.One)) {		if (!isTargetCanonicalConstantNode(Op) &&
		DemandedBits.isSubsetOf(Known.Zero \| Known.One)) {
// Avoid folding to a constant if any OpaqueConstant is involved.		// Avoid folding to a constant if any OpaqueConstant is involved.
const SDNode *N = Op.getNode();		const SDNode *N = Op.getNode();
for (SDNode *Op :		for (SDNode *Op :
llvm::make_range(SDNodeIterator::begin(N), SDNodeIterator::end(N))) {		llvm::make_range(SDNodeIterator::begin(N), SDNodeIterator::end(N))) {
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op))
if (C->isOpaque())		if (C->isOpaque())
return false;		return false;
}		}
▲ Show 20 Lines • Show All 7,128 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,123 Lines • ▼ Show 20 Lines	private:

bool SimplifyDemandedBitsForTargetNode(SDValue Op,		bool SimplifyDemandedBitsForTargetNode(SDValue Op,
const APInt &OriginalDemandedBits,		const APInt &OriginalDemandedBits,
const APInt &OriginalDemandedElts,		const APInt &OriginalDemandedElts,
KnownBits &Known,		KnownBits &Known,
TargetLoweringOpt &TLO,		TargetLoweringOpt &TLO,
unsigned Depth) const override;		unsigned Depth) const override;

		bool isTargetCanonicalConstantNode(SDValue Op) const override {
		return Op.getOpcode() == AArch64ISD::DUP \|\|
		TargetLowering::isTargetCanonicalConstantNode(Op);
		}

// Normally SVE is only used for byte size vectors that do not fit within a		// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be		// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.		// used for 64bit and 128bit vectors as well.
bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;		bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;

// With the exception of data-predicate transitions, no instructions are		// With the exception of data-predicate transitions, no instructions are
// required to cast between legal scalable vector types. However:		// required to cast between legal scalable vector types. However:
// 1. Packed and unpacked types have different bit lengths, meaning BITCAST		// 1. Packed and unpacked types have different bit lengths, meaning BITCAST
// is not universally useable.		// is not universally useable.
// 2. Most unpacked integer types are not legal and thus integer extends		// 2. Most unpacked integer types are not legal and thus integer extends
// cannot be used to convert between unpacked and packed types.		// cannot be used to convert between unpacked and packed types.
// These can make "bitcasting" a multiphase process. REINTERPRET_CAST is used		// These can make "bitcasting" a multiphase process. REINTERPRET_CAST is used
// to transition between unpacked and packed types of the same element type,		// to transition between unpacked and packed types of the same element type,
// with BITCAST used otherwise.		// with BITCAST used otherwise.
SDValue getSVESafeBitCast(EVT VT, SDValue Op, SelectionDAG &DAG) const;		SDValue getSVESafeBitCast(EVT VT, SDValue Op, SelectionDAG &DAG) const;

bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,		bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,
LLT Ty2) const override;		LLT Ty2) const override;
};		};

namespace AArch64 {		namespace AArch64 {
		RKSimonUnsubmitted Not Done Reply Inline Actions We probably want to do something like this in case this gets extended in the future: return Opc == AArch64ISD::DUP \|\| TargetLoweringBase::isCanonicalConstantNode(Opc, VT); RKSimon: We probably want to do something like this in case this gets extended in the future: ``` return…
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace AArch64		} // end namespace AArch64

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,799 Lines • ▼ Show 20 Lines	if (!C)
return false;		return false;
uint64_t Imm = C->getZExtValue();		uint64_t Imm = C->getZExtValue();
return optimizeLogicalImm(Op, Size, Imm, DemandedBits, TLO, NewOpc);		return optimizeLogicalImm(Op, Size, Imm, DemandedBits, TLO, NewOpc);
}		}

/// computeKnownBitsForTargetNode - Determine which of the bits specified in		/// computeKnownBitsForTargetNode - Determine which of the bits specified in
/// Mask are known to be either zero or one and return them Known.		/// Mask are known to be either zero or one and return them Known.
void AArch64TargetLowering::computeKnownBitsForTargetNode(		void AArch64TargetLowering::computeKnownBitsForTargetNode(
const SDValue Op, KnownBits &Known,		const SDValue Op, KnownBits &Known, const APInt &DemandedElts,
const APInt &DemandedElts, const SelectionDAG &DAG, unsigned Depth) const {		const SelectionDAG &DAG, unsigned Depth) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
break;		break;
		case AArch64ISD::DUP: {
		SDValue SrcOp = Op.getOperand(0);
		Known = DAG.computeKnownBits(SrcOp, Depth + 1);
		if (SrcOp.getValueSizeInBits() != Op.getScalarValueSizeInBits()) {
		assert(SrcOp.getValueSizeInBits() > Op.getScalarValueSizeInBits() &&
		"Expected DUP implicit truncation");
		Known = Known.trunc(Op.getScalarValueSizeInBits());
		}
		break;
		}
case AArch64ISD::CSEL: {		case AArch64ISD::CSEL: {
KnownBits Known2;		KnownBits Known2;
Known = DAG.computeKnownBits(Op->getOperand(0), Depth + 1);		Known = DAG.computeKnownBits(Op->getOperand(0), Depth + 1);
Known2 = DAG.computeKnownBits(Op->getOperand(1), Depth + 1);		Known2 = DAG.computeKnownBits(Op->getOperand(1), Depth + 1);
Known = KnownBits::commonBits(Known, Known2);		Known = KnownBits::commonBits(Known, Known2);
break;		break;
}		}
case AArch64ISD::BICi: {		case AArch64ISD::BICi: {
▲ Show 20 Lines • Show All 13,032 Lines • ▼ Show 20 Lines	performExtractVectorEltCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);		SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);
ConstantSDNode *ConstantN1 = dyn_cast<ConstantSDNode>(N1);		ConstantSDNode *ConstantN1 = dyn_cast<ConstantSDNode>(N1);

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
const bool FullFP16 = DAG.getSubtarget<AArch64Subtarget>().hasFullFP16();		const bool FullFP16 = DAG.getSubtarget<AArch64Subtarget>().hasFullFP16();
bool IsStrict = N0->isStrictFPOpcode();		bool IsStrict = N0->isStrictFPOpcode();

		// extract(dup x) -> x
		if (N0.getOpcode() == AArch64ISD::DUP)
		return DAG.getZExtOrTrunc(N0.getOperand(0), SDLoc(N), VT);

// Rewrite for pairwise fadd pattern		// Rewrite for pairwise fadd pattern
// (f32 (extract_vector_elt		// (f32 (extract_vector_elt
// (fadd (vXf32 Other)		// (fadd (vXf32 Other)
// (vector_shuffle (vXf32 Other) undef <1,X,...> )) 0))		// (vector_shuffle (vXf32 Other) undef <1,X,...> )) 0))
// ->		// ->
// (f32 (fadd (extract_vector_elt (vXf32 Other) 0)		// (f32 (fadd (extract_vector_elt (vXf32 Other) 0)
// (extract_vector_elt (vXf32 Other) 1))		// (extract_vector_elt (vXf32 Other) 1))
// For strict_fadd we need to make sure the old strict_fadd can be deleted, so		// For strict_fadd we need to make sure the old strict_fadd can be deleted, so
▲ Show 20 Lines • Show All 6,532 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-build-vector.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; The lowering of a widened f16 BUILD_VECTOR tries to optimize it by building			; The lowering of a widened f16 BUILD_VECTOR tries to optimize it by building
	; an equivalent integer vector and BITCAST-ing that. This case checks that			; an equivalent integer vector and BITCAST-ing that. This case checks that
	; normalizing the vector generates a valid result. The choice of the			; normalizing the vector generates a valid result. The choice of the
	; constant prevents earlier passes from replacing the BUILD_VECTOR.			; constant prevents earlier passes from replacing the BUILD_VECTOR.
	define void @widen_f16_build_vector(half* %addr) {			define void @widen_f16_build_vector(half* %addr) {
	; CHECK-LABEL: widen_f16_build_vector:			; CHECK-LABEL: widen_f16_build_vector:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #13294			; CHECK-NEXT: mov w8, #13294
	; CHECK-NEXT: dup.4h v0, w8			; CHECK-NEXT: movk w8, #13294, lsl #16
	; CHECK-NEXT: str s0, [x0]			; CHECK-NEXT: str w8, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = bitcast half* %addr to <2 x half>*			%1 = bitcast half* %addr to <2 x half>*
	store <2 x half> <half 0xH33EE, half 0xH33EE>, <2 x half>* %1, align 2			store <2 x half> <half 0xH33EE, half 0xH33EE>, <2 x half>* %1, align 2
	ret void			ret void
	}			}

	; Check that a single element vector is constructed with a mov			; Check that a single element vector is constructed with a mov
	define <1 x i64> @single_element_vector_i64(<1 x i64> %arg) {			define <1 x i64> @single_element_vector_i64(<1 x i64> %arg) {
	Show All 38 Lines

llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
store <4 x i16> %result, <4 x i16>* %dest, align 8		store <4 x i16> %result, <4 x i16>* %dest, align 8
ret void		ret void
}		}

define void @testRightBad4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {		define void @testRightBad4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {
; CHECK-LABEL: testRightBad4x16:		; CHECK-LABEL: testRightBad4x16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #16500		; CHECK-NEXT: mov w8, #16500
; CHECK-NEXT: ushr.4h v1, v1, #14
; CHECK-NEXT: dup.4h v2, w8		; CHECK-NEXT: dup.4h v2, w8
; CHECK-NEXT: and.8b v0, v0, v2		; CHECK-NEXT: and.8b v0, v0, v2
; CHECK-NEXT: orr.8b v0, v0, v1		; CHECK-NEXT: usra.4h v0, v1, #14
; CHECK-NEXT: str d0, [x0]		; CHECK-NEXT: str d0, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%and.i = and <4 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500>		%and.i = and <4 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500>
%vshl_n = lshr <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>		%vshl_n = lshr <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>
%result = or <4 x i16> %and.i, %vshl_n		%result = or <4 x i16> %and.i, %vshl_n
store <4 x i16> %result, <4 x i16>* %dest, align 8		store <4 x i16> %result, <4 x i16>* %dest, align 8
ret void		ret void
}		}
Show All 40 Lines	; CHECK-NEXT: ret
store <8 x i16> %result, <8 x i16>* %dest, align 16		store <8 x i16> %result, <8 x i16>* %dest, align 16
ret void		ret void
}		}

define void @testRightBad8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {		define void @testRightBad8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {
; CHECK-LABEL: testRightBad8x16:		; CHECK-LABEL: testRightBad8x16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #16500		; CHECK-NEXT: mov w8, #16500
; CHECK-NEXT: ushr.8h v1, v1, #14
; CHECK-NEXT: dup.8h v2, w8		; CHECK-NEXT: dup.8h v2, w8
; CHECK-NEXT: and.16b v0, v0, v2		; CHECK-NEXT: and.16b v0, v0, v2
; CHECK-NEXT: orr.16b v0, v0, v1		; CHECK-NEXT: usra.8h v0, v1, #14
; CHECK-NEXT: str q0, [x0]		; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%and.i = and <8 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500>		%and.i = and <8 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500>
%vshl_n = lshr <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>		%vshl_n = lshr <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>
%result = or <8 x i16> %and.i, %vshl_n		%result = or <8 x i16> %and.i, %vshl_n
store <8 x i16> %result, <8 x i16>* %dest, align 16		store <8 x i16> %result, <8 x i16>* %dest, align 16
ret void		ret void
}		}
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Known bits for AArch64ISD::DUPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 438446

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64-build-vector.ll

llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll

[AArch64] Known bits for AArch64ISD::DUP
ClosedPublic