This is an archive of the discontinued LLVM Phabricator instance.

[x86][AArch64] ask the target whether it has a vector blend instruction
ClosedPublic

Authored by sebpop on Mar 5 2018, 1:27 PM.

Download Raw Diff

Details

Reviewers

chandlerc
kristof.beyls
evandro
SjoerdMeijer

Commits

rGb4bd0a404fe2: [x86][aarch64] ask the backend whether it has a vector blend instruction
rL327132: [x86][aarch64] ask the backend whether it has a vector blend instruction

Summary

The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.

Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.

Diff Detail

Repository: rL LLVM

Event Timeline

sebpop created this revision.Mar 5 2018, 1:27 PM

Herald added subscribers: hiraditya, javed.absar, rengolin. · View Herald TranscriptMar 5 2018, 1:27 PM

It LGTM, but I wonder how it affects other major targets, like PPC. It's probably a good idea to give them some time to ponder this change.

This doesn't seem to make any difference for SystemZ. Adding Hal, Kit, and Nemanja for PowerPC ...

We have fairly comprehensive handling for vector shuffles in the PPC back end so I don't think this affects us. As long as there are no lit failures for PPC, this can go ahead as far as I'm concerned.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 9 2018, 6:31 AM

Closed by commit rL327132: [x86][aarch64] ask the backend whether it has a vector blend instruction (authored by spop). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

TargetLowering.h

3 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

50 lines

Target/

X86/

X86ISelLowering.h

2 lines

test/

CodeGen/

AArch64/

aarch64-vuzp.ll

10 lines

arm64-collect-loh.ll

4 lines

Diff 137740

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,111 Lines • ▼ Show 20 Lines	public:
/// In other words, unless the target performs a post-isel load combining,		/// In other words, unless the target performs a post-isel load combining,
/// this information should not be provided because it will generate more		/// this information should not be provided because it will generate more
/// loads.		/// loads.
virtual bool hasPairedLoad(EVT /LoadedType/,		virtual bool hasPairedLoad(EVT /LoadedType/,
unsigned & /RequiredAlignment/) const {		unsigned & /RequiredAlignment/) const {
return false;		return false;
}		}

		/// Return true if the target has a vector blend instruction.
		virtual bool hasVectorBlend() const { return false; }

/// \brief Get the maximum supported factor for interleaved memory accesses.		/// \brief Get the maximum supported factor for interleaved memory accesses.
/// Default to be the minimum interleave factor: 2.		/// Default to be the minimum interleave factor: 2.
virtual unsigned getMaxSupportedInterleaveFactor() const { return 2; }		virtual unsigned getMaxSupportedInterleaveFactor() const { return 2; }

/// \brief Lower an interleaved load to target specific intrinsics. Return		/// \brief Lower an interleaved load to target specific intrinsics. Return
/// true on success.		/// true on success.
///		///
/// \p LI is the vector load instruction.		/// \p LI is the vector load instruction.
▲ Show 20 Lines • Show All 1,478 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,551 Lines • ▼ Show 20 Lines	if (N1 == N2) {
for (int i = 0; i != NElts; ++i)		for (int i = 0; i != NElts; ++i)
if (MaskVec[i] >= NElts) MaskVec[i] -= NElts;		if (MaskVec[i] >= NElts) MaskVec[i] -= NElts;
}		}

// Canonicalize shuffle undef, v -> v, undef. Commute the shuffle mask.		// Canonicalize shuffle undef, v -> v, undef. Commute the shuffle mask.
if (N1.isUndef())		if (N1.isUndef())
commuteShuffle(N1, N2, MaskVec);		commuteShuffle(N1, N2, MaskVec);

		if (TLI->hasVectorBlend()) {
// If shuffling a splat, try to blend the splat instead. We do this here so		// If shuffling a splat, try to blend the splat instead. We do this here so
// that even when this arises during lowering we don't have to re-handle it.		// that even when this arises during lowering we don't have to re-handle it.
auto BlendSplat = [&](BuildVectorSDNode *BV, int Offset) {		auto BlendSplat = [&](BuildVectorSDNode *BV, int Offset) {
BitVector UndefElements;		BitVector UndefElements;
SDValue Splat = BV->getSplatValue(&UndefElements);		SDValue Splat = BV->getSplatValue(&UndefElements);
if (!Splat)		if (!Splat)
return;		return;

for (int i = 0; i < NElts; ++i) {		for (int i = 0; i < NElts; ++i) {
if (MaskVec[i] < Offset \|\| MaskVec[i] >= (Offset + NElts))		if (MaskVec[i] < Offset \|\| MaskVec[i] >= (Offset + NElts))
continue;		continue;

// If this input comes from undef, mark it as such.		// If this input comes from undef, mark it as such.
if (UndefElements[MaskVec[i] - Offset]) {		if (UndefElements[MaskVec[i] - Offset]) {
MaskVec[i] = -1;		MaskVec[i] = -1;
continue;		continue;
}		}

// If we can blend a non-undef lane, use that instead.		// If we can blend a non-undef lane, use that instead.
if (!UndefElements[i])		if (!UndefElements[i])
MaskVec[i] = i + Offset;		MaskVec[i] = i + Offset;
}		}
};		};
if (auto *N1BV = dyn_cast<BuildVectorSDNode>(N1))		if (auto *N1BV = dyn_cast<BuildVectorSDNode>(N1))
BlendSplat(N1BV, 0);		BlendSplat(N1BV, 0);
if (auto *N2BV = dyn_cast<BuildVectorSDNode>(N2))		if (auto *N2BV = dyn_cast<BuildVectorSDNode>(N2))
BlendSplat(N2BV, NElts);		BlendSplat(N2BV, NElts);
		}

// Canonicalize all index into lhs, -> shuffle lhs, undef		// Canonicalize all index into lhs, -> shuffle lhs, undef
// Canonicalize all index into rhs, -> shuffle rhs, undef		// Canonicalize all index into rhs, -> shuffle rhs, undef
bool AllLHS = true, AllRHS = true;		bool AllLHS = true, AllRHS = true;
bool N2Undef = N2.isUndef();		bool N2Undef = N2.isUndef();
for (int i = 0; i != NElts; ++i) {		for (int i = 0; i != NElts; ++i) {
if (MaskVec[i] >= NElts) {		if (MaskVec[i] >= NElts) {
if (N2Undef)		if (N2Undef)
▲ Show 20 Lines • Show All 6,886 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,092 Lines • ▼ Show 20 Lines	unsigned getNumRegistersForCallingConv(LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;

bool isIntDivCheap(EVT VT, AttributeList Attr) const override;		bool isIntDivCheap(EVT VT, AttributeList Attr) const override;

bool supportSwiftError() const override;		bool supportSwiftError() const override;

StringRef getStackProbeSymbolName(MachineFunction &MF) const override;		StringRef getStackProbeSymbolName(MachineFunction &MF) const override;

		bool hasVectorBlend() const override { return true; }

unsigned getMaxSupportedInterleaveFactor() const override { return 4; }		unsigned getMaxSupportedInterleaveFactor() const override { return 4; }

/// \brief Lower interleaved load(s) into target specific		/// \brief Lower interleaved load(s) into target specific
/// instructions/intrinsics.		/// instructions/intrinsics.
bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
ArrayRef<ShuffleVectorInst *> Shuffles,		ArrayRef<ShuffleVectorInst *> Shuffles,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Factor) const override;		unsigned Factor) const override;
▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/aarch64-vuzp.ll

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	entry:
%x = bitcast i8* %p1 to <8 x i8>*		%x = bitcast i8* %p1 to <8 x i8>*
%wide.vec = load <8 x i8>, <8 x i8>* %x, align 1		%wide.vec = load <8 x i8>, <8 x i8>* %x, align 1
%strided.vec = shufflevector <8 x i8> %wide.vec, <8 x i8> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%strided.vec = shufflevector <8 x i8> %wide.vec, <8 x i8> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%y = zext <4 x i8> %strided.vec to <4 x i32>		%y = zext <4 x i8> %strided.vec to <4 x i32>
%z = bitcast i32* %p2 to <4 x i32>*		%z = bitcast i32* %p2 to <4 x i32>*
store <4 x i32> %y, <4 x i32>* %z, align 4		store <4 x i32> %y, <4 x i32>* %z, align 4
ret void		ret void
}		}

		; Check that this pattern is recognized as a VZIP and
		; that the vector blend transform does not scramble the pattern.
		; CHECK-LABEL: vzipNoBlend:
		; CHECK: zip1
		define <8 x i8> @vzipNoBlend(<8 x i8>* %A, <8 x i16>* %B) nounwind {
		%t = load <8 x i8>, <8 x i8>* %A
		%vzip = shufflevector <8 x i8> %t, <8 x i8> <i8 0, i8 0, i8 0, i8 0, i8 undef, i8 undef, i8 undef, i8 undef>, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
		ret <8 x i8> %vzip
		}

llvm/trunk/test/CodeGen/AArch64/arm64-collect-loh.ll

	Show First 20 Lines • Show All 632 Lines • ▼ Show 20 Lines
	; a tuple register to appear in the lowering. Thus, the target			; a tuple register to appear in the lowering. Thus, the target
	; cpu is required to have the problem reproduced.			; cpu is required to have the problem reproduced.
	; CHECK-LABEL: _uninterestingSub			; CHECK-LABEL: _uninterestingSub
	; CHECK: [[LOH_LABEL0:Lloh[0-9]+]]:			; CHECK: [[LOH_LABEL0:Lloh[0-9]+]]:
	; CHECK: adrp [[ADRP_REG:x[0-9]+]], [[CONSTPOOL:lCPI[0-9]+_[0-9]+]]@PAGE			; CHECK: adrp [[ADRP_REG:x[0-9]+]], [[CONSTPOOL:lCPI[0-9]+_[0-9]+]]@PAGE
	; CHECK: [[LOH_LABEL1:Lloh[0-9]+]]:			; CHECK: [[LOH_LABEL1:Lloh[0-9]+]]:
	; CHECK: ldr q[[IDX:[0-9]+]], {{\[}}[[ADRP_REG]], [[CONSTPOOL]]@PAGEOFF]			; CHECK: ldr q[[IDX:[0-9]+]], {{\[}}[[ADRP_REG]], [[CONSTPOOL]]@PAGEOFF]
	; The tuple comes from the next instruction.			; The tuple comes from the next instruction.
	; CHECK-NEXT: tbl.16b v{{[0-9]+}}, { v{{[0-9]+}}, v{{[0-9]+}} }, v[[IDX]]			; CHECK: ext.16b v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, #1
	; CHECK: ret			; CHECK: ret
	; CHECK: .loh AdrpLdr [[LOH_LABEL0]], [[LOH_LABEL1]]			; CHECK: .loh AdrpLdr [[LOH_LABEL0]], [[LOH_LABEL1]]
	define void @uninterestingSub(i8* nocapture %row) #0 {			define void @uninterestingSub(i8* nocapture %row) #0 {
	%tmp = bitcast i8* %row to <16 x i8>*			%tmp = bitcast i8* %row to <16 x i8>*
	%tmp1 = load <16 x i8>, <16 x i8>* %tmp, align 16			%tmp1 = load <16 x i8>, <16 x i8>* %tmp, align 16
	%vext43 = shufflevector <16 x i8> <i8 undef, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> %tmp1, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>			%vext43 = shufflevector <16 x i8> <i8 undef, i8 16, i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2>, <16 x i8> %tmp1, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>
	%add.i.414 = add <16 x i8> zeroinitializer, %vext43			%add.i.414 = add <16 x i8> zeroinitializer, %vext43
	store <16 x i8> %add.i.414, <16 x i8>* %tmp, align 16			store <16 x i8> %add.i.414, <16 x i8>* %tmp, align 16
	%add.ptr51 = getelementptr inbounds i8, i8* %row, i64 16			%add.ptr51 = getelementptr inbounds i8, i8* %row, i64 16
	%tmp2 = bitcast i8* %add.ptr51 to <16 x i8>*			%tmp2 = bitcast i8* %add.ptr51 to <16 x i8>*
	%tmp3 = load <16 x i8>, <16 x i8>* %tmp2, align 16			%tmp3 = load <16 x i8>, <16 x i8>* %tmp2, align 16
	%tmp4 = bitcast i8* undef to <16 x i8>*			%tmp4 = bitcast i8* undef to <16 x i8>*
	%tmp5 = load <16 x i8>, <16 x i8>* %tmp4, align 16			%tmp5 = load <16 x i8>, <16 x i8>* %tmp4, align 16
	%vext157 = shufflevector <16 x i8> %tmp3, <16 x i8> %tmp5, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>			%vext157 = shufflevector <16 x i8> %tmp3, <16 x i8> %tmp5, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>
	Show All 27 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86][AArch64] ask the target whether it has a vector blend instructionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 137740

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/test/CodeGen/AArch64/aarch64-vuzp.ll

llvm/trunk/test/CodeGen/AArch64/arm64-collect-loh.ll

[x86][AArch64] ask the target whether it has a vector blend instruction
ClosedPublic