Download Raw Diff

Details

Reviewers

RKSimon
zvi
delena
mkuper
DavidKreitzer

Commits

rG0e3ae305b657: Refactored X86InterleavedAccess into a class. NFCI.
rL288410: Refactored X86InterleavedAccess into a class. NFCI.

Summary

This change-set support re-factorization of X86InterleaveAccess pass into a class without any functional changes in order to allow better code sharing.

Diff Detail

Repository: rL LLVM

Event Timeline

Farhana updated this revision to Diff 75878.Oct 26 2016, 7:20 AM

Farhana retitled this revision from to Re-factorization of X86InterleaveAccess into a class.

Farhana updated this object.

Farhana added reviewers: RKSimon, delena, mkuper, DavidKreitzer, zvi.

zvi added inline comments.Oct 27 2016, 7:36 AM

lib/Target/X86/X86InterleavedAccess.cpp
69 ↗	(On Diff #75878)	Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct?
lib/Target/X86/X86InterleavedAccess.h
18 ↗	(On Diff #75878)	No need to include in the header. forward declaration is sufficient: class X86Subtarget;
19 ↗	(On Diff #75878)	Probably same as above, LLVM convention is to use forward declarations if reasonable.
53 ↗	(On Diff #75878)	Consider documenting this c-tor's arguments.
66 ↗	(On Diff #75878)	Does this method need to be public?
77 ↗	(On Diff #75878)	Does this method need to be public?

RKSimon added inline comments.Oct 27 2016, 7:59 AM

lib/Target/X86/X86InterleavedAccess.cpp
152 ↗	(On Diff #75878)	Isn't this a leak? Where is this deleted?

Thanks Simon and Zvi.

Here is the updated change-set that addresses your comments.

zvi added inline comments.Oct 28 2016, 2:01 AM

lib/Target/X86/X86InterleavedAccess.cpp
224 ↗	(On Diff #76113)	It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object.

The update addresses Zvi's comment about declaring Grp as a functional local variable.

DavidKreitzer added inline comments.Oct 31 2016, 2:14 PM

lib/Target/X86/X86InterleavedAccess.cpp
26 ↗	(On Diff #76449)	elements --> element
53 ↗	(On Diff #76449)	This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."?
55 ↗	(On Diff #76449)	Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead?
64 ↗	(On Diff #76449)	It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors OutputVectors --> TransposedVectors
68 ↗	(On Diff #76449)	Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed
95 ↗	(On Diff #76449)	Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation).
116 ↗	(On Diff #76449)	Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members.
119 ↗	(On Diff #76449)	VecSize --> SubVecSize would be clearer

Thanks Dave, the update addresses your comment.

lib/Target/X86/X86InterleavedAccess.cpp
55 ↗	(On Diff #76449)	So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly.
68 ↗	(On Diff #76449)	Yes.
116 ↗	(On Diff #76449)	So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern.

Please can you replace the uses of uint32_t with unsigned?

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

Hi Simon,

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

Replaces uint32_t with unsigned.

delena added inline comments.Nov 9 2016, 7:35 PM

lib/Target/X86/X86InterleavedAccess.cpp
34 ↗	(On Diff #77381)	const LoadInst* Inst;
50 ↗	(On Diff #77381)	Do you need to keep Builder inside? I assume it may be a local var inside function.
81 ↗	(On Diff #77381)	const Instruction *I
216 ↗	(On Diff #77381)	Why do you need to pass an empty builder here?
220 ↗	(On Diff #77381)	return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();

This change-set addresses Elena's comments.

lib/Target/X86/X86InterleavedAccess.cpp
34 ↗	(On Diff #77381)	Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst.
50 ↗	(On Diff #77381)	Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation.
81 ↗	(On Diff #77381)	I think it's covered by my previous comment.
216 ↗	(On Diff #77381)	We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions.

delena added inline comments.Nov 10 2016, 10:57 PM

lib/Target/X86/X86InterleavedAccess.cpp
127 ↗	(On Diff #77569)	It is always "load" or you are thinking about any general case? You can call the function decomposeLoad.
141 ↗	(On Diff #77569)	Zvi is right. And fix indentation, please.
34 ↗	(On Diff #77381)	so "Instruction* Inst" is enough. The "const" is redundant here.

Farhana added inline comments.Nov 11 2016, 7:15 AM

lib/Target/X86/X86InterleavedAccess.cpp
127 ↗	(On Diff #77569)	Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle.
141 ↗	(On Diff #77569)	"And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is.
69 ↗	(On Diff #75878)	Sorry Zvi, I missed this comment earlier. Yes, that's correct.

In D25986#590787, @Farhana wrote:

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

In general we should match whatever the original return values was - Type::getVectorNumElements() returns unsigned so the NumSubVectors should probably match that.

I'm sorry but I messed up on the types of a couple of the other instances - CreateShuffleVector should take an ArrayRef<uint32_t> and DataLayout::getTypeSizeInBits returns a uint64_t

lib/Target/X86/X86InterleavedAccess.cpp
196 ↗	(On Diff #77569)	for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i)

Farhana updated this revision to Diff 77650.Nov 11 2016, 12:17 PM

Hi Guys,

Is there anything else you want me to fix/clarify?

Farhana

LGTM. Thanks, Farhana!

X86InterleavedAccess.cpp
71 ↗	(On Diff #77650)	Minor comment: AFAIU you can pass an ArrayRef object by value efficiently, but i did see several occurrences in LLVM's source code with ArrayRef's passed by reference. Whatever you choose, maybe make this entire file consistent (line 82, for example is inconsistent with line 71). I see that the overriden methods such as TargetLowering::lowerInterleavedLoad pass ArrayRef objects by value, so maybe this would be a tie-breaker for following this convention?

ArrayRefs are passed by value.

Thanks, Farhana! LGTM.

LGTM

This revision is now accepted and ready to land.Nov 23 2016, 4:04 AM

Hi Farhana,
I have no further comments. This LGTM too.
-Dave

Closed by commit rL288410: Refactored X86InterleavedAccess into a class. NFCI. (authored by dlkreitz). · Explain WhyDec 1 2016, 12:06 PM

This revision was automatically updated to reflect the committed changes.

Diff 79965

llvm/trunk/lib/Target/X86/X86InterleavedAccess.cpp

	//===------- X86InterleavedAccess.cpp --------------===//			//===--------- X86InterleavedAccess.cpp ----------------------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//
	//			///
	// This file contains the X86 implementation of the interleaved accesses			/// \file
	// optimization generating X86-specific instructions/intrinsics for interleaved			/// This file contains the X86 implementation of the interleaved accesses
	// access groups.			/// optimization generating X86-specific instructions/intrinsics for
	//			/// interleaved access groups.
	//===----------------------------------------------------------------------===//			///
				//===--------------------------------------------------------------------===//

	#include "X86ISelLowering.h"			#include "X86ISelLowering.h"
	#include "X86TargetMachine.h"			#include "X86TargetMachine.h"

	using namespace llvm;			using namespace llvm;

	/// Returns true if the interleaved access group represented by the shuffles			/// \brief This class holds necessary information to represent an interleaved
	/// is supported for the subtarget. Returns false otherwise.			/// access group and supports utilities to lower the group into
	static bool isSupported(const X86Subtarget &SubTarget,			/// X86-specific instructions/intrinsics.
	const LoadInst *LI,			/// E.g. A group of interleaving access loads (Factor = 2; accessing every
	const ArrayRef<ShuffleVectorInst *> &Shuffles,			/// other element)
	unsigned Factor) {			/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
				/// %v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
				/// %v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>

				class X86InterleavedAccessGroup {
				/// \brief Reference to the wide-load instruction of an interleaved access
				/// group.
				Instruction *const Inst;

				/// \brief Reference to the shuffle(s), consumer(s) of the (load) 'Inst'.
				ArrayRef<ShuffleVectorInst *> Shuffles;

				/// \brief Reference to the starting index of each user-shuffle.
				ArrayRef<unsigned> Indices;

				/// \brief Reference to the interleaving stride in terms of elements.
				const unsigned Factor;

				/// \brief Reference to the underlying target.
				const X86Subtarget &Subtarget;

				const DataLayout &DL;

				IRBuilder<> &Builder;

				/// \brief Breaks down a vector \p 'Inst' of N elements into \p NumSubVectors
				/// sub vectors of type \p T. Returns true and the sub-vectors in
				/// \p DecomposedVectors if it decomposes the Inst, returns false otherwise.
				bool decompose(Instruction Inst, unsigned NumSubVectors, VectorType T,
				SmallVectorImpl<Instruction *> &DecomposedVectors);

				/// \brief Performs matrix transposition on a 4x4 matrix \p InputVectors and
				/// returns the transposed-vectors in \p TransposedVectors.
				/// E.g.
				/// InputVectors:
				/// In-V0 = p1, p2, p3, p4
				/// In-V1 = q1, q2, q3, q4
				/// In-V2 = r1, r2, r3, r4
				/// In-V3 = s1, s2, s3, s4
				/// OutputVectors:
				/// Out-V0 = p1, q1, r1, s1
				/// Out-V1 = p2, q2, r2, s2
				/// Out-V2 = p3, q3, r3, s3
				/// Out-V3 = P4, q4, r4, s4
				void transpose_4x4(ArrayRef<Instruction *> InputVectors,
				SmallVectorImpl<Value *> &TrasposedVectors);

				public:
				/// In order to form an interleaved access group X86InterleavedAccessGroup
				/// requires a wide-load instruction \p 'I', a group of interleaved-vectors
				/// \p Shuffs, reference to the first indices of each interleaved-vector
				/// \p 'Ind' and the interleaving stride factor \p F. In order to generate
				/// X86-specific instructions/intrinsics it also requires the underlying
				/// target information \p STarget.
				explicit X86InterleavedAccessGroup(Instruction *I,
				ArrayRef<ShuffleVectorInst *> Shuffs,
				ArrayRef<unsigned> Ind,
				const unsigned F,
				const X86Subtarget &STarget,
				IRBuilder<> &B)
				: Inst(I), Shuffles(Shuffs), Indices(Ind), Factor(F), Subtarget(STarget),
				DL(Inst->getModule()->getDataLayout()), Builder(B) {}

				/// \brief Returns true if this interleaved access group can be lowered into
				/// x86-specific instructions/intrinsics, false otherwise.
				bool isSupported() const;

				/// \brief Lowers this interleaved access group into X86-specific
				/// instructions/intrinsics.
				bool lowerIntoOptimizedSequence();
				};

	const DataLayout &DL = Shuffles[0]->getModule()->getDataLayout();			bool X86InterleavedAccessGroup::isSupported() const {
	VectorType *ShuffleVecTy = Shuffles[0]->getType();			VectorType *ShuffleVecTy = Shuffles[0]->getType();
	unsigned ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);			uint64_t ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);
	Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();			Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();

	if (DL.getTypeSizeInBits(LI->getType()) < Factor * ShuffleVecSize)			if (DL.getTypeSizeInBits(Inst->getType()) < Factor * ShuffleVecSize)
	return false;			return false;

	// Currently, lowering is supported for 64 bits on AVX.			// Currently, lowering is supported for 64 bits on AVX.
	if (!SubTarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|			if (!Subtarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|
	DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\|			DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\| Factor != 4)
	Factor != 4)
	return false;			return false;

	return true;			return true;
	}			}

	/// \brief Lower interleaved load(s) into target specific instructions/			bool X86InterleavedAccessGroup::decompose(
	/// intrinsics. Lowering sequence varies depending on the vector-types, factor,			Instruction VecInst, unsigned NumSubVectors, VectorType SubVecTy,
	/// number of shuffles and ISA.			SmallVectorImpl<Instruction *> &DecomposedVectors) {
	/// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.			Type *VecTy = VecInst->getType();
	bool X86TargetLowering::lowerInterleavedLoad(
	LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,			assert(VecTy->isVectorTy() &&
	ArrayRef<unsigned> Indices, unsigned Factor) const {			DL.getTypeSizeInBits(VecTy) >=
	assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&			DL.getTypeSizeInBits(SubVecTy) * NumSubVectors &&
	"Invalid interleave factor");			"Invalid Inst-size!!!");
	assert(!Shuffles.empty() && "Empty shufflevector input");			assert(VecTy->getVectorElementType() == SubVecTy->getVectorElementType() &&
	assert(Shuffles.size() == Indices.size() &&			"Element type mismatched!!!");
	"Unmatched number of shufflevectors and indices");

	if (!isSupported(Subtarget, LI, Shuffles, Factor))			if (!isa<LoadInst>(VecInst))
	return false;			return false;

	VectorType *ShuffleVecTy = Shuffles[0]->getType();			LoadInst *LI = cast<LoadInst>(VecInst);
				Type *VecBasePtrTy = SubVecTy->getPointerTo(LI->getPointerAddressSpace());
	Type *VecBasePtrTy = ShuffleVecTy->getPointerTo(LI->getPointerAddressSpace());

	IRBuilder<> Builder(LI);
	SmallVector<Instruction *, 4> NewLoads;
	SmallVector<Value *, 4> NewShuffles;
	NewShuffles.resize(Factor);

	Value *VecBasePtr =			Value *VecBasePtr =
	Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);			Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);

	// Generate 4 loads of type v4xT64			// Generate N loads of T type
	for (unsigned Part = 0; Part < Factor; Part++) {			for (unsigned i = 0; i < NumSubVectors; i++) {
	// TODO: Support inbounds GEP			// TODO: Support inbounds GEP
	Value *NewBasePtr =			Value *NewBasePtr = Builder.CreateGEP(VecBasePtr, Builder.getInt32(i));
	Builder.CreateGEP(VecBasePtr, Builder.getInt32(Part));
	Instruction *NewLoad =			Instruction *NewLoad =
	Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());			Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());
	NewLoads.push_back(NewLoad);			DecomposedVectors.push_back(NewLoad);
				}

				return true;
	}			}

				void X86InterleavedAccessGroup::transpose_4x4(
				ArrayRef<Instruction *> Matrix,
				SmallVectorImpl<Value *> &TransposedMatrix) {
				assert(Matrix.size() == 4 && "Invalid matrix size");
				TransposedMatrix.resize(4);

	// dst = src1[0,1],src2[0,1]			// dst = src1[0,1],src2[0,1]
	uint32_t IntMask1[] = {0, 1, 4, 5};			uint32_t IntMask1[] = {0, 1, 4, 5};
	ArrayRef<unsigned int> ShuffleMask = makeArrayRef(IntMask1, 4);			ArrayRef<uint32_t> Mask = makeArrayRef(IntMask1, 4);
	Value *IntrVec1 =			Value *IntrVec1 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec2 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec2 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[2,3],src2[2,3]			// dst = src1[2,3],src2[2,3]
	uint32_t IntMask2[] = {2, 3, 6, 7};			uint32_t IntMask2[] = {2, 3, 6, 7};
	ShuffleMask = makeArrayRef(IntMask2, 4);			Mask = makeArrayRef(IntMask2, 4);
	Value *IntrVec3 =			Value *IntrVec3 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec4 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec4 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[0],src2[0],src1[2],src2[2]			// dst = src1[0],src2[0],src1[2],src2[2]
	uint32_t IntMask3[] = {0, 4, 2, 6};			uint32_t IntMask3[] = {0, 4, 2, 6};
	ShuffleMask = makeArrayRef(IntMask3, 4);			Mask = makeArrayRef(IntMask3, 4);
	NewShuffles[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	// dst = src1[1],src2[1],src1[3],src2[3]			// dst = src1[1],src2[1],src1[3],src2[3]
	uint32_t IntMask4[] = {1, 5, 3, 7};			uint32_t IntMask4[] = {1, 5, 3, 7};
	ShuffleMask = makeArrayRef(IntMask4, 4);			Mask = makeArrayRef(IntMask4, 4);
	NewShuffles[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	for (unsigned i = 0; i < Shuffles.size(); i++) {
	unsigned Index = Indices[i];
	Shuffles[i]->replaceAllUsesWith(NewShuffles[Index]);
	}			}

				// Lowers this interleaved access group into X86-specific
				// instructions/intrinsics.
				bool X86InterleavedAccessGroup::lowerIntoOptimizedSequence() {
				SmallVector<Instruction *, 4> DecomposedVectors;
				VectorType *VecTy = Shuffles[0]->getType();
				// Try to generate target-sized register(/instruction).
				if (!decompose(Inst, Factor, VecTy, DecomposedVectors))
				return false;

				SmallVector<Value *, 4> TransposedVectors;
				// Perform matrix-transposition in order to compute interleaved
				// results by generating some sort of (optimized) target-specific
				// instructions.
				transpose_4x4(DecomposedVectors, TransposedVectors);

				// Now replace the unoptimized-interleaved-vectors with the
				// transposed-interleaved vectors.
				for (unsigned i = 0; i < Shuffles.size(); i++)
				Shuffles[i]->replaceAllUsesWith(TransposedVectors[Indices[i]]);

	return true;			return true;
	}			}

				// Lower interleaved load(s) into target specific instructions/
				// intrinsics. Lowering sequence varies depending on the vector-types, factor,
				// number of shuffles and ISA.
				// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.
				bool X86TargetLowering::lowerInterleavedLoad(
				LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,
				ArrayRef<unsigned> Indices, unsigned Factor) const {
				assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
				"Invalid interleave factor");
				assert(!Shuffles.empty() && "Empty shufflevector input");
				assert(Shuffles.size() == Indices.size() &&
				"Unmatched number of shufflevectors and indices");

				// Create an interleaved access group.
				IRBuilder<> Builder(LI);
				X86InterleavedAccessGroup Grp(LI, Shuffles, Indices, Factor, Subtarget,
				Builder);

				return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();
				}

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a class
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79965

llvm/trunk/lib/Target/X86/X86InterleavedAccess.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a classClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79965

llvm/trunk/lib/Target/X86/X86InterleavedAccess.cpp

Re-factorization of X86InterleaveAccess into a class
ClosedPublic