Download Raw Diff

Details

Reviewers

RKSimon
zvi
delena
mkuper
DavidKreitzer

Commits

rG0e3ae305b657: Refactored X86InterleavedAccess into a class. NFCI.
rL288410: Refactored X86InterleavedAccess into a class. NFCI.

Summary

This change-set support re-factorization of X86InterleaveAccess pass into a class without any functional changes in order to allow better code sharing.

Diff Detail

Event Timeline

Farhana updated this revision to Diff 75878.Oct 26 2016, 7:20 AM

Farhana retitled this revision from to Re-factorization of X86InterleaveAccess into a class.

Farhana updated this object.

Farhana added reviewers: RKSimon, delena, mkuper, DavidKreitzer, zvi.

zvi added inline comments.Oct 27 2016, 7:36 AM

lib/Target/X86/X86InterleavedAccess.cpp
141	Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct?
lib/Target/X86/X86InterleavedAccess.h
18 ↗	(On Diff #75878)	No need to include in the header. forward declaration is sufficient: class X86Subtarget;
19 ↗	(On Diff #75878)	Probably same as above, LLVM convention is to use forward declarations if reasonable.
53 ↗	(On Diff #75878)	Consider documenting this c-tor's arguments.
66 ↗	(On Diff #75878)	Does this method need to be public?
77 ↗	(On Diff #75878)	Does this method need to be public?

RKSimon added inline comments.Oct 27 2016, 7:59 AM

lib/Target/X86/X86InterleavedAccess.cpp
217	Isn't this a leak? Where is this deleted?

Thanks Simon and Zvi.

Here is the updated change-set that addresses your comments.

zvi added inline comments.Oct 28 2016, 2:01 AM

lib/Target/X86/X86InterleavedAccess.cpp
217	It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object.

The update addresses Zvi's comment about declaring Grp as a functional local variable.

DavidKreitzer added inline comments.Oct 31 2016, 2:14 PM

lib/Target/X86/X86InterleavedAccess.cpp
103	elements --> element
118	Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members.
121	VecSize --> SubVecSize would be clearer
130	This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."?
132	Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead?
141	It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors OutputVectors --> TransposedVectors
145	Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed
172	Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation).

Thanks Dave, the update addresses your comment.

lib/Target/X86/X86InterleavedAccess.cpp
118	So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern.
132	So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly.
145	Yes.

Please can you replace the uses of uint32_t with unsigned?

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

Hi Simon,

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

Replaces uint32_t with unsigned.

delena added inline comments.Nov 9 2016, 7:35 PM

lib/Target/X86/X86InterleavedAccess.cpp
34	const LoadInst* Inst;
50	Do you need to keep Builder inside? I assume it may be a local var inside function.
81	const Instruction *I
216	Why do you need to pass an empty builder here?
220	return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();

This change-set addresses Elena's comments.

lib/Target/X86/X86InterleavedAccess.cpp
34	Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst.
50	Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation.
81	I think it's covered by my previous comment.
216	We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions.

delena added inline comments.Nov 10 2016, 10:57 PM

lib/Target/X86/X86InterleavedAccess.cpp
34	so "Instruction* Inst" is enough. The "const" is redundant here.
127	It is always "load" or you are thinking about any general case? You can call the function decomposeLoad.
141	Zvi is right. And fix indentation, please.

Farhana added inline comments.Nov 11 2016, 7:15 AM

lib/Target/X86/X86InterleavedAccess.cpp
127	Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle.
141	Sorry Zvi, I missed this comment earlier. Yes, that's correct.
141	"And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is.

In D25986#590787, @Farhana wrote:

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

In general we should match whatever the original return values was - Type::getVectorNumElements() returns unsigned so the NumSubVectors should probably match that.

I'm sorry but I messed up on the types of a couple of the other instances - CreateShuffleVector should take an ArrayRef<uint32_t> and DataLayout::getTypeSizeInBits returns a uint64_t

lib/Target/X86/X86InterleavedAccess.cpp
197	for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i)

Farhana updated this revision to Diff 77650.Nov 11 2016, 12:17 PM

Hi Guys,

Is there anything else you want me to fix/clarify?

Farhana

LGTM. Thanks, Farhana!

X86InterleavedAccess.cpp
71 ↗	(On Diff #77650)	Minor comment: AFAIU you can pass an ArrayRef object by value efficiently, but i did see several occurrences in LLVM's source code with ArrayRef's passed by reference. Whatever you choose, maybe make this entire file consistent (line 82, for example is inconsistent with line 71). I see that the overriden methods such as TargetLowering::lowerInterleavedLoad pass ArrayRef objects by value, so maybe this would be a tie-breaker for following this convention?

ArrayRefs are passed by value.

Thanks, Farhana! LGTM.

LGTM

This revision is now accepted and ready to land.Nov 23 2016, 4:04 AM

Hi Farhana,
I have no further comments. This LGTM too.
-Dave

Closed by commit rL288410: Refactored X86InterleavedAccess into a class. NFCI. (authored by dlkreitz). · Explain WhyDec 1 2016, 12:06 PM

This revision was automatically updated to reflect the committed changes.

Diff 78912

lib/Target/X86/X86InterleavedAccess.cpp

	//===------- X86InterleavedAccess.cpp --------------===//			//===--------- X86InterleavedAccess.cpp ----------------------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//
	//			///
	// This file contains the X86 implementation of the interleaved accesses			/// \file
	// optimization generating X86-specific instructions/intrinsics for interleaved			/// This file contains the X86 implementation of the interleaved accesses
	// access groups.			/// optimization generating X86-specific instructions/intrinsics for
	//			/// interleaved access groups.
	//===----------------------------------------------------------------------===//			///
				//===--------------------------------------------------------------------===//

	#include "X86ISelLowering.h"			#include "X86ISelLowering.h"
	#include "X86TargetMachine.h"			#include "X86TargetMachine.h"

	using namespace llvm;			using namespace llvm;

	/// Returns true if the interleaved access group represented by the shuffles			/// \brief This class holds necessary information to represent an interleaved
	/// is supported for the subtarget. Returns false otherwise.			/// access group and supports utilities to lower the group into
	static bool isSupported(const X86Subtarget &SubTarget,			/// X86-specific instructions/intrinsics.
	const LoadInst *LI,			/// E.g. A group of interleaving access loads (Factor = 2; accessing every
	const ArrayRef<ShuffleVectorInst *> &Shuffles,			/// other element)
	unsigned Factor) {			/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
				/// %v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
				/// %v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>

				class X86InterleavedAccessGroup {
				/// \brief Reference to the wide-load instruction of an interleaved access
				/// group.
				Instruction *const Inst;
				delenaUnsubmitted Not Done Reply Inline Actions const LoadInst* Inst; delena: const LoadInst* Inst;
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst. Farhana: Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a…
				delenaUnsubmitted Not Done Reply Inline Actions so "Instruction* Inst" is enough. The "const" is redundant here. delena: so "Instruction* Inst" is enough. The "const" is redundant here.

				/// \brief Reference to the shuffle(s), consumer(s) of the (load) 'Inst'.
				ArrayRef<ShuffleVectorInst *> Shuffles;

				/// \brief Reference to the starting index of each user-shuffle.
				ArrayRef<unsigned> Indices;

				/// \brief Reference to the interleaving stride in terms of elements.
				const unsigned Factor;

				/// \brief Reference to the underlying target.
				const X86Subtarget &Subtarget;

				const DataLayout &DL;

				IRBuilder<> &Builder;
				delenaUnsubmitted Not Done Reply Inline Actions Do you need to keep Builder inside? I assume it may be a local var inside function. delena: Do you need to keep Builder inside? I assume it may be a local var inside function.
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation. Farhana: Yes, because Builder is used in different member functions and the different member functions…

				/// \brief Breaks down a vector \p 'Inst' of N elements into \p NumSubVectors
				/// sub vectors of type \p T. Returns true and the sub-vectors in
				/// \p DecomposedVectors if it decomposes the Inst, returns false otherwise.
				bool decompose(Instruction Inst, unsigned NumSubVectors, VectorType T,
				SmallVectorImpl<Instruction *> &DecomposedVectors);

				/// \brief Performs matrix transposition on a 4x4 matrix \p InputVectors and
				/// returns the transposed-vectors in \p TransposedVectors.
				/// E.g.
				/// InputVectors:
				/// In-V0 = p1, p2, p3, p4
				/// In-V1 = q1, q2, q3, q4
				/// In-V2 = r1, r2, r3, r4
				/// In-V3 = s1, s2, s3, s4
				/// OutputVectors:
				/// Out-V0 = p1, q1, r1, s1
				/// Out-V1 = p2, q2, r2, s2
				/// Out-V2 = p3, q3, r3, s3
				/// Out-V3 = P4, q4, r4, s4
				void transpose_4x4(ArrayRef<Instruction *> InputVectors,
				SmallVectorImpl<Value *> &TrasposedVectors);

				public:
				/// In order to form an interleaved access group X86InterleavedAccessGroup
				/// requires a wide-load instruction \p 'I', a group of interleaved-vectors
				/// \p Shuffs, reference to the first indices of each interleaved-vector
				/// \p 'Ind' and the interleaving stride factor \p F. In order to generate
				/// X86-specific instructions/intrinsics it also requires the underlying
				/// target information \p STarget.
				explicit X86InterleavedAccessGroup(Instruction *I,
				delenaUnsubmitted Not Done Reply Inline Actions const Instruction I delena:* const Instruction *I
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions I think it's covered by my previous comment. Farhana: I think it's covered by my previous comment.
				ArrayRef<ShuffleVectorInst *> Shuffs,
				ArrayRef<unsigned> Ind,
				const unsigned F,
				const X86Subtarget &STarget,
				IRBuilder<> &B)
				: Inst(I), Shuffles(Shuffs), Indices(Ind), Factor(F), Subtarget(STarget),
				DL(Inst->getModule()->getDataLayout()), Builder(B) {}

				/// \brief Returns true if this interleaved access group can be lowered into
				/// x86-specific instructions/intrinsics, false otherwise.
				bool isSupported() const;

				/// \brief Lowers this interleaved access group into X86-specific
				/// instructions/intrinsics.
				bool lowerIntoOptimizedSequence();
				};

	const DataLayout &DL = Shuffles[0]->getModule()->getDataLayout();			bool X86InterleavedAccessGroup::isSupported() const {
	VectorType *ShuffleVecTy = Shuffles[0]->getType();			VectorType *ShuffleVecTy = Shuffles[0]->getType();
	unsigned ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);			uint64_t ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);
	Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();			Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();

				DavidKreitzerUnsubmitted Not Done Reply Inline Actions elements --> element DavidKreitzer: elements --> element
	if (DL.getTypeSizeInBits(LI->getType()) < Factor * ShuffleVecSize)			if (DL.getTypeSizeInBits(Inst->getType()) < Factor * ShuffleVecSize)
	return false;			return false;

	// Currently, lowering is supported for 64 bits on AVX.			// Currently, lowering is supported for 64 bits on AVX.
	if (!SubTarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|			if (!Subtarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|
	DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\|			DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\| Factor != 4)
	Factor != 4)
	return false;			return false;

	return true;			return true;
	}			}

	/// \brief Lower interleaved load(s) into target specific instructions/			bool X86InterleavedAccessGroup::decompose(
	/// intrinsics. Lowering sequence varies depending on the vector-types, factor,			Instruction VecInst, unsigned NumSubVectors, VectorType SubVecTy,
	/// number of shuffles and ISA.			SmallVectorImpl<Instruction *> &DecomposedVectors) {
	/// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.			Type *VecTy = VecInst->getType();
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members. DavidKreitzer: Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern. Farhana: So the intent is to use this function in a general way and use it to break down any kind of…
	bool X86TargetLowering::lowerInterleavedLoad(
	LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,			assert(VecTy->isVectorTy() &&
	ArrayRef<unsigned> Indices, unsigned Factor) const {			DL.getTypeSizeInBits(VecTy) >=
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions VecSize --> SubVecSize would be clearer DavidKreitzer: VecSize --> SubVecSize would be clearer
	assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&			DL.getTypeSizeInBits(SubVecTy) * NumSubVectors &&
	"Invalid interleave factor");			"Invalid Inst-size!!!");
	assert(!Shuffles.empty() && "Empty shufflevector input");			assert(VecTy->getVectorElementType() == SubVecTy->getVectorElementType() &&
	assert(Shuffles.size() == Indices.size() &&			"Element type mismatched!!!");
	"Unmatched number of shufflevectors and indices");

	if (!isSupported(Subtarget, LI, Shuffles, Factor))			if (!isa<LoadInst>(VecInst))
				delenaUnsubmitted Not Done Reply Inline Actions It is always "load" or you are thinking about any general case? You can call the function decomposeLoad. delena: It is always "load" or you are thinking about any general case? You can call the function…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle. Farhana: Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such…
	return false;			return false;

	VectorType *ShuffleVecTy = Shuffles[0]->getType();			LoadInst *LI = cast<LoadInst>(VecInst);
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."? DavidKreitzer: This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."?
				Type *VecBasePtrTy = SubVecTy->getPointerTo(LI->getPointerAddressSpace());
	Type *VecBasePtrTy = ShuffleVecTy->getPointerTo(LI->getPointerAddressSpace());

	IRBuilder<> Builder(LI);
	SmallVector<Instruction *, 4> NewLoads;
	SmallVector<Value *, 4> NewShuffles;
	NewShuffles.resize(Factor);

				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead? DavidKreitzer: Please describe the return value in the comment. I assume it is supposed to indicate success?
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly. Farhana: So the intent is to use it for breaking down any kind of instruction such as load, shuffle.
	Value *VecBasePtr =			Value *VecBasePtr =
	Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);			Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);

	// Generate 4 loads of type v4xT64			// Generate N loads of T type
	for (unsigned Part = 0; Part < Factor; Part++) {			for (unsigned i = 0; i < NumSubVectors; i++) {
	// TODO: Support inbounds GEP			// TODO: Support inbounds GEP
	Value *NewBasePtr =			Value *NewBasePtr = Builder.CreateGEP(VecBasePtr, Builder.getInt32(i));
	Builder.CreateGEP(VecBasePtr, Builder.getInt32(Part));
	Instruction *NewLoad =			Instruction *NewLoad =
	Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());			Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());
				zviUnsubmitted Not Done Reply Inline Actions Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct? zvi: Not related to this patch, but is setting the original vector load's alignment for all…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Sorry Zvi, I missed this comment earlier. Yes, that's correct. Farhana: Sorry Zvi, I missed this comment earlier. Yes, that's correct.
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors OutputVectors --> TransposedVectors DavidKreitzer: It would be slightly clearer if you used the variable names here, i.e. Input-Vectors -->…
				delenaUnsubmitted Not Done Reply Inline Actions Zvi is right. And fix indentation, please. delena: Zvi is right. And fix indentation, please.
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions "And fix indentation, please" Instruction NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is. Farhana:* "And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad…
	NewLoads.push_back(NewLoad);			DecomposedVectors.push_back(NewLoad);
				}

				return true;
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed DavidKreitzer: Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes. Farhana: Yes.
	}			}

				void X86InterleavedAccessGroup::transpose_4x4(
				ArrayRef<Instruction *> Matrix,
				SmallVectorImpl<Value *> &TransposedMatrix) {
				assert(Matrix.size() == 4 && "Invalid matrix size");
				TransposedMatrix.resize(4);

	// dst = src1[0,1],src2[0,1]			// dst = src1[0,1],src2[0,1]
	uint32_t IntMask1[] = {0, 1, 4, 5};			uint32_t IntMask1[] = {0, 1, 4, 5};
	ArrayRef<unsigned int> ShuffleMask = makeArrayRef(IntMask1, 4);			ArrayRef<uint32_t> Mask = makeArrayRef(IntMask1, 4);
	Value *IntrVec1 =			Value *IntrVec1 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec2 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec2 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[2,3],src2[2,3]			// dst = src1[2,3],src2[2,3]
	uint32_t IntMask2[] = {2, 3, 6, 7};			uint32_t IntMask2[] = {2, 3, 6, 7};
	ShuffleMask = makeArrayRef(IntMask2, 4);			Mask = makeArrayRef(IntMask2, 4);
	Value *IntrVec3 =			Value *IntrVec3 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec4 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec4 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[0],src2[0],src1[2],src2[2]			// dst = src1[0],src2[0],src1[2],src2[2]
	uint32_t IntMask3[] = {0, 4, 2, 6};			uint32_t IntMask3[] = {0, 4, 2, 6};
	ShuffleMask = makeArrayRef(IntMask3, 4);			Mask = makeArrayRef(IntMask3, 4);
	NewShuffles[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	// dst = src1[1],src2[1],src1[3],src2[3]			// dst = src1[1],src2[1],src1[3],src2[3]
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation). DavidKreitzer: Don't duplicate the comments from the class definition unless they contain additional value (e.
	uint32_t IntMask4[] = {1, 5, 3, 7};			uint32_t IntMask4[] = {1, 5, 3, 7};
	ShuffleMask = makeArrayRef(IntMask4, 4);			Mask = makeArrayRef(IntMask4, 4);
	NewShuffles[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	for (unsigned i = 0; i < Shuffles.size(); i++) {
	unsigned Index = Indices[i];
	Shuffles[i]->replaceAllUsesWith(NewShuffles[Index]);
	}			}

				// Lowers this interleaved access group into X86-specific
				// instructions/intrinsics.
				bool X86InterleavedAccessGroup::lowerIntoOptimizedSequence() {
				SmallVector<Instruction *, 4> DecomposedVectors;
				VectorType *VecTy = Shuffles[0]->getType();
				// Try to generate target-sized register(/instruction).
				if (!decompose(Inst, Factor, VecTy, DecomposedVectors))
				return false;

				SmallVector<Value *, 4> TransposedVectors;
				// Perform matrix-transposition in order to compute interleaved
				// results by generating some sort of (optimized) target-specific
				// instructions.
				transpose_4x4(DecomposedVectors, TransposedVectors);

				// Now replace the unoptimized-interleaved-vectors with the
				// transposed-interleaved vectors.
				for (unsigned i = 0; i < Shuffles.size(); i++)
				Shuffles[i]->replaceAllUsesWith(TransposedVectors[Indices[i]]);
				RKSimonUnsubmitted Not Done Reply Inline Actions for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i) RKSimon: for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i)

	return true;			return true;
	}			}

				// Lower interleaved load(s) into target specific instructions/
				// intrinsics. Lowering sequence varies depending on the vector-types, factor,
				// number of shuffles and ISA.
				// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.
				bool X86TargetLowering::lowerInterleavedLoad(
				LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,
				ArrayRef<unsigned> Indices, unsigned Factor) const {
				assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
				"Invalid interleave factor");
				assert(!Shuffles.empty() && "Empty shufflevector input");
				assert(Shuffles.size() == Indices.size() &&
				"Unmatched number of shufflevectors and indices");

				// Create an interleaved access group.
				IRBuilder<> Builder(LI);
				delenaUnsubmitted Not Done Reply Inline Actions Why do you need to pass an empty builder here? delena: Why do you need to pass an empty builder here?
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions. Farhana: We are using this builder with insertionpoint set to load instruction in multiple member…
				X86InterleavedAccessGroup Grp(LI, Shuffles, Indices, Factor, Subtarget,
				RKSimonUnsubmitted Not Done Reply Inline Actions Isn't this a leak? Where is this deleted? RKSimon: Isn't this a leak? Where is this deleted?
				zviUnsubmitted Not Done Reply Inline Actions It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object. zvi: It would better to instantiate as a function-local variable. If you do want to allocate on the…
				Builder);

				return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();
				delenaUnsubmitted Not Done Reply Inline Actions return Grp.isSupported() && Grp.lowerIntoOptimizedSequence(); delena: return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();
				}

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a class
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78912

lib/Target/X86/X86InterleavedAccess.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a classClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78912

lib/Target/X86/X86InterleavedAccess.cpp

Re-factorization of X86InterleaveAccess into a class
ClosedPublic