Download Raw Diff

Details

Reviewers

RKSimon
zvi
delena
mkuper
DavidKreitzer

Commits

rG0e3ae305b657: Refactored X86InterleavedAccess into a class. NFCI.
rL288410: Refactored X86InterleavedAccess into a class. NFCI.

Summary

This change-set support re-factorization of X86InterleaveAccess pass into a class without any functional changes in order to allow better code sharing.

Diff Detail

Event Timeline

Farhana updated this revision to Diff 75878.Oct 26 2016, 7:20 AM

Farhana retitled this revision from to Re-factorization of X86InterleaveAccess into a class.

Farhana updated this object.

Farhana added reviewers: RKSimon, delena, mkuper, DavidKreitzer, zvi.

zvi added inline comments.Oct 27 2016, 7:36 AM

lib/Target/X86/X86InterleavedAccess.cpp
141	Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct?
lib/Target/X86/X86InterleavedAccess.h
18 ↗	(On Diff #75878)	No need to include in the header. forward declaration is sufficient: class X86Subtarget;
19 ↗	(On Diff #75878)	Probably same as above, LLVM convention is to use forward declarations if reasonable.
53 ↗	(On Diff #75878)	Consider documenting this c-tor's arguments.
66 ↗	(On Diff #75878)	Does this method need to be public?
77 ↗	(On Diff #75878)	Does this method need to be public?

RKSimon added inline comments.Oct 27 2016, 7:59 AM

lib/Target/X86/X86InterleavedAccess.cpp
224	Isn't this a leak? Where is this deleted?

Thanks Simon and Zvi.

Here is the updated change-set that addresses your comments.

zvi added inline comments.Oct 28 2016, 2:01 AM

lib/Target/X86/X86InterleavedAccess.cpp
224	It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object.

The update addresses Zvi's comment about declaring Grp as a functional local variable.

DavidKreitzer added inline comments.Oct 31 2016, 2:14 PM

lib/Target/X86/X86InterleavedAccess.cpp
26	elements --> element
53	This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."?
55	Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead?
64	It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors OutputVectors --> TransposedVectors
68	Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed
95	Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation).
116	Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members.
119	VecSize --> SubVecSize would be clearer

Thanks Dave, the update addresses your comment.

lib/Target/X86/X86InterleavedAccess.cpp
55	So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly.
68	Yes.
116	So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern.

Please can you replace the uses of uint32_t with unsigned?

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

Hi Simon,

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

Replaces uint32_t with unsigned.

delena added inline comments.Nov 9 2016, 7:35 PM

lib/Target/X86/X86InterleavedAccess.cpp
33	const LoadInst* Inst;
49	Do you need to keep Builder inside? I assume it may be a local var inside function.
80	const Instruction *I
223	Why do you need to pass an empty builder here?
227	return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();

This change-set addresses Elena's comments.

lib/Target/X86/X86InterleavedAccess.cpp
33	Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst.
49	Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation.
80	I think it's covered by my previous comment.
223	We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions.

delena added inline comments.Nov 10 2016, 10:57 PM

lib/Target/X86/X86InterleavedAccess.cpp
33	so "Instruction* Inst" is enough. The "const" is redundant here.
127	It is always "load" or you are thinking about any general case? You can call the function decomposeLoad.
141	Zvi is right. And fix indentation, please.

Farhana added inline comments.Nov 11 2016, 7:15 AM

lib/Target/X86/X86InterleavedAccess.cpp
127	Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle.
141	Sorry Zvi, I missed this comment earlier. Yes, that's correct.
141	"And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is.

In D25986#590787, @Farhana wrote:

In D25986#590381, @RKSimon wrote:

Please can you replace the uses of uint32_t with unsigned?

I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.

But in general is there a reason for wanting the size to change with the underlying architecture?

In general we should match whatever the original return values was - Type::getVectorNumElements() returns unsigned so the NumSubVectors should probably match that.

I'm sorry but I messed up on the types of a couple of the other instances - CreateShuffleVector should take an ArrayRef<uint32_t> and DataLayout::getTypeSizeInBits returns a uint64_t

lib/Target/X86/X86InterleavedAccess.cpp
204	for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i)

Farhana updated this revision to Diff 77650.Nov 11 2016, 12:17 PM

Hi Guys,

Is there anything else you want me to fix/clarify?

Farhana

LGTM. Thanks, Farhana!

X86InterleavedAccess.cpp
71 ↗	(On Diff #77650)	Minor comment: AFAIU you can pass an ArrayRef object by value efficiently, but i did see several occurrences in LLVM's source code with ArrayRef's passed by reference. Whatever you choose, maybe make this entire file consistent (line 82, for example is inconsistent with line 71). I see that the overriden methods such as TargetLowering::lowerInterleavedLoad pass ArrayRef objects by value, so maybe this would be a tie-breaker for following this convention?

ArrayRefs are passed by value.

Thanks, Farhana! LGTM.

LGTM

This revision is now accepted and ready to land.Nov 23 2016, 4:04 AM

Hi Farhana,
I have no further comments. This LGTM too.
-Dave

Closed by commit rL288410: Refactored X86InterleavedAccess into a class. NFCI. (authored by dlkreitz). · Explain WhyDec 1 2016, 12:06 PM

This revision was automatically updated to reflect the committed changes.

Diff 76113

lib/Target/X86/X86InterleavedAccess.cpp

	//===------- X86InterleavedAccess.cpp --------------===//			//===--------- X86InterleavedAccess.cpp ----------------------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//
	//			///
	// This file contains the X86 implementation of the interleaved accesses			/// \file
	// optimization generating X86-specific instructions/intrinsics for interleaved			/// This file contains the X86 implementation of the interleaved accesses
	// access groups.			/// optimization generating X86-specific instructions/intrinsics for
	//			/// interleaved access groups.
	//===----------------------------------------------------------------------===//			///
				//===--------------------------------------------------------------------===//

	#include "X86ISelLowering.h"			#include "X86ISelLowering.h"
	#include "X86TargetMachine.h"			#include "X86TargetMachine.h"

	using namespace llvm;			using namespace llvm;

	/// Returns true if the interleaved access group represented by the shuffles			/// \brief This class holds necessary information to represent an interleaved
	/// is supported for the subtarget. Returns false otherwise.			/// access group and supports utilities to lower the group into
	static bool isSupported(const X86Subtarget &SubTarget,			/// X86-specific instructions/intrinsics.
	const LoadInst *LI,			/// E.g. A group of interleaving access loads (Factor = 2; accessing every
	const ArrayRef<ShuffleVectorInst *> &Shuffles,			/// other elements)
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions elements --> element DavidKreitzer: elements --> element
	unsigned Factor) {			/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
				/// %v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
	const DataLayout &DL = Shuffles[0]->getModule()->getDataLayout();			/// %v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>

				class X86InterleavedAccessGroup {
				/// \brief Reference to the wide-load instruction of an interleaved access
				/// group.
				delenaUnsubmitted Not Done Reply Inline Actions const LoadInst* Inst; delena: const LoadInst* Inst;
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst. Farhana: Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a…
				delenaUnsubmitted Not Done Reply Inline Actions so "Instruction* Inst" is enough. The "const" is redundant here. delena: so "Instruction* Inst" is enough. The "const" is redundant here.
				Instruction *const Inst;

				/// \brief Reference to the shuffle(s), consumer(s) of the (load) 'Inst'.
				const ArrayRef<ShuffleVectorInst *> Shuffles;

				/// \brief Reference to the starting index of each user-shuffle.
				const ArrayRef<unsigned> Indices;

				/// \brief Reference to the interleaving stride in terms of elements.
				const unsigned Factor;

				/// \brief Reference to the underlying target.
				const X86Subtarget &Subtarget;

				const DataLayout &DL;

				delenaUnsubmitted Not Done Reply Inline Actions Do you need to keep Builder inside? I assume it may be a local var inside function. delena: Do you need to keep Builder inside? I assume it may be a local var inside function.
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation. Farhana: Yes, because Builder is used in different member functions and the different member functions…
				IRBuilder<> &Builder;

				/// \brief Breaks down a vector \p 'Inst' of N elements into \p NumSubVectors
				/// sub vectors of sub-vector of type \p T. Returns the sub-vectors in \p
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."? DavidKreitzer: This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."?
				/// DecomposedVectors
				bool decompose(Instruction Inst, uint32_t NumSubVectors, VectorType T,
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead? DavidKreitzer: Please describe the return value in the comment. I assume it is supposed to indicate success?
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly. Farhana: So the intent is to use it for breaking down any kind of instruction such as load, shuffle.
				SmallVectorImpl<Instruction *> &DecomposedVectors);

				/// \brief Performs matrix transposition on \p InputVectors and returns the
				/// transposed-vectors in \p TrasposedVectors.
				/// E.g.
				/// Input-Vectors:
				/// In-V0 = p1, p2
				/// In-V1 = q1, q2
				/// Output-Vectors:
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors OutputVectors --> TransposedVectors DavidKreitzer: It would be slightly clearer if you used the variable names here, i.e. Input-Vectors -->…
				/// Out-V0 = p1, q1
				/// Out-V1 = p2, q2
				void transpose(const ArrayRef<Instruction *> &InputVectors,
				SmallVectorImpl<Value *> &TrasposedVectors);
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed DavidKreitzer: Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes. Farhana: Yes.

				public:
				/// In order to form an interleaved access group X86InterleavedAccessGroup
				/// requires a wide-load instruction \p 'I', a group of interleaved-vectors
				/// \p Shuffs, reference to the first indices of each interleaved-vector
				/// \p 'Ind' and the interleaving stride factor \p F. In order to generate
				/// X86-specific instructions/intrinsics it also requires the underlying
				/// target information \p STarget.
				explicit X86InterleavedAccessGroup(Instruction *I,
				const ArrayRef<ShuffleVectorInst *> Shuffs,
				const ArrayRef<unsigned> Ind,
				const unsigned F,
				delenaUnsubmitted Not Done Reply Inline Actions const Instruction I delena:* const Instruction *I
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions I think it's covered by my previous comment. Farhana: I think it's covered by my previous comment.
				const X86Subtarget &STarget,
				IRBuilder<> &B)
				: Inst(I), Shuffles(Shuffs), Indices(Ind), Factor(F), Subtarget(STarget),
				DL(Inst->getModule()->getDataLayout()), Builder(B) {}

				/// \brief Returns true if this interleaved access group can be lowered into
				/// x86-specific instructions/intrinsics, false otherwise.
				bool isSupported() const;

				/// \brief Lowers this interleaved access group into X86-specific
				/// instructions/intrinsics.
				bool lowerIntoOptimizedSequence();
				};

				// Returns true if this interleaved access group is supported for the
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation). DavidKreitzer: Don't duplicate the comments from the class definition unless they contain additional value (e.
				// subtarget. Returns false otherwise.
				bool X86InterleavedAccessGroup::isSupported() const {
	VectorType *ShuffleVecTy = Shuffles[0]->getType();			VectorType *ShuffleVecTy = Shuffles[0]->getType();
	unsigned ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);			unsigned ShuffleVecSize = DL.getTypeSizeInBits(ShuffleVecTy);
	Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();			Type *ShuffleEltTy = ShuffleVecTy->getVectorElementType();

	if (DL.getTypeSizeInBits(LI->getType()) < Factor * ShuffleVecSize)			if (DL.getTypeSizeInBits(Inst->getType()) < Factor * ShuffleVecSize)
	return false;			return false;

	// Currently, lowering is supported for 64 bits on AVX.			// Currently, lowering is supported for 64 bits on AVX.
	if (!SubTarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|			if (!Subtarget.hasAVX() \|\| ShuffleVecSize != 256 \|\|
	DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\|			DL.getTypeSizeInBits(ShuffleEltTy) != 64 \|\| Factor != 4)
	Factor != 4)
	return false;			return false;

	return true;			return true;
	}			}

	/// \brief Lower interleaved load(s) into target specific instructions/			// Breaks down a vector 'VecInst' of N elements into NumSubVectors sub vectors;
	/// intrinsics. Lowering sequence varies depending on the vector-types, factor,			// where each sub-vector of type VecTy.
	/// number of shuffles and ISA.			bool X86InterleavedAccessGroup::decompose(
	/// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.			Instruction VecInst, uint32_t NumSubVectors, VectorType SubVecTy,
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members. DavidKreitzer: Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern. Farhana: So the intent is to use this function in a general way and use it to break down any kind of…
	bool X86TargetLowering::lowerInterleavedLoad(			SmallVectorImpl<Instruction *> &DecomposedVectors) {
	LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,			Type *VecTy = VecInst->getType();
	ArrayRef<unsigned> Indices, unsigned Factor) const {			uint32_t VecSize = DL.getTypeSizeInBits(SubVecTy);
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions VecSize --> SubVecSize would be clearer DavidKreitzer: VecSize --> SubVecSize would be clearer
	assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
	"Invalid interleave factor");			assert(VecTy->isVectorTy() &&
	assert(!Shuffles.empty() && "Empty shufflevector input");			DL.getTypeSizeInBits(VecTy) >= VecSize * NumSubVectors &&
	assert(Shuffles.size() == Indices.size() &&			"Invalid Inst-size!!!");
	"Unmatched number of shufflevectors and indices");			assert(VecTy->getVectorElementType() == SubVecTy->getVectorElementType() &&
				"Element type mismatched!!!");

	if (!isSupported(Subtarget, LI, Shuffles, Factor))			if (!isa<LoadInst>(VecInst))
				delenaUnsubmitted Not Done Reply Inline Actions It is always "load" or you are thinking about any general case? You can call the function decomposeLoad. delena: It is always "load" or you are thinking about any general case? You can call the function…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle. Farhana: Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such…
	return false;			return false;

	VectorType *ShuffleVecTy = Shuffles[0]->getType();			LoadInst *LI = cast<LoadInst>(VecInst);
				Type *VecBasePtrTy = SubVecTy->getPointerTo(LI->getPointerAddressSpace());
	Type *VecBasePtrTy = ShuffleVecTy->getPointerTo(LI->getPointerAddressSpace());

	IRBuilder<> Builder(LI);
	SmallVector<Instruction *, 4> NewLoads;
	SmallVector<Value *, 4> NewShuffles;
	NewShuffles.resize(Factor);

	Value *VecBasePtr =			Value *VecBasePtr =
	Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);			Builder.CreateBitCast(LI->getPointerOperand(), VecBasePtrTy);

	// Generate 4 loads of type v4xT64			// Generate N loads of T type
	for (unsigned Part = 0; Part < Factor; Part++) {			for (uint32_t i = 0; i < NumSubVectors; i++) {
	// TODO: Support inbounds GEP			// TODO: Support inbounds GEP
	Value *NewBasePtr =			Value *NewBasePtr = Builder.CreateGEP(VecBasePtr, Builder.getInt32(i));
	Builder.CreateGEP(VecBasePtr, Builder.getInt32(Part));
	Instruction *NewLoad =			Instruction *NewLoad =
	Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());			Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment());
				zviUnsubmitted Not Done Reply Inline Actions Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct? zvi: Not related to this patch, but is setting the original vector load's alignment for all…
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions Sorry Zvi, I missed this comment earlier. Yes, that's correct. Farhana: Sorry Zvi, I missed this comment earlier. Yes, that's correct.
				delenaUnsubmitted Not Done Reply Inline Actions Zvi is right. And fix indentation, please. delena: Zvi is right. And fix indentation, please.
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions "And fix indentation, please" Instruction NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is. Farhana:* "And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad…
	NewLoads.push_back(NewLoad);			DecomposedVectors.push_back(NewLoad);
	}			}

				return true;
				}

				// Performs matrix transposition and returns the transposed-vectors.
				// E.g.
				// Input-Vectors:
				// In-V0 = p1, p2
				// In-V1 = q1, q2
				// Output-Vectors:
				// Out-V0 = p1, q1
				// Out-V1 = p2, q2
				void X86InterleavedAccessGroup::transpose(
				const ArrayRef<Instruction *> &Matrix,
				SmallVectorImpl<Value *> &TransposedMatrix) {
				TransposedMatrix.resize(4);

	// dst = src1[0,1],src2[0,1]			// dst = src1[0,1],src2[0,1]
	uint32_t IntMask1[] = {0, 1, 4, 5};			uint32_t IntMask1[] = {0, 1, 4, 5};
	ArrayRef<unsigned int> ShuffleMask = makeArrayRef(IntMask1, 4);			ArrayRef<unsigned int> Mask = makeArrayRef(IntMask1, 4);
	Value *IntrVec1 =			Value *IntrVec1 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec2 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec2 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[2,3],src2[2,3]			// dst = src1[2,3],src2[2,3]
	uint32_t IntMask2[] = {2, 3, 6, 7};			uint32_t IntMask2[] = {2, 3, 6, 7};
	ShuffleMask = makeArrayRef(IntMask2, 4);			Mask = makeArrayRef(IntMask2, 4);
	Value *IntrVec3 =			Value *IntrVec3 = Builder.CreateShuffleVector(Matrix[0], Matrix[2], Mask);
	Builder.CreateShuffleVector(NewLoads[0], NewLoads[2], ShuffleMask);			Value *IntrVec4 = Builder.CreateShuffleVector(Matrix[1], Matrix[3], Mask);
	Value *IntrVec4 =
	Builder.CreateShuffleVector(NewLoads[1], NewLoads[3], ShuffleMask);

	// dst = src1[0],src2[0],src1[2],src2[2]			// dst = src1[0],src2[0],src1[2],src2[2]
	uint32_t IntMask3[] = {0, 4, 2, 6};			uint32_t IntMask3[] = {0, 4, 2, 6};
	ShuffleMask = makeArrayRef(IntMask3, 4);			Mask = makeArrayRef(IntMask3, 4);
	NewShuffles[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[0] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[2] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	// dst = src1[1],src2[1],src1[3],src2[3]			// dst = src1[1],src2[1],src1[3],src2[3]
	uint32_t IntMask4[] = {1, 5, 3, 7};			uint32_t IntMask4[] = {1, 5, 3, 7};
	ShuffleMask = makeArrayRef(IntMask4, 4);			Mask = makeArrayRef(IntMask4, 4);
	NewShuffles[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, ShuffleMask);			TransposedMatrix[1] = Builder.CreateShuffleVector(IntrVec1, IntrVec2, Mask);
	NewShuffles[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, ShuffleMask);			TransposedMatrix[3] = Builder.CreateShuffleVector(IntrVec3, IntrVec4, Mask);

	for (unsigned i = 0; i < Shuffles.size(); i++) {
	unsigned Index = Indices[i];
	Shuffles[i]->replaceAllUsesWith(NewShuffles[Index]);
	}			}

				// Lowers this interleaved access group into X86-specific
				// instructions/intrinsics.
				bool X86InterleavedAccessGroup::lowerIntoOptimizedSequence() {
				SmallVector<Instruction *, 4> DecomposedVectors;
				VectorType *VecTy = Shuffles[0]->getType();
				// Try to generate target-sized register(/instruction).
				if (!decompose(Inst, Factor, VecTy, DecomposedVectors))
				return false;

				SmallVector<Value *, 4> TransposedVectors;
				// Perform matrix-transposition in order to compute interleaved
				// results by generating some sort of (optimized) target-specific
				// instructions.
				transpose(DecomposedVectors, TransposedVectors);

				// Now replace the unoptimized-interleaved-vectors with the
				// transposed-interleaved vectors.
				for (unsigned i = 0; i < Shuffles.size(); i++)
				Shuffles[i]->replaceAllUsesWith(TransposedVectors[Indices[i]]);
				RKSimonUnsubmitted Not Done Reply Inline Actions for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i) RKSimon: for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i)

	return true;			return true;
	}			}

				// Lower interleaved load(s) into target specific instructions/
				// intrinsics. Lowering sequence varies depending on the vector-types, factor,
				// number of shuffles and ISA.
				// Currently, lowering is supported for 4x64 bits with Factor = 4 on AVX.
				bool X86TargetLowering::lowerInterleavedLoad(
				LoadInst LI, ArrayRef<ShuffleVectorInst > Shuffles,
				ArrayRef<unsigned> Indices, unsigned Factor) const {
				assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
				"Invalid interleave factor");
				assert(!Shuffles.empty() && "Empty shufflevector input");
				assert(Shuffles.size() == Indices.size() &&
				"Unmatched number of shufflevectors and indices");

				// Create an interleaved access group.
				IRBuilder<> Builder(LI);
				delenaUnsubmitted Not Done Reply Inline Actions Why do you need to pass an empty builder here? delena: Why do you need to pass an empty builder here?
				FarhanaAuthorUnsubmitted Not Done Reply Inline Actions We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions. Farhana: We are using this builder with insertionpoint set to load instruction in multiple member…
				X86InterleavedAccessGroup *Grp = new X86InterleavedAccessGroup(
				RKSimonUnsubmitted Not Done Reply Inline Actions Isn't this a leak? Where is this deleted? RKSimon: Isn't this a leak? Where is this deleted?
				zviUnsubmitted Not Done Reply Inline Actions It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object. zvi: It would better to instantiate as a function-local variable. If you do want to allocate on the…
				LI, Shuffles, Indices, Factor, Subtarget, Builder);

				bool IsLowered = false;
				delenaUnsubmitted Not Done Reply Inline Actions return Grp.isSupported() && Grp.lowerIntoOptimizedSequence(); delena: return Grp.isSupported() && Grp.lowerIntoOptimizedSequence();

				if (Grp->isSupported() && Grp->lowerIntoOptimizedSequence())
				IsLowered = true;

				delete Grp;
				return IsLowered;
				}

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a class
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 76113

lib/Target/X86/X86InterleavedAccess.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Re-factorization of X86InterleaveAccess into a classClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 76113

lib/Target/X86/X86InterleavedAccess.cpp

Re-factorization of X86InterleaveAccess into a class
ClosedPublic