This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer][InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
AbandonedPublic

Authored by dtemirbulatov on Jul 7 2017, 12:56 PM.

Download Raw Diff

Details

Reviewers

niravd
spatel
RKSimon
mkuper
ABataev

Summary

This change allows to restore ones removed instruction by InstCombine pass in order to vectorize AVX 256-bit shuffle oration.

Also, dereferenceable_or_null metadata now attributes to loads with no pointer types, but according to http://llvm.org/docs/LangRef.html#load-instruction this change looks correct to me:
"The optional !dereferenceable_or_null metadata must reference a single metadata name <deref_bytes_node> corresponding to a metadata node with one i64 entry. The existence of the !dereferenceable_or_null metadata on the instruction tells the optimizer that the value loaded is known to be either dereferenceable or null. The number of bytes known to be dereferenceable is specified by the integer value in the metadata node."

So, for this code:
__m256d vsht_d4_fold(const double* ptr) {

__m256d foo = (__m256d){ ptr[0], ptr[1], ptr[2], ptr[3] }; 
return __builtin_shufflevector( foo, foo, 0, 0, 2, 2 );

}
Currently, after InstCombine pass we have:
define <4 x double> @load_four_scalars_but_use_two(double* %ptr) #0 {

%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
%ld0 = load double, double* %ptr, align 8
%ld2 = load double, double* %arrayidx2, align 8
%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
%ins2 = insertelement <4 x double> %ins0, double %ld2, i32 2
%shuffle = shufflevector <4 x double> %ins2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %shuffle

}
that results to this assembler in output:

vmovsd	(%rdi), %xmm0           # xmm0 = mem[0],zero

vmovsd 16(%rdi), %xmm1 # xmm1 = mem[0],zero
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vmovddup %ymm0, %ymm0 # ymm0 = ymm0[0,0,2,2]
retq

after this change we have following output after InstCombine:

define <4 x double> @load_four_scalars_but_use_two(double* nocapture readonly %ptr) local_unnamed_addr #0 {

%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
%ld0 = load double, double* %ptr, align 8, !dereferenceable_or_null !0
%ld2 = load double, double* %arrayidx2, align 8, !dereferenceable_or_null !1
%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
%ins2 = insertelement <4 x double> %ins0, double %ld2, i32 2
%shuffle = shufflevector <4 x double> %ins2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %shuffle

}
and following IR after SLP Vectorizer:
define <4 x double> @load_four_scalars_but_use_two(double* nocapture readonly %ptr) local_unnamed_addr #0 {

%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
%1 = getelementptr double, double* %ptr, i64 1
%2 = getelementptr double, double* %ptr, i64 3
%3 = bitcast double* %ptr to <4 x double>*
%4 = load <4 x double>, <4 x double>* %3, align 8
%5 = extractelement <4 x double> %4, i32 0
%ins0 = insertelement <4 x double> undef, double %5, i32 0
%6 = extractelement <4 x double> %4, i32 1
%7 = insertelement <4 x double> %ins0, double %6, i64 1
%8 = extractelement <4 x double> %4, i32 2
%ins2 = insertelement <4 x double> %7, double %8, i32 2
%9 = extractelement <4 x double> %4, i32 3
%10 = insertelement <4 x double> %ins2, double %9, i64 3
%shuffle = shufflevector <4 x double> %10, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %shuffle

}
Which allows us to have this assembler in output:

vmovddup	(%rdi), %ymm0   # ymm0 = mem[0,0,2,2]

retq

Diff Detail

Event Timeline

dtemirbulatov created this revision.Jul 7 2017, 12:56 PM

Herald added subscribers: mzolotukhin, rengolin. · View Herald TranscriptJul 7 2017, 12:56 PM

RKSimon added a reviewer: ABataev.Jul 9 2017, 3:42 AM

First pass review

lib/Transforms/InstCombine/InstructionCombining.cpp
93	You really shouldn't have global variables like this, these need to go and be embedded in the combine instead.
1423	Comments would be nice!
2861	Can you use auto instead of BasicBlock::iterator ?
2869	use cast<> not dyn_cast<> if you know the cast will succeed.
2874	cast<>
2876	cast<>
2882	Do the APInt offset calc separately so this code is clearer.
2891	clang-format all this whole new code
lib/Transforms/Vectorize/SLPVectorizer.cpp
4872	Newline after the Ptr = line to make it clearer, use cast<>?
4969	clang-format
5138	clang-format (drop the braces?)
test/Transforms/SLPVectorizer/X86/21780.ll
1	Test file should be called pr21789.ll. Submit it to trunk now with current codegen and then update this patch so it shows the deltas - you can use utils/update_test_checks.py to auto-generate the codegen

Abandoning review due to : there should separate reviews for one for InstCombine Pass and another one for SLP, and dereferenceable_or_null metadata can only be applied to loads of a pointer type.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Vectorize/

SLPVectorizer.h

4 lines

lib/

IR/

Verifier.cpp

2 lines

Transforms/

InstCombine/

InstructionCombining.cpp

52 lines

Vectorize/

SLPVectorizer.cpp

142 lines

test/

Transforms/

SLPVectorizer/

X86/

21780.ll

97 lines

Verifier/

dereferenceable-md.ll

18 lines

Diff 105676

include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	private:

/// \brief Vectorize the store instructions collected in Stores.		/// \brief Vectorize the store instructions collected in Stores.
bool vectorizeStoreChains(slpvectorizer::BoUpSLP &R);		bool vectorizeStoreChains(slpvectorizer::BoUpSLP &R);

/// \brief Vectorize the index computations of the getelementptr instructions		/// \brief Vectorize the index computations of the getelementptr instructions
/// collected in GEPs.		/// collected in GEPs.
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

		/// \brief Restore inserts, loads out of dereferenceable_or_null attribute.
		InsertElementInst * restoreInserts(InsertElementInst *FirstInsertElem,
		LoadInst LInstr, const MDNode MD);

/// Try to find horizontal reduction or otherwise vectorize a chain of binary		/// Try to find horizontal reduction or otherwise vectorize a chain of binary
/// operators.		/// operators.
bool vectorizeRootInstruction(PHINode P, Value V, BasicBlock *BB,		bool vectorizeRootInstruction(PHINode P, Value V, BasicBlock *BB,
slpvectorizer::BoUpSLP &R,		slpvectorizer::BoUpSLP &R,
TargetTransformInfo *TTI);		TargetTransformInfo *TTI);

/// \brief Scan the basic block and look for patterns that are likely to start		/// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
Show All 16 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 3,708 Lines • ▼ Show 20 Lines	if (!isa<PHINode>(I) && InstsInThisBlock.count(Op))
return;		return;

const Use &U = I.getOperandUse(i);		const Use &U = I.getOperandUse(i);
Assert(DT.dominates(Op, U),		Assert(DT.dominates(Op, U),
"Instruction does not dominate all uses!", Op, &I);		"Instruction does not dominate all uses!", Op, &I);
}		}

void Verifier::visitDereferenceableMetadata(Instruction& I, MDNode* MD) {		void Verifier::visitDereferenceableMetadata(Instruction& I, MDNode* MD) {
Assert(I.getType()->isPointerTy(), "dereferenceable, dereferenceable_or_null "
"apply only to pointer types", &I);
Assert(isa<LoadInst>(I),		Assert(isa<LoadInst>(I),
"dereferenceable, dereferenceable_or_null apply only to load"		"dereferenceable, dereferenceable_or_null apply only to load"
" instructions, use attributes for calls or invokes", &I);		" instructions, use attributes for calls or invokes", &I);
Assert(MD->getNumOperands() == 1, "dereferenceable, dereferenceable_or_null "		Assert(MD->getNumOperands() == 1, "dereferenceable, dereferenceable_or_null "
"take one operand!", &I);		"take one operand!", &I);
ConstantInt *CI = mdconst::dyn_extract<ConstantInt>(MD->getOperand(0));		ConstantInt *CI = mdconst::dyn_extract<ConstantInt>(MD->getOperand(0));
Assert(CI && CI->getType()->isIntegerTy(64), "dereferenceable, "		Assert(CI && CI->getType()->isIntegerTy(64), "dereferenceable, "
"dereferenceable_or_null metadata value must be an i64!", &I);		"dereferenceable_or_null metadata value must be an i64!", &I);
▲ Show 20 Lines • Show All 1,225 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/InstCombine/InstCombine.h"		#include "llvm/Transforms/InstCombine/InstCombine.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <algorithm>		#include <algorithm>
Show All 14 Lines
static cl::opt<bool>		static cl::opt<bool>
EnableExpensiveCombines("expensive-combines",		EnableExpensiveCombines("expensive-combines",
cl::desc("Enable expensive instruction combines"));		cl::desc("Enable expensive instruction combines"));

static cl::opt<unsigned>		static cl::opt<unsigned>
MaxArraySize("instcombine-maxarray-size", cl::init(1024),		MaxArraySize("instcombine-maxarray-size", cl::init(1024),
cl::desc("Maximum array size considered when doing a combine"));		cl::desc("Maximum array size considered when doing a combine"));

		DenseMap<Value , Value > Bases;

		DenseMap<Value , std::pair<Value, Value*>> GEPs;
		RKSimonUnsubmitted Not Done Reply Inline Actions You really shouldn't have global variables like this, these need to go and be embedded in the combine instead. RKSimon: You really shouldn't have global variables like this, these need to go and be embedded in the…

Value InstCombiner::EmitGEPOffset(User GEP) {		Value InstCombiner::EmitGEPOffset(User GEP) {
return llvm::EmitGEPOffset(Builder, DL, GEP);		return llvm::EmitGEPOffset(Builder, DL, GEP);
}		}

/// Return true if it is desirable to convert an integer computation from a		/// Return true if it is desirable to convert an integer computation from a
/// given bit width to a new bit width.		/// given bit width to a new bit width.
/// We don't want to convert from a legal to an illegal type or from a smaller		/// We don't want to convert from a legal to an illegal type or from a smaller
/// to a larger illegal type. A width of '1' is always treated as a legal type		/// to a larger illegal type. A width of '1' is always treated as a legal type
▲ Show 20 Lines • Show All 1,312 Lines • ▼ Show 20 Lines	Value *InstCombiner::SimplifyVectorOp(BinaryOperator &Inst) {
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {		Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {
SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());		SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());

		if (!dyn_cast<VectorType>(Ops[0]->getType()) && GEP.getNumOperands() == 2)
		RKSimonUnsubmitted Not Done Reply Inline Actions Comments would be nice! RKSimon: Comments would be nice!
		if (Constant *CST = dyn_cast<Constant>(GEP.getOperand(1))) {
		Value *PtrOp = GEP.getOperand(0);
		Value *GEPResult = llvm::cast<llvm::Value>(&GEP);
		if (CST > Bases[PtrOp])
		Bases[PtrOp]=CST;
		GEPs[GEPResult] = std::make_pair(PtrOp, CST);
		}

if (Value *V = SimplifyGEPInst(GEP.getSourceElementType(), Ops,		if (Value *V = SimplifyGEPInst(GEP.getSourceElementType(), Ops,
SQ.getWithInstruction(&GEP)))		SQ.getWithInstruction(&GEP)))
return replaceInstUsesWith(GEP, V);		return replaceInstUsesWith(GEP, V);

Value *PtrOp = GEP.getOperand(0);		Value *PtrOp = GEP.getOperand(0);

// Eliminate unneeded casts for indices, and replace indices which displace		// Eliminate unneeded casts for indices, and replace indices which displace
// by multiples of a zero size type with zero.		// by multiples of a zero size type with zero.
▲ Show 20 Lines • Show All 1,408 Lines • ▼ Show 20 Lines
bool InstCombiner::run() {		bool InstCombiner::run() {
while (!Worklist.isEmpty()) {		while (!Worklist.isEmpty()) {
Instruction *I = Worklist.RemoveOne();		Instruction *I = Worklist.RemoveOne();
if (I == nullptr) continue; // skip null values.		if (I == nullptr) continue; // skip null values.

// Check to see if we can DCE the instruction.		// Check to see if we can DCE the instruction.
if (isInstructionTriviallyDead(I, &TLI)) {		if (isInstructionTriviallyDead(I, &TLI)) {
DEBUG(dbgs() << "IC: DCE: " << *I << '\n');		DEBUG(dbgs() << "IC: DCE: " << *I << '\n');
		// If we have a load here then we need to
		// update other loads with dereferenceable_or_null
		// for the same base pointer.
		if (LoadInst *Load = dyn_cast<LoadInst>(I)) {
		BasicBlock *BB = Load->getParent();
		for (BasicBlock::iterator Iter = BB->begin(),
		RKSimonUnsubmitted Not Done Reply Inline Actions Can you use auto instead of BasicBlock::iterator ? RKSimon: Can you use auto instead of BasicBlock::iterator ?
		E = BB->end(); Iter != E; ) {
		long long MaxOffset=0, CurPos=0;
		if (LoadInst *LInstr = dyn_cast<LoadInst>(Iter++)) {
		if (LInstr->getMetadata(
		LLVMContext::MD_dereferenceable_or_null) != nullptr)
		continue;
		if (Bases[LInstr->getOperand(0)])
		MaxOffset = (dyn_cast<ConstantInt>
		RKSimonUnsubmitted Not Done Reply Inline Actions use cast<> not dyn_cast<> if you know the cast will succeed. RKSimon: use cast<> not dyn_cast<> if you know the cast will succeed.
		(Bases[LInstr->getOperand(0)]))->getZExtValue();
		else
		if (GEPs.count(LInstr->getOperand(0)) > 0) {
		MaxOffset = (dyn_cast<ConstantInt>
		(Bases[GEPs[LInstr->getOperand(0)].first]))->getZExtValue();
		RKSimonUnsubmitted Not Done Reply Inline Actions cast<> RKSimon: cast<>
		CurPos = (dyn_cast<ConstantInt>(
		GEPs[LInstr->getOperand(0)].second))->getZExtValue();
		RKSimonUnsubmitted Not Done Reply Inline Actions cast<> RKSimon: cast<>
		}
		if ((MaxOffset-CurPos) != 0)
		{
		Constant *C = ConstantInt::get(LInstr->getContext(),
		APInt(64, (MaxOffset - CurPos)
		*DL.getTypeStoreSize(LInstr->getType())));
		RKSimonUnsubmitted Not Done Reply Inline Actions Do the APInt offset calc separately so this code is clearer. RKSimon: Do the APInt offset calc separately so this code is clearer.
		SmallVector<Metadata*, 3> Vals;
		MDBuilder MDB(LInstr->getContext());
		Vals.push_back(MDB.createConstant(C));
		LInstr->setMetadata(LLVMContext::MD_dereferenceable_or_null,
		MDNode::get(LInstr->getContext(), Vals));
		}
		}
		}
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format all this whole new code RKSimon: clang-format all this whole new code
eraseInstFromFunction(*I);		eraseInstFromFunction(*I);
++NumDeadInst;		++NumDeadInst;
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
}		}

// Instruction isn't dead, see if we can constant propagate it.		// Instruction isn't dead, see if we can constant propagate it.
if (!I->use_empty() &&		if (!I->use_empty() &&
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
static bool		static bool
combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,		combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,
AliasAnalysis *AA, AssumptionCache &AC,		AliasAnalysis *AA, AssumptionCache &AC,
TargetLibraryInfo &TLI, DominatorTree &DT,		TargetLibraryInfo &TLI, DominatorTree &DT,
bool ExpensiveCombines = true,		bool ExpensiveCombines = true,
LoopInfo *LI = nullptr) {		LoopInfo *LI = nullptr) {
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();
ExpensiveCombines \|= EnableExpensiveCombines;		ExpensiveCombines \|= EnableExpensiveCombines;
		Bases.clear();
		GEPs.clear();

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {		IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
Worklist.Add(I);		Worklist.Add(I);

▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 4,830 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
// Try to vectorize operands.		// Try to vectorize operands.
if (++Level < RecursionMaxDepth)		if (++Level < RecursionMaxDepth)
for (auto *Op : Inst->operand_values())		for (auto *Op : Inst->operand_values())
Stack.emplace_back(Op, Level);		Stack.emplace_back(Op, Level);
}		}
return Res;		return Res;
}		}

		InsertElementInst* SLPVectorizerPass::restoreInserts(InsertElementInst*
		FirstInsertElem, LoadInst *LInstr,
		const MDNode *MD)
		{
		Value *Ptr;
		Value *Use = nullptr;
		SmallVector<int64_t, 16> Offsets;
		SmallDenseMap<int64_t, Value*> Inserts;
		BasicBlock *BB = FirstInsertElem->getParent();
		int64_t FirstAt = 0;
		int64_t TypeSize = DL->getTypeStoreSize(LInstr->getType());
		VectorType *VecType = FirstInsertElem->getType();

		ConstantInt *C = cast<ConstantInt>(FirstInsertElem->getOperand(2));
		if (C->getZExtValue() == 0) {
		// We found the first InsertElem in
		// this chain.
		Inserts[0] = FirstInsertElem;
		Offsets.push_back(0);
		}
		if (GetElementPtrInst *GEP =
		dyn_cast<GetElementPtrInst>(LInstr->getOperand(0)))
		{
		Ptr = GEP->getOperand(0);
		// Looks like the FirstInsertElem was not first in the chain,
		// saving its offset in FirstAt.
		if (ConstantInt *C = dyn_cast<ConstantInt>(GEP->getOperand(1)))
		FirstAt = C->getZExtValue() * DL->getTypeStoreSize(LInstr->getType());
		else
		// Complex GEP for vector insert.
		return nullptr;
		} else
		Ptr = LInstr->getOperand(0);
		ConstantAsMetadata *ValMD =
		RKSimonUnsubmitted Not Done Reply Inline Actions Newline after the Ptr = line to make it clearer, use cast<>? RKSimon: Newline after the Ptr = line to make it clearer, use cast<>?
		dyn_cast<ConstantAsMetadata>(MD->getOperand(0));
		int64_t MaxOffset =
		cast<ConstantInt>(ValMD->getValue())->getZExtValue() +
		FirstAt;
		IRBuilder<NoFolder> Builder(LInstr->getParent(),
		++BasicBlock::iterator(LInstr));
		DEBUG(dbgs() << "Found maxmimum offset for " << *Ptr << " " <<
		MaxOffset << "\n");
		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++)
		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(it))
		{
		// Examining the chain of GEP, Load, Insert.
		if (Ptr == GEP->getOperand(0)) {
		if (ConstantInt *C = dyn_cast<ConstantInt>(GEP->getOperand(1)))
		{
		int64_t Off = C->getZExtValue()*TypeSize;
		Offsets.push_back(Off);
		if (GEP->use_empty())
		return nullptr;
		if (!GEP->hasOneUse())
		return nullptr;
		LoadInst *Load = dyn_cast<LoadInst>(GEP->user_back());
		if (!Load)
		return nullptr;
		if (Load->use_empty())
		return nullptr;
		if (!Load->hasOneUse())
		return nullptr;
		InsertElementInst *Insert =
		dyn_cast<InsertElementInst>(Load->user_back());
		if (!Insert)
		return nullptr;
		ConstantInt *C1 = cast<ConstantInt>(Insert->getOperand(2));
		int64_t Off1 = C1->getZExtValue()*TypeSize;
		// Offset from GEP must be equal to offset
		// from Load otherwise ignoring.
		if (Off != Off1)
		return nullptr;
		Inserts[Off] = Insert;
		if (!Insert->use_empty())
		Use = Insert->user_back();
		} else // Complex GEP for vector insert.
		return nullptr;
		}
		}

		for (int i=0; i<MaxOffset+TypeSize; i+=TypeSize) {
		// Walk through possible offsets.
		bool Found=false;
		for (unsigned j=0; j<Offsets.size(); j++)
		if (Offsets[j]==i) {
		// We found alrerady existing InsertElement
		// for particular offset.
		InsertElementInst *Ins = dyn_cast<InsertElementInst>(Inserts[i]);
		assert(Ins != nullptr && "Expect insert element instruction here");
		// We don't need to update InsertElement for
		// offset quals to 0, it should be Undef.
		if (i != 0 && (Ins->getOperand(0) != Inserts[i-TypeSize]))
		Ins->setOperand(0, Inserts[i-TypeSize]);
		Found=true;
		}
		if (!Found) {
		// Building GEP, Load, Insert for
		// this Offset.
		Builder.SetInsertPoint(LInstr);
		Builder.SetCurrentDebugLocation(LInstr->getDebugLoc());
		Value *NewGEP = Builder.CreateGEP(LInstr->getType(), Ptr,
		Builder.getInt64(i/TypeSize));
		Value *NewLoad = Builder.CreateLoad(NewGEP);
		if (i==0) {
		// For Offset 0 it should be Undef.
		Inserts[0] = Builder.CreateInsertElement(
		UndefValue::get(VecType), NewLoad, Builder.getInt64(0));
		FirstInsertElem = dyn_cast<InsertElementInst>(Inserts[0]);
		} else {
		// InsertElement must be inserted just after
		// previous InsertElement.
		InsertElementInst *PrevIns = dyn_cast<InsertElementInst>(
		Inserts[i-TypeSize]);
		Instruction *InsAt = PrevIns->getNextNode();
		IRBuilder<NoFolder> Builder1(InsAt->getParent(),
		++BasicBlock::iterator(InsAt));
		Builder1.SetInsertPoint(InsAt);
		Builder1.SetCurrentDebugLocation(InsAt->getDebugLoc());
		Inserts[i] = Builder1.CreateInsertElement(
		PrevIns, NewLoad, Builder1.getInt64(i/TypeSize));
		}
		}
		}
		if (ShuffleVectorInst* Shuffle = dyn_cast<ShuffleVectorInst>(Use))
		{
		// Update ShuffleVector with restored insert element.
		Shuffle->setOperand(0, Inserts[MaxOffset]);
		return FirstInsertElem;
		}
		return nullptr;
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format RKSimon: clang-format

bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,		bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,
BasicBlock *BB, BoUpSLP &R,		BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI) {		TargetTransformInfo *TTI) {
if (!V)		if (!V)
return false;		return false;
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
if (!I)		if (!I)
return false;		return false;
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++) {

// Try to vectorize trees that start at insertelement instructions.		// Try to vectorize trees that start at insertelement instructions.
if (InsertElementInst *FirstInsertElem = dyn_cast<InsertElementInst>(it)) {		if (InsertElementInst *FirstInsertElem = dyn_cast<InsertElementInst>(it)) {
SmallVector<Value *, 16> BuildVector;		SmallVector<Value *, 16> BuildVector;
SmallVector<Value *, 16> BuildVectorOpds;		SmallVector<Value *, 16> BuildVectorOpds;
if (!findBuildVector(FirstInsertElem, BuildVector, BuildVectorOpds))		if (!findBuildVector(FirstInsertElem, BuildVector, BuildVectorOpds))
continue;		continue;

		if (LoadInst *LInstr = dyn_cast<LoadInst>(
		FirstInsertElem->getOperand(1)))
		if (const MDNode *MD = LInstr->getMetadata(
		LLVMContext::MD_dereferenceable_or_null))
		{
		if (InsertElementInst *Insert =
		restoreInserts(FirstInsertElem, LInstr, MD))
		findBuildVector(Insert, BuildVector, BuildVectorOpds);
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format (drop the braces?) RKSimon: clang-format (drop the braces?)

// Vectorize starting with the build vector operands ignoring the		// Vectorize starting with the build vector operands ignoring the
// BuildVector instructions for the purpose of scheduling and user		// BuildVector instructions for the purpose of scheduling and user
// extraction.		// extraction.
if (tryToVectorizeList(BuildVectorOpds, R, BuildVector)) {		if (tryToVectorizeList(BuildVectorOpds, R, BuildVector)) {
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
}		}
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/21780.ll

This file was added.

				; RUN: opt < %s -instcombine -slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -S \| FileCheck %s --check-prefix=SLP
				RKSimonUnsubmitted Not Done Reply Inline Actions Test file should be called pr21789.ll. Submit it to trunk now with current codegen and then update this patch so it shows the deltas - you can use utils/update_test_checks.py to auto-generate the codegen RKSimon: Test file should be called pr21789.ll. Submit it to trunk now with current codegen and then…
				; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -S \| FileCheck %s --check-prefix=ICOMB

				; SLP-LABEL: load_four_scalars_but_use_two
				; SLP: %arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				; SLP-NEXT: %1 = getelementptr double, double* %ptr, i64 1
				; SLP-NEXT: %2 = getelementptr double, double* %ptr, i64 3
				; SLP-NEXT: %3 = bitcast double* %ptr to <4 x double>*
				; SLP-NEXT: %4 = load <4 x double>, <4 x double>* %3, align 8
				; SLP-NEXT: %5 = extractelement <4 x double> %4, i32 0
				; SLP-NEXT: %ins0 = insertelement <4 x double> undef, double %5, i32 0
				; SLP-NEXT: %6 = extractelement <4 x double> %4, i32 1
				; SLP-NEXT: %7 = insertelement <4 x double> %ins0, double %6, i64 1
				; SLP-NEXT: %8 = extractelement <4 x double> %4, i32 2
				; SLP-NEXT: %ins2 = insertelement <4 x double> %7, double %8, i32 2
				; SLP-NEXT: %9 = extractelement <4 x double> %4, i32 3
				; SLP-NEXT: %10 = insertelement <4 x double> %ins2, double %9, i64 3
				; SLP-NEXT: %shuffle = shufflevector <4 x double> %10, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
				; SLP-NEXT: ret <4 x double> %shuffle

				; ICOMB-LABEL: load_four_scalars_but_use_two
				; ICOMB: %arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				; ICOMB-NEXT: %ld0 = load double, double* %ptr, align 8, !dereferenceable_or_null !0
				; ICOMB-NEXT: %ld2 = load double, double* %arrayidx2, align 8, !dereferenceable_or_null !1
				; ICOMB-NEXT: %ins0 = insertelement <4 x double> undef, double %ld0, i32 0
				; ICOMB-NEXT: %ins2 = insertelement <4 x double> %ins0, double %ld2, i32 2
				; ICOMB-NEXT: %shuffle = shufflevector <4 x double> %ins2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
				; ICOMB-NEXT: ret <4 x double> %shuffle

				define <4 x double> @load_four_scalars_but_use_two(double* %ptr) {
				%arrayidx0 = getelementptr inbounds double, double* %ptr, i64 0
				%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3

				%ld0 = load double, double* %arrayidx0
				%ld1 = load double, double* %arrayidx1
				%ld2 = load double, double* %arrayidx2
				%ld3 = load double, double* %arrayidx3

				%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
				%ins1 = insertelement <4 x double> %ins0, double %ld1, i32 1
				%ins2 = insertelement <4 x double> %ins1, double %ld2, i32 2
				%ins3 = insertelement <4 x double> %ins2, double %ld3, i32 3

				%shuffle = shufflevector <4 x double> %ins3, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
				ret <4 x double> %shuffle
				}

				; SLP-LABEL: load_four_scalars_but_use_two1
				; SLP: %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				; SLP-NEXT: %arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				; SLP-NEXT: %1 = getelementptr double, double* %ptr, i64 0
				; SLP-NEXT: %2 = getelementptr double, double* %ptr, i64 3
				; SLP-NEXT: %3 = bitcast double* %1 to <4 x double>*
				; SLP-NEXT: %4 = load <4 x double>, <4 x double>* %3, align 8
				; SLP-NEXT: %5 = extractelement <4 x double> %4, i32 0
				; SLP-NEXT: %6 = insertelement <4 x double> undef, double %5, i64 0
				; SLP-NEXT: %7 = extractelement <4 x double> %4, i32 1
				; SLP-NEXT: %ins1 = insertelement <4 x double> %6, double %7, i32 1
				; SLP-NEXT: %8 = extractelement <4 x double> %4, i32 2
				; SLP-NEXT: %ins2 = insertelement <4 x double> %ins1, double %8, i32 2
				; SLP-NEXT: %9 = extractelement <4 x double> %4, i32 3
				; SLP-NEXT: %10 = insertelement <4 x double> %ins2, double %9, i64 3
				; SLP-NEXT: %shuffle = shufflevector <4 x double> %10, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 2>
				; SLP-NEXT: ret <4 x double> %shuffle

				; ICOMB-LABEL: load_four_scalars_but_use_two1
				; ICOMB: %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				; ICOMB-NEXT: %arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				; ICOMB-NEXT: %ld1 = load double, double* %arrayidx1, align 8, !dereferenceable_or_null !2
				; ICOMB-NEXT: %ld2 = load double, double* %arrayidx2, align 8, !dereferenceable_or_null !1
				; ICOMB-NEXT: %ins1 = insertelement <4 x double> undef, double %ld1, i32 1
				; ICOMB-NEXT: %ins2 = insertelement <4 x double> %ins1, double %ld2, i32 2
				; ICOMB-NEXT: %shuffle = shufflevector <4 x double> %ins2, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 2>
				; ICOMB-NEXT: ret <4 x double> %shuffle

				define <4 x double> @load_four_scalars_but_use_two1(double* %ptr) {
				%arrayidx0 = getelementptr inbounds double, double* %ptr, i64 0
				%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3

				%ld0 = load double, double* %arrayidx0
				%ld1 = load double, double* %arrayidx1
				%ld2 = load double, double* %arrayidx2
				%ld3 = load double, double* %arrayidx3

				%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
				%ins1 = insertelement <4 x double> %ins0, double %ld1, i32 1
				%ins2 = insertelement <4 x double> %ins1, double %ld2, i32 2
				%ins3 = insertelement <4 x double> %ins2, double %ld3, i32 3

				%shuffle = shufflevector <4 x double> %ins3, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 2>
				ret <4 x double> %shuffle
				}

test/Verifier/dereferenceable-md.ll

	Show All 12 Lines
	define void @f2() {			define void @f2() {
	entry:			entry:
	call i8* @foo(), !dereferenceable_or_null !{i64 2}			call i8* @foo(), !dereferenceable_or_null !{i64 2}
	ret void			ret void
	}			}
	; CHECK: dereferenceable, dereferenceable_or_null apply only to load instructions, use attributes for calls or invokes			; CHECK: dereferenceable, dereferenceable_or_null apply only to load instructions, use attributes for calls or invokes
	; CHECK-NEXT: call i8* @foo()			; CHECK-NEXT: call i8* @foo()

	define i8 @f3(i8* %x) {
	entry:
	%y = load i8, i8* %x, !dereferenceable !{i64 2}
	ret i8 %y
	}
	; CHECK: dereferenceable, dereferenceable_or_null apply only to pointer types
	; CHECK-NEXT: load i8, i8* %x

	define i8 @f4(i8* %x) {
	entry:
	%y = load i8, i8* %x, !dereferenceable_or_null !{i64 2}
	ret i8 %y
	}
	; CHECK: dereferenceable, dereferenceable_or_null apply only to pointer types
	; CHECK-NEXT: load i8, i8* %x

	define i8* @f5(i8** %x) {			define i8* @f5(i8** %x) {
	entry:			entry:
	%y = load i8, i8* %x, !dereferenceable !{}			%y = load i8, i8* %x, !dereferenceable !{}
	ret i8* %y			ret i8* %y
	}			}
	; CHECK: dereferenceable, dereferenceable_or_null take one operand			; CHECK: dereferenceable, dereferenceable_or_null take one operand
	; CHECK-NEXT: load i8, i8* %x			; CHECK-NEXT: load i8, i8* %x

	Show All 33 Lines


	define i8* @f10(i8** %x) {			define i8* @f10(i8** %x) {
	entry:			entry:
	%y = load i8, i8* %x, !dereferenceable_or_null !{i32 2}			%y = load i8, i8* %x, !dereferenceable_or_null !{i32 2}
	ret i8* %y			ret i8* %y
	}			}
	; CHECK: dereferenceable, dereferenceable_or_null metadata value must be an i64!			; CHECK: dereferenceable, dereferenceable_or_null metadata value must be an i64!
	; CHECK-NEXT: load i8, i8* %x			; CHECK-NEXT: load i8, i8* %x
	No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer][InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shufflesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 105676

include/llvm/Transforms/Vectorize/SLPVectorizer.h

lib/IR/Verifier.cpp

lib/Transforms/InstCombine/InstructionCombining.cpp

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/X86/21780.ll

test/Verifier/dereferenceable-md.ll

[SLPVectorizer][InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
AbandonedPublic