Diff 123854

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,910 Lines • ▼ Show 20 Lines

	.. code-block:: llvm			.. code-block:: llvm

	%result = call i64 %binop(i64 %x, i64 %y), !callees !0			%result = call i64 %binop(i64 %x, i64 %y), !callees !0

	...			...
	!0 = !{i64 (i64, i64)* @add, i64 (i64, i64)* @sub}			!0 = !{i64 (i64, i64)* @add, i64 (i64, i64)* @sub}

				'``speculation.marker``' Metadata
				reamesUnsubmitted Not Done Reply Inline Actions This needs a name change. We already have dereferenceable attribute which marks arguments and return values as being globally deferenceable. This is marking a memory location as being dereferenceable at a particular contextual location. Maybe: dereferenceable_offsets? reames: This needs a name change. We already have dereferenceable attribute which marks arguments…
				^^^^^^^^^^^^^^^^^^^^^^

				``speculation.marker`` metadata must be attached to a load. It consists of
				set of ``i64`` type offsets indicating that the memory from load pointer
				address is accessable for read operation with provided offsets. The intent
				of this metadata is to keep track of dereferanceable memory locations after
				load operations to that memory were deleted, as it might be beneficial
				for future optimizations. Offsets are sorted and they aren't indicating min/max
				offsets, but rather each offset proven to be dereferanceable.

				.. code-block:: llvm

				%ld1 = load double, double* %arrayidx1, align 8, !speculation.marker !0

				...
				!0 = !{i64 -1, i64 2}
				reamesUnsubmitted Not Done Reply Inline Actions More information on the format is needed. Specific missing pieces: what do both fields mean? i64 or any constant? signed or unsigned? is it a list? I'd also suggest requiring sorting for easy of access. reames: More information on the format is needed. Specific missing pieces: - what do both fields mean?

	'``unpredictable``' Metadata			'``unpredictable``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	``unpredictable`` metadata may be attached to any branch or switch			``unpredictable`` metadata may be attached to any branch or switch
	instruction. It can be used to express the unpredictability of control			instruction. It can be used to express the unpredictability of control
	flow. Similar to the llvm.expect intrinsic, it may be used to alter			flow. Similar to the llvm.expect intrinsic, it may be used to alter
	optimizations related to compare and branch instructions. The metadata			optimizations related to compare and branch instructions. The metadata
	is treated as a boolean value; if it exists, it signals that the branch			is treated as a boolean value; if it exists, it signals that the branch
	▲ Show 20 Lines • Show All 9,612 Lines • Show Last 20 Lines

include/llvm/IR/LLVMContext.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	enum {
MD_align = 17, // "align"		MD_align = 17, // "align"
MD_loop = 18, // "llvm.loop"		MD_loop = 18, // "llvm.loop"
MD_type = 19, // "type"		MD_type = 19, // "type"
MD_section_prefix = 20, // "section_prefix"		MD_section_prefix = 20, // "section_prefix"
MD_absolute_symbol = 21, // "absolute_symbol"		MD_absolute_symbol = 21, // "absolute_symbol"
MD_associated = 22, // "associated"		MD_associated = 22, // "associated"
MD_callees = 23, // "callees"		MD_callees = 23, // "callees"
MD_irr_loop = 24, // "irr_loop"		MD_irr_loop = 24, // "irr_loop"
		MD_speculation_marker = 25, // "speculation.marker"
};		};

/// Known operand bundle tag IDs, which always have the same value. All		/// Known operand bundle tag IDs, which always have the same value. All
/// operand bundle tags that LLVM has special knowledge of are listed here.		/// operand bundle tags that LLVM has special knowledge of are listed here.
/// Additionally, this scheme allows LLVM to efficiently check for specific		/// Additionally, this scheme allows LLVM to efficiently check for specific
/// operand bundle tags without comparing strings.		/// operand bundle tags without comparing strings.
enum {		enum {
OB_deopt = 0, // "deopt"		OB_deopt = 0, // "deopt"
▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines

lib/IR/LLVMContext.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	std::pair<unsigned, StringRef> MDKinds[] = {
{MD_align, "align"},		{MD_align, "align"},
{MD_loop, "llvm.loop"},		{MD_loop, "llvm.loop"},
{MD_type, "type"},		{MD_type, "type"},
{MD_section_prefix, "section_prefix"},		{MD_section_prefix, "section_prefix"},
{MD_absolute_symbol, "absolute_symbol"},		{MD_absolute_symbol, "absolute_symbol"},
{MD_associated, "associated"},		{MD_associated, "associated"},
{MD_callees, "callees"},		{MD_callees, "callees"},
{MD_irr_loop, "irr_loop"},		{MD_irr_loop, "irr_loop"},
		{MD_speculation_marker, "speculation.marker"},
};		};

for (auto &MDKind : MDKinds) {		for (auto &MDKind : MDKinds) {
unsigned ID = getMDKindID(MDKind.second);		unsigned ID = getMDKindID(MDKind.second);
assert(ID == MDKind.first && "metadata kind id drifted");		assert(ID == MDKind.first && "metadata kind id drifted");
(void)ID;		(void)ID;
}		}

▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	case Intrinsic::ssub_with_overflow:
return OCF_SIGNED_SUB;		return OCF_SIGNED_SUB;
case Intrinsic::umul_with_overflow:		case Intrinsic::umul_with_overflow:
return OCF_UNSIGNED_MUL;		return OCF_UNSIGNED_MUL;
case Intrinsic::smul_with_overflow:		case Intrinsic::smul_with_overflow:
return OCF_SIGNED_MUL;		return OCF_SIGNED_MUL;
}		}
}		}

		typedef DenseMap<Value*, SmallDenseSet<uint64_t>> BasePtrInfoTy;
		typedef DenseMap<Value, std::pair<Value, uint64_t>> PtrInfoTy;

/// \brief The core instruction combiner logic.		/// \brief The core instruction combiner logic.
///		///
/// This class provides both the logic to recursively visit instructions and		/// This class provides both the logic to recursively visit instructions and
/// combine them.		/// combine them.
class LLVM_LIBRARY_VISIBILITY InstCombiner		class LLVM_LIBRARY_VISIBILITY InstCombiner
: public InstVisitor<InstCombiner, Instruction *> {		: public InstVisitor<InstCombiner, Instruction *> {
// FIXME: These members shouldn't be public.		// FIXME: These members shouldn't be public.
public:		public:
Show All 20 Lines	private:
DominatorTree &DT;		DominatorTree &DT;
const DataLayout &DL;		const DataLayout &DL;
const SimplifyQuery SQ;		const SimplifyQuery SQ;
OptimizationRemarkEmitter &ORE;		OptimizationRemarkEmitter &ORE;

// Optional analyses. When non-null, these can both be used to do better		// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.		// combining and will be updated to reflect any changes.
LoopInfo *LI;		LoopInfo *LI;
		BasePtrInfoTy &Bases;
		PtrInfoTy &Ptrs;

bool MadeIRChange = false;		bool MadeIRChange = false;

public:		public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,		InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,
bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,		bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,		AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,
OptimizationRemarkEmitter &ORE, const DataLayout &DL,		OptimizationRemarkEmitter &ORE, const DataLayout &DL,
LoopInfo *LI)		LoopInfo *LI, BasePtrInfoTy &Bases, PtrInfoTy &Ptrs)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),		: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),		ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),
DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), LI(LI) {}		DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), LI(LI),
		Bases(Bases), Ptrs(Ptrs) {}

/// \brief Run the combiner over the entire worklist until it is empty.		/// \brief Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache &getAssumptionCache() const { return AC; }		AssumptionCache &getAssumptionCache() const { return AC; }

▲ Show 20 Lines • Show All 538 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CBindingWrapping.h"		#include "llvm/Support/CBindingWrapping.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"		#include "llvm/Support/DebugCounter.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
▲ Show 20 Lines • Show All 1,401 Lines • ▼ Show 20 Lines	Value *InstCombiner::SimplifyVectorOp(BinaryOperator &Inst) {
}		}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {		Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {
SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());		SmallVector<Value*, 8> Ops(GEP.op_begin(), GEP.op_end());

		// Here we have to avoid any GEPs with vector types or with
		// number of operands that is not equal to two or GEPs
		// in a loop, except it's origin is in the same basic block.
		if (!dyn_cast<VectorType>(Ops[0]->getType()) && GEP.getNumOperands() == 2 &&
		LI != nullptr) {
		bool ValidGEP = (LI->getLoopDepth(GEP.getParent()) == 0);
		Value *Ptr = cast<Value>(&GEP);
		if (!ValidGEP)
		if (Instruction *PtrInst = dyn_cast<Instruction>(Ptr))
		ValidGEP = PtrInst->getParent() == GEP.getParent();
		ConstantInt *C = dyn_cast<ConstantInt>(GEP.getOperand(1));
		if (ValidGEP && C != nullptr) {
		Value *BasePtr = GEP.getOperand(0);
		Ptrs[Ptr].first = BasePtr;
		Ptrs[Ptr].second = C->getZExtValue();
		if (Ptrs.find(BasePtr) == Ptrs.end()) {
		Ptrs[BasePtr].first = BasePtr;
		Ptrs[BasePtr].second = 0;
		}
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format this - your indentation is broken. RKSimon: clang-format this - your indentation is broken.
		}

if (Value *V = SimplifyGEPInst(GEP.getSourceElementType(), Ops,		if (Value *V = SimplifyGEPInst(GEP.getSourceElementType(), Ops,
SQ.getWithInstruction(&GEP)))		SQ.getWithInstruction(&GEP)))
return replaceInstUsesWith(GEP, V);		return replaceInstUsesWith(GEP, V);

Value *PtrOp = GEP.getOperand(0);		Value *PtrOp = GEP.getOperand(0);

// Eliminate unneeded casts for indices, and replace indices which displace		// Eliminate unneeded casts for indices, and replace indices which displace
// by multiples of a zero size type with zero.		// by multiples of a zero size type with zero.
▲ Show 20 Lines • Show All 1,419 Lines • ▼ Show 20 Lines
bool InstCombiner::run() {		bool InstCombiner::run() {
while (!Worklist.isEmpty()) {		while (!Worklist.isEmpty()) {
Instruction *I = Worklist.RemoveOne();		Instruction *I = Worklist.RemoveOne();
if (I == nullptr) continue; // skip null values.		if (I == nullptr) continue; // skip null values.

// Check to see if we can DCE the instruction.		// Check to see if we can DCE the instruction.
if (isInstructionTriviallyDead(I, &TLI)) {		if (isInstructionTriviallyDead(I, &TLI)) {
DEBUG(dbgs() << "IC: DCE: " << *I << '\n');		DEBUG(dbgs() << "IC: DCE: " << *I << '\n');
		if (InsertElementInst *Insert = dyn_cast<InsertElementInst>(I))
		if (LoadInst *Load = dyn_cast<LoadInst>(Insert->getOperand(1))) {
		Value *Ptr = Load->getOperand(0);
		if (Ptrs.find(Ptr) != Ptrs.end()) {
		Value *BasePtr = Ptrs[Ptr].first;
		Bases[BasePtr].insert(Ptrs[Ptr].second);
		}
		}
eraseInstFromFunction(*I);		eraseInstFromFunction(*I);
++NumDeadInst;		++NumDeadInst;
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
}		}

if (!DebugCounter::shouldExecute(VisitCounter))		if (!DebugCounter::shouldExecute(VisitCounter))
continue;		continue;

// Instruction isn't dead, see if we can constant propagate it.		// Instruction isn't dead, see if we can constant propagate it.
if (!I->use_empty() &&		if (!I->use_empty() &&
(I->getNumOperands() == 0 \|\| isa<Constant>(I->getOperand(0)))) {		(I->getNumOperands() == 0 \|\| isa<Constant>(I->getOperand(0)))) {
if (Constant *C = ConstantFoldInstruction(I, DL, &TLI)) {		if (Constant *C = ConstantFoldInstruction(I, DL, &TLI)) {
DEBUG(dbgs() << "IC: ConstFold to: " << C << " from: " << I << '\n');		DEBUG(dbgs() << "IC: ConstFold to: " << C << " from: " << I << '\n');

// Add operands to the worklist.		// Add operands to the worklist.
replaceInstUsesWith(*I, C);		replaceInstUsesWith(*I, C);
++NumConstProp;		++NumConstProp;
if (isInstructionTriviallyDead(I, &TLI))		if (isInstructionTriviallyDead(I, &TLI))
eraseInstFromFunction(*I);		eraseInstFromFunction(*I);
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
}		}
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format - braces and indentation are really bad RKSimon: clang-format - braces and indentation are really bad

// In general, it is possible for computeKnownBits to determine all bits in		// In general, it is possible for computeKnownBits to determine all bits in
// a value even when the operands are not all constants.		// a value even when the operands are not all constants.
Type *Ty = I->getType();		Type *Ty = I->getType();
if (ExpensiveCombines && !I->use_empty() && Ty->isIntOrIntVectorTy()) {		if (ExpensiveCombines && !I->use_empty() && Ty->isIntOrIntVectorTy()) {
KnownBits Known = computeKnownBits(I, /Depth/0, I);		KnownBits Known = computeKnownBits(I, /Depth/0, I);
if (Known.isConstant()) {		if (Known.isConstant()) {
Constant *C = ConstantInt::get(Ty, Known.getConstant());		Constant *C = ConstantInt::get(Ty, Known.getConstant());
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
/// This has a couple of tricks to make the code faster and more powerful. In		/// This has a couple of tricks to make the code faster and more powerful. In
/// particular, we constant fold and DCE instructions as we go, to avoid adding		/// particular, we constant fold and DCE instructions as we go, to avoid adding
/// them to the worklist (this significantly speeds up instcombine on code where		/// them to the worklist (this significantly speeds up instcombine on code where
/// many instructions are dead or constant). Additionally, if we find a branch		/// many instructions are dead or constant). Additionally, if we find a branch
/// whose condition is a known constant, we only visit the reachable successors.		/// whose condition is a known constant, we only visit the reachable successors.
static bool AddReachableCodeToWorklist(BasicBlock *BB, const DataLayout &DL,		static bool AddReachableCodeToWorklist(BasicBlock *BB, const DataLayout &DL,
SmallPtrSetImpl<BasicBlock *> &Visited,		SmallPtrSetImpl<BasicBlock *> &Visited,
InstCombineWorklist &ICWorklist,		InstCombineWorklist &ICWorklist,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		BasePtrInfoTy &Bases,
		PtrInfoTy &Ptrs) {
bool MadeIRChange = false;		bool MadeIRChange = false;
SmallVector<BasicBlock*, 256> Worklist;		SmallVector<BasicBlock*, 256> Worklist;
Worklist.push_back(BB);		Worklist.push_back(BB);

SmallVector<Instruction*, 128> InstrsForInstCombineWorklist;		SmallVector<Instruction*, 128> InstrsForInstCombineWorklist;
DenseMap<Constant , Constant > FoldedConstants;		DenseMap<Constant , Constant > FoldedConstants;

do {		do {
BB = Worklist.pop_back_val();		BB = Worklist.pop_back_val();

// We have now visited this block! If we've already been here, ignore it.		// We have now visited this block! If we've already been here, ignore it.
if (!Visited.insert(BB).second)		if (!Visited.insert(BB).second)
continue;		continue;

for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) {		for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) {
Instruction Inst = &BBI++;		Instruction Inst = &BBI++;

// DCE instruction if trivially dead.		// DCE instruction if trivially dead.
if (isInstructionTriviallyDead(Inst, TLI)) {		if (isInstructionTriviallyDead(Inst, TLI)) {
++NumDeadInst;		++NumDeadInst;
DEBUG(dbgs() << "IC: DCE: " << *Inst << '\n');		DEBUG(dbgs() << "IC: DCE: " << *Inst << '\n');
		if (InsertElementInst *Insert = dyn_cast<InsertElementInst>(Inst))
		if (LoadInst *Load = dyn_cast<LoadInst>(Insert->getOperand(1))) {
		Value *Ptr = Load->getOperand(0);
		if (Ptrs.find(Ptr) != Ptrs.end()) {
		Value *BasePtr = Ptrs[Ptr].first;
		Bases[BasePtr].insert(Ptrs[Ptr].second);
		}
		}
salvageDebugInfo(*Inst);		salvageDebugInfo(*Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
}		}

// ConstantProp instruction if trivially constant.		// ConstantProp instruction if trivially constant.
if (!Inst->use_empty() &&		if (!Inst->use_empty() &&
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines

/// \brief Populate the IC worklist from a function, and prune any dead basic		/// \brief Populate the IC worklist from a function, and prune any dead basic
/// blocks discovered in the process.		/// blocks discovered in the process.
///		///
/// This also does basic constant propagation and other forward fixing to make		/// This also does basic constant propagation and other forward fixing to make
/// the combiner itself run much faster.		/// the combiner itself run much faster.
static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,		static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
TargetLibraryInfo *TLI,		TargetLibraryInfo *TLI,
InstCombineWorklist &ICWorklist) {		InstCombineWorklist &ICWorklist,
		BasePtrInfoTy &Bases,
		PtrInfoTy &Ptrs) {
bool MadeIRChange = false;		bool MadeIRChange = false;

// Do a depth-first traversal of the function, populate the worklist with		// Do a depth-first traversal of the function, populate the worklist with
// the reachable instructions. Ignore blocks that are not reachable. Keep		// the reachable instructions. Ignore blocks that are not reachable. Keep
// track of which blocks we visit.		// track of which blocks we visit.
SmallPtrSet<BasicBlock *, 32> Visited;		SmallPtrSet<BasicBlock *, 32> Visited;
MadeIRChange \|=		MadeIRChange \|=
AddReachableCodeToWorklist(&F.front(), DL, Visited, ICWorklist, TLI);		AddReachableCodeToWorklist(&F.front(), DL, Visited, ICWorklist, TLI,
		Bases, Ptrs);

// Do a quick scan over the function. If we find any blocks that are		// Do a quick scan over the function. If we find any blocks that are
// unreachable, remove any instructions inside of them. This prevents		// unreachable, remove any instructions inside of them. This prevents
// the instcombine code from having to deal with some bad special cases.		// the instcombine code from having to deal with some bad special cases.
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
if (Visited.count(&BB))		if (Visited.count(&BB))
continue;		continue;

Show All 18 Lines	static bool combineInstructionsOverFunction(
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
F.getContext(), TargetFolder(DL),		F.getContext(), TargetFolder(DL),
IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {		IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
Worklist.Add(I);		Worklist.Add(I);
if (match(I, m_Intrinsic<Intrinsic::assume>()))		if (match(I, m_Intrinsic<Intrinsic::assume>()))
AC.registerAssumption(cast<CallInst>(I));		AC.registerAssumption(cast<CallInst>(I));
}));		}));

		BasePtrInfoTy Bases;
		PtrInfoTy Ptrs;

// Lower dbg.declare intrinsics otherwise their value may be clobbered		// Lower dbg.declare intrinsics otherwise their value may be clobbered
// by instcombiner.		// by instcombiner.
bool MadeIRChange = false;		bool MadeIRChange = false;
if (ShouldLowerDbgDeclare)		if (ShouldLowerDbgDeclare)
MadeIRChange = LowerDbgDeclare(F);		MadeIRChange = LowerDbgDeclare(F);

// Iterate while there is work to do.		// Iterate while there is work to do.
int Iteration = 0;		int Iteration = 0;
while (true) {		while (true) {
++Iteration;		++Iteration;
DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist, Bases,
		Ptrs);

InstCombiner IC(Worklist, Builder, F.optForMinSize(), ExpensiveCombines, AA,		InstCombiner IC(Worklist, Builder, F.optForMinSize(), ExpensiveCombines, AA,
AC, TLI, DT, ORE, DL, LI);		AC, TLI, DT, ORE, DL, LI, Bases, Ptrs);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;

if (!IC.run())		if (!IC.run())
break;		break;
}		}

		bool SpeculativeLoad = false;
		for (auto x : Bases)
		if (!x.second.empty())
		SpeculativeLoad = true;

		if (SpeculativeLoad)
		for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB)
		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; I++)
		if (LoadInst Load = dyn_cast<LoadInst>(&I)) {
		Value *Ptr = Load->getOperand(0);
		if (Ptrs.find(Ptr) != Ptrs.end()) {
		SmallVector<Metadata *, 2> Vals;
		SmallDenseSet<uint64_t> &Set = Bases[Ptrs[Ptr].first];
		int64_t LoadOffset = Ptrs[Ptr].second;
		MDBuilder MDB(Load->getContext());
		if (MDNode *MD = Load->getMetadata(LLVMContext::MD_speculation_marker))
		for (int i = 0, e = MD->getNumOperands(); i < e; i++) {
		ConstantInt *C =
		mdconst::dyn_extract<ConstantInt>(MD->getOperand(i));
		int64_t Diff = LoadOffset + (int64_t)C->getSExtValue();
		if (Set.count(Diff) == 0)
		Set.insert(Diff);
		}
		for (auto Off : Set) {
		int64_t Offset = (int64_t)Off;
		Constant *C = ConstantInt::get(Load->getContext(),
		APInt(64, Offset - LoadOffset));
		Vals.push_back(MDB.createConstant(C));
		}
		Load->setMetadata(LLVMContext::MD_speculation_marker,
		MDNode::get(Load->getContext(), Vals));
		}
		}

return MadeIRChange \|\| Iteration > 1;		return MadeIRChange \|\| Iteration > 1;
}		}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/ThinLTO/X86/lazyload_metadata.ll

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %s -o %t.bc -bitcode-mdindex-threshold=0
	; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0			; RUN: opt -module-summary %p/Inputs/lazyload_metadata.ll -o %t2.bc -bitcode-mdindex-threshold=0
	; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc			; RUN: llvm-lto -thinlto-action=thinlink -o %t3.bc %t.bc %t2.bc
	; REQUIRES: asserts			; REQUIRES: asserts

	; Check that importing @globalfunc1 does not trigger loading all the global			; Check that importing @globalfunc1 does not trigger loading all the global
	; metadata for @globalfunc2 and @globalfunc3			; metadata for @globalfunc2 and @globalfunc3

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -stats \			; RUN: -o /dev/null -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=LAZY
	; LAZY: 55 bitcode-reader - Number of Metadata records loaded			; LAZY: 57 bitcode-reader - Number of Metadata records loaded
	; LAZY: 2 bitcode-reader - Number of MDStrings loaded			; LAZY: 2 bitcode-reader - Number of MDStrings loaded

	; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \			; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
	; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \			; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \
	; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY			; RUN: 2>&1 \| FileCheck %s -check-prefix=NOTLAZY
	; NOTLAZY: 64 bitcode-reader - Number of Metadata records loaded			; NOTLAZY: 66 bitcode-reader - Number of Metadata records loaded
	; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded			; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded


	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.11.0"			target triple = "x86_64-apple-macosx10.11.0"

	define void @globalfunc1(i32 %arg) {			define void @globalfunc1(i32 %arg) {
	%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")			%x = call i1 @llvm.type.test(i8* undef, metadata !"typeid1")
	Show All 31 Lines

test/Transforms/InstCombine/speculation_marker.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				define <4 x double> @foo(double* %ptr) {
				; CHECK-LABEL: @foo(
				reamesUnsubmitted Not Done Reply Inline Actions Just to comment: in your test case, we can prove that the loads post dominate the entry to the function and could update the argument with the existing dereferenceability attribute. This might be an alternate approach and separately worth implementation. reames: Just to comment: in your test case, we can prove that the loads post dominate the entry to the…
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 2
				; CHECK-NEXT: [[LD0:%.]] = load double, double [[PTR]], align 8, !speculation.marker !0
				; CHECK-NEXT: [[LD2:%.]] = load double, double [[ARRAYIDX2]], align 8, !speculation.marker !1
				; CHECK-NEXT: [[INS0:%.*]] = insertelement <4 x double> undef, double [[LD0]], i32 0
				; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x double> [[INS0]], double [[LD2]], i32 2
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[INS2]], <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
				; CHECK-NEXT: ret <4 x double> [[SHUFFLE]]
				;
				%arrayidx0 = getelementptr inbounds double, double* %ptr, i64 0
				%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3

				%ld0 = load double, double* %arrayidx0
				%ld1 = load double, double* %arrayidx1
				%ld2 = load double, double* %arrayidx2
				%ld3 = load double, double* %arrayidx3

				%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
				%ins1 = insertelement <4 x double> %ins0, double %ld1, i32 1
				%ins2 = insertelement <4 x double> %ins1, double %ld2, i32 2
				%ins3 = insertelement <4 x double> %ins2, double %ld3, i32 3

				%shuffle = shufflevector <4 x double> %ins3, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
				ret <4 x double> %shuffle
				}

				define <4 x double> @bar(double* %ptr) {
				; CHECK-LABEL: @bar(
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[PTR]], i64 2
				; CHECK-NEXT: [[LD1:%.]] = load double, double [[ARRAYIDX1]], align 8, !speculation.marker !2
				; CHECK-NEXT: [[LD2:%.]] = load double, double [[ARRAYIDX2]], align 8, !speculation.marker !3
				; CHECK-NEXT: [[INS1:%.*]] = insertelement <4 x double> undef, double [[LD1]], i32 1
				; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x double> [[INS1]], double [[LD2]], i32 2
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[INS2]], <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 2>
				; CHECK-NEXT: ret <4 x double> [[SHUFFLE]]
				;
				%arrayidx0 = getelementptr inbounds double, double* %ptr, i64 0
				%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
				%arrayidx2 = getelementptr inbounds double, double* %ptr, i64 2
				%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 3

				%ld0 = load double, double* %arrayidx0
				%ld1 = load double, double* %arrayidx1
				%ld2 = load double, double* %arrayidx2
				%ld3 = load double, double* %arrayidx3

				%ins0 = insertelement <4 x double> undef, double %ld0, i32 0
				%ins1 = insertelement <4 x double> %ins0, double %ld1, i32 1
				%ins2 = insertelement <4 x double> %ins1, double %ld2, i32 2
				%ins3 = insertelement <4 x double> %ins2, double %ld3, i32 3

				%shuffle = shufflevector <4 x double> %ins3, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 2>
				ret <4 x double> %shuffle
				}

				; CHECK: !0 = !{i64 1, i64 3}
				; CHECK: !1 = !{i64 -1, i64 1}
				; CHECK: !2 = !{i64 -1, i64 2}
				; CHECK: !3 = !{i64 -2, i64 1}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 123854

docs/LangRef.rst

include/llvm/IR/LLVMContext.h

lib/IR/LLVMContext.cpp

lib/Transforms/InstCombine/InstCombineInternal.h

lib/Transforms/InstCombine/InstructionCombining.cpp

test/ThinLTO/X86/lazyload_metadata.ll

test/Transforms/InstCombine/speculation_marker.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shufflesNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 123854

docs/LangRef.rst

include/llvm/IR/LLVMContext.h

lib/IR/LLVMContext.cpp

lib/Transforms/InstCombine/InstCombineInternal.h

lib/Transforms/InstCombine/InstructionCombining.cpp

test/ThinLTO/X86/lazyload_metadata.ll

test/Transforms/InstCombine/speculation_marker.ll

[InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
Needs RevisionPublic