This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
polly/trunk/
-
trunk/
-
include/polly/CodeGen/
-
polly/
-
CodeGen/
-
IRBuilder.h
-
LoopGenerators.h
-
lib/
-
CodeGen/
-
IRBuilder.cpp
-
IslNodeBuilder.cpp
-
LoopGenerators.cpp
-
Transform/
-
ScheduleOptimizer.cpp
-
test/ScheduleOptimizer/
-
ScheduleOptimizer/
-
mat_mul_pattern_data_layout_2.ll
-
pattern-matching-based-opts_12.ll
-
pattern-matching-based-opts_13.ll
-
pattern-matching-based-opts_14.ll
-
pattern-matching-based-opts_3.ll
-
pattern-matching-based-opts_4.ll
-
pattern-matching-based-opts_5.ll
-
pattern-matching-based-opts_6.ll
-
pattern-matching-based-opts_7.ll
-
pattern-matching-based-opts_8.ll
-
pattern-matching-based-opts_9.ll

Differential D36928

[Polly][MatMul][WIP] Disable the Loop Vectorizer
ClosedPublic

Authored by gareevroman on Aug 19 2017, 10:57 AM.

Download Raw Diff

Details

Reviewers

grosser
Meinersbur
jdoerfert
bollu

Commits

rG0956a606ffb1: Disable the Loop Vectorizer in case of GEMM
rPLO311473: Disable the Loop Vectorizer in case of GEMM
rL311473: Disable the Loop Vectorizer in case of GEMM

Summary

The Loop Vectorizer can generate mis-optimized code https://bugs.llvm.org/show_bug.cgi?id=34245

In case of GEMM, we use only the SLP Vectorizer out of two LLVM vectorizers. Consequently, we disable the Loop Vectorizer for the innermost loop using mark nodes and emitting the corresponding metadata.

P.S.: I haven't managed to insert the mark nodes before AST for nodes, since the isolation can produce if statements and, AFAIU, we aren't able to modify their children. For example, we can get the following

// Mark node
if (…)
  for (…)

Consequently, we aren't able to modify the for loop during handling of the mark node.

Diff Detail

Repository: rL LLVM

Event Timeline

gareevroman created this revision.Aug 19 2017, 10:57 AM

Herald added a reviewer: bollu. · View Herald TranscriptAug 19 2017, 10:57 AM

Herald added a subscriber: rengolin. · View Herald Transcript

LGTM, besides the broken comment.

lib/CodeGen/IslNodeBuilder.cpp
492 ↗	(On Diff #111842)	This comment looks out of place?

grosser accepted this revision.Aug 19 2017, 11:07 AM

This revision is now accepted and ready to land.Aug 19 2017, 11:07 AM

I suggest that you wait on this until we understand the regression.

In D36928#846606, @hfinkel wrote:

I suggest that you wait on this until we understand the regression.

In part, it's possible that this is the wrong fix. Based on the bug report, it is possible that the problem is that the vectorizer is generating runtime checks. Maybe aliasing metadata would help. Maybe it's unrolling too much.

In D36928#846608, @hfinkel wrote:

In D36928#846606, @hfinkel wrote:

I suggest that you wait on this until we understand the regression.

In part, it's possible that this is the wrong fix. Based on the bug report, it is possible that the problem is that the vectorizer is generating runtime checks. Maybe aliasing metadata would help. Maybe it's unrolling too much.

Currently, in case of GEMM detected and optimized by Polly, only the SLP vectorizer is needed out of two LLVM vectorizers. Usually the Loop Vectorizer doesn't affect the code. Consequently, as far as I understand, this fix shouldn't hurt anything.

To generate BLIS micro-kernel [1], we apply tiling and unroll the two innermost loops. Subsequently, we use LICM to sink and hoist all stores and loads form the innermost loop, and apply the SLP vectorizer to get a sequence of rank-1 updates.

I think that the application of the Loop Vectorizer instead of the SLP Vectorizer is the way to improve matrix optimization of Polly. However, as far as I understand, there is no way to make the Loop Vectorizer vectorize a loop and sink and hoist accesses. All available metadata are only optimization hints and the optimizer will only interleave and vectorize loops if it believes it's safe to do so. Consequently, it'd probably require the involvement of maintainers of the Loop Vectorizer.

P.S.: Af far as I know, in case we unroll only the innermost loop, the Loop Vectorizer does unroll and vectorize the second loop. However, it doesn't sink and hoist stores and loads. Also, I haven't tested how it vectorizes different forms of generalized matrix multiplication, for example, the shortest path problem which can use the following kernel:

#define MIN(X, Y) (((X) < (Y)) ? (X) : (Y))

  for (i = 0; i < _PB_NI; i++)
    for (j = 0; j < _PB_NI; j++)
      for (k = 0; k < _PB_NI; k++)
        L[i][j] = MIN(L[i][j], W[i][k] + W[k][j]);

Refs.:

[1] - https://pdfs.semanticscholar.org/cb77/c2fdf8132f5e88f09b253a9fe01b65da7bc4.pdf

Hi Hal,

I think this is conceptually the right approach. We currently generate code -- with explicit register unrolling -- and expect the SLP vectorizer to perform the vectorization. I believe communicating this information via explicit metadata is reasonable.

We may want to move towards using the LLVM loop vectorizer rather than the SLP vectorizer, but this requires both changes to the loop vectorizer and to our code generation strategy. We should certainly consider this, but I feel that this could be separate steps. 1) clarify current behavior and fix regressions, 2) expand the loop vectorizer, 3) change our code generation logic.

Roman, I think Hal is right that we should look into how to improve the loop vectorizer. It would be great if you could add more information to the bug report, such that others can -- independently of polly -- understand what optimization the current loop vectorizer does not do we be needed for us.

In D36928#847279, @grosser wrote:

Hi Hal,

I think this is conceptually the right approach. We currently generate code -- with explicit register unrolling -- and expect the SLP vectorizer to perform the vectorization. I believe communicating this information via explicit metadata is reasonable.

We may want to move towards using the LLVM loop vectorizer rather than the SLP vectorizer, but this requires both changes to the loop vectorizer and to our code generation strategy. We should certainly consider this, but I feel that this could be separate steps. 1) clarify current behavior and fix regressions, 2) expand the loop vectorizer, 3) change our code generation logic.

Okay, I understand. In that case, this is fine. If you're explicitly setting up for the SLP vectorizer, then we don't want the loop vectorizer to get in the way. However, at least in theory, this is probably suboptimal and we should figure out how to make it better.

Roman, I think Hal is right that we should look into how to improve the loop vectorizer. It would be great if you could add more information to the bug report, such that others can -- independently of polly -- understand what optimization the current loop vectorizer does not do we be needed for us.

Yes, please do.

In D36928#847135, @gareevroman wrote:

In D36928#846608, @hfinkel wrote:

In D36928#846606, @hfinkel wrote:

I suggest that you wait on this until we understand the regression.

In part, it's possible that this is the wrong fix. Based on the bug report, it is possible that the problem is that the vectorizer is generating runtime checks. Maybe aliasing metadata would help. Maybe it's unrolling too much.

Currently, in case of GEMM detected and optimized by Polly, only the SLP vectorizer is needed out of two LLVM vectorizers. Usually the Loop Vectorizer doesn't affect the code. Consequently, as far as I understand, this fix shouldn't hurt anything.

To generate BLIS micro-kernel [1], we apply tiling and unroll the two innermost loops. Subsequently, we use LICM to sink and hoist all stores and loads form the innermost loop, and apply the SLP vectorizer to get a sequence of rank-1 updates.

I think that the application of the Loop Vectorizer instead of the SLP Vectorizer is the way to improve matrix optimization of Polly. However, as far as I understand, there is no way to make the Loop Vectorizer vectorize a loop and sink and hoist accesses. All available metadata are only optimization hints and the optimizer will only interleave and vectorize loops if it believes it's safe to do so.

FYI: This is not completely true. You can add ‘llvm.mem.parallel_loop_access‘ metadata to cause the vectorizer to assume vectorization safety. Look at what Clang generates if you use #pragma clang loop vectorize(assume_safety) or #pragma omp simd for. Using this metadata may allow you to setup the loop for efficient loop vectorization.

Consequently, it'd probably require the involvement of maintainers of the Loop Vectorizer.

P.S.: Af far as I know, in case we unroll only the innermost loop, the Loop Vectorizer does unroll and vectorize the second loop. However, it doesn't sink and hoist stores and loads. Also, I haven't tested how it vectorizes different forms of generalized matrix multiplication, for example, the shortest path problem which can use the following kernel:
#define MIN(X, Y) (((X) < (Y)) ? (X) : (Y))

  for (i = 0; i < _PB_NI; i++)
    for (j = 0; j < _PB_NI; j++)
      for (k = 0; k < _PB_NI; k++)
        L[i][j] = MIN(L[i][j], W[i][k] + W[k][j]);
Refs.:

[1] - https://pdfs.semanticscholar.org/cb77/c2fdf8132f5e88f09b253a9fe01b65da7bc4.pdf

In D36928#847279, @grosser wrote:

Hi Hal,

I think this is conceptually the right approach. We currently generate code -- with explicit register unrolling -- and expect the SLP vectorizer to perform the vectorization. I believe communicating this information via explicit metadata is reasonable.

We may want to move towards using the LLVM loop vectorizer rather than the SLP vectorizer, but this requires both changes to the loop vectorizer and to our code generation strategy. We should certainly consider this, but I feel that this could be separate steps. 1) clarify current behavior and fix regressions, 2) expand the loop vectorizer, 3) change our code generation logic.

Roman, I think Hal is right that we should look into how to improve the loop vectorizer.

Yes, I think this is a good idea.

It would be great if you could add more information to the bug report, such that others can -- independently of polly --

My understanding is that the bug report already contains the test case, which is independent of Polly. Sorry, I haven't managed to reduce it yet.

understand what optimization the current loop vectorizer does not do we be needed for us.

Probably, it should be described in a separate bug report. I'll try to do it soon.

Closed by commit rL311473: Disable the Loop Vectorizer in case of GEMM (authored by romangareev). · Explain WhyAug 22 2017, 10:39 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

polly/

trunk/

include/

polly/

CodeGen/

IRBuilder.h

4 lines

LoopGenerators.h

41 lines

lib/

CodeGen/

IRBuilder.cpp

28 lines

IslNodeBuilder.cpp

27 lines

LoopGenerators.cpp

6 lines

Transform/

ScheduleOptimizer.cpp

18 lines

test/

ScheduleOptimizer/

mat_mul_pattern_data_layout_2.ll

1 line

pattern-matching-based-opts_12.ll

3 lines

pattern-matching-based-opts_13.ll

2 lines

pattern-matching-based-opts_14.ll

3 lines

pattern-matching-based-opts_3.ll

1 line

pattern-matching-based-opts_4.ll

1 line

pattern-matching-based-opts_5.ll

3 lines

pattern-matching-based-opts_6.ll

2 lines

pattern-matching-based-opts_7.ll

1 line

pattern-matching-based-opts_8.ll

1 line

pattern-matching-based-opts_9.ll

1 line

Diff 112199

polly/trunk/include/polly/CodeGen/IRBuilder.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:

/// Remove the last added loop.		/// Remove the last added loop.
void popLoop(bool isParallel);		void popLoop(bool isParallel);

/// Annotate the new instruction @p I for all parallel loops.		/// Annotate the new instruction @p I for all parallel loops.
void annotate(llvm::Instruction *I);		void annotate(llvm::Instruction *I);

/// Annotate the loop latch @p B wrt. @p L.		/// Annotate the loop latch @p B wrt. @p L.
void annotateLoopLatch(llvm::BranchInst B, llvm::Loop L,		void annotateLoopLatch(llvm::BranchInst B, llvm::Loop L, bool IsParallel,
bool IsParallel) const;		bool IsLoopVectorizerDisabled) const;

/// Add alternative alias based pointers		/// Add alternative alias based pointers
///		///
/// When annotating instructions with alias scope metadata, the right metadata		/// When annotating instructions with alias scope metadata, the right metadata
/// is identified through the base pointer of the memory access. In some cases		/// is identified through the base pointer of the memory access. In some cases
/// (e.g. OpenMP code generation), the base pointer of the memory accesses is		/// (e.g. OpenMP code generation), the base pointer of the memory accesses is
/// not the original base pointer, but was changed when passing the original		/// not the original base pointer, but was changed when passing the original
/// base pointer over a function boundary. This function allows to provide a		/// base pointer over a function boundary. This function allows to provide a
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

polly/trunk/include/polly/CodeGen/LoopGenerators.h

	Show All 25 Lines
	class BasicBlock;			class BasicBlock;
	} // namespace llvm			} // namespace llvm

	namespace polly {			namespace polly {
	using namespace llvm;			using namespace llvm;

	/// Create a scalar do/for-style loop.			/// Create a scalar do/for-style loop.
	///			///
	/// @param LowerBound The starting value of the induction variable.			/// @param LowerBound The starting value of the induction variable.
	/// @param UpperBound The upper bound of the induction variable.			/// @param UpperBound The upper bound of the induction variable.
	/// @param Stride The value by which the induction variable is incremented.			/// @param Stride The value by which the induction variable
				/// is incremented.
	///			///
	/// @param Builder The builder used to create the loop.			/// @param Builder The builder used to create the loop.
	/// @param P A pointer to the pass that uses this function. It is used			/// @param P A pointer to the pass that uses this function.
	/// to update analysis information.			/// It is used to update analysis information.
	/// @param LI The loop info for the current function			/// @param LI The loop info for the current function
	/// @param DT The dominator tree we need to update			/// @param DT The dominator tree we need to update
	/// @param ExitBlock The block the loop will exit to.			/// @param ExitBlock The block the loop will exit to.
	/// @param Predicate The predicate used to generate the upper loop bound.			/// @param Predicate The predicate used to generate the upper loop
	/// @param Annotator This function can (optionally) take a ScopAnnotator which			/// bound.
				/// @param Annotator This function can (optionally) take
				/// a ScopAnnotator which
	/// annotates loops and alias information in the SCoP.			/// annotates loops and alias information in the SCoP.
	/// @param Parallel If this loop should be marked parallel in the Annotator.			/// @param Parallel If this loop should be marked parallel in
	/// @param UseGuard Create a guard in front of the header to check if the			/// the Annotator.
	/// loop is executed at least once, otherwise just assume it.			/// @param UseGuard Create a guard in front of the header to check if
				/// the loop is executed at least once, otherwise just
				/// assume it.
				/// @param LoopVectDisabled If the Loop vectorizer should be disabled for this
				/// loop.
	///			///
	/// @return Value* The newly created induction variable for this loop.			/// @return Value* The newly created induction variable for this loop.
	Value createLoop(Value LowerBound, Value UpperBound, Value Stride,			Value createLoop(Value LowerBound, Value UpperBound, Value Stride,
	PollyIRBuilder &Builder, LoopInfo &LI, DominatorTree &DT,			PollyIRBuilder &Builder, LoopInfo &LI, DominatorTree &DT,
	BasicBlock *&ExitBlock, ICmpInst::Predicate Predicate,			BasicBlock *&ExitBlock, ICmpInst::Predicate Predicate,
	ScopAnnotator *Annotator = NULL, bool Parallel = false,			ScopAnnotator *Annotator = NULL, bool Parallel = false,
	bool UseGuard = true);			bool UseGuard = true, bool LoopVectDisabled = false);

	/// The ParallelLoopGenerator allows to create parallelized loops			/// The ParallelLoopGenerator allows to create parallelized loops
	///			///
	/// To parallelize a loop, we perform the following steps:			/// To parallelize a loop, we perform the following steps:
	/// o Generate a subfunction which will hold the loop body.			/// o Generate a subfunction which will hold the loop body.
	/// o Create a struct to hold all outer values needed in the loop body.			/// o Create a struct to hold all outer values needed in the loop body.
	/// o Create calls to a runtime library to achieve the actual parallelism.			/// o Create calls to a runtime library to achieve the actual parallelism.
	/// These calls will spawn and join threads, define how the work (here the			/// These calls will spawn and join threads, define how the work (here the
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

polly/trunk/lib/CodeGen/IRBuilder.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	void ScopAnnotator::popLoop(bool IsParallel) {
ActiveLoops.pop_back();		ActiveLoops.pop_back();
if (!IsParallel)		if (!IsParallel)
return;		return;

assert(!ParallelLoops.empty() && "Expected a parallel loop to pop");		assert(!ParallelLoops.empty() && "Expected a parallel loop to pop");
ParallelLoops.pop_back();		ParallelLoops.pop_back();
}		}

void ScopAnnotator::annotateLoopLatch(BranchInst B, Loop L,		void ScopAnnotator::annotateLoopLatch(BranchInst B, Loop L, bool IsParallel,
bool IsParallel) const {		bool IsLoopVectorizerDisabled) const {
if (!IsParallel)		MDNode *MData = nullptr;
return;
		if (IsLoopVectorizerDisabled) {
		SmallVector<Metadata *, 3> Args;
		LLVMContext &Ctx = SE->getContext();
		Args.push_back(MDString::get(Ctx, "llvm.loop.vectorize.enable"));
		auto *FalseValue = ConstantInt::get(Type::getInt1Ty(Ctx), 0);
		Args.push_back(ValueAsMetadata::get(FalseValue));
		MData = MDNode::concatenate(MData, getID(Ctx, MDNode::get(Ctx, Args)));
		}

		if (IsParallel) {
assert(!ParallelLoops.empty() && "Expected a parallel loop to annotate");		assert(!ParallelLoops.empty() && "Expected a parallel loop to annotate");
MDNode *Ids = ParallelLoops.back();		MDNode *Ids = ParallelLoops.back();
MDNode *Id = cast<MDNode>(Ids->getOperand(Ids->getNumOperands() - 1));		MDNode *Id = cast<MDNode>(Ids->getOperand(Ids->getNumOperands() - 1));
B->setMetadata("llvm.loop", Id);		MData = MDNode::concatenate(MData, Id);
		}

		B->setMetadata("llvm.loop", MData);
}		}

/// Get the pointer operand		/// Get the pointer operand
///		///
/// @param Inst The instruction to be analyzed.		/// @param Inst The instruction to be analyzed.
/// @return the pointer operand in case @p Inst is a memory access		/// @return the pointer operand in case @p Inst is a memory access
/// instruction and nullptr otherwise.		/// instruction and nullptr otherwise.
static llvm::Value getMemAccInstPointerOperand(Instruction Inst) {		static llvm::Value getMemAccInstPointerOperand(Instruction Inst) {
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

polly/trunk/lib/CodeGen/IslNodeBuilder.cpp

Show First 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	void IslNodeBuilder::createForVector(__isl_take isl_ast_node *For,
IDToValue.erase(IDToValue.find(IteratorID));		IDToValue.erase(IDToValue.find(IteratorID));
isl_id_free(IteratorID);		isl_id_free(IteratorID);
isl_union_map_free(Schedule);		isl_union_map_free(Schedule);

isl_ast_node_free(For);		isl_ast_node_free(For);
isl_ast_expr_free(Iterator);		isl_ast_expr_free(Iterator);
}		}

		/// Restore the initial ordering of dimensions of the band node
		///
		/// In case the band node represents all the dimensions of the iteration
		/// domain, recreate the band node to restore the initial ordering of the
		/// dimensions.
		///
		/// @param Node The band node to be modified.
		/// @return The modified schedule node.
		namespace {
		bool IsLoopVectorizerDisabled(isl::ast_node Node) {
		assert(isl_ast_node_get_type(Node.keep()) == isl_ast_node_for);
		auto Body = Node.for_get_body();
		if (isl_ast_node_get_type(Body.keep()) != isl_ast_node_mark)
		return false;
		auto Id = Body.mark_get_id();
		if (!strcmp(Id.get_name().c_str(), "Loop Vectorizer Disabled"))
		return true;
		return false;
		}
		} // namespace

void IslNodeBuilder::createForSequential(__isl_take isl_ast_node *For,		void IslNodeBuilder::createForSequential(__isl_take isl_ast_node *For,
bool KnownParallel) {		bool KnownParallel) {
isl_ast_node *Body;		isl_ast_node *Body;
isl_ast_expr Init, Inc, Iterator, UB;		isl_ast_expr Init, Inc, Iterator, UB;
isl_id *IteratorID;		isl_id *IteratorID;
Value ValueLB, ValueUB, *ValueInc;		Value ValueLB, ValueUB, *ValueInc;
Type *MaxType;		Type *MaxType;
BasicBlock *ExitBlock;		BasicBlock *ExitBlock;
Value *IV;		Value *IV;
CmpInst::Predicate Predicate;		CmpInst::Predicate Predicate;
bool Parallel;		bool Parallel;

Parallel = KnownParallel \|\| (IslAstInfo::isParallel(For) &&		Parallel = KnownParallel \|\| (IslAstInfo::isParallel(For) &&
!IslAstInfo::isReductionParallel(For));		!IslAstInfo::isReductionParallel(For));

		bool LoopVectorizerDisabled =
		IsLoopVectorizerDisabled(isl::manage(isl_ast_node_copy(For)));

Body = isl_ast_node_for_get_body(For);		Body = isl_ast_node_for_get_body(For);

// isl_ast_node_for_is_degenerate(For)		// isl_ast_node_for_is_degenerate(For)
//		//
// TODO: For degenerated loops we could generate a plain assignment.		// TODO: For degenerated loops we could generate a plain assignment.
// However, for now we just reuse the logic for normal loops, which will		// However, for now we just reuse the logic for normal loops, which will
// create a loop with a single iteration.		// create a loop with a single iteration.

Show All 19 Lines	void IslNodeBuilder::createForSequential(__isl_take isl_ast_node *For,
if (MaxType != ValueInc->getType())		if (MaxType != ValueInc->getType())
ValueInc = Builder.CreateSExt(ValueInc, MaxType);		ValueInc = Builder.CreateSExt(ValueInc, MaxType);

// If we can show that LB <Predicate> UB holds at least once, we can		// If we can show that LB <Predicate> UB holds at least once, we can
// omit the GuardBB in front of the loop.		// omit the GuardBB in front of the loop.
bool UseGuardBB =		bool UseGuardBB =
!SE.isKnownPredicate(Predicate, SE.getSCEV(ValueLB), SE.getSCEV(ValueUB));		!SE.isKnownPredicate(Predicate, SE.getSCEV(ValueLB), SE.getSCEV(ValueUB));
IV = createLoop(ValueLB, ValueUB, ValueInc, Builder, LI, DT, ExitBlock,		IV = createLoop(ValueLB, ValueUB, ValueInc, Builder, LI, DT, ExitBlock,
Predicate, &Annotator, Parallel, UseGuardBB);		Predicate, &Annotator, Parallel, UseGuardBB,
		LoopVectorizerDisabled);
IDToValue[IteratorID] = IV;		IDToValue[IteratorID] = IV;

create(Body);		create(Body);

Annotator.popLoop(Parallel);		Annotator.popLoop(Parallel);

IDToValue.erase(IDToValue.find(IteratorID));		IDToValue.erase(IDToValue.find(IteratorID));

▲ Show 20 Lines • Show All 1,035 Lines • Show Last 20 Lines

polly/trunk/lib/CodeGen/LoopGenerators.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
// iteration of the loop. After the loop has finished, we branch to ExitBB.		// iteration of the loop. After the loop has finished, we branch to ExitBB.
// We expect the type of UB, LB, UB+Stride to be large enough for values that		// We expect the type of UB, LB, UB+Stride to be large enough for values that
// UB may take throughout the execution of the loop, including the computation		// UB may take throughout the execution of the loop, including the computation
// of indvar + Stride before the final abort.		// of indvar + Stride before the final abort.
Value polly::createLoop(Value LB, Value UB, Value Stride,		Value polly::createLoop(Value LB, Value UB, Value Stride,
PollyIRBuilder &Builder, LoopInfo &LI,		PollyIRBuilder &Builder, LoopInfo &LI,
DominatorTree &DT, BasicBlock *&ExitBB,		DominatorTree &DT, BasicBlock *&ExitBB,
ICmpInst::Predicate Predicate,		ICmpInst::Predicate Predicate,
ScopAnnotator *Annotator, bool Parallel,		ScopAnnotator *Annotator, bool Parallel, bool UseGuard,
bool UseGuard) {		bool LoopVectDisabled) {
Function *F = Builder.GetInsertBlock()->getParent();		Function *F = Builder.GetInsertBlock()->getParent();
LLVMContext &Context = F->getContext();		LLVMContext &Context = F->getContext();

assert(LB->getType() == UB->getType() && "Types of loop bounds do not match");		assert(LB->getType() == UB->getType() && "Types of loop bounds do not match");
IntegerType *LoopIVType = dyn_cast<IntegerType>(UB->getType());		IntegerType *LoopIVType = dyn_cast<IntegerType>(UB->getType());
assert(LoopIVType && "UB is not integer?");		assert(LoopIVType && "UB is not integer?");

BasicBlock *BeforeBB = Builder.GetInsertBlock();		BasicBlock *BeforeBB = Builder.GetInsertBlock();
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	Value polly::createLoop(Value LB, Value UB, Value Stride,
Stride = Builder.CreateZExtOrBitCast(Stride, LoopIVType);		Stride = Builder.CreateZExtOrBitCast(Stride, LoopIVType);
Value *IncrementedIV = Builder.CreateNSWAdd(IV, Stride, "polly.indvar_next");		Value *IncrementedIV = Builder.CreateNSWAdd(IV, Stride, "polly.indvar_next");
Value *LoopCondition =		Value *LoopCondition =
Builder.CreateICmp(Predicate, IncrementedIV, UB, "polly.loop_cond");		Builder.CreateICmp(Predicate, IncrementedIV, UB, "polly.loop_cond");

// Create the loop latch and annotate it as such.		// Create the loop latch and annotate it as such.
BranchInst *B = Builder.CreateCondBr(LoopCondition, HeaderBB, ExitBB);		BranchInst *B = Builder.CreateCondBr(LoopCondition, HeaderBB, ExitBB);
if (Annotator)		if (Annotator)
Annotator->annotateLoopLatch(B, NewLoop, Parallel);		Annotator->annotateLoopLatch(B, NewLoop, Parallel, LoopVectDisabled);

IV->addIncoming(IncrementedIV, HeaderBB);		IV->addIncoming(IncrementedIV, HeaderBB);
if (GuardBB)		if (GuardBB)
DT.changeImmediateDominator(ExitBB, GuardBB);		DT.changeImmediateDominator(ExitBB, GuardBB);
else		else
DT.changeImmediateDominator(ExitBB, HeaderBB);		DT.changeImmediateDominator(ExitBB, HeaderBB);

// The loop body should be added here.		// The loop body should be added here.
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

polly/trunk/lib/Transform/ScheduleOptimizer.cpp

Show First 20 Lines • Show All 987 Lines • ▼ Show 20 Lines	optimizeDataLayoutMatrMulPattern(isl::schedule_node Node, isl::map MapOldIndVar,
MicroKernelParamsTy MicroParams,		MicroKernelParamsTy MicroParams,
MacroKernelParamsTy MacroParams,		MacroKernelParamsTy MacroParams,
MatMulInfoTy &MMI) {		MatMulInfoTy &MMI) {
auto InputDimsId = MapOldIndVar.get_tuple_id(isl::dim::in);		auto InputDimsId = MapOldIndVar.get_tuple_id(isl::dim::in);
auto Stmt = static_cast<ScopStmt >(InputDimsId.get_user());		auto Stmt = static_cast<ScopStmt >(InputDimsId.get_user());

// Create a copy statement that corresponds to the memory access to the		// Create a copy statement that corresponds to the memory access to the
// matrix B, the second operand of the matrix multiplication.		// matrix B, the second operand of the matrix multiplication.
Node = Node.parent().parent().parent().parent().parent();		Node = Node.parent().parent().parent().parent().parent().parent();
Node = isl::manage(isl_schedule_node_band_split(Node.release(), 2)).child(0);		Node = isl::manage(isl_schedule_node_band_split(Node.release(), 2)).child(0);
auto AccRel = getMatMulAccRel(isl::manage(MapOldIndVar.copy()), 3, 7);		auto AccRel = getMatMulAccRel(isl::manage(MapOldIndVar.copy()), 3, 7);
unsigned FirstDimSize = MacroParams.Nc / MicroParams.Nr;		unsigned FirstDimSize = MacroParams.Nc / MicroParams.Nr;
unsigned SecondDimSize = MacroParams.Kc;		unsigned SecondDimSize = MacroParams.Kc;
unsigned ThirdDimSize = MicroParams.Nr;		unsigned ThirdDimSize = MicroParams.Nr;
auto *SAI = Stmt->getParent()->createScopArrayInfo(		auto *SAI = Stmt->getParent()->createScopArrayInfo(
MMI.B->getElementType(), "Packed_B",		MMI.B->getElementType(), "Packed_B",
{FirstDimSize, SecondDimSize, ThirdDimSize});		{FirstDimSize, SecondDimSize, ThirdDimSize});
Show All 36 Lines	NewStmt = Stmt->getParent()->addScopStmt(
OldAcc, MMI.A->getLatestAccessRelation(), Domain);		OldAcc, MMI.A->getLatestAccessRelation(), Domain);

// Restrict the domains of the copy statements to only execute when also its		// Restrict the domains of the copy statements to only execute when also its
// originating statement is executed.		// originating statement is executed.
ExtMap = ExtMap.set_tuple_id(isl::dim::out, DomainId);		ExtMap = ExtMap.set_tuple_id(isl::dim::out, DomainId);
ExtMap = ExtMap.intersect_range(Domain);		ExtMap = ExtMap.intersect_range(Domain);
ExtMap = ExtMap.set_tuple_id(isl::dim::out, NewStmt->getDomainId());		ExtMap = ExtMap.set_tuple_id(isl::dim::out, NewStmt->getDomainId());
Node = createExtensionNode(Node, ExtMap);		Node = createExtensionNode(Node, ExtMap);
return Node.child(0).child(0).child(0).child(0);		return Node.child(0).child(0).child(0).child(0).child(0);
}		}

/// Get a relation mapping induction variables produced by schedule		/// Get a relation mapping induction variables produced by schedule
/// transformations to the original ones.		/// transformations to the original ones.
///		///
/// @param Node The schedule node produced as the result of creation		/// @param Node The schedule node produced as the result of creation
/// of the BLIS kernels.		/// of the BLIS kernels.
/// @param MicroKernelParams, MacroKernelParams Parameters of the BLIS kernel		/// @param MicroKernelParams, MacroKernelParams Parameters of the BLIS kernel
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	isolateAndUnrollMatMulInnerLoops(isl::schedule_node Node,

isl::union_set IsolateOption =		isl::union_set IsolateOption =
getIsolateOptions(Prefix.add_dims(isl::dim::set, 3), 3);		getIsolateOptions(Prefix.add_dims(isl::dim::set, 3), 3);
isl::ctx Ctx = Node.get_ctx();		isl::ctx Ctx = Node.get_ctx();
isl::union_set AtomicOption = getAtomicOptions(Ctx);		isl::union_set AtomicOption = getAtomicOptions(Ctx);
isl::union_set Options = IsolateOption.unite(AtomicOption);		isl::union_set Options = IsolateOption.unite(AtomicOption);
Options = Options.unite(getUnrollIsolatedSetOptions(Ctx));		Options = Options.unite(getUnrollIsolatedSetOptions(Ctx));
Node = Node.band_set_ast_build_options(Options);		Node = Node.band_set_ast_build_options(Options);
Node = Node.parent().parent();		Node = Node.parent().parent().parent();
IsolateOption = getIsolateOptions(Prefix, 3);		IsolateOption = getIsolateOptions(Prefix, 3);
Options = IsolateOption.unite(AtomicOption);		Options = IsolateOption.unite(AtomicOption);
Node = Node.band_set_ast_build_options(Options);		Node = Node.band_set_ast_build_options(Options);
Node = Node.child(0).child(0);		Node = Node.child(0).child(0).child(0);
return Node;		return Node;
}		}

/// Mark @p BasePtr with "Inter iteration alias-free" mark node.		/// Mark @p BasePtr with "Inter iteration alias-free" mark node.
///		///
/// @param Node The child of the mark node to be inserted.		/// @param Node The child of the mark node to be inserted.
/// @param BasePtr The pointer to be marked.		/// @param BasePtr The pointer to be marked.
/// @return The modified isl_schedule_node.		/// @return The modified isl_schedule_node.
static isl::schedule_node markInterIterationAliasFree(isl::schedule_node Node,		static isl::schedule_node markInterIterationAliasFree(isl::schedule_node Node,
llvm::Value *BasePtr) {		llvm::Value *BasePtr) {
if (!BasePtr)		if (!BasePtr)
return Node;		return Node;

auto Id =		auto Id =
isl::id::alloc(Node.get_ctx(), "Inter iteration alias-free", BasePtr);		isl::id::alloc(Node.get_ctx(), "Inter iteration alias-free", BasePtr);
return Node.insert_mark(Id).child(0);		return Node.insert_mark(Id).child(0);
}		}

		/// Insert "Loop Vectorizer Disabled" mark node.
		///
		/// @param Node The child of the mark node to be inserted.
		/// @return The modified isl_schedule_node.
		static isl::schedule_node markLoopVectorizerDisabled(isl::schedule_node Node) {
		auto Id = isl::id::alloc(Node.get_ctx(), "Loop Vectorizer Disabled", nullptr);
		return Node.insert_mark(Id).child(0);
		}

/// Restore the initial ordering of dimensions of the band node		/// Restore the initial ordering of dimensions of the band node
///		///
/// In case the band node represents all the dimensions of the iteration		/// In case the band node represents all the dimensions of the iteration
/// domain, recreate the band node to restore the initial ordering of the		/// domain, recreate the band node to restore the initial ordering of the
/// dimensions.		/// dimensions.
///		///
/// @param Node The band node to be modified.		/// @param Node The band node to be modified.
/// @return The modified schedule node.		/// @return The modified schedule node.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	isl::schedule_node ScheduleTreeOptimizer::optimizeMatMulPattern(
Node = createMicroKernel(Node, MicroKernelParams);		Node = createMicroKernel(Node, MicroKernelParams);
if (MacroKernelParams.Mc == 1 \|\| MacroKernelParams.Nc == 1 \|\|		if (MacroKernelParams.Mc == 1 \|\| MacroKernelParams.Nc == 1 \|\|
MacroKernelParams.Kc == 1)		MacroKernelParams.Kc == 1)
return Node;		return Node;
auto MapOldIndVar = getInductionVariablesSubstitution(Node, MicroKernelParams,		auto MapOldIndVar = getInductionVariablesSubstitution(Node, MicroKernelParams,
MacroKernelParams);		MacroKernelParams);
if (!MapOldIndVar)		if (!MapOldIndVar)
return Node;		return Node;
		Node = markLoopVectorizerDisabled(Node.parent()).child(0);
Node = isolateAndUnrollMatMulInnerLoops(Node, MicroKernelParams);		Node = isolateAndUnrollMatMulInnerLoops(Node, MicroKernelParams);
return optimizeDataLayoutMatrMulPattern(Node, MapOldIndVar, MicroKernelParams,		return optimizeDataLayoutMatrMulPattern(Node, MapOldIndVar, MicroKernelParams,
MacroKernelParams, MMI);		MacroKernelParams, MMI);
}		}

bool ScheduleTreeOptimizer::isMatrMultPattern(isl::schedule_node Node,		bool ScheduleTreeOptimizer::isMatrMultPattern(isl::schedule_node Node,
const Dependences *D,		const Dependences *D,
MatMulInfoTy &MMI) {		MatMulInfoTy &MMI) {
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll

	Show All 36 Lines
	; CHECK-NEXT: for (int c3 = 96 * c2; c3 <= 96 * c2 + 95; c3 += 1)			; CHECK-NEXT: for (int c3 = 96 * c2; c3 <= 96 * c2 + 95; c3 += 1)
	; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= min(1022, 256 * c1 + 255); c5 += 1)			; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= min(1022, 256 * c1 + 255); c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: for (int c3 = 0; c3 <= 131; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 131; c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1022); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1022); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_12.ll

	Show All 20 Lines
	; CHECK-NEXT: for (int c5 = 512 * c1; c5 <= min(1019, 512 * c1 + 511); c5 += 1)			; CHECK-NEXT: for (int c5 = 512 * c1; c5 <= min(1019, 512 * c1 + 511); c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: for (int c3 = 0; c3 <= 30; c3 += 1) {			; CHECK-NEXT: for (int c3 = 0; c3 <= 30; c3 += 1) {
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 126); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 126); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 1, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 1, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 2, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 2, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 3, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 3, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 4, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 4, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 5, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 5, 512 * c1 + c5);
	▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 28, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 28, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 29, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 29, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 30, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 30, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 31, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + 7, 32 * c3 + 31, 512 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: if (c2 == 2)			; CHECK-NEXT: if (c2 == 2)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 <= 3; c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 <= 3; c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= 31; c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= 31; c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(c6 + 1016, 32 * c3 + c7, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(c6 + 1016, 32 * c3 + c7, 512 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 127); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 127); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(511, -512 * c1 + 1019); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 <= min(7, -384 * c2 - 8 * c4 + 1019); c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 <= min(7, -384 * c2 - 8 * c4 + 1019); c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= 27; c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= 27; c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + c6, c7 + 992, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4 + c6, c7 + 992, 512 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_13.ll

	Show All 22 Lines
	; CHECK-NEXT: for (int c5 = 307 * c1; c5 <= min(1999, 307 * c1 + 306); c5 += 1)			; CHECK-NEXT: for (int c5 = 307 * c1; c5 <= min(1999, 307 * c1 + 306); c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + 332); c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + 332); c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(306, -307 * c1 + 1999); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(306, -307 * c1 + 1999); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 1, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 1, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 2, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 2, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 3, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 3, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 4, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 4, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 5, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4, 1536 * c0 + 6 * c3 + 5, 307 * c1 + c5);
	Show All 21 Lines
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 3, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 3, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 4, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 4, 307 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 5, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + 4, 1536 * c0 + 6 * c3 + 5, 307 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: if (c0 == 1)			; CHECK-NEXT: if (c0 == 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(306, -307 * c1 + 1999); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(306, -307 * c1 + 1999); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 <= 4; c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 <= 4; c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= 1; c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= 1; c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + c6, c7 + 1998, 307 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(80 * c2 + 5 * c4 + c6, c7 + 1998, 307 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_14.ll

	; RUN: opt %loadPolly -polly-import-jscop -polly-opt-isl \			; RUN: opt %loadPolly -polly-import-jscop -polly-opt-isl \
	; RUN: -polly-target-throughput-vector-fma=1 \			; RUN: -polly-target-throughput-vector-fma=1 \
	; RUN: -polly-target-latency-vector-fma=8 \			; RUN: -polly-target-latency-vector-fma=8 \
	; RUN: -polly-target-1st-cache-level-associativity=8 \			; RUN: -polly-target-1st-cache-level-associativity=8 \
	; RUN: -polly-target-2nd-cache-level-associativity=8 \			; RUN: -polly-target-2nd-cache-level-associativity=8 \
	; RUN: -polly-target-1st-cache-level-size=32768 \			; RUN: -polly-target-1st-cache-level-size=32768 \
	; RUN: -polly-target-vector-register-bitwidth=256 \			; RUN: -polly-target-vector-register-bitwidth=256 \
	; RUN: -polly-target-2nd-cache-level-size=262144 \			; RUN: -polly-target-2nd-cache-level-size=262144 \
	; RUN: -polly-import-jscop-postfix=transformed -polly-codegen -S < %s \			; RUN: -polly-import-jscop-postfix=transformed -polly-codegen -S < %s \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s
	;			;
	; Check that we do not create different alias sets for locations represented by			; Check that we do not create different alias sets for locations represented by
	; different raw pointers.			; different raw pointers.
	;			;
				; Also check that we disable the Loop Vectorizer.
				;
	; CHECK-NOT: !76 = distinct !{!76, !5, !"second level alias metadata"}			; CHECK-NOT: !76 = distinct !{!76, !5, !"second level alias metadata"}
				; CHECK: !{!"llvm.loop.vectorize.enable", i1 false}
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define void @kernel_gemm(i32 %ni, i32 %nj, i32 %nk, [1024 x double]* %A, [1024 x double]* %B, [1024 x double]* %C, double* %C1) {			define void @kernel_gemm(i32 %ni, i32 %nj, i32 %nk, [1024 x double]* %A, [1024 x double]* %B, [1024 x double]* %C, double* %C1) {
	entry:			entry:
	br label %entry.split			br label %entry.split

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 96 * c2; c3 <= 96 * c2 + 95; c3 += 1)			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 96 * c2; c3 <= 96 * c2 + 95; c3 += 1)
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 256 * c1; c5 <= 256 * c1 + 255; c5 += 1)			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 256 * c1; c5 <= 256 * c1 + 255; c5 += 1)
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: CopyStmt_1(c3, 0, c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: CopyStmt_1(c3, 0, c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Tiles			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 131; c3 += 1)			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 131; c3 += 1)
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1)			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1)
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Loop Vectorizer Disabled
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Points			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: {			; EXTRACTION-OF-MACRO-KERNEL-NEXT: {
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_Copy_0(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_4.ll

	Show All 30 Lines
	; PATTERN-MATCHING-OPTS-NEXT: for (int c3 = 96 * c2; c3 <= min(1023, 96 * c2 + 95); c3 += 1)			; PATTERN-MATCHING-OPTS-NEXT: for (int c3 = 96 * c2; c3 <= min(1023, 96 * c2 + 95); c3 += 1)
	; PATTERN-MATCHING-OPTS-NEXT: for (int c4 = 256 * c1; c4 <= 256 * c1 + 255; c4 += 1)			; PATTERN-MATCHING-OPTS-NEXT: for (int c4 = 256 * c1; c4 <= 256 * c1 + 255; c4 += 1)
	; PATTERN-MATCHING-OPTS-NEXT: CopyStmt_1(c3, c4, 0);			; PATTERN-MATCHING-OPTS-NEXT: CopyStmt_1(c3, c4, 0);
	; PATTERN-MATCHING-OPTS-NEXT: // 1st level tiling - Points			; PATTERN-MATCHING-OPTS-NEXT: // 1st level tiling - Points
	; PATTERN-MATCHING-OPTS-NEXT: // Register tiling - Tiles			; PATTERN-MATCHING-OPTS-NEXT: // Register tiling - Tiles
	; PATTERN-MATCHING-OPTS-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)			; PATTERN-MATCHING-OPTS-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)
	; PATTERN-MATCHING-OPTS-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 255); c4 += 1)			; PATTERN-MATCHING-OPTS-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 255); c4 += 1)
	; PATTERN-MATCHING-OPTS-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {			; PATTERN-MATCHING-OPTS-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {
				; PATTERN-MATCHING-OPTS-NEXT: // Loop Vectorizer Disabled
	; PATTERN-MATCHING-OPTS-NEXT: // Register tiling - Points			; PATTERN-MATCHING-OPTS-NEXT: // Register tiling - Points
	; PATTERN-MATCHING-OPTS-NEXT: {			; PATTERN-MATCHING-OPTS-NEXT: {
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3);
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 1);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 1);
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 2);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 2);
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 3);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 3);
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 4);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 4);
	; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 5);			; PATTERN-MATCHING-OPTS-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 256 * c1 + c5, 8 * c3 + 5);
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_5.ll

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: if (ni >= 96 * c2 + 4)			; CHECK-NEXT: if (ni >= 96 * c2 + 4)
	; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + nj / 8 - 1); c3 += 1) {			; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + nj / 8 - 1); c3 += 1) {
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + ni / 4 - 1); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + ni / 4 - 1); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 1, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 1, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 2, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 2, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 2048 * c0 + 8 * c3 + 5, 256 * c1 + c5);
	Show All 22 Lines
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 5, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 6, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 6, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 7, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 2048 * c0 + 8 * c3 + 7, 256 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: if (96 * c2 + 95 >= ni)			; CHECK-NEXT: if (96 * c2 + 95 >= ni)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 < ni % 4; c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 < ni % 4; c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= 7; c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= 7; c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(-((ni + 4) % 4) + ni + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(-((ni + 4) % 4) + ni + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: if (96 * c2 + 3 >= ni \|\| (2048 * c0 + 2047 >= nj && nj % 8 >= 1))			; CHECK-NEXT: if (96 * c2 + 3 >= ni \|\| (2048 * c0 + 2047 >= nj && nj % 8 >= 1))
	; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + (nj - 1) / 8); c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= min(255, -256 * c0 + (nj - 1) / 8); c3 += 1)
	; CHECK-NEXT: if (96 * c2 + 3 >= ni \|\| 2048 * c0 + 8 * c3 + 7 >= nj)			; CHECK-NEXT: if (96 * c2 + 3 >= ni \|\| 2048 * c0 + 8 * c3 + 7 >= nj)
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + (ni - 1) / 4); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + (ni - 1) / 4); c4 += 1)
	; CHECK-NEXT: if ((ni >= 96 * c2 + 4 && 2048 * c0 + 8 * c3 + 7 >= nj) \|\| 1)			; CHECK-NEXT: if ((ni >= 96 * c2 + 4 && 2048 * c0 + 8 * c3 + 7 >= nj) \|\| 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, nk - 256 * c1 - 1); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 <= min(3, ni - 96 * c2 - 4 * c4 - 1); c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 <= min(3, ni - 96 * c2 - 4 * c4 - 1); c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= min(7, nj - 2048 * c0 - 8 * c3 - 1); c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= min(7, nj - 2048 * c0 - 8 * c3 - 1); c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + c6, 2048 * c0 + 8 * c3 + c7, 256 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_6.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= min(1019, 256 * c1 + 255); c5 += 1)			; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= min(1019, 256 * c1 + 255); c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: for (int c3 = 0; c3 <= 126; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 126; c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 254); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 254); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1019); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1019); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);
	Show All 22 Lines
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 5, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 6, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 6, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 7, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + 3, 8 * c3 + 7, 256 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 254); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 254); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1019); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(255, -256 * c1 + 1019); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: for (int c6 = 0; c6 <= 3; c6 += 1)			; CHECK-NEXT: for (int c6 = 0; c6 <= 3; c6 += 1)
	; CHECK-NEXT: for (int c7 = 0; c7 <= 3; c7 += 1)			; CHECK-NEXT: for (int c7 = 0; c7 <= 3; c7 += 1)
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + c6, c7 + 1016, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4 + c6, c7 + 1016, 256 * c1 + c5);
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: }			; CHECK-NEXT: }
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_7.ll

	Show All 30 Lines
	; CHECK-NEXT: for (int c3 = 128 * c2; c3 <= 128 * c2 + 127; c3 += 1)			; CHECK-NEXT: for (int c3 = 128 * c2; c3 <= 128 * c2 + 127; c3 += 1)
	; CHECK-NEXT: for (int c5 = 384 * c1; c5 <= min(1023, 384 * c1 + 383); c5 += 1)			; CHECK-NEXT: for (int c5 = 384 * c1; c5 <= min(1023, 384 * c1 + 383); c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= 15; c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= min(383, -384 * c1 + 1023); c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= min(383, -384 * c1 + 1023); c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3, 384 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 1, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 1, 384 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 2, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 2, 384 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 3, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 3, 384 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 4, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 4, 384 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 5, 384 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(128 * c2 + 8 * c4, 8 * c3 + 5, 384 * c1 + c5);
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_8.ll

	Show All 31 Lines
	; CHECK-NEXT: for (int c3 = 96 * c2; c3 <= min(1023, 96 * c2 + 95); c3 += 1)			; CHECK-NEXT: for (int c3 = 96 * c2; c3 <= min(1023, 96 * c2 + 95); c3 += 1)
	; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= 256 * c1 + 255; c5 += 1)			; CHECK-NEXT: for (int c5 = 256 * c1; c5 <= 256 * c1 + 255; c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 127; c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 255); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(23, -24 * c2 + 255); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 1, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 2, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 3, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 4, 256 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(96 * c2 + 4 * c4, 8 * c3 + 5, 256 * c1 + c5);
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_9.ll

	Show All 37 Lines
	; CHECK-NEXT: for (int c3 = 384 * c2; c3 <= min(1023, 384 * c2 + 383); c3 += 1)			; CHECK-NEXT: for (int c3 = 384 * c2; c3 <= min(1023, 384 * c2 + 383); c3 += 1)
	; CHECK-NEXT: for (int c5 = 512 * c1; c5 <= 512 * c1 + 511; c5 += 1)			; CHECK-NEXT: for (int c5 = 512 * c1; c5 <= 512 * c1 + 511; c5 += 1)
	; CHECK-NEXT: CopyStmt_1(c3, 0, c5);			; CHECK-NEXT: CopyStmt_1(c3, 0, c5);
	; CHECK-NEXT: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK-NEXT: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1)
	; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 127); c4 += 1)			; CHECK-NEXT: for (int c4 = 0; c4 <= min(47, -48 * c2 + 127); c4 += 1)
	; CHECK-NEXT: for (int c5 = 0; c5 <= 511; c5 += 1) {			; CHECK-NEXT: for (int c5 = 0; c5 <= 511; c5 += 1) {
				; CHECK-NEXT: // Loop Vectorizer Disabled
	; CHECK-NEXT: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 1, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 1, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 2, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 2, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 3, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 3, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 4, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 4, 512 * c1 + c5);
	; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 5, 512 * c1 + c5);			; CHECK-NEXT: Stmt_for_body6(384 * c2 + 8 * c4, 32 * c3 + 5, 512 * c1 + c5);
	▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Polly][MatMul][WIP] Disable the Loop VectorizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 112199

polly/trunk/include/polly/CodeGen/IRBuilder.h

polly/trunk/include/polly/CodeGen/LoopGenerators.h

polly/trunk/lib/CodeGen/IRBuilder.cpp

polly/trunk/lib/CodeGen/IslNodeBuilder.cpp

polly/trunk/lib/CodeGen/LoopGenerators.cpp

polly/trunk/lib/Transform/ScheduleOptimizer.cpp

polly/trunk/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_12.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_13.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_14.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_4.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_5.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_6.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_7.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_8.ll

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_9.ll

[Polly][MatMul][WIP] Disable the Loop Vectorizer
ClosedPublic