This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Change the loop order of micro and macro kernels
ClosedPublic

Authored by gareevroman on Oct 16 2016, 12:59 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
grosser
jdoerfert

Commits

rG8babe1a21609: The order of the loops defines the data reused in the BLIS implementation of…
rPLO289806: The order of the loops defines the data reused in the BLIS implementation of
rL289806: The order of the loops defines the data reused in the BLIS implementation of

Summary

The order of the loops defines the data reused in the BLIS implementation of gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used.

In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A .

Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Diff Detail

Event Timeline

gareevroman updated this revision to Diff 74785.Oct 16 2016, 12:59 AM

gareevroman retitled this revision from to [Polly] Change the loop order of micro and macro kernels.

gareevroman updated this object.

gareevroman added reviewers: grosser, Meinersbur, jdoerfert.

gareevroman added a subscriber: pollydev.

gareevroman updated this object.Oct 16 2016, 1:02 AM

gareevroman added a child revision: D25655: [Polly] Restrict ranges of extension maps.Oct 16 2016, 6:12 AM

Hi Roman,

the general direction looks fine to me. Two questions:

Can you add comments that explain why this interchange is done?
How does this impact performance on your system?

In D25653#572527, @grosser wrote:

Hi Roman,

the general direction looks fine to me. Two questions:

Hi Tobias,

thanks for the comments!

Can you add comments that explain why this interchange is done?

Maybe we could extend the comments of ScheduleTreeOptimizer::optimizeMatMulPattern. I've done it in the new version of the patch.

How does this impact performance on your system?

I've added this information to the summary. This patch helps to attain 40.45% of theoretical peak. However, determination of optimal values of the Nc parameter will help to attain 77,74% of theoretical peak.

Hi Roman,

sorry for the delay. I think the documentation clearly needs to be more extensive here. At the very least you should include the comment from the commit message in the source code. More importantly, it would be good to explain precisely what [1] assumes, what the LLVM vectorizer does, how this differs and with what changes to the algorithm in [1] we take care of this difference.

Best,
Tobias

Hi Roman,

sorry for the delay. I think the documentation clearly needs to be more extensive here. At the very least you should include the comment from the commit message in the source code. More importantly, it would be good to explain precisely what [1] assumes, what the LLVM vectorizer does, how this differs and with what changes to the algorithm in [1] we take care of this difference.

Hi Tobias,

thanks for the comments! I've tried to address them.

LGTM.

grosser accepted this revision.Dec 15 2016, 3:49 AM

grosser edited edge metadata.

This revision is now accepted and ready to land.Dec 15 2016, 3:49 AM

Closed by commit rL289806: The order of the loops defines the data reused in the BLIS implementation of (authored by romangareev). · Explain WhyDec 15 2016, 3:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transform/

ScheduleOptimizer.cpp

60 lines

test/

ScheduleOptimizer/

mat_mul_pattern_data_layout.ll

50 lines

pattern-matching-based-opts_3.ll

209 lines

Diff 74785

lib/Transform/ScheduleOptimizer.cpp

Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	PartialSchedule = isl_multi_union_pw_aff_set_union_pw_aff(
PartialSchedule, FirstDim, PartialScheduleSecondDim);		PartialSchedule, FirstDim, PartialScheduleSecondDim);
Node = isl_schedule_node_delete(Node);		Node = isl_schedule_node_delete(Node);
Node = isl_schedule_node_insert_partial_schedule(Node, PartialSchedule);		Node = isl_schedule_node_insert_partial_schedule(Node, PartialSchedule);
return Node;		return Node;
}		}

__isl_give isl_schedule_node *ScheduleTreeOptimizer::createMicroKernel(		__isl_give isl_schedule_node *ScheduleTreeOptimizer::createMicroKernel(
__isl_take isl_schedule_node *Node, MicroKernelParamsTy MicroKernelParams) {		__isl_take isl_schedule_node *Node, MicroKernelParamsTy MicroKernelParams) {
return applyRegisterTiling(Node, {MicroKernelParams.Mr, MicroKernelParams.Nr},		applyRegisterTiling(Node, {MicroKernelParams.Mr, MicroKernelParams.Nr}, 1);
1);		Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));
		Node = permuteBandNodeDimensions(Node, 0, 1);
		return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);
}		}

__isl_give isl_schedule_node *ScheduleTreeOptimizer::createMacroKernel(		__isl_give isl_schedule_node *ScheduleTreeOptimizer::createMacroKernel(
__isl_take isl_schedule_node *Node, MacroKernelParamsTy MacroKernelParams) {		__isl_take isl_schedule_node *Node, MacroKernelParamsTy MacroKernelParams) {
assert(isl_schedule_node_get_type(Node) == isl_schedule_node_band);		assert(isl_schedule_node_get_type(Node) == isl_schedule_node_band);
if (MacroKernelParams.Mc == 1 && MacroKernelParams.Nc == 1 &&		if (MacroKernelParams.Mc == 1 && MacroKernelParams.Nc == 1 &&
MacroKernelParams.Kc == 1)		MacroKernelParams.Kc == 1)
return Node;		return Node;
Node = tileNode(		Node = tileNode(
Node, "1st level tiling",		Node, "1st level tiling",
{MacroKernelParams.Mc, MacroKernelParams.Nc, MacroKernelParams.Kc}, 1);		{MacroKernelParams.Mc, MacroKernelParams.Nc, MacroKernelParams.Kc}, 1);
Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));		Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));
Node = permuteBandNodeDimensions(Node, 1, 2);		Node = permuteBandNodeDimensions(Node, 1, 2);
		Node = permuteBandNodeDimensions(Node, 0, 2);
return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);		return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);
}		}

/// Get parameters of the BLIS micro kernel.		/// Get parameters of the BLIS micro kernel.
///		///
/// We choose the Mr and Nr parameters of the micro kernel to be large enough		/// We choose the Mr and Nr parameters of the micro kernel to be large enough
/// such that no stalls caused by the combination of latencies and dependencies		/// such that no stalls caused by the combination of latencies and dependencies
/// are introduced during the updates of the resulting matrix of the matrix		/// are introduced during the updates of the resulting matrix of the matrix
Show All 40 Lines	getMacroKernelParams(const MicroKernelParamsTy &MicroKernelParams) {
// for determination of the parameters should be used.		// for determination of the parameters should be used.
if (!(MicroKernelParams.Mr > 0 && MicroKernelParams.Nr > 0 &&		if (!(MicroKernelParams.Mr > 0 && MicroKernelParams.Nr > 0 &&
CacheLevelSizes.size() >= 2 && CacheLevelAssociativity.size() >= 2 &&		CacheLevelSizes.size() >= 2 && CacheLevelAssociativity.size() >= 2 &&
CacheLevelSizes[0] > 0 && CacheLevelSizes[1] > 0 &&		CacheLevelSizes[0] > 0 && CacheLevelSizes[1] > 0 &&
CacheLevelAssociativity[0] > 2 && CacheLevelAssociativity[1] > 2))		CacheLevelAssociativity[0] > 2 && CacheLevelAssociativity[1] > 2))
return {1, 1, 1};		return {1, 1, 1};
int Cbr = floor(		int Cbr = floor(
(CacheLevelAssociativity[0] - 1) /		(CacheLevelAssociativity[0] - 1) /
(1 + static_cast<double>(MicroKernelParams.Mr) / MicroKernelParams.Nr));		(1 + static_cast<double>(MicroKernelParams.Nr) / MicroKernelParams.Mr));
int Kc = (Cbr * CacheLevelSizes[0]) /		int Kc = (Cbr * CacheLevelSizes[0]) /
(MicroKernelParams.Nr * CacheLevelAssociativity[0] * 8);		(MicroKernelParams.Mr * CacheLevelAssociativity[0] * 8);
double Cac = static_cast<double>(MicroKernelParams.Mr * Kc * 8 *		double Cac = static_cast<double>(Kc * 8 * CacheLevelAssociativity[1]) /
CacheLevelAssociativity[1]) /
CacheLevelSizes[1];		CacheLevelSizes[1];
double Cbc = static_cast<double>(MicroKernelParams.Nr * Kc * 8 *		double Cbc = static_cast<double>(Kc * 8 * CacheLevelAssociativity[1]) /
CacheLevelAssociativity[1]) /
CacheLevelSizes[1];		CacheLevelSizes[1];
int Mc = floor(MicroKernelParams.Mr / Cac);		int Mc = floor((CacheLevelAssociativity[1] - 2) / Cac);
int Nc =		int Nc = floor(1 / Cbc);
floor((MicroKernelParams.Nr * (CacheLevelAssociativity[1] - 2)) / Cbc);
return {Mc, Nc, Kc};		return {Mc, Nc, Kc};
}		}

/// Identify a memory access through the shape of its memory access relation.		/// Identify a memory access through the shape of its memory access relation.
///		///
/// Identify the unique memory access in @p Stmt, that has an access relation		/// Identify the unique memory access in @p Stmt, that has an access relation
/// equal to @p ExpectedAccessRelation.		/// equal to @p ExpectedAccessRelation.
///		///
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	if (!MemAccessA \|\| !MemAccessB) {
isl_map_free(MapOldIndVar);		isl_map_free(MapOldIndVar);
return Node;		return Node;
}		}
Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));		Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));
Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));		Node = isl_schedule_node_parent(isl_schedule_node_parent(Node));
Node = isl_schedule_node_parent(Node);		Node = isl_schedule_node_parent(Node);
Node = isl_schedule_node_child(isl_schedule_node_band_split(Node, 2), 0);		Node = isl_schedule_node_child(isl_schedule_node_band_split(Node, 2), 0);
auto *AccRel =		auto *AccRel =
getMatMulAccRel(isl_map_copy(MapOldIndVar), MacroParams.Kc, 3, 6);		getMatMulAccRel(isl_map_copy(MapOldIndVar), MacroParams.Kc, 3, 7);
unsigned FirstDimSize = MacroParams.Mc * MacroParams.Kc / MicroParams.Mr;		unsigned FirstDimSize = MacroParams.Nc * MacroParams.Kc / MicroParams.Nr;
unsigned SecondDimSize = MicroParams.Mr;		unsigned SecondDimSize = MicroParams.Nr;
auto *SAI = Stmt->getParent()->createScopArrayInfo(		auto *SAI = Stmt->getParent()->createScopArrayInfo(
MemAccessA->getElementType(), "Packed_A", {FirstDimSize, SecondDimSize});		MemAccessB->getElementType(), "Packed_B", {FirstDimSize, SecondDimSize});
AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());		AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());
auto *OldAcc = MemAccessA->getAccessRelation();		auto *OldAcc = MemAccessB->getAccessRelation();
MemAccessA->setNewAccessRelation(AccRel);		MemAccessB->setNewAccessRelation(AccRel);
auto *ExtMap =		auto *ExtMap =
getMatMulExt(Stmt->getIslCtx(), MacroParams.Mc, 0, MacroParams.Kc);		getMatMulExt(Stmt->getIslCtx(), 0, MacroParams.Nc, MacroParams.Kc);
ExtMap = isl_map_project_out(ExtMap, isl_dim_in, 1, 1);		isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 0, 1);
		isl_map_move_dims(ExtMap, isl_dim_in, 2, isl_dim_out, 0, 1);
		ExtMap = isl_map_project_out(ExtMap, isl_dim_in, 2, 1);
auto *Domain = Stmt->getDomain();		auto *Domain = Stmt->getDomain();
auto *NewStmt = Stmt->getParent()->addScopStmt(		auto *NewStmt = Stmt->getParent()->addScopStmt(
OldAcc, MemAccessA->getAccessRelation(), isl_set_copy(Domain));		OldAcc, MemAccessB->getAccessRelation(), isl_set_copy(Domain));
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());
Node = createExtensionNode(Node, ExtMap);		Node = createExtensionNode(Node, ExtMap);
Node = isl_schedule_node_child(Node, 0);		Node = isl_schedule_node_child(Node, 0);
AccRel = getMatMulAccRel(MapOldIndVar, MacroParams.Kc, 4, 7);		AccRel = getMatMulAccRel(MapOldIndVar, MacroParams.Kc, 4, 6);
FirstDimSize = MacroParams.Nc * MacroParams.Kc / MicroParams.Nr;		FirstDimSize = MacroParams.Mc * MacroParams.Kc / MicroParams.Mr;
SecondDimSize = MicroParams.Nr;		SecondDimSize = MicroParams.Mr;
SAI = Stmt->getParent()->createScopArrayInfo(		SAI = Stmt->getParent()->createScopArrayInfo(
MemAccessB->getElementType(), "Packed_B", {FirstDimSize, SecondDimSize});		MemAccessA->getElementType(), "Packed_A", {FirstDimSize, SecondDimSize});
AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());		AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());
OldAcc = MemAccessB->getAccessRelation();		OldAcc = MemAccessA->getAccessRelation();
MemAccessB->setNewAccessRelation(AccRel);		MemAccessA->setNewAccessRelation(AccRel);
ExtMap = getMatMulExt(Stmt->getIslCtx(), 0, MacroParams.Nc, MacroParams.Kc);		ExtMap = getMatMulExt(Stmt->getIslCtx(), MacroParams.Mc, 0, MacroParams.Kc);
isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 1, 1);		isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 0, 1);
isl_map_move_dims(ExtMap, isl_dim_in, 2, isl_dim_out, 0, 1);		isl_map_move_dims(ExtMap, isl_dim_in, 2, isl_dim_out, 0, 1);
NewStmt = Stmt->getParent()->addScopStmt(		NewStmt = Stmt->getParent()->addScopStmt(
OldAcc, MemAccessB->getAccessRelation(), Domain);		OldAcc, MemAccessA->getAccessRelation(), Domain);
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());
Node = createExtensionNode(Node, ExtMap);		Node = createExtensionNode(Node, ExtMap);
Node = isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);		Node = isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);
return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);		return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);
}		}

/// Get a relation mapping induction variables produced by schedule		/// Get a relation mapping induction variables produced by schedule
/// transformations to the original ones.		/// transformations to the original ones.
▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll

	; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -polly-target-cache-level-associativity=8,8 -polly-target-cache-level-sizes=32768,262144 -polly-optimized-scops < %s 2>&1 \| FileCheck %s			; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -polly-target-cache-level-associativity=8,8 -polly-target-cache-level-sizes=32768,262144 -polly-optimized-scops < %s 2>&1 \| FileCheck %s
	;			;
	; /* C := alphaAB + betaC /			; /* C := alphaAB + betaC /
	; for (i = 0; i < _PB_NI; i++)			; for (i = 0; i < _PB_NI; i++)
	; for (j = 0; j < _PB_NJ; j++)			; for (j = 0; j < _PB_NJ; j++)
	; {			; {
	; C[i][j] *= beta;			; C[i][j] *= beta;
	; for (k = 0; k < _PB_NK; ++k)			; for (k = 0; k < _PB_NK; ++k)
	; C[i][j] += alpha * A[i][k] * B[k][j];			; C[i][j] += alpha * A[i][k] * B[k][j];
	; }			; }
	;			;
	; CHECK: double Packed_A[ { [] -> [(1024)] } ][ { [] -> [(4)] } ]; // Element size 8			; CHECK: double Packed_B[ { [] -> [(512)] } ][ { [] -> [(8)] } ]; // Element size 8
	; CHECK: double Packed_B[ { [] -> [(3072)] } ][ { [] -> [(8)] } ]; // Element size 8			; CHECK-NEXT: double Packed_A[ { [] -> [(6144)] } ][ { [] -> [(4)] } ]; // Element size 8
	;			;
	; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg6[i0, i2] };			; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg6[i0, i2] };
	; CHECK: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_A[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 4floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 16floor((i0)/16) <= 4floor((o0)/256) <= i0 - 16*floor((i0)/16) };			; CHECK-NEXT: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_A[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 4floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 96floor((i0)/96) <= 4floor((o0)/256) <= i0 - 96*floor((i0)/96) };
	;			;
	; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg7[i2, i1] };			; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg7[i2, i1] };
	; CHECK: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_B[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 8floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 96floor((i1)/96) <= 8floor((o0)/256) <= i1 - 96*floor((i1)/96) };			; CHECK-NEXT: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_B[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 8floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 16floor((i1)/16) <= 8floor((o0)/256) <= i1 - 16*floor((i1)/16) };
	;			;
	; CHECK: CopyStmt_0			; CHECK: CopyStmt_0
	; CHECK: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK: { CopyStmt_0[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 };			; CHECK-NEXT: { CopyStmt_0[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 };
	; CHECK: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK: ;			; CHECK-NEXT: ;
	; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: null;			; CHECK-NEXT: null;
	; CHECK: new: { CopyStmt_0[i0, i1, i2] -> Packed_A[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 4floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 16floor((i0)/16) <= 4floor((o0)/256) <= i0 - 16*floor((i0)/16) };			; CHECK-NEXT: new: { CopyStmt_0[i0, i1, i2] -> Packed_B[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 8floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 16floor((i1)/16) <= 8floor((o0)/256) <= i1 - 16*floor((i1)/16) };
	; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: null;			; CHECK-NEXT: null;
	; CHECK: new: { CopyStmt_0[i0, i1, i2] -> MemRef_arg6[i0, i2] };			; CHECK-NEXT: new: { CopyStmt_0[i0, i1, i2] -> MemRef_arg7[i2, i1] };
	; CHECK: CopyStmt_1			; CHECK-NEXT: CopyStmt_1
	; CHECK: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK: { CopyStmt_1[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 };			; CHECK-NEXT: { CopyStmt_1[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 };
	; CHECK: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK: ;			; CHECK-NEXT: ;
	; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: null;			; CHECK-NEXT: null;
	; CHECK: new: { CopyStmt_1[i0, i1, i2] -> Packed_B[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 8floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 96floor((i1)/96) <= 8floor((o0)/256) <= i1 - 96*floor((i1)/96) };			; CHECK-NEXT: new: { CopyStmt_1[i0, i1, i2] -> Packed_A[o0, o1] : 256floor((-i2 + o0)/256) = -i2 + o0 and 4floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 96floor((i0)/96) <= 4floor((o0)/256) <= i0 - 96*floor((i0)/96) };
	; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK: null;			; CHECK-NEXT: null;
	; CHECK: new: { CopyStmt_1[i0, i1, i2] -> MemRef_arg7[i2, i1] };			; CHECK-NEXT: new: { CopyStmt_1[i0, i1, i2] -> MemRef_arg6[i0, i2] };
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define internal void @kernel_gemm(i32 %arg, i32 %arg1, i32 %arg2, double %arg3, double %arg4, [1056 x double]* %arg5, [1024 x double]* %arg6, [1056 x double]* %arg7) #0 {			define internal void @kernel_gemm(i32 %arg, i32 %arg1, i32 %arg2, double %arg3, double %arg4, [1056 x double]* %arg5, [1024 x double]* %arg6, [1056 x double]* %arg7) #0 {
	bb:			bb:
	br label %bb8			br label %bb8

	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

test/ScheduleOptimizer/pattern-matching-based-opts_3.ll

	; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -analyze -polly-ast < %s 2>&1 \| FileCheck %s			; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -analyze -polly-ast < %s 2>&1 \| FileCheck %s
	; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -analyze -polly-ast -polly-target-cache-level-associativity=8,8 -polly-target-cache-level-sizes=32768,262144 < %s 2>&1 \| FileCheck %s --check-prefix=EXTRACTION-OF-MACRO-KERNEL			; RUN: opt %loadPolly -polly-opt-isl -polly-pattern-matching-based-opts=true -polly-target-througput-vector-fma=1 -polly-target-latency-vector-fma=8 -analyze -polly-ast -polly-target-cache-level-associativity=8,8 -polly-target-cache-level-sizes=32768,262144 < %s 2>&1 \| FileCheck %s --check-prefix=EXTRACTION-OF-MACRO-KERNEL
	;			;
	; /* C := alphaAB + betaC /			; /* C := alphaAB + betaC /
	; for (i = 0; i < _PB_NI; i++)			; for (i = 0; i < _PB_NI; i++)
	; for (j = 0; j < _PB_NJ; j++)			; for (j = 0; j < _PB_NJ; j++)
	; {			; {
	; C[i][j] *= beta;			; C[i][j] *= beta;
	; for (k = 0; k < _PB_NK; ++k)			; for (k = 0; k < _PB_NK; ++k)
	; C[i][j] += alpha * A[i][k] * B[k][j];			; C[i][j] += alpha * A[i][k] * B[k][j];
	; }			; }
	;			;
	; CHECK: {			; CHECK: {
	; CHECK: // 1st level tiling - Tiles			; CHECK-NEXT: // 1st level tiling - Tiles
	; CHECK: for (int c0 = 0; c0 <= 32; c0 += 1)			; CHECK-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1)
	; CHECK: for (int c1 = 0; c1 <= 32; c1 += 1) {			; CHECK-NEXT: for (int c1 = 0; c1 <= 32; c1 += 1) {
	; CHECK: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK: for (int c2 = 0; c2 <= 31; c2 += 1)			; CHECK-NEXT: for (int c2 = 0; c2 <= 31; c2 += 1)
	; CHECK: for (int c3 = 0; c3 <= 31; c3 += 1)			; CHECK-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1)
	; CHECK: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3);			; CHECK-NEXT: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3);
	; CHECK: }			; CHECK-NEXT: }
	; CHECK: // Register tiling - Tiles			; CHECK-NEXT: // Register tiling - Tiles
	; CHECK: for (int c0 = 0; c0 <= 263; c0 += 1)			; CHECK-NEXT: for (int c0 = 0; c0 <= 131; c0 += 1)
	; CHECK: for (int c1 = 0; c1 <= 131; c1 += 1)			; CHECK-NEXT: for (int c1 = 0; c1 <= 263; c1 += 1)
	; CHECK: for (int c2 = 0; c2 <= 1023; c2 += 1) {			; CHECK-NEXT: for (int c2 = 0; c2 <= 1023; c2 += 1) {
	; CHECK: // Register tiling - Points			; CHECK-NEXT: // Register tiling - Points
	; CHECK: // 1st level tiling - Tiles			; CHECK-NEXT: // 1st level tiling - Tiles
	; CHECK: // 1st level tiling - Points			; CHECK-NEXT: // 1st level tiling - Points
	; CHECK: {			; CHECK-NEXT: {
	; CHECK: Stmt_bb24(4 * c0, 8 * c1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 1, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 2, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 2, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 3, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 3, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 4, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 4, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 5, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 5, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 6, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 6, c2);
	; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 7, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 7, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 1, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 2, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 2, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 3, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 3, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 4, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 4, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 5, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 5, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 6, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 6, c2);
	; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 7, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 7, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 1, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 2, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 2, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 3, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 3, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 4, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 4, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 5, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 5, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 6, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 6, c2);
	; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 7, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 7, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 1, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 1, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 2, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 2, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 3, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 3, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 4, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 4, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 5, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 5, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 6, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 6, c2);
	; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 7, c2);			; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 7, c2);
	; CHECK: }			; CHECK-NEXT: }
	; CHECK: }			; CHECK-NEXT: }
	; CHECK: }			; CHECK-NEXT: }
	;			;
	; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: for (int c0 = 0; c0 <= 65; c0 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: for (int c1 = 0; c1 <= 3; c1 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: for (int c2 = 0; c2 <= 10; c2 += 1) {
	; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: // Register tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: for (int c3 = 0; c3 <= 3; c3 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: for (int c4 = 0; c4 <= 11; c4 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: for (int c5 = 0; c5 <= 255; c5 += 1) {
	; EXTRACTION-OF-MACRO-KERNEL: // Register tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: {			; EXTRACTION-OF-MACRO-KERNEL: {
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c1 = 0; c1 <= 32; c1 += 1) {
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c2 = 0; c2 <= 31; c2 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: }
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c0 = 0; c0 <= 65; c0 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c1 = 0; c1 <= 3; c1 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c2 = 0; c2 <= 10; c2 += 1) {
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 1; c3 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1)
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) {
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: {
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5);			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: }			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: }			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: }			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5);
	; EXTRACTION-OF-MACRO-KERNEL: }			; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5);
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: }
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: }
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: }
				; EXTRACTION-OF-MACRO-KERNEL-NEXT: }
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define internal void @kernel_gemm(i32 %arg, i32 %arg1, i32 %arg2, double %arg3, double %arg4, [1056 x double]* %arg5, [1024 x double]* %arg6, [1056 x double]* %arg7) #0 {			define internal void @kernel_gemm(i32 %arg, i32 %arg1, i32 %arg2, double %arg3, double %arg4, [1056 x double]* %arg5, [1024 x double]* %arg6, [1056 x double]* %arg7) #0 {
	bb:			bb:
	br label %bb8			br label %bb8

	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Change the loop order of micro and macro kernelsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74785

lib/Transform/ScheduleOptimizer.cpp

test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll

test/ScheduleOptimizer/pattern-matching-based-opts_3.ll

[Polly] Change the loop order of micro and macro kernels
ClosedPublic