This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
polly/trunk/
-
trunk/
-
lib/Transform/
-
Transform/
-
ScheduleOptimizer.cpp
-
test/ScheduleOptimizer/
-
ScheduleOptimizer/
-
kernel_gemm___%for.cond1.preheader---%for.end18.jscop.transformed
3
pattern-matching-based-opts_11.ll

Differential D33138

[Polly][WIP] Make the pattern matching work with modified memory accesses
ClosedPublic

Authored by gareevroman on May 12 2017, 10:10 AM.

Download Raw Diff

Details

Reviewers

grosser
Meinersbur
jdoerfert
bollu

Commits

rG750374181b0e: Make the pattern matching work with modified memory accesses
rPLO308494: Make the pattern matching work with modified memory accesses
rL308494: Make the pattern matching work with modified memory accesses

Summary

Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change their MemoryKind). Consequently, the pattern matching should take it into the account.

Diff Detail

Repository: rL LLVM

Event Timeline

gareevroman created this revision.May 12 2017, 10:10 AM

gareevroman added inline comments.

lib/Transform/ScheduleOptimizer.cpp
676 ↗	(On Diff #98793)	Maybe there is a better way to examine the corresponding non-partial memory accesses.
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1 ↗	(On Diff #98793)	The test case should be simplified. Also, in case the order of memory accesses is not preserved by DeLICM, it won’t be passed

The isLatestArrayKind() part is clear, but what does the isMatMulOperandAcc part do?

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
2 ↗	(On Diff #98793)	`-polly-delicm-overapproximate-writes`, but no `-polly-delicm` in the command line?

In D33138#753860, @Meinersbur wrote:

The isLatestArrayKind() part is clear, but what does the isMatMulOperandAcc part do?

AFAIU, DeLICM can produce partial memory accesses (e.g., [Bsize2, Asize2, Asize1, tmp18_pre, tmp11] -> { Stmt_KernelStmt[i0, i1, i2] -> MemRef_tmp12[i0, i1] : 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2 };). Consequently, the corresponding isl maps will contain additional constrains (e.g., 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2) along with o0 = i0 and o1 = i1 that we check using isMatMulOperandAcc. This part drops the additional constrains since we aren't interested in checking them.

I'm not sure whether the corresponding copy statements handle partial memory accesses correctly (I'll try to check it soon). However, the patch should already help to test the detection.

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
2 ↗	(On Diff #98793)	Thanks. It should be added.

Hi Roman,

I just tried to run the following command on polybench 3.2 and could not get the pattern based optimizations to work:

clang linear-algebra/kernels/gemm/gemm.c -O3 -DPOLYBENCH_TIME -I utilities/ -mllvm -polly -mllvm -polly-tiling=true -mllvm -polly-position=before-vectorizer -mllvm -polly-enable-delicm -mllvm -debug-only=polly-delicm -mllvm -polly-delicm-overapproximate-writes -mllvm -debug-only=polly-ast -mllvm -polly=true -fno-vectorize -fno-inline -mllvm -polly-enable-simplify utilities/polybench.c

using

git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@302926 91177308-0d34-0410-b5e6-96231b3b80d8

as well as:

[Polly][WIP] Make the pattern matching work with modified memory accesses
Differential Revision: https://reviews.llvm.org/D33138

[Polly][Simplify] Remove writes that are overwritten.
Differential Revision: https://reviews.llvm.org/D33142

Any idea what might be missing?

Best,
Tobias

lib/Transform/ScheduleOptimizer.cpp
676 ↗	(On Diff #98793)	Instead of inspecting the constraints, we could build a model of how the memory access should look like and then compare the memory access map with this model.

Hi Roman,

what is missing here?

Update the revision.

Herald added a reviewer: bollu. · View Herald TranscriptJun 18 2017, 7:24 AM

In D33138#754070, @grosser wrote:
Hi Roman,

I just tried to run the following command on polybench 3.2 and could not get the pattern based optimizations to work:

clang linear-algebra/kernels/gemm/gemm.c -O3 -DPOLYBENCH_TIME -I utilities/ -mllvm -polly -mllvm -polly-tiling=true -mllvm -polly-position=before-vectorizer -mllvm -polly-enable-delicm -mllvm -debug-only=polly-delicm -mllvm -polly-delicm-overapproximate-writes -mllvm -debug-only=polly-ast -mllvm -polly=true -fno-vectorize -fno-inline -mllvm -polly-enable-simplify utilities/polybench.c

using
git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@302926 91177308-0d34-0410-b5e6-96231b3b80d8
as well as:
[Polly][WIP] Make the pattern matching work with modified memory accesses
Differential Revision: https://reviews.llvm.org/D33138

[Polly][Simplify] Remove writes that are overwritten.
Differential Revision: https://reviews.llvm.org/D33142
Any idea what might be missing?

Best,
Tobias

Hi Tobias,

The only problem I see here is the order of memory access. For some reason, the access to the matrix B is last one. The pattern matching should detect it, if the write access is the last memory access.

Could you try to run it without -polly-enable-simplify? It helps in my case.

lib/Transform/ScheduleOptimizer.cpp
676 ↗	(On Diff #98793)	AFAIU, in case the FirstPos and the SecondPos are unknown (e.g., containsMatrMult), we would have to build and compare a separate model for every pair of unknown dimensions (e.g. S(i0, i1, k)->M(i0, i1) and S(i0, i1, k)->M(i1, i0) in case the loop nest contains three loops and P(n - 1, 2), where n is the number of dimensions of the loop nest in general case). I propose to leave that discussion for know.
767 ↗	(On Diff #102968)	Domains of write and read accesses modified by DeLICM can be different. For example, write and read accesses to the matrix C from the attached test case are, respectively: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }; ReadAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_body6[i0, i1, i2] -> MemRef1__phi[] }; new: { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 };

Hi Roman,

let's really try to get this in. I have a couple of comments, but nothing major.

PS: Can you try to upload your patches with full context?

lib/Transform/ScheduleOptimizer.cpp
678 ↗	(On Diff #102968)	Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations. Also, this looks as if this is just a workaround for what is achieved with: https://reviews.llvm.org/D35237 If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes?
767 ↗	(On Diff #102968)	I think we should check that the access relations are equal and rely on D35237 to remove the problematic constraints. Would that work? Also, I really do not understand the MemAccessPtr != MMI.WriteToC part. Can you briefly explain why the old approach does not work any more?
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
4 ↗	(On Diff #102968)	We commonly try to limit the scop of test case as much as possible. Hence, I would suggest to not include DELICM here, but rather to use -polly-export-jscop after DELICM to get a transformation file we want to apply (use -polly-use-llvm-names to get the right names) and to then apply it with -polly-import-jscop. This will also allow us to already remove the access constraints the same way these constraints will be removed with https://reviews.llvm.org/D35237. Unfortunately, it seems the JSON import does not allow this at the moment. Would be great to see this fixed: https://bugs.llvm.org/show_bug.cgi?id=33807 But not sure that this should really hold back this patch.

grosser requested changes to this revision.Jul 16 2017, 2:51 AM

This revision now requires changes to proceed.Jul 16 2017, 2:51 AM

PS: Can you try to upload your patches with full context?

Sure. Thanks. Just missed that one.

Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations.

Also, this looks as if this is just a workaround for what is achieved with:

https://reviews.llvm.org/D35237

If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes?

Yes, it was a temporary workaround.

I've tried to simplify the revision and make it independent.

Nice!

This revision is now accepted and ready to land.Jul 19 2017, 9:33 AM

Closed by commit rL308494: Make the pattern matching work with modified memory accesses (authored by romangareev). · Explain WhyJul 19 2017, 10:02 AM

This revision was automatically updated to reflect the committed changes.

Meinersbur added inline comments.Jul 19 2017, 1:15 PM

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4	This line pipes binary data to FileCheck. You can use `-disable-output` to avoid this. If using `-debug`, you also need a `REQUIRES: asserts` line. In non-assert builds, there is no `-debug` switch. I would very much like if we do not `-debug` in regression tests at all, because of the problems above (and as the name implies, it's meant for debugging; It makes adding additional debug-output difficult, some test may fail because of it). Could we find another way to detect if a matrix-multiplication has been detected. We could use `-analyze` `-pass-remarks-analysis` (potentially with YAML output) Check for some characteristic output in the IR after codegen. Check `-stats` (e.g. Number of detected matmul patterns)

gareevroman added inline comments.Jul 22 2017, 5:13 AM

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4	I like the fourth option. I can try to implement it, if there are no objections.

Meinersbur added inline comments.Jul 22 2017, 9:50 AM

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4	That would be great.

Revision Contents

Path

Size

polly/

trunk/

lib/

Transform/

ScheduleOptimizer.cpp

20 lines

test/

ScheduleOptimizer/

kernel_gemm___%for.cond1.preheader---%for.end18.jscop.transformed

46 lines

pattern-matching-based-opts_11.ll

52 lines

Diff 107329

polly/trunk/lib/Transform/ScheduleOptimizer.cpp

Show First 20 Lines • Show All 682 Lines • ▼ Show 20 Lines
///		///
/// @param MemAccess The memory access to be checked.		/// @param MemAccess The memory access to be checked.
/// @param MMI Parameters of the matrix multiplication operands.		/// @param MMI Parameters of the matrix multiplication operands.
/// @return True in case the memory access represents the read access		/// @return True in case the memory access represents the read access
/// to a non-scalar operand of the matrix multiplication and		/// to a non-scalar operand of the matrix multiplication and
/// false, otherwise.		/// false, otherwise.
static bool isMatMulNonScalarReadAccess(MemoryAccess *MemAccess,		static bool isMatMulNonScalarReadAccess(MemoryAccess *MemAccess,
MatMulInfoTy &MMI) {		MatMulInfoTy &MMI) {
if (!MemAccess->isArrayKind() \|\| !MemAccess->isRead())		if (!MemAccess->isLatestArrayKind() \|\| !MemAccess->isRead())
return false;		return false;
isl_map *AccMap = MemAccess->getAccessRelation();		isl_map *AccMap = MemAccess->getLatestAccessRelation();
if (isMatMulOperandAcc(AccMap, MMI.i, MMI.j) && !MMI.ReadFromC &&		if (isMatMulOperandAcc(AccMap, MMI.i, MMI.j) && !MMI.ReadFromC &&
isl_map_n_basic_map(AccMap) == 1) {		isl_map_n_basic_map(AccMap) == 1) {
MMI.ReadFromC = MemAccess;		MMI.ReadFromC = MemAccess;
isl_map_free(AccMap);		isl_map_free(AccMap);
return true;		return true;
}		}
if (isMatMulOperandAcc(AccMap, MMI.i, MMI.k) && !MMI.A &&		if (isMatMulOperandAcc(AccMap, MMI.i, MMI.k) && !MMI.A &&
isl_map_n_basic_map(AccMap) == 1) {		isl_map_n_basic_map(AccMap) == 1) {
Show All 36 Lines	static bool containsOnlyMatrMultAcc(__isl_keep isl_map *PartialSchedule,
auto *MapI = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,		auto *MapI = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,
MMI.i, OutDimNum - 1);		MMI.i, OutDimNum - 1);
auto *MapJ = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,		auto *MapJ = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,
MMI.j, OutDimNum - 1);		MMI.j, OutDimNum - 1);
auto *MapK = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,		auto *MapK = permuteDimensions(isl_map_copy(PartialSchedule), isl_dim_out,
MMI.k, OutDimNum - 1);		MMI.k, OutDimNum - 1);
for (auto *MemA = Stmt->begin(); MemA != Stmt->end() - 1; MemA++) {		for (auto *MemA = Stmt->begin(); MemA != Stmt->end() - 1; MemA++) {
auto MemAccessPtr = MemA;		auto MemAccessPtr = MemA;
if (MemAccessPtr->isArrayKind() && MemAccessPtr != MMI.WriteToC &&		if (MemAccessPtr->isLatestArrayKind() && MemAccessPtr != MMI.WriteToC &&
!isMatMulNonScalarReadAccess(MemAccessPtr, MMI) &&		!isMatMulNonScalarReadAccess(MemAccessPtr, MMI) &&
!(MemAccessPtr->isStrideZero(isl_map_copy(MapI)) &&		!(MemAccessPtr->isStrideZero(isl_map_copy(MapI)) &&
MemAccessPtr->isStrideZero(isl_map_copy(MapJ)) &&		MemAccessPtr->isStrideZero(isl_map_copy(MapJ)) &&
MemAccessPtr->isStrideZero(isl_map_copy(MapK)))) {		MemAccessPtr->isStrideZero(isl_map_copy(MapK)))) {
isl_map_free(MapI);		isl_map_free(MapI);
isl_map_free(MapJ);		isl_map_free(MapJ);
isl_map_free(MapK);		isl_map_free(MapK);
return false;		return false;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static bool containsMatrMult(__isl_keep isl_map *PartialSchedule,
const Dependences *D, MatMulInfoTy &MMI) {		const Dependences *D, MatMulInfoTy &MMI) {
auto *InputDimsId = isl_map_get_tuple_id(PartialSchedule, isl_dim_in);		auto *InputDimsId = isl_map_get_tuple_id(PartialSchedule, isl_dim_in);
auto Stmt = static_cast<ScopStmt >(isl_id_get_user(InputDimsId));		auto Stmt = static_cast<ScopStmt >(isl_id_get_user(InputDimsId));
isl_id_free(InputDimsId);		isl_id_free(InputDimsId);
if (Stmt->size() <= 1)		if (Stmt->size() <= 1)
return false;		return false;
for (auto *MemA = Stmt->end() - 1; MemA != Stmt->begin(); MemA--) {		for (auto *MemA = Stmt->end() - 1; MemA != Stmt->begin(); MemA--) {
auto MemAccessPtr = MemA;		auto MemAccessPtr = MemA;
if (!MemAccessPtr->isArrayKind())		if (!MemAccessPtr->isLatestArrayKind())
continue;		continue;
if (!MemAccessPtr->isWrite())		if (!MemAccessPtr->isWrite())
return false;		return false;
auto *AccMap = MemAccessPtr->getAccessRelation();		auto *AccMap = MemAccessPtr->getLatestAccessRelation();
if (isl_map_n_basic_map(AccMap) != 1 \|\|		if (isl_map_n_basic_map(AccMap) != 1 \|\|
!isMatMulOperandAcc(AccMap, MMI.i, MMI.j)) {		!isMatMulOperandAcc(AccMap, MMI.i, MMI.j)) {
isl_map_free(AccMap);		isl_map_free(AccMap);
return false;		return false;
}		}
isl_map_free(AccMap);		isl_map_free(AccMap);
MMI.WriteToC = MemAccessPtr;		MMI.WriteToC = MemAccessPtr;
break;		break;
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	static __isl_give isl_schedule_node *optimizeDataLayoutMatrMulPattern(
auto *AccRel = getMatMulAccRel(isl_map_copy(MapOldIndVar), 3, 7);		auto *AccRel = getMatMulAccRel(isl_map_copy(MapOldIndVar), 3, 7);
unsigned FirstDimSize = MacroParams.Nc / MicroParams.Nr;		unsigned FirstDimSize = MacroParams.Nc / MicroParams.Nr;
unsigned SecondDimSize = MacroParams.Kc;		unsigned SecondDimSize = MacroParams.Kc;
unsigned ThirdDimSize = MicroParams.Nr;		unsigned ThirdDimSize = MicroParams.Nr;
auto *SAI = Stmt->getParent()->createScopArrayInfo(		auto *SAI = Stmt->getParent()->createScopArrayInfo(
MMI.B->getElementType(), "Packed_B",		MMI.B->getElementType(), "Packed_B",
{FirstDimSize, SecondDimSize, ThirdDimSize});		{FirstDimSize, SecondDimSize, ThirdDimSize});
AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());		AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());
auto *OldAcc = MMI.B->getAccessRelation();		auto *OldAcc = MMI.B->getLatestAccessRelation();
MMI.B->setNewAccessRelation(AccRel);		MMI.B->setNewAccessRelation(AccRel);
auto *ExtMap =		auto *ExtMap =
isl_map_project_out(isl_map_copy(MapOldIndVar), isl_dim_out, 2,		isl_map_project_out(isl_map_copy(MapOldIndVar), isl_dim_out, 2,
isl_map_dim(MapOldIndVar, isl_dim_out) - 2);		isl_map_dim(MapOldIndVar, isl_dim_out) - 2);
ExtMap = isl_map_reverse(ExtMap);		ExtMap = isl_map_reverse(ExtMap);
ExtMap = isl_map_fix_si(ExtMap, isl_dim_out, MMI.i, 0);		ExtMap = isl_map_fix_si(ExtMap, isl_dim_out, MMI.i, 0);
auto *Domain = Stmt->getDomain();		auto *Domain = Stmt->getDomain();

// Restrict the domains of the copy statements to only execute when also its		// Restrict the domains of the copy statements to only execute when also its
// originating statement is executed.		// originating statement is executed.
auto *DomainId = isl_set_get_tuple_id(Domain);		auto *DomainId = isl_set_get_tuple_id(Domain);
auto *NewStmt = Stmt->getParent()->addScopStmt(		auto *NewStmt = Stmt->getParent()->addScopStmt(
OldAcc, MMI.B->getAccessRelation(), isl_set_copy(Domain));		OldAcc, MMI.B->getLatestAccessRelation(), isl_set_copy(Domain));
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, isl_id_copy(DomainId));		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, isl_id_copy(DomainId));
ExtMap = isl_map_intersect_range(ExtMap, isl_set_copy(Domain));		ExtMap = isl_map_intersect_range(ExtMap, isl_set_copy(Domain));
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());
Node = createExtensionNode(Node, ExtMap);		Node = createExtensionNode(Node, ExtMap);

// Create a copy statement that corresponds to the memory access		// Create a copy statement that corresponds to the memory access
// to the matrix A, the first operand of the matrix multiplication.		// to the matrix A, the first operand of the matrix multiplication.
Node = isl_schedule_node_child(Node, 0);		Node = isl_schedule_node_child(Node, 0);
AccRel = getMatMulAccRel(isl_map_copy(MapOldIndVar), 4, 6);		AccRel = getMatMulAccRel(isl_map_copy(MapOldIndVar), 4, 6);
FirstDimSize = MacroParams.Mc / MicroParams.Mr;		FirstDimSize = MacroParams.Mc / MicroParams.Mr;
ThirdDimSize = MicroParams.Mr;		ThirdDimSize = MicroParams.Mr;
SAI = Stmt->getParent()->createScopArrayInfo(		SAI = Stmt->getParent()->createScopArrayInfo(
MMI.A->getElementType(), "Packed_A",		MMI.A->getElementType(), "Packed_A",
{FirstDimSize, SecondDimSize, ThirdDimSize});		{FirstDimSize, SecondDimSize, ThirdDimSize});
AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());		AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId());
OldAcc = MMI.A->getAccessRelation();		OldAcc = MMI.A->getLatestAccessRelation();
MMI.A->setNewAccessRelation(AccRel);		MMI.A->setNewAccessRelation(AccRel);
ExtMap = isl_map_project_out(MapOldIndVar, isl_dim_out, 3,		ExtMap = isl_map_project_out(MapOldIndVar, isl_dim_out, 3,
isl_map_dim(MapOldIndVar, isl_dim_out) - 3);		isl_map_dim(MapOldIndVar, isl_dim_out) - 3);
ExtMap = isl_map_reverse(ExtMap);		ExtMap = isl_map_reverse(ExtMap);
ExtMap = isl_map_fix_si(ExtMap, isl_dim_out, MMI.j, 0);		ExtMap = isl_map_fix_si(ExtMap, isl_dim_out, MMI.j, 0);
NewStmt = Stmt->getParent()->addScopStmt(OldAcc, MMI.A->getAccessRelation(),		NewStmt = Stmt->getParent()->addScopStmt(
isl_set_copy(Domain));		OldAcc, MMI.A->getLatestAccessRelation(), isl_set_copy(Domain));

// Restrict the domains of the copy statements to only execute when also its		// Restrict the domains of the copy statements to only execute when also its
// originating statement is executed.		// originating statement is executed.
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, DomainId);		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, DomainId);
ExtMap = isl_map_intersect_range(ExtMap, Domain);		ExtMap = isl_map_intersect_range(ExtMap, Domain);
ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());		ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId());
Node = createExtensionNode(Node, ExtMap);		Node = createExtensionNode(Node, ExtMap);
Node = isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);		Node = isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0);
▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

polly/trunk/test/ScheduleOptimizer/kernel_gemm___%for.cond1.preheader---%for.end18.jscop.transformed

				{
				"arrays" : [
				{
				"name" : "MemRef_B",
				"sizes" : [ "*", "1024" ],
				"type" : "double"
				},
				{
				"name" : "MemRef_C",
				"sizes" : [ "*", "1024" ],
				"type" : "double"
				},
				{
				"name" : "New_MemRef_A",
				"sizes" : [ "1024", "1024" ],
				"type" : "double"
				}
				],
				"context" : "{ : }",
				"name" : "%for.cond1.preheader---%for.end18",
				"statements" : [
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_B[i2, i1] }"
				},
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> New_MemRef_A[i0, i2] }"
				},
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"
				}
				],
				"domain" : "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }",
				"name" : "Stmt_for_body6",
				"schedule" : "{ Stmt_for_body6[i0, i1, i2] -> [i0, i1, i2] }"
				}
				]
				}

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll

				; RUN: opt %loadPolly -polly-import-jscop \
				; RUN: -polly-import-jscop-postfix=transformed -polly -polly-delicm \
				; RUN: -polly-delicm-overapproximate-writes -polly-pattern-matching-based-opts \
				; RUN: -polly-opt-isl -debug < %s 2>&1 \| FileCheck %s
				MeinersburUnsubmitted Not Done Reply Inline Actions This line pipes binary data to FileCheck. You can use `-disable-output` to avoid this. If using `-debug`, you also need a `REQUIRES: asserts` line. In non-assert builds, there is no `-debug` switch. I would very much like if we do not `-debug` in regression tests at all, because of the problems above (and as the name implies, it's meant for debugging; It makes adding additional debug-output difficult, some test may fail because of it). Could we find another way to detect if a matrix-multiplication has been detected. We could use `-analyze` `-pass-remarks-analysis` (potentially with YAML output) Check for some characteristic output in the IR after codegen. Check `-stats` (e.g. Number of detected matmul patterns) Meinersbur: This line pipes binary data to FileCheck. You can use `-disable-output` to avoid this. If…
				gareevromanAuthorUnsubmitted Not Done Reply Inline Actions I like the fourth option. I can try to implement it, if there are no objections. gareevroman: I like the fourth option. I can try to implement it, if there are no objections.
				MeinersburUnsubmitted Not Done Reply Inline Actions That would be great. Meinersbur: That would be great.
				;
				; Check that the pattern matching detects the matrix multiplication pattern
				; in case scalar memory accesses were replaced by accesses to newly created
				; arrays.
				;
				; CHECK: The matrix multiplication pattern was detected
				;
				define void @kernel_gemm(i32 %ni, i32 %nj, i32 %nk, double %A, [1024 x double]* %B, [1024 x double]* %C) {
				entry:
				br label %entry.split

				entry.split: ; preds = %entry
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc16, %entry.split
				%indvars.iv35 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next36, %for.inc16 ]
				br label %for.cond4.preheader

				for.cond4.preheader: ; preds = %for.inc13, %for.cond1.preheader
				%indvars.iv32 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next33, %for.inc13 ]
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.cond4.preheader
				%indvars.iv = phi i64 [ 0, %for.cond4.preheader ], [ %indvars.iv.next, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [1024 x double], [1024 x double]* %B, i64 %indvars.iv, i64 %indvars.iv32
				%tmp = load double, double* %arrayidx8, align 8
				%mul = fmul double %tmp, %A
				%arrayidx12 = getelementptr inbounds [1024 x double], [1024 x double]* %C, i64 %indvars.iv35, i64 %indvars.iv32
				%tmp1 = load double, double* %arrayidx12, align 8
				%add = fadd double %tmp1, %mul
				store double %add, double* %arrayidx12, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.body6, label %for.inc13

				for.inc13: ; preds = %for.body6
				%indvars.iv.next33 = add nuw nsw i64 %indvars.iv32, 1
				%exitcond34 = icmp ne i64 %indvars.iv.next33, 1024
				br i1 %exitcond34, label %for.cond4.preheader, label %for.inc16

				for.inc16: ; preds = %for.inc13
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond37 = icmp ne i64 %indvars.iv.next36, 1024
				br i1 %exitcond37, label %for.cond1.preheader, label %for.end18

				for.end18: ; preds = %for.inc16
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Polly][WIP] Make the pattern matching work with modified memory accessesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 107329

polly/trunk/lib/Transform/ScheduleOptimizer.cpp

polly/trunk/test/ScheduleOptimizer/kernel_gemm___%for.cond1.preheader---%for.end18.jscop.transformed

polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll

[Polly][WIP] Make the pattern matching work with modified memory accesses
ClosedPublic