This is an archive of the discontinued LLVM Phabricator instance.

[Polly][WIP] Make the pattern matching work with modified memory accesses
ClosedPublic

Authored by gareevroman on May 12 2017, 10:10 AM.

Details

Summary

Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change their MemoryKind). Consequently, the pattern matching should take it into the account.

Diff Detail

Repository
rL LLVM

Event Timeline

gareevroman created this revision.May 12 2017, 10:10 AM
gareevroman added inline comments.
lib/Transform/ScheduleOptimizer.cpp
676 ↗(On Diff #98793)

Maybe there is a better way to examine the corresponding non-partial memory accesses.

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1 ↗(On Diff #98793)

The test case should be simplified. Also, in case the order of memory accesses is not preserved by DeLICM, it won’t be passed

Meinersbur edited edge metadata.May 12 2017, 2:37 PM

The isLatestArrayKind() part is clear, but what does the isMatMulOperandAcc part do?

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
2 ↗(On Diff #98793)

-polly-delicm-overapproximate-writes, but no -polly-delicm in the command line?

The isLatestArrayKind() part is clear, but what does the isMatMulOperandAcc part do?

AFAIU, DeLICM can produce partial memory accesses (e.g., [Bsize2, Asize2, Asize1, tmp18_pre, tmp11] -> { Stmt_KernelStmt[i0, i1, i2] -> MemRef_tmp12[i0, i1] : 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2 };). Consequently, the corresponding isl maps will contain additional constrains (e.g., 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2) along with o0 = i0 and o1 = i1 that we check using isMatMulOperandAcc. This part drops the additional constrains since we aren't interested in checking them.

I'm not sure whether the corresponding copy statements handle partial memory accesses correctly (I'll try to check it soon). However, the patch should already help to test the detection.

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
2 ↗(On Diff #98793)

Thanks. It should be added.

grosser edited edge metadata.May 13 2017, 3:08 AM

Hi Roman,

I just tried to run the following command on polybench 3.2 and could not get the pattern based optimizations to work:

clang linear-algebra/kernels/gemm/gemm.c -O3 -DPOLYBENCH_TIME -I utilities/ -mllvm -polly -mllvm -polly-tiling=true -mllvm -polly-position=before-vectorizer -mllvm -polly-enable-delicm -mllvm -debug-only=polly-delicm -mllvm -polly-delicm-overapproximate-writes -mllvm -debug-only=polly-ast -mllvm -polly=true -fno-vectorize -fno-inline -mllvm -polly-enable-simplify utilities/polybench.c

using

git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@302926 91177308-0d34-0410-b5e6-96231b3b80d8

as well as:

[Polly][WIP] Make the pattern matching work with modified memory accesses
Differential Revision: https://reviews.llvm.org/D33138

[Polly][Simplify] Remove writes that are overwritten.
Differential Revision: https://reviews.llvm.org/D33142

Any idea what might be missing?

Best,
Tobias

lib/Transform/ScheduleOptimizer.cpp
676 ↗(On Diff #98793)

Instead of inspecting the constraints, we could build a model of how the memory access should look like and then compare the memory access map with this model.

Hi Roman,

what is missing here?

Update the revision.

gareevroman marked 3 inline comments as done.Jun 18 2017, 7:26 AM

Hi Roman,

I just tried to run the following command on polybench 3.2 and could not get the pattern based optimizations to work:

clang linear-algebra/kernels/gemm/gemm.c -O3 -DPOLYBENCH_TIME -I utilities/ -mllvm -polly -mllvm -polly-tiling=true -mllvm -polly-position=before-vectorizer -mllvm -polly-enable-delicm -mllvm -debug-only=polly-delicm -mllvm -polly-delicm-overapproximate-writes -mllvm -debug-only=polly-ast -mllvm -polly=true -fno-vectorize -fno-inline -mllvm -polly-enable-simplify utilities/polybench.c

using

git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@302926 91177308-0d34-0410-b5e6-96231b3b80d8

as well as:

[Polly][WIP] Make the pattern matching work with modified memory accesses
Differential Revision: https://reviews.llvm.org/D33138

[Polly][Simplify] Remove writes that are overwritten.
Differential Revision: https://reviews.llvm.org/D33142

Any idea what might be missing?

Best,
Tobias

Hi Tobias,

The only problem I see here is the order of memory access. For some reason, the access to the matrix B is last one. The pattern matching should detect it, if the write access is the last memory access.

Could you try to run it without -polly-enable-simplify? It helps in my case.

lib/Transform/ScheduleOptimizer.cpp
676 ↗(On Diff #98793)

AFAIU, in case the FirstPos and the SecondPos are unknown (e.g., containsMatrMult), we would have to build and compare a separate model for every pair of unknown dimensions (e.g. S(i0, i1, k)->M(i0, i1) and S(i0, i1, k)->M(i1, i0) in case the loop nest contains three loops and P(n - 1, 2), where n is the number of dimensions of the loop nest in general case). I propose to leave that discussion for know.

767 ↗(On Diff #102968)

Domains of write and read accesses modified by DeLICM can be different. For example, write and read accesses to the matrix C from the attached test case are, respectively:

MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
  { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] };

ReadAccess := [Reduction Type: NONE] [Scalar: 1]
  { Stmt_for_body6[i0, i1, i2] -> MemRef1__phi[] };
new: { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] : 0 <= i0 <= 1023 and 0 <= i1 <=   1023 and 0 <= i2 <= 1023 };

Hi Roman,

let's really try to get this in. I have a couple of comments, but nothing major.

PS: Can you try to upload your patches with full context?

lib/Transform/ScheduleOptimizer.cpp
678 ↗(On Diff #102968)

Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations.

Also, this looks as if this is just a workaround for what is achieved with:

https://reviews.llvm.org/D35237

If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes?

767 ↗(On Diff #102968)

I think we should check that the access relations are equal and rely on D35237 to remove the problematic constraints. Would that work?

Also, I really do not understand the MemAccessPtr != MMI.WriteToC part. Can you briefly explain why the old approach does not work any more?

test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
4 ↗(On Diff #102968)

We commonly try to limit the scop of test case as much as possible. Hence, I would suggest to not include DELICM here, but rather to use -polly-export-jscop after DELICM to get a transformation file we want to apply (use -polly-use-llvm-names to get the right names) and to then apply it with -polly-import-jscop.

This will also allow us to already remove the access constraints the same way these constraints will be removed with https://reviews.llvm.org/D35237.

Unfortunately, it seems the JSON import does not allow this at the moment. Would be great to see this fixed: https://bugs.llvm.org/show_bug.cgi?id=33807
But not sure that this should really hold back this patch.

grosser requested changes to this revision.Jul 16 2017, 2:51 AM
This revision now requires changes to proceed.Jul 16 2017, 2:51 AM
gareevroman edited edge metadata.
gareevroman edited the summary of this revision. (Show Details)

PS: Can you try to upload your patches with full context?

Sure. Thanks. Just missed that one.

Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations.

Also, this looks as if this is just a workaround for what is achieved with:

https://reviews.llvm.org/D35237

If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes?

Yes, it was a temporary workaround.

I've tried to simplify the revision and make it independent.

grosser accepted this revision.Jul 19 2017, 9:33 AM

Nice!

This revision is now accepted and ready to land.Jul 19 2017, 9:33 AM
This revision was automatically updated to reflect the committed changes.
Meinersbur added inline comments.Jul 19 2017, 1:15 PM
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4

This line pipes binary data to FileCheck. You can use -disable-output to avoid this.

If using -debug, you also need a REQUIRES: asserts line. In non-assert builds, there is no -debug switch.

I would very much like if we do not -debug in regression tests at all, because of the problems above (and as the name implies, it's meant for debugging; It makes adding additional debug-output difficult, some test may fail because of it). Could we find another way to detect if a matrix-multiplication has been detected. We could use

  1. -analyze
  2. -pass-remarks-analysis (potentially with YAML output)
  3. Check for some characteristic output in the IR after codegen.
  4. Check -stats (e.g. Number of detected matmul patterns)
gareevroman added inline comments.Jul 22 2017, 5:13 AM
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4

I like the fourth option. I can try to implement it, if there are no objections.

Meinersbur added inline comments.Jul 22 2017, 9:50 AM
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll
1–4

That would be great.