Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change their MemoryKind). Consequently, the pattern matching should take it into the account.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Transform/ScheduleOptimizer.cpp | ||
---|---|---|
676 ↗ | (On Diff #98793) | Maybe there is a better way to examine the corresponding non-partial memory accesses. |
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
1 ↗ | (On Diff #98793) | The test case should be simplified. Also, in case the order of memory accesses is not preserved by DeLICM, it won’t be passed |
The isLatestArrayKind() part is clear, but what does the isMatMulOperandAcc part do?
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
---|---|---|
2 ↗ | (On Diff #98793) | -polly-delicm-overapproximate-writes, but no -polly-delicm in the command line? |
AFAIU, DeLICM can produce partial memory accesses (e.g., [Bsize2, Asize2, Asize1, tmp18_pre, tmp11] -> { Stmt_KernelStmt[i0, i1, i2] -> MemRef_tmp12[i0, i1] : 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2 };). Consequently, the corresponding isl maps will contain additional constrains (e.g., 0 <= i0 < Asize1 and 0 <= i1 < Bsize2 and 0 <= i2 < Asize2) along with o0 = i0 and o1 = i1 that we check using isMatMulOperandAcc. This part drops the additional constrains since we aren't interested in checking them.
I'm not sure whether the corresponding copy statements handle partial memory accesses correctly (I'll try to check it soon). However, the patch should already help to test the detection.
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
---|---|---|
2 ↗ | (On Diff #98793) | Thanks. It should be added. |
Hi Roman,
I just tried to run the following command on polybench 3.2 and could not get the pattern based optimizations to work:
clang linear-algebra/kernels/gemm/gemm.c -O3 -DPOLYBENCH_TIME -I utilities/ -mllvm -polly -mllvm -polly-tiling=true -mllvm -polly-position=before-vectorizer -mllvm -polly-enable-delicm -mllvm -debug-only=polly-delicm -mllvm -polly-delicm-overapproximate-writes -mllvm -debug-only=polly-ast -mllvm -polly=true -fno-vectorize -fno-inline -mllvm -polly-enable-simplify utilities/polybench.c
using
git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@302926 91177308-0d34-0410-b5e6-96231b3b80d8
as well as:
[Polly][WIP] Make the pattern matching work with modified memory accesses Differential Revision: https://reviews.llvm.org/D33138 [Polly][Simplify] Remove writes that are overwritten. Differential Revision: https://reviews.llvm.org/D33142
Any idea what might be missing?
Best,
Tobias
lib/Transform/ScheduleOptimizer.cpp | ||
---|---|---|
676 ↗ | (On Diff #98793) | Instead of inspecting the constraints, we could build a model of how the memory access should look like and then compare the memory access map with this model. |
Hi Tobias,
The only problem I see here is the order of memory access. For some reason, the access to the matrix B is last one. The pattern matching should detect it, if the write access is the last memory access.
Could you try to run it without -polly-enable-simplify? It helps in my case.
lib/Transform/ScheduleOptimizer.cpp | ||
---|---|---|
676 ↗ | (On Diff #98793) | AFAIU, in case the FirstPos and the SecondPos are unknown (e.g., containsMatrMult), we would have to build and compare a separate model for every pair of unknown dimensions (e.g. S(i0, i1, k)->M(i0, i1) and S(i0, i1, k)->M(i1, i0) in case the loop nest contains three loops and P(n - 1, 2), where n is the number of dimensions of the loop nest in general case). I propose to leave that discussion for know. |
767 ↗ | (On Diff #102968) | Domains of write and read accesses modified by DeLICM can be different. For example, write and read accesses to the matrix C from the attached test case are, respectively: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }; ReadAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_body6[i0, i1, i2] -> MemRef1__phi[] }; new: { Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }; |
Hi Roman,
let's really try to get this in. I have a couple of comments, but nothing major.
PS: Can you try to upload your patches with full context?
lib/Transform/ScheduleOptimizer.cpp | ||
---|---|---|
678 ↗ | (On Diff #102968) | Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations. Also, this looks as if this is just a workaround for what is achieved with: https://reviews.llvm.org/D35237 If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes? |
767 ↗ | (On Diff #102968) | I think we should check that the access relations are equal and rely on D35237 to remove the problematic constraints. Would that work? Also, I really do not understand the MemAccessPtr != MMI.WriteToC part. Can you briefly explain why the old approach does not work any more? |
test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
4 ↗ | (On Diff #102968) | We commonly try to limit the scop of test case as much as possible. Hence, I would suggest to not include DELICM here, but rather to use -polly-export-jscop after DELICM to get a transformation file we want to apply (use -polly-use-llvm-names to get the right names) and to then apply it with -polly-import-jscop. This will also allow us to already remove the access constraints the same way these constraints will be removed with https://reviews.llvm.org/D35237. Unfortunately, it seems the JSON import does not allow this at the moment. Would be great to see this fixed: https://bugs.llvm.org/show_bug.cgi?id=33807 |
PS: Can you try to upload your patches with full context?
Sure. Thanks. Just missed that one.
Dropping constraints just like this is not a good idea, as the result very much depends on the representation of the map. Are you sure this is save in all situations.
Also, this looks as if this is just a workaround for what is achieved with:
If I revert all these changes your test case still passes! Should we consequently not get D35237 in and remove these changes?
Yes, it was a temporary workaround.
I've tried to simplify the revision and make it independent.
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
---|---|---|
1–4 | This line pipes binary data to FileCheck. You can use -disable-output to avoid this. If using -debug, you also need a REQUIRES: asserts line. In non-assert builds, there is no -debug switch. I would very much like if we do not -debug in regression tests at all, because of the problems above (and as the name implies, it's meant for debugging; It makes adding additional debug-output difficult, some test may fail because of it). Could we find another way to detect if a matrix-multiplication has been detected. We could use
|
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
---|---|---|
1–4 | I like the fourth option. I can try to implement it, if there are no objections. |
polly/trunk/test/ScheduleOptimizer/pattern-matching-based-opts_11.ll | ||
---|---|---|
1–4 | That would be great. |
This line pipes binary data to FileCheck. You can use -disable-output to avoid this.
If using -debug, you also need a REQUIRES: asserts line. In non-assert builds, there is no -debug switch.
I would very much like if we do not -debug in regression tests at all, because of the problems above (and as the name implies, it's meant for debugging; It makes adding additional debug-output difficult, some test may fail because of it). Could we find another way to detect if a matrix-multiplication has been detected. We could use