[Polly][MatMul] Make MatMul detection independent of internal isl representations.
ClosedPublic

Authored by annanay25 on Aug 8 2017, 5:06 AM.

Details

Summary

The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Diff Detail

Repository
rL LLVM
annanay25 created this revision.Aug 8 2017, 5:06 AM

Nice. Could you possibly add a test case?

lib/Transform/ScheduleOptimizer.cpp
650 ↗(On Diff #110173)

Can you pull the MemAccess->getStatement()->getDomain() out?

Meinersbur added inline comments.Aug 8 2017, 7:00 AM
lib/Transform/ScheduleOptimizer.cpp
570–576 ↗(On Diff #110173)

[Suggestion] I think we don't need that check that there are exactly 2 constraints per basic_map anymore.

584–607 ↗(On Diff #110173)

[Suggestion] I think it is possible to make a loop for this:

int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};
for (int i = 0; i <= 6; i+=1)
    PossibleMatMul[i] = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
609 ↗(On Diff #110173)

[Style] remove commented code.

The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances).

Could you please show an example of such an isl_map?

lib/Transform/ScheduleOptimizer.cpp
604 ↗(On Diff #110173)

Could we just drop the constraints not involving the output dimensions:

auto *NewAccMap = isl_map_drop_constraints_not_involving_dims(

isl_map_copy(AccMap), isl_dim_out, 0, 2);

and investigate the new map, using the old code?

if (NewAccMap.foreach_basic_map(Lambda) != isl::stat::ok || DimInPos[0] < 0 ||

Meinersbur added inline comments.Aug 8 2017, 8:14 AM
lib/Transform/ScheduleOptimizer.cpp
604 ↗(On Diff #110173)

This is not about constraints of uninvolved dims.

isl does not have unique representation for representing the same sets. An example are the sets

{ Stmt[0] -> MemRef[0]; Stmt[1] -> MemRef[1]; Stmt[2] -> MemRef[2] }
{ Stmt[i0] -> MemRef[i0] : 0 <= i0 < 3 }

which both represent the same logical set. There is isl_map_coalesce which unfortunately does not guaranttee to get to any cannonical form.

It is more robust to use operations that operate on logical sets instead to investigate the internal structure.

Nice. Could you possibly add a test case?

Yes, will do. Thanks!

lib/Transform/ScheduleOptimizer.cpp
584–607 ↗(On Diff #110173)

Aha! Thanks! Thought I could individually comment the values of FirstPos and SecondPos for each domain.

604 ↗(On Diff #110173)

Thanks for this discussion. I would also point to my post on the isl-development group for more information (and an example) on this.

Link: https://groups.google.com/forum/#!topic/isl-development/iURXrsSVC28

609 ↗(On Diff #110173)

Oops.

650 ↗(On Diff #110173)

Will do.

annanay25 added inline comments.Aug 8 2017, 10:22 AM
lib/Transform/ScheduleOptimizer.cpp
584–607 ↗(On Diff #110173)

@Meinersbur , should I change it to the loop according to your suggestion? Or should I leave the comments behind for readability?

gareevroman added inline comments.Aug 8 2017, 10:28 AM
lib/Transform/ScheduleOptimizer.cpp
604 ↗(On Diff #110173)

This is not about constraints of uninvolved dims.

isl does not have unique representation for representing the same sets. An example are the sets

{ Stmt[0] -> MemRef[0]; Stmt[1] -> MemRef[1]; Stmt[2] -> MemRef[2] }
{ Stmt[i0] -> MemRef[i0] : 0 <= i0 < 3 }

which both represent the same logical set. There is isl_map_coalesce which unfortunately does not guaranttee to get to any cannonical form.

Right. Dropping the constraints would work only if Stmt[0] -> MemRef[0]; is represented as Stmt[i0] -> MemRef[o0] : o0 = i0 and i0 = 1

P.S.: Btw, I'm not sure that it's safe to use optimizeDataLayoutMatrMulPattern in case A and B have constraints, since we build the corresponding AccessRelation from scratch and don't take old constraints into the account. Furthermore, probably, it'd require to modify the isolation algorithm.

Meinersbur added inline comments.Aug 8 2017, 10:40 AM
lib/Transform/ScheduleOptimizer.cpp
584–607 ↗(On Diff #110173)

This is up to you, I am ok with both versions. The more important is to have a test case such that we can commit this.

For the long term, I think we should avoid code duplication and make single loop for this and the following code:

// Permutations of the three matmul loops: 3! = 6 possibilities.
int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};

for (int i = 0; i <= 6; i+=1) {
    auto PossibleMatMul = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
    if (computeMatchedDimensions(AccMap, PossibleMatMul, FirstDims[i], SecondDims[i], FirstPos, SecondPos)
      return true;
}
return false;

Some minor speed-up might be possible:

int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};
for (int i = 0; i <= 6; i+=1) {
    if (FirstPos >= 0 && FirstPos != FirstDims[i])
      continue;
    if (SecondPos >= 0 && SecondPos != SecondDims[i])
      continue;

    auto PossibleMatMul = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
    if (computeMatchedDimensions(AccMap, PossibleMatMul, FirstDims[i], SecondDims[i], FirstPos, SecondPos)
      return true;
}
return false;
annanay25 added inline comments.Aug 9 2017, 2:50 AM
lib/Transform/ScheduleOptimizer.cpp
570–576 ↗(On Diff #110173)

@Meinersbur ;
IMO it is necessary to ensure that there are exactly 2 out-dims. In the sense that, the access is of the form -

Stmt[i0, i1, i2] -> MemRef[o0, o1]

and should have two constraints, one each for o0 and o1. Ex:
o0 = i0 and o1 = i1

604 ↗(On Diff #110173)

I am not familiar with the usage of optimizeDataLayoutMatrMulPattern function. If you think it is a cleaner way of doing this I will look into it, thanks!

Meinersbur added inline comments.Aug 9 2017, 9:34 AM
lib/Transform/ScheduleOptimizer.cpp
570–576 ↗(On Diff #110173)

isl_map_is_equal will test whether "o0 = i0" and "o1 = i1" are true.

Having exactly two constraints is neither a sufficient nor necessary condition. E.g.
{ Stmt[i0, i1, i2] -> MemRef[o0, o1] : o0 = i0 and i1 = 5 and o1= 5 }
fulfills the condition, and
[p0] -> { Stmt[i0, i1, i2] -> MemRef[o0, o1] : o0 = i0 and i1 = p0 }
does not.

annanay25 added inline comments.Aug 9 2017, 9:47 AM
lib/Transform/ScheduleOptimizer.cpp
570–576 ↗(On Diff #110173)

Yes, okay I agree.

When I remove this check and run check-polly , I get a failing test case (polly/test/ScheduleOptimizer/2012-04-16-Trivially-vectorizable-loops.ll) with AccMap as follows -

{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[o0] : o0 = 1536i0 + i1 }

This is because of the Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1).

@Meinersbur ;

So the reason I haven't been able to come up with a test case is

(lldb) po isl_map_dump(AccMap.ptr)                                                                                                                                                
{ Stmt_loop_body_reduction[i0, i1, i2] -> MemRef_retval[o0, o1] : (i2 = 783 and o0 = i0 and o1 = i1 and i1 <= 1 - i0 and i0 >= 0 and i0 <= 1 and i1 >= 0 and i1 <= 1) or (i0 = 1 and i1 = 1 and i2 = 783 and o0 = 1 and o1 = 1) }
(lldb) po isl_map_dump(PossibleMatMul.ptr)                                                                                                                                           
{ Stmt_loop_body_reduction[i0, i1, i2] -> MemRef_retval[o0, o1] : o0 = i0 and o1 = i1 and i0 >= 0 and i0 <= 1 and i1 >= 0 and i1 <= 1 and i2 >= 0 and i2 <= 783 }

These two maps are not detected as equal - AccMap.is_equal(PossibleMatMul) is false.

Tried removing extra dims / retaining constraints having only two input dims / etc.

Reason: o0 = 1 and o1 = 1 and i0 = 1 and i1 = 1 != o0 = i0 and o1 = i1.

annanay25 marked 17 inline comments as done.Aug 11 2017, 5:08 AM
annanay25 updated this revision to Diff 110692.Aug 11 2017, 5:11 AM

[todo] Test case.

annanay25 retitled this revision from [Polly][MatMul] Make MatMul detection independent of internal isl representations. to [Polly][WIP][MatMul] Make MatMul detection independent of internal isl representations..Aug 11 2017, 5:11 AM

Can you add a test case using -polly-import=jscop?

lib/Transform/ScheduleOptimizer.cpp
537 ↗(On Diff #110692)

nice!

562 ↗(On Diff #110692)

[Compiler error] RedAccMap is not defined anywhere. Did you mean AccMap?

annanay25 updated this revision to Diff 110868.Aug 13 2017, 4:51 AM
annanay25 marked 2 inline comments as done.
annanay25 retitled this revision from [Polly][WIP][MatMul] Make MatMul detection independent of internal isl representations. to [Polly][MatMul] Make MatMul detection independent of internal isl representations..

Added test case.

Addressed @Meinersbur's comments.

Removed test case dependency on DeLICM.

Meinersbur accepted this revision.Aug 20 2017, 2:29 PM
This revision is now accepted and ready to land.Aug 20 2017, 2:29 PM
This revision was automatically updated to reflect the committed changes.
gareevroman added inline comments.Aug 21 2017, 12:00 AM
lib/Transform/ScheduleOptimizer.cpp
604 ↗(On Diff #110173)

Sorry, for some reason, I haven't received this comment. However, optimizeDataLayoutMatrMulPattern is used by Polly. I'll try to check and, probably, update optimizeDataLayoutMatrMulPattern, when I have time.

P.S.: Btw, I'm not sure that it's safe to use optimizeDataLayoutMatrMulPattern in case A and B have constraints, since we build the corresponding AccessRelation from scratch and don't take old constraints into the account. Furthermore, probably, it'd require to modify the isolation algorithm.

@Meinersbur are you sure that it's always safe to use this detection?

Meinersbur added inline comments.Aug 21 2017, 6:00 AM
lib/Transform/ScheduleOptimizer.cpp
604 ↗(On Diff #110173)

Can you elaborate on your concerns? The check should do the same as before, except not requiring inspection of the isl_map's internal representation, thus also accepting internal representations that are equivalent.

If you mean constraints on parameters or domains, then we'd have the same problems as before, because such constraints are gist'ed away for the access relation.