[Polly][MatMul] Make MatMul detection independent of internal isl representations.
Needs ReviewPublic

Authored by annanay25 on Tue, Aug 8, 5:06 AM.

Details

Summary

The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Diff Detail

annanay25 created this revision.Tue, Aug 8, 5:06 AM

Nice. Could you possibly add a test case?

lib/Transform/ScheduleOptimizer.cpp
595

Can you pull the MemAccess->getStatement()->getDomain() out?

Meinersbur added inline comments.Tue, Aug 8, 7:00 AM
lib/Transform/ScheduleOptimizer.cpp
535–541

[Suggestion] I think we don't need that check that there are exactly 2 constraints per basic_map anymore.

547–570

[Suggestion] I think it is possible to make a loop for this:

int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};
for (int i = 0; i <= 6; i+=1)
    PossibleMatMul[i] = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
572

[Style] remove commented code.

The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances).

Could you please show an example of such an isl_map?

lib/Transform/ScheduleOptimizer.cpp
606

Could we just drop the constraints not involving the output dimensions:

auto *NewAccMap = isl_map_drop_constraints_not_involving_dims(

isl_map_copy(AccMap), isl_dim_out, 0, 2);

and investigate the new map, using the old code?

if (NewAccMap.foreach_basic_map(Lambda) != isl::stat::ok || DimInPos[0] < 0 ||

Meinersbur added inline comments.Tue, Aug 8, 8:14 AM
lib/Transform/ScheduleOptimizer.cpp
606

This is not about constraints of uninvolved dims.

isl does not have unique representation for representing the same sets. An example are the sets

{ Stmt[0] -> MemRef[0]; Stmt[1] -> MemRef[1]; Stmt[2] -> MemRef[2] }
{ Stmt[i0] -> MemRef[i0] : 0 <= i0 < 3 }

which both represent the same logical set. There is isl_map_coalesce which unfortunately does not guaranttee to get to any cannonical form.

It is more robust to use operations that operate on logical sets instead to investigate the internal structure.

Nice. Could you possibly add a test case?

Yes, will do. Thanks!

lib/Transform/ScheduleOptimizer.cpp
547–570

Aha! Thanks! Thought I could individually comment the values of FirstPos and SecondPos for each domain.

572

Oops.

595

Will do.

606

Thanks for this discussion. I would also point to my post on the isl-development group for more information (and an example) on this.

Link: https://groups.google.com/forum/#!topic/isl-development/iURXrsSVC28

annanay25 added inline comments.Tue, Aug 8, 10:22 AM
lib/Transform/ScheduleOptimizer.cpp
547–570

@Meinersbur , should I change it to the loop according to your suggestion? Or should I leave the comments behind for readability?

gareevroman added inline comments.Tue, Aug 8, 10:28 AM
lib/Transform/ScheduleOptimizer.cpp
606

This is not about constraints of uninvolved dims.

isl does not have unique representation for representing the same sets. An example are the sets

{ Stmt[0] -> MemRef[0]; Stmt[1] -> MemRef[1]; Stmt[2] -> MemRef[2] }
{ Stmt[i0] -> MemRef[i0] : 0 <= i0 < 3 }

which both represent the same logical set. There is isl_map_coalesce which unfortunately does not guaranttee to get to any cannonical form.

Right. Dropping the constraints would work only if Stmt[0] -> MemRef[0]; is represented as Stmt[i0] -> MemRef[o0] : o0 = i0 and i0 = 1

P.S.: Btw, I'm not sure that it's safe to use optimizeDataLayoutMatrMulPattern in case A and B have constraints, since we build the corresponding AccessRelation from scratch and don't take old constraints into the account. Furthermore, probably, it'd require to modify the isolation algorithm.

Meinersbur added inline comments.Tue, Aug 8, 10:40 AM
lib/Transform/ScheduleOptimizer.cpp
547–570

This is up to you, I am ok with both versions. The more important is to have a test case such that we can commit this.

For the long term, I think we should avoid code duplication and make single loop for this and the following code:

// Permutations of the three matmul loops: 3! = 6 possibilities.
int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};

for (int i = 0; i <= 6; i+=1) {
    auto PossibleMatMul = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
    if (computeMatchedDimensions(AccMap, PossibleMatMul, FirstDims[i], SecondDims[i], FirstPos, SecondPos)
      return true;
}
return false;

Some minor speed-up might be possible:

int FirstDims[] = {0,0,1,1,2,2};
int SecondDims[] = {1,2,2,0,0,1};
for (int i = 0; i <= 6; i+=1) {
    if (FirstPos >= 0 && FirstPos != FirstDims[i])
      continue;
    if (SecondPos >= 0 && SecondPos != SecondDims[i])
      continue;

    auto PossibleMatMul = Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1);
    if (computeMatchedDimensions(AccMap, PossibleMatMul, FirstDims[i], SecondDims[i], FirstPos, SecondPos)
      return true;
}
return false;
annanay25 added inline comments.Wed, Aug 9, 2:50 AM
lib/Transform/ScheduleOptimizer.cpp
535–541

@Meinersbur ;
IMO it is necessary to ensure that there are exactly 2 out-dims. In the sense that, the access is of the form -

Stmt[i0, i1, i2] -> MemRef[o0, o1]

and should have two constraints, one each for o0 and o1. Ex:
o0 = i0 and o1 = i1

606

I am not familiar with the usage of optimizeDataLayoutMatrMulPattern function. If you think it is a cleaner way of doing this I will look into it, thanks!

Meinersbur added inline comments.Wed, Aug 9, 9:34 AM
lib/Transform/ScheduleOptimizer.cpp
535–541

isl_map_is_equal will test whether "o0 = i0" and "o1 = i1" are true.

Having exactly two constraints is neither a sufficient nor necessary condition. E.g.
{ Stmt[i0, i1, i2] -> MemRef[o0, o1] : o0 = i0 and i1 = 5 and o1= 5 }
fulfills the condition, and
[p0] -> { Stmt[i0, i1, i2] -> MemRef[o0, o1] : o0 = i0 and i1 = p0 }
does not.

annanay25 added inline comments.Wed, Aug 9, 9:47 AM
lib/Transform/ScheduleOptimizer.cpp
535–541

Yes, okay I agree.

When I remove this check and run check-polly , I get a failing test case (polly/test/ScheduleOptimizer/2012-04-16-Trivially-vectorizable-loops.ll) with AccMap as follows -

{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[o0] : o0 = 1536i0 + i1 }

This is because of the Universe.equate(isl::dim::in, FirstDims[i], isl::dim::out, 0).equate(isl::dim::in, SecondDims[i], isl::dim::out, 1).

@Meinersbur ;

So the reason I haven't been able to come up with a test case is

(lldb) po isl_map_dump(AccMap.ptr)                                                                                                                                                
{ Stmt_loop_body_reduction[i0, i1, i2] -> MemRef_retval[o0, o1] : (i2 = 783 and o0 = i0 and o1 = i1 and i1 <= 1 - i0 and i0 >= 0 and i0 <= 1 and i1 >= 0 and i1 <= 1) or (i0 = 1 and i1 = 1 and i2 = 783 and o0 = 1 and o1 = 1) }
(lldb) po isl_map_dump(PossibleMatMul.ptr)                                                                                                                                           
{ Stmt_loop_body_reduction[i0, i1, i2] -> MemRef_retval[o0, o1] : o0 = i0 and o1 = i1 and i0 >= 0 and i0 <= 1 and i1 >= 0 and i1 <= 1 and i2 >= 0 and i2 <= 783 }

These two maps are not detected as equal - AccMap.is_equal(PossibleMatMul) is false.

Tried removing extra dims / retaining constraints having only two input dims / etc.

Reason: o0 = 1 and o1 = 1 and i0 = 1 and i1 = 1 != o0 = i0 and o1 = i1.

annanay25 marked 17 inline comments as done.Fri, Aug 11, 5:08 AM
annanay25 updated this revision to Diff 110692.Fri, Aug 11, 5:11 AM

[todo] Test case.

annanay25 retitled this revision from [Polly][MatMul] Make MatMul detection independent of internal isl representations. to [Polly][WIP][MatMul] Make MatMul detection independent of internal isl representations..Fri, Aug 11, 5:11 AM

Can you add a test case using -polly-import=jscop?

lib/Transform/ScheduleOptimizer.cpp
539

nice!

564

[Compiler error] RedAccMap is not defined anywhere. Did you mean AccMap?

annanay25 updated this revision to Diff 110868.Sun, Aug 13, 4:51 AM
annanay25 marked 2 inline comments as done.
annanay25 retitled this revision from [Polly][WIP][MatMul] Make MatMul detection independent of internal isl representations. to [Polly][MatMul] Make MatMul detection independent of internal isl representations..

Added test case.

Addressed @Meinersbur's comments.

Removed test case dependency on DeLICM.