The current identification of a SCoP statement that implement a matrix multiplication does not help to identify different permutations of loops that contain it and check for dependencies, which can prevent it from being optimized. It also requires external determination of the operands of the matrix multiplication. This patch contains the implementation of a new algorithm that helps to avoid these issues.
Details
- Reviewers
Meinersbur grosser jdoerfert - Commits
- rG98075fe18152: A new algorithm for identification of a SCoP statement that implement a matrix…
rPLO293890: A new algorithm for identification of a SCoP statement that implement a matrix
rL293890: A new algorithm for identification of a SCoP statement that implement a matrix
Diff Detail
Event Timeline
I've found out that test/ScheduleOptimizer/pattern-matching-based-opts.ll, test/ScheduleOptimizer/pattern-matching-based-opts_2.ll, test/ScheduleOptimizer/pattern-matching-based-opts_3.ll contain code that produced array accesses, which should be delinearized to be identified by the algorithm. Since, as far as I understand, they are delinearized in common case, these test cases were modified. Should we commit it in a separate patch?
I did not review the soundness of the detection algorithm.
include/polly/ScheduleOptimizer.h | ||
---|---|---|
24–70 | Consider moving these into the polly namespace to not pollute the global namespace in file that include this header. This should also make the polly:: part unnecessary. | |
lib/Transform/ScheduleOptimizer.cpp | ||
497 | Consider making DimInPos an unsigned or i an int. I'd prefer the latter. | |
512 | This is the same \brief as for isMatMulOperandConstraint. Can you make them more specific? | |
526 | Please consider making i an unsigned or store the result of isl_local_space_dim in a temporary int. | |
568 | As a general rule, please consider using int (or another signed type if you need more bits) for loop induction variables. This allows the compiler to do more optimizations, static analyzers warn you that a looparound could occur nad ubsan to warn you if a looparound does oocur. | |
618 | Map = isl_map_set_tuple_id(Map, DimType, DimId); | |
650 | Please add braces | |
651–655 | Maybe store (*MemA) in a variable? | |
669–672 | k must also be at position Pos. | |
676 | I don't understand this description of Pos | |
694 | Using isl_set_plain_get_val_if_fixed makes the function a best-effort algorithm. isl_set_plain_get_val_if_fixed can return NULL and I don't see code handling this case. | |
723–724 | This could use foreachEltWithBreak and with isMatMulOperandBasicMap made a lambda. | |
731 | Consider using something more specific than check. | |
745 | Consider storing (*MemA) to a variable. | |
781 | Not sure whether using a unicode symbol (→) in source code is a good idea | |
782–784 |
Did you mean There are no ... except ... ? | |
811–817 | The last access could also be an arbitrary scalar access. Not sure whether on relying on the access' order is such a good idea. Could you use that last non-scalar access instead? These should be in-order. | |
1115–1116 | Consider declaring DimOutNum as int. |
Hi Roman,
here some more comments.
lib/Transform/ScheduleOptimizer.cpp | ||
---|---|---|
626 | doesn -> does | |
671 | correspondS to the dependenCY produced by | |
779 | relations: MA2, MA3, and MD4 represent | |
test/ScheduleOptimizer/pattern-matching-based-opts.ll | ||
65 | I don't think these attributes are needed, are they? | |
test/ScheduleOptimizer/pattern-matching-based-opts_2.ll | ||
67 | Drop these. |
Thanks for comments! I've updated the patch according to them.
P.S.: I've also tried to fix possible issues related to usage of ScheduleTreeOptimizer::createMacroKernel with band nodes that have more than three dimensions.
This is the same \brief as for isMatMulOperandConstraint. Can you make them more specific?
Right. Probably there is a duplication of code. I've tried to factor out the checking of coefficients of input and output dimensions.
This could use foreachEltWithBreak and with isMatMulOperandBasicMap made a lambda.
Wouldn't this require to capture variable to pass DimInPos and also rewrite isMatMulOperandBasicMap, using IslPtrs?
It seems that it wouldn't make the code more readable. Furthermore, according to my discussion with Tobias, probably it wouldn't good for the style of ScheduleOptimizer.cpp, if some of its functions used raw pointers and others used something else.
The last access could also be an arbitrary scalar access. Not sure whether on relying on the access' order is such a good idea. Could you use that last non-scalar access instead? These should be in-order.
I think we could do it.
It would make capturing FirstPost and SecondPos a lot easier as they don't need to be aggregated into a DimInPos and casted around. However, I agree that we may not want to introduce an IslPtr style here only because of this.
Michael
include/polly/ScheduleOptimizer.h | ||
---|---|---|
75 | I don't know why this is not also in the polly namespace. However, this is an unrelated change. | |
157–161 | multiplication | |
lib/Transform/ScheduleOptimizer.cpp | ||
659 | Another unicode character |
Hi Michael,
Thanks for comments! I've tried to address them, fix issues related to permuteDimensions and also simplify the identification.
It would make capturing FirstPost and SecondPos a lot easier as they don't need to be aggregated into a DimInPos and casted around. However, I agree that we may not want to introduce an IslPtr style here only because of this.
OK. I'll fix it when everybody agrees to introduce an IslPtr style in ScheduleTreeOptimizer.
I don't know why this is not also in the polly namespace. However, this is an unrelated change.
OK. I'll try to fix it soon.
I don't really know how this new algorithm is better than the current one. Unless Tobias has a remark regarding that, I'd trust you and accept the change.
test/ScheduleOptimizer/pattern-matching-based-opts_3.ll | ||
---|---|---|
193 | Probably Tobias meant to drop this one as well. |
Consider moving these into the polly namespace to not pollute the global namespace in file that include this header. This should also make the polly:: part unnecessary.