The patch optimizes the tiling of parallel loops with static bounds if the number of loop iterations is an integer multiple of the tile size.

# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

mlir/test/Dialect/SCF/parallel-loop-tiling.mlir | ||
---|---|---|

46–53 | Please avoid using VAL_<running_number> - instead C0, C3, C0_1, etc. for readability. A substitution at this point. |

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp | ||
---|---|---|

56 | Why do you dislike the | |

78 | This optimization makes sense here, as it is hard to recover this out of the generated affine min expression. But the comment could explain this better. Something like: If we statically know the size of the outer loops iteration and it is divisible by the tiling factor, we can use a static bound for the inner loop. Otherwise, we have to dynamically compute the bound for each iteration of the outer loop. | |

126 | I would rather have this as a separate canonicalization pattern that removes parallel loops with trip-count 1. |

- I started from the fixed tiling pass and tried to apply minimal changes (drop the minimum operation if the loop sizes is a multiple of the tile size)
- I added a canonicalization to drop single iteration loops
- I use CX / VX instead of VAL_X

I may missunderstand the thing but in mlir/test/Transforms/parallel-loop-collapsing.mlir the computation of I3 directly uses NEW_I0 and multiplies it by 10 and adds 9. Since NEW_I0 takes the values 0 and 1 this means we get the value 19 for I3 but we have the loop range 9 to 11 in the original loop. I guess NEW_I0 should be divided by two before computing the index I3?

Yes, you are right, there is a div missing there. Do you want to fix it or file a bug?

Also, it would be super awesome if you could split this into two, one adding the canonicalization and one changing the tiling.

mlir/lib/Dialect/SCF/SCF.cpp | ||
---|---|---|

738 ↗ | (On Diff #272002) | You also need to check whether |

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp | ||

86 | Why not zip over Value lowerBound, upperBound, originalStep, newStep, index; std::tie... | |

102 | You have assigned names to these |

Ok I try to split the stuff.

Regarding the collapse I will have a look at it next week.

- separated the loop tiling patch from the canonicalization
- addressed Stephan's comments

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp | ||
---|---|---|

90 | Please rephrase this: "size of the outer loops iteration" isn't meaningful. Also, you are only considering the case of lb, ub, step being constant, which isn't the same of trip counts being statically known/constant (for eg., %i = %N to %N + 16). |

- addressed the review comments
- use ceil division by the step to compute the number of loop iterations (to support the cases where the number of loop iterations is known but not a multiple of the step)
- adapted one of the test cases to test this scenario
- removed some unnecessary headers