The patch optimizes the parallel loop tiling if the loop bounds are statically known. We perform two optimizations: 1) avoid the bound computation for the inner loop if the number of loop iterations is a multiple of the tile size and 2) remove single-iteration loop dimensions. The latter is implemented as a canonicalization pattern.