Add ForLoopBoundSpecialization pass, which specializes scf.for loops into a "main loop" where step divides the iteration space evenly and into an scf.if that handles the last iteration ("loop peeling").
This transformation is useful for vectorization and loop tiling. E.g., when vectorizing loads/stores, programs will spend most of their time in the main loop, in which only unmasked loads/stores are used. Only the in the last iteration (scf.if), slower masked loads/stores are used.
Subsequent commits will apply this transformation in the SparseDialect and in Linalg's loop tiling.