Keep the original data type of integer do-variables
for structured loops. When do-variable's data type
is an integer type shorter than IndexType, processing
the do-variable separately from the DoLoop's iteration index
allows getting rid of type casts, which can make backend
optimizations easier.
For example,
do i = 2, n-1 do j = 2, n-1 ... = a(j-1, i) end do end do
If value of 'j' is computed by casting the DoLoop's iteration
index to 'i32', then Flang will produce the following LLVM IR:
%1 = trunc i64 %iter_index to i32 %2 = sub i32 %1, 1 %3 = sext i32 %2 to i64
LLVM's InstCombine may try to get rid of the sign extension,
and may transform this into:
%1 = shl i64 %iter_index, 32 %2 = add i64 %1, -4294967296 %3 = ashr exact i64 %2, 32
The extra computations for the element address applied on top
of this awkward pattern confuse LLVM vectorizer so that
it does not recognize the unit-strided access of 'a'.
Measured performance improvements on SPEC CPU2000@IceLake:
168.wupwise: 11.96% 171.swim: 11.22% 172.mrgid: 56.38% 178.galgel: 7.29% 301.apsi: 8.32%
// Do loop block argument holding the current value