Hi Hal,
Please find attached the patch to enable interchange of loops having reductions. The logic to detect a reduction/induction is borrowed from loop vectorizer code.
With this change we are now able to interchange matrix multiplication code such as one below-
for( int i=1;i<2048;i++) for( int j=1;j<2048;j++) for( int k=1;k<2048;k++) A[i][j]+=B[i][k]*C[k][j];
into -
for( int k=1;k<2048;k++) for( int i=1;i<2048;i++) for( int j=1;j<2048;j++) A[i][j]+=B[i][k]*C[k][j];
which now gets vectorized.
We observe a ~3X execution time improvement in the above code.
Please if you could let me know your inputs on the same.
Thanks and Regards
Karthik Bhat