There is an error in the determination of whether the loop is tightly nested. 
Therefore, the execution result may not be correct.
So I added a check to see if there are instructions between nested loops.
- Example
#include <stdio.h>
static   long  int i,j,k=3,m=3;
unsigned long  int x[50];
long double a[50][50];
int main() {
for (i=0; i<50; i++) for (j=0; j<50; j++) a[i][j]=i; for (i=0; i<50; i++) x[i]=1;
j=0;
for(i=40;i>1;i--) {
  k=20;
  while(k++<40) x[k]=x[j]%40;
  a[i][i]+=a[i][j]+a[j][i];
  for(m=9;;) if (--m<5) break;
}
for (i=0; i<50; i++) for (j=0; j<50; j++)
  printf("a:%ld %5.5Le \n",i,a[i][j]);
  return 0;
}*) Specified options are "-Ofast -mllvm -enable-loopinterchange=true -flegacy-pass-manager".
The difference between the results when loop interchange is enabled and disabled is shown below.
$ clang -O2 -flegacy-pass-manager -mllvm -enable-loopinterchange=true -Rpass=loop-interchange bug1362_tightlynested.c -o a_error.out bug1362_tightlynested.c:13:3: remark: Loop interchanged with enclosing loop. [-Rpass=loop-interchange] while(k++<40) x[k]=x[j]%40; ^ $ clang -O2 -flegacy-pass-manager -mllvm -enable-loopinterchange=false -Rpass=loop-interchange bug1362_tightlynested.c -o a_ok.out $ ./a_error.out > a_error.res $ ./a_ok.out > a_ok.res $ diff a_ok.res a_error.res 103c103 < a:2 4.00000e+00 --- > a:2 4.20000e+01 154c154 < a:3 6.00000e+00 --- > a:3 6.30000e+01 205c205 < a:4 8.00000e+00 --- > a:4 8.40000e+01 : Omit the following.
Is there any underlying reason that we selectively allow these instructions in InnerLoopPreheader, for example why zext? I'm wondering if there are specific reasons or is it an ad-hoc solution for now? The test program you provided does not seem to have these instructions in particular.
I have the same question for InnerLoopExit.