The original loop is like:
%15 = …
LOOP:
%26 = PHI %15, %31 … STORE -> [%26 + 32, %26 + 32 + 64] <- 64 is the access size … STORE -> [%26 + 0, %26 + 0 + 64] … STORE -> [%26 + 62, %26 + 62 + 64] … STORE -> [%26 + 94, %26 + 94 + 64] … %31 = ADD %26, 124
LOOP_END
The software pipelined loop is like:
%15 = … %234 = COPY %15
PROLOG:
… STORE -> [%234 + 32, %234 + 32 + 64] … STORE -> [%234 + 0, %234 + 0 + 64] … STORE -> [%234 + 62, %234 + 62 + 64] … %227 = ADD %234, 124
KERNEL:
%291 = PHI %227, %241 %292 = PHI %15, %291 … %303 = COPY %291 %241 = ADD %303, 124 … STORE -> [%303 + 32, %303 + 32 + 64] … STORE -> [%303 + 0, %303 + 0 + 64] … STORE -> [%292 + 94, %292 + 94 + 64] <- !!! … STORE -> [%303 + 62, %303 + 62 + 64]
…
KERNEL_END
EPILOG:
%299 = PHI %15, %303 STORE -> [%299 + 94, %299 + 94 + 64]
The new content in [offset 0, offset 0 + 64] in the next iteration is overwritten by the write [offset 94, offset94 + 64] in the previous iteration. The loop carried dependency could also exist between store and store instructions.
The diff should not be relative to the previously uploaded patch.