For MSVC compiler we moved the heavy-lifting of loop collapse feature into runtime.
For rectangular loops kmpc_process_loop_nest_rectang calculates total number of iterations, so that then loop nest can be processed as one 'openmp for' loop. kmpc_calc_original_ivs_rectang calculates original IVs from the overall IV for new for loop.
For non-rectangular loops __kmpc_for_collapsed_init on each thread returns a chunk to execute, formulated in terms of original IVs. So the loops are re-written to look ~like this (example with <=):
fetch iLBnew, iUBnew, jA0new, jUBnew, kA0new, kUBnew for the chunk; jA1new = kA1new = 0; for (i = iLBnew; i <= iUBnew; i += iStep) { for (j = i * jA1new + jA0new; j <= i * jB1 + jB0; j += jStep) { if ((i >= iUBnew) && (j > jUBnew)) goto done; for (k = j * kA1new + kA0new; k <= j * kB1 + kB0; k += kStep) { if ((i >= iUBnew) && (j >= jUBnew) && (k > kUBnew)) goto done; LOOP BODY } kA0new = kA0; kA1new = kA1; } jA0new = jA0; jA1new = jA1; } done:
I expect that it'll be easier to experiment with different implementations for non-rectangular loop collapse this way. E.g. pick different algorithms for triangular loops, or for when there are many threads available.
Why are we adding more ordinal numbers? I thought they weren't necessary anymore.