Added codegen for scan directives in parallel for regions.
Emits the code for the directive with inscan reductions.
Original code:
#pragma omp parallel for reduction(inscan, op : ...) for() { <input phase>; #pragma omp scan (in)exclusive(...) <scan phase> }
is transformed to something:
#pragma omp parallel { size num_iters = <num_iters>; <type> buffer[num_iters]; #pragma omp for for (i: 0..<num_iters>) { <input phase>; buffer[i] = red; } #pragma omp barrier for (int k = 0; k != ceil(log2(num_iters)); ++k) for (size cnt = last_iter; cnt >= pow(2, k); --k) buffer[i] op= buffer[i-pow(2,k)]; #pragma omp for for (0..<num_iters>) { red = InclusiveScan ? buffer[i] : buffer[i-1]; <scan phase>; } }
This looks pretty much like D81658, right? Could we avoid the duplication and maybe use a templated function (or something else)?