From the standard: A doacross loop nest is a loop nest that has cross-iteration
dependence. An iteration is dependent on one or more lexicographically earlier
iterations. The ordered clause parameter on a loop directive identifies the
loop(s) associated with the doacross loop nest.
The init/fini routines allocate/free doacross buffer(s) for each loop for each thread.
The wait routine waits for a flag designated by the dependence vector. The post routine sets the flag designated by current iteration vector.
We use a similar technique of shared buffer indices that covers up to 7 nowait loops executed simultaneously by different threads (number 7 has no real meaning, just heuristic value).
Also, the size of structures are kept intact via reducing dummy arrays.
This needs to be put into the OpenMP runtime library in order for the compiler
team to develop the compiler side of the implementation.