Make ICVs work across the board
I think we want a per warp array as we can reasonably synchronize then. The first draft did not have a preallocated array at all but it is hard to create it properly at runtime if you have no idea what threads are running.
This is a scheme that needs to be generalized for other "setters".
conceptually this should be the other way around, omp_get_thread_num calls this function. However, we should make the names actually match, so the impl. of omp_XXX is in __kmpc_XXX.
If we can get rid of static allocation here (basically the array) and use a dynamic one, it can help us a lot in terms of separating common part and target dependent part.