When we have a distribute parallel for the compiler calls __kmpc_for_static_init_* which in turn calls __kmp_for_static_init. In the latter function, when no chunk size is specified in the dist_schedule clause, the code does not set pstride correctly; *pstride is left to be equal to 1. Whereas this is OK for plain parallel for-loops with no chunk size, it is obviously wrong for distribute parallel for-loops and causes chunks of the distributed loop to be distributed multiple times to multiple teams. The fix is to set *pstride to some value at least as large as the loop trip count.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM