This change improves the way threads are spread across cores when OMP_PROC_BIND=spread is set and no unusual affinity masks are in use.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Paul,
Can you be more specific about how this "improves the way threads are spread across cores"? The OMP spec is very specific on exactly how the spread algorithm should work.
Thanks!
Terry
Let me illustrate how current algorithm works for 96 cores (dual CPU) machine (stars are threads):
1 [*_______________________________________________________________________________________________] 2 [*_______________________________________________*_______________________________________________] 3 [*_______________________________*_______________________________*_______________________________] 4 [*_______________________*_______________________*_______________________*_______________________] 5 [*___________________*__________________*__________________*__________________*__________________] 6 [*_______________*_______________*_______________*_______________*_______________*_______________] 7 [*_____________*_____________*_____________*_____________*_____________*____________*____________] 8 [*___________*___________*___________*___________*___________*___________*___________*___________] 9 [*__________*__________*__________*__________*__________*__________*_________*_________*_________] 10 [*_________*_________*_________*_________*_________*_________*________*________*________*________] 11 [*________*________*________*________*________*________*________*________*_______*_______*_______] 12 [*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______] 13 [*_______*______*_______*______*_______*______*_______*______*_______*______*______*______*______] 14 [*______*______*______*______*______*______*______*______*______*______*______*______*_____*_____] 15 [*______*_____*______*_____*______*_____*______*_____*______*_____*______*_____*_____*_____*_____] 16 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____] 17 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*____*____*____*____*____*____] 18 [*_____*____*____*_____*____*____*_____*____*____*_____*____*____*_____*____*____*_____*____*____] 19 [*_____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____] 20 [*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*___*___*___*___] 21 [*____*____*____*____*____*____*____*____*____*____*____*____*___*___*___*___*___*___*___*___*___] 22 [*____*___*____*___*____*___*____*___*____*___*____*___*____*___*____*___*___*___*___*___*___*___] 23 [*____*___*___*___*___*____*___*___*___*___*____*___*___*___*___*____*___*___*___*___*___*___*___] 24 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___] 25 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__] 26 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__*__*__*__*__] 27 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__*__*__*__*__*__*__*__*__] 28 [*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*__*__*__*__] 29 [*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*__*__] 30 [*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__] 31 [*___*__*__*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*__*__*__] 32 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__] 33 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_] 34 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_] 35 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_] 36 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_] 37 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 38 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 39 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_] 40 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_*_*_*_*_*_] 41 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 42 [*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*_*_*_*_*_] 43 [*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*_*_*_] 44 [*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*_*_*_*_] 45 [*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_] 46 [*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_] 47 [*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 48 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 49 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**] 50 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****] 51 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******] 52 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_********] 53 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**********] 54 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_************] 55 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**************] 56 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****************] 57 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******************] 58 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_********************] 59 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**********************] 60 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_************************] 61 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**************************] 62 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****************************] 63 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******************************] 64 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*] 65 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_****] 66 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*******] 67 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**********] 68 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*************] 69 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_****************] 70 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*******************] 71 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**********************] 72 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**] 73 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_******] 74 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**********] 75 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**************] 76 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_******************] 77 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****] 78 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_*********] 79 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_**************] 80 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_****] 81 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_**********] 82 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_****************] 83 [*_******_******_******_******_******_******_******_******_******_******_******_******_**********] 84 [*_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_******] 85 [*_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_**************] 86 [*_********_********_********_********_********_********_********_********_********_*************] 87 [*_*********_*********_*********_*********_*********_*********_*********_*********_**************] 88 [*_***********_***********_***********_***********_***********_***********_***********_**********] 89 [*_************_************_************_************_************_************_****************] 90 [*_***************_***************_***************_***************_***************_**************] 91 [*_******************_******************_******************_******************_******************] 92 [*_***********************_***********************_***********************_**********************] 93 [*_*******************************_*******************************_******************************] 94 [*_***********************************************_**********************************************] 95 [*_**********************************************************************************************] 96 [************************************************************************************************]
As you can see, with growing number of threads, there's a great imbalance between those two CPUs (second CPU gets more threads to run).
I wouldn't call it a good spread.
Comparing to this, my code gives you following results:
1 [*_______________________________________________________________________________________________] 1 / 96 : spacing = 97.00000, first = 1, second = 0, abs(diff) = 1, cnt = 1, cnt_ok = 1 2 [*_______________________________________________*_______________________________________________] 2 / 96 : spacing = 48.50000, first = 1, second = 1, abs(diff) = 0, cnt = 2, cnt_ok = 1 3 [*_______________________________*_______________________________*_______________________________] 3 / 96 : spacing = 32.33333, first = 2, second = 1, abs(diff) = 1, cnt = 3, cnt_ok = 1 4 [*_______________________*_______________________*_______________________*_______________________] 4 / 96 : spacing = 24.25000, first = 2, second = 2, abs(diff) = 0, cnt = 4, cnt_ok = 1 5 [*__________________*__________________*___________________*__________________*__________________] 5 / 96 : spacing = 19.40000, first = 3, second = 2, abs(diff) = 1, cnt = 5, cnt_ok = 1 6 [*_______________*_______________*_______________*_______________*_______________*_______________] 6 / 96 : spacing = 16.16667, first = 3, second = 3, abs(diff) = 0, cnt = 6, cnt_ok = 1 7 [*____________*_____________*_____________*_____________*_____________*_____________*____________] 7 / 96 : spacing = 13.85714, first = 4, second = 3, abs(diff) = 1, cnt = 7, cnt_ok = 1 8 [*___________*___________*___________*___________*___________*___________*___________*___________] 8 / 96 : spacing = 12.12500, first = 4, second = 4, abs(diff) = 0, cnt = 8, cnt_ok = 1 9 [*_________*__________*__________*__________*_________*__________*__________*__________*_________] 9 / 96 : spacing = 10.77778, first = 5, second = 4, abs(diff) = 1, cnt = 9, cnt_ok = 1 10 [*________*_________*_________*________*_________*_________*________*_________*_________*________] 10 / 96 : spacing = 9.70000, first = 5, second = 5, abs(diff) = 0, cnt = 10, cnt_ok = 1 11 [*_______*________*________*________*________*_______*________*________*________*________*_______] 11 / 96 : spacing = 8.81818, first = 6, second = 5, abs(diff) = 1, cnt = 11, cnt_ok = 1 12 [*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______] 12 / 96 : spacing = 8.08333, first = 6, second = 6, abs(diff) = 0, cnt = 12, cnt_ok = 1 13 [*______*______*_______*______*_______*______*_______*______*_______*______*_______*______*______] 13 / 96 : spacing = 7.46154, first = 7, second = 6, abs(diff) = 1, cnt = 13, cnt_ok = 1 14 [*_____*______*______*______*______*______*______*______*______*______*______*______*______*_____] 14 / 96 : spacing = 6.92857, first = 7, second = 7, abs(diff) = 0, cnt = 14, cnt_ok = 1 15 [*_____*_____*______*_____*______*_____*______*_____*______*_____*______*_____*______*_____*_____] 15 / 96 : spacing = 6.46667, first = 8, second = 7, abs(diff) = 1, cnt = 15, cnt_ok = 1 16 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____] 16 / 96 : spacing = 6.06250, first = 8, second = 8, abs(diff) = 0, cnt = 16, cnt_ok = 1 17 [*____*_____*_____*____*_____*_____*____*_____*_____*_____*____*_____*_____*____*_____*_____*____] 17 / 96 : spacing = 5.70588, first = 9, second = 8, abs(diff) = 1, cnt = 17, cnt_ok = 1 18 [*____*____*_____*____*____*_____*____*_____*____*____*_____*____*_____*____*____*_____*____*____] 18 / 96 : spacing = 5.38889, first = 9, second = 9, abs(diff) = 0, cnt = 18, cnt_ok = 1 19 [*____*____*____*____*____*____*____*____*____*_____*____*____*____*____*____*____*____*____*____] 19 / 96 : spacing = 5.10526, first = 10, second = 9, abs(diff) = 1, cnt = 19, cnt_ok = 1 20 [*___*____*____*____*____*____*___*____*____*____*____*____*____*___*____*____*____*____*____*___] 20 / 96 : spacing = 4.85000, first = 10, second = 10, abs(diff) = 0, cnt = 20, cnt_ok = 1 21 [*___*____*___*____*____*___*____*___*____*____*___*____*____*___*____*___*____*____*___*____*___] 21 / 96 : spacing = 4.61905, first = 11, second = 10, abs(diff) = 1, cnt = 21, cnt_ok = 1 22 [*___*___*____*___*____*___*___*____*___*____*___*___*____*___*____*___*___*____*___*____*___*___] 22 / 96 : spacing = 4.40909, first = 11, second = 11, abs(diff) = 0, cnt = 22, cnt_ok = 1 23 [*___*___*___*___*____*___*___*___*___*____*___*___*___*____*___*___*___*___*____*___*___*___*___] 23 / 96 : spacing = 4.21739, first = 12, second = 11, abs(diff) = 1, cnt = 23, cnt_ok = 1 24 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___] 24 / 96 : spacing = 4.04167, first = 12, second = 12, abs(diff) = 0, cnt = 24, cnt_ok = 1 25 [*__*___*___*___*___*___*___*___*__*___*___*___*___*___*___*___*__*___*___*___*___*___*___*___*__] 25 / 96 : spacing = 3.88000, first = 13, second = 12, abs(diff) = 1, cnt = 25, cnt_ok = 1 26 [*__*___*___*__*___*___*___*__*___*___*___*__*___*___*__*___*___*___*__*___*___*___*__*___*___*__] 26 / 96 : spacing = 3.73077, first = 13, second = 13, abs(diff) = 0, cnt = 26, cnt_ok = 1 27 [*__*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*__] 27 / 96 : spacing = 3.59259, first = 14, second = 13, abs(diff) = 1, cnt = 27, cnt_ok = 1 28 [*__*__*___*__*___*__*___*__*___*__*___*__*___*__*__*___*__*___*__*___*__*___*__*___*__*___*__*__] 28 / 96 : spacing = 3.46429, first = 14, second = 14, abs(diff) = 0, cnt = 28, cnt_ok = 1 29 [*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__] 29 / 96 : spacing = 3.34483, first = 15, second = 14, abs(diff) = 1, cnt = 29, cnt_ok = 1 30 [*__*__*__*__*___*__*__*__*___*__*__*__*___*__*__*__*__*___*__*__*__*___*__*__*__*___*__*__*__*__] 30 / 96 : spacing = 3.23333, first = 15, second = 15, abs(diff) = 0, cnt = 30, cnt_ok = 1 31 [*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__] 31 / 96 : spacing = 3.12903, first = 16, second = 15, abs(diff) = 1, cnt = 31, cnt_ok = 1 32 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__] 32 / 96 : spacing = 3.03125, first = 16, second = 16, abs(diff) = 0, cnt = 32, cnt_ok = 1 33 [*_*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_] 33 / 96 : spacing = 2.93939, first = 17, second = 16, abs(diff) = 1, cnt = 33, cnt_ok = 1 34 [*_*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*_] 34 / 96 : spacing = 2.85294, first = 17, second = 17, abs(diff) = 0, cnt = 34, cnt_ok = 1 35 [*_*__*__*__*_*__*__*__*_*__*__*__*__*_*__*__*__*_*__*__*__*_*__*__*__*__*_*__*__*__*_*__*__*__*_] 35 / 96 : spacing = 2.77143, first = 18, second = 17, abs(diff) = 1, cnt = 35, cnt_ok = 1 36 [*_*__*__*_*__*__*_*__*__*_*__*__*__*_*__*__*_*__*__*_*__*__*_*__*__*__*_*__*__*_*__*__*_*__*__*_] 36 / 96 : spacing = 2.69444, first = 18, second = 18, abs(diff) = 0, cnt = 36, cnt_ok = 1 37 [*_*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*_] 37 / 96 : spacing = 2.62162, first = 19, second = 18, abs(diff) = 1, cnt = 37, cnt_ok = 1 38 [*_*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*_] 38 / 96 : spacing = 2.55263, first = 19, second = 19, abs(diff) = 0, cnt = 38, cnt_ok = 1 39 [*_*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_] 39 / 96 : spacing = 2.48718, first = 20, second = 19, abs(diff) = 1, cnt = 39, cnt_ok = 1 40 [*_*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*_] 40 / 96 : spacing = 2.42500, first = 20, second = 20, abs(diff) = 0, cnt = 40, cnt_ok = 1 41 [*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_] 41 / 96 : spacing = 2.36585, first = 21, second = 20, abs(diff) = 1, cnt = 41, cnt_ok = 1 42 [*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_] 42 / 96 : spacing = 2.30952, first = 21, second = 21, abs(diff) = 0, cnt = 42, cnt_ok = 1 43 [*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_] 43 / 96 : spacing = 2.25581, first = 22, second = 21, abs(diff) = 1, cnt = 43, cnt_ok = 1 44 [*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_] 44 / 96 : spacing = 2.20455, first = 22, second = 22, abs(diff) = 0, cnt = 44, cnt_ok = 1 45 [*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_] 45 / 96 : spacing = 2.15556, first = 23, second = 22, abs(diff) = 1, cnt = 45, cnt_ok = 1 46 [*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_] 46 / 96 : spacing = 2.10870, first = 23, second = 23, abs(diff) = 0, cnt = 46, cnt_ok = 1 47 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 47 / 96 : spacing = 2.06383, first = 24, second = 23, abs(diff) = 1, cnt = 47, cnt_ok = 1 48 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 48 / 96 : spacing = 2.02083, first = 24, second = 24, abs(diff) = 0, cnt = 48, cnt_ok = 1 49 [**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*] 49 / 96 : spacing = 1.97959, first = 25, second = 24, abs(diff) = 1, cnt = 49, cnt_ok = 1 50 [**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*] 50 / 96 : spacing = 1.94000, first = 25, second = 25, abs(diff) = 0, cnt = 50, cnt_ok = 1 51 [**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*] 51 / 96 : spacing = 1.90196, first = 26, second = 25, abs(diff) = 1, cnt = 51, cnt_ok = 1 52 [**_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*] 52 / 96 : spacing = 1.86538, first = 26, second = 26, abs(diff) = 0, cnt = 52, cnt_ok = 1 53 [**_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*] 53 / 96 : spacing = 1.83019, first = 27, second = 26, abs(diff) = 1, cnt = 53, cnt_ok = 1 54 [**_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*] 54 / 96 : spacing = 1.79630, first = 27, second = 27, abs(diff) = 0, cnt = 54, cnt_ok = 1 55 [**_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*] 55 / 96 : spacing = 1.76364, first = 28, second = 27, abs(diff) = 1, cnt = 55, cnt_ok = 1 56 [**_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*] 56 / 96 : spacing = 1.73214, first = 28, second = 28, abs(diff) = 0, cnt = 56, cnt_ok = 1 57 [**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_*] 57 / 96 : spacing = 1.70175, first = 29, second = 28, abs(diff) = 1, cnt = 57, cnt_ok = 1 58 [**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_*] 58 / 96 : spacing = 1.67241, first = 29, second = 29, abs(diff) = 0, cnt = 58, cnt_ok = 1 59 [**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_*] 59 / 96 : spacing = 1.64407, first = 30, second = 29, abs(diff) = 1, cnt = 59, cnt_ok = 1 60 [**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_*] 60 / 96 : spacing = 1.61667, first = 30, second = 30, abs(diff) = 0, cnt = 60, cnt_ok = 1 61 [**_**_**_*_**_**_*_**_**_*_**_**_**_*_**_**_*_**_**_*_**_**_*_**_**_**_*_**_**_*_**_**_*_**_**_*] 61 / 96 : spacing = 1.59016, first = 31, second = 30, abs(diff) = 1, cnt = 61, cnt_ok = 1 62 [**_**_**_**_*_**_**_**_*_**_**_**_**_*_**_**_**_*_**_**_**_*_**_**_**_**_*_**_**_**_*_**_**_**_*] 62 / 96 : spacing = 1.56452, first = 31, second = 31, abs(diff) = 0, cnt = 62, cnt_ok = 1 63 [**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_*] 63 / 96 : spacing = 1.53968, first = 32, second = 31, abs(diff) = 1, cnt = 63, cnt_ok = 1 64 [**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*] 64 / 96 : spacing = 1.51562, first = 32, second = 32, abs(diff) = 0, cnt = 64, cnt_ok = 1 65 [***_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**] 65 / 96 : spacing = 1.49231, first = 33, second = 32, abs(diff) = 1, cnt = 65, cnt_ok = 1 66 [***_**_**_**_**_**_**_***_**_**_**_**_**_**_**_***_**_**_**_**_**_**_**_***_**_**_**_**_**_**_**] 66 / 96 : spacing = 1.46970, first = 33, second = 33, abs(diff) = 0, cnt = 66, cnt_ok = 1 67 [***_**_**_**_***_**_**_**_***_**_**_**_***_**_**_**_**_***_**_**_**_***_**_**_**_***_**_**_**_**] 67 / 96 : spacing = 1.44776, first = 34, second = 33, abs(diff) = 1, cnt = 67, cnt_ok = 1 68 [***_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**] 68 / 96 : spacing = 1.42647, first = 34, second = 34, abs(diff) = 0, cnt = 68, cnt_ok = 1 69 [***_**_***_**_***_**_***_**_***_**_***_**_***_**_**_***_**_***_**_***_**_***_**_***_**_***_**_**] 69 / 96 : spacing = 1.40580, first = 35, second = 34, abs(diff) = 1, cnt = 69, cnt_ok = 1 70 [***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_**] 70 / 96 : spacing = 1.38571, first = 35, second = 35, abs(diff) = 0, cnt = 70, cnt_ok = 1 71 [***_***_***_**_***_***_***_**_***_***_***_**_***_***_**_***_***_***_**_***_***_***_**_***_***_**] 71 / 96 : spacing = 1.36620, first = 36, second = 35, abs(diff) = 1, cnt = 71, cnt_ok = 1 72 [***_***_***_***_***_***_***_***_**_***_***_***_***_***_***_***_**_***_***_***_***_***_***_***_**] 72 / 96 : spacing = 1.34722, first = 36, second = 36, abs(diff) = 0, cnt = 72, cnt_ok = 1 73 [****_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***] 73 / 96 : spacing = 1.32877, first = 37, second = 36, abs(diff) = 1, cnt = 73, cnt_ok = 1 74 [****_***_***_***_****_***_***_***_***_****_***_***_***_****_***_***_***_***_****_***_***_***_***] 74 / 96 : spacing = 1.31081, first = 37, second = 37, abs(diff) = 0, cnt = 74, cnt_ok = 1 75 [****_***_****_***_****_***_***_****_***_****_***_***_****_***_****_***_***_****_***_****_***_***] 75 / 96 : spacing = 1.29333, first = 38, second = 37, abs(diff) = 1, cnt = 75, cnt_ok = 1 76 [****_****_***_****_****_***_****_***_****_****_***_****_****_***_****_***_****_****_***_****_***] 76 / 96 : spacing = 1.27632, first = 38, second = 38, abs(diff) = 0, cnt = 76, cnt_ok = 1 77 [****_****_****_****_****_****_***_****_****_****_****_****_****_***_****_****_****_****_****_***] 77 / 96 : spacing = 1.25974, first = 39, second = 38, abs(diff) = 1, cnt = 77, cnt_ok = 1 78 [*****_****_****_****_****_****_****_****_****_*****_****_****_****_****_****_****_****_****_****] 78 / 96 : spacing = 1.24359, first = 39, second = 39, abs(diff) = 0, cnt = 78, cnt_ok = 1 79 [*****_****_*****_****_****_*****_****_*****_****_****_*****_****_*****_****_****_*****_****_****] 79 / 96 : spacing = 1.22785, first = 40, second = 39, abs(diff) = 1, cnt = 79, cnt_ok = 1 80 [*****_*****_*****_****_*****_*****_****_*****_*****_*****_****_*****_*****_****_*****_*****_****] 80 / 96 : spacing = 1.21250, first = 40, second = 40, abs(diff) = 0, cnt = 80, cnt_ok = 1 81 [******_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****] 81 / 96 : spacing = 1.19753, first = 41, second = 40, abs(diff) = 1, cnt = 81, cnt_ok = 1 82 [******_*****_******_*****_******_*****_******_*****_******_*****_******_*****_******_*****_*****] 82 / 96 : spacing = 1.18293, first = 41, second = 41, abs(diff) = 0, cnt = 82, cnt_ok = 1 83 [******_******_******_******_******_******_******_******_******_******_******_******_******_*****] 83 / 96 : spacing = 1.16867, first = 42, second = 41, abs(diff) = 1, cnt = 83, cnt_ok = 1 84 [*******_******_*******_******_*******_******_*******_******_*******_******_*******_******_******] 84 / 96 : spacing = 1.15476, first = 42, second = 42, abs(diff) = 0, cnt = 84, cnt_ok = 1 85 [********_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******] 85 / 96 : spacing = 1.14118, first = 43, second = 42, abs(diff) = 1, cnt = 85, cnt_ok = 1 86 [********_********_********_********_********_*******_********_********_********_********_*******] 86 / 96 : spacing = 1.12791, first = 43, second = 43, abs(diff) = 0, cnt = 86, cnt_ok = 1 87 [*********_*********_*********_********_*********_*********_********_*********_*********_********] 87 / 96 : spacing = 1.11494, first = 44, second = 43, abs(diff) = 1, cnt = 87, cnt_ok = 1 88 [**********_**********_**********_**********_*********_**********_**********_**********_*********] 88 / 96 : spacing = 1.10227, first = 44, second = 44, abs(diff) = 0, cnt = 88, cnt_ok = 1 89 [************_***********_***********_***********_***********_***********_***********_***********] 89 / 96 : spacing = 1.08989, first = 45, second = 44, abs(diff) = 1, cnt = 89, cnt_ok = 1 90 [*************_*************_*************_*************_*************_*************_************] 90 / 96 : spacing = 1.07778, first = 45, second = 45, abs(diff) = 0, cnt = 90, cnt_ok = 1 91 [****************_***************_***************_***************_***************_***************] 91 / 96 : spacing = 1.06593, first = 46, second = 45, abs(diff) = 1, cnt = 91, cnt_ok = 1 92 [*******************_******************_*******************_******************_******************] 92 / 96 : spacing = 1.05435, first = 46, second = 46, abs(diff) = 0, cnt = 92, cnt_ok = 1 93 [************************_***********************_***********************_***********************] 93 / 96 : spacing = 1.04301, first = 47, second = 46, abs(diff) = 1, cnt = 93, cnt_ok = 1 94 [********************************_*******************************_*******************************] 94 / 96 : spacing = 1.03191, first = 47, second = 47, abs(diff) = 0, cnt = 94, cnt_ok = 1 95 [************************************************_***********************************************] 95 / 96 : spacing = 1.02105, first = 48, second = 47, abs(diff) = 1, cnt = 95, cnt_ok = 1 96 [************************************************************************************************] 96 / 96 : spacing = 1.01042, first = 48, second = 48, abs(diff) = 0, cnt = 96, cnt_ok = 1
The difference between number of threads per CPU (abs(diff)) is never greater than 1 (never greater than 2 if master_place is greater than 0).
So I think in summary, what you are doing is creating the place partitions with the ceil(P/T)-sized partitions evenly spread amongst the floor(P/T) partitions (where P is num places and T is num_threads). Is that right?
This looks good, and I don't see any problem with this wrt OMP spec Can you add some descriptive comments in the code to make it clearer?
Johnny, does it look okay to you?
Thanks!
Terry