Page MenuHomePhabricator

OMP_PROC_BIND: better spread
ClosedPublic

Authored by pawosm01 on Aug 9 2017, 4:19 AM.

Details

Summary

This change improves the way threads are spread across cores when OMP_PROC_BIND=spread is set and no unusual affinity masks are in use.

Diff Detail

Repository
rL LLVM

Event Timeline

pawosm01 created this revision.Aug 9 2017, 4:19 AM
jlpeyton edited edge metadata.Aug 9 2017, 10:25 AM

Can you give a small example of how this would differ from the previous code?

Hi Paul,
Can you be more specific about how this "improves the way threads are spread across cores"? The OMP spec is very specific on exactly how the spread algorithm should work.
Thanks!
Terry

Let me illustrate how current algorithm works for 96 cores (dual CPU) machine (stars are threads):

 1 [*_______________________________________________________________________________________________]
 2 [*_______________________________________________*_______________________________________________]
 3 [*_______________________________*_______________________________*_______________________________]
 4 [*_______________________*_______________________*_______________________*_______________________]
 5 [*___________________*__________________*__________________*__________________*__________________]
 6 [*_______________*_______________*_______________*_______________*_______________*_______________]
 7 [*_____________*_____________*_____________*_____________*_____________*____________*____________]
 8 [*___________*___________*___________*___________*___________*___________*___________*___________]
 9 [*__________*__________*__________*__________*__________*__________*_________*_________*_________]
10 [*_________*_________*_________*_________*_________*_________*________*________*________*________]
11 [*________*________*________*________*________*________*________*________*_______*_______*_______]
12 [*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______]
13 [*_______*______*_______*______*_______*______*_______*______*_______*______*______*______*______]
14 [*______*______*______*______*______*______*______*______*______*______*______*______*_____*_____]
15 [*______*_____*______*_____*______*_____*______*_____*______*_____*______*_____*_____*_____*_____]
16 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____]
17 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*____*____*____*____*____*____]
18 [*_____*____*____*_____*____*____*_____*____*____*_____*____*____*_____*____*____*_____*____*____]
19 [*_____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____]
20 [*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*____*___*___*___*___]
21 [*____*____*____*____*____*____*____*____*____*____*____*____*___*___*___*___*___*___*___*___*___]
22 [*____*___*____*___*____*___*____*___*____*___*____*___*____*___*____*___*___*___*___*___*___*___]
23 [*____*___*___*___*___*____*___*___*___*___*____*___*___*___*___*____*___*___*___*___*___*___*___]
24 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___]
25 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__]
26 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__*__*__*__*__]
27 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*__*__*__*__*__*__*__*__*__*__*__*__]
28 [*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*___*__*__*__*__*__]
29 [*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*__*__]
30 [*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__*___*__*__*__*__]
31 [*___*__*__*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*__*__*__]
32 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__]
33 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_]
34 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_]
35 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_]
36 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_]
37 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_]
38 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_]
39 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_]
40 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_*_*_*_*_*_]
41 [*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_]
42 [*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*_*_*_*_*_]
43 [*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*_*_*_]
44 [*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*_*_*_*_]
45 [*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_]
46 [*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_]
47 [*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_]
48 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_]
49 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**]
50 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****]
51 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******]
52 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_********]
53 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**********]
54 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_************]
55 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**************]
56 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****************]
57 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******************]
58 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_********************]
59 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**********************]
60 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_************************]
61 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**************************]
62 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_****************************]
63 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_******************************]
64 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*]
65 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_****]
66 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*******]
67 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**********]
68 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*************]
69 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_****************]
70 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*******************]
71 [*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**********************]
72 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**]
73 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_******]
74 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**********]
75 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_**************]
76 [*_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_******************]
77 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****]
78 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_*********]
79 [*_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_****_**************]
80 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_****]
81 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_**********]
82 [*_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_****************]
83 [*_******_******_******_******_******_******_******_******_******_******_******_******_**********]
84 [*_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_******]
85 [*_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_**************]
86 [*_********_********_********_********_********_********_********_********_********_*************]
87 [*_*********_*********_*********_*********_*********_*********_*********_*********_**************]
88 [*_***********_***********_***********_***********_***********_***********_***********_**********]
89 [*_************_************_************_************_************_************_****************]
90 [*_***************_***************_***************_***************_***************_**************]
91 [*_******************_******************_******************_******************_******************]
92 [*_***********************_***********************_***********************_**********************]
93 [*_*******************************_*******************************_******************************]
94 [*_***********************************************_**********************************************]
95 [*_**********************************************************************************************]
96 [************************************************************************************************]

As you can see, with growing number of threads, there's a great imbalance between those two CPUs (second CPU gets more threads to run).
I wouldn't call it a good spread.

Comparing to this, my code gives you following results:

 1 [*_______________________________________________________________________________________________] 1 / 96 : spacing = 97.00000, first = 1, second = 0, abs(diff) = 1, cnt = 1, cnt_ok = 1
 2 [*_______________________________________________*_______________________________________________] 2 / 96 : spacing = 48.50000, first = 1, second = 1, abs(diff) = 0, cnt = 2, cnt_ok = 1
 3 [*_______________________________*_______________________________*_______________________________] 3 / 96 : spacing = 32.33333, first = 2, second = 1, abs(diff) = 1, cnt = 3, cnt_ok = 1
 4 [*_______________________*_______________________*_______________________*_______________________] 4 / 96 : spacing = 24.25000, first = 2, second = 2, abs(diff) = 0, cnt = 4, cnt_ok = 1
 5 [*__________________*__________________*___________________*__________________*__________________] 5 / 96 : spacing = 19.40000, first = 3, second = 2, abs(diff) = 1, cnt = 5, cnt_ok = 1
 6 [*_______________*_______________*_______________*_______________*_______________*_______________] 6 / 96 : spacing = 16.16667, first = 3, second = 3, abs(diff) = 0, cnt = 6, cnt_ok = 1
 7 [*____________*_____________*_____________*_____________*_____________*_____________*____________] 7 / 96 : spacing = 13.85714, first = 4, second = 3, abs(diff) = 1, cnt = 7, cnt_ok = 1
 8 [*___________*___________*___________*___________*___________*___________*___________*___________] 8 / 96 : spacing = 12.12500, first = 4, second = 4, abs(diff) = 0, cnt = 8, cnt_ok = 1
 9 [*_________*__________*__________*__________*_________*__________*__________*__________*_________] 9 / 96 : spacing = 10.77778, first = 5, second = 4, abs(diff) = 1, cnt = 9, cnt_ok = 1
10 [*________*_________*_________*________*_________*_________*________*_________*_________*________] 10 / 96 : spacing = 9.70000, first = 5, second = 5, abs(diff) = 0, cnt = 10, cnt_ok = 1
11 [*_______*________*________*________*________*_______*________*________*________*________*_______] 11 / 96 : spacing = 8.81818, first = 6, second = 5, abs(diff) = 1, cnt = 11, cnt_ok = 1
12 [*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______*_______] 12 / 96 : spacing = 8.08333, first = 6, second = 6, abs(diff) = 0, cnt = 12, cnt_ok = 1
13 [*______*______*_______*______*_______*______*_______*______*_______*______*_______*______*______] 13 / 96 : spacing = 7.46154, first = 7, second = 6, abs(diff) = 1, cnt = 13, cnt_ok = 1
14 [*_____*______*______*______*______*______*______*______*______*______*______*______*______*_____] 14 / 96 : spacing = 6.92857, first = 7, second = 7, abs(diff) = 0, cnt = 14, cnt_ok = 1
15 [*_____*_____*______*_____*______*_____*______*_____*______*_____*______*_____*______*_____*_____] 15 / 96 : spacing = 6.46667, first = 8, second = 7, abs(diff) = 1, cnt = 15, cnt_ok = 1
16 [*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____*_____] 16 / 96 : spacing = 6.06250, first = 8, second = 8, abs(diff) = 0, cnt = 16, cnt_ok = 1
17 [*____*_____*_____*____*_____*_____*____*_____*_____*_____*____*_____*_____*____*_____*_____*____] 17 / 96 : spacing = 5.70588, first = 9, second = 8, abs(diff) = 1, cnt = 17, cnt_ok = 1
18 [*____*____*_____*____*____*_____*____*_____*____*____*_____*____*_____*____*____*_____*____*____] 18 / 96 : spacing = 5.38889, first = 9, second = 9, abs(diff) = 0, cnt = 18, cnt_ok = 1
19 [*____*____*____*____*____*____*____*____*____*_____*____*____*____*____*____*____*____*____*____] 19 / 96 : spacing = 5.10526, first = 10, second = 9, abs(diff) = 1, cnt = 19, cnt_ok = 1
20 [*___*____*____*____*____*____*___*____*____*____*____*____*____*___*____*____*____*____*____*___] 20 / 96 : spacing = 4.85000, first = 10, second = 10, abs(diff) = 0, cnt = 20, cnt_ok = 1
21 [*___*____*___*____*____*___*____*___*____*____*___*____*____*___*____*___*____*____*___*____*___] 21 / 96 : spacing = 4.61905, first = 11, second = 10, abs(diff) = 1, cnt = 21, cnt_ok = 1
22 [*___*___*____*___*____*___*___*____*___*____*___*___*____*___*____*___*___*____*___*____*___*___] 22 / 96 : spacing = 4.40909, first = 11, second = 11, abs(diff) = 0, cnt = 22, cnt_ok = 1
23 [*___*___*___*___*____*___*___*___*___*____*___*___*___*____*___*___*___*___*____*___*___*___*___] 23 / 96 : spacing = 4.21739, first = 12, second = 11, abs(diff) = 1, cnt = 23, cnt_ok = 1
24 [*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___*___] 24 / 96 : spacing = 4.04167, first = 12, second = 12, abs(diff) = 0, cnt = 24, cnt_ok = 1
25 [*__*___*___*___*___*___*___*___*__*___*___*___*___*___*___*___*__*___*___*___*___*___*___*___*__] 25 / 96 : spacing = 3.88000, first = 13, second = 12, abs(diff) = 1, cnt = 25, cnt_ok = 1
26 [*__*___*___*__*___*___*___*__*___*___*___*__*___*___*__*___*___*___*__*___*___*___*__*___*___*__] 26 / 96 : spacing = 3.73077, first = 13, second = 13, abs(diff) = 0, cnt = 26, cnt_ok = 1
27 [*__*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*___*__*___*__*___*__] 27 / 96 : spacing = 3.59259, first = 14, second = 13, abs(diff) = 1, cnt = 27, cnt_ok = 1
28 [*__*__*___*__*___*__*___*__*___*__*___*__*___*__*__*___*__*___*__*___*__*___*__*___*__*___*__*__] 28 / 96 : spacing = 3.46429, first = 14, second = 14, abs(diff) = 0, cnt = 28, cnt_ok = 1
29 [*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__*___*__*__] 29 / 96 : spacing = 3.34483, first = 15, second = 14, abs(diff) = 1, cnt = 29, cnt_ok = 1
30 [*__*__*__*__*___*__*__*__*___*__*__*__*___*__*__*__*__*___*__*__*__*___*__*__*__*___*__*__*__*__] 30 / 96 : spacing = 3.23333, first = 15, second = 15, abs(diff) = 0, cnt = 30, cnt_ok = 1
31 [*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__*___*__*__*__*__*__*__*__] 31 / 96 : spacing = 3.12903, first = 16, second = 15, abs(diff) = 1, cnt = 31, cnt_ok = 1
32 [*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__] 32 / 96 : spacing = 3.03125, first = 16, second = 16, abs(diff) = 0, cnt = 32, cnt_ok = 1
33 [*_*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_*__*__*__*__*__*__*__*__*__*__*__*__*__*__*__*_] 33 / 96 : spacing = 2.93939, first = 17, second = 16, abs(diff) = 1, cnt = 33, cnt_ok = 1
34 [*_*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*__*_*__*__*__*__*__*_] 34 / 96 : spacing = 2.85294, first = 17, second = 17, abs(diff) = 0, cnt = 34, cnt_ok = 1
35 [*_*__*__*__*_*__*__*__*_*__*__*__*__*_*__*__*__*_*__*__*__*_*__*__*__*__*_*__*__*__*_*__*__*__*_] 35 / 96 : spacing = 2.77143, first = 18, second = 17, abs(diff) = 1, cnt = 35, cnt_ok = 1
36 [*_*__*__*_*__*__*_*__*__*_*__*__*__*_*__*__*_*__*__*_*__*__*_*__*__*__*_*__*__*_*__*__*_*__*__*_] 36 / 96 : spacing = 2.69444, first = 18, second = 18, abs(diff) = 0, cnt = 36, cnt_ok = 1
37 [*_*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*__*_*__*_*__*__*_*__*_] 37 / 96 : spacing = 2.62162, first = 19, second = 18, abs(diff) = 1, cnt = 37, cnt_ok = 1
38 [*_*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*__*_*__*_*__*_*__*_*__*_] 38 / 96 : spacing = 2.55263, first = 19, second = 19, abs(diff) = 0, cnt = 38, cnt_ok = 1
39 [*_*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*__*_*_] 39 / 96 : spacing = 2.48718, first = 20, second = 19, abs(diff) = 1, cnt = 39, cnt_ok = 1
40 [*_*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*__*_*_*__*_*__*_*_] 40 / 96 : spacing = 2.42500, first = 20, second = 20, abs(diff) = 0, cnt = 40, cnt_ok = 1
41 [*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_*__*_*__*_*_*__*_*_*__*_*_] 41 / 96 : spacing = 2.36585, first = 21, second = 20, abs(diff) = 1, cnt = 41, cnt_ok = 1
42 [*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_*__*_*_*__*_*_*__*_*_*__*_*_*_] 42 / 96 : spacing = 2.30952, first = 21, second = 21, abs(diff) = 0, cnt = 42, cnt_ok = 1
43 [*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_*__*_*_*_] 43 / 96 : spacing = 2.25581, first = 22, second = 21, abs(diff) = 1, cnt = 43, cnt_ok = 1
44 [*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_*__*_*_*_*_] 44 / 96 : spacing = 2.20455, first = 22, second = 22, abs(diff) = 0, cnt = 44, cnt_ok = 1
45 [*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_*__*_*_*_*_*_*__*_*_*_*_*_*_] 45 / 96 : spacing = 2.15556, first = 23, second = 22, abs(diff) = 1, cnt = 45, cnt_ok = 1
46 [*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_] 46 / 96 : spacing = 2.10870, first = 23, second = 23, abs(diff) = 0, cnt = 46, cnt_ok = 1
47 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*__*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 47 / 96 : spacing = 2.06383, first = 24, second = 23, abs(diff) = 1, cnt = 47, cnt_ok = 1
48 [*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_] 48 / 96 : spacing = 2.02083, first = 24, second = 24, abs(diff) = 0, cnt = 48, cnt_ok = 1
49 [**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*] 49 / 96 : spacing = 1.97959, first = 25, second = 24, abs(diff) = 1, cnt = 49, cnt_ok = 1
50 [**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*_*_*_*_*_*_*] 50 / 96 : spacing = 1.94000, first = 25, second = 25, abs(diff) = 0, cnt = 50, cnt_ok = 1
51 [**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_**_*_*_*_*_*_*_*_*_*] 51 / 96 : spacing = 1.90196, first = 26, second = 25, abs(diff) = 1, cnt = 51, cnt_ok = 1
52 [**_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*_**_*_*_*_*_*_**_*_*_*_*_*_*] 52 / 96 : spacing = 1.86538, first = 26, second = 26, abs(diff) = 0, cnt = 52, cnt_ok = 1
53 [**_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*_**_*_*_*_*] 53 / 96 : spacing = 1.83019, first = 27, second = 26, abs(diff) = 1, cnt = 53, cnt_ok = 1
54 [**_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*_**_*_*_*] 54 / 96 : spacing = 1.79630, first = 27, second = 27, abs(diff) = 0, cnt = 54, cnt_ok = 1
55 [**_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*_**_*_*_**_*_*_**_*_*_**_*_*_*] 55 / 96 : spacing = 1.76364, first = 28, second = 27, abs(diff) = 1, cnt = 55, cnt_ok = 1
56 [**_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*_**_*_**_*_*_**_*_*_**_*_*] 56 / 96 : spacing = 1.73214, first = 28, second = 28, abs(diff) = 0, cnt = 56, cnt_ok = 1
57 [**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_**_*_*_**_*_**_*_*] 57 / 96 : spacing = 1.70175, first = 29, second = 28, abs(diff) = 1, cnt = 57, cnt_ok = 1
58 [**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_**_*_*] 58 / 96 : spacing = 1.67241, first = 29, second = 29, abs(diff) = 0, cnt = 58, cnt_ok = 1
59 [**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_**_*_**_*_**_*_**_*_**_*] 59 / 96 : spacing = 1.64407, first = 30, second = 29, abs(diff) = 1, cnt = 59, cnt_ok = 1
60 [**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_**_*_**_*_**_**_*_**_*] 60 / 96 : spacing = 1.61667, first = 30, second = 30, abs(diff) = 0, cnt = 60, cnt_ok = 1
61 [**_**_**_*_**_**_*_**_**_*_**_**_**_*_**_**_*_**_**_*_**_**_*_**_**_**_*_**_**_*_**_**_*_**_**_*] 61 / 96 : spacing = 1.59016, first = 31, second = 30, abs(diff) = 1, cnt = 61, cnt_ok = 1
62 [**_**_**_**_*_**_**_**_*_**_**_**_**_*_**_**_**_*_**_**_**_*_**_**_**_**_*_**_**_**_*_**_**_**_*] 62 / 96 : spacing = 1.56452, first = 31, second = 31, abs(diff) = 0, cnt = 62, cnt_ok = 1
63 [**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_**_*_**_**_**_**_**_*] 63 / 96 : spacing = 1.53968, first = 32, second = 31, abs(diff) = 1, cnt = 63, cnt_ok = 1
64 [**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_*] 64 / 96 : spacing = 1.51562, first = 32, second = 32, abs(diff) = 0, cnt = 64, cnt_ok = 1
65 [***_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**_**] 65 / 96 : spacing = 1.49231, first = 33, second = 32, abs(diff) = 1, cnt = 65, cnt_ok = 1
66 [***_**_**_**_**_**_**_***_**_**_**_**_**_**_**_***_**_**_**_**_**_**_**_***_**_**_**_**_**_**_**] 66 / 96 : spacing = 1.46970, first = 33, second = 33, abs(diff) = 0, cnt = 66, cnt_ok = 1
67 [***_**_**_**_***_**_**_**_***_**_**_**_***_**_**_**_**_***_**_**_**_***_**_**_**_***_**_**_**_**] 67 / 96 : spacing = 1.44776, first = 34, second = 33, abs(diff) = 1, cnt = 67, cnt_ok = 1
68 [***_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**_***_**_**] 68 / 96 : spacing = 1.42647, first = 34, second = 34, abs(diff) = 0, cnt = 68, cnt_ok = 1
69 [***_**_***_**_***_**_***_**_***_**_***_**_***_**_**_***_**_***_**_***_**_***_**_***_**_***_**_**] 69 / 96 : spacing = 1.40580, first = 35, second = 34, abs(diff) = 1, cnt = 69, cnt_ok = 1
70 [***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_***_**_***_**_***_**] 70 / 96 : spacing = 1.38571, first = 35, second = 35, abs(diff) = 0, cnt = 70, cnt_ok = 1
71 [***_***_***_**_***_***_***_**_***_***_***_**_***_***_**_***_***_***_**_***_***_***_**_***_***_**] 71 / 96 : spacing = 1.36620, first = 36, second = 35, abs(diff) = 1, cnt = 71, cnt_ok = 1
72 [***_***_***_***_***_***_***_***_**_***_***_***_***_***_***_***_**_***_***_***_***_***_***_***_**] 72 / 96 : spacing = 1.34722, first = 36, second = 36, abs(diff) = 0, cnt = 72, cnt_ok = 1
73 [****_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***_***] 73 / 96 : spacing = 1.32877, first = 37, second = 36, abs(diff) = 1, cnt = 73, cnt_ok = 1
74 [****_***_***_***_****_***_***_***_***_****_***_***_***_****_***_***_***_***_****_***_***_***_***] 74 / 96 : spacing = 1.31081, first = 37, second = 37, abs(diff) = 0, cnt = 74, cnt_ok = 1
75 [****_***_****_***_****_***_***_****_***_****_***_***_****_***_****_***_***_****_***_****_***_***] 75 / 96 : spacing = 1.29333, first = 38, second = 37, abs(diff) = 1, cnt = 75, cnt_ok = 1
76 [****_****_***_****_****_***_****_***_****_****_***_****_****_***_****_***_****_****_***_****_***] 76 / 96 : spacing = 1.27632, first = 38, second = 38, abs(diff) = 0, cnt = 76, cnt_ok = 1
77 [****_****_****_****_****_****_***_****_****_****_****_****_****_***_****_****_****_****_****_***] 77 / 96 : spacing = 1.25974, first = 39, second = 38, abs(diff) = 1, cnt = 77, cnt_ok = 1
78 [*****_****_****_****_****_****_****_****_****_*****_****_****_****_****_****_****_****_****_****] 78 / 96 : spacing = 1.24359, first = 39, second = 39, abs(diff) = 0, cnt = 78, cnt_ok = 1
79 [*****_****_*****_****_****_*****_****_*****_****_****_*****_****_*****_****_****_*****_****_****] 79 / 96 : spacing = 1.22785, first = 40, second = 39, abs(diff) = 1, cnt = 79, cnt_ok = 1
80 [*****_*****_*****_****_*****_*****_****_*****_*****_*****_****_*****_*****_****_*****_*****_****] 80 / 96 : spacing = 1.21250, first = 40, second = 40, abs(diff) = 0, cnt = 80, cnt_ok = 1
81 [******_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****_*****] 81 / 96 : spacing = 1.19753, first = 41, second = 40, abs(diff) = 1, cnt = 81, cnt_ok = 1
82 [******_*****_******_*****_******_*****_******_*****_******_*****_******_*****_******_*****_*****] 82 / 96 : spacing = 1.18293, first = 41, second = 41, abs(diff) = 0, cnt = 82, cnt_ok = 1
83 [******_******_******_******_******_******_******_******_******_******_******_******_******_*****] 83 / 96 : spacing = 1.16867, first = 42, second = 41, abs(diff) = 1, cnt = 83, cnt_ok = 1
84 [*******_******_*******_******_*******_******_*******_******_*******_******_*******_******_******] 84 / 96 : spacing = 1.15476, first = 42, second = 42, abs(diff) = 0, cnt = 84, cnt_ok = 1
85 [********_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******_*******] 85 / 96 : spacing = 1.14118, first = 43, second = 42, abs(diff) = 1, cnt = 85, cnt_ok = 1
86 [********_********_********_********_********_*******_********_********_********_********_*******] 86 / 96 : spacing = 1.12791, first = 43, second = 43, abs(diff) = 0, cnt = 86, cnt_ok = 1
87 [*********_*********_*********_********_*********_*********_********_*********_*********_********] 87 / 96 : spacing = 1.11494, first = 44, second = 43, abs(diff) = 1, cnt = 87, cnt_ok = 1
88 [**********_**********_**********_**********_*********_**********_**********_**********_*********] 88 / 96 : spacing = 1.10227, first = 44, second = 44, abs(diff) = 0, cnt = 88, cnt_ok = 1
89 [************_***********_***********_***********_***********_***********_***********_***********] 89 / 96 : spacing = 1.08989, first = 45, second = 44, abs(diff) = 1, cnt = 89, cnt_ok = 1
90 [*************_*************_*************_*************_*************_*************_************] 90 / 96 : spacing = 1.07778, first = 45, second = 45, abs(diff) = 0, cnt = 90, cnt_ok = 1
91 [****************_***************_***************_***************_***************_***************] 91 / 96 : spacing = 1.06593, first = 46, second = 45, abs(diff) = 1, cnt = 91, cnt_ok = 1
92 [*******************_******************_*******************_******************_******************] 92 / 96 : spacing = 1.05435, first = 46, second = 46, abs(diff) = 0, cnt = 92, cnt_ok = 1
93 [************************_***********************_***********************_***********************] 93 / 96 : spacing = 1.04301, first = 47, second = 46, abs(diff) = 1, cnt = 93, cnt_ok = 1
94 [********************************_*******************************_*******************************] 94 / 96 : spacing = 1.03191, first = 47, second = 47, abs(diff) = 0, cnt = 94, cnt_ok = 1
95 [************************************************_***********************************************] 95 / 96 : spacing = 1.02105, first = 48, second = 47, abs(diff) = 1, cnt = 95, cnt_ok = 1
96 [************************************************************************************************] 96 / 96 : spacing = 1.01042, first = 48, second = 48, abs(diff) = 0, cnt = 96, cnt_ok = 1

The difference between number of threads per CPU (abs(diff)) is never greater than 1 (never greater than 2 if master_place is greater than 0).

pawosm01 updated this revision to Diff 110521.Aug 9 2017, 11:46 PM

couldn't build everywhere with that assertion.

So I think in summary, what you are doing is creating the place partitions with the ceil(P/T)-sized partitions evenly spread amongst the floor(P/T) partitions (where P is num places and T is num_threads). Is that right?

This looks good, and I don't see any problem with this wrt OMP spec Can you add some descriptive comments in the code to make it clearer?

Johnny, does it look okay to you?

Thanks!
Terry

This comment was removed by tlwilmar.
pawosm01 updated this revision to Diff 110603.Aug 10 2017, 10:01 AM

some words of explanation added, as requested.

jlpeyton accepted this revision.Aug 10 2017, 1:33 PM

Looks good to me.

This revision is now accepted and ready to land.Aug 10 2017, 1:33 PM
This revision was automatically updated to reflect the committed changes.