Patch improves performance of the full runtime mode by moving
number-of-threads counter to the shared memory. It also allows to save
Can you make this comment clearer? What is the first parallel region and what are the other parallel regions? I suppose you mean L1 parallel vs nested?
There was another patch instead of this one.