This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.
Details
Diff Detail
- Repository
- rOMP OpenMP
- Build Status
Buildable 16221 Build 16221: arc lint + arc unit
Event Timeline
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
42–44 | Remove this change, that was a bug which has been fixed already. I have pushed it upstream. |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
42–44 | Was there a bug fix for this published on fabricator prior to me posting it in this patch? |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
42–44 | There was no revision on the Phabricator for this bug. I had the fix ready from clang-ykt but I hadn't pushed it onto the trunk. We don't need to call __popc(), just compare against 0. |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
429 | Can you just use and if here? Since NewSize does not change value if DefaultSlotSize <= NewSize, it's clearer to say if (DefaultSlotSize > NewSize) NewSize = DefaultSlotSize; | |
503–509 | I think you can speed things up here. Since you plan to remove the next slot (free(Tail->Next)), there is no point in setting new values for the fields of that slot which will be removed right away. You only need to do this for the very final slot (the one which is statically allocated). The two "//Extra" assignments look redundant in this sense. Also, the loop condition can be simplified. Since there is the statically allocated first slot, Tail will always point to some slot, it can never be NULL, and all you need to check for is while(Tail->Prev). Finally, Tail->Next = 0 can also be deferred until the end of the loop because the slot Tail points to may also be removed in the next iteration. So I think an equivalent (in terms of functionality) version would be: while(Tail->Prev) { Tail = Tail->Prev; free(Tail->Next); } Tail->Next=0; |
Remove this change, that was a bug which has been fixed already. I have pushed it upstream.