This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Enable globalization for workers
ClosedPublic

Authored by gtbercea on Mar 14 2018, 11:43 AM.

Details

Summary

This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

gtbercea created this revision.Mar 14 2018, 11:43 AM
grokos added inline comments.Mar 14 2018, 12:23 PM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
42–44

Remove this change, that was a bug which has been fixed already. I have pushed it upstream.

gtbercea added inline comments.Mar 14 2018, 1:58 PM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
42–44

Was there a bug fix for this published on fabricator prior to me posting it in this patch?

grokos added inline comments.Mar 14 2018, 2:03 PM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
42–44

There was no revision on the Phabricator for this bug. I had the fix ready from clang-ykt but I hadn't pushed it onto the trunk. We don't need to call __popc(), just compare against 0.

gtbercea updated this revision to Diff 138562.EditedMar 15 2018, 8:28 AM

Rebase patch.

gtbercea updated this revision to Diff 138567.Mar 15 2018, 8:35 AM

Rebase patch.

gtbercea marked an inline comment as done.Mar 15 2018, 8:35 AM
gtbercea updated this revision to Diff 138720.Mar 16 2018, 8:57 AM

Choose correct default size depending on slot type: worker or master.

gtbercea updated this revision to Diff 138981.Mar 19 2018, 12:33 PM

Share FrameP with all threads in the same warp.

gtbercea updated this revision to Diff 139016.Mar 19 2018, 3:55 PM

Always pop current frame from slot.

gtbercea updated this revision to Diff 139017.Mar 19 2018, 3:57 PM

Fix comment.

grokos added inline comments.Mar 20 2018, 11:17 AM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
429

Can you just use and if here? Since NewSize does not change value if DefaultSlotSize <= NewSize, it's clearer to say

if (DefaultSlotSize > NewSize)
  NewSize = DefaultSlotSize;
503–509

I think you can speed things up here. Since you plan to remove the next slot (free(Tail->Next)), there is no point in setting new values for the fields of that slot which will be removed right away. You only need to do this for the very final slot (the one which is statically allocated). The two "//Extra" assignments look redundant in this sense. Also, the loop condition can be simplified. Since there is the statically allocated first slot, Tail will always point to some slot, it can never be NULL, and all you need to check for is while(Tail->Prev). Finally, Tail->Next = 0 can also be deferred until the end of the loop because the slot Tail points to may also be removed in the next iteration. So I think an equivalent (in terms of functionality) version would be:

while(Tail->Prev) {
  Tail = Tail->Prev;
  free(Tail->Next);
}
Tail->Next=0;
gtbercea updated this revision to Diff 139173.Mar 20 2018, 11:31 AM

Address comments.

gtbercea marked 2 inline comments as done.Mar 20 2018, 11:32 AM
grokos accepted this revision.Mar 20 2018, 11:45 AM

LGTM.

This revision is now accepted and ready to land.Mar 20 2018, 11:45 AM
This revision was automatically updated to reflect the committed changes.