This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Enable multiple frames per global memory slot
ClosedPublic

Authored by gtbercea on Mar 14 2018, 8:03 AM.

Event Timeline

gtbercea created this revision.Mar 14 2018, 8:03 AM
gtbercea updated this revision to Diff 138363.Mar 14 2018, 8:20 AM

Add init of tail pointer.

grokos added inline comments.Mar 14 2018, 3:19 PM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
428

So here we are only inspecting the very next slot hoping that it will be large enough to accommodate our request. In case the next slot is not big enough, could there be slots after the next which are suitable? If this scenario is possible, then why are we only inspecting the very next slot and delete everything thereafter if it's not big enough?

504–513

I think this loop will delete all slots apart from the very first (the last iteration will be when Tail points to the first slot and we just deallocate Tail->next). Don't we want to delete the first slot as well?

gtbercea marked an inline comment as done.Mar 14 2018, 4:06 PM
gtbercea added inline comments.
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
428

It's definitely do-able but chances of that logic being applied are small. We expect the vast majority of requests to fit the default size of the the slot. For data that is larger than the default case it is vastly more likely that there's no next slot and we just create an entirely new slot.
There is also the issue of space: keeping around a list that occasionally has completely empty slots will be wasteful (you could argue that since it's a linked list we can just delete those nodes and redo the links). What I can do is as I delete the "next" node I can check if the data fits, if it does I can just place it there.

504–513

That's intentional. The head of the list is a statically allocated shared memory node so that one we don't need to call free on.

grokos added inline comments.Mar 14 2018, 7:54 PM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
428

Yes, that's what I had in mind. We are traversing the list anyway, so checking before deleting comes for free - it doesn't raise the complexity.

gtbercea marked an inline comment as done.Mar 15 2018, 8:21 AM
gtbercea updated this revision to Diff 138560.Mar 15 2018, 8:22 AM

New algorithm for adding data into an existing slot.

gtbercea marked an inline comment as done.Mar 15 2018, 8:25 AM
gtbercea added inline comments.
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
428

I made the change (for now).

gtbercea updated this revision to Diff 138565.Mar 15 2018, 8:32 AM

Rebase patch.

grokos accepted this revision.Mar 15 2018, 8:34 AM

Looks good.

This revision is now accepted and ready to land.Mar 15 2018, 8:34 AM
This revision was automatically updated to reflect the committed changes.