To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots.
Details
Diff Detail
- Repository
- rOMP OpenMP
Event Timeline
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
408 | So here we are only inspecting the very next slot hoping that it will be large enough to accommodate our request. In case the next slot is not big enough, could there be slots after the next which are suitable? If this scenario is possible, then why are we only inspecting the very next slot and delete everything thereafter if it's not big enough? | |
498–513 | I think this loop will delete all slots apart from the very first (the last iteration will be when Tail points to the first slot and we just deallocate Tail->next). Don't we want to delete the first slot as well? |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
408 | It's definitely do-able but chances of that logic being applied are small. We expect the vast majority of requests to fit the default size of the the slot. For data that is larger than the default case it is vastly more likely that there's no next slot and we just create an entirely new slot. | |
498–513 | That's intentional. The head of the list is a statically allocated shared memory node so that one we don't need to call free on. |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
408 | Yes, that's what I had in mind. We are traversing the list anyway, so checking before deleting comes for free - it doesn't raise the complexity. |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
408 | I made the change (for now). |
So here we are only inspecting the very next slot hoping that it will be large enough to accommodate our request. In case the next slot is not big enough, could there be slots after the next which are suitable? If this scenario is possible, then why are we only inspecting the very next slot and delete everything thereafter if it's not big enough?