This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Add runtime function for pushing coalesced global records
ClosedPublic

Authored by gtbercea on Oct 11 2018, 8:20 AM.

Details

Summary

In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

gtbercea created this revision.Oct 11 2018, 8:20 AM
grokos accepted this revision.Oct 11 2018, 8:41 AM

Looks good.

This revision is now accepted and ready to land.Oct 11 2018, 8:41 AM

I guess this will break the case when the DataSize passed to __kmpc_data_sharing_push_stack() needs additional alignment: With this change it is handled in data_sharing_push_stack_common() but __kmpc_data_sharing_push_stack() will determine PushSize without the adjustment and do the final pointer arithmetic.

I guess this will break the case when the DataSize passed to __kmpc_data_sharing_push_stack() needs additional alignment: With this change it is handled in data_sharing_push_stack_common() but __kmpc_data_sharing_push_stack() will determine PushSize without the adjustment and do the final pointer arithmetic.

Why DataSize might require an additional alignment? The DataSize must be already aligned.

gtbercea updated this revision to Diff 169252.Oct 11 2018, 10:56 AM

Ensure PushSize is multiple of 8 bytes.

I guess this will break the case when the DataSize passed to __kmpc_data_sharing_push_stack() needs additional alignment: With this change it is handled in data_sharing_push_stack_common() but __kmpc_data_sharing_push_stack() will determine PushSize without the adjustment and do the final pointer arithmetic.

Why DataSize might require an additional alignment? The DataSize must be already aligned.

It's required in __kmpc_data_sharing_push_stack() for some cases, please see D52655.
Another problem with this patch is that __kmpc_data_sharing_push_stack() is adding the lane offset to the return value of omptarget_nvptx_SimpleThreadPrivateContext::Allocate(). AFAICS this will break lastprivate which requires the same buffer for all threads.

Anyway, you must do what you think to be correct.

I guess this will break the case when the DataSize passed to __kmpc_data_sharing_push_stack() needs additional alignment: With this change it is handled in data_sharing_push_stack_common() but __kmpc_data_sharing_push_stack() will determine PushSize without the adjustment and do the final pointer arithmetic.

Why DataSize might require an additional alignment? The DataSize must be already aligned.

It's required in __kmpc_data_sharing_push_stack() for some cases, please see D52655.

It is another problem. The problem is that the StackP pointer is unaligned after all the pointer maths. The DataSize itself should not be aligned.

Another problem with this patch is that __kmpc_data_sharing_push_stack() is adding the lane offset to the return value of omptarget_nvptx_SimpleThreadPrivateContext::Allocate(). AFAICS this will break lastprivate which requires the same buffer for all threads.

Anyway, you must do what you think to be correct.

Yes, there must be an additional check size_t PushSize = (isRuntimeUninitialized() || IsMasterThread()) ? DataSize : WARPSIZE * DataSize;

gtbercea updated this revision to Diff 169273.Oct 11 2018, 12:32 PM

Simply call to common push function.

ABataev added inline comments.Oct 24 2018, 6:25 AM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
389

This must be removed

458

Also, must be removed

464

It is a very bad idea to have something like this without atomic instructions.
Also, for writing, you need to use atomic instructions (+, possibly, volatile data type). Otherwise, it leads to undefined behavior and problems during optimization.

gtbercea updated this revision to Diff 171988.Oct 31 2018, 11:52 AM
Address comments.
gtbercea marked 3 inline comments as done.Oct 31 2018, 11:53 AM
ABataev added inline comments.Oct 31 2018, 11:57 AM
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
463

Do you realy need the loop here? Seems to me it is better to have it in else branch of the if (IsWarpMaster) statement

gtbercea updated this revision to Diff 171993.Oct 31 2018, 12:35 PM

Move while on else branch.

gtbercea marked an inline comment as done.Oct 31 2018, 12:36 PM
ABataev added inline comments.Oct 31 2018, 12:45 PM
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
159 ↗(On Diff #171993)

Do you really need to remove this?

libomptarget/deviceRTLs/nvptx/src/supporti.h
191 ↗(On Diff #171993)

Same, do you really need to remove this?

gtbercea updated this revision to Diff 172007.Oct 31 2018, 1:37 PM
Reinstate assert.
gtbercea marked an inline comment as done.Oct 31 2018, 1:38 PM
gtbercea marked an inline comment as done.Oct 31 2018, 1:39 PM
gtbercea added inline comments.
libomptarget/deviceRTLs/nvptx/src/supporti.h
191 ↗(On Diff #171993)

Yes, leaving this in will lead to the whole malloc to be optimized out after inlining.

gtbercea marked an inline comment as done.Oct 31 2018, 1:39 PM
This revision was automatically updated to reflect the committed changes.