This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Use DS_Max_Warp_Number instead of WARPSIZE
ClosedPublic

Authored by pdhaliwal on Sep 3 2020, 5:10 AM.

Details

Summary

The size of worker_rootS should have been DS_Max_Warp_Number.
This reduces memory usage by deviceRTL on AMDGPU from around 2.3GB
to around 770MB.

Diff Detail

Event Timeline

pdhaliwal created this revision.Sep 3 2020, 5:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2020, 5:10 AM
pdhaliwal requested review of this revision.Sep 3 2020, 5:10 AM

This can't break nvptx as WARPSIZE == DS_Max_Warp_Number.

Can you share your reasoning for amdgcn? The control flow is difficult to follow in this area

jdoerfert requested changes to this revision.Sep 3 2020, 8:25 AM

Other places need to be updated too: data_sharing_init_stack_common for one.

Do we actually use this array? What happens if you make it a single element and update the function above?

This revision now requires changes to proceed.Sep 3 2020, 8:25 AM

Other places need to be updated too: data_sharing_init_stack_common for one.

Do we actually use this array? What happens if you make it a single element and update the function above?

IIRC this was only used in the old data sharing scheme for nested parallelism on the GPU, see also http://lists.llvm.org/pipermail/openmp-dev/2018-September/002160.html. This seems to be gone since D83349, AFAICT the arrays are only initialized (via the various code paths) but never used.

Other places need to be updated too: data_sharing_init_stack_common for one.

Do we actually use this array? What happens if you make it a single element and update the function above?

IIRC this was only used in the old data sharing scheme for nested parallelism on the GPU, see also http://lists.llvm.org/pipermail/openmp-dev/2018-September/002160.html. This seems to be gone since D83349, AFAICT the arrays are only initialized (via the various code paths) but never used.

In that case, we should remove it.

JonChesterfield accepted this revision.Sep 3 2020, 4:09 PM

It might be dead. Difficult to tell. The control flow being spread across codegen and the runtime obfuscates things.

I've convinced myself that the change proposed here is right, in that the access to the data structure are based on warp id, which is bounded by DS_Max_Warp_Number. On the basis that it's definitely a NFC for nvptx and reduces an object from 2.4gb to 600mb on amdgcn without loss of functionality, I say we accept this as-is. It's a step in a good direction. If we can determine that all this stuff is dynamically dead and delete it, great. This patch won't obstruct that.

Somewhat related - __kmpc_spmd_kernel_init is always passed '0' for RequiresDataSharing, so the code in that function which writes to DataSharingState is dead. So we should drop the always-zero argument and remove the dead code. It's probably worth doing another pass over codegen looking for functions that are only ever called with the same constant and working through the dead code elimination that falls out.

Only places where it was accessed are here and here. Jon's observation is correct. The maximum number of threads on both amdgcn and nvptx is 1024. However, on amdgcn, wave size is 64 and so maximum number of waves can be 16 and on nvptx, the warp size is 32 and maximum number of warps is 32.

And, this array is not completely dead yet as generic mode uses this for locals allocation whenever compiler is unable to properly detect if the method is being executed in generic mode or spmd mode (atleast on amdgcn).

@jdoerfert thanks for noticing, our local fork was already using DS_Max_Warp_Number in data_sharing_init_stack_common, I will change this here as well. This will not impact nvptx.

pdhaliwal updated this revision to Diff 289869.Sep 3 2020, 10:51 PM

Updated data_sharing_stack_init_common

This revision is now accepted and ready to land.Sep 4 2020, 3:43 PM
This revision was landed with ongoing or failed builds.Sep 7 2020, 2:15 AM
This revision was automatically updated to reflect the committed changes.