Page MenuHomePhabricator

[libomptarget] Specialize amdgpu devicertl on wave size for gfx10
Needs ReviewPublic

Authored by JonChesterfield on Aug 19 2021, 12:00 PM.

Details

Summary

Use 32 bit arithmetic instead of relying on llvm to recognise
that the high half of various uint64_t values is zero for wave32 code.

Performance optimisation only. Relies on D108380 and D108391.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Aug 19 2021, 12:00 PM
Herald added a project: Restricted Project. · View Herald Transcript

Low priority, posting it so I don't forget about it. Would remove the only reviewer but phab automatically re-adds you.

JonChesterfield planned changes to this revision.Aug 26 2021, 2:12 AM

Only useful after D108708 has landed

openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
144–149

This should be a 'runtime' switch on gridvalues, will update the patch

JonChesterfield added inline comments.
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
52

As a performance optimisation, this is probably in the noise.

However it will eliminate all the warp32 vs wave64 differences in the deviceRTL, making gfx10 a useful datapoint for debugging works on nvptx and fails on amdgpu. That is, if gfx10 works, it suggests the bug is in wave size. If it fails, it suggests the bug is not in wave size.