This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx10 atomic optimizer changes.
ClosedPublic

Authored by foad on Aug 2 2019, 3:33 AM.

Details

Summary

Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Diff Detail

Event Timeline

foad created this revision.Aug 2 2019, 3:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2019, 3:33 AM

I'm happy to spilt this up if the reviewers would like. There are a few NFC changes I could apply first, and/or I could try to split the wave32 changes out from the gfx10 dpp changes.

arsenm added inline comments.Aug 5 2019, 7:32 AM
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
289

I think it would end up being shorter/less line wrapping if you separately got the declaration for the update_dpp intrinsic and reused it in all of these places

293

I'm trying to avoid explicit getGeneration checks everywhere, and restricting them to all be in the Subtarget.

foad marked an inline comment as done.Aug 5 2019, 7:55 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
293

You mean I should define and use some more specific properties like hasDPPBroadcasts and hasDPPWavefrontShifts?

foad updated this revision to Diff 213552.Aug 6 2019, 2:31 AM

Add new hasDPPBroadcasts and hasDPPWavefrontShifts.
Use CreateCall instead of CreateIntrinsic in new helper functions.

foad marked 2 inline comments as done.Aug 6 2019, 2:32 AM
arsenm accepted this revision.Aug 18 2019, 8:21 AM

LGTM

This revision is now accepted and ready to land.Aug 18 2019, 8:21 AM
This revision was automatically updated to reflect the committed changes.