Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm happy to spilt this up if the reviewers would like. There are a few NFC changes I could apply first, and/or I could try to split the wave32 changes out from the gfx10 dpp changes.
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp | ||
---|---|---|
289 | I think it would end up being shorter/less line wrapping if you separately got the declaration for the update_dpp intrinsic and reused it in all of these places | |
293 | I'm trying to avoid explicit getGeneration checks everywhere, and restricting them to all be in the Subtarget. |
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp | ||
---|---|---|
293 | You mean I should define and use some more specific properties like hasDPPBroadcasts and hasDPPWavefrontShifts? |
Add new hasDPPBroadcasts and hasDPPWavefrontShifts.
Use CreateCall instead of CreateIntrinsic in new helper functions.
I think it would end up being shorter/less line wrapping if you separately got the declaration for the update_dpp intrinsic and reused it in all of these places