[AMDGPU] Add an AMDGPU specific atomic optimizer.
Needs ReviewPublic

Authored by sheredom on Wed, Sep 12, 2:36 AM.

Details

Reviewers
nhaehnle
arsenm
Summary

This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by:

  • Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub.
  • If all arguments except the value to be added/subtracted are uniform, record the value to be optimized.
  • Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted.
  • Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight.
  • Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result.

Diff Detail

sheredom created this revision.Wed, Sep 12, 2:36 AM
sheredom updated this revision to Diff 165041.Wed, Sep 12, 2:44 AM

Update to include tip LLVM changes (DivergenceAnalysis -> LegacyDivergenceAnalysis, AMDGPUAS is become just an enum now).

Should include tests running only the IR pass

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
58

This can be dropped

61–62

This needs work, but should preserve the dominator tree

177

No getPrimitiveSizeInBits

208

All the consts are in the wrong place

213–214

I'm pretty sure there's a utility function that deals with what you're trying to do, that will also handle the DominatorTree updates for this

220

You pretty much should never use getPrimitiveSizeInBits directly. Use the DataLayout

230–235

You need to mark this call site as convergent

244–245

CreateIntrinsic

295–296

Can you use enums to derive these values

368–379

The canonical way to do this in IR is lshr x, 32 and truncate

arsenm added inline comments.Thu, Sep 13, 5:11 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
380

Checking a different value is suspicious

arsenm added inline comments.Thu, Sep 13, 5:14 AM
test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
72

Missing -LABELs

sheredom added inline comments.Thu, Sep 13, 6:11 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
58

You mean I don't need to include a getPassName at all? I was just including it for consistency with the other AMDGPU level passes (a bunch at least have the method overridden).

208

I ran clang-format -i -style=file AMDGPUAtomicOptimizer.cpp and this is what it did!

244–245

I can't use the IRBuilder's CreateIntrinsic because it will then try and mangle the amdgcn_mbcnt_lo like llvm.amdgcn.mbcnt.lo.i32, whereas we expect the name to be sans-type-suffix.

nhaehnle added inline comments.Fri, Sep 14, 1:52 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
244–245

Ugh, this is a bug in CreateIntrinsic. It does:

Function *Fn = Intrinsic::getDeclaration(M, ID, { Args.front()->getType() });

... when in this particular case, the type vector should be empty.

There aren't that many users of CreateIntrinsic -- can you just fix CreateIntrinsic so that it takes an ArrayRef<Type*> like Intrinsic::getDeclaration? (As a separate preparatory patch obviously.)

sheredom added inline comments.Fri, Sep 14, 2:00 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
244–245

Just to be clear - do you want me to get the IRBuilder fix in first? Or is it fine to let this patch in as is, and then fixup during the IRBuilder change?

nhaehnle added inline comments.Fri, Sep 14, 2:22 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
244–245

I think we should do it right the first time.

sheredom added inline comments.Fri, Sep 14, 4:36 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
244–245
sheredom updated this revision to Diff 165490.Fri, Sep 14, 6:25 AM

Added an interaction with the DominatorTree, so that if its present in the PassManager we can preserve + update it.

sheredom marked 8 inline comments as done.Fri, Sep 14, 6:26 AM
sheredom marked an inline comment as done.

This still needs to be adjusted for D52087, and I have one minor comment, but mostly LGTM.

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
340

The last two arguments are defaults, so you can just leave them out.

arsenm added inline comments.Mon, Sep 17, 6:56 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
58

Yes. We should be deleting them. I'm not sure why they exist in the first place - they are redundant with the name passed in the macro

arsenm added inline comments.Mon, Sep 17, 6:58 AM
lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
139–140

You can use I.getPointerOperand/getValOperand