This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.amdgcn.softwqm intrinsic
ClosedPublic

Authored by critson on Jul 18 2019, 10:05 AM.

Details

Summary

Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader.

Diff Detail

Event Timeline

critson created this revision.Jul 18 2019, 10:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2019, 10:05 AM
arsenm added inline comments.Jul 18 2019, 10:16 AM
lib/Target/AMDGPU/SIISelLowering.cpp
5955–5959 ↗(On Diff #210619)

Is there some reason you can't just handle this with an instruction pattern?

critson marked an inline comment as done.Jul 19 2019, 3:25 AM
critson added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
5955–5959 ↗(On Diff #210619)

For the same reason as llvm.amdgcn.wqm, we don't specify the input and output types.
Happy to be corrected, but I don't think there is a way to have a single instruction pattern covering all types.

Have you checked that this actually fixes the reported CTS failure?

IIRC the CTS failure was essentially due to a shader of the form:

derivative calculation here
subgroup operation here

The derivative calculation enables WQM, but then we may leave WQM again for the subgroup operations which is unexpected (since helper lanes are expected to participate). So softwqm needs to seed WQM requirements, but only if there is at least one hard wqm requirement in the shader.

arsenm added inline comments.Jul 19 2019, 6:43 AM
lib/Target/AMDGPU/SIISelLowering.cpp
5955–5959 ↗(On Diff #210619)

It's easier to directly select than to enumerate all the possible types. I would still expect all of these direct-to-machine-node intrinsics to be handled in AMDGPUISelDAGToDAG

critson updated this revision to Diff 210865.Jul 19 2019, 11:04 AM

Add missing code in SI Fix SGPR copies.

Have you checked that this actually fixes the reported CTS failure?

Yes, with the associated (minimal) frontend changes this fixes the CTS failure.

The derivative calculation enables WQM, but then we may leave WQM again for the subgroup operations which is unexpected (since helper lanes are expected to participate). So softwqm needs to seed WQM requirements, but only if there is at least one hard wqm requirement in the shader.

While my understanding of "seed requirements" means "for the whole shader", this code does what you expect.
If there are any hard WQM requirements for the shader, then all softwqm instructions (and their dependencies) are marked WQM.

Okay thanks, I see the logic now.

lib/Target/AMDGPU/SIISelLowering.cpp
5955–5959 ↗(On Diff #210619)

You mean adding an AMDGPUDAGToDAGISel::SelectINTRINSIC_WO_CHAIN and lowering the softwqm intrinsic there? That does make sense to me.

lib/Target/AMDGPU/SIInstructions.td
114

s/wcm/wqm/

critson updated this revision to Diff 211099.Jul 22 2019, 7:41 AM

Move opcode selection to AMDGPUISelDAGToDAG.
Fix typo in comment.

critson marked an inline comment as done.Jul 22 2019, 7:44 AM

I've moved the selection to AMDGPUISelDAGToDAG.
If this code is appropriate I will submit a follow change to move the selection for llvm.amdgcn.wqm and llvm.amdgcn.wwm as well.

nhaehnle accepted this revision.Jul 24 2019, 1:27 AM

LGTM. Followup for WQM/WWM sounds good to me as well.

This revision is now accepted and ready to land.Jul 24 2019, 1:27 AM
This revision was automatically updated to reflect the committed changes.