This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Refactor exp instructions
ClosedPublic

Authored by arsenm on Aug 16 2015, 1:44 PM.

Details

Summary

Structure the definitions a bit more like the other classes.

The main change here is to split EXP with the done bit set
to a separate opcode, so we can set hasSideEffects = 1
if the done bit is in use since this has the special constraint
that it should be the last exp in the shader.

Previously all exp instructions were inferred to have unmodeled
side effects.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 32254.Aug 16 2015, 1:44 PM
arsenm retitled this revision from to AMDGPU: Refactor exp instructions.
arsenm updated this object.
arsenm added a reviewer: tstellarAMD.
arsenm added a subscriber: llvm-commits.
mareko added a subscriber: mareko.Aug 17 2015, 4:13 AM

DONE doesn't always mean it's the last EXP in the shader. It's only true for pixel shaders. For vertex shaders, it means it's the last Position export. Param vertex shader exports are unaffected and can be moved arbitrarily.

I think the DONE bit should be added at one of the last passes of compilation.

Especially we want to be free at Scheduling step to move around the exp instructions and have them the order we want.
I guess the export may take some time, so earlier the exp, the best.

We can also introduce in the SI scheduler some way to incitate to have the position exports first for vertex shaders.

I'm see cannot select error for EXPORT_DONE with this test case:

define void @main([9 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [17 x <4 x i32>] addrspace(2)* byval, [34 x <8 x i32>] addrspace(2)* byval, float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, i32, float, float) #0 {
main_body:

call void @llvm.SI.export(i32 0, i32 1, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)
ret void

}

declare void @llvm.SI.export(i32, i32, i32, i32, i32, i32, i32, i32, i32)

attributes #0 = { "ShaderType"="0" "enable-no-nans-fp-math"="true" }

Please ignore what I said. I've talked with a Catalyst guy and he said we should emit exports as late in the shader as possible. The export memory can be allocated in the first export instruction and it limits the number of waves that can be run from that point. It's okay to reorder exports, but please don't emit them too early.

Please ignore what I said. I've talked with a Catalyst guy and he said we should emit exports as late in the shader as possible. The export memory can be allocated in the first export instruction and it limits the number of waves that can be run from that point. It's okay to reorder exports, but please don't emit them too early.

This patch shouldn't effect the scheduling of exports much if any

tstellarAMD requested changes to this revision.Dec 1 2015, 8:46 AM
tstellarAMD edited edge metadata.
This revision now requires changes to proceed.Dec 1 2015, 8:46 AM
arsenm added a comment.Dec 1 2015, 1:14 PM

What needs to be done on this? The cannot select error is a frontend bug

arsenm updated this revision to Diff 63997.Jul 14 2016, 10:01 AM
arsenm edited edge metadata.

Update for trunk. Set mayLoad = 1 for EXP_DONE instead of hasSideEffects

arsenm updated this revision to Diff 64002.Jul 14 2016, 10:18 AM
arsenm edited edge metadata.

Change integer type

Could you briefly describe what effect this will have on behavior? (e.g. scheduling)

Update for trunk. Set mayLoad = 1 for EXP_DONE instead of hasSideEffects

Could you explain why ? To me it looks like hasSideEffects is the only way to guarantee the exp with the done bit is last.

Btw you forgot to update the comments in the part "Export opcodes" to reflect that change.

Update for trunk. Set mayLoad = 1 for EXP_DONE instead of hasSideEffects

Could you explain why ? To me it looks like hasSideEffects is the only way to guarantee the exp with the done bit is last.

Btw you forgot to update the comments in the part "Export opcodes" to reflect that change.

If exp_done is reading from unknown memory, it can't be reordered before the stores to unknown memory

Could you briefly describe what effect this will have on behavior? (e.g. scheduling)

I don't think it should have that much impact on scheduling, there might be some changes around the done part

Update for trunk. Set mayLoad = 1 for EXP_DONE instead of hasSideEffects

Could you explain why ? To me it looks like hasSideEffects is the only way to guarantee the exp with the done bit is last.

Btw you forgot to update the comments in the part "Export opcodes" to reflect that change.

I see some regressions with the mayLoad bit set for exp_done, but without it I see minimal changes: https://ghostbin.com/paste/b6ypo

I can drop the separate bit setting part of this patch for now and revisit it later

arsenm updated this revision to Diff 78118.Nov 15 2016, 6:12 PM

Update for trunk

arsenm updated this revision to Diff 78205.Nov 16 2016, 10:01 AM

Reorder operands

The TODOs need to be addressed, and for the FIXMEs in the intrinsic definition, I'd say we're better off just adding a real llvm.amdgcn.export (+ .done?) intrinsic in the future. Some more inline comments, but overall I think this patch is fine.

lib/Target/AMDGPU/AMDGPUInstrInfo.td
264–266

SDNPMayLoad instead of SDNPSideEffect for consistency.

lib/Target/AMDGPU/SIInstrInfo.td
680–681

s/hasSideEffects/mayLoad/

The TODOs need to be addressed, and for the FIXMEs in the intrinsic definition, I'd say we're better off just adding a real llvm.amdgcn.export (+ .done?) intrinsic in the future. Some more inline comments, but overall I think this patch is fine.

The only TODOs I see left are addressed in the D12068

FYI, I'm working on a pass that moves all EXP instructions to the end of shaders after the machine scheduler. We don't want any scheduling optimizations for those.

arsenm updated this revision to Diff 78556.Nov 18 2016, 11:00 AM
arsenm marked 2 inline comments as done.

Address comments, merge changes from other patch

arsenm updated this revision to Diff 78558.Nov 18 2016, 11:18 AM

Attach right patch. Bit inputs are temporarily printed as -1 until the next patch which fixes the printing

nhaehnle accepted this revision.Nov 21 2016, 1:42 AM
nhaehnle added a reviewer: nhaehnle.

LGTM

arsenm accepted this revision.Dec 5 2016, 12:35 PM
arsenm added a reviewer: arsenm.

r288695 with waitcnt insert fix

tstellarAMD accepted this revision.Dec 5 2016, 1:47 PM
tstellarAMD edited edge metadata.
This revision is now accepted and ready to land.Dec 5 2016, 1:47 PM
arsenm closed this revision.Dec 5 2016, 2:18 PM