This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
ClosedPublic

Authored by tstellarAMD on Jan 26 2016, 1:58 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

tstellarAMD retitled this revision from to AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions.
tstellarAMD updated this object.
tstellarAMD added a reviewer: arsenm.
tstellarAMD added a subscriber: llvm-commits.
arsenm edited edge metadata.Jan 26 2016, 7:14 PM

Thanks for finally doing this. Does this fix infinite loops? I didn't see one in the new tests, and I think there are a couple of XFAILed ones right now.

I think we should put an assert somewhere that the immediate for the branch target isn't overflowing. I'm worried that we could end up hitting that limit someday.

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
42 ↗(On Diff #46051)

Extra space before (

lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp
113 ↗(On Diff #46051)

Indentation looks off

115 ↗(On Diff #46051)

You should use TII->isSOPP(*I) here.

I think there should be an assert that I->isBranch()

lib/Target/AMDGPU/SIISelLowering.cpp
1168 ↗(On Diff #46051)

It seems like there should be an assert or something that this is actually a branch intrinsic

1169 ↗(On Diff #46051)

Typo leaglize

1172–1177 ↗(On Diff #46051)

The SetCC in the if should be folded into a single assert, or this whole block be put under an #ifndef NDEBUG block

lib/Target/AMDGPU/SIInstrInfo.cpp
2913–2914 ↗(On Diff #46051)

I think this could use a better name. How about something like
addSCCDefUsersToVALUWorklist?

2915–2916 ↗(On Diff #46051)

We should probably have a verifier check for this

2924–2925 ↗(On Diff #46051)

Why is this limited to branches? What if you had another user like a s_cselect? I would expect any user to need processing

lib/Target/AMDGPU/SIInstrInfo.td
252 ↗(On Diff #46051)

Extra space

855 ↗(On Diff #46051)

Commented out code

lib/Target/AMDGPU/SIInstructions.td
465 ↗(On Diff #46051)

Looks unnecessary

3118–3119 ↗(On Diff #46051)

This looks like the workaround for a tablegen bug with multiple outputs to reg_sequence. Maybe note that in a FIXME?

lib/Target/AMDGPU/SIRegisterInfo.cpp
413 ↗(On Diff #46051)

This should be the last in the array. The more common types should be first

test/CodeGen/AMDGPU/si-annotate-cf.ll
43–46 ↗(On Diff #46051)

Why this test change and the others like it?

test/CodeGen/AMDGPU/uniform-cfg.ll
362 ↗(On Diff #46051)

I think there should be a test where an icmp is the branch condition in a different block (there might be on here that I missed). One with the icmp used as the local branch's condition and another block's as well.

There are a few other perverse cases that might be nice to test, like branch condition on inline asm output.

mareko added a subscriber: mareko.Jan 27 2016, 3:10 AM

Is it this hardware bug handled properly?

Description: VCCZ bit corrupted by SMRD
Scope: SI+CI
Solution: Anytime a Scalar Memory read is issued, the shader compiler must assume that VCCZ is no longer valid AND cannot be loaded with a new value until all of the scalar memory reads have completed and returned their data, as guaranteed by inserting an S_WAITCNT LGKM_CNT(0) before issuing an instruction which updates VCCZ.

tstellarAMD marked 8 inline comments as done.Jan 29 2016, 9:11 AM
tstellarAMD added inline comments.
lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp
115 ↗(On Diff #46051)

You mean the assert should be inside the branch?

lib/Target/AMDGPU/SIInstrInfo.cpp
2924–2925 ↗(On Diff #46051)

If we don't limit this to branch instructions, then any instructions in the block which read scc will be moved to VALU, which we don't want. For most instructions the decision to move them to the VALU is based on explicit uses. This function needs to handle the cases where changing and implicit uses would force the instruction to go to the VALU.

test/CodeGen/AMDGPU/si-annotate-cf.ll
43–46 ↗(On Diff #46051)

Most of the tests with branching assume divergent branches, but with these changes most branches are uniform, so I forced branches to be divergent in places where the test assumes divergence.

arsenm added inline comments.Jan 29 2016, 9:08 PM
lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp
115 ↗(On Diff #46051)

Actually since this is from getFirstTerminator, I think the MachineVerifier would already catch this so it's not important

lib/Target/AMDGPU/SIInstrInfo.cpp
2924–2925 ↗(On Diff #46051)

But if the SCC def instruction was moved, any non-branch users would also need to be moved. We don't want all SCC users, just the ones until the next SCC def which is what this loop does. I don't think being implicit avoids the problem of needing to move the users.

Also I think this function should be only called for isDef() below. I think this works because the def check is still done for the original def instruction, but is more work to follow. This will end up getting called twice if you happen to have an addc to branch on overflow which has both a use and a def, and it will end up moving more instructions from the use.

The loop should also probably start at SCCDefInst + 1 also.

tstellarAMD marked 7 inline comments as done.Feb 5 2016, 4:12 PM
arsenm added a comment.Feb 5 2016, 4:45 PM

Did you mean to attach a new patch?

tstellarAMD edited edge metadata.

Clean-up moving SCC users to VALU.

arsenm accepted this revision.Feb 9 2016, 3:24 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Feb 9 2016, 3:24 PM
This revision was automatically updated to reflect the committed changes.