This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Do not clobber SCC in SIWholeQuadMode
ClosedPublic

Authored by nhaehnle on Jul 10 2016, 7:03 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 63426.Jul 10 2016, 7:03 AM
nhaehnle retitled this revision from to AMDGPU: Do not clobber SCC in SIWholeQuadMode.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
arsenm edited edge metadata.Jul 13 2016, 8:26 AM

Is WQM needed for correctness or is it just an optimization?

lib/Target/AMDGPU/SIWholeQuadMode.cpp
510–511 ↗(On Diff #63426)

These can be combined with findRegisterUseOperandIdx

551 ↗(On Diff #63426)

This function is kind of complicated to special case track SCC liveness. Can you use LivePhysRegs or check the LiveIntervals to simplify it (I think there was some issue with tracking physical registers with LIS last time I tried to do it, but I don't remember)

566–569 ↗(On Diff #63426)

The Clean = true can be hoisted to the initializer

572–573 ↗(On Diff #63426)

range loop

574 ↗(On Diff #63426)

Maybe this should get the read operand and see if it is undef?

578–579 ↗(On Diff #63426)

Should check if it's dead?

597 ↗(On Diff #63426)

I think these should use MI's DebugLoc

605–606 ↗(On Diff #63426)

Insert through LIS

609 ↗(On Diff #63426)

This can just return Restore

621 ↗(On Diff #63426)

Should set DebugLoc

638 ↗(On Diff #63426)

Should set DebugLoc

675 ↗(On Diff #63426)

Capitalize

test/CodeGen/AMDGPU/wqm.ll
423 ↗(On Diff #63426)

I don't see check lines for the cmp + select restore pattern here. Where is the scc def? This test also probably needs a comment

429 ↗(On Diff #63426)

CHECK-NEXT for s_cbranch_scc?

Can you post an MI dump example of when this happens? It seems like this isn't something you should need to handle

nhaehnle updated this revision to Diff 64251.Jul 17 2016, 6:14 AM
nhaehnle marked 4 inline comments as done.
nhaehnle edited edge metadata.

Rewrite in terms of LiveIntervals. (LivePhysRegs complains about remaining
virtual registers.)

Not sure about those DebugLocs. I'm not very familiar with their semantics,
but there isn't usually one particular instruction that can be pinned down
as the reason for inserting a new instruction (for example, when a new
instruction is added at the end of a basic block).

WQM is needed for correctness of pixel shaders. It must be enabled for all calculations that feed into derivative calculations (explicit ones via DS and implicit ones via image sample instructions), and it must be disabled for all stores.

test/CodeGen/AMDGPU/wqm.ll
423 ↗(On Diff #63426)

That's because I didn't actually manage to concoct a test where the pattern is unavoidable. Usually, the SCC def ends up right before the SCC use, so that the WQM instruction can just be moved around it.

I've tested the pattern with some artificial hacks in the code (in the latest version by setting First == Last in prepareInsertion).

The problem is that I cannot prove that it will never be needed. For example, it might be needed if the machine scheduler makes unusual decisions about moving store instructions between the SCC def and the SCC use.

Can you post an MI dump example of when this happens? It seems like this isn't something you should need to handle

The added test case in an example (I actually encountered this out in the real world in a piglit test.)

As you can see from the excerpt at http://paste.ubuntu.com/19764805/, the most natural place to insert the s_wqm_b64 would be between the s_cmp_lt and the s_cbranch. What the new code does instead is to maintain a range of places where the s_wqm_b64 could be inserted (based on the correctness constraints for WQM), and then finds a place in that range where SCC isnt' live.

nhaehnle updated this revision to Diff 64855.Jul 21 2016, 4:19 AM

Subsequent work revealed that there can be inconsistencies with SCC, so fix
those.

arsenm added inline comments.Jul 21 2016, 3:36 PM
include/llvm/CodeGen/LiveIntervalAnalysis.h
395–400 ↗(On Diff #64855)

This should probably be a separate patch

lib/Target/AMDGPU/SIWholeQuadMode.cpp
531–535 ↗(On Diff #64855)

I think this should use -1 instead of 1, and use compare != 0 to be more canonical

531–535 ↗(On Diff #64855)

Maybe you should just emit COPY and move this expansion to copyPhysReg for scc

nhaehnle updated this revision to Diff 65495.Jul 26 2016, 4:17 AM
nhaehnle marked 3 inline comments as done.
  • rebase
  • use COPY from/to SCC for save
  • separate removeRegUnit into its own patch
nhaehnle updated this revision to Diff 70365.Sep 6 2016, 2:26 AM

It turns out that r280589 introduced a piglit regression that needs this
change to be applied already now rather than as part of later WQM rework.

I had originally planned this after D22195, but with the recent changes to
control flow lowering that other patch is currently not ready to land / be
adapted, so I adjusted this SCC fix patch so that it applies on current SVN
trunk.

arsenm added inline comments.Sep 6 2016, 10:04 AM
lib/Target/AMDGPU/SIWholeQuadMode.cpp
424 ↗(On Diff #70365)

MI.isTransient() might be more accurate. INLINEASM for example will not be removed

nhaehnle updated this revision to Diff 70725.Sep 8 2016, 10:38 AM

Use MI.isTransient()

arsenm accepted this revision.Sep 9 2016, 11:22 AM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Sep 9 2016, 11:22 AM
This revision was automatically updated to reflect the committed changes.