This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
ClosedPublic

Authored by cwabbott on Jul 17 2017, 5:48 PM.

Details

Summary

Right now, the WQM pass conflates two different things when tracking the
Needs of an instruction:

  1. Needs can be StateWQM, which is propagated to other instructions, and

means that this instruction (and everything it depends on) must be
calculated in WQM.

  1. Needs can be StateExact, which is not propagated to other

instructions, and means that this instruction must not be calculated in
WQM and WQM-ness must not be propagated past this instruction.

This works now because there are only two different states, but in the
future we want to be able to express things like "calculate this in WQM,
but please disable WWM and don't propagate it" (to implement
@llvm.amdgcn.set.inactive). In order to do this, we need to split the
per-instruction Needs field in two: a new Needs field, which can only
contain StateWQM (and in the future, StateWWM) and is propagated to
sources, and a Disables field, which can also contain just StateWQM or
nothing for now.

We keep the per-block tracking the same for now, by translating
Needs/Disables to the old representation with only StateWQM or
StateExact. The other place that needs special handling is when we
emit the state transitions. We could just translate back to the old
representation there as well, which we almost do, but instead of 0 as a
placeholder value for "any state," we explicitly or together all the
states an instruction is allowed to be in. This lets us refactor the
code in preparation for WWM, where we'll need to be able to handle
things like "this instruction must be in Exact or WQM, but not WWM."

Event Timeline

cwabbott created this revision.Jul 17 2017, 5:48 PM
tpr edited edge metadata.Jul 19 2017, 9:23 AM

Looks good to me, but will leave to Nicolai to review.

nhaehnle accepted this revision.Jul 20 2017, 2:36 PM

One minor comment. Either way, LGTM.

lib/Target/AMDGPU/SIWholeQuadMode.cpp
228–236

I think you can mask out the Flag first.

This revision is now accepted and ready to land.Jul 20 2017, 2:36 PM
cwabbott updated this revision to Diff 108375.Jul 26 2017, 3:47 PM

Mask out the flag first in markInstruction() to simplify the code a little and
avoid processing instructions unnecessarily once WWM is enabled.

cwabbott marked an inline comment as done.Jul 26 2017, 3:47 PM
cwabbott updated this revision to Diff 108710.Jul 28 2017, 2:07 PM
  • Fixed style issues.
  • Rebased on using setDesc() in lowerCopyInstrs()
  • Reworked logic for determining safe points to transition where SCC isn't live, to make using s_or_saveexec_b64 for entering WWM possible.
  • Added tests for WWM across multiple basic blocks and correct SCC handling. For the latter, I couldn't figure out how to make an IR test that exercised that path, so I made a MIR test for it.
cwabbott updated this revision to Diff 108720.Jul 28 2017, 3:01 PM

Fix WWM SCC test to actually test what it's supposed to, and fix a bug caught by it. Whoops.

Whoops, I accidentally squashed this with D35524 while rebasing. Those changes were meant to be applied to that.

cwabbott updated this revision to Diff 108734.Jul 28 2017, 3:48 PM

Reset to previous version. Whoops.

This revision was automatically updated to reflect the committed changes.