Whole Wavefront Wode (WWM) is required for implementing wavefront
reductions in non-uniform control flow, where we need to use the
inactive lanes to propagate intermediate results, so they need to be
enabled. We need to propagate WWM to uses (unless they're explicitly
marked as exact) so that they also propagate intermediate results
correctly. We do the analysis and exec mask munging during the WQM pass,
since we may get other, non-WWM instructions mixed in the the WWM
instructions, and we'd like to avoid the overhead of switching back and
forth if we can, but only the WQM pass has this information. For
simplicity, WWM is entirely block-local -- blocks are never WWM on entry
or exit of a block, and WWM is not propagated to block inputs/outputs.
This means that computations involving WWM cannot involve control flow,
but we only ever plan to use WWM for a few limited purposes (none of
which involve control flow) anyways.
Right now, the only way to specify WWM is through a pseudo operand on
DPP instructions (added in a separate change). This commit also adds
support for WQM on DPP instructions through wqm_ctrl.