This allows the scheduler to do a better job because it doesn't see the
exec mask manipulation instructions insterted by SIWholeQuadMode which
act as scheduling barriers. In particular, it can do a better job of
grouping image instructions with implicit derivatives.
WORK IN PROGRESS: some tests are still failing, generated code may not
be correct.