- With that explicit exec mask manipulation, we may clobber global VGPRs during SGPR spilling or reloading. For instance, the following pseudo code illustrate such a clobbering:
v[62:63] = def(...);
spill(v[62:63], stack.0);
for (loop.cond) {
... = use(v[62:63]);
if (branch.cond) {
... reuse of v[62:v63];
... SGPR reload through v62 or v63;
v[62:63] = reload(stack.0);
// At this point, v[62:v63] is clobbered if branch.cond doesn't
// cover lanes 0 and 1.
}
}- For concerns in the origianl patch, we should not worry about the different exec masks between SGPR spills and reloads. As the IR is deSSA-ed from the original SSA form, we guarantee a def always dominates all its uses and thus the point spilling a value always dominates the point where that value is reloaded again. The exec mask at the reloading point is guaranteed to be a subset of the exec mask at the spilling point. As long as the SGPR is broadcasted to VGPR in the spilling point and v_readfirstlane is used to load SGPR from VGPR in the reloading point, the original SGPR value is always reloaded regard to that exec mask.
I don't think we need this, these instructions have already been lowered to v_readfirstlane when we try to optimize off the skip-jump.