- With that explicit exec mask manipulation, we may clobber global VGPRs during SGPR spilling or reloading. For instance, the following pseudo code illustrate such a clobbering:
v[62:63] = def(...); spill(v[62:63], stack.0); for (loop.cond) { ... = use(v[62:63]); if (branch.cond) { ... reuse of v[62:v63]; ... SGPR reload through v62 or v63; v[62:63] = reload(stack.0); // At this point, v[62:v63] is clobbered if branch.cond doesn't // cover lanes 0 and 1. } }
- For concerns in the origianl patch, we should not worry about the different exec masks between SGPR spills and reloads. As the IR is deSSA-ed from the original SSA form, we guarantee a def always dominates all its uses and thus the point spilling a value always dominates the point where that value is reloaded again. The exec mask at the reloading point is guaranteed to be a subset of the exec mask at the spilling point. As long as the SGPR is broadcasted to VGPR in the spilling point and v_readfirstlane is used to load SGPR from VGPR in the reloading point, the original SGPR value is always reloaded regard to that exec mask.
clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming]
not useful