When requesting a scalar value for a uniform VPDef, we can always return
lane 0. This can avoid unnecessary inserting some unncessary instructions
to duplicate the uniform value across lanes.
Not blocking this review, but I think it's bug-prone to mix lane 0 of scalarized divergent values and truly uniform values that can be kept on a single scalar. Possible examples:
top-loop: if %iv % VF != 0: inner-loop: %iv = [ 0, inner.ph ], [ %iv.next, inner.latch ] ; Uniform, but lane0 doesn't make much sense since it masked out. ... divergent exit condition
bb: %sel = select i1 %divergent, 42, %divergent.def ; divergent in general use %sel br i1 %divergent, label %uni.use.bb, label %bb2 uni.use.bb: %uni.phi = phi [ %sel, %bb ] ; "Conditionally" uniform - all active lanes have the same uniform value ; Long compute chain based on %uni.phi that we'd like to keep on a single scalar
In the latter case the correct extract for the uniform value would be from the first *active* lane, not from the lane 0. And I believe it's very easy to make a mistake if the same data storage is used for both scalarized parts of divergent values and for really uniform values that should be kept on a single scalar def/register.
To summarize - I think it's possible to implement everything correctly by repurposing lane0 storage for keeping uniform values, but it might lead (in future, once we try to implement more complex/complicated optimizations) to unexpected confusions and omissions that might lead to silent miscompiles (e.g. extracting undef values from lane0 instead of extracting required uniform values from the first active lane).