This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] VPTransformState::get() can always return lane 0 for uniforms.
Needs RevisionPublic

Authored by fhahn on Nov 15 2020, 9:31 AM.

Details

Summary

When requesting a scalar value for a uniform VPDef, we can always return
lane 0. This can avoid unnecessary inserting some unncessary instructions
to duplicate the uniform value across lanes.

Diff Detail

Event Timeline

fhahn created this revision.Nov 15 2020, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2020, 9:31 AM
fhahn requested review of this revision.Nov 15 2020, 9:31 AM
reames accepted this revision.Nov 22 2020, 4:32 PM
reames added a subscriber: reames.

LGTM - though, please keep in mind I'm not fully familiar with this code. You may want to wait for another reviewer.

This revision is now accepted and ready to land.Nov 22 2020, 4:32 PM
reames requested changes to this revision.Nov 22 2020, 4:44 PM

Actually, LGTM revoked. This apparently depends on D91500 (which isn't marked in the metadata), and as commented there I don't have context.

This revision now requires changes to proceed.Nov 22 2020, 4:44 PM

Not blocking this review, but I think it's bug-prone to mix lane 0 of scalarized divergent values and truly uniform values that can be kept on a single scalar. Possible examples:

top-loop:
  if %iv % VF != 0:
    inner-loop:
      %iv = [ 0, inner.ph ], [ %iv.next, inner.latch ] ; Uniform, but lane0 doesn't make much sense since it masked out.
      ...
      divergent exit condition
bb:
  %sel = select i1 %divergent, 42, %divergent.def ; divergent in general
  use %sel
  br i1 %divergent, label %uni.use.bb, label %bb2

uni.use.bb:
  %uni.phi = phi [ %sel, %bb ] ; "Conditionally" uniform - all active lanes have the same uniform value
  ; Long compute chain based on %uni.phi that we'd like to keep on a single scalar

In the latter case the correct extract for the uniform value would be from the first *active* lane, not from the lane 0. And I believe it's very easy to make a mistake if the same data storage is used for both scalarized parts of divergent values and for really uniform values that should be kept on a single scalar def/register.

To summarize - I think it's possible to implement everything correctly by repurposing lane0 storage for keeping uniform values, but it might lead (in future, once we try to implement more complex/complicated optimizations) to unexpected confusions and omissions that might lead to silent miscompiles (e.g. extracting undef values from lane0 instead of extracting required uniform values from the first active lane).