This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fix for shuffle to vector extend for non power 2 vectors
ClosedPublic

Authored by dstuttard on Jul 11 2017, 3:26 AM.

Details

Summary

See https://llvm.org/PR33743 for more details

It seems that for non-power of 2 vector sizes, the algorithm can produce
non-matching sizes for input and result causing an assert.

This usually isn't a problem as the isAnyExtend check will weed these out, but
in some cases (most often with lots of undefined values for the mask indices) it
can pass this check for non power of 2 vectors.

Adding in an extra check that ensures that bit size will match for the result
and input (as required)

Event Timeline

dstuttard created this revision.Jul 11 2017, 3:26 AM
dstuttard added a subscriber: llvm-commits.
dstuttard added a subscriber: RKSimon.

Not sure who is an appropriate reviewer for this, adding @RKSimon as he seems to have made several changes in the same area.

RKSimon edited edge metadata.

Any chance that you can bugpoint reduce the test case?

It's already bugpoint reduced - I tried trimming it further by hand but it hid the issue (unsurprisingly).

OK, I'm happy with the fix if the amdgpu guys are with the test case.

arsenm edited edge metadata.Aug 1 2017, 11:53 AM

Can you reduce the test any further?

Can you reduce the test any further?

It took quite a while to get it to this point :)
Attempts to reduce it further just stopped the problem occurring. If it is a big problem I can take another look.

I don't have time to finish this off, but this further reduced version of the test case appears to still show the same issue - please can you confirm (and possibly see if you further reduce it)?

define amdgpu_ps void @main() local_unnamed_addr {
.entry:
  br i1 undef, label %bb12, label %bb

bb:
  %tmp = tail call float @llvm.fma.f32(float undef, float 2.000000e+00, float -1.000000e+00)
  %tmp2 = bitcast float %tmp to i32
  %__llpc_global_proxy_r5.12.vec.insert = insertelement <4 x i32> undef, i32 %tmp2, i32 3
  %tmp3 = shufflevector <4 x i32> %__llpc_global_proxy_r5.12.vec.insert, <4 x i32> undef, <3 x i32> <i32 undef, i32 undef, i32 1>
  %tmp4 = bitcast <3 x i32> %tmp3 to <3 x float>
  %a2.i123 = extractelement <3 x float> %tmp4, i32 2
  %v2.i101 = fmul float %a2.i123, %a2.i123
  %tmp5 = bitcast float %v2.i101 to i32
  %__llpc_global_proxy_r2.0.vec.insert196 = insertelement <4 x i32> undef, i32 %tmp5, i32 0
  br label %bb12

bb12:
  %__llpc_global_proxy_r2.0 = phi <4 x i32> [ %__llpc_global_proxy_r2.0.vec.insert196, %bb ], [ undef, %.entry ]
  %tmp6 = shufflevector <4 x i32> %__llpc_global_proxy_r2.0, <4 x i32> undef, <3 x i32> <i32 1, i32 2, i32 3>
  %tmp7 = bitcast <3 x i32> %tmp6 to <3 x float>
  %a0.i = extractelement <3 x float> %tmp7, i32 0
  ret void
}

declare float @llvm.fma.f32(float, float, float)

@dstuttard Are you still looking at this?

@dstuttard Are you still looking at this?

Yes - haven't had time recently - I'll be looking again this week.

dstuttard updated this revision to Diff 116955.Sep 28 2017, 3:01 AM

Cutting down the test size and rebasing

Test now reduced further - thanks @RKSimon - confirmed that it still exhibits the problem and that the fix still fixes it.
I managed to remove a couple more lines from the cut-down version you provided.

@arsenm are you happy with the size of the test now?

RKSimon accepted this revision.Sep 30 2017, 3:05 AM

LGTM

This revision is now accepted and ready to land.Sep 30 2017, 3:05 AM

@dstuttard Are you ready to commit this?

@dstuttard Are you ready to commit this?

Yes - @arsenm are you happy with the size of the test now? I'll commit this tomorrow my time unless you object.

This revision was automatically updated to reflect the committed changes.