This is an archive of the discontinued LLVM Phabricator instance.

Stop instcombining propagating wider shufflevector arguments to predecessors.
ClosedPublic

Authored by sheredom on Sep 26 2018, 6:36 AM.

Details

Summary

instcombine would propagate shufflevector insts that had wider output vectors onto predecessors, which would sometimes push undef's onto the divisor of a div/rem and result in bad codegen.

I've fixed this by just banning propagating shufflevector back if the result of the shufflevector is wider than the input vectors.

Diff Detail

Event Timeline

sheredom created this revision.Sep 26 2018, 6:36 AM

I found a case where instcombine would introduce an undef into the divisor of a div/rem instruction, which would in turn cause that instruction to be simplified away because 'if any vector component is undef, the whole div/rem is undef.'

I haven't stepped through any of the tests, but before we do that we need to correct this statement. The description only applies to integer ops, not FP. So we need to remove FP div/rem from this patch and/or show how those ops are causing different problems (and that would likely be a different patch).

Please minimize the tests that demonstrate the bug (don't need 'spir func' at least), and commit them to trunk with the current (buggy) output using the script at utils/update_test_checks.py to auto-generate the complete CHECK lines.

Just to be clear you want me to commit the tests in a separate commit with the bad output first?

I'll remove the spir_func, but everything else in the tests is requied (instcombine will only attempt the opt if it sees the insertelement instructions as the input to the div/rem from what I can tell).

I don't have commit access to LLVM's trunk (long time user, finally able to be a contributor), do you want me to just file another review for just the test case?

Just to be clear you want me to commit the tests in a separate commit with the bad output first?

Yes - that way we'll have evidence of the bogus output, so if the code patch that fixes those bugs gets reverted for some reason, we'll know that we've reintroduced potential miscompiles.

I'll remove the spir_func, but everything else in the tests is requied (instcombine will only attempt the opt if it sees the insertelement instructions as the input to the div/rem from what I can tell).

Sounds good - I couldn't tell just from looking how we trigger the problem.

I don't have commit access to LLVM's trunk (long time user, finally able to be a contributor), do you want me to just file another review for just the test case?

Great - welcome! Yes, if you don't have commit rights yet, just post the test file in another review, and I'll commit it on your behalf.

tpr added a subscriber: tpr.Sep 26 2018, 8:03 AM
sheredom updated this revision to Diff 167150.Sep 26 2018, 8:47 AM

Removed the fdiv/frem as they weren't currently inst simplifying the bad behaviour, and used the utils script to update the test case.

lebedev.ri added inline comments.Sep 26 2018, 12:35 PM
test/Transforms/InstCombine/stop_bad_undef_propagation.ll
3

You need to rebase this diff ontop of the D52556, so this diff actually shows the diff for the tests, too.

spatel added inline comments.Sep 26 2018, 1:34 PM
test/Transforms/InstCombine/stop_bad_undef_propagation.ll
3

I checked in the baseline at rL343140, so this can be rebased against trunk now. You'll want to regenerate the assertions using the script again because I made some cosmetic changes to the tests when I committed.

sheredom updated this revision to Diff 167251.Sep 27 2018, 2:07 AM

Rebased on tip trunk.

sheredom marked 2 inline comments as done.Sep 27 2018, 2:17 AM

I haven't actually stepped through the code changes yet, so let me know if I'm missing something. I'm just looking at the tests a bit closer now.

The problem is independent of any FP ops, so we can kill the int-to-fp-cast ops and the trailing fmul in all cases, and we still show the bug.
Going further, we don't even need the final shuffle in any of these tests?

define <3 x i32> @add(i32 %y, i32 %z) {
  %i0 = insertelement <2 x i32> undef, i32 %y, i32 0
  %i1 = insertelement <2 x i32> %i0, i32 %z, i32 1
  %a = udiv <2 x i32> %i1, <i32 255, i32 255>
  %ext = shufflevector <2 x i32> %a, <2 x i32> undef, <3 x i32> <i32 0, i32 1, i32 undef>
  ret <3 x i32> %ext
}

I haven't actually stepped through the code changes yet, so let me know if I'm missing something. I'm just looking at the tests a bit closer now.

The problem is independent of any FP ops, so we can kill the int-to-fp-cast ops and the trailing fmul in all cases, and we still show the bug.
Going further, we don't even need the final shuffle in any of these tests?

define <3 x i32> @add(i32 %y, i32 %z) {
  %i0 = insertelement <2 x i32> undef, i32 %y, i32 0
  %i1 = insertelement <2 x i32> %i0, i32 %z, i32 1
  %a = udiv <2 x i32> %i1, <i32 255, i32 255>
  %ext = shufflevector <2 x i32> %a, <2 x i32> undef, <3 x i32> <i32 0, i32 1, i32 undef>
  ret <3 x i32> %ext
}

And I gave away the punchline in that test name. :)
This is where I was heading, and I've now checked. Let's change the div to a seemingly safe 'add' instead:

define <3 x i32> @add(i32 %y, i32 %z) {
  %i0 = insertelement <2 x i32> undef, i32 %y, i32 0
  %i1 = insertelement <2 x i32> %i0, i32 %z, i32 1
  %a = add <2 x i32> %i1, <i32 255, i32 255>
  %ext = shufflevector <2 x i32> %a, <2 x i32> undef, <3 x i32> <i32 0, i32 1, i32 undef>
  ret <3 x i32> %ext
}
$ ./opt -instcombine badshuf.ll -S
define <3 x i32> @add(<3 x i32> %x, i32 %y, i32 %z) {
  %1 = insertelement <3 x i32> undef, i32 %y, i32 0
  %2 = insertelement <3 x i32> %1, i32 %z, i32 1
  %3 = add <3 x i32> %2, <i32 255, i32 255, i32 undef>
  ret <3 x i32> %3
}

That's probably not a desired canonicalization for any op. Yes, we eliminated a shuffle, but we created wider vector ops to make that happen, and that's not something we should do without a cost model.
So I think there's a simpler fix for the problem as demonstrated - don't call CanEvaluateShuffled() if this is shuffle that widens its inputs. There might still be a need for preventing undefs with div/rem, but we need a different test to show that.

sheredom updated this revision to Diff 167436.Sep 28 2018, 1:48 AM

Changed the check to not ever push wider shufflevector stuff back onto predecessor instructions as per spatel's suggestion!

lebedev.ri added inline comments.Sep 28 2018, 1:56 AM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
874

I think this should to into lib/IR/Instructions.cpp, somewhere around the ShuffleVectorInst::isIdentityMask()?
(isWideningMask())

test/Transforms/InstCombine/stop_bad_undef_propagation.ll
4

Is it intentional that the test coverage changes?

sheredom added inline comments.Sep 28 2018, 1:59 AM
test/Transforms/InstCombine/stop_bad_undef_propagation.ll
4

Aye - @spatel noted that what I was testing was actually a specific part of a wider problem (that we shouldn't be widening shuffle vector predecessors), so I decided to reduce the test case as a result. With my change any predecessor of a shuffle vector that widens the input will not be optimized in this fashion.

sheredom updated this revision to Diff 167446.Sep 28 2018, 2:41 AM

Move the shuffle widens check into Instructions.h, and call it increasesLength to match the existing changesLength call that was already on ShuffleVectorInst.

sheredom marked an inline comment as done.Sep 28 2018, 2:43 AM
sheredom added inline comments.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
874

So I moved it into Instructions.h instead, and called it increasesLength because there was an existing method changesLength on ShuffleVectorInst so I thought better to match that!

sheredom retitled this revision from Stop instcombining introducing undef's in div/rem instructions. to Stop instcombining propagating wider shufflevector arguments to predecessors..Sep 28 2018, 6:12 AM
sheredom edited the summary of this revision. (Show Details)
sheredom set the repository for this revision to rL LLVM.
lebedev.ri added inline comments.Sep 28 2018, 6:17 AM
include/llvm/IR/Instructions.h
2460
/// Examples: shufflevector <4 x n> A, <4 x n> B, <1,2,3>
///           shufflevector <4 x n> A, <4 x n> B, <1,2,3,4,5>
sheredom marked an inline comment as done.Sep 28 2018, 6:27 AM
sheredom added inline comments.
include/llvm/IR/Instructions.h
2460

You want me to change the comment of the function I didn't touch? I generally try and avoid doing that, but if you think it is worth it I can!

lebedev.ri added inline comments.Sep 28 2018, 6:35 AM
include/llvm/IR/Instructions.h
2460

Yes, i think that comment for a slightly related function should be updated.

spatel added inline comments.Sep 28 2018, 6:42 AM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
900–902

Leave formatting changes out of this patch, so we're not distracting from the real diff.

922

Leave formatting changes out of this patch, so we're not distracting from the real diff.

942

Leave formatting changes out of this patch, so we're not distracting from the real diff.

1468

The length check should be before CanEvaluateShuffled() to be more efficient (CanEval... is recursive and potentially costly).

sheredom updated this revision to Diff 167472.Sep 28 2018, 7:21 AM

Fix latest review comments.

sheredom marked 10 inline comments as done.Sep 28 2018, 7:22 AM
spatel accepted this revision.Sep 28 2018, 7:28 AM

Thanks - LGTM. As before, I'd prefer to have the baseline tests in place before the patch. Let me know if I should commit on your behalf.

This revision is now accepted and ready to land.Sep 28 2018, 7:28 AM

Thanks - LGTM. As before, I'd prefer to have the baseline tests in place before the patch. Let me know if I should commit on your behalf.

Yes please!

This revision was automatically updated to reflect the committed changes.