Page MenuHomePhabricator

[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`)
ClosedPublic

Authored by lebedev.ri on Dec 5 2019, 6:21 AM.

Details

Summary

D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce sub A, zext(B),
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.

This inverts canonicalization, and adds back the obvious fold
that we'd miss:

It is obvious that @ossfuzz_9880() / @lshr_out_of_range()/@ashr_out_of_range() (oss-fuzz 4871)
are no longer folded as much, though it isn't obvious whether those failures are important?

Diff Detail

Event Timeline

lebedev.ri created this revision.Dec 5 2019, 6:21 AM
spatel added a comment.Dec 5 2019, 8:10 AM

The select transform is 1 that I thought about but never got around to implementing. We should have at least 1 minimal test for that pattern, but I don't see that in the diffs?

Did you confirm that codegen is equal or better for these cases (apart from the fuzzer tests - I agree that those are not important)? I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.

lebedev.ri marked an inline comment as done.Dec 5 2019, 8:16 AM

The select transform is 1 that I thought about but never got around to implementing. We should have at least 1 minimal test for that pattern, but I don't see that in the diffs?

See llvm-project/llvm/test/Transforms/InstCombine/add.ll.

llvm/test/Transforms/InstCombine/add.ll
4–44 ↗(On Diff #232330)

@spatel i believe this is the test coverage for select transform

Did you confirm that codegen is equal or better for these cases?
I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.

I didn't yet. So, as far as i can tell, these are all the interesting cases:
(i'm looking at @t0_new_canon vs @t1_old_canon since that is the only change in this patch)
https://godbolt.org/z/vbb25Q - no regression for x86
https://godbolt.org/z/Htg7m5 - aarch64 also looks ok?
https://godbolt.org/z/xmP6JV - arm good?
https://godbolt.org/z/vNrNJ8 - thumb good?

So i'd say everything is already covered by backend undo folds?
Let me know if i'm missing the point here?

(apart from the fuzzer tests - I agree that those are not important)

Good to know.

spatel added a comment.Dec 5 2019, 9:00 AM

Did you confirm that codegen is equal or better for these cases?
I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.

I didn't yet. So, as far as i can tell, these are all the interesting cases:
(i'm looking at @t0_new_canon vs @t1_old_canon since that is the only change in this patch)
https://godbolt.org/z/vbb25Q - no regression for x86
https://godbolt.org/z/Htg7m5 - aarch64 also looks ok?
https://godbolt.org/z/xmP6JV - arm good?
https://godbolt.org/z/vNrNJ8 - thumb good?

So i'd say everything is already covered by backend undo folds?
Let me know if i'm missing the point here?

Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mb

If I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.

Did you confirm that codegen is equal or better for these cases?
I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.

I didn't yet. So, as far as i can tell, these are all the interesting cases:
(i'm looking at @t0_new_canon vs @t1_old_canon since that is the only change in this patch)
https://godbolt.org/z/vbb25Q - no regression for x86
https://godbolt.org/z/Htg7m5 - aarch64 also looks ok?
https://godbolt.org/z/xmP6JV - arm good?
https://godbolt.org/z/vNrNJ8 - thumb good?

So i'd say everything is already covered by backend undo folds?
Let me know if i'm missing the point here?

Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mb

If I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.

So the comment is that the undo fold needs to be adjusted first, to fire for non-cmp i1 vectors on aarch64 and powerpc64le?

Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mb

If I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.

So the comment is that the undo fold needs to be adjusted first, to fire for non-cmp i1 vectors on aarch64 and powerpc64le?

spatel accepted this revision.Dec 5 2019, 9:30 AM

Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mb

If I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.

So the comment is that the undo fold needs to be adjusted first, to fire for non-cmp i1 vectors on aarch64 and powerpc64le?

I don't think that's necessary as a preliminary step because we're improving what should be the common patterns (the examples that include the cmp).
LGTM.

This revision is now accepted and ready to land.Dec 5 2019, 9:30 AM

Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mb

If I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.

So the comment is that the undo fold needs to be adjusted first, to fire for non-cmp i1 vectors on aarch64 and powerpc64le?

I don't think that's necessary as a preliminary step because we're improving what should be the common patterns (the examples that include the cmp).
LGTM.

Ah, okay, good point :)
Thank you for the review.

This revision was automatically updated to reflect the committed changes.