This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
ClosedPublic

Authored by spatel on Apr 11 2017, 9:03 AM.

Details

Summary

Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that.

This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 11 2017, 9:03 AM
arsenm added a subscriber: arsenm.Apr 11 2017, 9:44 AM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885 ↗(On Diff #94837)

!LegalOperations check first? Also why not check if the xor/zext are legal?

spatel added inline comments.Apr 11 2017, 4:26 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885 ↗(On Diff #94837)

No good reason - just lazy.

Although if we add the TLI checks, I don't think we can also hoist !LegalOps? I'll upload a new patch with the extra checks.

spatel updated this revision to Diff 94907.Apr 11 2017, 4:27 PM

Patch updated:
Check legality of the new ops, so the transform can fire post-legalization too.

efriedma edited edge metadata.Apr 18 2017, 12:21 PM

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

Hmm...any ideas how to limit in that case? I was assuming that since we have:

// select Cond, 0, 1 --> zext (!Cond)

...this also makes sense.

If we have a compare op, we should be able to fold the 'not' op introduced here directly into the compare predicate. Or in the case of a crippled ISA like SSE that lacks inverted predicates, we might be able to fold the 'not' into the zext/mask...but as I think we can see in the SSE test already included, we're missing that fold. Ie, instead of:

movaps {{.*#+}} xmm1 = [1,1,1,1]
xorps %xmm1, %xmm0
andps %xmm1, %xmm0

We should have done:

pandn	[1,1,1,1], %xmm0

We can clean up after legalization either way, I guess. I'm more concerned about the test coverage; passing i1 vectors as arguments to a function isn't really representative of how i1 vectors are used in practice.

spatel updated this revision to Diff 96580.Apr 25 2017, 9:27 AM

Patch updated:
Added more vector tests and rebased test diffs after:
https://reviews.llvm.org/rL300725
https://reviews.llvm.org/rL300763
https://reviews.llvm.org/rL300772

I made changes to the DAG's simplifyDemandedBits and then got distracted with related IR transforms...
The demanded-bits changes helped x86 vector codegen in the way I expected, but I think the ARM regressions show that we're missing some generic folds.

Let me know if there are other tests we should have to expose these folds.

The getNOT call on ARM returns:
v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
(the 'v4i1' is backed by 32-bit constants, and they are 1, not -1)...which then does not match the definition for TLI.isConstTrueVal(). Something similar happens with x86 too, but we catch the not(setcc) pattern post-legalization. That fails on ARM because there are size changing ops obfuscating the pattern:

      t50: v4i32 = setcc t16, t19, seteq:ch
    t51: v4i16 = truncate t50
    t49: v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
  t52: v4i16 = xor t51, t49
t53: v4i32 = any_extend t52

I'm not sure your description is the full story about the ARM code for cmpgt_sext_inc_vec; it looks like the following gets simplified for AVX2 (might want to include this as a testcase):

define <4 x i64> @cmpgt_sext_inc_vec(<4 x i64> %x, <4 x i64> %y) {
  %cmp = icmp sgt <4 x i64> %x, %y
  %ext = sext <4 x i1> %cmp to <4 x i64>
  %add = add <4 x i64> %ext, <i64 1, i64 1, i64 1, i64 1>
  ret <4 x i64> %add
}

A testcase for something like "((a != b) & (c != d)) + 1" might also be interesting.

spatel updated this revision to Diff 96742.Apr 26 2017, 8:12 AM

Patch updated:
No code changes again, but added tests with:
rL301362 / rL301412
And improved 'true' detection with:
rL301408

I think all tests are improvements or neutral now. Let me know if I missed the intent on the new tests or if there's any other interesting pattern to look at. I was expecting ARM to match to 'vbic', but I'm not sure if that's better in practice than what we see here?

efriedma accepted this revision.Apr 26 2017, 10:49 AM

I think you've covered the interesting cases; LGTM.

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

This revision is now accepted and ready to land.Apr 26 2017, 10:49 AM

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

Ah, my NEON literacy is even worse than my SSE. I overlooked the differing widths.

This revision was automatically updated to reflect the committed changes.