Download Raw Diff

Details

Reviewers

craig.topper
hfinkel
efriedma

Commits

rGa0547c3d9f71: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
rL301457: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)

Summary

Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that.

This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180.

Diff Detail

Event Timeline

spatel created this revision.Apr 11 2017, 9:03 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptApr 11 2017, 9:03 AM

arsenm added a subscriber: arsenm.Apr 11 2017, 9:44 AM

arsenm added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885	!LegalOperations check first? Also why not check if the xor/zext are legal?

spatel added inline comments.Apr 11 2017, 4:26 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885	No good reason - just lazy. Although if we add the TLI checks, I don't think we can also hoist !LegalOps? I'll upload a new patch with the extra checks.

Patch updated:
Check legality of the new ops, so the transform can fire post-legalization too.

Ping.

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

In D31944#729560, @efriedma wrote:

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

Hmm...any ideas how to limit in that case? I was assuming that since we have:

// select Cond, 0, 1 --> zext (!Cond)

...this also makes sense.

If we have a compare op, we should be able to fold the 'not' op introduced here directly into the compare predicate. Or in the case of a crippled ISA like SSE that lacks inverted predicates, we might be able to fold the 'not' into the zext/mask...but as I think we can see in the SSE test already included, we're missing that fold. Ie, instead of:

movaps {{.*#+}} xmm1 = [1,1,1,1]
xorps %xmm1, %xmm0
andps %xmm1, %xmm0

We should have done:

pandn	[1,1,1,1], %xmm0

We can clean up after legalization either way, I guess. I'm more concerned about the test coverage; passing i1 vectors as arguments to a function isn't really representative of how i1 vectors are used in practice.

spatel mentioned this in D32230: [DAG] add splat vector support for 'and' in SimplifyDemandedBits.Apr 19 2017, 8:53 AM

spatel mentioned this in rL300725: [DAG] add splat vector support for 'and' in SimplifyDemandedBits.Apr 19 2017, 11:18 AM

Patch updated:
Added more vector tests and rebased test diffs after:
https://reviews.llvm.org/rL300725
https://reviews.llvm.org/rL300763
https://reviews.llvm.org/rL300772

I made changes to the DAG's simplifyDemandedBits and then got distracted with related IR transforms...
The demanded-bits changes helped x86 vector codegen in the way I expected, but I think the ARM regressions show that we're missing some generic folds.

Let me know if there are other tests we should have to expose these folds.

The getNOT call on ARM returns:
v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
(the 'v4i1' is backed by 32-bit constants, and they are 1, not -1)...which then does not match the definition for TLI.isConstTrueVal(). Something similar happens with x86 too, but we catch the not(setcc) pattern post-legalization. That fails on ARM because there are size changing ops obfuscating the pattern:

      t50: v4i32 = setcc t16, t19, seteq:ch
    t51: v4i16 = truncate t50
    t49: v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
  t52: v4i16 = xor t51, t49
t53: v4i32 = any_extend t52

I'm not sure your description is the full story about the ARM code for cmpgt_sext_inc_vec; it looks like the following gets simplified for AVX2 (might want to include this as a testcase):

define <4 x i64> @cmpgt_sext_inc_vec(<4 x i64> %x, <4 x i64> %y) {
  %cmp = icmp sgt <4 x i64> %x, %y
  %ext = sext <4 x i1> %cmp to <4 x i64>
  %add = add <4 x i64> %ext, <i64 1, i64 1, i64 1, i64 1>
  ret <4 x i64> %add
}

A testcase for something like "((a != b) & (c != d)) + 1" might also be interesting.

spatel mentioned this in D32505: [TargetLowering] fix isConstTrueVal to account for build vector truncation.Apr 26 2017, 7:05 AM

spatel mentioned this in rL301412: [x86] change tests to use sext, not zext; NFC.Apr 26 2017, 7:49 AM

Patch updated:
No code changes again, but added tests with:
rL301362 / rL301412
And improved 'true' detection with:
rL301408

I think all tests are improvements or neutral now. Let me know if I missed the intent on the new tests or if there's any other interesting pattern to look at. I was expecting ARM to match to 'vbic', but I'm not sure if that's better in practice than what we see here?

I think you've covered the interesting cases; LGTM.

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

This revision is now accepted and ready to land.Apr 26 2017, 10:49 AM

In D31944#738386, @efriedma wrote:

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

Ah, my NEON literacy is even worse than my SSE. I overlooked the differing widths.

Closed by commit rL301457: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X) (authored by spatel). · Explain WhyApr 26 2017, 1:39 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 94837

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/bool-ext-inc.ll

test/CodeGen/X86/bool-ext-inc.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 94837

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/bool-ext-inc.ll

test/CodeGen/X86/bool-ext-inc.ll

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
ClosedPublic