spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (169 w, 1 d)

Recent Activity

Yesterday

spatel added inline comments to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.
Fri, Aug 18, 3:32 PM
spatel committed rL311193: fix typos in comments; NFC.
fix typos in comments; NFC
Fri, Aug 18, 1:30 PM
spatel accepted D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).

Thanks for breaking this up - LGTM.
See inline for a couple of nits.

Fri, Aug 18, 9:25 AM
spatel updated subscribers of D33987: [MergeICmps][WIP] Initial MergeICmps prototype.

Note: memcmp expansion for x86 was reduced with https://reviews.llvm.org/rL308986 because it caused a perf regression in PR33914 ( https://bugs.llvm.org/show_bug.cgi?id=33914 ).
There's a proposal to revert the expansion for x86 entirely in PR34032 ( https://bugs.llvm.org/show_bug.cgi?id=34032 ) because of a related perf regression and compile-time explosion in CGP (PR33900 - https://bugs.llvm.org/show_bug.cgi?id=33900 ). IMO, we need to solve the CGP addressing-mode bug anyway, but just so everyone is aware - memcmp expansion for x86 is still shaky at this point.

Fri, Aug 18, 8:08 AM

Thu, Aug 17

spatel added a comment to D36840: [DAG] convert vector select-of-constants to logic/math.

Patch updated:
Remove the general case fold which actually isn't a win if vselect is fast. We only convert the vselect when there's a clear optimization now.

Thu, Aug 17, 4:38 PM
spatel updated the diff for D36840: [DAG] convert vector select-of-constants to logic/math.

Patch updated:
Remove the general case fold which actually isn't a win if vselect is fast. We only convert the vselect when there's a clear optimization now.

Thu, Aug 17, 4:36 PM
spatel added a comment to D36840: [DAG] convert vector select-of-constants to logic/math.

On Thu, Aug 17, 2017 at 3:52 PM, <escha@apple.com> wrote:

btw, to note, blendv *is* a “fast op” on most arches; what’s one extra uop is the fact that the V form takes 3 inputs. iirc, it’s the same reason that ADC, CMOV, etc can’t be 1 uop (2 inputs + flags). the forms that don’t take 3 inputs are usually 1 uop.
Thu, Aug 17, 3:54 PM
spatel added reviewers for D36840: [DAG] convert vector select-of-constants to logic/math: escha, zvi, DavidKreitzer, aaboud, aivchenk.

Adding some more x86 experts.

Thu, Aug 17, 1:56 PM
spatel added a comment to D36840: [DAG] convert vector select-of-constants to logic/math.

> For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs

Are you sure?

Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.

Thu, Aug 17, 1:29 PM
spatel created D36840: [DAG] convert vector select-of-constants to logic/math.
Thu, Aug 17, 11:46 AM
spatel committed rL311103: [x86] add tests for vector select-of-constants; NFC.
[x86] add tests for vector select-of-constants; NFC
Thu, Aug 17, 10:08 AM
spatel committed rL311099: [PowerPC] add tests for vector select-of-constants; NFC.
[PowerPC] add tests for vector select-of-constants; NFC
Thu, Aug 17, 10:04 AM
spatel added a comment to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.

Thanks for the updates. I have no other suggestions.

Thu, Aug 17, 6:29 AM

Wed, Aug 16

spatel added a comment to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.

I checked in the tests from cmov-promotion.ll at rL311052, so we'd see the baseline codegen. Can you rebase after that?

Wed, Aug 16, 4:06 PM
spatel committed rL311052: [x86] add cmov promotion tests for D36711; NFC.
[x86] add cmov promotion tests for D36711; NFC
Wed, Aug 16, 3:51 PM
spatel accepted D36781: [InstCombine] Make folding (X >s -1) ? C1 : C2 --> ((X >>s 31) & (C2 - C1)) + C1 support splat vectors.

LGTM.

Wed, Aug 16, 1:43 PM
spatel added inline comments to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.
Wed, Aug 16, 1:26 PM
spatel accepted D36784: [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including those with vector splat shift amount).

Just to make sure I'm understanding: the multiply patch (D36679) is independent of this?

Wed, Aug 16, 11:44 AM
spatel added inline comments to D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).
Wed, Aug 16, 9:01 AM
spatel committed rL311009: [DemandedBits] simplify call; NFC.
[DemandedBits] simplify call; NFC
Wed, Aug 16, 7:29 AM

Tue, Aug 15

spatel added a comment to D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).

Can we have the ashr fix as a separate patch before the mul patch? If yes, make tests for just that diff and split this up?

Tue, Aug 15, 4:00 PM
spatel added inline comments to D36763: [InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector shifts with splat shift amount.
Tue, Aug 15, 3:38 PM
spatel accepted D36763: [InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector shifts with splat shift amount.

LGTM with change of the test diffs so this one goes in first.

Tue, Aug 15, 3:37 PM
spatel added a comment to D36763: [InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector shifts with splat shift amount.

Can you commit this one first with at least one test independent of those in D36498 to avoid the extra inst regressions seen there?

Tue, Aug 15, 3:29 PM
spatel added a comment to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.

The comment about multi-uses of the compare in D36711 made me wonder what we're doing here. Should we check one-use before trying to transform?

Tue, Aug 15, 2:59 PM
spatel added a comment to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.

Thank you for your comments, I updated the patch to address them:

  1. Done
  2. MCU target does not have CMOV instruction so it is always converted into test+branch. I firstly cared about it - that's the reason why it is here
Tue, Aug 15, 2:43 PM
spatel added inline comments to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.
Tue, Aug 15, 2:22 PM
spatel added a comment to D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).

I think all of the non-multiply tests are simplified after rL310942. Can you double-check that? If so, you can remove those tests from this patch. Feel free to add them as an independent NFC commit if they cover some case that isn't covered already.

Tue, Aug 15, 11:55 AM
spatel committed rL310942: [InstCombine] sink sext after ashr.
[InstCombine] sink sext after ashr
Tue, Aug 15, 11:26 AM
spatel committed rL310935: [InstCombine] add tests for sext+ashr; NFC.
[InstCombine] add tests for sext+ashr; NFC
Tue, Aug 15, 10:43 AM
spatel accepted D36743: [InstCombine] Added support for (X >>s C) << C --> X & (-1 << C).

LGTM.

Tue, Aug 15, 9:37 AM
spatel added inline comments to D36743: [InstCombine] Added support for (X >>s C) << C --> X & (-1 << C).
Tue, Aug 15, 8:45 AM
spatel added a comment to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.

It's worth noting that we could convert any select of constants to math ops. I was planning a follow-up to https://reviews.llvm.org/rL310717 that would do that, but there might be cases where we would prefer to select cmov? If not, that would make this patch obsolete.

Tue, Aug 15, 8:25 AM
spatel added a comment to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.

It's worth noting that we could convert any select of constants to math ops. I was planning a follow-up to https://reviews.llvm.org/rL310717 that would do that, but there might be cases where we would prefer to select cmov? If not, that would make this patch obsolete.

Tue, Aug 15, 8:19 AM
spatel added inline comments to D36711: [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments.
Tue, Aug 15, 8:11 AM
spatel added a comment to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.

This patch is really just making InstCombine self consistent. We currently optimize this case differently depending on whether i8 is legal in datalayout.

define i32 @test71(i32 %x) {
; CHECK-LABEL: @test71(
; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[X:%.*]], 6
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 2
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP2]], 40
; CHECK-NEXT: ret i32 [[TMP3]]
;

%1 = and i32 %x, 128
%2 = icmp eq i32 %1, 0
%3 = select i1 %2, i32 40, i32 42
ret i32 %3

}

If we want to remove foldSelectICmpAnd that's a different question.

Tue, Aug 15, 7:26 AM

Mon, Aug 14

spatel accepted D36646: [InstSimplify] Teach decomposeBitTestICmp to handle non-canonical compares.

LGTM.

Mon, Aug 14, 3:06 PM
spatel added a comment to D36646: [InstSimplify] Teach decomposeBitTestICmp to handle non-canonical compares.

Make sure I'm seeing this correctly: this patch should contain additional tests for ule/uge; ult/ugt should be added to the step before this (D36593)?

Mon, Aug 14, 11:04 AM
spatel added a comment to D36593: [InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify.

Wait - this does have a testable change, right? You can check the ult/ugt cases from InstSimplify with this:
http://rise4fun.com/Alive/XAV

Mon, Aug 14, 10:56 AM
spatel accepted D36593: [InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify.

LGTM.

Mon, Aug 14, 10:55 AM
spatel added a comment to D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.

I've been operating under the assumption that we want to transform in the opposite direction in IR. Ie, preserve and probably create more select-of-constants (see https://reviews.llvm.org/D24480 which I am planning to abandon). We should be able to reduce selects to logic/math in the DAG where it makes sense (eg, https://reviews.llvm.org/rL310717 ). Some reasons to prefer select were noted in:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html

Mon, Aug 14, 10:14 AM
spatel committed rL310849: [x86] fold the mask op on 8- and 16-bit rotates.
[x86] fold the mask op on 8- and 16-bit rotates
Mon, Aug 14, 8:56 AM
spatel closed D36644: [x86] fold the mask op on 8- and 16-bit rotates by committing rL310849: [x86] fold the mask op on 8- and 16-bit rotates.
Mon, Aug 14, 8:56 AM
spatel added inline comments to D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).
Mon, Aug 14, 8:49 AM
spatel committed rL310842: [BDCE] reduce scope of an assert (PR34179).
[BDCE] reduce scope of an assert (PR34179)
Mon, Aug 14, 8:14 AM

Sun, Aug 13

spatel accepted D36439: [InstCombine] Simplify and merge FoldOrWithConstants/FoldXorWithConstants.

LGTM - thanks!

Sun, Aug 13, 4:27 PM
spatel added inline comments to D36439: [InstCombine] Simplify and merge FoldOrWithConstants/FoldXorWithConstants.
Sun, Aug 13, 3:05 PM
spatel added inline comments to D36439: [InstCombine] Simplify and merge FoldOrWithConstants/FoldXorWithConstants.
Sun, Aug 13, 10:53 AM

Sat, Aug 12

spatel committed rL310779: [BDCE] clear poison generators after turning a value into zero (PR33695….
[BDCE] clear poison generators after turning a value into zero (PR33695…
Sat, Aug 12, 9:42 AM
spatel closed D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037) by committing rL310779: [BDCE] clear poison generators after turning a value into zero (PR33695….
Sat, Aug 12, 9:42 AM
spatel added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

Sat, Aug 12, 8:22 AM
spatel updated the diff for D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037).

Patch updated:
Replace the assume deletion with a comment about why it can't happen. :)

Sat, Aug 12, 7:58 AM
spatel updated the diff for D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037).

Patch updated - sorry, the last diff was not the one I intended to upload. This is NFC from the last rev, but:

  1. Includes Hal's comment about range metadata.
  2. Has CHECK lines for the poison_on_call_user_is_ok() test.
Sat, Aug 12, 7:33 AM
spatel created D36644: [x86] fold the mask op on 8- and 16-bit rotates.
Sat, Aug 12, 7:18 AM
spatel added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

Sat, Aug 12, 7:03 AM

Fri, Aug 11

spatel committed rL310770: [x86] add tests for rotate left/right with masked shifter; NFC.
[x86] add tests for rotate left/right with masked shifter; NFC
Fri, Aug 11, 3:41 PM
spatel committed rL310767: [x86] regenerate test checks, add 64-bit run; NFC.
[x86] regenerate test checks, add 64-bit run; NFC
Fri, Aug 11, 3:06 PM
spatel added inline comments to D35035: [InstCombine] Prevent memcpy generation for small data size.
Fri, Aug 11, 1:38 PM
spatel updated the diff for D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037).

Patch updated - mostly just cutting and pasting the code Hal posted:

  1. Make isExternallyVisible() check the calculated demanded bits rather than a list of opcodes.
  2. Add a check for and delete llvm.assume.
Fri, Aug 11, 12:32 PM
spatel committed rL310717: [x86] use more shift or LEA for select-of-constants (2nd try).
[x86] use more shift or LEA for select-of-constants (2nd try)
Fri, Aug 11, 8:45 AM
spatel closed D35340: [x86] use more shift or LEA for select-of-constants by committing rL310717: [x86] use more shift or LEA for select-of-constants (2nd try).
Fri, Aug 11, 8:45 AM
spatel updated the diff for D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037).

Patch updated:

  1. Use a loop (worklist) rather than recursion to avoid blowing up the stack.
  2. Add an exclusion list based on opcodes to stop the search.
  3. Add a test to show that the exclusion list works for a call inst.
  4. Add a TODO comment with Nuno's suggestion from the bug report to make this better.
Fri, Aug 11, 8:05 AM

Thu, Aug 10

spatel created D36592: [BDCE] clear poison generators after turning a value into zero (PR33695, PR34037).
Thu, Aug 10, 2:29 PM
spatel added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

This patch as uploaded makes even less sense to me than before:

  1. There are no tests to show the diff.
  2. You've changed the code rather than an assertion in an unrelated test to show a functional difference?
Thu, Aug 10, 8:51 AM
spatel committed rL310611: [InstCombine] add memcpy expansion tests with potential DL dependency; NFC.
[InstCombine] add memcpy expansion tests with potential DL dependency; NFC
Thu, Aug 10, 8:38 AM
spatel committed rL310603: [InstCombine] regenerate test checks; NFC.
[InstCombine] regenerate test checks; NFC
Thu, Aug 10, 8:08 AM
spatel committed rL310598: [InstCombine] regenerate test checks, add comments; NFC.
[InstCombine] regenerate test checks, add comments; NFC
Thu, Aug 10, 7:52 AM
spatel accepted D36505: [InstCombine] Make (X|C1)^C2 -> X^(C1^C2) iff X&~C1 == 0 work for splat vectors.

LGTM.

Thu, Aug 10, 7:23 AM

Wed, Aug 9

spatel committed rL310510: [SimplifyCFG] remove checks for crasher test from r310481.
[SimplifyCFG] remove checks for crasher test from r310481
Wed, Aug 9, 11:57 AM
spatel committed rL310509: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc….
[InstCombine] narrow rotate left/right patterns to eliminate zext/trunc…
Wed, Aug 9, 11:38 AM
spatel closed D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046) by committing rL310509: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc….
Wed, Aug 9, 11:38 AM
spatel updated the diff for D35340: [x86] use more shift or LEA for select-of-constants.

Patch updated:
As shown in PR34097 ( https://bugs.llvm.org/show_bug.cgi?id=34097 ), I failed to account for the subtract overflow case, so add a check for that.

Wed, Aug 9, 10:09 AM
spatel committed rL310490: [x86] add more tests for select-of-constants; NFC.
[x86] add more tests for select-of-constants; NFC
Wed, Aug 9, 8:57 AM
spatel updated the diff for D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).

Patch updated - no functional changes intended from the previous rev, but cleaner:

  1. Use isa<VectorType> in the assert and the callers' type check to make it clearer that vectors are allowed.
  2. Use m_SpecificInt in pattern matches to reduce code.
Wed, Aug 9, 6:24 AM
spatel added a comment to D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).
/// \brief Match a specific integer value or vector with all elements equal to
/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }
Wed, Aug 9, 6:21 AM

Tue, Aug 8

spatel added inline comments to D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).
Tue, Aug 8, 4:50 PM

Mon, Aug 7

spatel updated the diff for D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).

Patch updated:

  1. Add an assert for the shouldChangeType() guard that's in the caller function.
  2. Replace the explicit match for zext of the shift value with a call to MaskedValueIsZero().
  3. Remove the MaskedValueIsZero() check for the shift amount and generate our own masks to make the transform safe:

http://rise4fun.com/Alive/Qzm

  1. Add test to exercise #2 and #3.
Mon, Aug 7, 2:27 PM
spatel added a comment to D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).

I spent a bit of time playing with the generated for for various implementations of rotate for 16-bit values. It looks like this transform actually makes things worse; e.g. on ARM the transformed version compiles to one more instruction. Granted, that's probably not important; I think the optimal lowering for a 16-bit rotate on ARM actually involves lowering it to a 32-bit rotate and fixing up the result, and that's probably a transform which doesn't make sense until legalization.

Mon, Aug 7, 2:04 PM
spatel added a comment to D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).

Also, on x86, it looks like we're missing an important pattern here: we generate the sequence andb $15, %cl; rolw %cl, %ax.

Mon, Aug 7, 12:58 PM
spatel accepted D36384: [InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for vector splats.

LGTM.

Mon, Aug 7, 10:11 AM
spatel reopened D35340: [x86] use more shift or LEA for select-of-constants.

Reopening - the patch was reverted at rL310264 because it caused/exposed failures in test-suite jpeg tests:
https://bugs.llvm.org/show_bug.cgi?id=34097
https://bugs.llvm.org/show_bug.cgi?id=34105

Mon, Aug 7, 8:59 AM
spatel committed rL310264: [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097) .
[x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
Mon, Aug 7, 8:48 AM
spatel retitled D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046) from [InstCombine] narrow rotate left/patterns to eliminate zext/trunc (PR34046) to [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).
Mon, Aug 7, 7:57 AM
spatel created D36395: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).
Mon, Aug 7, 7:44 AM

Sun, Aug 6

spatel committed rL310208: [x86] use more shift or LEA for select-of-constants.
[x86] use more shift or LEA for select-of-constants
Sun, Aug 6, 9:28 AM
spatel closed D35340: [x86] use more shift or LEA for select-of-constants by committing rL310208: [x86] use more shift or LEA for select-of-constants.
Sun, Aug 6, 9:27 AM

Sat, Aug 5

spatel committed rL310181: [InstCombine] refactor trunc(binop) transforms; NFCI.
[InstCombine] refactor trunc(binop) transforms; NFCI
Sat, Aug 5, 8:20 AM

Fri, Aug 4

spatel committed rL310141: [InstCombine] narrow truncated add/sub/mul with constant.
[InstCombine] narrow truncated add/sub/mul with constant
Fri, Aug 4, 3:31 PM
spatel committed rL310122: [InstCombine] add vector tests for truncated math; NFC.
[InstCombine] add vector tests for truncated math; NFC
Fri, Aug 4, 1:39 PM
spatel committed rL310101: [InstCombine] auto-generate test checks; NFC.
[InstCombine] auto-generate test checks; NFC
Fri, Aug 4, 12:30 PM
spatel committed rL310060: [InstCombine] narrow lshr with constant.
[InstCombine] narrow lshr with constant
Fri, Aug 4, 8:43 AM
spatel accepted D36240: [InstCombine] Remove the (not (sext)) case from foldBoolSextMaskToSelect and inline the remaining code to match visitOr.

LGTM.

Fri, Aug 4, 7:46 AM

Thu, Aug 3

spatel committed rL309946: [NewGVN] fix typos; NFC.
[NewGVN] fix typos; NFC
Thu, Aug 3, 8:19 AM
spatel committed rL309945: [BDCE] add tests to show invalid/incomplete transforms.
[BDCE] add tests to show invalid/incomplete transforms
Thu, Aug 3, 8:08 AM

Wed, Aug 2

spatel accepted D36214: [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and (xor(sext(cmp)), -1) to ext(!cmp)..

LGTM.

Wed, Aug 2, 1:04 PM
spatel added a comment to D36214: [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and (xor(sext(cmp)), -1) to ext(!cmp)..

Please add the sext test to this patch. I assume a zext variant already exists - if not, we could use one of those too.
That way we'll be sure that the intended folds are still happening via other mechanisms.

Wed, Aug 2, 12:14 PM
spatel accepted D36234: [InstCombine] Support sext in foldLogicCastConstant.

LGTM. I think it makes sense for the same reason as the zext case.

Wed, Aug 2, 12:02 PM
spatel added a comment to D36214: [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and (xor(sext(cmp)), -1) to ext(!cmp)..

Double-check to make sure, but I think the sext variant isn't handled if we remove this block (sadly, there's no regression test for it):

Wed, Aug 2, 10:18 AM
spatel updated subscribers of D36213: [InstCombine] Remove check for sext of vector icmp from shouldOptimizeCast.

I pushed 'test7' through llc for x86 and PPC64LE, and no problems. But then I tried AArch64 and ARM, and they went nuts whether it was an 'xor' or an 'and':

Wed, Aug 2, 9:19 AM
spatel added a comment to D35182: [InstCombine] remove one-use restriction for not (cmp P, A, B) --> cmp P', A, B.

The C code with the global variables never creates a 'not' operation and never combines the 4 compares with 0. For your modified version, it looks like something combined the two compares with A into a single compare and the xor and with this patch instcombine undoes that. Isn't that moving in the opposite direction of PR27431?

Wed, Aug 2, 4:39 AM