spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (161 w, 2 d)

Recent Activity

Fri, Jun 23

spatel committed rL306139: [x86] fix value types for SBB transform (PR33560).
[x86] fix value types for SBB transform (PR33560)
Fri, Jun 23, 11:43 AM
spatel committed rL306114: [x86] auto-generate complete checks; NFC.
[x86] auto-generate complete checks; NFC
Fri, Jun 23, 8:30 AM
spatel committed rL306113: [x86] auto-generate complete checks; NFC.
[x86] auto-generate complete checks; NFC
Fri, Jun 23, 8:23 AM
spatel committed rL306110: [x86] remove overridden target settings in test; NFC.
[x86] remove overridden target settings in test; NFC
Fri, Jun 23, 8:07 AM
spatel committed rL306109: [x86] rename test file and auto-generate complete checks; NFC.
[x86] rename test file and auto-generate complete checks; NFC
Fri, Jun 23, 7:59 AM

Thu, Jun 22

spatel committed rL306072: [x86] add/sub (X==0) --> sbb(cmp X, 1).
[x86] add/sub (X==0) --> sbb(cmp X, 1)
Thu, Jun 22, 4:48 PM
spatel committed rL306064: [x86] add more tests for select --> sbb transform; NFC.
[x86] add more tests for select --> sbb transform; NFC
Thu, Jun 22, 3:17 PM
spatel committed rL306040: [x86] add/sub (X==0) --> sbb(neg X).
[x86] add/sub (X==0) --> sbb(neg X)
Thu, Jun 22, 11:12 AM
spatel committed rL306032: [x86] add tests for select --> sbb transform; NFC.
[x86] add tests for select --> sbb transform; NFC
Thu, Jun 22, 10:02 AM
spatel committed rL306011: [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138).
[InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138)
Thu, Jun 22, 8:47 AM
spatel closed D33517: [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138) by committing rL306011: [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138).
Thu, Jun 22, 8:47 AM
spatel committed rL306008: [InstCombine] add peekThroughBitcast() helper; NFC.
[InstCombine] add peekThroughBitcast() helper; NFC
Thu, Jun 22, 8:28 AM
spatel accepted D34184: [InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2)) .

LGTM. See inline for a nit.

Thu, Jun 22, 6:25 AM
spatel accepted D34498: [InstCombine] Add one use checks to or/and->xnor folding.

LGTM - although the test file was not updated here in the uploaded patch?

Thu, Jun 22, 6:11 AM

Wed, Jun 21

spatel committed rL305936: [CGP, memcmp] replace CreateZextOrTrunc with CreateZext because it can never….
[CGP, memcmp] replace CreateZextOrTrunc with CreateZext because it can never…
Wed, Jun 21, 11:21 AM
spatel committed rL305935: [CGP] fix variables to be unsigned in memcmp expansion.
[CGP] fix variables to be unsigned in memcmp expansion
Wed, Jun 21, 11:07 AM
spatel committed rL305931: [x86] set the datalayout to match the RUN line triple; NFC.
[x86] set the datalayout to match the RUN line triple; NFC
Wed, Jun 21, 10:07 AM
spatel committed rL305914: [x86] fix formatting; NFC.
[x86] fix formatting; NFC
Wed, Jun 21, 7:28 AM
spatel accepted D34437: [InstCombine] Don't let folding (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) create more instructions than it removes.

LGTM. Don't forget to remove the 'TODO' comments from the test file.

Wed, Jun 21, 6:20 AM

Tue, Jun 20

spatel created D34416: [CGP] eliminate a sub instruction in memcmp expansion.
Tue, Jun 20, 12:52 PM
spatel added a comment to D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion .

For those following the progress for x86, I enabled the smallest sizes with:
https://reviews.llvm.org/rL305802

Tue, Jun 20, 10:14 AM
spatel committed rL305802: [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes.
[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
Tue, Jun 20, 8:59 AM
spatel committed rL305793: [InstCombine] fix code/test comments for r305792; NFC.
[InstCombine] fix code/test comments for r305792; NFC
Tue, Jun 20, 5:46 AM
spatel committed rL305792: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps.
[InstCombine] try to canonicalize xor-of-icmps to and-of-icmps
Tue, Jun 20, 5:41 AM
spatel closed D33342: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps by committing rL305792: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps.
Tue, Jun 20, 5:41 AM

Mon, Jun 19

spatel added a comment to D33342: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps.

Sorry about the delayed review.

Mon, Jun 19, 1:58 PM
spatel accepted D34350: Add tests to document current InstCombine behavior for clamp pattern..

LGTM.

Mon, Jun 19, 1:22 PM
spatel committed rL305734: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion.
[CGP, PowerPC] try to constant fold before creating loads for memcmp expansion
Mon, Jun 19, 12:49 PM
spatel closed D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion by committing rL305734: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion.
Mon, Jun 19, 12:49 PM
spatel added a comment to D33342: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps.

Ping * 3.

Mon, Jun 19, 11:23 AM
spatel added reviewers for D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion : hfinkel, echristo.
Mon, Jun 19, 11:21 AM
spatel accepted D34308: [InstCombine] Cleanup some duplicated one use checks.

LGTM.

Mon, Jun 19, 8:49 AM
spatel updated the diff for D34336: [x86] transform vector inc/dec to use -1 constant (PR33483).

Patch updated (no functional changes):

  1. Use APInt::isOneValue().
  2. Change comment about potential zero-latency implementation of pcmpeq - although that does seem possible for some uarch to implement if it wanted to. :)
Mon, Jun 19, 8:36 AM
spatel added a comment to D34336: [x86] transform vector inc/dec to use -1 constant (PR33483).

Here's a test loop:

Mon, Jun 19, 8:09 AM
spatel added a comment to D34336: [x86] transform vector inc/dec to use -1 constant (PR33483).

vpternlog does not have any idiom recognition.

For pcmpeq I think intel only avoids the dependency but still executes it. What does AMD do?

Confirmed with Agner's docs - Jaguar/Bulldozer/Ryzen all avoid input register dependencies for PCMPEQ/PCMPGT/PSUB/XOR/ANDN simd instructions when the two inputs are the same. They still create+execute uops (unlike move elimination) - but as integer ops can often go down most simd pipes and are low latency they are very unlikely to cause a performance regression.

Mon, Jun 19, 8:00 AM

Sun, Jun 18

spatel created D34336: [x86] transform vector inc/dec to use -1 constant (PR33483).
Sun, Jun 18, 3:53 PM
spatel added a comment to D34242: [InstCombine] canonicalize icmp predicate feeding select.

If the goal of this is to expose more opportunities for congruent expressions maybe GVN itself should do this canonicalization, maybe? Some algorithms perform, e.g., reassociation to catch more cases.
See, e.g.
http://www.sable.mcgill.ca/~hendren/621/karthikhandouts.pdf

Sun, Jun 18, 3:27 PM
spatel committed rL305656: [x86] specify triples and auto-generate complete checks; NFC.
[x86] specify triples and auto-generate complete checks; NFC
Sun, Jun 18, 2:49 PM
spatel committed rL305655: [x86] specify triples and auto-generate complete checks; NFC.
[x86] specify triples and auto-generate complete checks; NFC
Sun, Jun 18, 2:43 PM
spatel committed rL305654: [x86] specify triple and auto-generate checks; NFC.
[x86] specify triple and auto-generate checks; NFC
Sun, Jun 18, 2:31 PM
spatel committed rL305646: x86] adjust test constants to maintain coverage; NFC.
x86] adjust test constants to maintain coverage; NFC
Sun, Jun 18, 7:46 AM
spatel committed rL305645: [x86] adjust test constants to maintain coverage; NFC.
[x86] adjust test constants to maintain coverage; NFC
Sun, Jun 18, 7:24 AM
spatel committed rL305644: [x86] adjust test constants to maintain coverage; NFC.
[x86] adjust test constants to maintain coverage; NFC
Sun, Jun 18, 7:02 AM

Fri, Jun 16

spatel accepted D34162: [InstCombine] Set correct insertion point for selects generated while folding phis.

LGTM.

Fri, Jun 16, 12:42 PM
spatel added a comment to D34162: [InstCombine] Set correct insertion point for selects generated while folding phis.
In D34162#782594, @anna wrote:

Should we go ahead with this fix, and come up with cost model for vector constants?

Fri, Jun 16, 11:54 AM
spatel added a comment to D34162: [InstCombine] Set correct insertion point for selects generated while folding phis.

I don't think we even need the cmp:

Fri, Jun 16, 9:21 AM
spatel added a comment to D34162: [InstCombine] Set correct insertion point for selects generated while folding phis.

The code change looks right, but the test is not minimized, and it's also not checking the more complicated case of a phi with >2 incoming values.

Fri, Jun 16, 9:03 AM
spatel added a comment to D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion .

Ping.

Fri, Jun 16, 7:39 AM
spatel added a comment to D33186: [InstCombine] Canonicalize clamp of float types to minmax in fast mode..

@a.elovikov - I didn't find your name in bugzilla; you might be interested in:
https://bugs.llvm.org/show_bug.cgi?id=33467

Fri, Jun 16, 7:13 AM

Thu, Jun 15

spatel accepted D34244: [InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2) if K1 and K2 are a 1-bit mask.

LGTM.

Thu, Jun 15, 3:56 PM
spatel accepted D33406: PR28129 expand vector oparation to an IR constant..

LGTM.

Thu, Jun 15, 3:49 PM
spatel added a comment to D33517: [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138).

Ping * 3.

Thu, Jun 15, 12:27 PM
spatel added a comment to D34244: [InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2) if K1 and K2 are a 1-bit mask.

Will do.

Can I change foldOrOfIcmps to take CxtI by reference so we discourage anyone from passing nullptr.

Thu, Jun 15, 11:52 AM
spatel added a comment to D34184: [InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2)) .

I think you're right. The 'icmp' and the 'or' pieces aren't checked for single use. In the existing code we do reuse the 'and'. In my new code i did make sure the trunc only had one use so that putting back an 'and' doesn't increase anything. So in the existing code we potentially create an 'or' and possibly a 'shift' and only delete a 'select'. And the really bad case we create an 'xor', an 'or', and a 'shift' while only removing a 'select'. This is kinda weird to handle because we don't always create a shift and don't always create an xor.

Thu, Jun 15, 11:43 AM
spatel added a comment to D34244: [InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2) if K1 and K2 are a 1-bit mask.

Nice! I was wondering if that was missing.

Thu, Jun 15, 11:36 AM
spatel added a comment to D34184: [InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2)) .

The enhancement looks right, but I'm worried that the existing transform can increase the instruction count if any of the pieces in the pattern have more than one use. Are we missing m_OneUse() in some/all of the matchers?

Thu, Jun 15, 10:59 AM
spatel added reviewers for D33186: [InstCombine] Canonicalize clamp of float types to minmax in fast mode.: majnemer, efriedma, craig.topper.

Adding more potential reviewers.

Thu, Jun 15, 10:35 AM
spatel added inline comments to D34230: [InstCombine] Handle (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2) when the one of the Ands is commuted relative to the other.
Thu, Jun 15, 10:33 AM
spatel accepted D34230: [InstCombine] Handle (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2) when the one of the Ands is commuted relative to the other.

LGTM. See inline for a possible follow-up and nit.

Thu, Jun 15, 10:31 AM
spatel added a comment to D33406: PR28129 expand vector oparation to an IR constant..

Functionally, I think this is correct and complete now. See inline for some nits.

Thu, Jun 15, 9:53 AM
spatel created D34242: [InstCombine] canonicalize icmp predicate feeding select.
Thu, Jun 15, 9:32 AM
spatel committed rL305474: [InstCombine] auto-generate complete checks; NFC.
[InstCombine] auto-generate complete checks; NFC
Thu, Jun 15, 8:14 AM

Wed, Jun 14

spatel committed rL305414: [x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively().
[x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()
Wed, Jun 14, 1:37 PM
spatel added a comment to D33186: [InstCombine] Canonicalize clamp of float types to minmax in fast mode..

If there are no objections, I would add the regression tests with the current output as a preliminary step, so we document the current behavior.

Wed, Jun 14, 10:10 AM
spatel closed D34174: [x86] replace div/rem with shift/mask for shuffle combining by committing rL305398: [x86] replace div/rem with shift/mask for better shuffle combining perf.
Wed, Jun 14, 10:01 AM
spatel committed rL305398: [x86] replace div/rem with shift/mask for better shuffle combining perf.
[x86] replace div/rem with shift/mask for better shuffle combining perf
Wed, Jun 14, 10:01 AM
spatel accepted D34165: [ValueTracking] Correct early out in computeKnownBitsFromOperator to work with non power of 2 bit widths.

LGTM.

Wed, Jun 14, 9:33 AM
spatel added a comment to D34174: [x86] replace div/rem with shift/mask for shuffle combining.

This makes lots of sense to me. Honestly, fixing the multiplies might be worth it -- if you can benchmark >1% improvement, I might do it. I'm thinking about the architectures where integer multiplies are *very* slow (atom at least, also some ARM chips)

Wed, Jun 14, 9:24 AM
spatel committed rL305388: [MathExtras] fix documentation comments; NFC.
[MathExtras] fix documentation comments; NFC
Wed, Jun 14, 7:29 AM

Tue, Jun 13

spatel added a comment to D34165: [ValueTracking] Correct early out in computeKnownBitsFromOperator to work with non power of 2 bit widths.

The analysis looks right to me - this won't work for non-power-of-2 widths, and there's nothing restricting the possible widths coming into here AFAICT.

Tue, Jun 13, 3:23 PM
spatel created D34174: [x86] replace div/rem with shift/mask for shuffle combining.
Tue, Jun 13, 2:49 PM
spatel added a comment to D33185: Fix m_[Ord|Unord][FMin|FMax] matchers to correctly match ordering..

@spatel Yes he does. He sent me a mail asking for me to commit it for him. Do you want to do it or do you want me to?

Tue, Jun 13, 10:06 AM
spatel added a comment to D33185: Fix m_[Ord|Unord][FMin|FMax] matchers to correctly match ordering..

Thanks! Do you need someone to commit this for you?

Tue, Jun 13, 9:01 AM

Mon, Jun 12

spatel committed rL305243: fix typos/formatting; NFC.
fix typos/formatting; NFC
Mon, Jun 12, 3:35 PM
spatel added inline comments to D34056: Tail merge size.
Mon, Jun 12, 11:00 AM
spatel committed rL305208: [utils] remove ability to generate llc check lines from update_test_checks.py.
[utils] remove ability to generate llc check lines from update_test_checks.py
Mon, Jun 12, 10:45 AM
spatel committed rL305206: [x86] regenerate checks with update_llc_test_checks.py.
[x86] regenerate checks with update_llc_test_checks.py
Mon, Jun 12, 10:32 AM
spatel committed rL305202: [x86] regenerate checks with update_llc_test_checks.py.
[x86] regenerate checks with update_llc_test_checks.py
Mon, Jun 12, 10:06 AM
spatel accepted D33185: Fix m_[Ord|Unord][FMin|FMax] matchers to correctly match ordering..

Sorry for the delay. Can this bug be exposed with an IR transform or a codegen test?

I am not aware of such cases. I've faced that when working on D33186 but later decided to work with fast math only there. But I believe that this is still useful to fix. At least someone else won't be surprised by it.

Mon, Jun 12, 9:39 AM
spatel added a comment to D33185: Fix m_[Ord|Unord][FMin|FMax] matchers to correctly match ordering..

Sorry for the delay. Can this bug be exposed with an IR transform or a codegen test?

Mon, Jun 12, 9:06 AM
spatel committed rL305192: [DAG] add helper to bind memop chains; NFCI.
[DAG] add helper to bind memop chains; NFCI
Mon, Jun 12, 7:42 AM
spatel closed D33649: [DAG] add helper to bind memop chains; NFCI by committing rL305192: [DAG] add helper to bind memop chains; NFCI.
Mon, Jun 12, 7:42 AM
spatel committed rL305190: [InstCombine] lshr (sext iM X to iN), N-M --> zext (ashr X, min(N-M, M-1)) to iN.
[InstCombine] lshr (sext iM X to iN), N-M --> zext (ashr X, min(N-M, M-1)) to iN
Mon, Jun 12, 7:24 AM
spatel added a comment to D33649: [DAG] add helper to bind memop chains; NFCI.

The patch was approved with the requirement to use "getChain()", but IIUC, we can't do that here. Can someone confirm if I understood that correctly?

Mon, Jun 12, 6:29 AM

Sun, Jun 11

spatel added inline comments to D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion .
Sun, Jun 11, 9:22 PM
spatel committed rL305171: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32….
[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32…
Sun, Jun 11, 2:19 PM
spatel closed D33938: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load by committing rL305171: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32….
Sun, Jun 11, 2:19 PM
spatel added a comment to D33938: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load.

A couple of notes for reference:

  1. There's another potential case for trying harder to recognize a splat mask in PR32007:

https://bugs.llvm.org/show_bug.cgi?id=32007

Sun, Jun 11, 10:17 AM

Fri, Jun 9

spatel created D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion .
Fri, Jun 9, 4:40 PM
spatel committed rL305131: [PowerPC] add memcmp test with one constant operand and equality cmp; NFC.
[PowerPC] add memcmp test with one constant operand and equality cmp; NFC
Fri, Jun 9, 4:15 PM
spatel committed rL305129: [CGP] add a reference to DataLayout in MemCmpExpansion; NFCI.
[CGP] add a reference to DataLayout in MemCmpExpansion; NFCI
Fri, Jun 9, 4:01 PM
spatel added a comment to D33342: [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps.

Ping * 2.

Fri, Jun 9, 12:16 PM
spatel committed rL305081: [SimplifyLibCalls] fix formatting; NFC.
[SimplifyLibCalls] fix formatting; NFC
Fri, Jun 9, 7:22 AM
spatel committed rL305080: [ValueTracking] fix typo; NFC.
[ValueTracking] fix typo; NFC
Fri, Jun 9, 7:21 AM

Thu, Jun 8

spatel committed rL305011: [CGP, x86] add tests for potential memcmp expansion; NFC.
[CGP, x86] add tests for potential memcmp expansion; NFC
Thu, Jun 8, 1:41 PM
spatel committed rL305008: fix formatting; NFC.
fix formatting; NFC
Thu, Jun 8, 1:00 PM
spatel committed rL305007: [CGP] don't expand a memcmp with nobuiltin attribute.
[CGP] don't expand a memcmp with nobuiltin attribute
Thu, Jun 8, 12:48 PM
spatel closed D34043: [CGP] don't expand a memcmp with nobuiltin attribute by committing rL305007: [CGP] don't expand a memcmp with nobuiltin attribute.
Thu, Jun 8, 12:47 PM
spatel created D34043: [CGP] don't expand a memcmp with nobuiltin attribute.
Thu, Jun 8, 11:35 AM
spatel added a comment to D33517: [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138).

Ping * 2.

Thu, Jun 8, 11:01 AM