spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (191 w, 2 d)

Recent Activity

Today

spatel added a comment to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.

Tests committed at:
https://reviews.llvm.org/rL323043

Sat, Jan 20, 8:20 AM
spatel committed rL323043: [InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC.
[InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC
Sat, Jan 20, 8:15 AM
spatel added a comment to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.

I also prefer to check all of the tests in first to show the current output of trunk. That way if your code commit is reverted for some reason, at least the tests will survive and show what was lost by the revert.

check all of the tests in

check<..>in == commit

Sat, Jan 20, 8:07 AM
spatel accepted D42265: [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations.

LGTM.

Sat, Jan 20, 8:04 AM

Yesterday

spatel added a comment to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.

This looks right, but see inline comments for some small improvements. Do you have commit access?

No, unfortunately.

Fri, Jan 19, 3:28 PM
spatel created D42323: [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994).
Fri, Jan 19, 3:18 PM
spatel committed rL323003: [x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994).
[x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994)
Fri, Jan 19, 2:50 PM
spatel added a comment to D42265: [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations.

Seems like a good change. I was afraid of getting regressions from something like this, but the loosening of the tablegen matcher seems to be handling those cases.

Fri, Jan 19, 11:07 AM
spatel committed rL322961: [x86] add RUN line and auto-generate checks.
[x86] add RUN line and auto-generate checks
Fri, Jan 19, 9:11 AM
spatel committed rL322960: [x86] regenerate complete checks; NFC.
[x86] regenerate complete checks; NFC
Fri, Jan 19, 9:07 AM
spatel committed rL322957: [x86] shrink 'and' immediate values by setting the high bits (PR35907).
[x86] shrink 'and' immediate values by setting the high bits (PR35907)
Fri, Jan 19, 8:40 AM
spatel closed D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).
Fri, Jan 19, 8:40 AM
spatel committed rL322955: [InstSimplify] use m_Specific and commutative matcher to reduce code; NFCI.
[InstSimplify] use m_Specific and commutative matcher to reduce code; NFCI
Fri, Jan 19, 8:14 AM
spatel added a comment to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.

This looks right, but see inline comments for some small improvements. Do you have commit access?

Fri, Jan 19, 8:05 AM
spatel committed rC322950: [CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909).
[CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909)
Fri, Jan 19, 7:19 AM
spatel committed rL322950: [CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909).
[CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909)
Fri, Jan 19, 7:16 AM
spatel closed D42249: [CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909).
Fri, Jan 19, 7:16 AM

Thu, Jan 18

spatel committed rL322907: [InstSimplify] regenerate checks and add tests for commutes; NFC.
[InstSimplify] regenerate checks and add tests for commutes; NFC
Thu, Jan 18, 3:13 PM
spatel updated the diff for D42249: [CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909).

Patch updated:

  1. Removed comment that didn't add value.
  2. Added test with no sanitizing, reduced from PR35909.
Thu, Jan 18, 1:37 PM
spatel added a comment to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.

This patch raises a question that I don't know the answer to. We're missing the more general fold (divisor is not constant) in instcombine:

Thu, Jan 18, 11:37 AM
spatel added inline comments to D42032: [LLVM][PASSES][InstCombine] Fix (a + a + ...) / a cases.
Thu, Jan 18, 11:12 AM
spatel accepted D39958: [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr.

LGTM.

Thu, Jan 18, 10:31 AM
spatel created D42249: [CodeGenCXX] annotate a GEP to a derived class with 'inbounds' (PR35909).
Thu, Jan 18, 10:07 AM
spatel added inline comments to D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).
Thu, Jan 18, 8:02 AM
spatel updated the diff for D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).

Patch updated:

  1. Use DAG.MaskedValueIsZero() to make the code cleaner.
  2. Move the check for shrinkability ahead of the more expensive call to compute known bits.
  3. Check for a -1 'and' mask. In this case, replace and return. Don't create an unnecessary constant and don't select out-of-turn. Allow the normal selection sequence to continue.
Thu, Jan 18, 7:52 AM
spatel committed rL322855: [TargetLowering] add punctuation for readability; NFC.
[TargetLowering] add punctuation for readability; NFC
Thu, Jan 18, 7:28 AM

Wed, Jan 17

spatel added inline comments to D42090: [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector.
Wed, Jan 17, 12:49 PM
spatel added a comment to D39958: [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr.

I added the tests with current output in rL322733. Can you rebase here so we just see the diffs (please use the script at utils/update_test_checks.py to regenerate the check lines)? I think we'll be good then.

Wed, Jan 17, 11:11 AM
spatel committed rL322733: [InstCombine] add baseline tests for D39958; NFC.
[InstCombine] add baseline tests for D39958; NFC
Wed, Jan 17, 11:07 AM
spatel added a comment to D41608: [InstCombine] Missed optimization in math expression: aggressive optimization with pow.

pow(a, x) * a * a * a * a emits to

define double @pow_ab_x_aaaa_fast(double %a, double %x) {
  %1 = call fast double @llvm.pow.f64(double %a, double %x)
  %2 = fmul fast double %a, %a
  %3 = fmul fast double %2, %2
  %mul4 = fmul fast double %3, %1
  ret double %mul4
}

I don't see obvious ways to fold these instructions. I Would appreciate if somebody could help me with this.

Wed, Jan 17, 10:54 AM
spatel accepted D42144: [utils] Make .cfi_startproc optional for powerpc.

LGTM - feel free to make any script changes to improve PPC handling.

Wed, Jan 17, 10:41 AM
spatel accepted D42086: [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast.

LGTM.

Wed, Jan 17, 10:38 AM
spatel added a comment to D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).

Here's an example of the SimplifyDemandedBits failure that avoids the preliminary div transform in the existing test:

Wed, Jan 17, 9:50 AM
spatel added inline comments to D42090: [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector.
Wed, Jan 17, 7:12 AM
spatel committed rL322662: [InstCombine] fix demanded-bits propagation for zext/trunc.
[InstCombine] fix demanded-bits propagation for zext/trunc
Wed, Jan 17, 6:40 AM
spatel committed rL322660: [InstCombine] add test to show hole in demanded bits; NFC.
[InstCombine] add test to show hole in demanded bits; NFC
Wed, Jan 17, 6:29 AM

Tue, Jan 16

spatel added a comment to D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).

I wonder if we should do this in PreprocessISelDAG so the killing off of the AND when the mask is all 1s doesn't seem quite so weird. Right now when it happens the Select call ends up doing selection on the input to the AND.

Tue, Jan 16, 12:14 PM
spatel added a comment to D42090: [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector.

Can you upload again with context?

Tue, Jan 16, 8:41 AM
spatel added inline comments to D42086: [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast.
Tue, Jan 16, 8:27 AM

Mon, Jan 15

spatel created D42088: [x86] shrink 'and' immediate values by setting the high bits (PR35907).
Mon, Jan 15, 2:25 PM
spatel committed rL322523: [x86] add tests to show missed constant shrinking (PR35907); NFC.
[x86] add tests to show missed constant shrinking (PR35907); NFC
Mon, Jan 15, 1:59 PM
spatel committed rL322522: [x86] regenerate test checks; NFC.
[x86] regenerate test checks; NFC
Mon, Jan 15, 1:34 PM
spatel committed rL322521: [x86] regenerate test checks; NFC.
[x86] regenerate test checks; NFC
Mon, Jan 15, 1:30 PM
spatel committed rL322519: [x86] regenerate test checks; NFC.
[x86] regenerate test checks; NFC
Mon, Jan 15, 1:24 PM
spatel added a comment to D41599: [X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM.

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

I'm still a little worried about this - it can create a lot more bit differences in results than previous other intrinsics where we've replaced with generic implementations - I guess _mm_div_ps already does this to an extent (and other fadd/fsub/fmul cases via re-association etc.).

Maybe I'm just being a little over cautious, but at very least I'd like to see D41168 update the intrinsic documentation to explain that -ffast-math may result in rsqrt+nr codgen under some circumstances - it still says that (v)sqrtps will be generated.

Mon, Jan 15, 7:37 AM

Sun, Jan 14

spatel committed rL322457: [x86] auto-generate complete checks; NFC.
[x86] auto-generate complete checks; NFC
Sun, Jan 14, 9:49 AM
spatel committed rL322456: [InstSimplify] fix code comments; NFC.
[InstSimplify] fix code comments; NFC
Sun, Jan 14, 8:01 AM

Sat, Jan 13

spatel committed rL322439: [InstSimplify] fold implied null ptr check (PR35790).
[InstSimplify] fold implied null ptr check (PR35790)
Sat, Jan 13, 7:46 AM

Fri, Jan 12

spatel committed rL322411: [InstSimplify] add tests for implied ptr cmp with null (PR35790); NFC.
[InstSimplify] add tests for implied ptr cmp with null (PR35790); NFC
Fri, Jan 12, 2:18 PM
spatel updated subscribers of D41283: [InstCombine] Missed optimization in math expression: tan(a) * cos(a) == sin(a).

Why doesn't tan have an LLVM intrinsic like sin/cos?

I don't know

Fri, Jan 12, 10:10 AM
spatel added a comment to D41283: [InstCombine] Missed optimization in math expression: tan(a) * cos(a) == sin(a).

Why doesn't tan have an LLVM intrinsic like sin/cos?

Fri, Jan 12, 9:50 AM

Thu, Jan 11

spatel committed rL322327: [InstSimplify] fold implied cmp with zero (PR35790).
[InstSimplify] fold implied cmp with zero (PR35790)
Thu, Jan 11, 3:28 PM
spatel committed rL322323: [InstSimplify] add tests for implied cmp with zero (PR35790); NFC.
[InstSimplify] add tests for implied cmp with zero (PR35790); NFC
Thu, Jan 11, 2:49 PM
spatel updated subscribers of D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).

For reference, @bkramer improved (thanks!) the attribute propagation:
rL322284
rL322285

Thu, Jan 11, 8:01 AM
spatel committed rL322283: [ValueTracking] recognize min/max-of-min/max with notted ops (PR35875).
[ValueTracking] recognize min/max-of-min/max with notted ops (PR35875)
Thu, Jan 11, 7:15 AM
spatel committed rL322281: [InstCombine] add min3-with-nots test (PR35875); NFC.
[InstCombine] add min3-with-nots test (PR35875); NFC
Thu, Jan 11, 6:55 AM

Wed, Jan 10

spatel committed rL322238: [AArch64] add tests for notted variants of min/max; NFC.
[AArch64] add tests for notted variants of min/max; NFC
Wed, Jan 10, 3:33 PM
spatel accepted D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).

I changed the brute force float checker to test 1/tan(x), and it matches cos(x)/sin(x) in all cases on macOS 10.13. This is also true on Ubuntu 17.10 x86-64.

Wed, Jan 10, 12:45 PM
spatel added a comment to D41353: [InstCombine] Adjusting bswap pattern to the new masked shl canonization.

If I understand the relationship correctly between the patches, this patch would move an existing failure of bswap matching to a different place. Shouldn't we improve the matching instead? Ie, I was expecting an independent patch from D41233 here.

Wed, Jan 10, 10:54 AM
spatel committed rL322206: [InstCombine] add test to show missed bswap; NFC.
[InstCombine] add test to show missed bswap; NFC
Wed, Jan 10, 10:48 AM

Tue, Jan 9

spatel committed rL322104: [InstCombine] weaken assertions for icmp folds (PR35846).
[InstCombine] weaken assertions for icmp folds (PR35846)
Tue, Jan 9, 10:57 AM
spatel committed rL322087: [SelectionDAG] lower math intrinsics to finite version of libcalls when….
[SelectionDAG] lower math intrinsics to finite version of libcalls when…
Tue, Jan 9, 7:42 AM
spatel closed D41338: [CodeGen] lower math intrinsics to finite version of libcalls when possible (PR35672).
Tue, Jan 9, 7:42 AM

Mon, Jan 8

spatel updated subscribers of D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).

log(pow(x, y)) -> y*log(x)

Mon, Jan 8, 3:23 PM
spatel added a comment to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).

If you have a numerical explanation of why this patch can go in, I'll be happy to accept,

Mon, Jan 8, 2:46 PM
spatel added inline comments to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Mon, Jan 8, 1:34 PM
spatel added inline comments to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Mon, Jan 8, 11:55 AM
spatel added inline comments to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Mon, Jan 8, 10:57 AM
spatel committed rL322016: [ValueTracking] remove overzealous assert.
[ValueTracking] remove overzealous assert
Mon, Jan 8, 10:32 AM
spatel added a comment to D41338: [CodeGen] lower math intrinsics to finite version of libcalls when possible (PR35672).

I noticed that we can expose the availability bug via constant propagation, so I split that part of the patch off here:
rL322010

Mon, Jan 8, 9:48 AM
spatel committed rL322010: [TargetLibraryInfo] fix finite mathlib function availability.
[TargetLibraryInfo] fix finite mathlib function availability
Mon, Jan 8, 9:39 AM
spatel added inline comments to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Mon, Jan 8, 8:05 AM
spatel committed rL321998: [InstCombine] fold min/max tree with common operand (PR35717).
[InstCombine] fold min/max tree with common operand (PR35717)
Mon, Jan 8, 7:06 AM
spatel closed D41603: [InstCombine] fold min/max tree with common operand (PR35717).
Mon, Jan 8, 7:06 AM
spatel added a comment to D41603: [InstCombine] fold min/max tree with common operand (PR35717).

If I'm right about the easy FP cases, I'd like to see those handled in follow-up. Regardless, this LGTM.

Mon, Jan 8, 7:00 AM

Sat, Jan 6

spatel committed rL321936: [InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b).
[InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b)
Sat, Jan 6, 9:35 AM
spatel committed rL321935: [InstCombine] add more tests for max(~a, ~b) and PR35834; NFC.
[InstCombine] add more tests for max(~a, ~b) and PR35834; NFC
Sat, Jan 6, 9:16 AM
spatel committed rL321934: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325).
[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
Sat, Jan 6, 8:17 AM
spatel closed D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325).
Sat, Jan 6, 8:17 AM

Fri, Jan 5

spatel committed rL321882: [InstCombine] add folds for min(~a, b) --> ~max(a, b).
[InstCombine] add folds for min(~a, b) --> ~max(a, b)
Fri, Jan 5, 11:02 AM
spatel added inline comments to D41286: [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Fri, Jan 5, 10:43 AM

Thu, Jan 4

spatel updated the diff for D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325).

Patch updated:
Add a RUN to the IR test for expansion so we see exactly how the IR differs between the old and new settings. I just did that for x64 because x32 would be the same in most cases.

Thu, Jan 4, 12:48 PM
spatel added inline comments to D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325).
Thu, Jan 4, 12:42 PM
spatel committed rL321801: [InstCombine] safely create a constant of the right type (PR35794).
[InstCombine] safely create a constant of the right type (PR35794)
Thu, Jan 4, 6:33 AM

Wed, Jan 3

spatel updated the diff for D41603: [InstCombine] fold min/max tree with common operand (PR35717).

Patch updated:
I adjusted the use checks to be '<='. This accounts for non-canonical min/max patterns as seen in the motivating test (the compare operands are not the same as the select operands). So now this patch should be the last step towards solving PR35717.

Wed, Jan 3, 2:33 PM
spatel created D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325).
Wed, Jan 3, 1:31 PM
spatel committed rL321756: [ExpandMemcmp] rename variables and add hook to override pref for number of….
[ExpandMemcmp] rename variables and add hook to override pref for number of…
Wed, Jan 3, 12:04 PM
spatel added a comment to D41338: [CodeGen] lower math intrinsics to finite version of libcalls when possible (PR35672).

Ping.

Wed, Jan 3, 9:44 AM
spatel added a comment to D41659: Implementing missing trigonometric optimizations.

Thanks for your patch!

Usually it is easier to review smaller patches. Splitting it up into 4 patches for the 4 different cases you implemented would make it slightly easier to reason about the 4 unrelated transformation in isolation.

Wed, Jan 3, 6:49 AM
spatel accepted D41381: [InstSimplify] Missed optimization in math expression: squashing exp(log), log(exp).

LGTM - thanks for the clean-up.

Wed, Jan 3, 6:26 AM

Tue, Jan 2

spatel accepted D41381: [InstSimplify] Missed optimization in math expression: squashing exp(log), log(exp).

LGTM - see inline comments for some potential improvements.

Tue, Jan 2, 2:28 PM
spatel committed rL321673: [AArch64] fix typos in comments; NFC.
[AArch64] fix typos in comments; NFC
Tue, Jan 2, 1:05 PM
spatel committed rL321672: [ValueTracking] recognize min/max of min/max patterns.
[ValueTracking] recognize min/max of min/max patterns
Tue, Jan 2, 12:58 PM
spatel committed rL321668: [AArch64] add tests for min/max of min/max (PR35717); NFC.
[AArch64] add tests for min/max of min/max (PR35717); NFC
Tue, Jan 2, 12:18 PM
spatel committed rL321656: [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons….
[x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons…
Tue, Jan 2, 8:39 AM
spatel closed D41618: [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons (PR33325).
Tue, Jan 2, 8:39 AM
spatel added a comment to D41381: [InstSimplify] Missed optimization in math expression: squashing exp(log), log(exp).

What is the expected result for this kind of test (please add at least one test like this)?

define double @exp_log_partly_fast(double %a) {
  %1 = call double @llvm.log.f64(double %a)          ; strict
  %2 = call fast double @llvm.exp.f64(double %1)
  ret double %2
}
Tue, Jan 2, 8:08 AM
spatel updated the diff for D41618: [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons (PR33325).

Patch updated:
No functional difference from previous draft, but improved code comments and variable name.

Tue, Jan 2, 7:16 AM
spatel added inline comments to D41618: [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons (PR33325).
Tue, Jan 2, 7:15 AM

Mon, Jan 1

spatel updated subscribers of D41618: [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons (PR33325).

cc @craig.topper and @hfinkel -- This might be an interesting case for the x86 preferred vector width effort (D41341) even for 128/256-bit vectors. Here, we're taking scalar code that is or would be produced by memcmp expansion and translating it to vectors based on the available ISA, but without accounting for the preferred vector width. I think we should add that predicate as a follow-up to prevent producing vector code if the user has requested we avoid it. There's no AVX512-specific codegen here or in memcmp expansion, so that's not a danger (yet).

Mon, Jan 1, 9:43 AM