spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (230 w, 14 h)

Recent Activity

Tue, Oct 16

spatel added a reverting commit for rL344609: [InstCombine] try harder to form select from logic ops: rL344612: revert rL344609: [InstCombine] try harder to form select from logic ops.
Tue, Oct 16, 8:28 AM
spatel committed rL344612: revert rL344609: [InstCombine] try harder to form select from logic ops.
revert rL344609: [InstCombine] try harder to form select from logic ops
Tue, Oct 16, 8:28 AM
spatel committed rL344610: [InstCombine] make sure type is integer before calling ComputeNumSignBits.
[InstCombine] make sure type is integer before calling ComputeNumSignBits
Tue, Oct 16, 7:46 AM
spatel committed rL344609: [InstCombine] try harder to form select from logic ops.
[InstCombine] try harder to form select from logic ops
Tue, Oct 16, 7:37 AM

Mon, Oct 15

spatel committed rL344559: [InstCombine] add tests for bitwise logic --> select; NFC.
[InstCombine] add tests for bitwise logic --> select; NFC
Mon, Oct 15, 2:45 PM
spatel committed rL344541: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.
[SelectionDAG] allow FP binops in SimplifyDemandedVectorElts
Mon, Oct 15, 11:07 AM
spatel closed D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.
Mon, Oct 15, 11:07 AM
spatel added a comment to D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.

Trying to avoid regressions with improved undef propagation:
rL343940
rL343942
rL343945
rL344525
rL344528
rL344534

Mon, Oct 15, 10:46 AM
spatel added a comment to D53236: [SelectionDAG] swap select_cc operands to enable folding.

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

Mon, Oct 15, 10:07 AM
spatel committed rL344534: [DAGCombiner] allow undef elts in vector fmul matching.
[DAGCombiner] allow undef elts in vector fmul matching
Mon, Oct 15, 9:56 AM
spatel committed rL344532: [DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI.
[DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI
Mon, Oct 15, 9:49 AM
spatel committed rL344531: [AArch64] add tests for fmul x, -2.0 with undef elts; NFC.
[AArch64] add tests for fmul x, -2.0 with undef elts; NFC
Mon, Oct 15, 9:46 AM
spatel committed rL344528: [DAGCombiner] allow undef elts in vector fma matching.
[DAGCombiner] allow undef elts in vector fma matching
Mon, Oct 15, 8:58 AM
spatel committed rL344527: [x86] add tests for fma with undef elts; NFC.
[x86] add tests for fma with undef elts; NFC
Mon, Oct 15, 8:49 AM
spatel committed rL344525: [DAGCombiner] allow undef elts in vector fma matching.
[DAGCombiner] allow undef elts in vector fma matching
Mon, Oct 15, 8:40 AM
spatel committed rL344523: [x86] add tests for fma with undef elts; NFC.
[x86] add tests for fma with undef elts; NFC
Mon, Oct 15, 8:31 AM
spatel added a comment to D53268: [X86] WIP: Stop promoting and/or/xor/andn to vXi64..

I think this is the right way to go. It's a blob of patterns, but they're stamped out in a regular form.
When the logic ops are promoted, we might have to deal with them separately as noted in D51553.

Mon, Oct 15, 8:21 AM
spatel added a comment to D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Um, in terms of fixes, I guess our alternatives are make this instcombine dependant on the target, or try to undo it somehow in the backend. My understanding is that the first option is not generally done in instcombine. Can someone give me some history there? Was it something that was decided as a rule, or just something that never came up. (I guess also in this case, not supporting this fold because it happens to cause the extra subs in a random test isn't a great reason to not do it).

Mon, Oct 15, 7:28 AM

Sun, Oct 14

spatel committed rL344476: [InstCombine] combine a shuffle and an extract subvector shuffle .
[InstCombine] combine a shuffle and an extract subvector shuffle
Sun, Oct 14, 8:27 AM
spatel closed D53037: [InstCombine] combine a shuffle and an extract subvector shuffle .
Sun, Oct 14, 8:27 AM
spatel added a comment to D53201: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016).

FYI, the post legalize types DAG combine doesn’t run if all the types in the DAG were legal and nothing was changed by legalize types. So gating a combine on LegalTypes can really mean a transform doesn’t run until after vector op legalization(again the DAG combine after this doesn’t run if nothing changed) or LegalizeDAG.

Sun, Oct 14, 7:50 AM

Sat, Oct 13

spatel added a comment to D53205: [FPEnv][NFCI] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...).

I suspect that none of these changes are actually 'NFC'.
Example:
rL344458

Sat, Oct 13, 9:29 AM
spatel committed rL344458: [InstCombine] fix complexity canonicalization with fake unary vector ops.
[InstCombine] fix complexity canonicalization with fake unary vector ops
Sat, Oct 13, 9:19 AM
spatel committed rL344456: [InstCombine] add tests for operand complexity canonicalization; NFC.
[InstCombine] add tests for operand complexity canonicalization; NFC
Sat, Oct 13, 9:05 AM
spatel updated the diff for D53201: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016).

Patch updated:

  1. Add "LegalTypes" as the first predicate for trying this transform to make it less likely that any weird types invalidate the later assumptions.
  2. Add an assert that the scalar bitwidth is a multiple of the vector element bitwidth (scalar_to_vector size must be a multiple, and bitcast can't change size).
  3. Add a test that at least starts with a weird type in IR.
Sat, Oct 13, 8:07 AM
spatel added inline comments to D53201: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016).
Sat, Oct 13, 8:02 AM

Fri, Oct 12

spatel added inline comments to D53037: [InstCombine] combine a shuffle and an extract subvector shuffle .
Fri, Oct 12, 10:51 AM
spatel committed rL344361: [x86] add and use fast horizontal vector math subtarget feature.
[x86] add and use fast horizontal vector math subtarget feature
Fri, Oct 12, 9:43 AM
spatel closed D53095: [x86] add and use fast horizontal vector math subtarget feature.
Fri, Oct 12, 9:43 AM
spatel created D53201: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016).
Fri, Oct 12, 9:21 AM
spatel committed rL344353: [AArch64][x86] add tests for trunc disguised as vector ops (PR39016); NFC.
[AArch64][x86] add tests for trunc disguised as vector ops (PR39016); NFC
Fri, Oct 12, 8:24 AM

Thu, Oct 11

spatel committed rL344320: [DAGCombiner] rearrange extract_element+bitcast fold; NFC.
[DAGCombiner] rearrange extract_element+bitcast fold; NFC
Thu, Oct 11, 4:58 PM
spatel committed rL344303: [x86] add tests for extract_element; NFC.
[x86] add tests for extract_element; NFC
Thu, Oct 11, 3:06 PM
spatel committed rL344301: [x86] regenerate CHECKs; NFC.
[x86] regenerate CHECKs; NFC
Thu, Oct 11, 2:46 PM
spatel committed rL344255: [DAGCombiner] move comment closer to the corresponding code; NFC .
[DAGCombiner] move comment closer to the corresponding code; NFC
Thu, Oct 11, 9:09 AM
spatel accepted D52729: [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.

LGTM

Thu, Oct 11, 7:48 AM

Wed, Oct 10

spatel accepted D53036: [InstCombine] Demand bits of UMin.

LGTM - see inline for a couple of nits.

Wed, Oct 10, 2:39 PM
spatel added inline comments to D53033: [InstCombine] Demand bits of UMAX.
Wed, Oct 10, 2:25 PM
spatel accepted D53033: [InstCombine] Demand bits of UMAX.

LGTM

Wed, Oct 10, 2:24 PM
spatel added a comment to D53036: [InstCombine] Demand bits of UMin.

Luckily, it appears that Alive can do countLeadingOnes too (although I don't see it anywhere in the sources I have).
https://rise4fun.com/Alive/O9i

Huh https://github.com/nunoplopes/alive/search?utf8=%E2%9C%93&q=countLeadingOnes&type=
CC @nlopes

Wed, Oct 10, 2:19 PM
spatel updated the diff for D53037: [InstCombine] combine a shuffle and an extract subvector shuffle .

Patch updated:
Added an assert to (hopefully) make this transform clearer.

Wed, Oct 10, 2:03 PM
spatel committed rL344181: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try.
[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Wed, Oct 10, 1:52 PM
spatel committed rL344178: revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.
revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
Wed, Oct 10, 1:42 PM
spatel created D53095: [x86] add and use fast horizontal vector math subtarget feature.
Wed, Oct 10, 10:29 AM
spatel added inline comments to D53037: [InstCombine] combine a shuffle and an extract subvector shuffle .
Wed, Oct 10, 9:02 AM
spatel committed rL344141: [x86] allow single source horizontal op matching (PR39195).
[x86] allow single source horizontal op matching (PR39195)
Wed, Oct 10, 6:43 AM
spatel closed D52997: [x86] allow single source horizontal op matching (PR39195).
Wed, Oct 10, 6:43 AM

Tue, Oct 9

spatel accepted D44548: [DAGCombiner] Expand combining of FP logical operations to sign-setting FP operations.

Nit: it would be nicer to check in the PPC tests with baseline asm as a preliminary step. That way, we'd just see the asm diff there too.

Tue, Oct 9, 2:33 PM
spatel committed rL344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.
[InstCombine] reverse 'trunc X to <N x i1>' canonicalization
Tue, Oct 9, 2:28 PM
spatel closed D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.
Tue, Oct 9, 2:27 PM
spatel accepted D52934: [FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros.

LGTM

Tue, Oct 9, 2:01 PM
spatel added a comment to D52997: [x86] allow single source horizontal op matching (PR39195).

I don't see it as a big problem if we start "regressing" this particular case on Jaguar.

Ok - that's not the impression I got reading PR39195. @dyung - how important is this pattern?

I will defer to Andrea's expertise in this area. I was under the impression that generating the horizontal add/sub instructions was preferred for these cases (especially for btver2), but if that is not the case, I can update our test to not expect it to be generated.

Tue, Oct 9, 1:40 PM
spatel added a reviewer for D53037: [InstCombine] combine a shuffle and an extract subvector shuffle : efriedma.
Tue, Oct 9, 1:10 PM
spatel created D53037: [InstCombine] combine a shuffle and an extract subvector shuffle .
Tue, Oct 9, 1:10 PM
spatel committed rL344067: [InstCombine] add tests for extract subvector shuffles; NFC.
[InstCombine] add tests for extract subvector shuffles; NFC
Tue, Oct 9, 11:39 AM
spatel committed rL344059: [AArch64][x86] add tests for bitcasted fnabs; NFC.
[AArch64][x86] add tests for bitcasted fnabs; NFC
Tue, Oct 9, 10:22 AM
spatel added a comment to D44548: [DAGCombiner] Expand combining of FP logical operations to sign-setting FP operations.

How about enabling the hook for PPC before adding to the generic DAG combine...
Right now, you'll get something horrible for normal fabs (and fneg?) without this patch, right?

	xscvdpspn 0, 1
	xxsldwi 0, 0, 0, 3
	mfvsrwz 3, 0
	rlwinm 3, 3, 0, 1, 31
	mtvsrd 0, 3
	xxsldwi 0, 0, 0, 1
	xscvspdpn 1, 0

Yes. Although that is the worst case (since PPC f32 isn't 32 bits in a register).
No doubt, the PPC-specific portion of this needs to go in and can certainly go in first. I can commit this as the PPC portion first (with the fnabs test missing). And then the generic part (adding the fnabs test).
The question is do I need to split this into two patches and post for review or can I just commit it as two separate patches (or maybe just commit the PPC-portion and post the generic portion for review)?

Tue, Oct 9, 10:14 AM
spatel committed rL344056: [InstCombine] make helper function 'static'; NFC.
[InstCombine] make helper function 'static'; NFC
Tue, Oct 9, 8:31 AM
spatel added a comment to D44548: [DAGCombiner] Expand combining of FP logical operations to sign-setting FP operations.

How about enabling the hook for PPC before adding to the generic DAG combine...
Right now, you'll get something horrible for normal fabs (and fneg?) without this patch, right?

Tue, Oct 9, 7:58 AM
spatel added inline comments to D52729: [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.
Tue, Oct 9, 7:46 AM
spatel added a comment to D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.

Now that D52964 has landed - is there anything stopping this?

Tue, Oct 9, 7:20 AM
spatel committed rL344048: [x86] use demanded bits to simplify masked store codegen.
[x86] use demanded bits to simplify masked store codegen
Tue, Oct 9, 7:06 AM
spatel closed D52964: [x86] use demanded bits to simplify masked store codegen.
Tue, Oct 9, 7:06 AM
spatel added a comment to D52964: [x86] use demanded bits to simplify masked store codegen.

LGTM - the AVX1 regression needs SIGN_EXTEND_VECTOR_INREG support adding to SimplifyDemandedBits which I intend to do as a follow up patch

Tue, Oct 9, 6:38 AM
spatel added a comment to D52997: [x86] allow single source horizontal op matching (PR39195).

I don't see it as a big problem if we start "regressing" this particular case on Jaguar.

Tue, Oct 9, 5:18 AM

Mon, Oct 8

spatel committed rL343997: [DAGCombiner] simplify code for fmul with constant fold; NFCI.
[DAGCombiner] simplify code for fmul with constant fold; NFCI
Mon, Oct 8, 2:19 PM
spatel created D52997: [x86] allow single source horizontal op matching (PR39195).
Mon, Oct 8, 1:03 PM
spatel committed rL343994: [x86] add tests for phaddd/phaddw; NFC.
[x86] add tests for phaddd/phaddw; NFC
Mon, Oct 8, 12:50 PM
spatel committed rL343989: [x86] make horizontal binop matching clearer; NFCI.
[x86] make horizontal binop matching clearer; NFCI
Mon, Oct 8, 11:11 AM
spatel committed rL343975: [x86] add hadd test with no undefs, remove duplicate tests; NFC.
[x86] add hadd test with no undefs, remove duplicate tests; NFC
Mon, Oct 8, 9:26 AM
spatel committed rL343974: [x86] simplify hadd tests; NFC.
[x86] simplify hadd tests; NFC
Mon, Oct 8, 8:58 AM
spatel added inline comments to D52934: [FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros.
Mon, Oct 8, 8:00 AM
spatel updated the diff for D52964: [x86] use demanded bits to simplify masked store codegen.

Patch updated:
Removed unnecessary depth param (copy-paste remnant).

Mon, Oct 8, 6:47 AM
spatel committed rL343965: [x86] add 16 missed hadd patterns (PR39195); NFC.
[x86] add 16 missed hadd patterns (PR39195); NFC
Mon, Oct 8, 5:56 AM

Sun, Oct 7

spatel added a comment to D44548: [DAGCombiner] Expand combining of FP logical operations to sign-setting FP operations.

I'm trying to improve vector-demanded-elements folds in D52912 which means we need to do better at recognizing patterns with undefs.

Sun, Oct 7, 9:41 AM
spatel committed rL343945: [DAGCombiner] allow undef elts in vector fadd matching.
[DAGCombiner] allow undef elts in vector fadd matching
Sun, Oct 7, 9:33 AM
spatel committed rL343944: [x86] add vector fadd with undef elts test; NFC.
[x86] add vector fadd with undef elts test; NFC
Sun, Oct 7, 9:31 AM
spatel committed rL343943: [x86] remove redundant tests; NFC.
[x86] remove redundant tests; NFC
Sun, Oct 7, 9:16 AM
spatel committed rL343942: [DAGCombiner] allow undefs when matching vector splats for fmul folds.
[DAGCombiner] allow undefs when matching vector splats for fmul folds
Sun, Oct 7, 9:09 AM
spatel committed rL343941: [x86] add vector fmul with undef elts tests; NFC .
[x86] add vector fmul with undef elts tests; NFC
Sun, Oct 7, 9:03 AM
spatel committed rL343940: [DAGCombiner] allow undef elts in vector fabs/fneg matching.
[DAGCombiner] allow undef elts in vector fabs/fneg matching
Sun, Oct 7, 8:34 AM
spatel committed rL343939: [DAGCombiner] shorten code for bitcast+fabs fold; NFC.
[DAGCombiner] shorten code for bitcast+fabs fold; NFC
Sun, Oct 7, 8:20 AM
spatel committed rL343938: [x86] add tests for FP logic folding for vectors with undefs; NFC .
[x86] add tests for FP logic folding for vectors with undefs; NFC
Sun, Oct 7, 8:08 AM
spatel added a comment to D52934: [FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros.

This patch is not 'NFC'. I added a test that should show an improvement at rL343936.

Sun, Oct 7, 7:50 AM
spatel committed rL343936: [InstSimplify] add vector test for fneg+fdiv; NFC.
[InstSimplify] add vector test for fneg+fdiv; NFC
Sun, Oct 7, 7:48 AM
spatel added inline comments to D52934: [FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros.
Sun, Oct 7, 7:39 AM
spatel updated the diff for D52964: [x86] use demanded bits to simplify masked store codegen.

Patch updated:
Fixed whitespace diff.

Sun, Oct 7, 6:53 AM

Sat, Oct 6

spatel added a dependent revision for D52964: [x86] use demanded bits to simplify masked store codegen: D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.
Sat, Oct 6, 7:33 AM
spatel added a dependency for D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization: D52964: [x86] use demanded bits to simplify masked store codegen.
Sat, Oct 6, 7:33 AM
spatel created D52964: [x86] use demanded bits to simplify masked store codegen.
Sat, Oct 6, 7:30 AM
spatel committed rL343920: [x86] add test for masked store with extra shift op; NFC.
[x86] add test for masked store with extra shift op; NFC
Sat, Oct 6, 7:13 AM
spatel added a comment to D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization.

@craig.topper The final codegen from the updated IR in masked_load_store.ll regresses due to masked stores not making use of only requiring the MSB of the mask vector (lots of SIGN_EXTEND_INREG etc.) - X86ISelLowering's combineMaskedStore only handles the PCMPGT case, how tricky would it be to replace it with a general SimplifyDemandedBits calls?

Sat, Oct 6, 6:03 AM

Fri, Oct 5

spatel added inline comments to D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.
Fri, Oct 5, 2:43 PM
spatel added inline comments to D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.
Fri, Oct 5, 2:35 PM
spatel updated the diff for D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.

Patch updated - once again, no code changes, but rebased after:
rL343882
...so now we don't have any more sched test diffs.
I was completely missing the point of those tests (to check both reg-reg and reg-mem variants of the opcodes).

Fri, Oct 5, 1:49 PM
spatel committed rL343882: [x86] make blend tests resistant to demanded elements improvements; NFC.
[x86] make blend tests resistant to demanded elements improvements; NFC
Fri, Oct 5, 1:28 PM
spatel updated the diff for D52912: [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts.

Patch updated - no code changes, but:
Rebased after rL343858, so less noise from the SSE1/2 sched tests.

Fri, Oct 5, 11:07 AM
spatel committed rL343865: [SelectionDAG] allow undefs when matching splat constants.
[SelectionDAG] allow undefs when matching splat constants
Fri, Oct 5, 10:44 AM
spatel committed rL343863: [x86] add test for (X - 0.0) vector with undef elts; NFC.
[x86] add test for (X - 0.0) vector with undef elts; NFC
Fri, Oct 5, 10:38 AM
spatel committed rL343855: [x86] regenerate full checks; NFC.
[x86] regenerate full checks; NFC
Fri, Oct 5, 7:58 AM