spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.
User Since
May 22 2014, 1:24 PM (148 w, 5 d)

Recent Activity

Yesterday

spatel added a comment to D31426: [DAGCombine] Combine shuffle of splat with multiple uses.

This seems unnecessarily complicated. Why can't we just match any shuffle of a splat before we get here and simplify it?

Tue, Mar 28, 4:14 PM
spatel committed rL298954: [DAGCombiner] reduce code duplication with local variables; NFCI.
[DAGCombiner] reduce code duplication with local variables; NFCI
Tue, Mar 28, 3:58 PM
spatel committed rL298950: [DAG] fix formatting; NFC.
[DAG] fix formatting; NFC
Tue, Mar 28, 3:37 PM
spatel committed rL298949: [DAGCombiner] remove redundant conditions and duplicated code; NFCI.
[DAGCombiner] remove redundant conditions and duplicated code; NFCI
Tue, Mar 28, 3:35 PM
spatel accepted D31165: [SDAG] Add AllowContract to SNodeFlags.

@spatel, @arsenm, this is a trivial patch to expose the new FMF to SDAG. Since you guys looked at the rest of the LLVM patches in the set, would you mind reviewing this too?

Tue, Mar 28, 3:03 PM
spatel committed rL298944: [DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCI.
[DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCI
Tue, Mar 28, 2:53 PM
spatel committed rL298940: [DAGCombiner] clean up foldAndOfSetCCs; NFCI.
[DAGCombiner] clean up foldAndOfSetCCs; NFCI
Tue, Mar 28, 1:40 PM
spatel committed rL298938: [DAGCombiner] add helper function for and-of-setcc folds; NFC.
[DAGCombiner] add helper function for and-of-setcc folds; NFC
Tue, Mar 28, 1:11 PM
spatel created D31428: [x86] fix AVX FP cmp intrinsic documentation (PR28110).
Tue, Mar 28, 10:59 AM
spatel committed rL298933: [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality.
[x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Tue, Mar 28, 10:36 AM
spatel committed rL298926: [x86] add separate check prefix for SSE; NFC.
[x86] add separate check prefix for SSE; NFC
Tue, Mar 28, 9:08 AM
spatel committed rL298918: [x86] add AVX2 run to show 256-bit opportunity; NFC.
[x86] add AVX2 run to show 256-bit opportunity; NFC
Tue, Mar 28, 6:59 AM

Mon, Mar 27

spatel accepted D31164: [IR] Add AllowContract to FastMathFlags.

Thanks! LGTM.

Mon, Mar 27, 4:10 PM
spatel added inline comments to D31403: [SDAG] Deal with deleted node in PromoteIntShiftOp.
Mon, Mar 27, 1:04 PM
spatel added inline comments to D31164: [IR] Add AllowContract to FastMathFlags.
Mon, Mar 27, 12:37 PM

Sat, Mar 25

spatel accepted D31347: [X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK.

LGTM. I'd make the VSRAI signbits part a follow-up commit for the sake of minimalism and bug bisection (if needed), but if you think it works better as one piece, that's ok too.

Sat, Mar 25, 9:22 AM
spatel committed rL298775: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
Sat, Mar 25, 9:18 AM
spatel closed D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality by committing rL298775: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Sat, Mar 25, 9:17 AM

Fri, Mar 24

spatel updated the diff for D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.

Patch updated:

  1. Added 32-bit target testing in rL298744
  2. Don't use hasSSE2() in the x86 override - that won't work if we're in soft-float mode (nice catch!).
  3. Add TODO comment to handle 64-bit type on x86 32-bit target.
Fri, Mar 24, 3:45 PM
spatel committed rL298744: [x86] add 32-bit RUN for better memcmp coverage; NFC.
[x86] add 32-bit RUN for better memcmp coverage; NFC
Fri, Mar 24, 3:22 PM
spatel added inline comments to D31347: [X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK.
Fri, Mar 24, 2:43 PM
spatel updated the diff for D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.

Patch updated:
On 2nd thought, that EVT/MVT argument makes no sense. The returned type from the hook is always going to be an MVT because it will be a supported type in order to be fast. Using MVT makes the code a bit cleaner since we don't have to pass a context around for those.

Fri, Mar 24, 2:26 PM
spatel updated the diff for D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.

Patch updated:
Have the TLI hook return the preferred operand (load) type for a given bitwidth, so we don't have to cycle through all of those when transforming the memcmp().

Fri, Mar 24, 2:05 PM
spatel added inline comments to D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Fri, Mar 24, 1:42 PM
spatel updated the diff for D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.

Patch updated:
Check all of the 16-byte simple value types before giving up.

Fri, Mar 24, 10:49 AM
spatel added inline comments to D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Fri, Mar 24, 9:56 AM
spatel accepted D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.

LGTM.

Fri, Mar 24, 7:12 AM

Thu, Mar 23

spatel added inline comments to D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Thu, Mar 23, 3:36 PM
spatel added inline comments to D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Thu, Mar 23, 2:32 PM
spatel added inline comments to D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Thu, Mar 23, 1:40 PM
spatel added a comment to D31037: [EarlyCSE] hoist div/rem when sibling op exists (PR31028).

Ping.

Thu, Mar 23, 9:27 AM
spatel created D31290: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality.
Thu, Mar 23, 9:17 AM
spatel committed rL298611: [x86] add memcmp tests, remove run.
[x86] add memcmp tests, remove run
Thu, Mar 23, 8:50 AM

Wed, Mar 22

spatel committed rL298553: [x86] improve tests, add tests, auto-generate checks; NFC.
[x86] improve tests, add tests, auto-generate checks; NFC
Wed, Mar 22, 3:51 PM
spatel committed rL298520: [InstCombine] canonicalize insertelement of scalar constant ahead of….
[InstCombine] canonicalize insertelement of scalar constant ahead of…
Wed, Mar 22, 10:23 AM
spatel closed D31196: [InstCombine] fold insertelement of scalar constant into vector of constants by committing rL298520: [InstCombine] canonicalize insertelement of scalar constant ahead of….
Wed, Mar 22, 10:23 AM
spatel added a comment to D31182: [InstCombine] fadd double (sitofp x), y check that the promotion is valid .

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

Wed, Mar 22, 9:21 AM

Tue, Mar 21

spatel updated subscribers of D31120: [InstCombine] Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified.

In D31094, @efriedma suggested a debug counter to make this visible in a test.

Tue, Mar 21, 4:30 PM
spatel accepted D31119: [InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts.

http://rise4fun.com/Alive/nI0
I had to use Alive to convince myself those tests are right. :)
LGTM.

Tue, Mar 21, 4:23 PM
spatel added inline comments to D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.
Tue, Mar 21, 3:55 PM
spatel updated the diff for D31196: [InstCombine] fold insertelement of scalar constant into vector of constants.

Patch updated as suggested by Eli:
Canonicalize an insert of a constant before an insert of a variable. There was already a test for this (but did nothing before), so I added a comment to explain.

Tue, Mar 21, 3:14 PM
spatel added a comment to D31196: [InstCombine] fold insertelement of scalar constant into vector of constants.

Alternatively, we could perform this transform unconditionally: transform (insertelement (insertelement, X, Y, IdxC1), C, IdxC2) -> (insertelement (insertelement X, C, IdxC2), Y, IdxC1), ignoring the actual value of X.

Tue, Mar 21, 2:08 PM
spatel committed rL298432: [InstCombine] regenerate checks; NFC.
[InstCombine] regenerate checks; NFC
Tue, Mar 21, 1:27 PM
spatel created D31196: [InstCombine] fold insertelement of scalar constant into vector of constants.
Tue, Mar 21, 10:58 AM
spatel added inline comments to D31164: [IR] Add AllowContract to FastMathFlags.
Tue, Mar 21, 8:14 AM
spatel added reviewers for D31182: [InstCombine] fadd double (sitofp x), y check that the promotion is valid : hfinkel, jmolloy.

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

Tue, Mar 21, 7:44 AM
spatel committed rL298377: [InstCombine] auto-generate better checks; NFC.
[InstCombine] auto-generate better checks; NFC
Tue, Mar 21, 7:17 AM
spatel committed rL298376: [x86] use PMOVMSK for vector-sized equality comparisons.
[x86] use PMOVMSK for vector-sized equality comparisons
Tue, Mar 21, 7:03 AM
spatel closed D31156: [x86] use PMOVMSK for vector-sized equality comparisons by committing rL298376: [x86] use PMOVMSK for vector-sized equality comparisons.
Tue, Mar 21, 7:02 AM

Mon, Mar 20

spatel created D31156: [x86] use PMOVMSK for vector-sized equality comparisons.
Mon, Mar 20, 3:57 PM
spatel committed rL298317: [x86] add tests for setcc of i128/i256; NFC.
[x86] add tests for setcc of i128/i256; NFC
Mon, Mar 20, 3:28 PM
spatel accepted D31091: [InstCombine] Print a debug message when we constant fold an operand during worklist creation.

LGTM.

Mon, Mar 20, 8:36 AM

Fri, Mar 17

spatel committed rL298166: [x86] regenerate checks; NFC.
[x86] regenerate checks; NFC
Fri, Mar 17, 4:37 PM
spatel committed rL298164: [x86] regenerate checks; NFC.
[x86] regenerate checks; NFC
Fri, Mar 17, 4:36 PM
spatel accepted D31094: [BuildLibCalls] emitPutChar should infer its function attributes.

fputc and fputs check the type of one of the arguments before calling inferLibFuncAttributes. So its not always unconditional.

Fri, Mar 17, 4:14 PM
spatel added a comment to D31094: [BuildLibCalls] emitPutChar should infer its function attributes.

Can we do better than repeating this in every emit* function? Is there ever a case where we don't want to infer the attrs?

Fri, Mar 17, 1:54 PM
spatel committed rL298118: [x86] clean up setcc with negated operand transform and add missing test; NFCI.
[x86] clean up setcc with negated operand transform and add missing test; NFCI
Fri, Mar 17, 1:42 PM
spatel committed rL298107: [x86] avoid adc/sbb assert when both sides of add are zexted (PR32316).
[x86] avoid adc/sbb assert when both sides of add are zexted (PR32316)
Fri, Mar 17, 10:39 AM

Thu, Mar 16

spatel committed rL297989: [InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306).
[InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306)
Thu, Mar 16, 1:54 PM
spatel committed rL297986: [InstCombine] add tests for PR32306 and missed min/max canonicalization; NFC.
[InstCombine] add tests for PR32306 and missed min/max canonicalization; NFC
Thu, Mar 16, 1:43 PM
spatel created D31037: [EarlyCSE] hoist div/rem when sibling op exists (PR31028).
Thu, Mar 16, 9:58 AM

Wed, Mar 15

spatel committed rL297886: [EarlyCSE] reduce indent; NFCI.
[EarlyCSE] reduce indent; NFCI
Wed, Mar 15, 2:11 PM
spatel committed rL297846: Cyle -> Cycle; NFCI.
Cyle -> Cycle; NFCI
Wed, Mar 15, 8:49 AM
spatel committed rL297836: [Target] fix typo; NFC.
[Target] fix typo; NFC
Wed, Mar 15, 7:13 AM
spatel added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?
Next, I was told that ResourceCycles here:
===============================
class ProcWriteResources<list<ProcResourceKind> resources> {

list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;

================================
could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?

Wed, Mar 15, 7:11 AM

Tue, Mar 14

spatel committed rL297762: [DAG] vector div/rem with any zero element in divisor is undef.
[DAG] vector div/rem with any zero element in divisor is undef
Tue, Mar 14, 11:19 AM
spatel closed D30826: [DAG] vector div/rem with any zero element in divisor is undef by committing rL297762: [DAG] vector div/rem with any zero element in divisor is undef.
Tue, Mar 14, 11:18 AM
spatel committed rL297755: [InstCombine] improve readability; NFCI.
[InstCombine] improve readability; NFCI
Tue, Mar 14, 10:39 AM
spatel committed rL297747: [InstCombine] consolidate rem tests and update checks; NFC.
[InstCombine] consolidate rem tests and update checks; NFC
Tue, Mar 14, 9:40 AM
spatel committed rL297746: [InstCombine] regenerate checks; NFC.
[InstCombine] regenerate checks; NFC
Tue, Mar 14, 9:29 AM

Mon, Mar 13

spatel added a comment to D30826: [DAG] vector div/rem with any zero element in divisor is undef.

I agree it's weird that there are two functions for constant folding... but you can leave it for now, I guess.

Maybe check for null or undef, rather than just null?

Mon, Mar 13, 4:16 PM
spatel added a comment to D30910: [SimplifyCFG] allow speculation of div/rem when sibling op exists (PR31028).

Every CPU can lower divrem to something cheaper than a separate div and rem; that seems fine.

My one concern here is that for a target without a hardware divrem for the width in question, if the rem is inside the conditional, you're speculating a multiply. This is fine if it gets lowered to a hardware instruction, but might be problematic if it gets lowered to a libcall.

Mon, Mar 13, 4:09 PM
spatel created D30910: [SimplifyCFG] allow speculation of div/rem when sibling op exists (PR31028).
Mon, Mar 13, 2:18 PM
spatel committed rL297660: [SimplifyCFG] move tests for PR31028 from CGP.
[SimplifyCFG] move tests for PR31028 from CGP
Mon, Mar 13, 1:11 PM
spatel committed rL297629: [CGP] add tests for PR31028; NFC.
[CGP] add tests for PR31028; NFC
Mon, Mar 13, 8:57 AM
spatel added a comment to D27114: Preserve nonnull metadata on Loads through SROA & mem2reg..

Hal's earlier comment said we don't need to hold this up for clang, but for reference that fix is proposed now:
D30806

Mon, Mar 13, 8:09 AM
spatel updated the diff for D30826: [DAG] vector div/rem with any zero element in divisor is undef.

Patch updated:

  1. The code duplication was already bugging me, and we know it's only going to grow over time, so I'm proposing a single place to house the undef checking. Fixing the duplication between FoldConstantArithmetic and FoldConstantVectorArithmetic is another step.
  2. Fixed to check for zero or undef elements in the build vector.
Mon, Mar 13, 7:46 AM

Sun, Mar 12

spatel committed rL297588: [x86] these aren't the undefs you're looking for (PR32176).
[x86] these aren't the undefs you're looking for (PR32176)
Sun, Mar 12, 12:27 PM
spatel closed D30834: [x86] these aren't the undefs you're looking for (PR32176) by committing rL297588: [x86] these aren't the undefs you're looking for (PR32176).
Sun, Mar 12, 12:27 PM
spatel committed rL297586: [x86] don't blindly transform SETB into SBB.
[x86] don't blindly transform SETB into SBB
Sun, Mar 12, 11:41 AM
spatel closed D30611: [x86] don't blindly transform SETB into SBB by committing rL297586: [x86] don't blindly transform SETB into SBB.
Sun, Mar 12, 11:41 AM
spatel accepted D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones.

LGTM. I don't know if the force-to-zero is important either, but since we're generally happy to add xors to avoid partial reg dependencies, that seems fine to me.

Sun, Mar 12, 9:34 AM

Fri, Mar 10

spatel added a comment to D30834: [x86] these aren't the undefs you're looking for (PR32176).

Have you ran the tests all the way through to assembly and made sure we don't regress? If we do regress, I wouldn't hold up fixing this, but we should at least have bugs for what breaks.

Fri, Mar 10, 4:51 PM
spatel added a comment to D29841: [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41).

LGTM.

Fri, Mar 10, 10:52 AM
spatel created D30834: [x86] these aren't the undefs you're looking for (PR32176).
Fri, Mar 10, 9:26 AM
spatel created D30826: [DAG] vector div/rem with any zero element in divisor is undef.
Fri, Mar 10, 6:53 AM

Thu, Mar 9

spatel committed rL297433: [x86] add tests for vec div/rem with 0 element in divisor; NFC.
[x86] add tests for vec div/rem with 0 element in divisor; NFC
Thu, Mar 9, 5:07 PM
spatel committed rL297411: [InstSimplify] allow folds for bool vector div/rem.
[InstSimplify] allow folds for bool vector div/rem
Thu, Mar 9, 2:08 PM
spatel committed rL297409: [ConstantFold] vector div/rem with any zero element in divisor is undef.
[ConstantFold] vector div/rem with any zero element in divisor is undef
Thu, Mar 9, 12:54 PM
spatel committed rL297407: [InstSimplify] add tests for vector constant folding div/rem-by-0; NFC.
[InstSimplify] add tests for vector constant folding div/rem-by-0; NFC
Thu, Mar 9, 12:43 PM
spatel updated the diff for D30611: [x86] don't blindly transform SETB into SBB.

Patch updated (no code changes, but...):

  1. Changed phantom test diffs to code comments to better track the cases where this transform still fires.
  2. Rebased - we got more diffs from rL297249 and less from rL297404.
Thu, Mar 9, 11:52 AM
spatel added inline comments to D30611: [x86] don't blindly transform SETB into SBB.
Thu, Mar 9, 11:28 AM
spatel committed rL297390: [InstSimplify] vector div/rem with any zero element in divisor is undef.
[InstSimplify] vector div/rem with any zero element in divisor is undef
Thu, Mar 9, 8:33 AM
spatel closed D30665: [InstSimplify] vector div/rem with any zero element in divisor is undef by committing rL297390: [InstSimplify] vector div/rem with any zero element in divisor is undef.
Thu, Mar 9, 8:33 AM
spatel committed rL297384: [DAG] recognize div/rem by 0 as undef before trying constant folding.
[DAG] recognize div/rem by 0 as undef before trying constant folding
Thu, Mar 9, 7:14 AM
spatel closed D30741: [DAG] recognize div/rem by 0 as undef before trying constant folding by committing rL297384: [DAG] recognize div/rem by 0 as undef before trying constant folding.
Thu, Mar 9, 7:14 AM

Wed, Mar 8

spatel updated the diff for D30741: [DAG] recognize div/rem by 0 as undef before trying constant folding.

Patch updated:
I didn't notice that an extra x86 test (for https://bugs.llvm.org/show_bug.cgi?id=30693) was failing with this change because it has div-by-0.
I can't tell if that test is useful or not any more ( cc'ing @zvi ).

Wed, Mar 8, 9:44 AM
spatel committed rL297296: [x86] regenerate checks; NFC.
[x86] regenerate checks; NFC
Wed, Mar 8, 9:32 AM
spatel updated the diff for D30741: [DAG] recognize div/rem by 0 as undef before trying constant folding.

Patch updated:
Changing the definition of isConstantOrConstantVector() to include undef may have unintended consequences (there's a test in test/CodeGen/ARM/select.ll that will trigger the assert in foldBinOpIntoSelect()), so just assert that an undef result from DAG.getNode() is ok for now.

Wed, Mar 8, 9:12 AM
spatel created D30741: [DAG] recognize div/rem by 0 as undef before trying constant folding.
Wed, Mar 8, 8:36 AM