Page MenuHomePhabricator

spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (248 w, 1 d)

Recent Activity

Yesterday

spatel committed rG973143ab79f3: [CGP] add tests for uaddo increment/decrement; NFC (authored by spatel).
[CGP] add tests for uaddo increment/decrement; NFC
Fri, Feb 22, 3:19 PM
spatel committed rL354699: [CGP] add tests for uaddo increment/decrement; NFC.
[CGP] add tests for uaddo increment/decrement; NFC
Fri, Feb 22, 3:19 PM
spatel committed rGffe1cf5e9283: [CGP] move overflow intrinsic insertion to common location; NFCI (authored by spatel).
[CGP] move overflow intrinsic insertion to common location; NFCI
Fri, Feb 22, 12:21 PM
spatel committed rL354689: [CGP] move overflow intrinsic insertion to common location; NFCI.
[CGP] move overflow intrinsic insertion to common location; NFCI
Fri, Feb 22, 12:21 PM
spatel added inline comments to D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Fri, Feb 22, 7:55 AM · Restricted Project
spatel committed rGa9e289174a1c: [x86] allow narrowing of vector UINT_TO_FP (authored by spatel).
[x86] allow narrowing of vector UINT_TO_FP
Fri, Feb 22, 7:53 AM
spatel committed rL354675: [x86] allow narrowing of vector UINT_TO_FP.
[x86] allow narrowing of vector UINT_TO_FP
Fri, Feb 22, 7:53 AM
spatel committed rG1baf7896cc2b: [x86] simplify code in combineExtractSubvector; NFC (authored by spatel).
[x86] simplify code in combineExtractSubvector; NFC
Fri, Feb 22, 7:29 AM
spatel committed rL354674: [x86] simplify code in combineExtractSubvector; NFC.
[x86] simplify code in combineExtractSubvector; NFC
Fri, Feb 22, 7:27 AM
spatel added inline comments to D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Fri, Feb 22, 6:38 AM · Restricted Project

Thu, Feb 21

spatel updated the diff for D58282: [x86] scalarize extract element 0 of FP math.

Patch updated:

  1. Added TODO comment about handling non-zero extract index.
  2. Reformatted switch.
Thu, Feb 21, 4:03 PM · Restricted Project
spatel added inline comments to D58282: [x86] scalarize extract element 0 of FP math.
Thu, Feb 21, 4:00 PM · Restricted Project
spatel committed rG234a5e8ea422: [x86] vectorize more cast ops in lowering to avoid register file transfers (authored by spatel).
[x86] vectorize more cast ops in lowering to avoid register file transfers
Thu, Feb 21, 12:41 PM
spatel committed rL354619: [x86] vectorize more cast ops in lowering to avoid register file transfers.
[x86] vectorize more cast ops in lowering to avoid register file transfers
Thu, Feb 21, 12:40 PM
spatel closed D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Thu, Feb 21, 12:40 PM · Restricted Project
spatel added a comment to D47735: [DAGCombiner] Create rotates more aggressively.

Are you opposed to having this done in the DAG combiner?

Thu, Feb 21, 12:10 PM · Restricted Project
spatel created D58521: [DAGCombiner] allow truncation of binops after legalization if desirable.
Thu, Feb 21, 10:59 AM · Restricted Project
spatel committed rGba5ee817e9b5: [DAGCombiner] prevent infinite looping by truncating 'and' (PR40793) (authored by spatel).
[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793)
Thu, Feb 21, 8:02 AM
spatel committed rL354594: [DAGCombiner] prevent infinite looping by truncating 'and' (PR40793).
[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793)
Thu, Feb 21, 8:01 AM
spatel committed rGd2886876763c: [x86] regenerate checks; NFC (authored by spatel).
[x86] regenerate checks; NFC
Thu, Feb 21, 7:31 AM
spatel committed rL354589: [x86] regenerate checks; NFC.
[x86] regenerate checks; NFC
Thu, Feb 21, 7:31 AM

Wed, Feb 20

spatel added a comment to D47735: [DAGCombiner] Create rotates more aggressively.

One goal was to be able to generate rol-and-accumulate instruction (on Hexagon), specifically for the accumulate operation being | (see f11 in rotate.ll). For the C code we still don't generate it:

unsigned blah(unsigned s, unsigned x) {
  return s | (x << 27) | (x >> 5);
}
Wed, Feb 20, 3:42 PM · Restricted Project
spatel committed rG198cc305e985: [CGP] match a special-case of unsigned subtract overflow (authored by spatel).
[CGP] match a special-case of unsigned subtract overflow
Wed, Feb 20, 1:23 PM
spatel committed rL354519: [CGP] match a special-case of unsigned subtract overflow.
[CGP] match a special-case of unsigned subtract overflow
Wed, Feb 20, 1:23 PM
spatel added a comment to D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.

Ping.

Wed, Feb 20, 12:03 PM · Restricted Project
spatel committed rG8d91faa48138: [CGP][x86] add tests for usubo special-case; NFC (authored by spatel).
[CGP][x86] add tests for usubo special-case; NFC
Wed, Feb 20, 7:41 AM
spatel committed rL354475: [CGP][x86] add tests for usubo special-case; NFC.
[CGP][x86] add tests for usubo special-case; NFC
Wed, Feb 20, 7:40 AM
spatel committed rG68171e3cd689: [InstSimplify] use any-zero matcher for fcmp folds (authored by spatel).
[InstSimplify] use any-zero matcher for fcmp folds
Wed, Feb 20, 6:34 AM
spatel committed rL354467: [InstSimplify] use any-zero matcher for fcmp folds.
[InstSimplify] use any-zero matcher for fcmp folds
Wed, Feb 20, 6:33 AM

Tue, Feb 19

spatel added a reviewer for D58412: [X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUs: chandlerc.

Should we change the generic model too then since most CPUs prefer inc/dec?
I'm guessing that's a lot more test diffs, so no problem to make it another patch, but just want to know if that's the way to think about the generic model.

Related question to that. Should "pentium4" which is clang's Linux default CPU for 32-bit builds have more tuning flags on it?

Tue, Feb 19, 5:53 PM · Restricted Project
spatel committed rG5fefb02e274b: [InstCombine] regenerate test checks; NFC (authored by spatel).
[InstCombine] regenerate test checks; NFC
Tue, Feb 19, 5:25 PM
spatel committed rL354420: [InstCombine] regenerate test checks; NFC.
[InstCombine] regenerate test checks; NFC
Tue, Feb 19, 5:24 PM
spatel committed rG49f97395abb4: Revert "[InstSimplify] use any-zero matcher for fcmp folds" (authored by spatel).
Revert "[InstSimplify] use any-zero matcher for fcmp folds"
Tue, Feb 19, 4:23 PM
spatel added a reverting change for rG058bb8351351: [InstSimplify] use any-zero matcher for fcmp folds: rG49f97395abb4: Revert "[InstSimplify] use any-zero matcher for fcmp folds".
Tue, Feb 19, 4:23 PM
spatel committed rL354408: Revert "[InstSimplify] use any-zero matcher for fcmp folds".
Revert "[InstSimplify] use any-zero matcher for fcmp folds"
Tue, Feb 19, 4:23 PM
spatel added a comment to D58412: [X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUs.

Should we change the generic model too then since most CPUs prefer inc/dec?
I'm guessing that's a lot more test diffs, so no problem to make it another patch, but just want to know if that's the way to think about the generic model.

Tue, Feb 19, 4:14 PM · Restricted Project
spatel committed rG058bb8351351: [InstSimplify] use any-zero matcher for fcmp folds (authored by spatel).
[InstSimplify] use any-zero matcher for fcmp folds
Tue, Feb 19, 4:10 PM
spatel committed rL354406: [InstSimplify] use any-zero matcher for fcmp folds.
[InstSimplify] use any-zero matcher for fcmp folds
Tue, Feb 19, 4:09 PM
spatel committed rG9cf04addf39c: [InstSimplify] add vector tests for fcmp+fabs; NFC (authored by spatel).
[InstSimplify] add vector tests for fcmp+fabs; NFC
Tue, Feb 19, 3:58 PM
spatel committed rL354404: [InstSimplify] add vector tests for fcmp+fabs; NFC.
[InstSimplify] add vector tests for fcmp+fabs; NFC
Tue, Feb 19, 3:57 PM
spatel abandoned D58359: [Analysis] fold load of untouched alloca to undef.

Abandoning. Added a sentence to the LangRef here:
rL354394

Tue, Feb 19, 2:39 PM · Restricted Project
spatel committed rGb6bc11d4067d: [LangRef] add to description of alloca instruction (authored by spatel).
[LangRef] add to description of alloca instruction
Tue, Feb 19, 2:37 PM
spatel committed rL354394: [LangRef] add to description of alloca instruction.
[LangRef] add to description of alloca instruction
Tue, Feb 19, 2:36 PM
spatel committed rGc1e018431795: [InstCombine] reduce even more unsigned saturated add with 'not' op (authored by spatel).
[InstCombine] reduce even more unsigned saturated add with 'not' op
Tue, Feb 19, 2:14 PM
spatel committed rL354393: [InstCombine] reduce even more unsigned saturated add with 'not' op.
[InstCombine] reduce even more unsigned saturated add with 'not' op
Tue, Feb 19, 2:14 PM
spatel committed rGdcb93c0ddacd: [InstCombine] rearrange saturated add folds; NFC (authored by spatel).
[InstCombine] rearrange saturated add folds; NFC
Tue, Feb 19, 1:46 PM
spatel committed rL354384: [InstCombine] rearrange saturated add folds; NFC.
[InstCombine] rearrange saturated add folds; NFC
Tue, Feb 19, 1:46 PM
spatel added a comment to D58359: [Analysis] fold load of untouched alloca to undef.

Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.

That said, this seems like the wrong direction, in terms of where we want to perform this sort of optimization. Most functions have more than one basic block, and you won't catch any of those cases. EarlyCSE/GVN/NewGVN should be able to handle this case, and similar cases where there's more than one basic block. Also, FindAvailablePtrLoadStore is basically only used from two places: JumpThreading, and InstCombine. InstCombine really shouldn't be doing this sort of scan, and JumpThreading obviously only triggers in functions with more than one BB.

Tue, Feb 19, 12:00 PM · Restricted Project
spatel added inline comments to D58329: [ValueTracking] Known bits support for unsigned saturating add/sub.
Tue, Feb 19, 7:00 AM · Restricted Project
spatel added a comment to D58361: [x86] allow more 128-bit extract+shufps formation to avoid 256-bit shuffles.

Thanks, Peter. 'HasFastVariableShuffle' is a heuristic, so we're never going to get it right all the time, but it sounds like we should leave this particular bit of logic as-is and try to chisel out smaller patterns/transforms for sure wins.

Tue, Feb 19, 5:27 AM · Restricted Project

Mon, Feb 18

spatel created D58361: [x86] allow more 128-bit extract+shufps formation to avoid 256-bit shuffles.
Mon, Feb 18, 4:23 PM · Restricted Project
spatel committed rGd8b4efcb6b4a: [CGP] form usub with overflow from sub+icmp (authored by spatel).
[CGP] form usub with overflow from sub+icmp
Mon, Feb 18, 3:35 PM
spatel committed rL354298: [CGP] form usub with overflow from sub+icmp.
[CGP] form usub with overflow from sub+icmp
Mon, Feb 18, 3:35 PM
spatel closed D57789: [CGP] form usub with overflow from sub+icmp.
Mon, Feb 18, 3:35 PM · Restricted Project
spatel created D58359: [Analysis] fold load of untouched alloca to undef.
Mon, Feb 18, 2:50 PM · Restricted Project
spatel added a comment to D57789: [CGP] form usub with overflow from sub+icmp.

Ping.

Mon, Feb 18, 8:54 AM · Restricted Project
spatel committed rGfff628274d46: [x86] split more v8f32/v8i32 shuffles in lowering (authored by spatel).
[x86] split more v8f32/v8i32 shuffles in lowering
Mon, Feb 18, 8:46 AM
spatel committed rL354279: [x86] split more v8f32/v8i32 shuffles in lowering.
[x86] split more v8f32/v8i32 shuffles in lowering
Mon, Feb 18, 8:46 AM
spatel closed D58181: [x86] split more v8f32/v8i32 shuffles in lowering.
Mon, Feb 18, 8:46 AM · Restricted Project
spatel committed rG8a35d339c92a: Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op" (authored by spatel).
Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op"
Mon, Feb 18, 8:04 AM
spatel added a reverting change for rG079b610c29b4: [InstCombine] reduce even more unsigned saturated add with 'not' op: rG8a35d339c92a: Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op".
Mon, Feb 18, 8:04 AM
spatel committed rL354277: Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op".
Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op"
Mon, Feb 18, 8:03 AM
spatel committed rG079b610c29b4: [InstCombine] reduce even more unsigned saturated add with 'not' op (authored by spatel).
[InstCombine] reduce even more unsigned saturated add with 'not' op
Mon, Feb 18, 7:25 AM
spatel committed rL354276: [InstCombine] reduce even more unsigned saturated add with 'not' op.
[InstCombine] reduce even more unsigned saturated add with 'not' op
Mon, Feb 18, 7:21 AM

Sun, Feb 17

spatel committed rG92b5b195dbaf: [InstCombine] add even more tests for unsigned saturated add; NFC (authored by spatel).
[InstCombine] add even more tests for unsigned saturated add; NFC
Sun, Feb 17, 12:02 PM
spatel committed rL354236: [InstCombine] add even more tests for unsigned saturated add; NFC.
[InstCombine] add even more tests for unsigned saturated add; NFC
Sun, Feb 17, 12:02 PM
spatel committed rGb341ee7071c1: [InstCombine] reduce more unsigned saturated add with 'not' op (authored by spatel).
[InstCombine] reduce more unsigned saturated add with 'not' op
Sun, Feb 17, 8:49 AM
spatel committed rL354224: [InstCombine] reduce more unsigned saturated add with 'not' op.
[InstCombine] reduce more unsigned saturated add with 'not' op
Sun, Feb 17, 8:48 AM
spatel committed rGdb02293d9dd5: [InstCombine] add more tests for unsigned saturated add; NFC (authored by spatel).
[InstCombine] add more tests for unsigned saturated add; NFC
Sun, Feb 17, 8:47 AM
spatel committed rL354223: [InstCombine] add more tests for unsigned saturated add; NFC.
[InstCombine] add more tests for unsigned saturated add; NFC
Sun, Feb 17, 8:47 AM
spatel committed rGbee207354271: [InstCombine] reduce unsigned saturated add with 'not' op (authored by spatel).
[InstCombine] reduce unsigned saturated add with 'not' op
Sun, Feb 17, 7:59 AM
spatel committed rL354221: [InstCombine] reduce unsigned saturated add with 'not' op.
[InstCombine] reduce unsigned saturated add with 'not' op
Sun, Feb 17, 7:58 AM
spatel committed rG3e1193743c43: [InstCombine] add tests for unsigned saturated add; NFC (authored by spatel).
[InstCombine] add tests for unsigned saturated add; NFC
Sun, Feb 17, 7:09 AM
spatel committed rL354219: [InstCombine] add tests for unsigned saturated add; NFC.
[InstCombine] add tests for unsigned saturated add; NFC
Sun, Feb 17, 7:09 AM
spatel accepted D58006: [SelectionDAG] Extract [US]MULO expansion into TL method; NFC.

LGTM

Sun, Feb 17, 6:51 AM · Restricted Project
spatel added a comment to D58006: [SelectionDAG] Extract [US]MULO expansion into TL method; NFC.

The change looks ok, but can you describe here or point to a discussion about how this fits in the overall plan for overflow op codegen? Ie, what exactly is enabled by moving this code?

Sun, Feb 17, 6:30 AM · Restricted Project

Sat, Feb 16

spatel added a comment to D57789: [CGP] form usub with overflow from sub+icmp.

Independent of this patch, but just so everyone's aware - there's currently no consistency in the way we transform to the overflow intrinsics. We may transform to sadd.with.overflow as an IR canonicalization (no target checks):
https://godbolt.org/z/2ajU23

Sat, Feb 16, 7:09 AM · Restricted Project

Fri, Feb 15

spatel updated the diff for D58181: [x86] split more v8f32/v8i32 shuffles in lowering.

Patch updated:
Restrict the change to targets without fast-variable-shuffle.

Fri, Feb 15, 12:47 PM · Restricted Project
spatel added a comment to D58181: [x86] split more v8f32/v8i32 shuffles in lowering.

Better to stage this in 2 parts?
I can add a check for fast-variable-shuffle, so we get the clear improvements. Then, a follow-up can remove that check and see if that results in any real-world fallout.

Fri, Feb 15, 9:36 AM · Restricted Project
spatel updated the diff for D58282: [x86] scalarize extract element 0 of FP math.

Patch updated:
Add llvm_unreachable so we don't accidentally return without a value.

Fri, Feb 15, 9:01 AM · Restricted Project
spatel committed rG8a2b543a1333: [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734) (authored by spatel).
[InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)
Fri, Feb 15, 8:32 AM
spatel committed rL354144: [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734).
[InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)
Fri, Feb 15, 8:32 AM
spatel created D58282: [x86] scalarize extract element 0 of FP math.
Fri, Feb 15, 6:45 AM · Restricted Project

Thu, Feb 14

spatel committed rG0b2dca9f8302: [x86] add tests for extractelement of FP; NFC (authored by spatel).
[x86] add tests for extractelement of FP; NFC
Thu, Feb 14, 3:18 PM
spatel committed rL354077: [x86] add tests for extractelement of FP; NFC.
[x86] add tests for extractelement of FP; NFC
Thu, Feb 14, 3:18 PM
spatel added a comment to D58210: [SelectionDAGLegalize] Improve promotion of CTLZ.

@jonpa - IIRC, some targets infinite loop if we always allow the generic trunc fold in DAGCombiner after legalization, but you might be able to constrain it using the existing TLI hooks:
IsDesirableToPromoteOp()
isTypeDesirableForOp()

Thu, Feb 14, 2:30 PM
spatel added inline comments to D58242: Teach instcombine about remaining idemptotent atomicrmw types.
Thu, Feb 14, 10:02 AM · Restricted Project
spatel added a comment to D57921: [DAG] Cleanup unused node in SimplifySelectCC..

Can you describe what's happening within SimplifySetCC for the affected test?

Thu, Feb 14, 6:59 AM · Restricted Project
spatel added reviewers for D58210: [SelectionDAGLegalize] Improve promotion of CTLZ: craig.topper, nikic, RKSimon, efriedma.

Adding more potential reviewers for legalization changes. I'm not sure how much combining/optimization we want to do in here.

Thu, Feb 14, 6:42 AM
spatel accepted D51216: Fix IRBuilder.CreateFCmp(X, X) misfolding.

LGTM - have you requested commit access?

Thu, Feb 14, 6:08 AM · Restricted Project
spatel added inline comments to D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Thu, Feb 14, 6:03 AM · Restricted Project

Wed, Feb 13

spatel added a comment to D51216: Fix IRBuilder.CreateFCmp(X, X) misfolding.

Some IR tests added here:
rL353992
I think that's better than IRBuilder tests. Feel free to add other predicates if you think they are needed, but I think those 4 give us good coverage.

Wed, Feb 13, 3:38 PM · Restricted Project
spatel committed rGd05ba496bc55: [ConstProp] add IR tests to show miscompiles; NFC (authored by spatel).
[ConstProp] add IR tests to show miscompiles; NFC
Wed, Feb 13, 3:31 PM
spatel committed rL353992: [ConstProp] add IR tests to show miscompiles; NFC.
[ConstProp] add IR tests to show miscompiles; NFC
Wed, Feb 13, 3:31 PM
spatel added inline comments to D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Wed, Feb 13, 1:43 PM · Restricted Project
spatel created D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers.
Wed, Feb 13, 11:20 AM · Restricted Project
spatel created D58181: [x86] split more v8f32/v8i32 shuffles in lowering.
Wed, Feb 13, 7:30 AM · Restricted Project
spatel accepted D51215: Fix misfolding of IRBuilder.CreateICmp(int_ty X, bitcast (float_ty Y) to int_ty).

LGTM

Wed, Feb 13, 6:02 AM · Restricted Project
spatel added inline comments to D51215: Fix misfolding of IRBuilder.CreateICmp(int_ty X, bitcast (float_ty Y) to int_ty).
Wed, Feb 13, 5:33 AM · Restricted Project

Tue, Feb 12

spatel added a comment to D51215: Fix misfolding of IRBuilder.CreateICmp(int_ty X, bitcast (float_ty Y) to int_ty).

This took some work, but I have an IR test case that should show the bug:
rL353883

Tue, Feb 12, 1:54 PM · Restricted Project