Page MenuHomePhabricator

spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (237 w, 6 d)

Recent Activity

Today

spatel added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Unrelated diffs got uploaded?

Thu, Dec 13, 9:39 AM
spatel committed rL349058: [DAGCombiner] after simplifying demanded elements of vector operand of extract….
[DAGCombiner] after simplifying demanded elements of vector operand of extract…
Thu, Dec 13, 9:08 AM
spatel closed D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.
Thu, Dec 13, 9:08 AM
spatel reopened D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.

Reverted at rL349056 because:

Thu, Dec 13, 8:38 AM
spatel added a reverting change for rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract…: rL349056: revert rL349051: [DAGCombiner] after simplifying demanded elements of vector….
Thu, Dec 13, 8:36 AM
spatel committed rL349056: revert rL349051: [DAGCombiner] after simplifying demanded elements of vector….
revert rL349051: [DAGCombiner] after simplifying demanded elements of vector…
Thu, Dec 13, 8:36 AM
spatel committed rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract….
[DAGCombiner] after simplifying demanded elements of vector operand of extract…
Thu, Dec 13, 7:47 AM
spatel closed D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.
Thu, Dec 13, 7:47 AM
spatel created D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.
Thu, Dec 13, 7:26 AM
spatel added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Sounds like everyone is in agreement about the overall direction. There are just a few inline comments/questions to answer, and then we should be good to go.

Thu, Dec 13, 6:01 AM

Yesterday

spatel updated the diff for D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).

Patch updated:
Added check and test for non-power-of-2 type.

Wed, Dec 12, 3:46 PM
spatel added inline comments to D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).
Wed, Dec 12, 3:14 PM
spatel updated the diff for D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).

Patch updated:

  1. Add hasOneUse() checks and test, so we're not converting to funnel shift if there's potential for perf regression.
  2. I went ahead and committed the PhaseOrdering test since it shows a bug independent of this patch (rL348980).
Wed, Dec 12, 2:18 PM
spatel committed rL348980: [PhaseOrdering] add test for funnel shift (rotate); NFC.
[PhaseOrdering] add test for funnel shift (rotate); NFC
Wed, Dec 12, 2:17 PM
spatel added inline comments to D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).
Wed, Dec 12, 1:27 PM
spatel added inline comments to D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).
Wed, Dec 12, 11:57 AM
spatel committed rL348946: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.
[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
Wed, Dec 12, 10:03 AM
spatel closed D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.
Wed, Dec 12, 10:03 AM
spatel accepted D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores..

LGTM

Wed, Dec 12, 9:38 AM
spatel created D55604: [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924).
Wed, Dec 12, 8:49 AM
spatel committed rL348933: [AggressiveInstCombine] add tests for rotates with branch; NFC.
[AggressiveInstCombine] add tests for rotates with branch; NFC
Wed, Dec 12, 7:31 AM

Tue, Dec 11

spatel requested changes to D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores..

Unsetting 'Approved' until we have a fix for the bug.

Tue, Dec 11, 3:46 PM
spatel reopened D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores..

Reopening - this patch was reverted at rL348844 because it broke an ARM test.

Tue, Dec 11, 3:45 PM
spatel updated the diff for D55515: [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859).

Patch updated:
Add tests for vsetcc simplification edge cases. These are currently folded early in generic DAGCombiner via SimplifySetCC().

Tue, Dec 11, 9:11 AM
spatel committed rL348862: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).
[InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
Tue, Dec 11, 8:41 AM
spatel closed D55529: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).
Tue, Dec 11, 8:41 AM
spatel updated the diff for D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.

Patch updated:
Rebased after cleanup in rL348845 and rL348851. This should be functionally equivalent, but with less cruft. There's a 'TODO' comment for 32-bit target based on the comments here.

Tue, Dec 11, 7:56 AM
spatel committed rL348851: [x86] clean up code for converting 16-bit ops to LEA; NFC.
[x86] clean up code for converting 16-bit ops to LEA; NFC
Tue, Dec 11, 7:32 AM
spatel committed rL348845: [x86] remove dead code for 16-bit LEA formation; NFC.
[x86] remove dead code for 16-bit LEA formation; NFC
Tue, Dec 11, 6:10 AM

Mon, Dec 10

spatel updated the diff for D55529: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).

Patch updated:
Fix code comment to better explain what we're doing.

Mon, Dec 10, 3:32 PM
spatel added inline comments to D55529: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).
Mon, Dec 10, 3:27 PM
spatel updated the summary of D55529: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).
Mon, Dec 10, 3:12 PM
spatel created D55529: [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927).
Mon, Dec 10, 3:09 PM
spatel added a comment to D55515: [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859).

Are there any tests for the 0/max cases? (I guess they're probably just constant folded away anyway?)

Mon, Dec 10, 2:49 PM
spatel edited reviewers for D55169: [ConstantFolding] Handle leading zero-length elements in load folding, added: efriedma, hfinkel, rnk, dblaikie; removed: spatel.

I don't have a good understanding of the potential edge cases with aggregates, so adding some other potential reviewers.

Mon, Dec 10, 2:41 PM
spatel committed rL348800: [InstCombine] add tests for movmsk (PR39927) NFC.
[InstCombine] add tests for movmsk (PR39927) NFC
Mon, Dec 10, 1:50 PM
spatel added inline comments to D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.
Mon, Dec 10, 12:15 PM
spatel accepted D55182: InstCombine: Scalarize single use icmp/fcmp.

LGTM - but I think there's a bug in cheapToScalarize() that I didn't notice before. Please put a FIXME note in there.

Mon, Dec 10, 11:39 AM
spatel added inline comments to D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.
Mon, Dec 10, 11:31 AM
spatel updated the diff for D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.

Patch updated:
Determine op size based on the opcode parameter (no need to change the function signature).

Mon, Dec 10, 11:30 AM
spatel added inline comments to D55515: [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859).
Mon, Dec 10, 9:44 AM
spatel committed rL348776: [x86] fix formatting; NFC.
[x86] fix formatting; NFC
Mon, Dec 10, 9:29 AM
spatel created D55515: [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859).
Mon, Dec 10, 8:57 AM
spatel committed rL348769: [x86] add tests for LowerVSETCC with min/max; NFC.
[x86] add tests for LowerVSETCC with min/max; NFC
Mon, Dec 10, 8:34 AM
spatel added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

I just looked over the codegen changes so far, but I want to add some more knowledgeable x86 hackers to have a look too. There are 2 concerns:

  1. Are there any known uarch problems with overlapping loads?
  2. Are there any known uarch problems with unaligned accesses (either scalar or SSE)?

    If there's any perf data (either nano-benchmarks or full apps) to support the changes, that would be nice to see. This reminds me of PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329 (can we close that now?)

Let's not lose sight of the big picture here. If uarch problems exist, are they *worse* than the cost of calling memcmp()? In other words, is the likely register spills, function call overhead, and dynamic algorithm selection (given the constantness of the size parameter is lost) worth it?

Mon, Dec 10, 7:39 AM

Sun, Dec 9

spatel created D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.
Sun, Dec 9, 7:14 AM
spatel committed rL348723: [x86] regenerate test checks; NFC.
[x86] regenerate test checks; NFC
Sun, Dec 9, 6:51 AM
spatel committed rL348722: [x86] don't try to convert add with undef operands to LEA.
[x86] don't try to convert add with undef operands to LEA
Sun, Dec 9, 6:45 AM
spatel closed D54710: [x86] don't try to convert add with undef operands to LEA.
Sun, Dec 9, 6:45 AM

Sat, Dec 8

spatel accepted D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores..

LGTM - the code change itself is just removing a hack.

Sat, Dec 8, 8:50 AM
spatel committed rL348706: [DAGCombiner] re-enable truncation of binops.
[DAGCombiner] re-enable truncation of binops
Sat, Dec 8, 8:10 AM
spatel added a reviewer for D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores.: RKSimon.
Sat, Dec 8, 7:46 AM
spatel committed rL348705: [x86] add 32-bit RUN for tests and test with opaque constants; NFC.
[x86] add 32-bit RUN for tests and test with opaque constants; NFC
Sat, Dec 8, 7:37 AM

Fri, Dec 7

spatel created D55448: [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates.
Fri, Dec 7, 11:28 AM
spatel committed rL348627: [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC.
[DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC
Fri, Dec 7, 10:54 AM
spatel committed rL348604: [DAGCombiner] disable truncation of binops by default.
[DAGCombiner] disable truncation of binops by default
Fri, Dec 7, 7:51 AM
spatel committed rL348597: [DAGCombiner] remove explicit calls to AddToWorkList; NFCI.
[DAGCombiner] remove explicit calls to AddToWorkList; NFCI
Fri, Dec 7, 7:04 AM

Thu, Dec 6

spatel committed rL348552: [DAGCombiner] use root SDLoc for all nodes created by logic fold.
[DAGCombiner] use root SDLoc for all nodes created by logic fold
Thu, Dec 6, 4:05 PM
spatel committed rL348550: [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI.
[DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI
Thu, Dec 6, 3:57 PM
spatel committed rL348547: [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC.
[DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC
Thu, Dec 6, 3:42 PM
spatel committed rL348534: [DAGCombiner] don't group bswap with casts in logic hoisting fold.
[DAGCombiner] don't group bswap with casts in logic hoisting fold
Thu, Dec 6, 2:13 PM
spatel committed rL348530: [x86] add test for vector bitwise-logic-of-bswaps; NFC.
[x86] add test for vector bitwise-logic-of-bswaps; NFC
Thu, Dec 6, 1:59 PM
spatel committed rL348523: [DAGCombiner] reduce indent; NFC.
[DAGCombiner] reduce indent; NFC
Thu, Dec 6, 12:06 PM
spatel committed rL348518: [DAGCombiner] don't hoist logic op if operands have other uses, part 2.
[DAGCombiner] don't hoist logic op if operands have other uses, part 2
Thu, Dec 6, 11:22 AM
spatel committed rL348516: [PowerPC] add tests for hoisting bitwise logic; NFC.
[PowerPC] add tests for hoisting bitwise logic; NFC
Thu, Dec 6, 11:09 AM
spatel committed rL348508: [DAGCombiner] don't hoist logic op if operands have other uses.
[DAGCombiner] don't hoist logic op if operands have other uses
Thu, Dec 6, 10:19 AM
spatel committed rL348506: [x86] add test for hoistLogicOpWithSameOpcodeHands with extra uses; NFC.
[x86] add test for hoistLogicOpWithSameOpcodeHands with extra uses; NFC
Thu, Dec 6, 10:09 AM
spatel committed rL348501: [DAGCombiner] refactor function that hoists bitwise logic; NFCI.
[DAGCombiner] refactor function that hoists bitwise logic; NFCI
Thu, Dec 6, 9:11 AM
spatel added reviewers for D54839: [CodeGen] Enhance machine PHIs optimization: aemerson, qcolombet, atrick.

(Adding more potential reviewers for MI changes)

Thu, Dec 6, 6:26 AM

Wed, Dec 5

spatel added a comment to D55182: InstCombine: Scalarize single use icmp/fcmp.
  1. There should be a test that shows the benefit of using 'isCheapToScalarize()'. IIUC, that would involve a sequence with multiple binops and/or compares between an insertelement and the trailing extract.
Wed, Dec 5, 3:26 PM
spatel committed rL348423: [InstCombine] remove dead code from visitExtractElement.
[InstCombine] remove dead code from visitExtractElement
Wed, Dec 5, 3:12 PM
spatel committed rL348418: [InstCombine] reduce duplication in visitExtractElementInst; NFC.
[InstCombine] reduce duplication in visitExtractElementInst; NFC
Wed, Dec 5, 2:00 PM
spatel committed rL348417: [InstCombine] add/move tests for extractelement; NFC.
[InstCombine] add/move tests for extractelement; NFC
Wed, Dec 5, 2:00 PM
spatel added reviewers for D55297: [DemandedBits][BDCE] Support vectors of integers: RKSimon, hfinkel.
Wed, Dec 5, 9:57 AM
spatel added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

The change makes the impacted values 2.5 - 3x as fast.

Wed, Dec 5, 9:35 AM
spatel committed rL348383: [DAGCombiner] don't try to extract a fraction of a vector binop and crash….
[DAGCombiner] don't try to extract a fraction of a vector binop and crash…
Wed, Dec 5, 9:13 AM
spatel committed rL348367: [InstCombine] simplify icmps with same operands based on dominating cmp.
[InstCombine] simplify icmps with same operands based on dominating cmp
Wed, Dec 5, 7:07 AM
spatel added reviewers for D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.: andreadb, craig.topper, pcordes, RKSimon.

I just looked over the codegen changes so far, but I want to add some more knowledgeable x86 hackers to have a look too. There are 2 concerns:

  1. Are there any known uarch problems with overlapping loads?
  2. Are there any known uarch problems with unaligned accesses (either scalar or SSE)?
Wed, Dec 5, 6:40 AM

Tue, Dec 4

spatel added a comment to D54710: [x86] don't try to convert add with undef operands to LEA.
This seems to me to be a bug in ProcessImplicitDefs, for some reasoning killing the IMPLICIT_DEF even though there's a subreg use. GlobalISel MIR output looks fine to me.
Tue, Dec 4, 5:07 PM
spatel committed rL348312: [InstCombine] add tests for implied simplifications; NFC.
[InstCombine] add tests for implied simplifications; NFC
Tue, Dec 4, 2:29 PM
spatel added a comment to D54710: [x86] don't try to convert add with undef operands to LEA.

Ping * 2.

Tue, Dec 4, 2:22 PM
spatel committed rL348292: [CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC.
[CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC
Tue, Dec 4, 10:57 AM
spatel committed rL348284: [InstCombine] rearrange foldICmpWithDominatingICmp; NFC.
[InstCombine] rearrange foldICmpWithDominatingICmp; NFC
Tue, Dec 4, 9:48 AM
spatel accepted D54698: [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467).

LGTM. We can handle any of the TODO items as follow-ups.

Tue, Dec 4, 9:42 AM
spatel committed rL348274: [InstCombine] auto-generate full checks for icmp overflow tests; NFC.
[InstCombine] auto-generate full checks for icmp overflow tests; NFC
Tue, Dec 4, 7:45 AM
spatel committed rL348273: [InstCombine] add helper for icmp with dominator; NFC.
[InstCombine] add helper for icmp with dominator; NFC
Tue, Dec 4, 7:38 AM
spatel committed rL348270: [InstCombine] auto-generate full checks for icmp dominator tests; NFC.
[InstCombine] auto-generate full checks for icmp dominator tests; NFC
Tue, Dec 4, 7:03 AM

Mon, Dec 3

spatel added a reviewer for D54698: [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467): fabiang.

I just see a few inline nits. Adding Fabian in case he has any feedback.

Mon, Dec 3, 5:57 PM
spatel committed rL348195: [DAGCombiner] narrow truncated vector binops when legal.
[DAGCombiner] narrow truncated vector binops when legal
Mon, Dec 3, 2:00 PM
spatel closed D55126: [DAGCombiner] narrow truncated vector binops when legal.
Mon, Dec 3, 2:00 PM
spatel committed rL348191: [InstCombine] fix undef propagation bug with shuffle+binop.
[InstCombine] fix undef propagation bug with shuffle+binop
Mon, Dec 3, 1:18 PM
spatel committed rL348175: [InstCombine] rearrange shuffle+binop fold; NFC.
[InstCombine] rearrange shuffle+binop fold; NFC
Mon, Dec 3, 11:56 AM
spatel committed rL348173: [InstCombine] add tests for shuffle+binop fold; NFC.
[InstCombine] add tests for shuffle+binop fold; NFC
Mon, Dec 3, 11:44 AM
spatel committed rL348151: [SimplifyCFG] add tests for cross block compare folding; NFC.
[SimplifyCFG] add tests for cross block compare folding; NFC
Mon, Dec 3, 8:58 AM
spatel closed D54994: [simplifycfg][NFC] add tests for cross block compare instructions fold and optimization..
Mon, Dec 3, 8:58 AM
spatel committed rL348149: [CmpInstAnalysis] fix formatting; NFC.
[CmpInstAnalysis] fix formatting; NFC
Mon, Dec 3, 7:51 AM
spatel added a comment to D55182: InstCombine: Scalarize single use icmp/fcmp.

I think this is the right change for IR (I'm trying to do something similar with D50992 but that's been held up while we try to get the backend in shape). .

Mon, Dec 3, 7:12 AM

Sun, Dec 2

spatel added a comment to D54827: [simplifycfg] Handle 3 sequential branches while the first two can infer the third one..

I've been looking at this:
rL347868
rL347896
rL348088
...and I'm still not exactly sure how we should fix it, but I'm confident that what this patch is proposing is not the right answer. We don't want to create a dummy instruction only to analyze and delete it.
My best guess is that we can handle this case using a small enhancement to ValueTracking plus a minimal change to the callers in InstCombine and/or SimplifyCFG. These are hacks vs. a proper solution that uses a dominator tree and a more significant fix to CVP, but it's what we already do in both InstCombine and SimplifyCFG, so there's really no additional cost for optimizing simple cases like this.

Sun, Dec 2, 9:27 AM
spatel committed rL348090: [SelectionDAG] fold constant with undef vector per element.
[SelectionDAG] fold constant with undef vector per element
Sun, Dec 2, 5:51 AM
spatel committed rL348089: [DAGCombiner] guard against an oversized shift crash.
[DAGCombiner] guard against an oversized shift crash
Sun, Dec 2, 5:37 AM