Page MenuHomePhabricator
Feed Advanced Search

Yesterday

spatel committed rG9bcb0766ebe2: [x86] add tests for vector cmps; NFC (authored by spatel).
[x86] add tests for vector cmps; NFC
Mon, Mar 25, 3:10 PM
spatel committed rL356959: [x86] add tests for vector cmps; NFC.
[x86] add tests for vector cmps; NFC
Mon, Mar 25, 3:07 PM
spatel added a comment to D59710: [SLP] remove lower limit for forming reduction patterns.

It really requires some additional improvvements, because I see a lot of regressions for the cmp instructions. I think, at first you should try to vectorize cmp instructions using the horizontal reductions anf only if it was unsuccessful, you need to try to vectorize the operands of the instruction itself.

Mon, Mar 25, 1:08 PM · Restricted Project
spatel updated the diff for D59777: [x86] improve AVX lowering of vector zext.

Patch updated:

  1. Don't match undef lanes unless they exist on both halves of the mask.
  2. New test to show that difference.
Mon, Mar 25, 11:04 AM · Restricted Project
spatel committed rGf49e33e252c4: [x86] add another vector zext test; NFC (authored by spatel).
[x86] add another vector zext test; NFC
Mon, Mar 25, 10:54 AM
spatel committed rL356930: [x86] add another vector zext test; NFC.
[x86] add another vector zext test; NFC
Mon, Mar 25, 10:53 AM
spatel accepted D59363: [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC.

LGTM

Mon, Mar 25, 10:22 AM · Restricted Project
spatel added inline comments to D59777: [x86] improve AVX lowering of vector zext.
Mon, Mar 25, 9:55 AM · Restricted Project
spatel created D59777: [x86] improve AVX lowering of vector zext.
Mon, Mar 25, 9:33 AM · Restricted Project
spatel committed rG76c1ef3d07b0: [x86] add tests for vector zext; NFC (authored by spatel).
[x86] add tests for vector zext; NFC
Mon, Mar 25, 8:55 AM
spatel committed rL356914: [x86] add tests for vector zext; NFC.
[x86] add tests for vector zext; NFC
Mon, Mar 25, 8:55 AM
spatel accepted D59696: [CGP] Build the DominatorTree lazily.

LGTM

Mon, Mar 25, 8:54 AM · Restricted Project

Sun, Mar 24

spatel added a comment to D59363: [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC.

The x86 regression test changes to preserve behavior look good. Pre-commit those, so we're just left actual diffs here?

Sun, Mar 24, 7:22 AM · Restricted Project
spatel committed rG7d676dfd86fa: [x86] improve the default expansion of uaddsat/usubsat (authored by spatel).
[x86] improve the default expansion of uaddsat/usubsat
Sun, Mar 24, 6:55 AM
spatel committed rL356855: [x86] improve the default expansion of uaddsat/usubsat.
[x86] improve the default expansion of uaddsat/usubsat
Sun, Mar 24, 6:54 AM
spatel closed D59006: [x86] improve the default expansion of uaddsat/usubsat.
Sun, Mar 24, 6:54 AM · Restricted Project

Sat, Mar 23

spatel committed rG2e92846d365a: [x86] reduce code duplication; NFC (authored by spatel).
[x86] reduce code duplication; NFC
Sat, Mar 23, 8:00 AM
spatel committed rL356836: [x86] reduce code duplication; NFC.
[x86] reduce code duplication; NFC
Sat, Mar 23, 8:00 AM
spatel added a comment to D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665).

The more I look at the motivating patterns, the less hopeful I am that we can get optimal code by delaying the transforms until SDAG.

Sat, Mar 23, 6:35 AM · Restricted Project

Fri, Mar 22

spatel added inline comments to D59696: [CGP] Build the DominatorTree lazily.
Fri, Mar 22, 4:13 PM · Restricted Project
spatel accepted D59662: [X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if possible if popcnt instruction is not available.

LGTM

Fri, Mar 22, 1:11 PM · Restricted Project
spatel created D59710: [SLP] remove lower limit for forming reduction patterns.
Fri, Mar 22, 12:35 PM · Restricted Project
spatel committed rGa0aaa11afca8: [SLP] fix variables names in test; NFC (authored by spatel).
[SLP] fix variables names in test; NFC
Fri, Mar 22, 11:34 AM
spatel committed rL356790: [SLP] fix variables names in test; NFC.
[SLP] fix variables names in test; NFC
Fri, Mar 22, 11:33 AM
spatel accepted D59473: [ValueTracking] Avoid redundant known bits calculation in computeOverflowForSignedAdd().
Fri, Mar 22, 9:58 AM · Restricted Project
spatel added a comment to D59473: [ValueTracking] Avoid redundant known bits calculation in computeOverflowForSignedAdd().

LGTM

Fri, Mar 22, 9:57 AM · Restricted Project
spatel committed rG221081e3652b: [x86] auto-generate complete test checks; NFC (authored by spatel).
[x86] auto-generate complete test checks; NFC
Fri, Mar 22, 8:33 AM
spatel committed rG0893351c1ca9: [x86] auto-generate complete test checks; NFC (authored by spatel).
[x86] auto-generate complete test checks; NFC
Fri, Mar 22, 8:33 AM
spatel committed rGf39494e79559: [x86] auto-generate complete checks for test; NFC (authored by spatel).
[x86] auto-generate complete checks for test; NFC
Fri, Mar 22, 8:33 AM
spatel committed rG61e2333acb22: [x86] add 'nounwind' to tests to reduce noise; NFC (authored by spatel).
[x86] add 'nounwind' to tests to reduce noise; NFC
Fri, Mar 22, 8:33 AM
spatel committed rL356763: [x86] auto-generate complete test checks; NFC.
[x86] auto-generate complete test checks; NFC
Fri, Mar 22, 8:33 AM
spatel committed rL356762: [x86] auto-generate complete test checks; NFC.
[x86] auto-generate complete test checks; NFC
Fri, Mar 22, 8:33 AM
spatel committed rL356761: [x86] add 'nounwind' to tests to reduce noise; NFC.
[x86] add 'nounwind' to tests to reduce noise; NFC
Fri, Mar 22, 8:33 AM
spatel committed rL356760: [x86] auto-generate complete checks for test; NFC.
[x86] auto-generate complete checks for test; NFC
Fri, Mar 22, 8:33 AM

Thu, Mar 21

spatel added reviewers for D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665): nlopes, regehr, efriedma.
Thu, Mar 21, 3:45 PM · Restricted Project
spatel added inline comments to D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665).
Thu, Mar 21, 3:09 PM · Restricted Project
spatel created D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665).
Thu, Mar 21, 2:32 PM · Restricted Project
spatel added a comment to D59006: [x86] improve the default expansion of uaddsat/usubsat.

This LG, but i'm not sure i understand how this is related to D59066?
Here, we are clearly end up with no select in ASM.
But in D59066 we expand to this pattern.
So there is something else that is able to do the transform that we do manually in D59066?
Should D59066 be doing something else to simply trigger the existing transform?

Thu, Mar 21, 11:17 AM · Restricted Project
spatel committed rG0760758fed77: [x86] add tests with movmsk potential (PR39665); NFC (authored by spatel).
[x86] add tests with movmsk potential (PR39665); NFC
Thu, Mar 21, 10:57 AM
spatel committed rL356691: [x86] add tests with movmsk potential (PR39665); NFC.
[x86] add tests with movmsk potential (PR39665); NFC
Thu, Mar 21, 10:57 AM
spatel committed rGd47eac59efb1: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129) (authored by spatel).
[CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
Thu, Mar 21, 6:58 AM
spatel committed rL356665: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).
[CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
Thu, Mar 21, 6:58 AM
spatel closed D59602: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).
Thu, Mar 21, 6:57 AM · Restricted Project
spatel accepted D59630: [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use..

LGTM - could you add a TODO comment here about using m_APInt() instead of m_ConstantInt()...there's no reason to limit this to scalars AFAICT.

Thu, Mar 21, 6:41 AM · Restricted Project

Wed, Mar 20

spatel updated the diff for D59006: [x86] improve the default expansion of uaddsat/usubsat.

Patch updated:
We improved the generic expansion slightly with D59066. That leaves customization for x86 which is required because umin/umax are custom lowered even if we don't actually have the instructions pmaxud/pmaxuq. That's not a generic lowering problem; that's an x86 problem.

Wed, Mar 20, 3:43 PM · Restricted Project
spatel created D59602: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).
Wed, Mar 20, 10:10 AM · Restricted Project
spatel committed rGa2250e923b39: [CGP] fix formatting; NFC (authored by spatel).
[CGP] fix formatting; NFC
Wed, Mar 20, 9:48 AM
spatel committed rL356572: [CGP] fix formatting; NFC.
[CGP] fix formatting; NFC
Wed, Mar 20, 9:48 AM
spatel committed rGd1ce455f7b6d: [CGP] convert chain of 'if' to 'switch'; NFC (authored by spatel).
[CGP] convert chain of 'if' to 'switch'; NFC
Wed, Mar 20, 8:54 AM
spatel committed rL356566: [CGP] convert chain of 'if' to 'switch'; NFC.
[CGP] convert chain of 'if' to 'switch'; NFC
Wed, Mar 20, 8:54 AM
spatel committed rGfb44f99b73bd: [CGP][x86] add tests for usubo regression (PR41129); NFC (authored by spatel).
[CGP][x86] add tests for usubo regression (PR41129); NFC
Wed, Mar 20, 8:03 AM
spatel committed rL356559: [CGP][x86] add tests for usubo regression (PR41129); NFC.
[CGP][x86] add tests for usubo regression (PR41129); NFC
Wed, Mar 20, 8:02 AM

Tue, Mar 19

spatel accepted D59471: [InstCombine] Fold add nuw + uadd.with.overflow.

Logic looks fine, so I won't hold it up, but seems better to not duplicate code for sibling transforms?

Tue, Mar 19, 11:09 AM · Restricted Project
spatel committed rG5b820323ca11: [InstCombine] fold logic-of-nan-fcmps (PR41069) (authored by spatel).
[InstCombine] fold logic-of-nan-fcmps (PR41069)
Tue, Mar 19, 9:41 AM
spatel committed rL356471: [InstCombine] fold logic-of-nan-fcmps (PR41069).
[InstCombine] fold logic-of-nan-fcmps (PR41069)
Tue, Mar 19, 9:38 AM
spatel committed rG423b9583065c: [InstCombine] add FMF to tests for extra coverage; NFC (authored by spatel).
[InstCombine] add FMF to tests for extra coverage; NFC
Tue, Mar 19, 6:41 AM
spatel committed rL356453: [InstCombine] add FMF to tests for extra coverage; NFC.
[InstCombine] add FMF to tests for extra coverage; NFC
Tue, Mar 19, 6:38 AM
spatel accepted D59541: [InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef.

LGTM

Tue, Mar 19, 6:29 AM · Restricted Project

Mon, Mar 18

spatel added a comment to D59522: [X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize..

Seems good, but let Simon have a look too in case there's some uarch concern that I'm not aware of.

Mon, Mar 18, 5:40 PM · Restricted Project
spatel accepted D59386: [ValueTracking] ConstantRange based overflow detection for unsigned add/sub.

LGTM

Mon, Mar 18, 3:38 PM · Restricted Project
spatel added a comment to D59378: [InstCombine] Prevent icmp transform that can cause inf loop if part of min/max.

@spatel I've created D59506 for the InstSimplify change.

Mon, Mar 18, 2:07 PM · Restricted Project
spatel added a comment to D59506: [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange().

If we want to be conservative for compile-time, we could use the simple pattern matchers (m_SMax...) rather than the heavier ValueTracking call. But we don't have that option currently for abs/nabs.
It seems like we've accomplished the improvement for almost no extra cost though, so that's probably a moot point now.

Mon, Mar 18, 1:41 PM · Restricted Project
spatel added inline comments to D59363: [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC.
Mon, Mar 18, 12:36 PM · Restricted Project
spatel committed rG08b5e68ef673: [InstCombine] add/adjust test for NaN checks; NFC (authored by spatel).
[InstCombine] add/adjust test for NaN checks; NFC
Mon, Mar 18, 10:36 AM
spatel committed rL356383: [InstCombine] add/adjust test for NaN checks; NFC.
[InstCombine] add/adjust test for NaN checks; NFC
Mon, Mar 18, 10:36 AM
spatel added inline comments to D59363: [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC.
Mon, Mar 18, 8:09 AM · Restricted Project
spatel added a comment to D59378: [InstCombine] Prevent icmp transform that can cause inf loop if part of min/max.

I'd prefer that we IR simplify our way out of this infinite loop instead of looking the other way though. Ie, can we get this in instsimplify using a ConstantRange?

That should be relatively simple to do, we just need to support constant range calculation for min/max flavor selects in computeConstantRange().

One thing I didn't mention is that this opportunity was exposed only by InstCombine's instruction sinking. I captured the code after sinking to create the test case.

Also, is it reasonable to assume prior passes will transform the code to avoid triggering possible infinite loops in later passes like this? Unless you mean move part of this transformation into InstSimplify and out of InstCombine?

Mon, Mar 18, 8:07 AM · Restricted Project
spatel added a comment to rL356338: [InstCombine] canonicalize rotate right by constant to rotate left.

Any particular reason to limit this to rotates only? This should be valid for funnel shifts in general.

No, I was just focused on that x86 diff, so didn't generalize. I'll do that in a follow-up.

Mon, Mar 18, 7:31 AM
spatel committed rG6063393536cc: [InstCombine] allow general vector constants for funnel shift to shift… (authored by spatel).
[InstCombine] allow general vector constants for funnel shift to shift…
Mon, Mar 18, 7:28 AM
spatel committed rL356372: [InstCombine] allow general vector constants for funnel shift to shift….
[InstCombine] allow general vector constants for funnel shift to shift…
Mon, Mar 18, 7:27 AM
spatel committed rG84de8a30a05a: [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift (authored by spatel).
[InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Mon, Mar 18, 7:10 AM
spatel committed rL356369: [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift.
[InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Mon, Mar 18, 7:10 AM
spatel committed rGd7f153932246: [InstCombine] add funnel shift tests with arbitrary constants; NFC (authored by spatel).
[InstCombine] add funnel shift tests with arbitrary constants; NFC
Mon, Mar 18, 6:35 AM
spatel committed rL356367: [InstCombine] add funnel shift tests with arbitrary constants; NFC.
[InstCombine] add funnel shift tests with arbitrary constants; NFC
Mon, Mar 18, 6:34 AM

Sun, Mar 17

spatel added a comment to rL356338: [InstCombine] canonicalize rotate right by constant to rotate left.

Any particular reason to limit this to rotates only? This should be valid for funnel shifts in general.

Sun, Mar 17, 2:45 PM
spatel committed rGb3bcd9577181: [InstCombine] canonicalize rotate right by constant to rotate left (authored by spatel).
[InstCombine] canonicalize rotate right by constant to rotate left
Sun, Mar 17, 12:08 PM
spatel committed rL356338: [InstCombine] canonicalize rotate right by constant to rotate left.
[InstCombine] canonicalize rotate right by constant to rotate left
Sun, Mar 17, 12:07 PM
spatel committed rGa3a2f9424e0c: [InstCombine] add tests for rotate by constant using funnel intrinsics; NFC (authored by spatel).
[InstCombine] add tests for rotate by constant using funnel intrinsics; NFC
Sun, Mar 17, 11:54 AM
spatel committed rL356337: [InstCombine] add tests for rotate by constant using funnel intrinsics; NFC.
[InstCombine] add tests for rotate by constant using funnel intrinsics; NFC
Sun, Mar 17, 11:49 AM
spatel added a comment to D57247: Simply operands of masked stores and scatters based on demanded elements.

I added the tests with baseline checks; please rebase after rL356283.
This seems ok to me, but let's double-check with @RKSimon about the refactoring options.

Sun, Mar 17, 9:55 AM · Restricted Project
spatel added reviewers for D59378: [InstCombine] Prevent icmp transform that can cause inf loop if part of min/max: lebedev.ri, nikic.

Please check in the test that provides the missing coverage as an NFC preliminary patch.

Sun, Mar 17, 9:35 AM · Restricted Project
spatel accepted D57372: Demanded elements support for masked.load and masked.gather.

LGTM

Sun, Mar 17, 9:11 AM · Restricted Project
spatel added a comment to D59386: [ValueTracking] ConstantRange based overflow detection for unsigned add/sub.

Hmm, this looks reasonable to me, @spatel ?
Are there any concerns of performance impact of using ConstantRange() here ?
Though i do think it does pull it's weight.

Sun, Mar 17, 8:44 AM · Restricted Project
spatel committed rG6a6e808b699b: [TargetLowering] improve the default expansion of uaddsat/usubsat (authored by spatel).
[TargetLowering] improve the default expansion of uaddsat/usubsat
Sun, Mar 17, 7:57 AM
spatel committed rL356332: [TargetLowering] improve the default expansion of uaddsat/usubsat.
[TargetLowering] improve the default expansion of uaddsat/usubsat
Sun, Mar 17, 7:57 AM
spatel closed D59066: [TargetLowering] improve the default expansion of uaddsat/usubsat.
Sun, Mar 17, 7:57 AM · Restricted Project

Fri, Mar 15

spatel committed rG052d1b7b66a7: [InstCombine] add tests for logic of NaN fcmps; NFC (authored by spatel).
[InstCombine] add tests for logic of NaN fcmps; NFC
Fri, Mar 15, 11:15 AM
spatel committed rL356287: [InstCombine] add tests for logic of NaN fcmps; NFC.
[InstCombine] add tests for logic of NaN fcmps; NFC
Fri, Mar 15, 11:13 AM
spatel committed rGa70c9d49af4f: [InstCombine] add tests for masked store/scatter; NFC (authored by spatel).
[InstCombine] add tests for masked store/scatter; NFC
Fri, Mar 15, 11:06 AM
spatel committed rL356283: [InstCombine] add tests for masked store/scatter; NFC.
[InstCombine] add tests for masked store/scatter; NFC
Fri, Mar 15, 10:59 AM
spatel accepted D57468: Strengthen handling of GEPs and generic calls for all undef lanes.

LGTM

Fri, Mar 15, 10:21 AM · Restricted Project
spatel added a comment to D57789: [CGP] form usub with overflow from sub+icmp.

This patch causes 5% regression of one of our eigen benchmarks on Haswell.

The problem is when it combines the CMP in a hot block with SUB in a cold block into a single SUB in hot block, on a two address architecture like x86, if the operand of CMP has other uses, it needs to make an extra COPY before the original CMP, so there is one more instruction in hot block.

Another patch r355823 papered over the problem in our code, but it didn't fix the root cause.

The regression is only observed on Haswell, it doesn't impact Skylake.

Fri, Mar 15, 6:40 AM · Restricted Project

Thu, Mar 14

spatel committed rG2c9275a79008: [CGP] add another bailout for degenerate code (PR41064) (authored by spatel).
[CGP] add another bailout for degenerate code (PR41064)
Thu, Mar 14, 4:16 PM
spatel committed rL356218: [CGP] add another bailout for degenerate code (PR41064).
[CGP] add another bailout for degenerate code (PR41064)
Thu, Mar 14, 4:15 PM
spatel accepted D59193: [ConstantRange] Add overflow check helpers.

LGTM

Thu, Mar 14, 3:53 PM · Restricted Project
spatel committed rG38f07b1966a9: [InstCombine] remove duplicate tests (authored by spatel).
[InstCombine] remove duplicate tests
Thu, Mar 14, 12:41 PM
spatel committed rL356195: [InstCombine] remove duplicate tests.
[InstCombine] remove duplicate tests
Thu, Mar 14, 12:40 PM
spatel committed rGde1d5d367599: [InstCombine] canonicalize funnel shift constant shift amount to be modulo… (authored by spatel).
[InstCombine] canonicalize funnel shift constant shift amount to be modulo…
Thu, Mar 14, 12:22 PM
spatel committed rG6e86216531e2: [InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC (authored by spatel).
[InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC
Thu, Mar 14, 12:21 PM
spatel committed rL356192: [InstCombine] canonicalize funnel shift constant shift amount to be modulo….
[InstCombine] canonicalize funnel shift constant shift amount to be modulo…
Thu, Mar 14, 12:21 PM