Page MenuHomePhabricator
Feed Advanced Search

Today

spatel planned changes to D56875: [DAGCombiner] narrow vector binop with 2 insert subvector operands.

Actually, this patch isn't correct as-is. We can't insert into an undef base vector because at least 'xor undef, undef --> 0' (not undef).
I know I avoided or fixed the similar problem in IR somewhere along the way.
We need to actually compute the constant vector for the specified binop's opcode.

Mon, Jan 21, 11:14 AM
spatel added a comment to D56864: [x86] vectorize cast ops in lowering to avoid register file transfers.

Do we have a SSE2/AVX1 cvtdq2pd test case?

Mon, Jan 21, 10:52 AM
spatel added a comment to D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

Filed a blocking bug for the 8.0 release:
https://bugs.llvm.org/show_bug.cgi?id=40394

Mon, Jan 21, 10:09 AM
spatel added a comment to D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

I see the problem now: I forgot to verify that the build vector and the source vector of the extract element are the same size. Should have a fix committed soon.

Mon, Jan 21, 9:48 AM
spatel committed rL351754: [AArch64] add more tests for buildvec to shuffle transform; NFC.
[AArch64] add more tests for buildvec to shuffle transform; NFC
Mon, Jan 21, 9:47 AM
spatel committed rL351753: [DAGCombiner] fix crash when converting build vector to shuffle.
[DAGCombiner] fix crash when converting build vector to shuffle
Mon, Jan 21, 9:30 AM
spatel added a comment to D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

Hello. I am investigating a crash and this assertion failure on AArch64:

lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1548: llvm::SDValue llvm::SelectionDAG::getVectorShuffle(llvm::EVT, const llvm::SDLoc&, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>): Assertion `VT.getVectorNumElements() == Mask.size() && "Must have the same number of vector elements as mask elements!"' failed.
Mon, Jan 21, 9:26 AM
spatel added inline comments to D41342: [InstCombine] Missed optimization in math expression: simplify calls exp functions.
Mon, Jan 21, 7:13 AM

Fri, Jan 18

spatel committed rL351590: [x86] add more movmsk tests; NFC.
[x86] add more movmsk tests; NFC
Fri, Jan 18, 12:46 PM
spatel committed rL351557: [x86] simplify code for SDValue.getOperand(); NFC.
[x86] simplify code for SDValue.getOperand(); NFC
Fri, Jan 18, 7:59 AM
spatel added a comment to D56864: [x86] vectorize cast ops in lowering to avoid register file transfers.

We still seem to be missing many x86_64 cases?

Fri, Jan 18, 6:55 AM

Thu, Jan 17

spatel accepted D56355: [InstCombine] Simplify cttz/ctlz + icmp ugt/ult into mask check.

LGTM - see inline comment for minor improvement.

Thu, Jan 17, 3:42 PM
spatel created D56875: [DAGCombiner] narrow vector binop with 2 insert subvector operands.
Thu, Jan 17, 12:20 PM
spatel added inline comments to D56864: [x86] vectorize cast ops in lowering to avoid register file transfers.
Thu, Jan 17, 9:37 AM
spatel created D56864: [x86] vectorize cast ops in lowering to avoid register file transfers.
Thu, Jan 17, 9:31 AM
spatel added a comment to D56784: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle.

The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.

I did consider that but then we contradict the "3 op limit" for older machines (like pre-SSSE3) before using "variable" shuffle masks - which includes AND masks.

Thu, Jan 17, 6:35 AM

Wed, Jan 16

spatel added a comment to D56784: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle.

The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.

Wed, Jan 16, 2:43 PM
spatel created D56796: [DAGCombiner][x86] add transform/hook to vectorize: cast(extract V, Y).
Wed, Jan 16, 10:10 AM
spatel committed rL351354: [x86] add tests for extracted scalar casts (PR39974); NFC.
[x86] add tests for extracted scalar casts (PR39974); NFC
Wed, Jan 16, 8:15 AM
spatel committed rL351346: [x86] lower shuffle of extracts to AVX2 vperm instructions.
[x86] lower shuffle of extracts to AVX2 vperm instructions
Wed, Jan 16, 6:19 AM
spatel closed D56756: [x86] lower shuffle of extracts to AVX2 vperm instructions.
Wed, Jan 16, 6:19 AM
spatel added inline comments to D56756: [x86] lower shuffle of extracts to AVX2 vperm instructions.
Wed, Jan 16, 6:04 AM

Tue, Jan 15

spatel created D56756: [x86] lower shuffle of extracts to AVX2 vperm instructions.
Tue, Jan 15, 4:09 PM
spatel committed rL351198: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.
[DAGCombiner] reduce buildvec of zexted extracted element to shuffle
Tue, Jan 15, 8:15 AM
spatel closed D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.
Tue, Jan 15, 8:15 AM
spatel added inline comments to D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.
Tue, Jan 15, 7:47 AM

Mon, Jan 14

spatel updated the diff for D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

Patch updated:
No code changes, but rebased after rL351103 so we get more vector zext'ing codegen.

Mon, Jan 14, 1:58 PM
spatel accepted D56679: [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs..

LGTM

Mon, Jan 14, 1:16 PM
spatel added a comment to D56679: [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs..

We are sure that we want to break the cycle here, and not at the other end?
This results in the optimal IR?

Mon, Jan 14, 12:10 PM
spatel committed rL351093: [x86] lower extracted add/sub to horizontal vector math.
[x86] lower extracted add/sub to horizontal vector math
Mon, Jan 14, 10:48 AM
spatel updated the diff for D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

Patch updated:
Added TODO comment for handling ISD::ANY_EXTEND.

Mon, Jan 14, 10:18 AM
spatel added inline comments to D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.
Mon, Jan 14, 8:31 AM

Sat, Jan 12

spatel committed rL351010: [LoopVectorizer] give more advice in remark about failure to vectorize call.
[LoopVectorizer] give more advice in remark about failure to vectorize call
Sat, Jan 12, 7:32 AM
spatel closed D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.
Sat, Jan 12, 7:32 AM
spatel added a comment to D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.

This LGTM too, just adding mtcw wondering if these extra checks for more accurate reporting are worth placing under allowExtraAnalysis(); and/or if TLI->isFunctionVectorizable() shouldn't be the one informing the cause of its failure when returning false.

Sat, Jan 12, 7:24 AM
spatel committed rL351008: [DAGCombiner] fold insert_subvector of insert_subvector.
[DAGCombiner] fold insert_subvector of insert_subvector
Sat, Jan 12, 7:17 AM
spatel closed D56604: [DAGCombiner] fold insert_subvector of insert_subvector.
Sat, Jan 12, 7:17 AM

Fri, Jan 11

spatel created D56604: [DAGCombiner] fold insert_subvector of insert_subvector.
Fri, Jan 11, 8:46 AM
spatel updated the diff for D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.

Patch updated:

  1. Added an FP-type constraint to the mathlib check (no point suggesting FP flags if it's not an FP call).
  2. Changed remark text to include clang-specific flags (and suggest/hope that users can translate those to their actual front-end options if this isn't a clang-based invocation).
Fri, Jan 11, 7:23 AM
spatel committed rL350928: [x86] allow insert/extract when matching horizontal ops.
[x86] allow insert/extract when matching horizontal ops
Fri, Jan 11, 6:31 AM
spatel added inline comments to D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.
Fri, Jan 11, 6:13 AM

Thu, Jan 10

spatel updated the diff for D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.

Patch updated:

  1. Try to distinguish a vectorizable libcall from an arbitrary call (I don't see an exact mapping, but "hasOptimizedCodeGen()" looks close).
  2. Add tests to show that we correctly differentiate the 2 cases.
Thu, Jan 10, 2:37 PM
spatel accepted D55950: [ConstantFolding] Fold undef for integer intrinsics.

LGTM

Thu, Jan 10, 2:33 PM
spatel updated the diff for D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.

Patch updated:
I reduced the reach of this patch to only handle a build vector of undefs + 1-non-undef. If x86 isn't lowering the other cases optimally, then other targets may not be either even if there's no regression test evidence of that.

Thu, Jan 10, 10:58 AM
spatel added inline comments to D55950: [ConstantFolding] Fold undef for integer intrinsics.
Thu, Jan 10, 10:19 AM
spatel created D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.
Thu, Jan 10, 9:33 AM
spatel committed rL350846: [Docs] fix typo, adjust text order.
[Docs] fix typo, adjust text order
Thu, Jan 10, 9:06 AM
spatel committed rL350845: [Docs] add note to avoid 'errno' for better vectorization (PR40265).
[Docs] add note to avoid 'errno' for better vectorization (PR40265)
Thu, Jan 10, 9:01 AM
spatel committed rL350844: [DAGCombiner] simplify code; NFC.
[DAGCombiner] simplify code; NFC
Thu, Jan 10, 8:51 AM
spatel accepted D56309: [X86] Simplify the BRCOND handling for FCMP_UNE..

LGTM

Thu, Jan 10, 8:29 AM
spatel abandoned D56490: [x86] minimal fix for horizontal binop matching for 256-bit vectors (PR40243).

Abandoning - the original patch was already reviewed, so I went ahead and committed that + a similar fix as this patch implements.
That fixes all of the h-op miscompile problems that I am aware of.
I did eventually convince myself that the last section is safe based on the output of existing regression tests.

Thu, Jan 10, 7:40 AM
spatel committed rL350830: [x86] fix remaining miscompile bug in horizontal binop matching (PR40243).
[x86] fix remaining miscompile bug in horizontal binop matching (PR40243)
Thu, Jan 10, 7:31 AM
spatel committed rL350826: [x86] fix horizontal binop matching for 256-bit vectors (PR40243).
[x86] fix horizontal binop matching for 256-bit vectors (PR40243)
Thu, Jan 10, 7:08 AM
spatel closed D56450: [x86] fix horizontal binop matching for 256-bit vectors (PR40243).
Thu, Jan 10, 7:08 AM

Wed, Jan 9

spatel added a comment to D36650: [X86] WIP support narrowing operations when only a subvector is demanded.

We did get semi-generic vector narrowing of binops with:
D53784
D54392

Wed, Jan 9, 9:56 AM
spatel committed rL350745: [x86] use 'nounwind' to remove test noise; NFC.
[x86] use 'nounwind' to remove test noise; NFC
Wed, Jan 9, 9:33 AM
spatel added a comment to D56460: [X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR.

I still don't know much about AVX512, but code change is trivially safer. Upload patch with context for better future searchability?

Wed, Jan 9, 7:09 AM
spatel created D56490: [x86] minimal fix for horizontal binop matching for 256-bit vectors (PR40243).
Wed, Jan 9, 6:53 AM

Tue, Jan 8

spatel committed rL350674: [InstCombine] remove stale comments; NFC.
[InstCombine] remove stale comments; NFC
Tue, Jan 8, 2:55 PM
spatel committed rL350672: [InstCombine] canonicalize another raw IR rotate pattern to funnel shift.
[InstCombine] canonicalize another raw IR rotate pattern to funnel shift
Tue, Jan 8, 2:45 PM
spatel added inline comments to D56450: [x86] fix horizontal binop matching for 256-bit vectors (PR40243).
Tue, Jan 8, 1:40 PM
spatel created D56450: [x86] fix horizontal binop matching for 256-bit vectors (PR40243).
Tue, Jan 8, 12:54 PM
spatel committed rL350646: [x86] add tests for PR40243; NFC.
[x86] add tests for PR40243; NFC
Tue, Jan 8, 11:19 AM

Mon, Jan 7

spatel added a comment to D55935: [X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y)).

Would we better off doing a generic DAGCombine into the optimal xor-and-xor (aka, masked merge) pattern?
(x & C) | (y & ~C) --> ((x ^ y) & C) ^ y --> ((y ^ x) & ~C) ^ x

So do we agree that adding support for this additional pattern only makes sense if one of X or Y is being loaded (once)? It doesn't seem to matter if C is being reused or not.

Mon, Jan 7, 9:05 AM
spatel committed rL350533: [x86] add more tests for LowerToHorizontalOp(); NFC.
[x86] add more tests for LowerToHorizontalOp(); NFC
Mon, Jan 7, 8:13 AM

Sun, Jan 6

spatel committed rL350496: [x86] explicitly set cost of integer add/sub.
[x86] explicitly set cost of integer add/sub
Sun, Jan 6, 8:27 AM

Fri, Jan 4

spatel committed rL350430: [x86] add tests for potential horizontal vector ops; NFC.
[x86] add tests for potential horizontal vector ops; NFC
Fri, Jan 4, 12:18 PM
spatel accepted D55786: [InstCombine] Relax cttz/ctlz with select on zero .

LGTM - please modify or add a test to use vector types, so we have some coverage for that possibility.

Fri, Jan 4, 10:02 AM
spatel committed rL350421: [x86] lower extracted fadd/fsub to horizontal vector math; 2nd try.
[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try
Fri, Jan 4, 9:52 AM
spatel closed D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Fri, Jan 4, 9:51 AM
spatel committed rL350419: [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift.
[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Fri, Jan 4, 9:42 AM
spatel updated the diff for D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Patch updated:
Rebased after cost model enhancements in rL350403. So now this patch is once again independent of any IR diffs.

Fri, Jan 4, 9:05 AM
spatel updated the diff for D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Patch updated:
Updated cost model changes - for SSE2 (P4), the default 2-cycle throughput cost matches Agner.
For SSE1 (P3), the default 2-cycle throughput cost does not match a P3 implementation (but hopefully nobody cares at this point).

Fri, Jan 4, 8:07 AM
spatel added inline comments to D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Fri, Jan 4, 7:50 AM
spatel updated the diff for D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Patch updated:
Add cost model overrides for FADD/FSUB to preserve existing behavior.
I suspect that these should be adjusted, and we should include more opcodes to avoid the default trap, but that's another patch. This maintains the current state, so no test diffs outside of codegen.

Fri, Jan 4, 7:25 AM

Thu, Jan 3

spatel updated the diff for D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Patch updated:
Include cost model and SLP test changes. Should we be overriding the cost of scalar FADD/FSUB, so we don't have these diffs?

Thu, Jan 3, 5:11 PM
spatel planned changes to D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Thu, Jan 3, 4:29 PM
spatel reopened D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Reopening - I reverted with rL350373.
I didn't realize the custom legalization would affect the cost models used by the vectorizers; there are potentially IR test changes that go with this.

Thu, Jan 3, 4:29 PM
spatel committed rL350373: revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math.
revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math
Thu, Jan 3, 4:05 PM
spatel committed rL350369: [x86] lower extracted fadd/fsub to horizontal vector math.
[x86] lower extracted fadd/fsub to horizontal vector math
Thu, Jan 3, 3:20 PM
spatel closed D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Thu, Jan 3, 3:20 PM
spatel added a comment to D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

LGTM - please can you add AVX512 tests to haddsub-undef.ll - we also need AVX-SLOW/AVX-FAST common prefixes as well

Thu, Jan 3, 3:18 PM
spatel committed rL350364: [x86] add 512-bit vector tests for horizontal ops; NFC.
[x86] add 512-bit vector tests for horizontal ops; NFC
Thu, Jan 3, 3:00 PM
spatel committed rL350362: [x86] add AVX512 runs for horizontal ops; NFC.
[x86] add AVX512 runs for horizontal ops; NFC
Thu, Jan 3, 2:46 PM
spatel committed rL350358: [x86] remove dead CHECK lines from test file; NFC.
[x86] remove dead CHECK lines from test file; NFC
Thu, Jan 3, 2:34 PM
spatel committed rL350357: [x86] split tests for FP and integer horizontal math.
[x86] split tests for FP and integer horizontal math
Thu, Jan 3, 2:30 PM
spatel committed rL350356: [x86] add common FileCheck prefix to reduce assert duplication; NFC.
[x86] add common FileCheck prefix to reduce assert duplication; NFC
Thu, Jan 3, 2:16 PM
spatel committed rL350354: [DAGCombiner][x86] scalarize binop followed by extractelement.
[DAGCombiner][x86] scalarize binop followed by extractelement
Thu, Jan 3, 1:35 PM
spatel closed D55722: [DAGCombiner] scalarize binop followed by extractelement.
Thu, Jan 3, 1:35 PM
spatel created D56281: [DAGCombiner] reduce buildvec of zexted extracted element to shuffle.
Thu, Jan 3, 11:18 AM
spatel added inline comments to D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Thu, Jan 3, 10:11 AM
spatel committed rL350338: [x86] add tests for buildvector with extracted element; NFC.
[x86] add tests for buildvector with extracted element; NFC
Thu, Jan 3, 9:59 AM
spatel added a comment to D50941: [DAGCombiner] unwrap truncated subtraction operand for rol generation in matchRotateSub.

The motivating cases seen in the regression tests here and PR15880 and PR32023 ( https://bugs.llvm.org/show_bug.cgi?id=32023 ) were fixed more generally with rL348706, so I think this can be abandoned.

Thu, Jan 3, 7:40 AM

Wed, Jan 2

spatel updated the diff for D55722: [DAGCombiner] scalarize binop followed by extractelement.

Patch updated:
Avoid controversy by adding a default-off TLI hook for this transform. Enable it for x86, so those diffs are similar to the previous rev of the patch. No ARM changes now, and also no SystemZ changes (probably better to fix the BYTE_MASK transform first?).

Wed, Jan 2, 12:54 PM
spatel updated the diff for D56011: [x86] lower extracted fadd/fsub to horizontal vector math.

Patch updated:

  1. Ease extra uses constraint - we might still be able to avoid a shuffle/extract.
  2. Allow 512-bit vectors.
  3. Add TODO to for other extract index pairs.
Wed, Jan 2, 9:14 AM
spatel added inline comments to D56011: [x86] lower extracted fadd/fsub to horizontal vector math.
Wed, Jan 2, 9:00 AM
spatel committed rL350221: [x86] add more tests for potential horizontal ops; NFC.
[x86] add more tests for potential horizontal ops; NFC
Wed, Jan 2, 8:40 AM
spatel accepted D56199: [X86] Support SHLD/SHRD masked shift-counts (PR34641).

LGTM

Wed, Jan 2, 7:45 AM
spatel accepted D55975: [X86] Remove x86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time.

LGTM

Wed, Jan 2, 7:35 AM

Tue, Jan 1

spatel committed rL350199: [InstCombine] canonicalize raw IR rotate patterns to funnel shift.
[InstCombine] canonicalize raw IR rotate patterns to funnel shift
Tue, Jan 1, 1:55 PM