This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
3/9
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/2
icmp-abs-C-vec.ll

Differential D142602

[X86] Expand transform (icmp eq/ne (ABS A), C) -> (and/or (icmp eq/ne A, C), (icmp eq/ne A, -C))
AbandonedPublic

Authored by goldstein.w.n on Jan 25 2023, 10:14 PM.

Download Raw Diff

Details

Reviewers

pengfei
RKSimon

Summary

This also makes sense if A is a vector type with i64 elements but
the target doesn't have avx512 but has avx2/sse4.1 (for ymm/xmm respectively).

In that case ABS will expand with 3 instructions `blendv(A, sub(set0,
A))` so its better to just to transform the version with fewer/faster
instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

goldstein.w.n created this revision.Jan 25 2023, 10:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 10:14 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

goldstein.w.n requested review of this revision.Jan 25 2023, 10:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 10:14 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

goldstein.w.n added a parent revision: D142601: [DAGCombiner]: Add transform (and/or (icmp eq/ne (A, C)), (icmp eq/ne (A, -C))) -> (icmp eq/ne (ABS A), ABS(C)).Jan 25 2023, 10:14 PM

goldstein.w.n added reviewers: pengfei, RKSimon.

Should this be merged with D142345?
Also this has a habit causing extra constants to manifest, maybe not worth it? Or could guard behind a check that the necessary constant nodes already exist. Seems worth it but maybe not.

Harbormaster completed remote builds in B210041: Diff 492333.Jan 25 2023, 11:02 PM

Rebase

Harbormaster completed remote builds in B210520: Diff 492979.Jan 27 2023, 10:02 PM

Rebase

Harbormaster completed remote builds in B210631: Diff 493117.Jan 29 2023, 1:06 PM

ping.

RKSimon added inline comments.Feb 5 2023, 9:18 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53450	SSE41 implies SSE2

RKSimon added inline comments.Feb 5 2023, 9:20 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53461	Doesn't this need to apply to the vector cases as well?

goldstein.w.n added inline comments.Feb 5 2023, 9:29 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53461	Doesn't this need to apply to the vector cases as well? No, for the scalar case we have a little hack to handle `(X == Pow2C \|\| X == -Pow2C)` which is the only case this is really worth it. For the vector case if there is no fast `abs` its preferable for an arbitrary `C` to do `X == C \|\| X == -C` as opposed to `Abs(X) == Abs(C)`.

goldstein.w.n marked an inline comment as done.Feb 5 2023, 9:37 AM

Remove unnecessary sse2 check + fixup commit

Harbormaster completed remote builds in B211946: Diff 494930.Feb 5 2023, 10:54 AM

ping

RKSimon added inline comments.Feb 11 2023, 9:36 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53461	In which case, could we move the isConstOrConstSplat check inside the isScalarInteger test and replace the -CInt with getNode() / FoldConstantArithmetic() ?

goldstein.w.n added inline comments.Feb 11 2023, 2:17 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
53461	What do you mean by this? As in don't go through APInt to get the value of C/-C? Is there an issue how with it is now?

pengfei added inline comments.Feb 11 2023, 7:59 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
53465	whereas
53465	The comment is not clear to me. Can you refactor it?
53466	is
llvm/test/CodeGen/X86/icmp-abs-C-vec.ll
104–107	I doubt if this is beneficial. The transform neither reduces instructions nor improves throughput, but it introduces extra memory load. WDYT?

goldstein.w.n added inline comments.Feb 11 2023, 10:37 PM

llvm/test/CodeGen/X86/icmp-abs-C-vec.ll
104–107	It's not a lot more memory, only 8 more bytes for the broadcast. If the new constant micro-fused with the vpcmp then it would be +32 bytes but save a true instruction. Also note `vblendvpd` is 2 uops, not 1. But I see the point. Think it would generally make sense as in a loop the load can be hoisted in which case vpcmpeq + vpor is better than vpsub + vblendvpd but granted not by much. Could make this transform only happen if -C already exists as a node in the DAG, you think that preferable?

RKSimon added inline comments.Feb 12 2023, 3:35 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53461	As it is now it will only work for RHS with uniform/splat values - but there doesn't appear to be any reason for the vector case not to work for non-uniform cases

Also I'm curious how well this could work on pre-SSSE3 codegen - where we don't have PABS instructions at all

In D142602#4121033, @RKSimon wrote:

Also I'm curious how well this could work on pre-SSSE3 codegen - where we don't have PABS instructions at all

Pre ssse3 abs gets the following codegen:

0000000000000000 <abs_v2i64>:
   0:	66 0f 6f c8          	movdqa %xmm0,%xmm1
   4:	66 0f 72 e1 1f       	psrad  $0x1f,%xmm1
   9:	66 0f 70 c9 f5       	pshufd $0xf5,%xmm1,%xmm1
   e:	66 0f ef c1          	pxor   %xmm1,%xmm0
  12:	66 0f fb c1          	psubq  %xmm1,%xmm0
  16:	c3                   	ret    
  17:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  1e:	00 00 

0000000000000020 <abs_v4i32>:
  20:	66 0f 6f c8          	movdqa %xmm0,%xmm1
  24:	66 0f 72 e1 1f       	psrad  $0x1f,%xmm1
  29:	66 0f ef c1          	pxor   %xmm1,%xmm0
  2d:	66 0f fa c1          	psubd  %xmm1,%xmm0
  31:	c3                   	ret    
  32:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  39:	00 00 00 
  3c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000000040 <abs_v8i16>:
  40:	66 0f ef c9          	pxor   %xmm1,%xmm1
  44:	66 0f f9 c8          	psubw  %xmm0,%xmm1
  48:	66 0f ee c1          	pmaxsw %xmm1,%xmm0
  4c:	c3                   	ret    
  4d:	0f 1f 00             	nopl   (%rax)

0000000000000050 <abs_v16i8>:
  50:	66 0f ef c9          	pxor   %xmm1,%xmm1
  54:	66 0f f8 c8          	psubb  %xmm0,%xmm1
  58:	66 0f da c1          	pminub %xmm1,%xmm0
  5c:	c3                   	ret

Probably makes sense for i64 but for the rest will make this only if the node already exists.

In D142602#4121358, @goldstein.w.n wrote:

In D142602#4121033, @RKSimon wrote:

Also I'm curious how well this could work on pre-SSSE3 codegen - where we don't have PABS instructions at all

Pre ssse3 abs gets the following codegen:

0000000000000000 <abs_v2i64>:
   0:	66 0f 6f c8          	movdqa %xmm0,%xmm1
   4:	66 0f 72 e1 1f       	psrad  $0x1f,%xmm1
   9:	66 0f 70 c9 f5       	pshufd $0xf5,%xmm1,%xmm1
   e:	66 0f ef c1          	pxor   %xmm1,%xmm0
  12:	66 0f fb c1          	psubq  %xmm1,%xmm0
  16:	c3                   	ret    
  17:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  1e:	00 00 

0000000000000020 <abs_v4i32>:
  20:	66 0f 6f c8          	movdqa %xmm0,%xmm1
  24:	66 0f 72 e1 1f       	psrad  $0x1f,%xmm1
  29:	66 0f ef c1          	pxor   %xmm1,%xmm0
  2d:	66 0f fa c1          	psubd  %xmm1,%xmm0
  31:	c3                   	ret    
  32:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  39:	00 00 00 
  3c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000000040 <abs_v8i16>:
  40:	66 0f ef c9          	pxor   %xmm1,%xmm1
  44:	66 0f f9 c8          	psubw  %xmm0,%xmm1
  48:	66 0f ee c1          	pmaxsw %xmm1,%xmm0
  4c:	c3                   	ret    
  4d:	0f 1f 00             	nopl   (%rax)

0000000000000050 <abs_v16i8>:
  50:	66 0f ef c9          	pxor   %xmm1,%xmm1
  54:	66 0f f8 c8          	psubb  %xmm0,%xmm1
  58:	66 0f da c1          	pminub %xmm1,%xmm0
  5c:	c3                   	ret

Probably makes sense for i64 but for the rest will make this only if the node already exists.

For the time being, think I'm going to drop this commit, doesn't seem to be much gain. Might revisit later.

goldstein.w.n abandoned this revision.May 12 2023, 3:16 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

50 lines

test/

CodeGen/

X86/

icmp-abs-C-vec.ll

137 lines

Diff 494930

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,752 Lines • ▼ Show 20 Lines
	// cmpne(trunc(x),0) --> cmpne(x,0)			// cmpne(trunc(x),0) --> cmpne(x,0)
	// iff x upper bits are zero.			// iff x upper bits are zero.
	// TODO: Add support for RHS to be truncate as well?			// TODO: Add support for RHS to be truncate as well?
	if (LHS.getOpcode() == ISD::TRUNCATE &&			if (LHS.getOpcode() == ISD::TRUNCATE &&
	LHS.getOperand(0).getScalarValueSizeInBits() >= 32 &&			LHS.getOperand(0).getScalarValueSizeInBits() >= 32 &&
	isNullConstant(RHS) && !DCI.isBeforeLegalize()) {			isNullConstant(RHS) && !DCI.isBeforeLegalize()) {
	EVT SrcVT = LHS.getOperand(0).getValueType();			EVT SrcVT = LHS.getOperand(0).getValueType();
	APInt UpperBits = APInt::getBitsSetFrom(SrcVT.getScalarSizeInBits(),			APInt UpperBits = APInt::getBitsSetFrom(SrcVT.getScalarSizeInBits(),
	OpVT.getScalarSizeInBits());			OpVT.getScalarSizeInBits());
				RKSimonUnsubmitted Done Reply Inline Actions SSE41 implies SSE2 RKSimon: SSE41 implies SSE2
	const TargetLowering &TLI = DAG.getTargetLoweringInfo();			const TargetLowering &TLI = DAG.getTargetLoweringInfo();
	if (DAG.MaskedValueIsZero(LHS.getOperand(0), UpperBits) &&			if (DAG.MaskedValueIsZero(LHS.getOperand(0), UpperBits) &&
	TLI.isTypeLegal(LHS.getOperand(0).getValueType()))			TLI.isTypeLegal(LHS.getOperand(0).getValueType()))
	return DAG.getSetCC(DL, VT, LHS.getOperand(0),			return DAG.getSetCC(DL, VT, LHS.getOperand(0),
	DAG.getConstant(0, DL, SrcVT), CC);			DAG.getConstant(0, DL, SrcVT), CC);
	}			}
				}

	// With C as a power of 2 and C != 0 and C != INT_MIN:			if (OpVT.isInteger() && LHS.getOpcode() == ISD::ABS && LHS.hasOneUse()) {
	// icmp eq Abs(X) C ->			if (auto *C = isConstOrConstSplat(RHS)) {
	// (icmp eq A, C) \| (icmp eq A, -C)			const APInt &CInt = C->getAPIntValue();
				RKSimonUnsubmitted Not Done Reply Inline Actions Doesn't this need to apply to the vector cases as well? RKSimon: Doesn't this need to apply to the vector cases as well?
				goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Doesn't this need to apply to the vector cases as well? No, for the scalar case we have a little hack to handle `(X == Pow2C \|\| X == -Pow2C)` which is the only case this is really worth it. For the vector case if there is no fast `abs` its preferable for an arbitrary `C` to do `X == C \|\| X == -C` as opposed to `Abs(X) == Abs(C)`. goldstein.w.n: > Doesn't this need to apply to the vector cases as well? No, for the scalar case we have a…
				RKSimonUnsubmitted Not Done Reply Inline Actions In which case, could we move the isConstOrConstSplat check inside the isScalarInteger test and replace the -CInt with getNode() / FoldConstantArithmetic() ? RKSimon: In which case, could we move the isConstOrConstSplat check inside the isScalarInteger test and…
				goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions What do you mean by this? As in don't go through APInt to get the value of C/-C? Is there an issue how with it is now? goldstein.w.n: What do you mean by this? As in don't go through APInt to get the value of C/-C? Is there an…
				RKSimonUnsubmitted Not Done Reply Inline Actions As it is now it will only work for RHS with uniform/splat values - but there doesn't appear to be any reason for the vector case not to work for non-uniform cases RKSimon: As it is now it will only work for RHS with uniform/splat values - but there doesn't appear to…
	// icmp ne Abs(X) C ->			bool ConvertToLogicOpOfSETCC = false;
	// (icmp ne A, C) & (icmp ne A, -C)			if (OpVT.isVector() && OpVT.getVectorElementType() == MVT::i64 &&
	// Both of these patterns can be better optimized in			!Subtarget.hasAVX512()) {
	// DAGCombiner::foldAndOrOfSETCC. Note this only applies for scalar			// If ABS(vNxi64) requires avx512 even for xmm/ymm wereas SETCC/ALU
				pengfeiUnsubmitted Not Done Reply Inline Actions whereas pengfei: whereas
				pengfeiUnsubmitted Not Done Reply Inline Actions The comment is not clear to me. Can you refactor it? pengfei: The comment is not clear to me. Can you refactor it?
	// integers which is checked above.			// are available with (sse2/sse4.1)/avx2. If ABS it not available,
				pengfeiUnsubmitted Not Done Reply Inline Actions is pengfei: is
	if (LHS.getOpcode() == ISD::ABS && LHS.hasOneUse()) {			// check if SETCC/ALU are, and if so, fold.
	if (auto *C = dyn_cast<ConstantSDNode>(RHS)) {			if (OpVT.getSizeInBits() == 128)
	const APInt &CInt = C->getAPIntValue();			ConvertToLogicOpOfSETCC = Subtarget.hasSSE41();
				else if (OpVT.getSizeInBits() == 256)
				ConvertToLogicOpOfSETCC = Subtarget.hasAVX2();
				} else if (OpVT.isScalarInteger()) {
				// With C as a power of 2 and C != 0 and C != INT_MIN:
				// icmp eq Abs(X) C ->
				// (icmp eq A, C) \| (icmp eq A, -C)
				// icmp ne Abs(X) C ->
				// (icmp ne A, C) & (icmp ne A, -C)
	// We can better optimize this case in DAGCombiner::foldAndOrOfSETCC.			// We can better optimize this case in DAGCombiner::foldAndOrOfSETCC.
	if (CInt.isPowerOf2() && !CInt.isMinSignedValue()) {			ConvertToLogicOpOfSETCC =
	SDValue BaseOp = LHS.getOperand(0);			CInt.isPowerOf2() && !CInt.isMinSignedValue();
	SDValue SETCC0 = DAG.getSetCC(DL, VT, BaseOp, RHS, CC);			}
	SDValue SETCC1 = DAG.getSetCC(
	DL, VT, BaseOp, DAG.getConstant(-CInt, DL, OpVT), CC);			if (ConvertToLogicOpOfSETCC) {
	return DAG.getNode(CC == ISD::SETEQ ? ISD::OR : ISD::AND, DL, VT,			SDValue BaseOp = LHS.getOperand(0);
	SETCC0, SETCC1);			SDValue SETCC0 = DAG.getSetCC(DL, VT, BaseOp, RHS, CC);
	}			SDValue SETCC1 = DAG.getSetCC(DL, VT, BaseOp,
				DAG.getConstant(-CInt, DL, OpVT), CC);
				return DAG.getNode(CC == ISD::SETEQ ? ISD::OR : ISD::AND, DL, VT,
				SETCC0, SETCC1);
	}			}
	}			}
	}			}
	}			}

	if (VT.isVector() && VT.getVectorElementType() == MVT::i1 &&			if (VT.isVector() && VT.getVectorElementType() == MVT::i1 &&
	(CC == ISD::SETNE \|\| CC == ISD::SETEQ \|\| ISD::isSignedIntSetCC(CC))) {			(CC == ISD::SETNE \|\| CC == ISD::SETEQ \|\| ISD::isSignedIntSetCC(CC))) {
	// Using temporaries to avoid messing up operand ordering for later			// Using temporaries to avoid messing up operand ordering for later
	▲ Show 20 Lines • Show All 4,402 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/icmp-abs-C-vec.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpcmpeqq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %k1			; AVX512-NEXT: vpcmpeqq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX2-LABEL: illegal_abs_to_eq_or:			; AVX2-LABEL: illegal_abs_to_eq_or:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpsubq %ymm0, %ymm1, %ymm1
	; AVX2-NEXT: vblendvpd %ymm0, %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
				; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
				; AVX2-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm2
	; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
				; AVX2-NEXT: vpor %ymm2, %ymm0, %ymm0
				pengfeiUnsubmitted Not Done Reply Inline Actions I doubt if this is beneficial. The transform neither reduces instructions nor improves throughput, but it introduces extra memory load. WDYT? pengfei: I doubt if this is beneficial. The transform neither reduces instructions nor improves…
				goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions It's not a lot more memory, only 8 more bytes for the broadcast. If the new constant micro-fused with the vpcmp then it would be +32 bytes but save a true instruction. Also note `vblendvpd` is 2 uops, not 1. But I see the point. Think it would generally make sense as in a loop the load can be hoisted in which case vpcmpeq + vpor is better than vpsub + vblendvpd but granted not by much. Could make this transform only happen if -C already exists as a node in the DAG, you think that preferable? goldstein.w.n: It's not a lot more memory, only 8 more bytes for the broadcast. If the new constant micro…
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpackssdw %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; SSE41-LABEL: illegal_abs_to_eq_or:			; SSE41-LABEL: illegal_abs_to_eq_or:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [129,129]
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: movdqa {{.*#+}} xmm3 = [18446744073709551487,18446744073709551487]
	; SSE41-NEXT: pxor %xmm4, %xmm4			; SSE41-NEXT: movdqa %xmm1, %xmm4
	; SSE41-NEXT: psubq %xmm0, %xmm4			; SSE41-NEXT: pcmpeqq %xmm3, %xmm4
	; SSE41-NEXT: blendvpd %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pcmpeqq %xmm2, %xmm1
	; SSE41-NEXT: psubq %xmm1, %xmm3			; SSE41-NEXT: por %xmm4, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: pcmpeqq %xmm0, %xmm3
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1			; SSE41-NEXT: pcmpeqq %xmm2, %xmm0
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [129,129]			; SSE41-NEXT: por %xmm3, %xmm0
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm1			; SSE41-NEXT: packssdw %xmm1, %xmm0
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm2
	; SSE41-NEXT: packssdw %xmm1, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; SSE2-LABEL: illegal_abs_to_eq_or:			; SSE2-LABEL: illegal_abs_to_eq_or:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm0, %xmm2			; SSE2-NEXT: movdqa %xmm0, %xmm2
	; SSE2-NEXT: psrad $31, %xmm2			; SSE2-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; SSE2-NEXT: pxor %xmm2, %xmm0			; SSE2-NEXT: pxor %xmm2, %xmm0
	Show All 22 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpabsq %ymm0, %ymm0			; AVX512-NEXT: vpabsq %ymm0, %ymm0
	; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
	; AVX512-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX2-LABEL: illegal_abs_to_eq_or_sext:			; AVX2-LABEL: illegal_abs_to_eq_or_sext:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpsubq %ymm0, %ymm1, %ymm1
	; AVX2-NEXT: vblendvpd %ymm0, %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
				; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
				; AVX2-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm2
	; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
				; AVX2-NEXT: vpor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; SSE41-LABEL: illegal_abs_to_eq_or_sext:			; SSE41-LABEL: illegal_abs_to_eq_or_sext:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [129,129]
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: movdqa {{.*#+}} xmm3 = [18446744073709551487,18446744073709551487]
	; SSE41-NEXT: pxor %xmm4, %xmm4			; SSE41-NEXT: movdqa %xmm0, %xmm4
	; SSE41-NEXT: psubq %xmm1, %xmm4			; SSE41-NEXT: pcmpeqq %xmm3, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: pcmpeqq %xmm2, %xmm0
	; SSE41-NEXT: blendvpd %xmm0, %xmm4, %xmm1			; SSE41-NEXT: por %xmm4, %xmm0
	; SSE41-NEXT: psubq %xmm2, %xmm3			; SSE41-NEXT: pcmpeqq %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: pcmpeqq %xmm2, %xmm1
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm2			; SSE41-NEXT: por %xmm3, %xmm1
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [129,129]
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm2
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; SSE2-LABEL: illegal_abs_to_eq_or_sext:			; SSE2-LABEL: illegal_abs_to_eq_or_sext:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: movdqa %xmm1, %xmm2
	; SSE2-NEXT: psrad $31, %xmm2			; SSE2-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; SSE2-NEXT: pxor %xmm2, %xmm1			; SSE2-NEXT: pxor %xmm2, %xmm1
	Show All 24 Lines
	; AVX512-NEXT: vpcmpneqq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %k1			; AVX512-NEXT: vpcmpneqq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX2-LABEL: illegal_abs_to_ne_and:			; AVX2-LABEL: illegal_abs_to_ne_and:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpsubq %ymm0, %ymm1, %ymm1
	; AVX2-NEXT: vblendvpd %ymm0, %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
	; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm1
	; AVX2-NEXT: vpcmpeqd %ymm1, %ymm1, %ymm1			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
	; AVX2-NEXT: vpxor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm0
				; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2
				; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
				; AVX2-NEXT: vpandn %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpackssdw %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; SSE41-LABEL: illegal_abs_to_ne_and:			; SSE41-LABEL: illegal_abs_to_ne_and:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [129,129]
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: pxor %xmm4, %xmm4			; SSE41-NEXT: pcmpeqq %xmm2, %xmm3
	; SSE41-NEXT: psubq %xmm0, %xmm4			; SSE41-NEXT: movdqa {{.*#+}} xmm4 = [18446744073709551487,18446744073709551487]
	; SSE41-NEXT: blendvpd %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pcmpeqq %xmm4, %xmm1
	; SSE41-NEXT: psubq %xmm1, %xmm3			; SSE41-NEXT: pcmpeqd %xmm5, %xmm5
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: pxor %xmm5, %xmm1
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1			; SSE41-NEXT: pandn %xmm1, %xmm3
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [129,129]
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm1
	; SSE41-NEXT: pcmpeqd %xmm3, %xmm3
	; SSE41-NEXT: pxor %xmm3, %xmm1
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm2			; SSE41-NEXT: pcmpeqq %xmm0, %xmm2
	; SSE41-NEXT: pxor %xmm3, %xmm2			; SSE41-NEXT: pcmpeqq %xmm4, %xmm0
	; SSE41-NEXT: packssdw %xmm1, %xmm2			; SSE41-NEXT: pxor %xmm5, %xmm0
				; SSE41-NEXT: pandn %xmm0, %xmm2
				; SSE41-NEXT: packssdw %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; SSE2-LABEL: illegal_abs_to_ne_and:			; SSE2-LABEL: illegal_abs_to_ne_and:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm0, %xmm2			; SSE2-NEXT: movdqa %xmm0, %xmm2
	; SSE2-NEXT: psrad $31, %xmm2			; SSE2-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	Show All 27 Lines
	; AVX512-NEXT: vpabsq %ymm0, %ymm0			; AVX512-NEXT: vpabsq %ymm0, %ymm0
	; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
	; AVX512-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpternlogq $15, %ymm0, %ymm0, %ymm0			; AVX512-NEXT: vpternlogq $15, %ymm0, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX2-LABEL: illegal_abs_to_ne_and_sext:			; AVX2-LABEL: illegal_abs_to_ne_and_sext:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpsubq %ymm0, %ymm1, %ymm1
	; AVX2-NEXT: vblendvpd %ymm0, %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [129,129,129,129]
	; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm1
	; AVX2-NEXT: vpcmpeqd %ymm1, %ymm1, %ymm1			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
	; AVX2-NEXT: vpxor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm0
				; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2
				; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
				; AVX2-NEXT: vpandn %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; SSE41-LABEL: illegal_abs_to_ne_and_sext:			; SSE41-LABEL: illegal_abs_to_ne_and_sext:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [129,129]
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: movdqa %xmm0, %xmm3
	; SSE41-NEXT: pxor %xmm4, %xmm4			; SSE41-NEXT: pcmpeqq %xmm2, %xmm3
	; SSE41-NEXT: psubq %xmm1, %xmm4			; SSE41-NEXT: movdqa {{.*#+}} xmm4 = [18446744073709551487,18446744073709551487]
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: pcmpeqq %xmm4, %xmm0
	; SSE41-NEXT: blendvpd %xmm0, %xmm4, %xmm1			; SSE41-NEXT: pcmpeqd %xmm5, %xmm5
	; SSE41-NEXT: psubq %xmm2, %xmm3			; SSE41-NEXT: pxor %xmm5, %xmm0
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: pandn %xmm0, %xmm3
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pcmpeqq %xmm1, %xmm2
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [129,129]			; SSE41-NEXT: pcmpeqq %xmm4, %xmm1
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm2			; SSE41-NEXT: pxor %xmm5, %xmm1
	; SSE41-NEXT: pcmpeqd %xmm3, %xmm3			; SSE41-NEXT: pandn %xmm1, %xmm2
	; SSE41-NEXT: pxor %xmm3, %xmm2			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pcmpeqq %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: pxor %xmm3, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; SSE2-LABEL: illegal_abs_to_ne_and_sext:			; SSE2-LABEL: illegal_abs_to_ne_and_sext:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: movdqa %xmm1, %xmm2
	; SSE2-NEXT: psrad $31, %xmm2			; SSE2-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; SSE2-NEXT: pxor %xmm2, %xmm1			; SSE2-NEXT: pxor %xmm2, %xmm1
	▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines