This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
10
LangRef.rst
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
9
LegalizeVectorOps.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
7
absdiff_128.ll
7
absdiff_256.ll
-
absdiff_expand.ll

Differential D11678

[CodeGen] Fixes absdiff intrinsic: LangRef doc/test case improvement and corresponding code change
ClosedPublic

Authored by • ashahid on Jul 31 2015, 5:19 AM.

Download Raw Diff

Details

Reviewers

mzolotukhin
jmolloy
hfinkel

Commits

rG13f1dfdf2ead: Codegen: Fix llvm.*absdiff semantic.
rL248483: Codegen: Fix llvm.*absdiff semantic.

Summary

This patch fixes the condition code for 'compare' llvm IR in the expansion of *absdiff* intrinsic.

LangRef doc is updated to reflect this change.

Test case is divided into two, based on the data width(128/256 bit). Also updated the tests to make it non-fragile.

Diff Detail

Event Timeline

• ashahid updated this revision to Diff 31118.Jul 31 2015, 5:19 AM

• ashahid retitled this revision from to [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change.

• ashahid updated this object.

• ashahid added reviewers: mzolotukhin, jmolloy, hfinkel.

• ashahid set the repository for this revision to rL LLVM.

• ashahid added a subscriber: llvm-commits.

Hi Shahid,

Please find some comments inline:

docs/LangRef.rst
10387–10390	What's the difference between `llvm.uabsdiff` and `llvm.sabsdiff` then?
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737	AFAIU, this should be `ISD::SETLE`.
test/CodeGen/X86/absdiff_128.ll
150–152	This `CHECK-DAG` doesn't make much sense, since it's limited by `CHECK` and `CHECK-NEXT` from both sides. Moreover, I think the right way to make the tests less bristle is to not check for everything, but just look for key instructions. For example, we definitely expect to see `psubd`, then, maybe after several other instructions, we want to see `pcmpgt`, then we want to see `pand`, `pandn`, and `por`. Thus, I'd write this test something like this: CHECK: psubd CHECK: pcmpgt CHECK-DAG: pand // BTW, why do you have two `pandn` here? CHECK-DAG: pandn CHECK: por CHECK: ret
206	If we don't want to match any specific register here, we need to get rid of comments `# xmm5 = xmm4...` too.
test/CodeGen/X86/absdiff_256.ll
33	This is still fragile. Imagine that register allocator for some strange reason begins to use `xmm5` instead of `xmm6` and vice versa - this test will immediately fail. Also, if you want to match `pxor %xmmN, %xmmN`, the correct way to write the regexp for it would be: pxor [[SOMENAME:%xmm[0-9]+]], [[SOMENAME]] This will ensure that `pxor` operates on the same register.

This revision now requires changes to proceed.Jul 31 2015, 1:26 PM

• ashahid added inline comments.Aug 2 2015, 12:16 AM

docs/LangRef.rst
10387–10390	The difference is the presence of NSW flag in case of llvm.sabsdiff.
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737	My bad... intention was to use Tmp1 instead of Tmp2. I will use the proper variable names to reflect the operations.
test/CodeGen/X86/absdiff_128.ll
150–152	Thanks, this is a very important input. For ISD::SETGE, X86 swaps the operand, consequently, in context of VSELECT it uses two "pandn".
206	Ok
test/CodeGen/X86/absdiff_256.ll
33	Ok

Hi James,

That is right, for Uint_max the current comparison will not be proper.

With larger data type do you mean promoting the given data type to larger type ex: MVT:i32 to MVT:i64
And then doing the expansion?

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, August 03, 2015 1:02 AM
To: reviews+D11678+public+e92bec0f352bb617@reviews.llvm.org; Shahid, Asghar-ahmad; james.molloy@arm.com; hfinkel@anl.gov; mzolotukhin@apple.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] D11678: [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change

I think uabsdiff needs to be expanded using a larger data type. If we have uabsdiff(uint_max, uint_max) , a signed comparison won't return the right result unless the bitwidth is expanded, right?

James

mzolotukhin added inline comments.Aug 3 2015, 12:55 PM

docs/LangRef.rst
10387–10390	I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any additional logic. It does assert that the expression won't overflow, but the operations we execute are still the same. That is, currently the only difference between signed and unsigned version is that for signed version we could get an undefined behavior in some cases. This is clearly incorrect, because we should get different results without undefined behavior in some cases (e.g. `<-1,-1,-1,-1>` and `<1,1,1,1>` - it should give `<254,254,254,254>` for `uabsdiff.v4i8` and `<2,2,2,2>` for `sabsdiff.v4i8`). What really should be the difference, as far is I understand, is condition code in the comparison: %ispos = icmp sge <4 x i32> %sub, zeroinitializer As far as I understand, we should use `uge` for unsigned and `sge` for signed case.

Updated the patch to define the behavior of llvm.uabsdiff intrinsic. The corresponding doc, code & test cases are updated accordingly.

Ping !

bruno added a subscriber: bruno.Aug 18 2015, 6:13 AM

bruno added inline comments.

docs/LangRef.rst
10352	Space after the dot
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
744	Remove the curly braces

Updated the patch for Bruno's comments.

Hey there... ping.

Hi Shahid,

Thanks for working on this! Please find some questions/comments below:

docs/LangRef.rst
10352–10354	Please specify what happens if the result overflows (e.g. `llvm.sabsdiff.v4i8(<4 x i32> <-128, -128, -128, -128>, <4 x i32> <127, 127, 127, 127>)`).
10359	While we are here, could you please fix the typo here? (space before 'it' and capitalize the first letter)
test/CodeGen/X86/absdiff_256.ll
10	A single `CHECK-DAG` between two `CHECK` statements has no effect (it works as a plain `CHECK`). Please fix that.

• ashahid added inline comments.Aug 26 2015, 1:13 AM

docs/LangRef.rst
10352–10354	Thanks for the catch, in this case the behavior is undefined and targets can define their own behavior. Does this make sense?

Hi Shahid,

Please see my replies below:

Thanks,
Michael

docs/LangRef.rst
10352–10354	I think that totally makes sense, but we need to explicitly state that in the documentation.
test/CodeGen/X86/absdiff_256.ll
2	`CHECK` is the default prefix, so you don't need to specify it.

I'm not a fan of the way the tests are written: they seem both too brittle and too strict.

FWIW, I'd start by using utils/update_llc_test_checks.py, and then refining away the non-ABI-specified registers into regexes.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
724	Can you define these at the point of initialization instead?
731	Again, please initialize variables when defining them. Given line 745, NVT is unnecessary, I think. The block at l736 could just set VT instead.
733–734	This logic is fishy, IMO: it doesn't make sense for targets to set the element type promotion rule, and asking for the promoted vector type might get us a widened version. What about using: VT.widenIntegerVectorElementType(*DAG.getContext()); which sounds like what you want, given that there's no other sensible thing the target could do.
lib/Target/X86/X86ISelLowering.cpp
318–323 ↗	(On Diff #32814)	This will become unnecessary if you use EVT::widenIntegerVectorElementType above.
test/CodeGen/X86/absdiff_128.ll
12–13	The a-d seems too restrictive (what about, say, %r9d?), and '+' seems unnecessary.
test/CodeGen/X86/absdiff_256.ll
6	Why test v16i32? IMHO the output is huge without providing more coverage. Should this be v16i16 or something?

• ashahid added inline comments.Aug 27 2015, 2:03 AM

test/CodeGen/X86/absdiff_128.ll
12–13	Thanks for the input, do you mean to use explicit register name in case of ABI specific register. For example, here if we are expecting EAX & ECX abi registers respectively, I should specify %eax & %ecx?

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

msg-12305-464.txt162 BDownload

In D11678#234108, @llvm-commits wrote:

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

msg-12305-464.txt162 BDownload

Updated the patch for

result overflow documentation
Test case fixes
Refactoring of the expansion of llvm@absdiff intrinsic.

Ping !!

Ping ...

Hi Mikhail & others,

Please review, waiting for the responses for long time.

Regards,
Shahid

Hi Shahid,

The logic in LangRef looks fine to me, but I'd prefer someone else to review the SDAG part.

Michael

Thanks for the changes!

I'm still uncomfortable with the tests. The explicit CHECK-NEXT approach would apply much better to the smaller tests in _128.ll than the big v16i16 one, I think. Regexes would also help make the tests less brittle while keeping them as useful.

Also, should we merge the two test files together? IMO, the _128/_256 distinction isn't very useful.

test/CodeGen/X86/absdiff_128.ll
67–68	These lines could be simplified into: ; CHECK: movzbl ; CHECK: movzbl Ditto for others, e.g. pextrw below.
test/CodeGen/X86/absdiff_256.ll
26–28	Why do these need CHECK-DAG?

Hi Ahmed,

Thanks for the comments. I will do the needful.

Regarding the test merging, IMO, even if it is not useful now, having a place holder as _256.ll is not a bad idea.

Regards,
Shahid

test/CodeGen/X86/absdiff_256.ll
26–28	This is for the case where psubw and pxor comes in different order. Doesn't it qualify for CHECK-DAG?

Updated the tests to make it less brittle.

Ping !!

hfinkel added inline comments.Sep 18 2015, 9:26 AM

docs/LangRef.rst
10351	I'd phrase this differently, and say that the intermediate calculations are computed using infinitely-precise unsigned arithmetic. That's similar to the language we used for inbounds GEPs.
10354	We cannot have target-defined behavior for target-independent IR intrinsics. You can say only that the result is undefined. If an application wants to rely on target-specific overflow behavior, then it will need to directly use some target-specific intrinsics, inline asm, etc.
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747	Should this be ISD::SETGE for both the signed and unsigned case?

Hi Hal,

Thanks for your comments, will update the doc accordingly.
See other response inlined.

Regards,
Shahid

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747	Here ISD::SETGE is for signed case only, unsigned case is handled with early exit.

hfinkel added inline comments.Sep 20 2015, 5:50 AM

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747	Okay, I see, you're using the TRUNCATE above.

docs/LangRef.rst updated to incorporate Hal's comments

LGTM.

Closed by commit rL248483: Codegen: Fix llvm.*absdiff semantic. (authored by • ashahid). · Explain WhySep 24 2015, 3:36 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

LangRef.rst

4 lines

lib/

CodeGen/

SelectionDAG/

LegalizeVectorOps.cpp

4 lines

test/

CodeGen/

X86/

absdiff_128.ll

221 lines

absdiff_256.ll

56 lines

absdiff_expand.ll

Diff 31118

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,342 Lines • ▼ Show 20 Lines	.. code-block:: llvm

declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4 x integer> %b)		declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4 x integer> %b)


Overview:		Overview:
"""""""""		"""""""""

The ``llvm.uabsdiff`` intrinsic returns a vector result of the absolute difference of the two operands,		The ``llvm.uabsdiff`` intrinsic returns a vector result of the absolute difference of the two operands,
treating them both as unsigned integers.		treating them both as unsigned integers.
		hfinkelUnsubmitted Not Done Reply Inline Actions I'd phrase this differently, and say that the intermediate calculations are computed using infinitely-precise unsigned arithmetic. That's similar to the language we used for inbounds GEPs. hfinkel: I'd phrase this differently, and say that the intermediate calculations are computed using…

		brunoUnsubmitted Not Done Reply Inline Actions Space after the dot bruno: Space after the dot
The ``llvm.sabsdiff`` intrinsic returns a vector result of the absolute difference of the two operands,		The ``llvm.sabsdiff`` intrinsic returns a vector result of the absolute difference of the two operands,
treating them both as signed integers.		treating them both as signed integers.
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Please specify what happens if the result overflows (e.g. `llvm.sabsdiff.v4i8(<4 x i32> <-128, -128, -128, -128>, <4 x i32> <127, 127, 127, 127>)`). mzolotukhin: Please specify what happens if the result overflows (e.g. `llvm.sabsdiff.v4i8(<4 x i32> <-128…
		ashahidAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the catch, in this case the behavior is undefined and targets can define their own behavior. Does this make sense? ashahid: Thanks for the catch, in this case the behavior is undefined and targets can define their own…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I think that totally makes sense, but we need to explicitly state that in the documentation. mzolotukhin: I think that totally makes sense, but we need to explicitly state that in the documentation.
		hfinkelUnsubmitted Not Done Reply Inline Actions We cannot have target-defined behavior for target-independent IR intrinsics. You can say only that the result is undefined. If an application wants to rely on target-specific overflow behavior, then it will need to directly use some target-specific intrinsics, inline asm, etc. hfinkel: We cannot have target-defined behavior for target-independent IR intrinsics. You can say only…

.. note::		.. note::

These intrinsics are primarily used during the code generation stage of compilation.		These intrinsics are primarily used during the code generation stage of compilation.
They are generated by compiler passes such as the Loop and SLP vectorizers.it is not		They are generated by compiler passes such as the Loop and SLP vectorizers.it is not
		mzolotukhinUnsubmitted Not Done Reply Inline Actions While we are here, could you please fix the typo here? (space before 'it' and capitalize the first letter) mzolotukhin: While we are here, could you please fix the typo here? (space before 'it' and capitalize the…
recommended for users to create them manually.		recommended for users to create them manually.

Arguments:		Arguments:
""""""""""		""""""""""

Both intrinsics take two integer of the same bitwidth.		Both intrinsics take two integer of the same bitwidth.

Semantics:		Semantics:
""""""""""		""""""""""

The expression::		The expression::

call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)		call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)

is equivalent to::		is equivalent to::

%sub = sub <4 x i32> %a, %b		%sub = sub <4 x i32> %a, %b
%ispos = icmp ugt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>		%ispos = icmp sge <4 x i32> %sub, zeroinitializer
%neg = sub <4 x i32> zeroinitializer, %sub		%neg = sub <4 x i32> zeroinitializer, %sub
%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg		%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg

Similarly the expression::		Similarly the expression::

call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)		call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)

is equivalent to::		is equivalent to::

%sub = sub nsw <4 x i32> %a, %b		%sub = sub nsw <4 x i32> %a, %b
%ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>		%ispos = icmp sge <4 x i32> %sub, zeroinitializer
%neg = sub nsw <4 x i32> zeroinitializer, %sub		%neg = sub nsw <4 x i32> zeroinitializer, %sub
%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg		%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg
		mzolotukhinUnsubmitted Not Done Reply Inline Actions What's the difference between `llvm.uabsdiff` and `llvm.sabsdiff` then? mzolotukhin: What's the difference between `llvm.uabsdiff` and `llvm.sabsdiff` then?
		ashahidAuthorUnsubmitted Not Done Reply Inline Actions The difference is the presence of NSW flag in case of llvm.sabsdiff. ashahid: The difference is the presence of NSW flag in case of llvm.sabsdiff.
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any additional logic. It does assert that the expression won't overflow, but the operations we execute are still the same. That is, currently the only difference between signed and unsigned version is that for signed version we could get an undefined behavior in some cases. This is clearly incorrect, because we should get different results without undefined behavior in some cases (e.g. `<-1,-1,-1,-1>` and `<1,1,1,1>` - it should give `<254,254,254,254>` for `uabsdiff.v4i8` and `<2,2,2,2>` for `sabsdiff.v4i8`). What really should be the difference, as far is I understand, is condition code in the comparison: %ispos = icmp sge <4 x i32> %sub, zeroinitializer As far as I understand, we should use `uge` for unsigned and `sge` for signed case. mzolotukhin: I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any…


Half Precision Floating Point Intrinsics		Half Precision Floating Point Intrinsics
----------------------------------------		----------------------------------------

For most target platforms, half precision floating point is a		For most target platforms, half precision floating point is a
storage-only format. This means that it is a dense encoding (in memory)		storage-only format. This means that it is a dense encoding (in memory)
but does not support computation in the format.		but does not support computation in the format.
▲ Show 20 Lines • Show All 961 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 715 Lines • ▼ Show 20 Lines	case ISD::SABSDIFF:
return ExpandABSDIFF(Op);		return ExpandABSDIFF(Op);
default:		default:
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());
}		}
}		}

SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {		SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {
SDLoc dl(Op);		SDLoc dl(Op);
SDValue Tmp1, Tmp2, Tmp3, Tmp4;		SDValue Tmp1, Tmp2, Tmp3, Tmp4;
		abUnsubmitted Not Done Reply Inline Actions Can you define these at the point of initialization instead? ab: Can you define these at the point of initialization instead?
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);		Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);

Tmp2 = Op.getOperand(0);		Tmp2 = Op.getOperand(0);
Tmp3 = Op.getOperand(1);		Tmp3 = Op.getOperand(1);
Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);		Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);
		abUnsubmitted Not Done Reply Inline Actions Again, please initialize variables when defining them. Given line 745, NVT is unnecessary, I think. The block at l736 could just set VT instead. ab: Again, please initialize variables when defining them. Given line 745, NVT is unnecessary, I…
Tmp2 =		Tmp2 =
DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1, &Flags);		DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1, &Flags);
Tmp4 = DAG.getNode(		Tmp4 = DAG.getNode(
		abUnsubmitted Not Done Reply Inline Actions This logic is fishy, IMO: it doesn't make sense for targets to set the element type promotion rule, and asking for the promoted vector type might get us a widened version. What about using: VT.widenIntegerVectorElementType(DAG.getContext()); which sounds like what you want, given that there's no other sensible thing the target could do. ab:* This logic is fishy, IMO: it doesn't make sense for targets to set the element type promotion…
ISD::SETCC, dl,		ISD::SETCC, dl,
TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT), Tmp2,		TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT), Tmp2,
DAG.getConstant(0, dl, VT),		DAG.getConstant(0, dl, VT), DAG.getCondCode(ISD::SETGE));
		mzolotukhinUnsubmitted Not Done Reply Inline Actions AFAIU, this should be `ISD::SETLE`. mzolotukhin: AFAIU, this should be `ISD::SETLE`.
		ashahidAuthorUnsubmitted Not Done Reply Inline Actions My bad... intention was to use Tmp1 instead of Tmp2. I will use the proper variable names to reflect the operations. ashahid: My bad... intention was to use Tmp1 instead of Tmp2. I will use the proper variable names to…
DAG.getCondCode(Op->getOpcode() == ISD::SABSDIFF ? ISD::SETLT
: ISD::SETULT));
Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);		Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);
return Tmp1;		return Tmp1;
}		}

SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {		SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {
// Lower a select instruction where the condition is a scalar and the		// Lower a select instruction where the condition is a scalar and the
// operands are vectors. Lower this select to VSELECT and implement it		// operands are vectors. Lower this select to VSELECT and implement it
		brunoUnsubmitted Not Done Reply Inline Actions Remove the curly braces bruno: Remove the curly braces
// using XOR AND OR. The selector bit is broadcasted.		// using XOR AND OR. The selector bit is broadcasted.
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
		hfinkelUnsubmitted Not Done Reply Inline Actions Should this be ISD::SETGE for both the signed and unsigned case? hfinkel: Should this be ISD::SETGE for both the signed and unsigned case?
		ashahidAuthorUnsubmitted Not Done Reply Inline Actions Here ISD::SETGE is for signed case only, unsigned case is handled with early exit. ashahid: Here ISD::SETGE is for signed case only, unsigned case is handled with early exit.
		hfinkelUnsubmitted Not Done Reply Inline Actions Okay, I see, you're using the TRUNCATE above. hfinkel: Okay, I see, you're using the TRUNCATE above.

SDValue Mask = Op.getOperand(0);		SDValue Mask = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
SDValue Op2 = Op.getOperand(2);		SDValue Op2 = Op.getOperand(2);

assert(VT.isVector() && !Mask.getValueType().isVector()		assert(VT.isVector() && !Mask.getValueType().isVector()
&& Op1.getValueType() == Op2.getValueType() && "Invalid type");		&& Op1.getValueType() == Op2.getValueType() && "Invalid type");

▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

test/CodeGen/X86/absdiff_128.ll

This file was added.

				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s -check-prefix=CHECK

				declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)

				define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
				; CHECK-LABEL: test_uabsdiff_v4i8_expand
				; CHECK-DAG: psubd %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtd %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				abUnsubmitted Not Done Reply Inline Actions The a-d seems too restrictive (what about, say, %r9d?), and '+' seems unnecessary. ab: The a-d seems too restrictive (what about, say, %r9d?), and '+' seems unnecessary.
				ashahidAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the input, do you mean to use explicit register name in case of ABI specific register. For example, here if we are expecting EAX & ECX abi registers respectively, I should specify %eax & %ecx? ashahid: Thanks for the input, do you mean to use explicit register name in case of ABI specific…
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq

				%1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
				ret <4 x i8> %1
				}

				declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)

				define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
				; CHECK-LABEL: test_sabsdiff_v4i8_expand
				; CHECK-DAG: psubd %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtd %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq

				%1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
				ret <4 x i8> %1
				}

				declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)

				define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8> %a2) {
				; CHECK-LABEL: test_sabsdiff_v8i8_expand
				; CHECK-DAG: psubw %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubw [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtw %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdq
				; CHECK-NEXT: retq
				%1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
				ret <8 x i8> %1
				}

				declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)

				define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8> %a2) {
				; CHECK-LABEL: test_uabsdiff_v16i8_expand
				abUnsubmitted Not Done Reply Inline Actions These lines could be simplified into: ; CHECK: movzbl ; CHECK: movzbl Ditto for others, e.g. pextrw below. ab: These lines could be simplified into: ``` ; CHECK: movzbl ; CHECK: movzbl ``` Ditto for…
				; CHECK-DAG: psubb %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubb [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtb %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8> %a2)
				ret <16 x i8> %1
				}

				declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)

				define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
				; CHECK-LABEL: test_uabsdiff_v8i16_expand
				; CHECK-DAG: psubw %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubw [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtw %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
				ret <8 x i16> %1
				}

				declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)

				define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
				; CHECK-LABEL: test_sabsdiff_v8i16_expand
				; CHECK-DAG: psubw %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubw [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtw %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
				ret <8 x i16> %1
				}

				declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
				; CHECK-LABEL: test_sabsdiff_v4i32_expand
				; CHECK-DAG: psubd %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-DAG: pcmpgtd %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
				ret <4 x i32> %1
				}

				declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
				; CHECK-LABEL: test_uabsdiff_v4i32_expand
				; CHECK-DAG: psubd %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK: pxor
				; CHECK-DAG: pxor %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This `CHECK-DAG` doesn't make much sense, since it's limited by `CHECK` and `CHECK-NEXT` from both sides. Moreover, I think the right way to make the tests less bristle is to not check for everything, but just look for key instructions. For example, we definitely expect to see `psubd`, then, maybe after several other instructions, we want to see `pcmpgt`, then we want to see `pand`, `pandn`, and `por`. Thus, I'd write this test something like this: CHECK: psubd CHECK: pcmpgt CHECK-DAG: pand // BTW, why do you have two `pandn` here? CHECK-DAG: pandn CHECK: por CHECK: ret mzolotukhin: This `CHECK-DAG` doesn't make much sense, since it's limited by `CHECK` and `CHECK-NEXT` from…
				ashahidAuthorUnsubmitted Not Done Reply Inline Actions Thanks, this is a very important input. For ISD::SETGE, X86 swaps the operand, consequently, in context of VSELECT it uses two "pandn". ashahid: Thanks, this is a very important input. For ISD::SETGE, X86 swaps the operand, consequently…
				; CHECK-DAG: pcmpgtd %xmm3, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pcmpeqd %xmm1, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-DAG: pandn %xmm0, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pandn %xmm3, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
				ret <4 x i32> %1
				}

				declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)

				define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32> %a2) {
				; CHECK-LABEL: test_sabsdiff_v2i32_expand
				; CHECK: psubq
				; CHECK-NEXT: pxor
				; CHECK-NEXT: psubq
				; CHECK-NEXT: movdqa .LCPI{{[0-9_]*[0-9]}}(%rip), [[SRC:%xmm[0-9]+]] # xmm1 = [2147483648,0,2147483648,0]
				; CHECK-NEXT: movdqa %xmm2, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: pcmpgtd
				; CHECK-NEXT: pshufd {{.*}} # xmm5 = xmm4[0,0,2,2]
				; CHECK-NEXT: pcmpeqd
				; CHECK-NEXT: pshufd {{.*}} # xmm1 = xmm3[1,1,3,3]
				; CHECK-NEXT: pand
				; CHECK-NEXT: pshufd {{.*}} # xmm3 = xmm4[1,1,3,3]
				; CHECK-NEXT: por
				; CHECK-NEXT: pcmpeqd
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pandn
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32> %a2)
				ret <2 x i32> %1
				}

				declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)

				define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64> %a2) {
				; CHECK-LABEL: test_sabsdiff_v2i64_expand
				; CHECK: psubq
				; CHECK-NEXT: pxor
				; CHECK-NEXT: psubq
				; CHECK-NEXT: movdqa .LCPI{{[0-9_]*[0-9]}}(%rip), [[SRC:%xmm[0-9]+]] # xmm1 = [2147483648,0,2147483648,0]
				; CHECK-NEXT: movdqa %xmm2, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pxor [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: pcmpgtd
				; CHECK-NEXT: pshufd {{.*}} # xmm5 = xmm4[0,0,2,2]
				mzolotukhinUnsubmitted Not Done Reply Inline Actions If we don't want to match any specific register here, we need to get rid of comments `# xmm5 = xmm4...` too. mzolotukhin: If we don't want to match any specific register here, we need to get rid of comments `# xmm5 =…
				ashahidAuthorUnsubmitted Not Done Reply Inline Actions Ok ashahid: Ok
				; CHECK-NEXT: pcmpeqd
				; CHECK-NEXT: pshufd {{.*}} # xmm1 = xmm3[1,1,3,3]
				; CHECK-NEXT: pand
				; CHECK-NEXT: pshufd {{.*}} # xmm3 = xmm4[1,1,3,3]
				; CHECK-NEXT: por
				; CHECK-NEXT: pcmpeqd
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pandn
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64> %a2)
				ret <2 x i64> %1
				}

test/CodeGen/X86/absdiff_256.ll

This file was added.

				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s -check-prefix=CHECK

				mzolotukhinUnsubmitted Not Done Reply Inline Actions `CHECK` is the default prefix, so you don't need to specify it. mzolotukhin: `CHECK` is the default prefix, so you don't need to specify it.
				declare <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32>, <16 x i32>)

				define <16 x i32> @test_sabsdiff_v16i32_expand(<16 x i32> %a1, <16 x i32> %a2) {
				; CHECK-LABEL: test_sabsdiff_v16i32_expand
				abUnsubmitted Not Done Reply Inline Actions Why test v16i32? IMHO the output is huge without providing more coverage. Should this be v16i16 or something? ab: Why test v16i32? IMHO the output is huge without providing more coverage. Should this be…
				; CHECK: movdqa
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: psubd
				mzolotukhinUnsubmitted Not Done Reply Inline Actions A single `CHECK-DAG` between two `CHECK` statements has no effect (it works as a plain `CHECK`). Please fix that. mzolotukhin: A single `CHECK-DAG` between two `CHECK` statements has no effect (it works as a plain `CHECK`).
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pxor
				; CHECK-DAG: psubd %xmm1, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor %xmm0, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pcmpgtd [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: pandn
				; CHECK-NEXT: pcmpeqd
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por
				; CHECK-DAG: psubd %xmm5, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor %xmm4, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pcmpgtd
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: pandn
				abUnsubmitted Not Done Reply Inline Actions Why do these need CHECK-DAG? ab: Why do these need CHECK-DAG?
				ashahidAuthorUnsubmitted Not Done Reply Inline Actions This is for the case where psubw and pxor comes in different order. Doesn't it qualify for CHECK-DAG? ashahid: This is for the case where psubw and pxor comes in different order. Doesn't it qualify for…
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por
				; CHECK-DAG: psubd %xmm6, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor %xmm5, [[DST:%xmm[0-9]+]]
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This is still fragile. Imagine that register allocator for some strange reason begins to use `xmm5` instead of `xmm6` and vice versa - this test will immediately fail. Also, if you want to match `pxor %xmmN, %xmmN`, the correct way to write the regexp for it would be: pxor [[SOMENAME:%xmm[0-9]+]], [[SOMENAME]] This will ensure that `pxor` operates on the same register. mzolotukhin: This is still fragile. Imagine that register allocator for some strange reason begins to use…
				ashahidAuthorUnsubmitted Not Done Reply Inline Actions Ok ashahid: Ok
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pcmpgtd
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: pandn
				; CHECK-NEXT: pxor
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por
				; CHECK-DAG: psubd %xmm7, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor %xmm2, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: psubd [[SRC]], [[DST]]
				; CHECK-NEXT: pcmpgtd
				; CHECK-NEXT: movdqa
				; CHECK-DAG: pandn %xmm8, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor %xmm10, [[DST:%xmm[0-9]+]]
				; CHECK-NEXT: pandn
				; CHECK-NEXT: por [[SRC]], [[DST]]
				; CHECK-NEXT: movdqa
				; CHECK-NEXT: retq
				%1 = call <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32> %a1, <16 x i32> %a2)
				ret <16 x i32> %1
				}

test/CodeGen/X86/absdiff_expand.ll

This file was deleted.

	; RUN: llc -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s -check-prefix=CHECK

	declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)

	define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
	; CHECK-LABEL: test_uabsdiff_v4i8_expand
	; CHECK: psubd %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubd %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: pcmpgtd %xmm3, %xmm2
	; CHECK-NEXT: pand %xmm2, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm2
	; CHECK-NEXT: por %xmm2, %xmm0
	; CHECK-NEXT: retq

	%1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
	ret <4 x i8> %1
	}

	declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)

	define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
	; CHECK-LABEL: test_sabsdiff_v4i8_expand
	; CHECK: psubd %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: pxor %xmm2, %xmm2
	; CHECK-NEXT: psubd %xmm0, %xmm2
	; CHECK-NEXT: pcmpgtd %xmm2, %xmm1
	; CHECK-NEXT: pand %xmm1, %xmm0
	; CHECK-NEXT: pandn %xmm2, %xmm1
	; CHECK-NEXT: por %xmm1, %xmm0
	; CHECK-NEXT: retq

	%1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
	ret <4 x i8> %1
	}


	declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)

	define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8> %a2) {
	; CHECK-LABEL: test_sabsdiff_v8i8_expand
	; CHECK: psubw %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: pxor %xmm2, %xmm2
	; CHECK-NEXT: psubw %xmm0, %xmm2
	; CHECK-NEXT: pcmpgtw %xmm2, %xmm1
	; CHECK-NEXT: pand %xmm1, %xmm0
	; CHECK-NEXT: pandn %xmm2, %xmm1
	; CHECK-NEXT: por %xmm1, %xmm0
	; CHECK-NEXT: retq
	%1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
	ret <8 x i8> %1
	}

	declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)

	define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8> %a2) {
	; CHECK-LABEL: test_uabsdiff_v16i8_expand
	; CHECK: psubb %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubb %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: pcmpgtb %xmm3, %xmm2
	; CHECK-NEXT: pand %xmm2, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm2
	; CHECK-NEXT: por %xmm2, %xmm0
	; CHECK-NEXT: retq
	%1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8> %a2)
	ret <16 x i8> %1
	}

	declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)

	define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
	; CHECK-LABEL: test_uabsdiff_v8i16_expand
	; CHECK: psubw %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubw %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: pcmpgtw %xmm3, %xmm2
	; CHECK-NEXT: pand %xmm2, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm2
	; CHECK-NEXT: por %xmm2, %xmm0
	; CHECK-NEXT: retq
	%1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
	ret <8 x i16> %1
	}

	declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)

	define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
	; CHECK-LABEL: test_sabsdiff_v8i16_expand
	; CHECK: psubw %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: pxor %xmm2, %xmm2
	; CHECK-NEXT: psubw %xmm0, %xmm2
	; CHECK-NEXT: pcmpgtw %xmm2, %xmm1
	; CHECK-NEXT: pand %xmm1, %xmm0
	; CHECK-NEXT: pandn %xmm2, %xmm1
	; CHECK-NEXT: por %xmm1, %xmm0
	; CHECK-NEXT: retq
	%1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
	ret <8 x i16> %1
	}

	declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)

	define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
	; CHECK-LABEL: test_sabsdiff_v4i32_expand
	; CHECK: psubd %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: pxor %xmm2, %xmm2
	; CHECK-NEXT: psubd %xmm0, %xmm2
	; CHECK-NEXT: pcmpgtd %xmm2, %xmm1
	; CHECK-NEXT: pand %xmm1, %xmm0
	; CHECK-NEXT: pandn %xmm2, %xmm1
	; CHECK-NEXT: por %xmm1, %xmm0
	; CHECK-NEXT: retq
	%1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
	ret <4 x i32> %1
	}

	declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)

	define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
	; CHECK-LABEL: test_uabsdiff_v4i32_expand
	; CHECK: psubd %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubd %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: pcmpgtd %xmm3, %xmm2
	; CHECK-NEXT: pand %xmm2, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm2
	; CHECK-NEXT: por %xmm2, %xmm0
	; CHECK-NEXT: retq
	%1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
	ret <4 x i32> %1
	}

	declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)

	define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32> %a2) {
	; CHECK-LABEL: test_sabsdiff_v2i32_expand
	; CHECK: psubq %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubq %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: movdqa %xmm2, %xmm4
	; CHECK-NEXT: pcmpgtd %xmm3, %xmm4
	; CHECK-NEXT: pshufd $160, %xmm4, %xmm5 # xmm5 = xmm4[0,0,2,2]
	; CHECK-NEXT: pcmpeqd %xmm2, %xmm3
	; CHECK-NEXT: pshufd $245, %xmm3, %xmm2 # xmm2 = xmm3[1,1,3,3]
	; CHECK-NEXT: pand %xmm5, %xmm2
	; CHECK-NEXT: pshufd $245, %xmm4, %xmm3 # xmm3 = xmm4[1,1,3,3]
	; CHECK-NEXT: por %xmm2, %xmm3
	; CHECK-NEXT: pand %xmm3, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm3
	; CHECK-NEXT: por %xmm3, %xmm0
	; CHECK-NEXT: retq
	%1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32> %a2)
	ret <2 x i32> %1
	}

	declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)

	define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64> %a2) {
	; CHECK-LABEL: test_sabsdiff_v2i64_expand
	; CHECK: psubq %xmm1, %xmm0
	; CHECK-NEXT: pxor %xmm1, %xmm1
	; CHECK-NEXT: psubq %xmm0, %xmm1
	; CHECK-NEXT: movdqa .LCPI{{[0-9_]*}}
	; CHECK-NEXT: movdqa %xmm1, %xmm3
	; CHECK-NEXT: pxor %xmm2, %xmm3
	; CHECK-NEXT: movdqa %xmm2, %xmm4
	; CHECK-NEXT: pcmpgtd %xmm3, %xmm4
	; CHECK-NEXT: pshufd $160, %xmm4, %xmm5 # xmm5 = xmm4[0,0,2,2]
	; CHECK-NEXT: pcmpeqd %xmm2, %xmm3
	; CHECK-NEXT: pshufd $245, %xmm3, %xmm2 # xmm2 = xmm3[1,1,3,3]
	; CHECK-NEXT: pand %xmm5, %xmm2
	; CHECK-NEXT: pshufd $245, %xmm4, %xmm3 # xmm3 = xmm4[1,1,3,3]
	; CHECK-NEXT: por %xmm2, %xmm3
	; CHECK-NEXT: pand %xmm3, %xmm0
	; CHECK-NEXT: pandn %xmm1, %xmm3
	; CHECK-NEXT: por %xmm3, %xmm0
	; CHECK-NEXT: retq
	%1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64> %a2)
	ret <2 x i64> %1
	}

	declare <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32>, <16 x i32>)

	define <16 x i32> @test_sabsdiff_v16i32_expand(<16 x i32> %a1, <16 x i32> %a2) {
	; CHECK-LABEL: test_sabsdiff_v16i32_expand
	; CHECK: psubd %xmm4, %xmm0
	; CHECK-NEXT: pxor %xmm8, %xmm8
	; CHECK-NEXT: pxor %xmm9, %xmm9
	; CHECK-NEXT: psubd %xmm0, %xmm9
	; CHECK-NEXT: pxor %xmm4, %xmm4
	; CHECK-NEXT: pcmpgtd %xmm9, %xmm4
	; CHECK-NEXT: pand %xmm4, %xmm0
	; CHECK-NEXT: pandn %xmm9, %xmm4
	; CHECK-NEXT: por %xmm4, %xmm0
	; CHECK-NEXT: psubd %xmm5, %xmm1
	; CHECK-NEXT: pxor %xmm4, %xmm4
	; CHECK-NEXT: psubd %xmm1, %xmm4
	; CHECK-NEXT: pxor %xmm5, %xmm5
	; CHECK-NEXT: pcmpgtd %xmm4, %xmm5
	; CHECK-NEXT: pand %xmm5, %xmm1
	; CHECK-NEXT: pandn %xmm4, %xmm5
	; CHECK-NEXT: por %xmm5, %xmm1
	; CHECK-NEXT: psubd %xmm6, %xmm2
	; CHECK-NEXT: pxor %xmm4, %xmm4
	; CHECK-NEXT: psubd %xmm2, %xmm4
	; CHECK-NEXT: pxor %xmm5, %xmm5
	; CHECK-NEXT: pcmpgtd %xmm4, %xmm5
	; CHECK-NEXT: pand %xmm5, %xmm2
	; CHECK-NEXT: pandn %xmm4, %xmm5
	; CHECK-NEXT: por %xmm5, %xmm2
	; CHECK-NEXT: psubd %xmm7, %xmm3
	; CHECK-NEXT: pxor %xmm4, %xmm4
	; CHECK-NEXT: psubd %xmm3, %xmm4
	; CHECK-NEXT: pcmpgtd %xmm4, %xmm8
	; CHECK-NEXT: pand %xmm8, %xmm3
	; CHECK-NEXT: pandn %xmm4, %xmm8
	; CHECK-NEXT: por %xmm8, %xmm3
	; CHECK-NEXT: retq
	%1 = call <16 x i32> @llvm.sabsdiff.v16i32(<16 x i32> %a1, <16 x i32> %a2)
	ret <16 x i32> %1
	}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code changeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 31118

docs/LangRef.rst

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

test/CodeGen/X86/absdiff_128.ll

test/CodeGen/X86/absdiff_256.ll

test/CodeGen/X86/absdiff_expand.ll

[CodeGen] Fixes absdiff intrinsic: LangRef doc/test case improvement and corresponding code change
ClosedPublic