Download Raw Diff

Details

Reviewers

mzolotukhin
jmolloy
hfinkel

Commits

rG13f1dfdf2ead: Codegen: Fix llvm.*absdiff semantic.
rL248483: Codegen: Fix llvm.*absdiff semantic.

Summary

This patch fixes the condition code for 'compare' llvm IR in the expansion of *absdiff* intrinsic.

LangRef doc is updated to reflect this change.

Test case is divided into two, based on the data width(128/256 bit). Also updated the tests to make it non-fragile.

Diff Detail

Repository: rL LLVM

Event Timeline

• ashahid updated this revision to Diff 31118.Jul 31 2015, 5:19 AM

• ashahid retitled this revision from to [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change.

• ashahid updated this object.

• ashahid added reviewers: mzolotukhin, jmolloy, hfinkel.

• ashahid set the repository for this revision to rL LLVM.

• ashahid added a subscriber: llvm-commits.

Hi Shahid,

Please find some comments inline:

docs/LangRef.rst
10387–10390 ↗	(On Diff #31118)	What's the difference between `llvm.uabsdiff` and `llvm.sabsdiff` then?
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737 ↗	(On Diff #31118)	AFAIU, this should be `ISD::SETLE`.
test/CodeGen/X86/absdiff_128.ll
150–152 ↗	(On Diff #31118)	This `CHECK-DAG` doesn't make much sense, since it's limited by `CHECK` and `CHECK-NEXT` from both sides. Moreover, I think the right way to make the tests less bristle is to not check for everything, but just look for key instructions. For example, we definitely expect to see `psubd`, then, maybe after several other instructions, we want to see `pcmpgt`, then we want to see `pand`, `pandn`, and `por`. Thus, I'd write this test something like this: CHECK: psubd CHECK: pcmpgt CHECK-DAG: pand // BTW, why do you have two `pandn` here? CHECK-DAG: pandn CHECK: por CHECK: ret
206 ↗	(On Diff #31118)	If we don't want to match any specific register here, we need to get rid of comments `# xmm5 = xmm4...` too.
test/CodeGen/X86/absdiff_256.ll
33 ↗	(On Diff #31118)	This is still fragile. Imagine that register allocator for some strange reason begins to use `xmm5` instead of `xmm6` and vice versa - this test will immediately fail. Also, if you want to match `pxor %xmmN, %xmmN`, the correct way to write the regexp for it would be: pxor [[SOMENAME:%xmm[0-9]+]], [[SOMENAME]] This will ensure that `pxor` operates on the same register.

This revision now requires changes to proceed.Jul 31 2015, 1:26 PM

• ashahid added inline comments.Aug 2 2015, 12:16 AM

docs/LangRef.rst
10387–10390 ↗	(On Diff #31118)	The difference is the presence of NSW flag in case of llvm.sabsdiff.
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737 ↗	(On Diff #31118)	My bad... intention was to use Tmp1 instead of Tmp2. I will use the proper variable names to reflect the operations.
test/CodeGen/X86/absdiff_128.ll
150–152 ↗	(On Diff #31118)	Thanks, this is a very important input. For ISD::SETGE, X86 swaps the operand, consequently, in context of VSELECT it uses two "pandn".
206 ↗	(On Diff #31118)	Ok
test/CodeGen/X86/absdiff_256.ll
33 ↗	(On Diff #31118)	Ok

Hi James,

That is right, for Uint_max the current comparison will not be proper.

With larger data type do you mean promoting the given data type to larger type ex: MVT:i32 to MVT:i64
And then doing the expansion?

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, August 03, 2015 1:02 AM
To: reviews+D11678+public+e92bec0f352bb617@reviews.llvm.org; Shahid, Asghar-ahmad; james.molloy@arm.com; hfinkel@anl.gov; mzolotukhin@apple.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] D11678: [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change

I think uabsdiff needs to be expanded using a larger data type. If we have uabsdiff(uint_max, uint_max) , a signed comparison won't return the right result unless the bitwidth is expanded, right?

James

mzolotukhin added inline comments.Aug 3 2015, 12:55 PM

docs/LangRef.rst
10387–10390 ↗	(On Diff #31118)	I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any additional logic. It does assert that the expression won't overflow, but the operations we execute are still the same. That is, currently the only difference between signed and unsigned version is that for signed version we could get an undefined behavior in some cases. This is clearly incorrect, because we should get different results without undefined behavior in some cases (e.g. `<-1,-1,-1,-1>` and `<1,1,1,1>` - it should give `<254,254,254,254>` for `uabsdiff.v4i8` and `<2,2,2,2>` for `sabsdiff.v4i8`). What really should be the difference, as far is I understand, is condition code in the comparison: %ispos = icmp sge <4 x i32> %sub, zeroinitializer As far as I understand, we should use `uge` for unsigned and `sge` for signed case.

Updated the patch to define the behavior of llvm.uabsdiff intrinsic. The corresponding doc, code & test cases are updated accordingly.

Ping !

bruno added a subscriber: bruno.Aug 18 2015, 6:13 AM

bruno added inline comments.

docs/LangRef.rst
10713 ↗	(On Diff #31646)	Space after the dot
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗	(On Diff #31646)	Remove the curly braces

Updated the patch for Bruno's comments.

Hey there... ping.

Hi Shahid,

Thanks for working on this! Please find some questions/comments below:

docs/LangRef.rst
10750–10751 ↗	(On Diff #32814)	Please specify what happens if the result overflows (e.g. `llvm.sabsdiff.v4i8(<4 x i32> <-128, -128, -128, -128>, <4 x i32> <127, 127, 127, 127>)`).
10756 ↗	(On Diff #32814)	While we are here, could you please fix the typo here? (space before 'it' and capitalize the first letter)
test/CodeGen/X86/absdiff_256.ll
9 ↗	(On Diff #32814)	A single `CHECK-DAG` between two `CHECK` statements has no effect (it works as a plain `CHECK`). Please fix that.

• ashahid added inline comments.Aug 26 2015, 1:13 AM

docs/LangRef.rst
10750–10751 ↗	(On Diff #32814)	Thanks for the catch, in this case the behavior is undefined and targets can define their own behavior. Does this make sense?

Hi Shahid,

Please see my replies below:

Thanks,
Michael

docs/LangRef.rst
10750–10751 ↗	(On Diff #32814)	I think that totally makes sense, but we need to explicitly state that in the documentation.
test/CodeGen/X86/absdiff_256.ll
1 ↗	(On Diff #32814)	`CHECK` is the default prefix, so you don't need to specify it.

I'm not a fan of the way the tests are written: they seem both too brittle and too strict.

FWIW, I'd start by using utils/update_llc_test_checks.py, and then refining away the non-ABI-specified registers into regexes.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
726 ↗	(On Diff #32814)	Can you define these at the point of initialization instead?
735 ↗	(On Diff #32814)	Again, please initialize variables when defining them. Given line 745, NVT is unnecessary, I think. The block at l736 could just set VT instead.
737–738 ↗	(On Diff #32814)	This logic is fishy, IMO: it doesn't make sense for targets to set the element type promotion rule, and asking for the promoted vector type might get us a widened version. What about using: VT.widenIntegerVectorElementType(*DAG.getContext()); which sounds like what you want, given that there's no other sensible thing the target could do.
lib/Target/X86/X86ISelLowering.cpp
318–323 ↗	(On Diff #32814)	This will become unnecessary if you use EVT::widenIntegerVectorElementType above.
test/CodeGen/X86/absdiff_128.ll
11–12 ↗	(On Diff #32814)	The a-d seems too restrictive (what about, say, %r9d?), and '+' seems unnecessary.
test/CodeGen/X86/absdiff_256.ll
5 ↗	(On Diff #32814)	Why test v16i32? IMHO the output is huge without providing more coverage. Should this be v16i16 or something?

• ashahid added inline comments.Aug 27 2015, 2:03 AM

test/CodeGen/X86/absdiff_128.ll
11–12 ↗	(On Diff #32814)	Thanks for the input, do you mean to use explicit register name in case of ABI specific register. For example, here if we are expecting EAX & ECX abi registers respectively, I should specify %eax & %ecx?

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

msg-12305-464.txt162 BDownload

In D11678#234108, @llvm-commits wrote:

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

msg-12305-464.txt162 BDownload

Updated the patch for

result overflow documentation
Test case fixes
Refactoring of the expansion of llvm@absdiff intrinsic.

Ping !!

Ping ...

Hi Mikhail & others,

Please review, waiting for the responses for long time.

Regards,
Shahid

Hi Shahid,

The logic in LangRef looks fine to me, but I'd prefer someone else to review the SDAG part.

Michael

Thanks for the changes!

I'm still uncomfortable with the tests. The explicit CHECK-NEXT approach would apply much better to the smaller tests in _128.ll than the big v16i16 one, I think. Regexes would also help make the tests less brittle while keeping them as useful.

Also, should we merge the two test files together? IMO, the _128/_256 distinction isn't very useful.

test/CodeGen/X86/absdiff_128.ll
66–67 ↗	(On Diff #33309)	These lines could be simplified into: ; CHECK: movzbl ; CHECK: movzbl Ditto for others, e.g. pextrw below.
test/CodeGen/X86/absdiff_256.ll
25–27 ↗	(On Diff #33309)	Why do these need CHECK-DAG?

Hi Ahmed,

Thanks for the comments. I will do the needful.

Regarding the test merging, IMO, even if it is not useful now, having a place holder as _256.ll is not a bad idea.

Regards,
Shahid

test/CodeGen/X86/absdiff_256.ll
25–27 ↗	(On Diff #33309)	This is for the case where psubw and pxor comes in different order. Doesn't it qualify for CHECK-DAG?

Updated the tests to make it less brittle.

Ping !!

hfinkel added inline comments.Sep 18 2015, 9:26 AM

docs/LangRef.rst
10873 ↗	(On Diff #34417)	I'd phrase this differently, and say that the intermediate calculations are computed using infinitely-precise unsigned arithmetic. That's similar to the language we used for inbounds GEPs.
10879 ↗	(On Diff #34417)	We cannot have target-defined behavior for target-independent IR intrinsics. You can say only that the result is undefined. If an application wants to rely on target-specific overflow behavior, then it will need to directly use some target-specific intrinsics, inline asm, etc.
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗	(On Diff #34417)	Should this be ISD::SETGE for both the signed and unsigned case?

Hi Hal,

Thanks for your comments, will update the doc accordingly.
See other response inlined.

Regards,
Shahid

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗	(On Diff #34417)	Here ISD::SETGE is for signed case only, unsigned case is handled with early exit.

hfinkel added inline comments.Sep 20 2015, 5:50 AM

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗	(On Diff #34417)	Okay, I see, you're using the TRUNCATE above.

docs/LangRef.rst updated to incorporate Hal's comments

LGTM.

Closed by commit rL248483: Codegen: Fix llvm.*absdiff semantic. (authored by • ashahid). · Explain WhySep 24 2015, 3:36 AM

This revision was automatically updated to reflect the committed changes.

Diff 35601

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,914 Lines • ▼ Show 20 Lines
	.. code-block:: llvm			.. code-block:: llvm

	declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4 x integer> %b)			declare <4 x integer> @llvm.uabsdiff.v4i32(<4 x integer> %a, <4 x integer> %b)


	Overview:			Overview:
	"""""""""			"""""""""

	The ``llvm.uabsdiff`` intrinsic returns a vector result of the absolute difference of			The ``llvm.uabsdiff`` intrinsic returns a vector result of the absolute difference
	the two operands, treating them both as unsigned integers.			of the two operands, treating them both as unsigned integers. The intermediate
				calculations are computed using infinitely precise unsigned arithmetic. The final
				result will be truncated to the given type.

	The ``llvm.sabsdiff`` intrinsic returns a vector result of the absolute difference of			The ``llvm.sabsdiff`` intrinsic returns a vector result of the absolute difference of
	the two operands, treating them both as signed integers.			the two operands, treating them both as signed integers. If the result overflows, the
				behavior is undefined.

	.. note::			.. note::

	These intrinsics are primarily used during the code generation stage of compilation.			These intrinsics are primarily used during the code generation stage of compilation.
	They are generated by compiler passes such as the Loop and SLP vectorizers.it is not			They are generated by compiler passes such as the Loop and SLP vectorizers. It is not
	recommended for users to create them manually.			recommended for users to create them manually.

	Arguments:			Arguments:
	""""""""""			""""""""""

	Both intrinsics take two integer of the same bitwidth.			Both intrinsics take two integer of the same bitwidth.

	Semantics:			Semantics:
	""""""""""			""""""""""

	The expression::			The expression::

	call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)			call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)

	is equivalent to::			is equivalent to::

	%sub = sub <4 x i32> %a, %b			%1 = zext <4 x i32> %a to <4 x i64>
	%ispos = icmp ugt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>			%2 = zext <4 x i32> %b to <4 x i64>
	%neg = sub <4 x i32> zeroinitializer, %sub			%sub = sub <4 x i64> %1, %2
	%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg			%trunc = trunc <4 x i64> to <4 x i32>

	Similarly the expression::			and the expression::

	call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)			call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a, <4 x i32> %b)

	is equivalent to::			is equivalent to::

	%sub = sub nsw <4 x i32> %a, %b			%sub = sub nsw <4 x i32> %a, %b
	%ispos = icmp sgt <4 x i32> %sub, <i32 -1, i32 -1, i32 -1, i32 -1>			%ispos = icmp sge <4 x i32> %sub, zeroinitializer
	%neg = sub nsw <4 x i32> zeroinitializer, %sub			%neg = sub nsw <4 x i32> zeroinitializer, %sub
	%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg			%1 = select <4 x i1> %ispos, <4 x i32> %sub, <4 x i32> %neg


	Half Precision Floating Point Intrinsics			Half Precision Floating Point Intrinsics
	----------------------------------------			----------------------------------------

	For most target platforms, half precision floating point is a			For most target platforms, half precision floating point is a
	▲ Show 20 Lines • Show All 993 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	case ISD::SABSDIFF:
return ExpandABSDIFF(Op);		return ExpandABSDIFF(Op);
default:		default:
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());
}		}
}		}

SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {		SDValue VectorLegalizer::ExpandABSDIFF(SDValue Op) {
SDLoc dl(Op);		SDLoc dl(Op);
SDValue Tmp1, Tmp2, Tmp3, Tmp4;		SDValue Op0 = Op.getOperand(0);
		SDValue Op1 = Op.getOperand(1);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDNodeFlags Flags;
Flags.setNoSignedWrap(Op->getOpcode() == ISD::SABSDIFF);

Tmp2 = Op.getOperand(0);		// For unsigned intrinsic, promote the type to handle unsigned overflow.
Tmp3 = Op.getOperand(1);		bool isUabsdiff = (Op->getOpcode() == ISD::UABSDIFF);
Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp3, &Flags);		if (isUabsdiff) {
Tmp2 =		VT = VT.widenIntegerVectorElementType(*DAG.getContext());
DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Tmp1, &Flags);		Op0 = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Op0);
Tmp4 = DAG.getNode(		Op1 = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Op1);
ISD::SETCC, dl,		}
TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT), Tmp2,
DAG.getConstant(0, dl, VT),		SDNodeFlags Flags;
DAG.getCondCode(Op->getOpcode() == ISD::SABSDIFF ? ISD::SETLT		Flags.setNoSignedWrap(!isUabsdiff);
: ISD::SETULT));		SDValue Sub = DAG.getNode(ISD::SUB, dl, VT, Op0, Op1, &Flags);
Tmp1 = DAG.getNode(ISD::VSELECT, dl, VT, Tmp4, Tmp1, Tmp2);		if (isUabsdiff)
return Tmp1;		return DAG.getNode(ISD::TRUNCATE, dl, Op.getValueType(), Sub);

		SDValue Cmp =
		DAG.getNode(ISD::SETCC, dl, TLI.getSetCCResultType(DAG.getDataLayout(),
		*DAG.getContext(), VT),
		Sub, DAG.getConstant(0, dl, VT), DAG.getCondCode(ISD::SETGE));
		SDValue Neg = DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Sub, &Flags);
		return DAG.getNode(ISD::VSELECT, dl, VT, Cmp, Sub, Neg);
}		}

SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {		SDValue VectorLegalizer::ExpandSELECT(SDValue Op) {
// Lower a select instruction where the condition is a scalar and the		// Lower a select instruction where the condition is a scalar and the
// operands are vectors. Lower this select to VSELECT and implement it		// operands are vectors. Lower this select to VSELECT and implement it
// using XOR AND OR. The selector bit is broadcasted.		// using XOR AND OR. The selector bit is broadcasted.
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/absdiff_128.ll

				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s

				declare <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8>, <4 x i8>)

				define <4 x i8> @test_uabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
				; CHECK-LABEL: test_uabsdiff_v4i8_expand
				; CHECK: pshufd
				; CHECK: movd
				; CHECK: subl
				; CHECK: punpckldq
				; CHECK-DAG: movd %xmm1, [[SRC:%.*]]
				; CHECK-DAG: movd %xmm0, [[DST:%.*]]
				; CHECK: subl [[SRC]], [[DST]]
				; CHECK: movd
				; CHECK: pshufd
				; CHECK: movd
				; CHECK: punpckldq
				; CHECK: movdqa
				; CHECK: retq

				%1 = call <4 x i8> @llvm.uabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
				ret <4 x i8> %1
				}

				declare <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8>, <4 x i8>)

				define <4 x i8> @test_sabsdiff_v4i8_expand(<4 x i8> %a1, <4 x i8> %a2) {
				; CHECK-LABEL: test_sabsdiff_v4i8_expand
				; CHECK: psubd
				; CHECK: pcmpgtd
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubd {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq

				%1 = call <4 x i8> @llvm.sabsdiff.v4i8(<4 x i8> %a1, <4 x i8> %a2)
				ret <4 x i8> %1
				}

				declare <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8>, <8 x i8>)

				define <8 x i8> @test_sabsdiff_v8i8_expand(<8 x i8> %a1, <8 x i8> %a2) {
				; CHECK-LABEL: test_sabsdiff_v8i8_expand
				; CHECK: psubw
				; CHECK: pcmpgtw
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubw {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq

				%1 = call <8 x i8> @llvm.sabsdiff.v8i8(<8 x i8> %a1, <8 x i8> %a2)
				ret <8 x i8> %1
				}

				declare <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8>, <16 x i8>)

				define <16 x i8> @test_uabsdiff_v16i8_expand(<16 x i8> %a1, <16 x i8> %a2) {
				; CHECK-LABEL: test_uabsdiff_v16i8_expand
				; CHECK: movd
				; CHECK: movzbl
				; CHECK: movzbl
				; CHECK: subl
				; CHECK: punpcklbw
				; CHECK: retq

				%1 = call <16 x i8> @llvm.uabsdiff.v16i8(<16 x i8> %a1, <16 x i8> %a2)
				ret <16 x i8> %1
				}

				declare <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16>, <8 x i16>)

				define <8 x i16> @test_uabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
				; CHECK-LABEL: test_uabsdiff_v8i16_expand
				; CHECK: pextrw
				; CHECK: pextrw
				; CHECK: subl
				; CHECK: punpcklwd
				; CHECK: retq

				%1 = call <8 x i16> @llvm.uabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
				ret <8 x i16> %1
				}

				declare <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16>, <8 x i16>)

				define <8 x i16> @test_sabsdiff_v8i16_expand(<8 x i16> %a1, <8 x i16> %a2) {
				; CHECK-LABEL: test_sabsdiff_v8i16_expand
				; CHECK: psubw
				; CHECK: pcmpgtw
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubw {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq

				%1 = call <8 x i16> @llvm.sabsdiff.v8i16(<8 x i16> %a1, <8 x i16> %a2)
				ret <8 x i16> %1
				}

				declare <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i32> @test_sabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
				; CHECK-LABEL: test_sabsdiff_v4i32_expand
				; CHECK: psubd
				; CHECK: pcmpgtd
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubd {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq
				%1 = call <4 x i32> @llvm.sabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
				ret <4 x i32> %1
				}

				declare <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i32> @test_uabsdiff_v4i32_expand(<4 x i32> %a1, <4 x i32> %a2) {
				; CHECK-LABEL: test_uabsdiff_v4i32_expand
				; CHECK: pshufd
				; CHECK: movd
				; CHECK: subl
				; CHECK: punpckldq
				; CHECK-DAG: movd %xmm1, [[SRC:%.*]]
				; CHECK-DAG: movd %xmm0, [[DST:%.*]]
				; CHECK: subl [[SRC]], [[DST]]
				; CHECK: movd
				; CHECK: pshufd
				; CHECK: movd
				; CHECK: punpckldq
				; CHECK: movdqa
				; CHECK: retq

				%1 = call <4 x i32> @llvm.uabsdiff.v4i32(<4 x i32> %a1, <4 x i32> %a2)
				ret <4 x i32> %1
				}

				declare <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32>, <2 x i32>)

				define <2 x i32> @test_sabsdiff_v2i32_expand(<2 x i32> %a1, <2 x i32> %a2) {
				; CHECK-LABEL: test_sabsdiff_v2i32_expand
				; CHECK: psubq
				; CHECK: pcmpgtd
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubq {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq

				%1 = call <2 x i32> @llvm.sabsdiff.v2i32(<2 x i32> %a1, <2 x i32> %a2)
				ret <2 x i32> %1
				}

				declare <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64>, <2 x i64>)

				define <2 x i64> @test_sabsdiff_v2i64_expand(<2 x i64> %a1, <2 x i64> %a2) {
				; CHECK-LABEL: test_sabsdiff_v2i64_expand
				; CHECK: psubq
				; CHECK: pcmpgtd
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK-DAG: psubq {{%xmm[0-9]+}}, [[SRC1:%xmm[0-9]+]]
				; CHECK-DAG: pandn {{%xmm[0-9]+}}, [[SRC2:%xmm[0-9]+]]
				; CHECK-DAG: pandn [[SRC1]], [[DST:%xmm[0-9]+]]
				; CHECK: por [[SRC2]], [[DST]]
				; CHECK: retq

				%1 = call <2 x i64> @llvm.sabsdiff.v2i64(<2 x i64> %a1, <2 x i64> %a2)
				ret <2 x i64> %1
				}

llvm/trunk/test/CodeGen/X86/absdiff_256.ll

				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s

				declare <16 x i16> @llvm.sabsdiff.v16i16(<16 x i16>, <16 x i16>)

				define <16 x i16> @test_sabsdiff_v16i16_expand(<16 x i16> %a1, <16 x i16> %a2) {
				; CHECK-LABEL: test_sabsdiff_v16i16_expand:
				; CHECK: # BB#0:
				; CHECK: psubw
				; CHECK: pxor
				; CHECK: pcmpgtw
				; CHECK: movdqa
				; CHECK: pandn
				; CHECK: pxor
				; CHECK: psubw
				; CHECK: pcmpeqd
				; CHECK: pxor
				; CHECK: pandn
				; CHECK: por
				; CHECK: pcmpgtw
				; CHECK-DAG: psubw {{%xmm[0-9]+}}, [[SRC:%xmm[0-9]+]]
				; CHECK-DAG: pxor {{%xmm[0-9]+}}, [[DST:%xmm[0-9]+]]
				; CHECK: pandn [[SRC]], [[DST]]
				; CHECK: por
				; CHECK: movdqa
				; CHECK: retq
				%1 = call <16 x i16> @llvm.sabsdiff.v16i16(<16 x i16> %a1, <16 x i16> %a2)
				ret <16 x i16> %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixes absdiff intrinsic: LangRef doc/test case improvement and corresponding code change
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 35601

llvm/trunk/docs/LangRef.rst

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

llvm/trunk/test/CodeGen/X86/absdiff_128.ll

llvm/trunk/test/CodeGen/X86/absdiff_256.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code changeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 35601

llvm/trunk/docs/LangRef.rst

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

llvm/trunk/test/CodeGen/X86/absdiff_128.ll

llvm/trunk/test/CodeGen/X86/absdiff_256.ll

[CodeGen] Fixes absdiff intrinsic: LangRef doc/test case improvement and corresponding code change
ClosedPublic