This is an archive of the discontinued LLVM Phabricator instance.

Add Integer Saturation Intrinsics.
Needs ReviewPublic

Authored by ab on Jan 14 2015, 2:02 PM.

Download Raw Diff

Details

Reviewers

echristo
weimingz

Summary

I'm not sure the C-ish examples in the documentation are desirable, but I've added a few examples to make the behavior clear enough.

See the llvmdev thread. As described there:
This patch introduces a new family of intrinsics, for integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). Quoting the added documentation:

%r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)

is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
implementable as the following IR:

%min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1)
%max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1
%0 = icmp slt i32 %x, %min_sint_n
%1 = select i1 %0, i32 %min_sint_n, i32 %x
%2 = icmp sgt i32 %1, %max_sint_n
%r = select i1 %2, i32 %max_sint_n, i32 %1

Diff Detail

Event Timeline

ab updated this revision to Diff 18181.Jan 14 2015, 2:02 PM

ab updated this revision to Diff 18182.

ab retitled this revision from to Add Integer Saturation Intrinsics..

ab updated this object.

ab edited the test plan for this revision. (Show Details)

ab mentioned this in D6977: [CodeGen] Add legalization for Integer Saturation Intrinsics..Jan 14 2015, 2:04 PM

ab added a subscriber: Unknown Object (MLST).Jan 14 2015, 2:09 PM

Do we need to remove the llvm.arm.usat with this patch?

FYI, D8371 would greatly benefit from this patch. I'm adding Eric me and Weiming to the review list, the interested parties in the change going in. Please, add whomever else would be good to have in the reviewers list.

cheers,
--renato

Probably we'll have to rebase the whole thing. Let's get that one sorted first, then we see what's left. I have a feeling that it'll be a lot less code from your part.

Hi Ahmed,

This looks clean enough for me, and it matches the ARM semantics. I believe front-ends could generate it directly, but also it could become the end-result of an optimization pass looking for that pattern in IR, right?

cheers,
--renato

In D6976#142319, @rengolin wrote:

Hi Ahmed,

This looks clean enough for me, and it matches the ARM semantics. I believe front-ends could generate it directly, but also it could become the end-result of an optimization pass looking for that pattern in IR, right?

Right, my goal was to have an InstCombine that recognizes this.

However, on the RFC thread there were some concerns with adding intrinsics, and honestly I can see where they're coming from (you'd need to teach the various optimizers about those intrinsics). We can do all this at the SelectionDAG level, with a CodeGenPrepare fixup to get the select/icmp closer together. So, why would we add special-case intrinsics if we can express this in fairly simple canonical IR? We do the same for min/max, and we don't have intrinsics for those.

Anyway, I'll try to rebase this soon-ish and put my other patches up; sorry I couldn't get to this sooner!

In D6976#142095, @weimingz wrote:

Do we need to remove the llvm.arm.usat with this patch?

For now I'd rather keep the ARM intrinsics, we can deprecate and autoupgrade down the road.

-Ahmed

My problem with implementing this on the ARM side is that the code won't be useful anywhere else.

Maybe, a better approach would be to have them just as special DAG nodes, created by a special DAG pass that only run for selected back-ends, instead of an IR intrisic, or an ARM-specific late pass.

I'm assuming the semantics of such pass would be very similar on other back-ends...

Hi,

I'm interested in these intrinsics for X86. Do you plan to push them as target independent?

Thanks.

Elena

rengolin resigned from this revision.May 13 2016, 7:14 AM

rengolin removed a reviewer: rengolin.

tkn added a subscriber: tkn.Nov 17 2016, 7:08 PM

Revision Contents

Path

Size

docs/

LangRef.rst

145 lines

include/

llvm/

IR/

Intrinsics.td

7 lines

lib/

IR/

Verifier.cpp

13 lines

test/

Verifier/

saturation.ll

28 lines

Diff 18182

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,002 Lines • ▼ Show 20 Lines

	Examples:			Examples:
	"""""""""			"""""""""

	.. code-block:: llvm			.. code-block:: llvm

	%r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c			%r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c

				Saturating Arithmetic Intrinsics
				--------------------------------

				'``llvm.usat.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				This is an overloaded intrinsic. You can use llvm.usat on any integer
				bit width, or on any vector with integer elements. Not all targets
				support all bit widths or vector types, however.

				::

				declare i16 @llvm.usat.i16(i16 %x, i32 %n)
				declare i32 @llvm.usat.i32(i32 %x, i32 %n)
				declare i64 @llvm.usat.i64(i64 %x, i32 %n)
				declare <2 x i32> @llvm.usat.v2i32(<2 x i32> %x, i32 %n)

				Overview:
				"""""""""

				The '``llvm.usat.*``' intrinsic functions represent unsigned saturation
				to an integer of a specified bitwidth. That is, if the input is greater
				than the maximally representable unsigned integer for the specified bitwidth,
				that maximum integer is returned. Conversely, if the input is less than
				the minimally representable unsigned integer for the specified bitwidth, that
				minimum integer is returned. Otherwise, the input integer is returned.

				Arguments:
				""""""""""

				The '``llvm.usat.*``' intrinsics each take two arguments: the input integer,
				x, and the bitwidth to saturate to, n. The second argument, the bitwidth,
				has to be a non-zero constant value, smaller than the first arguments integer
				size in bits.

				Semantics:
				""""""""""

				The expression:

				.. code-block:: llvm

				%0 = call i32 @llvm.usat.i32(i32 %x, i32 %n)

				is equivalent to the expression min(max(x, 0), 2^n-1), itself implementable as
				the following IR:

				.. code-block:: llvm

				%min_uint_n = i32 ... ; the min. unsigned integer of bitwidth n, 0
				%max_uint_n = i32 ... ; the max. unsigned integer of bitwidth n, 2^n-1
				%0 = icmp slt i32 %x, %min_uint_n
				%1 = select i1 %0, i32 %min_uint_n, i32 %x
				%2 = icmp sgt i32 %1, %max_uint_n
				%3 = select i1 %2, i32 %max_uint_n, i32 %1

				The input integer is interpreted as a signed integer, i.e., all the
				comparisons are signed. The output is an unsigned integer that can be
				truncated to an integer of the specified bitwidth without loss of
				information.

				Examples:
				"""""""""

				.. code-block:: llvm

				%0 = call i32 @llvm.usat.i32(i32 345, i32 8) ; yields i32:255
				%1 = call i32 @llvm.usat.i32(i32 345, i32 9) ; yields i32:345
				%1 = call i32 @llvm.usat.i32(i32 256, i32 8) ; yields i32:255
				%2 = call i32 @llvm.usat.i32(i32 -1, i32 8) ; yields i32:0

				'``llvm.ssat.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				This is an overloaded intrinsic. You can use llvm.ssat on any integer
				bit width, or on any vector with integer elements. Not all targets
				support all bit widths or vector types, however.

				::

				declare i16 @llvm.ssat.i16(i16 %x, i32 %n)
				declare i32 @llvm.ssat.i32(i32 %x, i32 %n)
				declare i64 @llvm.ssat.i64(i64 %x, i32 %n)
				declare <2 x i32> @llvm.ssat.v2i32(<2 x i32> %x, i32 %n)

				Overview:
				"""""""""

				The '``llvm.ssat.*``' intrinsic functions represent signed saturation
				to an integer of a specified bitwidth. That is, if the input is greater
				than the maximally representable signed integer for the specified bitwidth,
				that maximum integer is returned. Conversely, if the input is less than
				the minimally representable signed integer for the specified bitwidth, that
				minimum integer is returned. Otherwise, the input integer is returned.

				Arguments:
				""""""""""

				The '``llvm.ssat.*``' intrinsics each take two arguments: the input integer,
				x, and the bitwidth to saturate to, n. The second argument, the bitwidth,
				has to be a non-zero constant value, smaller than the first arguments integer
				size in bits.

				Semantics:
				""""""""""

				The expression:

				.. code-block:: llvm

				%r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)

				is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
				implementable as the following IR:

				.. code-block:: llvm

				%min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1)
				%max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1
				%0 = icmp slt i32 %x, %min_sint_n
				%1 = select i1 %0, i32 %min_sint_n, i32 %x
				%2 = icmp sgt i32 %1, %max_sint_n
				%r = select i1 %2, i32 %max_sint_n, i32 %1

				The input integer is interpreted as a signed integer, i.e., all the
				comparisons are signed. The output is a signed integer that can be
				truncated to an integer of the specified bitwidth without loss of
				information.

				Examples:
				"""""""""

				.. code-block:: llvm

				%0 = call i32 @llvm.ssat.i32(i32 189, i32 8) ; yields i32:127
				%1 = call i32 @llvm.ssat.i32(i32 189, i32 9) ; yields i32:255
				%2 = call i32 @llvm.ssat.i32(i32 -189, i32 8) ; yields i32:-128
				%3 = call i32 @llvm.ssat.i32(i32 -1, i32 8) ; yields i32:-1

	Half Precision Floating Point Intrinsics			Half Precision Floating Point Intrinsics
	----------------------------------------			----------------------------------------

	For most target platforms, half precision floating point is a			For most target platforms, half precision floating point is a
	storage-only format. This means that it is a dense encoding (in memory)			storage-only format. This means that it is a dense encoding (in memory)
	but does not support computation in the format.			but does not support computation in the format.

	This means that code must first load the half-precision floating point			This means that code must first load the half-precision floating point
	▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 466 Lines • ▼ Show 20 Lines

	def int_smul_with_overflow : Intrinsic<[llvm_anyint_ty, llvm_i1_ty],			def int_smul_with_overflow : Intrinsic<[llvm_anyint_ty, llvm_i1_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>],			[LLVMMatchType<0>, LLVMMatchType<0>],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_umul_with_overflow : Intrinsic<[llvm_anyint_ty, llvm_i1_ty],			def int_umul_with_overflow : Intrinsic<[llvm_anyint_ty, llvm_i1_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>],			[LLVMMatchType<0>, LLVMMatchType<0>],
	[IntrNoMem]>;			[IntrNoMem]>;

				//===------------------------ Saturation Intrinsics -----------------------===//
				//
				let Properties = [IntrNoMem] in {
				def int_ssat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>;
				def int_usat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>;
				}

	//===------------------------- Memory Use Markers -------------------------===//			//===------------------------- Memory Use Markers -------------------------===//
	//			//
	def int_lifetime_start : Intrinsic<[],			def int_lifetime_start : Intrinsic<[],
	[llvm_i64_ty, llvm_ptr_ty],			[llvm_i64_ty, llvm_ptr_ty],
	[IntrReadWriteArgMem, NoCapture<1>]>;			[IntrReadWriteArgMem, NoCapture<1>]>;
	def int_lifetime_end : Intrinsic<[],			def int_lifetime_end : Intrinsic<[],
	[llvm_i64_ty, llvm_ptr_ty],			[llvm_i64_ty, llvm_ptr_ty],
	[IntrReadWriteArgMem, NoCapture<1>]>;			[IntrReadWriteArgMem, NoCapture<1>]>;
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 2,531 Lines • ▼ Show 20 Lines	void Verifier::visitIntrinsicFunctionCall(Intrinsic::ID ID, CallInst &CI) {
default:		default:
break;		break;
case Intrinsic::ctlz: // llvm.ctlz		case Intrinsic::ctlz: // llvm.ctlz
case Intrinsic::cttz: // llvm.cttz		case Intrinsic::cttz: // llvm.cttz
Assert1(isa<ConstantInt>(CI.getArgOperand(1)),		Assert1(isa<ConstantInt>(CI.getArgOperand(1)),
"is_zero_undef argument of bit counting intrinsics must be a "		"is_zero_undef argument of bit counting intrinsics must be a "
"constant int", &CI);		"constant int", &CI);
break;		break;
		case Intrinsic::usat:
		case Intrinsic::ssat: {
		ConstantInt *BitC = dyn_cast<ConstantInt>(CI.getArgOperand(1));
		Assert1(BitC,
		"bitwidth argument of saturation intrinsics must be a constant int",
		&CI);
		uint64_t Bit = BitC->getZExtValue();
		Assert1(Bit < CI.getArgOperand(0)->getType()->getScalarSizeInBits() &&
		Bit > 0,
		"bitwidth argument of saturation intrinsics must be larger than "
		"zero, and smaller than the bitwidth of the first (value) argument",
		&CI);
		} break;
case Intrinsic::dbg_declare: { // llvm.dbg.declare		case Intrinsic::dbg_declare: { // llvm.dbg.declare
Assert1(CI.getArgOperand(0) && isa<MetadataAsValue>(CI.getArgOperand(0)),		Assert1(CI.getArgOperand(0) && isa<MetadataAsValue>(CI.getArgOperand(0)),
"invalid llvm.dbg.declare intrinsic call 1", &CI);		"invalid llvm.dbg.declare intrinsic call 1", &CI);
} break;		} break;
case Intrinsic::memcpy:		case Intrinsic::memcpy:
case Intrinsic::memmove:		case Intrinsic::memmove:
case Intrinsic::memset:		case Intrinsic::memset:
Assert1(isa<ConstantInt>(CI.getArgOperand(3)),		Assert1(isa<ConstantInt>(CI.getArgOperand(3)),
▲ Show 20 Lines • Show All 370 Lines • Show Last 20 Lines

test/Verifier/saturation.ll

This file was added.

				; RUN: not llvm-as %s -o /dev/null 2>&1 \| FileCheck %s

				declare i32 @llvm.usat.i32(i32, i32)
				declare i32 @llvm.ssat.i32(i32, i32)

				define void @f(i32 %x, i32 %n) {
				entry:
				; CHECK: bitwidth argument of saturation intrinsics must be a constant int
				; CHECK-NEXT: @llvm.usat.i32
				call i32 @llvm.usat.i32(i32 %x, i32 %n)

				; CHECK: bitwidth argument of saturation intrinsics must be a constant int
				; CHECK-NEXT: @llvm.ssat.i32
				call i32 @llvm.ssat.i32(i32 %x, i32 %n)

				; CHECK: bitwidth argument of saturation intrinsics must be larger than zero, and smaller than the bitwidth of the first (value) argument
				; CHECK-NEXT: @llvm.ssat.i32
				call i32 @llvm.ssat.i32(i32 %x, i32 0)

				; CHECK: bitwidth argument of saturation intrinsics must be larger than zero, and smaller than the bitwidth of the first (value) argument
				; CHECK-NEXT: @llvm.ssat.i32
				call i32 @llvm.ssat.i32(i32 %x, i32 32)

				; CHECK: bitwidth argument of saturation intrinsics must be larger than zero, and smaller than the bitwidth of the first (value) argument
				; CHECK-NEXT: @llvm.ssat.i32
				call i32 @llvm.ssat.i32(i32 %x, i32 -1)
				ret void
				}