This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
4/6
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
ISDOpcodes.h
-
TargetLowering.h
-
IR/
-
Intrinsics.td
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
CodeGen/
-
SelectionDAG/
-
LegalizeDAG.cpp
2/2
LegalizeIntegerTypes.cpp
-
LegalizeVectorOps.cpp
-
LegalizeVectorTypes.cpp
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
8/9
TargetLowering.cpp
3/3
TargetLoweringBase.cpp
-
IR/
-
Verifier.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
5/8
smul_fix_sat.ll
-
smul_fix_sat_constants.ll

Differential D55720

[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
ClosedPublic

Authored by leonardchan on Dec 14 2018, 2:19 PM.

Download Raw Diff

Details

Reviewers

ebevhan
bjope
craig.topper
RKSimon

Commits

rZORG1579f571b3d1: [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
rG1579f571b3d1: [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
rG0bada7ce6c12: [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
rL361289: [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic

Summary

Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

leonardchan created this revision.Dec 14 2018, 2:19 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 14 2018, 2:19 PM

Nothing sticks out to me, so I think it looks good. Hard to tell if there are any sneaky edge cases in the lowering steps, though.

Maybe you should rebase this on top of the unsigned patch, since they're touching all the same places. Or are you waiting for it to land?

In D55720#1336355, @ebevhan wrote:

Nothing sticks out to me, so I think it looks good. Hard to tell if there are any sneaky edge cases in the lowering steps, though.

Maybe you should rebase this on top of the unsigned patch, since they're touching all the same places. Or are you waiting for it to land?

Yeah, I figure it would be better to submit these as separate patches since they're still technically independent of each other and ideally makes it easier for it to review.

@bjope @craig.topper @RKSimon Any comments on this patch?

RKSimon added inline comments.Jan 9 2019, 2:05 PM

llvm/test/CodeGen/X86/smul_fix_sat.ll
45	nounwind

leonardchan updated this revision to Diff 182085.Jan 16 2019, 9:52 AM

leonardchan marked an inline comment as done.

Should https://reviews.llvm.org/D56987 be a parent for this? Then you'd need to rebase getExpandedFixedPointMultiplication since that has changed into converting into MUL when scale is zero (that is not valid for saturation).

leonardchan added a parent revision: D56987: [Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments.Jan 24 2019, 8:16 AM

leonardchan removed a parent revision: D56987: [Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments.

leonardchan added a child revision: D56987: [Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments.

In D55720#1364944, @bjope wrote:

Should https://reviews.llvm.org/D56987 be a parent for this? Then you'd need to rebase getExpandedFixedPointMultiplication since that has changed into converting into MUL when scale is zero (that is not valid for saturation).

Rebased

RKSimon added inline comments.Jan 30 2019, 1:08 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5767	(style) Do an early out to reduce indentation if (!Saturating) return Result;

Updated and rebased

Please rebase after D55625 lands

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 5:42 AM

Updated and rebased

RKSimon added inline comments.Feb 4 2019, 10:39 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5711–5729	You've changed the logic to let non-vector cases to fall through, which leads to UNDEFs for scale == 0 cases.

leonardchan updated this revision to Diff 185101.Feb 4 2019, 11:18 AM

leonardchan marked an inline comment as done.

leonardchan added a parent revision: D57836: [Intrinsic] Unsigned Fixed Point Saturation Multiplication Intrinsic.Feb 6 2019, 12:26 PM

bjope added inline comments.Feb 7 2019, 12:02 AM

llvm/lib/CodeGen/TargetLoweringBase.cpp
626	I'm not sure how to do this when overriding for a specific target. In our case we want it to be legal, but only when the scale is 15 (and VT is i16 or i24) or the scale is 31 (and VT is i32 or i40). Is there some easy solution for that? Setting it to legal/custom for any scale might be seen as an indication for optimizers that it is OK to introduce these operations for any scale. This is however a general comment, also for the already pushed non-saturating versions. So it isn't anything that you need to deal with in this patch. But we might need a better solution in the long term.

ebevhan added inline comments.Feb 7 2019, 12:33 AM

llvm/lib/CodeGen/TargetLoweringBase.cpp
626	That's what the `isSupportedFixedPointOperation` hook in TargetLowering is for.

bjope added inline comments.Feb 7 2019, 12:40 AM

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	The expansion is quite complicated (the splitting in four parts and detecting overflow etc). Isn't there a risk that X86 is a typical target that will try to find more optimal solutions and maybe also make SMULFIXSAT legal? Then this test case might not really verify the expand code any longer? On the other hand, these test cases are just jibberish to me anyway. I can't tell from looking at the checks that DAGTypeLegalizer::ExpandIntRes_MULFIX is doing the right thing. And it would not really help if using another target. Are there perhaps other ways to test DAGTypeLegalizer, such as unit tests? One thing that probably can be done quite easily is to a bunch of tests using constant operands. Verifying that DAGCombiner will constant fold to the expected result after having expanded into legal operations (somehow making sure that DAGCombiner do not constant fold the SMULFIXSAT before it has been expanded, I guess someone will add such DAGCombines sooner or later). That way you might be able to get coverage for all paths through DAGTypeLegalizer::ExpandIntRes_MULFIX. Maybe this test should be in a separate test file.

bjope added inline comments.Feb 7 2019, 1:13 AM

llvm/lib/CodeGen/TargetLoweringBase.cpp
626	Ah, yes! And that is updated in this patch. Just me being blind (in combination with a some amnesia).

RKSimon added inline comments.Feb 7 2019, 4:54 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5699	Why drop the assert? Why not just add ISD::SMULFIXSAT tests?
5712–5729	I think you need something like this here (please double check my logic): if (VT.isVector() && !isOperationLegalOrCustom(ISD::SMULO, VT) && !(!Saturating && isOperationLegalOrCustom(ISD::MUL, VT))) return SDValue(); // unroll And that will let you avoid the return SDValue() below by always defaulting to a scalar ISD::MUL/ISD::SMULO that legalization can handle.

RKSimon mentioned this in rL353546: [TargetLowering] Use ISD::FSHR in expandFixedPointMul.Feb 8 2019, 10:57 AM

RKSimon mentioned this in rGeb6a47a46274: [TargetLowering] Use ISD::FSHR in expandFixedPointMul.

RKSimon added inline comments.Feb 8 2019, 11:05 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5712–5729	I've committed rL353546 which /should/ mean that the scale==0 case is now safe to drop through.

leonardchan updated this revision to Diff 186055.Feb 8 2019, 3:16 PM

leonardchan marked 6 inline comments as done.

Fixed mistake related to saturation during expansion. When we promote the operand and result type widths, this also changes the saturation width and affects the min/max values we compare against. This is easily solved by also shifting one of the operands after extension.

Oh I forgot to submit these inline comments

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5712–5729	I thought we still want to allow vectors to pass to calls to `MUL` and `SMULO`? Wouldn't this scalarize when we disallow vectors even if `MUL` and `SMULO` are legeal?
llvm/test/CodeGen/X86/smul_fix_sat.ll
3	Yeah I can see how these tests are hard to read. I wasn't aware of other ways this could be tested other than making sure the codegen is the same each time. I have my own scripts with different cases to verify the output is correct, but wasn't sure of any existing widely used method of "taking my IR, running it, and verify the results". Testing with constant operands seems to produce better looking tests for non-saturating multiplication: define i4 @func() { ; X64-LABEL: func: ; X64: # %bb.0: ; X64-NEXT: movb $3, %al ; X64-NEXT: retq %tmp = call i4 @llvm.smul.fix.i4( i4 3, i4 2 , i32 1) ret i4 %tmp } where we can immediately tell the result is 3, but there's still branching in the saturating case: define i4 @func2() { ; X64-LABEL: func2: ; X64: # %bb.0: ; X64-NEXT: xorl %eax, %eax ; X64-NEXT: testb %al, %al ; X64-NEXT: movb $127, %cl ; X64-NEXT: jg .LBB1_2 ; X64-NEXT: # %bb.1: ; X64-NEXT: movb $3, %cl ; X64-NEXT: .LBB1_2: ; X64-NEXT: movb $-1, %al ; X64-NEXT: negb %al ; X64-NEXT: movb $-128, %al ; X64-NEXT: jl .LBB1_4 ; X64-NEXT: # %bb.3: ; X64-NEXT: movl %ecx, %eax ; X64-NEXT: .LBB1_4: ; X64-NEXT: retq %tmp = call i4 @llvm.smul.fix.sat.i4( i4 3, i4 2 , i32 1) ret i4 %tmp } so we can't get something as straightforward as with non-saturating.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 20 2019, 5:58 PM

leonardchan updated this revision to Diff 187955.Feb 22 2019, 11:26 AM

leonardchan marked an inline comment as done.

leonardchan added inline comments.

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	@bjope I added another test file that covers the saturation branches in ExpandIntRes_MULFIX using constant operands, although this doesn't seem to produce anything more readable than with variable operands.

bjope added inline comments.Feb 22 2019, 4:27 PM

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	Maybe it doesn't fold due to lack of constant folding for SMUL_LOHI (at least not for x86). What a pity. I tried running the test using -mtriple=x86_64--, that at least produce code that is easier to map to the expansion. I also tried some other targets: -mtriple=ppc32 => looks like we get some constant folding here -mtriple=ppc64 => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=hexagon => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=systemz => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=sparc => LLVM ERROR: Cannot select: t42: i32,i32 = addcarry t41:1, Constant:i32<0>, t90:1 (FWIW, no idea if the asserts and LLVM ERROR actually is related to your patch)

*ping*

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	Updated the test to use `-mtriple=x86_64-linux`and it looks a lot more readable.

bjope added inline comments.Mar 7 2019, 8:10 AM

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	Have you looked at the problem with asserts in llvm::SelectionDAG::transferDbgValues? It happens when expanding smulfixsat, so something seems to be broken regarding the legalization (depending on target used).

leonardchan added a parent revision: D59119: [SelectionDAG] Check legality for ADDCARRY in expandMUL_LOHI.Mar 7 2019, 5:03 PM

leonardchan removed a parent revision: D59119: [SelectionDAG] Check legality for ADDCARRY in expandMUL_LOHI.

leonardchan added a child revision: D59119: [SelectionDAG] Check legality for ADDCARRY in expandMUL_LOHI.

leonardchan updated this revision to Diff 189803.Mar 7 2019, 5:11 PM

leonardchan marked an inline comment as done.

leonardchan added inline comments.

llvm/test/CodeGen/X86/smul_fix_sat.ll
3	For `addcarry`, the problem seems to be that `ISD::ADDCARRY` is not supported on some 32 bit targets. The fix for this is just a check in `expandMUL_LOHI` to see if this operation is legal (https://reviews.llvm.org/D59119). For `llvm::SelectionDAG::transferDbgValues`, this is because `expandFixedPointMul` returns an empty `SDValue()` to indicate this function failed due to some unsupported operation (most likely `ISD::SMULO`). I imagine the simplest solution for this is to just `report_fatal_error` since we do not have other operations we can use to perform saturation multiplication.

Sorry for the holdup.

@bjope D61411 addresses the LLVM_ERROR from expanding ADDCARRY, so now we can compile the test for all triples you brought up before. Also updated the tests to reflect these changes, and am still able to confirm on my end that the intrinsic produces the correct results.

leonardchan marked 2 inline comments as done.May 9 2019, 4:05 PM

leonardchan added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5712–5729	Dropped and can confirm this works for my tests

bjope added inline comments.May 10 2019, 12:07 PM

llvm/docs/LangRef.rst
13380	I think this part about unspecified rounding direction either need to be explained more thoroughly somewhere, or we need to find a way to make it possible to specify it. Since the same problem already exist for smul.fix and umul.fix this shouldn't neccessarily be a stopper for this patch,. But I think it poses some problems that there sometimes are two correct results. Are we for example supposed to prohibit constant folding (allowing backend targets implement a specific rounding scheme)? Then I guess we can't implement the promotion/legalization part either, at least not without specifying which rounding scheme the legalization will use. It would be weird to prohibit constant folding in the first place, while at the same time legalization into generic ISD operations might result in DAG combiner actually ending up constant folding the expression.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4504	Is this newline by mistake? Seems unrelated to the patch.

Ka-Ka added a subscriber: Ka-Ka.May 13 2019, 1:56 PM

leonardchan marked 4 inline comments as done.May 14 2019, 12:51 PM

leonardchan added inline comments.

llvm/docs/LangRef.rst
13380	Hmm. I'm starting to regret not specifying an argument for rounding when making these intrinsics. Would there be any large consequences for changing the intrinsics to accept a 4th argument for rounding?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4504	Accidental newline

leonardchan updated this revision to Diff 199498.May 14 2019, 12:52 PM

leonardchan marked an inline comment as done.

bjope added inline comments.May 15 2019, 7:35 AM

llvm/docs/LangRef.rst
13330	Do we need a new "chapter" for this? Maybe we can just continue the "Fixed Point Arithmetic Intrinsics" chapter here, and skip the general description below that only refers to "Fixed Point Arithmetic Intrinsics" and "Saturation Arithmetic". If needed, we can add the "(see Saturation Arithmetic)" somewhere in the semantic description of llvm.smul.fix.sat.* instead.
13380	Had a short discussion with @ebevhan about this (offline). Adding the 4th argument for rounding would make things more clear (avoiding "unspecified") make things more complicated (how many rounding modes should be supported? do we need to support folding/promotion/lowering etc for all different kinds of rounding modes? how do we verify all those modes?) So despite my comment above, we think that the way forward is to keep the solution with "unspecified" for now (we already got it for the non-saturating intrinsics). But to avoid confusion when people are reading the LangRef and looking at the code etc. we probably want to describe what "rounding direction is unspecified" means somewhere (for example in the introduction about "Fixed Point Arithmetic Intrinsics"). Explaining things like: different optimizations (and legalization) in the pipeline are free to do the rounding in whatever direction they want to (but I think the accuracy of the result still should be well-defined so we need to say something about that?) KnownBits/ValueTracking can't assume the direction of the rounding.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2718	Even if rounding is unspecified, I believe this code is implementing some kind of rounding scheme. Should we perhaps say something about this in the function header. It can be at help when looking at the code in the future. Both to understand what the intention was with the original algorithm. And to understand the expected result when looking at test results etc. Or for some target to understand why a "legal"/"custom" lowering gives different result compared to "expand".

leonardchan updated this revision to Diff 199655.May 15 2019, 12:18 PM

leonardchan marked 5 inline comments as done.

leonardchan added inline comments.

llvm/docs/LangRef.rst
13330	We probably don't need this. Especially since the fixed point section comes after the saturation section.
13380	I added more detail to the overview explaining the default expansion for multiplication and rounding to say that targets should specify their own hooks if they care about rounding and optimizations/legalizations should be performed based off that hook. Let me know if there's something else important that should be added.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2718	Added

LGTM! (if all comments from other reviewers has been taken care of) (maybe you should wait another day to see if anyone else object, but I think this patch has been open for a long time so there have been plenty of time for comments already)

Btw, I'm still having a hard time reviewing the X86 test cases with very long results, and a little bit worried that those just will give annoying churn when doing unrelated patches in the future, rather than help out detecting problems related to smul.fix. I currently have no more ideas on how to improve that.

Hopefully I'll be able to run some runtime comparison tests between X86 and our OOT target when this has landed (and when I've adapted our target to use these new intrinsics using "legal" lowering and not "expand").

This revision is now accepted and ready to land.May 17 2019, 8:31 AM

In D55720#1506687, @bjope wrote:

LGTM! (if all comments from other reviewers has been taken care of) (maybe you should wait another day to see if anyone else object, but I think this patch has been open for a long time so there have been plenty of time for comments already)

Btw, I'm still having a hard time reviewing the X86 test cases with very long results, and a little bit worried that those just will give annoying churn when doing unrelated patches in the future, rather than help out detecting problems related to smul.fix. I currently have no more ideas on how to improve that.

Hopefully I'll be able to run some runtime comparison tests between X86 and our OOT target when this has landed (and when I've adapted our target to use these new intrinsics using "legal" lowering and not "expand").

Thanks. Unless anyone has other comments, I'll attempt to commit this start of next week.

Closed by commit rL361289: [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic (authored by leonardchan). · Explain WhyMay 21 2019, 12:14 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

95 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

5 lines

TargetLowering.h

1 line

IR/

Intrinsics.td

6 lines

Target/

TargetSelectionDAG.td

1 line

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

2 lines

LegalizeIntegerTypes.cpp

142 lines

LegalizeVectorOps.cpp

1 line

LegalizeVectorTypes.cpp

2 lines

SelectionDAGBuilder.cpp

8 lines

SelectionDAGDumper.cpp

2 lines

TargetLowering.cpp

56 lines

TargetLoweringBase.cpp

1 line

IR/

Verifier.cpp

13 lines

test/

CodeGen/

X86/

smul_fix_sat.ll

739 lines

smul_fix_sat_constants.ll

101 lines

Diff 199655

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,176 Lines • ▼ Show 20 Lines

A fixed point number represents a real data type for a number that has a fixed		A fixed point number represents a real data type for a number that has a fixed
number of digits after a radix point (equivalent to the decimal point '.').		number of digits after a radix point (equivalent to the decimal point '.').
The number of digits after the radix point is referred as the ``scale``. These		The number of digits after the radix point is referred as the ``scale``. These
are useful for representing fractional values to a specific precision. The		are useful for representing fractional values to a specific precision. The
following intrinsics perform fixed point arithmetic operations on 2 operands		following intrinsics perform fixed point arithmetic operations on 2 operands
of the same scale, specified as the third argument.		of the same scale, specified as the third argument.

		The `llvm.*mul.fix` family of intrinsic functions represents a multiplication
		of fixed point numbers through scaled integers. Therefore, fixed point
		multplication can be represented as

		::
		%result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)
		=>
		%a2 = sext i4 %a to i8
		%b2 = sext i4 %b to i8
		%mul = mul nsw nuw i8 %a, %b
		%scale2 = trunc i32 %scale to i8
		%r = ashr i8 %mul, i8 %scale2 ; this is for a target rounding down towards negative infinity
		%result = trunc i8 %r to i4

		For each of these functions, if the result cannot be represented exactly with
		the provided scale, the result is rounded. Rounding is unspecified since
		preferred rounding may vary for different targets. Rounding is specified
		through a target hook. Different pipelines should legalize or optimize this
		using the rounding specified by this hook if it is provided. Operations like
		constant folding, instruction combining, KnownBits, and ValueTracking should
		also use this hook, if provided, and not assume the direction of rounding. A
		rounded result must always be within one unit of precision from the true
		result. That is, the error between the returned result and the true result must
		be less than 1/2^(scale).


'``llvm.smul.fix.*``' Intrinsics		'``llvm.smul.fix.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax		Syntax
"""""""		"""""""

This is an overloaded intrinsic. You can use ``llvm.smul.fix``		This is an overloaded intrinsic. You can use ``llvm.smul.fix``
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	.. code-block:: llvm

%res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)		%res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
%res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)		%res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)

; The result in the following could be rounded down to 3.5 or up to 4		; The result in the following could be rounded down to 3.5 or up to 4
%res = call i4 @llvm.umul.fix.i4(i4 15, i4 1, i32 1) ; %res = 7 (or 8) (7.5 x 0.5 = 3.75)		%res = call i4 @llvm.umul.fix.i4(i4 15, i4 1, i32 1) ; %res = 7 (or 8) (7.5 x 0.5 = 3.75)


		'``llvm.smul.fix.sat.*``' Intrinsics
		bjopeUnsubmitted Done Reply Inline Actions Do we need a new "chapter" for this? Maybe we can just continue the "Fixed Point Arithmetic Intrinsics" chapter here, and skip the general description below that only refers to "Fixed Point Arithmetic Intrinsics" and "Saturation Arithmetic". If needed, we can add the "(see Saturation Arithmetic)" somewhere in the semantic description of llvm.smul.fix.sat.* instead. bjope: Do we need a new "chapter" for this? Maybe we can just continue the "Fixed Point Arithmetic…
		leonardchanAuthorUnsubmitted Done Reply Inline Actions We probably don't need this. Especially since the fixed point section comes after the saturation section. leonardchan: We probably don't need this. Especially since the fixed point section comes after the…
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax
		"""""""

		This is an overloaded intrinsic. You can use ``llvm.smul.fix.sat``
		on any integer bit width or vectors of integers.

		::

		declare i16 @llvm.smul.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
		declare i32 @llvm.smul.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
		declare i64 @llvm.smul.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
		declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)

		Overview
		"""""""""

		The '``llvm.smul.fix.sat``' family of intrinsic functions perform signed
		fixed point saturation multiplication on 2 arguments of the same scale.

		Arguments
		""""""""""

		The arguments (%a and %b) and the result may be of integer types of any bit
		width, but they must have the same bit width. ``%a`` and ``%b`` are the two
		values that will undergo signed fixed point multiplication. The argument
		``%scale`` represents the scale of both operands, and must be a constant
		integer.

		Semantics:
		""""""""""

		This operation performs fixed point multiplication on the 2 arguments of a
		specified scale. The result will also be returned in the same scale specified
		in the third argument.

		If the result value cannot be precisely represented in the given scale, the
		value is rounded up or down to the closest representable value. The rounding
		direction is unspecified.

		The maximum value this operation can clamp to is the largest signed value
		representable by the bit width of the first 2 arguments. The minimum value is the
		smallest signed value representable by this bit width.


		Examples
		"""""""""

		.. code-block:: llvm
		bjopeUnsubmitted Not Done Reply Inline Actions I think this part about unspecified rounding direction either need to be explained more thoroughly somewhere, or we need to find a way to make it possible to specify it. Since the same problem already exist for smul.fix and umul.fix this shouldn't neccessarily be a stopper for this patch,. But I think it poses some problems that there sometimes are two correct results. Are we for example supposed to prohibit constant folding (allowing backend targets implement a specific rounding scheme)? Then I guess we can't implement the promotion/legalization part either, at least not without specifying which rounding scheme the legalization will use. It would be weird to prohibit constant folding in the first place, while at the same time legalization into generic ISD operations might result in DAG combiner actually ending up constant folding the expression. bjope: I think this part about unspecified rounding direction either need to be explained more…
		leonardchanAuthorUnsubmitted Done Reply Inline Actions Hmm. I'm starting to regret not specifying an argument for rounding when making these intrinsics. Would there be any large consequences for changing the intrinsics to accept a 4th argument for rounding? leonardchan: Hmm. I'm starting to regret not specifying an argument for rounding when making these…
		bjopeUnsubmitted Not Done Reply Inline Actions Had a short discussion with @ebevhan about this (offline). Adding the 4th argument for rounding would make things more clear (avoiding "unspecified") make things more complicated (how many rounding modes should be supported? do we need to support folding/promotion/lowering etc for all different kinds of rounding modes? how do we verify all those modes?) So despite my comment above, we think that the way forward is to keep the solution with "unspecified" for now (we already got it for the non-saturating intrinsics). But to avoid confusion when people are reading the LangRef and looking at the code etc. we probably want to describe what "rounding direction is unspecified" means somewhere (for example in the introduction about "Fixed Point Arithmetic Intrinsics"). Explaining things like: different optimizations (and legalization) in the pipeline are free to do the rounding in whatever direction they want to (but I think the accuracy of the result still should be well-defined so we need to say something about that?) KnownBits/ValueTracking can't assume the direction of the rounding. bjope: Had a short discussion with @ebevhan about this (offline). Adding the 4th argument for…
		leonardchanAuthorUnsubmitted Done Reply Inline Actions I added more detail to the overview explaining the default expansion for multiplication and rounding to say that targets should specify their own hooks if they care about rounding and optimizations/legalizations should be performed based off that hook. Let me know if there's something else important that should be added. leonardchan: I added more detail to the overview explaining the default expansion for multiplication and…

		%res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
		%res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
		%res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 x -1 = -1.5)

		; The result in the following could be rounded up to -2 or down to -2.5
		%res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -3, i32 1) ; %res = -5 (or -4) (1.5 x -1.5 = -2.25)

		; Saturation
		%res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 2, i32 0) ; %res = 7
		%res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 2, i32 2) ; %res = 7
		%res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 2, i32 2) ; %res = -8
		%res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 -2, i32 2) ; %res = 7

		; Scale can affect the saturation result
		%res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 0) ; %res = 7 (2 x 4 -> clamped to 7)
		%res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)


Specialised Arithmetic Intrinsics		Specialised Arithmetic Intrinsics
---------------------------------		---------------------------------

'``llvm.canonicalize.*``' Intrinsic		'``llvm.canonicalize.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
▲ Show 20 Lines • Show All 3,628 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	enum NodeType {

/// RESULT = [US]MULFIX(LHS, RHS, SCALE) - Perform fixed point multiplication on		/// RESULT = [US]MULFIX(LHS, RHS, SCALE) - Perform fixed point multiplication on
/// 2 integers with the same width and scale. SCALE represents the scale of		/// 2 integers with the same width and scale. SCALE represents the scale of
/// both operands as fixed point numbers. This SCALE parameter must be a		/// both operands as fixed point numbers. This SCALE parameter must be a
/// constant integer. A scale of zero is effectively performing		/// constant integer. A scale of zero is effectively performing
/// multiplication on 2 integers.		/// multiplication on 2 integers.
SMULFIX, UMULFIX,		SMULFIX, UMULFIX,

		/// Same as the corresponding unsaturated fixed point instructions, but the
		/// result is clamped between the min and max values representable by the
		/// bits of the first 2 operands.
		SMULFIXSAT,

/// Simple binary floating point operators.		/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,		FADD, FSUB, FMUL, FDIV, FREM,

/// Constrained versions of the binary floating point operators.		/// Constrained versions of the binary floating point operators.
/// These will be lowered to the simple operators before final selection.		/// These will be lowered to the simple operators before final selection.
/// They are used to limit optimizations while the DAG is being		/// They are used to limit optimizations while the DAG is being
/// optimized.		/// optimized.
STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,		STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 849 Lines • ▼ Show 20 Lines	LegalizeAction getFixedPointOperationAction(unsigned Op, EVT VT,

// This operation is supported in this type but may only work on specific		// This operation is supported in this type but may only work on specific
// scales.		// scales.
bool Supported;		bool Supported;
switch (Op) {		switch (Op) {
default:		default:
llvm_unreachable("Unexpected fixed point operation.");		llvm_unreachable("Unexpected fixed point operation.");
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX:		case ISD::UMULFIX:
Supported = isSupportedFixedPointOperation(Op, VT, Scale);		Supported = isSupportedFixedPointOperation(Op, VT, Scale);
break;		break;
}		}

return Supported ? Action : Expand;		return Supported ? Action : Expand;
}		}

▲ Show 20 Lines • Show All 3,174 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 865 Lines • ▼ Show 20 Lines
	def int_smul_fix : Intrinsic<[llvm_anyint_ty],			def int_smul_fix : Intrinsic<[llvm_anyint_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],			[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
	[IntrNoMem, IntrSpeculatable, Commutative, ImmArg<2>]>;			[IntrNoMem, IntrSpeculatable, Commutative, ImmArg<2>]>;

	def int_umul_fix : Intrinsic<[llvm_anyint_ty],			def int_umul_fix : Intrinsic<[llvm_anyint_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],			[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
	[IntrNoMem, IntrSpeculatable, Commutative, ImmArg<2>]>;			[IntrNoMem, IntrSpeculatable, Commutative, ImmArg<2>]>;

				//===------------------- Fixed Point Saturation Arithmetic Intrinsics ----------------===//
				//
				def int_smul_fix_sat : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
				[IntrNoMem, IntrSpeculatable, Commutative, ImmArg<2>]>;

	//===------------------------- Memory Use Markers -------------------------===//			//===------------------------- Memory Use Markers -------------------------===//
	//			//
	def int_lifetime_start : Intrinsic<[],			def int_lifetime_start : Intrinsic<[],
	[llvm_i64_ty, llvm_anyptr_ty],			[llvm_i64_ty, llvm_anyptr_ty],
	[IntrArgMemOnly, NoCapture<1>, ImmArg<0>]>;			[IntrArgMemOnly, NoCapture<1>, ImmArg<0>]>;
	def int_lifetime_end : Intrinsic<[],			def int_lifetime_end : Intrinsic<[],
	[llvm_i64_ty, llvm_anyptr_ty],			[llvm_i64_ty, llvm_anyptr_ty],
	[IntrArgMemOnly, NoCapture<1>, ImmArg<0>]>;			[IntrArgMemOnly, NoCapture<1>, ImmArg<0>]>;
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

llvm/include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 385 Lines • ▼ Show 20 Lines	def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;

def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;		def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;
def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;		def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;

def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
		def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;

def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;		def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;		def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;
def zext_invec : SDNode<"ISD::ZERO_EXTEND_VECTOR_INREG", SDTExtInvec>;		def zext_invec : SDNode<"ISD::ZERO_EXTEND_VECTOR_INREG", SDTExtInvec>;

def abs : SDNode<"ISD::ABS" , SDTIntUnaryOp>;		def abs : SDNode<"ISD::ABS" , SDTIntUnaryOp>;
def bitreverse : SDNode<"ISD::BITREVERSE" , SDTIntUnaryOp>;		def bitreverse : SDNode<"ISD::BITREVERSE" , SDTIntUnaryOp>;
▲ Show 20 Lines • Show All 966 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 1,128 Lines • ▼ Show 20 Lines	#endif
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::USUBSAT: {		case ISD::USUBSAT: {
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;		break;
}		}
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX: {		case ISD::UMULFIX: {
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),		Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);		Node->getValueType(0), Scale);
break;		break;
}		}
case ISD::MSCATTER:		case ISD::MSCATTER:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
▲ Show 20 Lines • Show All 2,145 Lines • ▼ Show 20 Lines	case ISD::ROTR:
break;		break;
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::USUBSAT:		case ISD::USUBSAT:
Results.push_back(TLI.expandAddSubSat(Node, DAG));		Results.push_back(TLI.expandAddSubSat(Node, DAG));
break;		break;
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX:		case ISD::UMULFIX:
Results.push_back(TLI.expandFixedPointMul(Node, DAG));		Results.push_back(TLI.expandFixedPointMul(Node, DAG));
break;		break;
case ISD::ADDCARRY:		case ISD::ADDCARRY:
case ISD::SUBCARRY: {		case ISD::SUBCARRY: {
SDValue LHS = Node->getOperand(0);		SDValue LHS = Node->getOperand(0);
SDValue RHS = Node->getOperand(1);		SDValue RHS = Node->getOperand(1);
SDValue Carry = Node->getOperand(2);		SDValue Carry = Node->getOperand(2);
▲ Show 20 Lines • Show All 1,294 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	#endif
case ISD::ADDCARRY:		case ISD::ADDCARRY:
case ISD::SUBCARRY: Res = PromoteIntRes_ADDSUBCARRY(N, ResNo); break;		case ISD::SUBCARRY: Res = PromoteIntRes_ADDSUBCARRY(N, ResNo); break;

case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::USUBSAT: Res = PromoteIntRes_ADDSUBSAT(N); break;		case ISD::USUBSAT: Res = PromoteIntRes_ADDSUBSAT(N); break;
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX: Res = PromoteIntRes_MULFIX(N); break;		case ISD::UMULFIX: Res = PromoteIntRes_MULFIX(N); break;
case ISD::ABS: Res = PromoteIntRes_ABS(N); break;		case ISD::ABS: Res = PromoteIntRes_ABS(N); break;

case ISD::ATOMIC_LOAD:		case ISD::ATOMIC_LOAD:
Res = PromoteIntRes_Atomic0(cast<AtomicSDNode>(N)); break;		Res = PromoteIntRes_Atomic0(cast<AtomicSDNode>(N)); break;

case ISD::ATOMIC_LOAD_ADD:		case ISD::ATOMIC_LOAD_ADD:
case ISD::ATOMIC_LOAD_SUB:		case ISD::ATOMIC_LOAD_SUB:
▲ Show 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	SDValue Result =
DAG.getNode(Opcode, dl, PromotedType, Op1Promoted, Op2Promoted);		DAG.getNode(Opcode, dl, PromotedType, Op1Promoted, Op2Promoted);
return DAG.getNode(ShiftOp, dl, PromotedType, Result, ShiftAmount);		return DAG.getNode(ShiftOp, dl, PromotedType, Result, ShiftAmount);
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_MULFIX(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_MULFIX(SDNode *N) {
// Can just promote the operands then continue with operation.		// Can just promote the operands then continue with operation.
SDLoc dl(N);		SDLoc dl(N);
SDValue Op1Promoted, Op2Promoted;		SDValue Op1Promoted, Op2Promoted;
if (N->getOpcode() == ISD::SMULFIX) {		bool Signed =
		N->getOpcode() == ISD::SMULFIX \|\| N->getOpcode() == ISD::SMULFIXSAT;
		if (Signed) {
Op1Promoted = SExtPromotedInteger(N->getOperand(0));		Op1Promoted = SExtPromotedInteger(N->getOperand(0));
Op2Promoted = SExtPromotedInteger(N->getOperand(1));		Op2Promoted = SExtPromotedInteger(N->getOperand(1));
} else {		} else {
Op1Promoted = ZExtPromotedInteger(N->getOperand(0));		Op1Promoted = ZExtPromotedInteger(N->getOperand(0));
Op2Promoted = ZExtPromotedInteger(N->getOperand(1));		Op2Promoted = ZExtPromotedInteger(N->getOperand(1));
}		}
		EVT OldType = N->getOperand(0).getValueType();
EVT PromotedType = Op1Promoted.getValueType();		EVT PromotedType = Op1Promoted.getValueType();
		unsigned DiffSize =
		PromotedType.getScalarSizeInBits() - OldType.getScalarSizeInBits();

		bool Saturating = N->getOpcode() == ISD::SMULFIXSAT;
		if (Saturating) {
		// Promoting the operand and result values changes the saturation width,
		// which is extends the values that we clamp to on saturation. This could be
		// resolved by shifting one of the operands the same amount, which would
		// also shift the result we compare against, then shifting back.
		EVT ShiftTy = TLI.getShiftAmountTy(PromotedType, DAG.getDataLayout());
		Op1Promoted = DAG.getNode(ISD::SHL, dl, PromotedType, Op1Promoted,
		DAG.getConstant(DiffSize, dl, ShiftTy));
		SDValue Result = DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted,
		Op2Promoted, N->getOperand(2));
		unsigned ShiftOp = Signed ? ISD::SRA : ISD::SRL;
		return DAG.getNode(ShiftOp, dl, PromotedType, Result,
		DAG.getConstant(DiffSize, dl, ShiftTy));
		}
return DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted, Op2Promoted,		return DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted, Op2Promoted,
N->getOperand(2));		N->getOperand(2));
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo) {
if (ResNo == 1)		if (ResNo == 1)
return PromoteIntRes_Overflow(N);		return PromoteIntRes_Overflow(N);

▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::SUBCARRY: Res = PromoteIntOp_ADDSUBCARRY(N, OpNo); break;		case ISD::SUBCARRY: Res = PromoteIntOp_ADDSUBCARRY(N, OpNo); break;

case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
case ISD::RETURNADDR: Res = PromoteIntOp_FRAMERETURNADDR(N); break;		case ISD::RETURNADDR: Res = PromoteIntOp_FRAMERETURNADDR(N); break;

case ISD::PREFETCH: Res = PromoteIntOp_PREFETCH(N, OpNo); break;		case ISD::PREFETCH: Res = PromoteIntOp_PREFETCH(N, OpNo); break;

case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX: Res = PromoteIntOp_MULFIX(N); break;		case ISD::UMULFIX: Res = PromoteIntOp_MULFIX(N); break;

case ISD::FPOWI: Res = PromoteIntOp_FPOWI(N); break;		case ISD::FPOWI: Res = PromoteIntOp_FPOWI(N); break;

case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
▲ Show 20 Lines • Show All 546 Lines • ▼ Show 20 Lines	#endif
case ISD::USUBO: ExpandIntRes_UADDSUBO(N, Lo, Hi); break;		case ISD::USUBO: ExpandIntRes_UADDSUBO(N, Lo, Hi); break;
case ISD::UMULO:		case ISD::UMULO:
case ISD::SMULO: ExpandIntRes_XMULO(N, Lo, Hi); break;		case ISD::SMULO: ExpandIntRes_XMULO(N, Lo, Hi); break;

case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::USUBSAT: ExpandIntRes_ADDSUBSAT(N, Lo, Hi); break;		case ISD::USUBSAT: ExpandIntRes_ADDSUBSAT(N, Lo, Hi); break;

case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX: ExpandIntRes_MULFIX(N, Lo, Hi); break;		case ISD::UMULFIX: ExpandIntRes_MULFIX(N, Lo, Hi); break;

case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
▲ Show 20 Lines • Show All 981 Lines • ▼ Show 20 Lines
}		}

void DAGTypeLegalizer::ExpandIntRes_ADDSUBSAT(SDNode *N, SDValue &Lo,		void DAGTypeLegalizer::ExpandIntRes_ADDSUBSAT(SDNode *N, SDValue &Lo,
SDValue &Hi) {		SDValue &Hi) {
SDValue Result = TLI.expandAddSubSat(N, DAG);		SDValue Result = TLI.expandAddSubSat(N, DAG);
SplitInteger(Result, Lo, Hi);		SplitInteger(Result, Lo, Hi);
}		}

		/// This performs an expansion of the integer result for a fixed point
		/// multiplication. The default expansion performs rounding down towards
		/// negative infinity, though targets that do care about rounding should specify
		/// a target hook for rounding and provide their own expansion or lowering of
		/// fixed point multiplication to be consistent with rounding.
void DAGTypeLegalizer::ExpandIntRes_MULFIX(SDNode *N, SDValue &Lo,		void DAGTypeLegalizer::ExpandIntRes_MULFIX(SDNode *N, SDValue &Lo,
		bjopeUnsubmitted Done Reply Inline Actions Even if rounding is unspecified, I believe this code is implementing some kind of rounding scheme. Should we perhaps say something about this in the function header. It can be at help when looking at the code in the future. Both to understand what the intention was with the original algorithm. And to understand the expected result when looking at test results etc. Or for some target to understand why a "legal"/"custom" lowering gives different result compared to "expand". bjope: Even if rounding is unspecified, I believe this code is implementing some kind of rounding…
		leonardchanAuthorUnsubmitted Done Reply Inline Actions Added leonardchan: Added
SDValue &Hi) {		SDValue &Hi) {
assert(
(N->getOpcode() == ISD::SMULFIX \|\| N->getOpcode() == ISD::UMULFIX) &&
"Expected operand to be signed or unsigned fixed point multiplication");

SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		unsigned VTSize = VT.getScalarSizeInBits();
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
uint64_t Scale = N->getConstantOperandVal(2);		uint64_t Scale = N->getConstantOperandVal(2);
		bool Saturating = N->getOpcode() == ISD::SMULFIXSAT;
		EVT BoolVT =
		TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
		SDValue Zero = DAG.getConstant(0, dl, VT);
if (!Scale) {		if (!Scale) {
SDValue Result = DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);		SDValue Result;
		if (!Saturating) {
		Result = DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
		} else {
		Result = DAG.getNode(ISD::SMULO, dl, DAG.getVTList(VT, BoolVT), LHS, RHS);
		SDValue Product = Result.getValue(0);
		SDValue Overflow = Result.getValue(1);

		APInt MinVal = APInt::getSignedMinValue(VTSize);
		APInt MaxVal = APInt::getSignedMaxValue(VTSize);
		SDValue SatMin = DAG.getConstant(MinVal, dl, VT);
		SDValue SatMax = DAG.getConstant(MaxVal, dl, VT);
		SDValue ProdNeg = DAG.getSetCC(dl, BoolVT, Product, Zero, ISD::SETLT);
		Result = DAG.getSelect(dl, VT, ProdNeg, SatMax, SatMin);
		Result = DAG.getSelect(dl, VT, Overflow, Result, Product);
		}
SplitInteger(Result, Lo, Hi);		SplitInteger(Result, Lo, Hi);
return;		return;
}		}

EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
SDValue LL, LH, RL, RH;		SDValue LL, LH, RL, RH;
GetExpandedInteger(LHS, LL, LH);		GetExpandedInteger(LHS, LL, LH);
GetExpandedInteger(RHS, RL, RH);		GetExpandedInteger(RHS, RL, RH);
SmallVector<SDValue, 4> Result;		SmallVector<SDValue, 4> Result;

bool Signed = N->getOpcode() == ISD::SMULFIX;		bool Signed = (N->getOpcode() == ISD::SMULFIX \|\|
		N->getOpcode() == ISD::SMULFIXSAT);
unsigned LoHiOp = Signed ? ISD::SMUL_LOHI : ISD::UMUL_LOHI;		unsigned LoHiOp = Signed ? ISD::SMUL_LOHI : ISD::UMUL_LOHI;
if (!TLI.expandMUL_LOHI(LoHiOp, VT, dl, LHS, RHS, Result, NVT, DAG,		if (!TLI.expandMUL_LOHI(LoHiOp, VT, dl, LHS, RHS, Result, NVT, DAG,
TargetLowering::MulExpansionKind::OnlyLegalOrCustom,		TargetLowering::MulExpansionKind::OnlyLegalOrCustom,
LL, LH, RL, RH)) {		LL, LH, RL, RH)) {
report_fatal_error("Unable to expand MUL_FIX using MUL_LOHI.");		report_fatal_error("Unable to expand MUL_FIX using MUL_LOHI.");
return;		return;
}		}

unsigned VTSize = VT.getScalarSizeInBits();
unsigned NVTSize = NVT.getScalarSizeInBits();		unsigned NVTSize = NVT.getScalarSizeInBits();
		assert((VTSize == NVTSize * 2) && "Expected the new value type to be half "
		"the size of the current value type");
EVT ShiftTy = TLI.getShiftAmountTy(NVT, DAG.getDataLayout());		EVT ShiftTy = TLI.getShiftAmountTy(NVT, DAG.getDataLayout());

// Shift whole amount by scale.		// Shift whole amount by scale.
SDValue ResultLL = Result[0];		SDValue ResultLL = Result[0];
SDValue ResultLH = Result[1];		SDValue ResultLH = Result[1];
SDValue ResultHL = Result[2];		SDValue ResultHL = Result[2];
SDValue ResultHH = Result[3];		SDValue ResultHH = Result[3];

		SDValue SatMax, SatMin;
		SDValue NVTZero = DAG.getConstant(0, dl, NVT);
		SDValue NVTNeg1 = DAG.getConstant(-1, dl, NVT);
		EVT BoolNVT =
		TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), NVT);

// After getting the multplication result in 4 parts, we need to perform a		// After getting the multplication result in 4 parts, we need to perform a
// shift right by the amount of the scale to get the result in that scale.		// shift right by the amount of the scale to get the result in that scale.
// Let's say we multiply 2 64 bit numbers. The resulting value can be held in		// Let's say we multiply 2 64 bit numbers. The resulting value can be held in
// 128 bits that are cut into 4 32-bit parts:		// 128 bits that are cut into 4 32-bit parts:
//		//
// HH HL LH LL		// HH HL LH LL
// \|---32---\|---32---\|---32---\|---32---\|		// \|---32---\|---32---\|---32---\|---32---\|
// 128 96 64 32 0		// 128 96 64 32 0
Show All 12 Lines	if (Scale < NVTSize) {
SDValue SRLAmnt = DAG.getConstant(Scale, dl, ShiftTy);		SDValue SRLAmnt = DAG.getConstant(Scale, dl, ShiftTy);
SDValue SHLAmnt = DAG.getConstant(NVTSize - Scale, dl, ShiftTy);		SDValue SHLAmnt = DAG.getConstant(NVTSize - Scale, dl, ShiftTy);
Lo = DAG.getNode(ISD::SRL, dl, NVT, ResultLL, SRLAmnt);		Lo = DAG.getNode(ISD::SRL, dl, NVT, ResultLL, SRLAmnt);
Lo = DAG.getNode(ISD::OR, dl, NVT, Lo,		Lo = DAG.getNode(ISD::OR, dl, NVT, Lo,
DAG.getNode(ISD::SHL, dl, NVT, ResultLH, SHLAmnt));		DAG.getNode(ISD::SHL, dl, NVT, ResultLH, SHLAmnt));
Hi = DAG.getNode(ISD::SRL, dl, NVT, ResultLH, SRLAmnt);		Hi = DAG.getNode(ISD::SRL, dl, NVT, ResultLH, SRLAmnt);
Hi = DAG.getNode(ISD::OR, dl, NVT, Hi,		Hi = DAG.getNode(ISD::OR, dl, NVT, Hi,
DAG.getNode(ISD::SHL, dl, NVT, ResultHL, SHLAmnt));		DAG.getNode(ISD::SHL, dl, NVT, ResultHL, SHLAmnt));

		// We cannot overflow past HH when multiplying 2 ints of size VTSize, so the
		// highest bit of HH determines saturation direction in the event of
		// saturation.
		// The number of overflow bits we can check are VTSize - Scale + 1 (we
		// include the sign bit). If these top bits are > 0, then we overflowed past
		// the max value. If these top bits are < -1, then we overflowed past the
		// min value. Otherwise, we did not overflow.
		if (Saturating) {
		unsigned OverflowBits = VTSize - Scale + 1;
		assert(OverflowBits <= VTSize && OverflowBits > NVTSize &&
		"Extent of overflow bits must start within HL");
		SDValue HLHiMask = DAG.getConstant(
		APInt::getHighBitsSet(NVTSize, OverflowBits - NVTSize), dl, NVT);
		SDValue HLLoMask = DAG.getConstant(
		APInt::getLowBitsSet(NVTSize, VTSize - OverflowBits), dl, NVT);

		// HH > 0 or HH == 0 && HL > HLLoMask
		SDValue HHPos = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTZero, ISD::SETGT);
		SDValue HHZero = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTZero, ISD::SETEQ);
		SDValue HLPos =
		DAG.getSetCC(dl, BoolNVT, ResultHL, HLLoMask, ISD::SETUGT);
		SatMax = DAG.getNode(ISD::OR, dl, BoolNVT, HHPos,
		DAG.getNode(ISD::AND, dl, BoolNVT, HHZero, HLPos));

		// HH < -1 or HH == -1 && HL < HLHiMask
		SDValue HHNeg = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTNeg1, ISD::SETLT);
		SDValue HHNeg1 = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTNeg1, ISD::SETEQ);
		SDValue HLNeg =
		DAG.getSetCC(dl, BoolNVT, ResultHL, HLHiMask, ISD::SETULT);
		SatMin = DAG.getNode(ISD::OR, dl, BoolNVT, HHNeg,
		DAG.getNode(ISD::AND, dl, BoolNVT, HHNeg1, HLNeg));
		}
} else if (Scale == NVTSize) {		} else if (Scale == NVTSize) {
// If the scales are equal, Lo and Hi are ResultLH and Result HL,		// If the scales are equal, Lo and Hi are ResultLH and Result HL,
// respectively. Avoid shifting to prevent undefined behavior.		// respectively. Avoid shifting to prevent undefined behavior.
Lo = ResultLH;		Lo = ResultLH;
Hi = ResultHL;		Hi = ResultHL;

		// We overflow max if HH > 0 or HH == 0 && HL sign is negative.
		// We overflow min if HH < -1 or HH == -1 && HL sign is 0.
		if (Saturating) {
		SDValue HHPos = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTZero, ISD::SETGT);
		SDValue HHZero = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTZero, ISD::SETEQ);
		SDValue HLNeg = DAG.getSetCC(dl, BoolNVT, ResultHL, NVTZero, ISD::SETLT);
		SatMax = DAG.getNode(ISD::OR, dl, BoolNVT, HHPos,
		DAG.getNode(ISD::AND, dl, BoolNVT, HHZero, HLNeg));

		SDValue HHNeg = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTNeg1, ISD::SETLT);
		SDValue HHNeg1 = DAG.getSetCC(dl, BoolNVT, ResultHH, NVTNeg1, ISD::SETEQ);
		SDValue HLPos = DAG.getSetCC(dl, BoolNVT, ResultHL, NVTZero, ISD::SETGT);
		SatMin = DAG.getNode(ISD::OR, dl, BoolNVT, HHNeg,
		DAG.getNode(ISD::AND, dl, BoolNVT, HHNeg1, HLPos));
		}
} else if (Scale < VTSize) {		} else if (Scale < VTSize) {
// If the scale is instead less than the old VT size, but greater than or		// If the scale is instead less than the old VT size, but greater than or
// equal to the expanded VT size, the first part of the result (ResultLL) is		// equal to the expanded VT size, the first part of the result (ResultLL) is
// no longer a part of Lo because it would be scaled out anyway. Instead we		// no longer a part of Lo because it would be scaled out anyway. Instead we
// can start shifting right from the fourth part (ResultHH) to the second		// can start shifting right from the fourth part (ResultHH) to the second
// part (ResultLH), and Result LH will be the new Lo.		// part (ResultLH), and Result LH will be the new Lo.
SDValue SRLAmnt = DAG.getConstant(Scale - NVTSize, dl, ShiftTy);		SDValue SRLAmnt = DAG.getConstant(Scale - NVTSize, dl, ShiftTy);
SDValue SHLAmnt = DAG.getConstant(VTSize - Scale, dl, ShiftTy);		SDValue SHLAmnt = DAG.getConstant(VTSize - Scale, dl, ShiftTy);
Lo = DAG.getNode(ISD::SRL, dl, NVT, ResultLH, SRLAmnt);		Lo = DAG.getNode(ISD::SRL, dl, NVT, ResultLH, SRLAmnt);
Lo = DAG.getNode(ISD::OR, dl, NVT, Lo,		Lo = DAG.getNode(ISD::OR, dl, NVT, Lo,
DAG.getNode(ISD::SHL, dl, NVT, ResultHL, SHLAmnt));		DAG.getNode(ISD::SHL, dl, NVT, ResultHL, SHLAmnt));
Hi = DAG.getNode(ISD::SRL, dl, NVT, ResultHL, SRLAmnt);		Hi = DAG.getNode(ISD::SRL, dl, NVT, ResultHL, SRLAmnt);
Hi = DAG.getNode(ISD::OR, dl, NVT, Hi,		Hi = DAG.getNode(ISD::OR, dl, NVT, Hi,
DAG.getNode(ISD::SHL, dl, NVT, ResultHH, SHLAmnt));		DAG.getNode(ISD::SHL, dl, NVT, ResultHH, SHLAmnt));

		// This is similar to the case when we saturate if Scale < NVTSize, but we
		// only need to chech HH.
		if (Saturating) {
		unsigned OverflowBits = VTSize - Scale + 1;
		SDValue HHHiMask = DAG.getConstant(
		APInt::getHighBitsSet(NVTSize, OverflowBits), dl, NVT);
		SDValue HHLoMask = DAG.getConstant(
		APInt::getLowBitsSet(NVTSize, NVTSize - OverflowBits), dl, NVT);

		SatMax = DAG.getSetCC(dl, BoolNVT, ResultHH, HHLoMask, ISD::SETGT);
		SatMin = DAG.getSetCC(dl, BoolNVT, ResultHH, HHHiMask, ISD::SETLT);
		}
} else if (Scale == VTSize) {		} else if (Scale == VTSize) {
assert(		assert(
!Signed &&		!Signed &&
"Only unsigned types can have a scale equal to the operand bit width");		"Only unsigned types can have a scale equal to the operand bit width");

Lo = ResultHL;		Lo = ResultHL;
Hi = ResultHH;		Hi = ResultHH;
} else {		} else {
llvm_unreachable("Expected the scale to be less than or equal to the width "		llvm_unreachable("Expected the scale to be less than or equal to the width "
"of the operands");		"of the operands");
}		}

		if (Saturating) {
		APInt LHMax = APInt::getSignedMaxValue(NVTSize);
		APInt LLMax = APInt::getAllOnesValue(NVTSize);
		APInt LHMin = APInt::getSignedMinValue(NVTSize);
		Hi = DAG.getSelect(dl, NVT, SatMax, DAG.getConstant(LHMax, dl, NVT), Hi);
		Hi = DAG.getSelect(dl, NVT, SatMin, DAG.getConstant(LHMin, dl, NVT), Hi);
		Lo = DAG.getSelect(dl, NVT, SatMax, DAG.getConstant(LLMax, dl, NVT), Lo);
		Lo = DAG.getSelect(dl, NVT, SatMin, NVTZero, Lo);
		}
}		}

void DAGTypeLegalizer::ExpandIntRes_SADDSUBO(SDNode *Node,		void DAGTypeLegalizer::ExpandIntRes_SADDSUBO(SDNode *Node,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
SDValue LHS = Node->getOperand(0);		SDValue LHS = Node->getOperand(0);
SDValue RHS = Node->getOperand(1);		SDValue RHS = Node->getOperand(1);
SDLoc dl(Node);		SDLoc dl(Node);

▲ Show 20 Lines • Show All 1,208 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::USUBSAT:		case ISD::USUBSAT:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;		break;
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX: {		case ISD::UMULFIX: {
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),		Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);		Node->getValueType(0), Scale);
break;		break;
}		}
case ISD::FP_ROUND_INREG:		case ISD::FP_ROUND_INREG:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
▲ Show 20 Lines • Show All 913 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	#endif
case ISD::SADDO:		case ISD::SADDO:
case ISD::USUBO:		case ISD::USUBO:
case ISD::SSUBO:		case ISD::SSUBO:
case ISD::UMULO:		case ISD::UMULO:
case ISD::SMULO:		case ISD::SMULO:
R = ScalarizeVecRes_OverflowOp(N, ResNo);		R = ScalarizeVecRes_OverflowOp(N, ResNo);
break;		break;
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX:		case ISD::UMULFIX:
R = ScalarizeVecRes_MULFIX(N);		R = ScalarizeVecRes_MULFIX(N);
break;		break;
}		}

// If R is null, the sub-method took care of registering the result.		// If R is null, the sub-method took care of registering the result.
if (R.getNode())		if (R.getNode())
SetScalarizedVector(SDValue(N, ResNo), R);		SetScalarizedVector(SDValue(N, ResNo), R);
▲ Show 20 Lines • Show All 772 Lines • ▼ Show 20 Lines	#endif
case ISD::SADDO:		case ISD::SADDO:
case ISD::USUBO:		case ISD::USUBO:
case ISD::SSUBO:		case ISD::SSUBO:
case ISD::UMULO:		case ISD::UMULO:
case ISD::SMULO:		case ISD::SMULO:
SplitVecRes_OverflowOp(N, ResNo, Lo, Hi);		SplitVecRes_OverflowOp(N, ResNo, Lo, Hi);
break;		break;
case ISD::SMULFIX:		case ISD::SMULFIX:
		case ISD::SMULFIXSAT:
case ISD::UMULFIX:		case ISD::UMULFIX:
SplitVecRes_MULFIX(N, Lo, Hi);		SplitVecRes_MULFIX(N, Lo, Hi);
break;		break;
}		}

// If Lo/Hi is null, the sub-method took care of registering results etc.		// If Lo/Hi is null, the sub-method took care of registering results etc.
if (Lo.getNode())		if (Lo.getNode())
SetSplitVector(SDValue(N, ResNo), Lo, Hi);		SetSplitVector(SDValue(N, ResNo), Lo, Hi);
▲ Show 20 Lines • Show All 3,983 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,273 Lines • ▼ Show 20 Lines	SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
case Intrinsic::umul_fix: {		case Intrinsic::umul_fix: {
SDValue Op1 = getValue(I.getArgOperand(0));		SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));		SDValue Op2 = getValue(I.getArgOperand(1));
SDValue Op3 = getValue(I.getArgOperand(2));		SDValue Op3 = getValue(I.getArgOperand(2));
setValue(&I, DAG.getNode(FixedPointIntrinsicToOpcode(Intrinsic), sdl,		setValue(&I, DAG.getNode(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
Op1.getValueType(), Op1, Op2, Op3));		Op1.getValueType(), Op1, Op2, Op3));
return nullptr;		return nullptr;
}		}
		case Intrinsic::smul_fix_sat: {
		SDValue Op1 = getValue(I.getArgOperand(0));
		SDValue Op2 = getValue(I.getArgOperand(1));
		SDValue Op3 = getValue(I.getArgOperand(2));
		setValue(&I, DAG.getNode(ISD::SMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
		Op3));
		return nullptr;
		}
case Intrinsic::stacksave: {		case Intrinsic::stacksave: {
SDValue Op = getRoot();		SDValue Op = getRoot();
Res = DAG.getNode(		Res = DAG.getNode(
ISD::STACKSAVE, sdl,		ISD::STACKSAVE, sdl,
DAG.getVTList(TLI.getPointerTy(DAG.getDataLayout()), MVT::Other), Op);		DAG.getVTList(TLI.getPointerTy(DAG.getDataLayout()), MVT::Other), Op);
setValue(&I, Res);		setValue(&I, Res);
DAG.setRoot(Res.getValue(1));		DAG.setRoot(Res.getValue(1));
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 4,593 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	#endif
case ISD::SHL_PARTS: return "shl_parts";		case ISD::SHL_PARTS: return "shl_parts";
case ISD::SRA_PARTS: return "sra_parts";		case ISD::SRA_PARTS: return "sra_parts";
case ISD::SRL_PARTS: return "srl_parts";		case ISD::SRL_PARTS: return "srl_parts";

case ISD::SADDSAT: return "saddsat";		case ISD::SADDSAT: return "saddsat";
case ISD::UADDSAT: return "uaddsat";		case ISD::UADDSAT: return "uaddsat";
case ISD::SSUBSAT: return "ssubsat";		case ISD::SSUBSAT: return "ssubsat";
case ISD::USUBSAT: return "usubsat";		case ISD::USUBSAT: return "usubsat";

case ISD::SMULFIX: return "smulfix";		case ISD::SMULFIX: return "smulfix";
		case ISD::SMULFIXSAT: return "smulfixsat";
case ISD::UMULFIX: return "umulfix";		case ISD::UMULFIX: return "umulfix";

// Conversion operators.		// Conversion operators.
case ISD::SIGN_EXTEND: return "sign_extend";		case ISD::SIGN_EXTEND: return "sign_extend";
case ISD::ZERO_EXTEND: return "zero_extend";		case ISD::ZERO_EXTEND: return "zero_extend";
case ISD::ANY_EXTEND: return "any_extend";		case ISD::ANY_EXTEND: return "any_extend";
case ISD::SIGN_EXTEND_INREG: return "sign_extend_inreg";		case ISD::SIGN_EXTEND_INREG: return "sign_extend_inreg";
case ISD::ANY_EXTEND_VECTOR_INREG: return "any_extend_vector_inreg";		case ISD::ANY_EXTEND_VECTOR_INREG: return "any_extend_vector_inreg";
▲ Show 20 Lines • Show All 638 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 4,495 Lines • ▼ Show 20 Lines	bool TargetLowering::expandMUL_LOHI(unsigned Opcode, EVT VT, SDLoc dl,
if (!MakeMUL_LOHI(LH, RL, Lo, Hi, false))		if (!MakeMUL_LOHI(LH, RL, Lo, Hi, false))
return false;		return false;

SDValue Zero = DAG.getConstant(0, dl, HiLoVT);		SDValue Zero = DAG.getConstant(0, dl, HiLoVT);
EVT BoolType = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);		EVT BoolType = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);

bool UseGlue = (isOperationLegalOrCustom(ISD::ADDC, VT) &&		bool UseGlue = (isOperationLegalOrCustom(ISD::ADDC, VT) &&
isOperationLegalOrCustom(ISD::ADDE, VT));		isOperationLegalOrCustom(ISD::ADDE, VT));
if (UseGlue)		if (UseGlue)
		bjopeUnsubmitted Done Reply Inline Actions Is this newline by mistake? Seems unrelated to the patch. bjope: Is this newline by mistake? Seems unrelated to the patch.
		leonardchanAuthorUnsubmitted Done Reply Inline Actions Accidental newline leonardchan: Accidental newline
Next = DAG.getNode(ISD::ADDC, dl, DAG.getVTList(VT, MVT::Glue), Next,		Next = DAG.getNode(ISD::ADDC, dl, DAG.getVTList(VT, MVT::Glue), Next,
Merge(Lo, Hi));		Merge(Lo, Hi));
else		else
Next = DAG.getNode(ISD::ADDCARRY, dl, DAG.getVTList(VT, BoolType), Next,		Next = DAG.getNode(ISD::ADDCARRY, dl, DAG.getVTList(VT, BoolType), Next,
Merge(Lo, Hi), DAG.getConstant(0, dl, BoolType));		Merge(Lo, Hi), DAG.getConstant(0, dl, BoolType));

SDValue Carry = Next.getValue(1);		SDValue Carry = Next.getValue(1);
Result.push_back(DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, Next));		Result.push_back(DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, Next));
▲ Show 20 Lines • Show All 1,177 Lines • ▼ Show 20 Lines	if (Opcode == ISD::UADDSAT) {
Result = DAG.getSelect(dl, VT, SumNeg, SatMax, SatMin);		Result = DAG.getSelect(dl, VT, SumNeg, SatMax, SatMin);
return DAG.getSelect(dl, VT, Overflow, Result, SumDiff);		return DAG.getSelect(dl, VT, Overflow, Result, SumDiff);
}		}
}		}

SDValue		SDValue
TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {		TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {
assert((Node->getOpcode() == ISD::SMULFIX \|\|		assert((Node->getOpcode() == ISD::SMULFIX \|\|
Node->getOpcode() == ISD::UMULFIX) &&		Node->getOpcode() == ISD::UMULFIX \|\|
"Expected opcode to be SMULFIX or UMULFIX.");		Node->getOpcode() == ISD::SMULFIXSAT) &&
RKSimonUnsubmitted Done Reply Inline Actions Why drop the assert? Why not just add ISD::SMULFIXSAT tests? RKSimon: Why drop the assert? Why not just add ISD::SMULFIXSAT tests?
		"Expected a fixed point multiplication opcode");

SDLoc dl(Node);		SDLoc dl(Node);
SDValue LHS = Node->getOperand(0);		SDValue LHS = Node->getOperand(0);
SDValue RHS = Node->getOperand(1);		SDValue RHS = Node->getOperand(1);
EVT VT = LHS.getValueType();		EVT VT = LHS.getValueType();
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
		bool Saturating = Node->getOpcode() == ISD::SMULFIXSAT;
		EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
		unsigned VTSize = VT.getScalarSizeInBits();

// [us]mul.fix(a, b, 0) -> mul(a, b)
if (!Scale) {		if (!Scale) {
if (VT.isVector() && !isOperationLegalOrCustom(ISD::MUL, VT))		// [us]mul.fix(a, b, 0) -> mul(a, b)
return SDValue();		if (!Saturating && isOperationLegalOrCustom(ISD::MUL, VT)) {
return DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);		return DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
}		} else if (Saturating && isOperationLegalOrCustom(ISD::SMULO, VT)) {
		SDValue Result =
		DAG.getNode(ISD::SMULO, dl, DAG.getVTList(VT, BoolVT), LHS, RHS);
		SDValue Product = Result.getValue(0);
		SDValue Overflow = Result.getValue(1);
		SDValue Zero = DAG.getConstant(0, dl, VT);

unsigned VTSize = VT.getScalarSizeInBits();		APInt MinVal = APInt::getSignedMinValue(VTSize);
bool Signed = Node->getOpcode() == ISD::SMULFIX;		APInt MaxVal = APInt::getSignedMaxValue(VTSize);
		SDValue SatMin = DAG.getConstant(MinVal, dl, VT);
		SDValue SatMax = DAG.getConstant(MaxVal, dl, VT);
		SDValue ProdNeg = DAG.getSetCC(dl, BoolVT, Product, Zero, ISD::SETLT);
		Result = DAG.getSelect(dl, VT, ProdNeg, SatMax, SatMin);
		return DAG.getSelect(dl, VT, Overflow, Result, Product);
		}
		RKSimonUnsubmitted Done Reply Inline Actions You've changed the logic to let non-vector cases to fall through, which leads to UNDEFs for scale == 0 cases. RKSimon: You've changed the logic to let non-vector cases to fall through, which leads to UNDEFs for…
		RKSimonUnsubmitted Not Done Reply Inline Actions I think you need something like this here (please double check my logic): if (VT.isVector() && !isOperationLegalOrCustom(ISD::SMULO, VT) && !(!Saturating && isOperationLegalOrCustom(ISD::MUL, VT))) return SDValue(); // unroll And that will let you avoid the return SDValue() below by always defaulting to a scalar ISD::MUL/ISD::SMULO that legalization can handle. RKSimon: I think you need something like this here (please double check my logic): ``` if (VT.isVector()…
		leonardchanAuthorUnsubmitted Done Reply Inline Actions I thought we still want to allow vectors to pass to calls to `MUL` and `SMULO`? Wouldn't this scalarize when we disallow vectors even if `MUL` and `SMULO` are legeal? leonardchan: I thought we still want to allow vectors to pass to calls to `MUL` and `SMULO`? Wouldn't this…
		RKSimonUnsubmitted Done Reply Inline Actions I've committed rL353546 which /should/ mean that the scale==0 case is now safe to drop through. RKSimon: I've committed rL353546 which /should/ mean that the scale==0 case is now safe to drop through.
		leonardchanAuthorUnsubmitted Done Reply Inline Actions Dropped and can confirm this works for my tests leonardchan: Dropped and can confirm this works for my tests
		}

		bool Signed =
		Node->getOpcode() == ISD::SMULFIX \|\| Node->getOpcode() == ISD::SMULFIXSAT;
assert(((Signed && Scale < VTSize) \|\| (!Signed && Scale <= VTSize)) &&		assert(((Signed && Scale < VTSize) \|\| (!Signed && Scale <= VTSize)) &&
"Expected scale to be less than the number of bits if signed or at "		"Expected scale to be less than the number of bits if signed or at "
"most the number of bits if unsigned.");		"most the number of bits if unsigned.");
assert(LHS.getValueType() == RHS.getValueType() &&		assert(LHS.getValueType() == RHS.getValueType() &&
"Expected both operands to be the same type");		"Expected both operands to be the same type");

// Get the upper and lower bits of the result.		// Get the upper and lower bits of the result.
SDValue Lo, Hi;		SDValue Lo, Hi;
Show All 16 Lines	if (Scale == VTSize)
// Result is just the top half since we'd be shifting by the width of the		// Result is just the top half since we'd be shifting by the width of the
// operand.		// operand.
return Hi;		return Hi;

// The result will need to be shifted right by the scale since both operands		// The result will need to be shifted right by the scale since both operands
// are scaled. The result is given to us in 2 halves, so we only want part of		// are scaled. The result is given to us in 2 halves, so we only want part of
// both in the result.		// both in the result.
EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
return DAG.getNode(ISD::FSHR, dl, VT, Hi, Lo,		SDValue Result = DAG.getNode(ISD::FSHR, dl, VT, Hi, Lo,
DAG.getConstant(Scale, dl, ShiftTy));		DAG.getConstant(Scale, dl, ShiftTy));
		RKSimonUnsubmitted Done Reply Inline Actions (style) Do an early out to reduce indentation if (!Saturating) return Result; RKSimon: (style) Do an early out to reduce indentation ``` if (!Saturating) return Result; ```
		if (!Saturating)
		return Result;

		unsigned OverflowBits = VTSize - Scale + 1; // +1 for the sign
		SDValue HiMask =
		DAG.getConstant(APInt::getHighBitsSet(VTSize, OverflowBits), dl, VT);
		SDValue LoMask = DAG.getConstant(
		APInt::getLowBitsSet(VTSize, VTSize - OverflowBits), dl, VT);
		APInt MaxVal = APInt::getSignedMaxValue(VTSize);
		APInt MinVal = APInt::getSignedMinValue(VTSize);

		Result = DAG.getSelectCC(dl, Hi, LoMask,
		DAG.getConstant(MaxVal, dl, VT), Result,
		ISD::SETGT);
		return DAG.getSelectCC(dl, Hi, HiMask,
		DAG.getConstant(MinVal, dl, VT), Result,
		ISD::SETLT);
}		}

bool TargetLowering::expandMULO(SDNode *Node, SDValue &Result,		bool TargetLowering::expandMULO(SDNode *Node, SDValue &Result,
SDValue &Overflow, SelectionDAG &DAG) const {		SDValue &Overflow, SelectionDAG &DAG) const {
SDLoc dl(Node);		SDLoc dl(Node);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
SDValue LHS = Node->getOperand(0);		SDValue LHS = Node->getOperand(0);
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::all_valuetypes()) {
setOperationAction(ISD::ABS, VT, Expand);		setOperationAction(ISD::ABS, VT, Expand);
setOperationAction(ISD::FSHL, VT, Expand);		setOperationAction(ISD::FSHL, VT, Expand);
setOperationAction(ISD::FSHR, VT, Expand);		setOperationAction(ISD::FSHR, VT, Expand);
setOperationAction(ISD::SADDSAT, VT, Expand);		setOperationAction(ISD::SADDSAT, VT, Expand);
setOperationAction(ISD::UADDSAT, VT, Expand);		setOperationAction(ISD::UADDSAT, VT, Expand);
setOperationAction(ISD::SSUBSAT, VT, Expand);		setOperationAction(ISD::SSUBSAT, VT, Expand);
setOperationAction(ISD::USUBSAT, VT, Expand);		setOperationAction(ISD::USUBSAT, VT, Expand);
setOperationAction(ISD::SMULFIX, VT, Expand);		setOperationAction(ISD::SMULFIX, VT, Expand);
		setOperationAction(ISD::SMULFIXSAT, VT, Expand);
		bjopeUnsubmitted Done Reply Inline Actions I'm not sure how to do this when overriding for a specific target. In our case we want it to be legal, but only when the scale is 15 (and VT is i16 or i24) or the scale is 31 (and VT is i32 or i40). Is there some easy solution for that? Setting it to legal/custom for any scale might be seen as an indication for optimizers that it is OK to introduce these operations for any scale. This is however a general comment, also for the already pushed non-saturating versions. So it isn't anything that you need to deal with in this patch. But we might need a better solution in the long term. bjope: I'm not sure how to do this when overriding for a specific target. In our case we want it to be…
		ebevhanUnsubmitted Done Reply Inline Actions That's what the `isSupportedFixedPointOperation` hook in TargetLowering is for. ebevhan: That's what the `isSupportedFixedPointOperation` hook in TargetLowering is for.
		bjopeUnsubmitted Done Reply Inline Actions Ah, yes! And that is updated in this patch. Just me being blind (in combination with a some amnesia). bjope: Ah, yes! And that is updated in this patch. Just me being blind (in combination with a some…
setOperationAction(ISD::UMULFIX, VT, Expand);		setOperationAction(ISD::UMULFIX, VT, Expand);

// Overflow operations default to expand		// Overflow operations default to expand
setOperationAction(ISD::SADDO, VT, Expand);		setOperationAction(ISD::SADDO, VT, Expand);
setOperationAction(ISD::SSUBO, VT, Expand);		setOperationAction(ISD::SSUBO, VT, Expand);
setOperationAction(ISD::UADDO, VT, Expand);		setOperationAction(ISD::UADDO, VT, Expand);
setOperationAction(ISD::USUBO, VT, Expand);		setOperationAction(ISD::USUBO, VT, Expand);
setOperationAction(ISD::SMULO, VT, Expand);		setOperationAction(ISD::SMULO, VT, Expand);
▲ Show 20 Lines • Show All 1,259 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,589 Lines • ▼ Show 20 Lines	Assert(Op1->getType()->isIntOrIntVectorTy(),
"first operand of [us][add\|sub]_sat must be an int type or vector "		"first operand of [us][add\|sub]_sat must be an int type or vector "
"of ints");		"of ints");
Assert(Op2->getType()->isIntOrIntVectorTy(),		Assert(Op2->getType()->isIntOrIntVectorTy(),
"second operand of [us][add\|sub]_sat must be an int type or vector "		"second operand of [us][add\|sub]_sat must be an int type or vector "
"of ints");		"of ints");
break;		break;
}		}
case Intrinsic::smul_fix:		case Intrinsic::smul_fix:
		case Intrinsic::smul_fix_sat:
case Intrinsic::umul_fix: {		case Intrinsic::umul_fix: {
Value *Op1 = Call.getArgOperand(0);		Value *Op1 = Call.getArgOperand(0);
Value *Op2 = Call.getArgOperand(1);		Value *Op2 = Call.getArgOperand(1);
Assert(Op1->getType()->isIntOrIntVectorTy(),		Assert(Op1->getType()->isIntOrIntVectorTy(),
"first operand of [us]mul_fix must be an int type or vector "		"first operand of [us]mul_fix[_sat] must be an int type or vector "
"of ints");		"of ints");
Assert(Op2->getType()->isIntOrIntVectorTy(),		Assert(Op2->getType()->isIntOrIntVectorTy(),
"second operand of [us]mul_fix must be an int type or vector "		"second operand of [us]mul_fix_[sat] must be an int type or vector "
"of ints");		"of ints");

auto *Op3 = cast<ConstantInt>(Call.getArgOperand(2));		auto *Op3 = cast<ConstantInt>(Call.getArgOperand(2));
Assert(Op3->getType()->getBitWidth() <= 32,		Assert(Op3->getType()->getBitWidth() <= 32,
"third argument of [us]mul_fix must fit within 32 bits");		"third argument of [us]mul_fix[_sat] must fit within 32 bits");

if (ID == Intrinsic::smul_fix) {		if (ID == Intrinsic::smul_fix \|\| ID == Intrinsic::smul_fix_sat) {
Assert(		Assert(
Op3->getZExtValue() < Op1->getType()->getScalarSizeInBits(),		Op3->getZExtValue() < Op1->getType()->getScalarSizeInBits(),
"the scale of smul_fix must be less than the width of the operands");		"the scale of smul_fix[_sat] must be less than the width of the operands");
} else {		} else {
Assert(Op3->getZExtValue() <= Op1->getType()->getScalarSizeInBits(),		Assert(Op3->getZExtValue() <= Op1->getType()->getScalarSizeInBits(),
"the scale of umul_fix must be less than or equal to the width of "		"the scale of umul_fix[_sat] must be less than or equal to the width of "
"the operands");		"the operands");
}		}
break;		break;
}		}
};		};
}		}

/// Carefully grab the subprogram from a local scope.		/// Carefully grab the subprogram from a local scope.
▲ Show 20 Lines • Show All 772 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/smul_fix_sat.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64
				; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86
				bjopeUnsubmitted Not Done Reply Inline Actions The expansion is quite complicated (the splitting in four parts and detecting overflow etc). Isn't there a risk that X86 is a typical target that will try to find more optimal solutions and maybe also make SMULFIXSAT legal? Then this test case might not really verify the expand code any longer? On the other hand, these test cases are just jibberish to me anyway. I can't tell from looking at the checks that DAGTypeLegalizer::ExpandIntRes_MULFIX is doing the right thing. And it would not really help if using another target. Are there perhaps other ways to test DAGTypeLegalizer, such as unit tests? One thing that probably can be done quite easily is to a bunch of tests using constant operands. Verifying that DAGCombiner will constant fold to the expected result after having expanded into legal operations (somehow making sure that DAGCombiner do not constant fold the SMULFIXSAT before it has been expanded, I guess someone will add such DAGCombines sooner or later). That way you might be able to get coverage for all paths through DAGTypeLegalizer::ExpandIntRes_MULFIX. Maybe this test should be in a separate test file. bjope: The expansion is quite complicated (the splitting in four parts and detecting overflow etc).
				leonardchanAuthorUnsubmitted Done Reply Inline Actions Yeah I can see how these tests are hard to read. I wasn't aware of other ways this could be tested other than making sure the codegen is the same each time. I have my own scripts with different cases to verify the output is correct, but wasn't sure of any existing widely used method of "taking my IR, running it, and verify the results". Testing with constant operands seems to produce better looking tests for non-saturating multiplication: define i4 @func() { ; X64-LABEL: func: ; X64: # %bb.0: ; X64-NEXT: movb $3, %al ; X64-NEXT: retq %tmp = call i4 @llvm.smul.fix.i4( i4 3, i4 2 , i32 1) ret i4 %tmp } where we can immediately tell the result is 3, but there's still branching in the saturating case: define i4 @func2() { ; X64-LABEL: func2: ; X64: # %bb.0: ; X64-NEXT: xorl %eax, %eax ; X64-NEXT: testb %al, %al ; X64-NEXT: movb $127, %cl ; X64-NEXT: jg .LBB1_2 ; X64-NEXT: # %bb.1: ; X64-NEXT: movb $3, %cl ; X64-NEXT: .LBB1_2: ; X64-NEXT: movb $-1, %al ; X64-NEXT: negb %al ; X64-NEXT: movb $-128, %al ; X64-NEXT: jl .LBB1_4 ; X64-NEXT: # %bb.3: ; X64-NEXT: movl %ecx, %eax ; X64-NEXT: .LBB1_4: ; X64-NEXT: retq %tmp = call i4 @llvm.smul.fix.sat.i4( i4 3, i4 2 , i32 1) ret i4 %tmp } so we can't get something as straightforward as with non-saturating. leonardchan: Yeah I can see how these tests are hard to read. I wasn't aware of other ways this could be…
				leonardchanAuthorUnsubmitted Done Reply Inline Actions @bjope I added another test file that covers the saturation branches in ExpandIntRes_MULFIX using constant operands, although this doesn't seem to produce anything more readable than with variable operands. leonardchan: @bjope I added another test file that covers the saturation branches in ExpandIntRes_MULFIX…
				bjopeUnsubmitted Not Done Reply Inline Actions Maybe it doesn't fold due to lack of constant folding for SMUL_LOHI (at least not for x86). What a pity. I tried running the test using -mtriple=x86_64--, that at least produce code that is easier to map to the expansion. I also tried some other targets: -mtriple=ppc32 => looks like we get some constant folding here -mtriple=ppc64 => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=hexagon => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=systemz => asserts in llvm::SelectionDAG::transferDbgValues -mtriple=sparc => LLVM ERROR: Cannot select: t42: i32,i32 = addcarry t41:1, Constant:i32<0>, t90:1 (FWIW, no idea if the asserts and LLVM ERROR actually is related to your patch) bjope: Maybe it doesn't fold due to lack of constant folding for SMUL_LOHI (at least not for x86).
				leonardchanAuthorUnsubmitted Done Reply Inline Actions Updated the test to use `-mtriple=x86_64-linux`and it looks a lot more readable. leonardchan: Updated the test to use `-mtriple=x86_64-linux`and it looks a lot more readable.
				bjopeUnsubmitted Not Done Reply Inline Actions Have you looked at the problem with asserts in llvm::SelectionDAG::transferDbgValues? It happens when expanding smulfixsat, so something seems to be broken regarding the legalization (depending on target used). bjope: Have you looked at the problem with asserts in llvm::SelectionDAG::transferDbgValues? It…
				leonardchanAuthorUnsubmitted Done Reply Inline Actions For `addcarry`, the problem seems to be that `ISD::ADDCARRY` is not supported on some 32 bit targets. The fix for this is just a check in `expandMUL_LOHI` to see if this operation is legal (https://reviews.llvm.org/D59119). For `llvm::SelectionDAG::transferDbgValues`, this is because `expandFixedPointMul` returns an empty `SDValue()` to indicate this function failed due to some unsupported operation (most likely `ISD::SMULO`). I imagine the simplest solution for this is to just `report_fatal_error` since we do not have other operations we can use to perform saturation multiplication. leonardchan: - For `addcarry`, the problem seems to be that `ISD::ADDCARRY` is not supported on some 32 bit…

				declare i4 @llvm.smul.fix.sat.i4 (i4, i4, i32)
				declare i32 @llvm.smul.fix.sat.i32 (i32, i32, i32)
				declare i64 @llvm.smul.fix.sat.i64 (i64, i64, i32)
				declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)

				define i32 @func(i32 %x, i32 %y) nounwind {
				; X64-LABEL: func:
				; X64: # %bb.0:
				; X64-NEXT: movslq %esi, %rax
				; X64-NEXT: movslq %edi, %rcx
				; X64-NEXT: imulq %rax, %rcx
				; X64-NEXT: movq %rcx, %rax
				; X64-NEXT: shrq $32, %rax
				; X64-NEXT: shrdl $2, %eax, %ecx
				; X64-NEXT: cmpl $1, %eax
				; X64-NEXT: movl $2147483647, %edx # imm = 0x7FFFFFFF
				; X64-NEXT: cmovlel %ecx, %edx
				; X64-NEXT: cmpl $-2, %eax
				; X64-NEXT: movl $-2147483648, %eax # imm = 0x80000000
				; X64-NEXT: cmovgel %edx, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: func:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: shrdl $2, %edx, %eax
				; X86-NEXT: cmpl $1, %edx
				; X86-NEXT: movl $2147483647, %ecx # imm = 0x7FFFFFFF
				; X86-NEXT: cmovgl %ecx, %eax
				; X86-NEXT: cmpl $-2, %edx
				; X86-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: cmovll %ecx, %eax
				; X86-NEXT: retl
				%tmp = call i32 @llvm.smul.fix.sat.i32(i32 %x, i32 %y, i32 2);
				ret i32 %tmp;
				}

				define i64 @func2(i64 %x, i64 %y) nounwind {
				; X64-LABEL: func2:
				; X64: # %bb.0:
				RKSimonUnsubmitted Done Reply Inline Actions nounwind RKSimon: nounwind
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: imulq %rsi
				; X64-NEXT: shrdq $2, %rdx, %rax
				; X64-NEXT: cmpq $1, %rdx
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: cmpq $-2, %rdx
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: func2:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull %esi
				; X86-NEXT: movl %edx, %edi
				; X86-NEXT: movl %eax, %ebx
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl %edx, %ebp
				; X86-NEXT: addl %ebx, %ebp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: adcl $0, %edi
				; X86-NEXT: movl %ebx, %eax
				; X86-NEXT: imull %esi
				; X86-NEXT: movl %edx, %ecx
				; X86-NEXT: movl %eax, %esi
				; X86-NEXT: movl %ebx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: addl %ebp, %eax
				; X86-NEXT: adcl %edi, %edx
				; X86-NEXT: adcl $0, %ecx
				; X86-NEXT: addl %esi, %edx
				; X86-NEXT: adcl $0, %ecx
				; X86-NEXT: movl %edx, %esi
				; X86-NEXT: subl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl %ecx, %edi
				; X86-NEXT: sbbl $0, %edi
				; X86-NEXT: testl %ebx, %ebx
				; X86-NEXT: cmovnsl %ecx, %edi
				; X86-NEXT: cmovnsl %edx, %esi
				; X86-NEXT: movl %esi, %ecx
				; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %edi, %ebp
				; X86-NEXT: sbbl $0, %ebp
				; X86-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: cmovnsl %edi, %ebp
				; X86-NEXT: cmovnsl %esi, %ecx
				; X86-NEXT: testl %ebp, %ebp
				; X86-NEXT: setg %bh
				; X86-NEXT: sete {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
				; X86-NEXT: cmpl $1, %ecx
				; X86-NEXT: seta %bl
				; X86-NEXT: movl %ecx, %edx
				; X86-NEXT: shldl $30, %eax, %edx
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
				; X86-NEXT: shldl $30, %esi, %eax
				; X86-NEXT: andb {{[-0-9]+}}(%e{{[sb]}}p), %bl # 1-byte Folded Reload
				; X86-NEXT: orb %bh, %bl
				; X86-NEXT: testb %bl, %bl
				; X86-NEXT: movl $2147483647, %esi # imm = 0x7FFFFFFF
				; X86-NEXT: cmovnel %esi, %edx
				; X86-NEXT: movl $-1, %esi
				; X86-NEXT: cmovnel %esi, %eax
				; X86-NEXT: cmpl $-1, %ebp
				; X86-NEXT: setl %bl
				; X86-NEXT: sete %bh
				; X86-NEXT: cmpl $-2, %ecx
				; X86-NEXT: setb %cl
				; X86-NEXT: andb %bh, %cl
				; X86-NEXT: xorl %esi, %esi
				; X86-NEXT: orb %bl, %cl
				; X86-NEXT: cmovnel %esi, %eax
				; X86-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: cmovnel %ecx, %edx
				; X86-NEXT: addl $8, %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 2);
				ret i64 %tmp;
				}

				define i4 @func3(i4 %x, i4 %y) nounwind {
				; X64-LABEL: func3:
				; X64: # %bb.0:
				; X64-NEXT: shlb $4, %sil
				; X64-NEXT: sarb $4, %sil
				; X64-NEXT: shlb $4, %dil
				; X64-NEXT: movsbl %dil, %eax
				; X64-NEXT: movsbl %sil, %ecx
				; X64-NEXT: imull %eax, %ecx
				; X64-NEXT: movl %ecx, %eax
				; X64-NEXT: shrb $2, %al
				; X64-NEXT: shrl $8, %ecx
				; X64-NEXT: movl %ecx, %edx
				; X64-NEXT: shlb $6, %dl
				; X64-NEXT: orb %al, %dl
				; X64-NEXT: movzbl %dl, %eax
				; X64-NEXT: cmpb $1, %cl
				; X64-NEXT: movl $127, %edx
				; X64-NEXT: cmovlel %eax, %edx
				; X64-NEXT: cmpb $-2, %cl
				; X64-NEXT: movl $128, %eax
				; X64-NEXT: cmovgel %edx, %eax
				; X64-NEXT: sarb $4, %al
				; X64-NEXT: # kill: def $al killed $al killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: func3:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %al
				; X86-NEXT: shlb $4, %al
				; X86-NEXT: sarb $4, %al
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: shlb $4, %cl
				; X86-NEXT: movsbl %cl, %ecx
				; X86-NEXT: movsbl %al, %eax
				; X86-NEXT: imull %ecx, %eax
				; X86-NEXT: movb %ah, %cl
				; X86-NEXT: shlb $6, %cl
				; X86-NEXT: shrb $2, %al
				; X86-NEXT: orb %cl, %al
				; X86-NEXT: movzbl %al, %ecx
				; X86-NEXT: cmpb $1, %ah
				; X86-NEXT: movl $127, %edx
				; X86-NEXT: cmovlel %ecx, %edx
				; X86-NEXT: cmpb $-2, %ah
				; X86-NEXT: movl $128, %eax
				; X86-NEXT: cmovgel %edx, %eax
				; X86-NEXT: sarb $4, %al
				; X86-NEXT: # kill: def $al killed $al killed $eax
				; X86-NEXT: retl
				%tmp = call i4 @llvm.smul.fix.sat.i4(i4 %x, i4 %y, i32 2);
				ret i4 %tmp;
				}

				define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
				; X64-LABEL: vec:
				; X64: # %bb.0:
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,1,2,3]
				; X64-NEXT: movd %xmm2, %eax
				; X64-NEXT: cltq
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[3,1,2,3]
				; X64-NEXT: movd %xmm2, %ecx
				; X64-NEXT: movslq %ecx, %rdx
				; X64-NEXT: imulq %rax, %rdx
				; X64-NEXT: movq %rdx, %rcx
				; X64-NEXT: shrq $32, %rcx
				; X64-NEXT: shrdl $2, %ecx, %edx
				; X64-NEXT: cmpl $1, %ecx
				; X64-NEXT: movl $2147483647, %eax # imm = 0x7FFFFFFF
				; X64-NEXT: cmovgl %eax, %edx
				; X64-NEXT: cmpl $-2, %ecx
				; X64-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
				; X64-NEXT: cmovll %ecx, %edx
				; X64-NEXT: movd %edx, %xmm2
				; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
				; X64-NEXT: movd %xmm3, %edx
				; X64-NEXT: movslq %edx, %rdx
				; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm0[2,3,0,1]
				; X64-NEXT: movd %xmm3, %esi
				; X64-NEXT: movslq %esi, %rsi
				; X64-NEXT: imulq %rdx, %rsi
				; X64-NEXT: movq %rsi, %rdx
				; X64-NEXT: shrq $32, %rdx
				; X64-NEXT: shrdl $2, %edx, %esi
				; X64-NEXT: cmpl $1, %edx
				; X64-NEXT: cmovgl %eax, %esi
				; X64-NEXT: cmpl $-2, %edx
				; X64-NEXT: cmovll %ecx, %esi
				; X64-NEXT: movd %esi, %xmm3
				; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
				; X64-NEXT: movd %xmm1, %edx
				; X64-NEXT: movslq %edx, %rdx
				; X64-NEXT: movd %xmm0, %esi
				; X64-NEXT: movslq %esi, %rsi
				; X64-NEXT: imulq %rdx, %rsi
				; X64-NEXT: movq %rsi, %rdx
				; X64-NEXT: shrq $32, %rdx
				; X64-NEXT: shrdl $2, %edx, %esi
				; X64-NEXT: cmpl $1, %edx
				; X64-NEXT: cmovgl %eax, %esi
				; X64-NEXT: cmpl $-2, %edx
				; X64-NEXT: cmovll %ecx, %esi
				; X64-NEXT: movd %esi, %xmm2
				; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,2,3]
				; X64-NEXT: movd %xmm1, %edx
				; X64-NEXT: movslq %edx, %rdx
				; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
				; X64-NEXT: movd %xmm0, %esi
				; X64-NEXT: movslq %esi, %rsi
				; X64-NEXT: imulq %rdx, %rsi
				; X64-NEXT: movq %rsi, %rdx
				; X64-NEXT: shrq $32, %rdx
				; X64-NEXT: shrdl $2, %edx, %esi
				; X64-NEXT: cmpl $1, %edx
				; X64-NEXT: cmovgl %eax, %esi
				; X64-NEXT: cmpl $-2, %edx
				; X64-NEXT: cmovll %ecx, %esi
				; X64-NEXT: movd %esi, %xmm0
				; X64-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
				; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; X64-NEXT: movdqa %xmm2, %xmm0
				; X64-NEXT: retq
				;
				; X86-LABEL: vec:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, %ecx
				; X86-NEXT: shrdl $2, %edx, %ecx
				; X86-NEXT: cmpl $1, %edx
				; X86-NEXT: movl $2147483647, %ebp # imm = 0x7FFFFFFF
				; X86-NEXT: cmovgl %ebp, %ecx
				; X86-NEXT: cmpl $-2, %edx
				; X86-NEXT: movl $-2147483648, %esi # imm = 0x80000000
				; X86-NEXT: cmovll %esi, %ecx
				; X86-NEXT: movl %edi, %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, %edi
				; X86-NEXT: shrdl $2, %edx, %edi
				; X86-NEXT: cmpl $1, %edx
				; X86-NEXT: cmovgl %ebp, %edi
				; X86-NEXT: cmpl $-2, %edx
				; X86-NEXT: cmovll %esi, %edi
				; X86-NEXT: movl %ebx, %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, %ebx
				; X86-NEXT: shrdl $2, %edx, %ebx
				; X86-NEXT: cmpl $1, %edx
				; X86-NEXT: cmovgl %ebp, %ebx
				; X86-NEXT: cmpl $-2, %edx
				; X86-NEXT: cmovll %esi, %ebx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: shrdl $2, %edx, %eax
				; X86-NEXT: cmpl $1, %edx
				; X86-NEXT: cmovgl %ebp, %eax
				; X86-NEXT: cmpl $-2, %edx
				; X86-NEXT: cmovll %esi, %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl %eax, 12(%edx)
				; X86-NEXT: movl %ebx, 8(%edx)
				; X86-NEXT: movl %edi, 4(%edx)
				; X86-NEXT: movl %ecx, (%edx)
				; X86-NEXT: movl %edx, %eax
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				%tmp = call <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32> %x, <4 x i32> %y, i32 2);
				ret <4 x i32> %tmp;
				}

				; These result in regular integer multiplication
				define i32 @func4(i32 %x, i32 %y) nounwind {
				; X64-LABEL: func4:
				; X64: # %bb.0:
				; X64-NEXT: movl %edi, %ecx
				; X64-NEXT: imull %esi, %ecx
				; X64-NEXT: xorl %eax, %eax
				; X64-NEXT: testl %ecx, %ecx
				; X64-NEXT: setns %al
				; X64-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X64-NEXT: imull %esi, %edi
				; X64-NEXT: cmovnol %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: func4:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl %eax, %esi
				; X86-NEXT: imull %edx, %esi
				; X86-NEXT: xorl %ecx, %ecx
				; X86-NEXT: testl %esi, %esi
				; X86-NEXT: setns %cl
				; X86-NEXT: addl $2147483647, %ecx # imm = 0x7FFFFFFF
				; X86-NEXT: imull %edx, %eax
				; X86-NEXT: cmovol %ecx, %eax
				; X86-NEXT: popl %esi
				; X86-NEXT: retl
				%tmp = call i32 @llvm.smul.fix.sat.i32(i32 %x, i32 %y, i32 0);
				ret i32 %tmp;
				}

				define i64 @func5(i64 %x, i64 %y) {
				; X64-LABEL: func5:
				; X64: # %bb.0:
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: imulq %rsi, %rax
				; X64-NEXT: xorl %ecx, %ecx
				; X64-NEXT: testq %rax, %rax
				; X64-NEXT: setns %cl
				; X64-NEXT: movabsq $9223372036854775807, %rax # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: addq %rcx, %rax
				; X64-NEXT: imulq %rsi, %rdi
				; X64-NEXT: cmovnoq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: func5:
				; X86: # %bb.0:
				; X86-NEXT: pushl %edi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 12
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 16
				; X86-NEXT: .cfi_offset %esi, -12
				; X86-NEXT: .cfi_offset %edi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl $0, (%esp)
				; X86-NEXT: movl %esp, %edi
				; X86-NEXT: pushl %edi
				; X86-NEXT: .cfi_adjust_cfa_offset 4
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_adjust_cfa_offset 4
				; X86-NEXT: pushl %edx
				; X86-NEXT: .cfi_adjust_cfa_offset 4
				; X86-NEXT: pushl %ecx
				; X86-NEXT: .cfi_adjust_cfa_offset 4
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_adjust_cfa_offset 4
				; X86-NEXT: calll __mulodi4
				; X86-NEXT: addl $20, %esp
				; X86-NEXT: .cfi_adjust_cfa_offset -20
				; X86-NEXT: xorl %ecx, %ecx
				; X86-NEXT: testl %edx, %edx
				; X86-NEXT: setns %cl
				; X86-NEXT: addl $2147483647, %ecx # imm = 0x7FFFFFFF
				; X86-NEXT: movl %edx, %esi
				; X86-NEXT: sarl $31, %esi
				; X86-NEXT: cmpl $0, (%esp)
				; X86-NEXT: cmovnel %esi, %eax
				; X86-NEXT: cmovnel %ecx, %edx
				; X86-NEXT: addl $4, %esp
				; X86-NEXT: .cfi_def_cfa_offset 12
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: popl %edi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 0);
				ret i64 %tmp;
				}

				define i4 @func6(i4 %x, i4 %y) nounwind {
				; X64-LABEL: func6:
				; X64: # %bb.0:
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: shlb $4, %sil
				; X64-NEXT: sarb $4, %sil
				; X64-NEXT: shlb $4, %al
				; X64-NEXT: # kill: def $al killed $al killed $eax
				; X64-NEXT: imulb %sil
				; X64-NEXT: seto %cl
				; X64-NEXT: xorl %edx, %edx
				; X64-NEXT: testb %al, %al
				; X64-NEXT: setns %dl
				; X64-NEXT: addl $127, %edx
				; X64-NEXT: movzbl %al, %eax
				; X64-NEXT: testb %cl, %cl
				; X64-NEXT: cmovnel %edx, %eax
				; X64-NEXT: sarb $4, %al
				; X64-NEXT: # kill: def $al killed $al killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: func6:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: shlb $4, %cl
				; X86-NEXT: sarb $4, %cl
				; X86-NEXT: movb {{[0-9]+}}(%esp), %al
				; X86-NEXT: shlb $4, %al
				; X86-NEXT: imulb %cl
				; X86-NEXT: seto %dl
				; X86-NEXT: xorl %ecx, %ecx
				; X86-NEXT: testb %al, %al
				; X86-NEXT: setns %cl
				; X86-NEXT: addl $127, %ecx
				; X86-NEXT: movzbl %al, %eax
				; X86-NEXT: testb %dl, %dl
				; X86-NEXT: cmovnel %ecx, %eax
				; X86-NEXT: sarb $4, %al
				; X86-NEXT: # kill: def $al killed $al killed $eax
				; X86-NEXT: retl
				%tmp = call i4 @llvm.smul.fix.sat.i4(i4 %x, i4 %y, i32 0);
				ret i4 %tmp;
				}

				define <4 x i32> @vec2(<4 x i32> %x, <4 x i32> %y) nounwind {
				; X64-LABEL: vec2:
				; X64: # %bb.0:
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[1,1,2,3]
				; X64-NEXT: movd %xmm2, %ecx
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[1,1,2,3]
				; X64-NEXT: movd %xmm2, %r8d
				; X64-NEXT: movl %r8d, %edx
				; X64-NEXT: imull %ecx, %edx
				; X64-NEXT: xorl %esi, %esi
				; X64-NEXT: testl %edx, %edx
				; X64-NEXT: setns %sil
				; X64-NEXT: addl $2147483647, %esi # imm = 0x7FFFFFFF
				; X64-NEXT: imull %ecx, %r8d
				; X64-NEXT: cmovol %esi, %r8d
				; X64-NEXT: movd %xmm1, %edx
				; X64-NEXT: movd %xmm0, %ecx
				; X64-NEXT: movl %ecx, %esi
				; X64-NEXT: imull %edx, %esi
				; X64-NEXT: xorl %edi, %edi
				; X64-NEXT: testl %esi, %esi
				; X64-NEXT: setns %dil
				; X64-NEXT: addl $2147483647, %edi # imm = 0x7FFFFFFF
				; X64-NEXT: imull %edx, %ecx
				; X64-NEXT: cmovol %edi, %ecx
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
				; X64-NEXT: movd %xmm2, %edx
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
				; X64-NEXT: movd %xmm2, %esi
				; X64-NEXT: movl %esi, %edi
				; X64-NEXT: imull %edx, %edi
				; X64-NEXT: xorl %eax, %eax
				; X64-NEXT: testl %edi, %edi
				; X64-NEXT: setns %al
				; X64-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X64-NEXT: imull %edx, %esi
				; X64-NEXT: cmovol %eax, %esi
				; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3]
				; X64-NEXT: movd %xmm1, %r9d
				; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]
				; X64-NEXT: movd %xmm0, %edx
				; X64-NEXT: movl %edx, %edi
				; X64-NEXT: imull %r9d, %edi
				; X64-NEXT: xorl %eax, %eax
				; X64-NEXT: testl %edi, %edi
				; X64-NEXT: setns %al
				; X64-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X64-NEXT: imull %r9d, %edx
				; X64-NEXT: cmovol %eax, %edx
				; X64-NEXT: movd %edx, %xmm0
				; X64-NEXT: movd %esi, %xmm1
				; X64-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; X64-NEXT: movd %ecx, %xmm0
				; X64-NEXT: movd %r8d, %xmm2
				; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
				; X64-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; X64-NEXT: retq
				;
				; X86-LABEL: vec2:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl %ecx, %esi
				; X86-NEXT: imull %edx, %esi
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: testl %esi, %esi
				; X86-NEXT: setns %al
				; X86-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X86-NEXT: imull %edx, %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: cmovol %eax, %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl %edx, %edi
				; X86-NEXT: imull %esi, %edi
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: testl %edi, %edi
				; X86-NEXT: setns %al
				; X86-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X86-NEXT: imull %esi, %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: cmovol %eax, %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl %esi, %ebx
				; X86-NEXT: imull %edi, %ebx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: testl %ebx, %ebx
				; X86-NEXT: setns %al
				; X86-NEXT: addl $2147483647, %eax # imm = 0x7FFFFFFF
				; X86-NEXT: imull %edi, %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: cmovol %eax, %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %edi, %ebp
				; X86-NEXT: imull %eax, %ebp
				; X86-NEXT: xorl %ebx, %ebx
				; X86-NEXT: testl %ebp, %ebp
				; X86-NEXT: setns %bl
				; X86-NEXT: addl $2147483647, %ebx # imm = 0x7FFFFFFF
				; X86-NEXT: imull %eax, %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: cmovol %ebx, %edi
				; X86-NEXT: movl %ecx, 12(%eax)
				; X86-NEXT: movl %edx, 8(%eax)
				; X86-NEXT: movl %esi, 4(%eax)
				; X86-NEXT: movl %edi, (%eax)
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				%tmp = call <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32> %x, <4 x i32> %y, i32 0);
				ret <4 x i32> %tmp;
				}

				define i64 @func7(i64 %x, i64 %y) nounwind {
				; X64-LABEL: func7:
				; X64: # %bb.0:
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: imulq %rsi
				; X64-NEXT: shrdq $32, %rdx, %rax
				; X64-NEXT: cmpq $2147483647, %rdx # imm = 0x7FFFFFFF
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: cmpq $-2147483648, %rdx # imm = 0x80000000
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: func7:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, %edi
				; X86-NEXT: movl %eax, %ebx
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: addl %edx, %ebx
				; X86-NEXT: adcl $0, %edi
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, %ebp
				; X86-NEXT: movl %eax, %ecx
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: addl %ebx, %eax
				; X86-NEXT: adcl %edi, %edx
				; X86-NEXT: adcl $0, %ebp
				; X86-NEXT: addl %ecx, %edx
				; X86-NEXT: adcl $0, %ebp
				; X86-NEXT: movl %edx, %ecx
				; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ebp, %esi
				; X86-NEXT: sbbl $0, %esi
				; X86-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: cmovnsl %ebp, %esi
				; X86-NEXT: cmovnsl %edx, %ecx
				; X86-NEXT: movl %ecx, %edx
				; X86-NEXT: subl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl %esi, %edi
				; X86-NEXT: sbbl $0, %edi
				; X86-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: cmovnsl %esi, %edi
				; X86-NEXT: cmovnsl %ecx, %edx
				; X86-NEXT: testl %edx, %edx
				; X86-NEXT: setg %cl
				; X86-NEXT: sets %ch
				; X86-NEXT: testl %edi, %edi
				; X86-NEXT: setg %bl
				; X86-NEXT: sete %bh
				; X86-NEXT: andb %ch, %bh
				; X86-NEXT: orb %bl, %bh
				; X86-NEXT: movl $2147483647, %esi # imm = 0x7FFFFFFF
				; X86-NEXT: cmovnel %esi, %edx
				; X86-NEXT: movl $-1, %esi
				; X86-NEXT: cmovnel %esi, %eax
				; X86-NEXT: cmpl $-1, %edi
				; X86-NEXT: setl %ch
				; X86-NEXT: sete %bl
				; X86-NEXT: andb %cl, %bl
				; X86-NEXT: xorl %esi, %esi
				; X86-NEXT: orb %ch, %bl
				; X86-NEXT: cmovnel %esi, %eax
				; X86-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: cmovnel %ecx, %edx
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 32);
				ret i64 %tmp;
				}

				define i64 @func8(i64 %x, i64 %y) nounwind {
				; X64-LABEL: func8:
				; X64: # %bb.0:
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: imulq %rsi
				; X64-NEXT: shrdq $63, %rdx, %rax
				; X64-NEXT: movabsq $4611686018427387903, %rcx # imm = 0x3FFFFFFFFFFFFFFF
				; X64-NEXT: cmpq %rcx, %rdx
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: movabsq $-4611686018427387904, %rcx # imm = 0xC000000000000000
				; X64-NEXT: cmpq %rcx, %rdx
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: func8:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, %edi
				; X86-NEXT: movl %eax, %ebx
				; X86-NEXT: movl %ecx, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, %ebp
				; X86-NEXT: addl %ebx, %ebp
				; X86-NEXT: adcl $0, %edi
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: imull {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, %ebx
				; X86-NEXT: movl %eax, %ecx
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: mull {{[0-9]+}}(%esp)
				; X86-NEXT: addl %ebp, %eax
				; X86-NEXT: adcl %edi, %edx
				; X86-NEXT: adcl $0, %ebx
				; X86-NEXT: addl %ecx, %edx
				; X86-NEXT: adcl $0, %ebx
				; X86-NEXT: movl %edx, %ecx
				; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ebx, %esi
				; X86-NEXT: sbbl $0, %esi
				; X86-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: cmovnsl %ebx, %esi
				; X86-NEXT: cmovnsl %edx, %ecx
				; X86-NEXT: movl %ecx, %edi
				; X86-NEXT: subl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl %esi, %ebx
				; X86-NEXT: sbbl $0, %ebx
				; X86-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: cmovnsl %esi, %ebx
				; X86-NEXT: cmovnsl %ecx, %edi
				; X86-NEXT: movl %ebx, %edx
				; X86-NEXT: shldl $1, %edi, %edx
				; X86-NEXT: shrdl $31, %edi, %eax
				; X86-NEXT: cmpl $1073741823, %ebx # imm = 0x3FFFFFFF
				; X86-NEXT: movl $2147483647, %ecx # imm = 0x7FFFFFFF
				; X86-NEXT: cmovgl %ecx, %edx
				; X86-NEXT: movl $-1, %ecx
				; X86-NEXT: cmovgl %ecx, %eax
				; X86-NEXT: xorl %ecx, %ecx
				; X86-NEXT: cmpl $-1073741824, %ebx # imm = 0xC0000000
				; X86-NEXT: cmovll %ecx, %eax
				; X86-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: cmovll %ecx, %edx
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 63);
				ret i64 %tmp;
				}

llvm/test/CodeGen/X86/smul_fix_sat_constants.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64

				; Verify expansion by using constant values. We just want to cover all the paths layed out by ExpandIntRes_MULFIX.

				declare i4 @llvm.smul.fix.sat.i4 (i4, i4, i32)
				declare i32 @llvm.smul.fix.sat.i32 (i32, i32, i32)
				declare i64 @llvm.smul.fix.sat.i64 (i64, i64, i32)
				declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)
				declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64)

				define i64 @func() nounwind {
				; X64-LABEL: func:
				; X64: # %bb.0:
				; X64-NEXT: movl $2, %ecx
				; X64-NEXT: movl $3, %eax
				; X64-NEXT: imulq %rcx
				; X64-NEXT: shrdq $2, %rdx, %rax
				; X64-NEXT: cmpq $1, %rdx
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: cmpq $-2, %rdx
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 3, i64 2, i32 2);
				ret i64 %tmp;
				}

				define i64 @func2() nounwind {
				; X64-LABEL: func2:
				; X64: # %bb.0:
				; X64-NEXT: movl $3, %eax
				; X64-NEXT: imulq $2, %rax, %rcx
				; X64-NEXT: xorl %edx, %edx
				; X64-NEXT: testq %rcx, %rcx
				; X64-NEXT: setns %dl
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: addq %rdx, %rcx
				; X64-NEXT: imulq $2, %rax, %rax
				; X64-NEXT: cmovoq %rcx, %rax
				; X64-NEXT: retq
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 3, i64 2, i32 0);
				ret i64 %tmp;
				}

				define i64 @func3() nounwind {
				; X64-LABEL: func3:
				; X64: # %bb.0:
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movq %rcx, %rax
				; X64-NEXT: imulq %rdx
				; X64-NEXT: shrdq $2, %rdx, %rax
				; X64-NEXT: cmpq $1, %rdx
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: cmpq $-2, %rdx
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 9223372036854775807, i64 2, i32 2);
				ret i64 %tmp;
				}

				define i64 @func4() nounwind {
				; X64-LABEL: func4:
				; X64: # %bb.0:
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movq %rcx, %rax
				; X64-NEXT: imulq %rdx
				; X64-NEXT: shrdq $32, %rdx, %rax
				; X64-NEXT: cmpq $2147483647, %rdx # imm = 0x7FFFFFFF
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: cmpq $-2147483648, %rdx # imm = 0x80000000
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 9223372036854775807, i64 2, i32 32);
				ret i64 %tmp;
				}

				define i64 @func5() nounwind {
				; X64-LABEL: func5:
				; X64: # %bb.0:
				; X64-NEXT: movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movq %rcx, %rax
				; X64-NEXT: imulq %rdx
				; X64-NEXT: shrdq $63, %rdx, %rax
				; X64-NEXT: movabsq $4611686018427387903, %rsi # imm = 0x3FFFFFFFFFFFFFFF
				; X64-NEXT: cmpq %rsi, %rdx
				; X64-NEXT: cmovgq %rcx, %rax
				; X64-NEXT: movabsq $-4611686018427387904, %rcx # imm = 0xC000000000000000
				; X64-NEXT: cmpq %rcx, %rdx
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmovlq %rcx, %rax
				; X64-NEXT: retq
				%tmp = call i64 @llvm.smul.fix.sat.i64(i64 9223372036854775807, i64 2, i32 63);
				ret i64 %tmp;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsic] Signed Fixed Point Saturation Multiplication IntrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 199655

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/Target/TargetSelectionDAG.td

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/IR/Verifier.cpp

llvm/test/CodeGen/X86/smul_fix_sat.ll

llvm/test/CodeGen/X86/smul_fix_sat_constants.ll

[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
ClosedPublic