This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsic] Signed Fixed Point Multiplication Intrinsic
ClosedPublic

Authored by leonardchan on Nov 19 2018, 12:28 PM.

Details

Summary

Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them.

This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

leonardchan created this revision.Nov 19 2018, 12:28 PM

Need to update LangRef.rst which I think we also missed for the saturating intrinsics.

llvm/include/llvm/CodeGen/ISDOpcodes.h
278 ↗(On Diff #174669)

This is says it saturates but I didn't see that implemented in the expansion function.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
625 ↗(On Diff #174669)

Get rid of Op1 and Op2 and just do GetPromotedInteger(N->getOperand(0))

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5093 ↗(On Diff #174669)

Shift amount constants should get their type from getShiftAmountTy.

5096 ↗(On Diff #174669)

This is really an OR isn't it? DAG combiner will turn it into that so might as well just use OR.

5097 ↗(On Diff #174669)

No need for an else after a return.

llvm/lib/IR/Verifier.cpp
4531 ↗(On Diff #174669)

argumenr->argument

4532 ↗(On Diff #174669)

I think you need to check that Op3 is a ConstantInt as well. And that it fits in 32 bits.

leonardchan marked 7 inline comments as done.

Need to update LangRef.rst which I think we also missed for the saturating intrinsics.

Funny, I actually just uploaded a patch for the LangRef docs on the other intrinsics (https://reviews.llvm.org/D54729). Also added the docs for this intrinsic in this patch.

llvm/include/llvm/CodeGen/ISDOpcodes.h
278 ↗(On Diff #174669)

My bad, this intrinsic doesn't perform saturation.

bjope added inline comments.Nov 20 2018, 4:29 AM
llvm/docs/LangRef.rst
12604 ↗(On Diff #174703)

I'm not sure if/how the auto-vectorizers are aware that only the first two operands should be vectorized and that the third operand should stay scalar. So I guess it is unlikely that we ever get any vector operands at the moment (except for handwritten lit tests).

So I guess this should work for now (after all it is simpler than having a vector of scales). But perhaps we need to allow the scale to be given as a vector as well in the future to support vectorization (shl is for example similar, but afaik it will get the shift counts as a vector when being vectorized).

llvm/include/llvm/CodeGen/TargetLowering.h
801 ↗(On Diff #174703)

Shouldn't this say "scale" instead of "saturation bit width"?

819 ↗(On Diff #174703)

Same as above, isn't the check about "scales" and not "saturation widths"?

llvm/include/llvm/IR/Intrinsics.td
742 ↗(On Diff #174703)

nit: In the langref you are using "Saturation Arithmetic Intrinsics" and "Fixed Point Arithemetic Instrinsics" as two separate chapters. And here we put all of them into one category "Fixed Point Intrinsics". I'm not sure if the saturating arithmetics should be in a fixed point category.

Anyway, when adding a int_smul_fix_sat later I guess it will be in the "Fixed Point Arithemetic Instrinsics" in the langref (even if it also is saturating). So maybe it is better to not having this split into two categories after all.

RKSimon added inline comments.Nov 20 2018, 5:33 AM
llvm/test/CodeGen/X86/smul_fix.ll
3 ↗(On Diff #174703)

You should be able to drop -mcpu, also please can you use -check-prefix=X64 and -check-prefix=X86 ?

9 ↗(On Diff #174703)

Add nounwind to all the tests to reduce stack codegen?

ebevhan added inline comments.Nov 20 2018, 6:31 AM
llvm/docs/LangRef.rst
12628 ↗(On Diff #174703)

typo "up"

12629 ↗(On Diff #174703)

This doc doesn't say what happens on overflow. Truncation/wraparound? Or should we consider it undefined?

If it says that the rounding is dependent on the target, will the rounding mode be target-configurable and exposed through TLI/TTI? We essentially can't touch these intrinsics (constant folding, optimization, even legalization) otherwise.

Looking at the legalization sequence, it will definitely lower these into a round-down form. If a target has some legal operations which round up and some non-legal operations, then the legal ones will round up and the non-legal ones will round down. That's sort of messy.

It might be safer to say that the rounding is indeterminate, but that's even worse for optimization.

llvm/include/llvm/CodeGen/ISDOpcodes.h
278 ↗(On Diff #174703)

The intrinsic docs in the LangRef mentions that the last value must be constant, but this comment doesn't.

llvm/include/llvm/CodeGen/TargetLowering.h
3785 ↗(On Diff #174703)

This comment clashes with the assertions in the function. It doesn't take vectors.

llvm/include/llvm/Target/TargetSelectionDAG.td
383 ↗(On Diff #174703)

It's marked SDTIntBinOp, but is supposed to have three input operands. I think these nodes might need a new SDT.

Also, multiplication is obviously commutative, but I don't know if SDNPCommutative works on nodes that have anything except two operands. It might not have an effect at all. Someone who knows more about DAG might have more info on that.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2543 ↗(On Diff #174703)

Maybe I'm missing something here, but isn't this just a normal, expanded multiplication?

2544 ↗(On Diff #174703)

If we hit this return, doesn't this mean that legalization failed? Do we want to catch this here?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5101 ↗(On Diff #174703)

Could use a couple comments explaining what we're doing with the values/SRL/SHL.

Does this work if MULHS in VT is of dubious legality?

craig.topper added inline comments.Nov 20 2018, 10:16 AM
llvm/include/llvm/Target/TargetSelectionDAG.td
383 ↗(On Diff #174703)

Tablegen should use the first two operands as the commutative ones. It used to be an error or an assertion. But I changed it sometime in the last year or so to make it work for FMA in X86.

craig.topper added inline comments.Nov 20 2018, 10:18 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5101 ↗(On Diff #174703)

Ideally we'd use MUL_LOHI if the target supports it. That should allow X86 to use a single multiply instruction in the test cases.

leonardchan marked 15 inline comments as done.
leonardchan added inline comments.
llvm/docs/LangRef.rst
12629 ↗(On Diff #174703)

Overflow I was going to leave as undefined behavior. Added this into the docs.

By target dependent, I meant that whatever legal operation a target could replace this intrinsic with has the choice of rounding up or down. I'm not sure how bad this would be for optimization, but I imagine rounding isn't something that needs to be touched immediately since with this intrinsic, I'm just attempting to match what the standard says. I think one of the conclusions we also came up to in the long thread on llvm-dev was that, for now, we don't need to do more than what's necessary to implement the spec.

I changed the docs though to say rounding is indeterminable, as the spec says.

llvm/include/llvm/IR/Intrinsics.td
742 ↗(On Diff #174703)

Split into 2 categories. Was also planning on putting the saturating fixed point ones under "Fixed Point Intrinsics".

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2543 ↗(On Diff #174703)

This should actually be an expansion of SMUL_LOHI since we need the high bits when scaling down the result after multiplication. Updated the code, but one thing I found was that expandMUL_LOHI depends on ADDC but does not have a check for it. When rerunning the tests, it seems that X86 does not support 32 bit ADDE (and ADDC).

LLVM ERROR: Cannot select: t81: i32,glue = adde t121, t44:1, t80:1

so I removed these from the lit tests, but if we want to support expansion of this, I'm not sure if the solution is to add a check into expandMUL_LOHI to check isOperationLegalOrCustom(ISD::ADDC/E, VT) and if it is not legal, write out the expanded version. I found a comment in ExpandIntRes_ADDSUB() saying that there currently does not appear to be a way of generating a value of MTV::Glue if we were to expand the result.

2544 ↗(On Diff #174703)

Added a fatal error report here.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5101 ↗(On Diff #174703)

Added checks to see if we can use MULHS or SMUL_LOHI.

ebevhan added inline comments.Nov 21 2018, 1:37 AM
llvm/docs/LangRef.rst
12629 ↗(On Diff #174703)

Okay. If we claim that it's indeterminate I guess we can still fold however we like, but the results would seem a bit inconsistent.

12635 ↗(On Diff #174858)

I think "operation" might be clearer than "conversion". Or don't mention it at all and just say "It is undefined behavior if the source value does not fit within the range of the fixed point type."

The rounding section is also a bit wordy. "If the result value cannot be precisely represented in the given scale, the value is rounded up or down to the closest representable value. The rounding direction is unspecified."

Also, my choice of the word "indeterminate" was a bit unfortunate, I think the rest of the LangRef uses "unspecified".

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2543 ↗(On Diff #174703)

It's a bit odd that x86 doesn't support ADDC/ADDE. Those nodes are deprecated in favor of ADDCARRY, so expandMUL_LOHI should probably be making expansions with ADDCARRY instead.

@craig.topper probably knows more about this.

llvm/test/CodeGen/X86/smul_fix.ll
28 ↗(On Diff #174858)

Interesting that the 32-bit target produces better code in most cases.

leonardchan marked 6 inline comments as done.

I think @eli.friedman might know more about the ADDE/ADDC vs ADDCARRY than I do.

Do we have examples of real hardware that implements this sort of instruction?

Right now this intrinsic looks like its just reimplementing what you would get if you just emitted this IR

%a = sext iX %x to i2X
%b = sext iX %y to i2X
%c = mul i2X %a, %b
%d = lshr i2X %c, PRECISION
%e = trunc i2X %d to iX

So I'm not sure if there's a need for an intrinsic unless that pattern is difficult to match to an instruction.

Do we have examples of real hardware that implements this sort of instruction?

Right now this intrinsic looks like its just reimplementing what you would get if you just emitted this IR

%a = sext iX %x to i2X
%b = sext iX %y to i2X
%c = mul i2X %a, %b
%d = lshr i2X %c, PRECISION
%e = trunc i2X %d to iX

So I'm not sure if there's a need for an intrinsic unless that pattern is difficult to match to an instruction.

I think @ebevhan and @bjope may have hardware with fixed point instructions

Do we have examples of real hardware that implements this sort of instruction?

Right now this intrinsic looks like its just reimplementing what you would get if you just emitted this IR

%a = sext iX %x to i2X
%b = sext iX %y to i2X
%c = mul i2X %a, %b
%d = lshr i2X %c, PRECISION
%e = trunc i2X %d to iX

So I'm not sure if there's a need for an intrinsic unless that pattern is difficult to match to an instruction.

I think @ebevhan and @bjope may have hardware with fixed point instructions

Yes, our HW support various kinds of fixed point multiplication (such as llvm.smul.fix.i32(i32 %x, i32 %y, i32 31) and llvm.smul.fix.i24(i24 %x, i24 %y, i32 15)). We already have an intrinsic like llvm.smul.fix downstream.

For our downstream target we keep the intrinsic all the way to ISel. When compiling for other targets we have so far used an IR pass that transform the intrinsic to regular IR early in opt. I'm afraid that if we do that kind of expansion also for our target, we would introduce mul/lshr using non-legal types like i64 or i48. If we can't successfully match the IR back to a native instruction we will get really bad performance by doing the multiplication using the non-legal types. I'm also afraid that the pattern matching easily can be broken, for example by an IR pass that sinks the trunc to a later BB.

efriedma added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2543 ↗(On Diff #174703)

Yes, ADDC/ADDE are deprecated, and only work on targets which specifically implement lowering support, so there are a lot of targets where they simply can't be used (either because the target uses ADDCARRY instead, or because the operation simply doesn't exist on the target). expandMUL_LOHI is only used in very few places, though, so I guess we haven't hit this issue before.

There are a few different ways to expand an ADD; see DAGTypeLegalizer::ExpandIntRes_ADDSUB. Ideally, we should just generate ADDCARRY and let legalization take care of the rest, but I don't think ADDCARRY legalization is currently implemented. Probably not hard to implement, though. (Maybe keep the existing ADDC/ADDE code for targets which use them.)

leonardchan marked an inline comment as done.Nov 28 2018, 5:47 PM
leonardchan added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2543 ↗(On Diff #174703)

I changed expandMUL_LOHI to use ADDCARRY if ADDC and ADDE are not available, and the corresponding expansion in the DAGLegalizer.

*ping* Does anyone have any more comments on this patch?

@craig.topper @RKSimon @efriedma Any more comments on this patch?

You might need PromoteIntOp_SMULFIX for certain targets, like 64-bit RISCV.

llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

"The rounding direction is unspecified" seems scary to me... is there really no standard for rounding these operations?

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
624 ↗(On Diff #175795)

Do you need to sign-extend here?

3290 ↗(On Diff #175795)

Maybe worth adding a comment that the operand that needs to be expanded must be the third operand, since the other two operands have the same type as the result.

And actually, I'm not sure how you would hit this in practice; it seems unusual to split a boolean value.

ebevhan added inline comments.Dec 4 2018, 3:39 AM
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

The Embedded-C standard says that the rounding direction is implementation-defined, and is allowed to be indeterminable.

I also think it's a bit unfortunate that we can't express the rounding direction, since it inhibits optimization, but locking it down to either rounding up or down will make things annoying for any target with operations that round the other direction.

Saying that it's unspecified lets us properly legalize this to something sensible while allowing targets to implement it with legal operations if they have them.

For what it's worth, the legalization will make it round down (toward negative infinity).

bjope added inline comments.Dec 4 2018, 6:57 AM
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

Maybe it should say "target specific" instead? And then the legalization (and any constant folding etc) could use some target transform hook to decide in which direction to do the rounding (I guess it could be set to up/down/any).

Does "unspecified" mean that ConstantFolding use one strategy for rounding and legalization another strategy? Or should we only allow constant folding when rounding isn't a problem (such as multiply by zero)?

Things could still be annoying for a target that does care about rounding. Or maybe it isn't allowed for a target to decide what the implementation-defined behavior should be? Or should it be the same for all targets?

A first implementation could of course go for only supporting "unspecified" (if it is clear what kind of constant folding and other optimizations that are allowed). If a target needs a "target specific" behavior, then I assume the code could be sprinkled with hooks at a later stage.

leonardchan marked 11 inline comments as done.

You might need PromoteIntOp_SMULFIX for certain targets, like 64-bit RISCV.

Added, although RISCV doesn't seem to support [US]MUL_LOHI, so I couldn't add a test for that.

llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

Maybe it should say "target specific" instead? And then the legalization (and any constant folding etc) could use some target transform hook to decide in which direction to do the rounding (I guess it could be set to up/down/any).

I would be ok with adding this later. For now I was focusing on laying down the basic code necessary for just using this intrinsic.

Does "unspecified" mean that ConstantFolding use one strategy for rounding and legalization another strategy? Or should we only allow constant folding when rounding isn't a problem (such as multiply by zero)?

I initially had it as "target specific" but changed it to "unspecified" since the rest of the LangRef doc seems to use this term for describing either implementation-defined, or undefined, behavior.

Things could still be annoying for a target that does care about rounding. Or maybe it isn't allowed for a target to decide what the implementation-defined behavior should be? Or should it be the same for all targets?
A first implementation could of course go for only supporting "unspecified" (if it is clear what kind of constant folding and other optimizations that are allowed). If a target needs a "target specific" behavior, then I assume the code could be sprinkled with hooks at a later stage.

The hooks are a good idea. I figure for optimizations, depending on what's supported (up/down/any), InstCombine and others can use the hook to determine rounding direction.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
624 ↗(On Diff #175795)

Yup

3290 ↗(On Diff #175795)

Oh. You're right, this shouldn't have happened in the first place. When expanding to use ADDCARRY in expandMUL_LOHI, I accidentally created the constant with a VT that was the same as the operands instead of an MTV::i1.

Fixed this, and this method is no longer necessary.

efriedma added inline comments.Dec 4 2018, 3:45 PM
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

Would it make sense to make the rounding mode a parameter to the intrinsic? That way, frontends can choose the semantics they want, and the semantics are clear throughout the optimization pipeline. clang could choose the rounding mode in a target-specific manner if necessary.

leonardchan marked an inline comment as done.Dec 4 2018, 3:56 PM
leonardchan added inline comments.
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

The only thing that I think would be of concern is just having too many parameters, or having an extra intrinsic to dictate rounding direction. Other than this, I have no other strong opinions regarding this.

I imagine though that having an extra parameter/intrinsic to specify rounding would imply that a particular target has the option of specifying rounding in both directions, when really a target would likely just have native rounding in one direction (although this is just an assumption).

*ping* @craig.topper @RKSimon Any more comments on this patch?

craig.topper added inline comments.Dec 5 2018, 1:03 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2568 ↗(On Diff #176703)

Just get N->getConstantOperandVal(2)?

2599 ↗(On Diff #176703)

This result is only used by one of the 2 ifs below. Should it be pulled into that if.

Can you add a comment to describe what's happening here?

leonardchan marked 2 inline comments as done.
leonardchan marked an inline comment as done.Dec 5 2018, 5:21 PM
leonardchan added inline comments.
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

@ebevhan @bjope Do you think an extra parameter/intrinsic is worth adding also in this patch?

craig.topper added inline comments.Dec 6 2018, 11:14 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2592 ↗(On Diff #176874)

if Scale is grater than NVTSize then this number is negative.

2607 ↗(On Diff #176874)

Why is ResultLL being shifted right twice? I don't think I understand that.

2624 ↗(On Diff #176874)

If Scale == NVTSize then the result is exactly in ResultHL and ResultLH, but nothing can prevent at least 1 bit of ResultHH from being used here. The shift amount for the ResultHH shift must be between 0 and NVTSize-1 to not be undefined.

leonardchan marked 5 inline comments as done.
leonardchan added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2607 ↗(On Diff #176874)

My bad. It shouldn't have shifted twice.

2624 ↗(On Diff #176874)

Done. Added test case for this also.

ebevhan added inline comments.Dec 7 2018, 1:15 AM
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

It's definitely handy to be able to express the rounding of the operations in the intrinsic, but I think some of the downsides are:

  • The number of parameters grows, making the intrinsic more unwieldy.
  • There are other roundings than just up and down; rounding towards or away from zero (the former of which I think the fixed-point division ought to try and always follow) are also possibilities.
  • We need to support legalization of every rounding mode we add to the intrinsic. Then again, if we add a hook we need to do the same anyway.
  • The legalization checks need to check both rounding and scale.

And as you say, it's more likely that the rounding of a specific target should be the same across the board rather than be configurable per operation, if it has support for fixed-point.

ebevhan added inline comments.Dec 7 2018, 1:33 AM
llvm/docs/LangRef.rst
12830 ↗(On Diff #175795)

Ignore what I said about the division rounding; I was totally off mark there.

I think we can just keep it unspecified for now and if the need arises, add rounding mode with a hook.

Use getConstantOperandVal where possible

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1132 ↗(On Diff #177032)

unsigned Scale = Node->getConstantOperandVal(2);

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
418 ↗(On Diff #177032)

unsigned Scale = Node->getConstantOperandVal(2);

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5210 ↗(On Diff #177032)

unsigned Scale = Node->getConstantOperandVal(2);

leonardchan marked 9 inline comments as done.

*ping* Any more comments on this patch?

This revision is now accepted and ready to land.Dec 11 2018, 4:34 PM
This revision was automatically updated to reflect the committed changes.
lebedev.ri added inline comments.
llvm/trunk/docs/LangRef.rst
12820–12829

To someone unfamiliar with what fixed-point math is, this is somewhat vague.

Would it please be possible to reword this with some more details?
Overview section in https://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic is a nice example of it would look best.

In particular, it isn't all that obvious how scale interacts with lhs/rhs.
My current guess:

%r = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %c)
  =>
%a2 = sext i4 %a to i8
%b2 = sext i4 %b to i8
%mul = mul nsw nuw i8 %a, %b
%c2 = trunc i32 %c to i8
%scale = ashr i8 %mul, i8 %c ; does not convey the randomness of rounding though
%r = trunc i8 %scale to i4
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2019, 8:18 AM