This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
ClosedPublic

Authored by leonardchan on Dec 14 2018, 2:19 PM.

Details

Summary

Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

leonardchan created this revision.Dec 14 2018, 2:19 PM

Nothing sticks out to me, so I think it looks good. Hard to tell if there are any sneaky edge cases in the lowering steps, though.

Maybe you should rebase this on top of the unsigned patch, since they're touching all the same places. Or are you waiting for it to land?

Nothing sticks out to me, so I think it looks good. Hard to tell if there are any sneaky edge cases in the lowering steps, though.

Maybe you should rebase this on top of the unsigned patch, since they're touching all the same places. Or are you waiting for it to land?

Yeah, I figure it would be better to submit these as separate patches since they're still technically independent of each other and ideally makes it easier for it to review.

@bjope @craig.topper @RKSimon Any comments on this patch?

RKSimon added inline comments.Jan 9 2019, 2:05 PM
llvm/test/CodeGen/X86/smul_fix_sat.ll
45

nounwind

leonardchan marked an inline comment as done.

Should https://reviews.llvm.org/D56987 be a parent for this? Then you'd need to rebase getExpandedFixedPointMultiplication since that has changed into converting into MUL when scale is zero (that is not valid for saturation).

Should https://reviews.llvm.org/D56987 be a parent for this? Then you'd need to rebase getExpandedFixedPointMultiplication since that has changed into converting into MUL when scale is zero (that is not valid for saturation).

Rebased

RKSimon added inline comments.Jan 30 2019, 1:08 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5490

(style) Do an early out to reduce indentation

if (!Saturating)
  return Result;
leonardchan marked an inline comment as done.

Updated and rebased

Please rebase after D55625 lands

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 5:42 AM

Updated and rebased

RKSimon added inline comments.Feb 4 2019, 10:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5427–5446

You've changed the logic to let non-vector cases to fall through, which leads to UNDEFs for scale == 0 cases.

leonardchan marked an inline comment as done.
bjope added inline comments.Feb 7 2019, 12:02 AM
llvm/lib/CodeGen/TargetLoweringBase.cpp
627

I'm not sure how to do this when overriding for a specific target. In our case we want it to be legal, but only when the scale is 15 (and VT is i16 or i24) or the scale is 31 (and VT is i32 or i40). Is there some easy solution for that? Setting it to legal/custom for any scale might be seen as an indication for optimizers that it is OK to introduce these operations for any scale.

This is however a general comment, also for the already pushed non-saturating versions. So it isn't anything that you need to deal with in this patch. But we might need a better solution in the long term.

ebevhan added inline comments.Feb 7 2019, 12:33 AM
llvm/lib/CodeGen/TargetLoweringBase.cpp
627

That's what the isSupportedFixedPointOperation hook in TargetLowering is for.

bjope added inline comments.Feb 7 2019, 12:40 AM
llvm/test/CodeGen/X86/smul_fix_sat.ll
2

The expansion is quite complicated (the splitting in four parts and detecting overflow etc).
Isn't there a risk that X86 is a typical target that will try to find more optimal solutions and maybe also make SMULFIXSAT legal? Then this test case might not really verify the expand code any longer?

On the other hand, these test cases are just jibberish to me anyway. I can't tell from looking at the checks that DAGTypeLegalizer::ExpandIntRes_MULFIX is doing the right thing. And it would not really help if using another target. Are there perhaps other ways to test DAGTypeLegalizer, such as unit tests?

One thing that probably can be done quite easily is to a bunch of tests using constant operands. Verifying that DAGCombiner will constant fold to the expected result after having expanded into legal operations (somehow making sure that DAGCombiner do not constant fold the SMULFIXSAT before it has been expanded, I guess someone will add such DAGCombines sooner or later). That way you might be able to get coverage for all paths through DAGTypeLegalizer::ExpandIntRes_MULFIX. Maybe this test should be in a separate test file.

bjope added inline comments.Feb 7 2019, 1:13 AM
llvm/lib/CodeGen/TargetLoweringBase.cpp
627

Ah, yes! And that is updated in this patch. Just me being blind (in combination with a some amnesia).

RKSimon added inline comments.Feb 7 2019, 4:54 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5420

Why drop the assert? Why not just add ISD::SMULFIXSAT tests?

5428

I think you need something like this here (please double check my logic):

if (VT.isVector() && !isOperationLegalOrCustom(ISD::SMULO, VT) &&
    !(!Saturating && isOperationLegalOrCustom(ISD::MUL, VT)))
  return SDValue(); // unroll

And that will let you avoid the return SDValue() below by always defaulting to a scalar ISD::MUL/ISD::SMULO that legalization can handle.

RKSimon added inline comments.Feb 8 2019, 11:05 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5428

I've committed rL353546 which /should/ mean that the scale==0 case is now safe to drop through.

leonardchan marked 6 inline comments as done.

Fixed mistake related to saturation during expansion. When we promote the operand and result type widths, this also changes the saturation width and affects the min/max values we compare against. This is easily solved by also shifting one of the operands after extension.

Oh I forgot to submit these inline comments

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5428

I thought we still want to allow vectors to pass to calls to MUL and SMULO? Wouldn't this scalarize when we disallow vectors even if MUL and SMULO are legeal?

llvm/test/CodeGen/X86/smul_fix_sat.ll
2

Yeah I can see how these tests are hard to read. I wasn't aware of other ways this could be tested other than making sure the codegen is the same each time. I have my own scripts with different cases to verify the output is correct, but wasn't sure of any existing widely used method of "taking my IR, running it, and verify the results".

Testing with constant operands seems to produce better looking tests for non-saturating multiplication:

define i4 @func() {
; X64-LABEL: func:
; X64:       # %bb.0:
; X64-NEXT:    movb $3, %al 
; X64-NEXT:    retq
  %tmp = call i4 @llvm.smul.fix.i4( i4 3, i4 2 , i32 1)
  ret i4 %tmp
}

where we can immediately tell the result is 3, but there's still branching in the saturating case:

define i4 @func2() {
; X64-LABEL: func2:
; X64:       # %bb.0:
; X64-NEXT:    xorl %eax, %eax
; X64-NEXT:    testb %al, %al 
; X64-NEXT:    movb $127, %cl 
; X64-NEXT:    jg .LBB1_2
; X64-NEXT:  # %bb.1:
; X64-NEXT:    movb $3, %cl 
; X64-NEXT:  .LBB1_2:
; X64-NEXT:    movb $-1, %al 
; X64-NEXT:    negb %al 
; X64-NEXT:    movb $-128, %al 
; X64-NEXT:    jl .LBB1_4
; X64-NEXT:  # %bb.3:
; X64-NEXT:    movl %ecx, %eax
; X64-NEXT:  .LBB1_4:
; X64-NEXT:    retq
  %tmp = call i4 @llvm.smul.fix.sat.i4( i4 3, i4 2 , i32 1)
  ret i4 %tmp
}

so we can't get something as straightforward as with non-saturating.

leonardchan marked an inline comment as done.
leonardchan added inline comments.
llvm/test/CodeGen/X86/smul_fix_sat.ll
2

@bjope I added another test file that covers the saturation branches in ExpandIntRes_MULFIX using constant operands, although this doesn't seem to produce anything more readable than with variable operands.

bjope added inline comments.Feb 22 2019, 4:27 PM
llvm/test/CodeGen/X86/smul_fix_sat.ll
2

Maybe it doesn't fold due to lack of constant folding for SMUL_LOHI (at least not for x86). What a pity.

I tried running the test using -mtriple=x86_64--, that at least produce code that is easier to map to the expansion.

I also tried some other targets:

-mtriple=ppc32  => looks like we get some constant folding here
-mtriple=ppc64 => asserts in llvm::SelectionDAG::transferDbgValues
-mtriple=hexagon => asserts in llvm::SelectionDAG::transferDbgValues
-mtriple=systemz => asserts in llvm::SelectionDAG::transferDbgValues
-mtriple=sparc => LLVM ERROR: Cannot select: t42: i32,i32 = addcarry t41:1, Constant:i32<0>, t90:1

(FWIW, no idea if the asserts and LLVM ERROR actually is related to your patch)

leonardchan marked an inline comment as done.

*ping*

llvm/test/CodeGen/X86/smul_fix_sat.ll
2

Updated the test to use -mtriple=x86_64-linuxand it looks a lot more readable.

bjope added inline comments.Mar 7 2019, 8:10 AM
llvm/test/CodeGen/X86/smul_fix_sat.ll
2

Have you looked at the problem with asserts in llvm::SelectionDAG::transferDbgValues?
It happens when expanding smulfixsat, so something seems to be broken regarding the legalization (depending on target used).

leonardchan marked an inline comment as done.
leonardchan added inline comments.
llvm/test/CodeGen/X86/smul_fix_sat.ll
2
  • For addcarry, the problem seems to be that ISD::ADDCARRY is not supported on some 32 bit targets. The fix for this is just a check in expandMUL_LOHI to see if this operation is legal (https://reviews.llvm.org/D59119).
  • For llvm::SelectionDAG::transferDbgValues, this is because expandFixedPointMul returns an empty SDValue() to indicate this function failed due to some unsupported operation (most likely ISD::SMULO). I imagine the simplest solution for this is to just report_fatal_error since we do not have other operations we can use to perform saturation multiplication.

Sorry for the holdup.

@bjope D61411 addresses the LLVM_ERROR from expanding ADDCARRY, so now we can compile the test for all triples you brought up before. Also updated the tests to reflect these changes, and am still able to confirm on my end that the intrinsic produces the correct results.

leonardchan marked 2 inline comments as done.May 9 2019, 4:05 PM
leonardchan added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
5428

Dropped and can confirm this works for my tests

bjope added inline comments.May 10 2019, 12:07 PM
llvm/docs/LangRef.rst
13235

I think this part about unspecified rounding direction either need to be explained more thoroughly somewhere, or we need to find a way to make it possible to specify it. Since the same problem already exist for smul.fix and umul.fix this shouldn't neccessarily be a stopper for this patch,. But I think it poses some problems that there sometimes are two correct results.

Are we for example supposed to prohibit constant folding (allowing backend targets implement a specific rounding scheme)?
Then I guess we can't implement the promotion/legalization part either, at least not without specifying which rounding scheme the legalization will use. It would be weird to prohibit constant folding in the first place, while at the same time legalization into generic ISD operations might result in DAG combiner actually ending up constant folding the expression.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4215

Is this newline by mistake? Seems unrelated to the patch.

Ka-Ka added a subscriber: Ka-Ka.May 13 2019, 1:56 PM
leonardchan marked 4 inline comments as done.May 14 2019, 12:51 PM
leonardchan added inline comments.
llvm/docs/LangRef.rst
13235

Hmm. I'm starting to regret not specifying an argument for rounding when making these intrinsics. Would there be any large consequences for changing the intrinsics to accept a 4th argument for rounding?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4215

Accidental newline

leonardchan marked an inline comment as done.
bjope added inline comments.May 15 2019, 7:35 AM
llvm/docs/LangRef.rst
13185

Do we need a new "chapter" for this?

Maybe we can just continue the "Fixed Point Arithmetic Intrinsics" chapter here, and skip the general description below that only refers to "Fixed Point Arithmetic Intrinsics" and "Saturation Arithmetic".

If needed, we can add the "(see Saturation Arithmetic)" somewhere in the semantic description of llvm.smul.fix.sat.* instead.

13235

Had a short discussion with @ebevhan about this (offline).

Adding the 4th argument for rounding would

  • make things more clear (avoiding "unspecified")
  • make things more complicated (how many rounding modes should be supported? do we need to support folding/promotion/lowering etc for all different kinds of rounding modes? how do we verify all those modes?)

So despite my comment above, we think that the way forward is to keep the solution with "unspecified" for now (we already got it for the non-saturating intrinsics).

But to avoid confusion when people are reading the LangRef and looking at the code etc. we probably want to describe what "rounding direction is unspecified" means somewhere (for example in the introduction about "Fixed Point Arithmetic Intrinsics"). Explaining things like:

  • different optimizations (and legalization) in the pipeline are free to do the rounding in whatever direction they want to (but I think the accuracy of the result still should be well-defined so we need to say something about that?)
  • KnownBits/ValueTracking can't assume the direction of the rounding.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2604

Even if rounding is unspecified, I believe this code is implementing some kind of rounding scheme. Should we perhaps say something about this in the function header. It can be at help when looking at the code in the future. Both to understand what the intention was with the original algorithm. And to understand the expected result when looking at test results etc. Or for some target to understand why a "legal"/"custom" lowering gives different result compared to "expand".

leonardchan marked 5 inline comments as done.
leonardchan added inline comments.
llvm/docs/LangRef.rst
13185

We probably don't need this. Especially since the fixed point section comes after the saturation section.

13235

I added more detail to the overview explaining the default expansion for multiplication and rounding to say that targets should specify their own hooks if they care about rounding and optimizations/legalizations should be performed based off that hook. Let me know if there's something else important that should be added.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2604

Added

bjope accepted this revision.May 17 2019, 8:31 AM

LGTM! (if all comments from other reviewers has been taken care of) (maybe you should wait another day to see if anyone else object, but I think this patch has been open for a long time so there have been plenty of time for comments already)

Btw, I'm still having a hard time reviewing the X86 test cases with very long results, and a little bit worried that those just will give annoying churn when doing unrelated patches in the future, rather than help out detecting problems related to smul.fix. I currently have no more ideas on how to improve that.

Hopefully I'll be able to run some runtime comparison tests between X86 and our OOT target when this has landed (and when I've adapted our target to use these new intrinsics using "legal" lowering and not "expand").

This revision is now accepted and ready to land.May 17 2019, 8:31 AM

LGTM! (if all comments from other reviewers has been taken care of) (maybe you should wait another day to see if anyone else object, but I think this patch has been open for a long time so there have been plenty of time for comments already)

Btw, I'm still having a hard time reviewing the X86 test cases with very long results, and a little bit worried that those just will give annoying churn when doing unrelated patches in the future, rather than help out detecting problems related to smul.fix. I currently have no more ideas on how to improve that.

Hopefully I'll be able to run some runtime comparison tests between X86 and our OOT target when this has landed (and when I've adapted our target to use these new intrinsics using "legal" lowering and not "expand").

Thanks. Unless anyone has other comments, I'll attempt to commit this start of next week.

This revision was automatically updated to reflect the committed changes.