This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsic] Add fixed point saturating division intrinsics.
ClosedPublic

Authored by ebevhan on Dec 16 2019, 7:32 AM.

Details

Summary

This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:

llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*

These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.

Diff Detail

Event Timeline

ebevhan created this revision.Dec 16 2019, 7:32 AM
ebevhan updated this revision to Diff 234066.Dec 16 2019, 8:00 AM

No need to do the promotion in DAGBuilder for unsigned saturating 0-scale operations, they cannot experience overflow.

Ka-Ka added a subscriber: Ka-Ka.Dec 16 2019, 11:16 PM
ebevhan updated this revision to Diff 234498.Dec 18 2019, 3:26 AM

When promoting to a legal type, we must shift up the value for saturating divisions.

ebevhan updated this revision to Diff 236803.Jan 8 2020, 5:57 AM

Rebased.

LGTM for the code so far. Just want to confirm on my side that the codegen works as intended also.

bjope added inline comments.Jan 14 2020, 12:08 AM
llvm/docs/LangRef.rst
14368

Typo: divison/division

ebevhan updated this revision to Diff 239558.Jan 22 2020, 6:01 AM

Fixed typo.

ebevhan marked an inline comment as done.Jan 22 2020, 6:08 AM
leonardchan added a comment.EditedJan 22 2020, 3:16 PM

It seems that I run into LLVM ERROR: Unsupported library call operation! if I call llvm.sdiv.fix.sat.i64 with a scale of 63:

define i64 @sdiv_sat_i64_s63(i64 %x, i64 %y) {
  %tmp = call i64 @llvm.sdiv.fix.sat.i64(i64 %x, i64 %y, i32 63)
  ret i64 %tmp;
}

This seems to be the only issue I could find from running the code.

It seems that I run into LLVM ERROR: Unsupported library call operation! if I call llvm.sdiv.fix.sat.i64 with a scale of 63:

define i64 @sdiv_sat_i64_s63(i64 %x, i64 %y) {
  %tmp = call i64 @llvm.sdiv.fix.sat.i64(i64 %x, i64 %y, i32 63)
  ret i64 %tmp;
}

This seems to be the only issue I could find from running the code.

Ah, that's unfortunate... It's a consequence of being forced to handle the integer division overflow case gracefully.

With an i64, we get a sext+shl and an i65 after building. During the initial promotion we try to promote to i128. That gives us 63 bits of redundant sign (128-65), but in order to handle the case of MSB / ALL1 (MIN / -EPS) without invoking undefined behavior in the integer division, we require an extra bit. Since we don't have that bit to spare, we end up not being able to do the operation in i128. Then we get an i256 sdiv and there's no codegen for that.

I'm unsure what to do to make this look good. It would be convenient to skip the operation altogether in the overflow case, but we can't generate control flow here. I could check for the overflow case and fudge the inputs/result to prevent the overflow, but that feels sort of wrong. Might be the only viable option, though.

leonardchan added a comment.EditedJan 23 2020, 3:45 PM

Ah, that's unfortunate... It's a consequence of being forced to handle the integer division overflow case gracefully.

With an i64, we get a sext+shl and an i65 after building. During the initial promotion we try to promote to i128. That gives us 63 bits of redundant sign (128-65), but in order to handle the case of MSB / ALL1 (MIN / -EPS) without invoking undefined behavior in the integer division, we require an extra bit. Since we don't have that bit to spare, we end up not being able to do the operation in i128. Then we get an i256 sdiv and there's no codegen for that.

I'm unsure what to do to make this look good. It would be convenient to skip the operation altogether in the overflow case, but we can't generate control flow here. I could check for the overflow case and fudge the inputs/result to prevent the overflow, but that feels sort of wrong. Might be the only viable option, though.

Ah I see. Could you clarify though on how handling MIN / -EPS invokes undefined behavior?

Also, if handling this UB requires an extra bit, it seems odd that we would add one a second time. I could be wrong, but isn't the reason for adding the 1st extra bit (i64 -> i65) is just to force the operands to be promoted? If so, then by the time we check LHSLead + RHSTrail < Scale + (unsigned)(Saturating && Signed)) in expandFixedPointDiv, we would've already gotten the extra bit from the initial promotion. I feel like we should only need to promote once when doing fixed point division.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
897

nit: align the arguments

Ah I see. Could you clarify though on how handling MIN / -EPS invokes undefined behavior?

It's the same as if we had MIN / -1 in integer arithmetic. The result is MAX+1, so we get overflow, which is UB. The same case happens for our lowering of scaled division, since if we have MIN / -EPS (-EPS is -1 in the integer representation), during widening/promotion we may potentially shift up the LHS by the amount widened, giving us MIN / -EPS again, which causes overflow when we do the integer division.

This doesn't matter for non-saturating division since the operation overflows there as well, but for saturating division we need to check the result and saturate, but the div instruction will cause an exception so we never get to that stage.

Also, if handling this UB requires an extra bit, it seems odd that we would add one a second time. I could be wrong, but isn't the reason for adding the 1st extra bit (i64 -> i65) is just to force the operands to be promoted? If so, then by the time we check LHSLead + RHSTrail < Scale + (unsigned)(Saturating && Signed)) in expandFixedPointDiv, we would've already gotten the extra bit from the initial promotion.

Yes, we add the extra bit to force promotion, but we have to shift up and consume that bit in order to preserve the saturating behavior (SelectionDAGBuilder.cpp:5477) so we don't win any leading sign bits there. We only get those when we do the promotion, but we are one bit short there because of the pre-promotion, so we end up having to widen again. expandFixedPointDiv (during promotion) and earlyExpandDIVFIX do not need to do this extra shifting since they handle saturation directly.

The premature promotion really makes fixed-point division overly complicated.

I feel like we should only need to promote once when doing fixed point division.

You are right in that the pre-promotion causes us to lose out on a bit. It would maybe be possible to skip the aforementioned one-bit shift, but then we have to tell promotion that we need one extra bit of saturation, and I don't know how to effectively propagate that information.

leonardchan accepted this revision.Feb 20 2020, 11:57 AM

Ah I see. Could you clarify though on how handling MIN / -EPS invokes undefined behavior?

It's the same as if we had MIN / -1 in integer arithmetic. The result is MAX+1, so we get overflow, which is UB. The same case happens for our lowering of scaled division, since if we have MIN / -EPS (-EPS is -1 in the integer representation), during widening/promotion we may potentially shift up the LHS by the amount widened, giving us MIN / -EPS again, which causes overflow when we do the integer division.

This doesn't matter for non-saturating division since the operation overflows there as well, but for saturating division we need to check the result and saturate, but the div instruction will cause an exception so we never get to that stage.

Also, if handling this UB requires an extra bit, it seems odd that we would add one a second time. I could be wrong, but isn't the reason for adding the 1st extra bit (i64 -> i65) is just to force the operands to be promoted? If so, then by the time we check LHSLead + RHSTrail < Scale + (unsigned)(Saturating && Signed)) in expandFixedPointDiv, we would've already gotten the extra bit from the initial promotion.

Yes, we add the extra bit to force promotion, but we have to shift up and consume that bit in order to preserve the saturating behavior (SelectionDAGBuilder.cpp:5477) so we don't win any leading sign bits there. We only get those when we do the promotion, but we are one bit short there because of the pre-promotion, so we end up having to widen again. expandFixedPointDiv (during promotion) and earlyExpandDIVFIX do not need to do this extra shifting since they handle saturation directly.

The premature promotion really makes fixed-point division overly complicated.

I feel like we should only need to promote once when doing fixed point division.

You are right in that the pre-promotion causes us to lose out on a bit. It would maybe be possible to skip the aforementioned one-bit shift, but then we have to tell promotion that we need one extra bit of saturation, and I don't know how to effectively propagate that information.

Thanks for the clarification. Yeah, this is unfortunate. Not being able to do s.63 saturating multiplication kind of leaves a bad taste in my mouth, but for the purposes of implementing the standard, *I don't think* there's an instance where division for the proposed types can result in this (at least on a typical desktop processor). The main issues I can see down the road are frontend related, like if we perhaps will want to add something like long long _Fract that has a signed 64 bit width, or other frontends want to have types of varying scale and width. For now though, this still seems to work for our purposes and I also can't think of anything better sooo...LGTM.

This revision is now accepted and ready to land.Feb 20 2020, 11:57 AM

Not being able to do s.63 saturating multiplication kind of leaves a bad taste in my mouth, but for the purposes of implementing the standard, *I don't think* there's an instance where division for the proposed types can result in this (at least on a typical desktop processor). The main issues I can see down the road are frontend related, like if we perhaps will want to add something like long long _Fract that has a signed 64 bit width, or other frontends want to have types of varying scale and width.

Yes, it's pretty unfortunate that it's not as general as it could be. The largest (default) scale that Clang will output for the existing fixed-point types is 32, so I don't think it will hit the limit there. The long long _Fract case is compelling but we'll see if we ever get there.

This revision was automatically updated to reflect the committed changes.