This is an archive of the discontinued LLVM Phabricator instance.

[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.
ClosedPublic

Authored by craig.topper on Jan 29 2020, 12:01 AM.

Details

Summary

ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.

Fixes PR42195.

Event Timeline

craig.topper created this revision.Jan 29 2020, 12:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2020, 12:01 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
spatel accepted this revision.Jan 29 2020, 6:53 AM

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
20460

Include at least part of the text from this patch description as a block comment for this function, so:

/// ISD::FROUND is defined to round to nearest with ties rounding
/// away from 0. This mode isn't supported in hardware on X86.
/// But as long as we aren't compiling with trapping math, we can
/// emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).
/// ...
llvm/test/CodeGen/X86/vec_round.ll
9

This test/file was added with:
rL188048
...to show we wouldn't crash?
But it doesn't add value now that we have more thorough tests. I'd delete it.

This revision is now accepted and ready to land.Jan 29 2020, 6:53 AM
This revision was automatically updated to reflect the committed changes.
rtrieu added a subscriber: rtrieu.Jun 8 2021, 7:07 PM

I ran across a llvm_unreachable that points to this commit. Repro instructions below:

test.ii

double compare1(double x, double y) { return ((int)x< y) ? x : y; }
double compare2(double x, double y) { return y != 0.0 ? y : x; }

int compareint(int x, int y, int z) { return (x < y) ? y : (z < x) ? z : x; }

class C {
public:
  C(double arg) { 
    constexpr int k1 = -(1 << 23); 
    constexpr int k2 = (1 << 23) - 1;
    array[0] = compareint(arg, k1, k2);
  }
  
  char array[3];
};
extern "C" double round(double);
constexpr double kEight = 8;

C create(int b) {
  double d1 = b * kEight;
  double d2 = round(d1);
  double d3 = compare1(0.0, d2);
  double d4 = compare2(0.0, d3);
  return C(d4);
}

void loop(int* b, C *ptr, long j) {
  for (int i = 0; i < j; ++i)
    ptr[i] = create(b[i]);
}

clang command:

clang \
"-cc1" \
"-triple" "x86_64-unknown-linux-gnu" \
"-emit-obj" \
"-target-cpu" "x86-64" \
"-target-feature" "+avx" \
"-target-feature" "+avx2" \
"-target-feature" "+avx512f" \
"-O3" \
"-vectorize-loops" \
"-x" "c++" "test.ii"

output:

PromoteIntegerResult #0: t177: v16i24 = X86ISD::VTRUNCS t216

Do not know how to promote this operator!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:54!

0.	Program arguments: clang -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -target-cpu x86-64 -target-feature +avx -target-feature +avx2 -target-feature +avx512f -O3 -vectorize-loops -x c++ test.ii
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'test.ii'.
4.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@_Z4loopPiP1Cl'

Looks like some combine is incorrectly producing X86ISD::VTRUNCS? Probably just exposed by this patch.

craig.topper added a comment.EditedJun 8 2021, 10:18 PM

I ran across a llvm_unreachable that points to this commit. Repro instructions below:

test.ii

double compare1(double x, double y) { return ((int)x< y) ? x : y; }
double compare2(double x, double y) { return y != 0.0 ? y : x; }

int compareint(int x, int y, int z) { return (x < y) ? y : (z < x) ? z : x; }

class C {
public:
  C(double arg) { 
    constexpr int k1 = -(1 << 23); 
    constexpr int k2 = (1 << 23) - 1;
    array[0] = compareint(arg, k1, k2);
  }
  
  char array[3];
};
extern "C" double round(double);
constexpr double kEight = 8;

C create(int b) {
  double d1 = b * kEight;
  double d2 = round(d1);
  double d3 = compare1(0.0, d2);
  double d4 = compare2(0.0, d3);
  return C(d4);
}

void loop(int* b, C *ptr, long j) {
  for (int i = 0; i < j; ++i)
    ptr[i] = create(b[i]);
}

clang command:

clang \
"-cc1" \
"-triple" "x86_64-unknown-linux-gnu" \
"-emit-obj" \
"-target-cpu" "x86-64" \
"-target-feature" "+avx" \
"-target-feature" "+avx2" \
"-target-feature" "+avx512f" \
"-O3" \
"-vectorize-loops" \
"-x" "c++" "test.ii"

output:

PromoteIntegerResult #0: t177: v16i24 = X86ISD::VTRUNCS t216

Do not know how to promote this operator!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:54!

0.	Program arguments: clang -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -target-cpu x86-64 -target-feature +avx -target-feature +avx2 -target-feature +avx512f -O3 -vectorize-loops -x c++ test.ii
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'test.ii'.
4.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@_Z4loopPiP1Cl'

This should fix it

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index b89e1674..90babf3 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -45273,7 +45273,8 @@ static SDValue combineTruncateWithSat(SDValue In, EVT VT, const SDLoc &DL,
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (TLI.isTypeLegal(InVT) && InVT.isVector() && SVT != MVT::i1 &&
-      Subtarget.hasAVX512() && (InSVT != MVT::i16 || Subtarget.hasBWI())) {
+      Subtarget.hasAVX512() && (InSVT != MVT::i16 || Subtarget.hasBWI()) &&
+      (SVT == MVT::i32 || SVT == MVT::i16 || SVT == MVT::i8)) {
     unsigned TruncOpc = 0;
     SDValue SatVal;
     if (auto SSatVal = detectSSatPattern(In, VT)) {

Posted here https://reviews.llvm.org/D103940

rtrieu added a comment.Jun 9 2021, 2:10 PM

Thanks for the quick fix. I've verified that it fixes my full test case.