[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.

ISD::FROUND is defined to round to nearest with ties rounding

away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can

emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding

0.5 would have. For example, if X is nextafter(0.5, 0.0) it should

round to 0.0, but adding 0.5 would need one extra bit of mantissa

than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)

instead will just increase the exponent by 1 and leave the mantissa

as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.

But if we care about exceptions we should be using constrained

intrinsics. Constrained intrinsics would use STRICT_FROUND which

won't go through this code.

Fixes PR42195.

Differential Revision: https://reviews.llvm.org/D73607