⚙ D141074 [X86] Avoid converting u64 to f32 using x87 on Windows

icedrocket created this revision.Jan 5 2023, 10:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:28 AM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

icedrocket requested review of this revision.Jan 5 2023, 10:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

icedrocket added a reviewer: craig.topper.Jan 5 2023, 10:30 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 5 2023, 10:30 AM

icedrocket edited the summary of this revision. (Show Details)Jan 5 2023, 10:33 AM

icedrocket added a project: Restricted Project.Jan 5 2023, 10:37 AM

icedrocket removed a project: Restricted Project.Jan 5 2023, 10:45 AM

icedrocket updated this revision to Diff 486631.Jan 5 2023, 10:47 AM

Please upload all patches with full context (-U99999)
Tests missing

icedrocket updated this revision to Diff 486634.Jan 5 2023, 10:57 AM

only on Windows 32-bit

Is the default precision control different for 64-bit?

In D141074#4029537, @lebedev.ri wrote:

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

In D141074#4029541, @craig.topper wrote:

In D141074#4029537, @lebedev.ri wrote:

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

I'm mainly asking whether this needs to check for the 32-bit windows specifically, not just that it is windows. Guess not.

In D141074#4029544, @lebedev.ri wrote:

In D141074#4029541, @craig.topper wrote:

In D141074#4029537, @lebedev.ri wrote:

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

I'm mainly asking whether this needs to check for the 32-bit windows specifically, not just that it is windows. Guess not.

I tested it on my machine and the precision control is 53-bit even on Windows 64-bit.

Harbormaster completed remote builds in B205957: Diff 486634.Jan 5 2023, 1:04 PM

pengfei added inline comments.Jan 5 2023, 7:12 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
21389	What to do if user changes the default precision? Besides, our down stream offers option to help user to set higher percision.

icedrocket added a comment.Jan 5 2023, 9:04 PM

This comment was removed by icedrocket.

What to do if user changes the default precision?

If users change the precision control to 64-bit, we can allow the use of x87 instructions by adding a compiler option. However, this is probably only for Windows 32-bit, and I don't know if it's appropriate to add a new option because there aren't that many users on that platform.

Besides, our down stream offers option to help user to set higher percision.

I'm sorry, but I'm not sure exactly what that option is. Is it an option to increase x87 precision, or an option to operate on floats at higher precision like double?

Use library calls only when SSE is enabled. If SSE is disabled and x87 is enabled, the x87 implementation is used regardless of precision.

In D141074#4031117, @icedrocket wrote:

What to do if user changes the default precision?

If users change the precision control to 64-bit, we can allow the use of x87 instructions by adding a compiler option. However, this is probably only for Windows 32-bit, and I don't know if it's appropriate to add a new option because there aren't that many users on that platform.

Besides, our down stream offers option to help user to set higher percision.

I'm sorry, but I'm not sure exactly what that option is. Is it an option to increase x87 precision, or an option to operate on floats at higher precision like double?

It's /Qpc80 that to be compatible with ICC https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming/intel-c-compiler-classic-math-library/use-the-intel-c-compiler-classic-math-library.html
It's intended design for Windows, especially for 32-bits.

icedrocket updated this revision to Diff 486846.Jan 6 2023, 6:05 AM

icedrocket updated this revision to Diff 486869.Jan 6 2023, 7:10 AM

Harbormaster completed remote builds in B206115: Diff 486869.Jan 6 2023, 8:38 AM

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

In D141074#4031999, @icedrocket wrote:

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

The icc options injects code into main to set the PC. (Kind of ignoring that global constructors that run before main exist).

In D141074#4032006, @craig.topper wrote:

In D141074#4031999, @icedrocket wrote:

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

The icc options injects code into main to set the PC. (Kind of ignoring that global constructors that run before main exist).

The option I added just tells the compiler that the precision control will always be set to 64-bit. And it's still a draft implementation of the candidate concept.

craig.topper retitled this revision from Avoid converting 64-bit integers to floating point using x87 on Windows to [X86] Avoid converting 64-bit integers to floating point using x87 on Windows.Jan 6 2023, 10:45 AM

Harbormaster completed remote builds in B206152: Diff 486923.Jan 6 2023, 11:01 AM

icedrocket updated this revision to Diff 486959.Jan 6 2023, 12:16 PM

Harbormaster completed remote builds in B206180: Diff 486959.Jan 6 2023, 1:10 PM

On Windows 32-bit, the calling convention uses x87, so it's almost impossible to avoid using x87. So rather than avoiding the use of x87, I think it's better to add an option to inject code like an ICC option. To add this option to all LLVM-based compilers, the code must be injected into the LLVM IR. This will be done in the process of parsing or generating the LLVM IR. However, this requires refactoring a lot of code.

icedrocket added a comment.Jan 6 2023, 1:33 PM

This comment was removed by icedrocket.

I think it's not a good idea to use the same name while not doing the same thing as ICC. I considered to open source our implementation if people thinks it's useful, though I may not do it immediately since it relys on some internal framework.
Can we just emit a warning to remind user the result may not be precise on 32-bit Windows. I think the current implementation still has two problems: 1) The result is not identical to MSVC's, which may introduce more problem than it solves since MSVC is the dominant compiler on Windows. 2) The generated binary relys on compiler-rt, but I think not all user will use compiler-rt. Force to generate library call may result in compiler fail to them.

I think the original version of the diff is reasonable and can be merged.

I don't think the concerns that we lower something to a libcall are sufficiently grounded.
We already lower many things to a libcall. Doing so for one more thing is not the end of the world,
assuming the compiler-rt exists for that configuration in the first place.

In D141074#4037440, @lebedev.ri wrote:

I think the original version of the diff is reasonable and can be merged.

I don't think the concerns that we lower something to a libcall are sufficiently grounded.
We already lower many things to a libcall. Doing so for one more thing is not the end of the world,
assuming the compiler-rt exists for that configuration in the first place.

I've written code that adds a warning that the results may be incorrect if x87 is used on Windows 32-bit. Would it be a good idea to add this code too?

Does the libcall not produce the right answer? That sounds like a bug.
I think this patch should *just* be about falling back to the libcall.

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

In D141074#4037831, @icedrocket wrote:

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. In this case, it might be better to just use the x87 implementation and print a warning.

Hopefully if you're not using long double, there shouldn't be any major precision issues with x87. The issue with the unsigned conversion is assuming we could represent a 64-bit integer in the mantissa of an x87 register without occurring any rounding.

In D141074#4037831, @icedrocket wrote:

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

In D141074#4037854, @lebedev.ri wrote:

In D141074#4037831, @icedrocket wrote:

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

As long as the precision control is low, x87 has low precision.

In D141074#4037902, @icedrocket wrote:

In D141074#4037854, @lebedev.ri wrote:

In D141074#4037831, @icedrocket wrote:

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

As long as the precision control is low, x87 has low precision.

If the precision control is set to more than 24-bit precision, x87 actually has more precision than SSE for fp32.

icedrocket updated this revision to Diff 487594.Jan 9 2023, 4:02 PM

Please split clang change into a separate review.

icedrocket updated this revision to Diff 487596.Jan 9 2023, 4:08 PM

craig.topper added inline comments.Jan 9 2023, 4:13 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
21389	Where did 56 come from? The precision control settings are 24, 53, and 64.

icedrocket added a comment.Jan 9 2023, 4:14 PM

This comment was removed by icedrocket.

Fix the comments

icedrocket marked an inline comment as done.Jan 9 2023, 4:20 PM

Harbormaster completed remote builds in B206649: Diff 487598.Jan 9 2023, 5:19 PM

icedrocket edited the summary of this revision. (Show Details)Jan 9 2023, 6:22 PM

icedrocket updated this revision to Diff 487649.Jan 9 2023, 7:34 PM

Tests?

Harbormaster completed remote builds in B206682: Diff 487649.Jan 9 2023, 8:42 PM

The test keeps failing, but Diff 486869 has already built and tested successfully. The only thing that has changed is the comments.

icedrocket added a comment.Jan 10 2023, 3:20 PM

This comment was removed by icedrocket.

Change to original version of diff mentioned by @lebedev.ri.

Harbormaster completed remote builds in B206984: Diff 488063.Jan 11 2023, 12:13 AM

RKSimon added reviewers: pengfei, RKSimon, rnk.Jan 12 2023, 1:35 PM

Can we add lit test to llvm/test/CodeGen/X86 for this with a Windows triple?

icedrocket updated this revision to Diff 489212.Jan 14 2023, 1:38 AM

Harbormaster completed remote builds in B207788: Diff 489212.Jan 14 2023, 2:25 AM

The main reason I posted a review was because the conversion results from u64 and i64 to f32 did not match, causing problems.

I think using the x87 implementation of UINT_TO_FP also in i64 to f32 conversion is an alternative solution without compromising performance on Windows 32-bit. This will produces same results as MSVC and solves the problem that prompted me to post this review. There may be a very slight drop in precision, but it shouldn't be a major issue.

icedrocket updated this revision to Diff 489219.Jan 14 2023, 3:21 AM

I've checked and MSVC's implementation dynamically checks if AVX-512 is available and uses vcvtuqq2pd if available. Therefore, it seems that the result depends on whether the system supports AVX-512 or not.

Harbormaster completed remote builds in B207795: Diff 489219.Jan 14 2023, 4:00 AM

icedrocket updated this revision to Diff 489224.Jan 14 2023, 4:02 AM

Harbormaster completed remote builds in B207800: Diff 489224.Jan 14 2023, 4:43 AM

icedrocket updated this revision to Diff 489261.Jan 14 2023, 8:23 AM

Harbormaster completed remote builds in B207828: Diff 489261.Jan 14 2023, 6:37 PM

The tests keep failing, but it's all RISC-V related, so the diff probably isn't wrong.

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

In D141074#4054355, @icedrocket wrote:

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. If bit 63 wasn’t set, add 0.0. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

In D141074#4054359, @craig.topper wrote:

In D141074#4054355, @icedrocket wrote:

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

The value I tried to convert in summary's example code is 18014397972611071, which is representable in i64 range and bit 63 is not set.

In D141074#4054383, @icedrocket wrote:

In D141074#4054359, @craig.topper wrote:

In D141074#4054355, @icedrocket wrote:

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

The value I tried to convert in summary's example code is 18014397972611071, which is representable in i64 range and bit 63 is not set.

If bit 63 isn’t set we add 0.0 which still triggers the rounding to 53 bits on the add.

In D141074#4054355, @icedrocket wrote:

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

Initially, I suspected that LLVM might have similar issues as MSVC. The results from the two different implementations were identical, which confused me a bit.

icedrocket updated this revision to Diff 489420.Jan 15 2023, 9:08 PM

Harbormaster completed remote builds in B207959: Diff 489420.Jan 15 2023, 10:06 PM

icedrocket updated this revision to Diff 489429.Jan 15 2023, 10:11 PM

Harbormaster completed remote builds in B207963: Diff 489429.Jan 15 2023, 10:50 PM

RKSimon added inline comments.Jan 16 2023, 2:00 AM

llvm/test/CodeGen/X86/uint64-to-float.ll
4	add win64 test coverage as well to check we're not doing this there

icedrocket updated this revision to Diff 489489.Jan 16 2023, 3:46 AM

icedrocket marked an inline comment as done.

icedrocket updated this revision to Diff 489494.Jan 16 2023, 3:56 AM

Harbormaster completed remote builds in B208010: Diff 489494.Jan 16 2023, 4:29 AM

LGTM

This revision is now accepted and ready to land.Jan 17 2023, 9:11 AM

This change in it's current form does look reasonable to me.
Thanks

@craig.topper, could you please commit this change? I don't have permission to commit.

icedrocket retitled this revision from [X86] Avoid converting 64-bit integers to floating point using x87 on Windows to [X86] Avoid converting u64 to f32 using x87 on Windows.Jan 17 2023, 11:37 PM

icedrocket edited the summary of this revision. (Show Details)

In D141074#4061168, @icedrocket wrote:

@craig.topper, could you please commit this change? I don't have permission to commit.

@icedrocket Please can you provide your email address for the commit message?

In D141074#4061556, @RKSimon wrote:

In D141074#4061168, @icedrocket wrote:

@craig.topper, could you please commit this change? I don't have permission to commit.

@icedrocket Please can you provide your email address for the commit message?

114203630+icedrocket@users.noreply.github.com

Could someone commit this change?

This revision was landed with ongoing or failed builds.Jan 18 2023, 11:14 PM

Closed by commit rGa6e3027db7eb: [X86] Avoid converting u64 to f32 using x87 on Windows (authored by icedrocket, committed by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGa6e3027db7eb: [X86] Avoid converting u64 to f32 using x87 on Windows.

this is causing us to get undefined symbol errors for ___floatundisf because we don't link in builtins on win32. not 100% sure on this, but I don't think we're expected to link against builtins on win32

https://bugs.chromium.org/p/chromium/issues/detail?id=1408807

In D141074#4066869, @aeubanks wrote:

this is causing us to get undefined symbol errors for ___floatundisf because we don't link in builtins on win32. not 100% sure on this, but I don't think we're expected to link against builtins on win32

https://bugs.chromium.org/p/chromium/issues/detail?id=1408807

Same problem was injected into Halide. Are we expected to link in the builtins now? Halide is willing to follow Chrome's lead, but this seems like a surprising change.

I'm going to revert this patch. I'm going to see if we can change the precision control around the problematic fadd in the inlined sequence.

craig.topper added a reverting change: rG1692dff0b33c: Revert "[X86] Avoid converting u64 to f32 using x87 on Windows".Jan 19 2023, 9:36 PM

craig.topper mentioned this in D142178: [X86] Change precision control to FP80 during u64->fp32 conversion on Windows..Jan 19 2023, 10:31 PM

craig.topper mentioned this in rG928a1764d6bd: [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on….Jan 20 2023, 12:34 AM

craig.topper mentioned this in rG11fb09ec0afa: [X86] Change precision control to FP80 during u64->fp32 conversion on Windows..Feb 6 2023, 7:35 AM