This is an archive of the discontinued LLVM Phabricator instance.

[X86] Avoid converting u64 to f32 using x87 on Windows
ClosedPublic

Authored by icedrocket on Jan 5 2023, 10:28 AM.

Details

Summary

The code below currently prints less accurate values only on Windows 32-bit. On Windows, the default precision control on x87 is only 53-bit, and FADD triggers rounding with that precision, so the final result may be less accurate. This revision avoids less accurate conversions by using library calls instead.

#include <stdio.h>
#include <stdint.h>

int main() {
    int64_t n = 0b0000000000111111111111111111111111011111111111111111111111111111;
    printf("%lld, %.0f, %.0f", n, (float)n, (float)(uint64_t)n);

    return 0;
}

Diff Detail

Event Timeline

icedrocket created this revision.Jan 5 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:28 AM
icedrocket requested review of this revision.Jan 5 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:28 AM
icedrocket edited the summary of this revision. (Show Details)Jan 5 2023, 10:33 AM
icedrocket added a project: Restricted Project.Jan 5 2023, 10:37 AM
icedrocket removed a project: Restricted Project.Jan 5 2023, 10:45 AM

Please upload all patches with full context (-U99999)
Tests missing

only on Windows 32-bit

Is the default precision control different for 64-bit?

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

I'm mainly asking whether this needs to check for the 32-bit windows specifically, not just that it is windows. Guess not.

icedrocket added a comment.EditedJan 5 2023, 12:39 PM

only on Windows 32-bit

Is the default precision control different for 64-bit?

No, but I think we use a different algorithm with a 64-bit cvtsi2ss on 64-bit targets so we don't end up using x87.

I'm mainly asking whether this needs to check for the 32-bit windows specifically, not just that it is windows. Guess not.

I tested it on my machine and the precision control is 53-bit even on Windows 64-bit.

pengfei added inline comments.Jan 5 2023, 7:12 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
21389

What to do if user changes the default precision? Besides, our down stream offers option to help user to set higher percision.

This comment was removed by icedrocket.
icedrocket added a comment.EditedJan 6 2023, 4:51 AM

What to do if user changes the default precision?

If users change the precision control to 64-bit, we can allow the use of x87 instructions by adding a compiler option. However, this is probably only for Windows 32-bit, and I don't know if it's appropriate to add a new option because there aren't that many users on that platform.

Besides, our down stream offers option to help user to set higher percision.

I'm sorry, but I'm not sure exactly what that option is. Is it an option to increase x87 precision, or an option to operate on floats at higher precision like double?

icedrocket updated this revision to Diff 486837.Jan 6 2023, 5:43 AM

Use library calls only when SSE is enabled. If SSE is disabled and x87 is enabled, the x87 implementation is used regardless of precision.

What to do if user changes the default precision?

If users change the precision control to 64-bit, we can allow the use of x87 instructions by adding a compiler option. However, this is probably only for Windows 32-bit, and I don't know if it's appropriate to add a new option because there aren't that many users on that platform.

Besides, our down stream offers option to help user to set higher percision.

I'm sorry, but I'm not sure exactly what that option is. Is it an option to increase x87 precision, or an option to operate on floats at higher precision like double?

It's /Qpc80 that to be compatible with ICC https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming/intel-c-compiler-classic-math-library/use-the-intel-c-compiler-classic-math-library.html
It's intended design for Windows, especially for 32-bits.

icedrocket updated this revision to Diff 486846.Jan 6 2023, 6:05 AM
icedrocket updated this revision to Diff 486869.Jan 6 2023, 7:10 AM

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

The icc options injects code into main to set the PC. (Kind of ignoring that global constructors that run before main exist).

icedrocket marked an inline comment as done.EditedJan 6 2023, 10:34 AM

Add the pc80 option. There may be other suitable candidate names for this option, but I used the name from Intel ICC. The current implementation only applies to conversions from 64-bit integers to floating point.

The icc options injects code into main to set the PC. (Kind of ignoring that global constructors that run before main exist).

The option I added just tells the compiler that the precision control will always be set to 64-bit. And it's still a draft implementation of the candidate concept.

craig.topper retitled this revision from Avoid converting 64-bit integers to floating point using x87 on Windows to [X86] Avoid converting 64-bit integers to floating point using x87 on Windows.Jan 6 2023, 10:45 AM
icedrocket added a comment.EditedJan 6 2023, 1:22 PM

On Windows 32-bit, the calling convention uses x87, so it's almost impossible to avoid using x87. So rather than avoiding the use of x87, I think it's better to add an option to inject code like an ICC option. To add this option to all LLVM-based compilers, the code must be injected into the LLVM IR. This will be done in the process of parsing or generating the LLVM IR. However, this requires refactoring a lot of code.

This comment was removed by icedrocket.

I think it's not a good idea to use the same name while not doing the same thing as ICC. I considered to open source our implementation if people thinks it's useful, though I may not do it immediately since it relys on some internal framework.
Can we just emit a warning to remind user the result may not be precise on 32-bit Windows. I think the current implementation still has two problems: 1) The result is not identical to MSVC's, which may introduce more problem than it solves since MSVC is the dominant compiler on Windows. 2) The generated binary relys on compiler-rt, but I think not all user will use compiler-rt. Force to generate library call may result in compiler fail to them.

I think the original version of the diff is reasonable and can be merged.

I don't think the concerns that we lower something to a libcall are sufficiently grounded.
We already lower many things to a libcall. Doing so for one more thing is not the end of the world,
assuming the compiler-rt exists for that configuration in the first place.

icedrocket added a comment.EditedJan 9 2023, 2:27 PM

I think the original version of the diff is reasonable and can be merged.

I don't think the concerns that we lower something to a libcall are sufficiently grounded.
We already lower many things to a libcall. Doing so for one more thing is not the end of the world,
assuming the compiler-rt exists for that configuration in the first place.

I've written code that adds a warning that the results may be incorrect if x87 is used on Windows 32-bit. Would it be a good idea to add this code too?

Does the libcall not produce the right answer? That sounds like a bug.
I think this patch should *just* be about falling back to the libcall.

icedrocket added a comment.EditedJan 9 2023, 2:55 PM

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. In this case, it might be better to just use the x87 implementation and print a warning.

Hopefully if you're not using long double, there shouldn't be any major precision issues with x87. The issue with the unsigned conversion is assuming we could represent a 64-bit integer in the mantissa of an x87 register without occurring any rounding.

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

As long as the precision control is low, x87 has low precision.

If SSE2 is available, we only need to use library calls to convert between 64-bit integers and floating point. However, if only x87 is available, precision issues are unavoidable. What I meant was to just use the x87 implementation and print a warning in this case.

Why are they unavoidable?

As long as the precision control is low, x87 has low precision.

If the precision control is set to more than 24-bit precision, x87 actually has more precision than SSE for fp32.

icedrocket updated this revision to Diff 487594.Jan 9 2023, 4:02 PM

Please split clang change into a separate review.

icedrocket updated this revision to Diff 487596.Jan 9 2023, 4:08 PM
craig.topper added inline comments.Jan 9 2023, 4:13 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
21389

Where did 56 come from? The precision control settings are 24, 53, and 64.

This comment was removed by icedrocket.
icedrocket updated this revision to Diff 487598.Jan 9 2023, 4:18 PM

Fix the comments

icedrocket marked an inline comment as done.Jan 9 2023, 4:20 PM
icedrocket edited the summary of this revision. (Show Details)Jan 9 2023, 6:22 PM
icedrocket updated this revision to Diff 487649.Jan 9 2023, 7:34 PM
icedrocket added a comment.EditedJan 9 2023, 8:49 PM

The test keeps failing, but Diff 486869 has already built and tested successfully. The only thing that has changed is the comments.

This comment was removed by icedrocket.

Change to original version of diff mentioned by @lebedev.ri.

Can we add lit test to llvm/test/CodeGen/X86 for this with a Windows triple?

icedrocket added a comment.EditedJan 14 2023, 2:34 AM

The main reason I posted a review was because the conversion results from u64 and i64 to f32 did not match, causing problems.

I think using the x87 implementation of UINT_TO_FP also in i64 to f32 conversion is an alternative solution without compromising performance on Windows 32-bit. This will produces same results as MSVC and solves the problem that prompted me to post this review. There may be a very slight drop in precision, but it shouldn't be a major issue.

icedrocket added a comment.EditedJan 14 2023, 3:28 AM

I've checked and MSVC's implementation dynamically checks if AVX-512 is available and uses vcvtuqq2pd if available. Therefore, it seems that the result depends on whether the system supports AVX-512 or not.

The tests keep failing, but it's all RISC-V related, so the diff probably isn't wrong.

icedrocket added a comment.EditedJan 14 2023, 11:57 PM

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

craig.topper added a comment.EditedJan 15 2023, 12:21 AM

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. If bit 63 wasn’t set, add 0.0. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

icedrocket added a comment.EditedJan 15 2023, 1:29 AM

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

The value I tried to convert in summary's example code is 18014397972611071, which is representable in i64 range and bit 63 is not set.

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

The algorithm we use for u64 to f32 looks like this

bitcast u64 to i64
convert to i64 to 80-bit fp using FILD instruction (this conversion does not depend on the value of PC).
if bits 63 was set in the integer this conversion produces a negative number.
if bit 63 was set in the integer, add 18446744073709551616 to the negative floating point value to give it the correct positive value. With PC=64 this shouldn't result in any rounding.
round 80-bit fp value to f32 using FST.

The bug occurs because the addition of 18446744073709551616 ends up rounding unexpectedly because of PC=53.

The value I tried to convert in summary's example code is 18014397972611071, which is representable in i64 range and bit 63 is not set.

If bit 63 isn’t set we add 0.0 which still triggers the rounding to 53 bits on the add.

icedrocket added a comment.EditedJan 15 2023, 3:47 AM

I'm not sure why the result depends on the x87 control word, but anyway, there is a possibility that this issue is because of the rounding mode. The conversion from f64 to f32 is typically done using round to nearest, and the target to round can differ between u64 and u64 to f64. We should round over u64, not u64 to f64.

Initially, I suspected that LLVM might have similar issues as MSVC. The results from the two different implementations were identical, which confused me a bit.

RKSimon added inline comments.Jan 16 2023, 2:00 AM
llvm/test/CodeGen/X86/uint64-to-float.ll
4

add win64 test coverage as well to check we're not doing this there

icedrocket marked an inline comment as done.
This revision is now accepted and ready to land.Jan 17 2023, 9:11 AM
lebedev.ri accepted this revision.Jan 17 2023, 9:15 AM

This change in it's current form does look reasonable to me.
Thanks

icedrocket added a comment.EditedJan 17 2023, 10:14 PM

@craig.topper, could you please commit this change? I don't have permission to commit.

icedrocket retitled this revision from [X86] Avoid converting 64-bit integers to floating point using x87 on Windows to [X86] Avoid converting u64 to f32 using x87 on Windows.Jan 17 2023, 11:37 PM
icedrocket edited the summary of this revision. (Show Details)
icedrocket edited the summary of this revision. (Show Details)

@craig.topper, could you please commit this change? I don't have permission to commit.

@icedrocket Please can you provide your email address for the commit message?

@craig.topper, could you please commit this change? I don't have permission to commit.

@icedrocket Please can you provide your email address for the commit message?

114203630+icedrocket@users.noreply.github.com

Could someone commit this change?

This revision was landed with ongoing or failed builds.Jan 18 2023, 11:14 PM
This revision was automatically updated to reflect the committed changes.

this is causing us to get undefined symbol errors for ___floatundisf because we don't link in builtins on win32. not 100% sure on this, but I don't think we're expected to link against builtins on win32

https://bugs.chromium.org/p/chromium/issues/detail?id=1408807

srj added a subscriber: srj.Jan 19 2023, 3:42 PM

this is causing us to get undefined symbol errors for ___floatundisf because we don't link in builtins on win32. not 100% sure on this, but I don't think we're expected to link against builtins on win32

https://bugs.chromium.org/p/chromium/issues/detail?id=1408807

Same problem was injected into Halide. Are we expected to link in the builtins now? Halide is willing to follow Chrome's lead, but this seems like a surprising change.

I'm going to revert this patch. I'm going to see if we can change the precision control around the problematic fadd in the inlined sequence.