This is an archive of the discontinued LLVM Phabricator instance.

[X86] Change precision control to FP80 during u64->fp32 conversion on Windows.
ClosedPublic

Authored by craig.topper on Jan 19 2023, 10:31 PM.

Details

Summary

This is an alternative to D141074 to fix the problem by adjusting
the precision control dynamically.

Posting for early review so we can do some testing of this solution.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 19 2023, 10:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:31 PM
craig.topper requested review of this revision.Jan 19 2023, 10:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:31 PM
This revision was not accepted when it landed; it landed in state Needs Review.Jan 20 2023, 12:34 AM
This revision was automatically updated to reflect the committed changes.
craig.topper reopened this revision.Jan 20 2023, 12:41 AM

Committed by accident

I tested the patch and it seems to have fixed the issue.

This looks a good way to solve the problem.

llvm/lib/Target/X86/X86ISelLowering.h
895–896

This should be in the same line.

icedrocket added inline comments.Jan 23 2023, 6:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
37379–37402

What about just set value to 0x37f instead of applying bitwise OR to original value? We can save two registers by doing this, and only need two bytes of memory to hold the static value.

This comment was removed by icedrocket.
icedrocket added inline comments.Jan 23 2023, 10:03 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
37379–37402

Ah, there is still the possibility of an exception being thrown due to a stack overflow. What if the original value's rounding mode is not round to nearest?

craig.topper added inline comments.Jan 23 2023, 10:45 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
37379–37402

The compiler is managing the stack so there shouldn't be any overflow unless someone uses assembly or something to add things to the stack that the compiler doesn't know about.

Rounding mode doesn't matter because the fadd is not supposed to round. Any integer than can be created should fit perfectly in an 80 bit FP.

icedrocket added inline comments.Jan 24 2023, 12:15 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
37379–37402

Yes, and FADD doesn't seem to cause a stack overflow, because it doesn't push on the stack. I forgot that the modified control word only applies to FADD. So, as long as we only use FP80_ADD on LowerUINT_TO_FP, no exception will be thrown.

Use the load version of fadd when possible. Only handling the case of loading an f32 value.

craig.topper retitled this revision from [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows. to [X86] Change precision control to FP80 during u64->fp32 conversion on Windows..Jan 29 2023, 10:14 PM
craig.topper edited the summary of this revision. (Show Details)
icedrocket added a comment.EditedFeb 2 2023, 8:25 AM

Could you update the diff? The current diff is outdated and cannot be applied to main branch automatically.

icedrocket added inline comments.Feb 2 2023, 10:05 AM
llvm/lib/Target/X86/X86ISelLowering.h
895–896

Is this comment change intended? Other than this, everything looks good.

llvm/test/CodeGen/X86/uint64-to-float.ll
64

I checked the assembly generated by clang and it seems that fadds is split into fld and fadd.

craig.topper added inline comments.Feb 2 2023, 10:07 AM
llvm/lib/Target/X86/X86ISelLowering.h
895–896

not sure what happened there. I'll fix

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

craig.topper added inline comments.Feb 2 2023, 10:16 AM
llvm/test/CodeGen/X86/uint64-to-float.ll
64

That's weird. Do you have a C file you can share?

icedrocket added inline comments.Feb 2 2023, 10:44 AM
llvm/test/CodeGen/X86/uint64-to-float.ll
64

The file is same as the summary's code in D141074. I think that there is no actual fadds instruction in x87 and end up split into two instructions.

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

clang-format test failed. We need to fix the comment.

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

clang-format test failed. We need to fix the comment.

I'll fix in a separate pre-commit. It's a distraction for this review.

Rebase after fixing 80 column on trunk

Fix warning about ISD and X86ISD being different enums for a conditional operator.

craig.topper added inline comments.Feb 2 2023, 12:53 PM
llvm/test/CodeGen/X86/uint64-to-float.ll
64

Did you check the assembly without optimizations enabled? Folding the load into the fadd is only done with optimizations enabled.

icedrocket added inline comments.Feb 2 2023, 10:14 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
22066

Should we only apply this to the conversion to f32? Conversion to f64 might also have precision issues though I can't prove it.

llvm/test/CodeGen/X86/uint64-to-float.ll
64

You're right. I tested again with another code and it works as you mentioned.

icedrocket added inline comments.Feb 4 2023, 4:04 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
22066

LowerUINT_TO_FP_i64 also uses FP64 addition. If u64 to f64 conversion is problematic because FP80 addition is not used, then implementations using SSE2 will also be problematic. So unless we find a broken case, I think it's better to leave it alone.

icedrocket added a comment.EditedFeb 4 2023, 7:02 AM

I wrote some code to verify that u64 to f64 conversion using FILD+FADD with 53-bit precision is accurate. I tested it with 10^12 cases and the results are: u64 to f32 conversion failed frequently, but u64 to f64 conversion does not failed.

https://reviews.llvm.org/F26366621
https://reviews.llvm.org/F26366622

This revision is now accepted and ready to land.Feb 4 2023, 7:03 AM

I wrote some code to verify that u64 to f64 conversion using FILD+FADD with 53-bit precision is accurate. I tested it with 10^12 cases and the results are: u64 to f32 conversion failed frequently, but u64 to f64 conversion does not failed.

https://reviews.llvm.org/F26366621
https://reviews.llvm.org/F26366622

The u64->f64 conversion can only fail if PC is set to single precision. The original bug we are fixing is that with PC set to double precision we round from fp80 to fp64 then to fp32. If PC is set to single precision we have a lot more problems with f64.