Fixes assertion failures when using fptosi or fptoui with f80 and AVX512. The operation actions need to be left as Custom, not Legal. The legality for f32/f64 is handled by having FP_TO_INTHelper do nothing for those cases.
Suppresses use of the MSVC ftol2 library function for fptoui i64. That function performs a conversion to *signed* i64, so the results were incorrect for source values >= 2^63. I didn't rip out the now dead references to ftol2, but that can be done with a small followup change set if desired.
Implements an inline sequence for fptoui i64 for 32-bit X86. This is mostly in FP_TO_INTHelper, replacing the ftol2 usage on Windows, and replacing the calls to fixuns{sf,df,xf}di for non-windows. Improves performance by 6X under SSE3, 3X otherwise.
Could we use ANY_EXTEND for High32? We will shift out the top 32-bits anyway.