Page MenuHomePhabricator

[X86] -- Fix fptoui i64 conversions for IA32 (performance and correctness)
ClosedPublic

Authored by mbodart on Jul 17 2015, 4:30 PM.

Details

Summary

Fixes assertion failures when using fptosi or fptoui with f80 and AVX512. The operation actions need to be left as Custom, not Legal. The legality for f32/f64 is handled by having FP_TO_INTHelper do nothing for those cases.

Suppresses use of the MSVC ftol2 library function for fptoui i64. That function performs a conversion to *signed* i64, so the results were incorrect for source values >= 2^63. I didn't rip out the now dead references to ftol2, but that can be done with a small followup change set if desired.

Implements an inline sequence for fptoui i64 for 32-bit X86. This is mostly in FP_TO_INTHelper, replacing the ftol2 usage on Windows, and replacing the calls to fixuns{sf,df,xf}di for non-windows. Improves performance by 6X under SSE3, 3X otherwise.

Diff Detail

Repository
rL LLVM

Event Timeline

mbodart updated this revision to Diff 30048.Jul 17 2015, 4:30 PM
mbodart retitled this revision from to [X86] -- Fix fptoui i64 conversions for IA32 (performance and correctness).
mbodart updated this object.
mbodart added a subscriber: llvm-commits.

Hi Michael,

Just a ping to remind you of this review request.

I selected you as you were involved in the original ftol2 introduction, which I am disabling.

If you have suggestions for other reviewers, please pass them along.

thanks,

  • mitch

Per Michael Spencer's suggestion, added Nadav as a reviewer.

Also adding Michael Kuperstein, who consulted on these changes.

mkuper added inline comments.Aug 1 2015, 11:41 PM
lib/Target/X86/X86ISelLowering.cpp
26320 ↗(On Diff #30048)

I'm ok with just deleting the FTOL code immediately, unless anyone objects.
(Can be a follow-up patch)

test/CodeGen/X86/scalar-fp-to-i64.ll
17 ↗(On Diff #30048)

Can you use -mattr here instead of an explicit -mcpu?

test/CodeGen/X86/win_ftol2.ll
4 ↗(On Diff #30048)

What happened to this RUN line?

majnemer added inline comments.
lib/Target/X86/X86ISelLowering.cpp
26320 ↗(On Diff #30048)

I think it makes sense to remove it.

mbodart added inline comments.Aug 2 2015, 10:21 PM
lib/Target/X86/X86ISelLowering.cpp
26320 ↗(On Diff #30048)

I'm fine with removing it, but need a little guidance as to the depth of removal.

Certainly isIntegerTypeFTOL and isTargetFTOL, and their use in FP_TO_INTHelper, can be removed.

But further deletion of the WIN_FTOL instructions in the .td files would prevent any future use of that library routine. How can I be certain that isn't being used, or won't be needed?

test/CodeGen/X86/scalar-fp-to-i64.ll
17 ↗(On Diff #30048)

I see both -mcpu and -mattr being used for skx and avx512 testing in lit tests.
-mcpu seemed simpler and more "bullet proof" if you will in that it captures
all relevant attributes (sse, sse2, ... avx, skx).

But if -mattr is the preferred method, I can certainly change to that.

What is the BKM?

test/CodeGen/X86/win_ftol2.ll
4 ↗(On Diff #30048)

Oops, forgot to reset it before submitting the review.
I was playing around with different ways to disable this test.
It's left as RUN in my current version.

mbodart updated this revision to Diff 31699.Aug 10 2015, 11:31 AM

I uploaded a new revision, removing all usage of isTargetFTOL and isIntegerTypeFTOL, which I think is sufficient for this change set. I left the WIN_FTOL instructions, as conceivably those could still be used for conversion to signed int64, though they aren't currently.

I'm still waiting on a recommendation as to whether lit tests should lean towards using -mattr instead of -mcpu.
Please let me know your preference, and whether that applies to all RUN lines. Specifically, what would be the appropriate -mattr replacement for -mcpu=generic?

thanks!
Mitch Bodart

majnemer added inline comments.Aug 10 2015, 11:55 AM
lib/Target/X86/X86ISelLowering.cpp
12399 ↗(On Diff #31699)

Could we use ANY_EXTEND for High32? We will shift out the top 32-bits anyway.

12696 ↗(On Diff #31699)

Please format the return onto its own line.

mbodart updated this revision to Diff 31750.Aug 10 2015, 5:38 PM

Applied fixes for David's recent comments (using ANY_EXTEND, and formatting).

mbodart added inline comments.Aug 10 2015, 5:41 PM
lib/Target/X86/X86ISelLowering.cpp
12399 ↗(On Diff #31750)

Yes, thanks for the suggestion!

Fixed in latest upload.

12697 ↗(On Diff #31750)

Fixed, both here and up above in LowerFP_TO_SINT, where this was copied from.

mbodart updated this revision to Diff 32324.Aug 17 2015, 10:33 AM

Updated new test to use -mattr instead of -mcpu, per MK's recommendation.

With that I believe all review comments have been addressed.

Can I please get an LGTM?

thanks,
Mitch

mkuper accepted this revision.Aug 25 2015, 12:21 AM
mkuper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Aug 25 2015, 12:21 AM

I'll rip out the rest of the FTOL stuff in a follow-up commit, no reason to leave dead code lying around.

This revision was automatically updated to reflect the committed changes.