This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets (PR31630)
ClosedPublic

Authored by craig.topper on Feb 18 2018, 6:49 AM.

Details

Summary

As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.

Diff Detail

Event Timeline

RKSimon created this revision.Feb 18 2018, 6:49 AM
RKSimon added a reviewer: DavidKreitzer.

Why a DAG combine and not Custom type legalization?

lib/Target/X86/X86ISelLowering.cpp
52

Why bother with the AVX512 check here if we're just going to check DQI inside?

70

Do we need the zeros or could this be scalar_to_vector?

craig.topper commandeered this revision.May 14 2018, 12:35 PM
craig.topper edited reviewers, added: RKSimon; removed: craig.topper.

Commandeering from Simon to finish this up.

Move everything into type legalization lowering.

delena added inline comments.May 14 2018, 1:55 PM
lib/Target/X86/X86ISelLowering.cpp
16065

This case is the same for SINT and UINT. May be put them in a function?

25275

VecVT = MVT::getVectorVT(VT, NumElts);

25279

Why do you need to insert into zero vector? Can you insert to undef?

test/CodeGen/X86/scalar-fp-to-i64.ll
541

Can the memory operand be folded here?
VCVTTPD2UQQ ymm1 {k1}{z},ymm2/m256/m64bcst

craig.topper added inline comments.May 14 2018, 2:49 PM
lib/Target/X86/X86ISelLowering.cpp
25279

I think so. I asked the same question before I commandeered it. It's probably no worse than the widening with undef we do for v2f32 legalization.

test/CodeGen/X86/scalar-fp-to-i64.ll
541

We'd have to detect the load and the possibilty of folding it during this lowering code. Or we'd have to use undef for the upper elts and add a DAG combine to turn insert into undef into a broadcast if its foldable.

Create helper function to share between SINT_TO_FP and UINT_TO_FP. Use scalar_to_vector instead of inserting into a zero vector.

delena accepted this revision.May 14 2018, 10:58 PM

Or we'd have to use undef for the upper elts and add a DAG combine to turn insert into undef into a broadcast if its foldable.

I think it is simpler than load folding in td. You can open a bugzilla ticket for possible optimization.

This revision is now accepted and ready to land.May 14 2018, 10:58 PM
RKSimon added inline comments.May 15 2018, 2:01 AM
lib/Target/X86/X86ISelLowering.cpp
25279

In the original patch I was just trying to be very sure there wasn't anything in the other source elements that could cause fp exceptions/overflow flags etc.

Could you please add full context to the patch, Craig?

lib/Target/X86/X86ISelLowering.cpp
16048

I suspect we also want to use this vector sequence for unsigned i64 conventions on 64-bit.

25279

I think the only possible side effect from the other source element being undef is raising "inexact". Do we care?

Thanks, Craig.

lib/Target/X86/X86ISelLowering.cpp
25279

Forget what I wrote. I was thinking of the INT-->FP case. For the FP-->INT case, why wouldn't we need to worry about raising spurious exceptions?

Also, I'm probably missing something, but it looks like this code is expected to kick in for both 32-bit and 64-bit and both signed and unsigned FP-->i64. Is that intentional? For the 64-bit signed case, I would think we would prefer CVTTSx2SI.

Use zeros for other elements for fp->int to avoid spurious invalid exceptions. Since these sorts of scalar conversions are common there's a chance someone could notice the spurious exceptions and complain. We seem to have generated the same code in the test cases either way. Since we ultimatley had to load two i32s into an xmm register. We need use a movd/movss that 0 the upper bits anyway.

Also addeded an assert for !Subtarget.is64Bit() to the fp->int conversion. That code is in ReplaceNodeResults is a hook for the type legalizer to legalize the result type so we should only get there when i64 isn't legal.

craig.topper requested review of this revision.May 15 2018, 5:43 PM
DavidKreitzer accepted this revision.May 16 2018, 5:44 AM

Looks good, Craig. And thanks for the explanation on the 64-bit cases. I had forgotten about the VCVTTSx2USI & VCVTUSI2Sx instructions, which we appear to be generating already.

This revision is now accepted and ready to land.May 16 2018, 5:44 AM
RKSimon accepted this revision.May 16 2018, 7:32 AM

LGTM - thanks Craig!

This revision was automatically updated to reflect the committed changes.