This is an archive of the discontinued LLVM Phabricator instance.

[X86] Updates to X86 backend for f16 promotion
ClosedPublic

Authored by pirama on Apr 17 2015, 4:48 PM.

Details

Summary

r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:

  1. Set loadextaction for f16 vectors to expand.
  2. Handle FP_EXTEND in a switch statement when handling v2f32
  3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or

(store (SINT_TO_FP )) to a FILD.

Tests included.

Diff Detail

Repository
rL LLVM

Event Timeline

pirama updated this revision to Diff 23977.Apr 17 2015, 4:48 PM
pirama retitled this revision from to [X86] Updates to X86 backend for f16 promotion.
pirama updated this object.
pirama edited the test plan for this revision. (Show Details)
pirama added reviewers: ab, srhines, delena.
pirama added a subscriber: Unknown Object (MLST).
pirama updated this revision to Diff 24067.Apr 20 2015, 3:01 PM

Minor update to test.

ab added inline comments.Apr 28 2015, 11:39 AM
lib/Target/X86/X86ISelLowering.cpp
750–753 ↗(On Diff #24067)

A longstanding item on my todo list is to properly support F16C, where this is legal, and we don't even need to scalarize vector conversions. I'm fine with doing that separately though, that might be tricky.

17317–17321 ↗(On Diff #24067)

Why isn't this in the FP_TO_UINT block? Either way, please add uitofp/fptoui tests.

17367–17373 ↗(On Diff #24067)

Does v2f32 even hit this? Without this patch it should trigger the default unreachable, why would that change?

pirama added inline comments.Apr 28 2015, 7:39 PM
lib/Target/X86/X86ISelLowering.cpp
17317–17321 ↗(On Diff #24067)

This block is needed only for FP_TO_SINT with i64 return. OperationAction is set to custom around line 217:

setOperationAction(ISD::FP_TO_SINT     , MVT::i64  , Custom);
setOperationAction(ISD::SINT_TO_FP     , MVT::i64  , Custom);

I'll add a test for uitofp and fptoui.

17367–17373 ↗(On Diff #24067)

Around line 924 in this file:

setOperationAction(ISD::FP_EXTEND,          MVT::v2f32, Custom);

is what necessitates this. v2f32 will be a valuetype of an FP_EXTEND node only when the source type is smaller (i.e. v2f16). I am not sure why the OperationAction is set as above, but it triggers the unreachable error for FP_EXTEND from v2f16. The included test for vec4 extend fails because vec4 is split during legalization. Now that I think about it, I should add a vec2 test so this test doesn't rot if vec4 is legal in the future.

delena added inline comments.Apr 30 2015, 1:36 AM
test/CodeGen/X86/half.ll
89 ↗(On Diff #24067)

This instruction converts and stores the value. Could you, please, show the store in CHECK, somthing like (%rdi)

157 ↗(On Diff #24067)

Why double to half can't go through the chain (double -> float ->half). I'm just asking..

pirama updated this revision to Diff 24764.Apr 30 2015, 12:13 PM
  • Added UINT tests
  • Added checks for STOREs
pirama added inline comments.Apr 30 2015, 12:21 PM
test/CodeGen/X86/half.ll
103 ↗(On Diff #24764)

The uitofp generates two cvttss2si instructions along with a bunch of other instructions. I don't really understand what's going. Here's the exact sequence for '-f16c':

.cfi_def_cfa_offset 16
movzwl  (%rdi), %edi
callq   __gnu_h2f_ieee
movss   .LCPI9_0(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
movaps  %xmm0, %xmm2
subss   %xmm1, %xmm2
cvttss2si       %xmm2, %rax
movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
xorq    %rax, %rcx
cvttss2si       %xmm0, %rax
ucomiss %xmm1, %xmm0
cmovaeq %rcx, %rax
popq    %rdx
retq

If you think this is optimal, I'll add CHECKs for the movabs, xor etc.

157 ↗(On Diff #24067)

A chained conversion through float is done if unsafe math is enabled (See around line 3458 in lib/CodeGen/SelectionDAG/LegalizeDAG.cpp). Interestingly, half -> double goes through an intermediate float (this is done in the 'case ISD::FP16_TO_FP' just above.

I am not sure why double -> float -> half is considered a precision loss, while the reverse chain isn't.

ab added inline comments.Apr 30 2015, 1:37 PM
test/CodeGen/X86/half.ll
157 ↗(On Diff #24067)

The problem is double rounding: the first round can expose a tie to be broken by the second round, which wouldn't happen with single-step rounding.
For instance, consider "1.49" with round-to-even. Rounded directly to 1 digit gets you 1. Rounding first to 2 digits gets you 1.5, and if you round that again to 1 digit, you get 2, a different result.

The reverse is fine, since you're basically adding trailing zero bits to the mantissa. For 1.49, that would be akin to turning 1.49 into 1.49000. You can first do 1.490, then 1.49000: the result is the same.

ab added inline comments.Apr 30 2015, 3:15 PM
lib/Target/X86/X86ISelLowering.cpp
17367–17373 ↗(On Diff #24067)

How about removing that line, and removing the FP_EXTEND block as well? Both seem pretty pointless IMO.

test/CodeGen/X86/half.ll
103 ↗(On Diff #24764)

Looks like when we don't have a native FP_TO_UINT, we try to expand it using FP_TO_SINT. I understand it as, the intuition being, when:

i64 fptosi (f32 a)

fits in 63 bits, we can just return that. If it doesn't, we can assume it has to be in [2^63; 2^64[ (otherwise the result is undefined), and we can first do:

a -= (f32)(2 << 63)

equivalent in spirit to something like "(a & ~(2 << 63))", if a were an i64.
Then you can just to:

(or (i64 fptosi (f32 a)), (2 << 63))

So, the code looks good. The fact that we use cvttss2si can be confusing, so it's probably best to match more of that logic.

pirama added inline comments.Apr 30 2015, 5:42 PM
lib/Target/X86/X86ISelLowering.cpp
17367–17373 ↗(On Diff #24067)

Removing the operation action causes test/CodeGen/X86/pr11334.ll to fail. For this test, DAGTypeLegalizer::WidenVectorOperand calls CustomLowerNode based on the operand's value type which eventually causes the v2f32 to be expanded to a v4f32 and use a VFPEXT (in LowerFP_EXTEND).

I understood that operation action is always defined for the value type of a node and not on its operand's types. But, looks like that's not the case.

test/CodeGen/X86/half.ll
103 ↗(On Diff #24764)

Aah, that makes sense. Thanks for the explanation!

pirama updated this revision to Diff 24818.May 1 2015, 10:36 AM

Stricter matching for test_fptoui_i64

ab accepted this revision.May 7 2015, 3:28 PM
ab edited edge metadata.

Looking at this again, the tests should be stricter, I think. Operands, perhaps -NEXT (on X86, you shouldn't have the scheduling/allocation problems you had last time; ARM is .. special), at the very least for the scalar tests.

With that, LGTM, thanks!

lib/Target/X86/X86ISelLowering.cpp
17517 ↗(On Diff #24818)

Just "fallthrough" ?

17367–17373 ↗(On Diff #24067)

Ah, that's nasty. (yes, the operation action is keyed off of different types depending on who's asking..)

Your change is fine then, I don't have any better idea right now.

test/CodeGen/X86/half.ll
119 ↗(On Diff #24818)

'g' -> 'q'

This revision is now accepted and ready to land.May 7 2015, 3:28 PM
pirama updated this revision to Diff 25362.May 8 2015, 1:01 PM
pirama edited edge metadata.
  • Make scalar tests stricter
  • Use the correct tag, CHECK-F16C, in the tests (both old and new) instead of CHECK-FP16. (Nasty, yes)

Ahmed, Elena, I have made the scalar tests stricter. This update was slightly non-trivial. I'll wait for another review instead of pushing based on the prior LGTM.

ab added a comment.May 11 2015, 9:33 AM

Stricter tests indeed, go ahead. Thanks!

-Ahmed

test/CodeGen/X86/half.ll
1–2 ↗(On Diff #25362)

You might want to wrap both of these lines, they're starting to get pretty unwieldy!

pirama updated this revision to Diff 25481.May 11 2015, 10:17 AM

Split 'RUN:' lines in test

This revision was automatically updated to reflect the committed changes.