This is an archive of the discontinued LLVM Phabricator instance.

[X86] Teach how to custom lower double-to-half conversions under fast-math.
ClosedPublic

Authored by andreadb on Feb 23 2015, 8:27 AM.

Details

Summary

This patch teaches the backend how to custom lower a 'fp_to_fp16' node that performs a double-to-half conversion.

Under fast-math, if the target has F16C, the backend can expand a double-to-half conversion into a double-to-float conversion immediately followed by a float-to-half conversion. Before this patch, a double-to-half conversion was always expanded into a library call even under fast-math.

Example:
\code
define zeroext i16 @func(double %d) #0 {
entry:

%0 = tail call i16 @llvm.convert.to.fp16.f64(double %d)
ret i16 %0

}

attributes #0 = { "unsafe-fp-math=true" "use-soft-float"="false" }
\code end

Before this patch (with -mattr=+f16c), the conversion from double to fp16 was expanded into a library call to function '__truncdfhf2'.

With this patch, the double-to-half conversion is now expanded into the sequence:

vcvtsd2ss %xmm0, %xmm0, %xmm0
vcvtps2ph $0, %xmm0, %xmm0

Note that this patch also handles 'long double'-to-half conversions.

This patch doesn't add custom lowering rules for 'fp16_to_fp' dag nodes. The reason why we don't need those rules is because LegalizeDAG (see around lines 3532:3546) already knows how to expand a half-to-double conversion into a 'FP16_TO_FP' plus 'FP_EXTEND'.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 20515.Feb 23 2015, 8:27 AM
andreadb retitled this revision from to [X86] Teach how to custom lower double-to-half conversions under fast-math..
andreadb updated this object.
andreadb edited the test plan for this revision. (Show Details)
andreadb added reviewers: qcolombet, ab, grosbach.
andreadb added a subscriber: Unknown Object (MLST).
ab edited edge metadata.Feb 23 2015, 9:37 AM

Should this be in the target-independent legalizer instead, like FP16_TO_FP? Only with UnsafeFPMath, of course (I'm a bit uncomfortable with that, but I don't find it shocking, and I see there's precedent.)

Otherwise, the change seems reasonable, thanks!

-Ahmed

test/CodeGen/X86/fastmath-float-half-conversion.ll
17 ↗(On Diff #20515)

Not very important, but add a check for fp80->fp32, perhaps?

andreadb updated this revision to Diff 20524.Feb 23 2015, 11:12 AM
andreadb edited edge metadata.

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.
I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

Please let me know what you think.

Thanks!
-Andrea

ab added a comment.Feb 23 2015, 11:41 AM

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.

If f16->f32 isn't legal as well, we just turned one libcall into two, no?

I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

I expected something like that, thanks for adding it!

-Ahmed

Please let me know what you think.

Thanks!
-Andrea

In D7832#128423, @ab wrote:

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.

If f16->f32 isn't legal as well, we just turned one libcall into two, no?

Right, it doesn't make sense to expand that node if f16->f32 is not legal.
We would still generate a single libcall (for the f16->f32 conversion). However, we would get a extra (v)cvtsd2ss for the double-half conversion.
I will add a check to make sure that the f16->f32 is 'legalOrCustom' before attempting to expand the double-half conversion.

I will upload a new version patch.

I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

I expected something like that, thanks for adding it!

-Ahmed

Please let me know what you think.

Thanks!
-Andrea

andreadb updated this revision to Diff 20531.Feb 23 2015, 12:23 PM

Uploaded a new version of the patch.
This time, we avoid expanding a 'fp_to_fp16' if the float-half is not legal (or custom) for the target.
I also added a extra RUN line in my new test to verify the codegen if the target doesn't have F16C.

Please let me know what you think.

Thanks again for your time.
Andrea

ab accepted this revision.Feb 23 2015, 1:11 PM
ab edited edge metadata.

LGTM now, thanks!

-Ahmed

This revision is now accepted and ready to land.Feb 23 2015, 1:11 PM
This revision was automatically updated to reflect the committed changes.

Thanks Ahmed.
Committed revision 230276.