This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
ClosedPublic

Authored by RKSimon on May 23 2016, 9:27 AM.

Details

Summary

Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.

This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 58106.May 23 2016, 9:27 AM
RKSimon retitled this revision from to [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: cfe-commits.
mkuper edited edge metadata.May 23 2016, 10:23 AM

Thanks, Simon!
This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

lib/Headers/emmintrin.h
390 ↗(On Diff #58106)

It looks like there's a missing paren after the first __v4sf.
How does the test compile? Or am I misreading?

This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

I do have the relevant changes for llvm\test\CodeGen\X86\sse2-intrinsics-fast-isel.ll and llvm\test\CodeGen\X86\avx-intrinsics-fast-isel.ll (I spent most of last week adding them all.....). Do you want me to setup a separate llvm patch for review? I'm not ready to do the rest of the llvm work (removal of the llvm intrinsics / auto-upgrade etc.). but the fast-isel changes are very simple.

lib/Headers/emmintrin.h
390 ↗(On Diff #58106)

Sorry, that's me 'fixing' clang-format which I stupidly forgot to run until just before submission.

This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

I do have the relevant changes for llvm\test\CodeGen\X86\sse2-intrinsics-fast-isel.ll and llvm\test\CodeGen\X86\avx-intrinsics-fast-isel.ll (I spent most of last week adding them all.....). Do you want me to setup a separate llvm patch for review? I'm not ready to do the rest of the llvm work (removal of the llvm intrinsics / auto-upgrade etc.). but the fast-isel changes are very simple.

Sorry, I didn't intend to imply the rest of the llvm work is necessary for this to go in. Just that I'd be happier with this patch knowing that we have a regression test for doing the (shuffle + fpext, say) lowering correctly. I didn't even mean fast-isel, only the DAG.

RKSimon updated this revision to Diff 58146.May 23 2016, 1:27 PM
RKSimon edited edge metadata.

Sorry, I didn't intend to imply the rest of the llvm work is necessary for this to go in. Just that I'd be happier with this patch knowing that we have a regression test for doing the (shuffle + fpext, say) lowering correctly. I didn't even mean fast-isel, only the DAG.

The fast-isel tests are the most self contained (and are useful to show the non-optimized codegen for every intrinsic in the headers). I can submit them now if you wish.

Presumably, the fast-isel lowering of the IR pattern is already correct, and in any case, it isn't affected by this patch.
I just want to make sure we don't regress the optimized DAG codegen - that is, it still produces the instruction we'd expect from the intrinsic (or something at least as good).

Presumably, the fast-isel lowering of the IR pattern is already correct, and in any case, it isn't affected by this patch.
I just want to make sure we don't regress the optimized DAG codegen - that is, it still produces the instruction we'd expect from the intrinsic (or something at least as good).

The existing llvm\test\CodeGen\X86\vec_fpext.ll and llvm\test\CodeGen\X86\vec_int_to_fp.ll already demonstrate the correct optimized DAG codegen using the same IR as output in the clang\test\CodeGen\*-builtins.c here.

Also, the aim is to keep the llvm\test\CodeGen\X86\*-intrinsics-fast-isel.ll tests in sync with the llvm\tools\clang\test\CodeGen\*-builtins.c equivalents.

mkuper accepted this revision.May 23 2016, 3:05 PM
mkuper edited edge metadata.

The existing llvm\test\CodeGen\X86\vec_fpext.ll and llvm\test\CodeGen\X86\vec_int_to_fp.ll already demonstrate the correct optimized DAG codegen using the same IR as output in the clang\test\CodeGen\*-builtins.c here.

That's what I meant by "Do you know if there are already CG tests that check we correctly lower these IR patterns", sorry I wasn't more clear.
This LGTM.

This revision is now accepted and ready to land.May 23 2016, 3:05 PM
This revision was automatically updated to reflect the committed changes.