This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding
ClosedPublic

Authored by RKSimon on Sep 9 2015, 8:00 AM.

Details

Summary

Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):

<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.

Added constant folding support.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 34338.Sep 9 2015, 8:00 AM
RKSimon retitled this revision from to [InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding.
RKSimon updated this object.
RKSimon added reviewers: majnemer, ab, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
ab accepted this revision.Sep 10 2015, 11:50 AM
ab edited edge metadata.

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
846–857 ↗(On Diff #34338)

I couldn't help but notice that this is very similar to the ppc case above. All else being equal, can you keep them together?

Plus, the ph2ps case is similar to ss2si below!

test/Transforms/InstCombine/x86-f16c.ll
3–4 ↗(On Diff #34338)

Unnecessary attributes?

This revision is now accepted and ready to land.Sep 10 2015, 11:50 AM

Thanks Ahmed, I'll look into doing something similar for CVTPS2PH in a future patch, although I'm a little concerned about matching all the (compile-time) rounding modes.

lib/Transforms/InstCombine/InstCombineCalls.cpp
846–857 ↗(On Diff #34338)

Yes there is a lot of code duplication going in this file - splitting off common code into helper functions should be quite straightforward.

test/Transforms/InstCombine/x86-f16c.ll
3–4 ↗(On Diff #34338)

Just matching what is done in other tests - I'll strip them off.

ab added inline comments.Sep 10 2015, 1:25 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
846–857 ↗(On Diff #34338)

Right: in case I wasn't clear, I'm not saying you should refactor the entire function for this patch, just to move the added ph2ps case block between storeu and cvtss2si.

This revision was automatically updated to reflect the committed changes.