Page MenuHomePhabricator

[AArch64] optimise v4f16 FCMPs to utilise vector instructions

Authored by carwil on Jan 5 2018, 9:22 AM.



Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported by generating FCTVL(s) rather than a longer series of FCVTs.

Diff Detail


Event Timeline

carwil created this revision.Jan 5 2018, 9:22 AM
SjoerdMeijer accepted this revision.Jan 8 2018, 2:03 AM

Thanks, looks good to me. Just a few nits inlined, no need for another review.

7296 ↗(On Diff #128760)

Nit: perhaps a "TODO remark" here that v8f16 could be optimised as well but is a bit more complicated?

7303 ↗(On Diff #128760)

Nit: newline not necessary?

7305 ↗(On Diff #128760)

Coding style nit: you don't need the brackets for the else-clause (you can check the coding style with clang-format)

This revision is now accepted and ready to land.Jan 8 2018, 2:03 AM
carwil updated this revision to Diff 129997.Jan 16 2018, 10:50 AM
carwil marked 3 inline comments as done.

Improved code formatting based on review

carwil updated this revision to Diff 130869.Jan 22 2018, 5:40 AM
carwil updated this revision to Diff 130874.Jan 22 2018, 6:18 AM

Missing colon

This revision was automatically updated to reflect the committed changes.