This is an archive of the discontinued LLVM Phabricator instance.

[ARM] FP16 VSEL codegen follow up
ClosedPublic

Authored by SjoerdMeijer on Apr 3 2018, 5:37 AM.

Details

Summary

This is a follow up of rL327695, to instruction select
more variants of VSELGT and VSELGE, for which it is necessary
to custom lower SELECT.

More work is required in this area, which will be addressed soon:

  • more variants need to be regression tested, but this depends on the next point.
  • LowerConstantFP need to be adjusted for fp16 values.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Apr 3 2018, 5:37 AM
olista01 added inline comments.Apr 3 2018, 5:58 AM
lib/Target/ARM/ARMISelLowering.cpp
4529 ↗(On Diff #140765)

Does this need to check Subtarget->hasFullFP16()?

test/CodeGen/ARM/fp16-instructions.ll
769 ↗(On Diff #140765)

What code actually gets emitted here? If it is correct but sub-optimal we should be testing it, otherwise this patch shouldn't be committed util the generated code is correct.

778 ↗(On Diff #140765)

This only checks the HARDFP-FULLFP16 case, should it also be checking the others? Also, should we check the VCMP instruction, as well as the VSEL?

Thanks for checking!
I have guarded the lowering with HasFullFP16, and have temporarily removed the files, because at the moment they don't add anything. They just generate some vcmpe.f16, which we already tests. I will put these tests back when LowerConstantFP has been fixed and these cases also lower to VSELs.

This patch now changes 4 DAG nodes, but doesn't touch any tests for most of them. Are these changes NFC, and properly tested elsewhere? If not, this needs more tests.

Ah, yes, should have explained that a bit better. This is related to your earlier comment about:

Does this need to check Subtarget->hasFullFP16()?

Instead of doing the f16 and hasFullFP16() check inside function LowerSELECT_CC, I have guarded custom lowering of the Node with the hasFullFP16 check. While doing this, I spotted the custom lowering of SETCC, SELECT, SELECT_CC and BR_CC for f16, and I think it is more correct to guard these custom lowerings with hasFullFP16 as well; that's what we should be doing. Thus, this is indeed a non-functional change. And they should already be regression tested. BR_CC is for example added in D43508 and has tests, and file fp16-instructions.ll should test the others too.

I forgot to address one of your earlier comments, i.e. the one about adding more test cases.

olista01 added inline comments.Apr 5 2018, 10:07 AM
test/CodeGen/ARM/fp16-instructions.ll
827 ↗(On Diff #141170)

Why are we generating more code for this case than the others?

Why are we generating more code for this case than the others?

There was some interaction between using "undef" as an operand and this "unordered or equal" operator. I have changed the test case to make it more intuitive (and realistic). The generated code is identical to using float, except the .f16 and .f32 differences of course.

olista01 accepted this revision.Apr 11 2018, 2:16 AM

LGTM, thanks.

This revision is now accepted and ready to land.Apr 11 2018, 2:16 AM
This revision was automatically updated to reflect the committed changes.