This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Narrowing half-precision lowering to supported CCs
ClosedPublic

Authored by plotfi on Jun 24 2020, 2:12 AM.

Details

Summary

I had noticed an assert in the ARM backend on builds of the Swift stdlib. I've since reduced the offending IR and reported my findings to the folks on D75169. I suspect that the changes in D75169 were meant mainly for the AAPCS calling conventions and indeed they do work fine with them, but with fastcc and swiftcc I noticed there was a crash.

I also noticed that the line that checks the calling convention in D75169 is only checking to see that there is a calling convention, not specifically checking that it is a supported calling convention. So this patch does just that, and includes a regression test too.

Diff Detail

Event Timeline

plotfi created this revision.Jun 24 2020, 2:12 AM

Hi @plotfi,

I believe simply limiting the calling conventions might be a bit tricky. At this point the method might get a calling convention id that does not reflect the one required for argument lowering (see ARMTargetLowering::getEffectiveCallingConv in ARMISelLowering.cpp).
One option to get around this would be to use that same method to determine the effective calling convention, but it might not be easy to check whether or not the function beeing handled is variadic.

Also, I did not expect to see a difference in the resulting parts' types between getCopyToParts and ARMTargetLowering::splitValueIntoRegisterParts, only in how the final type is achieved. I'm investigating this further to see what's missing.

Taking a further look to the differences in the DAG, the resulting type is actually the same.
Both result in an f32 value, but get to it in different ways as expected.

With ARMTargetLowering::splitValueIntoRegisterParts:

t22: f32 = bitcast t21
  t21: i32 = any_extend t20
    t20: i16 = bitcast t5
      t5: f16 = extract_vector_elt t3, Constant:i32<0>
        t3: v8f16,v8f16 = merge_values t2, t2
          t2: v8f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f
16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>

Without ARMTargetLowering::splitValueIntoRegisterParts:

t20: f32 = fp_extend t5
  t5: f16 = extract_vector_elt t3, Constant:i32<0>
    t3: v8f16,v8f16 = merge_values t2, t2
      t2: v8f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<A
PFloat(0)>, ConstantFP:f16<APFloat(0)>, ConstantFP:f16<APFloat(0)>

I'm taking a deeper look into the assertion failure stack to figure out what's causing it.

It seems most of the argument lowering convertions get simplifyed on the regular case, when not using fastcc.
When using it, though, type legalization ends up trying to handle a i16 = bitcast ConstantFP:f16<APFloat(0)> node, turning it into an i32 <- f32 operation and getting lost while trying to create the new constant.

Avoiding the splitting should do the trick for fastcc, as it has no need to conform with AAPCS. I'm not sure about swiftcc, though, as it is expected to comply with AAPCS-VFP on ios platforms according to LLVM's language reference manual.

llvm/lib/Target/ARM/ARMISelLowering.cpp
4159 ↗(On Diff #272940)

CallingConv::ARM_AAPCS_VFP and CallingConv::C would also need to be included here.

plotfi marked an inline comment as done.Jun 24 2020, 11:06 AM

It seems most of the argument lowering convertions get simplifyed on the regular case, when not using fastcc.
When using it, though, type legalization ends up trying to handle a i16 = bitcast ConstantFP:f16<APFloat(0)> node, turning it into an i32 <- f32 operation and getting lost while trying to create the new constant.

Avoiding the splitting should do the trick for fastcc, as it has no need to conform with AAPCS. I'm not sure about swiftcc, though, as it is expected to comply with AAPCS-VFP on ios platforms according to LLVM's language reference manual.

llvm/lib/Target/ARM/ARMISelLowering.cpp
4159 ↗(On Diff #272940)

I believe I tried ARM_AAPCS_VFP and it asserted. Lemme confirm.

It seems most of the argument lowering convertions get simplifyed on the regular case, when not using fastcc.
When using it, though, type legalization ends up trying to handle a i16 = bitcast ConstantFP:f16<APFloat(0)> node, turning it into an i32 <- f32 operation and getting lost while trying to create the new constant.

Avoiding the splitting should do the trick for fastcc, as it has no need to conform with AAPCS. I'm not sure about swiftcc, though, as it is expected to comply with AAPCS-VFP on ios platforms according to LLVM's language reference manual.

I gave the following a try, and it also asserts:

target datalayout = "e-m:o-p:32:32-Fi8-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32"
target triple = "thumbv7s-apple-ios7.0.0"

define arm_aapcs_vfpcc { <8 x half>, <8 x half> } @f() {
  ret { <8 x half>, <8 x half> } zeroinitializer
}

I've posted D82552 fixing the assertion failure on the type legalization's bitcast handling. Once it lands, it should be ok to include arm_aapcs_fvpcc and swiftcc on this patch.

plotfi updated this revision to Diff 274251.Jun 29 2020, 2:56 PM

Updated diff to only handle fastcc

@pratlucas I've updated this diff to only exclude fastcc. How does this look to you? Can I land this for now, as it does not interfere with your D82552 diff?

pratlucas accepted this revision.Jul 2 2020, 8:36 AM

@plotfi This should not interfere D82552.
I believe you can land this, LGTM.

This revision is now accepted and ready to land.Jul 2 2020, 8:36 AM
plotfi updated this revision to Diff 275158.Jul 2 2020, 10:24 AM
plotfi marked an inline comment as done.

Churning diff (no change) to see if Harbormaster will pass.

plotfi updated this revision to Diff 275163.Jul 2 2020, 10:36 AM

Simplifying diff into a oneliner.

Harbormaster completed remote builds in B62715: Diff 275163.
plotfi updated this revision to Diff 276786.Jul 9 2020, 11:33 AM

D82552 appears to have also fixed fastcc, so I will only land the test case.

plotfi updated this revision to Diff 276787.Jul 9 2020, 11:34 AM
This revision was automatically updated to reflect the committed changes.