This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve lowering of insert_vector_elt with 0.0 consts.
ClosedPublic

Authored by fhahn on Oct 26 2020, 11:02 AM.

Details

Summary

When moving 0.0 into a float vector, we can use to vi*gpr variants of
INS. I am not sure if we can easily express this in the tablegen
descriptions, because INS*vi*gpr is only defined for integer vectors and
I am not sure how to convert things there.

This patch extends LowerINSERT_VECTOR_ELT to bitcast the input float
vector to an integer vector, apply the insertion and bitcast the result
back.

This way, we can piggy-back on the matching for the integer variants.

Diff Detail

Event Timeline

fhahn created this revision.Oct 26 2020, 11:02 AM
fhahn requested review of this revision.Oct 26 2020, 11:02 AM

We could turn this into a more general combine, We use fmov from a GPR to materialize fp constants in other cases. But maybe just zero is fine to start.

I am not sure if we can easily express this in the tablegen descriptions, because INS*vi*gpr is only defined for integer vectors and I am not sure how to convert things there.

MachineInstrs don't distinguish between integer and float vectors; there's only V64/V128 register classes. So the types only matter for the inputs to tablegen patterns, not the outputs. You can write a pattern that takes a float vector as an input and produces an INSvi32gpr without any casting.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257 ↗(On Diff #300742)

I think isZero is true for -0.0?

fhahn updated this revision to Diff 301087.Oct 27 2020, 1:05 PM

Updated to make sure we do not do this transform for -0.0, thanks!

We could turn this into a more general combine, We use fmov from a GPR to materialize fp constants in other cases. But maybe just zero is fine to start.

So for other FP constants that have values that can be cheaply matrialized into GPRs? That would be good. I guess that would be helpful in all contexts where we can use GPRs directly instead of FPRs. Is there a way to match something like that generically?

Happy to look into that, but would prefer to start just with the 0.0 case.

I am not sure if we can easily express this in the tablegen descriptions, because INS*vi*gpr is only defined for integer vectors and I am not sure how to convert things there.

MachineInstrs don't distinguish between integer and float vectors; there's only V64/V128 register classes. So the types only matter for the inputs to tablegen patterns, not the outputs. You can write a pattern that takes a float vector as an input and produces an INSvi32gpr without any casting.

Thanks! I think I was running into some ambiguity with the patterns, not sure exactly what was going on. Would you prefer the tablegen version?

fhahn added a comment.Oct 27 2020, 1:05 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257 ↗(On Diff #300742)

Indeed, thanks for spotting this!

We could turn this into a more general combine, We use fmov from a GPR to materialize fp constants in other cases. But maybe just zero is fine to start.

So for other FP constants that have values that can be cheaply matrialized into GPRs? That would be good. I guess that would be helpful in all contexts where we can use GPRs directly instead of FPRs. Is there a way to match something like that generically?

For values that aren't literals, there would be a bitcast or something like that.

For literals, the rule for whether we use a GPR is complicated; see AArch64TargetLowering::isFPImmLegal. But once we get to isel, the actual lowering is pretty simple; see bitcast_fpimm_to_i32.

Happy to look into that, but would prefer to start just with the 0.0 case.

I am not sure if we can easily express this in the tablegen descriptions, because INS*vi*gpr is only defined for integer vectors and I am not sure how to convert things there.

MachineInstrs don't distinguish between integer and float vectors; there's only V64/V128 register classes. So the types only matter for the inputs to tablegen patterns, not the outputs. You can write a pattern that takes a float vector as an input and produces an INSvi32gpr without any casting.

Thanks! I think I was running into some ambiguity with the patterns, not sure exactly what was going on. Would you prefer the tablegen version?

I'm a little worried this version will make other combines more complicated; I'd prefer the TableGen version if it isn't too hard.

fhahn updated this revision to Diff 301225.Oct 28 2020, 3:57 AM

Thanks! I think I was running into some ambiguity with the patterns, not sure exactly what was going on. Would you prefer the tablegen version?

I'm a little worried this version will make other combines more complicated; I'd prefer the TableGen version if it isn't too hard.

Thanks, I figured out what was going wrong with the tablegen version initially. The tablegen patterns now work for f32 and f64, but for some reason the matching does not work for f16.

Could this be related to fpimm0 not matching 0.0 for f16? It prints ConstantFP:f16<APFloat(0)> instead of ConstantFP:f32<0.00000> for f32 constants.

but for some reason the matching does not work for f16

I think isFPImmLegal is false, so we're forcing a constant pool. I think you can make the testcase work with +fullfp16?

fhahn updated this revision to Diff 301401.Oct 28 2020, 1:01 PM

but for some reason the matching does not work for f16

I think isFPImmLegal is false, so we're forcing a constant pool. I think you can make the testcase work with +fullfp16?

Yeah, that was the issue. Adding +fullfp16 solved the issue. Updated the tests. It's probably not worth trying to improve f16 codegen without +fullfp16.

This revision is now accepted and ready to land.Oct 28 2020, 2:05 PM
This revision was landed with ongoing or failed builds.Oct 28 2020, 2:35 PM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Oct 28 2020, 2:54 PM

Thanks for the review!