This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GISel] Additional FPExt vector lowering
ClosedPublic

Authored by dmgreen on Jul 18 2023, 7:54 AM.

Details

Summary

Similar to D155311, this adds lowering for more vector cases for FPExt

Diff Detail

Event Timeline

dmgreen created this revision.Jul 18 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 7:54 AM
dmgreen requested review of this revision.Jul 18 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 7:54 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm accepted this revision.Jul 18 2023, 8:21 AM

If you also test bfloat you'll find it miscompiles

This revision is now accepted and ready to land.Jul 18 2023, 8:21 AM
tschuett added inline comments.
llvm/test/CodeGen/AArch64/fpext.ll
204 ↗(On Diff #541535)

What is the trick behind the difference?

arsenm added inline comments.Jul 18 2023, 11:28 AM
llvm/test/CodeGen/AArch64/fpext.ll
204 ↗(On Diff #541535)

the dag widened the vector and gisel scalarized it

tschuett added inline comments.Jul 18 2023, 11:31 AM
llvm/test/CodeGen/AArch64/fpext.ll
204 ↗(On Diff #541535)

I missed the *3*.

tschuett added inline comments.Jul 18 2023, 11:37 AM
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
545

.clampMinNumElements(0, s32, 2)
.clampMaxNumElements(0, s32, 4)
.clampMinNumElements(0, s64, 1)
.clampMaxNumElements(0, s64, 2)

?

and
.widenScalarOrEltToNextPow2(0, /*Min */)
. moreElementsToNextPow2(0 , /*Min*/);

If you also test bfloat you'll find it miscompiles

Yeah. My understanding is that either the s16 type needs to distinguish between different float representations, or the G_FPEXT operation would need to, either with a different G_ opcode or some sort of flag.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
545

The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16->v2f32 instructions (unless you count the lower 2 lanes of a v4f16->v4f32 fcvtl, and it would seem better to legalize in the legalize step so we only have to deal with legal operation later, and reuse all the tablegen patterns).

llvm/test/CodeGen/AArch64/fpext.ll
204 ↗(On Diff #541535)

Yeah the 3 is awkward. This comes from the expansion, adding undef lanes that are not yet cleaned up properly. It should be possible to improve it, and we can get back to the same codegen as SDAG

This revision was landed with ongoing or failed builds.Jul 23 2023, 8:58 AM
This revision was automatically updated to reflect the committed changes.
dmgreen updated this revision to Diff 545440.Jul 30 2023, 7:26 AM

Update and rebase.

^ Oops. That was somehow attached to the wrong patch.