Similar to D155311, this adds lowering for more vector cases for FPExt
Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/fpext.ll | ||
---|---|---|
204 ↗ | (On Diff #541535) | What is the trick behind the difference? |
llvm/test/CodeGen/AArch64/fpext.ll | ||
---|---|---|
204 ↗ | (On Diff #541535) | the dag widened the vector and gisel scalarized it |
llvm/test/CodeGen/AArch64/fpext.ll | ||
---|---|---|
204 ↗ | (On Diff #541535) | I missed the *3*. |
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp | ||
---|---|---|
545 | .clampMinNumElements(0, s32, 2) ? |
Yeah. My understanding is that either the s16 type needs to distinguish between different float representations, or the G_FPEXT operation would need to, either with a different G_ opcode or some sort of flag.
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp | ||
---|---|---|
545 | The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16->v2f32 instructions (unless you count the lower 2 lanes of a v4f16->v4f32 fcvtl, and it would seem better to legalize in the legalize step so we only have to deal with legal operation later, and reuse all the tablegen patterns). | |
llvm/test/CodeGen/AArch64/fpext.ll | ||
204 ↗ | (On Diff #541535) | Yeah the 3 is awkward. This comes from the expansion, adding undef lanes that are not yet cleaned up properly. It should be possible to improve it, and we can get back to the same codegen as SDAG |
.clampMinNumElements(0, s32, 2)
.clampMaxNumElements(0, s32, 4)
.clampMinNumElements(0, s64, 1)
.clampMaxNumElements(0, s64, 2)
?