This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/GlobalISel/
-
GlobalISel/
-
LegalizerHelper.cpp
-
Target/AArch64/GISel/
-
AArch64/
-
GISel/
1/2
AArch64LegalizerInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/4
fpext.ll

Differential D155601

[AArch64][GISel] Additional FPExt vector lowering
ClosedPublic

Authored by dmgreen on Jul 18 2023, 7:54 AM.

Download Raw Diff

Details

Reviewers

aemerson
arsenm
paquette
Petar.Avramovic

Commits

rG6edc9a766213: [AArch64][GISel] Additional FPExt vector lowering

Summary

Similar to D155311, this adds lowering for more vector cases for FPExt

Diff Detail

Event Timeline

dmgreen created this revision.Jul 18 2023, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 7:54 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Jul 18 2023, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 7:54 AM

Herald added a subscriber: wdng. · View Herald Transcript

dmgreen added a child revision: D155311: [AArch64][GISel] Additional FPTrunc vector lowering.Jul 18 2023, 7:55 AM

Harbormaster completed remote builds in B246224: Diff 541535.Jul 18 2023, 7:55 AM

If you also test bfloat you'll find it miscompiles

This revision is now accepted and ready to land.Jul 18 2023, 8:21 AM

tschuett added a subscriber: tschuett.Jul 18 2023, 11:27 AM

tschuett added inline comments.

llvm/test/CodeGen/AArch64/fpext.ll
204	What is the trick behind the difference?

arsenm added inline comments.Jul 18 2023, 11:28 AM

llvm/test/CodeGen/AArch64/fpext.ll
204	the dag widened the vector and gisel scalarized it

tschuett added inline comments.Jul 18 2023, 11:31 AM

llvm/test/CodeGen/AArch64/fpext.ll
204	I missed the 3.

tschuett added inline comments.Jul 18 2023, 11:37 AM

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
545	.clampMinNumElements(0, s32, 2) .clampMaxNumElements(0, s32, 4) .clampMinNumElements(0, s64, 1) .clampMaxNumElements(0, s64, 2) ?

and
.widenScalarOrEltToNextPow2(0, /*Min */)
. moreElementsToNextPow2(0 , /*Min*/);

In D155601#4510788, @arsenm wrote:

If you also test bfloat you'll find it miscompiles

Yeah. My understanding is that either the s16 type needs to distinguish between different float representations, or the G_FPEXT operation would need to, either with a different G_ opcode or some sort of flag.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
545	The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16->v2f32 instructions (unless you count the lower 2 lanes of a v4f16->v4f32 fcvtl, and it would seem better to legalize in the legalize step so we only have to deal with legal operation later, and reuse all the tablegen patterns).
llvm/test/CodeGen/AArch64/fpext.ll
204	Yeah the 3 is awkward. This comes from the expansion, adding `undef` lanes that are not yet cleaned up properly. It should be possible to improve it, and we can get back to the same codegen as SDAG

This revision was landed with ongoing or failed builds.Jul 23 2023, 8:58 AM

Closed by commit rG6edc9a766213: [AArch64][GISel] Additional FPExt vector lowering (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG6edc9a766213: [AArch64][GISel] Additional FPExt vector lowering.

dmgreen added a child revision: D156171: [AArch64][GlobalISel] G_FMINNUM and G_FMAXNUM vector lowering.Jul 30 2023, 7:01 AM

Update and rebase.

Harbormaster completed remote builds in B249051: Diff 545440.Jul 30 2023, 7:26 AM

^ Oops. That was somehow attached to the wrong patch.

dmgreen mentioned this in D157679: [AArch64][GISel] Extend lowering for fp round intrinsics..Aug 14 2023, 1:18 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

GlobalISel/

LegalizerHelper.cpp

3 lines

Target/

AArch64/

GISel/

AArch64LegalizerInfo.cpp

4 lines

test/

CodeGen/

AArch64/

fpext.ll

234 lines

Diff 541535

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,944 Lines • ▼ Show 20 Lines	LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
}		}
case TargetOpcode::G_TRUNC: {		case TargetOpcode::G_TRUNC: {
Observer.changingInstr(MI);		Observer.changingInstr(MI);
moreElementsVectorSrc(MI, MoreTy, 1);		moreElementsVectorSrc(MI, MoreTy, 1);
moreElementsVectorDst(MI, MoreTy, 0);		moreElementsVectorDst(MI, MoreTy, 0);
Observer.changedInstr(MI);		Observer.changedInstr(MI);
return Legalized;		return Legalized;
}		}
case TargetOpcode::G_FPTRUNC: {		case TargetOpcode::G_FPTRUNC:
		case TargetOpcode::G_FPEXT: {
if (TypeIdx != 0)		if (TypeIdx != 0)
return UnableToLegalize;		return UnableToLegalize;
Observer.changingInstr(MI);		Observer.changingInstr(MI);
LLT SrcTy = LLT::fixed_vector(		LLT SrcTy = LLT::fixed_vector(
MoreTy.getNumElements(),		MoreTy.getNumElements(),
MRI.getType(MI.getOperand(1).getReg()).getElementType());		MRI.getType(MI.getOperand(1).getReg()).getElementType());
moreElementsVectorSrc(MI, SrcTy, 1);		moreElementsVectorSrc(MI, SrcTy, 1);
moreElementsVectorDst(MI, MoreTy, 0);		moreElementsVectorDst(MI, MoreTy, 0);
▲ Show 20 Lines • Show All 3,157 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Show First 20 Lines • Show All 535 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_FPTRUNC)
{{s16, s32}, {s16, s64}, {s32, s64}, {v4s16, v4s32}, {v2s32, v2s64}})		{{s16, s32}, {s16, s64}, {s32, s64}, {v4s16, v4s32}, {v2s32, v2s64}})
.clampNumElements(0, v4s16, v4s16)		.clampNumElements(0, v4s16, v4s16)
.clampNumElements(0, v2s32, v2s32)		.clampNumElements(0, v2s32, v2s32)
.scalarize(0);		.scalarize(0);

getActionDefinitionsBuilder(G_FPEXT)		getActionDefinitionsBuilder(G_FPEXT)
.legalFor(		.legalFor(
{{s32, s16}, {s64, s16}, {s64, s32}, {v4s32, v4s16}, {v2s64, v2s32}})		{{s32, s16}, {s64, s16}, {s64, s32}, {v4s32, v4s16}, {v2s64, v2s32}})
.clampMaxNumElements(0, s64, 2);		.clampNumElements(0, v4s32, v4s32)
		.clampNumElements(0, v2s64, v2s64)
		tschuettUnsubmitted Not Done Reply Inline Actions .clampMinNumElements(0, s32, 2) .clampMaxNumElements(0, s32, 4) .clampMinNumElements(0, s64, 1) .clampMaxNumElements(0, s64, 2) ? tschuett: .clampMinNumElements(0, s32, 2) .clampMaxNumElements(0, s32, 4) .clampMinNumElements(0, s64, 1)…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16->v2f32 instructions (unless you count the lower 2 lanes of a v4f16->v4f32 fcvtl, and it would seem better to legalize in the legalize step so we only have to deal with legal operation later, and reuse all the tablegen patterns). dmgreen: The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16…
		.scalarize(0);

// Conversions		// Conversions
getActionDefinitionsBuilder({G_FPTOSI, G_FPTOUI})		getActionDefinitionsBuilder({G_FPTOSI, G_FPTOUI})
.legalForCartesianProduct({s32, s64, v2s64, v4s32, v2s32})		.legalForCartesianProduct({s32, s64, v2s64, v4s32, v2s32})
.widenScalarToNextPow2(0)		.widenScalarToNextPow2(0)
.clampScalar(0, s32, s64)		.clampScalar(0, s32, s64)
.widenScalarToNextPow2(1)		.widenScalarToNextPow2(1)
.clampScalar(1, s32, s64);		.clampScalar(1, s32, s64);
▲ Show 20 Lines • Show All 1,133 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fpext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
				; RUN: llc -mtriple=aarch64-none-eabi -global-isel=0 -verify-machineinstrs %s -o - \| FileCheck %s --check-prefixes=CHECK,CHECK-SD
				; RUN: llc -mtriple=aarch64-none-eabi -global-isel=1 -verify-machineinstrs %s -o - \| FileCheck %s --check-prefixes=CHECK,CHECK-GI

				define double @fpext_f32_f64(float %a) {
				; CHECK-LABEL: fpext_f32_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvt d0, s0
				; CHECK-NEXT: ret
				entry:
				%c = fpext float %a to double
				ret double %c
				}

				define double @fpext_f16_f64(half %a) {
				; CHECK-LABEL: fpext_f16_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvt d0, h0
				; CHECK-NEXT: ret
				entry:
				%c = fpext half %a to double
				ret double %c
				}

				define float @fpext_f16_f32(half %a) {
				; CHECK-LABEL: fpext_f16_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: ret
				entry:
				%c = fpext half %a to float
				ret float %c
				}

				define <2 x double> @fpext_v2f32_v2f64(<2 x float> %a) {
				; CHECK-LABEL: fpext_v2f32_v2f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvtl v0.2d, v0.2s
				; CHECK-NEXT: ret
				entry:
				%c = fpext <2 x float> %a to <2 x double>
				ret <2 x double> %c
				}

				define <3 x double> @fpext_v3f32_v3f64(<3 x float> %a) {
				; CHECK-SD-LABEL: fpext_v3f32_v3f64:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: fcvtl v3.2d, v0.2s
				; CHECK-SD-NEXT: fcvtl2 v2.2d, v0.4s
				; CHECK-SD-NEXT: // kill: def $d2 killed $d2 killed $q2
				; CHECK-SD-NEXT: fmov d0, d3
				; CHECK-SD-NEXT: ext v1.16b, v3.16b, v3.16b, #8
				; CHECK-SD-NEXT: // kill: def $d1 killed $d1 killed $q1
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v3f32_v3f64:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: mov s1, v0.s[2]
				; CHECK-GI-NEXT: fcvtl v0.2d, v0.2s
				; CHECK-GI-NEXT: fcvt d2, s1
				; CHECK-GI-NEXT: mov d1, v0.d[1]
				; CHECK-GI-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <3 x float> %a to <3 x double>
				ret <3 x double> %c
				}

				define <4 x double> @fpext_v4f32_v4f64(<4 x float> %a) {
				; CHECK-SD-LABEL: fpext_v4f32_v4f64:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: fcvtl2 v1.2d, v0.4s
				; CHECK-SD-NEXT: fcvtl v0.2d, v0.2s
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v4f32_v4f64:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: mov d1, v0.d[1]
				; CHECK-GI-NEXT: fcvtl v0.2d, v0.2s
				; CHECK-GI-NEXT: fcvtl v1.2d, v1.2s
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <4 x float> %a to <4 x double>
				ret <4 x double> %c
				}

				define <2 x double> @fpext_v2f16_v2f64(<2 x half> %a) {
				; CHECK-SD-LABEL: fpext_v2f16_v2f64:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-SD-NEXT: mov h1, v0.h[1]
				; CHECK-SD-NEXT: fcvt d0, h0
				; CHECK-SD-NEXT: fcvt d1, h1
				; CHECK-SD-NEXT: mov v0.d[1], v1.d[0]
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v2f16_v2f64:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: fmov x8, d0
				; CHECK-GI-NEXT: fmov s0, w8
				; CHECK-GI-NEXT: mov h1, v0.h[1]
				; CHECK-GI-NEXT: fcvt d0, h0
				; CHECK-GI-NEXT: fcvt d1, h1
				; CHECK-GI-NEXT: mov v0.d[1], v1.d[0]
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <2 x half> %a to <2 x double>
				ret <2 x double> %c
				}

				define <3 x double> @fpext_v3f16_v3f64(<3 x half> %a) {
				; CHECK-LABEL: fpext_v3f16_v3f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: mov h1, v0.h[1]
				; CHECK-NEXT: mov h2, v0.h[2]
				; CHECK-NEXT: fcvt d0, h0
				; CHECK-NEXT: fcvt d1, h1
				; CHECK-NEXT: fcvt d2, h2
				; CHECK-NEXT: ret
				entry:
				%c = fpext <3 x half> %a to <3 x double>
				ret <3 x double> %c
				}

				define <4 x double> @fpext_v4f16_v4f64(<4 x half> %a) {
				; CHECK-SD-LABEL: fpext_v4f16_v4f64:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-SD-NEXT: mov h1, v0.h[1]
				; CHECK-SD-NEXT: mov h2, v0.h[3]
				; CHECK-SD-NEXT: mov h3, v0.h[2]
				; CHECK-SD-NEXT: fcvt d0, h0
				; CHECK-SD-NEXT: fcvt d4, h1
				; CHECK-SD-NEXT: fcvt d2, h2
				; CHECK-SD-NEXT: fcvt d1, h3
				; CHECK-SD-NEXT: mov v0.d[1], v4.d[0]
				; CHECK-SD-NEXT: mov v1.d[1], v2.d[0]
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v4f16_v4f64:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-GI-NEXT: mov s1, v0.s[1]
				; CHECK-GI-NEXT: mov h2, v0.h[1]
				; CHECK-GI-NEXT: fcvt d0, h0
				; CHECK-GI-NEXT: mov h3, v1.h[1]
				; CHECK-GI-NEXT: fcvt d2, h2
				; CHECK-GI-NEXT: fcvt d1, h1
				; CHECK-GI-NEXT: fcvt d3, h3
				; CHECK-GI-NEXT: mov v0.d[1], v2.d[0]
				; CHECK-GI-NEXT: mov v1.d[1], v3.d[0]
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <4 x half> %a to <4 x double>
				ret <4 x double> %c
				}

				define <2 x float> @fpext_v2f16_v2f32(<2 x half> %a) {
				; CHECK-SD-LABEL: fpext_v2f16_v2f32:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v2f16_v2f32:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: fmov x8, d0
				; CHECK-GI-NEXT: fmov s0, w8
				; CHECK-GI-NEXT: mov h1, v0.h[1]
				; CHECK-GI-NEXT: mov v0.h[1], v1.h[0]
				; CHECK-GI-NEXT: mov v0.h[2], v0.h[0]
				; CHECK-GI-NEXT: mov v0.h[3], v0.h[0]
				; CHECK-GI-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-GI-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <2 x half> %a to <2 x float>
				ret <2 x float> %c
				}

				define <3 x float> @fpext_v3f16_v3f32(<3 x half> %a) {
				; CHECK-SD-LABEL: fpext_v3f16_v3f32:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v3f16_v3f32:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-GI-NEXT: mov h1, v0.h[1]
				; CHECK-GI-NEXT: mov h2, v0.h[2]
				; CHECK-GI-NEXT: mov v0.h[1], v1.h[0]
				; CHECK-GI-NEXT: mov v0.h[2], v2.h[0]
				; CHECK-GI-NEXT: mov v0.h[3], v0.h[0]
				; CHECK-GI-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-GI-NEXT: mov s1, v0.s[1]
				; CHECK-GI-NEXT: mov s2, v0.s[2]
				; CHECK-GI-NEXT: mov v0.s[1], v1.s[0]
				; CHECK-GI-NEXT: mov v0.s[2], v2.s[0]
				; CHECK-GI-NEXT: mov v0.s[3], v0.s[0]
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <3 x half> %a to <3 x float>
				tschuettUnsubmitted Not Done Reply Inline Actions What is the trick behind the difference? tschuett: What is the trick behind the difference?
				arsenmUnsubmitted Not Done Reply Inline Actions the dag widened the vector and gisel scalarized it arsenm: the dag widened the vector and gisel scalarized it
				tschuettUnsubmitted Not Done Reply Inline Actions I missed the 3. tschuett: I missed the 3.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah the 3 is awkward. This comes from the expansion, adding `undef` lanes that are not yet cleaned up properly. It should be possible to improve it, and we can get back to the same codegen as SDAG dmgreen: Yeah the 3 is awkward. This comes from the expansion, adding `undef` lanes that are not yet…
				ret <3 x float> %c
				}

				define <4 x float> @fpext_v4f16_v4f32(<4 x half> %a) {
				; CHECK-LABEL: fpext_v4f16_v4f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-NEXT: ret
				entry:
				%c = fpext <4 x half> %a to <4 x float>
				ret <4 x float> %c
				}

				define <8 x float> @fpext_v8f16_v8f32(<8 x half> %a) {
				; CHECK-SD-LABEL: fpext_v8f16_v8f32:
				; CHECK-SD: // %bb.0: // %entry
				; CHECK-SD-NEXT: fcvtl2 v1.4s, v0.8h
				; CHECK-SD-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-SD-NEXT: ret
				;
				; CHECK-GI-LABEL: fpext_v8f16_v8f32:
				; CHECK-GI: // %bb.0: // %entry
				; CHECK-GI-NEXT: mov d1, v0.d[1]
				; CHECK-GI-NEXT: fcvtl v0.4s, v0.4h
				; CHECK-GI-NEXT: fcvtl v1.4s, v1.4h
				; CHECK-GI-NEXT: ret
				entry:
				%c = fpext <8 x half> %a to <8 x float>
				ret <8 x float> %c
				}