This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Do not fuse VADD and VMUL, continued (2/2)
ClosedPublic

Authored by SjoerdMeijer on Oct 16 2018, 1:59 AM.

Download Raw Diff

Details

Reviewers

samparker
t.p.northover
dmgreen
javed.absar

Commits

rG64cfb74a61e0: [ARM] Do not fuse VADD and VMUL, continued (2/2)
rL344683: [ARM] Do not fuse VADD and VMUL, continued (2/2)

Summary

This is patch 2/2, following up on D53314, to prevent fusing of mul + add sequences.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Oct 16 2018, 1:59 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 16 2018, 1:59 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

SjoerdMeijer retitled this revision from ARM] Do not fuse VADD and VMUL, continued (2/2) to [ARM] Do not fuse VADD and VMUL, continued (2/2).Oct 16 2018, 2:00 AM

SjoerdMeijer mentioned this in D53314: [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2).

samparker added inline comments.Oct 16 2018, 6:22 AM

test/CodeGen/ARM/fusedMAC.ll
10	should the f64 version not also be tested?

Herald added a subscriber: nhaehnle. · View Herald TranscriptOct 16 2018, 6:22 AM

should the f64 version not also be tested?

Both the M4 and M33 have only Single-Precision VFP support. Thus f64 operations result in EABI calls and so I don't think there's much to test here. I have cleaned up the RUN line in the test case though as we only need the -mcpu=cortex-m33 part and not the other attributes.

Good point. Would it be worth adding a test for the M7 though? We seem to be a little lacking in our m-class FP tests.

test/CodeGen/ARM/fusedMAC.ll
2	this is a funny looking triple for the M33

Would it be worth adding a test for the M7 though? We seem to be a little lacking in our m-class FP tests.

Yep, agreed. Added tests for the M7 and M4.

this is a funny looking triple for the M33

Oops, copy-paste, but now fixed.

nhaehnle removed a subscriber: nhaehnle.Oct 16 2018, 8:18 AM

Thanks, LGTM. With one bonus question, are the fused operations fast on the M7..?

This revision is now accepted and ready to land.Oct 16 2018, 8:19 AM

Cheers

With one bonus question, are the fused operations fast on the M7..?

:-)

From a first glance, it looks like we should give the M7 a similar treatment.
But I will double check, also run benchmarks, and will follow-up if necessary (which, again, looks very likely).

Closed by commit rL344683: [ARM] Do not fuse VADD and VMUL, continued (2/2) (authored by SjoerdMeijer). · Explain WhyOct 17 2018, 3:07 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMInstrInfo.td

6 lines

test/

CodeGen/

ARM/

fusedMAC.ll

9 lines

Diff 169826

lib/Target/ARM/ARMInstrInfo.td

Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	let RecomputePerFunction = 1 in {

def UseFPVMLx: Predicate<"((Subtarget->useFPVMLx() &&"		def UseFPVMLx: Predicate<"((Subtarget->useFPVMLx() &&"
" !TM.Options.AllowFPOpFusion == FPOpFusion::Fast) \|\|"		" !TM.Options.AllowFPOpFusion == FPOpFusion::Fast) \|\|"
"MF->getFunction().optForMinSize())">;		"MF->getFunction().optForMinSize())">;
}		}
def UseMulOps : Predicate<"Subtarget->useMulOps()">;		def UseMulOps : Predicate<"Subtarget->useMulOps()">;

// Prefer fused MAC for fp mul + add over fp VMLA / VMLS if they are available.		// Prefer fused MAC for fp mul + add over fp VMLA / VMLS if they are available.
// But only select them if more precision in FP computation is allowed.		// But only select them if more precision in FP computation is allowed, and when
		// they are not slower than a mul + add sequence.
// Do not use them for Darwin platforms.		// Do not use them for Darwin platforms.
def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="		def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="
" FPOpFusion::Fast && "		" FPOpFusion::Fast && "
" Subtarget->hasVFP4()) && "		" Subtarget->hasVFP4()) && "
"!Subtarget->isTargetDarwin()">;		"!Subtarget->isTargetDarwin() &&"
		"Subtarget->useFPVMLx()">;

def HasFastVGETLNi32 : Predicate<"!Subtarget->hasSlowVGETLNi32()">;		def HasFastVGETLNi32 : Predicate<"!Subtarget->hasSlowVGETLNi32()">;
def HasSlowVGETLNi32 : Predicate<"Subtarget->hasSlowVGETLNi32()">;		def HasSlowVGETLNi32 : Predicate<"Subtarget->hasSlowVGETLNi32()">;

def HasFastVDUP32 : Predicate<"!Subtarget->hasSlowVDUP32()">;		def HasFastVDUP32 : Predicate<"!Subtarget->hasSlowVDUP32()">;
def HasSlowVDUP32 : Predicate<"Subtarget->hasSlowVDUP32()">;		def HasSlowVDUP32 : Predicate<"Subtarget->hasSlowVDUP32()">;

def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() \|\|"		def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() \|\|"
▲ Show 20 Lines • Show All 5,811 Lines • Show Last 20 Lines

test/CodeGen/ARM/fusedMAC.ll

	; RUN: llc < %s -mtriple=armv7-eabi -mattr=+neon,+vfp4 -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-eabi -mattr=+neon,+vfp4 -fp-contract=fast \| FileCheck %s
				; RUN: llc < %s -mtriple=arm-arm-eabi -mcpu=cortex-m7 -fp-contract=fast \| FileCheck %s
				samparkerUnsubmitted Not Done Reply Inline Actions this is a funny looking triple for the M33 samparker: this is a funny looking triple for the M33
				; RUN: llc < %s -mtriple=arm-arm-eabi -mcpu=cortex-m4 -fp-contract=fast \| FileCheck %s -check-prefix=DONT-FUSE
				; RUN: llc < %s -mtriple=arm-arm-eabi -mcpu=cortex-m33 -fp-contract=fast \| FileCheck %s -check-prefix=DONT-FUSE

	; Check generated fused MAC and MLS.			; Check generated fused MAC and MLS.

	define double @fusedMACTest1(double %d1, double %d2, double %d3) {			define double @fusedMACTest1(double %d1, double %d2, double %d3) {
	;CHECK-LABEL: fusedMACTest1:			;CHECK-LABEL: fusedMACTest1:
	;CHECK: vfma.f64			;CHECK: vfma.f64
				samparkerUnsubmitted Not Done Reply Inline Actions should the f64 version not also be tested? samparker: should the f64 version not also be tested?
	%1 = fmul double %d1, %d2			%1 = fmul double %d1, %d2
	%2 = fadd double %1, %d3			%2 = fadd double %1, %d3
	ret double %2			ret double %2
	}			}

	define float @fusedMACTest2(float %f1, float %f2, float %f3) {			define float @fusedMACTest2(float %f1, float %f2, float %f3) {
	;CHECK-LABEL: fusedMACTest2:			;CHECK-LABEL: fusedMACTest2:
	;CHECK: vfma.f32			;CHECK: vfma.f32

				;DONT-FUSE-LABEL: fusedMACTest2:
				;DONT-FUSE: vmul.f32
				;DONT-FUSE-NEXT: vadd.f32

	%1 = fmul float %f1, %f2			%1 = fmul float %f1, %f2
	%2 = fadd float %1, %f3			%2 = fadd float %1, %f3
	ret float %2			ret float %2
	}			}

	define double @fusedMACTest3(double %d1, double %d2, double %d3) {			define double @fusedMACTest3(double %d1, double %d2, double %d3) {
	;CHECK-LABEL: fusedMACTest3:			;CHECK-LABEL: fusedMACTest3:
	;CHECK: vfms.f64			;CHECK: vfms.f64
	▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines