This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Select lower fsub,fabs pattern to fabd on AArch64
ClosedPublic

Authored by karthikthecool on Dec 28 2014, 11:48 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy

Summary

Hi,
Similar to http://reviews.llvm.org/D6781 we can select lower fsub, fabs pattern to fabd on AArch64.
Add pattern matching in .td file to handle the same.
For example for the below code -

float a[4],b[4],c[4];
void fabd_test() {
  a[0] = fabs(b[0]-c[0]);
  a[1] = fabs(b[1]-c[1]);
  a[2] = fabs(b[2]-c[2]);
  a[3] = fabs(b[3]-c[3]);
}

gcc produces a single

fabd	v0.4s, v1.4s, v0.4s

instead of

fsub	v0.4s, v0.4s, v1.4s
fabs	v0.4s, v0.4s

which was previously produced by clang. After this patch we are able to lower fsub fabs to fabd.
This is also valid for scalar operands in case of fabd.

Please let me know if this is good to commit.

Thanks and Regards
Karthik Bhat

Diff Detail

Repository: rL LLVM

Event Timeline

karthikthecool updated this revision to Diff 17665.Dec 28 2014, 11:48 PM

karthikthecool retitled this revision from to [AArch64] Select lower fsub,fabs pattern to fabd on AArch64.

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: t.p.northover, jmolloy.

karthikthecool set the repository for this revision to rL LLVM.

karthikthecool added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptDec 28 2014, 11:48 PM

Again, same comment as for other patch.

FABD Vd.<T>, Vn.<T>, Vm.<T>
Subtracts the elements of Vm from the corresponding
elements of Vn, and places the absolute values of the results in the elements of Vd

Rm, Rn should swap thier places i think.
I was also thinking if we should have a more generic name for the testcase to cover all vector arithmetic instructions involving abs.
Would like to know what James and Tim's thoughts are on this.
LGTM, otherwise.

Hi Jyoti,
Thanks for the input. Yes you are correct the registers in matched pattern for FABD needs to be swapped. I got confused by the manual wording. Updated the patch.
Please let me know if you have any other comments.
Thanks and Regards
Karthik Bhat

I would use a more generic name for the testcase as mentioned before. You could probably club the testcases of sadb, fabd into a single generic file something like arm64-neon-simd-vabs.ll. For the scalar versions you should find an appropriate place though.

We should probably check for complete machine instructions in the test output rather than just pneumonic, since we have added different patterns to handle various types of data.

Could you please modify to test these?

[PS] I would still get a go from either of reviewers before committing.

Hi Karthik,
To further clarify, reason for clubbing the testcases of sadb, fabd into a single generic file is so that we can accomodate more patterns revolving around
abs arithmetic, another reason being, adding more test files means running llc multiple times on each file, which in turn increases test time.
Hope this sounds reasonable.

Hi Jyoti,
Updated the test cases as per reveiw comments to check the exact instruction being generated.
Merged test case with D6781.
Please let me know if you have any other comments or if it is good to commit.
Thanks and Regards
Karthik Bhat

Hi Karthik,

This looks fine to me. I'd have written the test slightly differently, taking function parameters rather than globals as it makes the resulting code smaller and easier to read, but it's fine as-is.

Cheers,

James

This revision is now accepted and ready to land.Jan 5 2015, 2:50 AM

Thanks James. Comitted as r225169 after modifying test case as per comment.
Thanks!

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64InstrInfo.td

12 lines

test/

CodeGen/

AArch64/

arm64-neon-simd-vabs.ll

81 lines

Diff 17789

lib/Target/AArch64/AArch64InstrInfo.td

Context not available.
	BinOpFrag<(or node:$LHS, (vnot node:$RHS))> >;	BinOpFrag<(or node:$LHS, (vnot node:$RHS))> >;
	defm ORR : SIMDLogicalThreeVector<0, 0b10, "orr", or>;	defm ORR : SIMDLogicalThreeVector<0, 0b10, "orr", or>;

		def : Pat<(v2f32 (fabs (fsub V64:$Rn, V64:$Rm))),
		(FABDv2f32 V64:$Rn, V64:$Rm)>;
		def : Pat<(v4f32 (fabs (fsub V128:$Rn, V128:$Rm))),
		(FABDv4f32 V128:$Rn, V128:$Rm)>;
		def : Pat<(v2f64 (fabs (fsub V128:$Rn, V128:$Rm))),
		(FABDv2f64 V128:$Rn, V128:$Rm)>;

	def : Pat<(AArch64bsl (v8i8 V64:$Rd), V64:$Rn, V64:$Rm),	def : Pat<(AArch64bsl (v8i8 V64:$Rd), V64:$Rn, V64:$Rm),
	(BSLv8i8 V64:$Rd, V64:$Rn, V64:$Rm)>;	(BSLv8i8 V64:$Rd, V64:$Rn, V64:$Rm)>;
	def : Pat<(AArch64bsl (v4i16 V64:$Rd), V64:$Rn, V64:$Rm),	def : Pat<(AArch64bsl (v4i16 V64:$Rd), V64:$Rn, V64:$Rm),
Context not available.
	defm USQADD : SIMDTwoScalarBHSDTied< 1, 0b00011, "usqadd",	defm USQADD : SIMDTwoScalarBHSDTied< 1, 0b00011, "usqadd",
	int_aarch64_neon_usqadd>;	int_aarch64_neon_usqadd>;

		def : Pat<(f32 (fabs (fsub FPR32:$Rn, FPR32:$Rm))),
		(FABD32 FPR32:$Rn, FPR32:$Rm)>;
		def : Pat<(f64 (fabs (fsub FPR64:$Rn, FPR64:$Rm))),
		(FABD64 FPR64:$Rn, FPR64:$Rm)>;

	def : Pat<(AArch64neg (v1i64 V64:$Rn)), (NEGv1i64 V64:$Rn)>;	def : Pat<(AArch64neg (v1i64 V64:$Rn)), (NEGv1i64 V64:$Rn)>;

	def : Pat<(v1i64 (int_aarch64_neon_fcvtas (v1f64 FPR64:$Rn))),	def : Pat<(v1i64 (int_aarch64_neon_fcvtas (v1f64 FPR64:$Rn))),
Context not available.

test/CodeGen/AArch64/arm64-neon-simd-vabs.ll

				; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-none-linux-gnu"

				@a = common global [4 x float] zeroinitializer
				@b = common global [4 x float] zeroinitializer
				@c = common global [4 x float] zeroinitializer
				; CHECK: test_v4f32
				; CHECK: fabd v0.4s, v0.4s, v1.4s
				declare <4 x float> @llvm.fabs.v4f32(<4 x float>)
				define void @test_v4f32(){
				%1 = load <4 x float>* bitcast ([4 x float]* @b to <4 x float>*)
				%2 = load <4 x float>* bitcast ([4 x float]* @c to <4 x float>*)
				%3 = fsub <4 x float> %1, %2
				%4 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %3)
				store <4 x float> %4, <4 x float>* bitcast ([4 x float]* @a to <4 x float>*)
				ret void
				}

				@d = common global [2 x float] zeroinitializer
				@e = common global [2 x float] zeroinitializer
				@f = common global [2 x float] zeroinitializer
				; CHECK: test_v2f32
				; CHECK: fabd v0.2s, v0.2s, v1.2s
				declare <2 x float> @llvm.fabs.v2f32(<2 x float>)
				define void @test_v2f32(){
				%1 = load <2 x float>* bitcast ([2 x float]* @e to <2 x float>*)
				%2 = load <2 x float>* bitcast ([2 x float]* @f to <2 x float>*)
				%3 = fsub <2 x float> %1, %2
				%4 = call <2 x float> @llvm.fabs.v2f32(<2 x float> %3)
				store <2 x float> %4, <2 x float>* bitcast ([2 x float]* @d to <2 x float>*)
				ret void
				}

				@g = common global [2 x double] zeroinitializer
				@h = common global [2 x double] zeroinitializer
				@i = common global [2 x double] zeroinitializer
				; CHECK: test_v2f64
				; CHECK: fabd v0.2d, v0.2d, v1.2d
				declare <2 x double> @llvm.fabs.v2f64(<2 x double>)
				define void @test_v2f64(){
				%1 = load <2 x double>* bitcast ([2 x double]* @g to <2 x double>*)
				%2 = load <2 x double>* bitcast ([2 x double]* @h to <2 x double>*)
				%3 = fsub <2 x double> %1, %2
				%4 = call <2 x double> @llvm.fabs.v2f64(<2 x double> %3)
				store <2 x double> %4, <2 x double>* bitcast ([2 x double]* @i to <2 x double>*)
				ret void
				}

				@j = common global float 0.000000e+00
				@k = common global float 0.000000e+00
				@l = common global float 0.000000e+00
				; CHECK: test_fabd32
				; CHECK: fabd s0, s0, s1
				declare float @fabsf(float)
				define void @test_fabd32(){
				%1 = load float* @j
				%2 = load float* @k
				%3 = fsub float %1, %2
				%fabsf = tail call float @fabsf(float %3) #0
				store float %fabsf, float* @l
				ret void
				}

				@n = common global double 0.000000e+00
				@o = common global double 0.000000e+00
				@m = common global double 0.000000e+00
				; CHECK: test_fabd64
				; CHECK: fabd d0, d0, d1
				declare double @fabs(double)
				define void @test_fabd64() {
				%1 = load double* @n
				%2 = load double* @o
				%3 = fsub double %1, %2
				%4 = tail call double @fabs(double %3) #0
				store double %4, double* @m
				ret void
				}

				attributes #0 = { nounwind readnone "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }