This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Select lower sub,abs pattern to sabd on AArch64
ClosedPublic

Authored by karthikthecool on Dec 26 2014, 7:38 AM.

Details

Summary

Hi,
The below code -

int c[4],b[4],a[4];
void fn() {
  c[0] = abs(a[0]-b[0]);
  c[1] = abs(a[1]-b[1]);
  c[2] = abs(a[2]-b[2]);
  c[3] = abs(a[3]-b[3]);
}

is compiled into -

fn:                                     // @fn
// BB#0:

adrp x8, a
add x8, x8, :lo12:a
adrp x9, b
add x9, x9, :lo12:b
ldr q0, [x8]
ldr q1, [x9]
sub v0.4s, v0.4s, v1.4s
abs v0.4s, v0.4s
adrp x8, c
add x8, x8, :lo12:c
str q0, [x8]
ret
The sequence-

sub	v0.4s, v0.4s, v1.4s
abs	v0.4s, v0.4s

can further be lowered to a single instruction on AArch64-

sabd	v0.4s, v0.4s, v1.4s

This patch pattern matches the same in .td file to generate sabd instruction.
Please let me know if this is good to commit.

Thanks and Regards
Karthik Bhat

Diff Detail

Repository
rL LLVM

Event Timeline

karthikthecool retitled this revision from to [AArch64] Select lower sub,abs pattern to sabd on AArch64.
karthikthecool updated this object.
karthikthecool edited the test plan for this revision. (Show Details)
karthikthecool set the repository for this revision to rL LLVM.
karthikthecool added a subscriber: Unknown Object (MLST).

Update patch as SABD subtracts the second vector operand with the first and places the absolute value of difference in destination.
The pattern in the patch was incorrectly subtracting the first vector opearand with the second.
Updated the patch to generate the correct instruction. i.e.

sub	v0.4s, v0.4s, v1.4s
abs	v0.4s, v0.4s

can further be lowered to a single instruction on AArch64-

sabd	v0.4s, v1.4s, v0.4s and *NOT* sabd	v0.4s, v0.4s, v1.4s

Please let me know if this is good to commit.
Thanks and Regards
Karthik Bhat

jyoti.allur added a subscriber: jyoti.allur.EditedDec 29 2014, 6:28 AM

Hi Karthik,
Your original diff had it right the first time.

sub v0.4s, v0.4s, v1.4s => v0.4s - v1.4s and result of substract stored to v0.4s
abs v0.4s, v0.4s

Shouldn't it be SABDv4i32 V128:$Rn, V128:$Rm ?

Thanks Jyoti.. Yes you are right i got confused by the wording in the instruction manual. The 1st version of the patch was correct. Reverting back to the same.
Please let me know if you have any other comments.
Thanks and Regards
Karthik Bhat

A generic name for the testcase would be better.
Need to add checks for complete machine instructions in the test output rather than just pneumonic, since we have added different patterns to handle various types of data.
Could you please modify to test these?
While above changes are needed, i would prefer we get a go from either of reviewers before committing.

Hi Jyoti,
Updated the test cases as per reveiw comments to checke the exact instruction being generated.
Merged test case with D6791.
Please let me know if you have any other comments or if it is good to commit.
Thanks and Regards
Karthik Bhat

jmolloy accepted this revision.Jan 5 2015, 2:53 AM
jmolloy edited edge metadata.

Hi Karthik,

This also looks fine to me. As I mentioned on your previous Phab revision, I'd have written the testcase using function parameters rather than globals as it's a little easier to read. Perhaps that'd be worth thinking about for future revisions.

LGTM.

James

This revision is now accepted and ready to land.Jan 5 2015, 2:53 AM
karthikthecool closed this revision.Jan 5 2015, 5:15 AM

Hi James,
Thanks for the input. It makes sense to have function arguments instead of global variable. Submitted as r225165 after modifying test case as per comments.
Thanks!