This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve codegen of store lane instructions by avoiding GPR usage
ClosedPublic

Authored by ab on Nov 10 2014, 3:09 PM.

Details

Summary

EDIT: I should note that all this only applies to base+offset addressing modes.

We used to generate stuff like

	umov.b	w8, v0[2]
	strb	w8, [x0, x1]

because the STR*ro* patterns were preferred to ST1*.
Instead, generate:

	add	x8, x0, x1
	st1.b	{ v0 }[2], [x8]

This patch increases the ST1* AddedComplexity to achieve that.

Thanks!
-Ahmed

Diff Detail

Event Timeline

ab updated this revision to Diff 16011.Nov 10 2014, 3:09 PM
ab retitled this revision from to [AArch64] Improve codegen of store lane instructions by avoiding GPR usage.
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added a reviewer: t.p.northover.
ab added a subscriber: Unknown Object (MLST).
ab updated this object.Nov 10 2014, 3:13 PM
ab updated this object.Nov 11 2014, 10:15 AM
ab added a comment.Dec 1 2014, 12:58 PM

Ping

(Thanks for the reviews, Tim!)

-Ahmed

ab added a comment.Dec 22 2014, 1:33 PM

Another ping for review!

Thanks,
-Ahmed

jmolloy requested changes to this revision.Dec 23 2014, 2:35 AM
jmolloy added a reviewer: jmolloy.
jmolloy added a subscriber: jmolloy.

Hi Ahmed,

This patch would be better split into two. The first one seems fine to me.

For the second one; my local tinkering has suggested that you shouldn't need the "Non0" stuff at all.

You want to raise ST1Lane's AddedComplexity from 15 to 19, because the integer STRWro also has AddedComplexity 15 (and it matches more nodes in the DAG so has an overall higher complexity). This is fine, although it does expose AddedComplexity to be a very blunt hammer.

But if you do that you end up with suboptimal code for lane 0:

add	x8, x0, w1, sxtw #2
st1	{ v0.s }[0], [x8]

(which is different from the code snippet you pasted earlier. I get the above solely from changing the AddedComplexity from 15 to 19 on the ST1Lane patterns)

So surely all you need to do is make the AddedComplexity of your new Lane0 patterns "19". This is because your new patterns will match at least one more node than the ST1 patterns (because they match a RO addr mode).

This way, you're just telling SDAG "Here is a new pattern, it has the same AddedComplexity as the other one - choose at your will". Which is better than telling it it can never pick an ST1Lane for an index 0.

Does that sound sensible?

Cheers,

James

This revision now requires changes to proceed.Dec 23 2014, 2:35 AM
ab added a comment.Dec 23 2014, 11:33 AM

Why yes.. yes indeed, it makes lots of sense!

I split the lane0 parts into D6772; both patches are much simpler now.

Thanks a lot for the review!
-Ahmed

jmolloy accepted this revision.Jan 5 2015, 2:31 AM
jmolloy edited edge metadata.

This looks good to me, thanks for making the changes!

James

This revision is now accepted and ready to land.Jan 5 2015, 2:31 AM
This revision was automatically updated to reflect the committed changes.