Page MenuHomePhabricator

[PowerPC] Avoid scalarization of vector truncate

Authored by RolandF on Jan 9 2019, 11:17 AM.



The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.

Event Timeline

RolandF created this revision.Jan 9 2019, 11:17 AM
RolandF updated this revision to Diff 180946.Jan 9 2019, 3:28 PM

Fix comment.

nemanjai accepted this revision.Jan 25 2019, 2:58 PM

LGTM. I've had a pretty close look at it and am going to approve it. However, I realized that I was the only reviewer on it so I'll ask you to give it a couple of days to give at least @hfinkel and @jsji (but others as well) a chance to have a look if they are interested.

6826 ↗(On Diff #186059)

This may be misleading to the reader as it suggests that the byte order within an element in the register is different on little endian systems. It might be clearer to use LLVM-like notation for vectors and write these as:

<MSB1| LSB1, MSB2|LSB2,   uu,   uu,   uu,   uu,   uu,   uu> to
<LSB1, LSB2,      u, u, u, u, u, u, u, u, u, u, u, u, u, u>

<  uu,   uu,   uu,   uu,   uu,   uu, MSB2|LSB2, MSB1| LSB1> to
<u, u, u, u, u, u, u, u, u, u, u, u,      u, u, LSB2, LSB1>
This revision is now accepted and ready to land.Jan 25 2019, 2:58 PM
jsji accepted this revision.Jan 28 2019, 1:29 PM

LGTM. Thanks for exploiting this! Some minor comments.

644 ↗(On Diff #186059)

Maybe add a comment here about why only these 5 target VT are supported?

6834 ↗(On Diff #186059)

Comment a little misleading here?
v4f64 is also a legal PPC QPX vector. Maybe we should limit it to be "legal Altivec vector" here?

6855 ↗(On Diff #186059)

Maybe we can just use WideNumElts + 1 here?

2 ↗(On Diff #186059)

Why we require pwr9 here? I think this should apply to pwr8 and below as well?

16 ↗(On Diff #186059)

Can we add this new testcase in a NFC patch first, then show ONLY the difference caused by this opt here?

It will be great for others to see that this patch reduced the number of instructions from 33 to 1!

18 ↗(On Diff #186059)

Maybe we should either add the check to xxswapd in BE ?
Or else why not just remove all CHECK-BE? As they are all identical to CHECK?

nemanjai added inline comments.Jan 28 2019, 2:03 PM
16 ↗(On Diff #186059)

I think if we're doing this, it would probably be nice to see the entire codegen. Just produce the checks using utils/

RolandF marked an inline comment as done.Feb 6 2019, 2:47 PM
RolandF added inline comments.
18 ↗(On Diff #186059)

LowerTRUNCATEVector has an if statement checking LE or BE, so even though we want the result to be the same in the end we have to test both LE and BE to test the whole function.

jsji added inline comments.Feb 7 2019, 8:25 AM
18 ↗(On Diff #186059)

Yes, we need to test both BE/LE, but no need to have CHECK-BE/CHECK-LE prefixes if the results are the same?

RolandF marked an inline comment as done.Feb 7 2019, 3:42 PM
RolandF added inline comments.
18 ↗(On Diff #186059)

Oh, okay, I see what you mean now. I could have saved a bunch of check lines.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 3:42 PM
RolandF updated this revision to Diff 186059.Feb 8 2019, 3:28 PM

Update diff to show test changes and respond to comments.

This revision was automatically updated to reflect the committed changes.