Page MenuHomePhabricator

[RISCV] Turn splat shuffles of vector loads into scalar loads and a splat.
AbandonedPublic

Authored by craig.topper on Tue, Mar 23, 9:06 PM.

Details

Summary

VectorCombine turns splats of scalar loads into a vector load and
splat if it can determine that reading extract bytes won't page
fault, isn't volatile or atomic, etc. I'm not sure how useful this
is for us. It's especially annoying because vrgather.vi has a
early clobber constraint so this can force us to use an extra
register. If there happen to be splats of neighboring scalar
loads, VectorCombine seems to create multiple vector loads starting
from the address of each scalar.

This patch reverses the transform by turning it back into a scalar
load. I'm also peeking through concat_vectors because I saw
VectorCombine pad with undefs if it was too close to the end of
an array to load the full vector size.

Diff Detail

Event Timeline

craig.topper created this revision.Tue, Mar 23, 9:06 PM
craig.topper requested review of this revision.Tue, Mar 23, 9:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Mar 23, 9:06 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

How difficult would it be to opt out of this vector combine in the first place? I can imagine that this may benefit other architectures.

How difficult would it be to opt out of this vector combine in the first place? I can imagine that this may benefit other architectures.

It looks like the VectorCombine code only considers the cost of the insert+load that it is replacing. It doesn't look at the shuffle use. So it compares the cost of vector load to the cost of a load+insert. That by itself might be profitable. But once you consider that insert+shuffle is one instruction for us, then reversing the transform makes some sense.

HsiangKai added inline comments.Wed, Mar 31, 7:22 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1415

'the' is duplicated.

Remove duplicated "the" from comment.

One could argue that this introduces coupling between the scalar register bank and the vector register bank. But I presume the simpler scalar load makes up for that loss of decoupling.

Reassuring myself here: VectorCombine cannot do this for scalables, can it?

One could argue that this introduces coupling between the scalar register bank and the vector register bank. But I presume the simpler scalar load makes up for that loss of decoupling.

Reassuring myself here: VectorCombine cannot do this for scalables, can it?

The VectorCombine code only works on fixed vectors

craig.topper abandoned this revision.Mon, Apr 19, 9:18 PM

I have a much better idea for this.