This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Turn splat shuffles of vector loads into scalar loads and a splat.
AbandonedPublic

Authored by craig.topper on Mar 23 2021, 9:06 PM.

Details

Summary

VectorCombine turns splats of scalar loads into a vector load and
splat if it can determine that reading extract bytes won't page
fault, isn't volatile or atomic, etc. I'm not sure how useful this
is for us. It's especially annoying because vrgather.vi has a
early clobber constraint so this can force us to use an extra
register. If there happen to be splats of neighboring scalar
loads, VectorCombine seems to create multiple vector loads starting
from the address of each scalar.

This patch reverses the transform by turning it back into a scalar
load. I'm also peeking through concat_vectors because I saw
VectorCombine pad with undefs if it was too close to the end of
an array to load the full vector size.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 23 2021, 9:06 PM
craig.topper requested review of this revision.Mar 23 2021, 9:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2021, 9:06 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

How difficult would it be to opt out of this vector combine in the first place? I can imagine that this may benefit other architectures.

How difficult would it be to opt out of this vector combine in the first place? I can imagine that this may benefit other architectures.

It looks like the VectorCombine code only considers the cost of the insert+load that it is replacing. It doesn't look at the shuffle use. So it compares the cost of vector load to the cost of a load+insert. That by itself might be profitable. But once you consider that insert+shuffle is one instruction for us, then reversing the transform makes some sense.

HsiangKai added inline comments.Mar 31 2021, 7:22 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1347

'the' is duplicated.

Remove duplicated "the" from comment.

One could argue that this introduces coupling between the scalar register bank and the vector register bank. But I presume the simpler scalar load makes up for that loss of decoupling.

Reassuring myself here: VectorCombine cannot do this for scalables, can it?

One could argue that this introduces coupling between the scalar register bank and the vector register bank. But I presume the simpler scalar load makes up for that loss of decoupling.

Reassuring myself here: VectorCombine cannot do this for scalables, can it?

The VectorCombine code only works on fixed vectors

craig.topper abandoned this revision.Apr 19 2021, 9:18 PM

I have a much better idea for this.