This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR
ClosedPublic

Authored by tlively on Apr 6 2021, 10:30 PM.

Details

Summary

When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.

Diff Detail

Event Timeline

tlively created this revision.Apr 6 2021, 10:30 PM
tlively requested review of this revision.Apr 6 2021, 10:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 10:30 PM
srj added a subscriber: srj.Apr 7 2021, 10:08 AM
srj added a comment.Apr 7 2021, 2:20 PM

This definitely improves codegen for Halide substantially.

dschuff added inline comments.Apr 7 2021, 5:10 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1695

Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle?

dschuff accepted this revision.Apr 7 2021, 5:16 PM

Otherwise the code looks good though

This revision is now accepted and ready to land.Apr 7 2021, 5:16 PM
aheejin added inline comments.Apr 8 2021, 5:55 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1640

The source vector must not have more lanes than the dest.

Why? Shuffles can only include half of elements anyway, no?

1695
1713–1731

Does this handle when it is most beneficial to use two same vectors as both sources?

1737–1739

Can we have some more comments on why we prefer this order?

llvm/test/CodeGen/WebAssembly/simd-concat.ll
76

I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result of shufflevector contain the whole two vectors but the result of i8x16.shuffle contains only half of it.

tlively added inline comments.Apr 8 2021, 5:11 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1640

If the source vector has more lanes than the destination, then those lanes would be narrower. The shufflevector SDNode uses indices based on the wider lanes of the destination type, so it cannot express the extension and use of smaller lanes in one of the source vectors.

In contrast, if the source vector has fewer lanes than the destination, then those lanes would be wider. The extract_vector_elt operand to the BUILD_VECTOR node would therefore be doing an implicit truncate, and by scaling up the indices for the smaller destination lanes, the truncated portions of the wider source lanes can be correctly pulled in.

I'll expand on this comment a bit.

1695

In a single sentence: Shuffling uses a static array of indices to draw lanes from two source vectors while swizzling uses the lanes of one source vector as indices into a second source vector.

1713–1731

The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to use the same source for both operands. That being said, if there is only one available source, ShuffleSrc2 will be set to undef here and will be later made a copy of the first operand, for lack of any better vector to use there.

1737–1739

It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to prefer the simpler/smaller operations like splat over the more complex operations like shuffles and swizzles. I'll change this order in a follow-up PR and add a comment there.

llvm/test/CodeGen/WebAssembly/simd-concat.ll
76

v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an i64x2 vector. The i8x16.shuffle here pulls in those low 32 bits from each lane and leaves the unused high 32 bits.

tlively updated this revision to Diff 336277.Apr 8 2021, 5:20 PM
  • Improve comment, fix name
aheejin accepted this revision.Apr 9 2021, 6:10 AM
aheejin added inline comments.
llvm/test/CodeGen/WebAssembly/simd-concat.ll
76

Ah right, I mistook this for v4i32.. Sorry.