When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
1695 | Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle? |
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
1640 |
Why? Shuffles can only include half of elements anyway, no? | |
1695 | ||
1713–1731 | Does this handle when it is most beneficial to use two same vectors as both sources? | |
1737–1739 | Can we have some more comments on why we prefer this order? | |
llvm/test/CodeGen/WebAssembly/simd-concat.ll | ||
76 | I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result of shufflevector contain the whole two vectors but the result of i8x16.shuffle contains only half of it. |
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
1640 | If the source vector has more lanes than the destination, then those lanes would be narrower. The shufflevector SDNode uses indices based on the wider lanes of the destination type, so it cannot express the extension and use of smaller lanes in one of the source vectors. In contrast, if the source vector has fewer lanes than the destination, then those lanes would be wider. The extract_vector_elt operand to the BUILD_VECTOR node would therefore be doing an implicit truncate, and by scaling up the indices for the smaller destination lanes, the truncated portions of the wider source lanes can be correctly pulled in. I'll expand on this comment a bit. | |
1695 | In a single sentence: Shuffling uses a static array of indices to draw lanes from two source vectors while swizzling uses the lanes of one source vector as indices into a second source vector. | |
1713–1731 | The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to use the same source for both operands. That being said, if there is only one available source, ShuffleSrc2 will be set to undef here and will be later made a copy of the first operand, for lack of any better vector to use there. | |
1737–1739 | It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to prefer the simpler/smaller operations like splat over the more complex operations like shuffles and swizzles. I'll change this order in a follow-up PR and add a comment there. | |
llvm/test/CodeGen/WebAssembly/simd-concat.ll | ||
76 | v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an i64x2 vector. The i8x16.shuffle here pulls in those low 32 bits from each lane and leaves the unused high 32 bits. |
llvm/test/CodeGen/WebAssembly/simd-concat.ll | ||
---|---|---|
76 | Ah right, I mistook this for v4i32.. Sorry. |
Why? Shuffles can only include half of elements anyway, no?