Implements custom lowering logic that finds the optimal value for the
initial splat of the vector and either uses it or uses v128.const if
it is available and if it would produce smaller code. This logic
replaces large TableGen ISEL patterns that would lower all non-splat
BUILD_VECTORs into a splat followed by a fixed number of replace_lane
instructions. This CL fixes PR39685.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The tests are a WIP, but I thought I'd get this uploaded before they are done because there is a fair amount of relatively complex code to review. I manually verified that all the tests behave as expected.
Sorry for the delay! Some nits and questions.
- How should undef elements be initialized? This patch looks it doesn't care about which number they are initialized with, whereas we initialize them with 0 in scalars. Is this OK?
lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
1137 ↗ | (On Diff #182184) | How about V.getOpcode() == ISD::Constant || V.getOpcode() == ISD::ConstantFP ? ConstantSDNode and ConstantFPSDNode's definitions seem to include ISD::TargetConstant and ISD::TargetConstantFP, whose definitions sound like we shouldn't do any optimizations on them. |
1144 ↗ | (On Diff #182184) | Variable names should start with an uppercase letter. I know it looks especially weird for loop index variables ;( But anyway.. the same for all other for loops too. |
1144 ↗ | (On Diff #182184) | How about for (const SDValue &Op : Op.op_values()) ? For the other for loops too |
1147 ↗ | (On Diff #182184) | This is not being used anywhere. Were you gonna use it for something? Otherwise we can delete this? |
1169 ↗ | (On Diff #182184) | Can all these byte calculations change if we take LEB encoding into account? |
1173 ↗ | (On Diff #182184) | In which case you do splat for non-const arguments after using v128.const? |
1174 ↗ | (On Diff #182184) | For replace_lane, why is it 2? I guess 2 bytes for the opcode and a byte for immediate lane index, so 3? |
1184 ↗ | (On Diff #182184) | Why is this LaneConstBytes other than LaneDynBytes? Isn't LaneDynBytes the bytes needed to replace_lane instruction? |
test/CodeGen/WebAssembly/simd-build-vector.ll | ||
116 ↗ | (On Diff #182184) | Looking at the code above, I wonder how we ended up with a local.get from an unassigned local here for all undef case. Actually it works beautifully, given that unassigned locals are zero-initialized by default and it is even less space than splat 0. Maybe this can be an optimization in which we try to replace all v128.const 0, ..., 0 and splat 0 with local.get n where n being an assigned local. Just a thought and not relevant to this CL. |
test/CodeGen/WebAssembly/simd-build-vector.ll | ||
---|---|---|
116 ↗ | (On Diff #182184) | Hmm, it seems ExplicitLocals pass turns all unstackified registers to locals in which undefined registers become local.get from unassigned locals. And I don't think the current SIMD backend would prefer v128.const 0, ..., 0 over splat 0 anyway, right? So sorry, nevermind. :) |
All the inline comments disappeared because I switched to using the monorepo, which has different paths.
- How should undef elements be initialized? This patch looks it doesn't care about which number they are initialized with, whereas we initialize them with 0 in scalars. Is this OK?
Undef elements can be initialized with anything. We use zero here and elsewhere because there's no better default.
lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
1169 ↗ | (On Diff #182184) | Yes, LEB encoding can make any SIMD opcode arbitrarily long. I'm trying to use the smallest encoding in these calculations on the assumption that tools will always prefer the smallest encoding. |
1173 ↗ | (On Diff #182184) | After using v128.const, everything else would be a replace_lane, not a splat. This value is also used for the initial splat when v128.const is not used, though. |
1174 ↗ | (On Diff #182184) | Yes, good catch. |
1184 ↗ | (On Diff #182184) | I assume (for a lack of better information) that all dynamic values are already on the stack, but all constants need to be materialized immediately before they are used. That means that for constant replace_lanes I count the replace_lane instruction and also the constant lane value. |
I assume (for a lack of better information) that all dynamic values are already on the stack, but all constants need to be materialized immediately before they are used. That means that for constant replace_lanes I count the replace_lane instruction and also the constant lane value.
I guess for dynamic splats and consts we should add bytes for opcode for local.get and local number immediate? Because whatever that value is, most likely that value would not be on the top of the stack. For example, for dynamic replace_lanes, we start the code sequence with splat for v128.const, and by the time we need replace_lane, not the dynamic value but the result of v128.const or splat will be on the top of the stack. Also when you begin the sequence with a splat with dynamic value, that value might be on top of the stack, but more likely it's not.
We talked in person and I realized I forgot replace_lane also takes a vector argument. Thanks!