The shift amount operand can be provided in a general purpose register so sink it. Patterns have been added for lshr and ashr where we use a negated shift value for with a 'shift left' instruction.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
---|---|---|
4072 | I think all of this should be happening in a DAG combine, to give the negate a chance to fold into something else. It's generally not a good idea to create multiple outputs in a tablegen pattern. Ideally thy should be one/multi inputs to one output, with the other optimisations happening in dag combine to allow extra folding to happen. I guess ideally this would happen much earlier, like in instcombine to allow all that folding goodness. Not sure how to make that happen sensibly though. |
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
---|---|---|
4072 | I thought that these negations were generated by us in lowering/combining? I haven't looked much into the lowering code, but I really would hope that we could remove some of the custom nodes... So I would be very hesitant to introduce yet more custom nodes, to get around the custom nodes that they've already introduced! I also don't see why more c++ would be better than having all the related patterns here, in a concise readable format. |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
14746 | This bit looks like a good change on its own, if you want to split that out ( with the tests) and commit it separately. | |
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
4072 | I'm not sure which custom nodes you mean. There was a SPLATVECTOR added recently, which could replace ARMvdup. That would be good to use if it works enough yet (when I looked it seemed like it might only work for scalables, not sure). Shifts are a bit more awkward though, with the negated conditions. If you can make them work then that sounds useful, but I have a memory of trying that initially but eventually just using the custom nodes, the same as neon. My understanding is that, at least in isel, tablegen is considered only as the last step. There is no folding that happens after it, on machine instructions. (Which would be useful in places, but sounds like a lot of work, having to reimplement it for all targets). So the graph going into lowering needs to already be optimised. Unlike with legalising you can't just make a mess and have it cleaned up after. If you generate a shift _and_ a rsb, that's what you end up with even if the rsb can be folded into something else. We just need to give it that chance to be folded and that can only happen in dag combine. So I think there should be a (vdup(neg)) -> (neg(vdup)) combine. It shouldn't be too much code, as far as I understand. It might even be less than all this tablegen. |
Nice one. Yeah, that was what I meant. LGTM
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
11748 | We have a IsZeroVector in here now, which may help simplify some of this. |
We have a IsZeroVector in here now, which may help simplify some of this.