This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Sink vector shift operand
ClosedPublic

Authored by samparker on Nov 29 2019, 12:55 AM.

Details

Summary

The shift amount operand can be provided in a general purpose register so sink it. Patterns have been added for lshr and ashr where we use a negated shift value for with a 'shift left' instruction.

Diff Detail

Event Timeline

samparker created this revision.Nov 29 2019, 12:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2019, 12:55 AM
This revision is now accepted and ready to land.Nov 29 2019, 4:27 AM
dmgreen added inline comments.Nov 30 2019, 5:31 AM
llvm/lib/Target/ARM/ARMInstrMVE.td
4072

I think all of this should be happening in a DAG combine, to give the negate a chance to fold into something else.

It's generally not a good idea to create multiple outputs in a tablegen pattern. Ideally thy should be one/multi inputs to one output, with the other optimisations happening in dag combine to allow extra folding to happen.

I guess ideally this would happen much earlier, like in instcombine to allow all that folding goodness. Not sure how to make that happen sensibly though.

samparker marked an inline comment as done.Dec 2 2019, 5:02 AM
samparker added inline comments.
llvm/lib/Target/ARM/ARMInstrMVE.td
4072

I thought that these negations were generated by us in lowering/combining? I haven't looked much into the lowering code, but I really would hope that we could remove some of the custom nodes... So I would be very hesitant to introduce yet more custom nodes, to get around the custom nodes that they've already introduced! I also don't see why more c++ would be better than having all the related patterns here, in a concise readable format.

dmgreen added inline comments.Dec 3 2019, 12:19 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
14746

This bit looks like a good change on its own, if you want to split that out ( with the tests) and commit it separately.

llvm/lib/Target/ARM/ARMInstrMVE.td
4072

I'm not sure which custom nodes you mean. There was a SPLATVECTOR added recently, which could replace ARMvdup. That would be good to use if it works enough yet (when I looked it seemed like it might only work for scalables, not sure). Shifts are a bit more awkward though, with the negated conditions. If you can make them work then that sounds useful, but I have a memory of trying that initially but eventually just using the custom nodes, the same as neon.

My understanding is that, at least in isel, tablegen is considered only as the last step. There is no folding that happens after it, on machine instructions. (Which would be useful in places, but sounds like a lot of work, having to reimplement it for all targets). So the graph going into lowering needs to already be optimised. Unlike with legalising you can't just make a mess and have it cleaned up after.

If you generate a shift _and_ a rsb, that's what you end up with even if the rsb can be folded into something else. We just need to give it that chance to be folded and that can only happen in dag combine.

So I think there should be a (vdup(neg)) -> (neg(vdup)) combine. It shouldn't be too much code, as far as I understand. It might even be less than all this tablegen.

samparker updated this revision to Diff 232503.Dec 6 2019, 12:32 AM

Now flipping the vdup and sub to reuse existing patterns.

dmgreen accepted this revision.Dec 6 2019, 1:21 AM

Nice one. Yeah, that was what I meant. LGTM

llvm/lib/Target/ARM/ARMISelLowering.cpp
11748

We have a IsZeroVector in here now, which may help simplify some of this.

samparker marked an inline comment as done.Dec 11 2019, 11:36 PM
This revision was automatically updated to reflect the committed changes.