If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any proceeding MVE instructions in the preheader.
Looks good, but ignoring the nits I have one question inlined that asks about explaining why we are doing this, and am interested to have a read first.
nit: first thought it was a typo, but it isn't of course....perhaps: LSTP -> [W|D]LSTP
can you be a bit more specific, i.e. what the exact conditions are when we can't insert it and why we are doing this?
nit: perhaps cannotInsertLSTPBetween -> cannotInsertWDLSTPBetween
Thanks, perfectly clear, LGTM.
For bonus points, if you have ideas on that, perhaps a TODO how we can win some of the reverted loops back? Or is it next on your list to look at this?. From a quick look at one of the changed tests, I could be wrong, I see some loopstart instructions sitting in the middle of the block for no good reasons? Is there some low hanging fruit there?
Thanks. Yes, that's next on my TODO list (why would I want to do anything else?!) What I'd really like to do is simplify the logic and remove all the code that moves stuff around and just place the [D|W]LSTP as the last instruction in the preheader. We'll see how that goes...