As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities.
Details
Diff Detail
Unit Tests
Event Timeline
I kind of think this should be the default (plus it's perhaps a little strange for -march=thumbv8.1-m to have a difference branch cost to -march=thumbv8.1-m+mve).
But it does fix some regressions, so LGTM!
If I build up the will, I'll take a look at the vectorizer and figure out how it's using these costs... I'm assuming it's the cost of the phi that causes the problems which, as a register, sounds counter-intuitive to be 'free'.
Do you mean it's the cost of a phi that is altering things? They certainly sounds like they should be free most of the time. Even for codesize I would expect them to be folded away a lot of the time. You just have to get the inputs to share a register after all.
Do you mean it's the cost of a phi that is altering things?
I believe so, since I figure there's plenty more of them than branches and returns that would affect the vectorizer. I think register usages should still represent some cost, because they are important physical resources, but I suspect that the vectorizer is taking the scalar cost and multiplying it by the vector factor, even though we're likely to still only use a single register.
Registers are free unless you go over a limit. And I wouldn't expect it to be these individual cost functions that attempted to guess whether it was over that limit.
The thing about multiplying by the vector factor sounds like it might well be an issue. I would guess no-one would have run into that in the past if it was.
clang-format: please reformat the code