This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed.
Details
Diff Detail
Event Timeline
Just a query on the context of this work: this wasn't enabled at that time because of some regressions. How does that look now? Does this work rely on some fixes to address that, or has the picture changed?
I'm not sure exactly why T1 unrolling wasn't enabled in the past. I think it was causing more trouble than it was worth, and not being a focus at the time was dropped fairly early. The extra tuning that was done for T2 after that would have helped T1 not regress too.
As with any change like this, some things are better, a few things are worse. In general the performance looks good though (I would not have suggested it if it didn't!). I've done a few minor changes elsewhere, but they were fairly generic, not v6m specific. There is more that we could probably get out of it in by tuning it in places, and I was contemplating whether to try and tune that now or to get this in and work from there. All the geomeans of the benchmarks I've ran are looking healthy though.