Page MenuHomePhabricator

[ARM] Allow v6m runtime loop unrolling
ClosedPublic

Authored by dmgreen on Mar 30 2021, 6:30 AM.

Details

Summary

This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed.

Diff Detail

Event Timeline

dmgreen created this revision.Mar 30 2021, 6:30 AM
dmgreen requested review of this revision.Mar 30 2021, 6:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 6:30 AM

Just a query on the context of this work: this wasn't enabled at that time because of some regressions. How does that look now? Does this work rely on some fixes to address that, or has the picture changed?

I'm not sure exactly why T1 unrolling wasn't enabled in the past. I think it was causing more trouble than it was worth, and not being a focus at the time was dropped fairly early. The extra tuning that was done for T2 after that would have helped T1 not regress too.

As with any change like this, some things are better, a few things are worse. In general the performance looks good though (I would not have suggested it if it didn't!). I've done a few minor changes elsewhere, but they were fairly generic, not v6m specific. There is more that we could probably get out of it in by tuning it in places, and I was contemplating whether to try and tune that now or to get this in and work from there. All the geomeans of the benchmarks I've ran are looking healthy though.

SjoerdMeijer accepted this revision.Apr 1 2021, 2:39 AM

Nice one, thanks.

This revision is now accepted and ready to land.Apr 1 2021, 2:39 AM
This revision was automatically updated to reflect the committed changes.