This change set enables the post RA scheduler for pentium4 and SSE3 class cpus.
The intent is that a vanilla "clang -m32 -O2" compilation, with no specific
-march setting, will get post scheduled. This has demonstrated significant
performance improvements when the code is run on Silvermont, with
essentially neutral performance on avx2 systems.
Some silvermont highlights include over 20% improvements
in 456.hmmer and several EEMBC benchmarks, and 4 to 15%
improvements in over a dozen other industry benchmarks.
There were only a few drops, in the 2 to 4% range.
On an AVX2 system, performance was generally flat, with
a balance of smallish gains and losses (in the 2 to 4% range).
The key scheduling improvement is that loads get separated
from their users.
As coded, the change should not affect "clang -m64" compilations.
With -m32, clang currently defaults to -mcpu=pentium4.
So I changed the scheduling model for pentium4 from the
GenericModel, to a new GenericPostRAModel (same properties,
but also enables the PostRAScheduler).
As clang's default CPU could theoretically change, I also
made the same change for "nearby" cpus pentium-m, pentium4m,
prescott and nocona. Arguably yonah should be in this set
as well, for consistency. But I'd like to get feedback on
whether this is an OK approach overall, as changing yonah will
affect many lit tests. I suspect most of these older cpus
are no longer used in practice, but don't really know.