These changes introduce a local stack symbol table ordering phase to allow all targets a chance to order the stack symbols the way they'd like it.
X86 heuristics are added to order the symbols to improve code size and locality. The current default behavior for all other targets is to leave the order untouched. Other target specific heuristics can be very easily applied by simply providing the necessary heuristics.
As an example, here are some cpu2000 code size improvements that I measured (mileage may vary depending on options used).. Numbers are percentage reduction in the sum of the text size of all objects:
177.mesa	1.81%
179.art	0.50%
183.equake	1.78%
188.ammp	3.75%
164.gzip	0.79%
175.vpr	0.85%
176.gcc	0.70%
181.mcf	0.26%
186.crafty	0.47%
197.parser	0.29%
252.eon	2.47%
253.perl	0.47%
254.gap	0.55%
255.vortex	1.14%
256.bzip2	0.64%
300.twolf	0.88%
total	1.24%
I measured performance on cpu2k, cpu2006, eembc, and a few other benchmarks and it was pretty much flat, although on average, these changes should also improve data locality.
Many lit changes broke due to new offsets being assigned to local symbols. I fixed these by disabling the optimization (vs. updating with new offsets) so that we'd avoid additional flakiness due to heuristic tuning.

Please clang-format this.