This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Rearrange SizeReduction when using -Oz
ClosedPublic

Authored by NickGuy on Jun 24 2020, 12:44 AM.

Details

Summary

Move the Thumb2SizeReduce pass to before IfConversion when optimising
for minimal code size.

Running the Thumb2SizeReduction pass before IfConversion allows T1 instructions
to propagate to the final output, rather than the ifConverter modifying T2
instructions and preventing them from being reduced later.

This change does introduce a regression regarding execution time, so it's only
applied when optimising for size.

Running the LLVM Test Suite with this change produces a geomean
difference of -0.1% for the size..text metric.

LLVM Test suite results:
Tests: 310
Metric: size..text

ProgramInitialModifieddiff
test-suite.../Applications/sgefa/sgefa.test61526096-0.9%
test-suite...ks/Prolangs-C/agrep/agrep.test2410023884-0.9%
test-suite.../Benchmarks/Stanford/Perm.test524520-0.8%
test-suite...rks/FreeBench/mason/mason.test12321224-0.6%
test-suite.../Trimaran/enc-rc4/enc-rc4.test656652-0.6%
test-suite...itBench/uudecode/uudecode.test664660-0.6%
test-suite...Benchmarks/Stanford/Oscar.test13441336-0.6%
test-suite...ks/Shootout/Shootout-hash.test13601352-0.6%
test-suite...nch/fourinarow/fourinarow.test43364312-0.6%
test-suite...ch/g721/g721encode/encode.test38163796-0.5%
test-suite...itBench/uuencode/uuencode.test796792-0.5%
test-suite.../Benchmarks/Dhrystone/dry.test820816-0.5%
test-suite...count/automotive-bitcount.test18241816-0.4%
test-suite...marks/Olden/bisort/bisort.test18321824-0.4%
test-suite...CI_Purple/SMG2000/smg2000.test8990489512-0.4%

Geomean difference -0.1%

InitialModifieddiff
count310.000000310.000000310.000000
mean19649.87096819629.354839-0.000601
std48795.05345048742.7405880.001539
min292.000000292.000000-0.009103
25%1090.0000001090.000000-0.000648
50%2770.0000002772.0000000.000000
75%10185.00000010185.0000000.000000
max324040.000000323540.0000000.003704

Diff Detail

Event Timeline

NickGuy created this revision.Jun 24 2020, 12:44 AM
NickGuy edited the summary of this revision. (Show Details)Jun 24 2020, 1:08 AM

Sounds interesting. I've wanted to do the same thing for different reasons in the past too.

Can you update with full context? -U999999

NickGuy updated this revision to Diff 273002.Jun 24 2020, 5:38 AM
NickGuy edited the summary of this revision. (Show Details)

Reworded/improved summary, and included full patch context

This sounds good to me. And my results agree with your results that this is a nice improvement.

Ideally I think it would be better to do this all the time, not just for minsize. But my understanding from trying this a long time ago was that would mess up a lot of the schedulers we have that would take quite a bit of work to fix. It might be worth fixing them in the long run (with the way cortex-m cores can dual issue), but for codesize alone that shouldn't block you.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
525–526

Can you combine this comment with the "in v8" one above.

llvm/test/CodeGen/ARM/t2-shrink-ldrpost.ll
32

Can you leave in a comment that says this function shouldn't produce a ldm. And maybe one above that says that we should produce a ldm. It sounds useful to keep that around for future reference.

But my understanding from trying this a long time ago was that would mess up a lot of the schedulers we have that would take quite a bit of work to fix.

You mean, the scheduling models don't handle Thumb1 instructions well? Or there an issue with the way the actual CPUs handle Thumb1 instructions?

You mean, the scheduling models don't handle Thumb1 instructions well?

Yes. Only certain pairs of 16-bit instructions can be dual issued, but we don't know if instructions are Thumb1 or not. So there's room for improvement here.

You mean, the scheduling models don't handle Thumb1 instructions well? Or there an issue with the way the actual CPUs handle Thumb1 instructions?

The models do not handle thumb1 instructions well because they have not come up when scheduling in the past. You can (I presume) always treat them just like the equivalent thumb2 instruction and get similar results, but it would take time to get right.

The cores can often dual issue certain combinations of thumb1 instructions, so properly scheduling them would be useful. We currently, especially pre-ra, have to try and guess at what might become a thumb1 instruction. For older cores this wasn't super interesting due to the exact instructions that could be dual issued but newer cores are always getting better. It is on my list to potentially do something about this, if I can find enough cases of it going wrong to make it look promising, seeing as it's only post-ra that we can easily fix.

NickGuy updated this revision to Diff 273686.Jun 26 2020, 5:21 AM
NickGuy marked 2 inline comments as done.

Addressing inline comments

dmgreen accepted this revision.Jun 29 2020, 1:22 AM

LGTM

llvm/lib/Target/ARM/ARMTargetMachine.cpp
521–523

Feel free to make this into more of a sentence too, with Capitalization and a full stop.

This revision is now accepted and ready to land.Jun 29 2020, 1:22 AM
NickGuy updated this revision to Diff 274722.Jul 1 2020, 2:05 AM
NickGuy marked an inline comment as not done.

Updates some comments. NFC when compared to prior diffs

NickGuy marked an inline comment as done.Jul 1 2020, 2:05 AM
This revision was automatically updated to reflect the committed changes.