Standard -O0 IR and assembly can be hard to follow as, without mem2reg,
there are loads and stores to the stack everywhere that clutter things
up and make it hard to see where the actual interesting instructions are
(such as when trying to debug a crash in unoptimised code from the
disassembly). It is therefore useful to be able to force mem2reg to be
run even at -O0 to clean up a lot of those stack loads and stores. There
are also Clang CodeGen tests in the tree that explicitly run mem2reg on
the output in order to make the CHECK lines more readable, which
requires manually passing -disable-O0-optnone and piping to opt; having
a flag that supports this also makes those less clunky.
Whilst optimisation for speed's sake is not the primary purpose of this
patch, it does provide an easy significant improvement in code size as
you might expect, giving a ~12% decrease in code size on macOS/arm64
when compiling Clang itself with the option enabled, likely also having
a significant improvement on the running time of the test suite over a
plain Debug build. On GNU/Linux/amd64 the decrease is less pronounced,
at about 4%, likely due to the fact that many instructions can take one
memory operand and so do not have to pay the additional cost of a load
or store like on load-store architectures.