This fully fixes PR24804 and should return some serious sanity to the
loop pass pipeline. I've added comments to try to prevent folks from
accidentially breaking this and there is a test to catch changes here as
well now.
However, this is a very substantial change. We now rely on
loop-simplifycfg and loop-instsimplify to do the mid-loop-pass cleanup.
These actually work with the loop pass pipeline but are no where near as
powerful as simplifycfg and instcombine. I've added a direct run of
those two immediately after the loop pipeline to try to make sure we
adequately clean up any cruft produced, but this isn't the same as
running them in the middle.
Overall, I *strongly* suspect this is a net win. It is the model we
want. If there are regressions, the correct fix will just be to enhance
loop-simplifycfg and loop-instsimplify (potentially making
a loop-instcombine variant if needed) until they catch the necessary
cases.
However, I don't actually have a good loop-heavy benchmark suite. So I'd
really appreciate it if folks who do could benchmark this change and see
what happens. If there are serious regressions, I can even take a pretty
naive stab at enhancing the two passes to be more powerful. But I'd like
to see if this is actually already enough to handle the real cases we
have today.
There is one test I had to update because it used -O1 and after this
change we actually avoid forming memset in the awkward position that it
wanted to test for (instead we will pretty much completely nuke all the
code, so I'm happier with the end result personally). But to keep the
test relevant, I've taken the particualr memset pattern formed
previously and directly put it into the test case to make sure we don't
somehow perturb it with GMR.
I'd add a reference to the PR between parentheses at the end, because "bad things can happen" is typically the kind of comments that will stay there forever with no-one remembering what it refers to.