The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. This results in
subtle, frustrating churn in the particular order in which DAG combines
are applied which causes a number of minor regressions where we fail to
match a pattern previously matched by accident. AFAICT, all of these
should be using AddToWorklist to directly or should be written in a less
brittle way. None of the changes seem drastically bad, and a few of the
changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
Note that I have *NO IDEA* what the changes on any architecture other than x86
really imply, please check these for your target! I just don't know how to
evaluate them. I've literally transcribed the test case changes necessary to
pass, but it may be more useful to patch in this change and compare A/B to
understand the differences for a particular test case.
I think the x86 changes are really minor and uninteresting, but the avx512 at
least is hiding a "regression" (but the test case is just noise, not testing
some performance invariant) that might be looked into. Not sure if any of the
others impact specific "important" code paths, but they didn't look terribly
interesting to me, or the changes were really minor.
However, maybe this entire approach is just deeply flawed? What do folks think,
is this worthwhile?