This is tackling the same issue as in http://reviews.llvm.org/D12096. Reassociate is currently unable to simplify expressions such as (2 * b - (5 * a - 3 * b))
As David Majnemer pointed out, running reassociate twice did simplify the same.
Redoing the intermediate instructions created while breaking up a subtract (Negating) can open up more opportunities for reassociation and in this case simplifies the above expression to 5 * (b - a)
Details
Diff Detail
Event Timeline
This approach looks reasonable to me but I'd appreciate it if @dberlin or @chandlerc could take a look.
Thanks for working on this, Aditya. I tend to agree with David; I much prefer this solution over the InstCombine equivalent. I added a few minor nits, but overall this looks good.
lib/Transforms/Scalar/Reassociate.cpp | ||
---|---|---|
886–890 | Perhaps, /// Also add intermediate instructions to the redo list that are modified while pushing the negates through adds. These will be revisited to see if additional opportunities have been exposed. | |
927 | open up -> expose | |
2110 | Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further. | |
2133 | Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further. | |
test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll | ||
2 ↗ | (On Diff #33149) | Please use the CHECK-LABEL directive. |
17 ↗ | (On Diff #33149) | CHECK-LABEL: |
Thanks Chad. I will fix the comments shortly.
I think I might have found another convergence issue with reassociate. I expect when the pass finishes, running reassociate again should not make any changes (the output should already be in canonicalized form - please correct me if this is not a valid expectation). It currently takes three runs of reassociate for the output to converge while running secondary.ll (above change). Should this also be tackled in this change?
I think this should be the goal, but, as you're finding out, this isn't reality. IIRC, a similar question was asked and I believe David provided a similar comment. The approach you're taking seems to be moving us closer and that's a good thing. We just need to make sure we're not going overboard; we should only revisit things that have changed and only when that change is likely to expose other opportunities.
I see. I'll try to see what it takes for the above case to converge in one iteration.
The reason I asked is because I see reduction in instruction count (assembly) in several tests when I change the pass pipeline to have 2 reassociates(consecutive) vs just one. I will try and narrow down the patterns/cases which the second reassociate is exposing and/or cases which instcombine is missing.
First, you can get it to converge in O(size of the largest SCC of the
expressions being evaluated). Right now, reassociate does not look
through phi nodes, so that size will be 1 :)
Thus, it is possible to converge in one iteration with the right ordering.
Second, rather than spend lots of time on that, i would suggest other
approaches, Jingyu recently suggested a global reassociation algorithm
(https://docs.google.com/document/d/1momWzKFf4D6h8H3YlfgKQ3qeZy5ayvMRh6yR-Xn2hUE/edit#heading=h.pc7256itmioz)
(These are extensions of the existing n-ary reassociate)
This is likely a much better approach than trying to make the local
reassociation pass that reprocess things repeatedly, or even once,
because the *output* will be better :)
In particular,
A. I expect what is there now will not fixpoint in all cases, you'd
have to fix some things.
B. We already know the heuristics the local reassociation uses are
not only "bad for CSE" in a lot of cases, we know they are optimally
bad in a lot of cases
(IE it will transform things into the least canonical form).
For example:
; foo(a + c);
; foo((a + (b + c));
Reassociate on both:
RAIn: add i32 [ %a, #3] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %a, #3]
and
for the second:
RAIn: add i32 [ %a, #3] [ %b, #4] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %b, #4] [ %a, #3]
(IE c+ a, (c + b) + a)
The longer the expression, the worse it gets.
It is not possible to fix this without a global view of the
expressions, because you need to know "how i i pair the expressions
the last time i saw them" so you can pair them the same way.
Local reassociate, being a local algorithm, only has info about the
current expression chain, and thus, can't do something like this.
TL; DR While this patch seems great, doing a lot of work on local
reassociate is probably a mistake. Even if you get it to converge in
one iteration (which should be possible given processing), it'll still
give not great results in a lot of cases.
This was also all discussed a few times on the mailing list, if you
look back over threads mentioning reassociate.
Thanks Daniel. I definitely missed that conversation on the mailing list and I see the drawbacks of the local reassociate. I'll try seeing what little needs to be done to converge in the couple cases that I have (one above) and a superficial look at why the second reassociate improves codegen.
Is there already an implementation for global reassociate?
Ping? Was there any further feedback on this? We're seeing pretty horrible regressions caused by this.
I'm confused ;-)
The last status i saw was: "I'll try seeing what little needs to be done to
converge in the couple cases that I have (one above) and a superficial look
at why the second reassociate improves codegen."
I saw no update on that ;-)
and
"Is there already an implementation for global reassociate?"
Which i missed.
The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)
Would it be reasonable to stage that investigation? AFAICT the changes already proposed (with Chad's comments incorporated) are a strict improvement, and I at least am seeing pretty severe performance regressions from their absence. Would you be alright with going ahead and landing those?
and
"Is there already an implementation for global reassociate?"Which i missed.
The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)
For my use case, at least, N-ary reassociation is not really appropriate as I also heavily depend on reassociation of floating point arithmetic. So, I'm stuck with local reassociation, and the problem described here is making today's LLVM integer factors worse than last year's for me.
Thanks Owen. Sorry about not updating this. This patch caused some regressions on some tests and I hadn't fully figured out/isolated the regression.
What kinds of regressions? Can you replicate these by running Reassociate twice in an otherwise-standard pipeline? We really should get this figured out.
Danny, are you proposing that we extend N-ary reassociation to work on floating-point values?
There were both improvements in instruction count as well as reduction with this patch on our internal test suite.
Modifying our pass pipeline to do reassociate twice resulted in some differences in Instruction count. On investigating why the second reassociate made a difference, I found that sometimes, when we revisit instructions, valid instruction tree roots don't get simplified (for eg factorize) when they get revisited before dead instructions as there are additional uses(false).
This patch tries to erase dead instructions before we try and redo the instructions. This improves codegen slightly (lesser instruction count) and takes Reassociate pass closer to being Idempotent.
lib/Transforms/Scalar/Reassociate.cpp | ||
---|---|---|
613 | I assume this condition was removed because it will always be true, correct? If so, please commit this in isolation. | |
1931 | How about: Remove dead instructions and if any operands are trivially dead add them to | |
1934 | Ins -> Insts | |
1936 | Can we use a for loop to loop over all the Instruction operands, rather than pushing/poping each operand onto/off of a SetVector? | |
2277 | Please add a period. Comments should be written with proper capitalization, punctuation, etc. | |
2279 | How about something like: Iterate over all instructions to be reevaluated and remove trivially dead | |
2291 | Please add a period. | |
2295 | Please don't add extra curly brackets. |
lib/Transforms/Scalar/Reassociate.cpp | ||
---|---|---|
613 | Yes - I'll remove it and commit it separately |
LGTM once the minor nits have been fixed.
lib/Transforms/Scalar/Reassociate.cpp | ||
---|---|---|
186 | To conform to the surrounding coding style, would you mind adding argument names? | |
test/Transforms/Reassociate/factorize-again.ll | ||
2 | You can probably drop this comment. | |
4 | Please use a CHECK-LABEL directive. | |
27 | Drop comment | |
28 | Remove dead "#0" |
Aditya,
Once committed (r256773) please be sure to close the Phabricator review.
Chad
To conform to the surrounding coding style, would you mind adding argument names?