This is an archive of the discontinued LLVM Phabricator instance.

[Reassociate]: Add intermediate subtract instructions created while negating to be redone later for more reassociate opportunities
ClosedPublic

Authored by aditya_nandakumar on Aug 25 2015, 4:21 PM.

Details

Summary

This is tackling the same issue as in http://reviews.llvm.org/D12096. Reassociate is currently unable to simplify expressions such as (2 * b - (5 * a - 3 * b))
As David Majnemer pointed out, running reassociate twice did simplify the same.
Redoing the intermediate instructions created while breaking up a subtract (Negating) can open up more opportunities for reassociation and in this case simplifies the above expression to 5 * (b - a)

Diff Detail

Repository
rL LLVM

Event Timeline

aditya_nandakumar retitled this revision from to [Reassociate]: Add intermediate subtract instructions created while negating to be redone later for more reassociate opportunities.
aditya_nandakumar updated this object.
aditya_nandakumar set the repository for this revision to rL LLVM.
majnemer edited edge metadata.

This approach looks reasonable to me but I'd appreciate it if @dberlin or @chandlerc could take a look.

Thanks for working on this, Aditya. I tend to agree with David; I much prefer this solution over the InstCombine equivalent. I added a few minor nits, but overall this looks good.

lib/Transforms/Scalar/Reassociate.cpp
896

Perhaps,

/// Also add intermediate instructions to the redo list that are modified while pushing the negates through adds. These will be revisited to see if additional opportunities have been exposed.

936

open up -> expose

2106

Perhaps something like:

// If the negate was simplified, revisit the users to see if we can reassociate further.
2130

Perhaps something like:

// If the negate was simplified, revisit the users to see if we can reassociate further.

test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll
2

Please use the CHECK-LABEL directive.

17

CHECK-LABEL:

Thanks Chad. I will fix the comments shortly.
I think I might have found another convergence issue with reassociate. I expect when the pass finishes, running reassociate again should not make any changes (the output should already be in canonicalized form - please correct me if this is not a valid expectation). It currently takes three runs of reassociate for the output to converge while running secondary.ll (above change). Should this also be tackled in this change?

I expect when the pass finishes, running reassociate again should not make any changes (the output should already be in canonicalized form - please correct me if this is not a valid expectation).

I think this should be the goal, but, as you're finding out, this isn't reality. IIRC, a similar question was asked and I believe David provided a similar comment. The approach you're taking seems to be moving us closer and that's a good thing. We just need to make sure we're not going overboard; we should only revisit things that have changed and only when that change is likely to expose other opportunities.

I see. I'll try to see what it takes for the above case to converge in one iteration.
The reason I asked is because I see reduction in instruction count (assembly) in several tests when I change the pass pipeline to have 2 reassociates(consecutive) vs just one. I will try and narrow down the patterns/cases which the second reassociate is exposing and/or cases which instcombine is missing.

dberlin edited edge metadata.Aug 26 2015, 1:34 PM
dberlin added a subscriber: dberlin.

First, you can get it to converge in O(size of the largest SCC of the
expressions being evaluated). Right now, reassociate does not look
through phi nodes, so that size will be 1 :)
Thus, it is possible to converge in one iteration with the right ordering.

Second, rather than spend lots of time on that, i would suggest other
approaches, Jingyu recently suggested a global reassociation algorithm
(https://docs.google.com/document/d/1momWzKFf4D6h8H3YlfgKQ3qeZy5ayvMRh6yR-Xn2hUE/edit#heading=h.pc7256itmioz)

(These are extensions of the existing n-ary reassociate)

This is likely a much better approach than trying to make the local
reassociation pass that reprocess things repeatedly, or even once,
because the *output* will be better :)

In particular,
A. I expect what is there now will not fixpoint in all cases, you'd
have to fix some things.
B. We already know the heuristics the local reassociation uses are
not only "bad for CSE" in a lot of cases, we know they are optimally
bad in a lot of cases

(IE it will transform things into the least canonical form).

For example:

; foo(a + c);
; foo((a + (b + c));

Reassociate on both:

RAIn: add i32 [ %a, #3] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %a, #3]

and
for the second:
RAIn: add i32 [ %a, #3] [ %b, #4] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %b, #4] [ %a, #3]

(IE c+ a, (c + b) + a)

The longer the expression, the worse it gets.

It is not possible to fix this without a global view of the
expressions, because you need to know "how i i pair the expressions
the last time i saw them" so you can pair them the same way.

Local reassociate, being a local algorithm, only has info about the
current expression chain, and thus, can't do something like this.

TL; DR While this patch seems great, doing a lot of work on local
reassociate is probably a mistake. Even if you get it to converge in
one iteration (which should be possible given processing), it'll still
give not great results in a lot of cases.

This was also all discussed a few times on the mailing list, if you
look back over threads mentioning reassociate.

Thanks Daniel. I definitely missed that conversation on the mailing list and I see the drawbacks of the local reassociate. I'll try seeing what little needs to be done to converge in the couple cases that I have (one above) and a superficial look at why the second reassociate improves codegen.
Is there already an implementation for global reassociate?

Ping? Was there any further feedback on this? We're seeing pretty horrible regressions caused by this.

I'm confused ;-)

The last status i saw was: "I'll try seeing what little needs to be done to
converge in the couple cases that I have (one above) and a superficial look
at why the second reassociate improves codegen."

I saw no update on that ;-)

and
"Is there already an implementation for global reassociate?"

Which i missed.

The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)

The last status i saw was: "I'll try seeing what little needs to be done to
converge in the couple cases that I have (one above) and a superficial look
at why the second reassociate improves codegen."

I saw no update on that ;-)

Would it be reasonable to stage that investigation? AFAICT the changes already proposed (with Chad's comments incorporated) are a strict improvement, and I at least am seeing pretty severe performance regressions from their absence. Would you be alright with going ahead and landing those?

and
"Is there already an implementation for global reassociate?"

Which i missed.

The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)

For my use case, at least, N-ary reassociation is not really appropriate as I also heavily depend on reassociation of floating point arithmetic. So, I'm stuck with local reassociation, and the problem described here is making today's LLVM integer factors worse than last year's for me.

Thanks Owen. Sorry about not updating this. This patch caused some regressions on some tests and I hadn't fully figured out/isolated the regression.

Thanks Owen. Sorry about not updating this. This patch caused some regressions on some tests and I hadn't fully figured out/isolated the regression.

What kinds of regressions? Can you replicate these by running Reassociate twice in an otherwise-standard pipeline? We really should get this figured out.

Danny, are you proposing that we extend N-ary reassociation to work on floating-point values?

There were both improvements in instruction count as well as reduction with this patch on our internal test suite.

aditya_nandakumar edited edge metadata.
aditya_nandakumar removed rL LLVM as the repository for this revision.

Modifying our pass pipeline to do reassociate twice resulted in some differences in Instruction count. On investigating why the second reassociate made a difference, I found that sometimes, when we revisit instructions, valid instruction tree roots don't get simplified (for eg factorize) when they get revisited before dead instructions as there are additional uses(false).
This patch tries to erase dead instructions before we try and redo the instructions. This improves codegen slightly (lesser instruction count) and takes Reassociate pass closer to being Idempotent.

mcrosier added inline comments.Dec 23 2015, 7:13 AM
lib/Transforms/Scalar/Reassociate.cpp
623

I assume this condition was removed because it will always be true, correct? If so, please commit this in isolation.

1943

How about:

Remove dead instructions and if any operands are trivially dead add them to
Insts so they will be removed as well.

1946

Ins -> Insts

1948

Can we use a for loop to loop over all the Instruction operands, rather than pushing/poping each operand onto/off of a SetVector?

2275

Please add a period. Comments should be written with proper capitalization, punctuation, etc.

2277

How about something like:

Iterate over all instructions to be reevaluated and remove trivially dead
instructions. If any operand of the trivially dead instruction becomes
dead mark it for deletion as well. Continue this process until all trivially
dead instructions have been removed.

2279

Please don't add extra curly brackets.

2289

Please add a period.

aditya_nandakumar marked 7 inline comments as done.Dec 23 2015, 11:09 AM
aditya_nandakumar added inline comments.
lib/Transforms/Scalar/Reassociate.cpp
623

Yes - I'll remove it and commit it separately

Updated based on feedback.

mcrosier accepted this revision.Dec 29 2015, 7:53 AM
mcrosier added a reviewer: mcrosier.

LGTM once the minor nits have been fixed.

lib/Transforms/Scalar/Reassociate.cpp
198

To conform to the surrounding coding style, would you mind adding argument names?

test/Transforms/Reassociate/factorize-again.ll
2 ↗(On Diff #43550)

You can probably drop this comment.

4 ↗(On Diff #43550)

Please use a CHECK-LABEL directive.

27 ↗(On Diff #43550)

Drop comment

28 ↗(On Diff #43550)

Remove dead "#0"

This revision is now accepted and ready to land.Dec 29 2015, 7:53 AM

Committed in r256773.

mcrosier closed this revision.Jan 5 2016, 8:39 AM

Aditya,
Once committed (r256773) please be sure to close the Phabricator review.

Chad