Page MenuHomePhabricator

[InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes.
AbandonedPublic

Authored by aaboud on Aug 27 2017, 8:30 AM.

Details

Summary

Added support for patterns like this: http://rise4fun.com/Alive/85d
Where truncate expression contains nodes with multi usages, however all these usages are dominated by the same truncate instruction.

Solution: Instead of returning false when hitting such a node with more than one usage, simply record all the usages that were not visited yet, so they can be checked later against all visited nodes.

Diff Detail

Event Timeline

aaboud updated this revision to Diff 112830.Aug 27 2017, 8:30 AM
aaboud created this revision.

fixed some typos.

craig.topper added inline comments.Aug 27 2017, 9:18 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
706

I think a better way to name this is like this

canEvaluateTruncated->canEvaluateTruncatedImpl
canEvaluateTruncatedWrapper->canEvaluateTruncated

I think the Impl name will be more obvious that it shouldn't be called directly.

test/Transforms/InstCombine/cast.ll
1370

tranc->trunc

aaboud marked 2 inline comments as done.Aug 29 2017, 3:40 AM

Thanks Craig for the comments.
Will upload an updated patch soon.

lib/Transforms/InstCombine/InstCombineCasts.cpp
706

I thought to do that, but then tried to reduce the number of changed lines :)
I will update the patch accordingly.

aaboud updated this revision to Diff 113054.Aug 29 2017, 3:40 AM
aaboud marked an inline comment as done.

Fixed typo and naming according to Craig comments.

spatel edited edge metadata.

I didn't look closely at the implementation, but extending instcombine with visitation/evaluation maps may cross the boundary of what instcombine should handle. Adding some more potential reviewers.

I didn't look closely at the implementation, but extending instcombine with visitation/evaluation maps may cross the boundary of what instcombine should handle. Adding some more potential reviewers.

We could implement this by passing a clean set of visitedInsts/ToVisitInsts per call to these functions, instead of having a member instance in InstCombine.
I do not see why it is much different that passing the demandedBits map and update it recursively!

The only thing that we need to be aware of regarding these two functions I modified, is that they have no "depth" guard, like many others!

I do not see why it is much different that passing the demandedBits map and update it recursively!

That may be true, but there's also an argument that doing demandedBits analysis within instcombine already went over the line. :)
There have been a few recent llvm-dev threads about the responsibilities and limits of various passes. Here's one:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113212.html

I don't have enough experience with this, so I'll defer to others for their guidance. Just thought it was worth mentioning.

dberlin resigned from this revision.Aug 29 2017, 8:16 AM
aaboud added a comment.Sep 5 2017, 1:32 AM

So, how do you wish to proceed from here?
Do you think that such optimization should be moved to separate new pass? Though it will be doing very similar thing as InstCombine, just will catch more cases that it does today?

Please, share with me your opinion on this optimization.

Thanks,
Amjad

lsaba added a subscriber: lsaba.Sep 5 2017, 1:54 AM
spatel added a comment.Sep 5 2017, 8:00 AM

So, how do you wish to proceed from here?
Do you think that such optimization should be moved to separate new pass? Though it will be doing very similar thing as InstCombine, just will catch more cases that it does today?

Please, share with me your opinion on this optimization.

Not sure if the question is for me, but since I raised the doubt...yes, a separate pass that just does evaluation in a narrower type (a subset of demanded bits analysis?) seems like a reasonable thing to implement. Another option is to extend BDCE since that's also using demanded bits. Either way, that would let us lift this task out of instcombine where it is likely costing more than is necessary because instcombine runs so many times. Note that I raised a similar vector possibility for myself in https://reviews.llvm.org/D37236 (vector demanded elements).

That patch does suggest another option. As you've noted, we already do demanded bits analysis (and it even has some allowance for multiple uses) in instcombine, so why doesn't that work? The big fix is to extract that whole thing out of instcombine as its own pass so we can better control it, but (yet another) quick fix is to adjust demanded bits to catch these cases?

As I said, I don't have enough experience to say what's best. Post to llvm-dev to get more opinions?

aaboud abandoned this revision.Sep 27 2017, 6:23 AM

New approach was uploaded to D38313.