This is an archive of the discontinued LLVM Phabricator instance.

[NFC] Improve performances of Value::getSingleUndroppableUse
AbandonedPublic

Authored by Tyker on Mar 27 2020, 6:57 AM.

Details

Summary

since f1a9efabcb9bc37b663b0e03ed3d5a5aa7cc055e,
Value::getSingleUndroppableUse is used in the hot path of InstCombine.

Diff Detail

Event Timeline

Tyker created this revision.Mar 27 2020, 6:57 AM
Herald added a project: Restricted Project. · View Herald Transcript
Tyker added a comment.EditedMar 27 2020, 7:05 AM

i was not able to get reliable time measurement for a whole compiler run.
but in a profiler this patched replaced a +0.16% cycles in Use::getUser() and +0.08% time in getSingleUndroppableUse for a +0.06% cycles in InstCombiner::run.

jdoerfert added inline comments.Mar 27 2020, 8:21 AM
llvm/include/llvm/IR/Use.h
152

There is a comment needed here explaining what this encodes.

I'm seeing a 0.1-0.2% regression with this change. Probably the added cost of the pointer tagging outweighs the benefit we see for the single user of getSingleUndroppableUse() right now. If this API is going to be used more heavily in the future, the picture will of course change. Hard for me to say whether it makes sense to make this change at this time.

I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.

llvm/include/llvm/IR/User.h
227

This sentence is cut off.

I'm also not sure about changing Use, @nikic suggestion with marking it in the callinst sounds interesting.

Tyker updated this revision to Diff 253334.EditedMar 28 2020, 5:53 AM

I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.

from profiling i saw that most of the cost of getSingleUndroppableUse comes from Use::getUser (+0.16%) and not from User::isDroppable (+0.05%). so i don't think this would make the difference we want.

with this revision i moved the tag to the Prev pointer because it is accessed much less than Val and it already has a tag so i just added a bit to the tag.
maybe this revision has less overhead.

by the way the compiler seems to be consistantly spending more than 1% of its cycle in Use::getUser. and i am wondering if way-marking is worth the memory saving.

nikic added a comment.Mar 28 2020, 7:50 AM

I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.

from profiling i saw that most of the cost of getSingleUndroppableUse comes from Use::getUser (+0.16%) and not from User::isDroppable (+0.05%). so i don't think this would make the difference we want.

with this revision i moved the tag to the Prev pointer because it is accessed much less than Val and it already has a tag so i just added a bit to the tag.
maybe this revision has less overhead.

Indeed, I'm seeing a 0.1% improvement with this version, which is about the size of the original regression.

It's not clear to me that this is actually safe though. On 32-bit platforms, wouldn't Use only be 4-aligned? Or does LLVM use a special allocator that provides stronger alignment guarantees?

Tyker added a comment.EditedMar 28 2020, 8:08 AM

I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.

from profiling i saw that most of the cost of getSingleUndroppableUse comes from Use::getUser (+0.16%) and not from User::isDroppable (+0.05%). so i don't think this would make the difference we want.

with this revision i moved the tag to the Prev pointer because it is accessed much less than Val and it already has a tag so i just added a bit to the tag.
maybe this revision has less overhead.

Indeed, I'm seeing a 0.1% improvement with this version, which is about the size of the original regression.

It's not clear to me that this is actually safe though. On 32-bit platforms, wouldn't Use only be 4-aligned? Or does LLVM use a special allocator that provides stronger alignment guarantees?

i don't think its safe either for 32-bits systems. we could over align Use to 8 to fix this for 32-bit without affecting 64-bit ?

Tyker updated this revision to Diff 253420.Mar 29 2020, 7:45 AM

overalined the 2 fields the Use::Prev can point to.

Tyker abandoned this revision.Apr 17 2020, 3:47 AM

since D77144 has landed i don't think this is needed anymore