since f1a9efabcb9bc37b663b0e03ed3d5a5aa7cc055e,
Value::getSingleUndroppableUse is used in the hot path of InstCombine.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
i was not able to get reliable time measurement for a whole compiler run.
but in a profiler this patched replaced a +0.16% cycles in Use::getUser() and +0.08% time in getSingleUndroppableUse for a +0.06% cycles in InstCombiner::run.
llvm/include/llvm/IR/Use.h | ||
---|---|---|
156 | There is a comment needed here explaining what this encodes. |
I'm seeing a 0.1-0.2% regression with this change. Probably the added cost of the pointer tagging outweighs the benefit we see for the single user of getSingleUndroppableUse() right now. If this API is going to be used more heavily in the future, the picture will of course change. Hard for me to say whether it makes sense to make this change at this time.
I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.
llvm/include/llvm/IR/User.h | ||
---|---|---|
227 | This sentence is cut off. |
I'm also not sure about changing Use, @nikic suggestion with marking it in the callinst sounds interesting.
from profiling i saw that most of the cost of getSingleUndroppableUse comes from Use::getUser (+0.16%) and not from User::isDroppable (+0.05%). so i don't think this would make the difference we want.
with this revision i moved the tag to the Prev pointer because it is accessed much less than Val and it already has a tag so i just added a bit to the tag.
maybe this revision has less overhead.
by the way the compiler seems to be consistantly spending more than 1% of its cycle in Use::getUser. and i am wondering if way-marking is worth the memory saving.
Indeed, I'm seeing a 0.1% improvement with this version, which is about the size of the original regression.
It's not clear to me that this is actually safe though. On 32-bit platforms, wouldn't Use only be 4-aligned? Or does LLVM use a special allocator that provides stronger alignment guarantees?
i don't think its safe either for 32-bits systems. we could over align Use to 8 to fix this for 32-bit without affecting 64-bit ?
clang-format: please reformat the code