Value::getSingleUndroppableUse is used in the hot path of InstCombine.
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/BETA_amd64_debian_testing_clang8/llvm-project/build/bin/opt < /mnt/disks/ssd0/agent/workspace/BETA_amd64_debian_testing_clang8/llvm-project/llvm/test/CodeGen/NVPTX/loop-vectorize.ll -O3 -S | /mnt/disks/ssd0/agent/workspace/BETA_amd64_debian_testing_clang8/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/BETA_amd64_debian_testing_clang8/llvm-project/llvm/test/CodeGen/NVPTX/loop-vectorize.ll
Script: -- : 'RUN: at line 1'; c:\ws\workspace\amd64_windows_vs2017\llvm-project\build\bin\opt.exe < C:\ws\workspace\amd64_windows_vs2017\llvm-project\llvm\test\CodeGen\NVPTX\loop-vectorize.ll -O3 -S | c:\ws\workspace\amd64_windows_vs2017\llvm-project\build\bin\filecheck.exe C:\ws\workspace\amd64_windows_vs2017\llvm-project\llvm\test\CodeGen\NVPTX\loop-vectorize.ll
Note: Google Test filter = ConnectionFileDescriptorTest.TCPGetURIv6 [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
i was not able to get reliable time measurement for a whole compiler run.
but in a profiler this patched replaced a +0.16% cycles in Use::getUser() and +0.08% time in getSingleUndroppableUse for a +0.06% cycles in InstCombiner::run.
I'm seeing a 0.1-0.2% regression with this change. Probably the added cost of the pointer tagging outweighs the benefit we see for the single user of getSingleUndroppableUse() right now. If this API is going to be used more heavily in the future, the picture will of course change. Hard for me to say whether it makes sense to make this change at this time.
I do wonder whether there's a middle ground here: Rather than adding a tag to Use, we could use a bit of subclass data on CallInst to indicate whether the call is droppable. That would reduce User->isDroppable() to a compare and mask check, and we could at least inline it.
This sentence is cut off.
from profiling i saw that most of the cost of getSingleUndroppableUse comes from Use::getUser (+0.16%) and not from User::isDroppable (+0.05%). so i don't think this would make the difference we want.
with this revision i moved the tag to the Prev pointer because it is accessed much less than Val and it already has a tag so i just added a bit to the tag.
maybe this revision has less overhead.
by the way the compiler seems to be consistantly spending more than 1% of its cycle in Use::getUser. and i am wondering if way-marking is worth the memory saving.
Indeed, I'm seeing a 0.1% improvement with this version, which is about the size of the original regression.
It's not clear to me that this is actually safe though. On 32-bit platforms, wouldn't Use only be 4-aligned? Or does LLVM use a special allocator that provides stronger alignment guarantees?