This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Speed up placeDbgValues, NFC
AbandonedPublic

Authored by vsk on Feb 14 2020, 1:16 PM.

Details

Reviewers
jmorse
aprantl
rnk
Summary

For a file in WebKit, this brings the time spent in CodeGenPrepare from
10 minutes to 50 milliseconds. The reduction comes from using a cache of
visited instructions to speed up block-local dominance queries.

Testing: I built LNT using the Os-g cmake cache with & without this
patch, then diffed the object files to verify there was no binary diff
(and check-llvm of course).

rdar://59446577

Diff Detail

Event Timeline

vsk created this revision.Feb 14 2020, 1:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 1:16 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
rnk accepted this revision.Feb 14 2020, 1:26 PM

lgtm

llvm/lib/CodeGen/CodeGenPrepare.cpp
7273

I guess by the same reasoning this is still useful even if we had D51664. Although, if no reordering is required because all defs are before uses, D51664 would still help, and I think that is the common case. I assume the webkit case is a long BB of:

def A ; dbgval(A) ; def B ; dbgval(B)...
This revision is now accepted and ready to land.Feb 14 2020, 1:26 PM
vsk marked an inline comment as done.Feb 14 2020, 2:33 PM
vsk added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
7273

I believe D51664 makes this patch unnecessary. I grabbed the NumDbgValueMoved statistic from the WebKit test case: it was 0. This patch might still improve performance if CGP were to encounter lots of weird dbg.value use-before-def. But I think we want to fix those bugs earlier in the pipeline anyway.

Should we just hold off on landing this, or put it in as a stopgap until D51664 is merged (and file a PR for it to be reverted afterwards)?

rnk added inline comments.Feb 14 2020, 3:14 PM
llvm/lib/CodeGen/CodeGenPrepare.cpp
7273

I guess in the long run it would be best to not have this complexity here. I think the numbering invalidation logic could use whatever MLIR does to avoid invalidating the numbering too often, and then this code can remain simple.

So, I agree we should either land this as a stop gap with a plan to revert, or wait. I plan to leave the RFC alone until next Thursday, and hopefully land D51664 then. Given that timeline, you can either land or wait.

I suspect this will not be the only code in LLVM that can be simplified after the instruction numbering lands.

jmorse accepted this revision.Feb 17 2020, 6:02 AM

LGTM landed whichever way; am I right in thinking that this approach will still be faster than the domtree query even if the block is very large? (On account of the pointer set lookup being ordered, instead of walking through each instruction in a block).

Longer term, placeDbgValues should be entirely eliminated anyway, IIRC it's only still around because CodeGenPrepare doesn't try to update variable locations when it optimises things.

vsk added a comment.Feb 17 2020, 1:41 PM

LGTM landed whichever way;

I'm hoping we can just land D51664 and avoid making this code more complex.

am I right in thinking that this approach will still be faster than the domtree query even if the block is very large? (On account of the pointer set lookup being ordered, instead of walking through each instruction in a block).

For very large basic blocks, I think this patch might make placeDbgValues a little faster, because the set lookup is theoretically O(1) whereas Instruction::comesBefore is O(n). The worst-case bound is arguably the wrong thing to look at though because a) we'd cache the instruction order, and b) (more speculative) we don't expect to reorder very many debug intrinsics within a block? Note that not all reordering invalidates the instruction order because Reid made removeFromParent() preserve the order.

rnk added a comment.Feb 18 2020, 3:31 PM

I think for the case of reordering two existing instructions, the ordering will be invalidated. So, after D51664, this is still O(n^2) in the case where every other instruction is a dbg.value that needs sinking. However, that's a pretty pathological input, and I think we could address that use case in a general way by leaving holes in the numbering, rather than by keeping this patch around.

vsk added a comment.Feb 18 2020, 3:52 PM
In D74642#1881690, @rnk wrote:

I think for the case of reordering two existing instructions, the ordering will be invalidated. So, after D51664, this is still O(n^2) in the case where every other instruction is a dbg.value that needs sinking. However, that's a pretty pathological input, and I think we could address that use case in a general way by leaving holes in the numbering, rather than by keeping this patch around.

Agreed (point "b" in my last comment was my hand-waving about this being a fairly hard-to-hit case). I don't expect it to turn up, but if it does I'll look at poking holes in the instruction ordering.

vsk abandoned this revision.Feb 18 2020, 3:52 PM