This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Limit the size of DIExpressions that we will salvage up to
ClosedPublic

Authored by StephenTozer on Sep 23 2021, 7:11 AM.

Details

Summary

Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841

This patch places an arbitrary limit on the size of DIExpressions that we will produce via salvaging, for performance reasons. This helps to fix a performance issue observed in the bug above, in which debug values would be salvaged hundreds of times, producing expressions with over 1000 elements and causing the compiler to hang. Limiting the size of debug values that we will produce to 128 largely fixes this issue.

It should be noted that this is not the only issue related to the bug above; InstCombine is also quadratically growing the number of debug intrinsics with each block it moves through, resulting in the number of intrinsics growing from 4561 to 70710 despite the fact that not a single one of the new intrinsics adds useful debug info. There are potential fixes for this, but I believe the problem of unlimited salvaging will still be an issue even if we smarten up InstCombine, as we can still legitimately salvage thousands of intrinsics thousands of times each before we determine that the resulting intrinsics provide no useful info. As such, I think this patch is probably a necessity that we can aim to reduce our dependence on with further changes.

Diff Detail

Event Timeline

StephenTozer created this revision.Sep 23 2021, 7:11 AM
StephenTozer requested review of this revision.Sep 23 2021, 7:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 7:11 AM
spatel edited reviewers, added: nikic; removed: spatel.Sep 23 2021, 7:31 AM

Thanks for working on this! I don't know anything about debug info/salvaging implementation, so I can't review the patch, but I hope someone else will have a look.

Could this reduce compile time in this config:
https://llvm-compile-time-tracker.com/?config=NewPM-ReleaseLTO-g&stat=instructions ?

If so, it would be good to know that number before committing - maybe tune the MaxExpressionSize setting for most benefit based on those benchmarks.

spatel added a subscriber: spatel.Sep 23 2021, 7:32 AM

(would need test coverage - which might help answer my main question: What sort of large expressions are observed in practice? Could they be simplified, rather than dropped? (I think it's probably still right/good to have a limit, but if hitting the limit is often reflective of some suboptimal representational choice in how the dwarf expression is formed, that could be optimized down to something more compact - that's something to be aware of/keep an eye on probably))

(would need test coverage - which might help answer my main question: What sort of large expressions are observed in practice? Could they be simplified, rather than dropped? (I think it's probably still right/good to have a limit, but if hitting the limit is often reflective of some suboptimal representational choice in how the dwarf expression is formed, that could be optimized down to something more compact - that's something to be aware of/keep an eye on probably))

In the bug above, the limit is hit due to this loop:

for (*f = 4; *f <= 33; *f += 1)
  for (b = -27; b <= 48; b++) {
    int16_t g;
    int32_t *i = e;
    (c -= 10) % (g = c ^= i != 0);
    a &= b;
  }

This loop is ultimately unrolled and optimized into a chain of ~4000 add and xor instructions, for which every xor has an associated debug value:

%sub.29.5 = add nsw i32 %xor.29.4, -10, !dbg !51
%xor.29.5 = xor i32 %sub.29.5, 1, !dbg !52
call void @llvm.dbg.value(metadata i32 %xor.29.5, metadata !31, metadata !DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_unsigned, DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_stack_value)), !dbg !50
%sub.29.6 = add nsw i32 %xor.29.5, -10, !dbg !51
%xor.29.6 = xor i32 %sub.29.6, 1, !dbg !52
call void @llvm.dbg.value(metadata i32 %xor.29.6, metadata !31, metadata !DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_unsigned, DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_stack_value)), !dbg !5
etc...

InstCombine shuffles these instructions around, and due to the way that we handle lost debug values this results in us trying to salvage back down this chain of instructions. This is what we would actually want to do in a lot of real cases, it just happens that in this case there is a prohibitive number of instructions we would have to salvage through in order to recover the debug value. I would definitely be in favour of implementing const-folding in expressions as a means of preventing us from hitting the limit, but it doesn't seem possible in this case.

For what it's worth , in this specific case the salvaged dbg.values are also useless since they form into a huge consecutive block with no instructions in-between - but 1) I think the IR could be modified such that this wasn't the case for at least some of the intrinsics, and 2) without making a much larger change to InstCombine, I don't think we can know that the intrinsics will be useless before we've finished the salvaging.

(would need test coverage - which might help answer my main question: What sort of large expressions are observed in practice? Could they be simplified, rather than dropped? (I think it's probably still right/good to have a limit, but if hitting the limit is often reflective of some suboptimal representational choice in how the dwarf expression is formed, that could be optimized down to something more compact - that's something to be aware of/keep an eye on probably))

In the bug above, the limit is hit due to this loop:

for (*f = 4; *f <= 33; *f += 1)
  for (b = -27; b <= 48; b++) {
    int16_t g;
    int32_t *i = e;
    (c -= 10) % (g = c ^= i != 0);
    a &= b;
  }

This loop is ultimately unrolled and optimized into a chain of ~4000 add and xor instructions, for which every xor has an associated debug value:

%sub.29.5 = add nsw i32 %xor.29.4, -10, !dbg !51
%xor.29.5 = xor i32 %sub.29.5, 1, !dbg !52
call void @llvm.dbg.value(metadata i32 %xor.29.5, metadata !31, metadata !DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_unsigned, DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_stack_value)), !dbg !50
%sub.29.6 = add nsw i32 %xor.29.5, -10, !dbg !51
%xor.29.6 = xor i32 %sub.29.6, 1, !dbg !52
call void @llvm.dbg.value(metadata i32 %xor.29.6, metadata !31, metadata !DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_unsigned, DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_stack_value)), !dbg !5
etc...

InstCombine shuffles these instructions around, and due to the way that we handle lost debug values this results in us trying to salvage back down this chain of instructions. This is what we would actually want to do in a lot of real cases, it just happens that in this case there is a prohibitive number of instructions we would have to salvage through in order to recover the debug value. I would definitely be in favour of implementing const-folding in expressions as a means of preventing us from hitting the limit, but it doesn't seem possible in this case.

For what it's worth , in this specific case the salvaged dbg.values are also useless since they form into a huge consecutive block with no instructions in-between - but 1) I think the IR could be modified such that this wasn't the case for at least some of the intrinsics, and 2) without making a much larger change to InstCombine, I don't think we can know that the intrinsics will be useless before we've finished the salvaging.

Fair enough - thanks for bearing with me!

StephenTozer updated this revision to Diff 379481.EditedOct 13 2021, 11:36 AM

Added test.

jmorse accepted this revision.Oct 14 2021, 3:56 AM

LGTM, with naming suggestion inline

llvm/lib/Transforms/Utils/Local.cpp
3–1

IMO, "Valid" is the wrong term to use here as it suggests a correctness problem. Could I suggest "IsDesirableSalvageExpr"?

This revision is now accepted and ready to land.Oct 14 2021, 3:56 AM

Thanks for working on this! I don't know anything about debug info/salvaging implementation, so I can't review the patch, but I hope someone else will have a look.

Could this reduce compile time in this config:
https://llvm-compile-time-tracker.com/?config=NewPM-ReleaseLTO-g&stat=instructions ?

If so, it would be good to know that number before committing - maybe tune the MaxExpressionSize setting for most benefit based on those benchmarks.

I've tested a few different versions on the compile time tracker, but I'm afraid I can't see any measurable difference - probably because most programs will never hit this limit to begin with, so it makes little difference performance-wise where we draw the line. With respect to the reported broken file, the most significant performance impact of lengthening the allowed expressions is increasing the size of the intermediate file (so memory usage, and disk space if the intermediate file is saved). In terms of performance, there doesn't seem to be a huge difference, but 128 seems a good medium to draw it at. We could go down to 64, since most programs will never hit that with a legitimate expression anyway, but 128 is already low enough to make the pathological cases manageable, so we have enough leeway to support potentially very large expressions.

Thanks for working on this! I don't know anything about debug info/salvaging implementation, so I can't review the patch, but I hope someone else will have a look.

Could this reduce compile time in this config:
https://llvm-compile-time-tracker.com/?config=NewPM-ReleaseLTO-g&stat=instructions ?

If so, it would be good to know that number before committing - maybe tune the MaxExpressionSize setting for most benefit based on those benchmarks.

I've tested a few different versions on the compile time tracker, but I'm afraid I can't see any measurable difference - probably because most programs will never hit this limit to begin with, so it makes little difference performance-wise where we draw the line. With respect to the reported broken file, the most significant performance impact of lengthening the allowed expressions is increasing the size of the intermediate file (so memory usage, and disk space if the intermediate file is saved). In terms of performance, there doesn't seem to be a huge difference, but 128 seems a good medium to draw it at. We could go down to 64, since most programs will never hit that with a legitimate expression anyway, but 128 is already low enough to make the pathological cases manageable, so we have enough leeway to support potentially very large expressions.

Sounds good. Thanks for checking.

This revision was landed with ongoing or failed builds.Oct 15 2021, 9:35 AM
This revision was automatically updated to reflect the committed changes.