This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands
ClosedPublic

Authored by StephenTozer on Apr 27 2021, 9:10 AM.

Details

Summary

This patch addresses a crash that has been observed from rG791930d74087, although the underlying error was introduced in patch rG7d0cafba962c. Currently, LiveDebugVariables asserts that no DBG_VALUE_LIST can have 64 or more debug operands. This assertion was added on the assumption that this limit was high enough that it would almost certainly be hit only as the result of an earlier compiler error, but a relatively simple reproducer has been found that triggers this assertion; this was noted in the latest review comments for D91722.

This patch fixes this issue by removing the assertion, and instead dropping the DBG_VALUE_LISTs. It may technically be possible to keep them, but we should make sure that such complex DBG_VALUE_LISTs will not introduce significant performance issues first. There may also be issues with the use of IntervalMap in LiveDebugVariables; the reason for the current limit of <64 operands is that supporting any greater number of operands would increase the size of the DbgVariableValue class, which is kept as small as possible to be used as the value in an IntervalMap.

Diff Detail

Event Timeline

StephenTozer created this revision.Apr 27 2021, 9:10 AM
StephenTozer requested review of this revision.Apr 27 2021, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 9:10 AM

This presumably needs a test case.

But also: The DWARF we're generating here is pretty heinous - is the intent that this is a short term fix, but some attempt to reign in these /massive/ expressions is on your/someone's TODO list?

Hmm, I guess maybe the DWARF expression size isn't the fault of the DWARF location generation itself, instead maybe it's a function of some weird/bad codegen? We're spilling all these values to the stack even at -O3? 248 byte stack frame for this function?

llvm/lib/CodeGen/LiveDebugVariables.cpp
121–122

make_unique, perhaps?

This presumably needs a test case.

Indeed, I'm working on right now - it just needs to be reduced significantly, any "organic" reproducer for this results in an excessively large test file (~1000 lines with the simplest .cpp reproducer).

But also: The DWARF we're generating here is pretty heinous - is the intent that this is a short term fix, but some attempt to reign in these /massive/ expressions is on your/someone's TODO list?

I think right now it's a bit of an open question as to what we do with these. It does leave us in slightly unfamiliar territory w.r.t debug info; usually our concern is just recovering as much debug info as we can, and we usually only intentionally remove it either when we need to due to an optimization, or when some debug info is redundant/unused. I think we can do some testing to determine what should be considered the limit (if any), and/or use a heuristic to find debug information that can be dropped without causing much impact to an end-user.

Hmm, I guess maybe the DWARF expression size isn't the fault of the DWARF location generation itself, instead maybe it's a function of some weird/bad codegen? We're spilling all these values to the stack even at -O3? 248 byte stack frame for this function?

While this is a weird case, it seems to be legitimate as far as I can tell - we have a variable with at least one use, and a value that is the sum of 64 (or more) function calls. I may be missing something (I'm not a CodeGen expert by any means), but it seems as though the DBG_VALUE_LIST itself is correct - we've 64 different values and our variable is known to be the sum of all of them. It's possible that there is a failure to efficiently optimize here, as we have the result of every single function call either in a register or on the stack and then sum them, rather than reusing a single register/stack slot for each call and calculating a rolling sum, but the code is still correct and LiveDebugVariables shouldn't crash upon encountering it.

This presumably needs a test case.

Indeed, I'm working on right now - it just needs to be reduced significantly, any "organic" reproducer for this results in an excessively large test file (~1000 lines with the simplest .cpp reproducer).

Hmm - not sure I follow. The LLVM IR for the reproduce @rupprecht gave seems to be about half that (~460 lines) - in any case, one workaround could be to include a test-only flag for the limit and set the limit artificially low for testing.

But also: The DWARF we're generating here is pretty heinous - is the intent that this is a short term fix, but some attempt to reign in these /massive/ expressions is on your/someone's TODO list?

I think right now it's a bit of an open question as to what we do with these. It does leave us in slightly unfamiliar territory w.r.t debug info; usually our concern is just recovering as much debug info as we can, and we usually only intentionally remove it either when we need to due to an optimization, or when some debug info is redundant/unused. I think we can do some testing to determine what should be considered the limit (if any), and/or use a heuristic to find debug information that can be dropped without causing much impact to an end-user.

Yeah, fair

Hmm, I guess maybe the DWARF expression size isn't the fault of the DWARF location generation itself, instead maybe it's a function of some weird/bad codegen? We're spilling all these values to the stack even at -O3? 248 byte stack frame for this function?

While this is a weird case, it seems to be legitimate as far as I can tell - we have a variable with at least one use, and a value that is the sum of 64 (or more) function calls. I may be missing something (I'm not a CodeGen expert by any means), but it seems as though the DBG_VALUE_LIST itself is correct - we've 64 different values and our variable is known to be the sum of all of them. It's possible that there is a failure to efficiently optimize here, as we have the result of every single function call either in a register or on the stack and then sum them, rather than reusing a single register/stack slot for each call and calculating a rolling sum, but the code is still correct and LiveDebugVariables shouldn't crash upon encountering it.

Sorry, I should clarify: I mean the actual assembly generated for this seems inefficient - even when building without debug info. It uses a whole lot of stack space, when it could use one register and add to it between each call. (@Carrot - is this of any interest to you, maybe? Check the codegen for https://reviews.llvm.org/D91722#2718316 - each call has its value placed on a separate stack slot, then after all the calls each stack value is added together/onto ebx - rather than adding onto a callee save register after each call, perhaps? (I think that's what GCC does))
Though fixing the codegen of course wouldn't guarantee the debug info never grows this large - so fixing the growth is important even if someone changes the codegen to be less problematic in this specific case.

Hmm - not sure I follow. The LLVM IR for the reproduce @rupprecht gave seems to be about half that (~460 lines) - in any case, one workaround could be to include a test-only flag for the limit and set the limit artificially low for testing.

Yeah, I was just planning on producing a test case that starts as late in the pipeline as possible; the MIR prior to beginning register allocation is 977 lines, from a 351 line LLVM IR file. If it's better to use the IR file for this (which may be reasonable given the size difference), then that simplifies things - although it can still probably be reduced further. If that doesn't work, then using a test flag might be the way to go.

Hmm - not sure I follow. The LLVM IR for the reproduce @rupprecht gave seems to be about half that (~460 lines) - in any case, one workaround could be to include a test-only flag for the limit and set the limit artificially low for testing.

Yeah, I was just planning on producing a test case that starts as late in the pipeline as possible; the MIR prior to beginning register allocation is 977 lines, from a 351 line LLVM IR file. If it's better to use the IR file for this (which may be reasonable given the size difference), then that simplifies things - although it can still probably be reduced further. If that doesn't work, then using a test flag might be the way to go.

Ah, right - I haven't dabbled much with the MIR test cases, I can see how that easily doubles the size of the test, etc. I'd personally be OK with a plain IR case for something like this, perhaps. (@aprantl @JDevlieghere any thoughts on this?)

Add reduced test case, use std::make_unique to initialize unique_ptrs.

Oh, right. I just realized this test doesn't have to be especially realistic because it's only about the length of the expression - not how we got such a long expression in the first place?

So perhaps we could include a comment describing the real-ish situation, but then make the test case really simple/obvious? - it doesn't need any IR instructions really, and can have an expression that's simple/uninteresting in some way? Like adding together the constant 1 repeatedly?

Oh, right. I just realized this test doesn't have to be especially realistic because it's only about the length of the expression - not how we got such a long expression in the first place?

So perhaps we could include a comment describing the real-ish situation, but then make the test case really simple/obvious? - it doesn't need any IR instructions really, and can have an expression that's simple/uninteresting in some way? Like adding together the constant 1 repeatedly?

Yes, that would work - I've checked that as well, and there can be an LL test case of just over 30 lines (although to be more specific, they have to be different constants - identical values will be coalesced just above the assert in question). One thing is that I suspect we might have change it back to the register form in the future however; on the topic of reducing the size of expressions, DIExpression optimization has come up before. The simplest forms of this to implement would be removal of no-ops subexpressions (like "multiply by 1), and simplifying constant subexpressions. This would cause an issue with the test later on, at which point we'd revert back to the test currently in this patch.

That being said, we don't have to bend over backwards to accomodate future potential changes; I can just add a note to the test pointing to this review to ensure that if it breaks later as the result of such a change being added, whoever picks it up doesn't have to go back to square 1 on finding a reproducer.

Simplify test case by using constants in the DIArgList instead of SSA values.

dblaikie accepted this revision.Apr 27 2021, 5:33 PM

Oh, right. I just realized this test doesn't have to be especially realistic because it's only about the length of the expression - not how we got such a long expression in the first place?

So perhaps we could include a comment describing the real-ish situation, but then make the test case really simple/obvious? - it doesn't need any IR instructions really, and can have an expression that's simple/uninteresting in some way? Like adding together the constant 1 repeatedly?

Yes, that would work - I've checked that as well, and there can be an LL test case of just over 30 lines (although to be more specific, they have to be different constants - identical values will be coalesced just above the assert in question). One thing is that I suspect we might have change it back to the register form in the future however; on the topic of reducing the size of expressions, DIExpression optimization has come up before. The simplest forms of this to implement would be removal of no-ops subexpressions (like "multiply by 1), and simplifying constant subexpressions. This would cause an issue with the test later on, at which point we'd revert back to the test currently in this patch.

Guess you could change it to an MIR test case now that it can be further reduced like this - but up to you, I wouldn't insist on it.

If there's some way to make an irreducable expression, that'd be handy, but nothing comes immediately to mind... perhaps a parameter that's a pointer to an array and an expression that adds many elements from that array together? Not sure. If there's something reasonably easy to do there it would be nice if the test was more reliable - but I wouldn't spend ages on it.

That being said, we don't have to bend over backwards to accomodate future potential changes; I can just add a note to the test pointing to this review to ensure that if it breaks later as the result of such a change being added, whoever picks it up doesn't have to go back to square 1 on finding a reproducer.

Yeah, some comments/breadcrumbs should suffice.

This revision is now accepted and ready to land.Apr 27 2021, 5:33 PM

Hope the other reviewers on this take a look despite my signing off on it - I think it's an interesting question of what limits (if any) we place on DWARF expressions if we they do correctly describe the variable location but it happens to be long/expensive to do so.