This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Refactor how unrecoverable debug line parsing errors work
AbandonedPublic

Authored by jhenderson on Jun 4 2020, 8:00 AM.

Details

Summary

Previously, unrecoverable errors were passed back to the debug line parse() caller, who would then pass the error to a callback. This meant that such errors were inconsistently printed in the output between verbose and non-verbose output: for verbose output, they'd be printed after the output (which is printed at the end of the parse() function) whilst non-verbose output was printed later on, after the error callback was run. This patch moves the error callback usage to inside the parse() function.

Depends on D81102.

Diff Detail

Event Timeline

jhenderson created this revision.Jun 4 2020, 8:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2020, 8:00 AM

Redirecting conversation from D81102:

This is the sort of "let's try to curate where errors print out relative to output" changes/direction I think are pretty unfortunate - being able to return errors rather than passing in error handlers seems like a better general coding paradigm where possible (while the callback system for warnings is great, and callbacks for errors are sometimes necessary - when there is some ability to print an error and continue) - returning errors seems clearer/more obvious/simpler control flow (there's no ability to accidentally print an error and continue because you forgot the error handler doesn't terminate execution/need to separately return? and what value do you return if you need to communicate to the caller they need to stop too etc)...

The problem is that in the common case, most tools are doing the same thing with both recoverable and unrecoverable errors, it's just done at different times. This means that some error messages appear in the middle of parsing the table (before the terminating blank line), whilst others appear after the prologue has been printed, after the blank line, and seemingly with the next table. This is the incosistency and potential confusion I'm trying to avoid here.

I have a patch for this already, but essentially all that it does is change this function to return void, having passed the error to a separate UnrecoverableErrorHandler (which already exists in higher-level code). The change is NFC from the point of view of what gets parsed - the same handler will still get called allowing the client to stop or do whatever etc. The only difference is when the callback is called, and since it's a callback, the client can still handle the error as it did before. In practice, they both just get manifested as warnings most of the time, so the timing difference looks weird to most clients.

Whilst I get your point about simpler control flow, we are currently in a slightly weird hybrid situation where sometimes we use a callback and other times we return the error, so in some ways, I think this idea simplifies the interface.

I think we're going in a less elegant direction by supporting more dumping/printing while parsing, rather than less - making these sort of error handling V dumping complexities.

What about breaking up DWARFDebugLine::LineTable::parse's implementation - so a caller could explicitly parse the prologue & decide what to do with that error, rather than having parse responsible for dumping things - then call to parse the rest of the table (where I wouldn't mind revisiting the dumping+parsing mix there too but can save that for another day/refactor/review/etc).

I think my underlying thesis here is that mixing parsing and dumping is a bit problematic/difficult to maintain well (though I don't have a perfect answer for the verbose line table dumping, for instance - I imagine the realistic solution there is a callback-based API for the raw/verbose line table, and implementing the processed line table on top of that API - possibly inserting an adapter of sorts there when you want to retrieve the processed line table while at the same time dumping the verbose details) and that fatal errors are probably tidier handled via return values than via a callback since they should be stopping control flow anyway. (at some point they're not quite fatal - fatal within one line table contribution, but still allow the parser to continue reading the next contribution - at that API level you'd change to a callback/"non-fatal" concept - it's fatal to the line table contribution, but non-fatal to the line table section as a whole)

But if you/other folks feel this patch is the right direction, I won't stand in the way of it -just my 2c.

Redirecting conversation from D81102:

This is the sort of "let's try to curate where errors print out relative to output" changes/direction I think are pretty unfortunate - being able to return errors rather than passing in error handlers seems like a better general coding paradigm where possible (while the callback system for warnings is great, and callbacks for errors are sometimes necessary - when there is some ability to print an error and continue) - returning errors seems clearer/more obvious/simpler control flow (there's no ability to accidentally print an error and continue because you forgot the error handler doesn't terminate execution/need to separately return? and what value do you return if you need to communicate to the caller they need to stop too etc)...

The problem is that in the common case, most tools are doing the same thing with both recoverable and unrecoverable errors, it's just done at different times. This means that some error messages appear in the middle of parsing the table (before the terminating blank line), whilst others appear after the prologue has been printed, after the blank line, and seemingly with the next table. This is the incosistency and potential confusion I'm trying to avoid here.

I have a patch for this already, but essentially all that it does is change this function to return void, having passed the error to a separate UnrecoverableErrorHandler (which already exists in higher-level code). The change is NFC from the point of view of what gets parsed - the same handler will still get called allowing the client to stop or do whatever etc. The only difference is when the callback is called, and since it's a callback, the client can still handle the error as it did before. In practice, they both just get manifested as warnings most of the time, so the timing difference looks weird to most clients.

Whilst I get your point about simpler control flow, we are currently in a slightly weird hybrid situation where sometimes we use a callback and other times we return the error, so in some ways, I think this idea simplifies the interface.

I think we're going in a less elegant direction by supporting more dumping/printing while parsing, rather than less - making these sort of error handling V dumping complexities.

What about breaking up DWARFDebugLine::LineTable::parse's implementation - so a caller could explicitly parse the prologue & decide what to do with that error, rather than having parse responsible for dumping things - then call to parse the rest of the table (where I wouldn't mind revisiting the dumping+parsing mix there too but can save that for another day/refactor/review/etc).

I think my underlying thesis here is that mixing parsing and dumping is a bit problematic/difficult to maintain well (though I don't have a perfect answer for the verbose line table dumping, for instance - I imagine the realistic solution there is a callback-based API for the raw/verbose line table, and implementing the processed line table on top of that API - possibly inserting an adapter of sorts there when you want to retrieve the processed line table while at the same time dumping the verbose details) and that fatal errors are probably tidier handled via return values than via a callback since they should be stopping control flow anyway. (at some point they're not quite fatal - fatal within one line table contribution, but still allow the parser to continue reading the next contribution - at that API level you'd change to a callback/"non-fatal" concept - it's fatal to the line table contribution, but non-fatal to the line table section as a whole)

But if you/other folks feel this patch is the right direction, I won't stand in the way of it -just my 2c.

llvm::DWARFDebugLine::Prologue::parse has been called in three places. llvm::DWARFDebugLine::LineTable::parse is called in three places. Splitting llvm::DWARFDebugLine::LineTable::parse into two (parsePrologue and parseBody) seems fine to me.

labath added a comment.Jun 8 2020, 5:36 AM

Redirecting conversation from D81102:

This is the sort of "let's try to curate where errors print out relative to output" changes/direction I think are pretty unfortunate - being able to return errors rather than passing in error handlers seems like a better general coding paradigm where possible (while the callback system for warnings is great, and callbacks for errors are sometimes necessary - when there is some ability to print an error and continue) - returning errors seems clearer/more obvious/simpler control flow (there's no ability to accidentally print an error and continue because you forgot the error handler doesn't terminate execution/need to separately return? and what value do you return if you need to communicate to the caller they need to stop too etc)...

The problem is that in the common case, most tools are doing the same thing with both recoverable and unrecoverable errors, it's just done at different times. This means that some error messages appear in the middle of parsing the table (before the terminating blank line), whilst others appear after the prologue has been printed, after the blank line, and seemingly with the next table. This is the incosistency and potential confusion I'm trying to avoid here.

I have a patch for this already, but essentially all that it does is change this function to return void, having passed the error to a separate UnrecoverableErrorHandler (which already exists in higher-level code). The change is NFC from the point of view of what gets parsed - the same handler will still get called allowing the client to stop or do whatever etc. The only difference is when the callback is called, and since it's a callback, the client can still handle the error as it did before. In practice, they both just get manifested as warnings most of the time, so the timing difference looks weird to most clients.

Whilst I get your point about simpler control flow, we are currently in a slightly weird hybrid situation where sometimes we use a callback and other times we return the error, so in some ways, I think this idea simplifies the interface.

I think we're going in a less elegant direction by supporting more dumping/printing while parsing, rather than less - making these sort of error handling V dumping complexities.

What about breaking up DWARFDebugLine::LineTable::parse's implementation - so a caller could explicitly parse the prologue & decide what to do with that error, rather than having parse responsible for dumping things - then call to parse the rest of the table (where I wouldn't mind revisiting the dumping+parsing mix there too but can save that for another day/refactor/review/etc).

I think my underlying thesis here is that mixing parsing and dumping is a bit problematic/difficult to maintain well (though I don't have a perfect answer for the verbose line table dumping, for instance - I imagine the realistic solution there is a callback-based API for the raw/verbose line table, and implementing the processed line table on top of that API - possibly inserting an adapter of sorts there when you want to retrieve the processed line table while at the same time dumping the verbose details) and that fatal errors are probably tidier handled via return values than via a callback since they should be stopping control flow anyway. (at some point they're not quite fatal - fatal within one line table contribution, but still allow the parser to continue reading the next contribution - at that API level you'd change to a callback/"non-fatal" concept - it's fatal to the line table contribution, but non-fatal to the line table section as a whole)

But if you/other folks feel this patch is the right direction, I won't stand in the way of it -just my 2c.

I also think it would be good to separate the (verbose) dumping from the parsing code. The way we did that for location list dumping is that the parser was a function taking a callback -- Error visitLocationList(uint64_t *Offset, function_ref<bool(const DWARFLocationEntry &)> Callback). I think a similar approach would work here, except that the callback argument would be Expected<Opcode>, because we can have "recoverable" errors when parsing opcodes (there are no such errors for location lists). Fatal errors would be reported via the return value. When dumping, once can pass in a callback which prints out the raw opcodes.

Doing something similar for line tables would be sort of simpler, because we don't need to support two very different encodings (debug_loc, debug_loclists), but it would also be more complicated because line tables are more complicated. But if we can do a small change like splitting prologue parsing that would move us in that direction, then I think we should do it.

One of my other pending patches is going to address truncated standard opcodes, similar to what D80797 does for extended opcodes. Unfortunately, because standard opcodes have no dedicated length field (unlike extended ones), it's not possible to resume parsing of the current table, if there's a problem reading the data. In all likelihood of course, it doesn't matter (the only errors are likely to be running off the table end, but technically an overflowing ULEB could also cause an error), but really this failure should be reported as an "unrecoverable" error, to indicate that table parsing is incomplete. This means that unrecoverable errors will be encountered when parsing the body as well as the prologue, and as such, I don't think splitting the prologue and body parsing into two separate functions will help solve the issue. I'll give it some thought and look at the .debug_loc code to consider alternatives. I do find parsing and printing simultaneously to be a little less than ideal, I must say.

I've spotted another example of where the current error handling technique doesn't work well combined with verbose output - when there is an error with the address size, the error will be reported mid line, which means that if the streams are interleaved, you'll get something like 0x0000002f: 00 DW_LNE_set_addresswarning: ... (0x00000008). There are potentially similar problems for other opcodes too. It could be possible to delay sending these errors to the handler until the full line has been printed, but I'm not particularly happy with the code changes required for that. This further illustrates that making the verbose printing independent of the parsing somehow might be preferable.

jhenderson abandoned this revision.Sep 16 2020, 2:19 AM

I've addressed the main issues I aimed to resolve in other patches. There may be some issues with odd output of error messages when the output is interleaved, but I have other priorities to work on at this point. If someone wants to work on this furhter, it's probably best done in a new review.