This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Fix handling of .debug_aranges with -g1
ClosedPublic

Authored by ayermolo on Oct 31 2022, 5:42 PM.

Details

Summary

Old behavior was to add to .debug_aranges only when we create a DIE. As the
result we could end up in situation where DW_AT_ranges have addresses that are
not in .debug_aranges. This has caused issues for LLDB: D136395.

Changed it to add addresses to .debug_aranges even when DIE is not created.

Diff Detail

Event Timeline

ayermolo created this revision.Oct 31 2022, 5:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 5:42 PM
ayermolo requested review of this revision.Oct 31 2022, 5:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 5:42 PM

Sounds reasonable to me, but I will let the llvm experts weigh in here. Even if we don't add a DW_TAG_subprogram, we might have line table addresses that are associated with source locations in a compile unit and we should map those addresses to the compile unit right?

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2264

We're already in a DwarfDebug function, no need to retrieve the DwarfDebug object some other way.

Oh, also:

This has caused issues for LLDB

The debug_aranges verifier is mentioned as support for this statement, but I don't seee any discussion of LLDB in that thread - what's the connection to LLDB here? Could we fix LLDB to ignore aranges & just use CU ranges (which presumably it already does if aranges aren't present)?

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

I totally agree... Trying to see if we can migrate internal tool from using aranges. Although considering this is a bit of a "special case" with -g1, might be OK just let it be for that tool.

You have a good point. @clayborg can we just change lldb to ignore this section?

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

I totally agree... Trying to see if we can migrate internal tool from using aranges. Although considering this is a bit of a "special case" with -g1, might be OK just let it be for that tool.

You have a good point. @clayborg can we just change lldb to ignore this section?

This is the first time I have been aware that .debug_aranges was so unreliable when we identified the root cause of this issue. I am totally fine and I will modify LLDB to ignore this section when it can. If we have valid indexes like .debug_names though, it would be a shame, if each compile unit doesn't have a DW_AT_ranges, to have to manually index the DWARF just to figure out the aranges for a compile unit, so I might let LLDB rely on this section if it is available, but only if the compile unit has no valid ranges.

We can modify every individual tool to work around the compiler producing a .debug_aranges section that doesn't do its job. That is certainly one way to fix this issue. I am also fine with giving up on .debug_aranges. In fact we should remove it completely from the compiler, as it isn't producing a section that any tool can rely on. Why are we even making it if we aren't going to do it right? And what is the with definition of doing it right, I think we have seen there are differences between what gcc and clang will produce. Part of this is the DWARF spec not . So yes I am fine with removing this section being used at all in LLDB. That still doesn't mean all of the other tools currently relying on it are going to work. So lets just stop the compiler from producing it all together, then no tools have to be modified so just magically know this section isn't reliable.

I would also suggest that someone on the LLVM side make equivalent changes to make sure no one tries to use the .debug_aranges for the LLVM symbolizer in the DWARF code for LLVM, or for any DWARF code that does address lookups.

I'll just point out that there are other debuggers/consumers of these sections besides lldb. I mostly just watch these code-changes with my bag of popcorn and pass them along to my debugger engineers.

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

I totally agree... Trying to see if we can migrate internal tool from using aranges. Although considering this is a bit of a "special case" with -g1, might be OK just let it be for that tool.

You have a good point. @clayborg can we just change lldb to ignore this section?

This is the first time I have been aware that .debug_aranges was so unreliable when we identified the root cause of this issue. I am totally fine and I will modify LLDB to ignore this section when it can. If we have valid indexes like .debug_names though, it would be a shame, if each compile unit doesn't have a DW_AT_ranges, to have to manually index the DWARF just to figure out the aranges for a compile unit, so I might let LLDB rely on this section if it is available, but only if the compile unit has no valid ranges.

I agree in the abstract, but are there any producers you know of that don't produce CU ranges (& also do produce aranges) that this would be intended to support?

We can modify every individual tool to work around the compiler producing a .debug_aranges section that doesn't do its job. That is certainly one way to fix this issue. I am also fine with giving up on .debug_aranges. In fact we should remove it completely from the compiler, as it isn't producing a section that any tool can rely on. Why are we even making it if we aren't going to do it right? And what is the with definition of doing it right, I think we have seen there are differences between what gcc and clang will produce. Part of this is the DWARF spec not . So yes I am fine with removing this section being used at all in LLDB. That still doesn't mean all of the other tools currently relying on it are going to work. So lets just stop the compiler from producing it all together, then no tools have to be modified so just magically know this section isn't reliable.

I'd expect a lot of folks are "getting by" with what it is today and removing it entirely would be a regression for them. Perhaps they are having problems they don't realize - or perhaps it's "good enough" in that the bugs don't hurt them too much, or maybe don't hurt them at all (if they never use -gmlt + -garanges, for instance, maybe that's OK). Maybe we could thread the needle and remove aranges for gmlt, since gmlt is so focussed on debug info size, that might be worthwhile & anyone who comes complaining that their tool that relied on aranges is broken or slow we can say "well, the aranges in that case weren't correct anyway, so we've helped you, really".

Maybe that's a step along the deprecation/removal of aranges.

I'll just point out that there are other debuggers/consumers of these sections besides lldb. I mostly just watch these code-changes with my bag of popcorn and pass them along to my debugger engineers.

Any ideas if your debugger relies on aranges? Do you also maintain a clang toolchain, does it default to emitting aranges (this isn't the default in trunk/upstream, but no doubt some people might change it for their situation/compatibility/whatever)?

ayermolo edited the summary of this revision. (Show Details)Jan 24 2023, 2:54 PM
ayermolo updated this revision to Diff 491929.Jan 24 2023, 3:02 PM

addressed comment

ayermolo marked an inline comment as done.Jan 24 2023, 3:04 PM

Well this completely fell through the cracks. :facepalm
@dblaikie do you want to be an official reviewer for this?

Well this completely fell through the cracks. :facepalm
@dblaikie do you want to be an official reviewer for this?

Sure, don't mind formal or informal.

Was waiting on a few questions I asked @clayborg in my last comment. (perhaps the most practical one: Perhaps we could address this issue by removing support for aranges+gmlt instead of fixing this bug, as a step towards deprecating/removing aranges - though I'm not totally opposed to fixing the bug, just somewhat averse to supporting this use case where I think a bit more pressure on folks still using it to instead migrate to CU ranges might be good)

Well this completely fell through the cracks. :facepalm
@dblaikie do you want to be an official reviewer for this?

Sure, don't mind formal or informal.

Was waiting on a few questions I asked @clayborg in my last comment. (perhaps the most practical one: Perhaps we could address this issue by removing support for aranges+gmlt instead of fixing this bug, as a step towards deprecating/removing aranges - though I'm not totally opposed to fixing the bug, just somewhat averse to supporting this use case where I think a bit more pressure on folks still using it to instead migrate to CU ranges might be good)

Thank you. I'll ping Greg if he hasn't replied when he is back.
As for removing aranges, one of our internal symbolization tools relies on aranges currently. I don't think it's in the cards for that to change in short/medium future. :(

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

I totally agree... Trying to see if we can migrate internal tool from using aranges. Although considering this is a bit of a "special case" with -g1, might be OK just let it be for that tool.

You have a good point. @clayborg can we just change lldb to ignore this section?

This is the first time I have been aware that .debug_aranges was so unreliable when we identified the root cause of this issue. I am totally fine and I will modify LLDB to ignore this section when it can. If we have valid indexes like .debug_names though, it would be a shame, if each compile unit doesn't have a DW_AT_ranges, to have to manually index the DWARF just to figure out the aranges for a compile unit, so I might let LLDB rely on this section if it is available, but only if the compile unit has no valid ranges.

I agree in the abstract, but are there any producers you know of that don't produce CU ranges (& also do produce aranges) that this would be intended to support?

I don't try to guess what compilers people will use and then feed to LLDB. I just know I have seen it all over the years. Any even if we fix current tools to produce new and better output that is well suited to the debugger, we still need to be able to load anything we are given from the past.

We can modify every individual tool to work around the compiler producing a .debug_aranges section that doesn't do its job. That is certainly one way to fix this issue. I am also fine with giving up on .debug_aranges. In fact we should remove it completely from the compiler, as it isn't producing a section that any tool can rely on. Why are we even making it if we aren't going to do it right? And what is the with definition of doing it right, I think we have seen there are differences between what gcc and clang will produce. Part of this is the DWARF spec not . So yes I am fine with removing this section being used at all in LLDB. That still doesn't mean all of the other tools currently relying on it are going to work. So lets just stop the compiler from producing it all together, then no tools have to be modified so just magically know this section isn't reliable.

I'd expect a lot of folks are "getting by" with what it is today and removing it entirely would be a regression for them. Perhaps they are having problems they don't realize - or perhaps it's "good enough" in that the bugs don't hurt them too much, or maybe don't hurt them at all (if they never use -gmlt + -garanges, for instance, maybe that's OK). Maybe we could thread the needle and remove aranges for gmlt, since gmlt is so focussed on debug info size, that might be worthwhile & anyone who comes complaining that their tool that relied on aranges is broken or slow we can say "well, the aranges in that case weren't correct anyway, so we've helped you, really".

Maybe that's a step along the deprecation/removal of aranges.

By "gmlt" you mean "-g1"? LLDB is fine if this section is not here, so removing it anytime we can get away with it is fine.

The main issue is if we produce it, it should be reliable IMHO. AS you say, many people are relying on it, so anything we can do to fix it so that it is as correct as possible is a good thing.

This is the sort of thing I kind of wanted to avoid when we were talking about adding the aranges verifier - fixing bugs in aranges when instead we should be fixing bugs in consumers to not depend on aranges.

But anyway - if you really want/need aranges, perhaps a deeper fix would be good? We could emit aranges as CU ranges + data ranges instead of keeping an entirely separate list that can get out of sync? (& while we're at it, probably move the data ranges to the CU - rather than having to map them back from another independent data structure in DwarfDebug?)

I totally agree... Trying to see if we can migrate internal tool from using aranges. Although considering this is a bit of a "special case" with -g1, might be OK just let it be for that tool.

You have a good point. @clayborg can we just change lldb to ignore this section?

This is the first time I have been aware that .debug_aranges was so unreliable when we identified the root cause of this issue. I am totally fine and I will modify LLDB to ignore this section when it can. If we have valid indexes like .debug_names though, it would be a shame, if each compile unit doesn't have a DW_AT_ranges, to have to manually index the DWARF just to figure out the aranges for a compile unit, so I might let LLDB rely on this section if it is available, but only if the compile unit has no valid ranges.

I agree in the abstract, but are there any producers you know of that don't produce CU ranges (& also do produce aranges) that this would be intended to support?

I don't try to guess what compilers people will use and then feed to LLDB. I just know I have seen it all over the years. Any even if we fix current tools to produce new and better output that is well suited to the debugger, we still need to be able to load anything we are given from the past.

I think it'd be unfortunate to retain support for scenarios we have no evidence of - and in this particular case it'd just be worse performance to ignore aranges in the absence of CU ranges, not outright unsupported.

We can modify every individual tool to work around the compiler producing a .debug_aranges section that doesn't do its job. That is certainly one way to fix this issue. I am also fine with giving up on .debug_aranges. In fact we should remove it completely from the compiler, as it isn't producing a section that any tool can rely on. Why are we even making it if we aren't going to do it right? And what is the with definition of doing it right, I think we have seen there are differences between what gcc and clang will produce. Part of this is the DWARF spec not . So yes I am fine with removing this section being used at all in LLDB. That still doesn't mean all of the other tools currently relying on it are going to work. So lets just stop the compiler from producing it all together, then no tools have to be modified so just magically know this section isn't reliable.

I'd expect a lot of folks are "getting by" with what it is today and removing it entirely would be a regression for them. Perhaps they are having problems they don't realize - or perhaps it's "good enough" in that the bugs don't hurt them too much, or maybe don't hurt them at all (if they never use -gmlt + -garanges, for instance, maybe that's OK). Maybe we could thread the needle and remove aranges for gmlt, since gmlt is so focussed on debug info size, that might be worthwhile & anyone who comes complaining that their tool that relied on aranges is broken or slow we can say "well, the aranges in that case weren't correct anyway, so we've helped you, really".

Maybe that's a step along the deprecation/removal of aranges.

By "gmlt" you mean "-g1"?

Yep.

LLDB is fine if this section is not here, so removing it anytime we can get away with it is fine.

The main issue is if we produce it, it should be reliable IMHO. AS you say, many people are relying on it, so anything we can do to fix it so that it is as correct as possible is a good thing.

That's my suggestion, though - one way we could address the bug with -g1+-garanges is to not support that combination/reject it/not emit aranges under -g1.

But if you've really got tools that ignore CU ranges that are significantly harder to change than fixing clang/llvm here - I guess we can fix llvm... But I'd /really/ encourage folks to push towards removing dependence on aranges whenever possible/practical.

If the only time clang is not emitting .debug_aranges correctly is when we use -g1, then fixing that issue (by either fixing or removing .debug_arnages) would be fine by me. That being said, we have a clang that has been producing bad .debug_aranges, so unless we fix LLDB it won't fix the issue for binaries already out in the wild.

If the only time clang is not emitting .debug_aranges correctly is when we use -g1, then fixing that issue (by either fixing or removing .debug_arnages) would be fine by me. That being said, we have a clang that has been producing bad .debug_aranges, so unless we fix LLDB it won't fix the issue for binaries already out in the wild.

What sort of fix for LLDB are you picturing? One way to fix LLDB would be to teach it to ignore .debug_aranges - which would hurt perf on inputs with correct .debug_aranges but no CU ranges, but we don't seem to have anyone clamouring for support for that scenario.

If the only time clang is not emitting .debug_aranges correctly is when we use -g1, then fixing that issue (by either fixing or removing .debug_arnages) would be fine by me. That being said, we have a clang that has been producing bad .debug_aranges, so unless we fix LLDB it won't fix the issue for binaries already out in the wild.

What sort of fix for LLDB are you picturing? One way to fix LLDB would be to teach it to ignore .debug_aranges - which would hurt perf on inputs with correct .debug_aranges but no CU ranges, but we don't seem to have anyone clamouring for support for that scenario.

If I did a fix I would:

  • use CU ranges always if they are available
  • fall back to .debug_aranges if there
  • do expensive manual parsing for ranges

If the only time clang is not emitting .debug_aranges correctly is when we use -g1, then fixing that issue (by either fixing or removing .debug_arnages) would be fine by me. That being said, we have a clang that has been producing bad .debug_aranges, so unless we fix LLDB it won't fix the issue for binaries already out in the wild.

What sort of fix for LLDB are you picturing? One way to fix LLDB would be to teach it to ignore .debug_aranges - which would hurt perf on inputs with correct .debug_aranges but no CU ranges, but we don't seem to have anyone clamouring for support for that scenario.

If I did a fix I would:

  • use CU ranges always if they are available
  • fall back to .debug_aranges if there
  • do expensive manual parsing for ranges

That still doesn't address the existing broken aranges though, yeah? So to make LLDB robust to existing debug info, the only correct thing would be to ignore aranges - and we don't have any known examples where this hurts performance anyway (no known aranges-but-no-cu-ranges emitters), so seems like a pretty painless thing to do?

To me it sounds it's easier to just fix it in llvm, no?

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

What sort of change to LLDB are you thinking of?

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

What sort of change to LLDB are you thinking of?

using DW_AT_ranges instead of .debug_aranges if they are available.

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

What sort of change to LLDB are you thinking of?

using DW_AT_ranges instead of .debug_aranges if they are available.

So then LLDB wouldn't use this .debug_aranges data this patch fixes, right? So could we remove this .debug_aranges data instead?

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

What sort of change to LLDB are you thinking of?

using DW_AT_ranges instead of .debug_aranges if they are available.

So then LLDB wouldn't use this .debug_aranges data this patch fixes, right? So could we remove this .debug_aranges data instead?

Older LLDB versions which are out in the wild and will exist until the end of time will use the .debug_aranges. So I would vote to fix the compiler and do it with this patch.

dblaikie accepted this revision.Jan 31 2023, 10:47 AM

To me it sounds it's easier to just fix it in llvm, no?

Though that doesn't address the concern @clayborg that lldb will still misbehave on existing compiled code?

Which is ok, this is to fix the compiler. I can make separate LLDB changes to address the reality of the situation.

What sort of change to LLDB are you thinking of?

using DW_AT_ranges instead of .debug_aranges if they are available.

So then LLDB wouldn't use this .debug_aranges data this patch fixes, right? So could we remove this .debug_aranges data instead?

Older LLDB versions which are out in the wild and will exist until the end of time will use the .debug_aranges. So I would vote to fix the compiler and do it with this patch.

They won't use them if we remove them. Old debug info in old lldb will always be broken, can't help there. Yes, we could fix or remove aranges and that'd fix new code in new or old lldb and we could change the priority (CU ranges first, then aranges) or remove aranges support from lldb and that'd fix new lldb with old and new code.

I'd prefer we move in a direction away from aranges.

But sure, feel free to make this fix - but I'd really encourage you to revisit your use of/need for aranges.

This revision is now accepted and ready to land.Jan 31 2023, 10:47 AM
This revision was automatically updated to reflect the committed changes.

Thank you for reviewing and feedback.