This is an archive of the discontinued LLVM Phabricator instance.

Fix unwind failures when PC points beyond the end of a function
ClosedPublic

Authored by uweigand on Apr 11 2016, 11:25 AM.

Details

Summary

RegisterContextLLDB::InitializeNonZerothFrame already has code to attempt
to detect and handle the case where the PC points beyond the end of a
function, but there are certain cases where this doesn't work correctly.

In fact, there are *two* different places where this detection is attempted,
and the failure is in fact a result of an unfortunate interaction between
those two separate attempts.

First, the ResolveSymbolContextForAddress routine is called with the
resolve_tail_call_address flag set to true. This causes the routine
to internally accept a PC pointing beyond the end of a function, and
still resolving the PC to that function symbol.

Second, the InitializeNonZerothFrame routine itself maintains a
"decr_pc_and_recompute_addr_range" flag and, if that turns out to
be true, itself decrements the PC by one and searches again for
a symbol at that new PC value.

Both approaches correctly identify the symbol associated with the PC.
However, the problem is now that later on, we also need to find the
DWARF CFI record associated with the PC. This is done in the
RegisterContextLLDB::GetFullUnwindPlanForFrame routine, and uses
the "m_current_offset_backed_up_one" member variable.

However, that variable only actually contains the PC "backed up by
one" if the *second* approach above was taken. If the function was
already identified via the first approach above, that member variable
is *not* backed up by one but simply points to the original PC.
This in turn causes GetEHFrameUnwindPlan to not correctly identify
the DWARF CFI record associated with the PC.

Now, in many cases, if the first method had to back up the PC by one,
we *still* use the second method too, because of this piece of code:

// Or if we're in the middle of the stack (and not "above" an asynchronous event like sigtramp),
// and our "current" pc is the start of a function...
if (m_sym_ctx_valid
    && GetNextFrame()->m_frame_type != eTrapHandlerFrame
    && GetNextFrame()->m_frame_type != eDebuggerFrame
    && addr_range.GetBaseAddress().IsValid()
    && addr_range.GetBaseAddress().GetSection() == m_current_pc.GetSection()
    && addr_range.GetBaseAddress().GetOffset() == m_current_pc.GetOffset())
{
    decr_pc_and_recompute_addr_range = true;
}

In many cases, when the PC is one beyond the end of the current function,
it will indeed then be exactly at the start of the next function. But this
is not always the case, e.g. if there happens to be alignment padding
between the end of one function and the start of the next.

In those cases, we may sucessfully look up the function symbol via
ResolveSymbolContextForAddress, but *not* set decr_pc_and_recompute_addr_range,
and therefore fail to find the correct DWARF CFI record.

A very simple fix for this problem is to just never use the first method.
Call ResolveSymbolContextForAddress with resolve_tail_call_address set
to false, which will cause it to fail if the PC is beyond the end of
the current function; or else, identify the next function if the PC
is also at the start of the next function. In either case, we will
then set the decr_pc_and_recompute_addr_range variable and back up the
PC anyway, but this time also find the correct DWARF CFI.

Diff Detail

Repository
rL LLVM

Event Timeline

uweigand updated this revision to Diff 53290.Apr 11 2016, 11:25 AM
uweigand retitled this revision from to Fix unwind failures when PC points beyond the end of a function.
uweigand updated this object.
uweigand added a reviewer: jasonmolenda.
uweigand added a subscriber: lldb-commits.
uweigand updated this revision to Diff 53951.Apr 15 2016, 2:42 PM
uweigand added reviewers: clayborg, tberghammer.

Add fix for a related problem that still caused unwind failures on SystemZ.

The ResolveSymbolContextForAddress sometimes returns a "symbol" with empty
name. This turns out to be an ELF section symbol. Now, usually those get type
eSymbolTypeInvalid. However, there is code in ObjectFileELF::ParseSymbols
that tries to change the type of invalid symbols to eSymbolTypeCode or
eSymbolTypeData if the symbol lies within the code or data section.

Unfortunately, this check also hits the symbol for the code section
itself, which is then marked as eSymbolTypeCode. While the size of
the section symbol is 0 according to the ELF file, LLDB considers
this size invalid and attempts to figure out the "correct" size.
Depending on how this goes, we may end up with a symbol that overlays
part of the code section, even outside areas covered by real function
symbols.

Therefore, if we call ResolveSymbolContextForAddress with PC pointing
beyond the end of a function, we may get this bogus section symbol.
This again means InitializeNonZerothFrame thinks we have a valid PC,
but then we don't find any unwind info for it.

The fix for this problem seems to me to simply always leave ELF section
symbols as type eSymbolTypeInvalid.

clayborg accepted this revision.Apr 15 2016, 3:07 PM
clayborg edited edge metadata.

I am OK with this, but Jason Molenda needs to be OK as well. Seems like we should have a function on the right object that would return the PC that we should use when looking up symbols for a given stack frame so we don't mess this up at any point.

This revision is now accepted and ready to land.Apr 15 2016, 3:07 PM
jasonmolenda accepted this revision.Apr 19 2016, 8:40 PM
jasonmolenda edited edge metadata.

Hi Ulrich, sorry for the delayed reply, I wanted to look at this function again.

This function has gone through some modifications over its lifetime. As you correctly point out, we're doing this back-up-by-one thing in two places and that's not needed.

I think you should remove the resolve_tail_call_address argument altogether around line 473 - take out the variable, the comment, everything. Module::ResolveSymbolContextForAddress defaults this argument to 'false' if it isn't provided.

Please commit when you're ready, the details of this are up to you.

tberghammer accepted this revision.Apr 20 2016, 3:28 AM
tberghammer edited edge metadata.

I think having an explicit value for resolve_tail_call_address with a comment is valuable (not sure how much details we need in the comment) so when the next person looking into related unwinding issue he/she will know that changing resolve_tail_call_address to true is a bad idea (even if it fixes the problem they are debugging) and will break some usecase

This revision was automatically updated to reflect the committed changes.

Given Tamas' update, I've now checked in the original version, including comment. If you do want me to take out the comment after all, please let me know and I'll do so.