This is an archive of the discontinued LLVM Phabricator instance.

Fix PC adjustment in StackFrame::GetSymbolContext
ClosedPublic

Authored by JosephTremoulet on Jul 19 2019, 7:27 AM.

Details

Summary

Update StackFrame::GetSymbolContext to mirror the logic in
RegisterContextLLDB::InitializeNonZerothFrame that knows not to do the
pc decrement when the given frame is a signal trap handler frame or the
parent of one, because the pc may not follow a call in these frames.
Accomplish this by adding a behaves_like_zeroth_frame field to
lldb_private::StackFrame, set to true for the zeroth frame, for
signal handler frames, and for parents of signal handler frames.

Also add logic to propagate the signal handler flag from UnwindPlan to
the FrameType on the RegisterContextLLDB it generates, and factor out a
helper to resolve symbol and address range for a RegisterContextLLDB now
that we have it in four places.

Diff Detail

Repository
rL LLVM

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
JosephTremoulet marked 2 inline comments as done.Jul 19 2019, 7:42 AM
JosephTremoulet added inline comments.
lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
1760 ↗(On Diff #210831)

I'm a bit ambivalent about adding this part -- the downside is that it's not concretely helping today, because if an eh_frame record has the 'S' flag in its augmentation (which is what unwind_Plan->GetUnwindPlanForSignalTrap() reports), we'll have already decremented pc and generated unwind plans based on the prior function when initializing the register context. But the upside is that this connects the dots between the otherwise-unused bit on the unwind plan and the frame type, and will be in place should we subsequently add code to try the second function's unwind plan as a fallback.

lldb/source/Target/StackFrame.cpp
297 ↗(On Diff #210831)

I've verified that this fixes the issue with the function name in the stack trace mentioned in D63667, but I haven't figured a way to add a test to that effect without putting a brittle assumption about particular trap handler names in the test.

@jasonmolenda @clayborg ping. To clarify, the issue this fixes is, using test functionalities/signal/handle-abrt as an example:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

void handler(int sig)
{
    printf("Set a breakpoint here.\n");
    exit(0);
}

void abort_caller() {
    abort();
}

int main()
{
    if (signal(SIGABRT, handler) == SIG_ERR)
    {
        perror("signal");
        return 1;
    }

    abort_caller();
    return 2;
}

When stopped at the breakpoint in "handler", backtrace currently gives this:

(lldb) bt
* thread #1, name = 'sigtest', stop reason = breakpoint 1.1
  * frame #0: 0x0000000000400651 sigtest`handler(sig=6) at main.c:7
    frame #1: 0x00007ffff7a424b0 libc.so.6`___lldb_unnamed_symbol1$$libc.so.6 + 1    <----- the symbol context of this frame is the issue
    frame #2: 0x00007ffff7a42428 libc.so.6`__GI_raise(sig=6) at raise.c:54
    frame #3: 0x00007ffff7a4402a libc.so.6`__GI_abort at abort.c:89
    frame #4: 0x000000000040066e sigtest`abort_caller() at main.c:12
    frame #5: 0x00000000004006a2 sigtest`main at main.c:23
    frame #6: 0x00007ffff7a2d830 libc.so.6`__libc_start_main(main=(sigtest`main at main.c:16), argc=1, argv=0x00007fffffffe328, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe318) at libc-start.c:291
    frame #7: 0x0000000000400579 sigtest`_start + 41

the pc of frame 1 is actually on the first byte of the relevant function, not following a call:

(lldb) frame select 1
frame #1: 0x00007ffff7a424b0 libc.so.6`___lldb_unnamed_symbol1$$libc.so.6 + 1
libc.so.6`__restore_rt:
->  0x7ffff7a424b0 <+0>: movq   $0xf, %rax
    0x7ffff7a424b7 <+7>: syscall 
    0x7ffff7a424b9 <+9>: nopl   (%rax)

libc.so.6`__GI___libc_sigaction:
    0x7ffff7a424c0 <+0>: subq   $0xd0, %rsp
(lldb) disassemble -s '$pc - 2'
    0x7ffff7a424ae:       addb   %al, (%rax)
libc.so.6`__restore_rt:
->  0x7ffff7a424b0 <+0>:  movq   $0xf, %rax
    0x7ffff7a424b7 <+7>:  syscall 
    0x7ffff7a424b9 <+9>:  nopl   (%rax)

libc.so.6`__GI___libc_sigaction:
    0x7ffff7a424c0 <+0>:  subq   $0xd0, %rsp
    0x7ffff7a424c7 <+7>:  testq  %rsi, %rsi
    0x7ffff7a424ca <+10>: movq   %rdx, %r8

With this fix, we instead get this backtrace:

(lldb) bt
* thread #1, name = 'sigtest', stop reason = breakpoint 1.1
  * frame #0: 0x0000000000400651 sigtest`handler(sig=6) at main.c:7
    frame #1: 0x00007ffff7a424b0 libc.so.6`__restore_rt                <-------------- correct symbol context here
    frame #2: 0x00007ffff7a42428 libc.so.6`__GI_raise(sig=6) at raise.c:54
    frame #3: 0x00007ffff7a4402a libc.so.6`__GI_abort at abort.c:89
    frame #4: 0x000000000040066e sigtest`abort_caller() at main.c:12
    frame #5: 0x00000000004006a2 sigtest`main at main.c:23
    frame #6: 0x00007ffff7a2d830 libc.so.6`__libc_start_main(main=(sigtest`main at main.c:16), argc=1, argv=0x00007fffffffe328, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe318) at libc-start.c:291
    frame #7: 0x0000000000400579 sigtest`_start + 41
clayborg requested changes to this revision.Jul 31 2019, 7:10 AM

I will let Jason comment on the unwind specifics since this is his area. I caught a few other things that need to be cleaned up.

lldb/include/lldb/Target/Unwind.h
40 ↗(On Diff #210831)

initialize with value just in case.

lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
153–155 ↗(On Diff #210831)

See comment for RegisterContextLLDB::TryResolveSymbolContextAndAddressRange, but if we change as suggested this will become:

m_sym_ctx_valid = m_current_pc.ResolveFunctionScope(m_sym_ctx, nullptr);
430–432 ↗(On Diff #210831)

See comment for RegisterContextLLDB::TryResolveSymbolContextAndAddressRange, but if we change as suggested this will become:

m_sym_ctx_valid = m_current_pc.ResolveFunctionScope(m_sym_ctx, nullptr);
486–487 ↗(On Diff #210831)

See comment for RegisterContextLLDB::TryResolveSymbolContextAndAddressRange, but if we change as suggested this will become:

m_sym_ctx_valid = m_current_pc.ResolveFunctionScope(m_sym_ctx, &addr_range);
595–624 ↗(On Diff #210831)

This function doesn't belong in RegisterContextLLDB. It think lldb_private::Address would be a better place:

/// if "addr_range_ptr" is not NULL, then fill in with the address range of the function.
bool lldb_private::Address::ResolveFunctionScope(lldb_private::SymbolContext &sym_ctx, lldb_private::AddressRange *addr_range_ptr) {
  constexpr bool resolve_tail_call_address = false;
  constexpr SymbolContextItem resolve_scope = eSymbolContextFunction | eSymbolContextSymbol;
  if (!CalculateSymbolContext(&sym_ctx, resolve_scope))
   return false;
  if (!addr_range_ptr)
    return true;
  return sym_ctx.GetAddressRange(resolve_scope, 0, false, *addr_range_ptr);  
}
1760 ↗(On Diff #210831)

I will let Jason comment on this one.

lldb/source/Target/StackFrameList.cpp
453 ↗(On Diff #210831)

init with value

670 ↗(On Diff #210831)

init with value

This revision now requires changes to proceed.Jul 31 2019, 7:10 AM
labath added a subscriber: labath.Jul 31 2019, 8:16 AM

I haven't looked at the implementation, but I did run into the problem you are fixing in the past, and I am happy that someone is finally implementing this. I just wanted to say that for testing, I think you should be able to create some hand-written assembly that creates the kind of stack frames and unwind info you need to trigger this. You can look at existing tests in lldb/lit/Unwind for examples.

JosephTremoulet marked an inline comment as done.Jul 31 2019, 8:51 AM
JosephTremoulet added inline comments.
lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
595–624 ↗(On Diff #210831)

Sure, will update. Just to double-check a couple points:

  1. I'll actually want to pass &addr_range at all three callsites, rather than passing nullptr at two of them, correct? (at the callsite on 154 to preserve the addr_range previously defined on 175, and at the callsite on 431to preserve the addr_range previously defined on 470)
  2. I see you removed the resolve_scope & resolved_scope check. Am I correct that that was intentional and it's redundant with the checks in GetAddressRange that require function or symbol to be non-null, or is there something more subtle going on here?

Thanks.

clayborg added inline comments.Jul 31 2019, 9:00 AM
lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
595–624 ↗(On Diff #210831)

For 1) above, it seems like you were making a new local variable just so you could pass it. If it is actually used somewhere, then yes, do include it. It seemed like you were just making a new local for no reason.

For 2) above: You can restore the resolve_scope & resolved_scope if needed. Can't remember if we will return eSymbolContextModule if we fail to find eSymbolContextFunction or eSymbolContextSymbol. So problably safest to restore that so we don't change functionality

  • Move resolution helper from RegisterContextLLDB to lldb_private::Address
  • Defensively initialize out arguments

Code looks fine now. Now Jason will need to chime in with comments on the new unwind functionality.

JosephTremoulet marked 8 inline comments as done.Jul 31 2019, 10:09 AM

I haven't looked at the implementation, but I did run into the problem you are fixing in the past, and I am happy that someone is finally implementing this. I just wanted to say that for testing, I think you should be able to create some hand-written assembly that creates the kind of stack frames and unwind info you need to trigger this. You can look at existing tests in lldb/lit/Unwind for examples.

Thanks for the tip! I was indeed able to copy one of those and adapt it to test this.

labath added a comment.Aug 1 2019, 3:57 AM

I haven't looked at the implementation, but I did run into the problem you are fixing in the past, and I am happy that someone is finally implementing this. I just wanted to say that for testing, I think you should be able to create some hand-written assembly that creates the kind of stack frames and unwind info you need to trigger this. You can look at existing tests in lldb/lit/Unwind for examples.

Thanks for the tip! I was indeed able to copy one of those and adapt it to test this.

Awesome. Thanks for adding the test. The test looks great.

jasonmolenda accepted this revision.Aug 1 2019, 3:04 PM

This looks good, this is in line with what we discussed, thanks for taking it on! Sorry for the delay at looking this over, it has been a little crazy this week.

lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
1760 ↗(On Diff #210831)

Yeah, this was my impression of the S augmentation flag in the eh_frame too -- we can't really use it in lldb today without forcing a scan of eh_frame entries the first time we unwind a function from that Module, and that would be unfortunate. But I like to see the flag being parsed and recorded; at some point in the future we may find a good way to use it.

1744 ↗(On Diff #212642)

This is more a personal preference thing, but I would explicitly call out that we're addressing a problem on a platform where a trap handler is "added" to the stack by the kernel, but it hasn't actually been called and executed the way a normal function is, the PC is sitting on the first instruction of the function, so backing the pc up by 1 points us to the wrong function.

The kernel is presenting us with a stack frame that doesn't follow the normal ABI and execution models, and we're hacking around that. It's perfectly OK to do that, but it's a specific enough thing we're addressing here, I think it'll be helpful to Future Us to name it explicitly.

lldb/source/Plugins/Process/Utility/RegisterContextLLDB.h
125 ↗(On Diff #212642)

This is very much a personal preference, but I might call this something like UpdateTrapHandlerFlagFromUnwindPlan. I'm fine with this name if you find it clearer.

  • Expand comment about return trampolines
  • rename PropagateTrapHandlerFlag -> PropagateTrapHandlerFlagFromUnwindPlan
JosephTremoulet marked 2 inline comments as done.Aug 2 2019, 9:43 AM
This revision was not accepted when it landed; it landed in state Needs Review.Aug 2 2019, 9:57 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2019, 9:57 AM

Hey Joseph,

The test trap_frame_sym_ctx is failing on GreenDragon. Can you please have a look?

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/32449/testReport/junit/LLDB/Unwind/trap_frame_sym_ctx_test/

Hey Joseph,

The test trap_frame_sym_ctx is failing on GreenDragon. Can you please have a look?

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/32449/testReport/junit/LLDB/Unwind/trap_frame_sym_ctx_test/

Sorry about that. Hopefully fixed by r367706.

Thanks for the heads-up, though I'm surprised I didn't see one from the bot on either email or IRC; is it not sending them out, or am I just not looking in the right places?

Hey Joseph,

The test trap_frame_sym_ctx is failing on GreenDragon. Can you please have a look?

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/32449/testReport/junit/LLDB/Unwind/trap_frame_sym_ctx_test/

Sorry about that. Hopefully fixed by r367706.

Thanks, the bot is green again :-)

Thanks for the heads-up, though I'm surprised I didn't see one from the bot on either email or IRC; is it not sending them out, or am I just not looking in the right places?

Interesting. I'll check with the infrastructure team.

JosephTremoulet marked an inline comment as done.Aug 7 2019, 12:58 PM
JosephTremoulet added inline comments.
lldb/source/Plugins/Process/Utility/RegisterContextLLDB.cpp
1760 ↗(On Diff #210831)

Yeah, this was my impression of the S augmentation flag in the eh_frame too -- we can't really use it in lldb today without forcing a scan of eh_frame entries the first time we unwind a function from that Module

"Thinking out lout" a bit... we could consider doing the check only when the pc is at the very start of a function -- in that case, either it's one of these signal frames where it's worth paying the price to check eh_frame, or it's a call at the end of the previous function... am I understanding correctly that the non-signal case is specifically when a noreturn non-tail call gets laid out at the end of a function and there's no alignment padding between it and the next function? I'd naively think that's an uncommon case, is there reason to believe otherwise? I don't think this will be a high enough priority for me to actually investigate any time soon, but if I were going to investigate, do we have some standard test suite for assessing that sort of thing (either statically how often that code pattern happens or dynamically what slowdown is observed from a particular lldb source change)?

Thanks!