This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Add additional validation on return address in 'thread step-out'
ClosedPublic

Authored by mossberg on Dec 11 2019, 11:33 AM.

Details

Summary

During the 'thread step-out' command, check that the memory we are about to place a breakpoint in is executable. Previously, if the current function had a nonstandard stack layout/ABI, and had a valid data pointer in the location where the return address is usually located, data corruption would occur when the breakpoint was written. This could lead to an incorrectly reported crash or silent corruption of the program's state. Now, if the above check fails, the command safely aborts.

Further discussion:

  • This patch doesn't include a unit test -- I'd be happy to add one, but would appreciate guidance on how to do so. This is my first time working with the lldb codebase.
  • I wasn't sure if it was necessary to first check the log pointer before using it. Some parts of the code do this, and some don't.
  • Should we print out the return address in the log line?

Also, I don't have commit access, so I will need some help landing it when it's ready.

Diff Detail

Event Timeline

mossberg created this revision.Dec 11 2019, 11:33 AM
clayborg requested changes to this revision.Dec 11 2019, 12:59 PM
clayborg added a subscriber: clayborg.
clayborg added inline comments.
lldb/source/Target/ThreadPlanStepOut.cpp
127–129

This will fail for JIT'ed code. Best to ask the process for memory region information since it will usually be complete mappings for the process, even if we have no file on disk for the region in question:

uint32_t permissions = 0;
const addr_t return_pc = return_address.GetLoadAddr(&m_thread.GetProcess()->GetTarget());
if (!m_thread.GetProcess()->GetLoadAddressPermissions(return_pc, permissions)) {
  LLDB_LOGF(log, "No permissions for PC.");
  return;
} else if (!(permissions & ePermissionsExecutable)) {
  LLDB_LOGF(log, "Return address did not point to executable memory.");
  return;
}

The issue is we don't always have an object file for stuff we debug. If we rely on having an object file, like to require sections as your solution did, then this will fail for any JIT'ed code or any code that we weren't able to find object files for (remote debugging where we might not have all files).

This revision now requires changes to proceed.Dec 11 2019, 12:59 PM
mossberg updated this revision to Diff 233442.Dec 11 2019, 1:37 PM

Updated to get permissions information from the Process rather than the Section.

mossberg marked 2 inline comments as done.Dec 11 2019, 1:39 PM
mossberg added inline comments.
lldb/source/Target/ThreadPlanStepOut.cpp
127–129

Thanks @clayborg! Updated the patch.

mossberg marked an inline comment as done.Dec 11 2019, 1:48 PM
mossberg added a comment.EditedDec 11 2019, 1:51 PM

Here are a few screenshots of what the bug looks like:

Memory corruption leading to a "crash". In this case, a NULL pointer in memory was changed to a pointer to 0xcc.

Silent memory corruption of normal program data. (In this case a variable that is ultimately returned from main)

mossberg edited the summary of this revision. (Show Details)Dec 11 2019, 2:07 PM

Something I noticed while updating the patch was that the GetLoadedAddressPermissions call would succeed, even if passed an address that is obviously not mapped. In my test case I placed an int 0x22 where the return address would be, expecting the validation to fail at the GetLoadAddressPermissions call because 0x22 is not mapped. However, it only ended up failing when the permissions (which were empty) were checked for an execute bit. It seems to me like this might be another bug, but I'm not sure.

mossberg updated this revision to Diff 233466.Dec 11 2019, 4:02 PM

Remove lingering Section.h include

Please do print out the return address in the log. That will be helpful.

The LLDB_LOG* macros check the log for you. They are convenient if the thing you are going to log happen in one step. If you need to do some more logic when gathering up the log message and don't want to stuff it all in the arguments to the macro, you can use "if (log)..." Also the LLDB_LOG* macros were added fairly recently so there may be some places that haven't been converted to use them.

@clayborg: do you know under what circumstances we won't be able to find the permissions for an address? This patch would reject setting the breakpoint in that situation, so I'd like to be clear when that would happen.

mossberg updated this revision to Diff 233470.Dec 11 2019, 4:34 PM

Output return address in logging

Current output.

I'm not sure how easy it is to do that here, but it would certainly be nice to include these error messages in the actual error output from the "finish" command so that they are visible even without logging enabled.

As for the test, you should be able to do something similar to the tests in the test/Shell/Unwind folder (e.g. eh-frame-dwarf-unwind.test). I.e., you could write an assembly function which uses a non-standard frame layout, but do *not* describe that layout via .eh_frame. Then, stop in that function and try to step out...

Something I noticed while updating the patch was that the GetLoadedAddressPermissions call would succeed, even if passed an address that is obviously not mapped. In my test case I placed an int 0x22 where the return address would be, expecting the validation to fail at the GetLoadAddressPermissions call because 0x22 is not mapped. However, it only ended up failing when the permissions (which were empty) were checked for an execute bit. It seems to me like this might be another bug, but I'm not sure.

Are you sure there is nothing mapped at that address? I'm not a darwin expert, but I have a vague knowledge that the darwin loader (or some other component of the system) actually maps a couple of pages of unreadable memory around the address zero...

mossberg edited the summary of this revision. (Show Details)Dec 12 2019, 10:38 AM

I'm not sure how easy it is to do that here, but it would certainly be nice to include these error messages in the actual error output from the "finish" command so that they are visible even without logging enabled.

Agreed!

As for the test, you should be able to do something similar to the tests in the test/Shell/Unwind folder (e.g. eh-frame-dwarf-unwind.test). I.e., you could write an assembly function which uses a non-standard frame layout, but do *not* describe that layout via .eh_frame. Then, stop in that function and try to step out...

Thanks, will look into this.

Are you sure there is nothing mapped at that address? I'm not a darwin expert, but I have a vague knowledge that the darwin loader (or some other component of the system) actually maps a couple of pages of unreadable memory around the address zero...

Here's how I checked:

(lldb) fin
 SynthesizeTailCallFrames: can't find previous function
 Return address (0x22) did not point to executable memory.
 Discarding thread plans for thread tid = 0x1bd7b6, up to 0x7fe2f165a220
error: Could not create return address breakpoint.
(lldb) image list -a 0x22
error: Couldn't find module containing address: 0x22.
$ vmmap 94213 0x22
0x22 is not in any region.  Bytes before following region: 4294967262

I'm not sure how easy it is to do that here, but it would certainly be nice to include these error messages in the actual error output from the "finish" command so that they are visible even without logging enabled.

That's a good idea, and should be easy to do.

Since you can't really fail in constructors, the way the ThreadPlans work is that you make a thread plan, and if making it fails you leave a note to yourself that construction failed. Then when you go to queue the thread plan, Thread::QueueThreadPlan calls ValidatePlan on the new plan, and if that returns false the plan is unshipped. ValidatePlan takes a Stream argument as well, so the plans can write error messages. That's how you see the "Could not create return address breakpoint" in this patch. Finally, if QueueThreadPlan fails, CommandObjectThread::CommandObjectThreadStepWithTypeAndScope copies the Status message into the command result.

So just add a std::string to ThreadPlanStepOut, and cons up your error message there. Then in ThreadPlanStepOut::ValidatePlan, instead of doing:

if (m_return_bp_id == LLDB_INVALID_BREAK_ID) {
  if (error)
    error->PutCString("Could not create return address breakpoint.");
  return false;
}

Write the error string you made in the constructor.

It's a good idea to log it as well, since the logs don't get command output so having it there will make logs easier to read.

As for the test, you should be able to do something similar to the tests in the test/Shell/Unwind folder (e.g. eh-frame-dwarf-unwind.test). I.e., you could write an assembly function which uses a non-standard frame layout, but do *not* describe that layout via .eh_frame. Then, stop in that function and try to step out...

Something I noticed while updating the patch was that the GetLoadedAddressPermissions call would succeed, even if passed an address that is obviously not mapped. In my test case I placed an int 0x22 where the return address would be, expecting the validation to fail at the GetLoadAddressPermissions call because 0x22 is not mapped. However, it only ended up failing when the permissions (which were empty) were checked for an execute bit. It seems to me like this might be another bug, but I'm not sure.

Are you sure there is nothing mapped at that address? I'm not a darwin expert, but I have a vague knowledge that the darwin loader (or some other component of the system) actually maps a couple of pages of unreadable memory around the address zero...

Yes the whole first 32 bits of the address space get mapped with no access. That was done to catch any 32 ->64 bit address handling goofs when macOS was going from 32 to 64 bit architectures.

So just add a std::string to ThreadPlanStepOut, and cons up your error message there. Then in ThreadPlanStepOut::ValidatePlan, instead of doing:

if (m_return_bp_id == LLDB_INVALID_BREAK_ID) {
  if (error)
    error->PutCString("Could not create return address breakpoint.");
  return false;
}

Write the error string you made in the constructor.

It's a good idea to log it as well, since the logs don't get command output so having it there will make logs easier to read.

Ok, I see that ThreadPlanCallFunction.cpp seems to do this. I'll plan to model mine after it.

mossberg updated this revision to Diff 233673.Dec 12 2019, 1:33 PM

Output error messages to console in addition to logging.

I wasn't sure, but decided to unconditionally output the Could not create return address breakpoint. message for UX reasons. That message provides a bit of context for these new, lower level error messages.

Implementation is based on the one in ThreadPlanCallFunction.cpp.
I also slightly altered the error message for the "permissions" not found case to make both error messages more consistent.

Still working on adding the unit test, but I thought I'd submit this for early feedback.

Screenshots with new error messages.

(I artificially forced this branch to execute, still unable to make it execute at runtime)

I replaced my 0x22 example with 0x200000000, which is unmapped *and* outside the low 32 bit range. GetLoadAddressPermissions still seems to succeed.


Screenshots with new error messages.

(I artificially forced this branch to execute, still unable to make it execute at runtime)

I replaced my 0x22 example with 0x200000000, which is unmapped *and* outside the low 32 bit range. GetLoadAddressPermissions still seems to succeed.

That sounds like a bug then.


I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

Then what does the bool return mean?

I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

Then what does the bool return mean?

If there is no memory map info in the process plug-in at all, then false will be returned.

I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

Where is that example output from? I don't see that kind of formatting in image list or from vmmap.

I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

Then what does the bool return mean?

If there is no memory map info in the process plug-in at all, then false will be returned.

That means to answer the question "did GetLoadAddressPermissions return valid permissions for load address 0xABCD" I have to check both the return value and if any of the permission values are unknown? That seems like an awkward API.

I believe it is ok for permissions to succeed as long as they don't return invalid permissions. For any address outside any mapped ranges, we should have zero as the permissions. Looking up address mappings for zero will return

[0x00000000 - 0x100000000) ---

no permisssions are represented as "---". Then you can take the end address and look it up, and continue. So as long as we aren't getting bogus values back, we are good.

Then what does the bool return mean?

If there is no memory map info in the process plug-in at all, then false will be returned.

That means to answer the question "did GetLoadAddressPermissions return valid permissions for load address 0xABCD" I have to check both the return value and if any of the permission values are unknown? That seems like an awkward API.

That it is. Not sure if I added it, but blame shows Kate Stone from the great code convention checkin. Would be fine to fix it and update any call sites. Looks like it could be turned into:

bool Process::LoadAddressHasAllPermissions(lldb::addr_t load_addr, uint32_t permissions);

The permissions is now an input parameter. The only other client is RegisterContextLLDB and all call sites do things like:

process->GetLoadAddressPermissions(current_pc_addr, permissions) && (permissions & ePermissionsExecutable) == 0)

or

if (process.GetLoadAddressPermissions(candidate, permissions) &&permissions & lldb::ePermissionsExecutable)

So it would be easy to fix. Please do!

mossberg updated this revision to Diff 233708.EditedDec 12 2019, 4:35 PM

Add test.

I wasn't sure where to put it, so I left it in the Unwind/ directory since the call-asm.c infrastructure is already there, and this fix is *kind of* vaguely related to unwinding. Please let me know if there's a better location.

mossberg edited the summary of this revision. (Show Details)Dec 12 2019, 4:40 PM

I noticed some other tests specify the OS's they apply to-- I wasn't sure if there were any restrictions I should put on this one. I only tested on MacOS, and I imagine it should work on Linux. Not sure about Windows.

The test looks good. I think it should run fine on linux, and there's a decent chance it might work on windows too. It's fine to let check it in this way, and we can add some skips later if bots start complaining.

As for the location, /maybe/ ExecControl would be a better place for it but it's a bit hard to say because it has only two tests inside. I'd probably but it there, but leaving it in Unwind isn't the end of the world either. We can do the move later once the test structure crystallizes better. I wouldn't worry about duplicating call-asm.c though, as it's super-simple...

lldb/source/Target/ThreadPlanStepOut.cpp
137–138

I'd probably remove the log statements now that we have the real error output.

mossberg marked an inline comment as done.Dec 13 2019, 9:09 AM
mossberg added inline comments.
lldb/source/Target/ThreadPlanStepOut.cpp
137–138

I included both based on @jingham's feedback here: https://reviews.llvm.org/D71372#1782270

mossberg added a comment.EditedDec 15 2019, 1:29 PM

I wanted to also mention that this patch won't address buggy behavior if, for example, the stub takes a function pointer on the stack (vs a normal data pointer). In this case, the executable check will succeed, and the breakpoint will be written, but to a potentially arbitrary code address. This may or may not lead to the debugger "randomly" breaking out later in a debug session if or when that code happens to be executed (I've reproduced this, but also seen some very confusing behavior where in some situations the breakpoint logging will say a breakpoint is written, but it doesn't seem like this was actually the case. The presence of instructions that manipulate the stack pointer seemed to affect this also..)

I wanted to also mention that this patch won't address buggy behavior if, for example, the stub takes a function pointer on the stack (vs a normal data pointer). In this case, the executable check will succeed, and the breakpoint will be written, but to a potentially arbitrary code address. This may or may not lead to the debugger "randomly" breaking out later in a debug session if or when that code happens to be executed (I've reproduced this, but also seen some very confusing behavior where in some situations the breakpoint logging will say a breakpoint is written, but it doesn't seem like this was actually the case. The presence of instructions that manipulate the stack pointer seemed to affect this also..)

That's perfectly fine. This is all best effort -- if someone really wants us to do something bogus here, he'll always find a way. And OTOH, if he really wants us to do the right thing for functions like these, he should write unwind info for them.

lldb/source/Target/ThreadPlanStepOut.cpp
137–138

Yeah, sorry, I missed that part. Jim is the thread plan master, so lets do what he thinks best.

mossberg marked 2 inline comments as done.Dec 16 2019, 1:40 PM

I believe all the comments should be addressed at this point; thanks very much for your reviews so far.

This looks fine to me. We should let Greg have a final chance to weigh in and then I'll check it in.

This revision is now accepted and ready to land.Dec 19 2019, 2:03 PM

I committed this (2a42a5a2f4144cd99812ad0d230480f94a1d1c92) but I forgot to attribute the patch to Mark in the commit message. My apologies!

This revision was automatically updated to reflect the committed changes.

Hey, Mark... The test you added is failing on the Debian bots, e.g.:

http://lab.llvm.org:8011/builders/lldb-x86_64-debian/builds/1903

Could you take a look?

Took a quick look, and I think the good news is that the failure looks fairly superficial.

clang-10: warning: argument unused during compilation: '-fmodules-cache-path=/home/worker/lldb-x86_64-debian/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-shell' [-Wunused-command-line-argument]
/home/worker/lldb-x86_64-debian/lldb-x86_64-debian/llvm-project/lldb/test/Shell/Unwind/thread-step-out-ret-addr-check.test:10:10: error: CHECK: expected string not found in input

  1. CHECK: Breakpoint 1: where = {{.*}}`nonstandard_stub

The failing check was the one merely to see if the breakpoint was set, which I don't believe is related to the actual code changes that were made. Will continue to look. I also will also look at D71784.

ted added a subscriber: ted.EditedJan 8 2020, 12:31 PM

I've got another failure case for this. If the remote gdbserver doesn't implement qMemoryRegionInfo or qXfer:memory-map:read, thread step-out will fail.

error: Could not create return address breakpoint. Return address (0x5bc0) permissions not found.

That comes from this code:

if (!m_thread.GetProcess()->GetLoadAddressPermissions(m_return_addr,
                                                      permissions)) {
  m_constructor_errors.Printf("Return address (0x%" PRIx64
                              ") permissions not found.",
                              m_return_addr);
  LLDB_LOGF(log, "ThreadPlanStepOut(%p): %s", static_cast<void *>(this),
            m_constructor_errors.GetData());
  return;

GetLoadAddressPermissions does this:

Status error(GetMemoryRegionInfo(load_addr, range_info));
if (!error.Success())
  return false;

and ProcessGDBRemote::GetMemoryRegionInfo will send a qMemoryRegionInfo. If that fails, it will try qXfer::memory-map:read. qMemoryRegionInfo isn't required:

----------------------------------------------------------------------
"qMemoryRegionInfo:<addr>"

BRIEF
Get information about the address range that contains "<addr>"

PRIORITY TO IMPLEMENT
Medium. This is nice to have, but it isn't necessary.

Many embedded stubs do not implement this, which means thread step-out won't work with them anymore. Even if they did, many embedded operating systems don't have memory permissions.

How should we fix this?

labath added a comment.Jan 9 2020, 2:15 AM
In D71372#1810687, @ted wrote:

I've got another failure case for this. If the remote gdbserver doesn't implement qMemoryRegionInfo or qXfer:memory-map:read, thread step-out will fail.
....

That's a good point Ted. I think we should give targets which don't support fetching permissions the benefit of the doubt, and treat all memory as potentially executable. Would removing the return statement from the if(!GetLoadAddressPermissions) branch solve your problem? If so, can you whip up a patch for that?

ted added a comment.Jan 9 2020, 2:29 PM
In D71372#1810687, @ted wrote:

I've got another failure case for this. If the remote gdbserver doesn't implement qMemoryRegionInfo or qXfer:memory-map:read, thread step-out will fail.
....

That's a good point Ted. I think we should give targets which don't support fetching permissions the benefit of the doubt, and treat all memory as potentially executable. Would removing the return statement from the if(!GetLoadAddressPermissions) branch solve your problem? If so, can you whip up a patch for that?

Removing the return statement fixes the issue. I'll put up a patch. Keeping the m_constructor_errors.Printf line doesn't cause a failure; it might be useful to keep that in case the breakpoint can't be created for other reasons. What do you think?

In D71372#1813142, @ted wrote:
In D71372#1810687, @ted wrote:

I've got another failure case for this. If the remote gdbserver doesn't implement qMemoryRegionInfo or qXfer:memory-map:read, thread step-out will fail.
....

That's a good point Ted. I think we should give targets which don't support fetching permissions the benefit of the doubt, and treat all memory as potentially executable. Would removing the return statement from the if(!GetLoadAddressPermissions) branch solve your problem? If so, can you whip up a patch for that?

Removing the return statement fixes the issue. I'll put up a patch. Keeping the m_constructor_errors.Printf line doesn't cause a failure; it might be useful to keep that in case the breakpoint can't be created for other reasons. What do you think?

I don't care much either way.. Since you have this kind of a target around, you can judge whether printing this error/warning after each "finish" would be useful or just annoying. Another possibility would be to don't print the error to the command output, but still emit something into the log...

In D71372#1813142, @ted wrote:
In D71372#1810687, @ted wrote:

I've got another failure case for this. If the remote gdbserver doesn't implement qMemoryRegionInfo or qXfer:memory-map:read, thread step-out will fail.
....

That's a good point Ted. I think we should give targets which don't support fetching permissions the benefit of the doubt, and treat all memory as potentially executable. Would removing the return statement from the if(!GetLoadAddressPermissions) branch solve your problem? If so, can you whip up a patch for that?

Removing the return statement fixes the issue. I'll put up a patch. Keeping the m_constructor_errors.Printf line doesn't cause a failure; it might be useful to keep that in case the breakpoint can't be created for other reasons. What do you think?

I don't care much either way.. Since you have this kind of a target around, you can judge whether printing this error/warning after each "finish" would be useful or just annoying. Another possibility would be to don't print the error to the command output, but still emit something into the log...

There's no message printed after each finish; I think the message will only be printed if there's another error. That's fine by me.

I've uploaded the patch in D72513.