Page MenuHomePhabricator

Unwind: Add a stack scanning mechanism to support win32 unwinding
ClosedPublic

Authored by labath on Aug 23 2019, 3:46 AM.

Details

Summary

Windows unwinding is weird. The unwind rules do not (always) describe
the precise layout of the stack, but rather expect the debugger to scan
the stack for something which looks like a plausible return address, and
the unwind based on that. The reason this works somewhat reliably is
because the the unwinder also has access to the frame sizes of the
functions on the stack. This allows it (in most cases) to skip function
pointers in local variables or function arguments, which could otherwise
be mistaken for return addresses.

Implementing this kind of unwind mechanism in lldb was a bit challenging
because we expect to be able to statically describe (in the UnwindPlan)
structure, the layout of the stack for any given instruction. Giving a
precise desription of this is not possible, because it requires
correlating information from two functions -- the pushed arguments to a
function are considered a part of the callers stack frame, and their
size needs to be considered when unwinding the caller, but they are only
present in the unwind entry of the callee. The callee may end up being
in a completely different module, or it may not even be possible to
determine it statically (indirect calls).

This patch implements this functionality by introducing a couple of new
APIs:
SymbolFile::GetParameterStackSize - return the amount of stack space

taken up by parameters of this function.

SymbolFile::GetOwnFrameSize - the size of this function's frame. This

excludes the parameters, but includes stuff like local variables and
spilled registers.

These functions are then used by the unwinder to compute the estimated
location of the return address. This address is not always exact,
because the stack may contain some additional values -- for instance, if
we're getting ready to call a function then the stack will also contain
partially set up arguments, but we will not know their size because we
haven't called the function yet. For this reason the unwinder will crawl
up the stack from the return address position, and look for something
that looks like a possible return address. Currently, we assume that
something is a valid return address if it ends up pointing to an
executable section.

All of this logic kicks in when the UnwindPlan sets the value of CFA as
"isHeuristicallyDetected", which is also the final new API here. Right
now, only SymbolFileBreakpad implements these APIs, but in the future
SymbolFilePDB will use them too.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Aug 23 2019, 3:46 AM
labath updated this revision to Diff 218317.Sep 2 2019, 2:43 AM
labath updated this revision to Diff 218891.Sep 5 2019, 4:28 AM

Rebase on master. Should be ready for review now.

labath retitled this revision from WIP/Breakpad: Add support for stack unwinding on windows to Unwind: Add a stack scanning mechanism to support win32 unwinding.Sep 5 2019, 4:30 AM
labath edited the summary of this revision. (Show Details)
labath added reviewers: clayborg, jasonmolenda, amccarth.
labath set the repository for this revision to rLLDB LLDB.
labath added subscribers: lldb-commits, rnk, markmentovai.
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2019, 4:30 AM

I don't have any specific code comments, but I do have a couple general questions and points to consider.

  1. isFoundHeuristically is very generic. It's true that it's a heuristic approach, but it's a very specific heuristic. Might there be other heuristic approaches in the future? Should we name it something more specific like isRaSearch?
  1. max_iterations means how many stack positions the heuristic will scan before giving up, right? Are there any alignment issues here? Should we assert that the return address hint is a multiple of the stack alignment?
  1. The 100 for max_iterations is probably fine, but I wonder if there's a way to determine a more specific limit without just guessing. What things could be on the stack between the hint and the actual return address? It seems like only arguments for a call that the current function is preparing to make. The standard says that the actual number of parameters is implementation-defined, but that it suggests a minimum of 256. Should max_iterations be 256? Is there much risk in making it bigger than it needs to be?
  1. Is checking for executable permission slow? Would it be worth doing some culling or caching? I imagine a lot of non-return address values on the stack will be simple small numbers, like 0 or 1, which, for Windows, would never be a valid executable address.
include/lldb/Symbol/SymbolFile.h
254 ↗(On Diff #218891)

Typos:

paramenters -> parameters
callers -> caller's

labath added a comment.Sep 6 2019, 4:18 AM

Thanks for the quick reply. My responses are inline:

I don't have any specific code comments, but I do have a couple general questions and points to consider.

  1. isFoundHeuristically is very generic. It's true that it's a heuristic approach, but it's a very specific heuristic. Might there be other heuristic approaches in the future? Should we name it something more specific like isRaSearch?

I wanted to avoid a specific name, because I though this approach might be useful elsewhere too. For instance, I can imagine doing the same kind of search as some sort of a last ditch attempt when we lack *any* kind of unwind information. However, thinking about it now, when I ignore the fact that the pdb construct is called rasearch, the name still pretty well describes what this "heuristic" does, so it is probably better to name it that way.

  1. max_iterations means how many stack positions the heuristic will scan before giving up, right? Are there any alignment issues here? Should we assert that the return address hint is a multiple of the stack alignment?

I don't believe we're doing anything like that elsewhere in the unwinder, and x86 is pretty forgiving about memory access alignment, so if through some mishap, the function does end up with a misaligned stack, there is actually a good chance it would still work, and so we should be able to unwind from it.

  1. The 100 for max_iterations is probably fine, but I wonder if there's a way to determine a more specific limit without just guessing. What things could be on the stack between the hint and the actual return address? It seems like only arguments for a call that the current function is preparing to make. The standard says that the actual number of parameters is implementation-defined, but that it suggests a minimum of 256. Should max_iterations be 256? Is there much risk in making it bigger than it needs to be?

I don't think there's much risk in making this bigger. If everything goes according to plan, then we should get the return address only after a couple of iterations (most likely 1). Even if something goes wrong, then we will probably still return before completing all iterations, except that will find the return address a couple of frames higher (which is probably still better than giving no return address at all). So yeah, if the standard (which standard is that? C?) says 256, then that's probably the best number we can go with.

It's hard to say what can be on the stack except the function arguments. In theory it can be anything. The compiler could push any number of intermediate values to stash some temporary values. However, current compilers tend to avoid that, and instead prefer to allocate the maximum number of stack slots in the prologue.

That said, this made me realize that the win32 unwind format does allow one to express that the amount of stack taken up by the temporaries will be different at different points within the function. However, the API I am introducing here do not. It would probably be better to pass in an address in this case, so that one can return different values for different points within the function.

  1. Is checking for executable permission slow? Would it be worth doing some culling or caching? I imagine a lot of non-return address values on the stack will be simple small numbers, like 0 or 1, which, for Windows, would never be a valid executable address.

The check should be pretty fast. The way it's implemented now, it only consults lldb's view of the process, which is all readily available (the loaded sections are in the target's section load list, and the executable bit is in the section object). I have considered going through the "memory region info" infrastructure, which consults the OS to get a list of mapped pages. That would be a bit slower (though this information is also cached), but it would have the advantage of detecting executable code which does not belong to any (known) module. It wasn't clear to me whether that is worth the trouble though..

labath updated this revision to Diff 219071.Sep 6 2019, 5:28 AM
labath marked an inline comment as done.
labath edited the summary of this revision. (Show Details)
  • s/FoundHeuristically/RaSearch
  • max_iterations:=256
  • tweak the handling of "own frame size" instead of having the unwinder ask for it, it is now embedded directly into the unwind plan. This automatically makes it possible to have different frame sizes at different points in the function, and also reduces the api surface a bit. The "parameter size" is still in a separate function, because that belongs to a different frame/function/module.

So made a "sbt" command in a Python module as a way to get a backtrace when our backtracing fails because Android almost never makes it back to the first frame for any stacks. This code just grabs the current SP, looks up the memory region using the SP, and then it uses this as the address range (current SP back to the base of the stack) to scan for any pointers with an address aligned address that falls into executable sections. What it doesn't do, is to try and backup one instruction from the potential RA values to see if there is a branch to the current function. This avoids having to know anything about stack parameter sizes and just scan the stack for all executable memory region pointers, and then ask each architecture to backup one instruction from each proposed RA and see if there is a function call to the current function. For ARM this is easy: backup by 4 for ARM and 2 or 4 for Thumb and see if there is a branch and link. For x64, it would be trickier, but it might be as easy as backup up by 1 + address size and looking for a callq. If we can successfully do this, we could really improve stack backtracing for all architectures and OSs. Thoughts?

clayborg added inline comments.Sep 6 2019, 7:25 AM
source/Plugins/Process/Utility/RegisterContextLLDB.cpp
1875 ↗(On Diff #219071)

WE should probably be using the memory region permissions using:

bool Process::GetLoadAddressPermissions(lldb::addr_t load_addr, uint32_t &permissions);

Why? You might have a stack going through JIT'ed code that might not have a section. But the memory region info will know about the permissions.

jasonmolenda accepted this revision.Sep 12 2019, 12:39 PM

Sorry for taking so long to look at this one -- it looks like a good solution to this.

Greg's idea of an additional heuristic when walking the stack and having a POSSIBLE return address, of backing up the RA and seeing if there is a call/jump instruction there is an interesting one that lldb could use in scenarios where the stack walk is going poorly and it's trying to decide whether to fall back to the architectural default unwind plan or whatever. I don't think I'd hold off on landing this to add that heuristic, but it's an interesting thing, we could add an instruction recognizer to the ABI plugin given a RA which knows how to do that, and see about using it in the unwinder.

source/Plugins/Process/Utility/RegisterContextLLDB.cpp
1875 ↗(On Diff #219071)

Checking memory permissions is an interesting addition here, but not every gdb remote serial protocol stub support the memory permission packets, so I might try both. RegisterContextLLDB::InitializeNonZerothFrame does this, for instance - it checks if there is a Module that contains the address (it could check the permissions of that section like Pavel's patch does here, too) and then it calls GetLoadAddressPermissions to see if the memory is executable.

This revision is now accepted and ready to land.Sep 12 2019, 12:39 PM
amccarth accepted this revision.Sep 12 2019, 1:07 PM

My concerns are satisfied.

Thanks for the review and sorry for the delay (I was OOO). The idea to use Process::GetLoadAddressPermissions makes sense, both from consistency and correctness perspectives. Unfortunately it does not work "out of the box" because minidump core files (my primary use case for this) do not currently provide enough information through the memory region info api. It tries pretty hard to achieve that, but in the case of regular (not "full memory dumps") windows minidump, all we have is the "MemoryList" stream, which only contains a couple of memory regions, and it does not include the areas covered by loaded modules (which is where PCs should point). If this is to work, we'd need to extend this memory parsing code to also take into account the module memory ranges (from the ModuleList stream).

@clayborg, you're the one who wrote the memory region parsing code IIRC. Does that sound OK to you ?

The disassembling idea also sounds interesting, but I am afraid it's going to be very useful for my main use case. The main use case for breakpad symbol files is for the cases where one does not have the original object file (and when one has the original object file, it probably also has proper debug and unwind info). It _might_ be more interesting once we get around to PECOFF unwinding, as it also uses raSearch (on i386), and one is more likely to have the original file to disassemble there, but even then, I'd first check what other windows debuggers do, as might be best to just follow their lead. So, for now, I'd like to skip the disassembling logic.

Thanks for the review and sorry for the delay (I was OOO). The idea to use Process::GetLoadAddressPermissions makes sense, both from consistency and correctness perspectives. Unfortunately it does not work "out of the box" because minidump core files (my primary use case for this) do not currently provide enough information through the memory region info api. It tries pretty hard to achieve that, but in the case of regular (not "full memory dumps") windows minidump, all we have is the "MemoryList" stream, which only contains a couple of memory regions, and it does not include the areas covered by loaded modules (which is where PCs should point). If this is to work, we'd need to extend this memory parsing code to also take into account the module memory ranges (from the ModuleList stream).

@clayborg, you're the one who wrote the memory region parsing code IIRC. Does that sound OK to you ?

I am not sure if we can infer anything about permissions from the module ranges. I believe this range contains everything (.text, .data, .bss, etc), so it has a mixture of permissions?

If we can find a way to use the module base of image, we should only parse the module ranges if the /proc/<pid>/maps is not in a minidump file as that is the best source for mapped ranges.

The disassembling idea also sounds interesting, but I am afraid it's going to be very useful for my main use case. The main use case for breakpad symbol files is for the cases where one does not have the original object file (and when one has the original object file, it probably also has proper debug and unwind info). It _might_ be more interesting once we get around to PECOFF unwinding, as it also uses raSearch (on i386), and one is more likely to have the original file to disassemble there, but even then, I'd first check what other windows debuggers do, as might be best to just follow their lead. So, for now, I'd like to skip the disassembling logic.

Thanks for the review and sorry for the delay (I was OOO). The idea to use Process::GetLoadAddressPermissions makes sense, both from consistency and correctness perspectives. Unfortunately it does not work "out of the box" because minidump core files (my primary use case for this) do not currently provide enough information through the memory region info api. It tries pretty hard to achieve that, but in the case of regular (not "full memory dumps") windows minidump, all we have is the "MemoryList" stream, which only contains a couple of memory regions, and it does not include the areas covered by loaded modules (which is where PCs should point). If this is to work, we'd need to extend this memory parsing code to also take into account the module memory ranges (from the ModuleList stream).

@clayborg, you're the one who wrote the memory region parsing code IIRC. Does that sound OK to you ?

I am not sure if we can infer anything about permissions from the module ranges. I believe this range contains everything (.text, .data, .bss, etc), so it has a mixture of permissions?

If we can find a way to use the module base of image, we should only parse the module ranges if the /proc/<pid>/maps is not in a minidump file as that is the best source for mapped ranges.

If we're able to find the actual module that got mapped at this address, then we could use the sections of that module to reconstruct the permissions fairly accurately. If we can't find that module (which will unfortunately be the most common case for me), then we can't determine the actual permissions, but we can still differentiate between a piece of memory being definitely(*) not executable (if there is no module loaded at that address) and it being _potentially_ executable (if there is a module there). Given the data we have, I think that is the best we can do.

The MemoryRegion class already kind of supports this, because it stores the permissions in a ternary enum ("yes", "no", "don't know"), but the Process::GetLoadAddressPermissions interprets the permissions strictly, and treats "don't know" as "no". So, one way to handle that would be to return "don't know" for these memory ranges and then change the GetLoadAddressPermissions interface so that we're able to ask the question "is this address potentially executable?" (**). Or we could dumb the current check down even more, and just check whether there is _a_ module at the given address (similar to the first step of InitializeNonZerothFrame), and completely avoid permission checking. For an initial implementation, I would be happy with a super simple heuristic like that. WDYT?

(*) "definitively" might be too strong a word, as there might be some executable memory mapped there that we don't know about, but that is fairly unlikely. OTOH, an address belonging to a module is quite likely to be executable.

(**) Of course, all of this would kick in only when we don't have a source which can provide an exhaustive list of memory regions and their permissions. linux /proc/pid/maps is one such source, and I also believe the MemoryInfoList stream can be considered to be exhaustive.

labath updated this revision to Diff 222129.Fri, Sep 27, 4:46 AM

Offline, @markmentovai pointed out that crashpad does actually produce the
MemoryInfoList stream, which means there is a fairly large class of use cases
where checking the memory permissions will work correctly. Therefore, I am
updating this code to work with GetLoadAddressPermissions (and adding a
MemoryInfoList stream to the test case), as suggested by @clayborg. I'll create
a separate patch for handling the case of Windows-generated
non-full-memory-dumps which do not include the permissions.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFri, Sep 27, 5:11 AM