This is an archive of the discontinued LLVM Phabricator instance.

Fix backtrace of noreturn functions situated at the end of a module
ClosedPublic

Authored by labath on Apr 13 2017, 8:51 AM.

Details

Summary

When a call instruction is the last instruction in a function, the
backtrace PC will point past the end of the function. We already had
special code to handle that, but we did not handle the case where the PC
ends up outside of the bounds of the module containing the function,
which is a situation that occured in TestNoreturnUnwind on android for
some arch/compiler combinations.

I fix this by adding an argument to Address resolution code which states
that we are ok with addresses pointing to the end of a module/section to
resolve to that module/section.

I create a reproducible test case for this situation by hand-crafting an
executable which has a noreturn function at the end of a module.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Apr 13 2017, 8:51 AM

Jim, Jason, could one of you take a look at this please?

This part of the code is not really my specialty, so I'd appreciate your thoughts on this.

jasonmolenda edited edge metadata.Apr 20 2017, 2:54 PM

Sorry Pavel, I kept meaning to look at this patch this week but hadn't yet. I will later today. When I read over your description of the problem, it sounded like a good fix - haven't looked at the code yet.

Hi Pavel, I'd document the new flag in include/lldb/Core/Address.h where we have documentation for the other flags being used.

It seems like we're fixing this a little indirectly, and I'm not sure it's the best approach. I want to make sure I understand the situation correctly.

Say we have an object file with three sections

s1 0..99 (offset 0, size 100)
s2 100..199 (offset 100, size 100)
s3 200.299 (offset 200, size 100)

We have a noreturn function whose last instruction ends on the last byte of s2, so the saved return address is offset 200. Your change makes it so that when we say "is 200 contained within 100..199", we will say yes, it is. This gets us the correct Section for our symbol context and when we back up the pc value by one (the "decr_pc_and_recompute_addr_range = true" bit in RegisterContextLLDB::InitializeNonZerothFrame) so when we back up the offset within the section by 1, we're good.

The current behavior is that we pick the wrong section (s3) and somehow backing up the pc isn't working.

Maybe we can accomplish the same fix by looking at the if (decr_pc_and_recompute_addr_range) block of code? Is decr_pc_and_recompute_addr_range not being set for this case? Are we not correctly backing up the pc into the Section boundary correctly?

I have access to a linux machine so I can play with this myself, but it will take me a while to set up lldb on that system, could you let me know if you looked at this approach? I know RegisterContextLLDB is complicated and it's easy to miss things - or it may be that you tried this approach and it didn't look possible. (and I haven't had to touch this code in a few years so I may have forgotten about some "new Section is the same as the old Section" sanity check or something in there... I don't see it right now, but I may have missed it.)

Thank you for looking at this. My response is below.

Hi Pavel, I'd document the new flag in include/lldb/Core/Address.h where we have documentation for the other flags being used.

It seems like we're fixing this a little indirectly, and I'm not sure it's the best approach. I want to make sure I understand the situation correctly.

Say we have an object file with three sections

s1 0..99 (offset 0, size 100)
s2 100..199 (offset 100, size 100)
s3 200.299 (offset 200, size 100)

We have a noreturn function whose last instruction ends on the last byte of s2, so the saved return address is offset 200. Your change makes it so that when we say "is 200 contained within 100..199", we will say yes, it is. This gets us the correct Section for our symbol context and when we back up the pc value by one (the "decr_pc_and_recompute_addr_range = true" bit in RegisterContextLLDB::InitializeNonZerothFrame) so when we back up the offset within the section by 1, we're good.

The reason that decr_pc_and_recompute_addr_range is not working is because we don't find any section (or module) at all. In the object file you mention above, this would correspond to having a noreturn function at the end of s3, and then the space after s3 being empty. In this case we don't even reach the decr_pc_and_recompute_addr_range part, because we abort due to a missing module.

In my first version of this patch did something different -- I modified RegisterContextLLDB to retry the module search by decrementing the pc by one if the original search did not find any module. However, that did not seem completely clean either, as I had to do the same dance twice (StackFrame::GetFrameCodeAddress needed to do the same dance as well to display the symbol name correctly). I can go back to that implementation if you think that is better.

I have access to a linux machine so I can play with this myself, but it will take me a while to set up lldb on that system, could you let me know if you looked at this approach? I know RegisterContextLLDB is complicated and it's easy to miss things - or it may be that you tried this approach and it didn't look possible. (and I haven't had to touch this code in a few years so I may have forgotten about some "new Section is the same as the old Section" sanity check or something in there... I don't see it right now, but I may have missed it.)

You don't need a linux machine to debug this -- you should be able to open the core file in the test from any system (last time I tried, I had problems downloading binary files from phabricator, so I am going to also send them to you directly, just in case). You could probably also create a darwin executable reproducing this using assembly similar to the one in the test.

source/Plugins/Process/Utility/RegisterContextLLDB.cpp
339 ↗(On Diff #95139)

Here we don't get a valid module and we abort the search.

labath added a comment.May 5 2017, 6:09 AM

Jason, any thoughts on my comments above?

labath updated this revision to Diff 101555.Jun 6 2017, 6:50 AM

Add documentation.

labath added a comment.Jun 6 2017, 6:53 AM

I'd like to commit this this week. If you have any objections to how I implemented this, let me know, so I can adjust the approach.

jasonmolenda accepted this revision.Jun 7 2017, 9:36 PM

Pavel, my apologies for not following up on this, we had a big push to get ready for a beta release/conference this week, but it wasn't fair to leave you hanging with this change. Please commit. I'd like to play with this a little when I get free time (hahahaha) to see if I can't isolate the change a little more, but it'll be on me to play with the corefile test case on my own time instead of blocking you.

This revision is now accepted and ready to land.Jun 7 2017, 9:36 PM
labath added a comment.Jun 8 2017, 5:30 AM

Thanks for the review. I'm not entirely proud of how I implemented this, but I hope it's not too ugly either. The reason I opted for this approach is that I needed to implement the "if the module is null then decrement the address and recompute" logic in two places. RegisterContextLLDB (computes the backtrace) and StackFrame (computes the symbol name to display). This avoids the duplication, but adds another argument to a bunch of functions. I don't really have an idea on how to make this better, but maybe you will think of something.

This revision was automatically updated to reflect the committed changes.