Page MenuHomePhabricator

Add new ABI callback to provide fallback unwind register locations
ClosedPublic

Authored by uweigand on Apr 11 2016, 11:28 AM.

Details

Summary

If the UnwindPlan did not identify how to unwind the stack pointer
register, LLDB currently assumes it can determine to caller's SP
from the current frame's CFA. This is true on most platforms
where CFA is by definition equal to the incoming SP at function
entry.

However, on the s390x target, we instead define the CFA to equal
the incoming SP plus an offset of 160 bytes. This is because
our ABI defines that the caller has to provide a register save
area of size 160 bytes. This area is allocated by the caller,
but is considered part of the callee's stack frame, and therefore
the CFA is defined as pointing to the top of this area.

In order to make this work on s390x, this patch introduces a new
ABI callback GetFallbackRegisterLocation that provides platform-
specific fallback register locations for unwinding. The existing
code to handle SP unwinding as well as volatile registers is moved
into the default implementation of that ABI callback, to allow
targets where that implementation is incorrect to override it.

This patch in itself is a no-op for all existing platforms.
But it is a pre-requisite for adding s390x support.

Diff Detail

Repository
rL LLVM

Event Timeline

uweigand updated this revision to Diff 53292.Apr 11 2016, 11:28 AM
uweigand retitled this revision from to Add new ABI callback to return CFA offset.
uweigand updated this object.
uweigand added reviewers: jasonmolenda, clayborg.
uweigand added a subscriber: lldb-commits.
clayborg edited edge metadata.Apr 11 2016, 1:37 PM

I am not sure why this offset of 160 isn't represented in the unwind info. I guess the OS unwinder that uses the EH frame info just knows that it must adjust the CFA? That seems like a hack. It seems like the unwind info should be fixed and the OS level unwinder should be fixed so that the info can be used correctly without every debugger or tool that has to parse this know about such a hack? Do you have control over the OS and are you able to fix this? That seems like the better fix.

I am not sure why this offset of 160 isn't represented in the unwind info. I guess the OS unwinder that uses the EH frame info just knows that it must adjust the CFA? That seems like a hack. It seems like the unwind info should be fixed and the OS level unwinder should be fixed so that the info can be used correctly without every debugger or tool that has to parse this know about such a hack? Do you have control over the OS and are you able to fix this? That seems like the better fix.

The offset of 160 is represented in the unwind info when computing the CFA itself. This change is about unwinding the SP register when no explicit unwind info for SP is present. In this case, the LLDB unwinder assumes it should set SP to the CFA, which is where the problem comes in.

Now, on s390, most frames actually have explicit unwind instructions for SP, so this special case is not even triggered. Only for leaf functions without a stack frame can we ever see the scenario that there are no SP unwind instructions. In those cases, the correct result on s390 is to simply leave SP unchanged (since the function did not in fact allocate a frame).

However, due to the special case in the LLDB unwinder, SP is not left unchanged, but set to the CFA (i.e. SP + 160 in this case). This is wrong. There are two possible ways to fix this:

  • Disable the special case of setting SP to CFA on s390. Instead, when there is no unwind info for SP, just leave it unchanged (just like any other call-saved register). This solution is implemented in the platform unwinder (libgcc) and also in GDB.
  • Fix the special case of setting SP to CFA by actually taking the CFA bias into account. This is what this patch implements for LLDB.

I've implemented the second method for LLDB since it actually seemed a bit more general to me (describes more precisely what actually happens on the platform), and also it seemed to fit nicely into the ABI scheme.

If you prefer, I can implement the first method instead. However, since we cannot unconditionally disable the special case of setting SP to CFA (other platforms depend on that -- which is arguably the real hack to begin with), we would still need some ABI method to tell the unwinder whether or not the special case is needed.

There's no real way to "fix the unwind info", since this is not a new platform and binaries containing this unwind info have been in the field for more than 15 years. In any case, it is not clear that there is anything to "fix" -- a binary containing no unwind info for SP to indicate that SP does not change isn't really "wrong" IMO.

I am not sure why this offset of 160 isn't represented in the unwind info. I guess the OS unwinder that uses the EH frame info just knows that it must adjust the CFA? That seems like a hack. It seems like the unwind info should be fixed and the OS level unwinder should be fixed so that the info can be used correctly without every debugger or tool that has to parse this know about such a hack? Do you have control over the OS and are you able to fix this? That seems like the better fix.

The offset of 160 is represented in the unwind info when computing the CFA itself. This change is about unwinding the SP register when no explicit unwind info for SP is present. In this case, the LLDB unwinder assumes it should set SP to the CFA, which is where the problem comes in.

This is usually represented by the CIE (Common Information Entry) which is pointed to from the FDE (Frame Description Entry). The CIE has the initial state that all FDEs can share. Seems like there should be an entry for the SP that says the rule to unwind it is "CFA+160"?

Now, on s390, most frames actually have explicit unwind instructions for SP, so this special case is not even triggered. Only for leaf functions without a stack frame can we ever see the scenario that there are no SP unwind instructions. In those cases, the correct result on s390 is to simply leave SP unchanged (since the function did not in fact allocate a frame).

However, due to the special case in the LLDB unwinder, SP is not left unchanged, but set to the CFA (i.e. SP + 160 in this case). This is wrong. There are two possible ways to fix this:

  • Disable the special case of setting SP to CFA on s390. Instead, when there is no unwind info for SP, just leave it unchanged (just like any other call-saved register). This solution is implemented in the platform unwinder (libgcc) and also in GDB.
  • Fix the special case of setting SP to CFA by actually taking the CFA bias into account. This is what this patch implements for LLDB.

There seems to be magic happening here. It seems like the CIE or FDE should describe how to unwind the SP when things are tricky. We do have the notion that registers that have no rule can have a default rule applied to them. Currently this is only used for callee saved registers and for any such registers they rule defaults to "the registers is in the register itself". For volatile registers, their default rule is "the register is not available". This is the part where I get fuzzy with respect to how the OS unwinder works: does it have this same notion of default rules for registers? If so, we should be implementing the same thing for LLDB's EH frame parser.

So a third solution is to allow the ABI plug-in to specify the default rules for registers when they aren't specified. I think the ABI plug-in is where we get the fact that a register is volatile or not...

I've implemented the second method for LLDB since it actually seemed a bit more general to me (describes more precisely what actually happens on the platform), and also it seemed to fit nicely into the ABI scheme.

If you prefer, I can implement the first method instead. However, since we cannot unconditionally disable the special case of setting SP to CFA (other platforms depend on that -- which is arguably the real hack to begin with), we would still need some ABI method to tell the unwinder whether or not the special case is needed.

There's no real way to "fix the unwind info", since this is not a new platform and binaries containing this unwind info have been in the field for more than 15 years. In any case, it is not clear that there is anything to "fix" -- a binary containing no unwind info for SP to indicate that SP does not change isn't really "wrong" IMO.

So I think implementing the default rule for unspecified registers more completely will help us solve this problem correctly. Let me know what you think?

I am not sure why this offset of 160 isn't represented in the unwind info. I guess the OS unwinder that uses the EH frame info just knows that it must adjust the CFA? That seems like a hack. It seems like the unwind info should be fixed and the OS level unwinder should be fixed so that the info can be used correctly without every debugger or tool that has to parse this know about such a hack? Do you have control over the OS and are you able to fix this? That seems like the better fix.

The offset of 160 is represented in the unwind info when computing the CFA itself. This change is about unwinding the SP register when no explicit unwind info for SP is present. In this case, the LLDB unwinder assumes it should set SP to the CFA, which is where the problem comes in.

This is usually represented by the CIE (Common Information Entry) which is pointed to from the FDE (Frame Description Entry). The CIE has the initial state that all FDEs can share. Seems like there should be an entry for the SP that says the rule to unwind it is "CFA+160"?

Well, there could be, just like there could be a default "same register" rule for call-saved registers so we wouldn't have to apply ABI knowledge. However, none of that has been done historically on any DWARF platform I know of, mostly because this would duplicate information known via the ABI into every object file and thus waste space ... In any case, the current state is as it is, we have to handle object files that currently exist.

Now, on s390, most frames actually have explicit unwind instructions for SP, so this special case is not even triggered. Only for leaf functions without a stack frame can we ever see the scenario that there are no SP unwind instructions. In those cases, the correct result on s390 is to simply leave SP unchanged (since the function did not in fact allocate a frame).

However, due to the special case in the LLDB unwinder, SP is not left unchanged, but set to the CFA (i.e. SP + 160 in this case). This is wrong. There are two possible ways to fix this:

  • Disable the special case of setting SP to CFA on s390. Instead, when there is no unwind info for SP, just leave it unchanged (just like any other call-saved register). This solution is implemented in the platform unwinder (libgcc) and also in GDB.
  • Fix the special case of setting SP to CFA by actually taking the CFA bias into account. This is what this patch implements for LLDB.

There seems to be magic happening here. It seems like the CIE or FDE should describe how to unwind the SP when things are tricky. We do have the notion that registers that have no rule can have a default rule applied to them. Currently this is only used for callee saved registers and for any such registers they rule defaults to "the registers is in the register itself". For volatile registers, their default rule is "the register is not available". This is the part where I get fuzzy with respect to how the OS unwinder works: does it have this same notion of default rules for registers? If so, we should be implementing the same thing for LLDB's EH frame parser.

Actually, there are three different default rules: the ones for call-saved and call-clobbered registers you mention, and the special rule for the SP. The platform unwinder does not care about volatile registers (because when you dispatch an exception, the catch point by definition must not care about the contents of those), but it does implement the "same register" rule for call-saved registers, and it does implement the special rule for SP on the platforms that need it.

So a third solution is to allow the ABI plug-in to specify the default rules for registers when they aren't specified. I think the ABI plug-in is where we get the fact that a register is volatile or not...

Yes, that would be a solution, in fact this is the exact solution I implemented for GDB back in the days ...

Right now, if RegisterContextLLDB::SavedLocationForRegister finds no unwind info for a register, it implements (itself) the three default rules, in this order: if it is SP, use the CFA rule; otherwise, if the ABI says the register is volatile, use the "not available rule"; otherwise, say "register not found", which the caller interprets as "same as in the next frame".

One option to fix the s390x problem would be to instead of hard-coding the SP rule and the volatile check here, always call an ABI callback to define a "fallback register rule" by returning a UnwindPlan::Row::RegisterLocation. The ABI could return a "isCFAPlusOffset" fallback rule for the SP if appropriate, or it could return a "undefined" fallback rule for volatile registers.

Is this what you were thinking about? I can try to come up with a patch along those lines.

Yes, that plan sounds great.

uweigand updated this revision to Diff 53502.Apr 12 2016, 5:36 PM
uweigand retitled this revision from Add new ABI callback to return CFA offset to Add new ABI callback to provide fallback unwind register locations.
uweigand updated this object.
uweigand edited edge metadata.

OK, here's an implementation of the new approach. In addition to the changes discussed, I had to add handling of undefined register locations -- this might now theoretically also trigger when CFI contains an explicit DW_CFA_undefined, but it should actually do the right thing then as well.

Everything still works on SystemZ using this approach, and I've also tested on Intel with no regressions.

jasonmolenda edited edge metadata.Apr 12 2016, 5:51 PM

Thanks for doing this work Ulrich, the assumption that SP==CFA was mine, I should have put this in the ABI from the start.

I worry a little about the change to RegisterContextLLDB::SavedLocationForRegister() where you've moved the ABI is-volatile check into ABI::GetFallbackRegisterLocation. If this function can't find an unwind rule for a non-volatile register, it will return UnwindLLDB::RegisterSearchResult::eRegisterNotFound and UnwindLLDB will continue searching for a definition down the stack.

You're changing the call to ABI::RegisterIsVolatile to unwindplan_regloc.IsUndefined(), assuming that GetFallbackRegisterLocation set it to undefined. But could a register have an undefined state from the eh_frame rules? I'm not sure what that would indicate - but presumably this function doesn't use that register and lldb should continue its search.

I may not be correct about this point - but what do you think about this possibility?

Just to be clear, my concern is we get an register location which is "undefined" (I've never seen it, I don't know what it means) and now SavedLocationForRegister() will say "this register is volatile, not looking any further" when it used to say "register not found in this stack frame" and the unwinder would keep looking down the stack for a save location.

My understanding of the eh_frame is that "undefined" means the value of the register is not recoverable in the current frame (default for volatile registers) while the meaning of "same" is that this frame haven't messed with the register so we should continue looking for its value down the stack. Based on this I think this change is doing the right thing and our previous implementation is the one what is incorrect in some case.

The relevant information from the dwarf spec (from section 6.4.1):

undefined: A register that has this rule has no recoverable value in the previous frame. (By convention, it is not preserved by a callee.)
same value: This register has not been modified from the previous frame. (By convention, it is preserved by the callee, but the callee has not modified it.)

Additionally I run this code through TestStandartUnwind on ARM what is a basic stress test for the unwinding system and it detected no issues.

Just to be clear, my concern is we get an register location which is "undefined" (I've never seen it, I don't know what it means) and now SavedLocationForRegister() will say "this register is volatile, not looking any further" when it used to say "register not found in this stack frame" and the unwinder would keep looking down the stack for a save location.

Note that the default state of a register location if no CFI was found at all is "unspecified", which my patch leaves unchanged. A register location can only be "undefined" if it was explicitly set to that state. The only way this can happen, except for my new code in the GetFallbackRegisterLocation, is as a result of a DW_CFA_undefined entry in DWARF CFI.

As Tamas points out, that DWARF opcode is defined to mean the register is not recoverable in this frame, i.e. treating it as "volatile" should do exactly the right thing. (Note that this may be mostly theoretical, since I've not seen any compiler ever in fact issue DW_CFA_undefined ...)

This is was I was briefly alluding to in my original comment "this might now theoretically also trigger when CFI contains an explicit DW_CFA_undefined, but it should actually do the right thing then as well."

clayborg accepted this revision.Apr 13 2016, 10:42 AM
clayborg edited edge metadata.

Is the SystemZ ABI not in this patch? Looks good otherwise as long as Jason Molenda is OK with the patch.

This revision is now accepted and ready to land.Apr 13 2016, 10:42 AM

Is the SystemZ ABI not in this patch? Looks good otherwise as long as Jason Molenda is OK with the patch.

The SystemZ ABI is in D18978, which I just re-uploaded.

jasonmolenda accepted this revision.Apr 13 2016, 2:34 PM
jasonmolenda edited edge metadata.

Hi Tamas & Ulrich. I'm good with this patch, thanks for double checking that detail for me.

This revision was automatically updated to reflect the committed changes.