This is an archive of the discontinued LLVM Phabricator instance.

Use eAddressClassCode for address lookup for opcodes
ClosedPublic

Authored by tberghammer on Sep 2 2015, 8:16 AM.

Details

Summary

Use eAddressClassCode for address lookup for opcodes

Forcing eAddressClassCode if we are looking for an opcode address
is required because of the following edge case on arm:

bx <addr> Non-tail call in a no return function
[data-pool] Marked with $d mapping symbol

The return address of the function call will point to the data pool but
we have to treat it as code so the StackFrame can calculate the symbols
correctly.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 33811.Sep 2 2015, 8:16 AM
tberghammer retitled this revision from to Introduce new address class eAddressClassDataIntermixedCode.
tberghammer updated this object.
tberghammer added a reviewer: clayborg.
tberghammer added a subscriber: lldb-commits.
clayborg edited edge metadata.Sep 2 2015, 10:53 AM

Changing all $d symbols to always say they are eAddressClassDataIntermixedCode is wrong because the symbols in the .data section now would be marked as eAddressClassDataIntermixedCode.

To clarify a few things, lets say we have the following code:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

Should just claim that 0x1000 is eAddressClassCode and that 0x1004 is eAddressClassData. We don't need a new eAddressClassDataIntermixedCode for this, it should just say eAddressClassData for 0x1004.

For return addresses we that are on the stack on in the LR, we should actually be sanitizing them before we start doing lookups. So the return address would be 0x1005 in this case you are talking about right? Maybe we always get the address class of the return address and check if it is eAddressClassData. If it is, we know something is wrong since the return address can't go to data and we work around this by recognizing that fact.

I would rather not just say that all data is eAddressClassDataIntermixedCode for ARM and Thumb, that seems like the wrong fix.

clayborg requested changes to this revision.Sep 2 2015, 10:54 AM
clayborg edited edge metadata.

Marking as needing changes due to above comments.

This revision now requires changes to proceed.Sep 2 2015, 10:54 AM

Changing all $d symbols to always say they are eAddressClassDataIntermixedCode is wrong because the symbols in the .data section now would be marked as eAddressClassDataIntermixedCode.

We only use the m_address_class_map for the code section (when the address class of the section is eAddressClassCode) so we won't mark the symbols in the .data section as eAddressClassDataIntermixedCode.

To clarify a few things, lets say we have the following code:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

Should just claim that 0x1000 is eAddressClassCode and that 0x1004 is eAddressClassData. We don't need a new eAddressClassDataIntermixedCode for this, it should just say eAddressClassData for 0x1004.

This is the current implementation (before this change).

For return addresses we that are on the stack on in the LR, we should actually be sanitizing them before we start doing lookups. So the return address would be 0x1005 in this case you are talking about right? Maybe we always get the address class of the return address and check if it is eAddressClassData. If it is, we know something is wrong since the return address can't go to data and we work around this by recognizing that fact.

It is incorrect in some sense if we see that the return address points to a data section but in the described case this is the best what LLDB can say (and it is true that if the function return, then it will try to execute some code from the data segment), and from this address the unwinding code can continue without any issue (assuming lr is saved somewhere).

I would rather not just say that all data is eAddressClassDataIntermixedCode for ARM and Thumb, that seems like the wrong fix.

What is your opinion about changing GetOpcodeLoadAddress and GetCallableLoadAddress to always mask out the LSB on arm/thumb and never return LLDB_INVALID_ADDRESS? It would fix the issue and will be (mostly) consistent with the other architectures where we don't do any checking, but it will drop a slight safety net for the case when something went seriously wrong and we really ended up in a data section (we don't have this check for other architectures).

I would leave everything as is (no eAddressClassDataIntermixedCode), but I would change the code to use:

target->GetOpcodeLoadAddress (return_load_addr, eAddressClassCode);

We don't need to lookup the address class type when determining return addresses. Then this should just work.

The main reason I don't want stuff changing by adding eAddressClassDataIntermixedCode is if we are single stepping over 0x1000:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

We might set a breakpoint at 0x1004, but if we determine that 0x1004 is data, we won't set a breakpoint there... So there are cases where we want to know the truth about address 1004. But in the case of the LR or any return address, we need to just assume it is a code address instead of figuring out what it actually is.

So:

target->GetOpcodeLoadAddress (return_load_addr, eAddressClassCode);

Would get used in the stack backtracing code that is currently looking up the real address class of the return address...

tberghammer updated this revision to Diff 33937.Sep 3 2015, 5:48 AM
tberghammer retitled this revision from Introduce new address class eAddressClassDataIntermixedCode to Use eAddressClassCode for address lookup for opcodes.
tberghammer updated this object.
tberghammer edited edge metadata.

Address review comment

clayborg requested changes to this revision.Sep 3 2015, 11:48 AM
clayborg edited edge metadata.

Did you check who is all calling this? Is it only places that know that an address is code? It seems like we have might have different clients expecting different things out of this function call. For the stack backtracing code, we want to force the address to be code. Others might want to know if it is data. One idea is to add a bool parameter like "bool address_is_always_code". If this is true we call:

code_addr = target->GetOpcodeLoadAddress (code_addr, address_is_always_code ? eAddressClassCode : GetAddressClass());
This revision now requires changes to proceed.Sep 3 2015, 11:48 AM
tberghammer updated this revision to Diff 34038.Sep 4 2015, 9:11 AM
tberghammer edited edge metadata.

Updated the change based on the comments.

I don't fully agree with restricting the user from setting breakpoint in non-code locations because if LLDB classified a section incorrectly (e.g. haven't found the SO file for it) the user might want to still set a breakpoint there. In general I would like to make it possible to set a breakpoint at any address (even on un-aligned ones) but warn the user that it might be incorrect.

jingham added a subscriber: jingham.Sep 4 2015, 9:21 AM

If the user says "break set -a <SOME ADDRESS>" I have no problem with our setting the breakpoint there even if we don't think it is a terribly good idea. But if lldb is converting any other specification to an address, it should always move past data in text. The failure modes if you aren't careful about this are really confusing: "Why was the first value in my enum whatever the trap instruction is on your platform..." etc... If allowing the former makes it hard to do the latter, the latter should have priority.

Jim

I agree that we want to enable it only in very special cases when the user really know what he/she wants (probably pass in a --force flag). Anyway, it isn't implemented with this change and I don't expect it to be implemented in the near future.

clayborg accepted this revision.Sep 4 2015, 10:05 AM
clayborg edited edge metadata.

We really do need to restrict this for single stepping purposes. If the thread plans that single step and set breakpoints for stepping think they should place a breakpoint on 0x1004 if the example below:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

You will change the data with the software breakpoint instruction and change the flow of your program incorrectly. I do agree we should have a "force" option to allow this to be done by the user, but we need to do due diligence to make sure we don't do this in LLDB code.

Your updated changes look good though.

This revision is now accepted and ready to land.Sep 4 2015, 10:05 AM
This revision was automatically updated to reflect the committed changes.