This is an archive of the discontinued LLVM Phabricator instance.

Use eAddressClassCode for address lookup for opcodes
ClosedPublic

Authored by tberghammer on Sep 2 2015, 8:16 AM.

Download Raw Diff

Details

Reviewers

Commits

rG25b9f7ebd382: Use eAddressClassCode for address lookup for opcodes for stack frames
rLLDB246958: Use eAddressClassCode for address lookup for opcodes for stack frames
rL246958: Use eAddressClassCode for address lookup for opcodes for stack frames

Summary

Use eAddressClassCode for address lookup for opcodes

Forcing eAddressClassCode if we are looking for an opcode address
is required because of the following edge case on arm:

bx <addr> Non-tail call in a no return function
[data-pool] Marked with $d mapping symbol

The return address of the function call will point to the data pool but
we have to treat it as code so the StackFrame can calculate the symbols
correctly.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 33811.Sep 2 2015, 8:16 AM

tberghammer retitled this revision from to Introduce new address class eAddressClassDataIntermixedCode.

tberghammer updated this object.

tberghammer added a reviewer: clayborg.

tberghammer added a subscriber: lldb-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptSep 2 2015, 8:16 AM

Changing all $d symbols to always say they are eAddressClassDataIntermixedCode is wrong because the symbols in the .data section now would be marked as eAddressClassDataIntermixedCode.

To clarify a few things, lets say we have the following code:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

Should just claim that 0x1000 is eAddressClassCode and that 0x1004 is eAddressClassData. We don't need a new eAddressClassDataIntermixedCode for this, it should just say eAddressClassData for 0x1004.

For return addresses we that are on the stack on in the LR, we should actually be sanitizing them before we start doing lookups. So the return address would be 0x1005 in this case you are talking about right? Maybe we always get the address class of the return address and check if it is eAddressClassData. If it is, we know something is wrong since the return address can't go to data and we work around this by recognizing that fact.

I would rather not just say that all data is eAddressClassDataIntermixedCode for ARM and Thumb, that seems like the wrong fix.

Marking as needing changes due to above comments.

This revision now requires changes to proceed.Sep 2 2015, 10:54 AM

In D12556#238457, @clayborg wrote:

Changing all $d symbols to always say they are eAddressClassDataIntermixedCode is wrong because the symbols in the .data section now would be marked as eAddressClassDataIntermixedCode.

We only use the m_address_class_map for the code section (when the address class of the section is eAddressClassCode) so we won't mark the symbols in the .data section as eAddressClassDataIntermixedCode.

To clarify a few things, lets say we have the following code:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

Should just claim that 0x1000 is eAddressClassCode and that 0x1004 is eAddressClassData. We don't need a new eAddressClassDataIntermixedCode for this, it should just say eAddressClassData for 0x1004.

This is the current implementation (before this change).

For return addresses we that are on the stack on in the LR, we should actually be sanitizing them before we start doing lookups. So the return address would be 0x1005 in this case you are talking about right? Maybe we always get the address class of the return address and check if it is eAddressClassData. If it is, we know something is wrong since the return address can't go to data and we work around this by recognizing that fact.

It is incorrect in some sense if we see that the return address points to a data section but in the described case this is the best what LLDB can say (and it is true that if the function return, then it will try to execute some code from the data segment), and from this address the unwinding code can continue without any issue (assuming lr is saved somewhere).

I would rather not just say that all data is eAddressClassDataIntermixedCode for ARM and Thumb, that seems like the wrong fix.

What is your opinion about changing GetOpcodeLoadAddress and GetCallableLoadAddress to always mask out the LSB on arm/thumb and never return LLDB_INVALID_ADDRESS? It would fix the issue and will be (mostly) consistent with the other architectures where we don't do any checking, but it will drop a slight safety net for the case when something went seriously wrong and we really ended up in a data section (we don't have this check for other architectures).

I would leave everything as is (no eAddressClassDataIntermixedCode), but I would change the code to use:

target->GetOpcodeLoadAddress (return_load_addr, eAddressClassCode);

We don't need to lookup the address class type when determining return addresses. Then this should just work.

The main reason I don't want stuff changing by adding eAddressClassDataIntermixedCode is if we are single stepping over 0x1000:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

We might set a breakpoint at 0x1004, but if we determine that 0x1004 is data, we won't set a breakpoint there... So there are cases where we want to know the truth about address 1004. But in the case of the LR or any return address, we need to just assume it is a code address instead of figuring out what it actually is.

So:

target->GetOpcodeLoadAddress (return_load_addr, eAddressClassCode);

Would get used in the stack backtracing code that is currently looking up the real address class of the return address...

Address review comment

Did you check who is all calling this? Is it only places that know that an address is code? It seems like we have might have different clients expecting different things out of this function call. For the stack backtracing code, we want to force the address to be code. Others might want to know if it is data. One idea is to add a bool parameter like "bool address_is_always_code". If this is true we call:

code_addr = target->GetOpcodeLoadAddress (code_addr, address_is_always_code ? eAddressClassCode : GetAddressClass());

This revision now requires changes to proceed.Sep 3 2015, 11:48 AM

Updated the change based on the comments.

I don't fully agree with restricting the user from setting breakpoint in non-code locations because if LLDB classified a section incorrectly (e.g. haven't found the SO file for it) the user might want to still set a breakpoint there. In general I would like to make it possible to set a breakpoint at any address (even on un-aligned ones) but warn the user that it might be incorrect.

If the user says "break set -a <SOME ADDRESS>" I have no problem with our setting the breakpoint there even if we don't think it is a terribly good idea. But if lldb is converting any other specification to an address, it should always move past data in text. The failure modes if you aren't careful about this are really confusing: "Why was the first value in my enum whatever the trap instruction is on your platform..." etc... If allowing the former makes it hard to do the latter, the latter should have priority.

Jim

I agree that we want to enable it only in very special cases when the user really know what he/she wants (probably pass in a --force flag). Anyway, it isn't implemented with this change and I don't expect it to be implemented in the near future.

We really do need to restrict this for single stepping purposes. If the thread plans that single step and set breakpoints for stepping think they should place a breakpoint on 0x1004 if the example below:

0x1000: bx <addr> Non-tail call in a no return function
0x1004: [data-pool] Marked with $d mapping symbol

You will change the data with the software breakpoint instruction and change the flow of your program incorrectly. I do agree we should have a "force" option to allow this to be done by the user, but we need to do due diligence to make sure we don't do this in LLDB code.

Your updated changes look good though.

This revision is now accepted and ready to land.Sep 4 2015, 10:05 AM

Closed by commit rL246958: Use eAddressClassCode for address lookup for opcodes for stack frames (authored by tberghammer). · Explain WhySep 7 2015, 2:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

lldb/

lldb-enumerations.h

1 line

source/

Plugins/

ObjectFile/

ELF/

ObjectFileELF.cpp

4 lines

Target/

Target.cpp

2 lines

Diff 33811

include/lldb/lldb-enumerations.h

Show First 20 Lines • Show All 775 Lines • ▼ Show 20 Lines	enum FrameComparison
//----------------------------------------------------------------------		//----------------------------------------------------------------------
enum AddressClass		enum AddressClass
{		{
eAddressClassInvalid,		eAddressClassInvalid,
eAddressClassUnknown,		eAddressClassUnknown,
eAddressClassCode,		eAddressClassCode,
eAddressClassCodeAlternateISA,		eAddressClassCodeAlternateISA,
eAddressClassData,		eAddressClassData,
		eAddressClassDataIntermixedCode,
eAddressClassDebug,		eAddressClassDebug,
eAddressClassRuntime		eAddressClassRuntime
};		};

//----------------------------------------------------------------------		//----------------------------------------------------------------------
// File Permissions		// File Permissions
//		//
// Designed to mimic the unix file permission bits so they can be		// Designed to mimic the unix file permission bits so they can be
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Show First 20 Lines • Show All 1,994 Lines • ▼ Show 20 Lines	for (i = 0; i < num_symbols; ++i)
{		{
// $b[.<any>]* - marks a THUMB BL instruction sequence		// $b[.<any>]* - marks a THUMB BL instruction sequence
// $t[.<any>]* - marks a THUMB instruction sequence		// $t[.<any>]* - marks a THUMB instruction sequence
m_address_class_map[symbol.st_value] = eAddressClassCodeAlternateISA;		m_address_class_map[symbol.st_value] = eAddressClassCodeAlternateISA;
}		}
else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))		else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))
{		{
// $d[.<any>]* - marks a data item sequence (e.g. lit pool)		// $d[.<any>]* - marks a data item sequence (e.g. lit pool)
m_address_class_map[symbol.st_value] = eAddressClassData;		m_address_class_map[symbol.st_value] = eAddressClassDataIntermixedCode;
}		}
}		}
continue;		continue;
}		}
}		}
else if (arch.GetMachine() == llvm::Triple::aarch64)		else if (arch.GetMachine() == llvm::Triple::aarch64)
{		{
if (symbol.getBinding() == STB_LOCAL && symbol_name && symbol_name[0] == '$')		if (symbol.getBinding() == STB_LOCAL && symbol_name && symbol_name[0] == '$')
{		{
// These are reserved for the specification (e.g.: mapping		// These are reserved for the specification (e.g.: mapping
// symbols). We don't want to add them to the symbol table.		// symbols). We don't want to add them to the symbol table.

if (symbol_type == eSymbolTypeCode)		if (symbol_type == eSymbolTypeCode)
{		{
llvm::StringRef symbol_name_ref(symbol_name);		llvm::StringRef symbol_name_ref(symbol_name);
if (symbol_name_ref == "$x" \|\| symbol_name_ref.startswith("$x."))		if (symbol_name_ref == "$x" \|\| symbol_name_ref.startswith("$x."))
{		{
// $x[.<any>]* - marks an A64 instruction sequence		// $x[.<any>]* - marks an A64 instruction sequence
m_address_class_map[symbol.st_value] = eAddressClassCode;		m_address_class_map[symbol.st_value] = eAddressClassCode;
}		}
else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))		else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))
{		{
// $d[.<any>]* - marks a data item sequence (e.g. lit pool)		// $d[.<any>]* - marks a data item sequence (e.g. lit pool)
m_address_class_map[symbol.st_value] = eAddressClassData;		m_address_class_map[symbol.st_value] = eAddressClassDataIntermixedCode;
}		}
}		}

continue;		continue;
}		}
}		}

if (arch.GetMachine() == llvm::Triple::arm)		if (arch.GetMachine() == llvm::Triple::arm)
▲ Show 20 Lines • Show All 1,077 Lines • Show Last 20 Lines

source/Target/Target.cpp

Show First 20 Lines • Show All 2,078 Lines • ▼ Show 20 Lines	case llvm::Triple::thumb:
case eAddressClassDebug:		case eAddressClassDebug:
return LLDB_INVALID_ADDRESS;		return LLDB_INVALID_ADDRESS;

case eAddressClassUnknown:		case eAddressClassUnknown:
case eAddressClassInvalid:		case eAddressClassInvalid:
case eAddressClassCode:		case eAddressClassCode:
case eAddressClassCodeAlternateISA:		case eAddressClassCodeAlternateISA:
case eAddressClassRuntime:		case eAddressClassRuntime:
		case eAddressClassDataIntermixedCode:
// Check if bit zero it no set?		// Check if bit zero it no set?
if ((code_addr & 1ull) == 0)		if ((code_addr & 1ull) == 0)
{		{
// Bit zero isn't set, check if the address is a multiple of 2?		// Bit zero isn't set, check if the address is a multiple of 2?
if (code_addr & 2ull)		if (code_addr & 2ull)
{		{
// The address is a multiple of 2 so it must be thumb, set bit zero		// The address is a multiple of 2 so it must be thumb, set bit zero
code_addr \|= 1ull;		code_addr \|= 1ull;
Show All 29 Lines	case llvm::Triple::thumb:
case eAddressClassDebug:		case eAddressClassDebug:
return LLDB_INVALID_ADDRESS;		return LLDB_INVALID_ADDRESS;

case eAddressClassInvalid:		case eAddressClassInvalid:
case eAddressClassUnknown:		case eAddressClassUnknown:
case eAddressClassCode:		case eAddressClassCode:
case eAddressClassCodeAlternateISA:		case eAddressClassCodeAlternateISA:
case eAddressClassRuntime:		case eAddressClassRuntime:
		case eAddressClassDataIntermixedCode:
opcode_addr &= ~(1ull);		opcode_addr &= ~(1ull);
break;		break;
}		}
break;		break;

default:		default:
break;		break;
}		}
▲ Show 20 Lines • Show All 1,795 Lines • Show Last 20 Lines