This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] [DWARF] Strip out the thumb bit from addresses on ARM
Needs ReviewPublic

Authored by mstorsjo on Nov 29 2019, 12:49 AM.

Details

Summary

Windows on ARM is always thumb, and contrary to ELF, there's no per-symbol flag to indicate arm vs thumb mode. Therefore, the linkers both Microsoft's link.exe and lld) sets the thumb bit in all relocations pointing to code sections.

In ELF, the thumb bit is only set for symbols marked as %function, that is, it isn't set in local labels used when constructing the dwarf debug info and line tables.

As the windows linkers sets the thumb bit in all code addresses, we'd have to cope with this and strip out the lowest bit from all code addresses if the architecture is ARM/Thumb.

There's already predecent for this in DWARFCallFrameInfo::GetFDEIndex, this extends the same concept to DWARF debug info and line tables. I'm not sure if I managed to get all spots covered here, but I tried to look for all cases of DW_AT_low_pc/DW_AT_high_pc and find the best place to handle it, where it would have to be handled in as few places as possible.

This is admittedly not very pretty, improvement suggestions are welcome.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 29 2019, 12:49 AM
Herald added a project: Restricted Project. · View Herald Transcript

Yeah, this is going to be tricky... I don't really know what's the right way to do this, but here are the thoughts I have on this so far:

  • The new DWARFDebugInfoEntry member is going to substantially increase our memory footprint -- that's a non-starter
  • I don't like the inconsistency where all addresses are demangled in the DWARF code, except the line tables, which are in the "generic" code
  • I think we ought to have some kind of a utility function to hold the logic for the &~1 stuff. there is something like that in Architecture::GetOpcodeLoadAddress. The Architecture class is mostly independent of a target, so we may be able create one instance for each module or symbol file, but that feels quite heavy. I might even go for putting something like that into the ArchSpec class. The raison d'être of the Architecture class was to evict some heavy code out of ArchSpec -- this was ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still don't thing it belongs in ArchSpec, but the &~1 thingy seems like it could find a place there.)
  • is this behavior reproducible with lld ? If so, I think it'd make the test more readable to run llvm-mc+lld as a part of the test
  • the address byte size would best be handled separately

Yeah, this is going to be tricky... I don't really know what's the right way to do this, but here are the thoughts I have on this so far:

  • The new DWARFDebugInfoEntry member is going to substantially increase our memory footprint -- that's a non-starter

Ok, I feared that class was one where the size is critical yeah...

  • I don't like the inconsistency where all addresses are demangled in the DWARF code, except the line tables, which are in the "generic" code

Yup. The reason for doing it like this was as I tried to find the most narrow interface where I could catch all of them.

  • I think we ought to have some kind of a utility function to hold the logic for the &~1 stuff. there is something like that in Architecture::GetOpcodeLoadAddress. The Architecture class is mostly independent of a target, so we may be able create one instance for each module or symbol file, but that feels quite heavy. I might even go for putting something like that into the ArchSpec class. The raison d'être of the Architecture class was to evict some heavy code out of ArchSpec -- this was ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still don't thing it belongs in ArchSpec, but the &~1 thingy seems like it could find a place there.)
  • is this behavior reproducible with lld ? If so, I think it'd make the test more readable to run llvm-mc+lld as a part of the test

Yes it is. Ok, that's fine with me.

  • the address byte size would best be handled separately

On second thought, yes, this should be trivial to split out and make a similar test with inspecting line tables of an i386 binary.

mstorsjo updated this revision to Diff 231517.Nov 29 2019, 4:12 AM
mstorsjo edited the summary of this revision. (Show Details)

Converted the test to .s form, using lld, split out the ObjectFile changes to D70848 (which this change now depends on for the tests).

Yeah, this is going to be tricky... I don't really know what's the right way to do this, but here are the thoughts I have on this so far:

  • The new DWARFDebugInfoEntry member is going to substantially increase our memory footprint -- that's a non-starter

Ok, I feared that class was one where the size is critical yeah...

  • I don't like the inconsistency where all addresses are demangled in the DWARF code, except the line tables, which are in the "generic" code

Yup. The reason for doing it like this was as I tried to find the most narrow interface where I could catch all of them.

Is there any corresponding place in generic code where one could do the same operation on the addresses that comes from pc_lo/pc_hi ranges from .debug_info and .debug_aranges (not familiar with this section and when it's generated etc), if that would end up touching fewer places? The existing predecent in DWARFCallFrameInfo::GetFDEIndex is dwarf specific, but in generic code outside of the dwarf symbolfile plugin.

labath added a subscriber: dblaikie.

Throwing some additional dwarf people in here...

Yeah, this is going to be tricky... I don't really know what's the right way to do this, but here are the thoughts I have on this so far:

  • The new DWARFDebugInfoEntry member is going to substantially increase our memory footprint -- that's a non-starter

Ok, I feared that class was one where the size is critical yeah...

  • I don't like the inconsistency where all addresses are demangled in the DWARF code, except the line tables, which are in the "generic" code

Yup. The reason for doing it like this was as I tried to find the most narrow interface where I could catch all of them.

Is there any corresponding place in generic code where one could do the same operation on the addresses that comes from pc_lo/pc_hi ranges from .debug_info and .debug_aranges (not familiar with this section and when it's generated etc), if that would end up touching fewer places? The existing predecent in DWARFCallFrameInfo::GetFDEIndex is dwarf specific, but in generic code outside of the dwarf symbolfile plugin.

debug_aranges is a sort of a lookup table for speeding up address->compile unit searches. llvm does not generate it by default since, and I think the reason is that you can usually get the same kind of data from the DW_AT_ranges attribute of the compile unit. I don't think you would be able to handle that in generic code, as debug_aranges does not make it's way into generic code.

Overall, I think this needs to be handled in DWARF code. That may even make sense, since PDB may not suffer from the same problem (does it?)

TBH, the m_addr_mask member issue is not that hard to resolve -- all DWARFDebugInfoEntry functions get a DWARFUnit argument (otherwise they wouldn't be able to do anything). Theoretically, we could store the mask there.

However, the question on my mind is what does that say about the llvm<->lldb dwarf parser convergence (one of our long term goals is to delete the lldb dwarf parser altogether, leaving just the high level lldb-specific stuff). Adding mask handling code low into the dwarf parser would go against that goal, as there is no equivalent llvm functionality.

One possible solution would be to try to add equivalent llvm functionality -- it sounds like something like this is needed there too, as all llvm tools (I don't know if there's anything besides llvm-symbolizer) will probably not work without it. (If they do, it would be interesting to figure out how they manage that.)

Another possibility might be to implement this in the higher levels of the dwarf code (the parts that are likely to stay in lldb forever)...

dblaikie added a subscriber: rnk.Dec 2 2019, 7:41 AM

Throwing some additional dwarf people in here...

Yeah, this is going to be tricky... I don't really know what's the right way to do this, but here are the thoughts I have on this so far:

  • The new DWARFDebugInfoEntry member is going to substantially increase our memory footprint -- that's a non-starter

Ok, I feared that class was one where the size is critical yeah...

  • I don't like the inconsistency where all addresses are demangled in the DWARF code, except the line tables, which are in the "generic" code

Yup. The reason for doing it like this was as I tried to find the most narrow interface where I could catch all of them.

Is there any corresponding place in generic code where one could do the same operation on the addresses that comes from pc_lo/pc_hi ranges from .debug_info and .debug_aranges (not familiar with this section and when it's generated etc), if that would end up touching fewer places? The existing predecent in DWARFCallFrameInfo::GetFDEIndex is dwarf specific, but in generic code outside of the dwarf symbolfile plugin.

debug_aranges is a sort of a lookup table for speeding up address->compile unit searches. llvm does not generate it by default since, and I think the reason is that you can usually get the same kind of data from the DW_AT_ranges attribute of the compile unit. I don't think you would be able to handle that in generic code, as debug_aranges does not make it's way into generic code.

Overall, I think this needs to be handled in DWARF code. That may even make sense, since PDB may not suffer from the same problem (does it?)

TBH, the m_addr_mask member issue is not that hard to resolve -- all DWARFDebugInfoEntry functions get a DWARFUnit argument (otherwise they wouldn't be able to do anything). Theoretically, we could store the mask there.

However, the question on my mind is what does that say about the llvm<->lldb dwarf parser convergence (one of our long term goals is to delete the lldb dwarf parser altogether, leaving just the high level lldb-specific stuff). Adding mask handling code low into the dwarf parser would go against that goal, as there is no equivalent llvm functionality.

One possible solution would be to try to add equivalent llvm functionality -- it sounds like something like this is needed there too, as all llvm tools (I don't know if there's anything besides llvm-symbolizer) will probably not work without it. (If they do, it would be interesting to figure out how they manage that.)

Yeah, I don't know much about the ARM tagged pointer stuff, but if the tags don't appear in the return addresses in a stack trace & thus the addresses you'd pass to llvm-symbolizer then I think it'd make sense to implement it somewhere in LLVM's libDebugInfoDWARF (& yeah, don't know about PDB either, perhaps @rnk does). If there is no common code between debug_aranges and other address parsing -perhaps there should be? a filter of some kind that could be used for all addresses being read?

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

Also, any lldb_private::Address can be asked for its address class:

AddressClass Address::GetAddressClass() const;

This will return "eCode" for ARM code and "eCodeAlternateISA" for Thumb code. This is resolved by the ObjectFile::GetAddressClass:

/// Get the address type given a file address in an object file.
///
/// Many binary file formats know what kinds This is primarily for ARM
/// binaries, though it can be applied to any executable file format that
/// supports different opcode types within the same binary. ARM binaries
/// support having both ARM and Thumb within the same executable container.
/// We need to be able to get \return
///     The size of an address in bytes for the currently selected
///     architecture (and object for archives). Returns zero if no
///     architecture or object has been selected.
virtual AddressClass GetAddressClass(lldb::addr_t file_addr);

So currently no code in LLDB tries to undo the address masks that might be in the object file or debug info, and we take care of it after the fact. Early in the ARM days there used to be extra symbols that were added to the symbol table with names like "$a" for ARM, "$t" for Thumb and "$d" for data. There would be multiple of these symbols in an ordered vector of symbols that would create a CPU map. This was early in the ARM days before the branch instruction would watch for bit zero. Later ARM architectures started using bit zero to indicate which mode to put the processor in instead of using an explicit "branch to arm" or "branch to thumb" instructions. When this new stuff came out bit zero started showing up in symbol tables. So the current code allows for either the old style (CPU map in symbol table with $a $t and $d symbols) and the new style (bit zero set and no CPU map).

lldb/include/lldb/Symbol/LineTable.h
333

Might be nice to let the line table parse itself first, and then in a post production step clean up all the addresses? Maybe

void LineTable::Finalize(Architecture *arch);

Then we let the architecture plug-in handle any stripping using:

lldb::addr_t Architecture::GetOpcodeLoadAddress(lldb::addr_t addr, AddressClass addr_class) const;

The ArchitectureArm plugin does this:

addr_t ArchitectureArm::GetOpcodeLoadAddress(addr_t opcode_addr,
                                             AddressClass addr_class) const {
  switch (addr_class) {
  case AddressClass::eData:
  case AddressClass::eDebug:
    return LLDB_INVALID_ADDRESS;
  default: break;
  }
  return opcode_addr & ~(1ull);
}
lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp
19–20

Use Architecture plug-in instead of hard coded mask.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h
53

See comment in LineTable.h above.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
41–42

Use Architecture plug-in instead of hard coded mask.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
258

Use Architecture plug-in instead of hard coded mask.

273

ditto...

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h
44–45

remove

190

Use Architecture plug-in instead of hard coded mask.

lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
62

remove

159

remove

726

post produce with Architecture plug-in as mentioned in LineTable.h?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
4006–4014

remove

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
321

Use Architecture plug-in instead of hard coded mask.

lldb/source/Symbol/LineTable.cpp
23–27

Use Architecture plug-in instead of hard coded mask or post produce using Architecture plug-in.

39–40

Use Architecture plug-in instead of hard coded mask or post produce using Architecture plug-in.

So the main goal here is to let the Architecture plug-in handle the address masking in all places and make sure there is no code like:

if (ArchSpec arch = m_objfile_sp->GetArchitecture()) {
  if (arch.GetTriple().getArch() == llvm::Triple::arm ||
      arch.GetTriple().getArch() == llvm::Triple::thumb)
    return ~1ull;
}

anywhere in the codebase. Otherwise when we add a new architecture that requires bit stripping/adding, we end up having to change many locations in the code (2 of which were added with this patch). So use the Architecture plug-in to do this and we just need to edit the Architecture plug-in code for each architecture.

amccarth resigned from this revision.Dec 2 2019, 11:36 AM

I'm following along, but I don't think I have enough domain knowledge to qualify as a reviewer.

mstorsjo marked an inline comment as done.Dec 2 2019, 12:48 PM

debug_aranges is a sort of a lookup table for speeding up address->compile unit searches. llvm does not generate it by default since, and I think the reason is that you can usually get the same kind of data from the DW_AT_ranges attribute of the compile unit.

Ok, thanks for the explanation!

On the topic of debug_aranges (but unrelated to this thread), I've been notified about a project, drmingw, a postmortem debugger for the mingw environment (I think), which stumbles on code built by llvm exactly due to the lack of debug_aranges: https://github.com/jrfonseca/drmingw/issues/42#issuecomment-526850588 They speculate that it almost even would be mandated by the dwarf spec.

Overall, I think this needs to be handled in DWARF code. That may even make sense, since PDB may not suffer from the same problem (does it?)

Not sure actually, I'd have to check. (FWIW, llvm doesn't support generating codeview debug info for PDB for ARM, only x86 and aarch64, but I could check with MSVC tools.)

lldb/include/lldb/Symbol/LineTable.h
333

Thanks for all the pointers and the explanations! I'll try to have a look into all of this in a few days.

I don't want to add noise to this, but there's a related change that I'd like to upstream soon -- ARMv8.3 pointer authentication bits in addresses. With the ARMv8.3 ISA, higher bits that aren't used for addressing can be used to sign & authenticate pointers that are saved in memory, where they may be overwritten via buffer over-runs et al, to verify that they have not been modified before they are used. I hope to put together a patchset for upstreaming the changes I made to lldb to support this soon.

This is similar to the 0th bit set for the lr on Aarch32 code, to indicate that the caller function is thumb; when lldb reads the lr value, it has to strip the 0th bit before it does a symbol lookup to find the caller. With PAC bits, lldb needs to do the same -- clear or set the top bits of the address before symbol lookup.

However, the PAC bits give us two complications that the thumb bit does not have: First, the number of bits used for ptrauth are variable - depending on how many page tables the operating system is using for processes, we need to strip more/fewer bits. So we need access to a Process object to determine this at runtime. Second, people may need to debug issues with their PAC bits, so we want to print the real unadulterated pointer value, and if it resolves to a symbol address, we print the stripped value and the symbol. For instance, where we might show the lr value like this for register read:

lr = 0x00000001bfe03b48  libdispatch.dylib`dispatch_mig_server + 272

once authentication bits have been added, we'll show:

lr = 0x4051f781bfe03b48 (0x00000001bfe03b48) libdispatch.dylib`dispatch_mig_server + 272

I'm not sure we should try to handle both the thumb bit and pac bits in the same mechanism, but it's close enough that I wanted to bring it up.

I currently have my method in the ABI (my method name is poorly chosen, but I'd go with StripAuthenticationBits right now if I were putting it in the ABI) because I have a weak pointer to the Process from there.

I also added a SBValue::GetValueAsAddress API similar to GetValueAsUnsigned, which runs the value through the strip method. At the API level, script writers may want to show the pointer value with PAC bits and/or the actual pointer value, depending on what they are doing.

I have calls to this ABI method scattered across lldb, although unlike the thumb bit, it won't show up in the DWARF at all.

I haven't thought about moving this StripAuthenticationBits method into Architecture - the callers would all need to provide a Process*, but that's how everyone gets an ABI already, so it might be straightforward to do that.

debug_aranges is a sort of a lookup table for speeding up address->compile unit searches. llvm does not generate it by default since, and I think the reason is that you can usually get the same kind of data from the DW_AT_ranges attribute of the compile unit.

Ok, thanks for the explanation!

On the topic of debug_aranges (but unrelated to this thread), I've been notified about a project, drmingw, a postmortem debugger for the mingw environment (I think), which stumbles on code built by llvm exactly due to the lack of debug_aranges: https://github.com/jrfonseca/drmingw/issues/42#issuecomment-526850588 They speculate that it almost even would be mandated by the dwarf spec.

Yeah, I've had some conversations with folks on a couple of different things that have been curious about Clang/LLVM's lack of debug_aranges - my general take on it (as one of the people involved in turning off debug_aranges by default in LLVM's output - it can still be enabled by -gdwarf-aranges) is that they add a significant amount of object size for little benefit compared to the ranges on CU DIEs themselves. Clients who read input without aranges for a given CU should look in the CU DIE to see if there are ranges there - it should add very little overhead to consumers & save some significant debug info size (especially for split-dwarf in object sizes (where the relocations still exist & take up a fair bit of space - especially with compression enabled, where relocations are not compressed)).

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

So, where do I get hold of an Architecture* object instance from within the relevant contexts (within SymbolFileDWARF, where we have a reference to the object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have existing code that does the same.

labath added a comment.Dec 3 2019, 6:47 AM

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

So, where do I get hold of an Architecture* object instance from within the relevant contexts (within SymbolFileDWARF, where we have a reference to the object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have existing code that does the same.

Well... that's the tricky part (which I've tried to allude to in D70840#1763639. Currently the only thing holding an architecture object is the target class, but since one of the goals of the SymbolFile stuff is to be independent of any particular target, you cannot easily get hold of that. That is why I tried to steer this towards having this in the ArchSpec class. If we don't want that, we'll probably have to create one new Architecture instance local to each Module object...

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

So, where do I get hold of an Architecture* object instance from within the relevant contexts (within SymbolFileDWARF, where we have a reference to the object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have existing code that does the same.

Well... that's the tricky part (which I've tried to allude to in D70840#1763639. Currently the only thing holding an architecture object is the target class, but since one of the goals of the SymbolFile stuff is to be independent of any particular target, you cannot easily get hold of that. That is why I tried to steer this towards having this in the ArchSpec class. If we don't want that, we'll probably have to create one new Architecture instance local to each Module object...

Ah, sorry for overlooking those parts of your comment. @clayborg - what do you think of @labath's suggestions here?

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

So, where do I get hold of an Architecture* object instance from within the relevant contexts (within SymbolFileDWARF, where we have a reference to the object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have existing code that does the same.

Well... that's the tricky part (which I've tried to allude to in D70840#1763639. Currently the only thing holding an architecture object is the target class, but since one of the goals of the SymbolFile stuff is to be independent of any particular target, you cannot easily get hold of that. That is why I tried to steer this towards having this in the ArchSpec class. If we don't want that, we'll probably have to create one new Architecture instance local to each Module object...

Ah, sorry for overlooking those parts of your comment. @clayborg - what do you think of @labath's suggestions here?

Ping @clayborg - can you comment on @labath's suggestions?

  • I think we ought to have some kind of a utility function to hold the logic for the &~1 stuff. there is something like that in Architecture::GetOpcodeLoadAddress. The Architecture class is mostly independent of a target, so we may be able create one instance for each module or symbol file, but that feels quite heavy. I might even go for putting something like that into the ArchSpec class. The raison d'être of the Architecture class was to evict some heavy code out of ArchSpec -- this was ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still don't thing it belongs in ArchSpec, but the &~1 thingy seems like it could find a place there.)

I'm trying to dig into this now even if @clayborg hasn't commented on your suggestions.

This was moved out from the Target class in D48623/rG7aa9e7bc5739f468143b7b97060a9fbbfb94c6d2. In addition to GetOpcodeLoadAddress and GetCallableLoadAddress, there's also GetBreakableLoadAddress in ArchitectureMips, which is a rather big piece of code as well, so I guess that can't be motivated to be moved, but as you write, the &~1 bit in itself might be ok.

If we add a GetOpcodeLoadAddress in ArchSpec, which does &~1 on mips and arm - should we try to call this from the Architecture plugins (we don't have an ArchSpec stored there right now), or just see it as tolerable and justifiable duplication? The existing GetOpcodeLoadAddress takes an AddressClass as well, should we keep that, or just make a plain GetOpcodeLoadAddress that assumes that the provided address is a code address?

  • I think we ought to have some kind of a utility function to hold the logic for the &~1 stuff. there is something like that in Architecture::GetOpcodeLoadAddress. The Architecture class is mostly independent of a target, so we may be able create one instance for each module or symbol file, but that feels quite heavy. I might even go for putting something like that into the ArchSpec class. The raison d'être of the Architecture class was to evict some heavy code out of ArchSpec -- this was ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still don't thing it belongs in ArchSpec, but the &~1 thingy seems like it could find a place there.)

I'm trying to dig into this now even if @clayborg hasn't commented on your suggestions.

This was moved out from the Target class in D48623/rG7aa9e7bc5739f468143b7b97060a9fbbfb94c6d2. In addition to GetOpcodeLoadAddress and GetCallableLoadAddress, there's also GetBreakableLoadAddress in ArchitectureMips, which is a rather big piece of code as well, so I guess that can't be motivated to be moved, but as you write, the &~1 bit in itself might be ok.

Yes GetBreakableLoadAddress is definitely too big (and a huge hack too)...

If we add a GetOpcodeLoadAddress in ArchSpec, which does &~1 on mips and arm - should we try to call this from the Architecture plugins (we don't have an ArchSpec stored there right now), or just see it as tolerable and justifiable duplication?

I think it's fine to have an ArchSpec member in the "Architecture" plugin, and then implement the forwarding in the base class. As an alternative, we could look at the existing callers of this function, and see if they have an ArchSpec around so they could call this directly.

The existing GetOpcodeLoadAddress takes an AddressClass as well, should we keep that, or just make a plain GetOpcodeLoadAddress that assumes that the provided address is a code address?

I /think/ keeping the address class argument would be less confusing, but I am open to other ideas too...

mstorsjo updated this revision to Diff 233586.Dec 12 2019, 5:37 AM
mstorsjo edited the summary of this revision. (Show Details)

Added ArchSpec::GetOpcodeLoadAddress() (didn't refactor Architecture to use it yet, so this is still duplicated code with the ArchitectureArm and ArchitectureMips plugins, but posting for feedback).

Changed the existing DWARFCallFrameInfo to use it for clearing the lowest bit, and added a Finalize pass to both LineTable and DWARFDebugAranges for fixing it up.

DWARFDebugInfoEntry still has the calls scattered around, as there's many different call paths, and once parsed, the data ends up in many different places.

Sorry for the delay. I have had medical issues going on for the past month.

The hard part about Architecture vs ArchSpec is the architecture of the target doesn't always exactly match the arch of each module. But I agree that we need to be able to do more with ArchSpec level classes.

If we can get the Architecture from the inside the ArchSpec.cpp file only and always work through the ArchSpec class (an possible rename it to Architecture after get rid of the old Architecture?). There is nothing that seems target specific about the current lldb_private::Architecture class.

So my idea would be:
1 - add all virtual functions to from lldb_private::Architecture as normal members of ArchSpec.
2 - have ArchSpec grap the the lldb_private::Architecture using info in the ArchSpec when it is needed for methods added in step 1 and still call through to the lldb_private::Architecture plug-ins
3 - update all code that was directly accessing lldb_private::Architecture from the target and have them get the ArchSpec from the target instead and use that to make the same calls
4 - remove "Architecture *GetArchitecturePlugin()" from Target.h

My reasoning is there is no reason we need to access lldb_private::Architecture from a target only.

So my idea would be:
1 - add all virtual functions to from lldb_private::Architecture as normal members of ArchSpec.
2 - have ArchSpec grap the the lldb_private::Architecture using info in the ArchSpec when it is needed for methods added in step 1 and still call through to the lldb_private::Architecture plug-ins

So ArchSpec would have an Architecture* member which is lazy loaded when needed?

A couple details come to mind:

  • This would have to be a shared_ptr, to keep ArchSpec easily copyable
  • As those methods where this is used are const, should the shared_ptr Architecture be made mutable? Or skip the const on those methods?
  • There's a few places (in DWARFDebugInfoEntry) where I fetch a new copy of an ArchSpec (almost) each time I use it. In those cases, we'd end up with a new lazy load of the plugin each time (if the originating object hasn't needed loading the plugin yet). So perhaps it's best to always load it (like in ArchSpec::SetTriple) and similar ones, so that there's always an initialized shared_ptr that can be copied along cheaply?

I am strongly opposed to ArchSpec owing an Architecture object. The latter is far more complicated -- it reads bytes from target memory and disassembles them -- whereas ArchSpec just returns a bunch of constants. If anything, it should be the other way around. That way the code which just fiddles with triples does not need to depend on the whole world.

Having an Architecture instance in the Module object (to ensure it's independent of the target) seems /ok/, though I am not really convinced that it is needed, as the &~1 logic seems like a good candidate for ArchSpec (which already knows a lot about arm vs. thumb etc.

@clayborg - can you follow up on @labath's reply here?

And irrespectively if the ArchSpec vs Architecture design, can you (either of you) comment on the updated form of the patch?

jdoerfert resigned from this revision.Dec 19 2019, 8:50 AM

I am strongly opposed to ArchSpec owing an Architecture object. The latter is far more complicated -- it reads bytes from target memory and disassembles them -- whereas ArchSpec just returns a bunch of constants. If anything, it should be the other way around. That way the code which just fiddles with triples does not need to depend on the whole world.

I know part of this re-org was to keep the ArchSpec class from accessing any Target stuff to keep each directory with source code from accessing stuff in another tree. So from that perspective I can see your objection. I don't think it adds anything at all to the code by splitting this up other than achieving this separation though. It makes sense to ask an architecture to do architecture specific things. No sure why they need to be split IMHO.

Having an Architecture instance in the Module object (to ensure it's independent of the target) seems /ok/, though I am not really convinced that it is needed, as the &~1 logic seems like a good candidate for ArchSpec (which already knows a lot about arm vs. thumb etc.

I would rather keep all architecture specific code in the Architecture plug-ins somehow. It keeps code modifications clearer as only on file changes that is architecture specific. If we inline stuff into ArchSpec we end up with a big switch statement for the core and every adds their code to the same function.

So not sure what the right solution is here. I still like folding the Architecture into ArchSpec myself, but I am open to any and all opinions.

The main reason I don't want the Architecture class in ArchSpec is to keep the dependency graph of lldb-server clean (it's not very clean to begin with, but at least I want to avoid making it worse). lldb-server should not know anything about the Target class (that's a liblldb concept), but it has plenty of reasons to play around with ArchSpecs. If ArchSpec contained an Architecture instance, then lldb-server would end up transitively (Architecture::GetBreakableLoadAddress and friends) depending on Target. That's why I think we need two repositories for architecture-specific code. ArchSpec could be the home for the shared liblldb+lldb-server stuff (which will end up being low level code only, because lldb-server is low-level). The home for the second stuff might as well be the Architecture class. The &~1 logic seems sufficiently low-level to fall into the first category -- even though lldb-server does not seem to need it right now, it's not hard to imagine it needing that in the future.

A secondary issue is the inheritance. ArchSpec is currently a simple, (mostly) trivially copyable class. Introducing inheritance would complicate that. It's true that virtual dispatch can be cleaner than a switch, but I think this only applies if the logic inside the switch cases is complicated. And even then the code can be made relatively clean with a bit of care (and llvm is does not seem to be afraid of switches in general). In this particular case, I think that the logic is actually so simple that the virtual dispatch would only obscure what is going on.

If, in the future, we end up needing to put some more complicated code here, we can revisit this decision (in that case, I'd probably argue for creating some sort of a different entity to hold this functionality), but for now, I don't see a reason to complicate the ArchSpec design on account of this.

I can see how, from the perspective of someone maintaining a downstream target, having everything architecture-specific in a single place would be appealing, but I wouldn't want to put this objective too high on the priority list. Otherwise, I fear that this entity (whatever it would end up being called) will end up being a central nexus for everything in lldb. For example there's a lot of arm-specific code in ObjectFileELF. Some of that could be restructured to be independent of elf, but doing that for everything would be tricky, and so we may end up with everything depending on ELF.

And irrespectively if the ArchSpec vs Architecture design, can you (either of you) comment on the updated form of the patch?

The code still seems somewhat schizophrenic to me. :/ The line tables are fixed up super late, but DW_AT_low_pc is adjusted very early. The line table adjustment happens even after sorting, which means the fixup could alter the sort order. It probably wouldn't matter in practice, as everything would just get decremented by one, but it still seems like a bad design. And adjusting the low_pc so early will complicate the move to the llvm dwarf parser.

I think I'd most prefer some middle ground where the fixup happens after the lowest extraction layers are finished, but before the data hits the "generic" code. It's possible that no such place exists right now, but it might be possible to create something with a bit of refactoring...

And irrespectively if the ArchSpec vs Architecture design, can you (either of you) comment on the updated form of the patch?

The code still seems somewhat schizophrenic to me. :/ The line tables are fixed up super late,

This bit was based on suggestions by @clayborg here; it could be moved earlier as well, there's not much restrictions on where that one is done.

but DW_AT_low_pc is adjusted very early. The line table adjustment happens even after sorting, which means the fixup could alter the sort order. It probably wouldn't matter in practice, as everything would just get decremented by one, but it still seems like a bad design. And adjusting the low_pc so early will complicate the move to the llvm dwarf parser.

I think I'd most prefer some middle ground where the fixup happens after the lowest extraction layers are finished, but before the data hits the "generic" code. It's possible that no such place exists right now, but it might be possible to create something with a bit of refactoring...

The main issue with DW_AT_low_pc/DW_AT_high_pc, is that they're used in many different places - it's not just a straightforward process of "read data from storage into intermediate representations", but there's different accessor methods that all end up going to the original source, fetching the data. GetDIENamesAndRanges is one of the methods that reads data and outputs arrays and ranges, and those could be considered to handle later after extracting. GetAttributeAddressRange is one that is used in different contexts, where it might be possible to move the fixing up to a higher layer. But then there's cases like LookupAddress which seems to fetch the raw data from storage, just to do if ((lo_pc <= address) && (address < hi_pc)) to see if the info for the desired address is found.

But I guess it could be moved a few steps further out at least; for LookupAddress it could be done at the very end after fetching all the addresses, before doing the comparison, for GetAttributeAddressRange (which also is used by LookupAddress) it could be done right before returning, and with GetDIENamesAndRanges it could even be done by the caller.

The problem with the interface of the DWARFDebugInfoEntry class is that there's a lot of public methods, that provide access to data both at a high and low level of abstraction. Currently not all of the public methods are called from outside of the object, but the current implementation (where it's done immediately after loading values) was an attempt to safeguard all possible access patterns, even the ones not currently exercised. If we only do it in certain places, we could later end up with new code accessing the lower level methods, bypassing the fixups.

And irrespectively if the ArchSpec vs Architecture design, can you (either of you) comment on the updated form of the patch?

The code still seems somewhat schizophrenic to me. :/ The line tables are fixed up super late, but DW_AT_low_pc is adjusted very early. The line table adjustment happens even after sorting, which means the fixup could alter the sort order. It probably wouldn't matter in practice, as everything would just get decremented by one, but it still seems like a bad design. And adjusting the low_pc so early will complicate the move to the llvm dwarf parser.

I think I'd most prefer some middle ground where the fixup happens after the lowest extraction layers are finished, but before the data hits the "generic" code. It's possible that no such place exists right now, but it might be possible to create something with a bit of refactoring...

I tried to revisit this a bit now. Thanks to D72920, some of the more problematic cases went away, and I tried to trace all callers of the relevant methods and moving the fixups into them. Now the DWARFDebugInfoEntry class is no longer touched at all. I also tried to move fixups to before sorting.

mstorsjo updated this revision to Diff 240118.Jan 24 2020, 1:12 AM

Hopefully this is a bit less schizophrenic now. It's still quite RFC'y and many of the fixup loops could probably still be refactored, if someone suggests where to place the shared code.

Yes, I was keeping this problem in mind when I was working on that patch. :) I believe more work could be done to reduce the number of places that parse DW_AT_low/high_pc, but I haven't gotten around to that yet..

The thing that I am not sure we have fully explored is whether there is any need for this &~1 business in the llvm dwarf code. For instance, what should llvm::DWARFUnit::getSubroutineForAddress return if you pass it an address that is equal to the actual memory address of the start of the thumb function, but the relevant DW_AT_low_pc contains address|1? Answering that might give us an indication on what is the layer at which this fixup should be applied.

Yes, I was keeping this problem in mind when I was working on that patch. :) I believe more work could be done to reduce the number of places that parse DW_AT_low/high_pc, but I haven't gotten around to that yet..

Thanks :-)

Ok - this patch at least shows which codepaths I think are touching it at the moment. In addition to DW_AT_low/high_pc, there's also the case of DW_AT_ranges; I think all relevant paths that access that are taken care of here.

One downside to moving the fixups further out from the parsing, is that it easily becomes more brittle. If a new codepath accesses some getter without doing the fixup afterwards, we'd end up with the same hard-to-diagnose brokenness again.

The thing that I am not sure we have fully explored is whether there is any need for this &~1 business in the llvm dwarf code. For instance, what should llvm::DWARFUnit::getSubroutineForAddress return if you pass it an address that is equal to the actual memory address of the start of the thumb function, but the relevant DW_AT_low_pc contains address|1? Answering that might give us an indication on what is the layer at which this fixup should be applied.

From a quick check at the code there, for that particular case, either it'd be done in llvm::DWARFUnit::updateAddressDieMap or in llvm::DWARFDie::getAddressRanges. Since all levels of the API is public, the fix does become safer the closer to parsing it is, but one also generally has less context to work with in those cases.

I think the most narrow interface to apply the fix in would be DWARFDebugInfoEntry::GetAttributeAddressRanges (plus DWARFDebugInfoEntry::GetAttributeAddressRange which is used a few places internally in DWARFDebugInfoEntry) and DWARFDebugInfoEntry::GetDIENamesAndRanges (plus line tables); if only applied there, I think it would cover all the cases I'm fixing up at the moment...

mstorsjo updated this revision to Diff 240292.Jan 24 2020, 2:31 PM

I tried doing the other alternative; now we primarily touch up the values in DWARFDebugInfoEntry::GetAttributeAddressRange(s) and DWARFDebugInfoEntry::GetDIENamesAndRanges. This variant feels fairly concise and consistent.

clayborg added inline comments.Jan 27 2020, 1:09 PM
lldb/include/lldb/Expression/DWARFExpression.h
142 ↗(On Diff #240292)

Might be better as two functions? One to get the CU address and func address:

lldb_private::Address GetCUAddress();
lldb_private::Address GetFuncAddress();

The DWARFExpression has a Module when the location comes from DWARF and this can be used to get the arch and sanitize the address by calling GetOpcodeLoadAddress on the address before returning it.

lldb/include/lldb/Utility/RangeMap.h
247–250 ↗(On Diff #240292)

Remove if we make DWARFRangeList handle this as mentioned in inline comment

lldb/source/Expression/DWARFExpression.cpp
102–106 ↗(On Diff #240292)

Convert to:

lldb_private::Address DWARFExpression::GetCUAddress();
lldb_private::Address DWARFExpression::GetFuncAddress();
lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
396–399

What about asking the DWARFRangeList to fix all code addresses? This code would become:

ranges.GetOpcodeAddresses(arch);
400–404

Remove if we add:

lldb_private::Address DWARFExpression::GetCUAddress();
lldb_private::Address DWARFExpression::GetFuncAddress();
779–782

Call new DWARFRangeList::GetOpcodeAddresses?

ranges.GetOpcodeAddresses(arch);
mstorsjo marked 2 inline comments as done.Jan 27 2020, 1:33 PM
mstorsjo added inline comments.
lldb/include/lldb/Expression/DWARFExpression.h
142 ↗(On Diff #240292)

Oh, if we'd make DWARFExpression::SetLocationListAddresses do the sanitization, then we don't need to add the getters at all - that'd save a lot of extra changes as well.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
396–399

Yeah, I was thinking of some refactoring like that. I would have gone for a method that mutates the current range, e.g. ranges.FixOpcodeAddresses(arch), but would you prefer it to be something that returns a new sanitized range instead?

The thing that I am not sure we have fully explored is whether there is any need for this &~1 business in the llvm dwarf code. For instance, what should llvm::DWARFUnit::getSubroutineForAddress return if you pass it an address that is equal to the actual memory address of the start of the thumb function, but the relevant DW_AT_low_pc contains address|1? Answering that might give us an indication on what is the layer at which this fixup should be applied.

From a quick check at the code there, for that particular case, either it'd be done in llvm::DWARFUnit::updateAddressDieMap or in llvm::DWARFDie::getAddressRanges. Since all levels of the API is public, the fix does become safer the closer to parsing it is, but one also generally has less context to work with in those cases.

Right, your answer implicitly assumes (and I do agree with you) that the answer is that getSubroutineForAddress should return the function that address is in. That is not unreasonable -- I deliberately chose the most high-level api I could find, to make arguing that case easy. But that case should be argued nonetheless. For example, a counterargument to that might be that one should only use valid PC register values as inputs to this function (so an even value would not be a valid input here, because the function is thumb), and in that case, the there would be no fixups needed. This position could be further strengthened by the fact that the llvm dwarf parser is also used by tools (e.g., llvm-dwarfdump) which want a fairly exact view of the data in the file, without too much smartness, so there may need to be some kind of an api which would provide the "raw" value.

This is something that probably should be discussed with the llvm dwarf owners too (though most of them are on this review already).

However, if we assume that we can agree that this logic should be implemented at a fairly low level in the dwarf parsing code (so that e.g., when we switch to the llvm parser, it will do this thing for us), then this patch still seems quite "schizophrenic" to me, because the line table and location list fixups happen pretty high up. The location list case is particularly interesting here because we use a high level api (llvm::DWARFLocationExpression) to retrieve the location list, which may be reasonably expected to return "fixed" addresses, and so this fix should happen in llvm. (The line tables also use llvm code for parsing, but we access them using a very low-level api, so it's likely this check would have to done on lldb side even if llvm provided such a functionality -- but this could happen in SymbolFileDWARF::ParseLineTable instead of in the LineTable class).

So, at the end of the day, I guess what I am saying is that we should send a small RFC to the llvm debug info folks to clarify what should the behavior of the llvm classes be for this kind of dwarf. If the conclusion is that this should be implemented inside these classes (and not by their users), then we should implement it there, and then have the lldb code mirror that. This is the approach we've been trying to do for a while now when modifying or implementing new dwarf features in lldb, but this is probably the first case of wanting to implement a feature in lldb where there is no existing llvm counterpart. (In contrast the DWZ feature is also blocked in part due to concerns about increasing llvm divergence, but there I'm pretty sure the answer is that DWZ should not be implemented in llvm, and the lldb parts should be written in a way that does not require low-level dwarf changes.)

If you want, I can send the initial email to start the discussion, but I can't commit to writing any of the patches. :(

Right, your answer implicitly assumes (and I do agree with you) that the answer is that getSubroutineForAddress should return the function that address is in. That is not unreasonable -- I deliberately chose the most high-level api I could find, to make arguing that case easy. But that case should be argued nonetheless. For example, a counterargument to that might be that one should only use valid PC register values as inputs to this function (so an even value would not be a valid input here, because the function is thumb), and in that case, the there would be no fixups needed. This position could be further strengthened by the fact that the llvm dwarf parser is also used by tools (e.g., llvm-dwarfdump) which want a fairly exact view of the data in the file, without too much smartness, so there may need to be some kind of an api which would provide the "raw" value.

I guess the selection of what kind of fixups to apply could be controlled via whatever object controls the fixup. In these patches, it's lldb's ArchSpec - the low level dwarf routines doesn't have any ArchSpec but uses the one given to them to fix up the addresses.

This is something that probably should be discussed with the llvm dwarf owners too (though most of them are on this review already).

However, if we assume that we can agree that this logic should be implemented at a fairly low level in the dwarf parsing code (so that e.g., when we switch to the llvm parser, it will do this thing for us),

I guess the main question here, as follows from my comment above, is who/how is the fixup controlled in the llvm side, where there's (to my knowledge) no GetOpcodeLoadAddress that can abstract away the fixup? IIRC the dwarf code itself is happily unaware of what architecture it is describing.

then this patch still seems quite "schizophrenic" to me, because the line table and location list fixups happen pretty high up. The location list case is particularly interesting here because we use a high level api (llvm::DWARFLocationExpression) to retrieve the location list, which may be reasonably expected to return "fixed" addresses, and so this fix should happen in llvm.

FWIW, as @clayborg said that DWARFExpression owns a Module in these cases, and thus should have access to an ArchSpec, I could just slip the fixup into DWARFExpression::SetLocationListAddresses. The reason I was messing around there was primarily to avoid sprinkling fixups in many codepaths in DWARFDebugInfoEntry::GetDIENamesAndRanges that call this method.

(The line tables also use llvm code for parsing, but we access them using a very low-level api, so it's likely this check would have to done on lldb side even if llvm provided such a functionality -- but this could happen in SymbolFileDWARF::ParseLineTable instead of in the LineTable class).

Sure, I'll look into moving the fixups there.

So, at the end of the day, I guess what I am saying is that we should send a small RFC to the llvm debug info folks to clarify what should the behavior of the llvm classes be for this kind of dwarf. If the conclusion is that this should be implemented inside these classes (and not by their users), then we should implement it there, and then have the lldb code mirror that. This is the approach we've been trying to do for a while now when modifying or implementing new dwarf features in lldb, but this is probably the first case of wanting to implement a feature in lldb where there is no existing llvm counterpart. (In contrast the DWZ feature is also blocked in part due to concerns about increasing llvm divergence, but there I'm pretty sure the answer is that DWZ should not be implemented in llvm, and the lldb parts should be written in a way that does not require low-level dwarf changes.)

If you want, I can send the initial email to start the discussion, but I can't commit to writing any of the patches. :(

If you can do that, that would be very much appreciated!

This is something that probably should be discussed with the llvm dwarf owners too (though most of them are on this review already).

However, if we assume that we can agree that this logic should be implemented at a fairly low level in the dwarf parsing code (so that e.g., when we switch to the llvm parser, it will do this thing for us),

I guess the main question here, as follows from my comment above, is who/how is the fixup controlled in the llvm side, where there's (to my knowledge) no GetOpcodeLoadAddress that can abstract away the fixup? IIRC the dwarf code itself is happily unaware of what architecture it is describing.

Yes, that is the trickiest part. Dwarf is architecture-agnostic for the most part (which makes this behavior of the windows linker very unfortunate), but the llvm DWARFContext already knows the architecture of the file it is describing (DWARFContext::getArch). The fixup could key off of that, but this will need careful consideration.

then this patch still seems quite "schizophrenic" to me, because the line table and location list fixups happen pretty high up. The location list case is particularly interesting here because we use a high level api (llvm::DWARFLocationExpression) to retrieve the location list, which may be reasonably expected to return "fixed" addresses, and so this fix should happen in llvm.

FWIW, as @clayborg said that DWARFExpression owns a Module in these cases, and thus should have access to an ArchSpec, I could just slip the fixup into DWARFExpression::SetLocationListAddresses. The reason I was messing around there was primarily to avoid sprinkling fixups in many codepaths in DWARFDebugInfoEntry::GetDIENamesAndRanges that call this method.

Here, I wasn't referring to SetLocationListAddresses, but rather to the addresses you get from the location list section (DWARFExpression::GetLocationExpression). I would expect you need to fix those do (but you don't do it in this patch). As for SetLocationListAddresses, I would still consider that high level, as the addresses pass through several high-level functions before landing here. If we go with the low-level approach, I'd expect this to happen somewhere near DWARFUnit::GetBaseAddress and DWARFDIE::getLowPC (imaginary function inspired by llvm::DWARFDie::getLowAndHighPC)

(The line tables also use llvm code for parsing, but we access them using a very low-level api, so it's likely this check would have to done on lldb side even if llvm provided such a functionality -- but this could happen in SymbolFileDWARF::ParseLineTable instead of in the LineTable class).

Sure, I'll look into moving the fixups there.

I wouldn't bother with that too much until we figure out the overall strategy here.

If you want, I can send the initial email to start the discussion, but I can't commit to writing any of the patches. :(

If you can do that, that would be very much appreciated!

OK, I'll try to get the ball rolling there.

clayborg added inline comments.Jan 28 2020, 10:42 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
399

Either way, I am not picky.

Speaking to having the LLVM layer strip things out, this should be an option if it does get added there. If we have a stripped binary and we have no symbols or any other CPU map, and DWARF is the only way to tell if a function was ARM or Thumb by looking at bit zero, then we need to somehow be able to get this raw information.

Speaking to having the LLVM layer strip things out, this should be an option if it does get added there. If we have a stripped binary and we have no symbols or any other CPU map, and DWARF is the only way to tell if a function was ARM or Thumb by looking at bit zero, then we need to somehow be able to get this raw information.

I would expect there will still be some way to get the raw data, because the dumping tools would want to get it.

However, if I understand things correctly, this is would not be a very useful source of information for us -- on windows arm, all debug info addresses will have the thumb bit set (because everything is thumb), and everywhere else the debug info will have the bit 0 cleared, regardless of whether the function is thumb or not...

I did not send out any emails for this, but I did talk about this with @dblaikie when I was in the US last week. My take from that is that we should start by "fixing" llvm-symbolizer so that it works reasonably in these cases. I'll use this godbolt link to illustrate what I mean.

If we take a look at the deref function in that link, and try to pass the address of the first instruction ldr r0, [r0] to llvm-symbolizer, then we will get something reasonable in the non-windows case, but we will get nothing in the windows case, because the debug info would claim that the function starts halfway into that opcode. Fixing the tool so that it returns the same thing everywhere sounds reasonable, and since it does both line table and debug_info lookups, this will already answer the main questions we have about where this logic should be implemented. Then we can just copy that in lldb.

Now someone might argue that the looking up the address of the ldr opcode is wrong, because that is not the actual pc value, and that we should lookup the address of ldr+1. In that case we can point them to the next function (call_noreturn), which ends with a call to a noreturn function. Now if we get a crash in does_not_return(), the address we will get from the backtrace will be one byte after the last opcode of call_noreturn (bl does_not_return()). This means that in the non-windows case, it will not resolve correctly to the call_noreturn even if we apply the usual trick of subtracting 1 from the address to bring it into the range of the calling function. Either way, there's some fixup that needs to happen somewhere...

I did not send out any emails for this, but I did talk about this with @dblaikie when I was in the US last week. My take from that is that we should start by "fixing" llvm-symbolizer so that it works reasonably in these cases. I'll use this godbolt link to illustrate what I mean.

If we take a look at the deref function in that link, and try to pass the address of the first instruction ldr r0, [r0] to llvm-symbolizer, then we will get something reasonable in the non-windows case, but we will get nothing in the windows case, because the debug info would claim that the function starts halfway into that opcode. Fixing the tool so that it returns the same thing everywhere sounds reasonable, and since it does both line table and debug_info lookups, this will already answer the main questions we have about where this logic should be implemented. Then we can just copy that in lldb.

Thanks for keeping track of this! I'll look into fixing this at some suitable level for llvm-symbolizer for further discussion.

Now someone might argue that the looking up the address of the ldr opcode is wrong, because that is not the actual pc value, and that we should lookup the address of ldr+1. In that case we can point them to the next function (call_noreturn), which ends with a call to a noreturn function. Now if we get a crash in does_not_return(), the address we will get from the backtrace will be one byte after the last opcode of call_noreturn (bl does_not_return()). This means that in the non-windows case, it will not resolve correctly to the call_noreturn even if we apply the usual trick of subtracting 1 from the address to bring it into the range of the calling function.

Hmm... I presume this issue is present now already (and doesn't really change due to normalizing the line tables and ranges on windows, to the same as linux). Not familiar with "the usual trick of subtracting 1 from the address to bring it into the range of the calling function", but wouldn't something like GetOpcodeLoadAddress(pc) - 1 always bring you into the range of the calling function, given a return address?

Now someone might argue that the looking up the address of the ldr opcode is wrong, because that is not the actual pc value, and that we should lookup the address of ldr+1. In that case we can point them to the next function (call_noreturn), which ends with a call to a noreturn function. Now if we get a crash in does_not_return(), the address we will get from the backtrace will be one byte after the last opcode of call_noreturn (bl does_not_return()). This means that in the non-windows case, it will not resolve correctly to the call_noreturn even if we apply the usual trick of subtracting 1 from the address to bring it into the range of the calling function.

Hmm... I presume this issue is present now already (and doesn't really change due to normalizing the line tables and ranges on windows, to the same as linux). Not familiar with "the usual trick of subtracting 1 from the address to bring it into the range of the calling function", but wouldn't something like GetOpcodeLoadAddress(pc) - 1 always bring you into the range of the calling function, given a return address?

Lldb uses this when unwinding specifically to deal with the "call as a last instruction" problem, and I'm pretty sure it's not the only tool doing that. This is even described in the DWARF spec (non-normative text):

In most cases the return address is in the same context as the calling address, but that
need not be the case, especially if the producer knows in some way the call never will
return. The context of the ’return address’ might be on a different line, in a different
lexical block, or past the end of the calling subroutine. If a consumer were to assume that
it was in the same context as the calling address, the virtual unwind might fail.

For architectures with constant-length instructions where the return address
immediately follows the call instruction, a simple solution is to subtract the length of an
instruction from the return address to obtain the calling instruction. For architectures
with variable-length instructions (for example, x86), this is not possible. However,
subtracting 1 from the return address, although not guaranteed to provide the exact
calling address, generally will produce an address within the same context as the calling
address, and that usually is sufficient.

Using GetOpcodeLoadAddress(pc) - 1 would work, but this implies that the caller should be passing in "load" addresses, which is in conflict with the premise I have made at the beginning of the paragraph that one should be passing in "code" addresses here. (My attempt at proof by contradiction.)

Now someone might argue that the looking up the address of the ldr opcode is wrong, because that is not the actual pc value, and that we should lookup the address of ldr+1. In that case we can point them to the next function (call_noreturn), which ends with a call to a noreturn function. Now if we get a crash in does_not_return(), the address we will get from the backtrace will be one byte after the last opcode of call_noreturn (bl does_not_return()). This means that in the non-windows case, it will not resolve correctly to the call_noreturn even if we apply the usual trick of subtracting 1 from the address to bring it into the range of the calling function.

Hmm... I presume this issue is present now already (and doesn't really change due to normalizing the line tables and ranges on windows, to the same as linux). Not familiar with "the usual trick of subtracting 1 from the address to bring it into the range of the calling function", but wouldn't something like GetOpcodeLoadAddress(pc) - 1 always bring you into the range of the calling function, given a return address?

Lldb uses this when unwinding specifically to deal with the "call as a last instruction" problem, and I'm pretty sure it's not the only tool doing that. This is even described in the DWARF spec (non-normative text):

In most cases the return address is in the same context as the calling address, but that
need not be the case, especially if the producer knows in some way the call never will
return. The context of the ’return address’ might be on a different line, in a different
lexical block, or past the end of the calling subroutine. If a consumer were to assume that
it was in the same context as the calling address, the virtual unwind might fail.

For architectures with constant-length instructions where the return address
immediately follows the call instruction, a simple solution is to subtract the length of an
instruction from the return address to obtain the calling instruction. For architectures
with variable-length instructions (for example, x86), this is not possible. However,
subtracting 1 from the return address, although not guaranteed to provide the exact
calling address, generally will produce an address within the same context as the calling
address, and that usually is sufficient.

Using GetOpcodeLoadAddress(pc) - 1 would work, but this implies that the caller should be passing in "load" addresses, which is in conflict with the premise I have made at the beginning of the paragraph that one should be passing in "code" addresses here. (My attempt at proof by contradiction.)

Ah, I see - so this also is another argument for making things work with "load" addresses - good then.