This is an archive of the discontinued LLVM Phabricator instance.

Fix lookup of symbols with the same address range but different binding
ClosedPublic

Authored by jankratochvil on Jun 19 2019, 3:54 AM.

Details

Summary

This fixes a failing testcase on Fedora 30 x86_64 (regression Fedora 29->30):

PASS:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`__GI_raise + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`__GI_abort + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`__libc_start_main + 243
    frame #7: 0x000000000040106e a.out`_start + 46

vs.

FAIL - unrecognized abort() function:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`.annobin_raise.c + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`.annobin_loadmsgcat.c_end.unlikely + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`.annobin_libc_start.c + 243
    frame #7: 0x000000000040106e a.out`.annobin_init.c.hot + 46

The extra ELF symbols are there due to Annobin (I did not investigate why this problem happened specifically since F-30 and not since F-28).
It is due to:

Symbol table '.dynsym' contains 2361 entries:
Valu e          Size Type   Bind   Vis     Name
0000000000022769   5 FUNC   LOCAL  DEFAULT _nl_load_domain.cold
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_loadmsgcat.c_end.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_textdomain.c_end.unlikely
000000000002276e 548 FUNC   GLOBAL DEFAULT abort
000000000002276e 548 FUNC   GLOBAL DEFAULT abort@@GLIBC_2.2.5
000000000002276e 548 FUNC   LOCAL  DEFAULT __GI_abort
0000000000022992   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c_end.unlikely

GDB has some more complicated preferences between overlapping and/or sharing address symbols, I have made here so far the most simple fix for this case.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jankratochvil created this revision.Jun 19 2019, 3:54 AM

The fix seems reasonable to me. There's just one thing I don't get: Why was it necessary to change the iteration order here?
IIUC, the real business happens on line 904, where you add the if (entry[i].base == entry[i+1].base) break; check. I could be missing something, but it seems to me like that thing would work no matter what the iteration order is...

lldb/lit/SymbolFile/sizeless-symbol.test
1

Are you sure this will work whatever the host triple happens to be ? (you can try it out by manually forcing a couple of random targets with the -target argument). I have a feeling the .type, .size, etc. directives are a feature of elf-targetting assemblers. If that doesn't work you can always force a specific triple with the -target command.

4

As you're fixing this bug by changing the way symbol size is computed, it would be nice to also put (and check it's output) a "image dump symtab" command here. That would also make it clear what are the assumptions the following "lookup" commands are operating under.

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

my fix assumes we would have zero sized symbols first in the m_file_addr_to_index map, and the symbols with the same size would be ordered after the ones with zero size. Also, I am assuming when we do:

const FileRangeToIndexMap::Entry *entry = m_file_addr_to_index.FindEntryStartsAt(file_addr);

that it finds the first symbol that starts with "file_addr". If this isn't the case we might need to be able to back up a few symbols to ensure we get consistent results

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

You are assuming here that the symbols have size zero at the time we are performing the lookup. If I understand correctly what is going on, the problem here is that the code munging the symbol table (InitAddressIndexes), will set these symbols to have non-zero size. This is what this patch is trying to avoid.

The reason we are fiddling with the size of the symbols is because there are valid instances of symbols not having a size (usually coming from hand-written assembly, where one just doesn't bother to add the .size directive). However, it certainly seems like we shouldn't be doing that if there is another symbol at the same address, and this symbol has the size set correctly...

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

You are assuming here that the symbols have size zero at the time we are performing the lookup. If I understand correctly what is going on, the problem here is that the code munging the symbol table (InitAddressIndexes), will set these symbols to have non-zero size. This is what this patch is trying to avoid.

I am saying to leave symbols with zero size as is _if_ there is another symbol that does have a valid size with the same start address.

The reason we are fiddling with the size of the symbols is because there are valid instances of symbols not having a size (usually coming from hand-written assembly, where one just doesn't bother to add the .size directive). However, it certainly seems like we shouldn't be doing that if there is another symbol at the same address, and this symbol has the size set correctly...

Exactly. The best solution in my mind is:

  • leave all symbols with sizes as is
  • only set a symbol size if there is no other symbol at that address that didn't originally have a size
  • maybe leave zero sizes symbols out of the lookup map if they have no sizes after doing the two steps above
jankratochvil marked 4 inline comments as done.Jun 27 2019, 12:25 PM

Why was it necessary to change the iteration order here?

No, it was an accidental leftover.

Thanks for the review.

jankratochvil added inline comments.Jun 27 2019, 12:25 PM
lldb/lit/SymbolFile/sizeless-symbol.test
1

Yes, you are right.

4

I only hope the image dump symtab output ordering is stable.

jankratochvil marked 2 inline comments as done.

I am saying to leave symbols with zero size as is _if_ there is another symbol that does have a valid size with the same start address.

This is what this patch does.

Exactly. The best solution in my mind is:

  • leave all symbols with sizes as is

This is what this patch does.

  • only set a symbol size if there is no other symbol at that address that didn't originally have a size

This is what this patch does.

  • maybe leave zero sizes symbols out of the lookup map if they have no sizes after doing the two steps above

I did not change this, IIUC it is already solved in existing code because InitAddressIndexes will expand any such zero-sized symbols first and it already uses the sized symbol in preference to the sizeless one during lookup.

labath accepted this revision.Jul 1 2019, 2:11 AM

Looks good to me, but I'd suggest waiting a while to give @clayborg a chance to respond...

lldb/lit/SymbolFile/sizeless-symbol.test
4

I'm pretty sure these are printed in the same order as present in .symtab. I'd be more worried about the order in which the symbols get emitted into the symtab in the first place. Theoretically this might change due to some patches to the assembler, but it should not change spuriously...

11

Nit: Please line up this table by inserting a couple of spaces between ":" and "Index".

lldb/source/Symbol/Symtab.cpp
906

Please clang-format this.

This revision is now accepted and ready to land.Jul 1 2019, 2:11 AM
jankratochvil marked 5 inline comments as done.Jul 1 2019, 3:05 AM
jankratochvil added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
11

Great, I did not expect it will still match with the added whitespaces.

lldb/source/Symbol/Symtab.cpp
906

Sorry I forgot to run the clang-format.

jankratochvil marked 2 inline comments as done.
clayborg accepted this revision.Jul 1 2019, 7:08 AM
clayborg added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
6

Symbols are ordered in the same order as they appear in the symbol table. On Darwin we coalesce symbols, so not all symbol are always in the symbol table, but we don't do this for ELF.

jankratochvil marked 2 inline comments as done.Jul 1 2019, 7:30 AM
jankratochvil added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
6

OK, great; thanks for the review.

This revision was automatically updated to reflect the committed changes.
jankratochvil marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2019, 7:32 AM
omjavaid reopened this revision.Nov 8 2019, 6:11 AM
omjavaid added a subscriber: omjavaid.

Hi Jan,

This change has introduced a regression in 32bit arm linux testsuite with around 60 new test failures.

Here are list of tests which are failing with this change. Mostly tests are failing when an expression is being evaluated. I think we should immediately revert this change from 9.x branch as well as master.

It.py

lldb-api :: commands/expression/argument_passing_restrictions/TestArgumentPassingRestrictions.py
lldb-api :: commands/expression/call-overridden-method/TestCallOverriddenMethod.py
lldb-api :: commands/expression/call-restarts/TestCallThatRestarts.py
lldb-api :: commands/expression/char/TestExprsChar.py
lldb-api :: commands/expression/class_template_specialization_empty_pack/TestClassTemplateSpecializationParametersHandling.py
lldb-api :: commands/expression/context-object/TestContextObject.py
lldb-api :: commands/expression/dont_allow_jit/TestAllowJIT.py
lldb-api :: commands/expression/expr-in-syscall/TestExpressionInSyscall.py
lldb-api :: commands/expression/inline-namespace/TestInlineNamespace.py
lldb-api :: commands/expression/no-deadlock/TestExprDoesntBlock.py
lldb-api :: commands/expression/persistent_types/TestNestedPersistentTypes.py
lldb-api :: commands/expression/persistent_types/TestPersistentTypes.py
lldb-api :: commands/expression/pr35310/TestExprsBug35310.py
lldb-api :: commands/expression/radar_9531204/TestPrintfAfterUp.py
lldb-api :: commands/expression/radar_9673664/TestExprHelpExamples.py
lldb-api :: commands/expression/rdar44436068/Test128BitsInteger.py
lldb-api :: commands/expression/save_jit_objects/TestSaveJITObjects.py
lldb-api :: commands/expression/static-initializers/TestStaticInitializers.py
lldb-api :: commands/expression/test/TestExprs.py
lldb-api :: commands/expression/timeout/TestCallWithTimeout.py
lldb-api :: commands/expression/unwind_expression/TestUnwindExpression.py
lldb-api :: commands/expression/xvalue/TestXValuePrinting.py
lldb-api :: commands/register/register/register_command/TestRegisters.py
lldb-api :: commands/source/info/TestSourceInfo.py
lldb-api :: functionalities/breakpoint/require_hw_breakpoints/TestRequireHWBreakpoints.py
lldb-api :: functionalities/data-formatter/data-formatter-cpp/TestDataFormatterCpp.py
lldb-api :: functionalities/data-formatter/data-formatter-smart-array/TestDataFormatterSmartArray.py
lldb-api :: functionalities/data-formatter/data-formatter-synthval/TestDataFormatterSynthVal.py
lldb-api :: functionalities/load_unload/TestLoadUnload.py
lldb-api :: functionalities/load_using_paths/TestLoadUsingPaths.py
lldb-api :: functionalities/memory/find/TestMemoryFind.py
lldb-api :: functionalities/process_group/TestChangeProcessGroup.py
lldb-api :: functionalities/show_location/TestShowLocationDwarf5.py
lldb-api :: functionalities/signal/handle-abrt/TestHandleAbort.py
lldb-api :: functionalities/thread/num_threads/TestNumThreads.py
lldb-api :: lang/c/anonymous/TestAnonymous.py
lldb-api :: lang/c/bitfields/TestBitfields.py
lldb-api :: lang/c/enum_types/TestEnumTypes.py
lldb-api :: lang/c/function_types/TestFunctionTypes.py
lldb-api :: lang/c/shared_lib/TestSharedLib.py
lldb-api :: lang/c/strings/TestCStrings.py
lldb-api :: lang/c/struct_types/TestStructTypes.py
lldb-api :: lang/cpp/auto/TestCPPAuto.py
lldb-api :: lang/cpp/call-function/TestCallCPPFunction.py
lldb-api :: lang/cpp/chained-calls/TestCppChainedCalls.py
lldb-api :: lang/cpp/class-template-parameter-pack/TestClassTemplateParameterPack.py
lldb-api :: lang/cpp/extern_c/TestExternCSymbols.py
lldb-api :: lang/cpp/global_operators/TestCppGlobalOperators.py
lldb-api :: lang/cpp/lambdas/TestLambdas.py
lldb-api :: lang/cpp/llvm-style/TestLLVMStyle.py
lldb-api :: lang/cpp/namespace/TestNamespace.py
lldb-api :: lang/cpp/namespace/TestNamespaceLookup.py
lldb-api :: lang/cpp/namespace_conflicts/TestNamespaceConflicts.py
lldb-api :: lang/cpp/namespace_definitions/TestNamespaceDefinitions.py
lldb-api :: lang/cpp/operators/TestCppOperators.py
lldb-api :: lang/cpp/overloaded-functions/TestOverloadedFunctions.py
lldb-api :: lang/cpp/rvalue-references/TestRvalueReferences.py
lldb-api :: lang/cpp/scope/TestCppScope.py
lldb-api :: lang/cpp/static_methods/TestCPPStaticMethods.py
lldb-api :: lang/cpp/symbols/TestSymbols.py
lldb-api :: lang/cpp/template-function/TestTemplateFunctions.py
lldb-api :: lang/cpp/template/TestTemplateArgs.py
lldb-api :: lang/cpp/this/TestCPPThis.py
lldb-api :: lang/cpp/trivial_abi/TestTrivialABI.py
lldb-api :: lang/cpp/unicode-literals/TestUnicodeLiterals.py
lldb-api :: lang/cpp/virtual/TestVirtual.py
lldb-api :: python_api/sbvalue_persist/TestSBValuePersist.py
lldb-api :: python_api/thread/TestThreadAPI.py
lldb-api :: python_api/value/TestValueAPI.py
This revision is now accepted and ready to land.Nov 8 2019, 6:11 AM
omjavaid requested changes to this revision.Nov 8 2019, 6:11 AM
This revision now requires changes to proceed.Nov 8 2019, 6:11 AM

I am going to investigate it this week. Feel free to revert it but some such patch is definitely needed for all recent Fedoras, RHEL-8 and CentOS-8 (due to annobin on all those).

This revision was not accepted when it landed; it landed in state Needs Revision.Nov 12 2019, 6:05 AM
This revision was automatically updated to reflect the committed changes.
omjavaid reopened this revision.Nov 12 2019, 6:06 AM

@omjavaid I do not have the regression reproducible, could you provide more info?
Having on both hosts trunk 54a9b4c02ff57e9847e0c501578e51db6f73d3be having applied your D69904, having reverted your revert (=reapplied) of this my D63540.
Running on Fedora 30 armv7l:

ssh -L 1234:localhost:1234 -L 1235:localhost:1235 arm03-packager00.cloud.fedoraproject.org
~/redhat/llvm-monorepo-clangassert/bin/lldb-server platform --listen "*:1234" --server --min-gdbserver-port 1235 --max-gdbserver-port 1236

On Fedora 30 x86_64:
./bin/lldb-dotest --arch=arm --platform-name=remote-linux --platform-url=connect://localhost:1234 -f TestSourceInfo -v -t

UNSUPPORTED: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-10-arm) :: test_dsym (lldbsuite.test.lldbtest.TestSourceInfo) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-10-arm) :: test_dwarf (lldbsuite.test.lldbtest.TestSourceInfo)
PASS: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-10-arm) :: test_dwo (lldbsuite.test.lldbtest.TestSourceInfo)
UNSUPPORTED: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-10-arm) :: test_gmodules (lldbsuite.test.lldbtest.TestSourceInfo) (test case does not fall in any category of interest for this run) 
----------------------------------------------------------------------
Ran 4 tests in 0.802s
RESULT: PASSED (2 passes, 0 failures, 0 errors, 2 skipped, 0 expected failures, 0 unexpected successes)

It is still the same (no arm32 regression) with 16bdcc809c72c639a2888b6b859dca88453e3c28 and this patch reapplied.

It is still the same (no arm32 regression) with 16bdcc809c72c639a2888b6b859dca88453e3c28 and this patch reapplied.

Hi Jan,

This issue manifests itself only when lldb test suite is being run on 32-bit arm host. I think we dont run complete testsuite when we test via lldb-dotest with lldb-server cross-compiled for Arm 32-bit.

I am trying to find a reasonable way to reproduce this issue. I ll get back to you with more details.

Thanks!

Getting ld.lld: error: failed to open ../../../../bin/clang-10: Cannot allocate memory when trying to build lldb natively on arm32 (the same error happens both for clang and for lldb). Apparently memory is not a problem but the linker runs out of its 32-bit address space.
I am aware of the cross-compiling lldb possibility but I haven't tried that yet. Do you have some simple instructions how to cross-compile lldb for arm32 on x86_64 host?

@omjavaid In the end I built a native armv7l-unknown-linux-gnueabihf lldb using:

cmake ../llvm-monorepo/llvm/ -DCMAKE_BUILD_TYPE=Release  -DLLVM_ENABLE_PROJECTS="lldb;clang;lld"  -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_BUILD_LLVM_DYLIB

Built it from trunk: 16bdcc809c72c639a2888b6b859dca88453e3c28
But it does not stop even in a trivial breakpoint:

$ echo 'int main(){}'|gcc -x c - -g;./bin/lldb -o 'b main' -o r ./a.out
(lldb) target create "./a.out"
Current executable set to '/home/fedora/jankratochvil/jkratoch/redhat/llvm-monorepo-clangassertdyn/a.out' (arm).
(lldb) b main
Breakpoint 1: where = a.out`main + 12 at <stdin>:1:1, address = 0x000103dc
(lldb) r
Process 12146 exited with status = 0 (0x00000000) 

Process 12146 launched: '/home/fedora/jankratochvil/jkratoch/redhat/llvm-monorepo-clangassertdyn/a.out' (arm)
(lldb) q

Are there some off-trunk patches needed?

kernel-5.2.9-200.fc30.armv7hl

@omjavaid I guess it depends on some system library - what OS vendor/release do you have the regression reproducible on?
Still not reproducible for me. Trying bfbbf0aba81a84da8b53d4d159d080e77ad8ee70 with applied D70155 and applied/unapplied this D63540 on Fedora 30 armv7l and it has no difference in testsuite (except for the added/removed testcase of this D63540).
In both cases I get these:

Unexpected Passing Tests (5):
    lldb-api :: functionalities/archives/TestBSDArchives.py
    lldb-api :: functionalities/inferior-assert/TestInferiorAssert.py
    lldb-api :: functionalities/thread/concurrent_events/TestConcurrentTwoWatchpointsOneSignal.py
    lldb-api :: linux/builtin_trap/TestBuiltinTrap.py
    lldb-api :: linux/thread/create_during_instruction_step/TestCreateDuringInstructionStep.py
Failing Tests (103):
    lldb-api :: commands/expression/argument_passing_restrictions/TestArgumentPassingRestrictions.py
    lldb-api :: commands/expression/call-function/TestCallStdStringFunction.py
    lldb-api :: commands/expression/call-function/TestCallStopAndContinue.py
    lldb-api :: commands/expression/call-function/TestCallUserDefinedFunction.py
    lldb-api :: commands/expression/call-overridden-method/TestCallOverriddenMethod.py
    lldb-api :: commands/expression/call-restarts/TestCallThatRestarts.py
    lldb-api :: commands/expression/char/TestExprsChar.py
    lldb-api :: commands/expression/class_template_specialization_empty_pack/TestClassTemplateSpecializationParametersHandling.py
    lldb-api :: commands/expression/context-object/TestContextObject.py
    lldb-api :: commands/expression/dont_allow_jit/TestAllowJIT.py
    lldb-api :: commands/expression/entry-bp/TestExprEntryBP.py
    lldb-api :: commands/expression/expr-in-syscall/TestExpressionInSyscall.py
    lldb-api :: commands/expression/formatters/TestFormatters.py
    lldb-api :: commands/expression/inline-namespace/TestInlineNamespace.py
    lldb-api :: commands/expression/no-deadlock/TestExprDoesntBlock.py
    lldb-api :: commands/expression/persistent_types/TestNestedPersistentTypes.py
    lldb-api :: commands/expression/persistent_types/TestPersistentTypes.py
    lldb-api :: commands/expression/pr35310/TestExprsBug35310.py
    lldb-api :: commands/expression/radar_9531204/TestPrintfAfterUp.py
    lldb-api :: commands/expression/radar_9673664/TestExprHelpExamples.py
    lldb-api :: commands/expression/rdar44436068/Test128BitsInteger.py
    lldb-api :: commands/expression/static-initializers/TestStaticInitializers.py
    lldb-api :: commands/expression/test/TestExprs.py
    lldb-api :: commands/expression/timeout/TestCallWithTimeout.py
    lldb-api :: commands/expression/unwind_expression/TestUnwindExpression.py
    lldb-api :: commands/expression/xvalue/TestXValuePrinting.py
    lldb-api :: commands/register/register/register_command/TestRegisters.py
    lldb-api :: commands/watchpoints/hello_watchlocation/TestWatchLocation.py
    lldb-api :: commands/watchpoints/multi_watchpoint_slots/TestWatchpointMultipleSlots.py
    lldb-api :: commands/watchpoints/watchpoint_size/TestWatchpointSizes.py
    lldb-api :: functionalities/breakpoint/require_hw_breakpoints/TestRequireHWBreakpoints.py
    lldb-api :: functionalities/data-formatter/data-formatter-cpp/TestDataFormatterCpp.py
    lldb-api :: functionalities/data-formatter/data-formatter-synthval/TestDataFormatterSynthVal.py
    lldb-api :: functionalities/load_unload/TestLoadUnload.py
    lldb-api :: functionalities/load_using_paths/TestLoadUsingPaths.py
    lldb-api :: functionalities/memory/find/TestMemoryFind.py
    lldb-api :: functionalities/process_group/TestChangeProcessGroup.py
    lldb-api :: functionalities/return-value/TestReturnValue.py
    lldb-api :: functionalities/show_location/TestShowLocationDwarf5.py
    lldb-api :: functionalities/thread/num_threads/TestNumThreads.py
    lldb-api :: lang/c/bitfields/TestBitfields.py
    lldb-api :: lang/c/function_types/TestFunctionTypes.py
    lldb-api :: lang/c/shared_lib/TestSharedLib.py
    lldb-api :: lang/c/strings/TestCStrings.py
    lldb-api :: lang/c/struct_types/TestStructTypes.py
    lldb-api :: lang/cpp/auto/TestCPPAuto.py
    lldb-api :: lang/cpp/call-function/TestCallCPPFunction.py
    lldb-api :: lang/cpp/chained-calls/TestCppChainedCalls.py
    lldb-api :: lang/cpp/class-template-parameter-pack/TestClassTemplateParameterPack.py
    lldb-api :: lang/cpp/extern_c/TestExternCSymbols.py
    lldb-api :: lang/cpp/global_operators/TestCppGlobalOperators.py
    lldb-api :: lang/cpp/lambdas/TestLambdas.py
    lldb-api :: lang/cpp/llvm-style/TestLLVMStyle.py
    lldb-api :: lang/cpp/namespace/TestNamespace.py
    lldb-api :: lang/cpp/namespace/TestNamespaceLookup.py
    lldb-api :: lang/cpp/namespace_conflicts/TestNamespaceConflicts.py
    lldb-api :: lang/cpp/namespace_definitions/TestNamespaceDefinitions.py
    lldb-api :: lang/cpp/operators/TestCppOperators.py
    lldb-api :: lang/cpp/overloaded-functions/TestOverloadedFunctions.py
    lldb-api :: lang/cpp/rvalue-references/TestRvalueReferences.py
    lldb-api :: lang/cpp/scope/TestCppScope.py
    lldb-api :: lang/cpp/static_methods/TestCPPStaticMethods.py
    lldb-api :: lang/cpp/symbols/TestSymbols.py
    lldb-api :: lang/cpp/template-function/TestTemplateFunctions.py
    lldb-api :: lang/cpp/template/TestTemplateArgs.py
    lldb-api :: lang/cpp/this/TestCPPThis.py
    lldb-api :: lang/cpp/trivial_abi/TestTrivialABI.py
    lldb-api :: lang/cpp/unicode-literals/TestUnicodeLiterals.py
    lldb-api :: lang/cpp/virtual/TestVirtual.py
    lldb-api :: python_api/thread/TestThreadAPI.py
    lldb-api :: tools/lldb-server/TestLldbGdbServer.py
    lldb-shell :: ObjectFile/ELF/PT_LOAD-overlap-PT_INTERP.yaml
    lldb-shell :: ObjectFile/ELF/PT_LOAD-overlap-PT_TLS.yaml
    lldb-shell :: ObjectFile/ELF/PT_LOAD-overlap-section.yaml
    lldb-shell :: ObjectFile/ELF/PT_LOAD.yaml
    lldb-shell :: ObjectFile/ELF/PT_TLS-overlap-PT_LOAD.yaml
    lldb-shell :: ObjectFile/ELF/minidebuginfo-set-and-hit-breakpoint.test
    lldb-shell :: Reproducer/Functionalities/TestDataFormatter.test
    lldb-shell :: Reproducer/Functionalities/TestImageList.test
    lldb-shell :: Reproducer/Functionalities/TestStepping.test
    lldb-shell :: Reproducer/TestDump.test
    lldb-shell :: Reproducer/TestGDBRemoteRepro.test
    lldb-shell :: Reproducer/TestRelativePath.test
    lldb-shell :: Reproducer/TestReuseDirectory.test
    lldb-shell :: Reproducer/TestSynchronous.test
    lldb-shell :: Reproducer/TestWorkingDir.test
    lldb-shell :: SymbolFile/DWARF/anon_class_w_and_wo_export_symbols.ll
    lldb-shell :: SymbolFile/DWARF/clang-ast-from-dwarf-unamed-and-anon-structs.cpp
    lldb-shell :: SymbolFile/DWARF/debug-types-expressions.test
    lldb-shell :: SymbolFile/NativePDB/ast-functions.cpp
    lldb-shell :: SymbolFile/NativePDB/ast-methods.cpp
    lldb-shell :: SymbolFile/NativePDB/ast-types.cpp
    lldb-shell :: SymbolFile/NativePDB/bitfields.cpp
    lldb-shell :: SymbolFile/NativePDB/break-by-function.cpp
    lldb-shell :: SymbolFile/NativePDB/break-by-line.cpp
    lldb-shell :: SymbolFile/NativePDB/function-types-builtins.cpp
    lldb-shell :: SymbolFile/NativePDB/function-types-classes.cpp
    lldb-shell :: SymbolFile/NativePDB/global-classes.cpp
    lldb-shell :: SymbolFile/NativePDB/globals-bss.cpp
    lldb-shell :: SymbolFile/NativePDB/globals-fundamental.cpp
    lldb-shell :: SymbolFile/NativePDB/nested-types.cpp
    lldb-shell :: SymbolFile/NativePDB/source-list.cpp
    lldb-shell :: SymbolFile/NativePDB/tag-types.cpp

Hi Jan,

I have done further investigation on this with new ampere arm server. This problem manifests itself on all kind of Debian flavors and is not related to underlying libraries. I have tested LLDB on Arm 32bit host with Ubuntu Xenial, Bionic and Debian Buster all show same problem.

Moreover this problem seems to be emerging from clang expression parser generating symbols while evaluating certain types of expressions. These symbols are rejected by expression interpreter. LLDB symtab classes are utilized for expressions as well so it is a highly likely problem so I would suggest you to investigate further on these lines

For example:
./bin/lldb-dotest -p TestVirtual.py -v -t

test_virtual_madness_dwarf (TestVirtual.CppVirtualMadness) fails with following error

runCmd: expression a_as_A->a()
runCmd failed!
error: Can't run the expression locally: Interpreter doesn't handle one of the expression's opcodes

Hi Jan,

I have done further investigation on this with new ampere arm server. This problem manifests itself on all kind of Debian flavors and is not related to underlying libraries. I have tested LLDB on Arm 32bit host with Ubuntu Xenial, Bionic and Debian Buster all show same problem.

Moreover this problem seems to be emerging from clang expression parser generating symbols while evaluating certain types of expressions. These symbols are rejected by expression interpreter. LLDB symtab classes are utilized for expressions as well so it is a highly likely problem so I would suggest you to investigate further on these lines

For example:
./bin/lldb-dotest -p TestVirtual.py -v -t

test_virtual_madness_dwarf (TestVirtual.CppVirtualMadness) fails with following error

runCmd: expression a_as_A->a()
runCmd failed!
error: Can't run the expression locally: Interpreter doesn't handle one of the expression's opcodes

Hi Omair, is it possible that Process::CanJIT() is false? I'm not super experienced with the expression engine in lldb, but this sounds like the llvm IR interpretation can't handle the expression (expected, in this case) and lldb isn't able to JIT the IR into code and run it.

The one thing that IR interpreter can't do is call external functions. When I'm trying to debug something like this, I'll go as basic as possible -- something like "p (int)isdigit('1')" which requires that the expression be jitted, and debug from there. Maybe this is specific to this C++ test case, but is it possible that all function calls are failing?

FYI I am investigating it. I have found 2019-09-26-raspbian-buster.zip really can run in chroot on Fedora 31 armv7hl so I should be able to reproduce it. It is building LLDB now, that takes about 6 hours.

@omjavaid FYI the JIT error happens for me on 2019-09-26-raspbian-buster even without this my patch. I will try to debug that.

@omjavaid Do you still see any regression of this patch after D71498?

@jankratochvil I have verified both D71498 and D71514. They do not fix the issues introduced by D63540.

jankratochvil planned changes to this revision.Dec 18 2019, 9:35 AM
jankratochvil updated this revision to Diff 234672.EditedDec 19 2019, 12:39 AM

Changing the size of symbols turned out to be too invasive. Let's keep it intact.
Rather choose the best symbols from those which have the same address range.
Currently LLDB chooses randomly any symbol from those so make it more deterministic. That should have no negative effect.
The STB_WEAK support there is not really needed, I was just surprised weak symbols are not supported for ELF - they were supported only for Mach-O (N_WEAK_REF).

jankratochvil retitled this revision from Fix lookup of symbols at the same address with no size vs. size to Fix lookup of symbols with the same address range but different binding.Dec 19 2019, 12:41 AM

Defining some sort of a preference based on symbol type seems like a good idea, but I don't think this is a good way to implement it. If I read this patch correctly, then this for example means that the "less global" symbols will not be reported through the Symtab::ForEachSymbolContainingFileAddress API, which seems like a bad thing.

I'm also not happy that this is supposed to be a replacement for the size setting patch, as I believe (and I think we've agreed on that while reviewing the original patch) that *not* fiddling with the sizes of those symbols is a good thing. I don't think reverting a patch that has been sitting in the tree for several months because it "breaks expression evaluation" is a good idea, when that patch has nothing to do with expression evaluation, the failure only happens on a target which has a lot of failures anyway, and the failure only happens on some machine that the original author does not have access to. (I should've said this a while ago, but I was hoping that this problem would be resolved easily..)

Anyway, I think @jankratochvil has done far more than could be expected of any committer to diagnose that problem, and I think it should be up to @omjavaid to explain out how the expression evaluation failures relate to sizeless symbols. At that point, we can re-evaluate whether our decision to *not* fiddle with size of these symbols was correct.

lldb/test/Shell/SymbolFile/Inputs/symbol-binding.s
10–11 ↗(On Diff #234672)

I found these names pretty confusing. I'd suggest something like case1_local+case1_global, case2_local+case2_weak, etc.

jankratochvil planned changes to this revision.Dec 19 2019, 3:50 AM

then this for example means that the "less global" symbols will not be reported through the Symtab::ForEachSymbolContainingFileAddress API, which seems like a bad thing.

I agree, I was not aware of this function. Still sure the prioritization could be moved to Symtab::FindSymbolContainingFileAddress.

I'm also not happy that this is supposed to be a replacement for the size setting patch, as I believe (and I think we've agreed on that while reviewing the original patch) that *not* fiddling with the sizes of those symbols is a good thing. I don't think reverting a patch that has been sitting in the tree for several months because it "breaks expression evaluation" is a good idea,

OK, I will try to recheck how to fix the arm32 issue with the previous patch.

@jankratochvil @labath

Some new information below:

ProcessGDBRemote::DoAllocateMemory fails when it is called for expression evaluation preparation. This happens when both lldb and lldb-server are running on 32-bit arm host (various distros verified). If we debug using lldb-server running remotely and lldb is hosted locally on an x86-64 machine then problem does not appear.

Further more this does appear to be related to symbols size zeroing as failure happens due to failing InferiorCallMmap resulting in failure of ProcessGDBRemote::DoAllocateMemory. I am not aware of the exact problem but size zeroing somewhere causes corruption in thread plan prepared for expression eval with mmap.

I have been trying to debug this issue but actual hardware does not have enough resources to support a debug symbol build and all I have is logging and thats why have not reached a conclusion so far. But I am willing to further collaborate to get this issue fixed as it is important for lldb on arm 32 and could be a problem on other architectures which are not being tested actively right now.

Below is a log for your reference below if you can point out anything that I can go look out for specifically. PS: your might see some extra log messages which i have added for debugging here n there to see actual point of failure.

Correct - log ) https://pastebin.ubuntu.com/p/8Z4wB5W6fM/

Failing - log ) https://pastebin.ubuntu.com/p/g6ngkqS7p3/

Thank you very much Omair. This is very useful information, and it does explain a few thing (while raising other questions). The thing that would help now is seeing the difference in the symbols and their properties in the area around the mmap function. Comparing the output of "image dump symtab lib.so" before and after the change could provide interesting clues.

Another thing which I'd be interested in seeing is the gdb-remote packet log. From the existing logs, it's clear that the mmap call does not complete due to a signal. However, observing how lldb actually sets up the call could tell us a lot about what went wrong (my current bet is on confusing the arm/thumb instruction set).

Some new observations:

With this approach of not setting size of a non-last zero-sized symbol we somehow mange to zero out size of init (_init symbol) and .text (_start symbol).

To build and evaluate a.out echo -e '#include <unistd.h>\nint main(void){\nsync();return 0;}'|./bin/clang -g -x c -;./bin/lldb -o 'file ./a.out' -o 'b main' -o r -o 'p (void)sync()'

Please see objdump of a.out here https://pastebin.ubuntu.com/p/N5krm5TWFV/

Please see diff of symtab dump here: https://pastebin.ubuntu.com/p/zwmC9cVTGc/

symtab dump of a.out here with above patch applied: https://pastebin.ubuntu.com/p/Fs4yyG82s5/
symtab dump of a.out here without above patch applied: https://pastebin.ubuntu.com/p/TCKXVjnGsH/

The difference (with the previous size-non-setting patch) is:

-lldb             <  14> send packet: $Z0,102f0,2#
+lldb             <  14> send packet: $Z0,102f0,4#

That is because GetAddressClass fails to recognized 0x102f0 as a code address:

PASS:
(lldb) p (void)sync()
GetAddressClass:0x102f1
GetAddressClass:0x102f1=(null) ValueIsAddress=1 section_type=1
GetAddressClass:0x96040
GetAddressClass:0x96040=__mmap ValueIsAddress=1 section_type=1
GetAddressClass:0x102f1
GetAddressClass:0x102f1=(null) ValueIsAddress=1 section_type=1
GetAddressClass:0x102f0
GetAddressClass:0x102f0=(null) ValueIsAddress=1 section_type=1
...

FAIL:
(lldb) p (void)sync()
GetAddressClass:0x102f1
GetAddressClass:0x102f1=_start ValueIsAddress=1 section_type=1
GetAddressClass:0x96040
GetAddressClass:0x96040=__mmap ValueIsAddress=1 section_type=1
GetAddressClass:0x102f1
GetAddressClass:0x102f1=_start ValueIsAddress=1 section_type=1
GetAddressClass:0x102f0
...

That is due to:

symtab.fail:[   11]     12     Invalid         0x00000000000102f0                    0x0000000000000000 0x00000003 
symtab.fail:[   66]     99   X Code            0x00000000000102f0                    0x0000000000000030 0x00000012 _start
symtab.pass:[   11]     12     Invalid         0x00000000000102f0                    0x0000000000000030 0x00000003 
symtab.pass:[   66]     99   X Code            0x00000000000102f0                    0x0000000000000030 0x00000012 _start

The difference is in the 'Invalid' symbol which is:

Num:    Value  Size Type    Bind   Vis      Ndx Name
 12: 000102f0     0 SECTION LOCAL  DEFAULT   12

Apparently ARM32 relies on that section symbol to have proper size. I do not see how Symtab::InitAddressIndexes could handle STT_SECTION in a special way as that is ELF-type specific Symbol characteristics:

uint32_t m_flags; // A copy of the flags from the original symbol table, the
                  // ObjectFile plug-in can interpret these

I did not debug it more as I think LLDB should have the symbol binding type preference anyway and then this problem disappears and we can keep status quo of this symbols size issue.

jankratochvil marked an inline comment as done.Dec 29 2019, 12:04 PM

Defining some sort of a preference based on symbol type seems like a good idea, but I don't think this is a good way to implement it. If I read this patch correctly, then this for example means that the "less global" symbols will not be reported through the Symtab::ForEachSymbolContainingFileAddress API, which seems like a bad thing.

Is this update OK now? Thanks for the review.

I'm also not happy that this is supposed to be a replacement for the size setting patch, as I believe (and I think we've agreed on that while reviewing the original patch) that *not* fiddling with the sizes of those symbols is a good thing.

I agree although I think this new "bindings priority" patch is also a good thing on its own and it is easier to implement. IMHO we can then continue with the "sizeless" patch.

labath added a comment.Jan 6 2020, 8:15 AM

Thanks Jan and Omair for the investigation. It does seem like there is some more work to be done on arm/thumb in elf code in lldb, as it seems to me that things are working now more-or-less accidentally.

The good news is that this should not require any fundamental changes since the address class determination happens in ObjectFile::GetAddressClass, we'd probably just need to tweak it's ELF implementation somehow (not fully clear how yet).

But anyway, I do agree that this can be done seperately from the binding prioritization change. As for the change itself, I think we're getting closer, but I am still not convinced by the extra RangeMap::Sort overload. Passing different sort objects to different sort invocations could cause changes done by one Sort call to be undone by further Symtab additions (and/or trigger assertions in the IsSorted function). It seems to me that the sort order should be a property of the RangeMap object, set when the object is created, similar to how std::map and friends do this thing...

jankratochvil planned changes to this revision.Jan 6 2020, 8:37 AM

Passing different sort objects to different sort invocations could cause changes done by one Sort call to be undone by further Symtab additions (and/or trigger assertions in the IsSorted function). It seems to me that the sort order should be a property of the RangeMap object, set when the object is created, similar to how std::map and friends do this thing...

I did not like it much myself but I did not realize when it could really cause a problem, thanks for the example. I will try to find some way.

Found some way.

labath added a comment.Jan 8 2020, 2:53 AM

I think this is pretty close. The two things I'd change are:

  • make the RangeMap constructor take a const Compare & instead of a template pack. The std containers do the same, and I don't see a reason to diverge..
  • make Compare operate only on the "data" field of the map. The RangeMap methods assume the entries are sorted in a certain way, so changing that in the comparator does not seem useful. The std::sort call can take a lambda or something to first compare the ranges and then the data fields...

It might also be nice to split this off into a separate patch with a unit test and all...

jankratochvil planned changes to this revision.Jan 8 2020, 3:02 AM
  • make the RangeMap constructor take a const Compare & instead of a template pack. The std containers do the same, and I don't see a reason to diverge..

Done. I have mistaken it as optimization but that does not apply in this case.

  • make Compare operate only on the "data" field of the map. The RangeMap methods assume the entries are sorted in a certain way, so changing that in the comparator does not seem useful. The std::sort call can take a lambda or something to first compare the ranges and then the data fields...

It might also be nice to split this off into a separate patch with a unit test and all...

Done in D72460.

This patch is now on top of D72460.

omjavaid accepted this revision.Jan 12 2020, 4:36 PM

This doesnt cause any regressions on 32-bit arm testsuite. Tested on Ubuntu Xenial and Bionic 32bit arm host docker container. I hope this fixes all the redhat issues as well. Thank you so much Jan for all the effort on this and Thanks to Pavel for the reviews and assistance.

This revision is now accepted and ready to land.Jan 12 2020, 4:36 PM
labath accepted this revision.Jan 13 2020, 2:26 AM

This looks great now. Thanks for persisting.

lldb/include/lldb/Symbol/Symtab.h
159–161 ↗(On Diff #237103)
jankratochvil marked 2 inline comments as done.Jan 13 2020, 3:04 AM
jankratochvil added inline comments.
lldb/include/lldb/Symbol/Symtab.h
159–161 ↗(On Diff #237103)

Thanks for catching it. It is a leftover from val = X of previous version of the patch.

This revision was automatically updated to reflect the committed changes.
jankratochvil marked an inline comment as done.