This is an archive of the discontinued LLVM Phabricator instance.

Fix "process load/unload" on android
ClosedPublic

Authored by tberghammer on Jul 23 2015, 10:07 AM.

Details

Summary

Fix "process load/unload" on android

On android the symbols exposed by libdl (dlopen, dlclose, dlerror)
prefixed by "__dl_". This change moves the handling of process
load/unload to the platform object and override it for android to
handle the special prefix.

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Fix "process load/unload" on android.
tberghammer updated this object.
tberghammer added a reviewer: sivachandra.
tberghammer added a subscriber: lldb-commits.
srhines added inline comments.Jul 23 2015, 2:10 PM
include/lldb/Target/Platform.h
872 ↗(On Diff #30500)

Some platforms prefix the ... Let those platforms specify ...

Improve comment based on review

tberghammer marked an inline comment as done.Jul 24 2015, 7:23 AM
sivachandra edited edge metadata.Jul 27 2015, 9:42 AM

I am not really familiar with this area. OK from my side, but you might want clayborg to take a look at this.

clayborg requested changes to this revision.Jul 27 2015, 11:32 AM
clayborg edited edge metadata.

We should add support to lldb::SBPlatform and lldb_private::Platform where we get the shared library extension from the platform and also fix the logic as noted in the inlined comments.

test/functionalities/load_unload/TestLoadUnload.py
201–204 ↗(On Diff #30573)

It would be great to add something to lldb::SBPlatform and lldb_private::Platform that allows us to get the shared library extension from the platform:

class SBPlatform
{
    const char *
    GetSharedLibraryExtension();
};

Then this code can become:

dylibName = "libloadunload" + platform.GetSharedLibraryExtension()
206–209 ↗(On Diff #30573)

This code should be:

if lldb.remote_platform:
    dylibPath = os.path.join(shlib_dir, dylibName)
else:
    dylibPath = dylibName
This revision now requires changes to proceed.Jul 27 2015, 11:32 AM

The other alternative is to have lldb_private::Platform have a new function:

class Platform
{
    virtual uint32_t
    LoadImage (lldb_private::Process *process, const FileSpec &image_spec, Error &error);

    virtual Error
    UnLoadImage (lldb_private::Process *process, uint32_t image_token);
}

Then we move the current code from Process::LoadImage() and Process::UnLoadImage() into PlatformPOSIX, and have new versions for PlatformAndroid.

The second platform route would probably be better because it would allow for Windows to load/unload images if we ever move to lldb-server on windows.

sivachandra resigned from this revision.Oct 19 2015, 10:34 AM
sivachandra removed a reviewer: sivachandra.
tberghammer updated this revision to Diff 41490.Dec 1 2015, 4:31 AM
tberghammer updated this object.
tberghammer edited edge metadata.

Address review comments

tberghammer updated this revision to Diff 41491.Dec 1 2015, 4:33 AM
tberghammer edited edge metadata.
tberghammer added inline comments.
test/functionalities/load_unload/TestLoadUnload.py
201–204 ↗(On Diff #30573)

It is a fairly unrelated change. Will create a separate CL for it

206–209 ↗(On Diff #30573)

Done

clayborg accepted this revision.Dec 1 2015, 9:53 AM
clayborg edited edge metadata.

Looks fine.

This revision is now accepted and ready to land.Dec 1 2015, 9:53 AM
This revision was automatically updated to reflect the committed changes.

It seems like an awful lot of logic is duplicated between the POSIX & Android versions of LoadImage when all that really changes is the name of the function you are calling. Is it possible to centralize this some more?

labath added a subscriber: labath.Dec 2 2015, 10:40 AM

It seems like an awful lot of logic is duplicated between the POSIX & Android versions of LoadImage when all that really changes is the name of the function you are calling. Is it possible to centralize this some more?

We've been discussing this offline. I believe Tamas is working on resolving that.

It seems like an awful lot of logic is duplicated between the POSIX & Android versions of LoadImage when all that really changes is the name of the function you are calling. Is it possible to centralize this some more?

We've been discussing this offline. I believe Tamas is working on resolving that.

I uploaded D15183 to fix the issue

nitesh.jain added inline comments.
lldb/trunk/source/Plugins/Platform/Android/PlatformAndroid.cpp
389

Hi Tamas,

When I run "process load libloadunload_a.so --install" its fail with error Couldn't lookup symbols: dl_dlerror, dl_dlopen.

Symbol table contains 13 entries:

Num:    Value  Size Type    Bind   Vis      Ndx Name
  0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 00000304     5 FUNC    GLOBAL DEFAULT    5 android_dlopen_ext
  2: 00000309     5 FUNC    GLOBAL DEFAULT    5 android_get_LD_LIBRARY_PA
  3: 0000030e     5 FUNC    GLOBAL DEFAULT    5 android_update_LD_LIBRARY
  4: 00000313     5 FUNC    GLOBAL DEFAULT    5 dl_iterate_phdr
  5: 00000318     5 FUNC    GLOBAL DEFAULT    5 **dladdr**
  6: 0000031d     5 FUNC    GLOBAL DEFAULT    5 **dlclose**
  7: 00000322     5 FUNC    GLOBAL DEFAULT    5 **dlerror**
  8: 00000327     5 FUNC    GLOBAL DEFAULT    5 **dlopen**
  9: 0000032c     5 FUNC    GLOBAL DEFAULT    5 dlsym
 10: 00002000     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
 11: 00002000     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
 12: 00002000     0 NOTYPE  GLOBAL DEFAULT  ABS _end

The symbols are not prefix with "__dl_" . We are using SDK version 25 for MIP64r6 target.

generic_mips64:/ # getprop ro.build.version.sdk
25

Even we are not seeing prefix "__dl_" added to X86-64 dynamic symbol
Symbol table '.dynsym' contains 12 entries:

Num:    Value          Size Type    Bind   Vis      Ndx Name
  0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 0000000000000442     6 FUNC    GLOBAL DEFAULT    7 dl_iterate_phdr@@LIBC
  2: 000000000000043c     6 FUNC    GLOBAL DEFAULT    7 android_dlopen_ext@@LIBC
  3: 000000000000044e     6 FUNC    GLOBAL DEFAULT    7 dlclose@@LIBC
  4: 0000000000000448     6 FUNC    GLOBAL DEFAULT    7 dladdr@@LIBC
  5: 000000000000045a     6 FUNC    GLOBAL DEFAULT    7 dlopen@@LIBC
  6: 0000000000000454     6 FUNC    GLOBAL DEFAULT    7 dlerror@@LIBC
  7: 0000000000000460     6 FUNC    GLOBAL DEFAULT    7 dlsym@@LIBC
  8: 0000000000002000     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
  9: 0000000000002000     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
 10: 0000000000002000     0 NOTYPE  GLOBAL DEFAULT  ABS _end
 11: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBC

Symbol table '.symtab' contains 14 entries:

Num:    Value          Size Type    Bind   Vis      Ndx Name
  0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS tmp-platform.c
  2: 0000000000001ee0   288 OBJECT  LOCAL  HIDDEN    10 _DYNAMIC
  3: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBC
  4: 0000000000000442     6 FUNC    GLOBAL DEFAULT    7 dl_iterate_phdr
  5: 000000000000043c     6 FUNC    GLOBAL DEFAULT    7 android_dlopen_ext
  6: 000000000000044e     6 FUNC    GLOBAL DEFAULT    7 dlclose
  7: 0000000000000448     6 FUNC    GLOBAL DEFAULT    7 dladdr
  8: 000000000000045a     6 FUNC    GLOBAL DEFAULT    7 dlopen

Hi Nitesh,

The way dlopen, dlclose and dl_phdr_iterate is implemented on android is pretty strange until (and including) SDK 25. The trick is that these functions are implemented inside "/system/bin/linker" with actual function names prefixed with "__dl_". The linker is doing some magic (creating a fake shared library entry at startup) to resolve these symbols when a library is loaded. This magic isn't understand by LLDB so it tries to lookup the symbols in the linker binary itself based on there actual function names. The actual libdl.so file located on the device isn't used for anything (it is there just for some compatibility reasons).

To help me investigate, can you do the followings and attach the output to this thread:

  • Run "target modules list" in LLDB
  • Run "target modules lookup -n dlopen" in LLDB
  • Run "target modules lookup -n __dl_dlopen" in LLDB
  • Download "/system/bin/linker" from the device and dump the ".symtab" section of it

Random guesses for what can cause the problem (will be able to more specific based on the above data):

  • LLDB fails to detect or load "/system/bin/linker" in the shared library list
  • /system/bin/linker doesn't contain a ".symtab" section (e.g. stripped)
  • You are using a version of android what have an API 26 /system/bin/linker what had some changes breaking LLDB (Pavel fixed it but the fix only applies for API 26+)
labath added inline comments.Jul 28 2017, 11:32 AM
lldb/trunk/source/Plugins/Platform/Android/PlatformAndroid.cpp
389

Is it possible you are using a preview version of android sdk? These had the new dlopen symbols even though the official android sdk's still used the __dl_dlopen symbols?

It would be nice to auto detect the names correctly?

lldb/trunk/source/Plugins/Platform/Android/PlatformAndroid.cpp
389

Maybe search for a symbol named "dl_open" first, then fall back to "dl_open" if that symbol is available, and supply as an argument for "expr.Printf(" by adding a "%s" in place of the "dl_dlopen? call? Then it can work for both?

const char *dl_open_func_name = nullptr;
SymbolContextList matching_symbols;
std::vector<const char *> dl_open_names = { "__dl_open", "dl_open" };
const char *dl_open_name = nullptr;
for (auto name: dl_open_names) {
  if (process->GetTarget().GetImages().FindFunctionSymbols(ConstString(name), eFunctionNameTypeFull, matching_symbols))
    dl_open_name = name;
    break;
  }
}

Hi Tamas,

Run "target modules list"

(lldb) target modules list
[ 0] E6A47987-0000-0000-0000-000000000000 /export/tmp/nin/LLVM/llvm/tools/lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/multiset/a.out
[ 1] C2298D9F-41E0-1ED4-C9BF-D5B416CBE38F /home/nin/.lldb/module_cache/remote-android/.cache/C2298D9F-41E0-1ED4-C9BF-D5B416CBE38F/libstdc++.so
[ 2] D178E966-9342-8A56-CE34-35C9359886A4 /home/nin/.lldb/module_cache/remote-android/.cache/D178E966-9342-8A56-CE34-35C9359886A4/libm.so
[ 3] 6F226DD0-EFC1-AC63-647D-374F43AB9CC6 /home/nin/.lldb/module_cache/remote-android/.cache/6F226DD0-EFC1-AC63-647D-374F43AB9CC6/libc.so
[ 4] DF63F23A-2C91-BA01-EC7D-7CCA4FA1A55D /home/nin/.lldb/module_cache/remote-android/.cache/DF63F23A-2C91-BA01-EC7D-7CCA4FA1A55D/libdl.so
[ 5] 0466B0A0-00B3-88DA-55F1-1DA6D42D2C36 /home/nin/.lldb/module_cache/remote-android/.cache/0466B0A0-00B3-88DA-55F1-1DA6D42D2C36/ld-android.so
[ 6] 3FD33902-2A3B-6F33-0890-CDF6A434045E /home/nin/.lldb/module_cache/remote-android/.cache/3FD33902-2A3B-6F33-0890-CDF6A434045E/linker64
[ 7] D738489C-460B-25A5-B3B1-0D9BBC7CB8D6 /home/nin/.lldb/module_cache/remote-android/.cache/D738489C-460B-25A5-B3B1-0D9BBC7CB8D6/libnetd_client.so
[ 8] 84239230-A38E-687A-7EC5-FA40CD52D25F /home/nin/.lldb/module_cache/remote-android/.cache/84239230-A38E-687A-7EC5-FA40CD52D25F/libcutils.so
[ 9] 38725D15-41E9-AB20-70A0-16012C5B9DAD /home/nin/.lldb/module_cache/remote-android/.cache/38725D15-41E9-AB20-70A0-16012C5B9DAD/libc++.so
[ 10] 72DDCA8E-D724-7089-17EF-0703F4CA12E4 /home/nin/.lldb/module_cache/remote-android/.cache/72DDCA8E-D724-7089-17EF-0703F4CA12E4/liblog.so

Run "target modules lookup -n dlopen"

(lldb) target modules lookup -n dlopen
1 match found in /home/nin/.lldb/module_cache/remote-android/.cache/DF63F23A-2C91-BA01-EC7D-7CCA4FA1A55D/libdl.so:

Address: libdl.so[0x00000000000010a0] (libdl.so..text + 352)
Summary: libdl.so`dlopen

target modules lookup -n __dl_dlopen

(lldb) target modules lookup -n __dl_dlopen
(lldb)

I have attach the output of /system/bin/linker64.

.

The android version am using for testing is "O".

generic_mips64:/ # getprop ro.build.version.release
O

Thanks for all of the data. Based on this I think you are using a preview version of android O what reports SDK 25 but the linker works the way an SDK 26 system linker would do.

I think the proper fix for the problem is to do something like what Greg suggested to detect the correct function name based on the list of available symbols instead of based on the SDK version. I will try to create a CL to fix it in the next couple of days (unless you are happy to do it), but not sure when I will have time for that and it is a bit tricky to test it.

Until then you can try to do either update to a newer version of the system image what reports SDK 26 (and make sure you are using a sufficiently new LLDB) or locally hack the "PlatformAndroid::LoadImage" function to use the SDK 26 version in case of SDK 25 as well (possibly with a check for ro.build.version.release)

Thanks for all of the data. Based on this I think you are using a preview version of android O what reports SDK 25 but the linker works the way an SDK 26 system linker would do.

I think the proper fix for the problem is to do something like what Greg suggested to detect the correct function name based on the list of available symbols instead of based on the SDK version. I will try to create a CL to fix it in the next couple of days (unless you are happy to do it), but not sure when I will have time for that and it is a bit tricky to test it.

Until then you can try to do either update to a newer version of the system image what reports SDK 26 (and make sure you are using a sufficiently new LLDB) or locally hack the "PlatformAndroid::LoadImage" function to use the SDK 26 version in case of SDK 25 as well (possibly with a check for ro.build.version.release)

Thanks. As suggested by Greg will submit a patch for that