Breakpad creates minidump files that can a module loaded multiple times. We found that when a process mmap's the object file for a library, this can confuse breakpad into creating multiple modules in the module list. This patch fixes the GetFilteredModules() to check the linux maps for permissions and use the one that has execute permissions. Typically when people mmap a file into memory they don't map it as executable. This helps people to correctly load minidump files for post mortem analysis.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
If I correctly understand what this is doing, then I don't think it's a good idea. The base of an (elf) shared library does not have to be mapped executable. These are the mappings I get for a trivial hello world program (no mmapping of libraries or anything) on my linux machine:
00400000-00401000 r--p 00000000 fd:01 2838574 /tmp/l/a.out 00401000-00402000 r-xp 00001000 fd:01 2838574 /tmp/l/a.out 00402000-00403000 r--p 00002000 fd:01 2838574 /tmp/l/a.out 00403000-00404000 r--p 00002000 fd:01 2838574 /tmp/l/a.out 00404000-00405000 rw-p 00003000 fd:01 2838574 /tmp/l/a.out 020fb000-0211c000 rw-p 00000000 00:00 0 [heap] 7fe4f5d87000-7fe4f5da9000 r--p 00000000 fd:01 2738932 /lib64/libc-2.31.so 7fe4f5da9000-7fe4f5ef3000 r-xp 00022000 fd:01 2738932 /lib64/libc-2.31.so 7fe4f5ef3000-7fe4f5f3d000 r--p 0016c000 fd:01 2738932 /lib64/libc-2.31.so 7fe4f5f3d000-7fe4f5f41000 r--p 001b5000 fd:01 2738932 /lib64/libc-2.31.so 7fe4f5f41000-7fe4f5f43000 rw-p 001b9000 fd:01 2738932 /lib64/libc-2.31.so ...
Here, the correct base of a.out is 0x00400000 and the libc base is 0x7fe4f5d87000. But this patch would make them be detected as 0x00401000 and 0x7fe4f5da9000, respectively.
This behavior is controlled by the -z (no)separate-code. My machine has separate-code as default, but that setting may not be universal, so you may need to pass this flag explicitly to reproduce this. For reference, these are the mappings I get when compiling a.out with -z noseparate-code (libc mappings remain unchanged, of course):
00400000-00401000 r-xp 00000000 fd:01 2838574 /tmp/l/a.out 00401000-00402000 r--p 00000000 fd:01 2838574 /tmp/l/a.out 00402000-00403000 rw-p 00001000 fd:01 2838574 /tmp/l/a.out
It sounds like we need a better heuristic. How about "the number of consecutive mappings with the same name"? User mmapping code is likely going to map the library in a single chunk, but the dynamic linker will typically create multiple mappings (even a trivial executable can have five), so it seems like picking the longest sequence could work...
It actually will do the right thing here. We only prefer the executable mapping for multiple entries in the minidump modules list. The minidump will have only one load address per module and that address will always be the first mapping in the linux proc maps file.
So the only entry for "/tmp/l/a.out" in the minidump would be the one for 00400000, and the only mapping for /lib64/libc-2.31.so would be 7fe4f5d87000.
If we have multiple module entries _and_ if you have separate code, the old behavior will continue and it will prefer the mapping with the lowest address because neither of the binaries would have the base address marked as executable. If we have multiple entries for the exact same path, but with different addresses, we will prefer the one that _is_ marked executable correctly (which happens on Android for us all the time, can could easily happen on Linux with "-z no-separate-code".
This behavior is controlled by the -z (no)separate-code. My machine has separate-code as default, but that setting may not be universal, so you may need to pass this flag explicitly to reproduce this. For reference, these are the mappings I get when compiling a.out with -z noseparate-code (libc mappings remain unchanged, of course):
00400000-00401000 r-xp 00000000 fd:01 2838574 /tmp/l/a.out 00401000-00402000 r--p 00000000 fd:01 2838574 /tmp/l/a.out 00402000-00403000 rw-p 00001000 fd:01 2838574 /tmp/l/a.outIt sounds like we need a better heuristic. How about "the number of consecutive mappings with the same name"? User mmapping code is likely going to map the library in a single chunk, but the dynamic linker will typically create multiple mappings (even a trivial executable can have five), so it seems like picking the longest sequence could work...
So this heuristic will work like it did before for "-z separate-code", and will fix multiple mappings if "-z no-separate-code" is used. So this is only a win in my book since we only check the start address for a shared library _and_ we only check for the execute part if we have multiple module entries for the same file.
Added a better algorithm for detecting if an address from the module list for linux processes with valid /proc/<pid>/maps file is executable. We now search consective addresses that match the path to the module for any that are marked as executable. Added tests to test if the mmap'ed file comes first or second, and also a test for "-z separate-code" when the first mapping for an executable isn't executable, but a subsequent mapping with a matching path is.