Adds a fallback mode to procmaps when the symbolizer
fails to locate a module for a given address by using
dl_iterate_phdr.
Details
- Reviewers
kubamracek rnk vitalybuka eugenis - Commits
- rGdaf210f7b664: Add support for custom loaders to the sanitizer symbolizer
rGe2aa5b2ace43: Add support for custom loaders to the sanitizer symbolizer
rGb9a32d470af2: Add support for custom loaders to the sanitizer symbolizer
rCRT314713: Add support for custom loaders to the sanitizer symbolizer
rCRT314671: Add support for custom loaders to the sanitizer symbolizer
rCRT314431: Add support for custom loaders to the sanitizer symbolizer
rL314713: Add support for custom loaders to the sanitizer symbolizer
rL314671: Add support for custom loaders to the sanitizer symbolizer
rL314431: Add support for custom loaders to the sanitizer symbolizer
Diff Detail
- Repository
- rL LLVM
Event Timeline
I don't understand. What are these custom loaders that fail to list all modules? What prevents FindModuleForAddress from going into infinite recursion if an address is not found even in proc/maps?
modules_.enableFallbackInit() will only return true once (if use_fallback_init_ is already enabled, it returns false), so there shouldn't be an infinite recursion anywhere.
The issue that this is trying to solve is that the Linux LoadedModules uses dl_iterate_phdr by default. This is an api function of the system loader, so it works fine for libraries loaded by the system loader, but won't find any libraries not loaded by the system loader. If you use a custom loader (for example, Faulty.lib) to load some of your libraries, they won't appear in dl_iterate_phdr, because the system loader didn't load them. They are still loaded in the memory map though, so they will appear in procmaps.
ping - I'm open to alternate ways of dealing with this, but it's pretty important for our use case (and presumably affects anyone else using faulty or other custom loaders).
lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc | ||
---|---|---|
191 ↗ | (On Diff #113137) | Does this branch need to set modules_fresh_ to false to trigger reinitialization in the recursive call? This function got super confusing and difficult to reason about. Could you rewrite it without recursion? The real logic behind this is very simple - if dl_iterate_phdr fails, use procmaps. There is nothing recursive about it. |
lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc | ||
---|---|---|
191 ↗ | (On Diff #113137) | Yeah, that makes sense. I mainly went for recursion because that's the way the existing fallback case works (I actually wonder whether that needs to be recursive, maybe I can change that one too). Will refactor. |
This implementation never returns to dl_iterate_phdr once it finds the first module lookup error. This is not great, because proc/maps does not provide reliable base address for DSOs (see the comment in MemoryMappingLayout::DumpListOfModules).
I suggest keeping the fallback list of modules in parallel with the primary list, and always trying the latter first.
lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc | ||
---|---|---|
175 ↗ | (On Diff #114738) | consider early exit here and below: if (module) return module; |
188 ↗ | (On Diff #114738) | This will do extra work on platforms where fallbackInit == init. Move it under smth like if (hasFallback()) ? |
lib/sanitizer_common/sanitizer_symbolizer.h | ||
---|---|---|
155 ↗ | (On Diff #115233) | Please make sure that this does not allocate memory unless it's necessary. |
lib/sanitizer_common/sanitizer_common.h | ||
---|---|---|
723 ↗ | (On Diff #115670) | Is not this backwards? |
This caused a hang on one of the linux buildbots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4146
I'm not sure why this would affect only that buildbot, as it seems like a pretty generic one. I don't see any linux issues locally. My only idea is potentially that initialized could introduce a race condition leading to double-initialization, but I don't understand how two threads could be using the ListOfModules at the same time (or why that would cause a hang instead of data corruption/leaking). I can try introducing a mutex there if it seems problematic.
Actually, I think I'll try splitting this up into a few smaller commits to diagnose where the issue is. There are a few pieces that seem NFC/low risk, so hopefully merging those will help figure out which of the remaining parts actually broke the buildbot.
Once it looks like all the buildbots are happy with those two NFC revisions, I'll merge the NoCtor part of this change (which I suspect is the problematic part).
Interestingly, the buildbot also looks happy with the NoCtor change (rL314533): http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167. Must be that the fallbackInit causes issues somehow.
Interestingly, this small change still causes a hang somewhere on the gcc sanitizer buildbot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4222. Reverted again.
#0 atomic_exchange<__sanitizer::atomic_uint32_t> () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h:68 #1 Lock () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:620 #2 0x00000000004dcf82 in GenericScopedLock () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_mutex.h:187 #3 SymbolizePC () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:76 #4 0x00000000004dba18 in Print () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:35 #5 0x00000000004c38ef in AsanCheckFailed () at /code/llvm-project/compiler-rt/lib/asan/asan_rtl.cc:69 #6 0x00000000004d8910 in CheckFailed () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79 #7 0x00000000004d44d8 in MemoryMappingLayout () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_procmaps_common.cc:80 #8 0x00000000004daad4 in procmapsInit () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:477 #9 fallbackInit () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cc:496 #10 0x00000000004dd34a in RefreshModules () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:168 #11 FindModuleForAddress () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:186 #12 0x00000000004dcf98 in FindModuleNameAndOffsetForAddress () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:157 #13 SymbolizePC () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:81 #14 0x00000000004dba18 in Print () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:35 #15 0x00000000004d92c4 in ReportDeadlySignalImpl () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cc:242 #16 ReportDeadlySignal () at /code/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cc:255 #17 0x00000000004c0225 in ~ScopedInErrorReport () at /code/llvm-project/compiler-rt/lib/asan/asan_report.cc:138 #18 0x00000000004bffb2 in ReportDeadlySignal () at /code/llvm-project/compiler-rt/lib/asan/asan_report.cc:211 #19 0x00000000004bf9a3 in AsanOnDeadlySignal () at /code/llvm-project/compiler-rt/lib/asan/asan_posix.cc:39 #20 <signal handler called> #21 0x00000000004eabe0 in main () at /code/llvm-project/compiler-rt/test/asan/TestCases/Linux/sanbox_read_proc_self_maps_test.cc:26
Hope it helps.
aha - looks like that test removes access to proc/maps, which then causes reading procmaps to fail. Not sure why that failure wasn't reproducing locally, but I'll try to figure out a way to handle that case.
Actually, I'm pretty sure that fixing this is as easy as allowing the fallback cache for the LoadedModule list.