This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][hwasan] Add __hwasan_library_loaded/unloaded to hwasan_interface.h
AcceptedPublic

Authored by leonardchan on Sep 13 2022, 11:56 AM.

Details

Summary

These are interface functions because a platform that supports instrumenting globals needs a dynamic loader to call these to register globals correctly.

Diff Detail

Event Timeline

leonardchan created this revision.Sep 13 2022, 11:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 11:56 AM
Herald added subscribers: Enna1, dberris. · View Herald Transcript
leonardchan requested review of this revision.Sep 13 2022, 11:56 AM
leonardchan retitled this revision from [compiler-rt][hwasan] Move __hwasan_library_loaded/unloaded to hwasan_interface.h to [compiler-rt][hwasan] Add __hwasan_library_loaded/unloaded to hwasan_interface.h.
leonardchan edited the summary of this revision. (Show Details)
leonardchan edited the summary of this revision. (Show Details)
vitalybuka accepted this revision.Sep 14 2022, 2:17 PM

we will probably search for a different solution with glibc

This revision is now accepted and ready to land.Sep 14 2022, 2:17 PM

BTW, @mcgrathr, any hints what can be done for GLIBC, taking into account that we likely need to support older versions.

For the general case, not only glibc but probably all preexisting system libraries that don't have special hwasan support built in, I don't think you can rely on anything to hook into here. If you really needed to, *maybe* you could have dlopen and dlclose interceptors, but that seems really unlikely to go well. Bottom line, you probably have to find a way on those systems to have vanilla init and fini hook behavior (init_array, preinit_array, fini_array, etc.) be sufficient. It's challenging because you really need to come in before any code that could access a global is run, and with a DSO overriding a symbol in one of its DT_NEEDED DSOs the order of init hooks getting called could make it impossible to be in the right place.

So you might actually need to resort to essentially a dl_iterate_phdr-based check (using the dlpi_adds/dlpi_subs counts to optimize unnecessary re-checks) eagerly done after dlclose. And for dlopen you might actually be SOL unless you can do something with the LD_AUDIT interface. Yeah, that's probably what you have to do, but it's going to be pretty painful. It's probably not feasible without the hwasan runtime being a DSO itself, unless you do some really nasty stuff in the compiler driver to fudge link switches. The DSO or executable needs to have a DT_AUDIT for a separate special DSO that will get callbacks. LLD doesn't even support the switches to create that entry, so there's a lot to make all that happen. Not to mention it's a DSO load, so it has all the RUNPATH requirements of any other DSO. It's a lot.

compiler-rt/include/sanitizer/hwasan_interface.h
46

I suspected that one of the reasons this wasn't exposed originally is because it's ELF-specific (and as written here, even specific to the GNU spellings like ElfW).

Even if it has only been supported for GNU-compatible ELF platforms so far, there's no reason hwasan shouldn't eventually support non-ELF systems or non-GNU ELF systems where this signature or spelling thereof is problematic.

EllfW(Addr) here has no particular value and you could as well just use uintptr_t.
Likewise, ElfW(Half) is just uint16_t (in all variants of ELF) and there's no particular value to passing it with that type instead of something more usual like size_t.. In fact, using Half/uint16_t here breaks some cases that can otherwise work in theory (PN_XNUM cases).

It might be safest to make the phdr argument just const void* and just comment that this is used as an array of ELF Phdr structs. Then you don't need to rely on <link.h> at all.

51

Both of these, but especially the unloaded hook, need to specify clearly when they need to be called relative to other aspects of loading/unloading like relocation, init hooks, and unmapping. I think the requirements are:

  • loaded: After the module has been fully loaded and relocated, but before any of its code (e.g. initializers) has run.
  • unloaded: After the module's code will no longer be run, and either before or after its segments have been unmapped.

But we should verify and make the specified constraints precise. (For example it's important for the implementation to know whether the unloaded hook *requires* that the code already be unmapped, or doesn't care either way. One reason for this is that ordinarily the phdr array may well be part of what gets unmapped, so if this said it should be called after unmapping, the implementation would have to do extra gymnastics.)

leonardchan marked an inline comment as done.Oct 13 2022, 3:21 PM

Updated the function signatures. Also updated comments to make them less ELF-specific.

compiler-rt/include/sanitizer/hwasan_interface.h
51

Updated with comments adding restrictions. I think the loaded restrictions are more straightforward. For unloaded, I just put must be called *before* unmapping. Unloading before seems more straightforward to me and I think this is what all current users of this function do anyway. I also can't think of a reason for why it would be desirable to unload after unmapping. I imagine if we explicitly say "before" right now, we won't lose anything if we decide to say "before or after" later. @vitalybuka Thoughts?