This is an archive of the discontinued LLVM Phabricator instance.

Minimal runtime for UBSan.
ClosedPublic

Authored by eugenis on Aug 16 2017, 5:29 PM.

Details

Summary

An implementation of ubsan runtime library suitable for use in production.

Minimal attack surface.

  • No stack traces.
  • Definitely no C++ demangling.
  • No UBSAN_OPTIONS=log_file=/path (very suid-unfriendly). And no UBSAN_OPTIONS in general.
  • as simple as possible

Minimal CPU and RAM overhead.

  • Source locations unnecessary in the presence of (split) debug info.
  • Values and types (as in A+B overflows T) can be reconstructed from register/stack dumps, once you know what type of error you are looking at.
  • above two items save 3% binary size.

When UBSan is used with -ftrap-function=abort, sometimes it is hard to reason about failures. This library replaces abort with a slightly more informative message without much extra overhead. Since ubsan interface in not stable, this code must reside in compiler-rt.

  • Tests pending ***

Diff Detail

Repository
rL LLVM

Event Timeline

eugenis created this revision.Aug 16 2017, 5:29 PM
emaste added a subscriber: emaste.Aug 16 2017, 6:37 PM
pcc added inline comments.Aug 17 2017, 3:48 PM
clang/lib/Driver/ToolChains/CommonArgs.cpp
548 ↗(On Diff #111439)

What would happen if I compile with asan+ubsan+minimal runtime? Does that combination make sense?

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
18 ↗(On Diff #111439)

This won't work on Windows I think, so you'll probably want to disable building the runtime on Windows.

24 ↗(On Diff #111439)

There's a function type mismatch here I believe. If the idea is to reduce binary size, I'd imagine that we'd want to avoid passing the source location on the clang side.

eugenis added inline comments.Aug 17 2017, 4:00 PM
clang/lib/Driver/ToolChains/CommonArgs.cpp
548 ↗(On Diff #111439)

Not right now. The driver would not accept asan + minimal-runtime - see CompatibleWithMinimalRuntime. We may come up with a hardened runtime library for ASan in the future; I expect it to include minimal-ubsan runtime and then needsAsanRt() would be true and needsUbsanRt() would be false.

This is all hypothetical, I don't plan to work on "hardened asan" in the near future.

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
24 ↗(On Diff #111439)

Good point. Completely forgot about that :)

Have you explored the possibility of reducing the overhead of the regular runtime to an acceptable level?

  • If it's too big, we can disable any heavy weight features you identify at compile-time.
  • If it allocates or prints too much, we can add in a custom allocator or printing strategy.
  • If it requires too much metadata to be inserted into user programs, we can devise a new, smaller encoding for the metadata. Or simply teach the runtime handlers to accept nullptr. This should also achieve the binary size savings you mentioned.

I have a strong preference towards having a single codebase for the runtime. An entirely separate runtime would impose a higher testing and maintenance burden. If having a separate runtime is really the only path forward, I would like to learn more about why the current runtime cannot be adapted to suit your use cases.

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
24 ↗(On Diff #111439)

I imagine that deduplication needs to be supported by a minimal runtime, to prevent e.g a tight for loop from spamming the UB log. How do you plan on accomplishing that without source locations -- perhaps by emitting a bitmap?

Good questions.

The primary motivation is security. I want this to be something that I can be confident does not add any new vulnerabilities to the program. The current implementation demangles potentially untrusted strings, writes to arbitrary files, and in general contains too much code of mixed quality.

In D36810#844863, @vsk wrote:

Have you explored the possibility of reducing the overhead of the regular runtime to an acceptable level?

  • If it's too big, we can disable any heavy weight features you identify at compile-time.

Not worried about that.

  • If it allocates or prints too much, we can add in a custom allocator or printing strategy.
  • If it requires too much metadata to be inserted into user programs, we can devise a new, smaller encoding for the metadata. Or simply teach the runtime handlers to accept nullptr. This should also achieve the binary size savings you mentioned.

Both metadata and extra code size to set up the call frame. UBSan checks are very common, and each instruction counts. If we are going to allow nullptr in the arguments, might as well implement _minimal variants.

I have a strong preference towards having a single codebase for the runtime. An entirely separate runtime would impose a higher testing and maintenance burden. If having a separate runtime is really the only path forward, I would like to learn more about why the current runtime cannot be adapted to suit your use cases.

I'd love to have a single code base, but it creates lots of unnecessary complications. I thought of doing some sort of template or preprocessor magic to reuse the ubsan handlers, stripping out things like suppressions, argument processing, fancy output (demangling in particular, but also logging), UBSAN_OPTIONS handling when building as "minimal runtime". I feel the result would be a terrible mess.

The minimal runtime does not really need much testing - it's only 50 lines of code. It would be good to add something to ensure that the set of handlers matches the full runtime.

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
24 ↗(On Diff #111439)

Primary mode for this runtime, as a security hardening tool, would be abort-on-error. I even considered skipping the non-abort variants of the handlers.

For the sake of simplicity, we could do this with a plain array of caller PCs which each handler will do a linear scan of, and probably not even reallocate it once it's full but just give up and print "error limit reached".

vsk added a comment.Aug 18 2017, 10:53 AM

Good questions.

The primary motivation is security. I want this to be something that I can be confident does not add any new vulnerabilities to the program. The current implementation demangles potentially untrusted strings, writes to arbitrary files, and in general contains too much code of mixed quality.

In D36810#844863, @vsk wrote:

Have you explored the possibility of reducing the overhead of the regular runtime to an acceptable level?

  • If it's too big, we can disable any heavy weight features you identify at compile-time.

Not worried about that.

  • If it allocates or prints too much, we can add in a custom allocator or printing strategy.
  • If it requires too much metadata to be inserted into user programs, we can devise a new, smaller encoding for the metadata. Or simply teach the runtime handlers to accept nullptr. This should also achieve the binary size savings you mentioned.

Both metadata and extra code size to set up the call frame. UBSan checks are very common, and each instruction counts. If we are going to allow nullptr in the arguments, might as well implement _minimal variants.

I see, yes, this is a good reason to introduce the _minimal variants.

I have a strong preference towards having a single codebase for the runtime. An entirely separate runtime would impose a higher testing and maintenance burden. If having a separate runtime is really the only path forward, I would like to learn more about why the current runtime cannot be adapted to suit your use cases.

I'd love to have a single code base, but it creates lots of unnecessary complications. I thought of doing some sort of template or preprocessor magic to reuse the ubsan handlers, stripping out things like suppressions, argument processing, fancy output (demangling in particular, but also logging), UBSAN_OPTIONS handling when building as "minimal runtime". I feel the result would be a terrible mess.

The minimal runtime does not really need much testing - it's only 50 lines of code. It would be good to add something to ensure that the set of handlers matches the full runtime.

Fair enough. Having some way to enforce parity between the regular/minimal handlers would alleviate most of my concerns. Thanks for the explanation!

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
24 ↗(On Diff #111439)

Sgtm, this reduces the amount of inserted metadata further.

ubsan and suitable on production? I don't understand this phrasing, sounds to me like oxymoron.

vsk added a comment.Aug 18 2017, 11:06 AM

ubsan and suitable on production? I don't understand this phrasing, sounds to me like oxymoron.

It's been done for a while already :). Though, AFAIK, without a runtime component.

kubamracek added inline comments.Aug 18 2017, 11:39 AM
compiler-rt/lib/ubsan_minimal/CMakeLists.txt
63 ↗(On Diff #111439)

Not an objection for this particular patch, but I noticed the amount of CMake boilerplate code necessary for the bring-up of a new sanitizer tool is very high (and this is even without macOS support). Is there a way to refactor and reduce that?

eugenis updated this revision to Diff 112625.Aug 24 2017, 3:43 PM

Now with tests and better diagnostics in the driver.

Disabled minimal runtime with CFI because cross-dso cfi does not know how to handle an error without diagnostic data.
Minimal runtime is not realy necessary with cross-dso cfi, because the stack trace very clearly points to the problem.

eugenis added inline comments.Aug 24 2017, 3:48 PM
compiler-rt/lib/ubsan_minimal/CMakeLists.txt
63 ↗(On Diff #111439)

I've removed some of the code that is unnecessary for this runtime. I'm not sure about factoring large parts of it out. It feels like there are a lot of things in this code that might be different in one sanitizer or the other.

pcc added inline comments.Aug 24 2017, 7:22 PM
clang/lib/CodeGen/CGExpr.cpp
2713 ↗(On Diff #112625)

Now that we have three of these I think it would be more readable to change this to:

if (CheckInfo.Version && !MinimalRuntime)
  FnName += "_v" + llvm::utostr(CheckInfo.Version);

and similarly below.

compiler-rt/lib/CMakeLists.txt
37 ↗(On Diff #112625)

Instead of adding this here it looks like the new way is to add the runtime to ALL_SANITIZERS.

eugenis updated this revision to Diff 112734.Aug 25 2017, 1:25 PM

address pcc's comments

clang/lib/CodeGen/CGExpr.cpp
2713 ↗(On Diff #112625)

Done. Note that the new code is using std::string concatenation and not Twine - as I understand, Twine can not be used across multiple statements.

compiler-rt/lib/CMakeLists.txt
37 ↗(On Diff #112625)

OK, this can be done because nothing else depends on ubsan-minimal (unlike ubsan, stats and lsan above).

Also, this will require a clean build to pick up the new sanitizer, because COMPILER_RT_SANITIZERS_TO_BUILD is already expanded to the list that does not include ubsan_minimal in cmake cache. Re-running cmake with just this one flag (=all) in the existing build directory also works.

pcc accepted this revision.Aug 25 2017, 1:55 PM

LGTM

compiler-rt/lib/CMakeLists.txt
37 ↗(On Diff #112625)

COMPILER_RT_SANITIZERS_TO_BUILD is already expanded to the list that does not include ubsan_minimal in cmake cache

That doesn't seem right. I think that in a separate change we should change the default value to all.

compiler-rt/lib/ubsan_minimal/CMakeLists.txt
17 ↗(On Diff #112734)

I don't think you need to test for this now that the inclusion of this file is conditional on COMPILER_RT_HAS_UBSAN_MINIMAL.

This revision is now accepted and ready to land.Aug 25 2017, 1:55 PM
vsk added inline comments.Aug 25 2017, 1:57 PM
compiler-rt/cmake/config-ix.cmake
549 ↗(On Diff #112734)

I think all of this works on Darwin without issue.

compiler-rt/lib/ubsan_minimal/CMakeLists.txt
17 ↗(On Diff #112734)

This 'if Apple' branch doesn't seem necessary. We're not doing any add_weak_symbols() magic, because there's no ___ubsan_default_options.

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
45 ↗(On Diff #112734)

Leave out the parameter name to avoid "Wunused"-style diagnostics?

53 ↗(On Diff #112734)

Printing out the return address here would aid debugging. It's extra complexity but would be worth it when triaging bugs, or when using the minimal runtime in low-level software.

compiler-rt/test/ubsan_minimal/lit.common.cfg
32 ↗(On Diff #112734)

We should be able to just include Darwin here too.

eugenis added inline comments.Aug 25 2017, 6:20 PM
compiler-rt/lib/CMakeLists.txt
37 ↗(On Diff #112625)

Good point, r311824.

compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
53 ↗(On Diff #112734)

Even if it's just an absolute address, w/o module base? It's does not give you much in the presence of ASLR.

vsk added inline comments.Aug 25 2017, 6:24 PM
compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
53 ↗(On Diff #112734)

This would still be helpful while debugging xnu, or firmware which runs without ASLR. While debugging the kernel you can print out the slide offset for the text segment.

vitalybuka accepted this revision.Aug 26 2017, 11:44 AM
vitalybuka added inline comments.
compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
29 ↗(On Diff #112734)

Could this be at line 20
if (sz == kMaxCallerPcs - 1) message("ubsan: too many errors\n");
if (sz >= kMaxCallerPcs - 1) return false;

eugenis updated this revision to Diff 112958.Aug 28 2017, 1:40 PM
eugenis marked 3 inline comments as done.

Addressed comments.
Removed shared ubsan runtime on Android support in the driver - it should be done in a separate commit,
and probably under a flag (something like -shared-libubsan, similar to gcc's -static-libubsan and clang's -shared-libasan).

eugenis added inline comments.Aug 28 2017, 1:42 PM
compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
29 ↗(On Diff #112734)

No, that way "too many errors" message may be printed multiple times.

Added one more check to the loop above: if we read a nullptr value from caller_pcs[i], it means that another thread has bumped caller_pcs_sz but has not written the pc value into the new element yet.

53 ↗(On Diff #112734)

I've added a FIXME for now.

compiler-rt/test/ubsan_minimal/lit.common.cfg
32 ↗(On Diff #112734)

Would you like me to add Darwin to COMPILER_RT_HAS_UBSAN_MINIMAL in config-ix.cmake, too? I don't have a machine to test it though.

vsk added inline comments.Aug 28 2017, 1:45 PM
compiler-rt/test/ubsan_minimal/lit.common.cfg
32 ↗(On Diff #112734)

That would be great, I can fix the Darwin bots post-commit if there's any issue.

eugenis updated this revision to Diff 112963.Aug 28 2017, 1:50 PM

enable on Darwin

eugenis marked an inline comment as done.Aug 28 2017, 1:50 PM
vitalybuka accepted this revision.Aug 28 2017, 1:51 PM
vitalybuka added inline comments.
compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cc
23 ↗(On Diff #112958)

This comment probably is unnecessary.
I was asking about early exit, not about "too many errors" in particular.

vsk accepted this revision.Aug 28 2017, 1:53 PM

Thanks for working on this!

This revision was automatically updated to reflect the committed changes.

This broke Mac build, as expected. I've disabled ubsan-minimal on Darwin in r312036.