This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Keep track of heap allocated regex states
Needs ReviewPublic

Authored by timshen on Oct 25 2017, 3:54 PM.

Details

Summary

Build abstraction on regex's allocation

This fixes a fuzzer crasher from a huge input (provided by Marshall), which seems to be a stackoverflow during destruction. However, I can't reproduce Marshall's crasher even before the change.

No new tests are added, but all old tests pass. I didn't really run valgrind on tests to check for memory leak, as the new allocation is basically correct by construction (only one call site uses new, no explicit delete calls).

Event Timeline

timshen created this revision.Oct 25 2017, 3:54 PM
timshen updated this revision to Diff 120334.Oct 25 2017, 4:04 PM

Add an assertion to __push.

mclow.lists edited edge metadata.Oct 25 2017, 6:55 PM

A couple of notes.

  • This change means that <regex> now requires C++11 (the new __push function w/ the varargs). I don't know how important that is; but I'm pretty sure libc++ currently provides <regex> in C++03 mode.
  • This is an ABI change; existing code that was compiled with the "old" <regex> implementation will not interoperate with this.
  • I think that there may be some exception-safety issues in __push; if the push_back throws, I think we leak.
libcxx/include/regex
4644

Why auto here?

How about unique_ptr<_NodeType> instead, and then

__storage_.__push(__ret.get());
return __ret.release()

I can confirm that with this patch the (large) regex that used to cause a stack overflow does not any more.

A couple of notes.

Sorry for the oversights, when a C++11(-only :) contributor doesn't care about ABI stability, nor exceptions, he contributes naive code. :P I'll fix them.

  • This change means that <regex> now requires C++11 (the new __push function w/ the varargs). I don't know how important that is; but I'm pretty sure libc++ currently provides <regex> in C++03 mode.
  • This is an ABI change; existing code that was compiled with the "old" <regex> implementation will not interoperate with this.

Can you give an example on what would go wrong, while it's not expected to go wrong?

  • I think that there may be some exception-safety issues in __push; if the push_back throws, I think we leak.
timshen updated this revision to Diff 120711.Oct 27 2017, 3:38 PM

Remove the uses of variadic template and auto.

I'm not sure of how to implement this without a ABI change (the addition of __storage_).

What we need to do here is to here is to add a new macro _LIBCPP_ABI_REGEX_MEMORY in <__config> (around line 89) and then make these changes available only when this macro is defined.

By default - we get no change.
If _LIBCPP_ABI_REGEX_MEMORY is defined, we get the new behavior.

Then we can start adding other ABI changes as well.