Page MenuHomePhabricator

[LibFuzzer] Reimplement how the optional user functions are called.
ClosedPublic

Authored by delcypher on May 27 2016, 11:36 AM.

Details

Summary

[LibFuzzer] Reimplement how the optional user functions are called.

The motivation for this change is to fix linking issues on OSX.
However this only partially fixes linking issues (the uninstrumented
tests and a few others won't succesfully link yet).

This change introduces a struct of function pointers
(`fuzzer::ExternalFuntions`) which when initialised will point to the
optional functions if they are available. Currently these
`LLVMFuzzerInitialize and LLVMFuzzerCustomMutator` functions.

Two implementations of `fuzzer::ExternalFunctions` constructor are
provided one for Linux and one for OSX.

The OSX implementation uses `dlsym()` because the prior implementation
using weak symbols does not work unless the additional flags are passed
to the linker.

The Linux implementation continues to use weak symbols because the
`dlsym()` approach does not work unless additional flags are passed
to the linker.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@kcc : I've uploaded a work in progress patch so you can take a look to see if I'm going in the right direction with this. One thing I'm not very happy about is FuzzerMain.cpp. The implementation of main(int argc, char **argv) needs to be able to optionally call LLVMFuzzerInitialize but also the Fuzzer class needs access to bunch of other functions. So currently they both independently create a fuzzer::ExternalFunctions and call Init(). This is wasteful but I wanted to avoid changing the interface of fuzzer::fuzzerDriver and the constructor of Fuzzer which would be required if I wanted to propagate EF through the layers into an instance of the Fuzzer class. Any thoughts on how you would like this handled?

kcc added a comment.May 31 2016, 2:03 PM

Ah, yea, you need to pass &argc and &argv...
Just change the interface of FuzzerDriver to accept int* and char*** -- it's not a part of public interface any more, easy to change.

In D20741#444862, @kcc wrote:

Ah, yea, you need to pass &argc and &argv...
Just change the interface of FuzzerDriver to accept int* and char*** -- it's not a part of public interface any more, easy to change.

Ok great. I will go ahead and do that.

aizatsky edited edge metadata.May 31 2016, 3:06 PM

I solved this in chromium tree by specifying additional linker arguments:

"-Wl,-U,_LLVMFuzzerCustomMutator",
"-Wl,-U,_LLVMFuzzerInitialize",

https://code.google.com/p/chromium/codesearch#chromium/src/testing/libfuzzer/BUILD.gn&l=29

Is there a reason we shouldn't do the same here without changing libfuzzer code?

Any reason *not* to change libFuzzer to avoid that?
The original intention of the code in libFuzzer (according to the comment) was having them "weak_external". Having to modify the user linker flag is not nice.

kcc added a comment.May 31 2016, 3:14 PM

Is there a reason we shouldn't do the same here without changing libfuzzer code?

The build command line for libFuzzer should remain very simple and the same across platforms.
It's easy to achieve, so we should just do it.

delcypher updated this revision to Diff 59155.May 31 2016, 5:03 PM
delcypher edited edge metadata.

Try to use dlsym() rather than passing flags to the linker on OSX. The result is not very satisfying.

delcypher added a comment.EditedMay 31 2016, 5:10 PM

@kcc:

The build command line for libFuzzer should remain very simple and the same across platforms.
It's easy to achieve, so we should just do it.

Okay here's the result. I'm not very happy with. It seems that on Linux to get this to work the linker has to be passed -export-dynamic (otherwise dlsym() can't see the symbols we want to get) and told to link against the dl library for this to work.

A little bit ironically I don't have to tell the Darwin linker to do anything special in this patch. So in my original patch I had to tell the Darwin linker to do extra work and in this patch I have to tell GNU ld to do extra work. So now this patch causes hassle under Linux :(

Perhaps a good middle ground would be to make the Linux implementation of ExternalFunctions::Init() use weak symbols (as a way of getting the addresses to assign to the function pointers) that way no special linking is required on either platform. Doing this though will either require some ifdefs in FuzzerExtFunctions.cpp or require two different versions FuzzerExtFunctionsLinux.cpp and FuzzerExtFunctionsDarwin.cpp.

Thoughts?

delcypher updated this revision to Diff 59160.May 31 2016, 5:14 PM

Upload correct patch (Sorry the old one was wrong).

kcc added a comment.May 31 2016, 5:23 PM

No extra linker flags, please.
No ifdefs, other than for an entire file, please.
I am ok with this approach:
Create FuzzerExtFunctionsWeak.cpp and FuzzerExtFunctionsDlsym.cpp,
have weak functions in the first one and dlsym in the second one.
Have a file scope ifdefs in these files (#if LIBFUZZER_LINUX in the first one, #if LIBFUZZER_APPLE in the second one)

lib/Fuzzer/FuzzerInternal.h
475 ↗(On Diff #59160)

Does it have to be passed as a parameter?
Why not just this?
ExternalFunctions EF;

lib/Fuzzer/test/FuzzerUnittest.cpp
417 ↗(On Diff #59160)

I wouldn't worry too much, dlsym is not that slow.
And if you follow my other comment you won't need any of this.

In D20741#445133, @kcc wrote:

No extra linker flags, please.
No ifdefs, other than for an entire file, please.
I am ok with this approach:
Create FuzzerExtFunctionsWeak.cpp and FuzzerExtFunctionsDlsym.cpp,
have weak functions in the first one and dlsym in the second one.
Have a file scope ifdefs in these files (#if LIBFUZZER_LINUX in the first one, #if LIBFUZZER_APPLE in the second one)

There is nothing specific to Linux or Apple in these file, having such a ifdef does not seem right to me. You can handle the different in the CMake (linking one or the other for instance).

kcc added a comment.May 31 2016, 5:31 PM
You can handle the different in the CMake  (linking one or the other for instance).

No.
The build command line for libFuzzer outside of LLVM build system should remain what it is now

Sure, but ifdef linux and ifdef Apple are still not correct, so back to the whiteboard...

kcc added a comment.May 31 2016, 5:36 PM

Sure, but ifdef linux and ifdef Apple are still not correct,

Maybe not.
Although: weak is not properly supported on Mac, and dlsym does not work easy enough for linux in this case.
So maybe yes.

so back to the whiteboard...

No objections :)

Talked to Nick about the linker on Darwin, and he's confirming that we shouldn't rely on weak external, or even on weak override as it is too easy to get wrong.
He's suggesting that the most reliable way is to handle this at the API level, i.e. something like:

int MyTestInput(const uint8_t *data, size_t size);

struct FuzzerAPI {
  int (*TestOneInput)(const uint8_t *data, size_t size);
//  ... other overridable APIs
};
extern "C" void LLVMFuzzerRegistration(FuzzerAPI *API) {
  API->TestOneInput = MyTestInput;
}

That adds a few lines of boilerplate on the client side...

kcc added a comment.May 31 2016, 5:56 PM

No. The user interface needs to remain as is.

@kcc @mehdi_amini

so back to the whiteboard...

No objections :)

So am I going with @kcc 's proposal?

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term (especially if someone decides to add support for Windows). It may be necessary in the future to divorce LibFuzzer from the LLVM build system and give LibFuzzer its own build system to do the work of compiling for different platforms (instead of having a simple command line to build the library).

lib/Fuzzer/FuzzerInternal.h
475 ↗(On Diff #59160)

That is also fine. As a rule of thumb I try to avoid passing structs by value when possible but if you would rather it was passed by value I can do that.

lib/Fuzzer/test/FuzzerUnittest.cpp
417 ↗(On Diff #59160)

Which comment are you referring to? Changing to passing the struct by value won't change having to call ExternalFunctions::Init(). I can drop this comment if you'd like though.

kcc added a comment.May 31 2016, 6:04 PM

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term

I am not convinced. Worked well so far. We have a very similar code structure for the sanitizers, btw,
and they, at least asan, are ported to more platforms than the rest of LLVM is.

In D20741#445216, @kcc wrote:

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term

I am not convinced. Worked well so far. We have a very similar code structure for the sanitizers, btw,
and they, at least asan, are ported to more platforms than the rest of LLVM is.

But aren't the sanitizers a little different? They are part of compiler-rt which AFAIK is always built using clang (or maybe gcc?). LibFuzzer however could conceivably be built with MSVC (which is a very different beast) one day. Of course the code that needs to be instrumented has to be built with clang so my argument here is a little weak.

In D20741#445216, @kcc wrote:

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term

I am not convinced. Worked well so far.

Well actually I'm not sure it did work that well: nobody could follow the doc and build on OSX.

We have a very similar code structure for the sanitizers, btw,
and they, at least asan, are ported to more platforms than the rest of LLVM is.

Any other precedent on relying on the user to provide/override some APIs in any sanitizer?

In D20741#445216, @kcc wrote:

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term

I am not convinced. Worked well so far.

Well actually I'm not sure it did work that well: nobody could follow the doc and build on OSX.

LibFuzzer is not supported on OS X, so there was no expectation that following the instructions would work.

Simple user interface (command line and API) should be the highest priority here. I believe using dlsym on the Mac and weak linking on Linux is the best option. (Sanitizers implement platform specific behavior as well.)

In D20741#445216, @kcc wrote:

I think it's probably fine for now but I don't think the goal of having a really simple compilation command line for LibFuzzer is sustainable long term

I am not convinced. Worked well so far.

Well actually I'm not sure it did work that well: nobody could follow the doc and build on OSX.

LibFuzzer is not supported on OS X, so there was no expectation that following the instructions would work.

The doc only mentions Windows as unsupported because "The sanitizer coverage support does not work on Windows either as of 01/2015" (see http://llvm.org/pre-releases/3.8.0/rc2/docs/LibFuzzer.html#q-what-about-windows-then-the-fuzzer-contains-code-that-does-not-build-on-windows).
So it seem to me to be a reasonable expectation to have it build on most platform that can build LLVM.

delcypher updated this revision to Diff 59173.May 31 2016, 9:11 PM
delcypher retitled this revision from [LibFuzzer] Fix weak linking issues building tests under OSX. to [LibFuzzer] Reimplement how the optional user functions are called..
delcypher updated this object.

Use different implementation for initializing fuzzer::ExternalFunctions on Linux and OSX

delcypher marked 6 inline comments as done.May 31 2016, 9:12 PM
mehdi_amini added inline comments.May 31 2016, 9:21 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

Again, there is nothing Apple specific here AFAIK, please rename.

@kcc : This is my attempt at implementing what you wanted. Please review when you have time.

delcypher added inline comments.May 31 2016, 9:36 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

The comments above explain why this is the implementation we want to use on Apple platforms. Although this code compiles on Linux it does not work as intended unless an extra flag is passed to the linker by the client when linking against LibFuzzer. If we assume for the moment that we want LibFuzzer to be used with a platform's default linking behavior then this implementation is specific to the behavior of the Darwin linker (and possibly other non GNU linkers). If later it turns out we also need to use this implementation for other platforms (e.g. the *BSDs) then the #ifdef can be appropriately modified in the future.

If I had it my way I wouldn't use #ifdefs and would tell CMake to compile the appropriate file based on the host platform but that option is not available to me.

lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

This code is currently only executed on "Apple" platforms so, in IMHO, spelling out which platform this code will be running on is better for readability than using some abstract name.

On the other hand, should we use the names consistent with what is declared in lib/sanitizer_common/sanitizer_platform.h? Ex: LIBFUZZER_MAC instead if LIBFUZZER_APPLE... LIBFUZZER_MAC is a horrible name (for including OSX and iOS), but consistency might be more important.

mehdi_amini added inline comments.May 31 2016, 10:00 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

None of what you're writing here is justifying the naming IMO, so again it should be something like:

#ifdef LIBFUZZER_USE_DLOPEN
...
#endif

I'm more flexible on how you get the default, but I'd rather write it in a way that can be overridden through a build setting, i.e. something along these lines (haven't checked the syntax):

#ifndef LIBFUZZER_USE_DLOPEN
  #if LIBFUZZER_LINUX
    #define LIBFUZZER_USE_WEAKLINK
  #else
    #define LIBFUZZER_USE_DLOPEN
  #endif
#else
  #ifdef LIBFUZZER_USE_WEAKLINK
    #error "Can't set both LIBFUZZER_USE_DLOPEN and LIBFUZZER_USE_WEAKLINK"
  #endif
#endif
mehdi_amini added inline comments.May 31 2016, 10:02 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

(I was replying to delcypher and not Anna as the email thread suggests).

zaks.anna added inline comments.May 31 2016, 10:14 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

I find this much less readable.

mehdi_amini added inline comments.May 31 2016, 10:25 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

You're saying that

#ifdef LIBFUZZER_USE_DLOPEN
...
#endif

is less readable than

#ifdef LIBFUZZER_APPLE
...
#endif

I'm missing your point here.

kcc added inline comments.May 31 2016, 10:45 PM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

Either is fine for me here.
directly using #ifdef *_APPLE and *_LINUX is shorter.
Using a proxy macros *_USE_* shows the intent more clear but adds clatter.

Decide between yourselves. :)

kcc added inline comments.May 31 2016, 10:51 PM
lib/Fuzzer/FuzzerDriver.cpp
422 ↗(On Diff #59173)

move all of this into the other FuzzerDriver.
This one is an atavism of my poor design choice in the past, let's not resurrect it.

lib/Fuzzer/FuzzerInternal.h
475 ↗(On Diff #59173)

I did not suggest to pass it by value.
Can't it be created inside Fuzzer::Fuzzer with a default CTOR?

zaks.anna added inline comments.Jun 1 2016, 9:15 AM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

You're saying that
#ifdef LIBFUZZER_USE_DLOPEN
...
#endif
is less readable than
#ifdef LIBFUZZER_APPLE
...
#endif
I'm missing your point here.

Yes, it would be more difficult to understand if code is executed on the given platform or not as you are introducing a level of indirection.

I'd rather write it in a way that can be overridden through a build setting,

Overriding a build setting would add yet another level of indirection. Debugging this won't be fun especially given all the different ways in which this could be built by various parties.

Unless there is a real reason we want to override it, I'd rather keep things simple.

mehdi_amini added inline comments.Jun 1 2016, 10:20 AM
lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
15 ↗(On Diff #59173)

I disagree with all of what you wrote above Anna.
But at the same time I don't work on sanitizers usually so I won't weight any more on this.

delcypher added inline comments.Jun 1 2016, 11:59 AM
lib/Fuzzer/FuzzerDriver.cpp
422 ↗(On Diff #59173)

A problem with your suggestion is it makes calling the LLVMFuzzerInitialize(int *argc, char ***argv) function difficult because we don't have the argv anymore (inside static int FuzzerDriver(const std::vector<std::string> &Args, UserCallback Callback, fuzzer::ExternalFunctions EF) ). We can emulate argc using Args.size() but argv is more of pain because we have std::vector<std::string> not std::vector<const char*>.

Also if the intention of LLVMFuzzerInitialize(int *argc, char ***argv) is to allow the command line arguments to be changed then what you're proposing doesn't work because the user needs to make the modifications before we create the std::vector<std::string> not after we create it.

lib/Fuzzer/FuzzerInternal.h
475 ↗(On Diff #59173)

We can create an instance of ExternalFunctions inside the Fuzzer class in the default constructor but then we have the problem of also needing the call LLVMFuzzerInitialize which requires an instance of ExternalFunctions and that has to be done before we create the Fuzzer class.

So either we need to make two instances of ExternalFunctions and initialize both of them (one inside the Fuzzer class and one outside) OR a single instance of ExternalFunctions which we initialize and pass to the Fuzzer class.

I can do either. Decide which you'd prefer and I will implement that.

kcc added inline comments.Jun 1 2016, 12:44 PM
lib/Fuzzer/FuzzerDriver.cpp
422 ↗(On Diff #59173)

Can you simply merge these two functions leaving just this one?
FuzzerDriver(int *argc, char ***argv, UserCallback Callback);
using std::vector there is unnecessary.

lib/Fuzzer/FuzzerExtFunctions.h
26 ↗(On Diff #59173)

We still have a duplication between here and FuzzerExtFunction.def
Can you use a bit more macro tricks to eliminate FuzzerExtFunction.def?

lib/Fuzzer/FuzzerInternal.h
475 ↗(On Diff #59173)

I'd prefer to have two instances of EF. That's a tiny bit slower at startup but leaves the code simpler.

delcypher updated this revision to Diff 59323.Jun 1 2016, 6:04 PM
delcypher updated this object.
  • Remove static int FuzzerDriver(const std::vector<std::string> &Args, UserCallback Callback).
  • Initialize a new ExternalFunctions inside the Fuzzer class.
  • Rename INIT_FUNC macro to EXT_FUNC to better reflect how its being used.
  • Use INIT_FUNC macro to add function pointer declaration to the ExternalFunctions macro to reduce redundancy.
delcypher marked 5 inline comments as done.Jun 1 2016, 6:11 PM
delcypher added inline comments.
lib/Fuzzer/FuzzerExtFunctions.h
27 ↗(On Diff #59323)

I've done this. One place where there is technically still duplication is in FuzzerExtFunctionsWeak.cpp with the declaration of the weak functions but I'd rather leave that how it is currently because I don't want too much macro magic.

lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
16 ↗(On Diff #59323)

In interest of a simple implementation I'm sticking with the LIBFUZZER_APPLE and LIBFUZZER_LINUX for now. We can revisit this in the future if it really becomes a problem.

kcc added a comment.Jun 1 2016, 6:17 PM

we are converging! :)

lib/Fuzzer/FuzzerExtFunctions.def
19 ↗(On Diff #59323)

s/ ,false/, false

lib/Fuzzer/FuzzerExtFunctions.h
22 ↗(On Diff #59323)

Why not ExternalFunctions::ExternalFunctions?

lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
48 ↗(On Diff #59323)

} // namespace fuzzer

lib/Fuzzer/FuzzerExtFunctionsWeak.cpp
21 ↗(On Diff #59323)

can you also use FuzzerExtFunctions.def here?

lib/Fuzzer/FuzzerMain.cpp
12 ↗(On Diff #59323)

do you still need this?

lib/Fuzzer/test/FuzzerUnittest.cpp
4 ↗(On Diff #59323)

do you still need this?

delcypher marked an inline comment as done.Jun 1 2016, 6:59 PM
delcypher added inline comments.
lib/Fuzzer/FuzzerExtFunctions.h
22 ↗(On Diff #59323)

My original thinking here is that you might not want to initialize immediately after allocating memory for an instance of ExternalFunctions. My worry is that if ExternalFunctions was ever created in some early start up code (i.e. before main() is called) then we might not want to call dlsym() at this time.

This is quite a narrow use case (that's not even required right now) so perhaps it would be better to just use a normal class initializer?

lib/Fuzzer/FuzzerExtFunctionsWeak.cpp
21 ↗(On Diff #59323)

Doing so would reduce clarity but I can if you want.

lib/Fuzzer/FuzzerMain.cpp
12 ↗(On Diff #59323)

Nope. Good catch.

lib/Fuzzer/test/FuzzerUnittest.cpp
4 ↗(On Diff #59323)

Nope. Good catch.

delcypher updated this revision to Diff 59334.Jun 1 2016, 7:28 PM
  • Add closing namespace comments
  • Fix minor white space issue
  • Drop unused includes
delcypher marked 8 inline comments as done.Jun 1 2016, 7:30 PM
delcypher added inline comments.Jun 1 2016, 7:32 PM
lib/Fuzzer/FuzzerExtFunctions.h
23 ↗(On Diff #59334)

@kcc This (not using ExternalFunctions::ExternalFunctions) is the only issue you raised that I haven't tackled. What would you like to do here?

kcc added inline comments.Jun 1 2016, 7:56 PM
lib/Fuzzer/FuzzerExtFunctions.def
19 ↗(On Diff #59334)

80 chars per line

lib/Fuzzer/FuzzerExtFunctions.h
23 ↗(On Diff #59334)

Use a CTOR

delcypher updated this revision to Diff 59335.Jun 1 2016, 8:58 PM
delcypher updated this object.
  • Remove 80 column violation in FuzzerExtFunctions.def
  • Use CTOR for ExternalFunctions rather than explicit Init() function.
kcc accepted this revision.Jun 1 2016, 9:15 PM
kcc edited edge metadata.

LGTM with 3 nits.
Ok to commit once you fix them (assuming all tests still work on Linux)
Thanks!

lib/Fuzzer/FuzzerExtFunctionsDlsym.cpp
1 ↗(On Diff #59335)

Fix the file name in the comment

lib/Fuzzer/FuzzerExtFunctionsWeak.cpp
1 ↗(On Diff #59335)

fix file name

lib/Fuzzer/FuzzerLoop.cpp
150 ↗(On Diff #59335)

Why do you need this now?

This revision is now accepted and ready to land.Jun 1 2016, 9:15 PM
delcypher updated this revision to Diff 59336.Jun 1 2016, 9:45 PM
delcypher edited edge metadata.
  • Fix incorrect filenames in comments
  • Remove unnecessary initialization in initializer list.

@kcc

In D20741#446582, @kcc wrote:

LGTM with 3 nits.
Ok to commit once you fix them (assuming all tests still work on Linux)
Thanks!

Nits fixed. I've rebased and the tests have broken under Linux. Looks like I need to upgrade to newer compiler-rt due to the introduction of a dependence on __sanitizer_print_memory_profile (r271465). I'm upgrading now and if tests pass after building a newer clang with the latest compiler-rt then I'll commit this.

kcc added a comment.Jun 1 2016, 10:23 PM

Looks like I need to upgrade to newer compiler-rt

You always should.

This revision was automatically updated to reflect the committed changes.