This is an archive of the discontinued LLVM Phabricator instance.

Extract and expose FuzzerMonitor C interface.
AbandonedPublic

Authored by aarongreen on Jan 12 2021, 9:38 AM.

Details

Summary

This change surfaces the static Fuzzer callbacks used to handle errors, in anticipation of them being called by a platform-specific IPC library to create cross-process fuzzers.

It also adds a PID parameter to process-specific diagnostic functions used by those callbacks, e.g. PrintStackTrace. Currently, this parameter is always 0, which indicates the local process.

This is change 9 of (at least) 18 for cross-process fuzzing support.

Diff Detail

Event Timeline

aarongreen created this revision.Jan 12 2021, 9:38 AM
aarongreen requested review of this revision.Jan 12 2021, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 9:38 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
aarongreen updated this revision to Diff 321775.Feb 5 2021, 8:41 AM

Interface change LGTM.

For the PID stuff, what is the plan for process-specific reports? To send a signal to the specified process to get a stack trace/memory profile?

compiler-rt/lib/fuzzer/FuzzerInternal.h
86–90

Should we use pid_t?

104

Why do we pass a PID to a function that returns a PID?

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
238
305

PID is unused.

aarongreen updated this revision to Diff 329128.Mar 8 2021, 1:37 PM
aarongreen marked 2 inline comments as done.

Interface change LGTM.

For the PID stuff, what is the plan for process-specific reports? To send a signal to the specified process to get a stack trace/memory profile?

That seemed the simplest way forward, yes. I realize I'm punting a bit on the question of where exactly the stack trace and memory profile get printed as it's OS-specific. As an example, for Linux, I assume the remote process is started by the engine process and inherits its stdout/stderr. The PrintStack/PrintMemoryProfile calls are not performance sensitive and should be synchronous, so this would put the process details in the "right" place. I recognize cases where Printf isn't going to stdout, but to a fuzz-N.log might need more attention.

compiler-rt/lib/fuzzer/FuzzerInternal.h
86–90

I'd be happy to, but ::fuzzer::GetPid() returns unsigned long. I'm willing to slip a change under this one that changes that everywhere, although I expect some spots will need a little more massaging (since pid_t is signed).

Would you prefer this?

104

The idea here is if you pass 0, you get the current PID, and otherwise you get whatever you pass. That way the error callbacks can have a default PID=0 that is replaced when called from a remote process, but isn't if called from the current process.

But there's things I don't like about it on reflection: a) it's private and doesn't touch the object, so it doesn't need to be a method, and b) both its name and function clearly lead to confusion. Since it's only used 3 times, I'm replacing it with PID ? PID : GetPid() (where GetPid here is ::fuzzer::GetPid from FuzzerUtil.h).

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
305

This is true, but at the risk of violating YAGNI, I'd like to pass along the identifier from the remote process to match the other remotely-callable error callbacks, and in case the implementation of LeakCallback changes in the future. Having the remote PID in the remote interface seems prudent to me, but WDYT?

That seemed the simplest way forward, yes. I realize I'm punting a bit on the question of where exactly the stack trace and memory profile get printed as it's OS-specific. As an example, for Linux, I assume the remote process is started by the engine process and inherits its stdout/stderr. The PrintStack/PrintMemoryProfile calls are not performance sensitive and should be synchronous, so this would put the process details in the "right" place. I recognize cases where Printf isn't going to stdout, but to a fuzz-N.log might need more attention.

Sorry if I'm misunderstanding the design. But how will we prevent stack trace skew from signals being delivered asynchronously?

i.e. We need a stack trace *now*, so we send a signal to that process. But by the time that process receives the signal, the IP is somewhere else, maybe a different function. So our stack trace is wrong.

compiler-rt/lib/fuzzer/FuzzerInternal.h
86–90

pid_t seems more appropriate to me, but this is a nit.

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
305

Sounds good.

aarongreen added a comment.EditedMar 10 2021, 9:03 AM

Sorry if I'm misunderstanding the design. But how will we prevent stack trace skew from signals being delivered asynchronously?

i.e. We need a stack trace *now*, so we send a signal to that process. But by the time that process receives the signal, the IP is somewhere else, maybe a different function. So our stack trace is wrong.

You're correct. I've addressed this in both the implementations I've done so far, but I haven't stipulated it in the interface and I probably need to. On both Linux and Fuchsia, if an error callback is invoked, the calling thread in the remote process takes over handling responses from the engine process. It then sits in a loop handling any requests that come in until the connection is reset, at which point it exits. In this way, the code never returns from the stack frame above the error callback, and the stack trace is correct (modulo missing the actual call to, e.g., fuzzer::Fuzzer::LeakCallback). I'll add that requirement to the comments in FuzzerMonitor.h; if an implementer fails to do so, it should be obvious when the llvm-lit tests fail.

Sorry if I'm misunderstanding the design. But how will we prevent stack trace skew from signals being delivered asynchronously?

i.e. We need a stack trace *now*, so we send a signal to that process. But by the time that process receives the signal, the IP is somewhere else, maybe a different function. So our stack trace is wrong.

You're correct. I've addressed this in both the implementations I've done so far, but I haven't stipulated it in the interface and I probably need to. On both Linux and Fuchsia, if an error callback is invoked, the calling thread in the remote process takes over handling responses from the engine process. It then sits in a loop handling any requests that come in until the connection is reset, at which point it exits. In this way, the code never returns from the stack frame above the error callback, and the stack trace is correct (modulo missing the actual call to, e.g., fuzzer::Fuzzer::LeakCallback). I'll add that requirement to the comments in FuzzerMonitor.h; if an implementer fails to do so, it should be obvious when the llvm-lit tests fail.

Please add the comments, and I'll review this again.

aarongreen abandoned this revision.Sep 1 2021, 9:01 AM

Multiprocess fuzzing will not be supported by the libFuzzer maintainers. Fuchsia has implemented a new approach with their Component Fuzzing Framework (RFC-117).