This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Add a command-line option for tracing mutation of corpus inputs in the dot graph format.
ClosedPublic

Authored by dokyungs on Aug 25 2020, 11:10 AM.

Details

Summary

This patch adds a new command-line option -mutation_graph_file=FILE for
debugging purposes, which traces how corpus inputs evolve during a fuzzing
run. For each new input that is added to the corpus, a new vertex corresponding
to the added input, as well as a new edge that connects its base input to itself
are written to the given file. Each vertex is labeled with the filename of the
input, and each edge is labeled with the mutation sequence that led to the input
w.r.t. its base input.

The format of the mutation graph file is the dot file format. Once prepended and
appended with "graph {" and "}", respectively, the graph becomes a valid dot
file and can be visualized.

Diff Detail

Event Timeline

dokyungs created this revision.Aug 25 2020, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2020, 11:10 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dokyungs requested review of this revision.Aug 25 2020, 11:10 AM
hctim added inline comments.Aug 25 2020, 5:42 PM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
473

Isn't std::string a more appropriate type here? We're not creating a Unit to dump, we're creating a string vertex to dump.

475

In the cases where we have a base input, we don't need to create a blank vertex, right? So shouldn't this be conditional based on not having a base input?

477

why the "V" prefixes (is there something like having a decimal at the start of the name invalid?)

if so, can we escape with quotes instead?

478

nit - instead of using operator+ inside of append, break each concatenation into its own append (and same elsewhere).

Especially for more-than-two strings being appended each time, creating the temporary is expensive: https://quick-bench.com/q/wHhQi64BxEKJyv2m08OzQ1Yzy9g

(i.e. a + b + c requires a temporary to store the result of a + b, which can be expensive)

480–482

U += newVertex; ?

492–494

same here, U += newEdge?

compiler-rt/test/fuzzer/mutation-graph.test
10

Could you grab the vertex (perhaps the one that does the edge from H -> Hi) and explicitly enter the SHA's for that mutation here? Probably don't want to include the type of the mutation (in case that changes over time), but the testcase contents should be the same.

dokyungs updated this revision to Diff 289860.Sep 3 2020, 9:42 PM

Addressed comments.

dokyungs marked 7 inline comments as done.Sep 3 2020, 9:52 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
473

Yes. std::string sounds much more appropriate. The patch now uses std::string.

475

BaseII being null means that II is seed input. So we should always create a vertex both for seed input and non-seed input, but we don't need any edge for the seed input. That's why the edge addition below is conditioned on BaseII.

477

Escaped with quotes.

478

Thanks for the benchmark! I changed every operator+ to .append.

492–494

I am appending to a single output string for both vertex and edge. How does it look now?

compiler-rt/test/fuzzer/mutation-graph.test
10

I am now grabbing two vertex * edge pairs. The test still doesn't check, however, from what input "H" and "Hi" was found, because there are many candidates (e.g., "H" -> "Hx" -> "Hi" or "H" -> "HHH" -> "Hi", etc.) if we give a random seed.

Still I give seed=1 here to avoid the cases where libFuzzer finds Hi in a single mutation, although it's very unlikely.

hctim accepted this revision.Sep 8 2020, 1:21 PM

Mostly just some nits - other than that LGTM.

compiler-rt/lib/fuzzer/FuzzerFlags.def
77–78

"internal flag. Saves a graph (in DOT format) to mutation_graph_file. The graph contains a vertex for each input that has unique coverage; directed edges are provided between parents and children where the child has unique coverage, and are recorded with the type of mutation that caused the child."

compiler-rt/lib/fuzzer/FuzzerIO.cpp
86

Shouldn't mode "a" be sufficient here?

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
487–489

nit - collapse these "\" -> \""

491–492

nit - again collapse

This revision is now accepted and ready to land.Sep 8 2020, 1:21 PM
dokyungs updated this revision to Diff 290586.Sep 8 2020, 2:34 PM
dokyungs marked 6 inline comments as done.

Addressed comments.

dokyungs marked 4 inline comments as done.Sep 8 2020, 2:35 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerIO.cpp
86

Yes!

morehouse accepted this revision.Sep 8 2020, 5:18 PM

This is a great patch, thank you DK!

compiler-rt/lib/fuzzer/FuzzerFlags.def
77

I think this is not an "internal flag", since it needs to be passed by the user.

compiler-rt/lib/fuzzer/FuzzerIO.cpp
81

s/c_str/data

dokyungs updated this revision to Diff 290627.Sep 8 2020, 7:51 PM
dokyungs marked an inline comment as done.

Addressed comments.

dokyungs retitled this revision from [libFuzzer] Add an internal flag for tracing mutation of corpus inputs in the dot graph format. to [libFuzzer] Add a command-line option for tracing mutation of corpus inputs in the dot graph format..Sep 8 2020, 7:55 PM
dokyungs edited the summary of this revision. (Show Details)
dokyungs marked 2 inline comments as done.Sep 8 2020, 8:28 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerFlags.def
77

Removed. Thanks!

compiler-rt/lib/fuzzer/FuzzerIO.cpp
81

Done.

hctim accepted this revision.Sep 8 2020, 8:34 PM

LGTM

This revision was landed with ongoing or failed builds.Sep 8 2020, 8:42 PM
This revision was automatically updated to reflect the committed changes.
dokyungs marked 2 inline comments as done.