This change allows to perform corpus merging in two steps. This is useful when
the user wants to address the following two points simultaneously:
- Get trustworthy incremental stats for the coverage and corpus size changes when adding new corpus units.
- Make sure the shorter units will be preferred when two or more units give the same unique signal (equivalent to the REDUCE logic).
This solution was brainstormed together with @kcc, hopefully it looks good to
the other people too. The proposed use case scenario:
- We have a fuzz_target binary and existing_corpus directory.
- We do fuzzing and write new units into the new_corpus directory.
- We want to merge the new corpus into the existing corpus and satisfy the points mentioned above.
- We create an empty directory merged_corpus and run the first merge step:
./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus
this provides the initial stats for existing_corpus, e.g. from the output:
MERGE-OUTER: 3 new files with 11 new features added; 11 new coverage edges
- We recreate merged_corpus directory and run the second merge step:
./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus ./new_corpus
this provides the final stats for the merged corpus, e.g. from the output:
MERGE-OUTER: 6 new files with 14 new features added; 14 new coverage edges
Alternative solutions to this approach are:
A) Store precise coverage information for every unit (not only unique signal).
B) Execute the same two steps without reusing the control file.
Either of these would be suboptimal as it would impose an extra disk or CPU load
respectively, which is bad given the quadratic complexity in the worst case.
Tested on Linux, Mac, Windows.