This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/fuzzer/
-
fuzzer/
5/6
FuzzerDefs.h
3/4
FuzzerMerge.cpp
-
test/fuzzer/
-
fuzzer/
-
merge.test
-
merge_two_step.test

Differential D66107

[libFuzzer] Make -merge=1 to reuse coverage information from the control file.
ClosedPublic

Authored by Dor1s on Aug 12 2019, 1:06 PM.

Download Raw Diff

Details

Reviewers

morehouse
metzman
hctim
kcc

Commits

rL371620: [libFuzzer] Make -merge=1 to reuse coverage information from the control file.
rGf054067f276d: [libFuzzer] Make -merge=1 to reuse coverage information from the control file.
rCRT371620: [libFuzzer] Make -merge=1 to reuse coverage information from the control file.

Summary

This change allows to perform corpus merging in two steps. This is useful when
the user wants to address the following two points simultaneously:

Get trustworthy incremental stats for the coverage and corpus size changes when adding new corpus units.
Make sure the shorter units will be preferred when two or more units give the same unique signal (equivalent to the REDUCE logic).

This solution was brainstormed together with @kcc, hopefully it looks good to
the other people too. The proposed use case scenario:

We have a fuzz_target binary and existing_corpus directory.
We do fuzzing and write new units into the new_corpus directory.
We want to merge the new corpus into the existing corpus and satisfy the points mentioned above.
We create an empty directory merged_corpus and run the first merge step:

./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus

this provides the initial stats for existing_corpus, e.g. from the output:

MERGE-OUTER: 3 new files with 11 new features added; 11 new coverage edges

We recreate merged_corpus directory and run the second merge step:

./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus ./new_corpus

this provides the final stats for the merged corpus, e.g. from the output:

MERGE-OUTER: 6 new files with 14 new features added; 14 new coverage edges

Alternative solutions to this approach are:

A) Store precise coverage information for every unit (not only unique signal).
B) Execute the same two steps without reusing the control file.

Either of these would be suboptimal as it would impose an extra disk or CPU load
respectively, which is bad given the quadratic complexity in the worst case.

Tested on Linux, Mac, Windows.

Diff Detail

Repository

rCRT Compiler Runtime

Build Status

Buildable 38013
Build 38012: arc lint + arc unit

Event Timeline

Dor1s created this revision.Aug 12 2019, 1:06 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 12 2019, 1:06 PM

Herald added subscribers: Restricted Project, mgrang, delcypher. · View Herald Transcript

fix a typo

Hi everyone,

This CL is a proof-of-concept to start the discussion. If you approve the approach and/or suggest any improvements, I'll go ahead and polish it + update the tests. Please take a look :)

Context: https://github.com/google/clusterfuzz/pull/815#issuecomment-520087538

Harbormaster completed remote builds in B36615: Diff 214695.Aug 12 2019, 1:10 PM

Harbormaster completed remote builds in B36617: Diff 214697.

Actually updated the test to prove everyone (including myself) that this works.

Harbormaster completed remote builds in B36621: Diff 214705.Aug 12 2019, 2:25 PM

Dor1s edited the summary of this revision. (Show Details)Aug 12 2019, 2:31 PM

Dor1s edited the summary of this revision. (Show Details)Aug 12 2019, 2:50 PM

Dor1s edited the summary of this revision. (Show Details)

Update the test a bit more

Harbormaster completed remote builds in B36625: Diff 214714.Aug 12 2019, 2:52 PM

Friendly ping :) Feedback on the description would be the most important at this point, as I feel like I can improve the code a bit more. But if you could check out the code, that would be also great. Note there are at least two TODOs that I'll address before merging. It's still a draft, even though it works.

A few high-level thoughts/questions:

What do you plan to use the improved merge stats for?
With this change, -merge seems to no longer mean we're "merging" the input directories into the output directory, since existing files in the output can be deleted.
The change seems a bit complex. I wonder if there's a way to use the existing "greedy selection" approach to make things simpler. Maybe if we combine the output corpus SizedFile vector into the input corpus SizedFile vector before sorting and writing the initial control file.

Thanks for looking into making this change. It should be very useful for CF.
I'll try to take a look again tomorrow morning with fresh eyes.

lib/fuzzer/FuzzerDefs.h
22	Nit: before `<vector>`.
lib/fuzzer/FuzzerDriver.cpp
499 ↗	(On Diff #214714)	I'm a bit confused why WriteToOutputCorpus isn't going to truncate tescases (which i'm pretty sure we don't want). I know your CL didn't introduce this, but do you know if this is the case?
lib/fuzzer/FuzzerFork.cpp
224 ↗	(On Diff #214714)	How about doing the FilesToReplace loop inside of CrashResistantMerge instead so it doesn't need to be repeated and so that `CrashResistantMerge` doesn't need to be passed more arguments.
lib/fuzzer/FuzzerMerge.cpp
186	I'm not sure this approach with signatures works the same way reduction works within a fuzzing run (ie: the way I think we probably want). I'll have to think about this more by tomorrow, since I haven't 100% thought this through and the code dealing with this during fuzzing is tricky. As I understand it, this implementation it only replaces initial corpus elements with new ones that are smaller and are exactly the same in coverage. However, during fuzzing corpus elements are replaced if a new corpus element is found that is smaller and covers the features for which the initial element is the smallest testcase, i.e. testcases can be replaced by smaller testcases that aren't exactly equivalent. Example: Unit A is the smallest element/unit covering feature X. A only covers feature X and no other feature. Then we find Unit B which is smaller than Unit A. B covers features X and is the only (or smallest) unit that covers Y. During fuzzing we will delete A and add B since B is now the smallest unit covering X. Here, if A were an initial testcase (primary/output corpus) and B a new one (secondary/input corpus), I think we don't delete A. Is my example correct? If so then there may be a problem if there is an initial Unit C that is already covering Y, I don't think B will get added to the merged corpus, since it isn't a pure reduction of any testcase.

Guys, thanks a lot for the feedback! Some answers below, I'll get back to the code soon.

In D66107#1628236, @morehouse wrote:

A few high-level thoughts/questions:

What do you plan to use the improved merge stats for?

In various ClusterFuzz instances we're tracking the stats for every fuzzer run. It's important as we jungle a bunch of different so-called strategies, and now we have an automated logic that auto-assigns probabilities to the strategies based on the stats. Basically, we must have a trustworthy stats w.r.t. new coverage / corpus changes, and relying on libFuzzer's merge is our best call. Otherwise, we do a lot of hacky parsing and still have problems with stats in certain cases :(

Outside of CF, I think it's a reasonable feature when a user merges two or more corpuses, they would see whether that gave them any new coverage, etc.

With this change, -merge seems to no longer mean we're "merging" the input directories into the output directory, since existing files in the output can be deleted.

That's true... However, the -merge is also supposed to be used for corpus minimization, and in this case it'll be doing even a better job.

The change seems a bit complex. I wonder if there's a way to use the existing "greedy selection" approach to make things simpler. Maybe if we combine the output corpus SizedFile vector into the input corpus SizedFile vector before sorting and writing the initial control file.

I didn't think it was possible as my perception of the REDUCE logic seemed to be wrong. Now, with your and Jonathan's comments, it feels like this should be possible to achieve through the existing greedy selection. Even if I have to loop through some files twice (not sure if that'll be needed, just speculating), that should be feasible, as it all happens in memory and doesn't invoke the target function.

lib/fuzzer/FuzzerDriver.cpp
499 ↗	(On Diff #214714)	Not sure I understood what exactly you meant: If you're asking about the contents of the unit (in memory), then I think it's not getting truncated here as the `MaxLen` has been applied earlier as well (e.g. when reading the inputs) If you're asking about truncating the actual corpus units on disk, I think it doesn't happen because this function calculates the filename based on the contents (through hash), i.e. a truncated file contents would be given a different filename
lib/fuzzer/FuzzerFork.cpp
224 ↗	(On Diff #214714)	Hm, sounds like an option. Thanks!
lib/fuzzer/FuzzerMerge.cpp
186	Wow, that's a great point! I was thinking that the units should give exactly the same coverage. If that's not the case, I should be able to achieve what we need without introducing the signature stuff, and it'll likely look better :) Thanks for pointing this out!

I would prefer to not introduce this complexity.
For periodic pruning we can use an empty dir, like you describe.
For stats, we can use the overal corpus size (in bytes and in files)

In D66107#1633546, @kcc wrote:

I would prefer to not introduce this complexity.
For periodic pruning we can use an empty dir, like you describe.
For stats, we can use the overal corpus size (in bytes and in files)

Sorry, i didn't get a chance to re-write this in a better way yet.

The problem with an empty dir is that we don't have stats for the existing corpus. In order to get those, we'd need to do an extra ./fuzzer -runs=0 ... execution for the current working corpus. And of course parse the logs yet again, calculate the difference, etc.

It is not necessary in some cases, but whenever we use corpus subset strategy or an arbitrary -max_len value, we do not get the correct information about the current coverage. Value profiling strategy is another trouble maker if we continue to calculate coverage on the user side.

Implement another solution brainstromed with kcc@

Harbormaster completed remote builds in B37674: Diff 218523.Sep 3 2019, 12:55 PM

Dor1s retitled this revision from [libFuzzer] Improve -merge= process to account for REDUCED corpus units. to [libFuzzer] Make -merge=1 to reuse coverage information from the control file..Sep 3 2019, 1:58 PM

Dor1s edited the summary of this revision. (Show Details)

Herald added a subscriber: JDevlieghere. · View Herald TranscriptSep 3 2019, 1:58 PM

Dor1s edited the summary of this revision. (Show Details)Sep 3 2019, 1:58 PM

Hey @morehouse and @metzman,

This is a new approach that @kcc and I agreed on (Kostya hasn't seen the implementation yet though).

The main idea is that we run the merge twice (without and with the new corpus). To avoid wasting cycles, coverage information is re-used when the control file is provided by the user (and it's possible to reuse it).

Please take a look.

We recreate merged_corpus directory and run the second merge step:

./fuzz_target -merge=1 -merge_control_file=MCF ./new_corpus ./existing_corpus ./new_corpus

Should this be ./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus ./new_corpus?

In D66107#1660264, @metzman wrote:

We recreate merged_corpus directory and run the second merge step:

./fuzz_target -merge=1 -merge_control_file=MCF ./new_corpus ./existing_corpus ./new_corpus

Should this be ./fuzz_target -merge=1 -merge_control_file=MCF ./merged_corpus ./existing_corpus ./new_corpus?

Oh, right :( my bad, fixing the description now!

Dor1s edited the summary of this revision. (Show Details)Sep 6 2019, 6:46 AM

LGTM with one nit, also asking Matt for a second review.

lib/fuzzer/FuzzerDefs.h
195	any reason to do this? we had the special types for Vector/Set due to the problem which we don't have anymore (we solved it by using a privatized STL)

Dor1s marked an inline comment as done.Sep 10 2019, 1:03 PM

Dor1s added inline comments.

lib/fuzzer/FuzzerDefs.h
195	Not really, I just thought I would follow that pattern of pre-defining STL container types using `fuzzer_allocator`. I'm fine to get rid of it, if you prefer.

Dor1s marked an inline comment as done.Sep 10 2019, 3:04 PM

Dor1s added inline comments.

lib/fuzzer/FuzzerDefs.h
195	Discussed offline. I'll remove this change after Matt's review.

morehouse accepted this revision.Sep 10 2019, 3:14 PM

morehouse added inline comments.

lib/fuzzer/FuzzerDefs.h
195	Please remove it.
lib/fuzzer/FuzzerMerge.cpp
344	Nit: let's omit the else for less nesting since the if short-circuits

This revision is now accepted and ready to land.Sep 10 2019, 3:14 PM

Address review comments

Harbormaster completed remote builds in B38013: Diff 219707.Sep 11 2019, 7:08 AM

Dor1s added inline comments.Sep 11 2019, 7:09 AM

lib/fuzzer/FuzzerDefs.h
195	Done
lib/fuzzer/FuzzerMerge.cpp
344	Done

Closed by commit rGf054067f276d: [libFuzzer] Make -merge=1 to reuse coverage information from the control file. (authored by Dor1s). · Explain WhySep 11 2019, 7:11 AM

This revision was automatically updated to reflect the committed changes.

The new test is failing for me because CHECK1 is not satisfied. Instead the line says MERGE-OUTER: 3 new files with 12 new features added; 11 new coverage edges (instead of 11 new features). I'm currently investigating what's wrong here, let me know if you have an idea.

In D66107#1666528, @Hahnfeld wrote:

The new test is failing for me because CHECK1 is not satisfied. Instead the line says MERGE-OUTER: 3 new files with 12 new features added; 11 new coverage edges (instead of 11 new features). I'm currently investigating what's wrong here, let me know if you have an idea.

Thanks for letting me know. I'm trying on the latest ToT now. Maybe some instrumentation bits changed while I was working on this.

Hm, doesn't fail for me, but I guess the feature detection might be platform-dependent to some extent, so I'm fine with replacing the number of the features with a regex. Do you want to upload a change, or should I?

In D66107#1666569, @Dor1s wrote:

Hm, doesn't fail for me, but I guess the feature detection might be platform-dependent to some extent, so I'm fine with replacing the number of the features with a regex. Do you want to upload a change, or should I?

I think the bots are also green, so it might be just related to how I build Clang (with libc++, for example). I'm half way through building ToT with GCC, that should give insight whether it's related to my system or my configuration.

Just curious: So the number of new features and coverage edges does not need to be the same? I also have "15 new freatures" for CHECK2.

In D66107#1666574, @Hahnfeld wrote:

In D66107#1666569, @Dor1s wrote:

Hm, doesn't fail for me, but I guess the feature detection might be platform-dependent to some extent, so I'm fine with replacing the number of the features with a regex. Do you want to upload a change, or should I?

I think the bots are also green, so it might be just related to how I build Clang (with libc++, for example). I'm half way through building ToT with GCC, that should give insight whether it's related to my system or my configuration.

Actually, some ARM bots got broken, so I've uploaded a fix: https://reviews.llvm.org/D67458

Just curious: So the number of new features and coverage edges does not need to be the same? I also have "15 new freatures" for CHECK2.

Yes, features include coverage edges + additional signal (e.g. value profiling), that's why the number is different and why number of features can differ between platforms.

In D66107#1666702, @Dor1s wrote:

In D66107#1666574, @Hahnfeld wrote:

I think the bots are also green, so it might be just related to how I build Clang (with libc++, for example). I'm half way through building ToT with GCC, that should give insight whether it's related to my system or my configuration.

Actually, some ARM bots got broken, so I've uploaded a fix: https://reviews.llvm.org/D67458

Just curious: So the number of new features and coverage edges does not need to be the same? I also have "15 new freatures" for CHECK2.

Yes, features include coverage edges + additional signal (e.g. value profiling), that's why the number is different and why number of features can differ between platforms.

Thanks for the explanation and the fix :)

Revision Contents

Path

Size

lib/

fuzzer/

FuzzerDefs.h

5 lines

FuzzerMerge.cpp

73 lines

test/

fuzzer/

merge.test

2 lines

merge_two_step.test

31 lines

Diff 219707

lib/fuzzer/FuzzerDefs.h

Show All 9 Lines

#ifndef LLVM_FUZZER_DEFS_H		#ifndef LLVM_FUZZER_DEFS_H
#define LLVM_FUZZER_DEFS_H		#define LLVM_FUZZER_DEFS_H

#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <cstring>		#include <cstring>
		#include <memory>
		#include <set>
#include <string>		#include <string>
#include <vector>		#include <vector>
#include <set>
		metzmanUnsubmitted Not Done Reply Inline Actions Nit: before `<vector>`. metzman: Nit: before `<vector>`.
#include <memory>

// Platform detection.		// Platform detection.
#ifdef __linux__		#ifdef __linux__
#define LIBFUZZER_APPLE 0		#define LIBFUZZER_APPLE 0
#define LIBFUZZER_FUCHSIA 0		#define LIBFUZZER_FUCHSIA 0
#define LIBFUZZER_LINUX 1		#define LIBFUZZER_LINUX 1
#define LIBFUZZER_NETBSD 0		#define LIBFUZZER_NETBSD 0
#define LIBFUZZER_FREEBSD 0		#define LIBFUZZER_FREEBSD 0
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	template<typename T>
};		};

template<typename T>		template<typename T>
using Vector = std::vector<T, fuzzer_allocator<T>>;		using Vector = std::vector<T, fuzzer_allocator<T>>;

template<typename T>		template<typename T>
using Set = std::set<T, std::less<T>, fuzzer_allocator<T>>;		using Set = std::set<T, std::less<T>, fuzzer_allocator<T>>;

typedef Vector<uint8_t> Unit;		typedef Vector<uint8_t> Unit;
		kccUnsubmitted Done Reply Inline Actions any reason to do this? we had the special types for Vector/Set due to the problem which we don't have anymore (we solved it by using a privatized STL) kcc: any reason to do this? we had the special types for Vector/Set due to the problem which we…
		Dor1sAuthorUnsubmitted Done Reply Inline Actions Not really, I just thought I would follow that pattern of pre-defining STL container types using `fuzzer_allocator`. I'm fine to get rid of it, if you prefer. Dor1s: Not really, I just thought I would follow that pattern of pre-defining STL container types…
		morehouseUnsubmitted Done Reply Inline Actions Please remove it. morehouse: Please remove it.
		Dor1sAuthorUnsubmitted Done Reply Inline Actions Done Dor1s: Done
		Dor1sAuthorUnsubmitted Done Reply Inline Actions Discussed offline. I'll remove this change after Matt's review. Dor1s: Discussed offline. I'll remove this change after Matt's review.
typedef Vector<Unit> UnitVector;		typedef Vector<Unit> UnitVector;
typedef int (UserCallback)(const uint8_t Data, size_t Size);		typedef int (UserCallback)(const uint8_t Data, size_t Size);

int FuzzerDriver(int argc, char **argv, UserCallback Callback);		int FuzzerDriver(int argc, char **argv, UserCallback Callback);

uint8_t *ExtraCountersBegin();		uint8_t *ExtraCountersBegin();
uint8_t *ExtraCountersEnd();		uint8_t *ExtraCountersEnd();
void ClearExtraCounters();		void ClearExtraCounters();

extern bool RunningUserCallback;		extern bool RunningUserCallback;

} // namespace fuzzer		} // namespace fuzzer

#endif // LLVM_FUZZER_DEFS_H		#endif // LLVM_FUZZER_DEFS_H

lib/fuzzer/FuzzerMerge.cpp

Show All 13 Lines
#include "FuzzerInternal.h"		#include "FuzzerInternal.h"
#include "FuzzerTracePC.h"		#include "FuzzerTracePC.h"
#include "FuzzerUtil.h"		#include "FuzzerUtil.h"

#include <fstream>		#include <fstream>
#include <iterator>		#include <iterator>
#include <set>		#include <set>
#include <sstream>		#include <sstream>
		#include <unordered_set>

namespace fuzzer {		namespace fuzzer {

bool Merger::Parse(const std::string &Str, bool ParseCoverage) {		bool Merger::Parse(const std::string &Str, bool ParseCoverage) {
std::istringstream SS(Str);		std::istringstream SS(Str);
return Parse(SS, ParseCoverage);		return Parse(SS, ParseCoverage);
}		}

▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	for (auto Fe: Cur) {
}		}
}		}
if (FoundNewFeatures)		if (FoundNewFeatures)
NewFiles->push_back(Files[i].Name);		NewFiles->push_back(Files[i].Name);
for (auto Cov : Files[i].Cov)		for (auto Cov : Files[i].Cov)
if (InitialCov.find(Cov) == InitialCov.end())		if (InitialCov.find(Cov) == InitialCov.end())
NewCov->insert(Cov);		NewCov->insert(Cov);
}		}
return NewFeatures->size();		return NewFeatures->size();
		metzmanUnsubmitted Not Done Reply Inline Actions I'm not sure this approach with signatures works the same way reduction works within a fuzzing run (ie: the way I think we probably want). I'll have to think about this more by tomorrow, since I haven't 100% thought this through and the code dealing with this during fuzzing is tricky. As I understand it, this implementation it only replaces initial corpus elements with new ones that are smaller and are exactly the same in coverage. However, during fuzzing corpus elements are replaced if a new corpus element is found that is smaller and covers the features for which the initial element is the smallest testcase, i.e. testcases can be replaced by smaller testcases that aren't exactly equivalent. Example: Unit A is the smallest element/unit covering feature X. A only covers feature X and no other feature. Then we find Unit B which is smaller than Unit A. B covers features X and is the only (or smallest) unit that covers Y. During fuzzing we will delete A and add B since B is now the smallest unit covering X. Here, if A were an initial testcase (primary/output corpus) and B a new one (secondary/input corpus), I think we don't delete A. Is my example correct? If so then there may be a problem if there is an initial Unit C that is already covering Y, I don't think B will get added to the merged corpus, since it isn't a pure reduction of any testcase. metzman: I'm not sure this approach with signatures works the same way reduction works within a fuzzing…
		Dor1sAuthorUnsubmitted Done Reply Inline Actions Wow, that's a great point! I was thinking that the units should give exactly the same coverage. If that's not the case, I should be able to achieve what we need without introducing the signature stuff, and it'll likely look better :) Thanks for pointing this out! Dor1s: Wow, that's a great point! I was thinking that the units should give exactly the same coverage.
}		}

Set<uint32_t> Merger::AllFeatures() const {		Set<uint32_t> Merger::AllFeatures() const {
Set<uint32_t> S;		Set<uint32_t> S;
for (auto &File : Files)		for (auto &File : Files)
S.insert(File.Features.begin(), File.Features.end());		S.insert(File.Features.begin(), File.Features.end());
return S;		return S;
}		}
Show All 22 Lines	void Fuzzer::CrashResistantMergeInternalStep(const std::string &CFPath) {
Set<const TracePC::PCTableEntry *> AllPCs;		Set<const TracePC::PCTableEntry *> AllPCs;
for (size_t i = M.FirstNotProcessedFile; i < M.Files.size(); i++) {		for (size_t i = M.FirstNotProcessedFile; i < M.Files.size(); i++) {
Fuzzer::MaybeExitGracefully();		Fuzzer::MaybeExitGracefully();
auto U = FileToVector(M.Files[i].Name);		auto U = FileToVector(M.Files[i].Name);
if (U.size() > MaxInputLen) {		if (U.size() > MaxInputLen) {
U.resize(MaxInputLen);		U.resize(MaxInputLen);
U.shrink_to_fit();		U.shrink_to_fit();
}		}
std::ostringstream StartedLine;
// Write the pre-run marker.		// Write the pre-run marker.
OF << "STARTED " << i << " " << U.size() << "\n";		OF << "STARTED " << i << " " << U.size() << "\n";
OF.flush(); // Flush is important since Command::Execute may crash.		OF.flush(); // Flush is important since Command::Execute may crash.
// Run.		// Run.
TPC.ResetMaps();		TPC.ResetMaps();
ExecuteCallback(U.data(), U.size());		ExecuteCallback(U.data(), U.size());
// Collect coverage. We are iterating over the files in this order:		// Collect coverage. We are iterating over the files in this order:
// * First, files in the initial corpus ordered by size, smallest first.		// * First, files in the initial corpus ordered by size, smallest first.
Show All 22 Lines	TPC.ForEachObservedPC([&](const TracePC::PCTableEntry *TE) {
OF << " " << TPC.PCTableEntryIdx(TE);		OF << " " << TPC.PCTableEntryIdx(TE);
});		});
OF << "\n";		OF << "\n";
OF.flush();		OF.flush();
}		}
PrintStatsWrapper("DONE ");		PrintStatsWrapper("DONE ");
}		}

static void WriteNewControlFile(const std::string &CFPath,		static size_t WriteNewControlFile(const std::string &CFPath,
const Vector<SizedFile> &OldCorpus,		const Vector<SizedFile> &OldCorpus,
const Vector<SizedFile> &NewCorpus) {		const Vector<SizedFile> &NewCorpus,
RemoveFile(CFPath);		const Vector<MergeFileInfo> &KnownFiles) {
std::ofstream ControlFile(CFPath);		std::unordered_set<std::string> FilesToSkip;
ControlFile << (OldCorpus.size() + NewCorpus.size()) << "\n";		for (auto &SF: KnownFiles)
ControlFile << OldCorpus.size() << "\n";		FilesToSkip.insert(SF.Name);

		Vector<std::string> FilesToUse;
		auto MaybeUseFile = [=, &FilesToUse](std::string Name) {
		if (FilesToSkip.find(Name) == FilesToSkip.end())
		FilesToUse.push_back(Name);
		};
for (auto &SF: OldCorpus)		for (auto &SF: OldCorpus)
ControlFile << SF.File << "\n";		MaybeUseFile(SF.File);
		auto FilesToUseFromOldCorpus = FilesToUse.size();
for (auto &SF: NewCorpus)		for (auto &SF: NewCorpus)
ControlFile << SF.File << "\n";		MaybeUseFile(SF.File);

		RemoveFile(CFPath);
		std::ofstream ControlFile(CFPath);
		ControlFile << FilesToUse.size() << "\n";
		ControlFile << FilesToUseFromOldCorpus << "\n";
		for (auto &FN: FilesToUse)
		ControlFile << FN << "\n";

if (!ControlFile) {		if (!ControlFile) {
Printf("MERGE-OUTER: failed to write to the control file: %s\n",		Printf("MERGE-OUTER: failed to write to the control file: %s\n",
CFPath.c_str());		CFPath.c_str());
exit(1);		exit(1);
}		}

		return FilesToUse.size();
}		}

// Outer process. Does not call the target code and thus should not fail.		// Outer process. Does not call the target code and thus should not fail.
void CrashResistantMerge(const Vector<std::string> &Args,		void CrashResistantMerge(const Vector<std::string> &Args,
const Vector<SizedFile> &OldCorpus,		const Vector<SizedFile> &OldCorpus,
const Vector<SizedFile> &NewCorpus,		const Vector<SizedFile> &NewCorpus,
Vector<std::string> *NewFiles,		Vector<std::string> *NewFiles,
const Set<uint32_t> &InitialFeatures,		const Set<uint32_t> &InitialFeatures,
Set<uint32_t> *NewFeatures,		Set<uint32_t> *NewFeatures,
const Set<uint32_t> &InitialCov,		const Set<uint32_t> &InitialCov,
Set<uint32_t> *NewCov,		Set<uint32_t> *NewCov,
const std::string &CFPath,		const std::string &CFPath,
bool V /Verbose/) {		bool V /Verbose/) {
if (NewCorpus.empty() && OldCorpus.empty()) return; // Nothing to merge.		if (NewCorpus.empty() && OldCorpus.empty()) return; // Nothing to merge.
size_t NumAttempts = 0;		size_t NumAttempts = 0;
		Vector<MergeFileInfo> KnownFiles;
if (FileSize(CFPath)) {		if (FileSize(CFPath)) {
VPrintf(V, "MERGE-OUTER: non-empty control file provided: '%s'\n",		VPrintf(V, "MERGE-OUTER: non-empty control file provided: '%s'\n",
CFPath.c_str());		CFPath.c_str());
Merger M;		Merger M;
std::ifstream IF(CFPath);		std::ifstream IF(CFPath);
if (M.Parse(IF, /ParseCoverage=/false)) {		if (M.Parse(IF, /ParseCoverage=/true)) {
VPrintf(V, "MERGE-OUTER: control file ok, %zd files total,"		VPrintf(V, "MERGE-OUTER: control file ok, %zd files total,"
" first not processed file %zd\n",		" first not processed file %zd\n",
M.Files.size(), M.FirstNotProcessedFile);		M.Files.size(), M.FirstNotProcessedFile);
if (!M.LastFailure.empty())		if (!M.LastFailure.empty())
VPrintf(V, "MERGE-OUTER: '%s' will be skipped as unlucky "		VPrintf(V, "MERGE-OUTER: '%s' will be skipped as unlucky "
"(merge has stumbled on it the last time)\n",		"(merge has stumbled on it the last time)\n",
M.LastFailure.c_str());		M.LastFailure.c_str());
if (M.FirstNotProcessedFile >= M.Files.size()) {		if (M.FirstNotProcessedFile >= M.Files.size()) {
		// Merge has already been completed with the given merge control file.
		if (M.Files.size() == OldCorpus.size() + NewCorpus.size()) {
VPrintf(		VPrintf(
V, "MERGE-OUTER: nothing to do, merge has been completed before\n");		V,
		"MERGE-OUTER: nothing to do, merge has been completed before\n");
exit(0);		exit(0);
}		}

		// Number of input files likely changed, start merge from scratch, but
		// reuse coverage information from the given merge control file.
		VPrintf(
		V,
		"MERGE-OUTER: starting merge from scratch, but reusing coverage "
		"information from the given control file\n");
		KnownFiles = M.Files;
		} else {
		// There is a merge in progress, continue.
NumAttempts = M.Files.size() - M.FirstNotProcessedFile;		NumAttempts = M.Files.size() - M.FirstNotProcessedFile;
		morehouseUnsubmitted Done Reply Inline Actions Nit: let's omit the else for less nesting since the if short-circuits morehouse: Nit: let's omit the else for less nesting since the if short-circuits
		Dor1sAuthorUnsubmitted Done Reply Inline Actions Done Dor1s: Done
		}
} else {		} else {
VPrintf(V, "MERGE-OUTER: bad control file, will overwrite it\n");		VPrintf(V, "MERGE-OUTER: bad control file, will overwrite it\n");
}		}
}		}

if (!NumAttempts) {		if (!NumAttempts) {
// The supplied control file is empty or bad, create a fresh one.		// The supplied control file is empty or bad, create a fresh one.
NumAttempts = OldCorpus.size() + NewCorpus.size();		VPrintf(V, "MERGE-OUTER: "
VPrintf(V, "MERGE-OUTER: %zd files, %zd in the initial corpus\n",		"%zd files, %zd in the initial corpus, %zd processed earlier\n",
NumAttempts, OldCorpus.size());		OldCorpus.size() + NewCorpus.size(), OldCorpus.size(),
WriteNewControlFile(CFPath, OldCorpus, NewCorpus);		KnownFiles.size());
		NumAttempts = WriteNewControlFile(CFPath, OldCorpus, NewCorpus, KnownFiles);
}		}

// Execute the inner process until it passes.		// Execute the inner process until it passes.
// Every inner process should execute at least one input.		// Every inner process should execute at least one input.
Command BaseCmd(Args);		Command BaseCmd(Args);
BaseCmd.removeFlag("merge");		BaseCmd.removeFlag("merge");
BaseCmd.removeFlag("fork");		BaseCmd.removeFlag("fork");
BaseCmd.removeFlag("collect_data_flow");		BaseCmd.removeFlag("collect_data_flow");
Show All 20 Lines	void CrashResistantMerge(const Vector<std::string> &Args,
VPrintf(V, "MERGE-OUTER: the control file has %zd bytes\n",		VPrintf(V, "MERGE-OUTER: the control file has %zd bytes\n",
(size_t)IF.tellg());		(size_t)IF.tellg());
IF.seekg(0, IF.beg);		IF.seekg(0, IF.beg);
M.ParseOrExit(IF, true);		M.ParseOrExit(IF, true);
IF.close();		IF.close();
VPrintf(V,		VPrintf(V,
"MERGE-OUTER: consumed %zdMb (%zdMb rss) to parse the control file\n",		"MERGE-OUTER: consumed %zdMb (%zdMb rss) to parse the control file\n",
M.ApproximateMemoryConsumption() >> 20, GetPeakRSSMb());		M.ApproximateMemoryConsumption() >> 20, GetPeakRSSMb());

		M.Files.insert(M.Files.end(), KnownFiles.begin(), KnownFiles.end());
M.Merge(InitialFeatures, NewFeatures, InitialCov, NewCov, NewFiles);		M.Merge(InitialFeatures, NewFeatures, InitialCov, NewCov, NewFiles);
VPrintf(V, "MERGE-OUTER: %zd new files with %zd new features added; "		VPrintf(V, "MERGE-OUTER: %zd new files with %zd new features added; "
"%zd new coverage edges\n",		"%zd new coverage edges\n",
NewFiles->size(), NewFeatures->size(), NewCov->size());		NewFiles->size(), NewFeatures->size(), NewCov->size());
}		}

} // namespace fuzzer		} // namespace fuzzer

test/fuzzer/merge.test

	CHECK: BINGO

	RUN: %cpp_compiler %S/FullCoverageSetTest.cpp -o %t-FullCoverageSetTest			RUN: %cpp_compiler %S/FullCoverageSetTest.cpp -o %t-FullCoverageSetTest

	RUN: rm -rf %t/T0 %t/T1 %t/T2			RUN: rm -rf %t/T0 %t/T1 %t/T2
	RUN: mkdir -p %t/T0 %t/T1 %t/T2			RUN: mkdir -p %t/T0 %t/T1 %t/T2
	RUN: echo F..... > %t/T0/1			RUN: echo F..... > %t/T0/1
	RUN: echo .U.... > %t/T0/2			RUN: echo .U.... > %t/T0/2
	RUN: echo ..Z... > %t/T0/3			RUN: echo ..Z... > %t/T0/3

	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/fuzzer/merge_two_step.test

This file was added.

				RUN: %cpp_compiler %S/FullCoverageSetTest.cpp -o %t-FullCoverageSetTest

				RUN: rm -rf %t/T0 %t/T1 %t/T2
				RUN: mkdir -p %t/T0 %t/T1 %t/T2
				RUN: echo F..... > %t/T1/1
				RUN: echo .U.... > %t/T1/2
				RUN: echo ..Z... > %t/T1/3

				# T1 has 3 elements, T0 is empty.
				RUN: rm -f %t/MCF
				RUN: %run %t-FullCoverageSetTest -merge=1 -merge_control_file=%t/MCF %t/T0 %t/T1 2>&1 \| FileCheck %s --check-prefix=CHECK1
				CHECK1: MERGE-OUTER: 3 files, 0 in the initial corpus
				CHECK1: MERGE-OUTER: 3 new files with 11 new features added; 11 new coverage edges

				RUN: echo ...Z.. > %t/T2/1
				RUN: echo ....E. > %t/T2/2
				RUN: echo .....R > %t/T2/3
				RUN: echo F..... > %t/T2/a

				RUN: rm -rf %t/T0
				RUN: mkdir -p %t/T0

				# T1 has 3 elements, T2 has 4 elements, T0 is empty.
				RUN: %run %t-FullCoverageSetTest -merge=1 -merge_control_file=%t/MCF %t/T0 %t/T1 %t/T2 2>&1 \| FileCheck %s --check-prefix=CHECK2
				CHECK2: MERGE-OUTER: non-empty control file provided
				CHECK2: MERGE-OUTER: control file ok, 3 files total, first not processed file 3
				CHECK2: MERGE-OUTER: starting merge from scratch, but reusing coverage information from the given control file
				CHECK2: MERGE-OUTER: 7 files, 0 in the initial corpus, 3 processed earlier
				CHECK2: MERGE-INNER: using the control file
				CHECK2: MERGE-INNER: 4 total files; 0 processed earlier; will process 4 files now
				CHECK2: MERGE-OUTER: 6 new files with 14 new features added; 14 new coverage edges