This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/
-
lib/fuzzer/
-
fuzzer/
-
FuzzerDriver.cpp
-
FuzzerFlags.def
1
FuzzerFork.cpp
-
FuzzerInternal.h
-
FuzzerMerge.h
1/5
FuzzerMerge.cpp
-
tests/
3/5
FuzzerUnittest.cpp
-
test/fuzzer/
-
fuzzer/
1/3
set_cover_merge.test

Differential D105284

Greedy set cover implementation of `Merger::Merge`
ClosedPublic

Authored by arisKoutsou on Jul 1 2021, 7:33 AM.

Download Raw Diff

Details

Reviewers

kcc
morehouse

Commits

rGe6597dbae840: Greedy set cover implementation of `Merger::Merge`

Summary

Extend the existing single-pass algorithm for Merger::Merge with an algorithm that gives better results. This new implementation can be used with a new set_cover_merge=1 flag.

This greedy set cover implementation gives a substantially smaller final corpus (40%-80% less testcases) while preserving the same features/coverage. At the same time, the execution time penalty is not that significant (+50% for ~1M corpus files and far less for smaller corpora). These results were obtained by comparing several targets with varying size corpora.

Change Merger::CrashResistantMergeInternalStep to collect all features from each file and not just unique ones. This is needed for the set cover algorithm to work correctly. The implementation of the algorithm in Merger::SetCoverMerge uses a bitvector to store features that are covered by a file while performing the pass. Collisions while indexing the bitvector are ignored similarly to the fuzzer.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

arisKoutsou created this revision.Jul 1 2021, 7:33 AM

Herald added a subscriber: mgrang. · View Herald TranscriptJul 1 2021, 7:33 AM

arisKoutsou requested review of this revision.Jul 1 2021, 7:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2021, 7:33 AM

Herald added a subscriber: Restricted Project. · View Herald Transcript

Harbormaster completed remote builds in B111982: Diff 355871.Jul 1 2021, 8:14 AM

arisKoutsou updated this revision to Diff 356333.Jul 3 2021, 1:50 AM

arisKoutsou added a subscriber: llvm-commits.Jul 3 2021, 2:12 AM

Harbormaster completed remote builds in B112317: Diff 356333.Jul 3 2021, 2:24 AM

kcc added a reviewer: morehouse.Jul 12 2021, 12:24 PM

Thanks for the change!
Indeed, the current single-pass merge is far from perfect, and it's nice to see your numbers.

Some high-level notes:

not sure about the flag syntax (-merge=2), I'd prefer something more descriptive. Maybe add an extra flag -set_cover_merge=1?
The current single-pass algorithm is also greedy (just less so :)), so maybe use a different name to distinguish the new algorithm
please try to follow the coding style e.g. with respect to {} around single-statement blocks.
please add a unit test and a .lit test

I'll let Matt make another round of review.

I understand that this merge algorithm decreases the number of inputs by taking the most feature-rich inputs first. Does this lead to larger average input sizes in the merged corpus?

And what's the effect on total corpus bytes (du -hs)?

compiler-rt/lib/fuzzer/FuzzerMerge.cpp
198	Please document the merge algorithm in a function comment here.
233	Given the multiple passes over `Remaining`, is this sort useful anymore?
317	Would a set be a better data structure for `Remaining`? Then we wouldn't need to do a linear lookup on every erase.

arisKoutsou edited the summary of this revision. (Show Details)Aug 23 2021, 7:24 AM

Changes:

Renamed all occurences of MergeGreedy with SetCoverMerge.
Added a new flag called set_cover_merge. Defaults to 0, when 1 the new set_cover_merge implementation is used to merge corpora instead of the standard merge.
Some code-style changes.
Added a unit test, based off the Merger::Merge unit test.
Added a lit test based on the merge.test. Results of merge and set_cover_merge are different in some cases, as expected.
Changed the Remaining variable in SetCoverMerge() from std::vector to a std::set.
Addressed some inaccuracies in the algorithm, mostly regarding the first corpus (consider features in the first corpus as already covered, just like merge=1 works).
Removed the initial sorting of files on size/features. Instead, for files with the same number of features, break ties by choosing the smaller one in size.

Harbormaster completed remote builds in B120790: Diff 368104.Aug 23 2021, 7:43 AM

Does this lead to larger average input sizes in the merged corpus?

@morehouse It will lead to a larger average size, since we are preferring files with many features (which probably are larger in many cases). On the other hand, since the final corpus size (in files) is smaller, the sum of the file sizes in bytes may end up less than with the existing merge algorithm.

For example, consider minimizing a 70K testcase corpus for a harness of the jq target. The following workflow is performed in a container (image: https://hub.docker.com/r/ariskoutsou/jq-libfuzzer) containing the updated -set_cover_merge=1 implementation along with the target harness:

root@dd166043164c:/src/harness# mkdir 0
root@dd166043164c:/src/harness# ./fuzzer_jq -detect_leaks=0 -close_fd_mask=3 -set_cover_merge=1 0 corpus
...
MERGE-OUTER: the control file has 48616105 bytes
MERGE-OUTER: consumed 34Mb (185Mb rss) to parse the control file
MERGE-OUTER: 83 new files with 3174 new features added; 1116 new coverage edges
...
root@dd166043164c:/src/harness# find 0 -ls | awk '{sum += $7; n++;} END {print sum/n;}'
1068.81
root@dd166043164c:/src/harness# du -sh 0
356K    0
root@dd166043164c:/src/harness# rm 0/*
root@dd166043164c:/src/harness# ./fuzzer_jq -detect_leaks=0 -close_fd_mask=3 -merge=1 0 corpus
...
MERGE-OUTER: the control file has 4184437 bytes
MERGE-OUTER: consumed 5Mb (94Mb rss) to parse the control file
MERGE-OUTER: 161 new files with 3174 new features added; 1293 new coverage edges
...
root@dd166043164c:/src/harness# find 0 -ls | awk '{sum += $7; n++;} END {p
rint sum/n;}'
675.296
root@dd166043164c:/src/harness# du -sh 0/
676K    0/

Here, even though the average file size is quite bigger ~1068 bytes over ~675 bytes, the total corpus size is smaller. With the set_cover_merge flag the minimized corpus size is 356K, while with merge it is 676K.

Harbormaster completed remote builds in B120791: Diff 368107.Aug 23 2021, 8:22 AM

Change .lit test input to better illustrate the scenario where 2 files cover all features.

Harbormaster completed remote builds in B120811: Diff 368133.Aug 23 2021, 10:22 AM

morehouse added inline comments.Aug 25 2021, 12:45 PM

compiler-rt/lib/fuzzer/FuzzerFork.cpp
324–325
compiler-rt/lib/fuzzer/FuzzerMerge.cpp
358	Could this continue cause an infinite loop? i.e. when `Remaining` is empty
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
953	IIUC, this test doesn't quite do what's intended. We choose input C because it has the most unique features, but after that A only has `{0}` as a unique feature while B has `{4, 5}`. So we do in fact choose B, but not because it is smaller.
compiler-rt/test/fuzzer/set_cover_merge.test
49	I think we would get the same results with `-merge`. Perhaps we should make some feature-poor inputs smaller so that `-merge` would pick those first, while `-set_cover_merge` picks the feature-rich ones.

Add comments for argument names when passing argument values.
Change AllFeatures. Calculate all unique features by considering Feature % kFeatureSetSize as the feature value.
Change continue to break statement in main loop. Add assertions to highlight the condition that exits the loop.
Remove checking for feature-less files since we are not removing features from any MergeFileInfo objects.
Update tests, add testcase for feature collision on the bitvector.

compiler-rt/lib/fuzzer/FuzzerMerge.cpp
358	It wouldn't cause an infinite loop when Remaining is empty because the `while` condition `(CoveredSize != AllFeatures.size())` would be false. On the other hand, I noticed that there would be a problem if a feature had a value greater than `1 << 21`, which is the size of the bitvector. In that case, there could be an infinite loop because the `while` condition would never become false. I am addressing this problem in the latest patch.
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
953	Correct, I will adjust the features correctly so that we can test breaking ties between files with equal number of features.
compiler-rt/test/fuzzer/set_cover_merge.test
49	Should I also include a test of `-merge=1` in this source file to highlight the difference in behavior?

Harbormaster completed remote builds in B121738: Diff 369438.Aug 30 2021, 7:06 AM

morehouse added inline comments.Sep 3 2021, 1:58 PM

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
953	The comment above is confusing me now. I think what's happening is that B is picked first (over A since B is smaller), leaving A with `{0, 3}` unique features and C with `{1, 2, 4}` unique features. Then C gets picked since it has more features. We still end up with C and B in the set, but the comment's explanation of how this happens is wrong.
compiler-rt/test/fuzzer/set_cover_merge.test
49	Yes, please do.

Add -merge=1 test in set_cover_merge.test for comparison with -set_cover_merge=1.

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
953	As far as I can see here , the format of the control file is: FT <current_file_index> <feature_1> <feature_2> ... <feature_n> So, in our case, the feature sets are: A = {3, 5, 6} B = {4, 5, 6} C = {1, 2, 3, 4} D = {1} Since the set cover algorithm chooses the set that covers the maximum number of previously uncovered features, the file that is chosen in the first iteration is C. This is because `Covered` is empty and C has 4 features while all the other files have less features. In the next iteration of the algorithm A's uncovered features are {5, 6} and B's uncovered features are {5, 6} so we break the tie by selecting the smallest which is B. Finally, all features are covered so we can exit.

Harbormaster completed remote builds in B122867: Diff 371064.Sep 7 2021, 7:48 AM

LGTM

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
953	Ah, yep you're right. Thanks for explaining.

This revision is now accepted and ready to land.Sep 7 2021, 8:47 AM

Fix set_cover_merge.test to not produce flaky results based on file listing order.

This revision was landed with ongoing or failed builds.Sep 7 2021, 9:43 AM

Closed by commit rGe6597dbae840: Greedy set cover implementation of `Merger::Merge` (authored by aristotelis <aristotelis@forallsecure.com>, committed by morehouse). · Explain Why

This revision was automatically updated to reflect the committed changes.

morehouse added a commit: rGe6597dbae840: Greedy set cover implementation of `Merger::Merge`.

Harbormaster completed remote builds in B122888: Diff 371101.Sep 7 2021, 9:47 AM

Revision Contents

Path

Size

compiler-rt/

lib/

fuzzer/

10 lines

5 lines

8 lines

3 lines

7 lines

154 lines

tests/

FuzzerUnittest.cpp

131 lines

test/

fuzzer/

set_cover_merge.test

72 lines

Commit	Tree	Parents	Author	Summary	Date
658d376866c4	9d3fbe602478	e30a75c7ae3f	aristotelis	Fix set_cover_merge.test to not produce flaky results based on file listing… (Show More…)	Sep 7 2021, 9:13 AM
e30a75c7ae3f	1450fe52c28f	e5440fa78aa5 0a697771ced0	aristotelis	Merge branch 'main' into greedy-cmin	Sep 7 2021, 3:59 AM
0a697771ced0	875d9b0b628f	583dad56802b 36527cbe02c4	aristotelis	Merge remote-tracking branch 'origin/main'	Sep 7 2021, 3:58 AM
e5440fa78aa5	f6330e6d7af0	115f18df100e	aristotelis	Add -merge=1 test for comparison with -set_cover_merge=1.	Aug 30 2021, 6:11 AM
583dad56802b	c5839119dade	75ed49b15058 a823bdf3ab78	aristotelis	Merge branch 'main' of https://github.com/llvm/llvm-project	Sep 7 2021, 3:22 AM
115f18df100e	e7da12de80f0	3009ef265e67 75ed49b15058	aristotelis	Merge branch 'main' into greedy-cmin	Aug 27 2021, 4:39 AM
75ed49b15058	bb8adc143c87	9ecbca7378b7 78f92c38101f	aristotelis	Merge remote-tracking branch 'origin/main'	Aug 27 2021, 4:39 AM
3009ef265e67	6b75708976bf	795401e91e7f 9ecbca7378b7	aristotelis	Change testcase to avoid flaky result based on ls	Aug 23 2021, 7:44 AM
9ecbca7378b7	233db2065b9d	209a465e2e22 cdb391698bb2	aristotelis	Merge remote-tracking branch 'origin/main'	Aug 23 2021, 7:29 AM
795401e91e7f	a153d5588cce	b12151fd587e	aristotelis	Styling (Show More…)	Aug 18 2021, 5:32 AM
b12151fd587e	7940f7a61611	0a276fce6d2f 209a465e2e22	aristotelis	Merge branch 'main' into greedy-cmin	Aug 18 2021, 4:31 AM
209a465e2e22	15a8659d92b3	23a75ace8b5f f22e5869a012	aristotelis	Merge remote-tracking branch 'origin/main'	Aug 18 2021, 4:30 AM
0a276fce6d2f	56481d259689	7d44c861484e 23a75ace8b5f	aristotelis	Merge branch 'main' into greedy-cmin	Aug 13 2021, 3:18 AM
23a75ace8b5f	c7c67dda774f	1ae8cef42c00 5c8c24d2deca	aristotelis	Merge remote-tracking branch 'origin/main'	Aug 13 2021, 3:18 AM
7d44c861484e	34c8d797df44	d25d56826ec6 1ae8cef42c00	aristotelis	Merge branch 'main' into greedy-cmin (Show More…)	Aug 5 2021, 1:44 AM
1ae8cef42c00	59a720563abe	0a3de3868233 39dac1f7f656	aristotelis	Merge remote-tracking branch 'origin/main' (Show More…)	Aug 5 2021, 1:30 AM
d25d56826ec6	06d3ae1c403f	22dbe83ce7c1	aristotelis	Clang-Format	Jul 1 2021, 12:07 PM
22dbe83ce7c1	ac804139de31	a7ca9c0079f1	aristotelis	Fix Merge() call bug.	Jul 1 2021, 5:30 AM
a7ca9c0079f1	5f4f66169f11	3be97a6fc2e8	aristotelis	Fix Features to AllFeatures in TPC collect.	Jul 1 2021, 4:39 AM
3be97a6fc2e8	c0abc929cbdd	db73ba194a8e	aristotelis	Fix flag value	Jul 1 2021, 4:25 AM
db73ba194a8e	cba4aa73e435	da2d4be50edb	aristotelis	Fix FuzzerFork	Jul 1 2021, 4:10 AM
da2d4be50edb	e21a6c9b0835	0558df836c18	aristotelis	Add new value for Flag.merge and Flag.merge_inner	Jul 1 2021, 3:56 AM
0a3de3868233	b277aa155b06	f7b1fa6f5ebe	aristotelis	Change quadratic to linear. (Show More…)	Jun 29 2021, 9:35 AM
0558df836c18	7d6cb9359ac0	d37544caec3b	aristotelis	Remove print from main.	Jun 30 2021, 3:15 AM
d37544caec3b	ffd56279c1f7	b09cff408c65	aristotelis	Formatting.	Jun 30 2021, 3:12 AM
b09cff408c65	46a8c858ce94	1764cf47a015	aristotelis	Rename UniqFeatures var.	Jun 30 2021, 3:07 AM
1764cf47a015	4bac7178118f	f087bf62780f	aristotelis	Apply clion formatting.	Jun 30 2021, 2:58 AM
f087bf62780f	480df5f2cd72	83440d233e05	aristotelis	Remove AllFeatures() method. Nicer comments.	Jun 30 2021, 2:55 AM
83440d233e05	f587880c21ab	08f0daa6472d	aristotelis	Revert "Adding chrono timers to measure function wall time." (Show More…)	Jun 30 2021, 2:47 AM
08f0daa6472d	bf7df5864e8e	d2e71c8980a2	aristotelis	Adding chrono timers to measure function wall time.	Jun 29 2021, 8:29 AM
d2e71c8980a2	f587880c21ab	528d7c975e82	aristotelis	Add modulo operation before indexing to 'Covered' bitvector.	Jun 29 2021, 7:52 AM
528d7c975e82	b163466abc68	7f094cd74a39	aristotelis	Initial commit of greedy Merge() function.	Jun 25 2021, 9:50 AM
7f094cd74a39	aa0e8f4b6451	f7b1fa6f5ebe	aristotelis	Add a greeting message to differentiate between upstream and custom fuzzer.	Jun 25 2021, 9:40 AM

Diff 371101

compiler-rt/lib/fuzzer/FuzzerDriver.cpp

Show First 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	for (size_t i = 1; i < Corpora.size(); i++)
GetSizedFilesFromDir(Corpora[i], &NewCorpus);		GetSizedFilesFromDir(Corpora[i], &NewCorpus);
std::sort(OldCorpus.begin(), OldCorpus.end());		std::sort(OldCorpus.begin(), OldCorpus.end());
std::sort(NewCorpus.begin(), NewCorpus.end());		std::sort(NewCorpus.begin(), NewCorpus.end());

std::string CFPath = CFPathOrNull ? CFPathOrNull : TempPath("Merge", ".txt");		std::string CFPath = CFPathOrNull ? CFPathOrNull : TempPath("Merge", ".txt");
std::vector<std::string> NewFiles;		std::vector<std::string> NewFiles;
std::set<uint32_t> NewFeatures, NewCov;		std::set<uint32_t> NewFeatures, NewCov;
CrashResistantMerge(Args, OldCorpus, NewCorpus, &NewFiles, {}, &NewFeatures,		CrashResistantMerge(Args, OldCorpus, NewCorpus, &NewFiles, {}, &NewFeatures,
{}, &NewCov, CFPath, true);		{}, &NewCov, CFPath, true, Flags.set_cover_merge);
for (auto &Path : NewFiles)		for (auto &Path : NewFiles)
F->WriteToOutputCorpus(FileToVector(Path, Options.MaxLen));		F->WriteToOutputCorpus(FileToVector(Path, Options.MaxLen));
// We are done, delete the control file if it was a temporary one.		// We are done, delete the control file if it was a temporary one.
if (!Flags.merge_control_file)		if (!Flags.merge_control_file)
RemoveFile(CFPath);		RemoveFile(CFPath);

exit(0);		exit(0);
}		}
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	int FuzzerDriver(int argc, char **argv, UserCallback Callback) {
unsigned Seed = Flags.seed;		unsigned Seed = Flags.seed;
// Initialize Seed.		// Initialize Seed.
if (Seed == 0)		if (Seed == 0)
Seed = static_cast<unsigned>(		Seed = static_cast<unsigned>(
std::chrono::system_clock::now().time_since_epoch().count() + GetPid());		std::chrono::system_clock::now().time_since_epoch().count() + GetPid());
if (Flags.verbosity)		if (Flags.verbosity)
Printf("INFO: Seed: %u\n", Seed);		Printf("INFO: Seed: %u\n", Seed);

if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {		if (Flags.collect_data_flow && !Flags.fork &&
		!(Flags.merge \|\| Flags.set_cover_merge)) {
if (RunIndividualFiles)		if (RunIndividualFiles)
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,		return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
ReadCorpora({}, *Inputs));		ReadCorpora({}, *Inputs));
else		else
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,		return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
ReadCorpora(*Inputs, {}));		ReadCorpora(*Inputs, {}));
}		}

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	Printf("***\n"
"***\n");		"***\n");
F->PrintFinalStats();		F->PrintFinalStats();
exit(0);		exit(0);
}		}

if (Flags.fork)		if (Flags.fork)
FuzzWithFork(F->GetMD().GetRand(), Options, Args, *Inputs, Flags.fork);		FuzzWithFork(F->GetMD().GetRand(), Options, Args, *Inputs, Flags.fork);

if (Flags.merge)		if (Flags.merge \|\| Flags.set_cover_merge)
Merge(F, Options, Args, *Inputs, Flags.merge_control_file);		Merge(F, Options, Args, *Inputs, Flags.merge_control_file);

if (Flags.merge_inner) {		if (Flags.merge_inner) {
const size_t kDefaultMaxMergeLen = 1 << 20;		const size_t kDefaultMaxMergeLen = 1 << 20;
if (Options.MaxLen == 0)		if (Options.MaxLen == 0)
F->SetMaxInputLen(kDefaultMaxMergeLen);		F->SetMaxInputLen(kDefaultMaxMergeLen);
assert(Flags.merge_control_file);		assert(Flags.merge_control_file);
F->CrashResistantMergeInternalStep(Flags.merge_control_file);		F->CrashResistantMergeInternalStep(Flags.merge_control_file,
		!strncmp(Flags.merge_inner, "2", 1));
exit(0);		exit(0);
}		}

if (Flags.analyze_dict) {		if (Flags.analyze_dict) {
size_t MaxLen = INT_MAX; // Large max length.		size_t MaxLen = INT_MAX; // Large max length.
UnitVector InitialCorpus;		UnitVector InitialCorpus;
for (auto &Inp : *Inputs) {		for (auto &Inp : *Inputs) {
Printf("Loading corpus dir: %s\n", Inp.c_str());		Printf("Loading corpus dir: %s\n", Inp.c_str());
Show All 37 Lines

compiler-rt/lib/fuzzer/FuzzerFlags.def

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	FUZZER_FLAG_INT(fork, 0, "Experimental mode where fuzzing happens "			FUZZER_FLAG_INT(fork, 0, "Experimental mode where fuzzing happens "
	"in a subprocess")			"in a subprocess")
	FUZZER_FLAG_INT(ignore_timeouts, 1, "Ignore timeouts in fork mode")			FUZZER_FLAG_INT(ignore_timeouts, 1, "Ignore timeouts in fork mode")
	FUZZER_FLAG_INT(ignore_ooms, 1, "Ignore OOMs in fork mode")			FUZZER_FLAG_INT(ignore_ooms, 1, "Ignore OOMs in fork mode")
	FUZZER_FLAG_INT(ignore_crashes, 0, "Ignore crashes in fork mode")			FUZZER_FLAG_INT(ignore_crashes, 0, "Ignore crashes in fork mode")
	FUZZER_FLAG_INT(merge, 0, "If 1, the 2-nd, 3-rd, etc corpora will be "			FUZZER_FLAG_INT(merge, 0, "If 1, the 2-nd, 3-rd, etc corpora will be "
	"merged into the 1-st corpus. Only interesting units will be taken. "			"merged into the 1-st corpus. Only interesting units will be taken. "
	"This flag can be used to minimize a corpus.")			"This flag can be used to minimize a corpus.")
				FUZZER_FLAG_INT(set_cover_merge, 0, "If 1, the 2-nd, 3-rd, etc corpora will be "
				"merged into the 1-st corpus. Same as the 'merge' flag, but uses the "
				"standard greedy algorithm for the set cover problem to "
				"compute an approximation of the minimum set of testcases that "
				"provide the same coverage as the initial corpora")
	FUZZER_FLAG_STRING(stop_file, "Stop fuzzing ASAP if this file exists")			FUZZER_FLAG_STRING(stop_file, "Stop fuzzing ASAP if this file exists")
	FUZZER_FLAG_STRING(merge_inner, "internal flag")			FUZZER_FLAG_STRING(merge_inner, "internal flag")
	FUZZER_FLAG_STRING(merge_control_file,			FUZZER_FLAG_STRING(merge_control_file,
	"Specify a control file used for the merge process. "			"Specify a control file used for the merge process. "
	"If a merge process gets killed it tries to leave this file "			"If a merge process gets killed it tries to leave this file "
	"in a state suitable for resuming the merge. "			"in a state suitable for resuming the merge. "
	"By default a temporary file will be used."			"By default a temporary file will be used."
	"The same file can be used for multistep merge process.")			"The same file can be used for multistep merge process.")
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

compiler-rt/lib/fuzzer/FuzzerFork.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

Printf("#%zd: cov: %zd ft: %zd corp: %zd exec/s %zd "

NumRuns, Cov.size(), Features.size(), Files.size(),

Stats.average_exec_per_sec, NumOOMs, NumTimeouts, NumCrashes,

secondsSinceProcessStartUp(), Job->JobId, Job->DftTimeInSeconds);

if (MergeCandidates.empty()) return;

std::vector<std::string> FilesToAdd;

std::set<uint32_t> NewFeatures, NewCov;

bool IsSetCoverMerge =

!Job->Cmd.getFlagValue("set_cover_merge").compare("1");

CrashResistantMerge(Args, {}, MergeCandidates, &FilesToAdd, Features,

&NewFeatures, Cov, &NewCov, Job->CFPath, false);

&NewFeatures, Cov, &NewCov, Job->CFPath, false,

IsSetCoverMerge);

for (auto &Path : FilesToAdd) {

auto U = FileToVector(Path);

auto NewPath = DirPlusFile(MainCorpusDir, Hash(U));

WriteToFile(U, NewPath);

Files.push_back(NewPath);

}

Features.insert(NewFeatures.begin(), NewFeatures.end());

Cov.insert(NewCov.begin(), NewCov.end());

▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

void FuzzWithFork(Random &Rand, const FuzzingOptions &Options,

if (Options.KeepSeed) {

for (auto &File : SeedFiles)

Env.Files.push_back(File.File);

} else {

auto CFPath = DirPlusFile(Env.TempDir, "merge.txt");

std::set<uint32_t> NewFeatures, NewCov;

CrashResistantMerge(Env.Args, {}, SeedFiles, &Env.Files, Env.Features,

&NewFeatures, Env.Cov, &NewCov, CFPath, false);

&NewFeatures, Env.Cov, &NewCov, CFPath,

/*Verbose=*/false, /*IsSetCoverMerge=*/false);

morehouseUnsubmitted

Not Done

CrashResistantMerge(Env.Args, {}, SeedFiles, &Env.Files, Env.Features,

- &NewFeatures, Env.Cov, &NewCov, CFPath, false, false);

+ &NewFeatures, Env.Cov, &NewCov, CFPath,

+ /*Verbose=*/false, /*IsSetCoverMerge=*/false);

Env.Features.insert(NewFeatures.begin(), NewFeatures.end());

morehouse:

Env.Features.insert(NewFeatures.begin(), NewFeatures.end());

Env.Cov.insert(NewFeatures.begin(), NewFeatures.end());

RemoveFile(CFPath);

}

Printf("INFO: -fork=%d: %zd seed inputs, starting to fuzz in %s\n", NumJobs,

Env.Files.size(), Env.TempDir.c_str());

int ExitCode = 0;

▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

compiler-rt/lib/fuzzer/FuzzerInternal.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	public:
void ExecuteCallback(const uint8_t *Data, size_t Size);		void ExecuteCallback(const uint8_t *Data, size_t Size);
bool RunOne(const uint8_t *Data, size_t Size, bool MayDeleteFile = false,		bool RunOne(const uint8_t *Data, size_t Size, bool MayDeleteFile = false,
InputInfo *II = nullptr, bool ForceAddToCorpus = false,		InputInfo *II = nullptr, bool ForceAddToCorpus = false,
bool *FoundUniqFeatures = nullptr);		bool *FoundUniqFeatures = nullptr);
void TPCUpdateObservedPCs();		void TPCUpdateObservedPCs();

// Merge Corpora[1:] into Corpora[0].		// Merge Corpora[1:] into Corpora[0].
void Merge(const std::vector<std::string> &Corpora);		void Merge(const std::vector<std::string> &Corpora);
void CrashResistantMergeInternalStep(const std::string &ControlFilePath);		void CrashResistantMergeInternalStep(const std::string &ControlFilePath,
		bool IsSetCoverMerge);
MutationDispatcher &GetMD() { return MD; }		MutationDispatcher &GetMD() { return MD; }
void PrintFinalStats();		void PrintFinalStats();
void SetMaxInputLen(size_t MaxInputLen);		void SetMaxInputLen(size_t MaxInputLen);
void SetMaxMutationLen(size_t MaxMutationLen);		void SetMaxMutationLen(size_t MaxMutationLen);
void RssLimitCallback();		void RssLimitCallback();

bool InFuzzingThread() const { return IsMyThread; }		bool InFuzzingThread() const { return IsMyThread; }
size_t GetCurrentUnitInFuzzingThead(const uint8_t **Data) const;		size_t GetCurrentUnitInFuzzingThead(const uint8_t **Data) const;
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

compiler-rt/lib/fuzzer/FuzzerMerge.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	struct Merger {

bool Parse(std::istream &IS, bool ParseCoverage);		bool Parse(std::istream &IS, bool ParseCoverage);
bool Parse(const std::string &Str, bool ParseCoverage);		bool Parse(const std::string &Str, bool ParseCoverage);
void ParseOrExit(std::istream &IS, bool ParseCoverage);		void ParseOrExit(std::istream &IS, bool ParseCoverage);
size_t Merge(const std::set<uint32_t> &InitialFeatures,		size_t Merge(const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,		std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov, std::set<uint32_t> *NewCov,		const std::set<uint32_t> &InitialCov, std::set<uint32_t> *NewCov,
std::vector<std::string> *NewFiles);		std::vector<std::string> *NewFiles);
		size_t SetCoverMerge(const std::set<uint32_t> &InitialFeatures,
		std::set<uint32_t> *NewFeatures,
		const std::set<uint32_t> &InitialCov,
		std::set<uint32_t> *NewCov,
		std::vector<std::string> *NewFiles);
size_t ApproximateMemoryConsumption() const;		size_t ApproximateMemoryConsumption() const;
std::set<uint32_t> AllFeatures() const;		std::set<uint32_t> AllFeatures() const;
};		};

void CrashResistantMerge(const std::vector<std::string> &Args,		void CrashResistantMerge(const std::vector<std::string> &Args,
const std::vector<SizedFile> &OldCorpus,		const std::vector<SizedFile> &OldCorpus,
const std::vector<SizedFile> &NewCorpus,		const std::vector<SizedFile> &NewCorpus,
std::vector<std::string> *NewFiles,		std::vector<std::string> *NewFiles,
const std::set<uint32_t> &InitialFeatures,		const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,		std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov,		const std::set<uint32_t> &InitialCov,
std::set<uint32_t> *NewCov, const std::string &CFPath,		std::set<uint32_t> *NewCov, const std::string &CFPath,
bool Verbose);		bool Verbose, bool IsSetCoverMerge);

} // namespace fuzzer		} // namespace fuzzer

#endif // LLVM_FUZZER_MERGE_H		#endif // LLVM_FUZZER_MERGE_H

compiler-rt/lib/fuzzer/FuzzerMerge.cpp

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
}		}

std::set<uint32_t> Merger::AllFeatures() const {		std::set<uint32_t> Merger::AllFeatures() const {
std::set<uint32_t> S;		std::set<uint32_t> S;
for (auto &File : Files)		for (auto &File : Files)
S.insert(File.Features.begin(), File.Features.end());		S.insert(File.Features.begin(), File.Features.end());
return S;		return S;
}		}

		morehouseUnsubmitted Not Done Reply Inline Actions Please document the merge algorithm in a function comment here. morehouse: Please document the merge algorithm in a function comment here.
// Inner process. May crash if the target crashes.		// Inner process. May crash if the target crashes.
void Fuzzer::CrashResistantMergeInternalStep(const std::string &CFPath) {		void Fuzzer::CrashResistantMergeInternalStep(const std::string &CFPath,
		bool IsSetCoverMerge) {
Printf("MERGE-INNER: using the control file '%s'\n", CFPath.c_str());		Printf("MERGE-INNER: using the control file '%s'\n", CFPath.c_str());
Merger M;		Merger M;
std::ifstream IF(CFPath);		std::ifstream IF(CFPath);
M.ParseOrExit(IF, false);		M.ParseOrExit(IF, false);
IF.close();		IF.close();
if (!M.LastFailure.empty())		if (!M.LastFailure.empty())
Printf("MERGE-INNER: '%s' caused a failure at the previous merge step\n",		Printf("MERGE-INNER: '%s' caused a failure at the previous merge step\n",
M.LastFailure.c_str());		M.LastFailure.c_str());
Show All 15 Lines	for (size_t i = M.FirstNotProcessedFile; i < M.Files.size(); i++) {
if (U.size() > MaxInputLen) {		if (U.size() > MaxInputLen) {
U.resize(MaxInputLen);		U.resize(MaxInputLen);
U.shrink_to_fit();		U.shrink_to_fit();
}		}

// Write the pre-run marker.		// Write the pre-run marker.
OF << "STARTED " << i << " " << U.size() << "\n";		OF << "STARTED " << i << " " << U.size() << "\n";
OF.flush(); // Flush is important since Command::Execute may crash.		OF.flush(); // Flush is important since Command::Execute may crash.
// Run.		// Run.
		morehouseUnsubmitted Not Done Reply Inline Actions Given the multiple passes over `Remaining`, is this sort useful anymore? morehouse: Given the multiple passes over `Remaining`, is this sort useful anymore?
TPC.ResetMaps();		TPC.ResetMaps();
ExecuteCallback(U.data(), U.size());		ExecuteCallback(U.data(), U.size());
// Collect coverage. We are iterating over the files in this order:		// Collect coverage. We are iterating over the files in this order:
// * First, files in the initial corpus ordered by size, smallest first.		// * First, files in the initial corpus ordered by size, smallest first.
// * Then, all other files, smallest first.		// * Then, all other files, smallest first.
// So it makes no sense to record all features for all files, instead we		std::set<size_t> Features;
// only record features that were not seen before.		if (IsSetCoverMerge)
std::set<size_t> UniqFeatures;		TPC.CollectFeatures([&](size_t Feature) { Features.insert(Feature); });
		else
TPC.CollectFeatures([&](size_t Feature) {		TPC.CollectFeatures([&](size_t Feature) {
if (AllFeatures.insert(Feature).second)		if (AllFeatures.insert(Feature).second)
UniqFeatures.insert(Feature);		Features.insert(Feature);
});		});
TPC.UpdateObservedPCs();		TPC.UpdateObservedPCs();
// Show stats.		// Show stats.
if (!(TotalNumberOfRuns & (TotalNumberOfRuns - 1)))		if (!(TotalNumberOfRuns & (TotalNumberOfRuns - 1)))
PrintStatsWrapper("pulse ");		PrintStatsWrapper("pulse ");
if (TotalNumberOfRuns == M.NumFilesInFirstCorpus)		if (TotalNumberOfRuns == M.NumFilesInFirstCorpus)
PrintStatsWrapper("LOADED");		PrintStatsWrapper("LOADED");
// Write the post-run marker and the coverage.		// Write the post-run marker and the coverage.
OF << "FT " << i;		OF << "FT " << i;
for (size_t F : UniqFeatures)		for (size_t F : Features)
OF << " " << F;		OF << " " << F;
OF << "\n";		OF << "\n";
OF << "COV " << i;		OF << "COV " << i;
TPC.ForEachObservedPC([&](const TracePC::PCTableEntry *TE) {		TPC.ForEachObservedPC([&](const TracePC::PCTableEntry *TE) {
if (AllPCs.insert(TE).second)		if (AllPCs.insert(TE).second)
OF << " " << TPC.PCTableEntryIdx(TE);		OF << " " << TPC.PCTableEntryIdx(TE);
});		});
OF << "\n";		OF << "\n";
OF.flush();		OF.flush();
}		}
PrintStatsWrapper("DONE ");		PrintStatsWrapper("DONE ");
}		}

		// Merges all corpora into the first corpus. A file is added into
		// the first corpus only if it adds new features. Unlike `Merger::Merge`,
		// this implementation calculates an approximation of the minimum set
		// of corpora files, that cover all known features (set cover problem).
		// Generally, this means that files with more features are preferred for
		// merge into the first corpus. When two files have the same number of
		// features, the smaller one is preferred.
		size_t Merger::SetCoverMerge(const std::set<uint32_t> &InitialFeatures,
		std::set<uint32_t> *NewFeatures,
		const std::set<uint32_t> &InitialCov,
		std::set<uint32_t> *NewCov,
		std::vector<std::string> *NewFiles) {
		assert(NumFilesInFirstCorpus <= Files.size());
		NewFiles->clear();
		NewFeatures->clear();
		NewCov->clear();
		std::set<uint32_t> AllFeatures;
		// 1 << 21 - 1 is the maximum feature index.
		// See 'kFeatureSetSize' in 'FuzzerCorpus.h'.
		const uint32_t kFeatureSetSize = 1 << 21;
		std::vector<bool> Covered(kFeatureSetSize, false);
		size_t NumCovered = 0;

		std::set<uint32_t> ExistingFeatures = InitialFeatures;
		for (size_t i = 0; i < NumFilesInFirstCorpus; ++i)
		ExistingFeatures.insert(Files[i].Features.begin(), Files[i].Features.end());

		// Mark the existing features as covered.
		for (const auto &F : ExistingFeatures) {
		if (!Covered[F % kFeatureSetSize]) {
		++NumCovered;
		Covered[F % kFeatureSetSize] = true;
		}
		// Calculate an underestimation of the set of covered features
		// since the `Covered` bitvector is smaller than the feature range.
		AllFeatures.insert(F % kFeatureSetSize);
		}

		std::set<size_t> RemainingFiles;
		for (size_t i = NumFilesInFirstCorpus; i < Files.size(); ++i) {
		// Construct an incremental sequence which represent the
		// indices to all files (excluding those in the initial corpus).
		// RemainingFiles = range(NumFilesInFirstCorpus..Files.size()).
		RemainingFiles.insert(i);
		// Insert this file's unique features to all features.
		for (const auto &F : Files[i].Features)
		AllFeatures.insert(F % kFeatureSetSize);
		}

		morehouseUnsubmitted Not Done Reply Inline Actions Would a set be a better data structure for `Remaining`? Then we wouldn't need to do a linear lookup on every erase. morehouse: Would a set be a better data structure for `Remaining`? Then we wouldn't need to do a linear…
		// Integrate files into Covered until set is complete.
		while (NumCovered != AllFeatures.size()) {
		// Index to file with largest number of unique features.
		size_t MaxFeaturesIndex = NumFilesInFirstCorpus;
		// Indices to remove from RemainingFiles.
		std::set<size_t> RemoveIndices;
		// Running max unique feature count.
		// Updated upon finding a file with more features.
		size_t MaxNumFeatures = 0;

		// Iterate over all files not yet integrated into Covered,
		// to find the file which has the largest number of
		// features that are not already in Covered.
		for (const auto &i : RemainingFiles) {
		const auto &File = Files[i];
		size_t CurrentUnique = 0;
		// Count number of features in this file
		// which are not yet in Covered.
		for (const auto &F : File.Features)
		if (!Covered[F % kFeatureSetSize])
		++CurrentUnique;

		if (CurrentUnique == 0) {
		// All features in this file are already in Covered: skip next time.
		RemoveIndices.insert(i);
		} else if (CurrentUnique > MaxNumFeatures \|\|
		(CurrentUnique == MaxNumFeatures &&
		File.Size < Files[MaxFeaturesIndex].Size)) {
		// Update the max features file based on unique features
		// Break ties by selecting smaller files.
		MaxNumFeatures = CurrentUnique;
		MaxFeaturesIndex = i;
		}
		}
		// Must be a valid index/
		assert(MaxFeaturesIndex < Files.size());
		// Remove any feature-less files found.
		for (const auto &i : RemoveIndices)
		RemainingFiles.erase(i);
		if (MaxNumFeatures == 0) {
		// Did not find a file that adds unique features.
		morehouseUnsubmitted Not Done Reply Inline Actions Could this continue cause an infinite loop? i.e. when `Remaining` is empty morehouse: Could this continue cause an infinite loop? i.e. when `Remaining` is empty
		arisKoutsouAuthorUnsubmitted Done Reply Inline Actions It wouldn't cause an infinite loop when Remaining is empty because the `while` condition `(CoveredSize != AllFeatures.size())` would be false. On the other hand, I noticed that there would be a problem if a feature had a value greater than `1 << 21`, which is the size of the bitvector. In that case, there could be an infinite loop because the `while` condition would never become false. I am addressing this problem in the latest patch. arisKoutsou: It wouldn't cause an infinite loop when Remaining is empty because the `while` condition `…
		// This means that we should have no remaining files.
		assert(RemainingFiles.size() == 0);
		assert(NumCovered == AllFeatures.size());
		break;
		}

		// MaxFeaturesIndex must be an element of Remaining.
		assert(RemainingFiles.find(MaxFeaturesIndex) != RemainingFiles.end());
		// Remove the file with the most features from Remaining.
		RemainingFiles.erase(MaxFeaturesIndex);
		const auto &MaxFeatureFile = Files[MaxFeaturesIndex];
		// Add the features of the max feature file to Covered.
		for (const auto &F : MaxFeatureFile.Features) {
		if (!Covered[F % kFeatureSetSize]) {
		++NumCovered;
		Covered[F % kFeatureSetSize] = true;
		NewFeatures->insert(F);
		}
		}
		// Add the index to this file to the result.
		NewFiles->push_back(MaxFeatureFile.Name);
		// Update NewCov with the additional coverage
		// that MaxFeatureFile provides.
		for (const auto &C : MaxFeatureFile.Cov)
		if (InitialCov.find(C) == InitialCov.end())
		NewCov->insert(C);
		}

		return NewFeatures->size();
		}

static size_t		static size_t
WriteNewControlFile(const std::string &CFPath,		WriteNewControlFile(const std::string &CFPath,
const std::vector<SizedFile> &OldCorpus,		const std::vector<SizedFile> &OldCorpus,
const std::vector<SizedFile> &NewCorpus,		const std::vector<SizedFile> &NewCorpus,
const std::vector<MergeFileInfo> &KnownFiles) {		const std::vector<MergeFileInfo> &KnownFiles) {
std::unordered_set<std::string> FilesToSkip;		std::unordered_set<std::string> FilesToSkip;
for (auto &SF: KnownFiles)		for (auto &SF: KnownFiles)
FilesToSkip.insert(SF.Name);		FilesToSkip.insert(SF.Name);
Show All 29 Lines
void CrashResistantMerge(const std::vector<std::string> &Args,		void CrashResistantMerge(const std::vector<std::string> &Args,
const std::vector<SizedFile> &OldCorpus,		const std::vector<SizedFile> &OldCorpus,
const std::vector<SizedFile> &NewCorpus,		const std::vector<SizedFile> &NewCorpus,
std::vector<std::string> *NewFiles,		std::vector<std::string> *NewFiles,
const std::set<uint32_t> &InitialFeatures,		const std::set<uint32_t> &InitialFeatures,
std::set<uint32_t> *NewFeatures,		std::set<uint32_t> *NewFeatures,
const std::set<uint32_t> &InitialCov,		const std::set<uint32_t> &InitialCov,
std::set<uint32_t> *NewCov, const std::string &CFPath,		std::set<uint32_t> *NewCov, const std::string &CFPath,
bool V /Verbose/) {		bool V, /Verbose/
		bool IsSetCoverMerge) {
if (NewCorpus.empty() && OldCorpus.empty()) return; // Nothing to merge.		if (NewCorpus.empty() && OldCorpus.empty()) return; // Nothing to merge.
size_t NumAttempts = 0;		size_t NumAttempts = 0;
std::vector<MergeFileInfo> KnownFiles;		std::vector<MergeFileInfo> KnownFiles;
if (FileSize(CFPath)) {		if (FileSize(CFPath)) {
VPrintf(V, "MERGE-OUTER: non-empty control file provided: '%s'\n",		VPrintf(V, "MERGE-OUTER: non-empty control file provided: '%s'\n",
CFPath.c_str());		CFPath.c_str());
Merger M;		Merger M;
std::ifstream IF(CFPath);		std::ifstream IF(CFPath);
Show All 38 Lines	VPrintf(V, "MERGE-OUTER: "
KnownFiles.size());		KnownFiles.size());
NumAttempts = WriteNewControlFile(CFPath, OldCorpus, NewCorpus, KnownFiles);		NumAttempts = WriteNewControlFile(CFPath, OldCorpus, NewCorpus, KnownFiles);
}		}

// Execute the inner process until it passes.		// Execute the inner process until it passes.
// Every inner process should execute at least one input.		// Every inner process should execute at least one input.
Command BaseCmd(Args);		Command BaseCmd(Args);
BaseCmd.removeFlag("merge");		BaseCmd.removeFlag("merge");
		BaseCmd.removeFlag("set_cover_merge");
BaseCmd.removeFlag("fork");		BaseCmd.removeFlag("fork");
BaseCmd.removeFlag("collect_data_flow");		BaseCmd.removeFlag("collect_data_flow");
for (size_t Attempt = 1; Attempt <= NumAttempts; Attempt++) {		for (size_t Attempt = 1; Attempt <= NumAttempts; Attempt++) {
Fuzzer::MaybeExitGracefully();		Fuzzer::MaybeExitGracefully();
VPrintf(V, "MERGE-OUTER: attempt %zd\n", Attempt);		VPrintf(V, "MERGE-OUTER: attempt %zd\n", Attempt);
Command Cmd(BaseCmd);		Command Cmd(BaseCmd);
Cmd.addFlag("merge_control_file", CFPath);		Cmd.addFlag("merge_control_file", CFPath);
Cmd.addFlag("merge_inner", "1");		// If we are going to use the set cover implementation for
		// minimization add the merge_inner=2 internal flag.
		Cmd.addFlag("merge_inner", IsSetCoverMerge ? "2" : "1");
if (!V) {		if (!V) {
Cmd.setOutputFile(getDevNull());		Cmd.setOutputFile(getDevNull());
Cmd.combineOutAndErr();		Cmd.combineOutAndErr();
}		}
auto ExitCode = ExecuteCommand(Cmd);		auto ExitCode = ExecuteCommand(Cmd);
if (!ExitCode) {		if (!ExitCode) {
VPrintf(V, "MERGE-OUTER: successful in %zd attempt(s)\n", Attempt);		VPrintf(V, "MERGE-OUTER: successful in %zd attempt(s)\n", Attempt);
break;		break;
}		}
}		}
// Read the control file and do the merge.		// Read the control file and do the merge.
Merger M;		Merger M;
std::ifstream IF(CFPath);		std::ifstream IF(CFPath);
IF.seekg(0, IF.end);		IF.seekg(0, IF.end);
VPrintf(V, "MERGE-OUTER: the control file has %zd bytes\n",		VPrintf(V, "MERGE-OUTER: the control file has %zd bytes\n",
(size_t)IF.tellg());		(size_t)IF.tellg());
IF.seekg(0, IF.beg);		IF.seekg(0, IF.beg);
M.ParseOrExit(IF, true);		M.ParseOrExit(IF, true);
IF.close();		IF.close();
VPrintf(V,		VPrintf(V,
"MERGE-OUTER: consumed %zdMb (%zdMb rss) to parse the control file\n",		"MERGE-OUTER: consumed %zdMb (%zdMb rss) to parse the control file\n",
M.ApproximateMemoryConsumption() >> 20, GetPeakRSSMb());		M.ApproximateMemoryConsumption() >> 20, GetPeakRSSMb());

M.Files.insert(M.Files.end(), KnownFiles.begin(), KnownFiles.end());		M.Files.insert(M.Files.end(), KnownFiles.begin(), KnownFiles.end());
		if (IsSetCoverMerge)
		M.SetCoverMerge(InitialFeatures, NewFeatures, InitialCov, NewCov, NewFiles);
		else
M.Merge(InitialFeatures, NewFeatures, InitialCov, NewCov, NewFiles);		M.Merge(InitialFeatures, NewFeatures, InitialCov, NewCov, NewFiles);
VPrintf(V, "MERGE-OUTER: %zd new files with %zd new features added; "		VPrintf(V, "MERGE-OUTER: %zd new files with %zd new features added; "
"%zd new coverage edges\n",		"%zd new coverage edges\n",
NewFiles->size(), NewFeatures->size(), NewCov->size());		NewFiles->size(), NewFeatures->size(), NewCov->size());
}		}

} // namespace fuzzer		} // namespace fuzzer

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp

Show All 10 Lines

#include "FuzzerCorpus.h"		#include "FuzzerCorpus.h"
#include "FuzzerDictionary.h"		#include "FuzzerDictionary.h"
#include "FuzzerInternal.h"		#include "FuzzerInternal.h"
#include "FuzzerMerge.h"		#include "FuzzerMerge.h"
#include "FuzzerMutate.h"		#include "FuzzerMutate.h"
#include "FuzzerRandom.h"		#include "FuzzerRandom.h"
#include "FuzzerTracePC.h"		#include "FuzzerTracePC.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
		Lint: Pre-merge checks Inline Actions clang-tidy: error: 'gtest/gtest.h' file not found [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: 'gtest/gtest.h' file not found [clang-diagnostic-error] [[https://github.
#include <memory>		#include <memory>
#include <set>		#include <set>
#include <sstream>		#include <sstream>

using namespace fuzzer;		using namespace fuzzer;

// For now, have LLVMFuzzerTestOneInput just to make it link.		// For now, have LLVMFuzzerTestOneInput just to make it link.
// Later we may want to make unittests that actually call		// Later we may want to make unittests that actually call
▲ Show 20 Lines • Show All 832 Lines • ▼ Show 20 Lines	EXPECT_TRUE(M.Parse("4\n1\nA\nB\nC\nD\n"
"",		"",
true));		true));
EXPECT_EQ(M.Merge(Features, &NewFeatures, Cov, &NewCov, &NewFiles), 3U);		EXPECT_EQ(M.Merge(Features, &NewFeatures, Cov, &NewCov, &NewFiles), 3U);
TRACED_EQ(M.Files, {"A", "B", "C", "D"});		TRACED_EQ(M.Files, {"A", "B", "C", "D"});
TRACED_EQ(NewFiles, {"D", "B"});		TRACED_EQ(NewFiles, {"D", "B"});
TRACED_EQ(NewFeatures, {1, 2, 3});		TRACED_EQ(NewFeatures, {1, 2, 3});
}		}

		TEST(Merger, SetCoverMerge) {
		Merger M;
		std::set<uint32_t> Features, NewFeatures;
		std::set<uint32_t> Cov, NewCov;
		std::vector<std::string> NewFiles;

		// Adds new files and features
		EXPECT_TRUE(M.Parse("3\n0\nA\nB\nC\n"
		"STARTED 0 1000\n"
		"FT 0 1 2 3\n"
		"STARTED 1 1001\n"
		"FT 1 4 5 6 \n"
		"STARTED 2 1002\n"
		"FT 2 6 1 3\n"
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		6U);
		TRACED_EQ(M.Files, {"A", "B", "C"});
		TRACED_EQ(NewFiles, {"A", "B"});
		TRACED_EQ(NewFeatures, {1, 2, 3, 4, 5, 6});

		// Doesn't return features or files in the initial corpus.
		EXPECT_TRUE(M.Parse("3\n1\nA\nB\nC\n"
		"STARTED 0 1000\n"
		"FT 0 1 2 3\n"
		"STARTED 1 1001\n"
		"FT 1 4 5 6 \n"
		"STARTED 2 1002\n"
		"FT 2 6 1 3\n"
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		3U);
		TRACED_EQ(M.Files, {"A", "B", "C"});
		TRACED_EQ(NewFiles, {"B"});
		TRACED_EQ(NewFeatures, {4, 5, 6});

		// No new features, so no new files
		EXPECT_TRUE(M.Parse("3\n2\nA\nB\nC\n"
		"STARTED 0 1000\n"
		"FT 0 1 2 3\n"
		"STARTED 1 1001\n"
		"FT 1 4 5 6 \n"
		"STARTED 2 1002\n"
		"FT 2 6 1 3\n"
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		0U);
		TRACED_EQ(M.Files, {"A", "B", "C"});
		TRACED_EQ(NewFiles, {});
		TRACED_EQ(NewFeatures, {});

		// Can pass initial features and coverage.
		Features = {1, 2, 3};
		Cov = {};
		EXPECT_TRUE(M.Parse("2\n0\nA\nB\n"
		"STARTED 0 1000\n"
		"FT 0 1 2 3\n"
		"STARTED 1 1001\n"
		"FT 1 4 5 6\n"
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		3U);
		TRACED_EQ(M.Files, {"A", "B"});
		TRACED_EQ(NewFiles, {"B"});
		TRACED_EQ(NewFeatures, {4, 5, 6});
		Features.clear();
		Cov.clear();

		// Prefer files with a lot of features first (C has 4 features)
		// Then prefer B over A due to the smaller size. After choosing C and B,
		// A and D have no new features to contribute.
		EXPECT_TRUE(M.Parse("4\n0\nA\nB\nC\nD\n"
		"STARTED 0 2000\n"
		"FT 0 3 5 6\n"
		"STARTED 1 1000\n"
		"FT 1 4 5 6 \n"
		"STARTED 2 1000\n"
		"FT 2 1 2 3 4 \n"
		"STARTED 3 500\n"
		"FT 3 1 \n"
		"",
		true));
		morehouseUnsubmitted Not Done Reply Inline Actions IIUC, this test doesn't quite do what's intended. We choose input C because it has the most unique features, but after that A only has `{0}` as a unique feature while B has `{4, 5}`. So we do in fact choose B, but not because it is smaller. morehouse: IIUC, this test doesn't quite do what's intended. We choose input C because it has the most…
		arisKoutsouAuthorUnsubmitted Done Reply Inline Actions Correct, I will adjust the features correctly so that we can test breaking ties between files with equal number of features. arisKoutsou: Correct, I will adjust the features correctly so that we can test breaking ties between files…
		morehouseUnsubmitted Not Done Reply Inline Actions The comment above is confusing me now. I think what's happening is that B is picked first (over A since B is smaller), leaving A with `{0, 3}` unique features and C with `{1, 2, 4}` unique features. Then C gets picked since it has more features. We still end up with C and B in the set, but the comment's explanation of how this happens is wrong. morehouse: The comment above is confusing me now. I think what's happening is that B is picked first…
		arisKoutsouAuthorUnsubmitted Done Reply Inline Actions As far as I can see here , the format of the control file is: FT <current_file_index> <feature_1> <feature_2> ... <feature_n> So, in our case, the feature sets are: A = {3, 5, 6} B = {4, 5, 6} C = {1, 2, 3, 4} D = {1} Since the set cover algorithm chooses the set that covers the maximum number of previously uncovered features, the file that is chosen in the first iteration is C. This is because `Covered` is empty and C has 4 features while all the other files have less features. In the next iteration of the algorithm A's uncovered features are {5, 6} and B's uncovered features are {5, 6} so we break the tie by selecting the smallest which is B. Finally, all features are covered so we can exit. arisKoutsou: As far as I can see [[ https://github.com/llvm/llvm-project/blob/main/compiler…
		morehouseUnsubmitted Done Reply Inline Actions Ah, yep you're right. Thanks for explaining. morehouse: Ah, yep you're right. Thanks for explaining.
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		6U);
		TRACED_EQ(M.Files, {"A", "B", "C", "D"});
		TRACED_EQ(NewFiles, {"C", "B"});
		TRACED_EQ(NewFeatures, {1, 2, 3, 4, 5, 6});

		// Only 1 file covers all features.
		EXPECT_TRUE(M.Parse("4\n1\nA\nB\nC\nD\n"
		"STARTED 0 2000\n"
		"FT 0 4 5 6 7 8\n"
		"STARTED 1 1100\n"
		"FT 1 1 2 3 \n"
		"STARTED 2 1100\n"
		"FT 2 2 3 \n"
		"STARTED 3 1000\n"
		"FT 3 1 \n"
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		3U);
		TRACED_EQ(M.Files, {"A", "B", "C", "D"});
		TRACED_EQ(NewFiles, {"B"});
		TRACED_EQ(NewFeatures, {1, 2, 3});

		// A Feature has a value greater than (1 << 21) and hence
		// there are collisions in the underlying `covered features`
		// bitvector.
		EXPECT_TRUE(M.Parse("3\n0\nA\nB\nC\n"
		"STARTED 0 2000\n"
		"FT 0 1 2 3\n"
		"STARTED 1 1000\n"
		"FT 1 3 4 5 \n"
		"STARTED 2 1000\n"
		"FT 2 3 2097153 \n" // Last feature is (2^21 + 1).
		"",
		true));
		EXPECT_EQ(M.SetCoverMerge(Features, &NewFeatures, Cov, &NewCov, &NewFiles),
		5U);
		TRACED_EQ(M.Files, {"A", "B", "C"});
		// File 'C' is not added because it's last feature is considered
		// covered due to collision with feature 1.
		TRACED_EQ(NewFiles, {"B", "A"});
		TRACED_EQ(NewFeatures, {1, 2, 3, 4, 5});
		}

#undef TRACED_EQ		#undef TRACED_EQ

TEST(DFT, BlockCoverage) {		TEST(DFT, BlockCoverage) {
BlockCoverage Cov;		BlockCoverage Cov;
// Assuming C0 has 5 instrumented blocks,		// Assuming C0 has 5 instrumented blocks,
// C1: 7 blocks, C2: 4, C3: 9, C4 never covered, C5: 15,		// C1: 7 blocks, C2: 4, C3: 9, C4 never covered, C5: 15,

// Add C0		// Add C0
▲ Show 20 Lines • Show All 349 Lines • Show Last 20 Lines

compiler-rt/test/fuzzer/set_cover_merge.test

This file was added.

				RUN: %cpp_compiler %S/FullCoverageSetTest.cpp -o %t-FullCoverageSetTest

				RUN: rm -rf %t/T1 %t/T2
				RUN: mkdir -p %t/T1 %t/T2
				RUN: echo F..... > %t/T1/1
				RUN: echo .U.... > %t/T1/2
				RUN: echo ..Z... > %t/T1/3

				# T1 has 3 elements, T2 is empty.
				RUN: %run %t-FullCoverageSetTest -set_cover_merge=1 %t/T1 %t/T2 2>&1 \| FileCheck %s --check-prefix=CHECK1
				CHECK1: MERGE-OUTER: 3 files, 3 in the initial corpus
				CHECK1: MERGE-OUTER: 0 new files with 0 new features added

				RUN: echo ...Z.. > %t/T2/1
				RUN: echo ....E. > %t/T2/2
				RUN: echo .....R > %t/T2/3
				RUN: echo F..... > %t/T2/a
				RUN: echo .U.... > %t/T2/b
				RUN: echo ..Z... > %t/T2/c

				# T1 has 3 elements, T2 has 6 elements, only 3 are new.
				RUN: %run %t-FullCoverageSetTest -set_cover_merge=1 %t/T1 %t/T2 2>&1 \| FileCheck %s --check-prefix=CHECK2
				CHECK2: MERGE-OUTER: 9 files, 3 in the initial corpus
				CHECK2: MERGE-OUTER: 3 new files with 3 new features added

				# Now, T1 has 6 units and T2 has no new interesting units.
				RUN: %run %t-FullCoverageSetTest -set_cover_merge=1 %t/T1 %t/T2 2>&1 \| FileCheck %s --check-prefix=CHECK3
				CHECK3: MERGE-OUTER: 12 files, 6 in the initial corpus
				CHECK3: MERGE-OUTER: 0 new files with 0 new features added

				RUN: rm -rf %t/T1/* %t/T2/*
				RUN: mkdir -p %t/T3
				RUN: echo ...... > %t/T1/1
				RUN: echo F..... > %t/T2/a
				RUN: echo .U.... > %t/T2/b
				RUN: echo ..Z... > %t/T2/c
				RUN: echo ...Z.. > %t/T3/1
				RUN: echo ....E. > %t/T3/2
				RUN: echo ...ZER > %t/T3/3
				RUN: echo .UZZER. > %t/T3/a
				RUN: echo .UZZER.. > %t/T3/b
				RUN: echo ...... > %t/T3/c

				# T1 is empty, T2 and T3 have overlapping features. The optimal solution
				# consists of 2 files: T2/a, T3/a. These files cover 6 new features.
				# Although T3/a is larger in size (1 more byte) we prefer it because
				# it covers more features than any other file.
				RUN: %run %t-FullCoverageSetTest -set_cover_merge=1 %t/T1 %t/T2 %t/T3 2>&1 \| FileCheck %s --check-prefix=CHECK_OVERLAP
				CHECK_OVERLAP: MERGE-OUTER: 10 files, 1 in the initial corpus
				morehouseUnsubmitted Not Done Reply Inline Actions I think we would get the same results with `-merge`. Perhaps we should make some feature-poor inputs smaller so that `-merge` would pick those first, while `-set_cover_merge` picks the feature-rich ones. morehouse: I think we would get the same results with `-merge`. Perhaps we should make some feature-poor…
				arisKoutsouAuthorUnsubmitted Done Reply Inline Actions Should I also include a test of `-merge=1` in this source file to highlight the difference in behavior? arisKoutsou: Should I also include a test of `-merge=1` in this source file to highlight the difference in…
				morehouseUnsubmitted Not Done Reply Inline Actions Yes, please do. morehouse: Yes, please do.
				CHECK_OVERLAP: MERGE-OUTER: 2 new files with 6 new features added
				# Make sure that we are prefering smaller files (T3/a over T3/b).
				RUN: diff %t/T1/1b2301992a0266982b135fee5164937d7f7abca3 %t/T3/a

				RUN: rm -rf %t/T1/* %t/T2/* %t/T3/*
				RUN: echo ...... > %t/T1/1
				RUN: echo F..... > %t/T2/a
				RUN: echo .U.... > %t/T2/b
				RUN: echo ..Z... > %t/T2/c
				RUN: echo ...Z.. > %t/T3/1
				RUN: echo ....E. > %t/T3/2
				RUN: echo .....R > %t/T3/3
				RUN: echo .UZZER. > %t/T3/a
				RUN: echo .UZZER.. > %t/T3/b
				RUN: echo ...... > %t/T3/c

				# Test the previous scenario with the '-merge=1' flag.
				# Expect more files to be added to the first corpus, since '-merge=1'
				# is going to prefer smaller files and ignore the fact that
				# T3/a covers almost all of the available features.
				RUN: %run %t-FullCoverageSetTest -merge=1 %t/T1 %t/T2 %t/T3 2>&1 \| FileCheck %s --check-prefix=CHECK_OVERLAP2
				CHECK_OVERLAP2: MERGE-OUTER: 10 files, 1 in the initial corpus
				CHECK_OVERLAP2: MERGE-OUTER: 6 new files with 6 new features added
				No newline at end of file