Divide the corpus into n parts according to size. Each job executes each corpus in turn, Job one executes the corpus with the smallest size, Job two executes the relatively larger corpus,...Job N executes the seed of the largest corpus, in turn,. i.e. each job choose some seeds from corpus 1, corpus2 ,..., corpus N, corpus1,corpus2...corpus N .....
that is, allocate more energy to the small seeds, trigger the common path in advance, and prefer to keep the small seeds.
In my experiment , It is found that the bugs rate is greatly accelerated, the cov is greatly increased (equal to the effect of entropic improvement), and the size of the newly generated interesting seeds is very small.
Details
Diff Detail
Event Timeline
Thanks for the patch! Would you mind sharing the experimental data/results you obtained for this patch?
Additionally, could you submit this patch to FuzzBench for an independent evaluation?
Thanks,
Matt
Hello,How can i share the experiment data?
It seems that FuzzBench does not accept this parallel mode evaluation.
The patched version data in average dir, data of libfuzzer dir is from original libfuzzer.
Thanks.
Thanks for sharing your data. Took a quick look and seems promising.
I would like to try this on FuzzBench before accepting the patch though. FuzzBench has a very nice experimental framework for evaluating changes like this.
It seems that FuzzBench does not accept this parallel mode evaluation.
I talked to @metzman who manages FuzzBench. Sounds like you're correct, FuzzBench uses only one worker process in fork mode. @metzman said we could probably run a special experiment with more workers to evaluate this patch.
Another approach that might be worth doing, is to make the patch effective even for a single worker. For example, maybe we randomly pick from a subset of the corpus for that single worker.
Also, I'm curious how the number of fork-mode workers affects efficacy. I can imagine with lots of workers that this patch could perform much worse. Specifically if we have a small number of corpus elements per worker, the crossover mutation becomes quite limited.
compiler-rt/lib/fuzzer/FuzzerFork.cpp | ||
---|---|---|
143 | Does keeping this skew within each subset of the corpus help? Would a uniform-random approach change efficacy? | |
150 | So we would still use the current strategy on the rare occasion when we get an m index past the end of Files. I imagine there's always some cases where the original strategy will be beneficial at least for the crossover mutation. Maybe we could make the probability of using the current strategy more explicit. Something like if (RandInt(100) < 90) // Use new strategy else // Use old strategy |
Also, the descriptions states:
Divide the corpus into n parts according to size.
Is it really according to size? IIUC when there are multiple worker processes, any new coverage they have simply gets appended to Files. So Files is not necessarily sorted by size.
Yes, Grouping the corpus has two advantages, one is that it can reduce the repetitive work of each process, and the other is that small seeds trigger the same path in advance. And triggering new paths in advance is the key reason to improve efficiency. so 'Files' should sorted by size.
OK, Thanks for your work.
There are some thoughtful tips to tell you :
- The effect of Grouping corpus energy may be similar to the effect of entropic (distribute energy for single seeds)on some goals, but there are also differences. So you should enable -entropic=0 when you evalutate them . Of course , At the same time, Enable -entropic and -NumCorpus will also have a certain effect .If you are interested, you can test four groups of subjects
(1).-entropic=0,NumCorpuses=1;
(2),-entropic=1,NumCorpuses=1
(3),-entropic=0, NumCorpuses=N (i set 30, others are also possible, I think this changes with the total number of seeds, it should change dynamically,)
(4),-entropic=1,NumCorpuses=30
- I set -fork=30,-NumCorpuses=30,-entropic=0 in my evaluation. But the -fork value can not be equal to the -NumCorpuses value, because each job will execute each corpus in turn from small to large.
- According to 2, it should be worked well in the single core mode, Single process executes each corpus in turn from small to large. for in-process libfuzzer, frequent interaction with fs brings additional overhead. Therefore, it is still suitable for energy scheduling in parallel fuzzing when each child process maintains the same coverage bitmap in time .
- The degree of parallelism depends on how long you want to get results .I think more grouping and -fork equal to -NumCorpuses will be much better. They can be regarded as: traversing all corpora in one loop.If this is not the case, the corpues that are biased to the back will not be fully tested, because the merged results of the previous jobs will be written back to fs (if there is a large seed generated by small jobs ) and will be taken out again, which will have some negative effects.
- I am sorry I haven't tried it on -workers .
- Can you share the official results to me?
- Thanks for your work once again!
- Sorry, I want to know 'Files' is sorted by size necessarily in the trunk version ,is it different from the https://github.com/Dor1s/libfuzzer-workshop? The shared data is collected on the https://github.com/Dor1s/libfuzzer-workshop , and the new evaluation data is still tested in the trunk version ,the data seems promising.
- I have a question. When they created the task for the first time, the ’FILES‘ were sorted strictly according to the size. The newly covered seeds were added after the Files were appended, which means that the new seeds in the future may not necessarily be sorted according to the size. , Then my theory may not hold true, but from the experimental data, at the same time, the average seed size of the main corpus is very small.The cov is more, this method may give little seeds more energy from a certain angle, but I don’t seem to be able to explain this phenomenon. Can you help me?
- Wouldn't it be better if FILEs were always sorted strictly according to size? Because the use of corpus grouping is equivalent to N more locations for extracting seeds in the entire corpus, so Rand->SkewTowardsLast(Files.size()) may not be appropriate, so it tends to extract newly added seeds .
- How to implement File is strictly sorted according to seeds. I only came into contact with libfuzzer code last two weeks and I am not very familiar with it .
- I think my method of grouping can prevent your original method from causing local optimality. Therefore, if you sort strictly according to the size, this will ensure that interesting seeds that are based on a certain sub-corpus can fall into the original sub-corpus with a high probability . Then i can design a filtering algorithm Thompson sampling or Multi-armed Bandits to find the global optimal solution .
If not sorted by size , Just a simple grouping of corpus, the effect is similar to entropic.
Hello.
Due to the time zone difference, I think our communication is a bit inefficient. Can we arrange a convenient time for you to focus on the discussion?
We use CST.
Thanks.
At this point I am not convinced this patch will provide benefit for the default use case when -entropic=1. I am hesitant to add complexity to the code for unsure benefit.
If you request a FuzzBench experiment to get some data on this, and the results look good, then I'll be willing to invest more time into reviewing this patch.
Please CC me on the FuzzBench pull request, so I can make sure we are evaluating this properly.
Hello,
Fuzzbench don't accept the parallel mode testing .
I will share my complete experiment data in the future.
I redesigned the algorithm and did a complete long-term evaluation by myself, and got very good results. Whether it is -entropic=0 or 1, it performs very well, and -fork mode is now better than paralllel fuzzing mode Better performance, please move to D105084. There are detailed data. Thanks a lot.
clang-format: please reformat the code