Tracking issue: https://github.com/google/oss-fuzz/issues/331
Details
Diff Detail
Event Timeline
Kostya, below is what I'm getting with this code:
Dictionary: 27 entries INFO: Seed: 2599778320 INFO: Loaded 2 modules (22862 guards): [0x7fc343e70160, 0x7fc343e81294), [0x5f98a0, 0x5feca4), Loading corpus dir: ../libpng_read_fuzzer Loaded 1024/2611 files from ../libpng_read_fuzzer Loaded 2048/2611 files from ../libpng_read_fuzzer Started dictionary minimization. Need to perform 140940 tests. Initial corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0 Initial Result: 2, Features: 11200 11208, Coverage: 753 Dictionary unit: \x89PNG\x0d\x0a\x1a\x0a Modified corpus unit: v\xaf\xb1\xb8\xf2\xf5\xe5\xf5\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0 Modified Result: 2, Features: 11200 11208, Coverage: 754 Dictionary unit: IDAT Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000\xb6\xbb\xbe\xabhCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0 Modified Result: 0, Features:, Coverage: 759 Dictionary unit: IHDR Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\xb6\xb7\xbb\xad\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0 Modified Result: 2, Features: 11200 11208, Coverage: 759 Dictionary unit: zTXt Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14\x85\xab\xa7\x8b\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0 Modified Result: 0, Features:, Coverage: 760 ###### Useless dictionary elements. ###### "\x89PNG\x0d\x0a\x1a\x0a" # Score: -1 "IDAT" # Score: 1 "IHDR" # Score: -1 "zTXt" # Score: 1 ###### End of useless dictionary elements. ###### Dictionary minimization suceeded
As you can see, initial corpus unit gives the following stats:
Initial Result: 2, Features: 11200 11208, Coverage: 753
If I replace PNG signature in the beginning with a trash, it still gives:
Modified Result: 2, Features: 11200 11208, Coverage: 754
Which is totally wrong. I do not obtain coverage info correctly. What could be a good way to distinguish a difference between two corpus units?
ah...
TPC.CollectFeatures is destructive...
For performance reasons I had to do Counters[i] = 0; inside it, so you can only call it once after each execution and Fuzzer::RunOne already does that.
So, one way to "fix" your code is to call ExecuteCallback and then call TPC.CollectFeatures.
Another way, (maybe) is to fix TPC.CollectFeatures so that it does not reset Counters. (I'll look into this too)
in t297783 I removed Counters[i] = 0; -- try again.
(also removed some other stale code)
Yeah, this is much better now! Thank you!!! Look, really important elements for PNG dictionary get a pretty high score now:
"\x89PNG\x0d\x0a\x1a\x0a" # Score: 2589, Used: 2589 "IDAT" # Score: 1323, Used: 1649 "IEND" # Score: -297, Used: 319 "IHDR" # Score: 2527, Used: 2545 "PLTE" # Score: 132, Used: 266 "bKGD" # Score: 18, Used: 82 "cHRM" # Score: 139, Used: 189 "fRAc" # Score: -17, Used: 47 "gAMA" # Score: 232, Used: 368 "gIFg" # Score: -1, Used: 43 "gIFt" # Score: -21, Used: 49 "gIFx" # Score: 0, Used: 10 "hIST" # Score: -9, Used: 47 "iCCP" # Score: 182, Used: 450 "iTXt" # Score: -30, Used: 64 "oFFs" # Score: -74, Used: 122 "pCAL" # Score: -12, Used: 124 "pHYs" # Score: 2, Used: 66 "sBIT" # Score: 39, Used: 97 "sCAL" # Score: -4, Used: 42 "sPLT" # Score: -16, Used: 62 "sRGB" # Score: 64, Used: 118 "sTER" # Score: 4, Used: 26 "tEXt" # Score: 83, Used: 121 "tIME" # Score: -6, Used: 42 "tRNS" # Score: 73, Used: 173 "zTXt" # Score: 458, Used: 612
or some entries from net_dns_record_fuzzer dictionary:
"\x00\x00" # Score: 316, Used: 182 "\x00\x01" # Score: 139, Used: 122 "\x00\x02" # Score: 0, Used: 9 "\x00\x03" # Score: 136, Used: 89 "\x00\x04" # Score: -37, Used: 37 "\x00\x05" # Score: 3, Used: 3 "\x00\x08" # Score: 2, Used: 1 "\x00\x0c" # Score: 4, Used: 2 "\x00\x10" # Score: -11, Used: 20 "\x00\x1c" # Score: 22, Used: 68 "\x00 " # Score: 99, Used: 63 "\x000" # Score: 2, Used: 1 "\x03foo\x00" # Score: 2, Used: 1 "\x00\x01\x00\x01" # Score: 2, Used: 1 "\x00\x05\x00\x01" # Score: 1, Used: 2 "\x03foo\x00\x00\x01\x00\x01" # Score: 2, Used: 1 "\x00\x05\x00\x01\x00\x00\x00\xff" # Score: 2, Used: 1 "A.B.C.D" # Score: -6, Used: 6 "0" # Score: -119, Used: 125 "M." # Score: -11, Used: 11 "IN," # Score: -1, Used: 1 "TTL" # Score: -15, Used: 15 "[" # Score: 59, Used: 46 "MB" # Score: -1, Used: 1 "MX" # Score: -1, Used: 1 "use" # Score: -1, Used: 1 "to" # Score: -1, Used: 1 "TELNET," # Score: 2, Used: 1 "F." # Score: -55, Used: 64 "A" # Score: -126, Used: 135 "\"A" # Score: 115, Used: 80 "a" # Score: -75, Used: 81 "RFC-953." # Score: -1, Used: 1
- rename MinimizeDictionary -> AnalyzeDictionary
- replace all occurrences of a dictionary unit inside corpus testcase
- add two points to score if dictionary unit seems to be useful
Kostya, what do you think about this approach in general? I guess that you don't want to add new flags and new auxiliary features to libFuzzer, but I don't see a way to minimize dictionaries without runtime analysis and coverage comparison.
I like how aggressive this approach is against "useless" units in dictionary. I even had to use Scores[i] += 2 instead of ++Scores[i] for cases when dictionary unit looks "useful". Otherwise it's too aggressive, e.g.:
PNG fuzzer:
###### Useless dictionary elements. ###### "IEND" # Score: -297, Used: 319 "fRAc" # Score: -1, Used: 47 "gIFt" # Score: -9, Used: 49 "iTXt" # Score: -20, Used: 64 "oFFs" # Score: -68, Used: 122 ###### End of useless dictionary elements. ######
or prtime_fuzzer:
###### Useless dictionary elements. ###### "." # Score: -33, Used: 129 "Z" # Score: -70, Used: 90 "Mar" # Score: -2, Used: 10 "Apr" # Score: -6, Used: 12 "Jun" # Score: -1, Used: 9 "Nov" # Score: -3, Used: 7 ###### End of useless dictionary elements. ######
With += 2 it looks better, e.g. PNG:
###### Useless dictionary elements. ###### "IEND" # Score: -286, Used: 319 "oFFs" # Score: -41, Used: 122 ###### End of useless dictionary elements. ######
prtime:
###### Useless dictionary elements. ###### "Z" # Score: -60, Used: 90 "Apr" # Score: -3, Used: 12 "Nov" # Score: -1, Used: 7 ###### End of useless dictionary elements. ######
I'm sure this approach does not love IEND tag because existing libpng fuzzer almost never reaches end of the input where IEND chunk could make sense. Anyway, those units are from manually compiled dictionaries, and we do not want to minimize them. We need to minimize auto-collected dictionaries. And for them this approach suggests to remove lots of trashy units.
I don't know if this approach is useful or not. The only way to know is to try it more :)
Adding experimental flags like this is fine, as long as the implementation does not make things more complex.
This one doesn't.
Since this is an highly experimental feature I don't insist on adding tests at this moment,
but unless there are tests the feature may break at any time.
lib/Fuzzer/FuzzerDriver.cpp | ||
---|---|---|
363 | Do you need this here? | |
372 | not sure if we care about efficiency here. If we do, std::vector (declared outside of the loop, with .clear() call here) might be a better choice | |
377 | ditto | |
379 | just size_t, and use < instead of != |
I do not have commit access. May I ask you please to land the CL if you won't request more changes?
LGTM, I'll commit
lib/Fuzzer/FuzzerDriver.cpp | ||
---|---|---|
394 | {some perfectionism} |
Thank you very much!
lib/Fuzzer/FuzzerDriver.cpp | ||
---|---|---|
394 | Hm, but how? I could write a separate function that loops through the Data and modify it when substring is found, but I'm not sure that it would be faster, since here I also iterate through the Data only once. |
Do you need this here?
Fuzzer::ExecuteCallback does this already.