Extend the existing single-pass algorithm for Merger::Merge with an algorithm that gives better results. This new implementation can be used with a new set_cover_merge=1 flag.
This greedy set cover implementation gives a substantially smaller final corpus (40%-80% less testcases) while preserving the same features/coverage. At the same time, the execution time penalty is not that significant (+50% for ~1M corpus files and far less for smaller corpora). These results were obtained by comparing several targets with varying size corpora.
Change Merger::CrashResistantMergeInternalStep to collect all features from each file and not just unique ones. This is needed for the set cover algorithm to work correctly. The implementation of the algorithm in Merger::SetCoverMerge uses a bitvector to store features that are covered by a file while performing the pass. Collisions while indexing the bitvector are ignored similarly to the fuzzer.