This is an archive of the discontinued LLVM Phabricator instance.

ClamAV: Copy zlib into ClamAV benchmark
ClosedPublic

Authored by MatzeB on Apr 13 2017, 2:00 PM.

Details

Summary

Avoid the external zlib dependency in ClamAV by copying zlib-1.2.11
source into the benchmark.

External dependencies are problematic in benchmarks because:

  • They are not compiled with the same compiler/flags as the rest of the benchmarks.
  • They are an additional burden to setup when running the test-suite. While zlib is a really popular and ubiquitous library it is still sometimes missing in cross-compilation settings.
  • No external dependencies simplifies the buildsystem.

This will unfortunately increase the overal compilatime of ClamAV and may therefore disrupt the history of CTMark data.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Apr 13 2017, 2:00 PM

Avoid the external zlib dependency in ClamAV by copying zlib-1.2.11
source into the benchmark.

External dependencies are problematic in benchmarks because:

They are not compiled with the same compiler/flags as the rest of the benchmarks.
They are an additional burden to setup when running the test-suite. While zlib is a really popular and ubiquitous library it is still sometimes missing in cross-compilation settings.
No external dependencies simplifies the buildsystem.

I agree that it's a good thing to have fewer external dependencies. However I wonder if the alternative to rip out the parts of clamav that need the zlib library would work too, as a way to get rid of this external dependency?
It probably would mean converting the test inputs to not be compressed?
If a non-trivial amount of time is spent in zlib for clamav and we would want to retain benchmarking of the zlib library, maybe it'd be better to add zlib (with appropriate test input) as a separate benchmark to the test-suite?
We have modified other programs in the test-suite before to make them more useful for compiler performance evaluation and tracking, so I think that doing this would not go beyond what we've done before.
My guess is that performance tracking would be slightly easier if zlib was tested separately.

This will unfortunately increase the overal compilatime of ClamAV and may therefore disrupt the history of CTMark data.

I think it's pretty accepted that if we have a good reason to change the test-suite, we'll do so even if it breaks historical comparison.
The only way to do guaranteed like-for-like comparisons is to keep the version of the test-suite stable in your experiments.

If a non-trivial amount of time is spent in zlib for clamav and we would want to retain benchmarking of the zlib library, maybe it'd be better to add zlib (with appropriate test input) as a separate benchmark to the test-suite?

There is already spec2000/164.gzip which has pretty much the same inflation/deflation code as zlib. Also this patch is not about adding a new benchmark but simply getting rid of an external dependency!

We have modified other programs in the test-suite before to make them more useful for compiler performance evaluation and tracking, so I think that doing this would not go beyond what we've done before.
My guess is that performance tracking would be slightly easier if zlib was tested separately.

This isn't a good benchmark either way with the current inputs (~800k, of which ~210k are gzip compressed and hence zlib):

$ llvm-lit MultiSource/Applications/ClamAV/clamscan.test
...
exec_time: 0.0897

we should probably remove it from the BENCHMARKING_ONLY set (but in another patch).

With this only being a useful correctness/compile time test today, I'd vote to stay with the copying zlib solution as that is easier to do and is less invasive to the sourcecode.

If a non-trivial amount of time is spent in zlib for clamav and we would want to retain benchmarking of the zlib library, maybe it'd be better to add zlib (with appropriate test input) as a separate benchmark to the test-suite?

There is already spec2000/164.gzip which has pretty much the same inflation/deflation code as zlib. Also this patch is not about adding a new benchmark but simply getting rid of an external dependency!

Good point.

We have modified other programs in the test-suite before to make them more useful for compiler performance evaluation and tracking, so I think that doing this would not go beyond what we've done before.
My guess is that performance tracking would be slightly easier if zlib was tested separately.

This isn't a good benchmark either way with the current inputs (~800k, of which ~210k are gzip compressed and hence zlib):

$ llvm-lit MultiSource/Applications/ClamAV/clamscan.test
...
exec_time: 0.0897

we should probably remove it from the BENCHMARKING_ONLY set (but in another patch).

With this only being a useful correctness/compile time test today, I'd vote to stay with the copying zlib solution as that is easier to do and is less invasive to the sourcecode.

OK, I'm happy that the option to not rely on zlib by changing the source code was considered and rejected - no more objections from my side for this to go in.
I'm assuming that if in the future more programs in the test-suite rely on zlib, we might have to restructure this a bit, so one copy of zlib can be shared? Anyway - that's a problem to solve in the future if it ever appears.

On removing this from the benchmarking_only set: I don't think I've seen clamav be a test that's so noisy that the current methods in LNT to deal with noisy programs can't handle it.
But I also don't remember any performance delta that only clamav flagged up. In summary, I'd keep clamav in the benchmarking set, but also don't feel strongly about it being removed if that was a good for others.

On removing this from the benchmarking_only set: I don't think I've seen clamav be a test that's so noisy that the current methods in LNT to deal with noisy programs can't handle it.
But I also don't remember any performance delta that only clamav flagged up. In summary, I'd keep clamav in the benchmarking set, but also don't feel strongly about it being removed if that was a good for others.

I personally filter all benchmarks with a runtime shorter than 0.6s (test-suite/util/compare.py --filter-short) so I ignore ClamAV for performance measurements anyway. I am happy to leave it in the benchmarking set.

This revision was automatically updated to reflect the committed changes.