This patch adds BilateralFiltering kernel in "MicroBenchmarks/ImageProcessing"
Runtime
Without Polly: Total 0m5.542s
Benchmark | Time | CPU | Iterations |
---|---|---|---|
BENCHMARK_BILATERAL_FILTER/16/2 | 40 us | 40 us | 17339 |
BENCHMARK_BILATERAL_FILTER/16/4 | 131 us | 131 us | 5361 |
BENCHMARK_BILATERAL_FILTER/32/2 | 164 us | 164 us | 4267 |
BENCHMARK_BILATERAL_FILTER/32/4 | 632 us | 632 us | 1109 |
BENCHMARK_BILATERAL_FILTER/64/2 | 700 us | 700 us | 1000 |
BENCHMARK_BILATERAL_FILTER/64/4 | 3286 us | 3285 us | 213 |
With Polly : Total 0m5.577s
Benchmark | Time | CPU | Iterations |
---|---|---|---|
BENCHMARK_BILATERAL_FILTER/16/2 | 41 us | 41 us | 16945 |
BENCHMARK_BILATERAL_FILTER/16/4 | 134 us | 134 us | 5248 |
BENCHMARK_BILATERAL_FILTER/32/2 | 168 us | 168 us | 4175 |
BENCHMARK_BILATERAL_FILTER/32/4 | 647 us | 647 us | 1071 |
BENCHMARK_BILATERAL_FILTER/64/2 | 716 us | 716 us | 978 |
BENCHMARK_BILATERAL_FILTER/64/4 | 2963 us | 2963 us | 236 |
Even after repeated runs, this small difference in runtime is there. For input (32/4) and (64/4), Polly always performed better.