Added Dilate Kernel to Image processing using benchmark library.
Runtime:
Without Polly
Benchmark Time CPU Iterations
BENCHMARK_DILATE/128/128          142 us        142 us       4942
BENCHMARK_DILATE/256/256          568 us        568 us       1064
BENCHMARK_DILATE/512/512         2334 us       2334 us        290
BENCHMARK_DILATE/1024/1024       9745 us       9745 us         72
With Polly
Benchmark Time CPU Iterations
BENCHMARK_DILATE/128/128           40 us         40 us      13423
BENCHMARK_DILATE/256/256          184 us        184 us       3616
BENCHMARK_DILATE/512/512         1073 us       1073 us        600
BENCHMARK_DILATE/1024/1024       8595 us       8595 us         80
Note: This differential should be applied after D49339 (D49339 contains common function required like readImage, writeImage).
this call is made to warm up the cache