This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Update output image dimensions in the Harris benchmark
ClosedPublic

Authored by Meinersbur on May 25 2021, 8:48 AM.

Details

Summary

On AIX, I realized that the output image matrix of the harris benchmark did not match the one produced on Linux.

For instance, on Linux, the image matrix started off as:

$ more output.txt
00000000000000000000000000000000000000000000000000000000000000 . . .

But on AIX, we were getting:

$ more output.txt
25525525525500000000002552552552552552552552552552550000000000 . . .

This patch updates the dimensions of the output image in the harris benchmark in order to
resolve the issue of printing the image on AIX.

Diff Detail

Repository
rT test-suite

Event Timeline

amyk requested review of this revision.May 25 2021, 8:48 AM
amyk created this revision.

image is meant to be a pointer to a (HEIGHT+2)*(WIDTH+2) image, where each "pixel" is a represented as a grayscale float (4 bytes) value. sizeof(float*) * (2 + HEIGHT) * (2 + WIDTH) allocates a pointer-size for each pixel (8 bytes on a 64-bit system), i.e. typically twice as much. However, the data is not supposed to be a 2-dimensional array of pointers.

I cannot claim there is no out-of-bounds array access in the program, but the size computation is not the issue. Did you try to run it under MemorySanitizer or Valgrind?

image is meant to be a pointer to a (HEIGHT+2)*(WIDTH+2) image, where each "pixel" is a represented as a grayscale float (4 bytes) value. sizeof(float*) * (2 + HEIGHT) * (2 + WIDTH) allocates a pointer-size for each pixel (8 bytes on a 64-bit system), i.e. typically twice as much. However, the data is not supposed to be a 2-dimensional array of pointers.

I agree the size change is not the right fix.

I cannot claim there is no out-of-bounds array access in the program, but the size computation is not the issue. Did you try to run it under MemorySanitizer or Valgrind?

What I can say is that using calloc seems to fix the issue:

-      malloc(sizeof(float) * (2 + HEIGHT) * (2 + WIDTH));
+      std::calloc((2 + HEIGHT) * (2 + WIDTH), sizeof(float));

That does cast suspicion on the correctness of the source program.

The problem is fairly obvious once found.

The kernel only initializes part of the output image (most obviously not the initial rows/columns):

for (int _i0 = 2; (_i0 < height); _i0++) {
  for (int _i1 = 2; (_i1 < width); _i1++) {
    outputImg[_i0][_i1] =

The check output does print from the initial rows/columns:

void printImage(int height, int width, Image &arr,
                int dummy) {
  std::ofstream myfile;
  myfile.open("output.txt");
  for (int i = 0; i < height - 2; i++) {
    for (int j = 0; j < width - 2; j++) {
      if (arr[i][j] < 0) {
if (argc == 2) {
  printImage(HEIGHT + 2, WIDTH + 2, *imageOutput, sum);
} else {
  printImage(HEIGHT + 2, WIDTH + 2, *imageOutput, -1);
}

Likewise, the "sum" is calculated using additional rows and columns not initialized by the kernel:

for (int i = 0; i < height + 2; i++) {
  for (int j = 0; j < width + 2; j++) {
    sum = (sum + 1) & (int)(*imageOutput)[i][j];
  }
}

Thank you @hubert.reinterpretcast for the analysis.

@amyk are you going to update that patch? Either bey use of calloc, or (preferably) fixing the array bounds of what is being checked.

The kernel only initializes part of the output image (most obviously not the initial rows/columns):

for (int _i0 = 2; (_i0 < height); _i0++) {
  for (int _i1 = 2; (_i1 < width); _i1++) {
    outputImg[_i0][_i1] =

Should this be

for (int _i0 = 1; (_i0 < height + 1); _i0++) {
  for (int _i1 = 1; (_i1 < width + 1); _i1++) {
    outputImg[_i0][_i1] =

(i.e. a 1-pixel margin left/right/top/bottom instead a 2-pixel margin left/top)? Alternatively, the output image doesn't even need the margin.

I remember having reviewed the original patch, seem I didn't pay enough attention.

amyk added a comment.May 27 2021, 11:52 PM

Thanks @hubert.reinterpretcast and @Meinersbur! I was also further investigating this and I can confirm that changing the malloc calls to calloc resolves the issue for the harris benchmark.

I noticed that this change (a 1-pixel margin left/right/top/bottom), still did not provide the correct result.

for (int _i0 = 1; (_i0 < height + 1); _i0++) {
  for (int _i1 = 1; (_i1 < width + 1); _i1++) {
    outputImg[_i0][_i1] =

However, you did mention that the output image does not require a margin. Did you mean the following?

diff --git a/MicroBenchmarks/harris/harrisKernel.cpp b/MicroBenchmarks/harris/harrisKernel.cpp
index 0fb9c894..09d8aacc 100644
--- a/MicroBenchmarks/harris/harrisKernel.cpp
+++ b/MicroBenchmarks/harris/harrisKernel.cpp
@@ -103,8 +103,8 @@ void harrisKernel(
     }
   }

-  for (int _i0 = 2; (_i0 < height); _i0++) {
-    for (int _i1 = 2; (_i1 < width); _i1++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
+    for (int _i1 = 0; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }

If so, then this change works to produce the correct output on AIX, as well.

And yes, I will be updating the patch - either by calloc or the above change (if that is preferred).

And yes, I will be updating the patch - either by calloc or the above change (if that is preferred).

Instead of calloc, we can use new:

diff --git a/MicroBenchmarks/harris/main.cpp b/MicroBenchmarks/harris/main.cpp
index 6d43c930..f9833588 100644
--- a/MicroBenchmarks/harris/main.cpp
+++ b/MicroBenchmarks/harris/main.cpp
@@ -66,4 +66,3 @@ void BENCHMARK_HARRIS(benchmark::State &state) {
   float(*imageOutput)[2 + HEIGHT][2 + WIDTH];
-  imageOutput = (float(*)[2 + HEIGHT][2 + WIDTH])
-      malloc(sizeof(float) * (2 + HEIGHT) * (2 + WIDTH));
+  imageOutput = new float[1][2 + HEIGHT][2 + WIDTH]();

@@ -129,3 +128,3 @@ void BENCHMARK_HARRIS(benchmark::State &state) {

-  free((void *)imageOutput);
+  delete [] imageOutput;
   free((void *)image);
@@ -157,4 +156,3 @@ int main(int argc, char *argv[]) {
   float(*imageOutput)[2 + HEIGHT][2 + WIDTH];
-  imageOutput = (float(*)[2 + HEIGHT][2 + WIDTH])
-      malloc(sizeof(float) * (2 + HEIGHT) * (2 + WIDTH));
+  imageOutput = new float[1][2 + HEIGHT][2 + WIDTH]();

@@ -213,3 +211,3 @@ int main(int argc, char *argv[]) {
   free((void *)image);
-  free((void *)imageOutput);
+  delete [] imageOutput;
   return 0;

However, you did mention that the output image does not require a margin. Did you mean the following?

diff --git a/MicroBenchmarks/harris/harrisKernel.cpp b/MicroBenchmarks/harris/harrisKernel.cpp
index 0fb9c894..09d8aacc 100644
--- a/MicroBenchmarks/harris/harrisKernel.cpp
+++ b/MicroBenchmarks/harris/harrisKernel.cpp
@@ -103,8 +103,8 @@ void harrisKernel(
     }
   }

-  for (int _i0 = 2; (_i0 < height); _i0++) {
-    for (int _i1 = 2; (_i1 < width); _i1++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
+    for (int _i1 = 0; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }

I don't have an overview on what Harris edge detection does with trace whether trace[0][0], trace[0][1] etc contains useful information. The output should also be a WIDTH*HEIGHT-sized image, i.e.

- float(*imageOutput)[2 + HEIGHT][2 + WIDTH];
+ float(*imageOutput)[HEIGHT][WIDTH];

It seems that outputImg[WIDTH], outputImg[WIDTH+1][ etc is ignored anyway. Same with trace[WIDTH], trace[WIDTH+1]`, etc.

amyk added a comment.May 31 2021, 6:54 AM

Instead of calloc, we can use new:

Thanks Hubert. I can con confirm that new also works for this test and we can possibly move forward with this, too.

I don't have an overview on what Harris edge detection does with trace whether trace[0][0], trace[0][1] etc contains useful information. The output should also be a WIDTH*HEIGHT-sized image, i.e.

- float(*imageOutput)[2 + HEIGHT][2 + WIDTH];
+ float(*imageOutput)[HEIGHT][WIDTH];

It seems that outputImg[WIDTH], outputImg[WIDTH+1][ etc is ignored anyway. Same with trace[WIDTH], trace[WIDTH+1]`, etc.

@Meinersbur Sorry - just a clarification question. From your comment, did you mean that imageOutput/trace/det should be [HEIGHT][WIDTH] since the WIDTH and WIDTH+1, etc. is ignored anyway?

amyk added a comment.May 31 2021, 8:23 AM

However, you did mention that the output image does not require a margin. Did you mean the following?

diff --git a/MicroBenchmarks/harris/harrisKernel.cpp b/MicroBenchmarks/harris/harrisKernel.cpp
index 0fb9c894..09d8aacc 100644
--- a/MicroBenchmarks/harris/harrisKernel.cpp
+++ b/MicroBenchmarks/harris/harrisKernel.cpp
@@ -103,8 +103,8 @@ void harrisKernel(
     }
   }

-  for (int _i0 = 2; (_i0 < height); _i0++) {
-    for (int _i1 = 2; (_i1 < width); _i1++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
+    for (int _i1 = 0; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }

I don't have an overview on what Harris edge detection does with trace whether trace[0][0], trace[0][1] etc contains useful information. The output should also be a WIDTH*HEIGHT-sized image, i.e.

- float(*imageOutput)[2 + HEIGHT][2 + WIDTH];
+ float(*imageOutput)[HEIGHT][WIDTH];

It seems that outputImg[WIDTH], outputImg[WIDTH+1][ etc is ignored anyway. Same with trace[WIDTH], trace[WIDTH+1]`, etc.

Additionally, do you mean that we should be doing this change, along with this change too:

diff --git a/MicroBenchmarks/harris/harrisKernel.cpp b/MicroBenchmarks/harris/harrisKernel.cpp
index 0fb9c894..09d8aacc 100644
--- a/MicroBenchmarks/harris/harrisKernel.cpp
+++ b/MicroBenchmarks/harris/harrisKernel.cpp
@@ -103,8 +103,8 @@ void harrisKernel(
     }
   }

-  for (int _i0 = 2; (_i0 < height); _i0++) {
-    for (int _i1 = 2; (_i1 < width); _i1++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
+    for (int _i1 = 0; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }

If we decide not to go with the new/delete or calloc approach?

@Meinersbur Sorry - just a clarification question. From your comment, did you mean that imageOutput/trace/det should be [HEIGHT][WIDTH] since the WIDTH and WIDTH+1, etc. is ignored anyway?

Would be nice. However, for a minimal fix, just not printing the irrelevant border would be sufficient.

amyk updated this revision to Diff 351063.Jun 9 2021, 10:09 PM
amyk retitled this revision from [test-suite] Update the malloc calls in the Harris benchmark to [test-suite] Update output image dimensions in the Harris benchmark.
amyk edited the summary of this revision. (Show Details)
  • Update the dimensions of outputImg and the loop indices for outputImg in the harris kernel

@Meinersbur It's possible that I may have misunderstood your comment, but could you please take another look at this patch?

Meinersbur accepted this revision.Jun 10 2021, 12:05 AM

LGTM, thank you

MicroBenchmarks/harris/harrisKernel.cpp
6

[nit] remove parens around HEIGHT

There wasn't really a reason for parens before, but now its even more obvious

MicroBenchmarks/harris/main.cpp
32

[nit] remove parens around HEIGHT

This revision is now accepted and ready to land.Jun 10 2021, 12:05 AM
amyk added a comment.Jun 21 2021, 6:56 AM

LGTM, thank you

Thanks for reviewing. My apologies in the delay on getting to this patch. I wanted to address the review comment this morning - upon realizing that I was not testing this patch with the latest revision of test-suite, it looks like this patch no longer addresses the issue on AIX.
It seems since the D101475 revision, the harris benchmark will fail on AIX even with this change.

The output image from this patch is slightly different with this patch (as seen in D103092-output.txt, although it still looks a more similar to the reference output when comparing with the output without this change.

$ diff reference-output.out D103092-output.out
10,12c10,13
< 0000000033330000003333000000 . . . 
< 0000000033330000003333000000 . . .
< 0000000033330000003333000000 . . . 
---
> 33000000333300000033330000003333 . . . 
> 33000000333300000033330000003333 . . . 
> 33000000333300000033330000003333 . . . 
> 33000000000000000000000000000000 . . . 

So, I think if I update the array bounds, this patch needs more work. As a side note, doing the new/delete or calloc approach will still produce the correct output image.

If the sizes of the output changes a mismatch with the old reference output is expected (the removed border contains undefined values anyway). In that case, just replace the reference-output.out with D103092-output.txt.

amyk added a comment.Jun 21 2021, 11:35 PM

If the sizes of the output changes a mismatch with the old reference output is expected (the removed border contains undefined values anyway). In that case, just replace the reference-output.out with D103092-output.txt.

If it's okay, I just wanted to double check if my understanding is clear: your suggestion is to add a new reference output file for AIX to reflect these changes, right? (Since the Linux result still matches the original reference output)

Also, ever since the D101475 revision (updating the benchmark dependency of Microbenchmarks), the different output image is produced even just by printing the margin like so:

diff --git a/MicroBenchmarks/harris/harrisKernel.cpp b/MicroBenchmarks/harris/harrisKernel.cpp
index 0fb9c894..09d8aacc 100644
--- a/MicroBenchmarks/harris/harrisKernel.cpp
+++ b/MicroBenchmarks/harris/harrisKernel.cpp
@@ -103,8 +103,8 @@ void harrisKernel(
     }
   }

-  for (int _i0 = 2; (_i0 < height); _i0++) {
-    for (int _i1 = 2; (_i1 < width); _i1++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
+    for (int _i1 = 0; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }

Without changing the dimensions of outputImg to [HEIGHT][WIDTH]. Is adding a new reference output still okay in this situation?
I just wanted to make sure since your previous comment stated "If the sizes of the output changes a mismatch with the old reference output is expected", but the output is different even without changing the dimensions of outputImg.

Meinersbur requested changes to this revision.Jun 25 2021, 12:02 PM

If it's okay, I just wanted to double check if my understanding is clear: your suggestion is to add a new reference output file for AIX to reflect these changes, right? (Since the Linux result still matches the original reference output)

How can the Linux output still be the same? harrisKernel assigns:

for (int _i0 = 2; (_i0 < height); _i0++) {
  for (int _i1 = 2; (_i1 < width); _i1++) {
    outputImg[_i0][_i1] = ...

printImage accesses:

for (int i = 0; i < height - 2; i++) {
  for (int j = 0; j < width - 2; j++) {
     ... arr[i][j] ...

There is a gap of the indices 0 and 1 that is never assigned, and the last two indices never printed. Output of Linux and AIX must be the same. It seems that the platform difference between Linux and AIX is that on Linux the memory is somehow initialized to zero. Please don't rely on that on either platform. There seems to be an off-by-two error here. Print all the values that have been assigned in harrisKernel, and only those.

This revision now requires changes to proceed.Jun 25 2021, 12:02 PM
amyk added a comment.Jul 2 2021, 11:07 AM

How can the Linux output still be the same? harrisKernel assigns:

for (int _i0 = 2; (_i0 < height); _i0++) {
  for (int _i1 = 2; (_i1 < width); _i1++) {
    outputImg[_i0][_i1] = ...

printImage accesses:

for (int i = 0; i < height - 2; i++) {
  for (int j = 0; j < width - 2; j++) {
     ... arr[i][j] ...

There is a gap of the indices 0 and 1 that is never assigned, and the last two indices never printed. Output of Linux and AIX must be the same. It seems that the platform difference between Linux and AIX is that on Linux the memory is somehow initialized to zero. Please don't rely on that on either platform. There seems to be an off-by-two error here. Print all the values that have been assigned in harrisKernel, and only those.

I apologize, I misunderstood your previous comment - I agree that the output of Linux and AIX should be the same.

In the current patch, harrisKernel doesn't assign outputImg starting at 2, since I changed it to start at 0:

for (int _i0 = 0; (_i0 < height); _i0++) {
  for (int _i1 = 0; (_i1 < width); _i1++) {
    outputImg[_i0][_i1] =
        (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
  }
}

Moreover, unless I am missing something, just by looking at the code, it looks like printImage does actually print the last two indices, despite going up to height - 2 and width - 2 because the arguments of height and width to printImage are HEIGHT + 2 and WIDTH + 2:

if (argc == 2) {
  printImage(HEIGHT + 2, WIDTH + 2, *imageOutput, sum);
} else {
  printImage(HEIGHT + 2, WIDTH + 2, *imageOutput, -1);
}
amyk added a comment.Jul 2 2021, 11:08 AM

Since the revision of updating the benchmark dependency of Microbenchmarks affected this change I made to harris, I tried to redo this patch and noticed something.
If I start off with the change of making imageOutput to have the size of [HEIGHT][WIDTH] instead of [HEIGHT + 2][WIDTH + 2], the image output I get on Linux and AIX is almost the same, except for the first line that is printed:

$ diff AIX_output.txt Linux_output.txt
1c1
< 0000002552552552552552552552552552550000000000255 . . . 
> 0000000000000000000000000000000000000000000000000 . . . 

So I think this relates to the height of the image. Updating harrisKernel to assign outputImg from 0 until height resolves the issue of the first line being different.

However, I found that the I can no longer change the width to also start from 0.

-  for (int _i0 = 2; (_i0 < height); _i0++) {
+  for (int _i0 = 0; (_i0 < height); _i0++) {
     for (int _i1 = 2; (_i1 < width); _i1++) {
       outputImg[_i0][_i1] =
           (det[_i0][_i1] - ((0.04f * trace[_i0][_i1]) * trace[_i0][_i1]));
     }
   }
 }

It looks like currently, width needs to stay the same, or there is more variation in the lines that are produced.

$ diff AIX_output.txt Linux_output.txt
10,13c10,12
< 3300000033330000003333000000333300000033330000003333000000 . . . 
< 3300000033330000003333000000333300000033330000003333000000 . . . 
< 3300000033330000003333000000333300000033330000003333000000 . . .
< 3300000000000000000000000000000000000000000000000000000000 . . . 
---
> 0000000033330000003333000000333300000033330000003333000000 . . .
> 0000000033330000003333000000333300000033330000003333000000 . . . 
> 0000000033330000003333000000333300000033330000003333000000 . . .
. . . 

I can update the patch based on this change that I've observed.

amyk updated this revision to Diff 356225.Jul 2 2021, 11:12 AM

Update patch to:

  • change imageOutput to have a size of [HEIGHT][WIDTH].
  • Remove unnecessary parentheses around HEIGHT.
  • Update harrisKernel to assign height from 0.

This updated revision reflects the change that is discussed in the comment above.

  • Fix use of unitialized memory in MicroBenchmarks/harris

Since we did not seem to advance here, I took the initiative and fixed the test according to what I had in mind.

Changes:

  • Increase border size of input image to 2 in each direction
  • Remove unused border of Sxx,Syy,Sxz,det,trace
  • Normalize loop bounds to start with 0
  • Remove superfluous parentheses

Valgind memcheck, AddressSanitizer and MemorySanitizer(*)-clean

(*) msan still complains about use of initialized memory in Google Benchmark

  • Remove some more unnecessary parens
amyk added a comment.Jul 11 2021, 11:43 PM

Since we did not seem to advance here, I took the initiative and fixed the test according to what I had in mind.

Changes:

  • Increase border size of input image to 2 in each direction
  • Remove unused border of Sxx,Syy,Sxz,det,trace
  • Normalize loop bounds to start with 0
  • Remove superfluous parentheses

Valgind memcheck, AddressSanitizer and MemorySanitizer(*)-clean

(*) msan still complains about use of initialized memory in Google Benchmark

Hi, I'd like to apologize for previously misunderstanding your review comments earlier, and sorry for a lot of back and forth on this patch.
Nonetheless, thank you very much for updating the patch, I greatly appreciate it.

I've applied this patch and this also runs successfully on AIX.

Sorry if this is a silly question, but may I ask behind the justification of increasing the border size for the input image?

Sorry if this is a silly question, but may I ask behind the justification of increasing the border size for the input image?

Computing Ix and Iy each uses 1 neighbor in east/west respectively north/south direction from the original image, which requires the input to be +2 larger than the output. In application where input/output must be the same size, it is common to mirror the pixels along the image borders. Here, we just ask initCheckboardImage to create a larger image.

Similarly computing Iyy, Sxy, Sxx, Syy requires a larger border from Ix and Iy. Both accumulate to a +4 for both stencils.

amyk added a comment.Jul 12 2021, 4:55 PM

Sorry if this is a silly question, but may I ask behind the justification of increasing the border size for the input image?

Computing Ix and Iy each uses 1 neighbor in east/west respectively north/south direction from the original image, which requires the input to be +2 larger than the output. In application where input/output must be the same size, it is common to mirror the pixels along the image borders. Here, we just ask initCheckboardImage to create a larger image.

Similarly computing Iyy, Sxy, Sxx, Syy requires a larger border from Ix and Iy. Both accumulate to a +4 for both stencils.

Ah, I see. Thanks a lot for the explanation.
I would also like to ask would it make sense for you to commandeer this revision since you've updated the patch?

Meinersbur commandeered this revision.Jul 12 2021, 5:18 PM
Meinersbur edited reviewers, added: amyk; removed: Meinersbur.

OK. I took over the patch. Please try it on AIX and LGTM it if you think it is good.

amyk accepted this revision.Jul 14 2021, 11:18 AM

OK. I took over the patch. Please try it on AIX and LGTM it if you think it is good.

I've tried this on AIX and the output I'm getting matches the reference output. Thank you again for your assistance on this patch.
LGTM.

This revision is now accepted and ready to land.Jul 14 2021, 11:18 AM