This is an archive of the discontinued LLVM Phabricator instance.

Reland "[mlir] add support for verification in integration tests"
ClosedPublic

Authored by gysit on Feb 12 2021, 4:31 AM.

Details

Summary

The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).

Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.

Changes:

  • Remove the max function that requires the "algorithm" header on Windows
  • Eliminate the truncation warning in the float specialization of verifyElem by using a float constant

Diff Detail

Event Timeline

gysit created this revision.Feb 12 2021, 4:31 AM
gysit requested review of this revision.Feb 12 2021, 4:31 AM
aartbik added inline comments.Feb 12 2021, 11:38 AM
mlir/include/mlir/ExecutionEngine/RunnerUtils.h
219

Please document what verify means in this case (returning true/false, not verify-failing)

mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100

did you verify that such a CHECK never has false positives or negatives?
I often see a lot of stray numbers in the output, and just want to make sure this it not too brittle

gysit added inline comments.Feb 12 2021, 12:39 PM
mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100

Thank you for the comments.

Are you referring to the numerical aspect of the verification? Or to the single character check?

I verified for a few test cases that I get errors / no errors when I expect them (that's the reason why I introduced the ^ and $). The numerics should be fine for the use cases I have seen so far. The epsilon I chose is quite small compared to the machine epsilon meaning the verification should be sensitive. I believe that is what we want for now.

The best choice for unit testing is of course working with integers. That avoids potential numerical issues.

aartbik added inline comments.Feb 12 2021, 1:49 PM
mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100

I was indeed referring to the single character check (just because the output sometimes contains timestamps etc.). I may be overly pedantic but I remember being surprised in the past that a digit-based check actually passed when I did not expect it.

I noticed the ^ and $ so I suspect you gave it some thought, but just wanted to double check. That is why I would prefer some harder check on output (like PASS/FAIL or return code), but if this works, I am fine.

Thanks for confirming.

gysit updated this revision to Diff 323529.Feb 12 2021, 11:53 PM
  • address comments aartbik's comments
  • return error by value
gysit marked 2 inline comments as done.Feb 12 2021, 11:57 PM
gysit added inline comments.
mlir/integration_test/Dialect/Linalg/CPU/benchmark_matmul.mlir
91–100

I understand the concern. But actually returning the number of errors is sometimes really useful. E.g. if you may be able to see that the values of one column row are entirely wrong. It is still possible to extend the test with an scf.if error != 0 then printFail to be on the safer side.

lgtm, perhaps also ask @mehdi_amini to have a quick look, since he is busy refactoring our runner utils

In general I'm a bit worried that this directory is not well structured, but this addition is in line with the kind of things already in this file so I don't feel like there's a strong reason to hold on this.

Kayjukh added inline comments.Feb 14 2021, 1:42 AM
mlir/include/mlir/ExecutionEngine/RunnerUtils.h
248

nit: Could be condensed as return (delta <= epsilon * std::abs(expected));

288

This loop "feels wrong" to me because of the two occurrences of the errors variable, even though it seems to be correct when taking the time to think about it. If I'm not mistaken you could remove the errors argument to the function and always start from zero. This line would then become errors += verify(os, ..., strides + 1);, which makes more sense to me.

306

Would it make more sense to return a boolean indicating whether or not there were errors and update a reference argument accordingly? Otherwise, the fact that this function returns -1 if there are no errors should be clearly documented somewhere, as it can easily be overlooked.

gysit updated this revision to Diff 323604.Feb 14 2021, 3:55 AM
  • addressed Kayjukh's comments
gysit marked 2 inline comments as done.Feb 14 2021, 3:58 AM
gysit added inline comments.
mlir/include/mlir/ExecutionEngine/RunnerUtils.h
288

True this was confusion. I am now using a separate printCounter passed by reference. Another option would be to return the error count by reference but I currently prefer having a return value on all verify methods.

306

I went for documenting the -1 since I like the additional info provided by the return value.

Kayjukh accepted this revision.Feb 14 2021, 10:07 AM

LGTM. Since @aartbik and @mehdi_amini seem to agree, I think you could land this patch.

This revision is now accepted and ready to land.Feb 14 2021, 10:07 AM
This revision was automatically updated to reflect the committed changes.
gysit marked an inline comment as done.