This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add update_mlir_test_checks.py
Needs ReviewPublic

Authored by stephenneuendorffer on Jul 15 2020, 6:50 PM.

Details

Summary

mlir/utils/generate_test_checks.py is somewhat braindead about updating
tests. Recent improvements allow it to modify existing tests, but still
require writing a verbose command line which duplicates information
that exists in the RUN line of the test. This script follows the pattern
used in the rest of llvm, with some MLIR-sytax specific tweaks to generate
tests the way that generate_test_checks.py does it. I expect that this is not
perfect and could/should be reduced to use more of UpdateTestChecks/common.py

Diff Detail

Event Timeline

mehdi_amini added inline comments.Jul 15 2020, 8:37 PM
llvm/utils/update_mlir_test_checks.py
29

Something that I believe is important is to not just have blindly generated CHECKs but have a human editing the tests (documenting, minimizing the number of checks, etc.).

Can we have the script adding a paragraph mentioning in the actual generated test? (with a link to the testing guide, etc.)

stephenneuendorffer marked an inline comment as done.Jul 16 2020, 9:35 AM
stephenneuendorffer added inline comments.
llvm/utils/update_mlir_test_checks.py
29

Conceivably...but I'm guessing that we'd end up with the same paragraph in lots of files. I tend to use this to auto-update tests in a mechanical way which is easy to verify, but hard to do by hand. I'm specifically trying to avoid opening every file when updating lots of tests, so I think adding a bunch of text in the file that is expected to be removed would be a bad idea. What if we just added a boilerplate message about usage and norms that was generated when the script is run?

It's kind of confusing that this is in llvm/utils while generate-test-checks.py is in mlir/utils. Is there a plan to reconcile this difference?

mehdi_amini added inline comments.Jul 16 2020, 12:10 PM
llvm/utils/update_mlir_test_checks.py
29

Conceivably...but I'm guessing that we'd end up with the same paragraph in lots of files.

I mean: since the user is supposed to manually edit significantly the output of the tool, removing the paragraph doesn't seem much.

The challenge is not get the output of the script just good enough to be checked in as-is but leading to actual bad tests

It's kind of confusing that this is in llvm/utils while generate-test-checks.py is in mlir/utils. Is there a plan to reconcile this difference?

it exists in llvm/utils because there are a bunch of other similar scripts there (for llc, mir, ll, etc.) I'm open to suggestions on how to reconcile this. Personally, I would replace generate-test-checks.py with this, because I think it's easier to use in most cases, but sometimes generate-test-checks.py might be better. We could, perhaps, add an option to this that doesn't modify the test, but just generates the test checks on standard out which would enable both usage models.

stephenneuendorffer marked an inline comment as done.Jul 16 2020, 12:31 PM
stephenneuendorffer added inline comments.
llvm/utils/update_mlir_test_checks.py
29

Personally, I think we'll get better tests if we get people to focus on making good tests, rather than tedious manual editing. I think that this goes more into the input of tests, than the format of the check lines. I think forcing people to 'manually edit significantly' should definitely not be the goal.

There seems to be a lot of hardcoding in here related to func. Given that func is not even the only function in upstream MLIR, and that for several dialects(existing and upcoming) func isn't even used at all, what is the story for those users?

llvm/utils/update_mlir_test_checks.py
29

Forcing lots of manual edits is not the goal, but a more important goal is not to have "diff" tests. IMO having a large amount of users manually edit is much better than having one person have to deal with the fallout of having to update hundreds of tests that broke because they were testing something unnecessary. It spreads out the maintenance efforts and makes evolution easier. (Speaking from having had to deal with this a lot in practice.)

There seems to be a lot of hardcoding in here related to func. Given that func is not even the only function in upstream MLIR, and that for several dialects(existing and upcoming) func isn't even used at all, what is the story for those users?

I've been scratching my head a bit about how to build tools like this which need to understand the custom op formats. I'm also thinking of things like automatic syntax highlighting in emacs/other editors. One possibility is that this should actually be an MLIR-based tool which uses a different syntax which parses comments as ops. It would then be able to query operations and apply rules for where test boundaries should be. so instead of 'func' hardcoded as a string, the tool would query for FunctionLikeInterface. There are other things in here that are hardcoded too in regexps. The best thing I can say at the moment is that the tool just won't work in such cases.

[Drive by] tests?

[Drive by] tests?

Yeah, would be good idea.

silvas added inline comments.Jul 16 2020, 9:39 PM
llvm/utils/update_mlir_test_checks.py
3

mlir_opt -> mlir-opt

17

This could use an MLIR-ification

29

+1 to what River says. Encouraging some testing thoughtfulness would be really great.

If we really want to be evil we can insert some random garbage characters in the generated test checks 😈 so that people will have to manually run the test / look through the checks / test failures to remove them. And of course, the easiest way to fix such deliberately mangled tests is to delete whole CHECK lines, types, and other stuff, which is exactly what we want them to be doing; "the more I delete, the less I have to audit for random garbage inserted by the update_mlir_test_checks script".

Less evilly, I think that Mehdi's idea of generating a "test best practices summary + link to complete best practices doc" sounds good. It could look like:

PLEASE make sure that you trim down what is being checked by this test to the minimum that gives decent coverage. That will make life easier for people maintaining the test in the future.
- Does your test need to check all operations? If not, then remove the unnecessary CHECK lines! 
- Does your test need to check the types? If not, then remove the types from the CHECK lines!
- Does your test need to check all attributes of the ops? If not, then use {{.*}} to skip them!
- Does your test really need to check that SSA values are passed correctly? (rather than the mere presence of an op). If not, then remove the %[[FOO:.*]] from CHECK lines!

For more info on testing best practices see <link to more thorough docs>

Btw, does this script throw away any user edits? (haven't looked at the code very much). If it does, then users will probably get bored of re-trimming-down the autogenerated stuff no matter how much encouragement we give them.

292

why is this code commented out?

302

stray commented out code?

stephenneuendorffer marked an inline comment as done.Jul 17 2020, 12:05 AM
stephenneuendorffer added inline comments.
llvm/utils/update_mlir_test_checks.py
29

Can you help me out, because I don't really buy the "making it hard" part. If people aren't looking at their tests, and reviewers are looking at the tests either, then I don't think we can help them.

It also seems to me that minimal CHECK lines come from minimal input. I'd rather have consistent tests (i.e. automated with good formatting) that are easy to read and update, but perhaps more verbose, than CHECK lines which are absolutely minimal, but are poorly documented about what they are testing, or missing so much context that they are impossible to debug when they fail. Can you suggest some good 'heavily minimized' tests, so we can see how much difference there is between what is automatically generated and what is human-minimal? I'd like to encourage thought be put into tests, but temper it with some practicality.

Yes, the script throws away user edits. I think to not do that we need something significantly smarter than a bunch of REGEXes (along the lines of an MLIR dialect for CHECK lines.)

[Drive by] tests?

Yeah, would be good idea.

Sorry, I should have used more words. There is a way to add lit tests for these scripts now, see llvm/test/tools/UpdateTestChecks. :)

mehdi_amini added inline comments.Jul 17 2020, 5:02 PM
llvm/utils/update_mlir_test_checks.py
29

It also seems to me that minimal CHECK lines come from minimal input.

I think that is only half of it. Minimizing the input isn't enough to minimize the CHECK.
What would the script generate here for instance: https://github.com/llvm/llvm-project/blob/master/mlir/test/Transforms/sccp.mlir ?

silvas added inline comments.Jul 20 2020, 11:56 AM
llvm/utils/update_mlir_test_checks.py
29

There's a couple reasons why we might want to build in a bit of hand-holding into the script:

  1. Not all MLIR development happens in MLIR core upstream and is subjected to the same level of review
  2. Not all MLIR development is being done by people with extensive experience with FileCheck tests/best practices

I definitely sympathize with the idea that minimized check lines and result in difficult to update/understand tests (and tests that require a lot more staring-at to absorb their documentation value). While it is easier for test maintainers to update tests if they just need to run the script, it also loses a lot of value of the test (what if the test is now actually "failing"?). Except for crashes, simply updating the checks blindly with a script will let bugs slip through.

I think the idea is that a properly minimized test (both the IR and the set of things it is checking) won't be broken by practically any changes other than ones that truly break it (or are merely cosmetic, like op syntax changes) and need manual attention anyway.

silvas resigned from this revision.Oct 29 2020, 3:07 PM