This is an archive of the discontinued LLVM Phabricator instance.

FileCheck [2/12]: Stricter parsing of -D option
ClosedPublic

Authored by thopre on Apr 7 2019, 4:17 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch gives earlier and better
diagnostics for the -D option.

Prior to this change, parsing of -D option was very loose: it assumed
that the part on the left of the equal sign was a valid variable name.
This commit adds logic to ensure that this is the case and gives
diagnostic when it is not, making it clear that the issue came from a
command-line option error. This is achieved by sharing the variable
parsing code into a new function ParseVariable.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Event Timeline

thopre created this revision.Apr 7 2019, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2019, 4:17 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
thopre updated this revision to Diff 194151.Apr 8 2019, 8:25 AM

Use camel casing for new functions

arsenm added a subscriber: arsenm.Apr 11 2019, 12:32 PM
arsenm added inline comments.
llvm/lib/Support/FileCheck.cpp
58–60

I think this should be split into a separate isValidVarNameStart function or something. I would also invert this to be a whitelist. We probably also need to at least disallow $, @, +, - and :

tra removed a subscriber: tra.Apr 11 2019, 1:16 PM
thopre updated this revision to Diff 194779.Apr 11 2019, 3:56 PM

Rebase and fix Optional usage

thopre marked an inline comment as done.Apr 12 2019, 2:02 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
58–60

I'm not sure what you mean by invert to a whitelist. Do you mean:

Name[0] == '_' || isalpha(Name[0])

? Regarding $, @, +, - and : I believe they are already rejected by the line above checking for isalnum or '_', isn't it?

Wouldn't it be preferable to put the command-line option validation in the driver, rather than here?
The division of responsibilities between the driver and the library is really not clear-cut; normally I'd expect the driver to do option parsing/validation and the library to simply accept the list. (I understand that FileCheck currently isn't always organized that way, but it's something I'd like to see happen.)

thopre updated this revision to Diff 194865.Apr 12 2019, 6:31 AM
thopre marked an inline comment as done.
  • Fix variable parsing logic
  • Add unit testing

Wouldn't it be preferable to put the command-line option validation in the driver, rather than here?
The division of responsibilities between the driver and the library is really not clear-cut; normally I'd expect the driver to do option parsing/validation and the library to simply accept the list. (I understand that FileCheck currently isn't always organized that way, but it's something I'd like to see happen.)

If the library was only used by the FileCheck executable I would agree but since it is used as a library by some of the tests I believe it makes sense to test here. Some of the code could go away if the library was receiving global defines as a vector of pair of StringRef but the validation of variable name would still be necessary IMO.

Wouldn't it be preferable to put the command-line option validation in the driver, rather than here?
The division of responsibilities between the driver and the library is really not clear-cut; normally I'd expect the driver to do option parsing/validation and the library to simply accept the list. (I understand that FileCheck currently isn't always organized that way, but it's something I'd like to see happen.)

If the library was only used by the FileCheck executable I would agree but since it is used as a library by some of the tests I believe it makes sense to test here. Some of the code could go away if the library was receiving global defines as a vector of pair of StringRef but the validation of variable name would still be necessary IMO.

IMO this is not a user-facing API. I think it's entirely reasonable to trust the unittests to call the API correctly, and have the user-facing driver validate the user-provided input. If you want extra checks in the library in Asserts mode, that would be fine.

Wouldn't it be preferable to put the command-line option validation in the driver, rather than here?
The division of responsibilities between the driver and the library is really not clear-cut; normally I'd expect the driver to do option parsing/validation and the library to simply accept the list. (I understand that FileCheck currently isn't always organized that way, but it's something I'd like to see happen.)

If the library was only used by the FileCheck executable I would agree but since it is used as a library by some of the tests I believe it makes sense to test here. Some of the code could go away if the library was receiving global defines as a vector of pair of StringRef but the validation of variable name would still be necessary IMO.

IMO this is not a user-facing API. I think it's entirely reasonable to trust the unittests to call the API correctly, and have the user-facing driver validate the user-provided input. If you want extra checks in the library in Asserts mode, that would be fine.

Actually, while this parsing code can easily be moved to the FileCheck executable by assuming the name is well formed, follow-on commits eventually allow to define a numeric variable from the command-line to a numeric expression so the parsing code is reused to avoid a lot of code duplication and the parsing code expect a SourceMgr to print the diagnostic to.

Therefore, if we want to remove the diagnostic from that function we need to remove the ability to define a numeric variable from a numeric expression (which allows stuff like -D#NUMVAR1=10 -D#NUMVAR2=NUMVAR1+4. How useful do you think this ability is?

jhenderson added inline comments.Apr 15 2019, 4:34 AM
llvm/lib/Support/FileCheck.cpp
27

Here and elsewhere below, I don't believe the style has changed yet and this should be upper-case 'C' (also 'I', 'E' etc below).

171

Does this print the actual specified string in the error message?

178

DefSepIdx doesn't read well to me (as in I'm not sure what is meant by it at first glance). Perhaps a slightly more verbose name would be clearer.

183

Lower-case 'i' I think is more in keeping with elsewhere, so I don't think you should change this.

188–189

Nit: is relaxed -> is a relaxed

1419

Keep the StringRef here.

1420

Use a StringRef here.

llvm/unittests/Support/FileCheckTest.cpp
155–163

These should probably be a separate test since they test a different function.

170

Here and below, it might be worthwhile using VarName.size() (with appropriate - 1 as required) to make the test more robust, in case the input name changes ever.

thopre updated this revision to Diff 195372.Apr 16 2019, 7:07 AM
thopre marked 7 inline comments as done.
  • Fix one letter variable casing
  • Do not capitalize error message
  • Split new unit tests in 2 separate functions
thopre updated this revision to Diff 195374.Apr 16 2019, 7:15 AM
thopre marked 2 inline comments as done.

Express expected result in unit test in terms of variable size

Therefore, if we want to remove the diagnostic from that function we need to remove the ability to define a numeric variable from a numeric expression (which allows stuff like -D#NUMVAR1=10 -D#NUMVAR2=NUMVAR1+4. How useful do you think this ability is?

I have no idea how useful this would be, but you're making a reasonable argument that in general the library needs to be where the validation occurs, because the acceptable syntax is determined by the library.
Just make it clear to the user where the problem is and what needs fixing.

thopre added inline comments.Apr 16 2019, 8:04 AM
llvm/lib/Support/FileCheck.cpp
171

It'll print the line, followed by a caret pointing to the variable followed by the error message. I thought mentioning the variable name in the error message was superfluous given the caret. Do you want me to include it?

1419

Should I rather use std::string instead? I need to prefix each string with "-D" (see DiagCmdline = Prefix + CmdlineDef) which cannot be done with a reference. Is there anything wrong with using std::string?

1420

Likewise, I need this to be std::string to allow for the concatenation on next line.

jhenderson added inline comments.Apr 16 2019, 8:47 AM
llvm/lib/Support/FileCheck.cpp
171

No, as long as it is clear from the context, there's no need.

188–189

Stricter -> A stricter

1419

Concatenation of StringRefs is supported via Twines. You can choose to make DiagCmdline a Twine itself, or do .str() on the result of the concatenation to get the final std::string a bit earlier, I don't really mind.

1420

See above.

probinson added inline comments.Apr 16 2019, 8:49 AM
llvm/lib/Support/FileCheck.cpp
1420

Why does the library know the name of the option passed to the driver? That's not a good separation of concerns.

thopre updated this revision to Diff 195525.Apr 17 2019, 2:22 AM
thopre marked 8 inline comments as done.
  • Make diagnostic for command-line global define parsing more neutral wrt. user of the FileCheck library
  • Use StringRef in the loop for command-line global define parsing
  • Fix wording of comment for @LINE parsing

I have only one more minor comment. Happy for this to go in with or without it, but @probinson should give the final thumbs up.

llvm/test/FileCheck/defines.txt
38

The line and column numbers like a bit weird to me, but isn't a big deal. If there's a way to not have them though, it would be better. Alternatively, if they have some actual meaning, please have a test case that produces different values for them.

thopre marked an inline comment as done.Apr 17 2019, 4:03 AM
thopre added inline comments.
llvm/test/FileCheck/defines.txt
38

Not have them in the output or in this expected text?

If the former yes currently it's not meaningful because the current code create a separate buffer with same name (Global defines) for each definition, so all definition would be a line 1, and the name would be a column 1 (I think it's good to test that bit since it shows we point to the variable name rather than the value).
I could change the code to have one line per definition which could be useful if a variable is defined twice from the command-line (i.e. -DFOO=10 -DFOO=12). Alternatively I could make the definition one after the other on the same line, perhaps separated by space (i.e. output would be FOO=10 FOO=12 with the caret pointing to the right FOO). Both of these approaches will require to loop over definition twice (once to construct the buffer, the second to parse it).

If the latter, I cannot introduce variations because the name will always be first but I think it's worth testing that the location is accurate.

thopre updated this revision to Diff 195578.Apr 17 2019, 8:31 AM

Enumerate global defines in diagnostic

thopre updated this revision to Diff 195752.Apr 18 2019, 8:36 AM

Remove debugging assert and add comment

thopre updated this revision to Diff 196254.Apr 23 2019, 8:18 AM
  • Add unit test for updated -D parsing
  • Move unit tests to be in the same order as in FileCheck.h and FileCheck.cpp
probinson added inline comments.Apr 24 2019, 7:39 AM
llvm/lib/Support/FileCheck.cpp
47

A whole lot of this loop appears to be handling just the optional initial $/@ and then the initial isValidVarNameStart test. I'd think it would be more readable to factor out the handling of the initial one or two characters, then loop over the rest.

probinson accepted this revision.Apr 24 2019, 7:50 AM

LGTM after the comment on the parseVariable loop is addressed.

This revision is now accepted and ready to land.Apr 24 2019, 7:50 AM
thopre updated this revision to Diff 197088.Apr 29 2019, 4:26 AM

Extract prefix characters checks out of loop in parseVariable.

thopre marked an inline comment as done.Apr 29 2019, 4:27 AM
jhenderson accepted this revision.Apr 29 2019, 5:33 AM

LGTM, with a couple of nits.

llvm/lib/Support/FileCheck.cpp
1410

"which definition a given diagnostic corresponds to."

1413

You don't need = std::string() here.

llvm/test/FileCheck/defines.txt
38

I was referring to the output, but I think this looks somewhat better.

thopre updated this revision to Diff 197096.Apr 29 2019, 6:29 AM
thopre marked 2 inline comments as done.

Fix english in comment in defineCmdlineVariables and remove unecessary call to String() constructor

This revision was automatically updated to reflect the committed changes.