Page MenuHomePhabricator

[clang-tidy] Linting .rst documentation
Needs ReviewPublic

Authored by MyDeveloperDay on Dec 10 2018, 10:09 AM.

Details

Summary

(retitled from Looking for feedback on add_new_check.py functionality)
This is really a first revision to get some feedback from code owners about where a change like this should be added (if at all)

Recently while submitting a patch D55433: [clang-tidy] Adding a new modernize use nodiscard checker and looking at other similar reviews D48866: [clang-tidy] Add incorrect-pointer-cast checker,D54222: [clang-tidy] Add a check to detect returning static locals in public headers , I noticed a number of times my review was commented on with "Please use 80 characters limit." in .rst files.

as clang-format doesn't automatically correct these files its really easy for these comments to creep back in in subsequent revisions, potentially resulting in more comments and more revisions.

We could add validation of the .rst files as a step that could be performed prior to submitting a patch to help get reviews ready, this would reduce the repetition that key reviewers have to perform for each new person who ventures to submit something.

We could either put such a validation step directly into add_new_check.py

e.g.
add_new_check.py -validate checker_module checker_name

or as a new python file (which is what I include here mainly for brevity)

validate_check.py checker_module checker_name

This would then validate the line length of .rst files, and determine if the line could be filled to closer to 80 characters by including the first word from next line

Included in this patch, are the changes found by running this with

validate_check.py modernize-use-noexcept

It checks both docs/ReleaseNotes.rst and the specified checker in docs/clang-tidy/checks/module_name-checker_name.rst

and emits outputs like....

e.g.

Checking ..\docs\clang-tidy\checks\modernize-use-noexcept.rst...
Line 32 maximize 80 characters by joining:'[Users can use :option:`ReplacementString` to specify a macro to use
]'
and
'[instead]


Line 33 maximize 80 characters by joining:'[instead of ``noexcept``.  This is useful when maintaining source code
]'
and
'[that]

Diff Detail

Event Timeline

MyDeveloperDay added a project: Restricted Project.
MyDeveloperDay edited the summary of this revision. (Show Details)

Hi,

i think the idea is good to have more tools that do style checks for us.
Plugging them into the normal test-suite is maybe a good idea, as otherwise everyone forgets to use them ;)

But I feel @alexfh or @aaron.ballman opinions should be taken here, too.

Thank you for great idea! I think running this script should be part of build, like running Clang-format in Polly.

MyDeveloperDay marked an inline comment as done.Dec 10 2018, 11:50 AM
MyDeveloperDay added inline comments.
docs/ReleaseNotes.rst
205 ↗(On Diff #177547)

There are some places where it doesn't work so well, like line 203, it simply can't be cut down and its actually 81 characters

JonasToth added inline comments.Dec 10 2018, 11:53 AM
docs/ReleaseNotes.rst
205 ↗(On Diff #177547)

Yes, but you should be able to detect such a situation. If you have this functionality it i think it would be good to go.

Other common problems which will be great to detect with this script:

  • Double spaces
  • Empty lines at end of file
  • Ensure same length of headers titles and markup.

Other common problems which will be great to detect with this script:

  • Double spaces
  • Empty lines at end of file
  • Ensure same length of headers titles and markup.

It seems you also picked me up on

  • double multiple adjacent blank lines...

I would also expect

  • trailing white-space to perhaps be a comment review comment

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Added new checks to help reduce review comments in documentation

  • check for unnecessary empty lines at the end of the rst files
  • check for double spaces
  • check for extraneous multiple blank lines
  • check for extraneous trailing whitespace
  • ensure markup title and markup characters --- === ^^^ match in lengh
  • exclude lines containing <clang-tidy
  • reformat message to look like a compiler warning

in looking closer at an example of the output..

$ ./validate_check.py readability avoid-const-params-in-decls
Checking ..\docs\ReleaseNotes.rst...
warning: line 27 is in excess of 80 characters (81) : the latest release, please see the `Clang Web Site <https://clang.llvm.org>`_ or
...
warning: line 31 maximize 80 characters by joining:'[Note that if you are reading this file from a Subversion checkout or the]' and '[main...]

warning: line 32 maximize 80 characters by joining:'[main Clang web page, this document applies to the *next* release, not]' and '[the...]

warning: line 33 maximize 80 characters by joining:'[the current one. To see the release notes for a specific release, please]' and '[see...]

warning: line 81 maximize 80 characters by joining:'[  floating-point context, and recommends the use of a function that]' and '[returns...]

warning: line 162 maximize 80 characters by joining:'[  ``namespace a { namespace b { ... }}`` and offers change to]' and '[syntax...]

warning: line 167 is in excess of 80 characters (81) :   Detects usage of the deprecated member types of ``std::ios_base`` and replaces
...
warning: line 173 trailing whitespace
Checking ..\docs\clang-tidy\checks\readability-avoid-const-params-in-decls.rst...
warning: line 16 contains double spaces
warning: line 17 contains double spaces
You are ready to review!

I realize i have a couple of out by 1 errors on the line_num and the length not excluding the newline, which I'll address in the next patch

Add capability to check all the checks in one go (suitable for a build step)

Example output from running with validate_check.py --check-all

MyDeveloperDay updated this revision to Diff 177697.EditedDec 11 2018, 5:44 AM

Reduce false positives

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

This should make you happy!

Checking .\../docs/clang-tidy/checks\hicpp-no-assembler.rst...
warning: line 4 title and markup non matching lengths
.. title:: clang-tidy - hicpp-no-assembler

hicpp-no-assembler
===================

Check for assembler statements. No fix is offered.

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

This should make you happy!

Checking .\../docs/clang-tidy/checks\hicpp-no-assembler.rst...
warning: line 4 title and markup non matching lengths
.. title:: clang-tidy - hicpp-no-assembler

hicpp-no-assembler
===================

Check for assembler statements. No fix is offered.

Great! Thank you for script improvements!

Neat script, I think we could even plug that into arc diff. AFAIK it can run linters

clang-tidy/validate_check.py
22

Please align the string better.

48

Please make the comments here grammatically correct, too.

64

using parens in python is not necessary. if not in_code_block: is enough. Same at all other places. I would only use parens if you mix or and and.

71

We had cases were code-block is indented, sometimes its even necessary if you have a nested code-block in an option field or so. I think that should be considered.

93

we don't count higher? :)
How about a little regex close to r"^\d+\. .*"? Not sure if that is actually correct now, but please try it out ;)

103

What about \n?

121

That condition is quite hard to read. Could make a little function with a good name that takes the necessary parameters and returns a boolean instead?

123

please add space around the second >

148

please align that line better

164

Does that allow multiple arguments too? It would be great to just provide a list.

Maybe nargs=+, required=False, default = [None,] (not sure if default would be required) would fit better?

169

Does it need to be a new check? I think you could drop that.

180

Please add Both module and check... to make it clear they have to be set both.

190

please return 0 on success and 1 or custom error codes on failure.

202

sys.exit(main()), CI could always validate and use the return code for success.

Address review comments

MyDeveloperDay marked 12 inline comments as done.Dec 12 2018, 10:44 AM

I think will be good idea to rename script to validate_documentation.py. Or may be lint?

JonasToth added inline comments.Dec 13 2018, 3:12 AM
clang-tidy/validate_check.py
103

Nit: missing whitespace after comma, you can use yapf or autopep8 to format your python code, too.

MyDeveloperDay retitled this revision from [clang-tidy] Looking for feedback on add_new_check.py functionality to [clang-tidy] Linting .rst documentation.Dec 13 2018, 1:56 PM
MyDeveloperDay edited the summary of this revision. (Show Details)

I think will be good idea to rename script to validate_documentation.py. Or may be lint?

I guess this really turned into this is sphinx linter, for which perhaps something like that already exists, however I think unless I put a full parser in it would never really cover everything without too many exceptions...

I noticed that recently the static analyzer team also suggested adding their documentation in sphinx format D54429: [analyzer] Creating standard Sphinx documentation by @dkrupp , not wanting to criticize them but it didn't take me long to discover places where they also broke some of the rules like double empty adjacent lines and misaligned markup headers etc..

So then I looking inside clang/docs.... and again it was easy to find multiple places where the file had minor issues

$ ./validate_check.py --no-maximize --rst ../../../docs/ReleaseNotes.rst
Checking ..\..\..\docs\ReleaseNotes.rst...
warning: line 115 multiple blank lines
warning: line 157 multiple blank lines
warning: line 184 multiple blank lines
warning: line 234 title and markup non matching lengths
warning: line 236 multiple blank lines
warning: line 240 multiple blank lines
warning: line 258 multiple blank lines
warning: line 266 multiple blank lines
warning: line 280 is in excess of 80 characters (82) :   was extended. One more type of issues is caught - implicit integer sign change.

so then I headed over to llvm/docs and I see, you guess it, more cases at that level too

Checking ..\..\..\..\..\docs\ReleaseNotes.rst...
warning: line 13 multiple blank lines
warning: line 18 contains double spaces
warning: line 18 is in excess of 80 characters (82) : release 8.0.0.  Here we describe the status of LLVM, including major improvements
...
warning: line 20 contains double spaces
warning: line 31 contains double spaces
warning: line 59 multiple blank lines
warning: line 65 multiple blank lines
warning: line 71 multiple blank lines
warning: line 83 title and markup non matching lengths
warning: line 88 title and markup non matching lengths
warning: line 95 multiple blank lines
warning: line 96 multiple blank lines
warning: line 100 multiple blank lines
warning: line 109 multiple blank lines
warning: line 117 contains double spaces

its almost as if this script could live as a tool on its own not just for clang-tidy, and whilst a few formatting errors aren't anything to write home about, it perhaps constitutes of "review comments" that went unmissed, and maybe its something some people might worry about.

Even without the rule that tries to maximize the 80 characters per line (which I added a -no-maximize switch to allow it to be turned off) and running over the full LLVM tree

$ find . -name '*.rst' -exec ./tools/clang/tools/extra/clang-tidy/validate_check.py --no-maximize --rst {} \; | grep warning | wc -l

14172

Gave 14,000+ warnings across the entire LLVM tree

6231 - line is in excess of 80 characters
4774 - line contains double spaces
1528 - line multiple blank lines
1078 - line trailing whitespace
484 - title and markup non matching lengths
81 - empty lines at the end of the file (almost 16% of all .rst files)

Of course some of these lines might have multiple errors...

I'm not sure if anyone is interested in such a linter or if something already exists? perhjaps I should be forwarding this to someone who looks after llvm/utils/lint, but the last person to make any changes over there was in 2009

I think will be good idea to discuss wider application for this script in llvm-dev. If LLVM use Clng-format, why not to use this script?

MyDeveloperDay planned changes to this revision.Jan 9 2019, 2:26 AM

Addressing some formatting review comments

  • add ability to run with --rst to perform on a single .rst file (useful for validating docs not in clang-tidy)
  • add ability to ignore the 80 character rulles (with -no-maximize), (some of the other documentation has lots of this violations and we might want to fix the obvious checks first)
MyDeveloperDay marked an inline comment as done.Jan 10 2019, 2:09 AM
clang-tidy/validate_check.py
37

@MyDeveloperDay did you consider using the fairly standard docutils package to avoid the manual parsing of rst file? Something along

python
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
def parse_rst(path):
    with open(path) as fd:
        text = fd.read()
        parser = docutils.parsers.rst.Parser()
        components = docutils.parsers.rst.Parser,
        settings = docutils.frontend.OptionParser(components=components).get_default_values()
        document = docutils.utils.new_document('<rst-doc>', settings=settings)
        parser.parse(text, document)
        return document

class MyVisitor(docutils.nodes.NodeVisitor):
   def visit_literal_block(self, node):
     print(node.raw_source)  # do stuff here
   def unknown_visit(self, node):
    pass

looks more resilient to me...

MyDeveloperDay marked an inline comment as done.Jan 14 2019, 1:16 AM
MyDeveloperDay added inline comments.
clang-tidy/validate_check.py
37

I'll take a look that looks like a good suggestion, especially if it gives us access to the original text

A few questions:

  1. why do we want to lint just clang-tidy docs and not all LLVM's .rst files?
  2. did you try the existing tools to lint rst, for example https://github.com/twolfson/restructuredtext-lint?
  3. maybe there are some automated formatters for rst out there?

A few questions:

  1. why do we want to lint just clang-tidy docs and not all LLVM's .rst files?
  2. did you try the existing tools to lint rst, for example https://github.com/twolfson/restructuredtext-lint?
  3. maybe there are some automated formatters for rst out there?

@alexfh, All completely valid questions...to be honest at first I was just trying to reduce the number of code review comments that were "remove the double blank line", "keep the file to 80 chars" that kept coming back, maybe it comes with the "tidy" territory, but I noticed the reviews are really hot on whitespace even in the documentation (I'm fine with that!) . To be honest i'm too much of a newbie to go diving in elsewhere in LLVM, and the thought of suggesting some other tool to the toolchain was beyond my experience.

I was really just trying to build was a "pre-review" build step..to reduce the number of times I ended up going around on the reviews...but also more to try to save the normal clang-tidy reviewers some typing.. It was only later I realized that this could impact the rest of the LLVM rst files and so its clear my solution would need to be less clang-tidy centric..

Possibly other rst lint tools is a better way to go, so I'm in two minds as to whether to just abandon this, however I still use my script extensively to double check every time I make a patch, so its useful at least for me. When I see reviews from others I often bring down their code and run the script and use it to help "review" their change.

To be honest it might even be possible to build a review bot that could automatically add review comments to the rst files, I know I've done something similar with an another Phabricator instance. (I used a herald rule to post an http message to a CI system on review creation which would fetch the review diff and then run a linter over the patch and then submit the review comments, back into the review moments after the review was created, no one took offense because it was a bot user, and generally the review comments prevented the normal reviewers from having to repeat themselves.).. something like this that just said "run-clang-format" if the -output-replacements-xml file was empty could probably save people a ton of "please run clang-format on this" type comments.

Ultimately I see from the "pre-review" script I've put in place for myself that it save me time and time again, ensuring everything is clang-formatted, building the sphinix documentation, creating the patch with full context, linting the rst files, running the unit test and even running clang-tidy itself with some readability rules on the code I'm submitting, means I generally catch issues 80% of the time. Given that my first commit caused @JonasToth to have to make 2 or 3 other commits just to correct my work because I didn't know what else would break in the lab, meant I didn't want to make those mistakes again.

I see a lot of documentation about how to handle reviews and using Phabricator (I'm a massive Phabricator fan), I just wonder how many review comments are the same every time, as new people come and go, it must be a continuous education exercise for those dedicated reviewers like yourself and others to TEACH people the "ways" and have to justify why its done like that to every new developer that comes along with their own style ..all of which feels like it could be automated to de-personalize those aspects of the review process and make the review about the "approach" and not about the tidyness or conforming to the way its been decided.

A few questions:

  1. why do we want to lint just clang-tidy docs and not all LLVM's .rst files?

Proposal was made on llvm-dev, but it didn't attract interest.

A few questions:

  1. why do we want to lint just clang-tidy docs and not all LLVM's .rst files?

Proposal was made on llvm-dev, but it didn't attract interest.

Maybe I'll just put the script over on github in a clang-utils project so its easier to download...and then Abandon this review here....then we can always point people at it if they find it useful.

A few questions:

  1. why do we want to lint just clang-tidy docs and not all LLVM's .rst files?

Proposal was made on llvm-dev, but it didn't attract interest.

Maybe I'll just put the script over on github in a clang-utils project so its easier to download...and then Abandon this review here....then we can always point people at it if they find it useful.

Not a fan of that. I would rather "sneak" this script (or similar functionality) in LLVMs repo, even if its just in clang-tidy. It can then be promoted way easier to other places.
That the proposal did not attract interest might have different reasons and should not be seen as general rejection in my opinion.

You said you already did something like this in another case: I would be very interested of having this. If I do arc diff master to upload a patch there is a linting-stage, how could this be hooked in there?
I did read a bit but ended up with you can extend phabricator with some php, which I did not follow further.

MyDeveloperDay added a comment.EditedJan 25 2019, 2:54 AM

Phabrciator can use Herald rules to react to a change in a revision

If that is a gobal rule it can run a Harbormaster build plan, I can't tell if this will work (because I don't have permission on the LLVM phabricator) but someone has already set up a arc lint + arc unit build plan

it may be possible to use that, if thats the case we may just be able to set up the validation of .rst files are as a arc linter in .arcconfig (without needing people to use arc to submit the reviews)

alternatively Harbormaster can make an HTTP request to a CI system, passing information about the review PHID, its then actually very simple json and wget or curl type calls to talk back to Phabricator to insert review comments

see https://reviews.llvm.org/conduit/method/differential.createinline/

So you could imagine, someone makes a review, and a CI system pulls the contents, runs some simple checks (like are you clang-formatted) then inserts a review comment back into the reivew..

This means you don't really need to hack in php (although I've customized our own phabricator its very easy as it has an extensibility mechanism)

Phabricator is very powerful, we are only really scratching the surface.. take a look here at some of the things you can do...

https://reviews.llvm.org/conduit/

To be honest it would probably best if we made contact with whoever is the administrator of Phabricator (do we know who that is?), because there are some very simple things that can be added to make life a little easier.. (for example like using the bugtraq.logregex and bugtraq.url settings to automatically turn Bugzilla PR numbers in messages into hyperlinks) (http://www.howtobuildsoftware.com/index.php/how-do/bFAT/issue-tracking-phabricator-link-to-issue-page-from-within-phabricator)

Phabrciator can use Herald rules to react to a change in a revision

If that is a gobal rule it can run a Harbormaster build plan, I can't tell if this will work (because I don't have permission on the LLVM phabricator) but someone has already set up a arc lint + arc unit build plan

it may be possible to use that, if thats the case we may just be able to set up the validation of .rst files are as a arc linter in .arcconfig (without needing people to use arc to submit the reviews)

alternatively Harbormaster can make an HTTP request to a CI system, passing information about the review PHID, its then actually very simple json and wget or curl type calls to talk back to Phabricator to insert review comments

see https://reviews.llvm.org/conduit/method/differential.createinline/

So you could imagine, someone makes a review, and a CI system pulls the contents, runs some simple checks (like are you clang-formatted) then inserts a review comment back into the reivew..

This means you don't really need to hack in php (although I've customized our own phabricator its very easy as it has an extensibility mechanism)

Phabricator is very powerful, we are only really scratching the surface.. take a look here at some of the things you can do...

https://reviews.llvm.org/conduit/

To be honest it would probably best if we made contact with whoever is the administrator of Phabricator (do we know who that is?), because there are some very simple things that can be added to make life a little easier.. (for example like using the bugtraq.logregex and bugtraq.url settings to automatically turn Bugzilla PR numbers in messages into hyperlinks) (http://www.howtobuildsoftware.com/index.php/how-do/bFAT/issue-tracking-phabricator-link-to-issue-page-from-within-phabricator)

Fancy :)
I know that @sammccall is an admin, at least the profile says so :)
That would be a cool way to start using clang-tidy in LLVM, as its easier to enforece new code to be clean instead of fixing everything to not drown in warnings.

Following on from the discussion over in D49116: Setup clang-format as an Arcanist linter

if we added the validate_check.py into llvm/utils and created the following in .arclint in (llvm/tools/clang/tools/extra I'm not using the gitmonrepo yet)

.arclint

{
  "linters": {
    "validate-rst": {
      "type": "script-and-regex",
      "script-and-regex.script": "../../../../utils/validate_check.py --rst ",
      "script-and-regex.regex": "/^(?P<severity>warning|error): line (?P<line>\\d+) (?P<message>.*)$/m",
      "include": [
        "(\\.(rst)$)"
      ]
    }
  }
}

Then running arclint (after makes some breaking changes to .rst files) would give

I think using the monorepo this line would have to change

"script-and-regex.script": "../../../../utils/validate_check.py --rst ",

to

"script-and-regex.script": "../llvm/utils/validate_check.py --rst ",
"script-and-regex.script": "../../../../utils/validate_check.py --rst ",

The question now is: should we split the rst linting from other aspects? Imho yes and put the .rst-linting into LLVM space. We can still have a local .arclint-file to force only the CTE ppl into using it.
But in principle I would like to have that.
The monorepo-vs-non-monorepo is still interesting as we can specify both correct paths, can we?

The question now is: should we split the rst linting from other aspects? Imho yes and put the .rst-linting into LLVM space. We can still have a local .arclint-file to force only the CTE ppl into using it.

I'd keep the rst linting separate from the other linters, the "include" part will ensure only the .rst files in the review will be checked.

The monorepo-vs-non-monorepo is still interesting as we can specify both correct paths, can we?

No we can only have one path, which would need it to be different in monorepo-vs-non-monorepo, but this would be true for the clang-format linter suggested in D49116 too, so that linter will only work for the main llvm repo, (not clang or CTE)

The .arclint file will only work in CTE with the same .arclint file (copied into CTE) if both the .arclint file and the scripts are put into a new utils directory in tools/extra to match that of llvm (and if the clang repo wanted the same again it would need to be duplicate)

alternatively the path is given in an absolute form, but then your expecting everyone to have it in the same place likely outside the llvm tree (which isn't a great idea)

A more unified .arclint could be the following, if each repo had its own utils..

{
  "linters": {
    "validate-rst": {
      "type": "script-and-regex",
      "script-and-regex.script": "utils/validate_check.py --rst ",
      "script-and-regex.regex": "/^(?P<severity>warning|error): line (?P<line>\\d+) (?P<message>.*)$/m",
      "include": [
        "(\\.(rst)$)"
      ]
    }
  }
}

Its not a huge leap of faith to imagine a clang-tidy linter, in a similar fashion..

"clang-tidy": {
  "type": "script-and-regex",
  "script-and-regex.script": "clang-tidy $0.....",
  "include": [
    "(\\.(cpp)$)"
  ]
}

The question now is: should we split the rst linting from other aspects? Imho yes and put the .rst-linting into LLVM space. We can still have a local .arclint-file to force only the CTE ppl into using it.

I'd keep the rst linting separate from the other linters, the "include" part will ensure only the .rst files in the review will be checked.

The monorepo-vs-non-monorepo is still interesting as we can specify both correct paths, can we?

No we can only have one path, which would need it to be different in monorepo-vs-non-monorepo, but this would be true for the clang-format linter suggested in D49116 too, so that linter will only work for the main llvm repo, (not clang or CTE)

The .arclint file will only work in CTE with the same .arclint file (copied into CTE) if both the .arclint file and the scripts are put into a new utils directory in tools/extra to match that of llvm (and if the clang repo wanted the same again it would need to be duplicate)

LLVM decided on the monorepo, so we don't need to find a solution for the non-monorepo, as we can expect that all devs will use the monorepo eventually.
The utils folder in LLVM is meant for such kind of scripts, so we should aim there, at least use a utils/ directory in one of the subprojects.
clang-tidy can be expected to be installed on the system though. (what happens if the command fails for some reason, will it be an error?)