Page MenuHomePhabricator

[clang-tidy] Linting .rst documentation
Needs ReviewPublic

Authored by MyDeveloperDay on Dec 10 2018, 10:09 AM.

Details

Summary

(retitled from Looking for feedback on add_new_check.py functionality)
This is really a first revision to get some feedback from code owners about where a change like this should be added (if at all)

Recently while submitting a patch D55433: [clang-tidy] Adding a new modernize use nodiscard checker and looking at other similar reviews D48866: [clang-tidy] Add incorrect-pointer-cast checker,D54222: [clang-tidy] Add a check to detect returning static locals in public headers , I noticed a number of times my review was commented on with "Please use 80 characters limit." in .rst files.

as clang-format doesn't automatically correct these files its really easy for these comments to creep back in in subsequent revisions, potentially resulting in more comments and more revisions.

We could add validation of the .rst files as a step that could be performed prior to submitting a patch to help get reviews ready, this would reduce the repetition that key reviewers have to perform for each new person who ventures to submit something.

We could either put such a validation step directly into add_new_check.py

e.g.
add_new_check.py -validate checker_module checker_name

or as a new python file (which is what I include here mainly for brevity)

validate_check.py checker_module checker_name

This would then validate the line length of .rst files, and determine if the line could be filled to closer to 80 characters by including the first word from next line

Included in this patch, are the changes found by running this with

validate_check.py modernize-use-noexcept

It checks both docs/ReleaseNotes.rst and the specified checker in docs/clang-tidy/checks/module_name-checker_name.rst

and emits outputs like....

e.g.

Checking ..\docs\clang-tidy\checks\modernize-use-noexcept.rst...
Line 32 maximize 80 characters by joining:'[Users can use :option:`ReplacementString` to specify a macro to use
]'
and
'[instead]


Line 33 maximize 80 characters by joining:'[instead of ``noexcept``.  This is useful when maintaining source code
]'
and
'[that]

Diff Detail

Event Timeline

MyDeveloperDay added a project: Restricted Project.
MyDeveloperDay edited the summary of this revision. (Show Details)

Hi,

i think the idea is good to have more tools that do style checks for us.
Plugging them into the normal test-suite is maybe a good idea, as otherwise everyone forgets to use them ;)

But I feel @alexfh or @aaron.ballman opinions should be taken here, too.

Thank you for great idea! I think running this script should be part of build, like running Clang-format in Polly.

MyDeveloperDay marked an inline comment as done.Dec 10 2018, 11:50 AM
MyDeveloperDay added inline comments.
docs/ReleaseNotes.rst
205 ↗(On Diff #177547)

There are some places where it doesn't work so well, like line 203, it simply can't be cut down and its actually 81 characters

JonasToth added inline comments.Dec 10 2018, 11:53 AM
docs/ReleaseNotes.rst
205 ↗(On Diff #177547)

Yes, but you should be able to detect such a situation. If you have this functionality it i think it would be good to go.

Other common problems which will be great to detect with this script:

  • Double spaces
  • Empty lines at end of file
  • Ensure same length of headers titles and markup.

Other common problems which will be great to detect with this script:

  • Double spaces
  • Empty lines at end of file
  • Ensure same length of headers titles and markup.

It seems you also picked me up on

  • double multiple adjacent blank lines...

I would also expect

  • trailing white-space to perhaps be a comment review comment

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Added new checks to help reduce review comments in documentation

  • check for unnecessary empty lines at the end of the rst files
  • check for double spaces
  • check for extraneous multiple blank lines
  • check for extraneous trailing whitespace
  • ensure markup title and markup characters --- === ^^^ match in lengh
  • exclude lines containing <clang-tidy
  • reformat message to look like a compiler warning

in looking closer at an example of the output..

$ ./validate_check.py readability avoid-const-params-in-decls
Checking ..\docs\ReleaseNotes.rst...
warning: line 27 is in excess of 80 characters (81) : the latest release, please see the `Clang Web Site <https://clang.llvm.org>`_ or
...
warning: line 31 maximize 80 characters by joining:'[Note that if you are reading this file from a Subversion checkout or the]' and '[main...]

warning: line 32 maximize 80 characters by joining:'[main Clang web page, this document applies to the *next* release, not]' and '[the...]

warning: line 33 maximize 80 characters by joining:'[the current one. To see the release notes for a specific release, please]' and '[see...]

warning: line 81 maximize 80 characters by joining:'[  floating-point context, and recommends the use of a function that]' and '[returns...]

warning: line 162 maximize 80 characters by joining:'[  ``namespace a { namespace b { ... }}`` and offers change to]' and '[syntax...]

warning: line 167 is in excess of 80 characters (81) :   Detects usage of the deprecated member types of ``std::ios_base`` and replaces
...
warning: line 173 trailing whitespace
Checking ..\docs\clang-tidy\checks\readability-avoid-const-params-in-decls.rst...
warning: line 16 contains double spaces
warning: line 17 contains double spaces
You are ready to review!

I realize i have a couple of out by 1 errors on the line_num and the length not excluding the newline, which I'll address in the next patch

Add capability to check all the checks in one go (suitable for a build step)

Example output from running with validate_check.py --check-all

MyDeveloperDay updated this revision to Diff 177697.EditedDec 11 2018, 5:44 AM

Reduce false positives

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

This should make you happy!

Checking .\../docs/clang-tidy/checks\hicpp-no-assembler.rst...
warning: line 4 title and markup non matching lengths
.. title:: clang-tidy - hicpp-no-assembler

hicpp-no-assembler
===================

Check for assembler statements. No fix is offered.

Other common problems which will be great to detect with this script:

  • Ensure same length of headers titles and markup.

Could you explain what you mean by this? I'm assuming you mean that when you see a title with --- or === or ^^^ then you expect the line about to be the exact same length, is that correct?

Example
-------

Yes, you understood correctly.

This should make you happy!

Checking .\../docs/clang-tidy/checks\hicpp-no-assembler.rst...
warning: line 4 title and markup non matching lengths
.. title:: clang-tidy - hicpp-no-assembler

hicpp-no-assembler
===================

Check for assembler statements. No fix is offered.

Great! Thank you for script improvements!

Neat script, I think we could even plug that into arc diff. AFAIK it can run linters

clang-tidy/validate_check.py
22

Please align the string better.

48

Please make the comments here grammatically correct, too.

64

using parens in python is not necessary. if not in_code_block: is enough. Same at all other places. I would only use parens if you mix or and and.

71

We had cases were code-block is indented, sometimes its even necessary if you have a nested code-block in an option field or so. I think that should be considered.

93

we don't count higher? :)
How about a little regex close to r"^\d+\. .*"? Not sure if that is actually correct now, but please try it out ;)

103

What about \n?

121

That condition is quite hard to read. Could make a little function with a good name that takes the necessary parameters and returns a boolean instead?

123

please add space around the second >

148

please align that line better

164

Does that allow multiple arguments too? It would be great to just provide a list.

Maybe nargs=+, required=False, default = [None,] (not sure if default would be required) would fit better?

169

Does it need to be a new check? I think you could drop that.

180

Please add Both module and check... to make it clear they have to be set both.

190

please return 0 on success and 1 or custom error codes on failure.

202

sys.exit(main()), CI could always validate and use the return code for success.

Address review comments

MyDeveloperDay marked 12 inline comments as done.Dec 12 2018, 10:44 AM

I think will be good idea to rename script to validate_documentation.py. Or may be lint?

JonasToth added inline comments.Dec 13 2018, 3:12 AM
clang-tidy/validate_check.py
103

Nit: missing whitespace after comma, you can use yapf or autopep8 to format your python code, too.

MyDeveloperDay retitled this revision from [clang-tidy] Looking for feedback on add_new_check.py functionality to [clang-tidy] Linting .rst documentation.Dec 13 2018, 1:56 PM
MyDeveloperDay edited the summary of this revision. (Show Details)

I think will be good idea to rename script to validate_documentation.py. Or may be lint?

I guess this really turned into this is sphinx linter, for which perhaps something like that already exists, however I think unless I put a full parser in it would never really cover everything without too many exceptions...

I noticed that recently the static analyzer team also suggested adding their documentation in sphinx format D54429: [analyzer] Creating standard Sphinx documentation by @dkrupp , not wanting to criticize them but it didn't take me long to discover places where they also broke some of the rules like double empty adjacent lines and misaligned markup headers etc..

So then I looking inside clang/docs.... and again it was easy to find multiple places where the file had minor issues

$ ./validate_check.py --no-maximize --rst ../../../docs/ReleaseNotes.rst
Checking ..\..\..\docs\ReleaseNotes.rst...
warning: line 115 multiple blank lines
warning: line 157 multiple blank lines
warning: line 184 multiple blank lines
warning: line 234 title and markup non matching lengths
warning: line 236 multiple blank lines
warning: line 240 multiple blank lines
warning: line 258 multiple blank lines
warning: line 266 multiple blank lines
warning: line 280 is in excess of 80 characters (82) :   was extended. One more type of issues is caught - implicit integer sign change.

so then I headed over to llvm/docs and I see, you guess it, more cases at that level too

Checking ..\..\..\..\..\docs\ReleaseNotes.rst...
warning: line 13 multiple blank lines
warning: line 18 contains double spaces
warning: line 18 is in excess of 80 characters (82) : release 8.0.0.  Here we describe the status of LLVM, including major improvements
...
warning: line 20 contains double spaces
warning: line 31 contains double spaces
warning: line 59 multiple blank lines
warning: line 65 multiple blank lines
warning: line 71 multiple blank lines
warning: line 83 title and markup non matching lengths
warning: line 88 title and markup non matching lengths
warning: line 95 multiple blank lines
warning: line 96 multiple blank lines
warning: line 100 multiple blank lines
warning: line 109 multiple blank lines
warning: line 117 contains double spaces

its almost as if this script could live as a tool on its own not just for clang-tidy, and whilst a few formatting errors aren't anything to write home about, it perhaps constitutes of "review comments" that went unmissed, and maybe its something some people might worry about.

Even without the rule that tries to maximize the 80 characters per line (which I added a -no-maximize switch to allow it to be turned off) and running over the full LLVM tree

$ find . -name '*.rst' -exec ./tools/clang/tools/extra/clang-tidy/validate_check.py --no-maximize --rst {} \; | grep warning | wc -l

14172

Gave 14,000+ warnings across the entire LLVM tree

6231 - line is in excess of 80 characters
4774 - line contains double spaces
1528 - line multiple blank lines
1078 - line trailing whitespace
484 - title and markup non matching lengths
81 - empty lines at the end of the file (almost 16% of all .rst files)

Of course some of these lines might have multiple errors...

I'm not sure if anyone is interested in such a linter or if something already exists? perhjaps I should be forwarding this to someone who looks after llvm/utils/lint, but the last person to make any changes over there was in 2009

I think will be good idea to discuss wider application for this script in llvm-dev. If LLVM use Clng-format, why not to use this script?

MyDeveloperDay planned changes to this revision.Wed, Jan 9, 2:26 AM

Addressing some formatting review comments

  • add ability to run with --rst to perform on a single .rst file (useful for validating docs not in clang-tidy)
  • add ability to ignore the 80 character rulles (with -no-maximize), (some of the other documentation has lots of this violations and we might want to fix the obvious checks first)
MyDeveloperDay marked an inline comment as done.Thu, Jan 10, 2:09 AM
clang-tidy/validate_check.py
37

@MyDeveloperDay did you consider using the fairly standard docutils package to avoid the manual parsing of rst file? Something along

python
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
def parse_rst(path):
    with open(path) as fd:
        text = fd.read()
        parser = docutils.parsers.rst.Parser()
        components = docutils.parsers.rst.Parser,
        settings = docutils.frontend.OptionParser(components=components).get_default_values()
        document = docutils.utils.new_document('<rst-doc>', settings=settings)
        parser.parse(text, document)
        return document

class MyVisitor(docutils.nodes.NodeVisitor):
   def visit_literal_block(self, node):
     print(node.raw_source)  # do stuff here
   def unknown_visit(self, node):
    pass

looks more resilient to me...

MyDeveloperDay marked an inline comment as done.Mon, Jan 14, 1:16 AM
MyDeveloperDay added inline comments.
clang-tidy/validate_check.py
37

I'll take a look that looks like a good suggestion, especially if it gives us access to the original text