Page MenuHomePhabricator

[Wdocumentation][RFC] Improve identifier's of \param
Changes PlannedPublic

Authored by Mordante on Dec 28 2019, 11:16 AM.

Details

Summary

This fixes several issues with the processing of identifiers of a \param command:

Like Doxygen allow multiple identifiers in one \param command

For example:

/// \param foo, bar The description of foo and bar.
void fun(int foo, int bar);

Was interpreted as:

  • a single identifier foo,
  • a description of bar The description of foo and bar.

It will now be interpreted as:

  • two identifiers foo and bar
  • a description of The description of foo and bar.

Likewise:

/// \param foo,bar The description of foo and bar.
void fun(int foo, int bar);

Was interpreted as:

  • a single identifier foo,bar
  • a description of The description of foo and bar.

It will now be interpreted as:

  • two identifiers foo and bar
  • a description of The description of foo and bar.

Improved identifier detection

/// \param 42foo The description.
void fun(int foo);

The 42foo is not a valid identifier and will give a warning.

/// \param foo# The description.
void fun(int foo);

Will also trigger a warning since the identifier contains invalid characters.

Improved typo correction of variadic arguments

For example:

/// \param .. The description.
void fun(...);

Will suggest to change the identifier .. to .... This will only happen if the function is variadic and the variadic argument has no documentation.

Warn about undocumented function arguments

If after the typo correction function arguments are not documented yet there will be a warning about these arguments. This only happens if at least one \param comment is used.

/// \param foo The description of foo and bar.
void fun(int foo, int bar);

This will warn about the undocumented bar.

/// fun's arguments are not documented
void fun(int foo, int bar);

This will /not/ warn about the undocumented foo and bar.

Questions

The undocumented function argument is part of -Wdocumentation. Should it remain here or be part of its own -Wdocumentation-undocumented-arguments?

The clang-doc has minimal changes to support the new feature. @juliehockett anything you want to improve. If so should these changes be in this patch or a follow-up patch?

TODOs

  • The \tparam can be improved in a similar fashion. This will be done in a separate patch.
  • The identifier detection could try a bit harder to extract the identifier. It could detect and propose fixits for references, pointers, and arrays. For example it could offer a fixit for &*identifier[...]. This will be done in a separate patch.

Fixes PR14335: [-Wdocumentation] Multiple identifiers (comma,separated) are not recognized on \param

Diff Detail

Event Timeline

Mordante created this revision.Dec 28 2019, 11:16 AM

Unit tests: unknown.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

The merge bot noticed I forgot to update clang's unittests. Will look at that.

Mordante updated this revision to Diff 235514.Dec 29 2019, 11:01 AM

Fixes the compilation issues with the clang unit tests.

Unit tests: unknown.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Thank you for the patch!

clang/docs/ReleaseNotes.rst
68

validation *of*

360

Need backticks around \param. Also a spurious "p" at the end of the line.

clang/include/clang-c/Documentation.h
383

Please don't modify existing APIs in libclang -- it provides a stable API and ABI, and what has shipped, can't be changed. New functionality has to be exposed as new functions, while old functions should be kept working to the extent possible. It means that the resulting API can be subpar, but oh well, a stable ABI is a contract of libclang.

clang/include/clang/AST/Comment.h
715

"Indices of the parameters referenced by this command in the function declaration."

763

setParamIndices

781

Users can already call getArgText, do we need a second function that does the same?

793

Ditto, duplicated API.

797

I'd call it getArgument and define it in BlockCommandComment.

804

Why does vararg have an invalid index? Previously it was considered valid.

clang/include/clang/AST/CommentSema.h
111

s/NameArg/NameArgs/

clang/include/clang/Basic/DiagnosticCommentKinds.td
113

This warning should be in a separate group. -Wdocumentation contains generally useful, non-controversial stuff (that would have been an error if documentation comments were a part of the language). Warning about every undocumented parameter is only useful for certain style guides.

(Digression: And even then, I personally question the value of a style rule that requires every parameter (or every public API, or every file, or every whatever else) to be documented. Unless there's a tech writer on the team, in response to a such rule engineers are highly likely just to write useless comments like "\param frobnicator The Frobnicator necessary for this call." Engineers are already doing the best they can when writing comments. The choice we have is not between "no comments for some parameters" and "good comments for every parameter because the warning reminded you", it is between "no comments for some parameters" and "placeholder stuff that people will write to make the warning go away".)

I'd also suggest to implement this functionality in a separate patch.

clang/lib/AST/Comment.cpp
376

Previously, it was necessary for the index to be valid. Why relax the contract? (This change also makes things in consistent with, say, TParam).

clang/lib/AST/CommentParser.cpp
362

Adding lexIdentifier() seems too complicated to me (it is also not *really* an identifier, because we are allowing "..."). I'd stick to consuming characters until a comma or a space, that's a lot simpler. If we lexed an invalid identifier, it won't match any function parameter, and we will get a warning at that point anyway.

clang/lib/AST/CommentSema.cpp
742

Why double-backslash?

744

Stale comment (SmallVector)?

745

If ParamVarDocs is a vector of SourceRanges, the comment shouldn't say that it contains AST nodes.

758

Here's one reason why I find the fallback in getParamName to be questionable -- with the fallback, the function does different things depending on the phase of processing. Sometimes it returns the original name as written (before the index is set), and sometimes it returns the name from the function declaration (which can be different, as far as I understand). I'd much prefer to call getParamNameAsWritten here.

852

I'm not sure about this erase call. With this change, once a typo claims an identifier, it is gone no matter how bad the typo was. Even if there's a further, closer, typo, that identifier won't be suggested anymore.

866

Use cast<> instead.

cast<> = dyn_cast<> + assert

1121

I'm not sure I like the heuristic.

// \param foo.bar Something.
void foo(int foo_bar, ...);

Here we would suggest to correct foo.bar to ....

Instead, I'd suggest to throw a "..." identifier into the typo corrector if the function is variadic, and let the edit distance decide.

clang/lib/AST/JSONNodeDumper.cpp
1586

Despite what the old code does (I didn't review it when it was submitted), I think it is more useful to dump the parameter name as written, not as found in the function declaration.

clang/lib/Index/CommentToXML.cpp
745

Please also update the XML schema in llvm-project/clang/bindings/xml/comment-xml-schema.rng.

748

Why do we need an extra "<Identifiers>" wrapping the list of parameters?

clang/test/Sema/warn-documentation.cpp
268

Seems redundant to say that the identifier should be valid. Also it is not just any identifier, we can be more specific. WDYT about "a parameter name should be specified after the '\param' command"?

276

I don't think it makes sense to distinguish "not a valid identifier" and "not properly terminated". From the user's perspective, the issue is the same.

389

Would be nice to also add a test for '\param a, a Aaa.'

clang/unittests/AST/CommentParser.cpp
183

s/ParamName/ParamNames/

Mordante marked 14 inline comments as done.Jan 18 2020, 11:51 AM

Thanks for the review!

I only answered to the larger issues to get a better feeling of the direction the patch should go. Once that's clear I'll make a new patch and also address the smaller issues.

clang/include/clang-c/Documentation.h
383

I thought I had read this API was allowed to change, but required adding information to the release notes. (I can't find it quickly.)
I'll undo the changes to the existing functions and add new functions instead.

clang/include/clang/AST/Comment.h
781

I thought it made sense to offer these helper functions. getParamName seems clearer than getArgText. Of course it's also possible to add documentation instead of a helper function. What do you prefer?

Note: This function is the replacement of getParamNameAsWritten, so I can also reinstate its former name.

804

Before a lot of code first tested whether the index was valid and then depending on the vararg state do a different codepath. I thought it was very confusing that if the function returned true it was no guarantee it was safe to access DeclInfo::ParamVars[Idx].

clang/include/clang/Basic/DiagnosticCommentKinds.td
113

I'll move it to its own patch.

clang/lib/AST/Comment.cpp
376

IMO it makes it easier on the caller site. If it wants the name of the variable in the current FullComment context they can simply call this function. Then they don't need to validate whether the index is valid or not. (I intend to do a similar patch for TParam.)

clang/lib/AST/CommentSema.cpp
758

The difference is now in the FullComment argument instead of the name. But it seems that is not clear.

852

True, but I thought it was not allowed for fixits to give invalid hints. Using the same correction twice will result in new warnings. What do you think?

866

Thanks for the hint!

1121

This heuristic works at the moment. The lexIdentifier will not accept foo.bar. However since you requested to remove the lexIdentifier your approach makes more sense.

clang/lib/AST/JSONNodeDumper.cpp
1586

You want the behaviour for all AST dumpers? (I assume yes)

clang/lib/Index/CommentToXML.cpp
748

What would you propose instead? No "<Indentifiers>" and just a list of "<identifier>" ?

clang/test/Sema/warn-documentation.cpp
268

Agreed.

276

I'm not entirely sure. However after I remove the lexIdentifier 9aaa will become a valid identifier. So I'll remove this warning.

gribozavr2 added inline comments.Jan 22 2020, 1:23 AM
clang/include/clang-c/Documentation.h
383

I thought I had read this API was allowed to change

It would be interesting to find that doc. As far as I understand, libclang has a strict API & ABI stability rule.

I'll undo the changes to the existing functions and add new functions instead.

Thanks!

clang/include/clang/AST/Comment.h
781

I prefer using getArgText. It works uniformly for all block commands.

clang/lib/AST/Comment.cpp
376

The issue is that it is unclear what kind of name they get -- the as written name, or the mapped one.

clang/lib/AST/CommentSema.cpp
852

True, but I thought it was not allowed for fixits to give invalid hints.

Fixits are allowed to be incorrect. Any fixit can be incorrect semantically (the user actually intended something different), even if it perfectly fixes the C++ and Doxygen issues.

Using the same correction twice will result in new warnings.

I don't think users will apply all fixits without subsequent review. In an IDE setting, users pick and choose fixits.

Also, not reusing the same correction twice does not guarantee that the fixes are correct; it only ensures that we won't have a new warning about duplicated parameter names.

clang/lib/AST/JSONNodeDumper.cpp
1586

Yes, I'd prefer it everywhere.

clang/lib/Index/CommentToXML.cpp
748

Yep, "<Parameter>" can directly contain multiple "<Identifier>" tags.

Mordante marked 2 inline comments as done.Jan 26 2020, 8:40 AM

So if I understand correctly:

  • getParamNameAsWritten will become getArgText
  • The getParamName will do the translation from the name in the documentation to the name in the current function declaration. If the parameter index is invalid the function will fail (with an assertion error) and not fallback to call getArgText.
clang/include/clang-c/Documentation.h
383

I thought I had read this API was allowed to change

It would be interesting to find that doc. As far as I understand, libclang has a strict API & ABI stability rule.

My interpretation of http://llvm.org/docs/DeveloperPolicy.html#c-api-changes gave me this impression.

So if I understand correctly:

  • getParamNameAsWritten will become getArgText
  • The getParamName will do the translation from the name in the documentation to the name in the current function declaration. If the parameter index is invalid the function will fail (with an assertion error) and not fallback to call getArgText.

Yes, that would be my preference. Thanks!

clang/include/clang-c/Documentation.h
383

I see. Index.h explains the policy for libclang: https://github.com/llvm/llvm-project/blob/master/clang/include/clang-c/Index.h#L32-L33

If you feel like doing so, feel free to submit a patch for the developer policy to clarify libclang stability guarantees.

Mordante planned changes to this revision.Feb 16 2020, 9:00 AM
Mordante marked an inline comment as done.
Mordante added inline comments.
clang/lib/AST/CommentSema.cpp
1121

Any suggestion how to do this? The SimpleTypoCorrector expects a NamedDecl, but AFAIK ... isn't a NamedDecl.

Thanks for working on this, I'd like to see this being fixed.

clang/include/clang-c/Documentation.h
383

I think @gribozavr2 is correct here, the chapter in the developer policy refers to the C API of LLVM, not Clang. The Clang documentation also says that libclang should be stable (http://clang.llvm.org/docs/Tooling.html#libclang).