Page MenuHomePhabricator

[clang] fix incorrect Wdocumentation fix-its
Changes PlannedPublic

Authored by AlexanderLanin on Feb 19 2020, 3:20 PM.

Details

Summary

Wdocumentation no longer provides fix-its which map multiple param comments to the same parameter.

This commit fixes:

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2020, 3:20 PM
Mordante added a subscriber: gribozavr2.

Thanks for your patch. I also would like to get better fixits for the arguments.

I would also like to hear @gribozavr2's input.

clang/lib/AST/CommentSema.cpp
739

I don't feel the name Mapped conveys what the variable does. IMO Corrections would be a better name.

752

Please don't use curly braces for single line if and else statements (also at other locations).
It makes no sense to test this condition every iteration. Please move this test out of the loop.

763

I don't like adding duplicates here, and removed them in setDuplicatesToNullptr. This is inefficient.
Maybe add a mapping between CorrectedParamIndex and the first position in Mapped. Then when a duplicate is found the first position can be set to nullptr and the new position not added.

929–930

Please capitalize mapping.
I don't feel the name mapping conveys what the variable does. IMO Corrections would be a better name.
Please don't use auto when the type is not clear.

clang/test/Sema/warn-documentation-fixits.cpp
37

worthen -> worsen

llvm/unittests/ADT/StringRefTest.cpp
539

I don't mind this, but was it intentionally?

AlexanderLanin marked 3 inline comments as done.Feb 20 2020, 2:45 PM
AlexanderLanin added inline comments.
clang/lib/AST/CommentSema.cpp
752

I didn't touch this section that much. I extracted one of the ifs from the loop, but apparantely not the other... anyway I can certainly fix this.

763

I fully agree on that. However I went too far with cleanup and now I'm not quite sure what to do. To avoid adding another parallel review here, I'll just point you to https://github.com/AlexanderLanin/llvm-project/commit/d043f49089ebe3b0d1b4d8df02fbd0b3985b4d18 if thats fine? Just to figure out whether to abandon this change as it is here and switch to the other one or not. I ended up reducing resolveParamCommandIndexes to less then a dozen (hopefully readable) lines. Adding and removing is resolved with a MappedCounter, since other approaches failed to recognize triplicates.
If that looks fine in general I would replace this review. I personally like the smaller understandable functional methods, if that doesn't violate the general rules/ideas where the code should go. Otherwise I'll apply your findings here.

llvm/unittests/ADT/StringRefTest.cpp
539

I know it's slightly off topic, but not quite worth it's own commit. I was trying to figure out why 'ThisOneHasBeenRemovedButIsStillDocumented' was auto fixed to 'aaa' and my first guess was edit_distance doesn't correctly handle removed chars. As the code itself is not really readable for mere mortals and such a test was missing I added it here.

Turns out it was a special rule for if only one \\param, then don't check edit_distance. This is now slightly modified into if only one \\param and only one ParmVarDecl, then don't check edit_distance

Mordante marked 2 inline comments as done.Feb 22 2020, 10:35 AM
Mordante added inline comments.
clang/lib/AST/CommentSema.cpp
763

I think it makes sense to abandon this review and start a new one. As far as I know we don't have a policy regarding the size of functions. Personally I prefer smaller functions. Before putting up the new review, please address the comments in this review. Please have a look at coding standard https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable regarding the usage of auto.

I'm still not really fond of the algorithm used to remove the duplicates. I'll give it some thought to see whether I can come up with a better suggestion.

llvm/unittests/ADT/StringRefTest.cpp
539

Oke, just wanted to make sure it was intentionally.

AlexanderLanin retitled this revision from [clang] fewer incorrect Wdocumentation fix-its to [clang] fix incorrect Wdocumentation fix-its.
AlexanderLanin edited the summary of this revision. (Show Details)

Slightly more code changed, assuming it's better understandable now. Review findings incorporated.

@gribozavr2 what's your opinion of this patch?

Sorry it took a while.

Do you want to continue this patch or switch to the other one as you mentioned before?

Some other remarks:

  • In the tests please include the exact issues mentioned in PR43755 and PR43808.
  • I'd also like the patch to be a bit more robust regarding the fix-its. @gribozavr2 also offered some suggestions:
    • Take all \params and function parameters in account when looking for a solution and find the best total solution.
    • If the edit distance is too large don't use that suggestion.
    • Since most people tend to document \params in the same order as the function parameters you can also take that into account.

Can you add a few more complex test cases, for example something like:

/**
 * \param aabb X
 * \param abbb Y
 */
void foo(int NotMe, int bbbb);

Which comment will be attached to bbbb?

Thanks for the feedback!!

Do you want to continue this patch or switch to the other one as you mentioned before?

I had already updated this pull request to incorporate improvements from the github link. Sorry I didn't make that clear.
It's doesn't have those very small functions. That is in order to keep most of the not so easy understandable mapping vectors in small local scope.
Also to avoid some redundancy like adding all mappings and then removing the duplicates.

In the tests please include the exact issues mentioned in PR43755 and PR43808.

Both reports are from me and I'm quite certain I have them covered in tests. I will add a reference to the exact test cases.

Take all \params and function parameters in account when looking for a solution and find the best total solution.

Currently all unmapped parameters are taken into account.
I cannot imagine a use case where taking the mapped ones into account could influence the outcome.
Also I personally don't like breaking any existing correct documentation.
Do you have some example?

If the edit distance is too large don't use that suggestion.

Shall I remove the 1:1 exception which disregards edit distance?
The one from fixSinceThereIsOnlyOneOption?
I tried to not modify any not bug related behavior and this was there before.

Since most people tend to document \params in the same order as the function parameters you can also take that into account.

Sounds reasonable, but it might not be trivial to find an appropriate heuristic weighting this against edit distance.
I would like to exclude it from this patch and attempt that in the next one.

Can you add a few more complex test cases, for example something like:

Sure! I missed that one. I'll add some more.

More tests added to prevent further regressions. It's quite difficult to figure out tests which might fail, but now there is a whole bunch of them so it looks good.
Added explicit references from tests to the two bug reports.

If there is anything else I should fix, I'd gladly do so!

If there is nothing else please commit it as I cannot.
Alexander Lanin <alex@lanin.de>

Take all \params and function parameters in account when looking for a solution and find the best total solution.

Currently all unmapped parameters are taken into account.
I cannot imagine a use case where taking the mapped ones into account could influence the outcome.
Also I personally don't like breaking any existing correct documentation.
Do you have some example?

For example what happens in this case:

/**
 * \param aaaa Should match aaa
 * \param aaab  Should match aab
 * \param aaac  Should match aac
 */
void foo(int aaa, int aab, int aac);

And in the almost similar case:

/**
 * \param aaaa Should match aaa
 * \param aaab  Should match aab
 * \param aaac  Should match aac
 */
void foo(int aac, int aab, int aaa);

Does it give the same final result or does the order of the comments and parameters influence the outcome?

I would like to have a closer look at the algorithm you're using, I hope to find time for that later this week.

clang/lib/AST/CommentSema.cpp
738

Please capitalize the i and e.

787

Since the auto type is a pointer, please use a * to make it clear.

793

Please don't use auto unless the type is clear.

798

Is it intended that the assignment ParmVarDeclToParamCommandComment[PCC->getParamIndex()] = PCC; is executed? If so please add some comment.

814

Please capitalize the i and e.

824

Since the auto type is a pointer, please use a * to make it clear.

854

Please capitalize the i and e.

897

Please capitalize the i and e.

clang/test/Sema/warn-documentation-fixits.cpp
76

Why use a different test case than in the bug report? I really would like to see this patch fixes that case.

AlexanderLanin planned changes to this revision.Apr 10 2020, 2:47 PM
AlexanderLanin marked 10 inline comments as done.

I need to have a look why the mentioned test case is failing. I'll also add the two examples you provided to the next upload.

clang/lib/AST/CommentSema.cpp
738

Sorry such stuff should not require a manual review. I fixed this and similar issues by running clang-tidy --checks="readability-identifier-naming" -p=/home/alex/llvm/build CommentSema.cpp --fix

clang/test/Sema/warn-documentation-fixits.cpp
76

You are totally right. Added test case; It's failing; needs analysis. I thought I had that fixed. Will check ASAP.

AlexanderLanin marked an inline comment as done.
  • Add test cases exactly as in bug reports.
  • Test case from https://bugs.llvm.org/show_bug.cgi?id=43808 triggered a second bug due to the short parameter names used. Removed incorrect quick exit in SimpleTypoCorrector (MinPossibleEditDistance).
  • The two requested cases don't work since 'aaab' is as close to 'aaa' as it is to 'aab' (one char). Nothing is changed as there is no best match available. Instead two new test cases with aaab, aaac and aaad added (test3to3_allAlmost + test3to3_allAlmost_rev).

*ping*
Last change was over 3 months ago. Should this be merged or abandoned?
It doesn't solve everything. It doesn't improve everything. But it's a fix to two specific bugs.
Is someone missing as a reviewer to get this accepted/rejected any faster?

AlexanderLanin updated this revision to Diff 297541.EditedMon, Oct 12, 3:51 AM

Rebased and stored at https://github.com/AlexanderLanin/llvm-project/tree/Wdocumentation-fix-its so I have a branch next time I need to rebase :-)

Still looking for someone to approve and to commit this change.

AlexanderLanin planned changes to this revision.Mon, Oct 12, 4:56 AM

Added one missing expect statement in this revision.

Could someone please review and optionally approve and to commit this change?
That would be great!
It's been open for a while now ;-)

I think the patch looks better, but I still would like to get @gribozavr2 's opinion.

I'm not convinced the new approach is better in all cases, so I'm somewhat concerned about regressions from the current behaviour. Did you test this patch on a codebase to see the difference between the previous and new behaviour?

For the next time when providing a patch, please do the cleanups in a separate patch, that makes reviewing the real changes a bit easier.

clang/lib/AST/CommentSema.cpp
22

Is the debug output really that interesting after the code has been committed?

756

In general Clang doesn't use top level const often, please remove them from the function argument.

759

Please use auto * in range-based for loops.

760

Please use auto * since the type is already on the right hand side.

779

I would suggest to rename the function. The current name doesn't match what the function does. For example findUndocumentedParams.

787

I would suggest to make the name of the variable it bit shorter to improve readability.

790

Please use const auto *

828

Please use const auto*

859

Please reduce the number of blank lines in the hunk below.

873

I'm not sure whether that's the best behaviour. Can't we test which of the two suggestions is the best and then keep the suggestion there?

917

Please don't use braces on if's with one substatement. Maybe move the comment before the if.

clang/test/Sema/warn-documentation-fixits.cpp
81

Here I actually would like to get the fixit. No longer showing the argc when typing ArgC feels not good to me.

91

Here there's one undocumented parameter left and one \param why not match them?

96

why test for ZZZZZZ ?

137

Please reduce the amount of whitespace.

149

Here I would expect the cnt to be matched with argc.

AlexanderLanin planned changes to this revision.Sat, Oct 17, 12:33 PM
AlexanderLanin marked 9 inline comments as done.
AlexanderLanin added inline comments.
clang/lib/AST/CommentSema.cpp
22

There are a few debug print statements left in code below. They are certainly not interesting until someone tries to debug the code ;-)
Shall I remove this line and those llvm::dbgs() statements?

756

Sorry I was not familiar with that convention. I've removed const which affects only the definition and not the declaration (e.g. I left in case of const ref).

759

Sure, I'll fix it! It even felt a little strange to me. I tried to follow the guidelines which say to skip the type if it's already obvious from context. I though the exact type is not obvious since it's not getBlockContentComment(). But it's close enough :-)

779

Thanks, that should have been obvious from the comment 'Returns all ParmVarDecl without a comment'.

873

Sounds like a very reasonable improvement idea.
However on first glance it would require quite some changes starting with SimpleTypoCorrector/correctTypoInParmVarReference even returning the BestEditDistance.
Would it make more sense to address them in a follow up patch?

clang/test/Sema/warn-documentation-fixits.cpp
81

They both have the same edit distance of two mismatching chars. There is no logic in StringRef::edit_distance which would say 'c' is closer to 'C' then it is to 'V'.
I guess that would be a nice follow up improvement e.g. via another flag to edit_distance bool CaseIsNotVeryImportant.
edit_distance is used by ~23 files and I guess several of them could benefit from that new flag.

Note: at the moment both are fixed to 'argc' which is worse then not fixing at all.

96

Copy paste error. There is a ZZZZZZ before and after this test case.

149

This is currently not detected. Here is the minimal scenario:

/// \param a This one is correctly documented.
/// \param c Total mismatch. Fix it since it's only one comment to only one parameter ?!
int oneUnmatchedCommentToOneParameter(int a, int b);

Unfortunately I disagree. Sorry. Auto fixing that would be a no go for me personally. Because in my personal experience it's a quite common use case that one parameter is removed (but not the documentation) and then a different *unrelated* parameter is introduced (with no documentation). Just moving the old documentation over to this new parameter would hide/create a significant problem. Without 'fixing' it's at least obvious that there is a problem there.
I've not encountered this problem in real life with only one parameter, so I'm ok with fixSinceThereIsOnlyOneOption behavior.

AlexanderLanin marked 2 inline comments as done and an inline comment as not done.Sat, Oct 17, 12:48 PM
AlexanderLanin added inline comments.
clang/test/Sema/warn-documentation-fixits.cpp
149

Ok, that's not what is implemented: It does get fixed. So I was contemplating on what that means. Any code change, including or especially the automatic ones, do go through a code review. So it 'should' be ok to 'fix' stuff. That's what we have reviews for anyway.
I'm not too happy about it due to reasons above, but ok.

In this case it doesn't get suggested/fixed immediately on the first pass. Only after arg[] is fixed to arg it will be detected on subsequent compilation. It's not ideal, but good enough?!

AlexanderLanin marked an inline comment as not done.Sun, Oct 18, 3:36 PM

I'm not convinced the new approach is better in all cases, so I'm somewhat concerned about regressions from the current behaviour. Did you test this patch on a codebase to see the difference between the previous and new behaviour?

There are no warnings in llvm or ccache, all I found is a low quality codebase at work which does produce 1007 unique Wdocumentation warnings (see wc command).

It *seems* I can run this via:

mkdir -p build-clang-D74877 && cd build-clang-D74877
CC=/home/alex/llvm/build-D74877/bin/clang CXX=/home/alex/llvm/build-D74877/bin/clang++ cmake .. -DCMAKE_CXX_FLAGS="-Wdocumentation -Wno-documentation-deprecated-sync" -GNinja
CC=/home/alex/llvm/build-D74877/bin/clang CXX=/home/alex/llvm/build-D74877/bin/clang++ cmake .. -DCMAKE_CXX_FLAGS="-Wdocumentation -Wno-documentation-deprecated-sync -fsyntax-only" -GNinja
ninja -j1 -k10000 > warnings.txt
grep "\[-Wdocumentation\]" warnings.txt | grep -v "empty paragraph passed to" | grep -v "^\[" | sort | uniq | wc -l
cd ..

mkdir -p build-clang-master && cd build-clang-master
CC=/home/alex/llvm/build-master/bin/clang CXX=/home/alex/llvm/build-master/bin/clang++ cmake .. -DCMAKE_CXX_FLAGS="-Wdocumentation -Wno-documentation-deprecated-sync" -GNinja
CC=/home/alex/llvm/build-master/bin/clang CXX=/home/alex/llvm/build-master/bin/clang++ cmake .. -DCMAKE_CXX_FLAGS="-Wdocumentation -Wno-documentation-deprecated-sync -fsyntax-only" -GNinja
ninja -j1 -k10000 > warnings.txt
grep "\[-Wdocumentation\]" warnings.txt | grep -v "empty paragraph passed to" | grep -v "^\[" | sort | uniq | wc -l
cd ..

# .. in order to leave git
grep -v "^\[" build-clang-master/warnings.txt > ../clang-master-warnings.txt
grep -v "^\[" build-clang-D74877/warnings.txt > ../clang-D74877-warnings.txt

git diff ../clang-master-warnings.txt ../clang-D74877-warnings.txt

I do not know how to apply suggested fixes for an easier diff.
I could run clang-tidy with some inappropriate check, so that it will fix all the Wdocumentation warnings only. But that takes forever.

Anyway based on the approach above I found two additional regressions. Testing with real code seems to be kind of important ;-)

  • Reported column numbers have been changed. Apparently this cannot be checked by expected-warning and expected-note, so I added another FileCheck pass with a few select test cases.
  • All warnings are reported first and then all the notes. Apparently this is not checked by expected-warning and expected-note. This will probably also require to be checked by FileCheck directly.

I'll further analyze the diff ASAP (have to fix the second problem first so that the diff gets smaller).
But that experience was kind of enough to get the following idea:
I would like to create two different patches:

  • One with all the new tests (modified to pass based on current code). It should be trivial to review.
  • One with the new code and the required small modifications to tests.

Because at the moment lots of new test cases are added and it's not clear what is even different now with this patch.
That seems to be the best way to ensure all changes are clearly visible within test cases.

What do you think of that @Mordante?

I do not know how to apply suggested fixes for an easier diff.
I could run clang-tidy with some inappropriate check, so that it will fix all the Wdocumentation warnings only. But that takes forever.

Would it be feasible to run it on a part of the clang codebase just to get a feeling, or does that also take too long?

Anyway based on the approach above I found two additional regressions. Testing with real code seems to be kind of important ;-)

  • Reported column numbers have been changed. Apparently this cannot be checked by expected-warning and expected-note, so I added another FileCheck pass with a few select test cases.

Maybe first look why the columns have changed and whether it's a real issue.

  • All warnings are reported first and then all the notes. Apparently this is not checked by expected-warning and expected-note. This will probably also require to be checked by FileCheck directly.

AFAIK other diagnostic tests don't test the in which the diagnostics are emitted. So I wonder whether this is really required.

I'll further analyze the diff ASAP (have to fix the second problem first so that the diff gets smaller).
But that experience was kind of enough to get the following idea:
I would like to create two different patches:

  • One with all the new tests (modified to pass based on current code). It should be trivial to review.
  • One with the new code and the required small modifications to tests.

I think if would be first good to figure out why this changes and then see whether it's a real issue. If it's an issue it needs to be fixed, else the separate patches would be preferred.

clang/lib/AST/CommentSema.cpp
22

Yes please. When needed again for debugging it's trivial to add the wanted debug statements.

873

Yes if it's not trivial a separate patch would be preferable.

clang/test/Sema/warn-documentation-fixits.cpp
81

Yes maybe if the edit distance is the same it would be a good idea to use the case insensitive match. But I agree this can be done in a followup patch.

149

Would it be easy to only show the note on the first pass without the fixit? Then the user still gets a hint, but not the fixit.

AlexanderLanin marked 6 inline comments as done.Wed, Oct 21, 3:25 PM

Anyway based on the approach above I found two additional regressions. Testing with real code seems to be kind of important ;-)

  • Reported column numbers have been changed. Apparently this cannot be checked by expected-warning and expected-note, so I added another FileCheck pass with a few select test cases.

Maybe first look why the columns have changed and whether it's a real issue.

  • All warnings are reported first and then all the notes. Apparently this is not checked by expected-warning and expected-note. This will probably also require to be checked by FileCheck directly.

AFAIK other diagnostic tests don't test the in which the diagnostics are emitted. So I wonder whether this is really required.

I think if would be first good to figure out why this changes and then see whether it's a real issue. If it's an issue it needs to be fixed, else the separate patches would be preferred.

I think those unintentional changes are an issue. See where ^ points and how the readability of the messages decreases due to their order:

TimeFormatTypes.h:171:14: warning: parameter 'paramGpsWeek:' not found in the function declaration [-Wdocumentation]
        /// @param paramGpsWeek: weeks till 6.1.1980
             ^     ~~~~~~~~~~~~~
TimeFormatTypes.h:172:14: warning: parameter 'paramGpsTimeOfWeek:' not found in the function declaration [-Wdocumentation]
        /// @param paramGpsTimeOfWeek: seconds in current week
             ^     ~~~~~~~~~~~~~~~~~~~
TimeFormatTypes.h:173:14: warning: parameter 'paramLeapSeconds:' not found in the function declaration [-Wdocumentation]
        /// @param paramLeapSeconds: leap seconds, default 0
             ^     ~~~~~~~~~~~~~~~~~
TimeFormatTypes.h:171:14: note: did you mean 'paramGpsWeek'?
        /// @param paramGpsWeek: weeks till 6.1.1980
             ^     ~~~~~~~~~~~~~
                   paramGpsWeek
TimeFormatTypes.h:172:14: note: did you mean 'paramGpsTimeOfWeek'?
        /// @param paramGpsTimeOfWeek: seconds in current week
             ^     ~~~~~~~~~~~~~~~~~~~
                   paramGpsTimeOfWeek
TimeFormatTypes.h:173:14: note: did you mean 'paramLeapSeconds'?
        /// @param paramLeapSeconds: leap seconds, default 0
             ^     ~~~~~~~~~~~~~~~~~
                   paramLeapSeconds

I'll fix this shortly.

clang/test/Sema/warn-documentation-fixits.cpp
91

Because at the time this is executed there are technically two parameters unmatched. One has a fix-it suggestion.

149

Love the idea, will look into it!