This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tidy/tool/
-
tool/
-
run-clang-tidy.py
1
run_clang_tidy.py
-
test_input/
-
out_csa_cmake.log
-
out_performance_cmake.log
-
test_log_parser.py
-
docs/
-
ReleaseNotes.rst

Differential D54141

[clang-tidy] add deduplication support for run-clang-tidy.py
AbandonedPublic

Authored by JonasToth on Nov 6 2018, 2:19 AM.

Download Raw Diff

Details

Reviewers

alexfh
aaron.ballman
hokein
sammccall
serge-sans-paille
lebedev.ri

Summary

run-clang-tidy.py is the parallel executor for clang-tidy. Due to the
common header-inclusion problem in C++/C diagnostics that are usually emitted
in class declarations are emitted every time their corresponding header is
included.

This results in a *VERY* high amount of spam and renders the output basically
useles for bigger projects.
With this patch run-clang-tidy.py gets another option that enables
deduplication of all emitted diagnostics(by default off). This is achieved with parsing the
diagnostic output from each clang-tidy invocation, identifying warnings and
error and parsing until the next occurrence of an error or warning. The collected
diagnostic is hashed and stored in a set. Every new diagnostic will only be
emitted if its hash is not in the set already.

Numbers to show the issue

I am currently creating a buildbot for running clang-tidy over real world projects. Some experience comes from there, I reproduced one specific case for this test. It is not made up and not even the worst I could see.

Running clang-tidys misc-module over llvm/lib:

/fast_data2/llvm/tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks=-*,misc-* \
    -header-filter=".*" \
    -clang-tidy-binary /fast_data2/llvm/build_clang_fast/bin/clang-tidy \
    -fix \
    lib/ \
    2>/fast_data2/rct_dedup_lib.err.misc \
    1>/fast_data2/rct_dedup_lib.out.misc

produces over 300MB of diagnostic output. The run-clang-tidy.py script consumes up to 0.8%*32GB of RAM on my machine.

373K Nov  5 22:48 rct_lib.err.misc
306M Nov  5 22:48 rct_lib.out.misc

Doing the same analysis but with -deduplication enabled results in 5.4MB of diagnostic output (two orders of magnitude less!) and run-clang-tidy.py only consumes up to 0.5%*32GB of RAM.

373K Nov  5 23:13 rct_dedup_lib.err.misc
5,4M Nov  5 23:13 rct_dedup_lib.out.misc

Notes

The difference in RAM usage for the run-clang-tidy.py script seems suspicious as one would expect the duplication overhead should need more RAM as only printing the stuff out.
It might be a memory leak in the script of some other effect. To my surprise we are better of deduplicating. I did not measure run-time differences but I suspect they decrease as well, as piping hundreds of MB through stdout in python is probably slower.

I found multiple checks that are specifically prone to producing *A LOT* of spam, e.g. bugprone-macro-parentheses. I did statistics in my buildbot where the spammy checks easily had 100x times the output then they needed to have (consistent with the finding in the llvm/lib example).
Running modules with spam-prone checks over the whole of LLVM resulted in ~GB of log-output. I could measure more because my buildbot just refused to give me the full log-files.

Correctness

I did check against a grep "warning: " | sort | uniq -c | sort -n -r output for the log-files. They showed every diagnostic in the deduplicated output occured exactly once.
The hashing is done with SHA256 with is considered to be secure, so there are no collision expected. For this use-case MD5 might even be viable, but by inspecting htop output
the 16 cores of my machine were all fully loaded, so there doesn't seem to be a performance issue from to slow hashing or similar (the parsing is done within the lock, so no parallelization there!).

Diff Detail

Repository

rCTE Clang Tools Extra

Build Status

Buildable 27005
Build 27004: arc lint + arc unit

Event Timeline

JonasToth created this revision.Nov 6 2018, 2:19 AM

Herald added subscribers: cfe-commits, xazax.hun, mgorny. · View Herald TranscriptNov 6 2018, 2:19 AM

Harbormaster completed remote builds in B24593: Diff 172724.Nov 6 2018, 2:19 AM

spurious change in my git?

Harbormaster completed remote builds in B24595: Diff 172726.Nov 6 2018, 2:21 AM

JonasToth edited the summary of this revision. (Show Details)Nov 6 2018, 2:41 AM

JonasToth added a project: Restricted Project.

JonasToth added inline comments.

clang-tidy/tool/run_clang_tidy.py
2	This simlink is required for my unittests, I don't know how to add the added tests in the `lit` test-suite so there is no change there yet. A bit of guidance there would be nice :)

Thanks for the patch and nice improvements.

Some initial thoughts:

The output of clang-tidy diagnostic is YAML, and YAML is not an space-efficient format (just for human readability). If you want to save space further, we might consider using some compressed formats, e.g. llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this might not matter.
clang-tidy itself doesn't do deduplication, and run-clang-tidy.py seems an old way of running clang-tidy in parallel. The python script seems become more complicated now. We have AllTUsToolExecutor right now, which supports running clang tools on a compilation database in parallel, so another option would be to use AllTUsToolExecutor in clang-tidy, and we can do deduplication inside clang-tidy binary (in reduce phase), which should be faster than the python script (spawn new clang-tidy processes and do round-trip of all the data through YAML-on-disk).

This feature seems like a good idea. I started writing it too some months ago, but then I changed tactic and worked on distributing the refactor over the network instead. As far as I know, your deduplication would not work with a distributed environment.

However, it seems that both features can exist.

You use a regex to parse the clang output. Why not use the already-machine-readable yaml output and de-duplicate based on that? I think the design would be something like:

Run clang-tidy in a quiet mode which only exports yaml and does not issue diagnostics
Read the yaml in your python script
Add the entries to your already-seen cache
For any entry which was not already there
- Write the entries to a new yaml file
- Use clang-apply-replacements --issue-diags the_new_file.yaml to actually cause the new diagnostics to be issued (they were omitted from the clang-tidy run).

This avoids fragile parsing of the output from clang, instead relying on the machine-readable format.

I think clang-apply-replacements already does de-duplication, so it's possible that could take more responsibility.

Also, I think your test content is too big. I suggest trying to write more contained tests for this.

The output of clang-tidy diagnostic is YAML, and YAML is not an space-efficient format (just for human readability). If you want to save space further, we might consider using some compressed formats, e.g. llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this might not matter.

The output were normal diagnostics written to stdout, deduplication happens from there (see the test-cases). The files i created were just through piping to filter some of the noise.
Without de-duplication its very hard to get something useful out of a run with many checks activated for bigger projects (e.g. Blender and OpenCV are useless to try, because they have some commonly used macros with a check-violation. The buildbot filled 30GB of RAM before it crashed and couldn't even finish the analysis of the project. Similar for LLVM).

I would like to try the simple deduplication first and see if space is still an issue. After all I want to just read the diagnostic and see whats happening instantly and a more compressed format might not help there.

clang-tidy itself doesn't do deduplication, and run-clang-tidy.py seems an old way of running clang-tidy in parallel. The python script seems become more complicated now. We have AllTUsToolExecutor right now, which supports running clang tools on a compilation database in parallel, so another option would be to use AllTUsToolExecutor in clang-tidy, and we can do deduplication inside clang-tidy binary (in reduce phase), which should be faster than the python script (spawn new clang-tidy processes and do round-trip of all the data through YAML-on-disk).

I agree that AllTUsToolExecutor would be better instead of the python script, but i think getting this done takes longer, then just patching the script now. From the patch here (it is an by-default off option as well) it is easier to test all pieces of clang-tidy. From there we can easily migrate to something better then `run-clang-tidy.py´.

The deduplication within clang-tidy would be the best option! But for full deduplication the parallelization must happen first.

The python script seems become more complicated now.

A bit, yes. The actual calling of clang-tidy and other parts are not touched. Just the parser adds additional complexity, which is covered in the unit tests. I don't think this solution lives for ever, but its fast and effective. Its optional and by default off.

Thank you for the comment!

In D54141#1288809, @steveire wrote:

This feature seems like a good idea. I started writing it too some months ago, but then I changed tactic and worked on distributing the refactor over the network instead. As far as I know, your deduplication would not work with a distributed environment.

I agree that it would probably not work. It might enable a two-stage deduplication, but I don't know if that would be viable.

However, it seems that both features can exist.

You use a regex to parse the clang output. Why not use the already-machine-readable yaml output and de-duplicate based on that? I think the design would be something like:

Run clang-tidy in a quiet mode which only exports yaml and does not issue diagnostics

Read the yaml in your python script

Add the entries to your already-seen cache

For any entry which was not already there

Write the entries to a new yaml file

Use clang-apply-replacements --issue-diags the_new_file.yaml to actually cause the new diagnostics to be issued (they were omitted from the clang-tidy run).

This avoids fragile parsing of the output from clang, instead relying on the machine-readable format.

In principle this approach seems more robust and I am not claiming my approach is robust at all :)
The point hokein raised should be considered first in my opinion. If clang-tidy itself is already parallel we should definitely deduplicate there. This is something I would put more
effort in. The proposed solution is more a hack to get my buildbot running and find transformation bugs and provide real-world data for checks we implement. :)

I think clang-apply-replacements already does de-duplication, so it's possible that could take more responsibility.

Yes, the emitted fixes are deduplicated but i think we need something even if no fixes are involved.

Also, I think your test content is too big. I suggest trying to write more contained tests for this.

I wanted to have a mix of both real snippets and some unit-tests on short examples. Do you think its enough if i shorten the list of fields that the CSA output contains for the padding checker?

In D54141#1288818, @JonasToth wrote:

Thank you for the comment!

In D54141#1288809, @steveire wrote:

This feature seems like a good idea. I started writing it too some months ago, but then I changed tactic and worked on distributing the refactor over the network instead. As far as I know, your deduplication would not work with a distributed environment.

I agree that it would probably not work. It might enable a two-stage deduplication, but I don't know if that would be viable.

Yes, I think the distributed refactoring would benefit from the design I outlined - issuing diagnostics from the yaml files.

In principle this approach seems more robust and I am not claiming my approach is robust at all :)
The point hokein raised should be considered first in my opinion. If clang-tidy itself is already parallel we should definitely deduplicate there. This is something I would put more
effort in. The proposed solution is more a hack to get my buildbot running and find transformation bugs and provide real-world data for checks we implement. :)

Yes, I think it makes sense to do something more robust. I understand you're yak shaving here a bit while trying to reach a higher goal.

The AllTUsToolExecutor idea is worth exploring - it would mean we could remove the threading from run-clang-tidy.py.

I don't think we should get a self-confessed hack in just because it's already written.

However, AllTUsToolExecutor seems to not create output replacement files at all, which is not distributed-refactoring-friendly.

I think clang-apply-replacements already does de-duplication, so it's possible that could take more responsibility.

Yes, the emitted fixes are deduplicated but i think we need something even if no fixes are involved.

Maybe my suggestion was not clear. The yaml file generated by clang-tidy contains not only replacements, but all diagnostics, even without a fixit.

So, running clang-apply-replacements --issue-diags the_new_file.yaml would issue the warnings/fixit hints by processing the yaml and issuing the diagnostics the way clang-tidy would have done (See in my proposed design that we silence clang-tidy).

Note also that the --issue-diags option does not yet exist. I'm proposing adding it.

Also, I think your test content is too big. I suggest trying to write more contained tests for this.

I wanted to have a mix of both real snippets and some unit-tests on short examples. Do you think its enough if i shorten the list of fields that the CSA output contains for the padding checker?

It seems that the bulk of the testing part of this commit is parsing a real-world log that you made. I guess if you remove the parsing (by taking a machine-readable approach) that bulk will disappear anyway.

Maybe my suggestion was not clear. The yaml file generated by clang-tidy contains not only replacements, but all diagnostics, even without a fixit.

So, running clang-apply-replacements --issue-diags the_new_file.yaml would issue the warnings/fixit hints by processing the yaml and issuing the diagnostics the way clang-tidy would have done (See in my proposed design that we silence clang-tidy).

Note also that the --issue-diags option does not yet exist. I'm proposing adding it.

At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.
I think just relying on clang-apply-replacements is not ok.
If we do a hybrid: on-the-fly deduplication within clang-tidy/run-clang-tidy.py and a potential final deduplication with clang-apply-replacments we get the best of both worlds.
That fits the distributed use-case as well(? I don't use a distributed system for these things as my projects are too small), because the first stage is local, the second stage central after all local workers are done.

It seems that the bulk of the testing part of this commit is parsing a real-world log that you made. I guess if you remove the parsing (by taking a machine-readable approach) that bulk will disappear anyway.

That is true. The lat bit you have to convince me is on-the-flight output to see whats going on. I personally usually just take the raw textual representation and grep/scroll through it to see whats going on. It might be a bit of a tension between large-scale and small/medium scale applications.

In D54141#1288851, @JonasToth wrote:

So, running clang-apply-replacements --issue-diags the_new_file.yaml would issue the warnings/fixit hints by processing the yaml and issuing the diagnostics the way clang-tidy would have done (See in my proposed design that we silence clang-tidy).

Note also that the --issue-diags option does not yet exist. I'm proposing adding it.

At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.
I think just relying on clang-apply-replacements is not ok.

I think my proposal is still unclear to you. Sorry about that.

I am proposing on-the-fly de-duplication, but without regex parsing of the diagnostic output of clang-tidy.

My proposal is still the same as I wrote before, but maybe what I write below will be more clear. Sorry if this seems condescendingly detailed. I don't know where the misunderstanding is coming from, so I err on the side of detail:

Imagine you have two cpp files which both include the same header. Imagine also that all 3 files are missing 1 override keyword and you run the modernize-use-override check on the two translation units using run-clang-tidy.py.

Here is what I propose happens:

First, assume that the two translation units are processed serially, just to simplify the process as described. You will see that parallelizing does not change anything.
clang-tidy gets run on file1.cpp in a way that it does not write diagnostics (and fixes) to stdout/stderr, but only generates a yaml file representing the diagnostics (warnings and fixes).
Two diagnostics are created - one for the missing override in file1.cpp and one for the missing override in shared_header.h
the on-the-fly deduplication cache is empty, so both diagnostics get added to the on-the-fly deduplication cache
Because both were added to the cache, both diagnostics get written to a temporary file foo.yaml
clang-appy-replacements --issue-diags foo.yaml is run (some other tool could be used for this step, but CAR seems right to me)
clang-appy-replacements --issue-diags foo.yaml causes the two diagnostics to be issued, exactly as they would have been issued by clang-tidy.
clang-appy-replacements --issue-diags foo.yaml DOES NOT actually change the source code. It only emits diagnostics
Next process file2.cpp
Processing file2.cpp results in two diagnostics - one for the missing override in file2.cpp and one for the missing override in shared_header..h
NOTE: The diagnostic for the missing override in shared_header.h is a duplicate
NOTE: The diagnostic for the missing override in file2.cpp is NOT a duplicate
Try to add both to the on-the-fly deduplication cache
Discover that the diagnostic for file2.cpp IS NOT a duplicate. Add it to a tempoary bar.yaml (named not to conflict with any other temporary file! This is built into temporary file APIs).
Discover that the diagnostic for shared_header.h IS a duplicate. DO NOT write it to the temporary file
Run clang-appy-replacements --issue-diags bar.yaml
Run clang-appy-replacements --issue-diags bar.yaml causes ONLY the diagnostic for file2.cpp to be issued because that is all that is in the file.
At the end, you have a de-duplicated yaml structure in memory. Write it to the file provided to the --export-fixes parameter of run-clang-tidy.py.

Do you understand the proposal now?

This means that you get

A deduplicated fixes file
De-duplicated diagnostics issued - which means you can process them in your CI system etc.

That fits the distributed use-case as well?

If the distributed system processes more than one file at a time on the remote computer.

I'm not aware of any distributed systems that work that way. I think they all process single files at a time.

However, when the resulting yaml files are sent back to your computer, your computer can deduplicate the diagnostics issued (in my design).

Do you understand the proposal now?

Yes better, I was under the impression that clang-apply-replaments is run on the end and the YAMLs are kept until then. Now its clear.
I assume --issue-diags produce the same result as the normal diagnostic engine. That could work, yes.

clang-tidy does not have a quiet mode though. It has the -quiet option which just does not emit how many warnings were created and suppressed.
Do you have these things already in the pipeline?

Reducing log file size is good idea, but I think will be also good idea to count duplicates. This will allow to concentrate clean-up efforts on place where most of warnings originate.

In D54141#1288993, @Eugene.Zelenko wrote:

Reducing log file size is good idea, but I think will be also good idea to count duplicates. This will allow to concentrate clean-up efforts on place where most of warnings originate.

Places that emit a lot of diagnostics, still do. I think the amount of duplicated warnings does not show an urgency with the unique warning.

In D54141#1288930, @JonasToth wrote:

Do you understand the proposal now?

Yes better, I was under the impression that clang-apply-replaments is run on the end and the YAMLs are kept until then. Now its clear.
I assume --issue-diags produce the same result as the normal diagnostic engine. That could work, yes.

clang-tidy does not have a quiet mode though. It has the -quiet option which just does not emit how many warnings were created and suppressed.
Do you have these things already in the pipeline?

Please let me clarify a bit: -export-fixes _DOES_ emit all warnings to yaml, but clang-tidy still prints the diagnostics out, even in -quiet mode. So a useful deduplication would require changes to the clang-tidy itself if going the YAML route.

In D54141#1288930, @JonasToth wrote:

Do you understand the proposal now?

Yes better, I was under the impression that clang-apply-replaments is run on the end and the YAMLs are kept until then. Now its clear.
I assume --issue-diags produce the same result as the normal diagnostic engine. That could work, yes.

Great, glad we got that misunderstanding sorted out.

clang-tidy does not have a quiet mode though. It has the -quiet option which just does not emit how many warnings were created and suppressed.
Do you have these things already in the pipeline?

No, I have not started on these.

At least the clang-tidy quiet mode is trivial to implement. Maybe instead of --quiet we could have --stdout=<output_format> where output_format can be one of none, diag, yaml and in the future possibly json (requested here: http://lists.llvm.org/pipermail/cfe-dev/2018-October/059944.html) or cbor, to address the binary output suggestion from @hokein.

At least the clang-tidy quiet mode is trivial to implement. Maybe instead of --quiet we could have --stdout=<output_format> where output_format can be one of none, diag, yaml and in the future possibly json (requested here: http://lists.llvm.org/pipermail/cfe-dev/2018-October/059944.html) or cbor, to address the binary output suggestion from @hokein.

Yes, it would slighlty duplicate export-fixes but i think that is not a big issue.
I think the first steps would be using parallel execution within clang-tidy itself.
After that we can extract the deduplication logic from apply-replacements (if it is actually suitable!) or deduplicate in the diag() emitting phase. A thing we have to keep in mind, that deduplication must happen to the whole warning: blaaa\n note: blaa \n note: 'aslkdjad' here comes the only change between two different warnings construct.
If we don't group the diagnostics properly we have a bad time.

That said, would you agree to have the parser-based deduplication as an developer-only optin solution for now? :)

In D54141#1289326, @JonasToth wrote:

That said, would you agree to have the parser-based deduplication as an developer-only optin solution for now? :)

If you're suggesting proceeding with this regex based solution, I don't think that's a good idea. Why commit a hack which people will object to ever removing? Just see if we can do the right thing instead.

If you're suggesting proceeding with this regex based solution, I

don't think that's a good idea. Why commit a hack which people will object to ever removing? Just see if we can do the right thing instead.

+1, my main concern is the complexity of the patch and maintenance burden of the python script.

In D54141#1288811, @JonasToth wrote:

The output of clang-tidy diagnostic is YAML, and YAML is not an space-efficient format (just for human readability). If you want to save space further, we might consider using some compressed formats, e.g. llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this might not matter.

The output were normal diagnostics written to stdout, deduplication happens from there (see the test-cases). The files i created were just through piping to filter some of the noise.
Without de-duplication its very hard to get something useful out of a run with many checks activated for bigger projects (e.g. Blender and OpenCV are useless to try, because they have some commonly used macros with a check-violation. The buildbot filled 30GB of RAM before it crashed and couldn't even finish the analysis of the project. Similar for LLVM).

I would like to try the simple deduplication first and see if space is still an issue. After all I want to just read the diagnostic and see whats happening instantly and a more compressed format might not help there.

I misthought that the output was the -export-fixes, but what you mean is the stdout of clang-tidy.

Could you please explain your motivation of catching clang-tidy stdout? --export-fixes emits everything of diagnostic to YAML even the diagnostic doesn't have fixes. I guess the reason is that you want code snippets that you could show to users? If so, I think this is a separate UX problem, since we have everything in the emitted YAML, and we could construct whatever messages we want from it. A simpler approach maybe:

run clang-tidy in parallel on whole project, and emits a deduplicated result (fixes.yaml).
run a postprocessing in your buildbot that constructs diagnostic messages from fixes.yaml, and store it somewhere.
do whatever you want with output from 1) and 2).

Step 1 could be done in upstream, probably via AllTUsExecutor, and deduplication can be done on the fly based on <CheckName>::<FilePath>::<FileOffset>; we still need clang-apply-replacement to deduplicate replacements; I'm happy to help with this. Step 2 could be done by your own, just a simple script.

At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.

That's why I suggest using some sort of other space-efficient formats to store the fixes. My intuition is that the final deduplicated result shouldn't be too large (even for YAML), because 1) no duplication 2) these are actual diagnostics in code, a healthy codebase shouldn't contain lots of problem 3) you have mentioned that you use it for small projects :)

@hokein you and I seem to be making the same proposal :)

Could you please explain your motivation of catching clang-tidy stdout? --export-fixes emits everything of diagnostic to YAML even the diagnostic doesn't have fixes. I guess the reason is that you want code snippets that you could show to users? If so, I think this is a separate UX problem, since we have everything in the emitted YAML, and we could construct whatever messages we want from it.

A bit for pragmatic reasons and a bit precaution.

You are right with the code-snippet. I want to check for false-positives in new clang-tidy checks, if I can just scroll through and see the code-snippet in question it is just practical.
diagnostics in template-code might emit multiple warnings at the same code-position. There is a realistic chance that warning: xy happened here will be the same for all template-instantiations, and only the note: type 'XY' does not match gives the differentiating hint. If dedup happens _ONLY_ on the first warning I fear we might loose valid diagnostics! I did re-evaluate and it seems that the emitted yaml does not include the notes. That is an issues, for example the CSA output relies on the emitted notes that explain the path to the bug.
I originally implemented it for my buildbot which parses the check-name, location and so on and then gives an ordered output for each check in a module and so on. I extracted the essence for deduplication. -export-fixes still emits the clang-tidy diagnostics, so for my concrete use-case YAML based de-duplication brings no value in its current form, as my BB still struggles with the amount of stdout.

run clang-tidy in parallel on whole project, and emits a deduplicated result (fixes.yaml).

run a postprocessing in your buildbot that constructs diagnostic messages from fixes.yaml, and store it somewhere.

do whatever you want with output from 1) and 2).

Step 1 could be done in upstream, probably via AllTUsExecutor, and deduplication can be done on the fly based on <CheckName>::<FilePath>::<FileOffset>; we still need clang-apply-replacement to deduplicate replacements; I'm happy to help with this. Step 2 could be done by your own, just a simple script.

At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.

That's why I suggest using some sort of other space-efficient formats to store the fixes. My intuition is that the final deduplicated result shouldn't be too large (even for YAML), because 1) no duplication 2) these are actual diagnostics in code, a healthy codebase shouldn't contain lots of problem 3) you have mentioned that you use it for small projects :)

To 3) I do use it for all kinds of projects, LLVM and Blender are currently the biggest ones. I want to go for LibreOffice, Chromium and so on as well. But right now the amount of noise is the biggest obstacle. My goal is not to check if the project is healthy/provide a service for the project, but check if _we_ have bugs in our checks and if code-transformation is correct, false positives, too much output, misleading messages, ...

To 2) LLVM is very chatty as well, I don't consider LLVM to be a bad code-base. Take readability-braces-around-statements for example. I want to test if the check transform all possible places correctly, LLVM does not follow this style and LLVM has a lot of big headers that implement functionality that are transitively included a lot. LLVM is the one that overflowed my 32GB of RAM :)

To 1) I do agree and the data presented support that. I suspect that Yaml-to-stdout Ratio is maybe 2/3:1? So in the analyzed case we end up with 10-15MB of data instead of ~600MB(all Yaml). Space optimization is something we can tackle after-wards as it does not seem to be pathological after deduplication.

In general: I somewhat consider this patch as rejected, I will still use it locally for my BB, but I think this revision should be closed. We could move the discussion here to the mailing-list if you want. It is ok to continue here as well, as we already started to make plans :)
My opinion is, that we should put as much of the deduplication into clang-tidy itself and not rely on tools like run-clang-tidy.py if we can.

So for me step 1. would be providing AllTUsExecutor in clang-tidy and make it parallel itself. For dedup we need hook the diagnostics. CSA has the BugReport class that could be hashed. clang-tidy currently doesn't have this, maybe a similar approach (or the same?) would help us out.

Push some fixes i did while working with this script, for reference of others
potentially looking at this patch.

Harbormaster completed remote builds in B25272: Diff 175063.Nov 22 2018, 10:34 AM

after countless attempts of fixing the unicode problem, it is finally done.
Remove all unnecessary whitespace
remove the xxx warnings generated. as well

This setup now runs in my BB and is a good approximation (for me) how the
deduplication should work in the future.

Still just for reference and documentation.

Harbormaster completed remote builds in B25370: Diff 175459.Nov 27 2018, 5:25 AM

make the script more useable in my buildbot context
reduce the test-files
fix unicode issues I encountered while using

Herald added a reviewer: serge-sans-paille. · View Herald TranscriptJan 17 2019, 11:26 AM

Harbormaster completed remote builds in B27005: Diff 182359.Jan 17 2019, 11:27 AM

LLVM is very chatty as well, I don't consider LLVM to be a bad code-base. Take readability-braces-around-statements for example.

Do we need a llvm-elide-braces-for-small-statements?

This would make a great pre-review check

In D54141#1362924, @MyDeveloperDay wrote:

LLVM is very chatty as well, I don't consider LLVM to be a bad code-base. Take readability-braces-around-statements for example.

Do we need a llvm-elide-braces-for-small-statements?

This would make a great pre-review check

IMHO wouldn't hurt. It could even be a readability, one. But we need general thought on how to deal with conflicting checks, in this case especially.
Maybe we could extend the readability-braces-around-statements check with its AntiCheck and make it configurable. Therefore collision cant be emitted?

In D54141#1289980, @hokein wrote:

If you're suggesting proceeding with this regex based solution, I

don't think that's a good idea. Why commit a hack which people will object to ever removing? Just see if we can do the right thing instead.

+1, my main concern is the complexity of the patch and maintenance burden of the python script.

I think these are reasonable concerns and to a degree I share them. At the same time, I worry we may be leaving useful functionality behind in favor of functionality that doesn't exist and doesn't appear to be moving forward. If we were to move forward with this patch, nothing prevents us from surfacing it more naturally later when we have the infrastructure in place for the better solution, correct?

At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.

That's why I suggest using some sort of other space-efficient formats to store the fixes. My intuition is that the final deduplicated result shouldn't be too large (even for YAML), because 1) no duplication 2) these are actual diagnostics in code, a healthy codebase shouldn't contain lots of problem 3) you have mentioned that you use it for small projects :)

Re: #2, I don't think that assertion is true in practice. I expect there are plenty of projects that contain a lot of clang-tidy diagnostics, especially given that clang-tidy checks tend to have higher false positive rates. Even if clang-tidy checks were not so chatty, "shouldn't" and "don't" are very different measurements.

I'm not suggesting to plow full-steam-ahead with this patch or that the concerns raised here are invalid, but at the same time, I think it does solve a real problem and it would be a shame to lose a workable solution because something better might be possible. If work is taking place to actually implement that something better, then that's a different matter of course. I get the impression though that "something better" is an extensive amount of work compared to what's in front of us; am I misunderstanding?

In D54141#1291509, @JonasToth wrote:

My opinion is, that we should put as much of the deduplication into clang-tidy itself and not rely on tools like run-clang-tidy.py if we can.

Strong +1. TBH, I was unaware people used run-clang-tidy.py. ;-)

lebedev.ri resigned from this revision.Jul 10 2019, 2:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2019, 2:42 PM

won't happen anymore realistically.

Herald added a subscriber: mgehre. · View Herald TranscriptFeb 21 2022, 9:26 AM

Revision Contents

Path

Size

clang-tidy/

tool/

run-clang-tidy.py

204 lines

run_clang_tidy.py

1 line

test_input/

out_csa_cmake.log

69 lines

out_performance_cmake.log

93 lines

test_log_parser.py

236 lines

docs/

ReleaseNotes.rst

3 lines

Commit	Tree	Parents	Author	Summary	Date
0e846d64b6a5	8fd2845d2685	34da30bcf037 128300ffd21c	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Jan 17 2019, 11:25 AM
34da30bcf037	d0f0a868dcea	1eb86cbd2a73	Jonas Toth	[Misc] fix nits to densify output	Jan 17 2019, 11:25 AM
1eb86cbd2a73	2896d450de5f	493b05048689	Jonas Toth	[Test] reduce out_csa_cmake.log size but still be representative	Jan 16 2019, 8:15 AM
493b05048689	eb6081814798	205f18e52586 2582ce7ee299	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Jan 15 2019, 1:44 PM
205f18e52586	80e257f5fe55	75fa21d72f30	Jonas Toth	[Fix] logic bug, and/or swap	Nov 29 2018, 12:36 PM
75fa21d72f30	4127bc0d3fd8	3d5e7a00fc67 4e05736ffd0d	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Nov 29 2018, 10:30 AM
3d5e7a00fc67	08ca10c3314a	2c4db4958ae6	Jonas Toth	[Fix] write newline for error messages	Nov 29 2018, 10:29 AM
2c4db4958ae6	a39bcac5e5ca	a7e50e8bcfd1	Jonas Toth	[Fix] single warning message filtered too	Nov 29 2018, 1:29 AM
a7e50e8bcfd1	6ce1b08c53d8	e93566c1c50a 2c9f2cdfbe16	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Nov 27 2018, 2:53 AM
e93566c1c50a	989f7f9166f7	e71942e8aa58 90cd8903ef53	Jonas Toth	Merge branch 'feature_ct_dedup' of github.com:JonasToth/clang-tools-extra into… (Show More…)	Nov 27 2018, 2:41 AM
e71942e8aa58	1d8a32318902	8525cd2a188d	Jonas Toth	[Misc] remove whitespace from log	Nov 27 2018, 2:39 AM
90cd8903ef53	08019c9c6101	8525cd2a188d	Jonas Toth	[TryFix] whitespace still suboptimal	Nov 25 2018, 11:17 AM
8525cd2a188d	ec70bbd14c4d	aaa2151642c1	Jonas Toth	[TryFix] unicode...	Nov 23 2018, 3:17 AM
aaa2151642c1	074dd9e7587b	ac49c3ddb209 6986dc5abfb8	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Nov 22 2018, 11:42 AM
ac49c3ddb209	c4e5031c68c7	311f39777816	Jonas Toth	[Misc] whitespace changes	Nov 22 2018, 11:41 AM
311f39777816	cac2d084560e	f15c84d90c00	Jonas Toth	[Misc] reduce whitespace	Nov 22 2018, 11:31 AM
f15c84d90c00	b2227d9032f2	3517f5be519c	Jonas Toth	[TryFix] unicode problems...	Nov 22 2018, 11:26 AM
3517f5be519c	7a63738a4fdf	b8f4eb13ec3b	Jonas Toth	[Fix] dont throw exceptions on encoding issues, use replacements	Nov 22 2018, 10:33 AM
b8f4eb13ec3b	4740e19ce80d	23fb1f4a223b	Jonas Toth	[misc] make unicode string literals	Nov 21 2018, 7:00 AM
23fb1f4a223b	99bbb55553c9	66e0c04cce28	Jonas Toth	[TryFix] finding the coding problem	Nov 20 2018, 8:32 AM
66e0c04cce28	ac69ebf22151	1f4689e2bca8	Jonas Toth	[Misc] adjust encoding non throwing on errors	Nov 20 2018, 5:41 AM
1f4689e2bca8	1388de9bb8a4	ead80d66343e 35796d2d1759	Jonas Toth	Merge branch 'master' into feature_ct_dedup	Nov 20 2018, 3:55 AM
ead80d66343e	02bf49afeceb	356723c17739	Jonas Toth	[Debug] add diagnostic output for potential place where the current diag… (Show More…)	Nov 14 2018, 9:19 AM
356723c17739	110972e8a500	5165a1ac1ca2	Jonas Toth	[Fix] yes newline	Nov 12 2018, 3:35 PM
5165a1ac1ca2	ec911ffb1908	b4912b7deb9d	Jonas Toth	[Misc] adjust new lines	Nov 12 2018, 3:19 PM
b4912b7deb9d	7457b6b6c010	fd9114c6f665	Jonas Toth	[Fix] output newline after invocation	Nov 11 2018, 2:45 AM
fd9114c6f665	0aa5dc678312	725f1be9a24f	Jonas Toth	[Fix] decode everything in utf-8	Nov 11 2018, 2:14 AM
725f1be9a24f	50a730987f39	29f8b877326d	Jonas Toth	[Fix] flush stdout/err after writing the diagnostics	Nov 11 2018, 2:07 AM
29f8b877326d	96edd81826c3	11dc914f89ad 8660362e75de	Jonas Toth	Merge branch 'feature_ct_dedup' of github.com:JonasToth/clang-tools-extra into… (Show More…)	Nov 11 2018, 2:06 AM
8660362e75de	96edd81826c3	473b2fb2b522	Jonas Toth	[Misc] reduce noise output from clang-tidy more, densify	Nov 10 2018, 2:44 PM
473b2fb2b522	e844887be422	fbf74d7749db	Jonas Toth	[Fix] parser resetting only if possible	Nov 6 2018, 2:05 AM
fbf74d7749db	8ce63f58328f	a08c82b726ee	Jonas Toth	[Misc] remove whitespace at end of file	Nov 6 2018, 2:01 AM
a08c82b726ee	60add08ae47b	e8e58b911f39	Jonas Toth	[Test] add file parsing tests	Nov 6 2018, 2:00 AM
e8e58b911f39	e41bbbe6c2fa	de8520db1863	Jonas Toth	[Misc] shorten performance log	Nov 6 2018, 1:48 AM
de8520db1863	c2496249d203	8d4c27456df5	Jonas Toth	[Misc] remove noise completly	Nov 6 2018, 1:46 AM
8d4c27456df5	cc39ed05018c	bb6d8cfe3c69	Jonas Toth	[Misc] remove noise output from log	Nov 6 2018, 1:43 AM
bb6d8cfe3c69	daacbf9819b1	49a6a9f1f6dc	Jonas Toth	[Misc] clean up test data	Nov 6 2018, 1:41 AM
49a6a9f1f6dc	85ac2ea16ebe	d9a10f0d9803	Jonas Toth	[Feature] hide deduplication behind command line option	Nov 6 2018, 12:37 AM
d9a10f0d9803	23543db466c3	8eb5af770e55	Jonas Toth	[Fix] move parser out of each thread to remove final duplicates	Nov 5 2018, 2:01 PM
8eb5af770e55	5fc319f59ea5	138b8e37a76d	Jonas Toth	[Feature] switch to sha256 hashes for deduplication (Show More…)	Nov 5 2018, 1:27 PM
138b8e37a76d	33b26be7bdd2	5bfbc1897ceb	Jonas Toth	[Refactor] move deduplication code around	Nov 5 2018, 12:56 PM
5bfbc1897ceb	f090ef28dd37	a9eccb4eef90	Jonas Toth	[Fix] get run-clang-tidy.py running again	Nov 5 2018, 11:43 AM
a9eccb4eef90	88318cfe28fa	b1c786da3125	Jonas Toth	[Feature] use the deduplication parser in run-clang-tidy.py	Nov 5 2018, 11:23 AM
b1c786da3125	2b266b4c264f	62069ab9c6dc	Jonas Toth	[Refactor] remove old utility.py file	Nov 5 2018, 11:17 AM
62069ab9c6dc	832810e68b9c	1ce53bd6b265	Jonas Toth	[Refactor] move parsing code into run-clang-tidy, fix tests	Nov 5 2018, 11:16 AM
1ce53bd6b265	3402d7d9cd36	0831acb67111	Jonas Toth	[Fix] logic bug in deduplication, resetting shall not clean dedup data	Nov 5 2018, 11:08 AM
0831acb67111	8f514f057d01	9bdc33607993	Jonas Toth	[Feature] implement deduplication parser and test on real data	Nov 5 2018, 11:02 AM
9bdc33607993	d3c9ebdff026	b67a33d25d81	Jonas Toth	[Feature] implement basic data structure for efficient deduplication	Nov 5 2018, 8:52 AM
b67a33d25d81	1b0d2c2a4194	a7fa0d9ec595	Jonas Toth	[Misc] more test data	Nov 5 2018, 7:19 AM
a7fa0d9ec595	edc714bb5fd2	391f75d1b453	Jonas Toth	[Feature] start implementation for diagnostic deduplication in run-clang-tidy. (Show More…)	Nov 5 2018, 4:23 AM
11dc914f89ad	656519a36b8b	6585f6a83046 5f38fd6e3d7b	Jonas Toth	Merge branch 'feature_ct_dedup' of github.com:JonasToth/clang-tools-extra into… (Show More…)	Nov 6 2018, 2:20 AM
6585f6a83046	656519a36b8b	16f37a8f6755	Jonas Toth	[clang-tidy] add deduplication support for run-clang-tidy.py (Show More…)	Nov 6 2018, 2:05 AM
5f38fd6e3d7b	656519a36b8b	16f37a8f6755	Jonas Toth	[Fix] parser resetting only if possible	Nov 6 2018, 2:05 AM
16f37a8f6755	3df2ed5dde25	78d9754c90cc	Jonas Toth	[Misc] remove whitespace at end of file	Nov 6 2018, 2:01 AM
78d9754c90cc	07f61829c340	ee3a8a419f20	Jonas Toth	[Test] add file parsing tests	Nov 6 2018, 2:00 AM
ee3a8a419f20	41fef09ea9ea	cdc11f4dd131	Jonas Toth	[Misc] shorten performance log	Nov 6 2018, 1:48 AM
cdc11f4dd131	3f0b2f5adcf6	cc92d25fbbaa	Jonas Toth	[Misc] remove noise completly	Nov 6 2018, 1:46 AM
cc92d25fbbaa	170af36cda91	472a3fdafff8	Jonas Toth	[Misc] remove noise output from log	Nov 6 2018, 1:43 AM
472a3fdafff8	2a272616bfdf	c6ceca1f80ed	Jonas Toth	[Misc] clean up test data	Nov 6 2018, 1:41 AM
c6ceca1f80ed	522bb3a1bfd0	4fd1be761bdd	Jonas Toth	[Feature] hide deduplication behind command line option	Nov 6 2018, 12:37 AM
4fd1be761bdd	55dd34782d69	0d2cc978c037	Jonas Toth	[Fix] move parser out of each thread to remove final duplicates	Nov 5 2018, 2:01 PM
0d2cc978c037	95ae45a9c56b	3ae6b5a6509c	Jonas Toth	[Feature] switch to sha256 hashes for deduplication (Show More…)	Nov 5 2018, 1:27 PM
3ae6b5a6509c	b62e9dae4802	b28415ac8ac8	Jonas Toth	[Refactor] move deduplication code around	Nov 5 2018, 12:56 PM
b28415ac8ac8	e37ef4c8f4c1	e94f2c57cf49	Jonas Toth	[Fix] get run-clang-tidy.py running again	Nov 5 2018, 11:43 AM
e94f2c57cf49	bbbdff8f72ea	e9837be545ab	Jonas Toth	[Feature] use the deduplication parser in run-clang-tidy.py	Nov 5 2018, 11:23 AM
e9837be545ab	7625eee1fcfa	147f0b28a492	Jonas Toth	[Refactor] remove old utility.py file	Nov 5 2018, 11:17 AM
147f0b28a492	f8939769f287	d5d8cc756383	Jonas Toth	[Refactor] move parsing code into run-clang-tidy, fix tests	Nov 5 2018, 11:16 AM
d5d8cc756383	7735ec381d03	c266039640dc	Jonas Toth	[Fix] logic bug in deduplication, resetting shall not clean dedup data	Nov 5 2018, 11:08 AM
c266039640dc	e634cac269ab	f95be2bdb844	Jonas Toth	[Feature] implement deduplication parser and test on real data	Nov 5 2018, 11:02 AM
f95be2bdb844	33ea8ed15893	1ecbb5f7cc4e	Jonas Toth	[Feature] implement basic data structure for efficient deduplication	Nov 5 2018, 8:52 AM
1ecbb5f7cc4e	38f394033170	9de4f34bce7f	Jonas Toth	[Misc] more test data	Nov 5 2018, 7:19 AM
9de4f34bce7f	6cfdbbdf6393	e936bbdce059	Jonas Toth	[Feature] start implementation for diagnostic deduplication in run-clang-tidy. (Show More…)	Nov 5 2018, 4:23 AM

Diff 182359

clang-tidy/tool/run-clang-tidy.py

Show All 32 Lines
Compilation database setup:		Compilation database setup:
http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html		http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html
"""		"""

from __future__ import print_function		from __future__ import print_function

import argparse		import argparse
import glob		import glob
		import hashlib
import json		import json
import multiprocessing		import multiprocessing
import os		import os
import re		import re
import shutil		import shutil
import subprocess		import subprocess
import sys		import sys
import tempfile		import tempfile
import threading		import threading
import traceback		import traceback
import yaml		import yaml

is_py2 = sys.version[0] == '2'		is_py2 = sys.version[0] == '2'

if is_py2:		if is_py2:
import Queue as queue		import Queue as queue
else:		else:
import queue as queue		import queue as queue


		class Diagnostic(object):
		"""
		This class represents a parsed diagnostic message coming from clang-tidy
		output. While parsing the raw output each new diagnostic will incrementally
		build a temporary object of this class. Once the end of the diagnotic
		message is found its content is hashed with SHA256 and stored in a set.
		"""

		def __init__(self, path, line, column, diag):
		"""
		Start initializing this object. The source location is always known
		as it is emitted first and always in a single line.
		`diag` will contain all warning/error/note information until the first
		line-break. These are very uncommon but for example CSA's
		PaddingChecker emits a multi-line warning containing the optimal
		layout of a record. These additional lines must be added after
		creation of the `Diagnostic`.
		"""
		self._path = path
		self._line = line
		self._column = column
		self._diag = diag
		self._additional = ""

		def add_additional_line(self, line):
		"""Store more additional information line per line while parsing."""
		self._additional += "\n" + line

		def get_fingerprint(self):
		"""Return a secure fingerprint (SHA256 hash) of the diagnostic."""
		return hashlib.sha256(self.__str__().encode("utf-8", "backslachreplace")).hexdigest()

		def __str__(self):
		"""Transform the object back into a raw diagnostic."""
		return (self._path + ":" + str(self._line) + ":" + str(self._column)\
		+ ": " + self._diag + self._additional).encode("ascii", "backslashreplace")


		class Deduplication(object):
		"""
		This class provides an interface to deduplicate diagnostics emitted from
		`clang-tidy`. It maintains a `set` of SHA 256 hashes of the diagnostics
		and allows to query if an diagnostic is already emitted
		(according to the corresponding hash of the diagnostic string!).
		"""

		def __init__(self):
		"""Initializes an empty set."""
		self._set = set()

		def insert_and_query(self, diag):
		"""
		This method returns True if the `diag` was NOT emitted already
		signaling that the parser shall store/emit this diagnostic.
		If the `diag` was stored already this method return False and has
		no effect.
		"""
		fp = diag.get_fingerprint()
		if fp not in self._set:
		self._set.add(fp)
		return True
		return False


		def _is_valid_diag_match(match_groups):
		"""Return true if all elements in `match_groups` are not None."""
		return all(g is not None for g in match_groups)


		def _diag_from_match(match_groups):
		"""Helper function to create a diagnostic object from a regex match."""
		return Diagnostic(
		str(match_groups[0]), int(match_groups[1]), int(match_groups[2]),
		str(match_groups[3]) + ": " + str(match_groups[4]))


		class ParseClangTidyDiagnostics(object):
		"""
		This class is a stateful parser for `clang-tidy` diagnostic output.
		The parser collects all unique diagnostics that can be emitted after
		deduplication.
		"""

		def __init__(self):
		super(ParseClangTidyDiagnostics, self).__init__()
		self._diag_re = re.compile(
		r"^(.+):(\d+):(\d+): (error\|warning): (.*)$")
		self._current_diag = None

		self._dedup = Deduplication()
		self._uniq_diags = list()

		def reset_parser(self):
		"""
		Clean the parsing data to prepare for another set of output from
		`clang-tidy`. The deduplication is not cleaned because that data
		is required between multiple parsing runs. The diagnostics are cleaned
		as the parser assumes the new unique diagnostics are consumed before
		the parser is reset.
		"""
		self._current_diag = None
		self._uniq_diags = list()

		def get_diags(self):
		"""
		Returns a list of diagnostics that can be emitted after parsing the
		full output of a `clang-tidy` invocation.
		The list contains no duplicates.
		"""
		return self._uniq_diags

		def parse_string(self, input_str):
		"""Parse a string, e.g. captured stdout."""
		if self._current_diag:
		print("WARNING: FOUND CURRENT DIAG TO BE SET! BUG!!")
		print("DIAGNOSTIC MESSAGE:")
		print(str(self._current_diag))
		print("SETTING _current_diag TO NONE")
		self._current_diag = None
		self._parse_lines(input_str.splitlines())

		def _parse_line(self, line):
		"""Parses one line and returns nothing."""
		match = self._diag_re.match(line)

		# A new diagnostic is found (either error or warning).
		if match and _is_valid_diag_match(match.groups()):
		self._handle_new_diag(match.groups())

		# There was no new diagnostic but a previous diagnostic is in flight.
		# Interpret this situation as additional output like notes or
		# code-pointers from the diagnostic that is in flight.
		elif not match and self._current_diag:
		self._current_diag.add_additional_line(line)

		# There was no diagnostic in flight and this line did not create a
		# new one. This situation should not occur, but might happen if
		# `clang-tidy` emits information before warnings start.
		else:
		return

		def _handle_new_diag(self, match_groups):
		"""Potentially store an in-flight diagnostic and create a new one."""
		self._register_diag()
		self._current_diag = _diag_from_match(match_groups)

		def _register_diag(self):
		"""
		Stores a potential in-flight diagnostic if it is a new unique message.
		"""
		# The current in-flight diagnostic was not emitted before, therefor
		# it should be stored as a new unique diagnostic.
		if self._current_diag and \
		self._dedup.insert_and_query(self._current_diag):
		self._uniq_diags.append(self._current_diag)

		def _parse_lines(self, line_list):
		"""Parse a list of lines without \\n at the end of each string."""
		assert self._current_diag is None, \
		"Parser not in a clean state to restart parsing"

		for line in line_list:
		self._parse_line(line.rstrip())
		# Register the last diagnostic after all input is parsed.
		self._register_diag()

		def _parse_file(self, filename):
		"""Helper to parse a full file, for testing purposes only."""
		with open(filename, "r") as input_file:
		self._parse_lines(input_file.readlines())


def find_compilation_database(path):		def find_compilation_database(path):
"""Adjusts the directory until a compilation database is found."""		"""Adjusts the directory until a compilation database is found."""
result = './'		result = './'
while not os.path.isfile(os.path.join(result, path)):		while not os.path.isfile(os.path.join(result, path)):
if os.path.realpath(result) == '/':		if os.path.realpath(result) == '/':
print('Error: could not find compilation database.')		print('Error: could not find compilation database.')
sys.exit(1)		sys.exit(1)
result += '../'		result += '../'
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	def apply_fixes(args, tmpdir):
if args.format:		if args.format:
invocation.append('-format')		invocation.append('-format')
if args.style:		if args.style:
invocation.append('-style=' + args.style)		invocation.append('-style=' + args.style)
invocation.append(tmpdir)		invocation.append(tmpdir)
subprocess.call(invocation)		subprocess.call(invocation)


def run_tidy(args, tmpdir, build_path, queue, lock, failed_files):		def run_tidy(args, tmpdir, build_path, queue, lock, failed_files, parser):
"""Takes filenames out of queue and runs clang-tidy on them."""		"""Takes filenames out of queue and runs clang-tidy on them."""
while True:		while True:
name = queue.get()		name = queue.get()
invocation = get_tidy_invocation(name, args.clang_tidy_binary, args.checks,		invocation = get_tidy_invocation(name, args.clang_tidy_binary, args.checks,
tmpdir, build_path, args.header_filter,		tmpdir, build_path, args.header_filter,
args.extra_arg, args.extra_arg_before,		args.extra_arg, args.extra_arg_before,
args.quiet, args.config)		args.quiet, args.config)

proc = subprocess.Popen(invocation, stdout=subprocess.PIPE, stderr=subprocess.PIPE)		proc = subprocess.Popen(invocation, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate()		output, err = proc.communicate()
if proc.returncode != 0:		if proc.returncode != 0:
failed_files.append(name)		failed_files.append(name)

with lock:		with lock:
sys.stdout.write(' '.join(invocation) + '\n' + output.decode('utf-8') + '\n')		invoc = ' '.join(invocation) + '\n'
		if parser:
		parser.parse_string(output.decode('utf-8', 'backslashreplace'))
		diags = [str(diag) for diag in parser.get_diags()]
		diag_str = '\n'.join(diags)
		sys.stdout.write(''.join([invoc, diag_str]).rstrip().encode('utf-8', 'backslashreplace'))
		sys.stdout.write('\n')
		parser.reset_parser()
		else:
		sys.stdout.write(invoc + output.decode('utf-8', 'backslashreplace').strip() + '\n')
		sys.stdout.flush()

if len(err) > 0:		if len(err) > 0:
sys.stderr.write(err.decode('utf-8') + '\n')		err_lines = err.splitlines()
		errors = [l for l in err_lines if not "warnings generated" in l]
		for l in errors:
		sys.stderr.write(l.decode('utf-8', 'backslashreplace'))
		sys.stderr.flush()

queue.task_done()		queue.task_done()


def main():		def main():
parser = argparse.ArgumentParser(description='Runs clang-tidy over all files '		parser = argparse.ArgumentParser(description='Runs clang-tidy over all files '
'in a compilation database. Requires '		'in a compilation database. Requires '
'clang-tidy and clang-apply-replacements in '		'clang-tidy and clang-apply-replacements in '
'$PATH.')		'$PATH.')
Show All 38 Lines	parser.add_argument('-extra-arg', dest='extra_arg',
help='Additional argument to append to the compiler '		help='Additional argument to append to the compiler '
'command line.')		'command line.')
parser.add_argument('-extra-arg-before', dest='extra_arg_before',		parser.add_argument('-extra-arg-before', dest='extra_arg_before',
action='append', default=[],		action='append', default=[],
help='Additional argument to prepend to the compiler '		help='Additional argument to prepend to the compiler '
'command line.')		'command line.')
parser.add_argument('-quiet', action='store_true',		parser.add_argument('-quiet', action='store_true',
help='Run clang-tidy in quiet mode')		help='Run clang-tidy in quiet mode')
		parser.add_argument('-deduplicate', action='store_true',
		help='Deduplicate diagnostic message from clang-tidy')
args = parser.parse_args()		args = parser.parse_args()

db_path = 'compile_commands.json'		db_path = 'compile_commands.json'

if args.build_path is not None:		if args.build_path is not None:
build_path = args.build_path		build_path = args.build_path
else:		else:
# Find our database		# Find our database
Show All 29 Lines	def main():

return_code = 0		return_code = 0
try:		try:
# Spin up a bunch of tidy-launching threads.		# Spin up a bunch of tidy-launching threads.
task_queue = queue.Queue(max_task)		task_queue = queue.Queue(max_task)
# List of files with a non-zero return code.		# List of files with a non-zero return code.
failed_files = []		failed_files = []
lock = threading.Lock()		lock = threading.Lock()
		parser = None
		if args.deduplicate:
		parser = ParseClangTidyDiagnostics()
for _ in range(max_task):		for _ in range(max_task):
t = threading.Thread(target=run_tidy,		t = threading.Thread(target=run_tidy,
args=(args, tmpdir, build_path, task_queue, lock, failed_files))		args=(args, tmpdir, build_path, task_queue, lock, failed_files, parser))
t.daemon = True		t.daemon = True
t.start()		t.start()

# Fill the queue with files.		# Fill the queue with files.
for name in files:		for name in files:
if file_name_re.search(name):		if file_name_re.search(name):
task_queue.put(name)		task_queue.put(name)

Show All 37 Lines

clang-tidy/tool/run_clang_tidy.py

This file was added.

Property	Old Value	New Value
File Mode	null	120000

				run-clang-tidy.py
				No newline at end of file
				JonasTothAuthorUnsubmitted Not Done Reply Inline Actions This simlink is required for my unittests, I don't know how to add the added tests in the `lit` test-suite so there is no change there yet. A bit of guidance there would be nice :) JonasToth: This simlink is required for my unittests, I don't know how to add the added tests in the `lit`…

clang-tidy/tool/test_input/out_csa_cmake.log

This file was added.

				/project/git/Source/kwsys/testCommandLineArguments1.cxx:83:16: warning: Null pointer passed as an argument to a 'nonnull' parameter [clang-analyzer-core.NonNullParamChecker]
				strcmp(valid_unused_args[cc], newArgv[cc]) != 0) {
				^
				/project/git/Source/kwsys/testCommandLineArguments1.cxx:35:7: note: Assuming the condition is false
				if (!arg.Parse()) {
				^
				/project/git/Source/kwsys/testCommandLineArguments1.cxx:35:3: note: Taking false branch
				if (!arg.Parse()) {
				^
				/project/git/Source/kwsys/testCommandLineArguments1.cxx:79:5: note: Taking false branch
				if (cc >= 9) {
				^
				/project/git/Source/kwsys/testCommandLineArguments1.cxx:82:16: note: Left side of '&&' is true
				} else if (valid_unused_args[cc] &&
				^
				/project/git/Source/kwsys/testCommandLineArguments1.cxx:83:16: note: Null pointer passed as an argument to a 'nonnull' parameter
				strcmp(valid_unused_args[cc], newArgv[cc]) != 0) {
				^
				/project/git/Utilities/cmcurl/lib/urldata.h:1209:8: warning: Excessive padding in 'struct UrlState' (61 padding bytes, where 5 is optimal).
				Optimal fields order:
				conn_cache,
				lastconnect,
				headerbuff,
				headersize,
				prev_block_had_trailing_cr,
				slash_removed,
				use_range,
				rangestringalloc,
				done,
				stream_depends_e,
				consider reordering the fields or adding explicit padding members [clang-analyzer-optin.performance.Padding]
				struct UrlState {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:11:3: warning: 3rd function call argument is an uninitialized value [clang-analyzer-core.CallAndMessage]
				printf("[0x%02X,0x%02X,0x%02X,0x%02X]", static_cast<int>(d[0]),
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:93:3: note: Loop condition is false. Execution continues on line 98
				for (test_utf8_entry const* e = good_entry; e->n; ++e) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:98:3: note: Loop condition is true. Entering loop body
				for (test_utf8_char const* c = bad_chars; (*c)[0]; ++c) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:99:5: note: Taking false branch
				if (!decode_bad(*c)) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:98:3: note: Loop condition is true. Entering loop body
				for (test_utf8_char const* c = bad_chars; (*c)[0]; ++c) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:99:10: note: Calling 'decode_bad'
				if (!decode_bad(*c)) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:80:7: note: Assuming 'e' is null
				if (e) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:80:3: note: Taking false branch
				if (e) {
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:85:3: note: Calling 'report_bad'
				report_bad(true, s);
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:46:32: note: '?' condition is true
				printf("%s: decoding bad ", passed ? "pass" : "FAIL");
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:47:3: note: Calling 'test_utf8_char_print'
				test_utf8_char_print(c);
				^
				/project/git/Tests/CMakeLib/testUTF8.cxx:11:3: note: 3rd function call argument is an uninitialized value
				printf("[0x%02X,0x%02X,0x%02X,0x%02X]", static_cast<int>(d[0]),
				^

clang-tidy/tool/test_input/out_performance_cmake.log

This file was added.

				/project/git/Source/kwsys/Glob.cxx:200:28: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
				realname = dir + "/" + fname;
				^
				/project/git/Source/kwsys/Glob.cxx:223:47: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
				"' failed! Reason: '" + realPathErrorMessage + "'"));
				^
				/project/git/Source/kwsys/Glob.cxx:253:42: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
				message += canonicalPath + "/" + fname;
				^
				/project/git/Source/kwsys/Glob.cxx:305:28: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
				realname = dir + "/" + fname;
				^
				/project/git/Source/kwsys/SystemTools.cxx:1993:25: warning: 'find_first_of' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				if (ret.find_first_of(" ") != std::string::npos) {
				^~~
				' '
				/project/git/Source/kwsys/SystemTools.cxx:2068:17: warning: local copy 'source_name' of the variable 'source' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
				std::string source_name = source;
				^
				const &
				/project/git/Source/kwsys/SystemTools.cxx:2212:19: warning: local copy 'source_name' of the variable 'source' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
				std::string source_name = source;
				^
				const &
				/project/git/Source/kwsys/SystemTools.cxx:3050:49: warning: 'rfind' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				std::string::size_type slashPos = dir.rfind("/");
				^~~
				'/'
				/project/git/Source/kwsys/SystemTools.cxx:3207:32: warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
				out_components.push_back(std::move(*i));
				^~~~~~~~~~ ~
				/project/git/Source/kwsys/SystemTools.cxx:3638:15: warning: local copy 'data' of the variable 'str' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
				std::string data(str);
				^
				const &
				/project/git/Source/kwsys/SystemTools.cxx:3688:47: warning: 'rfind' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				std::string::size_type slash_pos = fn.rfind("/");
				^~~
				'/'
				/project/git/Source/kwsys/SystemInformation.cxx:1340:28: warning: 'rfind' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				size_t at = file.rfind("/");
				^~~
				'/'
				/project/git/Source/kwsys/SystemInformation.cxx:3354:23: warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				pos = buffer.find(":", pos);
				^~~
				':'
				/project/git/Source/kwsys/SystemInformation.cxx:3355:31: warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				size_t pos2 = buffer.find("\n", pos);
				^~~~
				'\n'
				/project/git/Source/kwsys/SystemInformation.cxx:4605:43: warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				size_t pos2 = this->SysCtlBuffer.find("\n", pos);
				^~~~
				'\n'
				/project/git/Source/kwsys/SystemInformation.cxx:5407:29: warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				while ((pos = output.find("\r", pos)) != std::string::npos) {
				^~~~
				'\r'
				/project/git/Source/kwsys/SystemInformation.cxx:5413:29: warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]
				while ((pos = output.find("\n", pos)) != std::string::npos) {
				^~~~
				'\n'
				/project/git/Utilities/cmjsoncpp/include/json/value.h:235:5: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				CZString(CZString&& other);
				^
				/project/git/Utilities/cmjsoncpp/include/json/value.h:241:15: warning: move assignment operators should be marked noexcept [performance-noexcept-move-constructor]
				CZString& operator=(CZString&& other);
				^
				/project/git/Utilities/cmjsoncpp/include/json/value.h:326:3: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				Value(Value&& other);
				^
				/project/git/Utilities/cmjsoncpp/src/lib_json/json_reader.cpp:1998:53: warning: the parameter 'key' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
				Value& CharReaderBuilder::operator[](JSONCPP_STRING key)
				^
				/project/git/Utilities/cmjsoncpp/include/json/value.h:235:5: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				CZString(CZString&& other);
				^
				/project/git/Utilities/cmjsoncpp/include/json/value.h:241:15: warning: move assignment operators should be marked noexcept [performance-noexcept-move-constructor]
				CZString& operator=(CZString&& other);
				^
				/project/git/Utilities/cmjsoncpp/include/json/value.h:326:3: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				Value(Value&& other);
				^
				/project/git/Utilities/cmjsoncpp/src/lib_json/json_value.cpp:278:18: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				Value::CZString::CZString(CZString&& other)
				^
				/project/git/Utilities/cmjsoncpp/src/lib_json/json_value.cpp:302:35: warning: move assignment operators should be marked noexcept [performance-noexcept-move-constructor]
				Value::CZString& Value::CZString::operator=(CZString&& other) {
				^
				/project/git/Utilities/cmjsoncpp/src/lib_json/json_value.cpp:490:8: warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
				Value::Value(Value&& other) {
				^

clang-tidy/tool/test_log_parser.py

This file was added.

				#!/usr/bin/env python
				# -- coding: utf-8 --

				import unittest
				from run_clang_tidy import Diagnostic, Deduplication
				from run_clang_tidy import ParseClangTidyDiagnostics, _is_valid_diag_match


				class TestDiagnostics(unittest.TestCase):
				"""Test fingerprinting diagnostic messages"""

				def test_construction(self):
				d = Diagnostic("/home/user/project/my_file.h", 24, 4,
				"warning: Do not do this thing [warning-category]")
				self.assertIsNotNone(d)
				self.assertEqual(str(d),
				"/home/user/project/my_file.h:24:4: warning: Do not do this thing [warning-category]")

				d.add_additional_line(" MyCodePiece();")
				d.add_additional_line(" ^")

				self.assertEqual(str(d),
				"/home/user/project/my_file.h:24:4: warning: Do not do this thing [warning-category]"
				"\n MyCodePiece();"
				"\n ^")


				class TestDeduplication(unittest.TestCase):
				"""Test the `DiagEssence` based deduplication of diagnostic messages."""

				def test_construction(self):
				self.assertIsNotNone(Deduplication())

				def test_dedup(self):
				dedup = Deduplication()
				d = Diagnostic("/home/user/project/my_file.h", 24, 4,
				"warning: Do not do this thing [warning-category]")
				self.assertTrue(dedup.insert_and_query(d))
				self.assertFalse(dedup.insert_and_query(d))

				d2 = Diagnostic("/home/user/project/my_file.h", 24, 4,
				"warning: Do not do this thing [warning-category]")
				d2.add_additional_line(" MyCodePiece();")
				d2.add_additional_line(" ^")
				self.assertTrue(dedup.insert_and_query(d2))
				self.assertFalse(dedup.insert_and_query(d2))

				d3 = Diagnostic("/home/user/project/my_file.h", 24, 4,
				"warning: Do not do this thing [warning-category]")
				self.assertFalse(dedup.insert_and_query(d3))

				class TestLinewiseParsing(unittest.TestCase):
				def test_construction(self):
				self.assertIsNotNone(ParseClangTidyDiagnostics())

				def test_valid_diags_regex(self):
				pp = ParseClangTidyDiagnostics()

				warning = "/home/user/project/my_file.h:123:1: warning: don't do it [no]"
				m = pp._diag_re.match(warning)
				self.assertTrue(m)
				self.assertTrue(_is_valid_diag_match(m.groups()))

				error = "/home/user/project/my_file.h:1:110: error: wrong! [not-ok]"
				m = pp._diag_re.match(error)
				self.assertTrue(m)
				self.assertTrue(_is_valid_diag_match(m.groups()))

				hybrid = "/home/user/project/boo.cpp:30:42: error: wrong! [not-ok,bad]"
				m = pp._diag_re.match(hybrid)
				self.assertTrue(m)
				self.assertTrue(_is_valid_diag_match(m.groups()))

				note = "/home/user/project/my_file.h:1:110: note: alksdj"
				m = pp._diag_re.match(note)
				self.assertFalse(m)

				garbage = "not a path:not_a_number:110: gibberish"
				m = pp._diag_re.match(garbage)
				self.assertFalse(m)

				def test_single_diagnostics(self):
				pp = ParseClangTidyDiagnostics()
				example_warning = [
				"/project/git/Source/kwsys/Terminal.c:53:21: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]",
				]
				pp._parse_lines(example_warning)
				self.assertEqual(
				str(pp.get_diags()[0]),
				"/project/git/Source/kwsys/Terminal.c:53:21: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]"
				)

				def test_no_diag(self):
				pp = ParseClangTidyDiagnostics()
				garbage_lines = \
				"""
				hicpp-no-array-decay
				hicpp-no-assembler
				hicpp-no-malloc
				hicpp-noexcept-move
				hicpp-signed-bitwise
				hicpp-special-member-functions
				hicpp-static-assert
				hicpp-undelegated-constructor
				hicpp-use-auto
				hicpp-use-emplace
				hicpp-use-equals-default
				hicpp-use-equals-delete
				hicpp-use-noexcept
				hicpp-use-nullptr
				hicpp-use-override
				hicpp-vararg

				clang-apply-replacements version 8.0.0
				18 warnings generated.
				36 warnings generated.
				Suppressed 26 warnings (26 in non-user code).
				Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

				61 warnings generated.
				122 warnings generated.
				Suppressed 122 warnings (122 in non-user code).
				Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

				clang-tidy -header-filter=^/project/git/.* -checks=-,hicpp- -export-fixes /tmp/tmpH8MVt0/tmpErKPl_.yaml -p=/project/git /project/git/Source/kwsys/Terminal.c
				"""
				pp.parse_string(garbage_lines)
				self.assertEqual(len(pp.get_diags()), 0)

				def test_deduplicate_basic_multi_line_warning(self):
				pp = ParseClangTidyDiagnostics()
				example_warning = [
				"/project/git/Source/kwsys/Terminal.c:53:21: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]",
				"int default_tty = color & kwsysTerminal_Color_AssumeTTY;",
				" ^",
				]

				pp._parse_lines(example_warning + example_warning)
				diags = pp.get_diags()

				self.assertEqual(len(diags), 1)
				self.assertEqual(
				str(diags[0]),
				"/project/git/Source/kwsys/Terminal.c:53:21: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]"
				"\nint default_tty = color & kwsysTerminal_Color_AssumeTTY;"
				"\n ^")

				def test_real_diags(self):
				pp = ParseClangTidyDiagnostics()
				excerpt = \
				"""/project/git/Source/kwsys/Base64.c:54:35: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				dest[0] = kwsysBase64EncodeChar((src[0] >> 2) & 0x3F);
				^
				/project/git/Source/kwsys/Base64.c:54:36: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				dest[0] = kwsysBase64EncodeChar((src[0] >> 2) & 0x3F);
				^
				/project/git/Source/kwsys/Base64.c:56:27: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				kwsysBase64EncodeChar(((src[0] << 4) & 0x30) \| ((src[1] >> 4) & 0x0F));
				^
				/project/git/Source/kwsys/Base64.c:56:28: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				kwsysBase64EncodeChar(((src[0] << 4) & 0x30) \| ((src[1] >> 4) & 0x0F));
				^
				/project/git/Source/kwsys/Base64.c:54:35: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				dest[0] = kwsysBase64EncodeChar((src[0] >> 2) & 0x3F);
				^
				/project/git/Source/kwsys/Base64.c:54:36: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				dest[0] = kwsysBase64EncodeChar((src[0] >> 2) & 0x3F);
				^
				/project/git/Source/kwsys/Base64.c:56:27: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				kwsysBase64EncodeChar(((src[0] << 4) & 0x30) \| ((src[1] >> 4) & 0x0F));
				^
				/project/git/Source/kwsys/Base64.c:56:28: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				kwsysBase64EncodeChar(((src[0] << 4) & 0x30) \| ((src[1] >> 4) & 0x0F));
				^
				/project/git/Source/kwsys/testCommandLineArguments.cxx:16:10: warning: inclusion of deprecated C++ header 'string.h'; consider using 'cstring' instead [hicpp-deprecated-headers]
				#include <string.h> /* strcmp */
				^~~~~~~~~~
				<cstring>
				/project/git/Source/kwsys/testFStream.cxx:10:10: warning: inclusion of deprecated C++ header 'string.h'; consider using 'cstring' instead [hicpp-deprecated-headers]
				#include <string.h>
				^~~~~~~~~~
				<cstring>
				/project/git/Source/kwsys/testFStream.cxx:77:47: warning: do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead [hicpp-no-array-decay]
				out.write(reinterpret_cast<const char*>(expected_bom_data[i] + 1),
				^
				/project/git/Source/kwsys/testFStream.cxx:78:18: warning: do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead [hicpp-no-array-decay]
				*expected_bom_data[i]);
				^
				/project/git/Source/kwsys/testFStream.cxx:109:3: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				ret \|= testNoFile();
				^
				/project/git/Source/kwsys/testFStream.cxx:110:3: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				ret \|= testBOM();
				^
				/project/git/Source/kwsys/testSystemInformation.cxx:83:36: warning: use of a signed integer operand with a binary bitwise operator [hicpp-signed-bitwise]
				if (info.DoesCPUSupportFeature(static_cast<long int>(1) << i)) {
				^"""
				pp.parse_string(excerpt)
				self.assertEqual(len(pp.get_diags()), 11)

				self.maxDiff = None
				generated_diag = "\n".join(str(diag) for diag in pp.get_diags())
				# It is not identical because of deduplication.
				self.assertNotEqual(generated_diag, excerpt)

				# The first 11 lines are duplicated diags but the rest is identical.
				self.assertEqual(generated_diag,
				"\n".join(excerpt.splitlines()[12:]))

				# Pretend that the next clang-tidy invocation returns its data
				# and the parser shall deduplicate this one as well. This time
				# no new data is expected.
				pp.reset_parser()
				pp.parse_string(excerpt)
				self.assertEqual(len(pp.get_diags()), 0)

				def test_log_files(self):
				pp = ParseClangTidyDiagnostics()
				pp._parse_file("test_input/out_csa_cmake.log")
				self.assertEqual(len(pp.get_diags()), 5)

				pp.reset_parser()
				pp._parse_file("test_input/out_csa_cmake.log")
				self.assertEqual(len(pp.get_diags()), 0)

				pp.reset_parser()
				pp._parse_file("test_input/out_performance_cmake.log")
				self.assertEqual(len(pp.get_diags()), 24)

				pp.reset_parser()
				pp._parse_file("test_input/out_performance_cmake.log")
				self.assertEqual(len(pp.get_diags()), 0)


				if __name__ == "__main__":
				unittest.main()

docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

	The improvements are...			The improvements are...

	Improvements to clang-tidy			Improvements to clang-tidy
	--------------------------			--------------------------

	- ...			- ...

				- `run-clang-tidy.py` support deduplication of `clang-tidy` diagnostics
				to reduce the amount of output with the optional `-deduplicate` flag.

	Improvements to include-fixer			Improvements to include-fixer
	-----------------------------			-----------------------------

	The improvements are...			The improvements are...

	Improvements to modularize			Improvements to modularize
	--------------------------			--------------------------

	The improvements are...			The improvements are...