Include checker name in Static Analyzer PLIST output. If the checker_name is not a suitable key, search and replace in the diff should work. It is useful to include the checker name in the plist output of the Static Analyzer. It makes it easier for 3rd party tools to consume the plist
I was working on tools, that made it more user friendly to use the static
analyzer. One of the features was to mark what are the reports that are new
in a revision. To identify reports, we calculated a hash of the environment
of the final position of the report and the name of the checker. We found
it to be more robust to use the checker names, for that reason, because
they are less likely to change. (Changing either type or category would
mean that all of the hashes would change for that checker and our tool
would report all of its output as new.) We also stored the reports in a
database, and the name of the checker was a natural filtering/sorting
Another example: there is a web frontend to view the reports that are stored in the database. The user can jump to the documentatcion of a specific checker, and the frontend generate the link to the documentation based on the name of the checker that generated to report.
I don't believe the checker name should be used for bug identification. The checker names are implementation detail. The bug message/name and category are better for this. If we think that we might be changing the names of categories, we might come up with some kind of a stable ID.
For example, this patch, which is currently in review (http://reviews.llvm.org/D6178), will move the implementation of a set of warnings from one place/checker to another.
I'd suggest to only use the bug message and it's location. (We should already do that in the CmpRuns script.)
What do you think?
The checker names are not purely implementation details, because the user refers to them, when determining which checkers she wants to run. I also think that, the plist itself is also an implementation detail: I think the users are not supposed to read them, but consume them with tools.
When you mention location, you mean the environment? In a new snapshot of the source code the locations may differ, and we do not want to show those reports in the comparision that are just relocated.
If we identify bugs using messages, once a message changes (because some rewording, or just adding some more information to it), the comparisons will show the reports that have that message as new. I think the checker names are still a bit more stable.
I was working on a web viewer for reports generated by the static analyzer. One of the features of the web viewer was, to be able to jump to the documentation of the checker that reported a specific issue. We had several design rules that we wanted to validate, and the checker documentation containd the design rule, to make it easier for developers to mark false positives. Of course it is possible, to generate the links to the documentation based on the message of the report, but it involves more work to both implement and maintain, since one checker can report multiple messages.
All in all, we found it useful to have this information in the Plist output, but it is also possible that information is redundant for others. We needed a unique id for the checkers and we used their names. You think that categories should be used as unique IDs?
It seems wrong to me to associate the bug descriptions with checker names. I think BugType is more appropriate. Also, I don't think the checker name should be part of identifying the issue.
Said that, we do not have a very good design for issue tracking and we do not guarantee locking either category, bug type, the message, or the checker name. I'll talk to Jordan and Ted to see what they think about this.
Looks like our issue identification is only referring to the issue location (see utils/analyzer/CmpRuns.py).
Jordan has pointed out that a similar discussion has occurred when the getCheckName method has been added to the diagnostic. There are several write-ups from Ted and Jordan about issue tracking there as well. (See the "Future directions for the analyzer" thread on cfe-dev as well as the review comments for the patch that added the getCheckName method. It's about a year old.)
Note that we do not guarantee that CheckName will never change. (At some point, we will want to clean up our naming of the checks.) We also do not guarantee that the CheckName will be the same as the checker name (these should be the names of bug types, not of the implementation module). On the other hand, we do not expect to continuously change these.
Since the getCheckName is already part of the diagnostic, it's fine to add it to the plist output. The name of the field does need to change to check_name. (This highlights that these are not guaranteed to be checker names.)
What do you think?
If you are ok with this, could you rename the field and I'll commit the patch.