Folks, I deleted some code.
Some of it was... Perl code.
Not sure if anybody noticed but scan-build didn't deduplicate reports correctly. Its algorithm was extremely primitive (basically it hashed each html file and deleted files with duplicate hash). This obviously didn't work with cross-file reports (i.e., bug path starts in the main file but ends in a header) which we intended to deduplicate by the end-of-path location, because the main file name is included in the report and therefore affects md5. D42269 doesn't help either. Long story short, it didn't honor the modern advancements in IssueHash at all.
I could make scan-build read the issue hash but I think my solution is even more elegant so hear me out. I put the issue hash into the report filename instead, replacing the random section. When clang tries to emit a duplicate report, it'll simply fail because the file with that name is already present. Moreover, as per tradition, I reduce the issue hash to the first 6 characters, so bug report filenames look exactly like before (report-123abc.html), except now they're automatically stable and deterministic! Such truncation is, of course, entirely cosmetic and absolutely unnecessary but I think it's pretty cool.
The flag -analyzer-config stable-report-filename=true now becomes redundant because reports are stable anyway (in fact, they weren't actually stable before the patch even with the flag, because they depended on the race between clang invocations to emit the reports; I changed it to include a snippet of the issue hash as well instead of the race-y index). That said, I'm conflicted about removing the flag entirely because it also produces more verbose/readable filenames that people seem to enjoy. I think we should enable it by default instead, as soon as we make sure it doesn't produce extremely long filenames.
scan-build --generate-index-only no longer does deduplication, as seen on the updated test "rebuild_index". If deduplication is desired, it can typically be achieved with a simple
find -name '*.html' -exec mv "{}" . \;
I don't frequently see stringstreams in the codebase.
Why don't you use the llvm alternative?