This is an archive of the discontinued LLVM Phabricator instance.

Add an option -analyzer-config stable-report-filename=true to produce more stable file names of scan-build reports.
ClosedPublic

Authored by sylvestre.ledru on May 14 2014, 6:42 AM.

Details

Summary

With the option '-analyzer-config stable-report-filename=true', instead of report-XXXXXX.html, generate report-<filename>-<function, method name>-<function position>-<id>.html. (id = i++ for several issues found in the same function/method)

Diff Detail

Event Timeline

sylvestre.ledru retitled this revision from to Update the file names of scan-build reports..
sylvestre.ledru updated this object.
sylvestre.ledru edited the test plan for this revision. (Show Details)
Jean-Daniel edited edge metadata.May 15 2014, 12:55 AM

Couldn't the "AnalyzerOptions" member be declared const ?

sylvestre.ledru updated this object.
sylvestre.ledru edited edge metadata.

Add const to the AnalyzerOptions declaration

Fix a mistake with arc.

jordan_rose edited edge metadata.May 15 2014, 8:07 AM

I'm not sure this is that much better—do we guarantee that diagnostics are always reported in the same order between two runs? I'm not sure we do. And there's also a race here if two analyzer processes are analyzing two files with the same name, or if they end up reporting issues in a common header.

I'd be okay with putting the file and declaration name in before the random string, though.

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
79

Might as well be const from here all the way down.

I agree, this patch is not perfect. I do not guarantee that the file name will remain the same....
However, the generated file is "source name + function/method name", in most of the cases, we won't have too many issues in the same function.
Maybe I could add the type of error in the filename. This will decrease the risk of collision (but not fix it completely)

I'd be okay with putting the file and declaration name in before the random string, though.

That would not fix my problem. I am running daily scan-build on LLVM/Clang (http://buildd-clang.debian.net/scan-build/) + Firefox but I cannot share the URL with others because they are not stable enough (because of the random aspect)...

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
79

Right, I will fix that.

If you want, I can keep the current behavior and instead add a option --no-random-URL or -no-random-filename-report. That won't impact scan-build and you will make me (and my users) happy :p

I think my point is that your URLs still won't be stable unless your build is non-parallel and the analyzer is deterministic, but that is a good use case.

If the analyzer is non-deterministic in the output order of bugs from a single TU, I think we want to fix that, but if the source is changing, you could introduce bugs in between other bugs and shift the numbers down. Maybe you could put the line number in as well?

I'm not quite sure how to stop multiple TUs from reporting the same bugs. For that you'd need to use some kind of issue hash, like CmpRuns.py does.

OK. I was expecting clang analyzer to process them in the same order of reading... Sorry! :)

I will add the line number to the file name.

sylvestre.ledru edited edge metadata.

With the line number + the const propagated

It's still possible to have problems with this (if there are multiple issues occurring on the same line), but mostly I'd think you wouldn't want to use the line number as a stable URL. What if someone inserts a comment at the top of the file?

(This is why the hash in the plist output is based on "lines from the start of the function body", when possible. Definitely not perfect either, but slightly more resilient.)

I guess a lot of this depends on what you are using this for. Do you want report URLs to stay stable even when the source code changes? Do you want to try to deduplicate when different main files produce the same diagnostic? Do you want to be able to handle multiple reports on the same line?

(One way to handle the last is to include the checker name in the report somehow, possibly via hash. It's less likely that the same line will contain two issues reported by the same checker, though not impossible.)

Also, if you are not already, please check that scan-view continues to work as well as scan-build. I don't foresee any problems since you're keeping the "report-" prefix, but just in case.

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
281

declName may contain non-valid URL characters. The name of an overloaded operator function is the operator. Objective-C methods probably don't include the parent class in the name, but they could: -[FooBarClass fooBar].

I'd think you wouldn't want to use the line number as a stable URL. What if someone inserts a comment at the top of the file?

That is why I didn't add the line at the first try (I added it because you asked :) or maybe you were talking about "lines from the start of the function body"

Do you want report URLs to stay stable even when the source code changes?

Yes, at least for a few weeks until a bug is fixed and the report disappears.

Do you want to try to deduplicate when different main files produce the same diagnostic? Do you want to be able to handle multiple reports on the same line?

It is not my current goals but maybe during the GSoC.

Also, if you are not already, please check that scan-view continues to work as well as scan-build. I don't foresee any problems since you're keeping the "report-" prefix, but just in case.

Yep, I did. It works fine!

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
281

OK. I am not familiar with Objective-C. I will have a look on how improve the filename.

I'd think you wouldn't want to use the line number as a stable URL. What if someone inserts a comment at the top of the file?

That is why I didn't add the line at the first try (I added it because you asked :) or maybe you were talking about "lines from the start of the function body"

Yeah, I'm not sure what to say about that. Sequentially numbering bugs is a problem anyway if someone else introduces a new bug. I guess you'll have to pick your poison.

I'd like to keep this new behavior off by default for now until either Ted or Anna decides that it's enough superior to the current behavior that it's worth changing, but I think it's good enough to take under a flag now, once you've decided which problem you're going to live with.

OK. I will implement these changes!

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
281

Do you know if we have a method for this or should I implement it?

jordan_rose added inline comments.May 22 2014, 9:57 AM
lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
281

To do what, mangle an arbitrary string into an identifier, or URL-safe string? Sorry, I can't remember seeing one, though there's a lot in llvm/Support and llvm/ADT that I eventually discover. You could also just leave the special characters in and make sure they're escaped when scan-build references them by URL.

sylvestre.ledru retitled this revision from Update the file names of scan-build reports. to Add an option --stable-report-filename to produce more stable file names of scan-build reports..
sylvestre.ledru updated this object.

Can I commit that ? :)

Jordan, is that OK with you? Thanks

Sorry, rather than adding a new option to Clang, please use the existing -analyzer-option API.

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
73

Nitpick: The trailing _ isn't very LLVMish. Better to just shorten the input parameter to something like "Opts".

133

Why not SmallString here?

existing -analyzer-option API. I cannot find a reference to this.
Do you mean using the same mechanism as -analyzer-eagerly-assume (for example) or using -analyzer-config ?

If you are talking about the first, where is it defined? Thanks

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
73

_ removed. I will keep AnalyzerOpts if you don't mind, as consistency with PlistDiagnostics.cpp

133

Same here. As consistency with PlistDiagnostics in which a std::string is used. I don't mind updating the two files if you prefer.

Yes, sorry, -analyzer-config, not -analyzer-option.

(There's been a lot going on these past two weeks.)

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
133

Yes, please. No need to incur extra random heap allocations, even if it doesn't make such a difference in this case.

sylvestre.ledru retitled this revision from Add an option --stable-report-filename to produce more stable file names of scan-build reports. to Add an option -analyzer-config stable-report-filename=true to produce more stable file names of scan-build reports..
sylvestre.ledru updated this object.

Uses -analyzer-config (way better indeed)
Uses SmallString

This is looking good! Thanks for going through these revisions. One more major change, to avoid stomping on the same file.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
365–369

Please note which is the default here. Also, you're missing a period on the first line.

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
132–133

Funny indentation here.

289–301

This is race-y: what if you have two analyzer instances running at once? You should use the F_Excl flag for llvm::sys::fs::openFileForWrite.

Also, please put spaces around =.

This is race-y: what if you have two analyzer instances running at once? You should use the F_Excl flag for llvm::sys::fs::openFileForWrite.

Are you sure this can happen ? I had the feeling that the analyzer is run a on a file as a whole and since I am writing a file with a file in the name, I cannot have collision.

If you confirm, I will fix that.

If the issue is in a header file, multiple implementation files might include that header file. Better safe than sorry, anyway.

Oh, headers, right! Thanks for confirmation.

Same player try again: With llvm::sys::fs::F_Excl ;)

No, just put the open() in the loop, and skip the exists() check. That's faster anyway—less disk access.

Uh, you should probably check that the error code is actually "the file already exists" rather than something completely unrelated, like "you don't have permission to write to this directory".

Sorry. Long week. I am doing dumb things :)

I updated the code to this:

EC = llvm::sys::fs::openFileForWrite(Model.str(), FD, llvm::sys::fs::OpenFlags::F_RW | llvm::sys::fs::F_Excl);
if (EC && EC != std::errc::file_exists) {
    llvm::errs() << "warning: could not create file '" << Model.str()
                 << "': " << EC.message() << '\n';
    return;
}

Work for me

jordan_rose accepted this revision.Jun 13 2014, 1:00 PM
jordan_rose edited edge metadata.

Okay, this one looks good. Last few comments, but no need for another round of review. Thanks, Sylvestre!

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
294

Inconsistent here about whether or not you want to say "OpenFlags". Also, remember the 80-column limit in LLVM.

lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
418

Ah, that's why. In this case the std::string is better because it avoids a copy.

In the long run we should be eliminating getAsString in favor of streaming to raw_ostreams, but that shouldn't block your patch.

This revision is now accepted and ready to land.Jun 13 2014, 1:00 PM

Commited as revision 210970

I implemented the 80 column change + backout of the smallstring change + remove of openflags.
Thanks again!