This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][scan-build-py] Do not remove `-m*` flags from compilation database.
ClosedPublic

Authored by rizsotto.mailinglist on Feb 20 2016, 6:34 PM.

Details

Summary

Got report from users, that filtering out -m* causes problem. (https://github.com/rizsotto/Bear/issues/115) This fix also remove the -W flags filtering, but that's intended too.

Diff Detail

Event Timeline

rizsotto.mailinglist retitled this revision from to [analyzer][scan-build-py] Do not remove `-m*` flags from compilation database..
rizsotto.mailinglist updated this object.
zaks.anna edited edge metadata.Feb 20 2016, 9:58 PM

This fix also remove the -W flags filtering, but that's intended too.

Why were those ignored before and why are you adding them back in?

  • -m* Perl implementation scan-build takes it in. i took it out, and got error reports that is actually needed for many cases.
  • -W* Perl implementation scan-build takes only the warning suppress flags -Wno-*, but says " We don't care about extra warnings, but we should suppress ones that we don't want to see." in the comment.

after this change the general policy will be to take in everything as it was. the exceptions are only these cases:

  • the ignored group comes from the Perl implementation. i personally don't see good reason to do that, but for backward compatibility (to pass the comparison test).
  • the preprocessor parameter group. (-MD and friends.) which are generating dependency files for makefiles. i got a report earlier when build scripts has it, and it caused duplicate entries for the same module in the compilation database. (one with these flags, one without.) when these flags are removed it fixes the duplicate entry problem.
  • the linker options. when the compiler is called to do compilation and linking, that shall create entry for the compilation database. (eg.: clang client.c -lapi -L../interface ) but only compiler flags are useful, so linker flags can be ignored.

the ignored group comes from the Perl implementation. i personally don't see good reason to do that, but
for backward compatibility (to pass the comparison test).

I am apprehensive about removing the options ignored by the Perl scan-build because I assume there were reasons to add them in the first place. (And that script went through a lot of user projects.) For example, some options are gcc-only and should not be passed to the clang compiler, but there are other reasons as well.

I am not sure why the extra warnings were ignored. The only reason I can think of is that ignoring warnings would bring down the compilation time; and make scan-build faster. Is there a reason to have those warnings on? Do we expect people to look at the warnings produced during analysis?

If we do want to make changes in the options that scan-build passes, we should make the changes to the Perl scan-build as well. That way both scan-builds are in sync and we will get more testing and less unexpected issues once we switch over. What do you think?

hey Anna, thanks for your reply!

I am apprehensive about removing the options ignored by the Perl scan-build because I assume there were reasons to add them in the first place. (And that script went through a lot of user projects.) For example, some options are gcc-only and should not be passed to the clang compiler, but there are other reasons as well.

would be glad to hear about those other reasons! i'll put into comment lines, and it'll be documented somewhere. anyway, could you point out from the list, which are the gcc only?

I am not sure why the extra warnings were ignored. The only reason I can think of is that ignoring warnings would bring down the compilation time; and make scan-build faster. Is there a reason to have those warnings on? Do we expect people to look at the warnings produced during analysis?

i come from a different angle. compilation database is not only for the static analyzer run. let's put everything into it, what was in the build process. give power to the people! :) (maybe it's an implementation problem, and filtering extra flags shall go to some place else, where the actual SA execution takes place.)

If we do want to make changes in the options that scan-build passes, we should make the changes to the Perl scan-build as well. That way both scan-builds are in sync and we will get more testing and less unexpected issues once we switch over. What do you think?

i think to keep the perl and python implementation in sync, is a big (and not so fruitful) effort...

  • i did compare the two implementation with SATestBuild.py tools, so i do consider it the required compatibility guaranteed. would abandon the perl one and focus to the new one.
  • the two implementation -which flags to take- is just way different: perl ignores by default and takes what it needs. python takes by default and ignores what does not needed.
  • the perl implementation is a long and hard to read for loop. when i had question related why it collects _linker_ flags, even Ted were forget the reasons. i consider the perl implementation is a guidance rather than a reference.

so, how to proceed from here? :) shall i split the argument filtering one for the compilation database, and the other for static analyzer run? i think that's a meaningful step to solve the problem of polluted compilation database and keep compatibility. (as to postpone the painful decision later. ;))

in the meanwhile you can provide some explanation which flags are ignored for what reason. could you do that for me?

i come from a different angle. compilation database is not only for the static analyzer run. let's put
everything into it, what was in the build process. give power to the people! :) (maybe it's an
implementation problem, and filtering extra flags shall go to some place else, where the actual SA
execution takes place.)

This is a very good point! Indeed, the filtering is specific to the static analyzer and should happen only when the compilation database is used for static analysis.

the perl implementation is a long and hard to read for loop. when i had question related why it collects
_linker_ flags, even Ted were forget the reasons. i consider the perl implementation is a guidance rather
than a reference.

I do not know why certain options are ignored and I wish it was documented better. The only way to figure that out now is to research each one of them. However, I think it's dangerous to remove the ones we are not sure about; they were added for a reason. For example, I added this one based on a user bug report:

Author: Anna Zaks <ganna@apple.com>
Date:   Fri Jan 6 01:54:02 2012 +0000
   [analyzer] Skip --serialize-diagnostic when running scan-build.    
   Otherwise, the analyzer will try to analyze the serialized diagnostic
   file as if it were a source file.

I think to keep the perl and python implementation in sync, is a big (and not so fruitful) effort... would
abandon the perl one and focus to the new one.

I think we cannot drop the old scan-build until we start releasing the python scan-build and convert a certain amount of users to it. Until that happens, we need to keep them in sync. We do not need to implement new features in the Perl scan-build, but when we are making small changes that could introduce subtle bugs (such as tweaking the options passed), both should be updated. That way, we will get more testing for those options and make the transition easier.

Here are the action items for releasing the Python scan-build that I am aware of. Many of them are blocked on our team:

  • Fix the xcodebuild issue on the Mac.
  • Tweak the English in help/documentation.
  • Add more projects to the analyzer builedbot and start testing both scan-builds with it; making sure they produce the same analyzer results.
  • Update the scripts to include the new scan-build in the static analyzer distribution on the analyzer website.

We should also build the new scan-build (or bear) as part cmake and integrate it's tests into lit.

dcoughlin edited edge metadata.Feb 24 2016, 12:04 PM

i think to keep the perl and python implementation in sync, is a big (and not so fruitful) effort...

  • i did compare the two implementation with SATestBuild.py tools, so i do consider it the required compatibility guaranteed. would abandon the perl one and focus to the new one.

We can't in good conscience tell our users to switch to scan-build-py until we know that it works for them! This includes Windows (where it is untested), Darwin (where we know it doesn't work yet with xcodebuild), and Linux (including folks who use clang for static analysis but gcc for compilation). Until scan-build-py supports these environments we can't abandon the perl implementation and do need to make parallel bug fixes in each when they are discovered. This *is* a burden -- but until the scan-build-py initial implementation is complete we have to do it.

While scan-build-py is still being finished, we should at a minimum set up a build bot to compare the results of the perl and python implementations to guarantee continued compatibility between the two versions and catch unintended regressions. For Darwin, we can use the green dragon buildbot http://lab.llvm.org:8080/green/job/StaticAnalyzerBenchmarks/ but ideally this would be done for Linux and Windows as well. Once we have compatibility in these environments we can drop the perl implementation.

It would also be good to add lit-based tests for key new bug fixes (dropping flags, etc.) and make sure these are tested as part of the normal regression tests for scan-build and scan-build-py.

rizsotto.mailinglist edited edge metadata.

dear Devin and Anna,

i've updated the patch. it might be bigger than it suppose to be, and might need to change the ticket title. ;)

  • it's mostly about the split of the flag filtering (one for the compilation database, one for the SA run.)
  • added a few test cases around it.
  • renamed the command module to compilation since it encapsulate the knowledge about a compilation command.
  • the runer module, which is responsible to run the SA is also restructured. the flow got simplified, the set_analyzer_output and create_commands method are gone. the steps which are taken to run the SA are more clear.
  • cleaned up the functional test cases. (no imports from parent directories.)

about the backward compatibility: this change shall not break any compatibilities with the perl version. if it does, that's not intended.

about the remaining tasks to release scan-build-py.

  • started to rewrite the functional tests onto lit.
    • my plan to keep the unit tests as python unit tests, but make a lit hook to run them as part of the check.
    • also want to run pep8 against the sources with a lit hook.
  • can set up a build bot (probably as a docker image) to run the SATestBuild on linux against different sources.
    • found the SATest* hard to use with a build bot. (does stop at the first project which differs, instead of going through all.) will solve it anyhow.
    • would you run the docker image as part of your build infrastructure?

regards,
Laszlo

This looks good to me, although I think it is important to explain, at a high level, why we're ignoring flags when we do so. This doesn't have to be done on a per-flag basis. It is enough to say "We ignore linker flags here because ...". In some cases it may be just for compatibility with the perl scan-build -- if so, we can say that. Writing down this justification will help future maintainers decide whether a flag should be ignored or not.

I ran this on an internal suite of non-Xcode projects and on those projects, at least, it has the same behavior as the perl scan-build. Thank you for making it compatible!

  • my plan to keep the unit tests as python unit tests, but make a lit hook to run them as part of the check.

This sounds great to me. I think adding some end-to-end tests should also be an eventual goal. We can use these end-to-end tests to qualify scan-build-py when we replace the perl version.

  • also want to run pep8 against the sources with a lit hook.

This sounds reasonable, but we should make sure that you don't need pep8 to run the normal tests so we don't add an additional dependency for testing scan-build.

can set up a build bot (probably as a docker image) to run the SATestBuild on linux against different sources.
found the SATest* hard to use with a build bot. (does stop at the first project which differs, instead of going through all.) will solve it anyhow.
would you run the docker image as part of your build infrastructure?

Our (Apple's) bot infrastructure http://lab.llvm.org:8080/green/ is Darwin-only. However, various groups provide buildslaves on different platforms for the LLVM Buildbot Infrastructure http://llvm.org/docs/HowToAddABuilder.html We could ask on cfe-dev whether they would be willing to run scan-build tests.

tools/scan-build-py/libscanbuild/compilation.py
15 ↗(On Diff #50559)

I think it is important to explain here why this module removes these flags. For example:

# Linker options are ignored because <insert justification here>.

This is important to explain so that when someone is teaching scan-build about a new flag they can decide whether it should be included here.

49 ↗(On Diff #50559)

split() is also the name of the method to separate a string, right?

tools/scan-build-py/libscanbuild/runner.py
23 ↗(On Diff #50559)

I think it is import to explain why and how this is different than IGNORED_FLAGS in compilation.py.

27 ↗(On Diff #50559)

You can add note saying that these flags are ignored for compatibility with the perl implementation of scan-build and may not be needed any more.

rizsotto.mailinglist marked 4 inline comments as done.Apr 16 2016, 8:35 AM

thanks Devin for the review. your comments are addressed in the new commit.

dcoughlin accepted this revision.Apr 18 2016, 11:40 AM
dcoughlin edited edge metadata.
This revision is now accepted and ready to land.Apr 18 2016, 11:40 AM
Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in r266726.