This is an archive of the discontinued LLVM Phabricator instance.

Add scan-build python implementation
ClosedPublic

Authored by rizsotto.mailinglist on May 8 2015, 4:25 AM.

Details

Summary

it was mentioned on the static analyzer's open project page, there is a desire to rewrite the existing Perl implementation to Python. beside that there were also a point "Do a better job interposing on a compilation" which I did prototype and test a solution.

here are the major characteristic of the rewrite:

  • it works/tested on python 2.7, 3.x,
  • it has no dependencies other than standard python modules,
  • it differs from the current perl implementation in functionality,
    • does not generate not-used-attribute files, since analyzer does not report it;
    • does not generate parser reject report, since analyzer does not report it;
    • does not check '-isysroot' uniqness;
    • does execute 'clang' binary only if it needed (one time less than current perl);
    • does not copy pre-compiled header (.ghc) files into report directory;
    • does create compilation database during the analysis;
    • does cover build systems which do not respect 'CC' or 'CXX' environment variables;
    • analyzer is run only after the compilation database were created (not during the build);
    • does filter requested sub-directories from the final report;
    • the invocation of the script changed a little (tried to minimize the change as much as i could)
  • it was tested on Linux, FreeBSD and OS X,
  • it has functional and unit tests around,
  • it is a self contained python package,
    • it's decomposed into multiple modules;
  • it has more documentation inside than the original implementation.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Will it be installed with rest of Clang?

Other useful extension will be support for Clang-tidy and Include What You Use. As workaround, I create compiler wrappers scripts which I pass via CC and CXX.

I think will be nice to have possibility to control output content.

When I run scan-build with all checkers enabled on my project, I got ~ 100 Gb of HTML mostly because of deadcode.DeadStores. My solution was to create CC and CXX wrappers which run Clang-tidy with Analyzer only checkers (producing messages with relevant line only).

I think will be nice to have possibility to control output content.

When I run scan-build with all checkers enabled on my project, I got ~ 100 Gb of HTML mostly because of deadcode.DeadStores. My solution was to create CC and CXX wrappers which run Clang-tidy with Analyzer only checkers (producing messages with relevant line only).

Sorry Eugene, I don't see your point here. You can enable/disable checkers with command line flags. (So you can add --disable-checker deadcode.DeadStores to scan-build without going into the wrapper business.) Anyway, these patches suppose to create a new implementation of scan-build. How is related to your problem?

I think will be nice to have possibility to control output content.

When I run scan-build with all checkers enabled on my project, I got ~ 100 Gb of HTML mostly because of deadcode.DeadStores. My solution was to create CC and CXX wrappers which run Clang-tidy with Analyzer only checkers (producing messages with relevant line only).

Sorry Eugene, I don't see your point here. You can enable/disable checkers with command line flags. (So you can add --disable-checker deadcode.DeadStores to scan-build without going into the wrapper business.) Anyway, these patches suppose to create a new implementation of scan-build. How is related to your problem?

I just wanted new script to be better then previous implementation.

I don't think this is the place to discuss about that. Please use the cfe mailing list

zaks.anna edited edge metadata.Jun 26 2015, 10:22 AM

I think will be nice to have possibility to control output content.
When I run scan-build with all checkers enabled on my project, I got ~ 100 Gb of HTML mostly because of deadcode.DeadStores. My solution was to create
CC and CXX wrappers which run Clang-tidy with Analyzer only checkers (producing messages with relevant line only).

Eugene,

These are very valid points and should be considered for future improvements to scan-build. However, these are outside of the scope of this initial patch, which is to match the existing functionality and allow for better build system interposition.

The bloat you get with the HTML output is a known limitation; for example, we produce one HTML file per bug report instead of sharing the same HTML between several bug reports that get reported on the same file. One thing in keep in mind is that the static analyzer contains many path-sensitive checks. When those are reported, you get a pull path through your code, where the problem occurs. You will not get the same experience when consuming reports through clang tidy.

I'm wondering about the status of this, since I've not heard much in the past few months. Is this patch still progressing? (I hope so, I would really love to see us drop our reliance on Perl!)

rizsotto.mailinglist edited edge metadata.

findings from compare from older Perl implementation

the latest update has the following improvements:

  • makes it work without installation.
  • pass the SATestBuild.py test harness against different projects with --strictness 2 arguments. (coreutils-8.23, gawk-4.1.1, make-4.0, openssl-1.0.0s, patch-2.7.4, tar-1.28) these minor adjustments were needed.
    • get silents when it runs against configure or autogen.
    • implements --override-compiler which enforce to use compiler wrappers.
    • a couple of minor fixes to get rid of the differences.
  • small number of improvement for compilation database generation (mainly shell escaping when that's needed). which were learnt form my other project Bear.

Would you mind re-uploading this patch as a diff against upstream trunk with full context?

Would you mind re-uploading this patch as a diff against upstream trunk with full context?

i'm not sure i do understand what do you ask. i wish i could upload these changes as a single patch, but don't know is that possible?

If you you're familiar with git, I'm asking you to: squash, rebase, and git diff -U999, then upload the resulting patch.

If you use svn, I'm asking you to: svn diff -r$(FirstCommit):$(LastCommit) --diff-cmd=diff -x -U999, and upload that.

The reason to squash all your patches together is that partial diffs from a patch series don't work well with Phabricator's "Revision Update History" tool. Also, it makes it hard to pull out a patch that I can apply on my local tree to try it out.

The reason to squash all your patches together is that partial diffs from a patch series don't work well with Phabricator's "Revision Update History" tool. Also, it makes it hard to pull out a patch that I can apply on my local tree to try it out.

thanks @jroelofs, will take a try on the weekend... my thought was that it might be more confusing to send squashed commits, since i can't remove the old patches... anyway, here you can find its own repository, till it's not merged to clang... https://github.com/rizsotto/Beye

dkrupp added a subscriber: dkrupp.Nov 16 2015, 5:01 AM

update with full context.

Thanks for re-uploading!

tools/scan-build-py/README.md
40

Mind adding a CMakeLists.txt to drive these from the clang build itself?

tools/scan-build-py/libscanbuild/__init__.py
8

I think most of this block comment belongs in a new file: clang/docs/ScanBuild.rst

tools/scan-build-py/tests/functional/src/build/Makefile
2

It'd probably be a good idea to structure these test inputs into their own "projects" so that support for more build system can be added later.

I'd suggest something like:

tools/scan-build-py/tests/functional/
    simple_makefile
        src
            clean-one.c
            broken-two.c
            clean-one.c
            clean-two.c
        build
            Makefile
    cmake_makefiles
        src
            ...
        build
            CMakeLists.txt
    cmake_ninja
        src
            ...
        build
            CMakeLists.txt
tools/scan-build-py/tests/unit/__init__.py
17

Mind hooking these up so that a LIT test harness can run them too? We use LIT to test pretty much everything else in the llvm project, and it'd be a shame to have a tool with a completely different way of running tests...

Thanks for re-uploading!

thanks for your comments! :)

most of your comments are about to embed it more into the clang build, test infrastructure. i think these are valid points!

i would like to have these code in the repository first and to do -the work you've proposed- afterwards. i might need help for some of those too. (like using lint to run python unit tests.)

jroelofs accepted this revision.Nov 18 2015, 12:38 PM
jroelofs added a reviewer: jroelofs.

Sounds reasonable.... and I can help with setting up the CMake & LIT goop once you've got it committed.

FWIW, I was able to get a simple example working by doing:

$ mkdir -p $build/lib/python2.7/site-packages
$ python setup.py build
$ PYTHONPATH=$build/lib/python2.7/site-packages python setup.py install --prefix=$build
$ cd path/to/example
$ PYTHONPATH=$build/lib/python2.7/site-packages $build/bin/scan-build --override-compiler --use-cc $build/bin/clang --use-analyzer $build/bin/clang make

I wasn't able to get the libear interceptor to work on my Darwin machine, but that can be debugged later.

This revision is now accepted and ready to land.Nov 18 2015, 12:38 PM

Hi Laszlo, thanks for the update!

Some high-level questions/comments (and various small things I noticed inline).

  • I tried running scan-build in interposition mode (i.e., uncommenting out #"$SCRIPT_DIR/analyze-build" $@ in scan-build) and got python compile errors. When you did your testing to compare output with the old scan-build, did you use this mode?
  • When I run scan-build in intercept-build mode on openssl-1.0 on OS X I get compile errors. Is this one of the projects you tested with? When I run in it in analyze-build mode I get diffs with CmpRuns.py -- it looks like might be because some paths are absolute. What kind of fidelity with the old scan-build do expect at this point?
  • What is the point of intercept-cc/intercept-c++? I thought libear didn't need to set CC/CXX environmental variables.
  • Do all the files in this patch need to be committed into the clang repo? There seem to be some extraneous files (I've asked inline about the ones that don't seem necessary.)
  • I found it difficult to understand how all the modules and command-line tools fit together. Can the module names be made more harmonious with the command-line tool names(the scripts in bin). It is not at all obvious which modules belong to which tools, etc. I think it would be good to document this in the code, as well.
tools/scan-build-py/.travis.yml
1 ↗(On Diff #40276)

Is this file needed in clang trunk?

tools/scan-build-py/CHANGES.txt
1 ↗(On Diff #40276)

Is this one needed too?

tools/scan-build-py/MANIFEST.in
1 ↗(On Diff #40276)

How about this one? Is it needed in clang trunk?

tools/scan-build-py/README.md
71

This should probably point to llvm bugzilla. https://llvm.org/bugs/enter_bug.cgi?product=clang

tools/scan-build-py/bin/scan-build
14

These should be controlled by flags rather than by commenting the required behavior in or out.

I also think analyze-build should be the default behavior, to maintain compatibility with the existing scan-build.

tools/scan-build-py/libear/__init__.py
2

How does this file fit into the overall build picture? Will this file go away once scan-build-py is built with the common clang cmake?

80

Maybe "Toolset" would be a more descriptive name than "Context"?

100

I gather the intent is that subclasses will override to provide their own versions of these methods? If so, these methods need to be documented so that people adding new support for additional platforms know what they should provide in their subclasses.

If there are reasonable defaults (for example., [] for dl_libraries), those should be returned here rather than pass. If not, these should probably raise an exception indicating they must be implemented rather than silently doing nothing.

152

Why is this a generator? There is no functionality after yield.

167

What does this do? Why is it hard-coded?

225

Why is this a generator?

263

Why is this a generator?

tools/scan-build-py/libscanbuild/clang.py
34

I think it would be more accurate to say that this method returns the front-end invocation that would be executed as a result of the given driver invocation.

tools/scan-build-py/libscanbuild/command.py
21

I think it would be good to document the keys and meaning of the returned dictionary. Or perhaps it would be better represented as class?

24

I think it would good to document what the value in this mapping means (number of expected parameters). I realize ccc-analyzer in the original scan-build is similarly un-documented, but we should do better here!

Also: should this include all the arguments IgnoredOptionMap in ccc-analyzer? It is missing `-u' and adds '-g'. Or are these changes intentional?

99

Typo "fille".

124

This method name should probably start with "is_"

tools/scan-build-py/libscanbuild/driver.py
1 ↗(On Diff #40276)

Why is this file called "driver"?

34 ↗(On Diff #40276)

Should this be 'intercept-build'?

67 ↗(On Diff #40276)

I think this error message can be improved. Perhaps "Unexpected error running intercept-build"?

105 ↗(On Diff #40276)

Typo "internt".

107 ↗(On Diff #40276)

This needs documentation.

tools/scan-build-py/libscanbuild/intercept.py
99

What is the purpose of setting CC/CXX environmental variables here? Doesn't libear intercept calls to the compiler? Why is intercept-cc/cxx needed?

tools/scan-build-py/libscanbuild/interposition.py
36 ↗(On Diff #40276)

Trying to access target_dir.name here causes a python error.

94 ↗(On Diff #40276)

It would be good to document that this is essentially the "main" of analyze-cc and document what it does.

tools/scan-build-py/libscanbuild/runner.py
22

This needs documentation on what opts can/should contain.

64

If it is hard to make sure that opts has the right dictionary keys, maybe it might be better to create a value class with methods?

114

What is the point of "continuation"?

tools/scan-build-py/libscanbuild/shell.py
2

I'm surprised there is not a library to do this.

Also, maybe a better name for this module is "shell_escaping".

tools/scan-build-py/setup.py
1 ↗(On Diff #40276)

Is this needed in the clang repo?

rizsotto.mailinglist marked 13 inline comments as done.Nov 20 2015, 8:42 AM

thanks Devin for your comments. made some changes already (will upload it tonight after some tests).

tools/scan-build-py/CHANGES.txt
1 ↗(On Diff #40276)

in order to make this code a standalone python tool tool, we need this file. (see llvm/utils/lit directory for example.)

tools/scan-build-py/MANIFEST.in
1 ↗(On Diff #40276)

in order to make this code a standalone python tool tool, we need this file. (see llvm/utils/lit directory for example.)

tools/scan-build-py/libear/__init__.py
2

this is quiet confusing me. previously you were asking make it work without installation. this file makes it possible to compile the ear library compiled before the build runs to use as preloaded library. the thing which is not needed is the CMakefile actually.

100

now rise NotImplementedError runtime exception.

167

this is added to mimic cmake behaviour. it is used in the config.h.in file.

tools/scan-build-py/libscanbuild/command.py
21

now documented when create the return value. (creating class would not bring much to the kitchen i think.)

24

-u is part of ignored linker flags. (see a few line above)
-g is added to mimic the ccc-analyzer results.
comment about key, value is added.

tools/scan-build-py/libscanbuild/driver.py
1 ↗(On Diff #40276)

any recommendation? it was the only entry point before the interposition was introduced. so it was the driver of the libscanbuild library.

34 ↗(On Diff #40276)

can be anything, but would make it rhyme with the module name...

67 ↗(On Diff #40276)

this line is printed as:

intercept-build: ERROR: Something unexpected had happened.
(and the stack-trace)

because the logging formating. so, 'intercept-build' and 'error' will be part of the message anyway.

tools/scan-build-py/libscanbuild/intercept.py
99

documentation added. this branch is used when preload does not work. (eg.: windows builds or failed to compile libear for unknown reasons.)

tools/scan-build-py/libscanbuild/runner.py
64

require decorator added to ensure the keys. (create class would be bloat in my view.)

114

this module use continuation to pass the current state and the method to call. i find this abstraction much better than create class methods (or a single method as it in the Perl implementation). Specially better if you want to test these methods!

tools/scan-build-py/libscanbuild/shell.py
2

the system library package called shlex and it has only the decode method. and failed on many of the real life tests. (in previous version i used that.)

i don't find shell_escaping a _better_ name. specially when i think how the method names would be: shell_escaping.encode. i think shell.encode is more descriptive.

tools/scan-build-py/setup.py
1 ↗(On Diff #40276)

in order to make this code a standalone python tool tool, we need this file. (see llvm/utils/lit directory for example.)

jroelofs added inline comments.Nov 20 2015, 8:59 AM
tools/scan-build-py/libear/__init__.py
2

I think the best way forward would be to teach the CMake build how to run setup.py. Then this would work both with and without installation, and it'd use the same code paths for both.

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

Is there a pythonic way of doing llvm crash handlers? I.e. the "here's the steps to reproduce this, a stack trace, and a bug report url" things that clang spits out.

Thanks Laszlo!

Is there a more descriptive name than "intercept-build" (I realize scan-build is pretty general too). It seems to me the point of the intercept-build tool is to generate the compilation database. I think it would be helpful if the tool name indicated that rather than the *method* used to to generate the database. I don't have any great suggestions: "compilation-database-build", "log-build", "log-compilation-build", ...

A couple more comments inline.

tools/scan-build-py/MANIFEST.in
1 ↗(On Diff #40276)

Can you explain why this is needed? We didn't need one for the perl scan-build nor for SATestBuild.py/SATestAdd.py?

A difference between lit and scan-build is that scan-build is not a standalone tool. Are you envisioning users installing scan-build without installing clang?

tools/scan-build-py/libear/__init__.py
2

previously you were asking make it work without installation.

Yes. It is very important to support that workflow since many users have multiple versions of clang on their systems at the same time.

this file makes it possible to compile the ear library compiled before the build runs to use as preloaded library.

Shouldn't this be done as part of the build process and not as part of installation?

Can the interception library (eventually) be built as part of the global CMake build? It seems quite fragile to have custom build logic in a python file.

2

jroelofs wrote:

I think the best way forward would be to teach the CMake build how to run setup.py

What is the benefit of running setup.py and treating scan-build as a python package? It seems to me that the fact that scan-build is written in python should be an implementation detail. Most importantly, installing scan-build shouldn't muck up the user's python environment.

In my opinion, it would be more user-friendly to install into the new scan-build install hierarchy that jroelofs recently added. This way all the scan-build stuff is under one directory that the user can blow away rather than hidden deep in the bowels of some python environment. This makes it much easier to ensure that scan-build and clang are in sync and would prevent the user from accidentally installing scan-build into a virtualenv (which doesn't make sense because clang doesn't live there).

tools/scan-build-py/libscanbuild/driver.py
1 ↗(On Diff #40276)

How about naming the files containing entry points after the name of the command-line tool that that invokes the entry point? Or moving the entry-point code to the command-line tool so only shared code is in modules? As it stands, it is hard to tell what code is called by what tool.

tools/scan-build-py/libscanbuild/intercept.py
99

Is there an indication to the user that preload didn't work? Do you think there should be one? (Or is this an implementation detail that users do not need to be aware of?)

tools/scan-build-py/libscanbuild/runner.py
114

It would be better to decompose larger tasks into individually testable subtasks and test the output of each of these individually. There is no need for continuations here. Please remove them.

tools/scan-build-py/libscanbuild/shell.py
2

I found it confusing that the shell module didn't call the shell. You module comment says: "This module implements basic shell escaping/unescaping methods." Please use a name for the file that communicates that intent. It doesn't haven to be "shell_escaping", but "shell" is not descriptive enough. This is important so that someone who is unfamiliar with the code base can quickly tell whether the file is relevant to the task at hand.

rizsotto.mailinglist marked 3 inline comments as done.Dec 1 2015, 12:41 PM

updated comments before upload a new diff.

tools/scan-build-py/MANIFEST.in
1 ↗(On Diff #40276)

yes, did set it up already. see https://pypi.python.org/pypi/scan-build

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

this crash/exception is not a clang crash. it's more like this program's fault. clang crash reports are recorded already in crash report files (and linked into the html report file).

tools/scan-build-py/libscanbuild/intercept.py
99

i think the user shall be notified when he requests it (emit debug level log now). but otherwise shall be hidden how this thing is working internally. (plenty of documentation explain internals in case if user want to know more.)

rizsotto.mailinglist edited edge metadata.

update to the latest comments. notable changes:

  • executables have distinct function now
  • modules are renamed to fit with the executable names
  • interposition of analyzer execution is merged into a single executable
  • documentation update
  • more tests were run against:
    • tested independent python package installation
    • tested on OSX and FreeBSD too
    • tested against different build systems
    • tests were added to check different interception modes

Thanks Laszlo.

I'm still not convinced that all the python package stuff is needed because scan-build is distributed with clang (see my question inline).

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

tools/scan-build-py/MANIFEST.in
2 ↗(On Diff #41556)

I see that at https://pypi.python.org/pypi/scan-build it lists clang as a prerequisite. But since scan-build-py will now be distributed as part of clang I don't understand what the point of the standalone tool is. Can you explain why this is needed?

tools/scan-build-py/libscanbuild/analyze.py
132

Typo: 'intrepose'

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

Maybe this should ask the user to file a bug against scan-build? Can it point to the bugzilla and tell the user what information, files, etc. they should attach to the bug report to help us reproduce the problem? Just saying "Something unexpected happened" is not as user-friendly as it could be because it does not tell the user that the problem is not their fault nor what they should do about it.

tools/scan-build-py/libscanbuild/runner.py
22

What I was asking for here is documentation about the 'opts' parameter. Where does 'opts' come from? Is it json input from a compilation database? What keys should it contain, what is meaning of their values, etc.

It is important to at least document the expected structure of the dictionary. This documentation will help people understand your design and make sure they don't accidentally break some invariant when fixing bugs or adding features in the future.

jroelofs added inline comments.Dec 5 2015, 8:42 AM
tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

Oh, I don't mean to handle clang crashes... I mean for handling python crashes. I meant to draw an analogy to a feature that clang has: when it crashes, it prints a stack trace, and some steps to reproduce the problem. I was wondering if a similar thing existed in python, for when a python app crashes.

rizsotto.mailinglist marked an inline comment as done.Dec 9 2015, 8:07 AM

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

i do think differently. :) first, i think you can rename executables to whatever you like. previously i made this functionality in a tool called Bear. within 4 years got more than 100 bug reports, improvement ideas, but nobody did care about the executable name. second, i'm not a native speaker, but to me 'intercept compiler calls from build' can be shortened to 'intercept-build'. third, if i rename the executable from intercept-build, shall rename the intercept module too? and to be honest, the executable name scan-build just as bad as intercept-build. :) if i did not convinced you, give me a name and i will rename to it.

tools/scan-build-py/MANIFEST.in
2 ↗(On Diff #41556)

thought to have more distribution channel is better. but if that so confusing, i've deleted it.

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

good point. added some implementation. if you don't like the text, please suggest one instead and will replace.

tools/scan-build-py/libscanbuild/runner.py
23

the comment says exactly the value coming from the compilation database. anyway added more explanations. if you have something in mind, please write the text and will copy it.

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

... and to be honest, the executable name scan-build just as bad as intercept-build. :)

There are many things about the original scan-build that are bad. As you have seen, the perl-based scan-build is poorly named, poorly factored, insufficiently documented and poorly tested. This has made it very difficult to fix bugs, support new platforms, and extend scan-build's functionality -- and is exactly the reason why we are so excited about a scan-build reimplementation. With this reimplementation, we have the opportunity to do so much better! If we live up to the high standards in the rest of the clang/llvm codebase about naming, factoring, documentation, and testing then the new scan-build will be much easier to maintain and can serve as a solid foundation to add new features.

tools/scan-build-py/README.md
85

With respect to dynamic library-interposition still not working on OS X, the System Integrity Protection in OS X 10.11 should not prevent interposition on build tools so in theory intercept-build should work. I'll look into it.

tools/scan-build-py/libscanbuild/runner.py
24

Ok. If I create a patch with additional documentation for these fields, would you be willing to take a look at it to make sure the comments are correct?

Hi Laszlo,

Here are some comments from me.

Should we be worried about the name conflicts (between old scan-build and this tool) during rollout? I think it would be beneficial to rename the tools, but let's discuss the names later. (If we integrate Codecheck, that will affect naming too.)

The libear library should be built as part of clang cmake build; looks like @jroelofs offered to do this.
We should also integrate your tests into lit, which is the way we test all of the other tools in clang.

tools/scan-build-py/README.md
2

What's this? Please, remove.

7

I do not think you need to mention the old scan-build. I'd just describe the tool. Here is a slightly modified copy and paste from scan-build:

"A package designed to wrap a build so that all calls to gcc/clang are intercepted and logged into a compilation database [2] and/or piped to the clang static analyzer. Includes intercept-build tool, which logs the build, as well as scan-build tool, which logs the build and runs the clang static analyzer on it."

I have a bunch of comments about other parts of the documentation. However, I can rewrite parts of it as a separate commit. (It will be more efficient than going back and forth.)

60

This is a good place to point out how each mode can be activated. (Which parameters should be used.)

86

Would be good to find out which specific binaries used by the build we are not allowed to interpose on. It would be very unfortunate if this does not work with the System Integrity Protection feature, which is turned on by default.

115

Please, refer to the LICENSE.TXT file like you do in the other files.

tools/scan-build-py/bin/analyze-c++
2

What calls this script?

tools/scan-build-py/bin/analyze-cc
15

It is hard to figure out/search which functions actually get called from the top level tools. Could you rename all of the public functions so that they have unique names? For example, "wrapper" -> "scan_build_wrapper". This would greatly improve searchability!

tools/scan-build-py/libear/ear.c
407

It will be hard for the users to interpret these error messages; especially if the tool is widely distributed. (Can be addressed after the initial commit.)

tools/scan-build-py/libscanbuild/__init__.py
12

Not sure if this comment helps to understand the functionality, please, remove.

tools/scan-build-py/libscanbuild/analyze.py
56
tools/scan-build-py/libscanbuild/intercept.py
72

What does post_processing do? Looks like it allows to append to the compilation database and more.. Could you add a comment or rename the function?

288

Do you explain what the "filter" means somewhere visible to the user? When is filter useful and when it is not?

tools/scan-build-py/libscanbuild/report.py
7

"to generate the "cover" report" -> "to generate index.html for the report"

10

"a final HTML "cover" report" -> "index.html"

Hi Laszlo,

tools/scan-build-py/libear/ear.c
282

This is not the recommended way of interposing on Darwin. All you need to do is provide your function, which can call the function you are interposing on and tell the linker about it as described in this example:

http://www.opensource.apple.com/source/dyld/dyld-97.1/include/mach-o/dyld-interposing.h

You would only need to set DYLD_INSERT_LIBRARIES and would not need to require flat namespace. We should not be setting DYLD_FORCE_FLAT_NAMESPACE; some of the problems it causes are described on this thread:

https://mikeash.com/pyblog/friday-qa-2009-01-30-code-injection.html

compiler-rt (sanitizers) also intercept on all platforms including Windows (see interception.h); that code is very platform dependent.

rizsotto.mailinglist marked 7 inline comments as done.Dec 10 2015, 3:48 AM

thanks for the comments.

tools/scan-build-py/bin/analyze-c++
2

sorry, don't get the question

tools/scan-build-py/libscanbuild/runner.py
24

sure

tools/scan-build-py/libear/ear.c
282

okay, since i don't have access for OSX machines can you help me to make it right?

zaks.anna added inline comments.Dec 10 2015, 10:34 AM
tools/scan-build-py/bin/analyze-c++
2

Where/How is analyze-c++ used?

tools/scan-build-py/bin/analyze-cc
14

I believe this is an important point. Fixing this would greatly improve readability!

tools/scan-build-py/libear/ear.c
282

Ok. Will do.

rizsotto.mailinglist marked an inline comment as done.Dec 10 2015, 1:06 PM
rizsotto.mailinglist added inline comments.
tools/scan-build-py/bin/analyze-c++
2

this is the compiler wrapper which runs the real compiler + the static analyzer. (you guys were call it interposition mode) there is 'analyze-c++' and 'analyze-cc' for C++ and C compilers. it is used from the libscanbuild.analyze.main method.

dcoughlin added inline comments.Dec 10 2015, 1:13 PM
tools/scan-build-py/libear/ear.c
140

The issue I am seeing with library-interposition on OS X seems to be caused by eagerly stashing *_NSGetEnviron() in environ. This causes processes that modify their environment before execv'ing to drop the modified environments because interposition forwards execv to call_execve() with the stale environment.

To avoid this, you can replace all references to environ on Darwin with (*_NSGetEnviron()).

tools/scan-build-py/libear/ear.c
142

this call just sets up the environ variables for the bear_capture_env_t method, two lines down. that call saves the relevant variables... so, if before execv'ing somebody modify the variables we still have it saved. (by the way Scons is the tool which does this.) and when call_execve called it recovers from the saved environment.

there is a test against it. (tests/functional/cases/test_create_cdb.py:53)

The issue I am seeing with library-interposition on OS X [...]

can you give me details what did you see? what was the test you were running?

zaks.anna added inline comments.Dec 10 2015, 4:46 PM
tools/scan-build-py/README.md
87

This is very unfortunate!

We should call out that library interposition is "not supported on OS X (unless System Integrity Protection feature is turned off)" and return an error if people are trying to use it (and System Integrity Protection feature is turned on).

tools/scan-build-py/bin/analyze-build
18

Please rename 'main'.

tools/scan-build-py/bin/analyze-c++
4

I searched the code and did not see it being called. By looking back at the previous revision I see that that libscanbuild.analyze.main used to call 'analyze-cxx' not 'analyze-c++'. Looks like you've also fixed the same bug with 'intercept-c++'.

Is this something that could/would be caught by the tests?

tools/scan-build-py/bin/analyze-cc
15

Could you rename all of the public functions, not just this one? I am talking about the other wrapper and several main functions used as the entry points in the scripts.

tools/scan-build-py/bin/intercept-build
18

Please rename 'main'.

tools/scan-build-py/bin/intercept-c++
15

Please rename 'wrapper'.

tools/scan-build-py/bin/intercept-cc
15

Please rename 'wrapper'.

tools/scan-build-py/bin/scan-build
18

Please rename 'main'.

dcoughlin added inline comments.Dec 11 2015, 10:19 AM
tools/scan-build-py/README.md
86

My comment about SIP on 10.11 not preventing library interposition on build tools is not correct. My apologies for the bad information. SIP does not prevent DYLD_INSERT_LIBRARIES on the build tools themselves but it *does* prevent interposition on the build tool shims in /usr/bin. These are thin shims that call the appropriate tools elsewhere and it is these shims that typical build scripts call. SIP also prevents library interposition on perl/python/etc. -- which are commonly used to drive build scripts. (For example, openssl's build script uses make to call perl to call cc). I think on Darwin we should use your wrapper-based approach (intercept-c/intercept-c++) instead. In my tests on openssl it produces the same (modulo ordering of entries) compilation database as running with library-interposition on a machine with SIP turned off.

tools/scan-build-py/libear/ear.c
142

You also later use 'environ' in execv(). But at that point the environment stashed in environ is different than the environment given to children. So in execv() rather than using 'environ' you on Darwin you would need to use '*_NSGetEnviron()'. (The behavior of 'environ' on Darwin in dylibs is different than in main executables, which is why _NSGetEnviron() exists). In any event, if only support using wrapper-based interposition (intercept-build-cc, etc.) on Darwin then perhaps we don't need to deal with this.

there is a test against it. (tests/functional/cases/test_create_cdb.py:53)

Interesting. Have you run the tests on OS X? Can you tell me how to run them?

tools/scan-build-py/libscanbuild/intercept.py
144

I think it would be better to use the fall-back compiler-wrapper interposition on darwin. As I noted above (apologies again for the bad information) System Integrity Protection will make library-based interposition unreliable for build scripts that employ restricted executables.

rizsotto.mailinglist marked 6 inline comments as done.Dec 11 2015, 3:00 PM
rizsotto.mailinglist added inline comments.
tools/scan-build-py/README.md
87

comment changed. please hint me with some code how can the script detect SIP status. (or postpone the check implementation after the patch is accepted)

tools/scan-build-py/bin/analyze-c++
4

test harness contains only C files, that was the reason this could have gone unnoticed. added specific check for it.

previous upload is broken, sorry for that

tools/scan-build-py/libear/ear.c
143

to run the full test set

PATH=$(pwd)/bin:$PATH python -m unittest discover

to run that specific test

PATH=$(pwd)/bin:$PATH python -m unittest -v tests.functional.cases.test_create_cdb.CompilationDatabaseTest.test_successful_build_on_empty_env

to more about run tests

https://docs.python.org/2/library/unittest.html

143

my understanding on the _NSGetEnviron is, that it shall be used when the library is during the load process. later when the build process calls execv the load process is over, and environ variable is available. an earlier version of this code had a get_environ method, which were either return the environ variable or called the _NSGetEnviron. then i made this change and the tests were still passing, so i don't see where your issue is coming from. please tell me what kind of test you run against it to find it as problem. would like to add it to the test suite.

dcoughlin added inline comments.Dec 12 2015, 1:27 PM
tools/scan-build-py/libear/ear.c
143

Aaah, I had an ancient libscanbuild in my global site-packages, which seemed to cause unittest discover to not work.

When running the above on your latest patch on a machine without SIP I get.

test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest) ... ERROR

======================================================================
ERROR: test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/functional/cases/test_create_cdb.py", line 58, in test_successful_build_on_empty_env
    'env', '-'] + make)
  File "tests/functional/cases/__init__.py", line 38, in silent_check_call
    return subprocess.check_call(cmd, *args, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['intercept-build', '--cdb', '/var/folders/mq/46lsd1kx65v5702dzrkgzdlr0000gn/T/scan-build-test-cKXza1/cdb.json', 'env', '-', 'make', 'SRCDIR=tests/functional/src', 'OBJDIR=/var/folders/mq/46lsd1kx65v5702dzrkgzdlr0000gn/T/scan-build-test-cKXza1', '-f', 'tests/functional/src/build/Makefile', 'CC=clang', 'build_regular']' returned non-zero exit status 2

----------------------------------------------------------------------
Ran 1 test in 0.554s

FAILED (errors=1)

This goes away if you use my suggested *_NSGetEnviron() fix above.
After applying that fix, all but 6 test past on OS X without SIP. The remaining 6 failures are due to your use of mknod() in tests, which requires superuser privileges on OS X:

======================================================================
ERROR: test_interposition_cxx_works (tests.functional.cases.test_from_cmd.RunAnalyzerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Volumes/Data/Clangs/OpenSourceGit/clang/tools/scan-build-py/tests/functional/cases/test_from_cmd.py", line 102, in test_interposition_cxx_works
    self.compile_empty_source_file(tmpdir, True))
  File "/Volumes/Data/Clangs/OpenSourceGit/clang/tools/scan-build-py/tests/functional/cases/test_from_cmd.py", line 87, in compile_empty_source_file
    os.mknod(src_file)
OSError: [Errno 1] Operation not permitted

Is there a more portable way to create an empty file? Perhaps open a file for writing and then closing it?

With SIP I see different failures:

workzilla:scan-build-py dcoughlin$ PATH=$(pwd)/bin:$PATH python -m unittest -v tests.functional.cases.test_create_cdb.CompilationDatabaseTest.test_successful_build_on_empty_env
test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest) ... FAIL

======================================================================
FAIL: test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/functional/cases/test_create_cdb.py", line 60, in test_successful_build_on_empty_env
    self.assertEqual(5, self.count_entries(result))
AssertionError: 5 != 0

----------------------------------------------------------------------
Ran 1 test in 1.069s

FAILED (failures=1)

Running the entire test suite with SIP yields:

FFFFF...EEFFE..................FEEF.EE...............................................
...
Ran 85 tests in 15.688s

FAILED (failures=9, errors=7)

I tried changing intercept.py to always use compiler wrappers on Darwin, but some tests were still failing (tests.functional.cases.test_exec_anatomy.ExecAnatomyTest, tests.functional.cases.test_create_cdb.CompilationDatabaseTest). Do you expect these tests to pass with compiler wrappers? Do they pass for you with compiler wrappers? Or is this some Darwin-specific issue?

kimgr added a subscriber: kimgr.Dec 13 2015, 12:06 AM

Comment at: tools/scan-build-py/bin/analyze-cc:14
@@ +13,2 @@
+from libscanbuild.analyze import wrapper

+sys.exit(wrapper(False))

It is hard to figure out/search which functions actually get called from the top level tools. Could you rename all of the public functions so that they
have unique names? For example, "wrapper" -> "scan_build_wrapper". This would greatly improve searchability!

A nice pattern in Python is to avoid importing individual
functions/classes, and instead use the module name as a namespace,
e.g.

from libscanbuild import analyze
sys.exit(analyze.wrapper(False))

I don't know if that addresses your concern? I haven't looked at the
code in more detail.

  • Kim
tools/scan-build-py/libear/ear.c
143

thanks Devin, these are interesting findings. however i managed to access to an OS X machine today, have no chance to turn on-and-off the SIP (it was most probably off). and it passed all the tests without any change.

$ uname -v
Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64

anyway, i did replaced the mknod call. and also implemented a way to call _NSGetEnviron rather than access to environment variable. (tests are still passing on this machine.)

and the ExecAnatomyTest is about to capture the different exec calls by the libear library. would not make sense with compiler wrappers.

Thanks for adding bear_get_environment() to handle the environment weirdness on Darwin.

tools/scan-build-py/libscanbuild/intercept.py
147

Can you change this to default to compiler-wrapper interposition on Darwin with a command-line flag to force library-based interposition?

Library-based interposition will fail silently if SIP is enabled, so this should be detected when that flag is passed so the user is informed. You can detect whether SIP is enabled on Darwin by checking whether (1) there is a binary called 'csrutil' in the path and, if so, (2) whether the output of executing 'csrutil status' contains 'System Integrity Protection status: enabled'.

rgov added a subscriber: rgov.Dec 20 2015, 12:13 AM

I tried out intercept-build and found it very easy to use. I tried running it on projects that compile with both make and xcodebuild and it worked for both. Thanks for contributing this tool.

sorry for the delay, hard to get free time these days. ;)

tools/scan-build-py/libscanbuild/intercept.py
147

ok, implemented the SIP check. (also added SELinux check, which behaves the same as SIP.)

about the default behavior: scan-build command has the other compiler-wrapper (run compiler & analyzer) and flags can turn this module on. intercept-build default is the library-based and flag can turn the compiler-wrapper interposition on... i would keep it this way. (and also not make platform dependent switches into the code.)

Hi Laszlo,

I've run scan-build-py with both environment-variable-based and library-based interposition on our open source benchmark suite and it looks like it is in great shape on Darwin! There are still some remaining issues with xcodebuild, but I will help fix these after scan-build-py is committed.

Please apply for commit access (the instructions are at http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access), update the commit message, and commit.

Thanks for all your hard work on this!

tools/scan-build-py/libscanbuild/intercept.py
262

Great! Thanks for checking for this.

tools/scan-build-py/libscanbuild/runner.py
43

Thanks for documenting these.

dcoughlin accepted this revision.Jan 6 2016, 5:12 PM
dcoughlin added a reviewer: dcoughlin.

I like the standalone bear tool. should it really be embedded into clang? Will the standalone tool still be maintained?

tools/scan-build-py/libear/ear.c
2

why should these be added in these source files when no other file in LLVM/Clang has it.

Some LLVM/Clang coding practices review comments.

tools/scan-build-py/tests/functional/exec/main.c
116

You are assigning a const literal to a non-const pointer which is dangerous. It seems to me you should use 'const char *' instead. personally I like 'const char * const' better but as far as I know that is against normal coding practices in LLVM/Clang.

123

I would suggest that you remove the semicolon here and add it in the macro instead. the macro usage will be more "function-like" then.

129

according to llvm naming conventions, variable names start with uppercase => File, Compiler, Argv, Envp

134

according to llvm naming conventions, expectedOut and createSource would be better names

zaks.anna accepted this revision.Jan 11 2016, 5:43 PM
zaks.anna edited edge metadata.

Laszlo,

I am very excited about having the new and much improved scan-build in tree! It will serve as a solid foundation for moving forward.

Thank you for all your hard work!
Anna.

Thank you guys for your review and help! The latest patch is commited. Let's do the next steps too. ;)

@danielmarjamaki the standalone Bear will be maintained. and your comments will also be considered.

dcoughlin closed this revision.Jan 12 2016, 4:25 PM

This was committed in r257533. Thanks Laszlo!