Page MenuHomePhabricator

Add scan-build python implementation
ClosedPublic

Authored by rizsotto.mailinglist on May 8 2015, 4:25 AM.

Details

Summary

it was mentioned on the static analyzer's open project page, there is a desire to rewrite the existing Perl implementation to Python. beside that there were also a point "Do a better job interposing on a compilation" which I did prototype and test a solution.

here are the major characteristic of the rewrite:

  • it works/tested on python 2.7, 3.x,
  • it has no dependencies other than standard python modules,
  • it differs from the current perl implementation in functionality,
    • does not generate not-used-attribute files, since analyzer does not report it;
    • does not generate parser reject report, since analyzer does not report it;
    • does not check '-isysroot' uniqness;
    • does execute 'clang' binary only if it needed (one time less than current perl);
    • does not copy pre-compiled header (.ghc) files into report directory;
    • does create compilation database during the analysis;
    • does cover build systems which do not respect 'CC' or 'CXX' environment variables;
    • analyzer is run only after the compilation database were created (not during the build);
    • does filter requested sub-directories from the final report;
    • the invocation of the script changed a little (tried to minimize the change as much as i could)
  • it was tested on Linux, FreeBSD and OS X,
  • it has functional and unit tests around,
  • it is a self contained python package,
    • it's decomposed into multiple modules;
  • it has more documentation inside than the original implementation.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

update to the latest comments. notable changes:

  • executables have distinct function now
  • modules are renamed to fit with the executable names
  • interposition of analyzer execution is merged into a single executable
  • documentation update
  • more tests were run against:
    • tested independent python package installation
    • tested on OSX and FreeBSD too
    • tested against different build systems
    • tests were added to check different interception modes

Thanks Laszlo.

I'm still not convinced that all the python package stuff is needed because scan-build is distributed with clang (see my question inline).

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

tools/scan-build-py/MANIFEST.in
2 ↗(On Diff #41556)

I see that at https://pypi.python.org/pypi/scan-build it lists clang as a prerequisite. But since scan-build-py will now be distributed as part of clang I don't understand what the point of the standalone tool is. Can you explain why this is needed?

tools/scan-build-py/libscanbuild/analyze.py
132

Typo: 'intrepose'

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

Maybe this should ask the user to file a bug against scan-build? Can it point to the bugzilla and tell the user what information, files, etc. they should attach to the bug report to help us reproduce the problem? Just saying "Something unexpected happened" is not as user-friendly as it could be because it does not tell the user that the problem is not their fault nor what they should do about it.

tools/scan-build-py/libscanbuild/runner.py
22

What I was asking for here is documentation about the 'opts' parameter. Where does 'opts' come from? Is it json input from a compilation database? What keys should it contain, what is meaning of their values, etc.

It is important to at least document the expected structure of the dictionary. This documentation will help people understand your design and make sure they don't accidentally break some invariant when fixing bugs or adding features in the future.

jroelofs added inline comments.Dec 5 2015, 8:42 AM
tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

Oh, I don't mean to handle clang crashes... I mean for handling python crashes. I meant to draw an analogy to a feature that clang has: when it crashes, it prints a stack trace, and some steps to reproduce the problem. I was wondering if a similar thing existed in python, for when a python app crashes.

rizsotto.mailinglist marked an inline comment as done.Dec 9 2015, 8:07 AM

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

i do think differently. :) first, i think you can rename executables to whatever you like. previously i made this functionality in a tool called Bear. within 4 years got more than 100 bug reports, improvement ideas, but nobody did care about the executable name. second, i'm not a native speaker, but to me 'intercept compiler calls from build' can be shortened to 'intercept-build'. third, if i rename the executable from intercept-build, shall rename the intercept module too? and to be honest, the executable name scan-build just as bad as intercept-build. :) if i did not convinced you, give me a name and i will rename to it.

tools/scan-build-py/MANIFEST.in
2 ↗(On Diff #41556)

thought to have more distribution channel is better. but if that so confusing, i've deleted it.

tools/scan-build-py/libscanbuild/driver.py
67 ↗(On Diff #40276)

good point. added some implementation. if you don't like the text, please suggest one instead and will replace.

tools/scan-build-py/libscanbuild/runner.py
23

the comment says exactly the value coming from the compilation database. anyway added more explanations. if you have something in mind, please write the text and will copy it.

Also, what do you think about renaming intercept-build to "log-build" or some of the other alternatives I proposed above? I think it is important for the name of the executable to communicate its purpose.

... and to be honest, the executable name scan-build just as bad as intercept-build. :)

There are many things about the original scan-build that are bad. As you have seen, the perl-based scan-build is poorly named, poorly factored, insufficiently documented and poorly tested. This has made it very difficult to fix bugs, support new platforms, and extend scan-build's functionality -- and is exactly the reason why we are so excited about a scan-build reimplementation. With this reimplementation, we have the opportunity to do so much better! If we live up to the high standards in the rest of the clang/llvm codebase about naming, factoring, documentation, and testing then the new scan-build will be much easier to maintain and can serve as a solid foundation to add new features.

tools/scan-build-py/README.md
85

With respect to dynamic library-interposition still not working on OS X, the System Integrity Protection in OS X 10.11 should not prevent interposition on build tools so in theory intercept-build should work. I'll look into it.

tools/scan-build-py/libscanbuild/runner.py
24

Ok. If I create a patch with additional documentation for these fields, would you be willing to take a look at it to make sure the comments are correct?

Hi Laszlo,

Here are some comments from me.

Should we be worried about the name conflicts (between old scan-build and this tool) during rollout? I think it would be beneficial to rename the tools, but let's discuss the names later. (If we integrate Codecheck, that will affect naming too.)

The libear library should be built as part of clang cmake build; looks like @jroelofs offered to do this.
We should also integrate your tests into lit, which is the way we test all of the other tools in clang.

tools/scan-build-py/README.md
2

What's this? Please, remove.

7

I do not think you need to mention the old scan-build. I'd just describe the tool. Here is a slightly modified copy and paste from scan-build:

"A package designed to wrap a build so that all calls to gcc/clang are intercepted and logged into a compilation database [2] and/or piped to the clang static analyzer. Includes intercept-build tool, which logs the build, as well as scan-build tool, which logs the build and runs the clang static analyzer on it."

I have a bunch of comments about other parts of the documentation. However, I can rewrite parts of it as a separate commit. (It will be more efficient than going back and forth.)

60

This is a good place to point out how each mode can be activated. (Which parameters should be used.)

86

Would be good to find out which specific binaries used by the build we are not allowed to interpose on. It would be very unfortunate if this does not work with the System Integrity Protection feature, which is turned on by default.

115

Please, refer to the LICENSE.TXT file like you do in the other files.

tools/scan-build-py/bin/analyze-c++
2

What calls this script?

tools/scan-build-py/bin/analyze-cc
15

It is hard to figure out/search which functions actually get called from the top level tools. Could you rename all of the public functions so that they have unique names? For example, "wrapper" -> "scan_build_wrapper". This would greatly improve searchability!

tools/scan-build-py/libear/ear.c
407

It will be hard for the users to interpret these error messages; especially if the tool is widely distributed. (Can be addressed after the initial commit.)

tools/scan-build-py/libscanbuild/__init__.py
12

Not sure if this comment helps to understand the functionality, please, remove.

tools/scan-build-py/libscanbuild/analyze.py
56
tools/scan-build-py/libscanbuild/intercept.py
72

What does post_processing do? Looks like it allows to append to the compilation database and more.. Could you add a comment or rename the function?

288

Do you explain what the "filter" means somewhere visible to the user? When is filter useful and when it is not?

tools/scan-build-py/libscanbuild/report.py
7

"to generate the "cover" report" -> "to generate index.html for the report"

10

"a final HTML "cover" report" -> "index.html"

Hi Laszlo,

tools/scan-build-py/libear/ear.c
282

This is not the recommended way of interposing on Darwin. All you need to do is provide your function, which can call the function you are interposing on and tell the linker about it as described in this example:

http://www.opensource.apple.com/source/dyld/dyld-97.1/include/mach-o/dyld-interposing.h

You would only need to set DYLD_INSERT_LIBRARIES and would not need to require flat namespace. We should not be setting DYLD_FORCE_FLAT_NAMESPACE; some of the problems it causes are described on this thread:

https://mikeash.com/pyblog/friday-qa-2009-01-30-code-injection.html

compiler-rt (sanitizers) also intercept on all platforms including Windows (see interception.h); that code is very platform dependent.

rizsotto.mailinglist marked 7 inline comments as done.Dec 10 2015, 3:48 AM

thanks for the comments.

tools/scan-build-py/bin/analyze-c++
2

sorry, don't get the question

tools/scan-build-py/libscanbuild/runner.py
24

sure

tools/scan-build-py/libear/ear.c
283

okay, since i don't have access for OSX machines can you help me to make it right?

zaks.anna added inline comments.Dec 10 2015, 10:34 AM
tools/scan-build-py/bin/analyze-c++
3

Where/How is analyze-c++ used?

tools/scan-build-py/bin/analyze-cc
15

I believe this is an important point. Fixing this would greatly improve readability!

tools/scan-build-py/libear/ear.c
283

Ok. Will do.

rizsotto.mailinglist marked an inline comment as done.Dec 10 2015, 1:06 PM
rizsotto.mailinglist added inline comments.
tools/scan-build-py/bin/analyze-c++
3

this is the compiler wrapper which runs the real compiler + the static analyzer. (you guys were call it interposition mode) there is 'analyze-c++' and 'analyze-cc' for C++ and C compilers. it is used from the libscanbuild.analyze.main method.

dcoughlin added inline comments.Dec 10 2015, 1:13 PM
tools/scan-build-py/libear/ear.c
141

The issue I am seeing with library-interposition on OS X seems to be caused by eagerly stashing *_NSGetEnviron() in environ. This causes processes that modify their environment before execv'ing to drop the modified environments because interposition forwards execv to call_execve() with the stale environment.

To avoid this, you can replace all references to environ on Darwin with (*_NSGetEnviron()).

tools/scan-build-py/libear/ear.c
142

this call just sets up the environ variables for the bear_capture_env_t method, two lines down. that call saves the relevant variables... so, if before execv'ing somebody modify the variables we still have it saved. (by the way Scons is the tool which does this.) and when call_execve called it recovers from the saved environment.

there is a test against it. (tests/functional/cases/test_create_cdb.py:53)

The issue I am seeing with library-interposition on OS X [...]

can you give me details what did you see? what was the test you were running?

zaks.anna added inline comments.Dec 10 2015, 4:46 PM
tools/scan-build-py/README.md
87

This is very unfortunate!

We should call out that library interposition is "not supported on OS X (unless System Integrity Protection feature is turned off)" and return an error if people are trying to use it (and System Integrity Protection feature is turned on).

tools/scan-build-py/bin/analyze-build
18

Please rename 'main'.

tools/scan-build-py/bin/analyze-c++
4

I searched the code and did not see it being called. By looking back at the previous revision I see that that libscanbuild.analyze.main used to call 'analyze-cxx' not 'analyze-c++'. Looks like you've also fixed the same bug with 'intercept-c++'.

Is this something that could/would be caught by the tests?

tools/scan-build-py/bin/analyze-cc
15

Could you rename all of the public functions, not just this one? I am talking about the other wrapper and several main functions used as the entry points in the scripts.

tools/scan-build-py/bin/intercept-build
18

Please rename 'main'.

tools/scan-build-py/bin/intercept-c++
15

Please rename 'wrapper'.

tools/scan-build-py/bin/intercept-cc
15

Please rename 'wrapper'.

tools/scan-build-py/bin/scan-build
18

Please rename 'main'.

dcoughlin added inline comments.Dec 11 2015, 10:19 AM
tools/scan-build-py/README.md
86

My comment about SIP on 10.11 not preventing library interposition on build tools is not correct. My apologies for the bad information. SIP does not prevent DYLD_INSERT_LIBRARIES on the build tools themselves but it *does* prevent interposition on the build tool shims in /usr/bin. These are thin shims that call the appropriate tools elsewhere and it is these shims that typical build scripts call. SIP also prevents library interposition on perl/python/etc. -- which are commonly used to drive build scripts. (For example, openssl's build script uses make to call perl to call cc). I think on Darwin we should use your wrapper-based approach (intercept-c/intercept-c++) instead. In my tests on openssl it produces the same (modulo ordering of entries) compilation database as running with library-interposition on a machine with SIP turned off.

tools/scan-build-py/libear/ear.c
142

You also later use 'environ' in execv(). But at that point the environment stashed in environ is different than the environment given to children. So in execv() rather than using 'environ' you on Darwin you would need to use '*_NSGetEnviron()'. (The behavior of 'environ' on Darwin in dylibs is different than in main executables, which is why _NSGetEnviron() exists). In any event, if only support using wrapper-based interposition (intercept-build-cc, etc.) on Darwin then perhaps we don't need to deal with this.

there is a test against it. (tests/functional/cases/test_create_cdb.py:53)

Interesting. Have you run the tests on OS X? Can you tell me how to run them?

tools/scan-build-py/libscanbuild/intercept.py
144

I think it would be better to use the fall-back compiler-wrapper interposition on darwin. As I noted above (apologies again for the bad information) System Integrity Protection will make library-based interposition unreliable for build scripts that employ restricted executables.

rizsotto.mailinglist marked 6 inline comments as done.Dec 11 2015, 3:00 PM
rizsotto.mailinglist added inline comments.
tools/scan-build-py/README.md
87

comment changed. please hint me with some code how can the script detect SIP status. (or postpone the check implementation after the patch is accepted)

tools/scan-build-py/bin/analyze-c++
4

test harness contains only C files, that was the reason this could have gone unnoticed. added specific check for it.

previous upload is broken, sorry for that

tools/scan-build-py/libear/ear.c
143

to run the full test set

PATH=$(pwd)/bin:$PATH python -m unittest discover

to run that specific test

PATH=$(pwd)/bin:$PATH python -m unittest -v tests.functional.cases.test_create_cdb.CompilationDatabaseTest.test_successful_build_on_empty_env

to more about run tests

https://docs.python.org/2/library/unittest.html

143

my understanding on the _NSGetEnviron is, that it shall be used when the library is during the load process. later when the build process calls execv the load process is over, and environ variable is available. an earlier version of this code had a get_environ method, which were either return the environ variable or called the _NSGetEnviron. then i made this change and the tests were still passing, so i don't see where your issue is coming from. please tell me what kind of test you run against it to find it as problem. would like to add it to the test suite.

dcoughlin added inline comments.Dec 12 2015, 1:27 PM
tools/scan-build-py/libear/ear.c
143

Aaah, I had an ancient libscanbuild in my global site-packages, which seemed to cause unittest discover to not work.

When running the above on your latest patch on a machine without SIP I get.

test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest) ... ERROR

======================================================================
ERROR: test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/functional/cases/test_create_cdb.py", line 58, in test_successful_build_on_empty_env
    'env', '-'] + make)
  File "tests/functional/cases/__init__.py", line 38, in silent_check_call
    return subprocess.check_call(cmd, *args, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['intercept-build', '--cdb', '/var/folders/mq/46lsd1kx65v5702dzrkgzdlr0000gn/T/scan-build-test-cKXza1/cdb.json', 'env', '-', 'make', 'SRCDIR=tests/functional/src', 'OBJDIR=/var/folders/mq/46lsd1kx65v5702dzrkgzdlr0000gn/T/scan-build-test-cKXza1', '-f', 'tests/functional/src/build/Makefile', 'CC=clang', 'build_regular']' returned non-zero exit status 2

----------------------------------------------------------------------
Ran 1 test in 0.554s

FAILED (errors=1)

This goes away if you use my suggested *_NSGetEnviron() fix above.
After applying that fix, all but 6 test past on OS X without SIP. The remaining 6 failures are due to your use of mknod() in tests, which requires superuser privileges on OS X:

======================================================================
ERROR: test_interposition_cxx_works (tests.functional.cases.test_from_cmd.RunAnalyzerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Volumes/Data/Clangs/OpenSourceGit/clang/tools/scan-build-py/tests/functional/cases/test_from_cmd.py", line 102, in test_interposition_cxx_works
    self.compile_empty_source_file(tmpdir, True))
  File "/Volumes/Data/Clangs/OpenSourceGit/clang/tools/scan-build-py/tests/functional/cases/test_from_cmd.py", line 87, in compile_empty_source_file
    os.mknod(src_file)
OSError: [Errno 1] Operation not permitted

Is there a more portable way to create an empty file? Perhaps open a file for writing and then closing it?

With SIP I see different failures:

workzilla:scan-build-py dcoughlin$ PATH=$(pwd)/bin:$PATH python -m unittest -v tests.functional.cases.test_create_cdb.CompilationDatabaseTest.test_successful_build_on_empty_env
test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest) ... FAIL

======================================================================
FAIL: test_successful_build_on_empty_env (tests.functional.cases.test_create_cdb.CompilationDatabaseTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/functional/cases/test_create_cdb.py", line 60, in test_successful_build_on_empty_env
    self.assertEqual(5, self.count_entries(result))
AssertionError: 5 != 0

----------------------------------------------------------------------
Ran 1 test in 1.069s

FAILED (failures=1)

Running the entire test suite with SIP yields:

FFFFF...EEFFE..................FEEF.EE...............................................
...
Ran 85 tests in 15.688s

FAILED (failures=9, errors=7)

I tried changing intercept.py to always use compiler wrappers on Darwin, but some tests were still failing (tests.functional.cases.test_exec_anatomy.ExecAnatomyTest, tests.functional.cases.test_create_cdb.CompilationDatabaseTest). Do you expect these tests to pass with compiler wrappers? Do they pass for you with compiler wrappers? Or is this some Darwin-specific issue?

kimgr added a subscriber: kimgr.Dec 13 2015, 12:06 AM

Comment at: tools/scan-build-py/bin/analyze-cc:14
@@ +13,2 @@
+from libscanbuild.analyze import wrapper

+sys.exit(wrapper(False))

It is hard to figure out/search which functions actually get called from the top level tools. Could you rename all of the public functions so that they
have unique names? For example, "wrapper" -> "scan_build_wrapper". This would greatly improve searchability!

A nice pattern in Python is to avoid importing individual
functions/classes, and instead use the module name as a namespace,
e.g.

from libscanbuild import analyze
sys.exit(analyze.wrapper(False))

I don't know if that addresses your concern? I haven't looked at the
code in more detail.

  • Kim
tools/scan-build-py/libear/ear.c
143

thanks Devin, these are interesting findings. however i managed to access to an OS X machine today, have no chance to turn on-and-off the SIP (it was most probably off). and it passed all the tests without any change.

$ uname -v
Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64

anyway, i did replaced the mknod call. and also implemented a way to call _NSGetEnviron rather than access to environment variable. (tests are still passing on this machine.)

and the ExecAnatomyTest is about to capture the different exec calls by the libear library. would not make sense with compiler wrappers.

Thanks for adding bear_get_environment() to handle the environment weirdness on Darwin.

tools/scan-build-py/libscanbuild/intercept.py
147

Can you change this to default to compiler-wrapper interposition on Darwin with a command-line flag to force library-based interposition?

Library-based interposition will fail silently if SIP is enabled, so this should be detected when that flag is passed so the user is informed. You can detect whether SIP is enabled on Darwin by checking whether (1) there is a binary called 'csrutil' in the path and, if so, (2) whether the output of executing 'csrutil status' contains 'System Integrity Protection status: enabled'.

rgov added a subscriber: rgov.Dec 20 2015, 12:13 AM

I tried out intercept-build and found it very easy to use. I tried running it on projects that compile with both make and xcodebuild and it worked for both. Thanks for contributing this tool.

sorry for the delay, hard to get free time these days. ;)

tools/scan-build-py/libscanbuild/intercept.py
147

ok, implemented the SIP check. (also added SELinux check, which behaves the same as SIP.)

about the default behavior: scan-build command has the other compiler-wrapper (run compiler & analyzer) and flags can turn this module on. intercept-build default is the library-based and flag can turn the compiler-wrapper interposition on... i would keep it this way. (and also not make platform dependent switches into the code.)

Hi Laszlo,

I've run scan-build-py with both environment-variable-based and library-based interposition on our open source benchmark suite and it looks like it is in great shape on Darwin! There are still some remaining issues with xcodebuild, but I will help fix these after scan-build-py is committed.

Please apply for commit access (the instructions are at http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access), update the commit message, and commit.

Thanks for all your hard work on this!

tools/scan-build-py/libscanbuild/intercept.py
261

Great! Thanks for checking for this.

tools/scan-build-py/libscanbuild/runner.py
42

Thanks for documenting these.

dcoughlin accepted this revision.Jan 6 2016, 5:12 PM
dcoughlin added a reviewer: dcoughlin.

I like the standalone bear tool. should it really be embedded into clang? Will the standalone tool still be maintained?

tools/scan-build-py/libear/ear.c
1

why should these be added in these source files when no other file in LLVM/Clang has it.

Some LLVM/Clang coding practices review comments.

tools/scan-build-py/tests/functional/exec/main.c
115

You are assigning a const literal to a non-const pointer which is dangerous. It seems to me you should use 'const char *' instead. personally I like 'const char * const' better but as far as I know that is against normal coding practices in LLVM/Clang.

122

I would suggest that you remove the semicolon here and add it in the macro instead. the macro usage will be more "function-like" then.

128

according to llvm naming conventions, variable names start with uppercase => File, Compiler, Argv, Envp

133

according to llvm naming conventions, expectedOut and createSource would be better names

zaks.anna accepted this revision.Jan 11 2016, 5:43 PM
zaks.anna edited edge metadata.

Laszlo,

I am very excited about having the new and much improved scan-build in tree! It will serve as a solid foundation for moving forward.

Thank you for all your hard work!
Anna.

Thank you guys for your review and help! The latest patch is commited. Let's do the next steps too. ;)

@danielmarjamaki the standalone Bear will be maintained. and your comments will also be considered.

dcoughlin closed this revision.Jan 12 2016, 4:25 PM

This was committed in r257533. Thanks Laszlo!