Page MenuHomePhabricator

[scan-build-py] compilation module rewrite

Authored by rizsotto.mailinglist on Mar 25 2017, 2:22 AM.



It's a chunk from D26390

Represent compilation as object, which can be hashed or checked for equality. Also responsible to create different views (mainly the compilation database entry view) and for the construction from different sources (from compilation database, or from command execution).

(Will provide usage of the new methods in next PR, but that would have been a much bigger change.)

Diff Detail

Event Timeline

JonasToth added inline comments.

This sounds not like english. Remove for ?


Thanks Jonas, will fix it.

Any other comment guys? Would like to merge this week and continue with other PRs which depends on this.

dcoughlin edited edge metadata.Apr 5 2017, 8:38 PM

Can you explain why you need structural equality ("two Commands are the same if their fields have the same values") rather than reference equality ("two Commands are the same if they have the same address")?


This seems fragile. How are you going to test to make sure your hash and equality functions stay in sync. (It would be easy to add a property to one and not the other).

Can you the result of 'as_dict' be used for hashing and equality instead? Are there attributes that you don't want to contribute to object identity?

I want to store Compilation in set to filter out duplicates. (That requires __hash__ and (__eq__ or __cmp__)) New objects will be created from files, so structural equality is needed. (No chance that reference equality would be enough.)
Thanks for the review, will upload a new diff soon.


Ok, will try with the as_dict method for equality. That seems to be a good option, since there is no attribute which would not contribute.

Fixed review comments.

george.karpenkov requested changes to this revision.Jun 19 2017, 1:35 PM

Hi @rizsotto, sorry for a long turnaround time. Overall I like this diff! I've left a number of comments on minor coding issues, and the grammar used in comments (sorry for those, but otherwise they are often hard to parse).
Could you write a few sentences clarifying the big picture here? E.g. what is the overall benefit of representing the compilation task as a comparable and hashable object? Does it mean that writing unit-tests is easier?

I see that you want to put usages into the next PR, but without those it is somewhat hard to review the code. Maybe you could show the next PR as well?



"This module is responsible for parsing a compiler invocation"?


I think "Map of ignored compiler option for the creation of a compilation database" is easier to parse.


Again "This map is used in _split_command method, which classifies the parameters and ignores the selected ones" seems easier to read.


"Please note that other parameters might be ignored as well"


"Option names are mapped to the number of following arguments which should be skipped"


Full stop after comments to be consistent with others.


Full stop after comment.


full stop after comment


Impressive list! Does it have a standard source?


Why use a set if we never do membership queries? I think a tuple would work much better, ditto for COMPILER_PATTERNS_CC


I think Devin has suggested to use as_dict here as well (e.g. hash(str(as_dict()))).
Depending on how often hashing is used, string construction might be very slow, so a lesser evil might be keeping those in sync manually.


Overall comment: rather then using strings 'c' and 'c++' inline everywhere, can we define appropriate globals (e.g. C_COMPILER = 'c') and use those instead?


"From a compilation database entry"


Great job on using Sphinx comments!
However, here an elsewhere, could we also specify the type of params explicitly using :type? That would make methods much easier to read, as often it's hard to figure out the expected type of the parameter just from looking at the method and without knowing the invocations.


If you are using class methods, it is usually better to use the @classmethod decorator.
Then you get cls as a first argument, and can use cls._split_command instead.

This is better than using Compilation._split_command, as it is more resilient to renames, which can be very painful in Python.


If iter_from_execution method is only used here, I think it would make more sense to inline it to from_db_entry.
Otherwise it seems very strange that we have a generic method supporting a number of sources per a compile command, yet it is only used in one place where the number of sources is asserted to be zero.


As per the previous commend, please change to @classmethod.


As per the previous commend, please change to @classmethod.


"whether the command is a compiler call"


Why not just return COMPILER_PATTERN_WRAPPER.match(cmd)?


I would change the comment to simply list is not empty, since you can always use slice indexing.


"Additionally, a wrapper can wrap another wrapper"


Why the compiler is set to 'c' if wrapper is used?


Please specify :type explicitly for command, cc, and cxx


IIRC, Python logging does not automatically print the parameter name of where the logging command was invoked.
Thus just from the message input was: it might be hard to decode where the message is actually coming from.
I think a more verbose prefix would be better.



  1. Please take regexp compilation out of a loop and into a global.
  2. What is the source for this list of skippable options? The perl version?

"look like a filename"


Again, better to take regexp into a global.


again, more fine-grained logging message could be more useful.


CompilationDatabase is not used or tested in this pull request.


Do we want to append to a database, or would it make more sense to override it (-w)?


"Returns 'c' or 'c++' on match" or "when matches".


Why is the parameter called force if the classify_source parameter is called c_compiler?

This revision now requires changes to proceed.Jun 19 2017, 1:35 PM

@george.karpenkov thanks for your comments. I totally lost interest to continue this job, due to lack of feedback. Will use your catches, suggestions on my upstream repository. Thanks again!

Hi @rizsotto, of course it's up to you, but if you still wish to continue, I am happy to review your patches, and I have time for doing so.
Apologies for the lack of feedback before, it happened as everyone was completely booked out.

@rizsotto I think that if the goal is a more testable and reliable wrapper system in Python, than your contributions are very valuable.