This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
tools/scan-build-py/
-
scan-build-py/
-
libscanbuild/
37
compilation.py
-
tests/unit/
-
unit/
1
test_compilation.py

Differential D31365

[scan-build-py] compilation module rewrite
AbandonedPublic

Authored by rizsotto.mailinglist on Mar 25 2017, 2:22 AM.

Download Raw Diff

Details

Reviewers

dcoughlin
jroelofs
akhiljaggarwal
zaks.anna
george.karpenkov

Summary

It's a chunk from D26390

Represent compilation as object, which can be hashed or checked for equality. Also responsible to create different views (mainly the compilation database entry view) and for the construction from different sources (from compilation database, or from command execution).

(Will provide usage of the new methods in next PR, but that would have been a much bigger change.)

Diff Detail

Event Timeline

rizsotto.mailinglist created this revision.Mar 25 2017, 2:22 AM

JonasToth added a subscriber: JonasToth.Mar 25 2017, 3:00 AM

JonasToth added inline comments.

tools/scan-build-py/libscanbuild/compilation.py
6	This sounds not like english. Remove `for` ?

rizsotto.mailinglist added inline comments.Mar 27 2017, 1:40 AM

tools/scan-build-py/libscanbuild/compilation.py
6	Thanks Jonas, will fix it.

JonasToth removed a subscriber: JonasToth.Mar 27 2017, 2:44 AM

Any other comment guys? Would like to merge this week and continue with other PRs which depends on this.

Can you explain why you need structural equality ("two Commands are the same if their fields have the same values") rather than reference equality ("two Commands are the same if they have the same address")?

tools/scan-build-py/libscanbuild/compilation.py
100	This seems fragile. How are you going to test to make sure your hash and equality functions stay in sync. (It would be easy to add a property to one and not the other). Can you the result of 'as_dict' be used for hashing and equality instead? Are there attributes that you don't want to contribute to object identity?

I want to store Compilation in set to filter out duplicates. (That requires __hash__ and (__eq__ or __cmp__)) New objects will be created from files, so structural equality is needed. (No chance that reference equality would be enough.)
Thanks for the review, will upload a new diff soon.

tools/scan-build-py/libscanbuild/compilation.py
100	Ok, will try with the `as_dict` method for equality. That seems to be a good option, since there is no attribute which would not contribute.

Fixed review comments.

ping

whisperity added a subscriber: whisperity.Apr 18 2017, 3:52 AM

ping

Hi @rizsotto, sorry for a long turnaround time. Overall I like this diff! I've left a number of comments on minor coding issues, and the grammar used in comments (sorry for those, but otherwise they are often hard to parse).
Could you write a few sentences clarifying the big picture here? E.g. what is the overall benefit of representing the compilation task as a comparable and hashable object? Does it mean that writing unit-tests is easier?

I see that you want to put usages into the next PR, but without those it is somewhat hard to review the code. Maybe you could show the next PR as well?

Thanks!

tools/scan-build-py/libscanbuild/compilation.py
6	"This module is responsible for parsing a compiler invocation"?
18	I think "Map of ignored compiler option for the creation of a compilation database" is easier to parse.
19	Again "This map is used in `_split_command` method, which classifies the parameters and ignores the selected ones" seems easier to read.
20	"Please note that other parameters might be ignored as well"
23	"Option names are mapped to the number of following arguments which should be skipped"
55	Full stop after comments to be consistent with others.
58	Full stop after comment.
66	full stop after comment
72	Impressive list! Does it have a standard source?
73	Why use a set if we never do membership queries? I think a tuple would work much better, ditto for COMPILER_PATTERNS_CC
93	I think Devin has suggested to use `as_dict` here as well (e.g. `hash(str(as_dict()))`). Depending on how often hashing is used, string construction might be very slow, so a lesser evil might be keeping those in sync manually.
107	Overall comment: rather then using strings `'c'` and `'c++'` inline everywhere, can we define appropriate globals (e.g. `C_COMPILER = 'c'`) and use those instead?
118	"From a compilation database entry"
120	Great job on using Sphinx comments! However, here an elsewhere, could we also specify the type of params explicitly using `:type`? That would make methods much easier to read, as often it's hard to figure out the expected type of the parameter just from looking at the method and without knowing the invocations.
126	If you are using class methods, it is usually better to use the @classmethod decorator. Then you get cls as a first argument, and can use cls._split_command instead. This is better than using Compilation._split_command, as it is more resilient to renames, which can be very painful in Python.
126	If `iter_from_execution` method is only used here, I think it would make more sense to inline it to `from_db_entry`. Otherwise it seems very strange that we have a generic method supporting a number of sources per a compile command, yet it is only used in one place where the number of sources is asserted to be zero.
130	As per the previous commend, please change to `@classmethod`.
150	As per the previous commend, please change to @classmethod.
152	"whether the command is a compiler call"
161	Why not just `return COMPILER_PATTERN_WRAPPER.match(cmd)`?
171	I would change the comment to simply `list is not empty`, since you can always use slice indexing.
176	"Additionally, a wrapper can wrap another wrapper"
179	Why the compiler is set to `'c'` if wrapper is used?
194	Please specify `:type` explicitly for `command`, `cc`, and `cxx`
196	IIRC, Python logging does not automatically print the parameter name of where the logging command was invoked. Thus just from the message `input was:` it might be hard to decode where the message is actually coming from. I think a more verbose prefix would be better.
216	Nice!
217	Please take regexp compilation out of a loop and into a global. What is the source for this list of skippable options? The perl version?
219	"look like a filename"
223	Again, better to take regexp into a global.
228	again, more fine-grained logging message could be more useful.
233	CompilationDatabase is not used or tested in this pull request.
237	Do we want to append to a database, or would it make more sense to override it (`-w`)?
296	"Returns 'c' or 'c++' on match" or "when matches".
tools/scan-build-py/tests/unit/test_compilation.py
184	Why is the parameter called `force` if the `classify_source` parameter is called `c_compiler`?

This revision now requires changes to proceed.Jun 19 2017, 1:35 PM

@george.karpenkov thanks for your comments. I totally lost interest to continue this job, due to lack of feedback. Will use your catches, suggestions on my upstream repository. Thanks again!

Hi @rizsotto, of course it's up to you, but if you still wish to continue, I am happy to review your patches, and I have time for doing so.
Apologies for the lack of feedback before, it happened as everyone was completely booked out.

@rizsotto I think that if the goal is a more testable and reliable wrapper system in Python, than your contributions are very valuable.

Revision Contents

Path

Size

tools/

scan-build-py/

libscanbuild/

compilation.py

265 lines

tests/

unit/

test_compilation.py

243 lines

Diff 94511

tools/scan-build-py/libscanbuild/compilation.py

# -- coding: utf-8 --		# -- coding: utf-8 --
# The LLVM Compiler Infrastructure		# The LLVM Compiler Infrastructure
#		#
# This file is distributed under the University of Illinois Open Source		# This file is distributed under the University of Illinois Open Source
# License. See LICENSE.TXT for details.		# License. See LICENSE.TXT for details.
""" This module is responsible for to parse a compiler invocation. """		""" This module is responsible to parse a compiler invocation. """
		JonasTothUnsubmitted Not Done Reply Inline Actions This sounds not like english. Remove `for` ? JonasToth: This sounds not like english. Remove `for` ?
		rizsotto.mailinglistAuthorUnsubmitted Not Done Reply Inline Actions Thanks Jonas, will fix it. rizsotto.mailinglist: Thanks Jonas, will fix it.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "This module is responsible for parsing a compiler invocation"? george.karpenkov: "This module is responsible for parsing a compiler invocation"?

import re		import re
import os		import os
import collections		import collections
		import logging
		import json
		from libscanbuild import Execution
		from libscanbuild.shell import decode

__all__ = ['split_command', 'classify_source', 'compiler_language']		__all__ = ['split_command', 'classify_source', 'compiler_language']

# Ignored compiler options map for compilation database creation.		# Ignored compiler options map for compilation database creation.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I think "Map of ignored compiler option for the creation of a compilation database" is easier to parse. george.karpenkov: I think "Map of ignored compiler option for the creation of a compilation database" is easier…
# The map is used in `split_command` method. (Which does ignore and classify		# The map is used in `_split_command` method. (Which does ignore and classify
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Again "This map is used in `_split_command` method, which classifies the parameters and ignores the selected ones" seems easier to read. george.karpenkov: Again "This map is used in `_split_command` method, which classifies the parameters and ignores…
# parameters.) Please note, that these are not the only parameters which		# parameters.) Please note, that these are not the only parameters which
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "Please note that other parameters might be ignored as well" george.karpenkov: "Please note that other parameters might be ignored as well"
# might be ignored.		# might be ignored.
#		#
# Keys are the option name, value number of options to skip		# Keys are the option name, value number of options to skip
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "Option names are mapped to the number of following arguments which should be skipped" george.karpenkov: "Option names are mapped to the number of following arguments which should be skipped"
IGNORED_FLAGS = {		IGNORED_FLAGS = {
# compiling only flag, ignored because the creator of compilation		# compiling only flag, ignored because the creator of compilation
# database will explicitly set it.		# database will explicitly set it.
'-c': 0,		'-c': 0,
# preprocessor macros, ignored because would cause duplicate entries in		# preprocessor macros, ignored because would cause duplicate entries in
# the output (the only difference would be these flags). this is actual		# the output (the only difference would be these flags). this is actual
# finding from users, who suffered longer execution time caused by the		# finding from users, who suffered longer execution time caused by the
# duplicates.		# duplicates.
Show All 15 Lines	IGNORED_FLAGS = {
'-l': 1,		'-l': 1,
'-L': 1,		'-L': 1,
'-u': 1,		'-u': 1,
'-z': 1,		'-z': 1,
'-T': 1,		'-T': 1,
'-Xlinker': 1		'-Xlinker': 1
}		}

# Known C/C++ compiler executable name patterns		# Known C/C++ compiler wrapper name patterns
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Full stop after comments to be consistent with others. george.karpenkov: Full stop after comments to be consistent with others.
COMPILER_PATTERNS = frozenset([		COMPILER_PATTERN_WRAPPER = re.compile(r'^(distcc\|ccache)$')
re.compile(r'^(intercept-\|analyze-\|)c(c\|\+\+)$'),
re.compile(r'^([^-]-)[mg](cc\|\+\+)(-\d+(\.\d+){0,2})?$'),		# Known C compiler executable name patterns
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Full stop after comment. george.karpenkov: Full stop after comment.
re.compile(r'^([^-]-)clang(\+\+)?(-\d+(\.\d+){0,2})?$'),		COMPILER_PATTERNS_CC = frozenset([
re.compile(r'^llvm-g(cc\|\+\+)$'),		re.compile(r'^(\|i\|mpi)cc$'),
		re.compile(r'^([^-]-)[mg]cc(-\d+(\.\d+){0,2})?$'),
		re.compile(r'^([^-]-)clang(-\d+(\.\d+){0,2})?$'),
		re.compile(r'^(g\|)xlc$'),
])		])

		# Known C++ compiler executable name patterns
		george.karpenkovUnsubmitted Not Done Reply Inline Actions full stop after comment george.karpenkov: full stop after comment
		COMPILER_PATTERNS_CXX = frozenset([
		re.compile(r'^(c\+\+\|cxx\|CC)$'),
		re.compile(r'^([^-]-)[mg]\+\+(-\d+(\.\d+){0,2})?$'),
		re.compile(r'^([^-]-)clang\+\+(-\d+(\.\d+){0,2})?$'),
		re.compile(r'^(icpc\|mpiCC\|mpicxx\|mpic\+\+)$'),
		re.compile(r'^(g\|)xl(C\|c\+\+)$'),
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Impressive list! Does it have a standard source? george.karpenkov: Impressive list! Does it have a standard source?
		])
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Why use a set if we never do membership queries? I think a tuple would work much better, ditto for COMPILER_PATTERNS_CC george.karpenkov: Why use a set if we never do membership queries? I think a tuple would work much better, ditto…

def split_command(command):		CompilationCommand = collections.namedtuple(
""" Returns a value when the command is a compilation, None otherwise.		'CompilationCommand', ['compiler', 'flags', 'files'])

The value on success is a named tuple with the following attributes:

files: list of source files		class Compilation:
flags: list of compile options		def __init__(self, compiler, flags, source, directory):
compiler: string value of 'c' or 'c++' """		""" Constructor for a single compilation.

# the result of this method		This method just normalize the paths and initialize values. """
result = collections.namedtuple('Compilation',
['compiler', 'flags', 'files'])		self.compiler = compiler
result.compiler = compiler_language(command)		self.flags = flags
result.flags = []		self.directory = os.path.normpath(directory)
result.files = []		self.source = source if os.path.isabs(source) else \
		os.path.normpath(os.path.join(self.directory, source))

		def __hash__(self):
		return hash((self.compiler, self.source, self.directory,
		':'.join(self.flags)))
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I think Devin has suggested to use `as_dict` here as well (e.g. `hash(str(as_dict()))`). Depending on how often hashing is used, string construction might be very slow, so a lesser evil might be keeping those in sync manually. george.karpenkov: I think Devin has suggested to use `as_dict` here as well (e.g. `hash(str(as_dict()))`).

		def __eq__(self, other):
		return vars(self) == vars(other)

		def as_dict(self):
		""" This method dumps the object attributes into a dictionary. """

		dcoughlinUnsubmitted Not Done Reply Inline Actions This seems fragile. How are you going to test to make sure your hash and equality functions stay in sync. (It would be easy to add a property to one and not the other). Can you the result of 'as_dict' be used for hashing and equality instead? Are there attributes that you don't want to contribute to object identity? dcoughlin: This seems fragile. How are you going to test to make sure your hash and equality functions…
		rizsotto.mailinglistAuthorUnsubmitted Not Done Reply Inline Actions Ok, will try with the `as_dict` method for equality. That seems to be a good option, since there is no attribute which would not contribute. rizsotto.mailinglist: Ok, will try with the `as_dict` method for equality. That seems to be a good option, since…
		return vars(self)

		def as_db_entry(self):
		""" This method creates a compilation database entry. """

		relative = os.path.relpath(self.source, self.directory)
		compiler = 'cc' if self.compiler == 'c' else 'c++'
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Overall comment: rather then using strings `'c'` and `'c++'` inline everywhere, can we define appropriate globals (e.g. `C_COMPILER = 'c'`) and use those instead? george.karpenkov: Overall comment: rather then using strings `'c'` and `'c++'` inline everywhere, can we define…
		return {
		'file': relative,
		'arguments': [compiler, '-c'] + self.flags + [relative],
		'directory': self.directory
		}

		@staticmethod
		def from_db_entry(entry):
		""" Parser method for compilation entry.

		From compilation database entry it creates the compilation object.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "From a compilation database entry" george.karpenkov: "From a compilation database entry"

		:param entry: the compilation database entry
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Great job on using Sphinx comments! However, here an elsewhere, could we also specify the type of params explicitly using `:type`? That would make methods much easier to read, as often it's hard to figure out the expected type of the parameter just from looking at the method and without knowing the invocations. george.karpenkov: Great job on using Sphinx comments! However, here an elsewhere, could we also specify the type…
		:return: a single compilation object """

		command = decode(entry['command']) if 'command' in entry else \
		entry['arguments']
		execution = Execution(cmd=command, cwd=entry['directory'], pid=0)
		entries = list(Compilation.iter_from_execution(execution))
		george.karpenkovUnsubmitted Not Done Reply Inline Actions If you are using class methods, it is usually better to use the @classmethod decorator. Then you get cls as a first argument, and can use cls._split_command instead. This is better than using Compilation._split_command, as it is more resilient to renames, which can be very painful in Python. george.karpenkov: If you are using class methods, it is usually better to use the @classmethod decorator. Then…
		george.karpenkovUnsubmitted Not Done Reply Inline Actions If `iter_from_execution` method is only used here, I think it would make more sense to inline it to `from_db_entry`. Otherwise it seems very strange that we have a generic method supporting a number of sources per a compile command, yet it is only used in one place where the number of sources is asserted to be zero. george.karpenkov: If `iter_from_execution` method is only used here, I think it would make more sense to inline…
		assert len(entries) == 1
		return entries[0]

		@staticmethod
		george.karpenkovUnsubmitted Not Done Reply Inline Actions As per the previous commend, please change to `@classmethod`. george.karpenkov: As per the previous commend, please change to `@classmethod`.
		def iter_from_execution(execution, cc='cc', cxx='c++'):
		""" Generator method for compilation entries.

		From a single compiler call it can generate zero or more entries.

		:param execution: executed command and working directory
		:param cc: user specified C compiler name
		:param cxx: user specified C++ compiler name
		:return: stream of CompilationDbEntry objects """

		candidate = Compilation._split_command(execution.cmd, cc, cxx)
		for source in (candidate.files if candidate else []):
		result = Compilation(directory=execution.cwd,
		source=source,
		compiler=candidate.compiler,
		flags=candidate.flags)
		if os.path.isfile(result.source):
		yield result

		@staticmethod
		george.karpenkovUnsubmitted Not Done Reply Inline Actions As per the previous commend, please change to @classmethod. george.karpenkov: As per the previous commend, please change to @classmethod.
		def _split_compiler(command, cc, cxx):
		""" A predicate to decide the command is a compiler call or not.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "whether the command is a compiler call" george.karpenkov: "whether the command is a compiler call"

		:param command: the command to classify
		:param cc: user specified C compiler name
		:param cxx: user specified C++ compiler name
		:return: None if the command is not a compilation, or a tuple
		(compiler_language, rest of the command) otherwise """

		def is_wrapper(cmd):
		return True if COMPILER_PATTERN_WRAPPER.match(cmd) else False
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Why not just `return COMPILER_PATTERN_WRAPPER.match(cmd)`? george.karpenkov: Why not just `return COMPILER_PATTERN_WRAPPER.match(cmd)`?

		def is_c_compiler(cmd):
		return os.path.basename(cc) == cmd or \
		any(pattern.match(cmd) for pattern in COMPILER_PATTERNS_CC)

		def is_cxx_compiler(cmd):
		return os.path.basename(cxx) == cmd or \
		any(pattern.match(cmd) for pattern in COMPILER_PATTERNS_CXX)

		if command: # not empty list will allow to index '0' and '1:'
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I would change the comment to simply `list is not empty`, since you can always use slice indexing. george.karpenkov: I would change the comment to simply `list is not empty`, since you can always use slice…
		executable = os.path.basename(command[0])
		parameters = command[1:]
		# 'wrapper' 'parameters' and
		# 'wrapper' 'compiler' 'parameters' are valid.
		# plus, a wrapper can wrap wrapper too.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "Additionally, a wrapper can wrap another wrapper" george.karpenkov: "Additionally, a wrapper can wrap another wrapper"
		if is_wrapper(executable):
		result = Compilation._split_compiler(parameters, cc, cxx)
		return ('c', parameters) if result is None else result
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Why the compiler is set to `'c'` if wrapper is used? george.karpenkov: Why the compiler is set to `'c'` if wrapper is used?
		# and 'compiler' 'parameters' is valid.
		elif is_c_compiler(executable):
		return 'c', parameters
		elif is_cxx_compiler(executable):
		return 'c++', parameters
		return None

		@staticmethod
		def _split_command(command, cc, cxx):
		""" Returns a value when the command is a compilation, None otherwise.

		:param command: the command to classify
		:param cc: user specified C compiler name
		:param cxx: user specified C++ compiler name
		:return: stream of CompilationCommand objects """
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Please specify `:type` explicitly for `command`, `cc`, and `cxx` george.karpenkov: Please specify `:type` explicitly for `command`, `cc`, and `cxx`

		logging.debug('input was: %s', command)
		george.karpenkovUnsubmitted Not Done Reply Inline Actions IIRC, Python logging does not automatically print the parameter name of where the logging command was invoked. Thus just from the message `input was:` it might be hard to decode where the message is actually coming from. I think a more verbose prefix would be better. george.karpenkov: IIRC, Python logging does not automatically print the parameter name of where the logging…
# quit right now, if the program was not a C/C++ compiler		# quit right now, if the program was not a C/C++ compiler
if not result.compiler:		compiler_and_arguments = Compilation._split_compiler(command, cc, cxx)
		if compiler_and_arguments is None:
return None		return None

		# the result of this method
		result = CompilationCommand(compiler=compiler_and_arguments[0],
		flags=[],
		files=[])
# iterate on the compile options		# iterate on the compile options
args = iter(command[1:])		args = iter(compiler_and_arguments[1])
for arg in args:		for arg in args:
# quit when compilation pass is not involved		# quit when compilation pass is not involved
if arg in {'-E', '-S', '-cc1', '-M', '-MM', '-###'}:		if arg in {'-E', '-S', '-cc1', '-M', '-MM', '-###'}:
return None		return None
# ignore some flags		# ignore some flags
elif arg in IGNORED_FLAGS:		elif arg in IGNORED_FLAGS:
count = IGNORED_FLAGS[arg]		count = IGNORED_FLAGS[arg]
for _ in range(count):		for _ in range(count):
next(args)		next(args)
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Nice! george.karpenkov: Nice!
elif re.match(r'^-(l\|L\|Wl,).+', arg):		elif re.match(r'^-(l\|L\|Wl,).+', arg):
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Please take regexp compilation out of a loop and into a global. What is the source for this list of skippable options? The perl version? george.karpenkov: 1) Please take regexp compilation out of a loop and into a global. 2) What is the source for…
pass		pass
# some parameters could look like filename, take as compile option		# some parameters could look like filename, take as compile option
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "look like a filename" george.karpenkov: "look like a filename"
elif arg in {'-D', '-I'}:		elif arg in {'-D', '-I'}:
result.flags.extend([arg, next(args)])		result.flags.extend([arg, next(args)])
# parameter which looks source file is taken...		# parameter which looks source file is taken...
elif re.match(r'^[^-].+', arg) and classify_source(arg):		elif re.match(r'^[^-].+', arg) and classify_source(arg):
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Again, better to take regexp into a global. george.karpenkov: Again, better to take regexp into a global.
result.files.append(arg)		result.files.append(arg)
# and consider everything else as compile option.		# and consider everything else as compile option.
else:		else:
result.flags.append(arg)		result.flags.append(arg)
		logging.debug('output is: %s', result)
		george.karpenkovUnsubmitted Not Done Reply Inline Actions again, more fine-grained logging message could be more useful. george.karpenkov: again, more fine-grained logging message could be more useful.
# do extra check on number of source files		# do extra check on number of source files
return result if result.files else None		return result if result.files else None


		class CompilationDatabase:
		george.karpenkovUnsubmitted Not Done Reply Inline Actions CompilationDatabase is not used or tested in this pull request. george.karpenkov: CompilationDatabase is not used or tested in this pull request.
		@staticmethod
		def save(filename, iterator):
		entries = [entry.as_db_entry() for entry in iterator]
		with open(filename, 'w+') as handle:
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Do we want to append to a database, or would it make more sense to override it (`-w`)? george.karpenkov: Do we want to append to a database, or would it make more sense to override it (`-w`)?
		json.dump(entries, handle, sort_keys=True, indent=4)

		@staticmethod
		def load(filename):
		with open(filename, 'r') as handle:
		for entry in json.load(handle):
		yield Compilation.from_db_entry(entry)


def classify_source(filename, c_compiler=True):		def classify_source(filename, c_compiler=True):
""" Return the language from file name extension. """		""" Classify source file names and returns the presumed language,
		based on the file name extension.

		:param filename: the source file name
		:param c_compiler: indicate that the compiler is a C compiler,
		:return: the language from file name extension. """

mapping = {		mapping = {
'.c': 'c' if c_compiler else 'c++',		'.c': 'c' if c_compiler else 'c++',
'.i': 'c-cpp-output' if c_compiler else 'c++-cpp-output',		'.i': 'c-cpp-output' if c_compiler else 'c++-cpp-output',
'.ii': 'c++-cpp-output',		'.ii': 'c++-cpp-output',
'.m': 'objective-c',		'.m': 'objective-c',
'.mi': 'objective-c-cpp-output',		'.mi': 'objective-c-cpp-output',
'.mm': 'objective-c++',		'.mm': 'objective-c++',
'.mii': 'objective-c++-cpp-output',		'.mii': 'objective-c++-cpp-output',
'.C': 'c++',		'.C': 'c++',
'.cc': 'c++',		'.cc': 'c++',
'.CC': 'c++',		'.CC': 'c++',
'.cp': 'c++',		'.cp': 'c++',
'.cpp': 'c++',		'.cpp': 'c++',
'.cxx': 'c++',		'.cxx': 'c++',
'.c++': 'c++',		'.c++': 'c++',
'.C++': 'c++',		'.C++': 'c++',
'.txx': 'c++'		'.txx': 'c++'
}		}

__, extension = os.path.splitext(os.path.basename(filename))		__, extension = os.path.splitext(os.path.basename(filename))
return mapping.get(extension)		return mapping.get(extension)


		# Bellow this line, only temporary declarations for backward compatibility. #

		def split_command(command):
		""" Returns a value when the command is a compilation, None otherwise.

		The value on success is a named tuple with the following attributes:

		files: list of source files
		flags: list of compile options
		compiler: string value of 'c' or 'c++' """

		# the result of this method
		return Compilation._split_command(command, "cc", "c++")


def compiler_language(command):		def compiler_language(command):
""" A predicate to decide the command is a compiler call or not.		""" A predicate to decide the command is a compiler call or not.

Returns 'c' or 'c++' when it match. None otherwise. """		Returns 'c' or 'c++' when it match. None otherwise. """
		george.karpenkovUnsubmitted Not Done Reply Inline Actions "Returns 'c' or 'c++' on match" or "when matches". george.karpenkov: "Returns 'c' or 'c++' on match" or "when matches".

cplusplus = re.compile(r'^(.+)(\+\+)(-.+\|)$')		language_and_arguments = Compilation._split_compiler(command, "cc", "c++")
		if language_and_arguments is None:
if command:
executable = os.path.basename(command[0])
if any(pattern.match(executable) for pattern in COMPILER_PATTERNS):
return 'c++' if cplusplus.match(executable) else 'c'
return None		return None

		return language_and_arguments[0]

tools/scan-build-py/tests/unit/test_compilation.py

	# -- coding: utf-8 --			# -- coding: utf-8 --
	# The LLVM Compiler Infrastructure			# The LLVM Compiler Infrastructure
	#			#
	# This file is distributed under the University of Illinois Open Source			# This file is distributed under the University of Illinois Open Source
	# License. See LICENSE.TXT for details.			# License. See LICENSE.TXT for details.

	import libscanbuild.compilation as sut			import libscanbuild.compilation as sut
	import unittest			import unittest


	class CompilerTest(unittest.TestCase):			class CompilerTest(unittest.TestCase):

	def test_is_compiler_call(self):			def assert_c_compiler(self, command, cc='nope', cxx='nope++'):
	self.assertIsNotNone(sut.compiler_language(['clang']))			value = sut.Compilation._split_compiler(command, cc, cxx)
	self.assertIsNotNone(sut.compiler_language(['clang-3.6']))			self.assertIsNotNone(value)
	self.assertIsNotNone(sut.compiler_language(['clang++']))			self.assertEqual(value[0], 'c')
	self.assertIsNotNone(sut.compiler_language(['clang++-3.5.1']))
	self.assertIsNotNone(sut.compiler_language(['cc']))			def assert_cxx_compiler(self, command, cc='nope', cxx='nope++'):
	self.assertIsNotNone(sut.compiler_language(['c++']))			value = sut.Compilation._split_compiler(command, cc, cxx)
	self.assertIsNotNone(sut.compiler_language(['gcc']))			self.assertIsNotNone(value)
	self.assertIsNotNone(sut.compiler_language(['g++']))			self.assertEqual(value[0], 'c++')
	self.assertIsNotNone(sut.compiler_language(['/usr/local/bin/gcc']))
	self.assertIsNotNone(sut.compiler_language(['/usr/local/bin/g++']))			def assert_not_compiler(self, command):
	self.assertIsNotNone(sut.compiler_language(['/usr/local/bin/clang']))			value = sut.Compilation._split_compiler(command, 'nope', 'nope')
	self.assertIsNotNone(			self.assertIsNone(value)
	sut.compiler_language(['armv7_neno-linux-gnueabi-g++']))
				def test_compiler_call(self):
	self.assertIsNone(sut.compiler_language([]))			self.assert_c_compiler(['cc'])
	self.assertIsNone(sut.compiler_language(['']))			self.assert_cxx_compiler(['CC'])
	self.assertIsNone(sut.compiler_language(['ld']))			self.assert_cxx_compiler(['c++'])
	self.assertIsNone(sut.compiler_language(['as']))			self.assert_cxx_compiler(['cxx'])
	self.assertIsNone(sut.compiler_language(['/usr/local/bin/compiler']))
				def test_clang_compiler_call(self):
				self.assert_c_compiler(['clang'])
				self.assert_c_compiler(['clang-3.6'])
				self.assert_cxx_compiler(['clang++'])
				self.assert_cxx_compiler(['clang++-3.5.1'])

				def test_gcc_compiler_call(self):
				self.assert_c_compiler(['gcc'])
				self.assert_cxx_compiler(['g++'])

				def test_intel_compiler_call(self):
				self.assert_c_compiler(['icc'])
				self.assert_cxx_compiler(['icpc'])

				def test_aix_compiler_call(self):
				self.assert_c_compiler(['xlc'])
				self.assert_cxx_compiler(['xlc++'])
				self.assert_cxx_compiler(['xlC'])
				self.assert_c_compiler(['gxlc'])
				self.assert_cxx_compiler(['gxlc++'])

				def test_open_mpi_compiler_call(self):
				self.assert_c_compiler(['mpicc'])
				self.assert_cxx_compiler(['mpiCC'])
				self.assert_cxx_compiler(['mpicxx'])
				self.assert_cxx_compiler(['mpic++'])

				def test_compiler_call_with_path(self):
				self.assert_c_compiler(['/usr/local/bin/gcc'])
				self.assert_cxx_compiler(['/usr/local/bin/g++'])
				self.assert_c_compiler(['/usr/local/bin/clang'])

				def test_cross_compiler_call(self):
				self.assert_cxx_compiler(['armv7_neno-linux-gnueabi-g++'])

				def test_compiler_wrapper_call(self):
				self.assert_c_compiler(['distcc'])
				self.assert_c_compiler(['distcc', 'cc'])
				self.assert_cxx_compiler(['distcc', 'c++'])
				self.assert_c_compiler(['ccache'])
				self.assert_c_compiler(['ccache', 'cc'])
				self.assert_cxx_compiler(['ccache', 'c++'])

				def test_non_compiler_call(self):
				self.assert_not_compiler([])
				self.assert_not_compiler([''])
				self.assert_not_compiler(['ld'])
				self.assert_not_compiler(['as'])
				self.assert_not_compiler(['/usr/local/bin/compiler'])

				def test_specific_compiler_call(self):
				self.assert_c_compiler(['nope'], cc='nope')
				self.assert_c_compiler(['./nope'], cc='nope')
				self.assert_c_compiler(['/path/nope'], cc='nope')
				self.assert_cxx_compiler(['nope++'], cxx='nope++')
				self.assert_cxx_compiler(['./nope++'], cxx='nope++')
				self.assert_cxx_compiler(['/path/nope++'], cxx='nope++')

				def assert_arguments_equal(self, expected, command):
				value = sut.Compilation._split_compiler(command, 'nope', 'nope')
				self.assertIsNotNone(value)
				self.assertEqual(expected, value[1])

				def test_argument_split(self):
				arguments = ['-c', 'file.c']
				self.assert_arguments_equal(arguments, ['distcc'] + arguments)
				self.assert_arguments_equal(arguments, ['distcc', 'cc'] + arguments)
				self.assert_arguments_equal(arguments, ['distcc', 'c++'] + arguments)
				self.assert_arguments_equal(arguments, ['ccache'] + arguments)
				self.assert_arguments_equal(arguments, ['ccache', 'cc'] + arguments)
				self.assert_arguments_equal(arguments, ['ccache', 'c++'] + arguments)


	class SplitTest(unittest.TestCase):			class SplitTest(unittest.TestCase):

	def test_detect_cxx_from_compiler_name(self):			def assert_compilation(self, command):
	def test(cmd):			result = sut.Compilation._split_command(command, 'nope', 'nope')
	result = sut.split_command([cmd, '-c', 'src.c'])			self.assertIsNotNone(result)
	self.assertIsNotNone(result, "wrong input for test")
	return result.compiler == 'c++'			def assert_non_compilation(self, command):
				result = sut.Compilation._split_command(command, 'nope', 'nope')
	self.assertFalse(test('cc'))			self.assertIsNone(result)
	self.assertFalse(test('gcc'))
	self.assertFalse(test('clang'))

	self.assertTrue(test('c++'))
	self.assertTrue(test('g++'))
	self.assertTrue(test('g++-5.3.1'))
	self.assertTrue(test('clang++'))
	self.assertTrue(test('clang++-3.7.1'))
	self.assertTrue(test('armv7_neno-linux-gnueabi-g++'))

	def test_action(self):			def test_action(self):
	self.assertIsNotNone(sut.split_command(['clang', 'source.c']))			self.assert_compilation(['clang', 'source.c'])
	self.assertIsNotNone(sut.split_command(['clang', '-c', 'source.c']))			self.assert_compilation(['clang', '-c', 'source.c'])
	self.assertIsNotNone(sut.split_command(['clang', '-c', 'source.c',			self.assert_compilation(['clang', '-c', 'source.c', '-MF', 'a.d'])
	'-MF', 'a.d']))
				self.assert_non_compilation(['clang', '-E', 'source.c'])
	self.assertIsNone(sut.split_command(['clang', '-E', 'source.c']))			self.assert_non_compilation(['clang', '-c', '-E', 'source.c'])
	self.assertIsNone(sut.split_command(['clang', '-c', '-E', 'source.c']))			self.assert_non_compilation(['clang', '-c', '-M', 'source.c'])
	self.assertIsNone(sut.split_command(['clang', '-c', '-M', 'source.c']))			self.assert_non_compilation(['clang', '-c', '-MM', 'source.c'])
	self.assertIsNone(
	sut.split_command(['clang', '-c', '-MM', 'source.c']))			def assert_source_files(self, expected, command):
				result = sut.Compilation._split_command(command, 'nope', 'nope')
				self.assertIsNotNone(result)
				self.assertEqual(expected, result.files)

	def test_source_file(self):			def test_source_file(self):
	def test(expected, cmd):			self.assert_source_files(['src.c'], ['clang', 'src.c'])
	self.assertEqual(expected, sut.split_command(cmd).files)			self.assert_source_files(['src.c'], ['clang', '-c', 'src.c'])
				self.assert_source_files(['src.C'], ['clang', '-x', 'c', 'src.C'])
				self.assert_source_files(['src.cpp'], ['clang++', '-c', 'src.cpp'])
				self.assert_source_files(['s1.c', 's2.c'],
				['clang', '-c', 's1.c', 's2.c'])
				self.assert_source_files(['s1.c', 's2.c'],
				['cc', 's1.c', 's2.c', '-ldp', '-o', 'a.out'])
				self.assert_source_files(['src.c'],
				['clang', '-c', '-I', './include', 'src.c'])
				self.assert_source_files(['src.c'],
				['clang', '-c', '-I', '/opt/inc', 'src.c'])
				self.assert_source_files(['src.c'],
				['clang', '-c', '-Dconfig=file.c', 'src.c'])

	test(['src.c'], ['clang', 'src.c'])			self.assert_non_compilation(['cc', 'this.o', 'that.o', '-o', 'a.out'])
	test(['src.c'], ['clang', '-c', 'src.c'])			self.assert_non_compilation(['cc', 'this.o', '-lthat', '-o', 'a.out'])
	test(['src.C'], ['clang', '-x', 'c', 'src.C'])
	test(['src.cpp'], ['clang++', '-c', 'src.cpp'])
	test(['s1.c', 's2.c'], ['clang', '-c', 's1.c', 's2.c'])
	test(['s1.c', 's2.c'], ['cc', 's1.c', 's2.c', '-ldep', '-o', 'a.out'])
	test(['src.c'], ['clang', '-c', '-I', './include', 'src.c'])
	test(['src.c'], ['clang', '-c', '-I', '/opt/me/include', 'src.c'])
	test(['src.c'], ['clang', '-c', '-D', 'config=file.c', 'src.c'])

	self.assertIsNone(
	sut.split_command(['cc', 'this.o', 'that.o', '-o', 'a.out']))
	self.assertIsNone(
	sut.split_command(['cc', 'this.o', '-lthat', '-o', 'a.out']))

	def test_filter_flags(self):			def assert_flags(self, expected, flags):
	def test(expected, flags):
	command = ['clang', '-c', 'src.c'] + flags			command = ['clang', '-c', 'src.c'] + flags
	self.assertEqual(expected, sut.split_command(command).flags)			result = sut.Compilation._split_command(command, 'nope', 'nope')
				self.assertIsNotNone(result)
				self.assertEqual(expected, result.flags)

				def test_filter_flags(self):

	def same(expected):			def same(expected):
	test(expected, expected)			self.assert_flags(expected, expected)

	def filtered(flags):			def filtered(flags):
	test([], flags)			self.assert_flags([], flags)

	same([])			same([])
	same(['-I', '/opt/me/include', '-DNDEBUG', '-ULIMITS'])			same(['-I', '/opt/me/include', '-DNDEBUG', '-ULIMITS'])
	same(['-O', '-O2'])			same(['-O', '-O2'])
	same(['-m32', '-mmms'])			same(['-m32', '-mmms'])
	same(['-Wall', '-Wno-unused', '-g', '-funroll-loops'])			same(['-Wall', '-Wno-unused', '-g', '-funroll-loops'])

	filtered([])			filtered([])
	filtered(['-lclien', '-L/opt/me/lib', '-L', '/opt/you/lib'])			filtered(['-lclien', '-L/opt/me/lib', '-L', '/opt/you/lib'])
	filtered(['-static'])			filtered(['-static'])
	filtered(['-MD', '-MT', 'something'])			filtered(['-MD', '-MT', 'something'])
	filtered(['-MMD', '-MF', 'something'])			filtered(['-MMD', '-MF', 'something'])


	class SourceClassifierTest(unittest.TestCase):			class SourceClassifierTest(unittest.TestCase):

				def assert_non_source(self, filename):
				result = sut.classify_source(filename)
				self.assertIsNone(result)

				def assert_c_source(self, filename, force):
				result = sut.classify_source(filename, force)
				george.karpenkovUnsubmitted Not Done Reply Inline Actions Why is the parameter called `force` if the `classify_source` parameter is called `c_compiler`? george.karpenkov: Why is the parameter called `force` if the `classify_source` parameter is called `c_compiler`?
				self.assertEqual('c', result)

				def assert_cxx_source(self, filename, force):
				result = sut.classify_source(filename, force)
				self.assertEqual('c++', result)

	def test_sources(self):			def test_sources(self):
	self.assertIsNone(sut.classify_source('file.o'))			self.assert_non_source('file.o')
	self.assertIsNone(sut.classify_source('file.exe'))			self.assert_non_source('file.exe')
	self.assertIsNone(sut.classify_source('/path/file.o'))			self.assert_non_source('/path/file.o')
	self.assertIsNone(sut.classify_source('clang'))			self.assert_non_source('clang')

	self.assertEqual('c', sut.classify_source('file.c'))			self.assert_c_source('file.c', True)
	self.assertEqual('c', sut.classify_source('./file.c'))			self.assert_cxx_source('file.c', False)
	self.assertEqual('c', sut.classify_source('/path/file.c'))
	self.assertEqual('c++', sut.classify_source('file.c', False))			self.assert_cxx_source('file.cxx', True)
	self.assertEqual('c++', sut.classify_source('./file.c', False))			self.assert_cxx_source('file.cxx', False)
	self.assertEqual('c++', sut.classify_source('/path/file.c', False))			self.assert_cxx_source('file.c++', True)
				self.assert_cxx_source('file.c++', False)
				self.assert_cxx_source('file.cpp', True)
				self.assert_cxx_source('file.cpp', False)

				self.assert_c_source('/path/file.c', True)
				self.assert_c_source('./path/file.c', True)
				self.assert_c_source('../path/file.c', True)
				self.assert_c_source('/file.c', True)
				self.assert_c_source('./file.c', True)