This is an archive of the discontinued LLVM Phabricator instance.

[scan-build-py] create decorator for compiler wrapper methods
ClosedPublic

Authored by rizsotto.mailinglist on Jan 29 2017, 12:39 AM.

Details

Summary

It's a chunk from D26390

Compiler wrapper code currently has duplicated code, when the real compiler is invoked. This change introduce a Python decorator to get rid of the duplicated code.

Diff Detail

Event Timeline

dcoughlin edited edge metadata.Jan 30 2017, 2:16 PM

While I support abstracting common functionality out to reduce duplication, I don't understand the benefit of using a decorator here as compared to more traditional methods of abstraction.

What is the benefit of the declarative decorator syntax here vs. a more imperative "with ... as" or a generator? It seems to me that using decorators here obscures the key components of the actual behavior of the function (calling the compiler, calling the analyzer) and makes it much harder to understand what the function is supposed to do.

tools/scan-build-py/libscanbuild/__init__.py
79

The LLVM style guidelines call for comments be prose, so they should be capitalized and with punctuation. See http://llvm.org/docs/CodingStandards.html#commenting

What is the benefit of the declarative decorator syntax here vs. a more imperative "with ... as" or a generator? It seems to me that using decorators here obscures the key components of the actual behavior of the function (calling the compiler, calling the analyzer) and makes it much harder to understand what the function is supposed to do.

Not sure I understand the question. What are you proposing here?

The three basic primitives are 'call the compiler', 'do wrapper functionality'. A compiler wrapper has to do both. So a wrapper needs to 'combine' these two. (It's the third)
I think that decorator is a pretty good combinator for such. (I agree the decorator currently doing 'call the compiler' and the 'combine'. It might be split up.)
But I don't see how with ... as would do a better job here. Could you present it? (What I see that would be just more code, but less expressive.)
Also don't see how generator fits better here. (For a stream of a zero or one element list? Not sure why that would be easier to understand.)

I think decorator is doing great here because it's not only provide the 'call the compiler' but also implements the 'combine'. In both of you recommendation I see the 'combine' would be part of the 'do wrapper functionality'. (I might be wrong, so please provide an example.)

jroelofs edited edge metadata.Feb 22 2017, 10:42 AM

I don't know enough about decorators to comment.

Laszlo,

Since scan-build-py is in tree and will hopefully eventually be distributed with clang, it will often be clang/llvm developers who reproduce, triage, and ultimately fix reported issues with scan-build-py. For this reason, it is very important that scan-build-py be easy for clang developers to understand and maintain. These maintainers work primarily in C++ and will be unfamiliar with the scan-build-py code base -- so it will be essential to make scan-build-py easy for newcomers to the code base to understand. As Jordan Rose wrote on cfe-dev when initially discussing scan-build-py, it should "easy for someone without too much experience with Python to be able to walk up and change some piece of it".

More specifically, this means that there is great value in being explicit about control flow. In particular it should be as easy as possible for newcomers to determine at a call site what functionality is invoked, find the callee, and then repeat that process for relevant call sites in the callee.

For example:

performTaskA()

...

def performTaskA():
  performSubtask1()
  performSubtask2()
  performSubtask3()

This makes it possible for newcomers trying to triage a bug to see a call to performTaskA(), search for it in the codebase to find its definition, and determine which, if any, of the subtasks might be responsible for the bug.

Structuring the code in this way makes it possible for people unfamiliar with the codebase to determine what parts of the codebase they need to understand at the moment and what parts they can safely ignore.

Using decorators for arbitrary function composition obscures the component subtasks:

@subtask1
@subtask2
def performTaskA():
  performSubtask3()

and makes it much harder for newcomers to scan-build-py (especially those who are C++ developers) to understand what performTaskA() does and what its inputs and outputs are.

When deciding which abstractions to use, I would encourage you to imagine you are a newcomer to the codebase trying to fix a bug. If the newcomer would need to understand what subTask1 and subTask2 do (or their inputs and outputs) to understand the high-level behavior of performTaskA() then it would be better to make the control flow explicit.

Thanks Devin your explanation. I think decorators are kind of higher-order functions. Functions which are taking one-or-more functions as argument. It's a pretty powerful abstraction. And I don't think it's hard to understand them... But let's say, they are complex and scan-build-py shall not have any!

Then I'm asking you to be a bit more concrete. How else should it be? Please show me how this commit would be better.

performTaskA()

...

def performTaskA():
  performSubtask1()
  performSubtask2()
  performSubtask3()

This example does not help me, because this code is not like this. It's not like call performSubtask1, ignore it's result, call performSubtask2 ignore it's result, etc... It's more like this:

def analyzer_wrapper():
  xxx = retrieve_parameters()
  configure_wrapper(xxx)
  result = call_real_compiler(xxx)
  try:
    do_things(result, xxx)
  expect:
    report_error()
  finally:
    return result

What my decorator was doing, it took do_things as parameter. Let's not do that!

solution 1: copy-paste the same method, change one line. (easy)
solution 2: create a base class which does this, and calls do_things which is implemented by inherited child classes. (heavy-weight, not sure will be more easier to understand)
solution 3: ???

Thanks Devin your explanation. I think decorators are kind of higher-order functions. Functions which are taking one-or-more functions as argument. It's a pretty powerful abstraction.

I agree. Decorators are not needed here though, and the surface syntax obscures the intent with no benefit.

And I don't think it's hard to understand them... But let's say, they are complex and scan-build-py shall not have any!

Laszlo, it is fine to use decorators when it makes sense to do so and when they don't make it harder to understand the intent and behavior of the program.

This example does not help me, because this code is not like this. It's not like call performSubtask1, ignore it's result, call performSubtask2 ignore it's result, etc...

The fact that there are inputs and outputs makes it all the more important for the code to make it clear about what is going on! Making it explicit what the inputs and outs are for operations will make it easier for newcomers to understand the code.

It's more like this:

def analyzer_wrapper():
  xxx = retrieve_parameters()
  configure_wrapper(xxx)
  result = call_real_compiler(xxx)
  try:
    do_things(result, xxx)
  expect:
    report_error()
  finally:
    return result

What my decorator was doing, it took do_things as parameter. Let's not do that!

Taking 'do_things' as a true parameter is fine as long the argument is explicit at the call site. Remember, the goal here is to make it easier for someone new to the codebase follow the code.

def wrap_compilation(wrapper_function):
  xxx = retrieve_parameters()
  configure_wrapper(xxx)
  real_result = call_real_compiler(xxx)
  try:
    wrapped_result = wrapper_function(real_result, xxx)
  except:
    report_error()
  finally:
    return wrapped_result

Explicitly passing wrapper_function as a lambda at the call site to wrap_compilation, for example, is a pattern that clang developers would easily be able follow since it is used throughout the clang/llvm codebase. Unfortunately, Python's syntactic affordances for dealing with lambdas are not very good, which is why I suggested using a 'with' at the call site with a generator. Something like:

@contextmanager
 def wrap_compilation():
   xxx = retrieve_parameters()
   configure_wrapper(xxx)
   real_result = call_real_compiler(xxx)
   try:
     wrapped_result = yield (real_result, xxx)
   except:
     report_error()
   finally:
     return wrapped_result
``

Thanks Devin to clarify it in details. Before your comments were too high level, abstract to me. I prefer concrete code examples in reviews... So, thanks again, really appreciated.

Taking 'do_things' as a true parameter is fine as long the argument is explicit at the call site. Remember, the goal here is to make it easier for someone new to the codebase follow the code.

def wrap_compilation(wrapper_function):
  xxx = retrieve_parameters()
  configure_wrapper(xxx)
  real_result = call_real_compiler(xxx)
  try:
    wrapped_result = wrapper_function(real_result, xxx)
  except:
    report_error()
  finally:
    return wrapped_result

Minor issue, but wrapped_result shall not go out. The wrapper must always return what the compiler was returning.

How the caller side looks like this case?

def run_analyzer_after_compilation(result, parameters):
  pass  # implement the 

def analyzer_wrapper():
  return wrap_compilation(run_analyzer_after_compilation()

Explicitly passing wrapper_function as a lambda at the call site to wrap_compilation, for example, is a pattern that clang developers would easily be able follow since it is used throughout the clang/llvm codebase. Unfortunately, Python's syntactic affordances for dealing with lambdas are not very good, which is why I suggested using a 'with' at the call site with a generator. Something like:

@contextmanager
 def wrap_compilation():
   xxx = retrieve_parameters()
   configure_wrapper(xxx)
   real_result = call_real_compiler(xxx)
   try:
     wrapped_result = yield (real_result, xxx)
   except:
     report_error()
   finally:
     return wrapped_result
``

PyCharm warning: "Python versions < 3.3 do not allow 'return' with argument inside generator." or Python 2.7 error message "SyntaxError: 'return' with argument inside generator"

I got your concerns, but on this language decorator is the right tool to solve issues like this. Or we go with the no-decorator syntax, but doing exactly the same thing. What do you say?

PyCharm warning: "Python versions < 3.3 do not allow 'return' with argument inside generator." or Python 2.7 error message "SyntaxError: 'return' with argument inside generator"

That's unfortunate!

I got your concerns, but on this language decorator is the right tool to solve issues like this. Or we go with the no-decorator syntax, but doing exactly the same thing. What do you say?

If we can't figure out a way to abstract the functionality while keeping it clear then we shouldn't abstract it. It is more important to have the code be understandable than abstracted.

I got your concerns, but on this language decorator is the right tool to solve issues like this. Or we go with the no-decorator syntax, but doing exactly the same thing. What do you say?

If we can't figure out a way to abstract the functionality while keeping it clear then we shouldn't abstract it. It is more important to have the code be understandable than abstracted.

Devin, please don't abstract! :) Can you be more specific?

solution1: current patch, with decorator. (you don't like it)
solution2: generator approach (python does not like it)
solution3:

def wrap_compilation(wrapper_function):
  xxx = retrieve_parameters()
  configure_wrapper(xxx)
  result = call_real_compiler(xxx)
  try:
    wrapper_function(real_result, xxx)
  except:
    report_error()
  finally:
    return result

...

def run_analyzer_after_compilation(result, parameters):
  pass  # implement it

@command_entry_point
def analyzer_wrapper():
  return wrap_compilation(run_analyzer_after_compilation()

solution4: ???

change decorator to a simple higher order function. call is explicit now.

dcoughlin accepted this revision.Mar 3 2017, 8:00 AM

Thanks Laszlo, this looks fantastic. I appreciate your patience!

tools/scan-build-py/libscanbuild/__init__.py
220

This is great!

This revision is now accepted and ready to land.Mar 3 2017, 8:00 AM
This revision was automatically updated to reflect the committed changes.