This is an archive of the discontinued LLVM Phabricator instance.

[scan-build-py] move argument parsing into separate module
ClosedPublic

Authored by rizsotto.mailinglist on Mar 3 2017, 8:22 PM.

Details

Summary

It's a chunk from D26390

It gets rid of duplicate codes for command line parsing. Introduce validation for each command. And program entry points are no longer receives parameters from commands.

Diff Detail

Repository
rL LLVM

Event Timeline

dcoughlin edited edge metadata.Mar 5 2017, 2:11 PM

Thanks for the upstreaming the patch! Comments inline.

Also: please make comments consistently into prose, including proper capitalization.

tools/scan-build-py/libscanbuild/analyze.py
67 ↗(On Diff #90566)

This is pretty confusing because the 'analyze()' function doesn't actually analyze anything. (Same with 'scan') See my comments below for suggestions on how to make this more clear.

tools/scan-build-py/libscanbuild/arguments.py
6 ↗(On Diff #90566)

I think "command line interface related duties" is pretty content free. Can you be more specific? How about: "This module parses and validates arguments for command-line interfaces."

44 ↗(On Diff #90566)

As a general rule, if the name of a function doesn't have anything in common with the concepts mentioned in the description of the function then it should probably be renamed. How about 'parse_analyze_args'? (And a similar renaming for 'intercept' and 'scan'?)

59 ↗(On Diff #90566)

As a general rule, if three functions do different things but have the same comment describing their functionality then the comment needs to be rewritten to be more precise. How about: "Parse and validate command-line arguments for scan-build." or something similar? (And the analogous rewriting for analyzer() and intercept()?)

78 ↗(On Diff #90566)

I think you can make your intent more clear with something like:

if arg.plugins == None:
     arg.plugins = []
79 ↗(On Diff #90566)

If you are really worried about future violations of this comment, it would be better to extract the checks in question into a well-named function so that any added checks are guaranteed to only be executed after the input is normalized.

When possible it is better to rely on well-structured code to avoid future pitfalls rather than leaving a land mine and hoping future contributors read and follow the comments.

130 ↗(On Diff #90566)

The name of this function makes it sound like it is analyzing a parser, but if I understand correctly its purpose is to create a parser.

You could make this more clear by renaming the function to something like 'create_analyze_parser'.

With a name that mentions what the function does you could then avoid needing to use jargon like "factory method" in the description. How about "Creates a parser for command-line arguments to 'analyze'"? (And the analogous for 'intercept'?)

143 ↗(On Diff #90566)

Is this correct? I think it is the opposite, since the intercept approach loses coverage for build systems that generate/modify source code in place.

What are the scenarios under which 'intercept' and then 'analyze' has better coverage than intertwined building and analysis?

As Laszlo noted in in D30600, capitalizing while upstreaming is making diffing with his out-of-tree sources more noisy. We can hold off on consistent capitalization until later. But we do need comments in prose, otherwise we can't tell what the intent of the upstreamed patch is and so can't review it properly.

Thanks Devin for the comments. I'm happy to have your improvement suggestions crafted, scratched. It helps me to understand what part is debated... See my replies inlined.

tools/scan-build-py/libscanbuild/analyze.py
67 ↗(On Diff #90566)

Good point, will fix that.

tools/scan-build-py/libscanbuild/arguments.py
6 ↗(On Diff #90566)

Thanks for the suggestion, will use those words. I agree it's better to be more explicit.

44 ↗(On Diff #90566)

Good point... Remember, I was hesitating between to parse_and_validate_analyze_build_args and something longer. Then I was choose the one word approach... How about analyze_build_arguments, scan_build_arguments and intercept_build_arguments? My point is, it not only parses, but parse-normalize-validate for command X. But at the end, it gives the parsed-normalized-validated arguments.

59 ↗(On Diff #90566)

Sure, will fix it.

78 ↗(On Diff #90566)

Sure.

79 ↗(On Diff #90566)

To be honest, I would write test against this worry. But that is a lot of work. (Requires to write a dummy analyzer plugin.) And it would not only test this code, but clang too, so that would be a bit fragile test.

Might create a normalize_analyze_arguments function which is called before the validation. And this method would renamed to validate_analyze_arguments. How does that sound for you?

130 ↗(On Diff #90566)

Good points. create_analyze_parser and create_intercept_parser sound good to me. Will do update the pydoc strings too.

143 ↗(On Diff #90566)

If you check it, I was not change any argument help text. This is how it was before in the analyze module. If you want to fix the documentation shall we do it in a separate PR?

But as far as I can recall, I was trying to refer that intercept library covers more build system than the wrapper approach. Might be debatable that statement. Might be better write this information somewhere else... Is it a blocker for this PR or can we discuss in another thread?

dcoughlin added inline comments.Mar 6 2017, 8:56 AM
tools/scan-build-py/libscanbuild/arguments.py
44 ↗(On Diff #90566)

The problem with analyze_build_arguments is that it starts with an action ("analyze") that is not what the function does. This means that at the call site someone reading that code may think that the function analyzes the build arguments -- but it doesn't.

In my opinion it is fine to use "parse" as the action and lump in validation and normalization into that action-- most parsers, including clang's, do some amount of validation and normalization as part of parsing so to me this would not be unexpected. But if you're worried about confusion then the longer, more descriptive parse_and_validate action seems fine to me as well. It is better to be clear at the call site than terse.

79 ↗(On Diff #90566)

This sounds great.

143 ↗(On Diff #90566)

It is not a blocker -- we can take to another thread.

fixed review comments.

dcoughlin accepted this revision.Mar 8 2017, 7:17 AM

Looks great!

This revision is now accepted and ready to land.Mar 8 2017, 7:17 AM
This revision was automatically updated to reflect the committed changes.

Thanks Devin. I forgot to add the arguments.py module at the commit, so this does not show it. But made another commit which has it. Sorry.