This is an archive of the discontinued LLVM Phabricator instance.

[llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable
ClosedPublic

Authored by MaskRay on Jul 9 2020, 10:53 PM.

Details

Summary

for the advantage outlined by D83639 ([OptTable] Support grouped short options)

Some behavior changes:

  • -i={0,false} is removed. Use --no-inlines instead.
  • --demangle={0,false} is removed. Use --no-demangle instead
  • -untag-addresses={0,false} is removed. Use --no-untag-addresses instead

Added a higher level API OptTable::parseArgs which handles optional
initial options populated from an environment variable, expands response
files recursively, and parses options.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 9 2020, 10:53 PM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay edited the summary of this revision. (Show Details)Jul 9 2020, 10:54 PM

-f is a GNU option, so we should keep that for compatibility. Is there a reason it's being removed?
... actually, the patch says -f is dropped, but from the tablegen, it looks like it's still there?

LGTM to the general idea of replacing cl::opt with tablegen options for simple command line tools. Will take a look when the grouped option issue is addressed.

MaskRay updated this revision to Diff 277276.Jul 12 2020, 12:25 AM
MaskRay retitled this revision from WIP: [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable to [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable.
MaskRay edited the summary of this revision. (Show Details)

Ready for review.

-f is a GNU option, so we should keep that for compatibility. Is there a reason it's being removed?
... actually, the patch says -f is dropped, but from the tablegen, it looks like it's still there?

Fixed. A subsequent patch can drop -f= and -e= which are not conventional. This patch tries hard to stick with the current behavior to prevent surprise.

Honestly, I'm not convinced by the change, looking at this patch, at least not without much more stuff being added to common library code. In particular, I don't like how significant code needs adding in this patch to handle a number of things that were previously handled by the CommandLine code, for example help handling, type validation, and unknown argument detection.

In high-level terms, I don't think I really care either way which command line processing option is used, if it is straightforward to use, so I'm not opposed to a switchover, but at the moment it seems to be significantly complicating things, not simplifying, for relatively little gain.

(P.S. Don't forget to update documentation).

llvm/test/tools/llvm-symbolizer/untag-addresses.test
5

This isn't mentioned in the description, but probably should be.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
75–76

Is this redundant now? You've added code that looks like it prints it.

MaskRay updated this revision to Diff 278847.Jul 17 2020, 10:52 AM
MaskRay marked 2 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Rebase after OptTable change was in.

MaskRay added a comment.EditedJul 17 2020, 11:03 AM

Honestly, I'm not convinced by the change, looking at this patch, at least not without much more stuff being added to common library code. In particular, I don't like how significant code needs adding in this patch to handle a number of things that were previously handled by the CommandLine code, for example help handling, type validation, and unknown argument detection.

I think the library complexity is OK - 50 lines in OptTable.cpp which can be shared with other utilities.

In high-level terms, I don't think I really care either way which command line processing option is used, if it is straightforward to use, so I'm not opposed to a switchover, but at the moment it seems to be significantly complicating things, not simplifying, for relatively little gain.

cl::ParseCommandLineOptions has a baked-in interface, with a bunch of customizations the user may or may not want to accept. For example, -bool=0 & -bool=false may feel exotic for users coming from GNU getopt_long. --flag & --no-flag requires explicit position comparison.

OptTable appears to be a lighter interface with less customization. It does not call Tbl.findNearest, so the tool needs to do it by itself. Other than the few boilerplate lines I don't find the complexity is significant. We could move spelling checking to OptTable if more utilities want to use OptTable.

(P.S. Don't forget to update documentation).

After the switchover, the interface very closely matches the previous behavior. I find that there is not a need for a change to docs/CommandGuide/llvm-symbolizer.rst

-untag-addresses={0,false} is not documented (this looks to me a debug option). We can do it in a separate change.

MaskRay updated this revision to Diff 278868.Jul 17 2020, 11:31 AM

Add --no-inlines to docs/CommandGuide/llvm-symbolizer.rst

Sorry, I've been off for the past two working days, and had a bucket load of other reviews to look at too.

Honestly, I'm not convinced by the change, looking at this patch, at least not without much more stuff being added to common library code. In particular, I don't like how significant code needs adding in this patch to handle a number of things that were previously handled by the CommandLine code, for example help handling, type validation, and unknown argument detection.

I think the library complexity is OK - 50 lines in OptTable.cpp which can be shared with other utilities.

Right, I wasn't referring to the library complexity, I was referring to the fact that this change significantly increases the amount of code in llvm-symbolizer.cpp to parse the command-line, for only minimal benefit.

In high-level terms, I don't think I really care either way which command line processing option is used, if it is straightforward to use, so I'm not opposed to a switchover, but at the moment it seems to be significantly complicating things, not simplifying, for relatively little gain.

cl::ParseCommandLineOptions has a baked-in interface, with a bunch of customizations the user may or may not want to accept. For example, -bool=0 & -bool=false may feel exotic for users coming from GNU getopt_long. --flag & --no-flag requires explicit position comparison.

OptTable appears to be a lighter interface with less customization. It does not call Tbl.findNearest, so the tool needs to do it by itself. Other than the few boilerplate lines I don't find the complexity is significant. We could move spelling checking to OptTable if more utilities want to use OptTable.

You seem to be throwing out all the good things to simplify a couple of things though. Here is a (possibly non-exhaustive) list of things that were previously handled by the library, and now llvm-symbolizer has to do itself:

  1. Response file usage.
  2. Spell-checking of option names.
  3. Argument type checking and conversion (e.g. to integer, enum etc).
  4. Integer argument signedness checking.
  5. Unknown argument handling.
  6. Options specified via environment variables.
  7. Help text printing.

Most, if not all of these are things that would be useful in other tools too, and should be therefore in the library, ideally on by default. For example, there are multiple places that want to use response files (LLD, llvm-objcopy etc) which have very similar, or even identical code to achieve that. I accept that there are some things in that list, where the default behaviour may need to differ (e.g. -h should not print help in llvm-readelf), but I reckon most of these are fairly rare.

It seems like there are two main things that you want to change. 1) To not support =0/=false style arguments, and 2) to not have to explicitly compare positions of --[no-]option examples. Would a different approach resolve this much more simply: add a new type cl::flag or similar, which implicitly adds a "no" prefix version, and turns off the =0 etc parsing, and handles the last-one-wins approach itself? If you don't think this is useful, then I'd expect to see the OptTable code expanded to address my above points.

(P.S. Don't forget to update documentation).

After the switchover, the interface very closely matches the previous behavior. I find that there is not a need for a change to docs/CommandGuide/llvm-symbolizer.rst

-untag-addresses={0,false} is not documented (this looks to me a debug option). We can do it in a separate change.

I actually asked to get that documented some time after the original review landed (see D65769). I wasn't referring to this though, I was referring to the new --no-inlines option, at the time (that's been added now). Also, there are several inline examples in the doc which use -i=0, which would still need updating.

You seem to be throwing out all the good things to simplify a couple of things though. Here is a (possibly non-exhaustive) list of things that were previously handled by the library, and now llvm-symbolizer has to do itself:

  1. Response file usage.
  2. Spell-checking of option names.
  3. Argument type checking and conversion (e.g. to integers, enum, etc).
  4. Integer argument signedness checking.
  5. Unknown argument handling.
  6. Options specified via environment variables.
  7. Help text printing.

When looking at refactoring, I would emphasis direction over how close it is to some "ultimate" form. Global variables + llvm::cl isn't really sustainable, and the patch proposes to migrate away from it. To us, a community dealing with languages, it is quite natural to solve the problem of command-line parsing with a language. So I like the proposed solution.

The patch has implemented the features mentioned above, meets the minimal bar for refactoring -- not breaking anything, including user experience. They may not be implemented in some ideal places, but the progress models separation of concern. If we find that some demands are common enough, we can put them in OptTable API later.

You seem to be throwing out all the good things to simplify a couple of things though. Here is a (possibly non-exhaustive) list of things that were previously handled by the library, and now llvm-symbolizer has to do itself:

  1. Response file usage.
  2. Spell-checking of option names.
  3. Argument type checking and conversion (e.g. to integers, enum, etc).
  4. Integer argument signedness checking.
  5. Unknown argument handling.
  6. Options specified via environment variables.
  7. Help text printing.

When looking at refactoring, I would emphasis direction over how close it is to some "ultimate" form. Global variables + llvm::cl isn't really sustainable, and the patch proposes to migrate away from it. To us, a community dealing with languages, it is quite natural to solve the problem of command-line parsing with a language. So I like the proposed solution.

The patch has implemented the features mentioned above, meets the minimal bar for refactoring -- not breaking anything, including user experience. They may not be implemented in some ideal places, but the progress models separation of concern. If we find that some demands are common enough, we can put them in OptTable API later.

I think your minimal refactoring bar is perhaps different to mine - I would argue that a refactor that makes the code more complicated is not a good idea.

I'm not arguing against moving to use OptTable in general, but I think every single one of my examples there already has at least one, and most more than one, other place that has the same or closely related requirement. We should improve the library first before trying to refactor this code to use it.

I think your minimal refactoring bar is perhaps different to mine - I would argue that a refactor that makes the code more complicated is not a good idea.

I'm not arguing against moving to use OptTable in general, but I think every single one of my examples there already has at least one, and most more than one, other place that has the same or closely related requirement. We should improve the library first before trying to refactor this code to use it.

I see some complicated code this time and made a few suggestions. However, to me, duplicated functionality does not imply complicated code. "Complicated" means that the code itself is difficult to comprehend, with or without some other place that has similar pieces. I'm fine with duplicating logic that preexists elsewhere during refactoring because we often need to look at more than one copies of the work to figure out a proper abstraction.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
227

Uses of NewArgv finish here. Please consider moving these steps into a function.

232

Please consider moving this into a function with a name that suggests the intent.

288

For example, you may move these into a function called "decideHowToPrintFunctions," then you don't need to immediately show how to do that in code.

298

Line 250 seems to have similar logic. Please consider moving the functionality into a function.

MaskRay updated this revision to Diff 282122.Jul 30 2020, 9:31 PM
MaskRay marked 4 inline comments as done.

Address comments

MaskRay updated this revision to Diff 282125.Jul 30 2020, 9:36 PM

Expand some auto to opt::Arg

LGTM.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
257–261

You might have forgotten to remove this line.

MaskRay updated this revision to Diff 282128.Jul 30 2020, 10:29 PM
MaskRay marked an inline comment as done.

Delete an unused variable

jhenderson requested changes to this revision.Jul 31 2020, 12:12 AM

I've already said my piece. I'm not prepared to accept this as-is until some effort is made to move the logic into a library, like it was before, so that the size and complexity of the tool's command-line parsing code is similar to what it was before.

I think your minimal refactoring bar is perhaps different to mine - I would argue that a refactor that makes the code more complicated is not a good idea.

I'm not arguing against moving to use OptTable in general, but I think every single one of my examples there already has at least one, and most more than one, other place that has the same or closely related requirement. We should improve the library first before trying to refactor this code to use it.

I see some complicated code this time and made a few suggestions. However, to me, duplicated functionality does not imply complicated code. "Complicated" means that the code itself is difficult to comprehend, with or without some other place that has similar pieces. I'm fine with duplicating logic that preexists elsewhere during refactoring because we often need to look at more than one copies of the work to figure out a proper abstraction.

There are many factors to complexity. Clearly a solution that is easy to understand but takes up 2/3/4 etc times as many lines of code is more complex than an equivalently simple to understand single line of code. Beyond that, using library interfaces which already do all the work is also simpler than writing the code out by hand, as long as the library interface is simple enough, and I believe the cl::opt one is fairly straightforward. Using libraries is also significantly more maintainable, and ensures that library improvements and fixes are picked up in the tool without having to be copied to every place doing the same thing. This change is going in the opposite direction on all counts as things stand.

This revision now requires changes to proceed.Jul 31 2020, 12:12 AM

I've already said my piece. I'm not prepared to accept this as-is until some effort is made to move the logic into a library, like it was before, so that the size and complexity of the tool's command-line parsing code is similar to what it was before.

I see some complicated code this time and made a few suggestions. However, to me, duplicated functionality does not imply complicated code. "Complicated" means that the code itself is difficult to comprehend, with or without some other place that has similar pieces. I'm fine with duplicating logic that preexists elsewhere during refactoring because we often need to look at more than one copies of the work to figure out a proper abstraction.

There are many factors to complexity. Clearly a solution that is easy to understand but takes up 2/3/4 etc times as many lines of code is more complex than an equivalently simple to understand single line of code. Beyond that, using library interfaces which already do all the work is also simpler than writing the code out by hand, as long as the library interface is simple enough, and I believe the cl::opt one is fairly straightforward. Using libraries is also significantly more maintainable, and ensures that library improvements and fixes are picked up in the tool without having to be copied to every place doing the same thing. This change is going in the opposite direction on all counts as things stand.

I thought you were talking about the complication aspect, now I understand that you are talking about complexity, thus the net of relations in code. I would say it would be ideal if every refactoring can maintain low complexity while building a better abstraction. But, as you may know, perfection is the enemy of progress. I doubt designing and implementing new features in OptTable are the uttermost important thing we should work on right now and the thing that motivates this change.

If we adopt this change, then the next person will have a basis that does not involve global variables to work on. Then he or she can do a survey on existing demand in other LLVM projects to tell if we want these features to appear in libraries. I deem this picture as progress. Telling contributors "to get your patch landed, you need to work on a problem that I have only a list of feature titles, entirely outside the component that I'm reviewing" often goes the opposite direction on maintaining sustainability in an open-source project.

I thought you were talking about the complication aspect, now I understand that you are talking about complexity, thus the net of relations in code. I would say it would be ideal if every refactoring can maintain low complexity while building a better abstraction. But, as you may know, perfection is the enemy of progress. I doubt designing and implementing new features in OptTable are the uttermost important thing we should work on right now and the thing that motivates this change.

I'm not sure I understand your distinction between complication and complexity. Perfection may well be the enemy of progress, but my point is that in its current state, I don't think this is progress. The global variables aren't causing any real issues (and indeed, they are effectively still there, just slightly hidden behind the tablegen abstraction). The issue is a lack of functionality in the cl::opt library for "last-one-wins" type functionality and similarly missing support for "--no-" type flags. Neither of these are exactly pressing issues, since there can already be implemented in fairly straightforward ways using the existing code. However, I'm willing to support changes to make them more natural, as long as they are net improvements. I think this change in its current form is one step forward (making the flags more natural) and about eight steps back (see list of things that the code now has to do itself which it didn't before).

If we adopt this change, then the next person will have a basis that does not involve global variables to work on. Then he or she can do a survey on existing demand in other LLVM projects to tell if we want these features to appear in libraries. I deem this picture as progress. Telling contributors "to get your patch landed, you need to work on a problem that I have only a list of feature titles, entirely outside the component that I'm reviewing" often goes the opposite direction on maintaining sustainability in an open-source project.

In this case, there are multiple clients already out there that require some or all of the features I've already listed. LLD and llvm-objcopy are just two examples. @MaskRay and I are already familiar with both code bases, so it's not like it's an arduous task to figure out that there is need for these features. LLVM is at its core a series of libraries, with various tools implemented on top of them. If we don't work to make sure as much useful code as possible is in those libraries (useful in the sense of there being more than one tool using it), then the wheel will be reinvented over-and-over again with all the corresponding costs and maintenance burden. For example, there have been issues in response file handling in one tool, because it was implemented differently there to others (I forget exactly which one and don't have the time right now to find the reference) - if these had all used a single version of code contained in a library, then the problem would never have occurred.

If on the other hand, we were to accept this patch in its current form, there's no requirement or indeed any real likelihood anybody will come along and add the missing functionality at some point, because the code already does the job it needs to. That leaves this code in a worse state than it currently is in.

MaskRay updated this revision to Diff 282252.Jul 31 2020, 9:53 AM
MaskRay edited the summary of this revision. (Show Details)

Add a higher level API OptTable::parseArgs

Many thanks to @lichray's feedback!

  1. Response file usage.
  2. Spell-checking of option names.
  3. Argument type checking and conversion (e.g. to integer, enum etc).
  4. Integer argument signedness checking.
  5. Unknown argument handling.
  6. Options specified via environment variables.
  7. Help text printing.

1, 2, 5 and 6 are handled by the new high-level API OptTable::parseArgs.

For 7, I think printHelp is as simple as if we handle it in the library.

For 3 and 4, they are handled by the same function parseIntArg and the underlying llvm::to_integer.
We could save some code by adding a new class representing the option type to .td files.
The appropriate abstraction requires more thoughts and probably needs some a few more cases
for us to form an opinion.

FWIW, llvm-symbolizer.cpp is 17 lines shorter than the previous revision.

Thanks. This looks much better to me. I hope to see other users for the new higher-level API in due course!

llvm/docs/CommandGuide/llvm-symbolizer.rst
222–223

I believe this option no longer exists?

llvm/include/llvm/Option/OptTable.h
236

argc -> Argc, argv -> Argv to silence the linter (same throughout).

llvm/lib/Option/OptTable.cpp
516

Nit: I might add a question mark to the end of this message, since it's a question.

bd1976llvm added inline comments.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
272

tab?

MaskRay marked 3 inline comments as done.Aug 3 2020, 9:59 AM
MaskRay added inline comments.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
272

No. New Phabricator displays this when it detects that the indentation changes.

MaskRay updated this revision to Diff 282661.Aug 3 2020, 9:59 AM

Address comments

jhenderson accepted this revision.Aug 4 2020, 12:52 AM

Looks good, but this probably needs a release note to highlight the change in how to disable certain options, in case anybody was relying on the old behaviour.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
203

Argc, Argv here too.

This revision is now accepted and ready to land.Aug 4 2020, 12:52 AM
MaskRay updated this revision to Diff 282933.Aug 4 2020, 8:45 AM

argc -> Argc

MaskRay updated this revision to Diff 282935.Aug 4 2020, 8:51 AM

Add some MetaVarName<>

This revision was landed with ongoing or failed builds.Aug 4 2020, 8:53 AM
This revision was automatically updated to reflect the committed changes.

@MaskRay http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/33877/ is failing with the error:

$ "c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\build\bin\llvm-symbolizer.exe" "-obj=c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm-project\llvm\test\tools\llvm-symbolizer\pdb/Inputs/test.exe" "-demangle=false"
# command stderr:
error: unknown argument '-demangle=false'

@MaskRay http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/33877/ is failing with the error:

$ "c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\build\bin\llvm-symbolizer.exe" "-obj=c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm-project\llvm\test\tools\llvm-symbolizer\pdb/Inputs/test.exe" "-demangle=false"
# command stderr:
error: unknown argument '-demangle=false'

Fixing

@MaskRay http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/33877/ is failing with the error:

$ "c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\build\bin\llvm-symbolizer.exe" "-obj=c:\ps4-buildslave2\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm-project\llvm\test\tools\llvm-symbolizer\pdb/Inputs/test.exe" "-demangle=false"
# command stderr:
error: unknown argument '-demangle=false'

Fixing

Fixed in 0729a772806e5ae38603c164c2f60e5e9f9e65e5. I don't have Windows or HAVE_DIA_SDK to test it, though.

rupprecht added inline comments.Aug 10 2020, 11:25 AM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
287

This is documented as a flag, why was it removed?

MaskRay added inline comments.Aug 10 2020, 3:36 PM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
287

It was unintentional. I think we may need to add/rename the negative option to --no-use-symbol-table. Some investigation is needed to create a test for it. I think previously it was probably a debug-purpose option, so there is no use-symbol-table=false test.

Dor1s added a subscriber: Dor1s.Sep 2 2020, 5:26 PM

Fangrui, what made you think it's fine to remove documented options and break compatibility for the users of the tool?

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
36

@MaskRay you silently removed this option and didn't even update the documentation: https://llvm.org/docs/CommandGuide/llvm-symbolizer.html#cmdoption-llvm-symbolizer-use-symbol-table

Dor1s added inline comments.Sep 2 2020, 5:28 PM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
287

Please avoid landing changes like this in future. Either do the investigation and make sure everything is fixed (i.e. documentation updated, no compatibility changes), or do not remove stuff silently like this.

MaskRay added a comment.EditedSep 2 2020, 5:41 PM

Please avoid landing changes like this in future. Either do the investigation and make sure everything is fixed (i.e. documentation updated, no compatibility changes), or do not remove stuff silently like this.

I have mentioned that (nearly one month ago) it was my omission and I apologized for that.

Fangrui, what made you think it's fine to remove documented options and break compatibility for the users of the tool?

I don't think the option is properly documented, though. This is a large refactoring and it is difficult to ensure every option is good. I had tried very hard as you can probably see from the history of this patch. I shall also mention that this option is a completed untested feature so I somehow missed it. To be more constructive, do you have a use case where this option or its inverse is needed?

@Dor1s Sent D87067.

I don't know anyone uses --use-symbol-table or --no-use-symbol-table, so I will not add them. When we do so, we should ensure proper test coverage.

Dor1s added a comment.Sep 9 2020, 12:55 PM

To be more constructive, do you have a use case where this option or its inverse is needed?

Yes, removal of this argument did break one pretty large system used by different projects / companies. We had to investigate and hotfix it on our end, but my point here that in general it's better not to break compatibility of the existing interfaces (doesn't matter whether we're talking about a cmd utility or an API), unless there's a very strong reason to do so.