This is an archive of the discontinued LLVM Phabricator instance.

Move option parsing out of the Args class
ClosedPublic

Authored by labath on Feb 27 2018, 3:50 PM.

Details

Summary

The args class is used in plenty of places (a lot of them in the lower lldb
layers) for representing a list of arguments, and most of these places don't
care about option parsing. Moving the option parsing out of the class removes
the largest external dependency (there are a couple more, but these are in
static functions), and brings us closer to being able to move it to the
Utility module).

The new home for these functions is the Options class, which was already used
as an argument to the parse calls, so this just inverts the dependency between
the two.

The functions are themselves are mainly just copied -- the biggest functional
change I've made to them is to avoid modifying the input Args argument (getopt
likes to permute the argument vector), as it was weird to have another class
reorder the entries in Args class. So now the functions don't modify the input
arguments, and (for those where it makes sense) return a new Args vector
instead. I've also made the addition of a "fake arg0" (required for getopt
compatibility) an implementation detail rather than a part of interface.

While doing that I noticed that ParseForCompletion function was recording the
option indexes in the shuffled vector, but then the consumer was looking up the
entries in the unshuffled one. This manifested itself as us not being able to
complete "watchpoint set variable foo --" (because getopt would move "foo" to
the end). Surprisingly all other completions (e.g. "watchpoint set variable foo
--w") were not affected by this. However, I couldn't find a comprehensive test
for command argument completion, so I consolidated the existing tests and added
a bunch of new ones.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Feb 27 2018, 3:50 PM

I don't have any objections to the contents of this patch per se. But I wonder if having to do all this work to separate the uses of Args from the options parser so we don't drag it in in some low level uses doesn't rather mean the Args class was not appropriate for use at that level, when what they really meant was just a collection of strings. For instance, we use the Args class to hold arguments to pass to running processes. That's convenient when we parse them from commands into run-args, but hardly essential to the function of passing a collection of strings to posix_spawnp or its like.

I haven't gone through and audited all the uses you are trying to separate out from the options part of Args, but if possible it would be cleaner to have the class that's supposed to be cheek to jowl with the command line parsing, and another to store a list of strings.

I don't have any objections to the contents of this patch per se. But I wonder if having to do all this work to separate the uses of Args from the options parser so we don't drag it in in some low level uses doesn't rather mean the Args class was not appropriate for use at that level, when what they really meant was just a collection of strings. For instance, we use the Args class to hold arguments to pass to running processes. That's convenient when we parse them from commands into run-args, but hardly essential to the function of passing a collection of strings to posix_spawnp or its like.

I haven't gone through and audited all the uses you are trying to separate out from the options part of Args, but if possible it would be cleaner to have the class that's supposed to be cheek to jowl with the command line parsing, and another to store a list of strings.

I was thinking about that as well, but eventually I chose this approach. The reason for that is that there is already functionality in the Args class that is useful for posix_spawn and friends, and that's the ability to turn itself into an argv vector. So, the new class couldn't just be a vector<string>, but it would need some additional smartness. So after implementing that, I think I would need to find a way to rip that code out of the Args class (maybe by making the new class a member of Args). So the end result may end up being very similar, we would just reach in it a different way.

The other reason I want to move this out is the single responsibility principle. Right now, I can identify about 4 responsibilities of the Args class:

  • being a representation of a list of arguments (I put it as a separate item because of the argv requirement)
  • parsing a string into a list of arguments
  • parsing a list of arguments into options
  • parsing a string into random other objects (via various static functions)

Now we probably don't want to go all out and split this into 4 classes, but I figured the first two items are enough for a single class (one could even argue the two items are actually a single thing). I think parsing a string into args and parsing args into options are two sufficiently complicated algorithms that makes sense to keep them separate.

The thing I'm not 100% clear on is whether the Options class is the best home for these functions. The reason I chose this at the end (instead of e,g, putting it in a new class) was because I saw this pattern in the CommandObject:

options->NotifyOptionParsingStarting(&exe_ctx);
...
options->Parse(...);
...
options->NotifyOptionParsingFinished(&exe_ctx);

It seemed to be that having these functions in the Options class would open up possibilities for simplifying the Options interface by folding the Notify functions into the Parse call.

I still worry a bit because there's another unstated responsibility for Args which is that even though it is going to get used at a very high level in lldb it has to NOT depend on anything you don't want lldb-server to depend on. That seems like a more slippery responsibility, and one that's worth stating explicitly by making the part of Args that gets used in lldb-server its own class. But on the implementation wins principle, as long as this doesn't worry you, I'm content.

BTW, most the string -> int/address/whatever conversion functions don't belong in Args at all. It makes these fairly useful functions hard to find, and we should have some string conversion utility to hold all these convenience functions. There are a few (like picking an enum option string from a set of enums) that belong more properly in options. But most of them have no real relation to Args. But that's an orthogonal issue.

zturner added inline comments.Feb 28 2018, 11:06 AM
include/lldb/Interpreter/Options.h
123–126 ↗(On Diff #136168)

It appears that all of these could be static functions. Can we do that?

I still worry a bit because there's another unstated responsibility for Args which is that even though it is going to get used at a very high level in lldb it has to NOT depend on anything you don't want lldb-server to depend on.

This will be enforced by moving the class (once I get rid of the extra static functions like you mentioned) into the Utility module. Nothing in the Utility module can depend on anything outside of that module.

include/lldb/Interpreter/Options.h
123–126 ↗(On Diff #136168)

They can't be. All of them access the this object. If you look at the original functions, they were taking an Options& as an argument and Args as this. These have that inverted.

It would be nice if we could eventually get rid of the need to pass in a platform and execution context here, but that's work for another day.

include/lldb/Interpreter/Options.h
123–126 ↗(On Diff #136168)

I originally searched for m_ and didn't find anything so assumed they could be static. But it looks like they are calling member functions, which is why I didn't see it. It's too bad it can't even be const, given that it returns a copy of the args. Seems like an awkward interface, maybe future cleanup can try to tackle that though. Anyway, ignore my comment.

labath added inline comments.Feb 28 2018, 11:40 AM
include/lldb/Interpreter/Options.h
123–126 ↗(On Diff #136168)

The returning of args is kind of a side effect of the function. The return value contains the arguments that could not be parsed into options (more precisely, the positional arguments). The main effect of the function is that it populates the Options object with the options that have been parsed. (I should probably add that to the comment).

Okay, that sounds good then. Will you enforce the rule about the Utilities directory socially or by some mechanism?

Okay, that sounds good then. Will you enforce the rule about the Utilities directory socially or by some mechanism?

If you mean the rule that Utility can't depend on anything else, I think it's enforced implicitly. because we have a UtilityTests unit test that only links against Utility and nothing else. If anything else gets brought in, this binary will fail to link.

That said, a while back I wrote a python script (checked in under lldb/scripts) which dumps the set of header dependencies for every project. It would perhaps be a useful exercise to write a lit test that enforces this by dumping the deps of Utility and ensuring that they're empty.

Okay, that sounds good then. Will you enforce the rule about the Utilities directory socially or by some mechanism?

You will get a link error because the Utility unittest executable will have missing symbols. I'm not sure how xcode builds the unit test executables; if it goes through cmake you will definitely get it locally; if not, at the very latest you will get it from one of the bots.

labath updated this revision to Diff 136355.Feb 28 2018, 12:11 PM

Add a slightly longer comment to the Parse function operation.

The separate gtest directories don't get build separately the way Todd set the Xcode project up. The tests get built into one combo .a file that links to liblldbcore.a, and then into the test binary. So the Xcode build wouldn't see that failure.

Since you are building .a files not dylibs from Utility, the UtilityTest won't show a failure for a class that has a dependency that the tests binary doesn't use, so this sort of thing could creep in. Another reason to be vigilant about the test coverage.

Once your whole project is done, lldb-server shouldn't build if something it uses from Utility uses something not in Utility (or in some other .a file that lldb-server depends on). That doesn't help altogether, since macOS only builds the platform part of lldb-server, so the macOS builds won't check usage in the other parts of lldb-server. But the bots should be good enough.

labath added a comment.Mar 8 2018, 7:40 AM

Is there anything else you wanted me to do here?

jingham accepted this revision.Mar 8 2018, 10:20 AM

Nope.

This revision is now accepted and ready to land.Mar 8 2018, 10:20 AM
This revision was automatically updated to reflect the committed changes.