This is an archive of the discontinued LLVM Phabricator instance.

Remove Args::m_argv
Needs RevisionPublic

Authored by zturner on Sep 26 2016, 10:16 PM.

Details

Reviewers
clayborg
Summary

The Args class was trying to maintain two parallel argument vectors. One that owned the memory and contained std::strings, and one that contained const char *'s that could be used similar to how argv is used.

But dealing with pointers to pointers is unsafe and should be avoided wherever possible. Furthermore, there doesn't seem to be much point in maintaining a second copy of the array, as it only serves to complicate the parsing logic since the two have to be kept in sync with each other, and really, the master copy has everything you need, including the null terminated strings.

So I removed m_argv entirely. The Args class will also no longer vend a const char**, and instead you can pass it a std::vector<const char*>& and it will fill it with null terminated argument strings. This can be used exactly like a const char** by writing &args[0], but now you have to explicitly opt into this, and by default you just get a nice safe container that can be used for example in ranged based fors. And the logic is simplified as well since the two arrays don't have to be kept in sync with each other.

All tests pass on Windows.

Diff Detail

Event Timeline

zturner updated this revision to Diff 72601.Sep 26 2016, 10:16 PM
zturner retitled this revision from to Remove Args::m_argv.
zturner updated this object.
zturner added a reviewer: clayborg.
zturner added subscribers: lldb-commits, Restricted Project.

Ignore the changes to FileSpec, it seems the two CLs I was working on got mixed together.

labath added a subscriber: labath.Sep 27 2016, 3:44 AM
labath added inline comments.
include/lldb/Interpreter/Args.h
150–159

I think doxygen will complain about @return in a function returning void.

BTW, with the NRVO, move semantics and everything, is there any reason to keep using the return-through-reference-argument pattern? It would be a lot more natural if this indeed _was_ a return value.

amccarth added inline comments.
include/lldb/Interpreter/Args.h
159

Perhaps this should actually return the vector instead of void, and rely on the return value optimization or, at worst, vector move. It would be easier to use and it would let you initialize variables as you declare them, (which means you could use it in constructor intializers and you could declare the vector as const). It also would eliminate the ambiguity of what happens if the args already contains some values (e.g., the caller might expect it to append rather than replace).

source/API/SBCommandInterpreter.cpp
119

Here's an example where, if GetArgumentVector actually returned the result, you could simplify the delcaration as:

const auto argv = command.GetArgumentVector();
source/Host/common/FileSpec.cpp
251 ↗(On Diff #72601)

Should name be const? As in:

for (const auto &name : name_list) { ... }
source/Interpreter/Args.cpp
40–41

I think your change eliminated the need for this comment. If not, it at least needs an update.

261

clang-format didn't do a great job wrapping the long lines in this comment. Can you fix it up?

411

This is an example where the ambiguity of whether GetArgumentVector appends or not could mislead the caller into doing the wrong this. This looks plausible (and more efficient):

std::vector<const char *> args;
args.push_back("dummy");
GetArgumentVector(args);

If GetArgumentVector returned the vector, then this ambiguity wouldn't exist.

clayborg requested changes to this revision.Sep 27 2016, 10:17 AM
clayborg edited edge metadata.

See inlined comments.

source/Interpreter/Args.cpp
264–265

Remove this if, it is no longer needed. Just return m_args.size()

269–272

Convert this over to return StringRef?

286–288

Remove this comment. The whole point off this function is to get an argument vector that is NULL terminated. Feel free to rename the function as desired to indicate this, but that is the whole point of this function. Feel free to document it if needed as well.

327

Is there a conversion operator that converts StringRef to std::string somewhere? How is this working? Iterators being detected?

373–376

Should we change this to an assignment operator? If so, rename "other" to "rhs".

source/Interpreter/CommandInterpreter.cpp
2049–2050

If we do the operator = above, then:

cmd_args = new_args;
2059–2060

If we do the operator = above, then:

cmd_args = new_args;
source/Interpreter/CommandObject.cpp
119

Why was this removed?

source/Interpreter/OptionValueArgs.cpp
33

args = argv;

source/Interpreter/OptionValueArray.cpp
151

args = argv;

source/Target/ProcessInfo.cpp
94

What not have a SetArguments that takes a "const char **" and does this internally? It is no less safe and makes the API more usable

This revision now requires changes to proceed.Sep 27 2016, 10:17 AM
zturner added inline comments.Sep 27 2016, 10:33 AM
source/Interpreter/Args.cpp
269–272

Yes, that is the next item on my list. But this function is used EVERYWHERE, so that will be a very large CL and should be done orthogonally.

286–288

There are two ways we use these argument vectors. One is in a call to a function which takes an argc and an argv. In that case the null terminator is unnecessary because you're specifying the argc explicitly. Another is in a call to a function which you do not specify an argc, and in tht case it requires it to be null terminated.

The point of this function isn't "return me something that's null terminated", it's "return me something I can use like an argv". That doesn't require null termination. And in fact, if you look at the callsites I fixed up, not a single one depends on the null termination. Everyone just needs to know the size. And for that you can call result.size(). If you add the null terminator, then you have to write result.size()-1, which is unnecessarily confusing. I believe it provides the most logical behavior as it is currently written.

327

Yes, StringRef has an operator std::string().

373–376

I thought about it, but it seems like a minor stylistic issue. I think it could go either way. I'm fine adding it or not.

source/Interpreter/CommandObject.cpp
119

Because it's a little weird to do it at this high of a level. This requires internal knowledge of the workings of Args::ParseOptions(), which itself calls OptionParser::Parse(), It's strange for a function this high up to require such low-level knowledge of the workings of a function. So instead, I moved this behavior into Args::ParseOptions. If you check that function, you will see that it injects this argument into the beginning.

This is nice because now it never modifies the internal state of the Args itself, but only the copy that gets passed to getopt_long_only.

source/Target/ProcessInfo.cpp
94

I would eventually like to get rid of all of these const char*'s. For example, this function's signature could itself be changed to take an ArrayRef<const char *>. Then we can keep raising this conversion higher and higher up. I actually want to make it more awkward to use unsafe types like const char **. So by adding the const char ** overload to Args, it doesn't encourage people to consider the alternative ArrayRef. On the other hand, this is similar to StringRef, in that once you have the ArrayRef<const char *> through all levels, you will almost never need this function, and calls like this will disappear as the ArrayRefs are being passed all the way down.

Ok, so just add the boolean arg to Args::GetArgumentVector and we are good to go.

source/Interpreter/Args.cpp
270–272

Ok, sounds good, for a later CL then.

286–288

Remove the comment. Even in "main(int argc, const char **argv)" the "argv" is null terminated. So all unix variants expect the null termination. The correct solution is to add a boolean argument:

std::vector<const char *> Args::GetArgumentVector(bool null_terminate = true) const {

Then it is clear.

source/Interpreter/CommandObject.cpp
119

Sounds good. I hate the fact that we have to do this. We might be able to play with the getopt_long globals to work around this and not have to add it, but as long as you took it into account I am fine.

zturner added inline comments.Sep 27 2016, 10:44 AM
source/Interpreter/Args.cpp
286–288

Even in main though, the null is not counted in the argc, right? So as written it will conform to the semantics of main. argc is the number of non-null pointers, but the null is still there anyway. If you add the boolean argument and pass true, it will give you back something whose argc (i.e. result.size()) is one too high.

zturner added inline comments.Sep 27 2016, 10:52 AM
source/Interpreter/Args.cpp
286–288

BTW, llvm::SmallString uses this same trick to return a null terminated string without modifying the size. Look at the implementation of SmallString::c_str() in SmallString.h