This is an archive of the discontinued LLVM Phabricator instance.

[Tooling] Improve handling of CL-style options
ClosedPublic

Authored by hamzasood on Aug 27 2018, 2:05 PM.

Details

Summary

This patch fixes the handling of clang-cl options in InterpolatingCompilationDatabase.
They were previously ignored completely, which led to a lot of bugs:

  • Additional options were being added with the wrong syntax. E.g. a file was specified as C++ by adding -x c++, which causes an error in CL mode.
  • The args were parsed and then rendered, which means that the aliasing information was lost. E.g. /W3 was rendered to -Wall, which in CL mode means -Weverything.
  • CL options were ignored when checking things like -std=, so a lot of logic was being bypassed.

Diff Detail

Event Timeline

hamzasood created this revision.Aug 27 2018, 2:05 PM
rnk added inline comments.Aug 27 2018, 2:16 PM
lib/Tooling/InterpolatingCompilationDatabase.cpp
182–184

I think you need to do this up front, otherwise you'll parse some args in non-cl mode and some in cl mode.

hamzasood updated this revision to Diff 162762.Aug 27 2018, 3:27 PM

Thanks, I've changed it so that the driver mode is calculated first.
I've also extended the unit test to include the various --driver-mode combinations.

rnk added inline comments.Aug 27 2018, 5:05 PM
lib/Tooling/InterpolatingCompilationDatabase.cpp
136

We support being called "CL.exe", but with our new VS integration, I don't think it matters to check for this case anymore. We may remove it over time.

140

Can you not keep using ParseArgs? It takes FlagsToInclude and FlagsToExclude, which you can compute outside the loop now.

156–169

I think the old for loop was nicer. I don't think this code needs to change, you should be able to use ParseArgs with the extra flag args.

hamzasood marked an inline comment as done.Aug 28 2018, 1:49 AM
hamzasood added inline comments.
lib/Tooling/InterpolatingCompilationDatabase.cpp
136

Should I add the check anyway?

140

Good point, I forgot to move the flags out of the loop when moving the —driver-mode check.

But I don’t think ParseArgs is suitable (see the other comment response).

156–169

The problem with the old loop is that we lose aliasing information (e.g. /W3 becomes -Wall). While ParseOneArg also performs alias conversion, it gives us indices for each individual argument. The last line of the new loop copies arguments by using that index information to read from OldArgs, which gives us the original spelling.

Hi, sorry about overlooking cl mode flags when adding this, I was blissfully unaware that compile_commands.json could use that syntax :-)

Out of curiosity, are you using this with clangd or some other tool? I'm sure there are places where clangd injects unixy flags...
Will take a look and try to understand.

Be aware of D51314 which is fixing some nasty performance pitfalls of InterpolatingCompilationDatabase (it should be logic neutral, but moves around the parsing code).

Could you reupload the patch with context?

hamzasood updated this revision to Diff 162872.Aug 28 2018, 8:15 AM
  • Re-uploaded with full context.

Yeah, I encountered these issues while using clangd on Windows. I haven't run into any clangd issues after applying this patch, but admittedly my usage is very basic (pretty much just code completion). It may well be subtly broken in other places.

Thanks for fixing this!
Mostly just picky style comments.
In particular, I know that some of the other maintainers here are just as ignorant of the cl driver as I am, and I want to make sure that it's still possible to follow the logic and debug unrelated problems without needing to know too much about it.

If some of the style bits is too annoying or unclear, happy to do some of it myself as a followup, let me know!

lib/Tooling/InterpolatingCompilationDatabase.cpp
124

nit: a two-value enum would be clearer and allow for terser variable names (Mode)

156

would you mind reverting this change and just wrapping the old Argv in an InputArgList?
I'm not sure the lifetime comments and std::transform aid readability here.

156–169

Makes sense, please add a comment summarizing. ("We don't use ParseArgs, as we must pass through the exact spelling of uninteresting args. Re-rendering is lossy for clang-cl options, e.g. /W3 -> -Wall")

175

these names are somewhat confusing - "End" neither marks the end of the loop nor the end of the current item (as it's initialized to Start).
What about:

for (unsigned Pos = 1; Pos < OldArgs.size();) {
  ...
  unsigned OldPos = Pos;
  Arg = ParseOneArg(..., Pos);
  ...
  NewArgs.insert(NewArgs.end(), &OldArgs[OldPos], &OldArgs[Pos]);
}
178

This seems a bit verbose.

First, do things actually break if we just use the default 0/0 masks? We're not trying to interpret all the flags, just a few and pass the rest through.

Failing that, it seems clearer to just use a ternary to initialize Included/Excluded, or inline them completely.
(Please also drop the extra scope here).

186

Arg.reset(OptTable->ParseOneArg...)

194

can we inline this and just || all the options together here?

197

please keep the mentions of -x etc even if not comprehensive - it's hard to precisely+tersely describe these flags in prose.

203

ditto --std

205

prefer *either* optional<> or allowing the function to return lang_unspecified, but not both. (There's no way a user can *explicitly* pass a flag saying the language is unspecified, right?)

225

as above, please revert comment

231

can we unify as addTypeFlag(TargetType, Mode, Result.CommandLine)?

258

Why do we need to take IsClMode into account while reading flags?
How would OPT__SLASH_Fa get parsed if we weren't in CL mode? Would we care?

(and below)

267

parseTypeArg?

283

parseStandardArg?

unittests/Tooling/CompilationDatabaseTest.cpp
652

I'm not sure the parameterized test is the right model here.
The INSTANTIATE_TEST_P and friends makes the test fixture pretty complicated, and failure messages hard to relate to input conditions.
Then the actual tests have a bunch of assertions conditional on which mode we're in.
Finally, we don't actually test any mixed-mode database.

Could we write this a little more directly?:

  • pass the driver mode flag to add() in the normal way, in the Flags param
  • require callers to "add" to pass "clang" or "clang-cl". (It's OK that this makes existing cases a bit more verbose)
  • explicitly test the clang-cl and --driver-mode cases we care about, rather than running every assertion in every configuration

e.g.

TEST_F(InterpolateTest, ClangCL) {
  add("foo.cc", "clang");
  add("bar.cc", "clang-cl");
  add("baz.cc", "clang --driver-mode=clang-cl");

  EXPECT_EQ(getCommand("a/bar.h"), "clang-cl -D a/baz.cc /TP");
}

Thanks for the detailed feedback; I’ll try to get a new patch up in a few hours.

However I disagree with some of them (see replies).

lib/Tooling/InterpolatingCompilationDatabase.cpp
124

The advantage of a Boolean is that it makes the checks simpler. E.g. if (isCL) instead of if (mode == DriverMode::CL) or something.

Happy to change it though if you still disagree.

156

The change was more about safety than readability. The old approach left an InputArgList of dangling pointers after moving the new args into the cmd object. This way there’s no way to accidentally access deallocated memory by using the InputArgList.

As above, happy to change if you still disagree.

178

Theoretically it could break without the flags. We need to recognise input files and strip them from the command line. If someone on a UNIX platform has an input file called /W3, that’d instead be interpreted as a warning flag and it’ll be left in. Likewise for any file in the root directory with the same spelling as a CL-only argument.

But yeah, I’ll inline the flags with a ternary.

205

Kind of: -std=hello_there is parsed as an unspecified language. We still want to strip the flag in this case, which won’t be possible if we also use the unspecified value to denote a lack of an -std flag.

231

To clarify, you mean extract this into a helper function?

258

It was an optimisation attempt in that we can do 5 less comparisons in the case where we know that there aren’t going to be any CL flags. Maybe I was trying too hard to micro-optimise...

unittests/Tooling/CompilationDatabaseTest.cpp
652

The intent was to make sure that the existing cases (that shouldn’t depend on the driver mode) don’t break, which admittedly happened while I was working on the patch.

Most of the tests aren’t conditional on the mode, and I think it’s important that they’re tested in all configurations. The parameterisation also lets us test as clang-cl, clang-cl —driver-mode=cl, and clang —driver-mode=cl. Which are all valid ways of using CL mode.

hamzasood updated this revision to Diff 163098.Aug 29 2018, 8:24 AM

I've addressed most of your comments and resolved the merge conflicts introduced in r340838.

hamzasood marked 10 inline comments as done.Aug 29 2018, 8:29 AM
hamzasood added inline comments.
lib/Tooling/InterpolatingCompilationDatabase.cpp
124

Also, there're more than just 2 driver modes. But we only care about cl and not cl.

156

I've restructured it so hopefully this is less of a concern.

unittests/Tooling/CompilationDatabaseTest.cpp
652

To clarify: the parameterisation runs the test 6 times. The isTestingCL() check narrows that down to 3.

sammccall accepted this revision.Sep 3 2018, 9:17 AM

Thanks again for fixing this.
Still a few places I feel the code could be simpler, but will let you decide how to deal with them.
I would greatly appreciate removing the parameterized tests, as that's the place where I feel least confident I can understand exactly what the intended behavior is.

lib/Tooling/InterpolatingCompilationDatabase.cpp
124

I do find if (mode == DriverMode::CL) much clearer.

"CL" isn't a particularly clear name for people who don't deal with windows much, and "!isCL" isn't a great name for the GCC-compatible driver.

156

I think a comment // ArgList is no longer valid would suffice, or storing the ArgList in an Optional and resetting it.

In principle the restructuring seems fine, but it introduced new issues: the boundaries and data flow between the constructor and processInputCommandLine is unclear. (It reads from parameters which are derived from Cmd.CommandLine and overwrites Cmd.CommandLine itself, along with other state. Its name doesn't help clarify what its responsibility is.

If you want to keep this function separate, I'd suggest:

  • make it static to avoid confusion about state while the constructor is running
  • call it parseCommandLine to reflect the processing it's doing
  • return tuple<vector<String>, DriverMode, LangStandard, Optional<Type>>
  • call it as std::tie(Cmd.CommandLine, Mode, ...) = parseCommandLine(...)

But it also seems fine as part of the constructor.

214

please remove premature optimizations (SmallVector, reserve) - this function is not (any more) performance-critical

216

simple loop is more readable than transform()

231

Right - you already have the helper function toCLflag, I'd suggest extending/renaming it so it covers both CL and GCC and fits the call site more conveniently.

256

can we just inline this ("--driver-mode") like we do with other specific we need in their string form (std etc)?

260
if (S.consume_front(...))
  return S == "cl";
unittests/Tooling/CompilationDatabaseTest.cpp
652

Understood. I don't think the extra test coverage here is worth the infrastructure complexity, and would strongly prefer to add a handful of tests covering driver mode interactions instead.

This revision is now accepted and ready to land.Sep 3 2018, 9:17 AM
This revision was automatically updated to reflect the committed changes.

Thanks for the help with this.
I reverted most of my changes (including the parameterised tests) before committing; I think I implemented all of your suggestions.