Page MenuHomePhabricator

[Tooling] Improve handling of CL-style options

Authored by hamzasood on Aug 27 2018, 2:05 PM.



This patch fixes the handling of clang-cl options in InterpolatingCompilationDatabase.
They were previously ignored completely, which led to a lot of bugs:

  • Additional options were being added with the wrong syntax. E.g. a file was specified as C++ by adding -x c++, which causes an error in CL mode.
  • The args were parsed and then rendered, which means that the aliasing information was lost. E.g. /W3 was rendered to -Wall, which in CL mode means -Weverything.
  • CL options were ignored when checking things like -std=, so a lot of logic was being bypassed.

Diff Detail


Event Timeline

hamzasood created this revision.Aug 27 2018, 2:05 PM
rnk added inline comments.Aug 27 2018, 2:16 PM
182–184 ↗(On Diff #162743)

I think you need to do this up front, otherwise you'll parse some args in non-cl mode and some in cl mode.

hamzasood updated this revision to Diff 162762.Aug 27 2018, 3:27 PM

Thanks, I've changed it so that the driver mode is calculated first.
I've also extended the unit test to include the various --driver-mode combinations.

rnk added inline comments.Aug 27 2018, 5:05 PM
136 ↗(On Diff #162762)

We support being called "CL.exe", but with our new VS integration, I don't think it matters to check for this case anymore. We may remove it over time.

140 ↗(On Diff #162762)

Can you not keep using ParseArgs? It takes FlagsToInclude and FlagsToExclude, which you can compute outside the loop now.

156–169 ↗(On Diff #162762)

I think the old for loop was nicer. I don't think this code needs to change, you should be able to use ParseArgs with the extra flag args.

hamzasood marked an inline comment as done.Aug 28 2018, 1:49 AM
hamzasood added inline comments.
136 ↗(On Diff #162762)

Should I add the check anyway?

140 ↗(On Diff #162762)

Good point, I forgot to move the flags out of the loop when moving the —driver-mode check.

But I don’t think ParseArgs is suitable (see the other comment response).

156–169 ↗(On Diff #162762)

The problem with the old loop is that we lose aliasing information (e.g. /W3 becomes -Wall). While ParseOneArg also performs alias conversion, it gives us indices for each individual argument. The last line of the new loop copies arguments by using that index information to read from OldArgs, which gives us the original spelling.

Hi, sorry about overlooking cl mode flags when adding this, I was blissfully unaware that compile_commands.json could use that syntax :-)

Out of curiosity, are you using this with clangd or some other tool? I'm sure there are places where clangd injects unixy flags...
Will take a look and try to understand.

Be aware of D51314 which is fixing some nasty performance pitfalls of InterpolatingCompilationDatabase (it should be logic neutral, but moves around the parsing code).

Could you reupload the patch with context?

hamzasood updated this revision to Diff 162872.Aug 28 2018, 8:15 AM
  • Re-uploaded with full context.

Yeah, I encountered these issues while using clangd on Windows. I haven't run into any clangd issues after applying this patch, but admittedly my usage is very basic (pretty much just code completion). It may well be subtly broken in other places.

Thanks for fixing this!
Mostly just picky style comments.
In particular, I know that some of the other maintainers here are just as ignorant of the cl driver as I am, and I want to make sure that it's still possible to follow the logic and debug unrelated problems without needing to know too much about it.

If some of the style bits is too annoying or unclear, happy to do some of it myself as a followup, let me know!

156–169 ↗(On Diff #162762)

Makes sense, please add a comment summarizing. ("We don't use ParseArgs, as we must pass through the exact spelling of uninteresting args. Re-rendering is lossy for clang-cl options, e.g. /W3 -> -Wall")

124 ↗(On Diff #162872)

nit: a two-value enum would be clearer and allow for terser variable names (Mode)

156 ↗(On Diff #162872)

would you mind reverting this change and just wrapping the old Argv in an InputArgList?
I'm not sure the lifetime comments and std::transform aid readability here.

175 ↗(On Diff #162872)

these names are somewhat confusing - "End" neither marks the end of the loop nor the end of the current item (as it's initialized to Start).
What about:

for (unsigned Pos = 1; Pos < OldArgs.size();) {
  unsigned OldPos = Pos;
  Arg = ParseOneArg(..., Pos);
  NewArgs.insert(NewArgs.end(), &OldArgs[OldPos], &OldArgs[Pos]);
178 ↗(On Diff #162872)

This seems a bit verbose.

First, do things actually break if we just use the default 0/0 masks? We're not trying to interpret all the flags, just a few and pass the rest through.

Failing that, it seems clearer to just use a ternary to initialize Included/Excluded, or inline them completely.
(Please also drop the extra scope here).

186 ↗(On Diff #162872)


194 ↗(On Diff #162872)

can we inline this and just || all the options together here?

197 ↗(On Diff #162872)

please keep the mentions of -x etc even if not comprehensive - it's hard to precisely+tersely describe these flags in prose.

203 ↗(On Diff #162872)

ditto --std

205 ↗(On Diff #162872)

prefer *either* optional<> or allowing the function to return lang_unspecified, but not both. (There's no way a user can *explicitly* pass a flag saying the language is unspecified, right?)

225 ↗(On Diff #162872)

as above, please revert comment

231 ↗(On Diff #162872)

can we unify as addTypeFlag(TargetType, Mode, Result.CommandLine)?

258 ↗(On Diff #162872)

Why do we need to take IsClMode into account while reading flags?
How would OPT__SLASH_Fa get parsed if we weren't in CL mode? Would we care?

(and below)

267 ↗(On Diff #162872)


283 ↗(On Diff #162872)


652 ↗(On Diff #162872)

I'm not sure the parameterized test is the right model here.
The INSTANTIATE_TEST_P and friends makes the test fixture pretty complicated, and failure messages hard to relate to input conditions.
Then the actual tests have a bunch of assertions conditional on which mode we're in.
Finally, we don't actually test any mixed-mode database.

Could we write this a little more directly?:

  • pass the driver mode flag to add() in the normal way, in the Flags param
  • require callers to "add" to pass "clang" or "clang-cl". (It's OK that this makes existing cases a bit more verbose)
  • explicitly test the clang-cl and --driver-mode cases we care about, rather than running every assertion in every configuration


TEST_F(InterpolateTest, ClangCL) {
  add("", "clang");
  add("", "clang-cl");
  add("", "clang --driver-mode=clang-cl");

  EXPECT_EQ(getCommand("a/bar.h"), "clang-cl -D a/ /TP");

Thanks for the detailed feedback; I’ll try to get a new patch up in a few hours.

However I disagree with some of them (see replies).

124 ↗(On Diff #162872)

The advantage of a Boolean is that it makes the checks simpler. E.g. if (isCL) instead of if (mode == DriverMode::CL) or something.

Happy to change it though if you still disagree.

156 ↗(On Diff #162872)

The change was more about safety than readability. The old approach left an InputArgList of dangling pointers after moving the new args into the cmd object. This way there’s no way to accidentally access deallocated memory by using the InputArgList.

As above, happy to change if you still disagree.

178 ↗(On Diff #162872)

Theoretically it could break without the flags. We need to recognise input files and strip them from the command line. If someone on a UNIX platform has an input file called /W3, that’d instead be interpreted as a warning flag and it’ll be left in. Likewise for any file in the root directory with the same spelling as a CL-only argument.

But yeah, I’ll inline the flags with a ternary.

205 ↗(On Diff #162872)

Kind of: -std=hello_there is parsed as an unspecified language. We still want to strip the flag in this case, which won’t be possible if we also use the unspecified value to denote a lack of an -std flag.

231 ↗(On Diff #162872)

To clarify, you mean extract this into a helper function?

258 ↗(On Diff #162872)

It was an optimisation attempt in that we can do 5 less comparisons in the case where we know that there aren’t going to be any CL flags. Maybe I was trying too hard to micro-optimise...

652 ↗(On Diff #162872)

The intent was to make sure that the existing cases (that shouldn’t depend on the driver mode) don’t break, which admittedly happened while I was working on the patch.

Most of the tests aren’t conditional on the mode, and I think it’s important that they’re tested in all configurations. The parameterisation also lets us test as clang-cl, clang-cl —driver-mode=cl, and clang —driver-mode=cl. Which are all valid ways of using CL mode.

hamzasood updated this revision to Diff 163098.Aug 29 2018, 8:24 AM

I've addressed most of your comments and resolved the merge conflicts introduced in r340838.

hamzasood marked 10 inline comments as done.Aug 29 2018, 8:29 AM
hamzasood added inline comments.
124 ↗(On Diff #162872)

Also, there're more than just 2 driver modes. But we only care about cl and not cl.

156 ↗(On Diff #162872)

I've restructured it so hopefully this is less of a concern.

652 ↗(On Diff #162872)

To clarify: the parameterisation runs the test 6 times. The isTestingCL() check narrows that down to 3.

sammccall accepted this revision.Sep 3 2018, 9:17 AM

Thanks again for fixing this.
Still a few places I feel the code could be simpler, but will let you decide how to deal with them.
I would greatly appreciate removing the parameterized tests, as that's the place where I feel least confident I can understand exactly what the intended behavior is.

124 ↗(On Diff #162872)

I do find if (mode == DriverMode::CL) much clearer.

"CL" isn't a particularly clear name for people who don't deal with windows much, and "!isCL" isn't a great name for the GCC-compatible driver.

156 ↗(On Diff #162872)

I think a comment // ArgList is no longer valid would suffice, or storing the ArgList in an Optional and resetting it.

In principle the restructuring seems fine, but it introduced new issues: the boundaries and data flow between the constructor and processInputCommandLine is unclear. (It reads from parameters which are derived from Cmd.CommandLine and overwrites Cmd.CommandLine itself, along with other state. Its name doesn't help clarify what its responsibility is.

If you want to keep this function separate, I'd suggest:

  • make it static to avoid confusion about state while the constructor is running
  • call it parseCommandLine to reflect the processing it's doing
  • return tuple<vector<String>, DriverMode, LangStandard, Optional<Type>>
  • call it as std::tie(Cmd.CommandLine, Mode, ...) = parseCommandLine(...)

But it also seems fine as part of the constructor.

231 ↗(On Diff #162872)

Right - you already have the helper function toCLflag, I'd suggest extending/renaming it so it covers both CL and GCC and fits the call site more conveniently.

141 ↗(On Diff #163098)

please remove premature optimizations (SmallVector, reserve) - this function is not (any more) performance-critical

143 ↗(On Diff #163098)

simple loop is more readable than transform()

192 ↗(On Diff #163098)

can we just inline this ("--driver-mode") like we do with other specific we need in their string form (std etc)?

196 ↗(On Diff #163098)
if (S.consume_front(...))
  return S == "cl";
652 ↗(On Diff #162872)

Understood. I don't think the extra test coverage here is worth the infrastructure complexity, and would strongly prefer to add a handful of tests covering driver mode interactions instead.

This revision is now accepted and ready to land.Sep 3 2018, 9:17 AM
This revision was automatically updated to reflect the committed changes.

Thanks for the help with this.
I reverted most of my changes (including the parameterised tests) before committing; I think I implemented all of your suggestions.