This is an archive of the discontinued LLVM Phabricator instance.

Clangd: Preserve target flags in system includes extractor
AbandonedPublic

Authored by cpsauer on Nov 23 2022, 12:31 AM.

Details

Summary

This preserves --target and -target flags in the system includes extractor, which is needed for cross-compiling to, e.g. Android, since the target flags can influence the default include paths.

Note that, like isysroot (which is handled incorrectly above and elsewhere), the target flag doesn't fit cleanly into the ArgsToPreserve abstraction and does indeed have a different number of - in its = and non = forms. (ref) There are plenty of bugs in this file, but this is an incremental improvement.

For more context, please see https://github.com/clangd/clangd/issues/1378

Thanks for all you do, wonderful LLVM folks :)

Diff Detail

Event Timeline

cpsauer created this revision.Nov 23 2022, 12:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 12:31 AM
Herald added a subscriber: arphaman. · View Herald Transcript
cpsauer requested review of this revision.Nov 23 2022, 12:31 AM

My biggest concern here is propagating more information here without including it in the cache key.
Originally the computation was strictly using the cache key as input, but started using the command-line flags in D73453. @kadircet any extra context on when/why this is safe?

cpsauer added a comment.EditedNov 23 2022, 1:47 AM

Sam, my read, too, is that the memoizing design isn't safe--also that the cache-key issue is preexisting.

I had the same reaction reading through this file after spotting problems in the log. That's what spawned https://github.com/clangd/clangd/issues/1378.

Any chance I could get you (and others) to quickly read through that issue if you haven't already? (The relevant section to this part: "If we think sysroots, targets, and the other flags enumerated effect the system includes, we'd better include them as part of the memoization key.")

There are sadly *lots* of problems in this file that leap out on a quick read. Nathan and I were thinking, though, that we'd should post this incremental fix for review rather than getting bogged down in trying to fix multiple things atomically.

[Also, if this does get approved, I'd love it if the approver could land it for me!]

Sam, my read, too, is that the memoizing design isn't safe--also that the key bug is preexisting.
Nathan and I were thinking, though, that we'd should post this incremental fix for review rather than getting bogged down in trying to fix multiple things atomically.

Sure. At first glance the design looks like it's been changed in a way that's broken, but maybe there's some deeper reason that it's safe. That reason may or may not also apply to -target (e.g. if -target could plausibly differ across a project but other flags couldn't). I wanted to understand whether/why it's broken today before concluding it's safe to break it further. Probably @kadircet is the best person to make a call here.

I had the same reaction reading through this file after spotting problems in the log. That's what spawned https://github.com/clangd/clangd/issues/1378.

Any chance I could get you (and others) to quickly read through that issue if you haven't already? (The relevant section to this part: "If we think sysroots, targets, and the other flags enumerated effect the system includes, we'd better include them as part of the memoization key.") )

The discussion in this bug makes sense to me. I agree with the need for memoization, and the handling of -isysroot indeed looks dodgy and could probably be fixed.
Framework support sounds important for Mac, and we'd be happy to take a contribution from a Mac person who can explain the issues there.
Reusing clang's arg parser is tricky: it's a large change, it's often more complex than naive string-bashing, and it's also significantly slower.

Unfortunately at the moment I'm not really able to spend work time on improving these things, beyond reviewing patches.
(The system include extractor is mostly useful for things that don't build with stock clang, which at $employer is a minority).

Makes sense! Thanks for your time, Sam, and for being great, as always.

cpsauer added a comment.EditedNov 23 2022, 2:29 AM

I suppose, if it ever might help make the case with said employer (Google, right?) the specific broken use case that's motivating this is Android (and also usage from Bazel/Blaze and by Google projects).

Sam, my read, too, is that the memoizing design isn't safe--also that the key bug is preexisting.
Nathan and I were thinking, though, that we'd should post this incremental fix for review rather than getting bogged down in trying to fix multiple things atomically.

Sure. At first glance the design looks like it's been changed in a way that's broken, but maybe there's some deeper reason that it's safe. That reason may or may not also apply to -target (e.g. if -target could plausibly differ across a project but other flags couldn't). I wanted to understand whether/why it's broken today before concluding it's safe to break it further. Probably @kadircet is the best person to make a call here.

Given that a "project" is a fuzzy concept -- where e.g. someone in a cross-compilation scenario may choose to group host tools and target tools together in the same project -- surely -isysroot, -nostdinc etc. could also potentially differ across a project (but it's probably not common).

I think the simple answer here is that not including those flags in the key was an oversight, albeit a low-impact one.

I'd say taking this change as-is would be a net improvement (but we should also fix the oversight, now that we're aware of it, in a follow-up change).

My only suggestion regarding this patch would be to amend this test to check for the preservation of the target flag as well.

Sorry for noticing this so late. Yes the initial reason we left sysroot and std/builtin-inc related flags were exactly that, in theory they could vary but in practice they were applied to all or none of the project.

Regarding this change itself, surely preserving target makes sense and similar to sysroot and others, i'd actually expect it to be the same across all files in practice. But they're more likely to change without a clangd restart (e.g. because user invokes build in a different configuration). So i think they should be part of the cache key nevertheless. My main question about this change is, you seem to be using a clang driver underneath. Why is clang driver we have in clangd not able to infer the same search paths etc? Is the clang provided in android ndk a custom clang? can you provide logs in the issue you've mentioned above, especially the compile flags retrieved from CDB and cc1 produced by it?

Regarding the mishandling of -isysroot<dir> vs -isysroot <dir> (they still both have a single -, but expecting an equals in between is a mistake apparently), surely we should fix that.

Sorry for being so slow this time myself, guys. Took me a while to dig myself out of a deep inbox hole after Thanksgiving (US holiday). Really appreciate you all and your friendly responsiveness, especially as I get up to speed here.

cpsauer updated this revision to Diff 481141.Dec 7 2022, 7:19 PM

@nridge, I took a shot at amending the test. Thanks for the pointer! Please me know if that looks good to you!
The host-target cross-compile situation you hypothesize is exactly what caused me to notice this. Super common with Bazel.

@kadircet, you were asking about what's requiring the use of query-driver--I'll reply over in the issue (as you suggest), keeping everything in one place.

nridge resigned from this revision.Dec 12 2022, 1:44 AM

@nridge, I took a shot at amending the test. Thanks for the pointer! Please me know if that looks good to you!

The test changes look good, thanks.

I'm going to leave the approval to Kadir because his comment sounded like he would maybe prefer for https://github.com/clangd/clangd/issues/1403 (note, it has a patch at D139277) to be fixed before merging this.

Sweet. Thanks again, Nathan.

Guys, lmk how you'd like to go from here. If you approve; feel free to copy in/land as part of the other change if that would save time.
(I don't have commit access anyway.)

It seems this patch can break GCC toolchains support

This patch expects that we never get target-related options in CommandLine passed to extractSystemIncludesAndTarget() for GCC toolchain. But seems this is not always true because of https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp#L253

In other words, with this patch we preserve --target=..., but this option can appear from https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp#L253 (but not from compile_commands.json). As the result of it, we can call GCC toolchain with --target=.. option at trying to query-driver and this call will fail (because GCC has no --target option)

cpsauer added a subscriber: nridge.EditedDec 27 2022, 8:59 PM

Ooh interesting. Thanks for your careful look, @ArcsinX.

To check my understanding: Sounds like, more generally, commands get massaged into clang argument form before they're passed into the query-driver machinery, which can break break calls through to other compilers?

So far that hadn't affected things, but probably we should be relayering things to operate on raw commands for query driver?
@nridge, I know you'd fixed a bunch of problems by relayering in the past. Any thoughts on this one?

More generally folks more experienced in the codebase: Any thoughts on how we should do this?

In other words, with this patch we preserve --target=..., but this option can appear from https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp#L253

Huh, I had no idea we had logic to add a --target flag there. (Nor that gcc does not recognize this option.)

@cpsauer just to double-check, in your case the --target flag is explicitly written in the compile_commands.json, and the idea is that in such case we want to pass it to the system include extractor (since it's safe to assume the compiler supports the flag in that case), but not in the case where we add it?

probably we should be relayering things to operate on raw commands for query driver?

Not quite raw commands: the fix for https://github.com/clangd/clangd/issues/1173 was to make sure the inferred language (e.g. -x c++) is added before query-driver, and the fix for https://github.com/clangd/clangd/issues/1089 was to make sure that edits from the config file are applied before query-driver (because Compiler can be overridden there).

But I think we could achieve the intended effect by removing the call to inferTargetAndDriverMode() from GlobalCompilationDatabase.cpp, and instead performing the equivalent logic in CommandMangler, after invoking the SystemIncludeExtractor?

Any thoughts on whether that would be reasonable / whether it would break anything else, are welcome.

@nridge, yep, confirming: For Android, --target is being added explicitly as part of the command and we'll need to pass through explicit (but not implicit) --target flags to the system includes extractor to get the correct paths.

cpsauer updated this revision to Diff 489208.Jan 14 2023, 1:37 AM

@nridge, I took a shot at the change you suggested. Confirming that this what you were thinking of, removing inferTargetAndDriverMode from database load and replacing it with a call to the underlying addTargetAndModeForProgramName in the CommandMangler after the SystemIncludeExtractor is invoked?
Note that I also saw a parallel call to inferTargetAndDriverMode in JSONCompilationDatabasePlugin, which seemed like it might cause the same problem, so I eliminated the call in there, too. I then culled the dead code, since there weren't other call sites for the addTargetAndModeForProgramName wrappings. Could I ask you to sanity check that? I know that's a more general tooling library; I was having trouble determining whether other tooling needed the target inference or not.
Also, @ArcsinX, could I get your take? Do you think this would fix the layering issue you raised?

We're definitely stretching my (currently quite limited) understanding of how clangd's implementation fits together, so I think I'll need to ask for help/guidance if that looks wrong.

Thanks again, guys!
-CS

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2023, 1:37 AM
cpsauer updated this revision to Diff 489214.Jan 14 2023, 1:50 AM
cpsauer updated this revision to Diff 489216.Jan 14 2023, 2:17 AM
cpsauer updated this revision to Diff 489217.Jan 14 2023, 2:51 AM
cpsauer updated this revision to Diff 489317.Jan 14 2023, 2:21 PM

(If anyone knows what's going on with the (unrelated seeming?) testing timeouts please do say.)

@nridge, I took a shot at the change you suggested. Confirming that this what you were thinking of, removing inferTargetAndDriverMode from database load and replacing it with a call to the underlying addTargetAndModeForProgramName in the CommandMangler after the SystemIncludeExtractor is invoked?

Yup, that's what I had in mind. Thanks for writing that up!

Note that I also saw a parallel call to inferTargetAndDriverMode in JSONCompilationDatabasePlugin, which seemed like it might cause the same problem, so I eliminated the call in there, too.

Oh, huh, I didn't realize that JSONCompilationDatabasePlugin has this behaviour baked in.

I would err on the side of leaving that call site as is.

My thinking is as follows:

  • The general purpose of the compilation database utilities in libTooling is to produce command lines that will be fed to the clang driver library.
  • The clang driver does support --target, and in some cases having that flag present is important.
  • For clangd's call site, we're not removing the logic to add the --target flag, we're just moving it from an ealier stage of the pipeline to a later stage of the pipeline, because our pipeline contains a specialized step (SystemIncludeExtractor) that may involve executing a gcc driver with (parts of the) the command.
  • But for other tools built on libTooling that use JSONCompilationDatabasePlugin, this change would be removing the --target logic altogether, which could regress behaviour.

I then culled the dead code, since there weren't other call sites for the addTargetAndModeForProgramName wrappings.

Reagrdless of what we decide about JSONCompilationDatabasePlugin, we probably shouldn't remove clang::tooling::inferTargetAndDriverMode(), as that's a public libTooling API that may be used by external projects that use libTooling as a library.

Trass3r removed a subscriber: Trass3r.Jan 19 2023, 1:01 AM

Yay. Thanks, Nathan, for guiding, replying, and just generally being so kind, great, and thorough.

I'll back out the plugin part of the changes changes momentarily, then.
Just to confirm: Sounds like you're not concerned, then, that the JSONCompilationDatabasePlugin will get invoked and the results then passed to a SystemIncludeExtractor-enabled mangler?
(I'd seen some fallback calls into the plugin registry in GlobalCompilationDatabase, hence the original change, but, lacking broader context, I was having trouble tracing the expected control flow.)

cpsauer updated this revision to Diff 491592.Jan 23 2023, 7:43 PM

Sounds like you're not concerned, then, that the JSONCompilationDatabasePlugin will get invoked and the results then passed to a SystemIncludeExtractor-enabled mangler?
(I'd seen some fallback calls into the plugin registry in GlobalCompilationDatabase, hence the original change, but, lacking broader context, I was having trouble tracing the expected control flow.)

I did initially overlook the mentioned fallback codepath, thanks for catching that. However, it looks like it's only taken if we've already explicitly checked for compile_commands.json and didn't find one, so JsonCompilationDatabasePlugin won't find anything either (i.e. the fallback is for potential other plugins).

As for consumers of the plugin registry outside of clangd, we can assume they won't use SystemIncludeExtractor which is currently a clangd-internal component. (There has been an idea floated about upstreaming SystemIncludeExtractor from clangd to libTooling, but it remains to be determined how that would look at the API level, and we can make adjustments as necessary if/when that happens.)

Anyways, +1 from me on the TargetAndMode related changes.

Thanks, Nathan. Makes sense; sounds good.

@kadircet, I think this one is ready for you, whenever you want it. Lmk how you'd like to proceed.

@kadircet, friendly check in, since I got reminded of this: How would you like to proceed from here?

@kadircet, could I get your review on this one?

Sorry I was waiting for D139277 to land, but it seems to be stuck so sent out D146941. Let's take a look at this again after that one lands

This can be closed, as the change has been made in https://github.com/llvm/llvm-project/pull/65824

cpsauer abandoned this revision.Sep 11 2023, 2:02 PM