This is an archive of the discontinued LLVM Phabricator instance.

Fix issues with using clangd with C++ modules
Changes PlannedPublic

Authored by kuganv on May 4 2022, 2:21 AM.

Details

Summary

Fixes following incompatibility issues with c++ modules
(clangd with c++ modules results in assertion failure and pch_langopt_mismatch #1105)

1:
Preamble generated by clangd is generated with WriteCommentListToPCH = false;
However, if we are using precompiled modules that have comments in the
serialised AST, following sanity check in the clangd fails.

Sanity check that the comment does not come from the PCH. We choose to not
write them into PCH, because they are racy and slow to load.
assert(!Ctx.getSourceManager().isLoadedSourceLocation(RC->getBeginLoc()));

In this patch when we are generating Preamble with
WriteCommentListToPCH, we do not load the comments from AST.

2:
If we have modules that are built with RetainCommentsFromSystemHeaders
difference, that wouldn prevent modules from getting loaded. Since this
difference is not critical that should be stopping modules from being loaded,
this patch changes it to be a COMPATIBLE_LANGOPT.

Diff Detail

Event Timeline

kuganv created this revision.May 4 2022, 2:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 2:21 AM
kuganv requested review of this revision.May 4 2022, 2:21 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 4 2022, 2:21 AM
kuganv edited the summary of this revision. (Show Details)May 4 2022, 2:44 AM
kuganv edited the summary of this revision. (Show Details)May 4 2022, 2:46 AM
kuganv retitled this revision from Fixes following incompatibility issues with c++ modules to Fix issues with using clangd with C++ modules.

To back up a bit...

Until now I don't think we've had a serious effort/plan for supporting modules in clangd, or AFAIK in clang tools in general.
To the extent that modules work, it's a side-effect of how we use clang's driver + compilation database, and inherit behavior by default.
So we probably need a bit of discussion of what configurations should be supported, how they should be configured & interact with the build system etc.

If I understand right (not a modules expert) in a build system:

  1. modules can be produced by the build system as a separate step, and passed to build actions (e.g. -fprebuilt-module-path or -fmodule-file)
  2. or they can be built on demand by compile steps that use them in a clang-private cache dir (-fimplicit-modules)
  3. or we can have a build with no modules at all

If I understand right, you want to support a build system that does 1.
This makes sense (I suspect we should support 2 as well one day, but we can call it out of scope for now)

In clangd we'll see the compile commands used by the build system, via e.g. compile_commands.json, and again we have several strategies:

  • we can consume modules from the build system
  • we can build the needed modules in some clangd-private storage
  • we can disable modular compilation, and parse headers textually

I think this patch intends to support strategy 1 (for build systems that produce modules).

My concern here is that this is fragile:

  • the PCM format is not stable, clangd can only consume module files if the build system is using clang and that clang is built from the exact same revision. This is hard to achieve for most users. (And compile_commands.json having module-related flags in it a signal that it is true)
  • clangd may now/later parse headers in different modes, skip function bodies etc. The COMPATIBLE_LANGOPT changes hint at the problems we can run into trying to reuse modules - it seems likely we'll run into issues later (compatibility, but maybe also performance?)
  • if the modules are produced by the build system, clangd doesn't get a chance to examine the source code or AST directly. How does the dynamic index work on modular code? How do we know its include structure?

There may well be good reasons to handle modules this way, but let's work out the implications before this ends up in production! :-)

kuganv added a comment.EditedMay 4 2022, 8:48 AM

Thanks a lot for the detailed comment. Yes, you are right in that we are using build system (buck) to build the modules and provide as part of the CDB. This is fairly cheap when we hit the buck cache that is also shared by others.

You are right in that the PCM format is not stable and clangd can only consume modules if the build system uses the exact same version of the clang. We do have checks in ASTReader that validates this. I don’t see anyway we can relax this unless we formalise AST and guarantee full compatibility. For our use, we expect to use the exact same version of clang as clangd for building the PCM.

As for different modes, unless I am missing something, we should only care about modes that we explicitly set in clangd. We should expect the modules in CDB should match the options provided by CDB. We could relax some these if they difference does not matter.

My main goal of this patch is to enable middles so that we can test is more widely. I understand that we would find more issues that may need fixing,

kuganv added a comment.May 7 2022, 8:34 AM

@sammccall, What are the first steps that is needed for supporting modules with clangd. From what I see, options related to handling comments that are specifically added to the compile options by clangd needs handling. This RFC diff is on that note.

AFIK, since, clangd relies on clang proper for compilation action, I thought the module semantics would work as in normal clang compilation. I could be completely wrong here and missing some details.

I am happy to look into any other requirements and iterate the diff to get it to where we want it to be acceptable by the community. Any feedback welcome.

nridge added a subscriber: nridge.May 8 2022, 8:01 PM

Hi Kugan, I apologize in advance for a frustrating and long-winded answer...

There's no modules support in clangd in large part because we don't know where to start.

  • clangd devs so far are not experts in modules and didn't have a burning use case, I suspect you can help a lot or both counts!
  • we expect using modules to have very high implementation and maintenance costs, so it's a big bet:
    • they are similar to preambles, and preambles greatly constrain and complicate most clangd features
    • ... but they have many more permutations than preambles, so are harder to reason about and test, we will have more types of bugs...
    • these are being added on, unlike preambles which were added early and much of our infrastructure was designed around (TUScheduler, index)
  • there's no established pattern for tools working with modularized build systems yet (c.f. compile_commands.json 5 years ago)

There's huge potential benefits here and I'm excited you're working on it! But because of the high costs, we need to make sure this is going to improve things substantially for many users. The more that a design assumes version-lock, PCMs are guaranteed present, host compiler flags are clang-compatible (clang/gcc/msvc all use diferent modules flags!), even explicitly modularized code, the narrower the slice of users there are.

It's tempting to start down the path of trying to fix the configuration that almost-works. We can/should definitely support experimentation! (The COMPATIBLE_LANGOPT change here seems fine to me). But even if the first steps are cheap, going down too far down the wrong path can be expensive. My biggest fear is to end up supporting increasingly costly fixes to an accidentally-supported mode that will never really work right.
But I should worry less and talk more - looking forward to a video call on thursday, I'm sure we'll find a good way forward!

I think first steps are pretty high level, to understand:

  • what the goals are (e.g. performance, easier integration with build system, work with modules syntax features, ...)
  • environment and constraints (e.g. how your build system works, how clangd interacts with headers today)
  • alternatives for achieving goals (externally built modules, internally built modules, some hybrid where modules are optionally provided, non-modules solutions...)
nridge added a comment.May 9 2022, 6:22 PM

I think first steps are pretty high level, to understand:

  • what the goals are (e.g. performance, easier integration with build system, work with modules syntax features, ...)

I'll start by suggesting (what seems to me is) a modest goal: a C++ project that wishes to use / start using Modules should be able to do so without breaking their developers' ability to use clangd to work on the project.

That is, clangd should be able to understand Modular code somehow; performance may not be optimal (e.g. clangd may have to parse modular headers from source rather than consuming a compiled module artifact), and it may require some extra configuration (e.g. developers may have to add things to .clangd to get the setup to work) -- those are areas where we can consider improvements (e.g. using a compiled module artifact if one is available) in the future.

kuganv planned changes to this revision.May 17 2022, 11:52 AM