This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Decouple importing the std C++ module from the way the program is compiled
ClosedPublic

Authored by teemperor on Sep 19 2019, 5:24 AM.

Details

Summary

At the moment, when trying to import the std module in LLDB, we look at the imported modules used in the compiled program
and try to infer the Clang configuration we need from the DWARF module-import. That was the initial idea but turned out to
cause a few problems or inconveniences:

  • It requires that users compile their programs with C++ modules. Given how experimental C++ modules are makes this feature inaccessible

for many users. Also it means that people can't just get the benefits of this feature for free when we activate it by default
(and we can't just close all the associated bug reports).

  • Relying on DWARF's imported module tags (that are only emitted by default on macOS) means this can only be used when using DWARF (and with -glldb on Linux).
  • We essentially hardcoded the C standard library paths on some platforms (Linux) or just couldn't support this feature on other platforms (macOS).

This patch drops the whole idea of looking at the imported module DWARF tags and instead just uses the support files of the compilation unit.
If we look at the support files and see file paths that indicate where the C standard library and libc++ are, we can just create the module
configuration this information. This fixes all the problems above which means we can enable all the tests now on Linux, macOS and with other debug information
than what we currently had. The only debug information specific code is now the iteration over external type module when -gmodules is used (as std and also the
Darwin module are their own external type module with their own files).

The meat of this patch is the CppModuleConfiguration which looks at the file paths from the compilation unit and then figures out the include paths
based on those paths. It's quite conservative in that it only enables modules if we find a single C library and single libc++ library. It's still missing some
test mode where we try to compile an expression before we actually activate the config for the user (which probably also needs some caching mechanism),
but for now it works and makes the feature usable.

Diff Detail

Repository
rL LLVM

Event Timeline

teemperor created this revision.Sep 19 2019, 5:24 AM

I understand the motivation, just a few questions about the approach:

  • Can we make it such that *if* the program was compiled with -gmodules LLDB is using the DWARF module info as authoritative source instead of using this heuristic?
  • Is there a path towards supporting the following scenario?
    1. User compiles a program that contains @import Foo; using a custom -DBAR define, which affects how Foo is built on the command line.
    2. LLDB reads the fact the Foo was imported with -DBAR from DWARF and makes it available in the expression evaluator.

This patch is *only* for getting the std module working on as many projects as possible. The initial plan about using -gmodules and then looking at the imported modules as expressed in DWARF is still what we will do for modules in general (and it will then have precedence over this approach if user's have fully modularized and migrated their projects to -gmodules). So clear yes to both of your questions. -gmodules support will be added when I get around to update D61606, but getting this into a state where we can ship it is the priority for now :)

aprantl added inline comments.Sep 20 2019, 12:53 PM
lldb/include/lldb/Symbol/CompileUnit.h
228 ↗(On Diff #220840)

What does "external module" mean in this context? Can we be specific about lldb::Module vs clang::Module?

lldb/source/Plugins/ExpressionParser/Clang/ClangUserExpression.cpp
442 ↗(On Diff #220840)

if you use LLDB_LOG(log, "[C++ module config] %s ", msg.c_str()); we only pay the cost for concatenating if logging is on.

lldb/source/Plugins/ExpressionParser/Clang/CppModuleConfiguration.cpp
1 ↗(On Diff #220840)

the -*- C++ -*- only makes sense for .h files were it is ambiguous whether they contain C or C++ code.

46 ↗(On Diff #220840)

llvm::sys::path::append to avoid hardcoding path separators?

65 ↗(On Diff #220840)
if (!std::all_of(supported_files.begin(), supported_files.end(), analyzeFile))
  return;

?

68 ↗(On Diff #220840)

sys::path::append

lldb/source/Plugins/ExpressionParser/Clang/CppModuleConfiguration.h
28 ↗(On Diff #220840)

micro-optimization: may use less space if the two bools are grouped together

teemperor marked 6 inline comments as done.Sep 23 2019, 5:35 AM

Thanks for the quick review!

lldb/include/lldb/Symbol/CompileUnit.h
228 ↗(On Diff #220840)

Rather lldb::Module. Well, it refers to the external type modules we get with -gmodules (i.e. what we store in SymbolFileDWRF::m_external_type_modules), so it is kind of both in this case.

lldb/source/Plugins/ExpressionParser/Clang/CppModuleConfiguration.cpp
46 ↗(On Diff #220840)

I would rather have this converted to POSIX style and work with that. Otherwise this whole function needs to be path-sep independent (including escaping the regex and what not). And that ends up being a lot of boilerplate to make this OS-specific code OS-agnostic.

teemperor updated this revision to Diff 221307.Sep 23 2019, 5:35 AM
  • Addressed feedback.
aprantl accepted this revision.Sep 23 2019, 5:40 PM
This revision is now accepted and ready to land.Sep 23 2019, 5:40 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2019, 3:09 AM