Page MenuHomePhabricator

Add ability to import std module into expression parser to improve C++ debugging

Authored by teemperor on Feb 12 2019, 8:05 AM.



This patch is the MVP version of importing the std module into the expression parser to improve C++ debugging.

What happens in this patch is that we inject a @import std into our expression source code. We also
modify our internal Clang instance for parsing this expression to work with modules and debug info
at the same time (which is the main change in terms of LOC). We implicitly build the std module on the first use. The
C++ include paths for building are extracted from the debug info, which means that this currently only
works if the program is compiled with -glldb -fmodules and uses the std module. The C include paths
are currently specified by LLDB.

I enabled the tests currently only for libc++ and Linux because I could test this locally. I'll enable the tests
for other platforms once this has landed and doesn't break any bots (and I implemented the platform-specific
C include paths for them).

With this patch we can now:

  • Build a libc++ as a module and import it into the expression parser.
  • Read from the module while also referencing declarations from the debug info. E.g. std::abs(local_variable).

What doesn't work (yet):

  • Merging debug info and C++ module declarations. E.g. std::vector<CustomClass> doesn't work.
  • Pretty much anything that involves the ASTImporter and templated code. As the ASTImporter is used for saving the result declaration, this means that we can't

call yet any function that returns a non-trivial type.

  • Use libstdc++ for this, as it requires multiple include paths and Clang only emits one include path per module. Also libstdc++ doesn't support Clang modules without patches.

Diff Detail


Event Timeline

teemperor created this revision.Feb 12 2019, 8:05 AM
teemperor planned changes to this revision.Feb 12 2019, 8:06 AM

All the changes regarding extracting module include paths are probably superseded by Adrian's patch (which does the same thing nicer). So all the code related to GetModuleIncludes will be removed from this patch most likely.

teemperor updated this revision to Diff 186478.Feb 12 2019, 8:15 AM
  • Cleaned up setup method.
shafik added inline comments.Feb 12 2019, 9:42 PM
1 ↗(On Diff #186478)

Looks like you are relying on iostream to bring in cstdlib which is not guaranteed.

444 ↗(On Diff #186478)


labath added a subscriber: labath.Feb 12 2019, 10:42 PM

I don't know much about modules so these are just two comments I noticed from a quick glance:

  • using both virtual and override on a method (like ExternalASTSourceWrapper does) is overkill. Drop virtual.
  • Returning /usr/include from GetSystemIncludeDirectoriesForLanguage is going to be wrong for remote sessions. The platform instance should have a "sysroot" path stored somewhere (from platform select foo --sysroot=/bar), so it would be good to at least honor that. It might even be good to pass the sysroot as the --sysroot parameter to clang.
267 ↗(On Diff #186478)

return directories by value?

teemperor updated this revision to Diff 187026.Feb 15 2019, 8:44 AM
  • Rebase on top of Adrian's patch (Thanks!)
  • Addressed the feedback from Shafik/Pavel.
teemperor marked 2 inline comments as done.Feb 15 2019, 8:47 AM

@labath Thanks for the hint with the sysroot, but I'm not really sure what's the best way to test this. Making a whole fake sysroot that can be used to compile a std module seems overkill, and not sure if symlinking or so is a good approach either. I'm open to suggestions.

1 ↗(On Diff #186478)

We actually always import the whole std module, not only the submodule that is used in the program. (Partly because we don't use the submodule visibility feature and partly because I don't see any disadvantage in doing that).

aprantl added inline comments.Feb 15 2019, 9:20 AM
3 ↗(On Diff #187026)

Makefile.rules defines MANDATORY_MODULE_BUILD_FLAGS for this. Can you instead add -fimplicit-module-maps to that variable and use it here?

teemperor marked 2 inline comments as done.Feb 17 2019, 4:31 AM
teemperor added inline comments.
3 ↗(On Diff #187026)

Well, we would need to add -glldb (needed on Linux at least, where this is not default) and -fimplicit-module-maps to that flag (and maybe even -fcxx-modules on macOS, but I'm not sure about that). Also I'm not sure if we want to have -gmodules in the same flag set as the importing of std should also work without. What about a separate MANDATORY_CXXMODULE_BUILD_FLAGS that we use in all the tests?

@labath Thanks for the hint with the sysroot, but I'm not really sure what's the best way to test this. Making a whole fake sysroot that can be used to compile a std module seems overkill, and not sure if symlinking or so is a good approach either. I'm open to suggestions.

How much of the sysroot would you actually need to recreate? Would a modulemap file and a one or two simple headers be enough?

Since we're simply testing here that the prefix gets prepended, I am not too worried if we just don't test that. What would make this super-interesting for me is if it would enable testing of this feature without a hard dependency on having libc++ installed on their system (a lot of people on linux still don't have libc++ by default, or may have an older non-modularized version), so if the "sysroot" test could be made to run on those systems, it would be awesome. I am not sure though what would be needed to make that happen. I suspect this could be pretty tricky, since our test suite is not really prepared for that. The most tractable path may be via something like a core file. Like, open a core file, while specifying the sysroot, import @std, then try to display something, which would not be possible without it.

aprantl added inline comments.Feb 18 2019, 9:24 AM
3 ↗(On Diff #187026)

That works for me. The bit I really care about is the -fmodules-cache-path. If we don't set it an incremental bot may blow up when upstream clang changes the serialization format.

teemperor updated this revision to Diff 189086.Mar 3 2019, 8:32 AM
  • Added sysroot test.
  • Moved build flags into MANDATORY_CXXMODULE_BUILD_FLAGS make variable.
teemperor updated this revision to Diff 189089.Mar 3 2019, 9:32 AM
  • Remove libc++ dependency from sysroot test.
  • Small format fix for sysroot Makefile
aprantl added inline comments.Mar 4 2019, 9:00 AM
283 ↗(On Diff #189089)


291 ↗(On Diff #189089)

$(MANDATORY_MODULE_BUILD_CFLAG) -cfxx-modules -glldb ?
Why is -fimplicit-module-maps only on the C++ case?

teemperor updated this revision to Diff 189160.Mar 4 2019, 10:02 AM
  • Refactored build flags.
  • Fixed doxygen comment.

Yeah, the implicit module map flag wasn't actually necessary for the tests, thanks! For the Makefile, I had to create a utility variable as -gmodules shouldn't be necessary for our tests, so the new flags are not really a superset of the normal module flags.

ormris removed a subscriber: ormris.Mar 4 2019, 10:07 AM
aprantl added inline comments.Mar 4 2019, 10:24 AM
35 ↗(On Diff #189160)

what's that for?

teemperor marked an inline comment as done.Mar 4 2019, 10:36 AM
teemperor added inline comments.
35 ↗(On Diff #189160)

I think that's how we set our executable as the target? It's frankly cargo-culted setup code that we use in a few hundred other tests.

Thank you for writing the test. I think it's really great that we're able to test something like this without depending on the actual c++ library existing and supporting modules. (I think we should have more tests like that).

55 ↗(On Diff #189160)

ArrayRef<std::string> ?

257 ↗(On Diff #189160)

I'm wondering why have you needed to do this (and whether we can figure out another solution to that).

254 ↗(On Diff #189160)

Shouldn't this use the platform of the currently executed target (instead of the host platform)?

273 ↗(On Diff #189160)


479–489 ↗(On Diff #189160)

all of this can boil down to something like:
LLDB_LOG(log, "Found module in compile unit: {0} - include dir: {1}", llvm::make_range(m.path.begin(), m.path.end()), m.search_path);

(i'd recommend using LLDB_LOG elsewhere too, but here the benefit is really obvious).

534–539 ↗(On Diff #189160)


teemperor marked an inline comment as done.Mar 4 2019, 10:54 AM
teemperor added inline comments.
257 ↗(On Diff #189160)

Yeah I should have added this as a comment to the test. Our sysroot system has only a few dummy headers, so we can resolve the #include <sys/prctl.h> in test_common.h. The removed -I flags are actually unintentional, I'll revert removing them.

aprantl added inline comments.Mar 4 2019, 10:59 AM
35 ↗(On Diff #189160)

lldbutil.run_break_set_by_file_and_line should be all you need. It has a.out as a default argument. So I don't think you need this line or the one defining exe, or the dictionary=... argument to build().

labath added inline comments.Mar 4 2019, 12:09 PM
257 ↗(On Diff #189160)

Ah, cool. I suspected something like that. In that case I suggest also renaming the flag to NO_TEST_COMMON_H or something. (We should probably replace the force-include by an actual include some day.)

teemperor marked an inline comment as done.Mar 4 2019, 12:34 PM
teemperor added inline comments.
35 ↗(On Diff #189160)

When I remove this line, I actually get AssertionError: False is not True : Expecting 1 locations, got 0 from the lldbutil.run_break_set_by_file_and_line.

I can't remember if I added the dictionary= argument for a specific reason or if that was just a side effect when debugging tests. Doesn't seem to break anything though, so I'll remove it.

aprantl added inline comments.Mar 4 2019, 1:41 PM
35 ↗(On Diff #189160)

Interesting. Looks like there are several very similarly named helper functions in lldbutil. There are several testcases that only use run_to_source/line_breakpoint without running file` manually, so perhaps those functions do more work than run_break_set_by_file_and_line. Since the line number comes from a regex anyway, I'd recommend using `run_to_source_breakpoint instead and also deleting the regex matching from setUp().

teemperor updated this revision to Diff 189995.Mar 9 2019, 2:56 PM
  • Addressed Pavel's comments (thanks!).
  • Reworked test setup code (thanks Adrian!)
  • Rebased the patch.

Thanks Raphael. Apart from some inline nits, I have no further comments from my side, but hopefully someone with more familiarity with modules and clang ASTs can give this the final pass.

13–16 ↗(On Diff #189995)

It looks like these includes are no longer needed.

279 ↗(On Diff #189995)

const std::string & avoids a copy

480–481 ↗(On Diff #189995)

I would actually say that make_range is better than join here, because it avoids constructing a temporary string. So, I'd just remove the FIXME here. (BTW, if you want to control the separator string, there's already a syntax for that: {0:$[.]})

teemperor updated this revision to Diff 190054.Mar 11 2019, 2:47 AM
  • Addressed Pavel's comments.

This looks mostly good now.

19 ↗(On Diff #190054)

Do you understand why the "dwarf" is necessary?

428 ↗(On Diff #190054)

std::for_each() or for (auto *S : Sources)?

teemperor marked an inline comment as done.Mar 11 2019, 10:23 AM
teemperor added inline comments.
19 ↗(On Diff #190054)

The test relies on having the module imports with their include directories listed in the debug info, which I think only dwarf supports for now? It fails for dwo at the moment and I guess there is no equivalent in PDB (can't test this though).

aprantl accepted this revision.Mar 11 2019, 12:28 PM
aprantl added inline comments.
19 ↗(On Diff #190054)

The debug info *format* should not have any semantic effect on the contents, so you probably found some kind of bug. At least dwarf and dwo should behave the same on Linux, as should dsym and dwarf on Darwin. PDB actually can have different contents because it's completely different; skippin gthat makes sense to me. Feel free to investigate this separately though.

This revision is now accepted and ready to land.Mar 11 2019, 12:28 PM
  • Rebased patch.
  • Modernized for loops in ASTUtils.
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2019, 10:10 AM