This is an archive of the discontinued LLVM Phabricator instance.

Testbed and skeleton of a new expression parser
ClosedPublic

Authored by spyffe on Nov 28 2016, 4:55 PM.

Details

Summary

LLVM's JIT is now the foundation of dynamic-compilation features for many languages. Clang also has low-level support for dynamic compilation (ASTImporter and ExternalASTSource, notably). How the compiler is set up for dynamic parsing is generally left up to individual clients, for example LLDB's C/C++/Objective-C expression parser and the ROOT project.

Although this arrangement offers external clients the flexibility to implement dynamic features as they see fit, the lack of an in-tree client means that subtle bugs can be introduced that cause regressions in the external clients but aren't caught by tests (or users) until much later. LLDB for example regularly encounters complicated ODR violation scenarios where it is not immediately clear who is at fault.

I propose a simple expression parser be added to Clang. I aim to have it encompass two main features:

  • It should be able to look up external declarations from a variety of sources (e.g., from previous dynamic compilations, from modules, or from DWARF) and have clear conflict resolution rules with easily understood errors. This functionality will be supported by in-tree tests.
  • It should work hand in hand with the LLVM JIT to resolve the locations of external declarations so that e.g. variables can be redeclared and (for high-performance applications like DTrace) external variables can be accessed directly from the registers where they reside.

I have attached a tester that parses a sequence of source files and then uses them as source data for an expression. External references are resolved using an ExternalASTSource that responds to name queries using an ASTImporter. This is the setup that LLDB uses, and the motivating reason for MinimalImport in ASTImporter.

Over time my intention is to make this more complete and support the many scenarios that LLDB can support. I also want to identify places where LLDB does the wrong thing and we can make this functionality more correct, hopefully eliminating frustrating and opaque errors. I also want to spot where Clang might get things wrong, and be able to point Clang developers to an easy-to-reproduce, in-tree test that doesn't require external projects or patches. Finally, I want to identify how we can make this functionality as generic as possible for the current clients besides LLDB, and for potential future clients.

Diff Detail

Repository
rL LLVM

Event Timeline

spyffe updated this revision to Diff 79475.Nov 28 2016, 4:55 PM
spyffe retitled this revision from to Testbed and skeleton of a new expression parser .
spyffe updated this object.
spyffe set the repository for this revision to rL LLVM.
spyffe added a subscriber: cfe-commits.

This seems like a great idea. btw, do you know about Cling (https://root.cern.ch/cling)?

a.sidorin edited edge metadata.Nov 29 2016, 9:05 AM

That's excellent. Thank you for this work, Sean!

tools/clang-import-test/CMakeLists.txt
19 ↗(On Diff #79475)

Please add a newline here.

tools/clang-import-test/clang-import-test.cpp
106 ↗(On Diff #79475)

How about something like this:

StringRef Remain(LineBegin, Buffer->getBufferEnd() - LineBegin);
size_t EndPos = Remain.find_first_of("\r\n");
StringRef Line = (EndPos == StringRef::npos) ? Remain : StringRef(LineBegin, EndPos);
llvm::errs() << Line << "\n";
llvm::errs().indent(LocColumn) << "^\n";

?

115 ↗(On Diff #79475)

Why do we use fprintf instead of llvm::errs()?

130 ↗(On Diff #79475)

SmallString? So, you will not need to push_back.

143 ↗(On Diff #79475)

No need in c_str() here.

152 ↗(On Diff #79475)

Please add a newline here.

162 ↗(On Diff #79475)

ForwardImporters[ImportCI] = ...llvm::make_unique<>?

179 ↗(On Diff #79475)

SmallVector?

181 ↗(On Diff #79475)

In LLVM code, qualifiers for llvm-style casts are not commonly used.

222 ↗(On Diff #79475)

Extra spaces.

236 ↗(On Diff #79475)

[](const std::string &s)

238 ↗(On Diff #79475)

ClangArgv.begin(), ClangArgv.end()

327 ↗(On Diff #79475)

// end namespace

This seems like a great idea. btw, do you know about Cling (https://root.cern.ch/cling)?

Hal, thank you for your interest! Yes, Cling is what I was referring to when I mentioned the ROOT project. Vassil Vassilev, the developer of Cling, is also on the reviewer list. I had the pleasure of talking to Vassil a few years ago at a LLVM Developer Meeting and the idea of creating common infrastructure between LLDB and Cling has been rolling around in the back of my head since then.

spyffe marked 11 inline comments as done.Nov 29 2016, 10:51 AM

Thank you, Alex! I've responded in a few places inline below, and will update this patch momentarily.

tools/clang-import-test/clang-import-test.cpp
106 ↗(On Diff #79475)

I'm going to think about this a little more. Personally I find the loop more readable.
That said StringRef instead of std::string seems an obvious win:

llvm::StringRef LineString(LineBegin, LineEnd - LineBegin);
152 ↗(On Diff #79475)

I just ran clang-format on the whole file. Sorry for the noise.

222 ↗(On Diff #79475)

Sigh. Yes, I had a preliminary implementation of this in mind, but decided to split that out into a separate patch.

238 ↗(On Diff #79475)
.../llvm/tools/clang/tools/clang-import-test/clang-import-test.cpp:236:44: error: no viable conversion from 'iterator' (aka '__wrap_iter<const char **>') to 'const char *const *'
  CompilerInvocation::CreateFromArgs(*Inv, ClangArgv.begin(), ClangArgv.end(),
                                           ^~~~~~~~~~~~~~~~~
.../llvm/tools/clang/include/clang/Frontend/CompilerInvocation.h:139:49: note: passing argument to parameter 'ArgBegin' here
                             const char* const *ArgBegin,
                                                ^
spyffe updated this revision to Diff 79605.Nov 29 2016, 10:52 AM
spyffe edited edge metadata.
spyffe marked 2 inline comments as done.

Updated to reflect Aleksei's comments.

vsk added a subscriber: vsk.Nov 29 2016, 7:07 PM

Thanks for working on this! I'm happy with the direction of this patch. It makes sense that clang would have a tool to help test AST reconstruction from 'external' sources. As you pointed out, it provides a good avenue for clients of the clang AST's to get better test coverage for their custom serialization formats. I haven't been paying much attention to the discussion about the clangDebuggerSupport library, but it sounds like your work will ultimately depend on it. Is that correct?

I've left some lower-level comments inline.

At a higher level, I'm concerned about the amount of covered-but-untested code this patch introduces. Since there are no CHECK lines, it's hard for me to verify that this tool is doing the right thing. What we really want to test right away is that the tool can "dump" the correct definition for struct S (since this implies that the importing went OK). A good way to go about this would be to add a hidden "debug" cl::opt, dump all imported struct decls when in -debug mode, and then add some CHECK lines for the expected struct decl.

Along with this change, I suggest stripping out a fair amount of code for the initial commit (probably PrintSourceForLocation, and maybe anything related to LogLookups).

tools/clang-import-test/clang-import-test.cpp
164 ↗(On Diff #79605)

I think it's more idiomatic to say /*MinimalImport=*/true in clang.

282 ↗(On Diff #79605)

This might as well be constructed as a StringRef from the get-go, since it's eventually converted into one.

302 ↗(On Diff #79605)

I suggest making this Expected<std::unique_ptr<CompilerInstance>> Parse(..., ArrayRef<std::unique_ptr<CompilerInstance>>). This way, there's no way to mistake CI for an input param, there's no need for an extra step to convert std::unique_ptr<CompilerInstance> to CompilerInstance *, and it's harder to drop an error from Parse without logging/handling it.

238 ↗(On Diff #79475)

This should resolve it, but it seems less readable: &*args.cbegin(), &*args.cend(). I'd leave this as-is..

vsk added a subscriber: friss.Nov 29 2016, 7:08 PM
spyffe marked 3 inline comments as done.Nov 30 2016, 11:04 AM

Thank you for your review, Vedant! I will update the patch to reflect your comments in a moment.

I'm concerned about the amount of covered-but-untested code this patch introduces. Since there are no CHECK lines, it's hard for me to verify that this tool is doing the right thing.

What you're observing is that this is very much a skeleton. What it can do right now is let the parser know that a struct S exists, but not what its contents are. That's why the test is so simple, too. As soon as the lexical lookup machinery exists, we'll be able to add tests accessing fields etc. and make sure everything is copacetic.

Along with this change, I suggest stripping out a fair amount of code for the initial commit (probably PrintSourceForLocation, and maybe anything related to LogLookups).

That's fair. As a low-level testing tool I'd like to make sure that I have a logging mechanism later on that allows tests to verify that the compiler made the right queries during a parse; that said, I can add these functions back in when tests require them.

tools/clang-import-test/clang-import-test.cpp
302 ↗(On Diff #79605)

Ah yes, this is a pattern I hadn't internalized yet. Thanks for the reminder.

spyffe updated this revision to Diff 79807.Nov 30 2016, 1:51 PM
spyffe marked an inline comment as done.

Updated to reflect Vedant's comments.
Also ensured 100% test coverage, by removing handling for errors that will never happen and by adding a few new tests.

vsk added a comment.Nov 30 2016, 2:10 PM

Looks good. Apart from some nitpicks, I'm happy with this. This isn't really my area so it'd be good to get an explicit lgtm from one of the reviewers.

tools/clang-import-test/CMakeLists.txt
7 ↗(On Diff #79807)

@beanz recently went on a crusade to add dependencies to intrinsics_gen to a bunch of stuff. Chances are, this tool probably depends on intrinsics_gen, so I'd look into that and add the dep if needed.

tools/clang-import-test/clang-import-test.cpp
306 ↗(On Diff #79807)

This transform smells a little strange. I'd rather see ArrayRef<std::unique_ptr<...>> used everywhere, through a typedef/using-decl if necessary.

310 ↗(On Diff #79807)

There are redundant parens around your calls to toString(): I think these should be dropped.

v.g.vassilev edited edge metadata.Dec 1 2016, 1:18 PM

Thanks for working on this Sean!

Since long time we are planning to propose a patch moving some parts of cling (eg. libInterpreter) mainline.

Now I should have a little more time to do this and it'd be great if we can share some code between cling and lldb. This testing infrastructure is an ideal candidate for this.

Could you move the Build* functions in a utility class/namespace. I need something very similar for my ongoing work on the libInterpreter patch and I'd like to reuse that part of the patch.

spyffe updated this revision to Diff 81661.Dec 15 2016, 2:18 PM
spyffe edited edge metadata.
spyffe marked 2 inline comments as done.

Applied Vassil and Vedant's comments. I will commit this soon.

spyffe updated this revision to Diff 81803.Dec 16 2016, 3:22 PM

Updated CMakeFiles to fix dependencies.

  • Fixed dependencies on gen_intrinsics
  • Added a testsuite dependency on clang-import-test
This revision was automatically updated to reflect the committed changes.

Sorry for being late :(

cfe/trunk/tools/clang-import-test/clang-import-test.cpp
100

I still think indent() method is more evident here and it doesn't require memory allocation.

126

No str() is needed here, raw_ostream works well even with non null-terminated StringRefs.

304

Broken indentation.

spyffe updated this revision to Diff 81991.Dec 19 2016, 1:33 PM
spyffe marked 3 inline comments as done.

Applied Aleksei's comments, and integrated all the CMakeFiles fixes required to make the bots happy.

Looks good now, thanks!

a.sidorin reopened this revision.Dec 21 2016, 9:18 AM
a.sidorin accepted this revision.
a.sidorin edited edge metadata.
This revision is now accepted and ready to land.Dec 21 2016, 9:18 AM
This revision was automatically updated to reflect the committed changes.