This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] A basic implementation of compiling cxx grammar at build time.
ClosedPublic

Authored by hokein on May 16 2022, 1:02 AM.

Details

Summary

The main idea is to compile the cxx grammar at build time, and construct
the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled
data sources.

This is a tiny implementation, which is good for start:

  • defines how the public API should look like;
  • integrates the cxx grammar compilation workflow with the cmake system.
  • onlynonterminal symbols of the C++ grammar are compiled, anything else are still doing the real compilation work at runtime, we can opt-in more bits in the future;
  • splits the monolithic clangPsuedo library for better layering;

Diff Detail

Event Timeline

hokein created this revision.May 16 2022, 1:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 1:02 AM
Herald added a subscriber: mgorny. · View Herald Transcript
hokein requested review of this revision.May 16 2022, 1:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 1:02 AM

This is a revised version of D125231, based on our discussion. It has a narrow scope: only nonterminals of the grammar are compiled; and it mainly focuses on interfaces;

Few initial comments...

clang-tools-extra/pseudo/gen/Main.cpp
8

missing description

clang-tools-extra/pseudo/include/clang-pseudo/cxx/cxx.h
35

enum class? I suspect having to cast these explicitly is worth it if using them is rare.

Otherwise we'll end up with rules in the same namespace as symbols, later.

37

we've got {cxx, Cxx} in filenames.
We should be consistent, and per LLVM style I think CXX is correct?

clang-tools-extra/pseudo/lib/CMakeLists.txt
3

We do have a layering relationship here, and a requirement to keep the "grammar" dependencies small - should we move it into a subdirectory?

45

why is this not in cxx/CMakeLists.txt?

clang-tools-extra/pseudo/lib/cxx/cxx.cpp
1

nit: interfaces

sammccall added inline comments.May 18 2022, 5:42 AM
clang-tools-extra/pseudo/CMakeLists.txt
1

I think these variables shared across CMakeLists.txt files generally add more confusion than value, it doesn't seem to be needed here - can we use relative paths instead?

clang-tools-extra/pseudo/gen/cxx_gen.cmake
1

why is this a textually included *.cmake file that adds rules, instead of a CMakeLists.txt file or a *.cmake file that provides functions?

clang-tools-extra/pseudo/include/CMakeLists.txt
1

it seems to me the build rules for files that end up in the include/ directory should go here?

hokein updated this revision to Diff 430504.May 18 2022, 2:16 PM
hokein marked 8 inline comments as done.

Address comments:

  • split to a grammar subdirectory
  • remove cxx_gen.cmake, move it to include/CMakeLists.txt
  • rename cxx -> CXX
hokein updated this revision to Diff 430506.May 18 2022, 2:18 PM

format fix

clang-tools-extra/pseudo/CMakeLists.txt
1

removed

clang-tools-extra/pseudo/include/CMakeLists.txt
1

sounds a good idea, move the cxx_gen.cmake to this file.

clang-tools-extra/pseudo/lib/CMakeLists.txt
3

split it to a grammar subdirectory.

sammccall accepted this revision.May 23 2022, 8:51 AM
sammccall added inline comments.
clang-tools-extra/pseudo/lib/cxx/CXX.cpp
17 ↗(On Diff #430506)

this is worth a try, but I think (some versions of?) MSVC don't like long string literals.

Stackoverflow says 2k per "chunk" ("one" "two" "three") and 64k total.
Integer arrays can be larger...

So let's start with simple readable options, and make them progressively uglier if we hit limits.

22 ↗(On Diff #430506)

static Grammar &G = *Grammar::parseBNF(...).release();
to avoid destruction

28 ↗(On Diff #430506)

similarly &Table = *new LRTable(buildSLR(...));

clang-tools-extra/pseudo/lib/grammar/CMakeLists.txt
3 ↗(On Diff #430506)

a comment why we want minimal deps here?

This revision is now accepted and ready to land.May 23 2022, 8:51 AM
hokein updated this revision to Diff 431746.May 24 2022, 12:07 PM
hokein marked 3 inline comments as done.

rebase, and emit string chunks rather than a long raw string literal (to make msvc happy)

clang-tools-extra/pseudo/lib/cxx/CXX.cpp
17 ↗(On Diff #430506)

ah, good point. And you're right -- the previous version raw string literal doesn't compile with MSVC v19 <source>(397): error C2026: string too big, trailing characters truncated. Changing to string chunks seems to work.

smeenai added a subscriber: smeenai.EditedMay 25 2022, 9:37 AM

This breaks cross-compilation, because we're building pseudo-gen for the target but then trying to run it on the build machine. You'll need to do something like TableGen does (see https://github.com/llvm/llvm-project/blob/main/llvm/cmake/modules/CrossCompile.cmake and the uses of build_native_tool). That being said, I'm trying to figure out why our builds are attempting to build this in the first place, since as far as I can tell nothing external depends on it yet.

EDIT: Ah, it's because we're building an umbrella target (clang-libraries), so the library targets introduced here will be naturally folded into it (because of add_clang_library). Could we please get cross-compilation fixed here, and revert if it'll take a while?

I'll see if I can fix it quickly, else will revert (there are a couple of fixes stacked on top already, so revert isn't quite clean)

I'll see if I can fix it quickly, else will revert (there are a couple of fixes stacked on top already, so revert isn't quite clean)

Thank you! I'll have some time to look at this a bit later myself as well, and it should hopefully be a quick fix, so we can try to avoid the revert :)

I'll see if I can fix it quickly, else will revert (there are a couple of fixes stacked on top already, so revert isn't quite clean)

Thank you! I'll have some time to look at this a bit later myself as well, and it should hopefully be a quick fix, so we can try to avoid the revert :)

D126397