This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.
Bigcheese jfb rsmith RKSimon rnk
- rG950b70dcc7e3: [Clang Interpreter] Initial patch for the constexpr interpreter
rC371834: [Clang Interpreter] Initial patch for the constexpr interpreter
rL371834: [Clang Interpreter] Initial patch for the constexpr interpreter
rG32f82c9cbaf3: [Clang Interpreter] Initial patch for the constexpr interpreter
rC370839: [Clang Interpreter] Initial patch for the constexpr interpreter
rL370839: [Clang Interpreter] Initial patch for the constexpr interpreter
rG8327fed9475a: [Clang Interpreter] Initial patch for the constexpr interpreter
rL370636: [Clang Interpreter] Initial patch for the constexpr interpreter
rC370636: [Clang Interpreter] Initial patch for the constexpr interpreter
rGafcb3de11726: [Clang Interpreter] Initial patch for the constexpr interpreter
rC370584: [Clang Interpreter] Initial patch for the constexpr interpreter
rL370584: [Clang Interpreter] Initial patch for the constexpr interpreter
rGd4c1002e0bbb: [Clang Interpreter] Initial patch for the constexpr interpreter
rC370531: [Clang Interpreter] Initial patch for the constexpr interpreter
rL370531: [Clang Interpreter] Initial patch for the constexpr interpreter
rGa55909505497: [Clang Interpreter] Initial patch for the constexpr interpreter
rC370476: [Clang Interpreter] Initial patch for the constexpr interpreter
rL370476: [Clang Interpreter] Initial patch for the constexpr interpreter
This patch introduces the skeleton of the constexpr interpreter,
|647–662 ↗||(On Diff #213277)|
This is the only case where such a switch occurs - there is another for zero initialisation, but it's quite different. A def which would capture the little common functionality would be quite ugly, so I would like to avoid it.
As for other integers, there won't be support for any. There are two more branches for the fixed-point catch-all fallback.
|40 ↗||(On Diff #213277)|
The entry point is through the emitter which needs to call into the code generator, things don't really work if the emitter is a field.
|245–249 ↗||(On Diff #213277)|
I haven't written things up yet since I changed quite a few things over the past few weeks. I am wrapping up support for unions and will produce a more detailed document afterwards. In the meantime, I have documented this field.
The old path-based approach is slow for most operations, while the new pointer-based approach is fast for the most common ones.
For read/write/update operations, it is enough to check the pointer and at most a single descriptor to ensure whether the operation
can be performed. For some operations, such as some pointer comparisons, in the worst case the old-style path can be reproduced by
following the pointers produced through the getBase method, allowing us to implement everything.
Extern objects are handled by allocating memory for them, allowing pointers to those regions to exist, but marking the actual storage
as extern/uninitialized and preventing reads/writes to that storage.
Some details of corner cases are not clear at this point, but the fact that we can reproduce full paths and have space to store any special
bits gives me the confidence that we can converge to a full implementation.
I want to stress again the fact that C++ has an insane number of features and I am focusing on implementing as many of them, leaving a
path towards future optimisations open. I do not want to clobber initial patches with optimisations that would increase complexity - the
few things we do with direct local reads/writes and improved loops already yield significant speedupds.
I think we're past the point of large-scale structural comments that are better addressed before the first commit, and it'd be much better to make incremental improvements from here.
Please go ahead and commit this, and we can iterate in-tree from now on. Thanks!
|40–74 ↗||(On Diff #214828)|
I think these three all belong in PartialDiagnostic rather than in OptionalDiagnostic. (It doesn't make sense that an OptionalDiagnostic can format an APSInt but a PartialDiagnostic cannot.)
|839 ↗||(On Diff #214828)|
const -> constant
|25 ↗||(On Diff #214828)|
|536 ↗||(On Diff #213277)|
As noted, the evaluation should be treated as non-constant in this case. This is "evaluate for overflow" mode, in which we want to keep evaluating past non-constant operations (and merely skip sub-evaluations that we can't process because they have non-constant values). We will need a bytecode representation for "non-constant evaluation". We should use that here.
|40 ↗||(On Diff #213277)|
I think the fact that the entry point to ByteCodeGen is in the Emitter base class is the cause of this design complexity (the inheritance, virtual functions, and layering cycle between the code generator and the emitter). For example, looking at ByteCodeEmitter, we have:
(And the same thing happens in EvalEmitter.) That to me suggests an opportunity to improve the layering:
No need to do this right away.
|85 ↗||(On Diff #214828)|
Please can you pick a different name for this, that's more different from the AccessKinds enum? AccessOp or something like that maybe?
|178–221 ↗||(On Diff #214828)|
Please document what the Args mean.
Seems like this patch introduced a cyclic dependency between Basic and AST when building with LLVM_ENABLE_MODULES=On:
While building module 'Clang_AST' imported from llvm/lldb/include/lldb/Symbol/ClangUtil.h:14: While building module 'Clang_Basic' imported from llvm/llvm/../clang/include/clang/AST/Stmt.h:18: In file included from <module-includes>:73: clang/include/clang/Basic/OptionalDiagnostic.h:17:10: fatal error: cyclic dependency in module 'Clang_AST': Clang_AST -> Clang_Basic -> Clang_AST #include "clang/AST/APValue.h"
This patch broke my build as well (discovered by runnin git bisect), command line
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_ASM_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=ON -DLLVM_USE_SPLIT_DWARF=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=ARC -DLLVM_ENABLE_PROJECTS=clang -DLLVM_BUILD_TESTS=ON -DLLVM_BUILD_DOCS=ON -DLLVM_ENABLE_WERROR=ON -H/redacted/git/llvm-project/llvm -B/tmp/llvm-project_dbg_compiled-with-clang -GNinja
ninja -C /tmp/llvm-project_dbg_compiled-with-clang
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/Block.cpp.o:(.data+0x0): undefined reference to `llvm::EnableABIBreakingChecks'
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAdd(clang::interp::PrimType, clang::interp::SourceInfo const&)':
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAddOffset(clang::interp::PrimType, clang::interp::SourceInfo const&)':
Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?
And that was even pointed out several days ago before the patch got relanded last two times:
@nand Please don't reland this patch until these concerns are addressed.
Thanks for identifying these issues - I fixed the cycle detected by modules since then, however I wasn't aware of the issue with shared libs until now.
MSVC builds are seeing template instantiation issues with this patch: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/27972/steps/build/logs/warnings%20%2824%29
Merged clangInterp and clangAST into a single library while keeping Interp in a separate folder.
This is required since clangInterp needs access to the AST, but the intry points to the interpreter
are also in AST.
Does anyone have any suggestions on how to fix the MSVC problems? I am trying to get access to a Windows machine, but it's not guaranteed.
I don't have an actionable alternative suggestion but this seems like counter-correct layering, trying to paper over the issue instead of fixing it.
The existing evaluator is not a separate library to start with. Given the tangling between ExprConstants.cpp and the AST nodes, I don't really see any elegant way of dealing with this problem.
An example of the problem is FieldDecl::getBitWidthValue.
@nand The MSVC warnings are self explanatory - you've declared a number of methods (visitIndirectMember, emitConv and getPtrConstFn) but not provided definitions, as they're on template classes MSVC complains, a lot.
I am providing definitions in the C++ file - the problem is that they are not available in the header before the extern declaration. The methods are available at the site of the extern definition.
gcc and clang accept this, so does Visual Studio 2019. This feels like an incorrect implementation of extern templates in Visual Studio?
I see two ways to proceed: move everything into a header (would like to avoid this) or silence the warning on VC++ (not great either).
Is there a better way? Which option is less bad from these two?
I think I'm missing something - for instance when I search the patch for "visitIndirectMember", all I get is this in ByteCodeExprGen.h:
bool visitIndirectMember(const BinaryOperator *E);
I don't see it in ByteCodeExprGen.cpp
Totally missed that - thanks for noticing. I must have forgotten to remove stuff from the header since clang/gcc don't warn about it.
I'll get hold of a Windows machine soon-ish and I'll make sure to fix this problem.
If you remove those 3 declarations from the headers and update the patch I can test it on my windows machines for you.
So I'm running into issues with this patch, specifically the line annotated below. I'm trying to compile our CHERI-LLVM fork as a native pure-capability CHERI binary (which eventually will mean compiling the Morello-LLVM fork of our CHERI-LLVM to run as a native pure-capability CHERI binary on Arm's prototype Morello board when it ships late this year/early next year), and I have LLVM (other than Orc JIT which is a whole beast and can't work without in-tree backend support) and LLD working (albeit only very lightly tested), and if I stub out that single function I am able to get Clang itself compiling and working (at least, well enough to compile a hello world C program all the way down to an executable), so this is the only technical blocker that I know of, which is rather frustrating.
CHERI capabilities can be used to implement C language pointers, providing strict fine-grained spatial safety and pointer provenance guarantees (i.e. it's not just that pointers have bounds, you also can't go and synthesise a pointer with the bounds you want if you don't already have one for that region), plus heap temporal safety on an experimental branch. This is done using tagged memory, where every capability-sized word in memory has a single tag bit indicating whether it has a valid capability or not. This means that, although you can memcpy capabilities around, they must remain aligned, otherwise their tags will become cleared and no longer be dereferenceable.
So, for us, that line _has_ to be able to be written as auto Punned = *(uintptr_t *)Ptr; (although it can be written in equivalent ways), as byte swapping uintptr_t is not something we can ever support (though with endianness::native you could at least make the templated version specialised to not call swapByteOrder, though the one taking a function argument can't be, but why bother going through all that indirection), nor is storing pointers to unaligned addresses if you want to be able to use them as pointers later.
I know nothing about this recently(ish)-added Clang constexpr interpreter, but it seems like this should be a solvable problem with some changes to the design to be less lax about alignment. Is there any documentation you could provide on how CodePtr is meant to work? "Pointer into the code segment." isn't exactly helpful, especially given it's actually storing pointers, which is not what you normally find in code segments when talking instead about compiled code. Do you have ideas for how this problem can be avoided?
^^^ this line ^^^
CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.
Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?
I've read through the code and have slightly more understanding now. It seems there are several options:
- Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
- Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
- Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments
1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?
Hm, though I see the "store an ID" pattern is common for dynamic things and this should be quite rare, so maybe that is indeed the right approach, mirroring something like getOrCreateGlobal?