Page MenuHomePhabricator

[Clang Interpreter] Initial patch for the constexpr interpreter
ClosedPublic

Authored by nand on Jul 3 2019, 10:30 AM.

Details

Summary

This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nand added inline comments.Aug 13 2019, 7:05 AM
clang/lib/AST/Interp/ByteCodeGen.cpp
647–662 ↗(On Diff #213277)

This is the only case where such a switch occurs - there is another for zero initialisation, but it's quite different. A def which would capture the little common functionality would be quite ugly, so I would like to avoid it.

As for other integers, there won't be support for any. There are two more branches for the fixed-point catch-all fallback.

clang/lib/AST/Interp/ByteCodeGen.h
40 ↗(On Diff #213277)

The entry point is through the emitter which needs to call into the code generator, things don't really work if the emitter is a field.

clang/lib/AST/Interp/Pointer.h
245–249 ↗(On Diff #213277)

I haven't written things up yet since I changed quite a few things over the past few weeks. I am wrapping up support for unions and will produce a more detailed document afterwards. In the meantime, I have documented this field.

nand added a comment.Aug 13 2019, 7:18 AM

The old path-based approach is slow for most operations, while the new pointer-based approach is fast for the most common ones.
For read/write/update operations, it is enough to check the pointer and at most a single descriptor to ensure whether the operation
can be performed. For some operations, such as some pointer comparisons, in the worst case the old-style path can be reproduced by
following the pointers produced through the getBase method, allowing us to implement everything.

Extern objects are handled by allocating memory for them, allowing pointers to those regions to exist, but marking the actual storage
as extern/uninitialized and preventing reads/writes to that storage.

Some details of corner cases are not clear at this point, but the fact that we can reproduce full paths and have space to store any special
bits gives me the confidence that we can converge to a full implementation.

I want to stress again the fact that C++ has an insane number of features and I am focusing on implementing as many of them, leaving a
path towards future optimisations open. I do not want to clobber initial patches with optimisations that would increase complexity - the
few things we do with direct local reads/writes and improved loops already yield significant speedupds.

I think we're past the point of large-scale structural comments that are better addressed before the first commit, and it'd be much better to make incremental improvements from here.

Please go ahead and commit this, and we can iterate in-tree from now on. Thanks!

clang/include/clang/Basic/OptionalDiagnostic.h
40–74 ↗(On Diff #214828)

I think these three all belong in PartialDiagnostic rather than in OptionalDiagnostic. (It doesn't make sense that an OptionalDiagnostic can format an APSInt but a PartialDiagnostic cannot.)

clang/include/clang/Driver/Options.td
839 ↗(On Diff #214828)

const -> constant

clang/lib/AST/Interp/ByteCodeGen.cpp
25 ↗(On Diff #214828)

using llvm::Expected;

536 ↗(On Diff #213277)

As noted, the evaluation should be treated as non-constant in this case. This is "evaluate for overflow" mode, in which we want to keep evaluating past non-constant operations (and merely skip sub-evaluations that we can't process because they have non-constant values). We will need a bytecode representation for "non-constant evaluation". We should use that here.

clang/lib/AST/Interp/ByteCodeGen.h
40 ↗(On Diff #213277)

I think the fact that the entry point to ByteCodeGen is in the Emitter base class is the cause of this design complexity (the inheritance, virtual functions, and layering cycle between the code generator and the emitter). For example, looking at ByteCodeEmitter, we have:

  • ByteCodeEmitter::compileFunc is the entry point, and is the only function that calls compile
  • ByteCodeEmitter::compile is a virtual function that calls into ByteCodeGen
  • otherwise, ByteCodeGen only calls into ByteCodeEmitter and ByteCodeEmitter never calls into ByteCodeGen

(And the same thing happens in EvalEmitter.) That to me suggests an opportunity to improve the layering:

  • move ByteCodeEmitter::compileFunc out of ByteCodeEmitter into a standalone function that creates a ByteCodeGen<ByteCodeEmitter> and calls compile on it
  • remove the virtual functions and store the Emitter as a member of ByteCodeGen

No need to do this right away.

85 ↗(On Diff #214828)

Please can you pick a different name for this, that's more different from the AccessKinds enum? AccessOp or something like that maybe?

clang/lib/AST/Interp/Opcodes.td
178–221 ↗(On Diff #214828)

Please document what the Args mean.

nand updated this revision to Diff 217419.Aug 27 2019, 9:07 AM

refactoring + implemented changes

nand updated this revision to Diff 217428.Aug 27 2019, 10:12 AM

deactivated tests

nand updated this revision to Diff 217448.Aug 27 2019, 11:30 AM

replaced llvm::make_unique with std::make_unique

This revision was not accepted when it landed; it landed in state Needs Review.Aug 30 2019, 8:02 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2019, 8:02 AM

Seems like this patch introduced a cyclic dependency between Basic and AST when building with LLVM_ENABLE_MODULES=On:

While building module 'Clang_AST' imported from llvm/lldb/include/lldb/Symbol/ClangUtil.h:14:
While building module 'Clang_Basic' imported from llvm/llvm/../clang/include/clang/AST/Stmt.h:18:
In file included from <module-includes>:73:
clang/include/clang/Basic/OptionalDiagnostic.h:17:10: fatal error: cyclic dependency in module 'Clang_AST': Clang_AST -> Clang_Basic -> Clang_AST
#include "clang/AST/APValue.h"

This patch broke my build as well (discovered by runnin git bisect), command line

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_ASM_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=ON -DLLVM_USE_SPLIT_DWARF=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=ARC -DLLVM_ENABLE_PROJECTS=clang -DLLVM_BUILD_TESTS=ON -DLLVM_BUILD_DOCS=ON -DLLVM_ENABLE_WERROR=ON -H/redacted/git/llvm-project/llvm -B/tmp/llvm-project_dbg_compiled-with-clang -GNinja
ninja -C /tmp/llvm-project_dbg_compiled-with-clang

Error:

/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/Block.cpp.o:(.data+0x0): undefined reference to `llvm::EnableABIBreakingChecks'
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAdd(clang::interp::PrimType, clang::interp::SourceInfo const&)':
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAddOffset(clang::interp::PrimType, clang::interp::SourceInfo const&)':

...

svenvh added a subscriber: svenvh.Sep 4 2019, 3:40 AM

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Arrived at same conclusions, reverted in rL370874.

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Arrived at same conclusions, reverted in rL370874.

And that was even pointed out several days ago before the patch got relanded last two times:

Seems like this patch introduced a cyclic dependency between Basic and AST when building with LLVM_ENABLE_MODULES=On:

While building module 'Clang_AST' imported from llvm/lldb/include/lldb/Symbol/ClangUtil.h:14:
While building module 'Clang_Basic' imported from llvm/llvm/../clang/include/clang/AST/Stmt.h:18:
In file included from <module-includes>:73:
clang/include/clang/Basic/OptionalDiagnostic.h:17:10: fatal error: cyclic dependency in module 'Clang_AST': Clang_AST -> Clang_Basic -> Clang_AST
#include "clang/AST/APValue.h"

@nand Please don't reland this patch until these concerns are addressed.

nand reopened this revision.Sep 4 2019, 4:12 AM

Thanks for identifying these issues - I fixed the cycle detected by modules since then, however I wasn't aware of the issue with shared libs until now.

RKSimon requested changes to this revision.Sep 4 2019, 4:40 AM
RKSimon added a subscriber: RKSimon.
This revision now requires changes to proceed.Sep 4 2019, 4:40 AM
nand updated this revision to Diff 218681.Sep 4 2019, 6:47 AM

Merged clangInterp and clangAST into a single library while keeping Interp in a separate folder.
This is required since clangInterp needs access to the AST, but the intry points to the interpreter
are also in AST.

nand added a comment.Sep 4 2019, 6:49 AM

Does anyone have any suggestions on how to fix the MSVC problems? I am trying to get access to a Windows machine, but it's not guaranteed.

Merged clangInterp and clangAST into a single library while keeping Interp in a separate folder.
This is required since clangInterp needs access to the AST, but the intry points to the interpreter
are also in AST.

I don't have an actionable alternative suggestion but this seems like counter-correct layering, trying to paper over the issue instead of fixing it.

nand added a comment.Sep 4 2019, 6:58 AM

The existing evaluator is not a separate library to start with. Given the tangling between ExprConstants.cpp and the AST nodes, I don't really see any elegant way of dealing with this problem.

An example of the problem is FieldDecl::getBitWidthValue.

nand added a reviewer: rnk.Sep 4 2019, 11:25 AM
nand added a subscriber: rnk.

@jfb suggested I add @rnk: I was wondering if you have any suggestions on how to fix the msvc warnings/errors? I'd be grateful if I had some feedback on what features of C++ I should avoid using and what I should replace them with.

@nand The MSVC warnings are self explanatory - you've declared a number of methods (visitIndirectMember, emitConv and getPtrConstFn) but not provided definitions, as they're on template classes MSVC complains, a lot.

nand added a comment.Sep 9 2019, 2:45 PM

I am providing definitions in the C++ file - the problem is that they are not available in the header before the extern declaration. The methods are available at the site of the extern definition.
gcc and clang accept this, so does Visual Studio 2019. This feels like an incorrect implementation of extern templates in Visual Studio?

I see two ways to proceed: move everything into a header (would like to avoid this) or silence the warning on VC++ (not great either).
Is there a better way? Which option is less bad from these two?

I am providing definitions in the C++ file - the problem is that they are not available in the header before the extern declaration. The methods are available at the site of the extern definition.
gcc and clang accept this, so does Visual Studio 2019. This feels like an incorrect implementation of extern templates in Visual Studio?

I see two ways to proceed: move everything into a header (would like to avoid this) or silence the warning on VC++ (not great either).
Is there a better way? Which option is less bad from these two?

I think I'm missing something - for instance when I search the patch for "visitIndirectMember", all I get is this in ByteCodeExprGen.h:

bool visitIndirectMember(const BinaryOperator *E);

I don't see it in ByteCodeExprGen.cpp

nand added a comment.Sep 10 2019, 10:55 AM

Totally missed that - thanks for noticing. I must have forgotten to remove stuff from the header since clang/gcc don't warn about it.
I'll get hold of a Windows machine soon-ish and I'll make sure to fix this problem.
Thanks!

Totally missed that - thanks for noticing. I must have forgotten to remove stuff from the header since clang/gcc don't warn about it.
I'll get hold of a Windows machine soon-ish and I'll make sure to fix this problem.
Thanks!

If you remove those 3 declarations from the headers and update the patch I can test it on my windows machines for you.

nand updated this revision to Diff 219660.Sep 10 2019, 11:33 PM

removed declarations with no definitions

nand updated this revision to Diff 219785.Sep 11 2019, 1:20 PM

reset accidentally changed tests

Thanks - this builds cleanly on MSVC now

nand added a comment.Sep 12 2019, 4:10 AM

Thanks for looking into the problem - sorry for the delay!

Thanks for looking into the problem - sorry for the delay!

No problem, you still need a specialist to approve the patch though

nand updated this revision to Diff 219965.Sep 12 2019, 12:02 PM

Added a FIXME to reflect the intention to move the interpreter to its own library

jfb accepted this revision.Sep 12 2019, 3:19 PM

Sounds like this is ready to land again! Thanks for fixing everything.

RKSimon accepted this revision.Sep 13 2019, 1:41 AM

Accepting this to remove the block

This revision is now accepted and ready to land.Sep 13 2019, 1:41 AM
This revision was automatically updated to reflect the committed changes.
ormris added a subscriber: ormris.Sep 23 2019, 2:49 PM
ormris removed a subscriber: ormris.Sep 24 2019, 10:26 AM
jrtc27 added a subscriber: jrtc27.EditedFeb 16 2021, 3:20 PM

So I'm running into issues with this patch, specifically the line annotated below. I'm trying to compile our CHERI-LLVM fork as a native pure-capability CHERI[1] binary (which eventually will mean compiling the Morello-LLVM fork of our CHERI-LLVM to run as a native pure-capability CHERI binary on Arm's prototype Morello[2] board when it ships late this year/early next year), and I have LLVM (other than Orc JIT which is a whole beast and can't work without in-tree backend support) and LLD working (albeit only very lightly tested), and if I stub out that single function I am able to get Clang itself compiling and working (at least, well enough to compile a hello world C program all the way down to an executable), so this is the only technical blocker that I know of, which is rather frustrating.

CHERI capabilities can be used to implement C language pointers, providing strict fine-grained spatial safety and pointer provenance guarantees (i.e. it's not just that pointers have bounds, you also can't go and synthesise a pointer with the bounds you want if you don't already have one for that region), plus heap temporal safety on an experimental branch. This is done using tagged memory, where every capability-sized word in memory has a single tag bit indicating whether it has a valid capability or not. This means that, although you can memcpy capabilities around, they must remain aligned, otherwise their tags will become cleared and no longer be dereferenceable.

So, for us, that line _has_ to be able to be written as auto Punned = *(uintptr_t *)Ptr; (although it can be written in equivalent ways), as byte swapping uintptr_t is not something we can ever support (though with endianness::native you could at least make the templated version specialised to not call swapByteOrder, though the one taking a function argument can't be, but why bother going through all that indirection), nor is storing pointers to unaligned addresses if you want to be able to use them as pointers later.

I know nothing about this recently(ish)-added Clang constexpr interpreter, but it seems like this should be a solvable problem with some changes to the design to be less lax about alignment. Is there any documentation you could provide on how CodePtr is meant to work? "Pointer into the code segment." isn't exactly helpful, especially given it's actually storing pointers, which is not what you normally find in code segments when talking instead about compiled code. Do you have ideas for how this problem can be avoided?

[1] http://cheri-cpu.org/
[2] https://developer.arm.com/architectures/cpu-architecture/a-profile/morello

cfe/trunk/lib/AST/Interp/Source.h
69

^^^ this line ^^^

nand added a comment.Feb 17 2021, 12:17 AM

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

Hm, though I see the "store an ID" pattern is common for dynamic things and this should be quite rare, so maybe that is indeed the right approach, mirroring something like getOrCreateGlobal?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

Hm, though I see the "store an ID" pattern is common for dynamic things and this should be quite rare, so maybe that is indeed the right approach, mirroring something like getOrCreateGlobal?

https://reviews.llvm.org/D97606 implements this.