This is an archive of the discontinued LLVM Phabricator instance.

[Clang Interpreter] Initial patch for the constexpr interpreter
ClosedPublic

Authored by nand on Jul 3 2019, 10:30 AM.

Details

Summary

This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rsmith added inline comments.Jul 26 2019, 5:03 PM
clang/lib/AST/ExprVM/InterpFrame.h
89–90 ↗(On Diff #208520)

Separate allocation for each frame seems wasteful. Have you considered using a slab-allocated stack or similar? (Eg, allocate a large block up front and allocate the locals from that, and allocate more blocks if the first one is exhausted; you can cache the blocks on the vm::Context for reuse with no extra round-trips to the heap.)

clang/lib/AST/ExprVM/Opcodes.inc
1 ↗(On Diff #208520)

This file is hardcoding too much of its #includers' implementation details. Please find a way to improve that. Generally we prefer

enum Opcode : uint32_t {
#define OPCODE(Name) EO_ ## Name,
#include "Opcodes.inc"
};

with this in Opcodes.inc:

OPCODE(Jmp)
OPCODE(JT)
OPCODE(JN)
// ...
#undef OPCODE

over the style you're using here, as it makes the point of use much more readable, and allows reuse of the same set of macros rather than having a distinct one for each use.

1 ↗(On Diff #208520)

This file also needs a *lot* more documentation strings than it currently has. We need to document what all the opcodes are, what effects they have on the interpreter state, and so on.

344–392 ↗(On Diff #208520)

Opcode names should be formatted following our normal coding conventions: use UpperCamelCase here rather than ALL_CAPS.

clang/lib/AST/ExprVM/Pointer.h
30 ↗(On Diff #208520)

How is this going to work for subobjects? We can't generate the full space of all possible descriptors, as that would be unmanageable for large arrays of complicated types.

36–39 ↗(On Diff #208520)

Please use a distinct type for representing size-in-the-interpreter, to avoid confusion and bugs when dealing with code that must manage both compile-time sizes and runtime sizes. And please use CharUnits for representing sizes-at-runtime.

40–49 ↗(On Diff #208520)

Consider packing these 5 members into 4 bytes.

clang/lib/AST/ExprVM/Program.h
37 ↗(On Diff #208520)

Consider adding a type-safe wrapper for this.

105 ↗(On Diff #208520)

If CodePtr is a plain pointer, using a single flat vector here seems dangerous; you'll have invalidation issues if you ever trigger creation of a Function while evaluating, which seems like something that will happen in practice, eg:

constexpr int f();
constexpr int g() { return f(); }
constexpr int f() { return 0; } // or imported from a module at this point
constexpr int k = g(); // evaluation of this call triggers generation of code for f
106–107 ↗(On Diff #208520)

Using a (presumably very large) std::map for this would presumably be very time- and space-inefficient. Do you expect this to be dense? (Would a flat array of SourceInfo work better?)

clang/lib/AST/ExprVM/Type.h
24–34 ↗(On Diff #208520)

As previously, please follow coding guidelines wrt naming here.

51–53 ↗(On Diff #208520)

What do you expect to use this for? If you need to encode a pointer, would it make sense to have a distinct primitive type for that?

64–75 ↗(On Diff #208520)

Injecting T into this seems too implicit to me. Consider instead accepting the name of a function-like macro and invoking it on the type. Eg:

#define RETURN_SIZEOF(T) return sizeof(T);
TYPE_SWITCH(Type, RETURN_SIZEOF);

Also please wrap the expansion in do { ... } while (0) to require the user of the macro to provide a semicolon afterwards.

nand added a comment.Jul 29 2019, 10:03 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr? If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

jfb added a comment.Jul 29 2019, 11:06 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr? If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

Most integers won't be in that category, you can therefore speculate that fact when emitting bytecode, and throw away byte code when the assumption turns out to be wrong (and re-generate the more expensive byte code).

uenoku added a subscriber: uenoku.Jul 29 2019, 11:11 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr?

It's not sufficient for this to handle only the things that are allowed in constant expressions; you also need to allow the things that are allowed by Clang's current constant evaluator, which includes this case. There are also constructs that allow arbitrary constant folding within the context of constant expression evaluation (such as a __builtin_constant_p conditional). So yes, you need to deal with this.

If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

These cases are likely to be rare enough that separately-allocated storage for this case could work. You'll need at least one bit somewhere to track whether you're in the "pointer cast to integer" case, though.

(You also need to be able to distinguish between an integer value and an uninitialized integer and an out-of-lifetime integer, so you'll presumably need some side-table to track the state of subobjects anyway.)

nand marked 10 inline comments as done.Jul 29 2019, 1:01 PM

We can add a separate integer type which tracks all the additional information required by __builtin_constant_p and compile all integers to it in this context. A later patch added an APInt fallback to the interpreter if an integral cannot be mapped to a type supported by the VM - this mechanism could be used to implement the fallback for contexts which cast pointers to integers.

clang/lib/AST/ExprVM/Compiler.h
125 ↗(On Diff #208520)

I've removed the size field since it's not going to be used, but left the structure since it will gain other fields in future patches.

bruno added a subscriber: bruno.Jul 31 2019, 12:01 PM
nand marked 40 inline comments as done.Jul 31 2019, 2:09 PM

I have applied most of the changes you suggested to my HEAD which had significantly more functionality, including a replacement of Opcodes.in with TableGen.
Quite a few of your concerns are answered by the features I have added between submitting the patch and now. The interpreter now stands at ~6k LOC.

Should I update the diff with an interpreter trimmed down to the functionality of the current diff (~3k LOC) or post the full interpreter now? What would be easier to review?

clang/lib/AST/ExprVM/Compiler.cpp
88 ↗(On Diff #208520)

Overflow happens here if a function takes more than a few million arguments. I think clang would die before it gets here.

488–503 ↗(On Diff #208520)

I'd like to do this in a future diff, once we add support for more constructs which discard their results.

clang/lib/AST/ExprVM/InterpFrame.h
89–90 ↗(On Diff #208520)

Yes, I considered allocating this on the stack - I just don't want to add that complexity to the interpreter yet. I like to avoid changing this until it becomes a noticeable performance problem.

clang/lib/AST/ExprVM/Pointer.h
30 ↗(On Diff #208520)

Subobjects will have pointers to descriptors embedded into their data - a pointer inside a block can follow that chain of descriptors up to the root.

36–39 ↗(On Diff #208520)

I've added an alias to be used for the sizes of object as determined by the interpreter - CharUnits will be used when interfacing with APValue.

40–49 ↗(On Diff #208520)

I'd like to avoid packing stuff for now, makes it harder to change things later and it's not a performance problem yet.

clang/lib/AST/ExprVM/Program.h
105 ↗(On Diff #208520)

This wouldn't have been a problem, but the compilation of default constructors caused issues. Now the compiler can recursively invoke itself.

106–107 ↗(On Diff #208520)

Replaced this with a dense array + binary search.

clang/lib/AST/ExprVM/Type.h
51–53 ↗(On Diff #208520)

Removed this type.

64–75 ↗(On Diff #208520)

I'd rather have a documented implicit than use nested macros...

nand updated this revision to Diff 213277.Aug 4 2019, 8:22 PM
nand marked 10 inline comments as done.

Implemented suggestions from @rsmith.

Since I had ~30 patches on top of the code in the diff, I applied the changes on those
and removed some functionality to produce this updated diff. This update uses the
byte code generator parameterised by a template emitter: the LinkEmitter emits and
links bytecode, while the EvalEmitter directly evaluates opcodes. The interpreter,
evaluator and most opcode-related code is now generated from Opcodes.td through
TableGen. Since the feedback included several references to pointers, I have included
some pointer functionality as well.

I hope the size of this diff is manageable - if really required, I could further trim
it down. Since this is a new subproject which does not introduce intrusive changes and
there are a significant number of other commits depending on this patch, I would like
to avoid that.

nand retitled this revision from [ConstExprPreter][WIP] Initial patch for the constexpr interpreter to [Clang Interpreter] Initial patch for the constexpr interpreter.Aug 4 2019, 8:23 PM

I would like to better understand the big picture for descriptors, pointers, and so on. I'm not yet seeing how the pieces are going to fit together and not frequently require expensive operations. For example, pointer arithmetic requires determining the array bound of the pointee; pointer subtraction requires determining whether two pointers point into the same array; pointer comparisons require determining the divergence point in the access paths between two pointers. These path-based operations were easy in the old representation, and seem to be difficult in the new representation, so I want to make sure that they've been properly considered. It's also not clear to me how you'll model pointers that have lost their designator (through constant folding), where you have a runtime offset but no compile-time offset, or pointers that point to objects whose values are not part of the evaluation (for example, extern globals), or pointers that have a designator but no base (for __builtin_object_size handling).

This choice of representation also seems to make it really easy for a minor bug in the interpreter to turn into undefined behavior during compilation. Have you given any thought to how that might be avoided or mitigated?

clang/include/clang/Basic/LangOptions.def
291–294 ↗(On Diff #213277)

"Clang" is redundant here (what else could it be?). If we're calling the new thing "interp" internally, then I guess "EnableInterp" and "ForceInterp" seem OK, but given that this is just scaffolding until the new interpreter is done, maybe "EnableNewInterp" and "ForceNewInterp" would be better. Suggestion:

BENIGN_LANGOPT(EnableNewInterp, 1, 0,
               "enable the experimental new constexpr interpreter")
BENIGN_LANGOPT(ForceNewInterp, 1, 0,
               "force the use of the experimental new constexpr interpreter")
clang/include/clang/Driver/Options.td
836–839 ↗(On Diff #213277)

On reflection, let's put "new-" in here too, to match -fexperimental-new-pass-manager (so -fexperimental-new-constexpr-interpreter). (As above, this flag name should not contain the string "clang".)

clang/lib/AST/CMakeLists.txt
85–87 ↗(On Diff #213277)

Please keep these in alphabetical order.

clang/lib/AST/ExprConstant.cpp
55–56 ↗(On Diff #213277)

Please keep these in alphabetical order.

5039–5040 ↗(On Diff #213277)

Missing a check for ForceClangInterp here.

12135–12136 ↗(On Diff #213277)

Missing a check for ForceClangInterp here.

12362–12364 ↗(On Diff #213277)

Isn't this in the wrong place? If interp succeeds, I think we always want to stop; it's only if interp fails / bails that ForceClangInterp should matter. Or did I misunderstand what that flag is for?

clang/lib/AST/ExprVM/Pointer.h
30 ↗(On Diff #208520)

I don't think I understand what you're suggesting here. If I have, for example:

struct X { char c; };
X x[1000000];

... are you saying you would embed 1000000 descriptors into the representation of x? That seems extremely wasteful to me. Presumably the goal should be to get as close to allocating 1000000 bytes of storage for x as possible. I think we can probably get down to 1000000 bytes for the c members plus perhaps 1 bit for each X object (to track whether it's within its lifetime) and 2 bits for each char object (to track whether it's within its lifetime and whether it's initialized), for a 1.375x storage overhead in this case. But I don't see how you get there from this representation.

clang/lib/AST/Interp/ByteCodeGen.cpp
54 ↗(On Diff #213277)

I think emitDestruction would probably be a better name, since this needs to encompass things that are not destructors (such as invalidating pointers/references that point into the destroyed objects).

99 ↗(On Diff #213277)

Doc comment please. "Block" means two different things in Clang (compound statements and the Apple Blocks extension) and a reader needs to know which one.

119 ↗(On Diff #213277)

I don't know what this means. Declarators don't typically introduce a scope (for lifetime purposes).

Do you mean "for the initializer of a top-level variable declaration" or something like that?

145–153 ↗(On Diff #213277)

No else after return, please:

if (auto *Exp = dyn_cast<Expr>(S)) {
  if (Exp->getType()->isVoidType())
    return this->Visit(Exp);
  return discard(Exp);
}
return this->bail(S);
161 ↗(On Diff #213277)

No auto here, and please don't shadow clang::Stmt with a local variable.

186 ↗(On Diff #213277)

No auto here, please.

206 ↗(On Diff #213277)

No auto here, please.

217 ↗(On Diff #213277)

No auto here, please.

221 ↗(On Diff #213277)

No auto here, please.

229–230 ↗(On Diff #213277)

Especially no auto here. Even someone familiar with the rest of the clang codebase won't be able to guess these types. In particular, as a reader, I'm wondering if this is some type provided by Emitter or whether it's some type provided by ByteCodeGen, and what the contract is between this class and its template argument with regard to this type. Spelling this as typename Emitter::Label would help substantially.

I'm not going to point out any more uses, but generally if you only use auto when the type is obvious from the spelling of the initializer (eg, when the initializer is dyn_cast<T>(...)), that would be fine.

256 ↗(On Diff #213277)

(This uses of auto like this are also fine; it's obvious this is an iterator and we don't need to know more than that.)

352–381 ↗(On Diff #213277)

Have you considered how you will support Clang's current constant folding of && and || expressions where the left-hand side is non-constant but the right hand side is a false or true constant (respectively)?

536 ↗(On Diff #213277)

We can, but should we?

I would think the only time we can reach this case is when we're constant-evaluating an expression within a function and that expression names a function parameter, and in that case, the expression should be treated as non-constant. We should have a more direct way to represent "if you get here, the evaluation is non-constant, with this reason" in the bytecode.

546 ↗(On Diff #213277)

The above ParmVarDecl case can fall through into here, because a ParmVarDecl is a VarDecl. Is that what you intended?

Do you actually need to treat locals and parameters separately? This would seem simpler if you could model them as being the same thing.

600–601 ↗(On Diff #213277)

Using an && here rather than a nested if would help readability a little.

605–611 ↗(On Diff #213277)

Please try to find some way to refactor this so that there's not so many nested scopes; it makes it hard to follow the control flow within the function.

647–662 ↗(On Diff #213277)

I'd like to see this generated from a .def file, rather than enumerating the various currently-supported integer types in many different places. The goal should be that adding support for a new integer type / size (say we start supporting a target with 48 bit integers) should require making changes in only one place. Case in point: you're currently missing support for 128-bit integers, which Clang does support across many targets.

718–720 ↗(On Diff #213277)

These {} arguments are completely opaque here. Please add some kind of a clue as to what you're passing in here. (Can you use SourceInfo()?)

clang/lib/AST/Interp/ByteCodeGen.h
40 ↗(On Diff #213277)

Does Emitter need to be a base class rather than a member?

111–113 ↗(On Diff #213277)

Is there any reason these need to be virtual? It looks like you would never call these from a context where the concrete type is unknown. (These are the only implementations of the pure virtual functions they override.)

clang/lib/AST/Interp/Descriptor.h
29–31 ↗(On Diff #213277)

Parameter names here would help explain what these functions are supposed to do -- particularly for BlockMoveFn (which char* is which?), but also for the bool parameter of BlockCtorFn.

62 ↗(On Diff #213277)

scoped -> scope

clang/lib/AST/Interp/EvalEmitter.cpp
71–75 ↗(On Diff #213277)

It seems like a problem that an unsupported construct in a !isActive() context causes us to fail evaluation.

105–111 ↗(On Diff #213277)

Can this happen? I would expect not: if we can't cope with backwards jumps here, I wouldn't expect us to be able to cope with function calls / returns.

If this can happen, we need to skip anything that happens after the return rather than keeping on evaluating it.

clang/lib/AST/Interp/EvalEmitter.h
69 ↗(On Diff #213277)

I can't find any calls to this, and I'm not sure what it's for. Is it necessary?

112–114 ↗(On Diff #213277)

I'm curious how much this approach costs us, compared to allowing the emitter to tell ByteCodeGen whether to emit both arms of a ?: (or if not, telling it which one to follow) and similarly for && / ||. There's the constant overhead of checking isActive() on each action, plus the cost of the calls into EvalEmitter while "evaluating" the unselected operand. (Eg, false ? some-very-large-expression : 0 would be evaluated much faster if we could tell ByteCodeGen not to bother with the unselected operand.)

(This approach doesn't support statement-expressions, because there can be backwards jumps in those, so I think ?:/&&/|| are pretty much the only things this supports, right?)

clang/lib/AST/Interp/Interp.h
213–222 ↗(On Diff #213277)

This is missing lots of necessary checks; I think your chosen model for pointers makes these checks quite hard to implement. The validity of pointer comparisons in constant evaluation depends on the point of divergence of the two access paths. For example:

struct A { int n; };
struct B : A { int m; } b;
B ba[10];
constexpr bool x = ba[0].n < &ba[0].m; // error, not constant, divergence point is base class
constexpr bool y = ba[0].n < &ba[1].m; // ok, divergence point is array indexing

Also, access control must be taken into account:

struct A {
  int x;
private:
  int y;
  constexpr bool q(A a) { return &a.x < &a.y; } // not constant, x and y have different access
};
218–219 ↗(On Diff #213277)

On what basis do you assume that the orders of the offsets in the compile-time representation has any relation to the correct order for a pointer comparison? Do you intend to guarantee this when doing struct layout for the interpreter? If so, please include a comment here explaining that.

clang/lib/AST/Interp/InterpFrame.cpp
25–36 ↗(On Diff #213277)

It seems wasteful to me to do this initialization work when entering the frame rather than when reaching the declaration of the variable in question. Do we need to do it early like this?

How does this deal with loops? I would have expected that we'd re-initialize a loop scope each time we enter the loop body, so that we don't confuse variables from one iteration with variables from another.

clang/lib/AST/Interp/LinkEmitter.h
32 ↗(On Diff #213277)

Should this be called ByteCodeEmitter if it emits byte code?

clang/lib/AST/Interp/Opcodes.td
1 ↗(On Diff #213277)

This still needs to be expanded to actually describe what the opcodes do (that is, their effects on the stack and any side-effects they cause). Something simple like:

// [RetVal] -> [], returns RetVal
def Ret : Opcode {
// [A, B] -> [B - A]
def Sub : AluRealOpcode;

would be enough.

clang/lib/AST/Interp/Pointer.h
69–78 ↗(On Diff #213277)

If we're allocating one of these for every storage allocation in the evaluation, packing these bools into the spare bits in the pointers seems worthwhile.

245–249 ↗(On Diff #213277)

What is this for? I think we need a design document here describing the intended model for pointers, references, descriptors, and so on -- preferably something that can be included in the internals manual.

251–261 ↗(On Diff #213277)

You will also need to track at least:

  • Whether this is a one-past-the-end pointer
  • (Information equivalent to) which union members were accessed on the path to this point

I think everything else can be reconstructed from the type and offset.

nand updated this revision to Diff 214828.Aug 13 2019, 7:05 AM
nand marked 60 inline comments as done.

Implemented @zygoloid's suggestions

nand added a comment.Aug 13 2019, 7:05 AM

updated diff, implemented requested changes

clang/include/clang/Basic/LangOptions.def
291–294 ↗(On Diff #213277)

The goal is to evaluate everything that is constant - I would like to avoid using constexpr specifically in the name.

clang/lib/AST/ExprConstant.cpp
5039–5040 ↗(On Diff #213277)

This is intended - if the interpreter bails and ForceInterp is set, we emit a diagnostic in Context, promoting the error to a failure.

clang/lib/AST/Interp/ByteCodeGen.cpp
352–381 ↗(On Diff #213277)

Same as with statement expressions in the evaluating emitter - some sort of closure which compiles and executes when triggered.

536 ↗(On Diff #213277)

We absolutely need this to emit the warning:

constexpr int callee_ptr(int *beg, int *end) {
  const int x = 2147483647;
  *beg = x + x; // expected-warning {{overflow in expression; result is -2 with type 'int'}}\
                // expected-note 3{{value 4294967294 is outside the range of representable values of type 'int'}}
  return *beg;
}
546 ↗(On Diff #213277)

Refactored + fixed this by moving stuff to separate functions.

clang/lib/AST/Interp/EvalEmitter.cpp
71–75 ↗(On Diff #213277)

Currently the interpreter cannot work side-by-side with the interpreter since reading from APValue is not supported at all... Since I am unsure whether I ever want to implement an APValue -> Interpreter mapping, I do not mind this, as long as diagnostic pinpoint the next feature to be implemented.

105–111 ↗(On Diff #213277)

Theoretically not, but it helped me catch a few bugs in the past, so I would like to keep it conditional.

clang/lib/AST/Interp/EvalEmitter.h
69 ↗(On Diff #213277)

The conditional operator will require it to handle the fallthrough from the false branch to the following expression.

112–114 ↗(On Diff #213277)

I'd like to revisit this at a later point in time - for now I think predication is the simplest option and I would like to stick with the simple options until the interpreter is feature-complete.

Statement expressions would not be predicated, the current plan is to generate a closure-like method for them.
Another idea would be to revert to bytecode generation whenever the landing pad of a potentially backwards jump is encountered - these spots are quite well defined.

clang/lib/AST/Interp/Interp.h
213–222 ↗(On Diff #213277)

I don't think there's a zero-cost way around this. We'll need to track more metadata to check whether fields are public/private and traverse the pointer chain to find divergence points. The InlineDescriptors might have a few spare bits could be used to cache "tags" which classify pointers into comparable classes? At this point, it is quite hard to decide what case to make fast at what cost.

218–219 ↗(On Diff #213277)

Yes, the layout of fields in the interpreter is supposed to match the target's record layout.

clang/lib/AST/Interp/InterpFrame.cpp
25–36 ↗(On Diff #213277)

We can optimise this later, avoiding the need to create blocks for local primitives which never have pointers taken. If a local does not have a pointer generated to it using GetLocalPtr, the block is not needed at all and zero-initialisation using memset should do the job.

clang/lib/AST/Interp/Opcodes.td
1 ↗(On Diff #213277)

I was a bit afraid of documenting everything and then changing everything, but things seem to be quite stable now, so I will add some documentation.

clang/lib/AST/Interp/Pointer.h
69–78 ↗(On Diff #213277)

It's definitely on the TODO list - I would like to avoid such optimisations for now, until enough features are available to run sizeable benchmarks.

251–261 ↗(On Diff #213277)

It's on the TODO list - I am aiming for C++11 conformance ASAP and this will be required.

nand added inline comments.Aug 13 2019, 7:05 AM
clang/lib/AST/Interp/ByteCodeGen.cpp
647–662 ↗(On Diff #213277)

This is the only case where such a switch occurs - there is another for zero initialisation, but it's quite different. A def which would capture the little common functionality would be quite ugly, so I would like to avoid it.

As for other integers, there won't be support for any. There are two more branches for the fixed-point catch-all fallback.

clang/lib/AST/Interp/ByteCodeGen.h
40 ↗(On Diff #213277)

The entry point is through the emitter which needs to call into the code generator, things don't really work if the emitter is a field.

clang/lib/AST/Interp/Pointer.h
245–249 ↗(On Diff #213277)

I haven't written things up yet since I changed quite a few things over the past few weeks. I am wrapping up support for unions and will produce a more detailed document afterwards. In the meantime, I have documented this field.

nand added a comment.Aug 13 2019, 7:18 AM

The old path-based approach is slow for most operations, while the new pointer-based approach is fast for the most common ones.
For read/write/update operations, it is enough to check the pointer and at most a single descriptor to ensure whether the operation
can be performed. For some operations, such as some pointer comparisons, in the worst case the old-style path can be reproduced by
following the pointers produced through the getBase method, allowing us to implement everything.

Extern objects are handled by allocating memory for them, allowing pointers to those regions to exist, but marking the actual storage
as extern/uninitialized and preventing reads/writes to that storage.

Some details of corner cases are not clear at this point, but the fact that we can reproduce full paths and have space to store any special
bits gives me the confidence that we can converge to a full implementation.

I want to stress again the fact that C++ has an insane number of features and I am focusing on implementing as many of them, leaving a
path towards future optimisations open. I do not want to clobber initial patches with optimisations that would increase complexity - the
few things we do with direct local reads/writes and improved loops already yield significant speedupds.

I think we're past the point of large-scale structural comments that are better addressed before the first commit, and it'd be much better to make incremental improvements from here.

Please go ahead and commit this, and we can iterate in-tree from now on. Thanks!

clang/include/clang/Basic/OptionalDiagnostic.h
40–74 ↗(On Diff #214828)

I think these three all belong in PartialDiagnostic rather than in OptionalDiagnostic. (It doesn't make sense that an OptionalDiagnostic can format an APSInt but a PartialDiagnostic cannot.)

clang/include/clang/Driver/Options.td
839 ↗(On Diff #214828)

const -> constant

clang/lib/AST/Interp/ByteCodeGen.cpp
25 ↗(On Diff #214828)

using llvm::Expected;

536 ↗(On Diff #213277)

As noted, the evaluation should be treated as non-constant in this case. This is "evaluate for overflow" mode, in which we want to keep evaluating past non-constant operations (and merely skip sub-evaluations that we can't process because they have non-constant values). We will need a bytecode representation for "non-constant evaluation". We should use that here.

clang/lib/AST/Interp/ByteCodeGen.h
40 ↗(On Diff #213277)

I think the fact that the entry point to ByteCodeGen is in the Emitter base class is the cause of this design complexity (the inheritance, virtual functions, and layering cycle between the code generator and the emitter). For example, looking at ByteCodeEmitter, we have:

  • ByteCodeEmitter::compileFunc is the entry point, and is the only function that calls compile
  • ByteCodeEmitter::compile is a virtual function that calls into ByteCodeGen
  • otherwise, ByteCodeGen only calls into ByteCodeEmitter and ByteCodeEmitter never calls into ByteCodeGen

(And the same thing happens in EvalEmitter.) That to me suggests an opportunity to improve the layering:

  • move ByteCodeEmitter::compileFunc out of ByteCodeEmitter into a standalone function that creates a ByteCodeGen<ByteCodeEmitter> and calls compile on it
  • remove the virtual functions and store the Emitter as a member of ByteCodeGen

No need to do this right away.

85 ↗(On Diff #214828)

Please can you pick a different name for this, that's more different from the AccessKinds enum? AccessOp or something like that maybe?

clang/lib/AST/Interp/Opcodes.td
178–221 ↗(On Diff #214828)

Please document what the Args mean.

nand updated this revision to Diff 217419.Aug 27 2019, 9:07 AM

refactoring + implemented changes

nand updated this revision to Diff 217428.Aug 27 2019, 10:12 AM

deactivated tests

nand updated this revision to Diff 217448.Aug 27 2019, 11:30 AM

replaced llvm::make_unique with std::make_unique

This revision was not accepted when it landed; it landed in state Needs Review.Aug 30 2019, 8:02 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2019, 8:02 AM

Seems like this patch introduced a cyclic dependency between Basic and AST when building with LLVM_ENABLE_MODULES=On:

While building module 'Clang_AST' imported from llvm/lldb/include/lldb/Symbol/ClangUtil.h:14:
While building module 'Clang_Basic' imported from llvm/llvm/../clang/include/clang/AST/Stmt.h:18:
In file included from <module-includes>:73:
clang/include/clang/Basic/OptionalDiagnostic.h:17:10: fatal error: cyclic dependency in module 'Clang_AST': Clang_AST -> Clang_Basic -> Clang_AST
#include "clang/AST/APValue.h"

This patch broke my build as well (discovered by runnin git bisect), command line

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_ASM_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=ON -DLLVM_USE_SPLIT_DWARF=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=ARC -DLLVM_ENABLE_PROJECTS=clang -DLLVM_BUILD_TESTS=ON -DLLVM_BUILD_DOCS=ON -DLLVM_ENABLE_WERROR=ON -H/redacted/git/llvm-project/llvm -B/tmp/llvm-project_dbg_compiled-with-clang -GNinja
ninja -C /tmp/llvm-project_dbg_compiled-with-clang

Error:

/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/Block.cpp.o:(.data+0x0): undefined reference to `llvm::EnableABIBreakingChecks'
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAdd(clang::interp::PrimType, clang::interp::SourceInfo const&)':
/usr/bin/ld: tools/clang/lib/AST/Interp/CMakeFiles/obj.clangInterp.dir/ByteCodeEmitter.cpp.o: in function `clang::interp::ByteCodeEmitter::emitAddOffset(clang::interp::PrimType, clang::interp::SourceInfo const&)':

...

svenvh added a subscriber: svenvh.Sep 4 2019, 3:40 AM

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Arrived at same conclusions, reverted in rL370874.

Shared library builds seem to be broken indeed. I tried fixing by adding Support and clangAST as dependencies for clangInterp, but that creates a cyclic dependency between clangAST <-> clangInterp. Which makes me wonder whether clangInterp should be a separate library or be part of clangAST?

Arrived at same conclusions, reverted in rL370874.

And that was even pointed out several days ago before the patch got relanded last two times:

Seems like this patch introduced a cyclic dependency between Basic and AST when building with LLVM_ENABLE_MODULES=On:

While building module 'Clang_AST' imported from llvm/lldb/include/lldb/Symbol/ClangUtil.h:14:
While building module 'Clang_Basic' imported from llvm/llvm/../clang/include/clang/AST/Stmt.h:18:
In file included from <module-includes>:73:
clang/include/clang/Basic/OptionalDiagnostic.h:17:10: fatal error: cyclic dependency in module 'Clang_AST': Clang_AST -> Clang_Basic -> Clang_AST
#include "clang/AST/APValue.h"

@nand Please don't reland this patch until these concerns are addressed.

nand reopened this revision.Sep 4 2019, 4:12 AM

Thanks for identifying these issues - I fixed the cycle detected by modules since then, however I wasn't aware of the issue with shared libs until now.

RKSimon requested changes to this revision.Sep 4 2019, 4:40 AM
RKSimon added a subscriber: RKSimon.
This revision now requires changes to proceed.Sep 4 2019, 4:40 AM
nand updated this revision to Diff 218681.Sep 4 2019, 6:47 AM

Merged clangInterp and clangAST into a single library while keeping Interp in a separate folder.
This is required since clangInterp needs access to the AST, but the intry points to the interpreter
are also in AST.

nand added a comment.Sep 4 2019, 6:49 AM

Does anyone have any suggestions on how to fix the MSVC problems? I am trying to get access to a Windows machine, but it's not guaranteed.

Merged clangInterp and clangAST into a single library while keeping Interp in a separate folder.
This is required since clangInterp needs access to the AST, but the intry points to the interpreter
are also in AST.

I don't have an actionable alternative suggestion but this seems like counter-correct layering, trying to paper over the issue instead of fixing it.

nand added a comment.Sep 4 2019, 6:58 AM

The existing evaluator is not a separate library to start with. Given the tangling between ExprConstants.cpp and the AST nodes, I don't really see any elegant way of dealing with this problem.

An example of the problem is FieldDecl::getBitWidthValue.

nand added a reviewer: rnk.Sep 4 2019, 11:25 AM
nand added a subscriber: rnk.

@jfb suggested I add @rnk: I was wondering if you have any suggestions on how to fix the msvc warnings/errors? I'd be grateful if I had some feedback on what features of C++ I should avoid using and what I should replace them with.

@nand The MSVC warnings are self explanatory - you've declared a number of methods (visitIndirectMember, emitConv and getPtrConstFn) but not provided definitions, as they're on template classes MSVC complains, a lot.

nand added a comment.Sep 9 2019, 2:45 PM

I am providing definitions in the C++ file - the problem is that they are not available in the header before the extern declaration. The methods are available at the site of the extern definition.
gcc and clang accept this, so does Visual Studio 2019. This feels like an incorrect implementation of extern templates in Visual Studio?

I see two ways to proceed: move everything into a header (would like to avoid this) or silence the warning on VC++ (not great either).
Is there a better way? Which option is less bad from these two?

I am providing definitions in the C++ file - the problem is that they are not available in the header before the extern declaration. The methods are available at the site of the extern definition.
gcc and clang accept this, so does Visual Studio 2019. This feels like an incorrect implementation of extern templates in Visual Studio?

I see two ways to proceed: move everything into a header (would like to avoid this) or silence the warning on VC++ (not great either).
Is there a better way? Which option is less bad from these two?

I think I'm missing something - for instance when I search the patch for "visitIndirectMember", all I get is this in ByteCodeExprGen.h:

bool visitIndirectMember(const BinaryOperator *E);

I don't see it in ByteCodeExprGen.cpp

nand added a comment.Sep 10 2019, 10:55 AM

Totally missed that - thanks for noticing. I must have forgotten to remove stuff from the header since clang/gcc don't warn about it.
I'll get hold of a Windows machine soon-ish and I'll make sure to fix this problem.
Thanks!

Totally missed that - thanks for noticing. I must have forgotten to remove stuff from the header since clang/gcc don't warn about it.
I'll get hold of a Windows machine soon-ish and I'll make sure to fix this problem.
Thanks!

If you remove those 3 declarations from the headers and update the patch I can test it on my windows machines for you.

nand updated this revision to Diff 219660.Sep 10 2019, 11:33 PM

removed declarations with no definitions

nand updated this revision to Diff 219785.Sep 11 2019, 1:20 PM

reset accidentally changed tests

Thanks - this builds cleanly on MSVC now

nand added a comment.Sep 12 2019, 4:10 AM

Thanks for looking into the problem - sorry for the delay!

Thanks for looking into the problem - sorry for the delay!

No problem, you still need a specialist to approve the patch though

nand updated this revision to Diff 219965.Sep 12 2019, 12:02 PM

Added a FIXME to reflect the intention to move the interpreter to its own library

jfb accepted this revision.Sep 12 2019, 3:19 PM

Sounds like this is ready to land again! Thanks for fixing everything.

RKSimon accepted this revision.Sep 13 2019, 1:41 AM

Accepting this to remove the block

This revision is now accepted and ready to land.Sep 13 2019, 1:41 AM
This revision was automatically updated to reflect the committed changes.
ormris added a subscriber: ormris.Sep 23 2019, 2:49 PM
ormris removed a subscriber: ormris.Sep 24 2019, 10:26 AM
jrtc27 added a subscriber: jrtc27.EditedFeb 16 2021, 3:20 PM

So I'm running into issues with this patch, specifically the line annotated below. I'm trying to compile our CHERI-LLVM fork as a native pure-capability CHERI[1] binary (which eventually will mean compiling the Morello-LLVM fork of our CHERI-LLVM to run as a native pure-capability CHERI binary on Arm's prototype Morello[2] board when it ships late this year/early next year), and I have LLVM (other than Orc JIT which is a whole beast and can't work without in-tree backend support) and LLD working (albeit only very lightly tested), and if I stub out that single function I am able to get Clang itself compiling and working (at least, well enough to compile a hello world C program all the way down to an executable), so this is the only technical blocker that I know of, which is rather frustrating.

CHERI capabilities can be used to implement C language pointers, providing strict fine-grained spatial safety and pointer provenance guarantees (i.e. it's not just that pointers have bounds, you also can't go and synthesise a pointer with the bounds you want if you don't already have one for that region), plus heap temporal safety on an experimental branch. This is done using tagged memory, where every capability-sized word in memory has a single tag bit indicating whether it has a valid capability or not. This means that, although you can memcpy capabilities around, they must remain aligned, otherwise their tags will become cleared and no longer be dereferenceable.

So, for us, that line _has_ to be able to be written as auto Punned = *(uintptr_t *)Ptr; (although it can be written in equivalent ways), as byte swapping uintptr_t is not something we can ever support (though with endianness::native you could at least make the templated version specialised to not call swapByteOrder, though the one taking a function argument can't be, but why bother going through all that indirection), nor is storing pointers to unaligned addresses if you want to be able to use them as pointers later.

I know nothing about this recently(ish)-added Clang constexpr interpreter, but it seems like this should be a solvable problem with some changes to the design to be less lax about alignment. Is there any documentation you could provide on how CodePtr is meant to work? "Pointer into the code segment." isn't exactly helpful, especially given it's actually storing pointers, which is not what you normally find in code segments when talking instead about compiled code. Do you have ideas for how this problem can be avoided?

[1] http://cheri-cpu.org/
[2] https://developer.arm.com/architectures/cpu-architecture/a-profile/morello

cfe/trunk/lib/AST/Interp/Source.h
69

^^^ this line ^^^

nand added a comment.Feb 17 2021, 12:17 AM

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

Hm, though I see the "store an ID" pattern is common for dynamic things and this should be quite rare, so maybe that is indeed the right approach, mirroring something like getOrCreateGlobal?

CodePtr points into the bytecode emitted by the byte code compiler. In some instances, pointers to auxiliary data structures are embedded into the byte code, such as functions or AST nodes which contain information relevant to the execution of the instruction.

Would it help if instead of encoding pointers, the byte code encoded some integers mapped to the original objects?

I've read through the code and have slightly more understanding now. It seems there are several options:

  1. Keep the pointers somewhere on the side and put an integer in the byte code, like you suggest
  1. Pad values in the byte code to their natural alignment in general (and ensure the underlying std::vector<char> gets its storage allocated at an aligned boundary / use a different container), though this can get a little weird as the amount of padding between consecutive arguments varies depending on where you are (unless you force realignment to the max alignment at the start of a new opcode)
  1. Make the byte code be an array of uintptr_t instead of packing it like is done currently, with care needed on ILP32; that can either just use uint64_t instead and we declare CHERI unsupported for 32-bit architectures (which is unlikely to be a problem as you probably want a 64-bit virtual address space if the doubling pointer size, with 64-bit CHERI capabilities on 32-bit VA systems being only for embedded use) or you can split 64-bit integers into two 32-bit integers and treat them as two arguments

1 works but feels ugly. 2 or 3 would be my preference, and mirror how "normal" interpreters work, though those might split the code and data so they can keep the opcodes as, say, 32-bit integers, but the stack full of native word/pointer slots; my inclination is that 3 is the best option as it looks like the simplest. How do you feel about each of those? Is memory overhead from not packing values a concern?

Hm, though I see the "store an ID" pattern is common for dynamic things and this should be quite rare, so maybe that is indeed the right approach, mirroring something like getOrCreateGlobal?

https://reviews.llvm.org/D97606 implements this.