Page MenuHomePhabricator

[Clang Interpreter] Initial patch for the constexpr interpreter
Needs ReviewPublic

Authored by nand on Jul 3 2019, 10:30 AM.



This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rsmith added inline comments.Jul 26 2019, 5:03 PM
1 ↗(On Diff #208520)

This file also needs a *lot* more documentation strings than it currently has. We need to document what all the opcodes are, what effects they have on the interpreter state, and so on.

40–49 ↗(On Diff #208520)

Consider packing these 5 members into 4 bytes.

37 ↗(On Diff #208520)

Consider adding a type-safe wrapper for this.

105 ↗(On Diff #208520)

If CodePtr is a plain pointer, using a single flat vector here seems dangerous; you'll have invalidation issues if you ever trigger creation of a Function while evaluating, which seems like something that will happen in practice, eg:

constexpr int f();
constexpr int g() { return f(); }
constexpr int f() { return 0; } // or imported from a module at this point
constexpr int k = g(); // evaluation of this call triggers generation of code for f
106–107 ↗(On Diff #208520)

Using a (presumably very large) std::map for this would presumably be very time- and space-inefficient. Do you expect this to be dense? (Would a flat array of SourceInfo work better?)

24–34 ↗(On Diff #208520)

As previously, please follow coding guidelines wrt naming here.

51–53 ↗(On Diff #208520)

What do you expect to use this for? If you need to encode a pointer, would it make sense to have a distinct primitive type for that?

64–75 ↗(On Diff #208520)

Injecting T into this seems too implicit to me. Consider instead accepting the name of a function-like macro and invoking it on the type. Eg:

#define RETURN_SIZEOF(T) return sizeof(T);

Also please wrap the expansion in do { ... } while (0) to require the user of the macro to provide a semicolon afterwards.

nand added a comment.Mon, Jul 29, 10:03 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr? If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

jfb added a comment.Mon, Jul 29, 11:06 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr? If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

Most integers won't be in that category, you can therefore speculate that fact when emitting bytecode, and throw away byte code when the assumption turns out to be wrong (and re-generate the more expensive byte code).

uenoku added a subscriber: uenoku.Mon, Jul 29, 11:11 AM

How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.

Is this ever going to be allowed in constexpr?

It's not sufficient for this to handle only the things that are allowed in constant expressions; you also need to allow the things that are allowed by Clang's current constant evaluator, which includes this case. There are also constructs that allow arbitrary constant folding within the context of constant expression evaluation (such as a __builtin_constant_p conditional). So yes, you need to deal with this.

If that is the case, we'll add a separate type for all integers which are large enough to hold a pointer, a tagged union indicating whether the value is a number or a pointer. This would add significant overhead, but I don't see any other way which can correctly diagnose UB when casting a random integer to a pointer.

These cases are likely to be rare enough that separately-allocated storage for this case could work. You'll need at least one bit somewhere to track whether you're in the "pointer cast to integer" case, though.

(You also need to be able to distinguish between an integer value and an uninitialized integer and an out-of-lifetime integer, so you'll presumably need some side-table to track the state of subobjects anyway.)

nand marked 10 inline comments as done.Mon, Jul 29, 1:01 PM

We can add a separate integer type which tracks all the additional information required by __builtin_constant_p and compile all integers to it in this context. A later patch added an APInt fallback to the interpreter if an integral cannot be mapped to a type supported by the VM - this mechanism could be used to implement the fallback for contexts which cast pointers to integers.

125 ↗(On Diff #208520)

I've removed the size field since it's not going to be used, but left the structure since it will gain other fields in future patches.

bruno added a subscriber: bruno.Wed, Jul 31, 12:01 PM
nand marked 40 inline comments as done.Wed, Jul 31, 2:09 PM

I have applied most of the changes you suggested to my HEAD which had significantly more functionality, including a replacement of with TableGen.
Quite a few of your concerns are answered by the features I have added between submitting the patch and now. The interpreter now stands at ~6k LOC.

Should I update the diff with an interpreter trimmed down to the functionality of the current diff (~3k LOC) or post the full interpreter now? What would be easier to review?

88 ↗(On Diff #208520)

Overflow happens here if a function takes more than a few million arguments. I think clang would die before it gets here.

488–503 ↗(On Diff #208520)

I'd like to do this in a future diff, once we add support for more constructs which discard their results.

89–90 ↗(On Diff #208520)

Yes, I considered allocating this on the stack - I just don't want to add that complexity to the interpreter yet. I like to avoid changing this until it becomes a noticeable performance problem.

30 ↗(On Diff #208520)

Subobjects will have pointers to descriptors embedded into their data - a pointer inside a block can follow that chain of descriptors up to the root.

36–39 ↗(On Diff #208520)

I've added an alias to be used for the sizes of object as determined by the interpreter - CharUnits will be used when interfacing with APValue.

40–49 ↗(On Diff #208520)

I'd like to avoid packing stuff for now, makes it harder to change things later and it's not a performance problem yet.

105 ↗(On Diff #208520)

This wouldn't have been a problem, but the compilation of default constructors caused issues. Now the compiler can recursively invoke itself.

106–107 ↗(On Diff #208520)

Replaced this with a dense array + binary search.

51–53 ↗(On Diff #208520)

Removed this type.

64–75 ↗(On Diff #208520)

I'd rather have a documented implicit than use nested macros...

nand updated this revision to Diff 213277.Sun, Aug 4, 8:22 PM
nand marked 10 inline comments as done.

Implemented suggestions from @rsmith.

Since I had ~30 patches on top of the code in the diff, I applied the changes on those
and removed some functionality to produce this updated diff. This update uses the
byte code generator parameterised by a template emitter: the LinkEmitter emits and
links bytecode, while the EvalEmitter directly evaluates opcodes. The interpreter,
evaluator and most opcode-related code is now generated from through
TableGen. Since the feedback included several references to pointers, I have included
some pointer functionality as well.

I hope the size of this diff is manageable - if really required, I could further trim
it down. Since this is a new subproject which does not introduce intrusive changes and
there are a significant number of other commits depending on this patch, I would like
to avoid that.

nand retitled this revision from [ConstExprPreter][WIP] Initial patch for the constexpr interpreter to [Clang Interpreter] Initial patch for the constexpr interpreter.Sun, Aug 4, 8:23 PM

I would like to better understand the big picture for descriptors, pointers, and so on. I'm not yet seeing how the pieces are going to fit together and not frequently require expensive operations. For example, pointer arithmetic requires determining the array bound of the pointee; pointer subtraction requires determining whether two pointers point into the same array; pointer comparisons require determining the divergence point in the access paths between two pointers. These path-based operations were easy in the old representation, and seem to be difficult in the new representation, so I want to make sure that they've been properly considered. It's also not clear to me how you'll model pointers that have lost their designator (through constant folding), where you have a runtime offset but no compile-time offset, or pointers that point to objects whose values are not part of the evaluation (for example, extern globals), or pointers that have a designator but no base (for __builtin_object_size handling).

This choice of representation also seems to make it really easy for a minor bug in the interpreter to turn into undefined behavior during compilation. Have you given any thought to how that might be avoided or mitigated?


"Clang" is redundant here (what else could it be?). If we're calling the new thing "interp" internally, then I guess "EnableInterp" and "ForceInterp" seem OK, but given that this is just scaffolding until the new interpreter is done, maybe "EnableNewInterp" and "ForceNewInterp" would be better. Suggestion:

BENIGN_LANGOPT(EnableNewInterp, 1, 0,
               "enable the experimental new constexpr interpreter")
BENIGN_LANGOPT(ForceNewInterp, 1, 0,
               "force the use of the experimental new constexpr interpreter")

On reflection, let's put "new-" in here too, to match -fexperimental-new-pass-manager (so -fexperimental-new-constexpr-interpreter). (As above, this flag name should not contain the string "clang".)


Please keep these in alphabetical order.


Please keep these in alphabetical order.


Missing a check for ForceClangInterp here.


Missing a check for ForceClangInterp here.


Isn't this in the wrong place? If interp succeeds, I think we always want to stop; it's only if interp fails / bails that ForceClangInterp should matter. Or did I misunderstand what that flag is for?

30 ↗(On Diff #208520)

I don't think I understand what you're suggesting here. If I have, for example:

struct X { char c; };
X x[1000000];

... are you saying you would embed 1000000 descriptors into the representation of x? That seems extremely wasteful to me. Presumably the goal should be to get as close to allocating 1000000 bytes of storage for x as possible. I think we can probably get down to 1000000 bytes for the c members plus perhaps 1 bit for each X object (to track whether it's within its lifetime) and 2 bits for each char object (to track whether it's within its lifetime and whether it's initialized), for a 1.375x storage overhead in this case. But I don't see how you get there from this representation.


I think emitDestruction would probably be a better name, since this needs to encompass things that are not destructors (such as invalidating pointers/references that point into the destroyed objects).


Doc comment please. "Block" means two different things in Clang (compound statements and the Apple Blocks extension) and a reader needs to know which one.


I don't know what this means. Declarators don't typically introduce a scope (for lifetime purposes).

Do you mean "for the initializer of a top-level variable declaration" or something like that?


No else after return, please:

if (auto *Exp = dyn_cast<Expr>(S)) {
  if (Exp->getType()->isVoidType())
    return this->Visit(Exp);
  return discard(Exp);
return this->bail(S);

No auto here, and please don't shadow clang::Stmt with a local variable.


No auto here, please.


No auto here, please.


No auto here, please.


No auto here, please.


Especially no auto here. Even someone familiar with the rest of the clang codebase won't be able to guess these types. In particular, as a reader, I'm wondering if this is some type provided by Emitter or whether it's some type provided by ByteCodeGen, and what the contract is between this class and its template argument with regard to this type. Spelling this as typename Emitter::Label would help substantially.

I'm not going to point out any more uses, but generally if you only use auto when the type is obvious from the spelling of the initializer (eg, when the initializer is dyn_cast<T>(...)), that would be fine.


(This uses of auto like this are also fine; it's obvious this is an iterator and we don't need to know more than that.)


Have you considered how you will support Clang's current constant folding of && and || expressions where the left-hand side is non-constant but the right hand side is a false or true constant (respectively)?


We can, but should we?

I would think the only time we can reach this case is when we're constant-evaluating an expression within a function and that expression names a function parameter, and in that case, the expression should be treated as non-constant. We should have a more direct way to represent "if you get here, the evaluation is non-constant, with this reason" in the bytecode.


The above ParmVarDecl case can fall through into here, because a ParmVarDecl is a VarDecl. Is that what you intended?

Do you actually need to treat locals and parameters separately? This would seem simpler if you could model them as being the same thing.


Using an && here rather than a nested if would help readability a little.


Please try to find some way to refactor this so that there's not so many nested scopes; it makes it hard to follow the control flow within the function.


I'd like to see this generated from a .def file, rather than enumerating the various currently-supported integer types in many different places. The goal should be that adding support for a new integer type / size (say we start supporting a target with 48 bit integers) should require making changes in only one place. Case in point: you're currently missing support for 128-bit integers, which Clang does support across many targets.


These {} arguments are completely opaque here. Please add some kind of a clue as to what you're passing in here. (Can you use SourceInfo()?)


Does Emitter need to be a base class rather than a member?


Is there any reason these need to be virtual? It looks like you would never call these from a context where the concrete type is unknown. (These are the only implementations of the pure virtual functions they override.)


Parameter names here would help explain what these functions are supposed to do -- particularly for BlockMoveFn (which char* is which?), but also for the bool parameter of BlockCtorFn.


scoped -> scope


It seems like a problem that an unsupported construct in a !isActive() context causes us to fail evaluation.


Can this happen? I would expect not: if we can't cope with backwards jumps here, I wouldn't expect us to be able to cope with function calls / returns.

If this can happen, we need to skip anything that happens after the return rather than keeping on evaluating it.


I can't find any calls to this, and I'm not sure what it's for. Is it necessary?


I'm curious how much this approach costs us, compared to allowing the emitter to tell ByteCodeGen whether to emit both arms of a ?: (or if not, telling it which one to follow) and similarly for && / ||. There's the constant overhead of checking isActive() on each action, plus the cost of the calls into EvalEmitter while "evaluating" the unselected operand. (Eg, false ? some-very-large-expression : 0 would be evaluated much faster if we could tell ByteCodeGen not to bother with the unselected operand.)

(This approach doesn't support statement-expressions, because there can be backwards jumps in those, so I think ?:/&&/|| are pretty much the only things this supports, right?)


This is missing lots of necessary checks; I think your chosen model for pointers makes these checks quite hard to implement. The validity of pointer comparisons in constant evaluation depends on the point of divergence of the two access paths. For example:

struct A { int n; };
struct B : A { int m; } b;
B ba[10];
constexpr bool x = ba[0].n < &ba[0].m; // error, not constant, divergence point is base class
constexpr bool y = ba[0].n < &ba[1].m; // ok, divergence point is array indexing

Also, access control must be taken into account:

struct A {
  int x;
  int y;
  constexpr bool q(A a) { return &a.x < &a.y; } // not constant, x and y have different access

On what basis do you assume that the orders of the offsets in the compile-time representation has any relation to the correct order for a pointer comparison? Do you intend to guarantee this when doing struct layout for the interpreter? If so, please include a comment here explaining that.


It seems wasteful to me to do this initialization work when entering the frame rather than when reaching the declaration of the variable in question. Do we need to do it early like this?

How does this deal with loops? I would have expected that we'd re-initialize a loop scope each time we enter the loop body, so that we don't confuse variables from one iteration with variables from another.

32 ↗(On Diff #213277)

Should this be called ByteCodeEmitter if it emits byte code?


This still needs to be expanded to actually describe what the opcodes do (that is, their effects on the stack and any side-effects they cause). Something simple like:

// [RetVal] -> [], returns RetVal
def Ret : Opcode {
// [A, B] -> [B - A]
def Sub : AluRealOpcode;

would be enough.


If we're allocating one of these for every storage allocation in the evaluation, packing these bools into the spare bits in the pointers seems worthwhile.


What is this for? I think we need a design document here describing the intended model for pointers, references, descriptors, and so on -- preferably something that can be included in the internals manual.


You will also need to track at least:

  • Whether this is a one-past-the-end pointer
  • (Information equivalent to) which union members were accessed on the path to this point

I think everything else can be reconstructed from the type and offset.

nand updated this revision to Diff 214828.Tue, Aug 13, 7:05 AM
nand marked 60 inline comments as done.

Implemented @zygoloid's suggestions

nand added a comment.Tue, Aug 13, 7:05 AM

updated diff, implemented requested changes


The goal is to evaluate everything that is constant - I would like to avoid using constexpr specifically in the name.


This is intended - if the interpreter bails and ForceInterp is set, we emit a diagnostic in Context, promoting the error to a failure.


Same as with statement expressions in the evaluating emitter - some sort of closure which compiles and executes when triggered.


We absolutely need this to emit the warning:

constexpr int callee_ptr(int *beg, int *end) {
  const int x = 2147483647;
  *beg = x + x; // expected-warning {{overflow in expression; result is -2 with type 'int'}}\
                // expected-note 3{{value 4294967294 is outside the range of representable values of type 'int'}}
  return *beg;

Refactored + fixed this by moving stuff to separate functions.


This is the only case where such a switch occurs - there is another for zero initialisation, but it's quite different. A def which would capture the little common functionality would be quite ugly, so I would like to avoid it.

As for other integers, there won't be support for any. There are two more branches for the fixed-point catch-all fallback.


The entry point is through the emitter which needs to call into the code generator, things don't really work if the emitter is a field.


Currently the interpreter cannot work side-by-side with the interpreter since reading from APValue is not supported at all... Since I am unsure whether I ever want to implement an APValue -> Interpreter mapping, I do not mind this, as long as diagnostic pinpoint the next feature to be implemented.


Theoretically not, but it helped me catch a few bugs in the past, so I would like to keep it conditional.


The conditional operator will require it to handle the fallthrough from the false branch to the following expression.


I'd like to revisit this at a later point in time - for now I think predication is the simplest option and I would like to stick with the simple options until the interpreter is feature-complete.

Statement expressions would not be predicated, the current plan is to generate a closure-like method for them.
Another idea would be to revert to bytecode generation whenever the landing pad of a potentially backwards jump is encountered - these spots are quite well defined.


I don't think there's a zero-cost way around this. We'll need to track more metadata to check whether fields are public/private and traverse the pointer chain to find divergence points. The InlineDescriptors might have a few spare bits could be used to cache "tags" which classify pointers into comparable classes? At this point, it is quite hard to decide what case to make fast at what cost.


Yes, the layout of fields in the interpreter is supposed to match the target's record layout.


We can optimise this later, avoiding the need to create blocks for local primitives which never have pointers taken. If a local does not have a pointer generated to it using GetLocalPtr, the block is not needed at all and zero-initialisation using memset should do the job.


I was a bit afraid of documenting everything and then changing everything, but things seem to be quite stable now, so I will add some documentation.


It's definitely on the TODO list - I would like to avoid such optimisations for now, until enough features are available to run sizeable benchmarks.


I haven't written things up yet since I changed quite a few things over the past few weeks. I am wrapping up support for unions and will produce a more detailed document afterwards. In the meantime, I have documented this field.


It's on the TODO list - I am aiming for C++11 conformance ASAP and this will be required.

nand added a comment.Tue, Aug 13, 7:18 AM

The old path-based approach is slow for most operations, while the new pointer-based approach is fast for the most common ones.
For read/write/update operations, it is enough to check the pointer and at most a single descriptor to ensure whether the operation
can be performed. For some operations, such as some pointer comparisons, in the worst case the old-style path can be reproduced by
following the pointers produced through the getBase method, allowing us to implement everything.

Extern objects are handled by allocating memory for them, allowing pointers to those regions to exist, but marking the actual storage
as extern/uninitialized and preventing reads/writes to that storage.

Some details of corner cases are not clear at this point, but the fact that we can reproduce full paths and have space to store any special
bits gives me the confidence that we can converge to a full implementation.

I want to stress again the fact that C++ has an insane number of features and I am focusing on implementing as many of them, leaving a
path towards future optimisations open. I do not want to clobber initial patches with optimisations that would increase complexity - the
few things we do with direct local reads/writes and improved loops already yield significant speedupds.