diff --git a/clang/docs/ConstantInterpreter.rst b/clang/docs/ConstantInterpreter.rst --- a/clang/docs/ConstantInterpreter.rst +++ b/clang/docs/ConstantInterpreter.rst @@ -44,21 +44,28 @@ * ``PT_Ptr`` - Pointer type, defined in ``"Pointer.h"``. + Pointer type, defined in ``"Pointer.h"``. A pointer can be either null, reference interpreter-allocated memory (``BlockPointer``) or point to an address which can be derived, but not accessed (``ExternPointer``). * ``PT_FnPtr`` - Function pointer type, can also be a null function pointer. Defined in ``"Pointer.h"``. + Function pointer type, can also be a null function pointer. Defined in ``"FnPointer.h"``. * ``PT_MemPtr`` - Member pointer type, can also be a null member pointer. Defined in ``"Pointer.h"`` + Member pointer type, can also be a null member pointer. Defined in ``"MemberPointer.h"`` + +* ``PT_VoidPtr`` + + Void pointer type, can be used for rount-trip casts. Represented as the union of all pointers which can be cast to void. Defined in ``"VoidPointer.h"``. + +* ``PT_ObjCBlockPtr`` + + Pointer type for ObjC blocks. Defined in ``"ObjCBlockPointer.h"``. Composite types --------------- -The interpreter distinguishes two kinds of composite types: arrays and records. Unions are represented as records, except a single field can be marked as active. The contents of inactive fields are kept until they -are reactivated and overwritten. +The interpreter distinguishes two kinds of composite types: arrays and records (structs and classes). Unions are represented as records, except at most a single field can be marked as active. The contents of inactive fields are kept until they are reactivated and overwritten. Complex numbers (``_Complex``) and vectors (``__attribute((vector_size(16)))``) are treated as arrays. Bytecode Execution @@ -85,8 +92,6 @@ * ``IsStatic`` indicates whether the block has static duration in the interpreter, i.e. it is not a local in a frame. -* ``IsExtern`` indicates that the block was created for an extern and the storage cannot be read or written. - * ``DeclID`` identifies each global declaration (it is set to an invalid and irrelevant value for locals) in order to prevent illegal writes and reads involving globals and temporaries with static storage duration. Static blocks are never deallocated, but local ones might be deallocated even when there are live pointers to them. Pointers are only valid as long as the blocks they point to are valid, so a block with pointers to it whose lifetime ends is kept alive until all pointers to it go out of scope. Since the frame is destroyed on function exit, such blocks are turned into a ``DeadBlock`` and copied to storage managed by the interpreter itself, not the frame. Reads and writes to these blocks are illegal and cause an appropriate diagnostic to be emitted. When the last pointer goes out of scope, dead blocks are also deallocated. @@ -97,7 +102,7 @@ * **DtorFn**: invokes the destructors of non-trivial objects. * **MoveFn**: moves a block to dead storage. -Non-static blocks track all the pointers into them through an intrusive doubly-linked list, this is required in order to adjust all pointers when transforming a block into a dead block. +Non-static blocks track all the pointers into them through an intrusive doubly-linked list, required to adjust all pointers when transforming a block into a dead block. Descriptors ----------- @@ -110,13 +115,13 @@ * **Arrays of primitives** - An array of primitives contains a pointer to an ``InitMap`` storage as its first field: the initialisation map is a bit map indicating all elements of the array which were initialised. If the pointer is null, no elements were initialised, while a value of ``(InitMap)-1`` indicates that the object was fully initialised. when all fields are initialised, the map is deallocated and replaced with that token. + An array of primitives contains a pointer to an ``InitMap`` storage as its first field: the initialisation map is a bit map indicating all elements of the array which were initialised. If the pointer is null, no elements were initialised, while a value of ``(InitMap*)-1`` indicates that the object was fully initialised. when all fields are initialised, the map is deallocated and replaced with that token. Array elements are stored sequentially, without padding, after the pointer to the map. * **Arrays of composites and records** - Each element in an array of composites is preceded by an ``InlineDescriptor``. Descriptors and elements are stored sequentially in the block. Records are laid out identically to arrays of composites: each field and base class is preceded by an inline descriptor. The ``InlineDescriptor`` has the following field: +Each element in an array of composites is preceded by an ``InlineDescriptor`` which stores the attributes specific to the field and not the whole allocation site. Descriptors and elements are stored sequentially in the block. Records are laid out identically to arrays of composites: each field and base class is preceded by an inline descriptor. The ``InlineDescriptor`` has the following field: * **Offset**: byte offset into the array or record, used to step back to the parent array or record. * **IsConst**: flag indicating if the field is const-qualified. @@ -130,7 +135,26 @@ Pointers -------- -Pointers track a ``Pointee``, the block to which they point or ``nullptr`` for null pointers, along with a ``Base`` and an ``Offset``. The base identifies the innermost field, while the offset points to an array element relative to the base (including one-past-end pointers). Most subobject the pointer points to in block, while the offset identifies the array element the pointer points to. These two fields allow all pointers to be uniquely identified and disambiguated. +Pointers, implemented in ``Pointer.h`` are represented as a tagged union. Some of these may not yet be available in upstream ``clang``. + + * **BlockPointer**: used to reference memory allocated and managed by the interpreter, being the only pointer kind which allows dereferencing in the interpreter + * **ExternPointer**: points to memory which can be addressed, but not read by the interpreter. It is equivalent to APValue, tracking a declaration and a path of fields and indices into that allocation. + * **TargetPointer**: represents a target address derived from a base address through pointer arithmetic, such as ``((int *)0x100)[20]``. Null pointers are target pointers with a zero offset. + * **TypeInfoPointer**: tracks information for the opaque type returned by ``typeid`` + * **InvalidPointer**: is dummy pointer created by an invalid operation which allows the interpreter to continue execution. Does not allow pointer arithmetic or dereferencing. + +Besides the previously mentioned union, a number of other pointers have their own type: + + * **ObjCBlockPointer** tracks Objective-C blocks + * **FnPointer** tracks functions and lazily caches their compiled version + * **MemberPointer** tracks C++ object members + +Void pointers, which can be built by casting any of the aforementioned pointers, are implemented as a union of all pointer types. The ``BitCast`` opcode is reponsible for performing all legal conversions between these types and primitive integers. + +BlockPointer +~~~~~~~~~~~~ + +Block pointers track a ``Pointee``, the block to which they point, along with a ``Base`` and an ``Offset``. The base identifies the innermost field, while the offset points to an array element relative to the base (including one-past-end pointers). The offset identifies the array element or field which is referenced, while the base points to the outer object or array which contains the field. These two fields allow all pointers to be uniquely identified, disambiguated and characterised. As an example, consider the following structure: @@ -164,9 +188,37 @@ a |&a.b.x &a.y &a.c |&a.c[0].a |&a.c[1].a | &a.b &a.c[0] &a.c[1] &a.z -The ``Base`` offset of all pointers points to the start of a field or an array and is preceded by an inline descriptor (unless ``Base == 0``, pointing to the root). All the relevant attributes can be read from either the inline descriptor or the descriptor of the block. +The ``Base`` offset of all pointers points to the start of a field or an array and is preceded by an inline descriptor (unless ``Base`` is zero, pointing to the root). All the relevant attributes can be read from either the inline descriptor or the descriptor of the block. + + +Array elements are identified by the ``Offset`` field of pointers, pointing to past the inline descriptors for composites and before the actual data in the case of primitive arrays. The ``Offset`` points to the offset where primitives can be read from. As an example, ``a.c + 1`` would have the same base as ``a.c`` since it is an element of ``a.c``, but its offset would point to ``&a.c[1]``. The array-to-pointer decay operation adjusts a pointer to an array (where the offset is equal to the base) to a pointer to the first element. + +ExternPointer +~~~~~~~~~~~~~ + +Extern pointers can be derived, pointing into symbols which are not readable from constexpr. An external pointer consists of a base declaration, along with a path designating a subobject, similar to the ``LValuePath`` of an APValue. Extern pointers can be converted to block pointers if the underlying variable is defined after the pointer is created, as is the case in the following example: + +.. code-block:: c + + extern const int a; + constexpr const int *p = &a; + const int a = 5; + static_assert(*p == 5, "x"); + +TargetPointer +~~~~~~~~~~~~~ + +While null pointer arithmetic or integer-to-pointer conversion is banned in constexpr, some expressions on target offsets must be folded, replicating the behavious of the ``offsetof`` builtin. Target pointers are characterised by 3 offsets: a field offset, an array offset and a base offset, along with a descriptor specifying the type the pointer is supposed to refer to. Array indexing ajusts the array offset, while the field offset is adjusted when a pointer to a member is created. Casting an integer to a pointer sets the value of the base offset. As a special case, null pointers are target pointers with all offets set to 0. + +TypeInfoPointer +~~~~~~~~~~~~~~~ + +``TypeInfoPointer`` tracks two types: the type assigned to ``std::type_info`` and the type which was passed to ``typeinfo``. + +InvalidPointer +~~~~~~~~~~~~~~ -Array elements are identified by the ``Offset`` field of pointers, pointing to past the inline descriptors for composites and before the actual data in the case of primitive arrays. The ``Offset`` points to the offset where primitives can be read from. As an example, ``a.c + 1`` would have the same base as ``a.c`` since it is an element of ``a.c``, but its offset would point to ``&a.c[1]``. The ``*`` operation narrows the scope of the pointer, adjusting the base to ``&a.c[1]``. The reverse operator, ``&``, expands the scope of ``&a.c[1]``, turning it into ``a.c + 1``. When a one-past-end pointer is narrowed, its offset is set to ``-1`` to indicate that it is an invalid value (expanding returns the past-the-end pointer). As a special case, narrowing ``&a.c`` results in ``&a.c[0]``. The `narrow` and `expand` methods can be used to follow the chain of equivalent pointers. +Such pointers are built by operations which cannot generate valid pointers, allowing the interpreter to continue execution after emitting a warning. Inspecting such a pointer stops execution. TODO ==== @@ -174,20 +226,18 @@ Missing Language Features ------------------------- -* Definition of externs must override previous declaration * Changing the active field of unions -* Union copy constructors -* ``typeid`` * ``volatile`` * ``__builtin_constant_p`` -* ``std::initializer_list`` -* lambdas -* range-based for loops -* ``vector_size`` * ``dynamic_cast`` +* ``new`` and ``delete`` +* Fixed Point numbers and arithmetic on Complex numbers +* Several builtin methods, including string operations and ``__builtin_bit_cast`` +* Continue-after-failure: a form of exception handling at the bytecode level should be implemented to allow execution to resume. As an example, argument evaluation should resume after the computation of an argument fails. +* Pointer-to-Integer conversions +* Lazy descriptors: the interpreter creates a ``Record`` and ``Descriptor`` when it encounters a type: ones which are not yet defined should be lazily created when required Known Bugs ---------- -* Pointer comparison for equality needs to narrow/expand pointers * If execution fails, memory storing APInts and APFloats is leaked when the stack is cleared