Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -4581,68 +4581,69 @@ :ref:`Representation` talks about the metadata encoding of various entities. -It is always possible to trace any TBAA node to a "root" TBAA node (details -in the :ref:`Representation` section). TBAA -nodes with different roots have an unknown aliasing relationship, and LLVM -conservatively infers ``MayAlias`` between them. The rules mentioned in -this section only pertain to TBAA nodes living under the same root. - .. _tbaa_node_semantics: Semantics """"""""" The TBAA metadata system, referred to as "struct path TBAA" (not to be -confused with ``tbaa.struct``), consists of the following high level -concepts: *Type Descriptors*, further subdivided into scalar type -descriptors and struct type descriptors; and *Access Tags*. +confused with ``tbaa.struct``), consists of three high-level concepts: +*Type Descriptors*, *Field Descriptors* and *Access Tags*. **Type descriptors** describe the type system of the higher level language -being compiled. **Scalar type descriptors** describe types that do not -contain other types. Each scalar type has a parent type, which must also -be a scalar type or the TBAA root. Via this parent relation, scalar types -within a TBAA root form a tree. **Struct type descriptors** denote types -that contain a sequence of other type descriptors, at known offsets. These -contained type descriptors can either be struct type descriptors themselves -or scalar type descriptors. - -**Access tags** are metadata nodes attached to load and store instructions. -Access tags use type descriptors to describe the *location* being accessed -in terms of the type system of the higher level language. Access tags are -tuples consisting of a base type, an access type and an offset. The base -type is a scalar type descriptor or a struct type descriptor, the access -type is a scalar type descriptor, and the offset is a constant integer. - -The access tag ``(BaseTy, AccessTy, Offset)`` can describe one of two -things: - - * If ``BaseTy`` is a struct type, the tag describes a memory access (load - or store) of a value of type ``AccessTy`` contained in the struct type - ``BaseTy`` at offset ``Offset``. - - * If ``BaseTy`` is a scalar type, ``Offset`` must be 0 and ``BaseTy`` and - ``AccessTy`` must be the same; and the access tag describes a scalar - access with scalar type ``AccessTy``. - -We first define an ``ImmediateParent`` relation on ``(BaseTy, Offset)`` -tuples this way: - - * If ``BaseTy`` is a scalar type then ``ImmediateParent(BaseTy, 0)`` is - ``(ParentTy, 0)`` where ``ParentTy`` is the parent of the scalar type as - described in the TBAA metadata. ``ImmediateParent(BaseTy, Offset)`` is - undefined if ``Offset`` is non-zero. - - * If ``BaseTy`` is a struct type then ``ImmediateParent(BaseTy, Offset)`` - is ``(NewTy, NewOffset)`` where ``NewTy`` is the type contained in - ``BaseTy`` at offset ``Offset`` and ``NewOffset`` is ``Offset`` adjusted - to be relative within that inner type. - -A memory access with an access tag ``(BaseTy1, AccessTy1, Offset1)`` -aliases a memory access with an access tag ``(BaseTy2, AccessTy2, -Offset2)`` if either ``(BaseTy1, Offset1)`` is reachable from ``(Base2, -Offset2)`` via the ``Parent`` relation or vice versa. +being compiled. Each type descriptor has a parent, which must also +be a type descriptor or the TBAA root. Via this parent relation, types +within a TBAA root form a tree. TBAA metadata provides no aliasing +information about types with different roots. + +In addition to the parent node, each type descriptor contains the size of the +denoted type, its identifier and an optional sequence of field descriptors. +Type identifiers are arbitrary ``Metadata`` nodes whose purpose is to guarantee +that different types are represented with different descriptors, even if +otherwise they are identical. For example, two C structures of the same layout +can be differentiated with metadata strings containing their tag names. + +**Field descriptors** are used to describe layout of structure types. Each +field descriptor contains the size of the field, its offset and type. + +**Access tags** are metadata nodes attached to various instructions, such as +loads and stores, that are used to describe the *location* being accessed by +these instructions in terms of the higher level language. + +Every access tag consists of a base type, an access type, an offset, a size and +an optional immutability flag. An access tag with different base and access +types represents an access to a field of an object of the base type. In this +case the offset is the offset of the field within the base object. Otherwise, +the base and access types are the same type and the tag denotes an access to +an object of that type. + +Access sizes together with type sizes encoded in type descriptors make it +possible to represent accesses to arbitrary parts of objects. Thus, if the +size of an access is larger that the size of its access type, then it is +considered to be an access to a range of array elements of that access type +that fits the size of the access. For example, if the size of an access is +28 and the access type is ``int`` of size 4, then it is an access to all or +any of the elements of an ``int[7]`` array. + +The immutability flag, when present, specifies whether the access tag states +that the location being accessed is "constant" (meaning +``pointsToConstantMemory`` should return true; see +`other useful AliasAnalysis methods `_.) + +Two accesses alias if: -As a concrete example, the type descriptor graph for the following program + * the base types of the accesses are the same type and the ranges of accessed + bytes within that type overlap, or + + * the base type of one access is the type of a *direct* or *indirect* field of + the other base type and the ranges of accessed bytes within that field type + overlap. A field is a direct field of a type if the descriptor of that + field is part of the field sequence of that type's descriptor. Then, + a field is an indirect field of a type if it is a direct or indirect field + of one of the direct fields of that type. + + +For example, for the following code: .. code-block:: c @@ -4654,73 +4655,123 @@ struct Outer { float f; // offset 0 double d; // offset 4 - struct Inner inner_a; // offset 12 + struct Inner inner; // offset 12 + }; + + union U { + int i; // offset 0 + struct Inner inner; // offset 0 }; - void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) { - outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0) - outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12) - outer->inner_a.f = 0.0; // tag2: (OuterStructTy, IntScalarTy, 16) - *f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0) + void f(struct Outer* outer, struct Inner* inner, union U* u, + float* f, int* i) { + *i = 0; // tag1: (BaseType=Int, AccessType=Int, + // Offset=0, Size=4) + *f = 0.0; // tag2: (BaseType=Float, AccessType=Float, + // Offset=0, Size=4) + inner->i = 0; // tag3: (BaseType=Inner, AccessType=Int, + // Offset=0, Size=4) + inner->f = 0; // tag4: (BaseType=Inner, AccessType=Float, + // Offset=4, Size=4) + outer->f = 0; // tag5: (BaseType=Outer, AccessType=Float, + // Offset=0, Size=4) + outer->inner.i = 0; // tag6: (BaseType=Outer, AccessType=Int, + // Offset=12, Size=4) + outer->inner.f = 0.0; // tag7: (BaseType=Outer, AccessType=Int, + // Offset=16, Size=4) + + // Accesses to union members are not supported yet, and conservatively + // represented as accesses to character arrays. Note that in C and C++, + // character types can be used to access any objects. + u->i = 0; // tag8: (BaseType=Char, AccessType=Char, + // Offset=0, Size=4) } -is (note that in C and C++, ``char`` can be used to access any arbitrary -type): +the type descriptor graph is: .. code-block:: text - Root = "TBAA Root" - CharScalarTy = ("char", Root, 0) - FloatScalarTy = ("float", CharScalarTy, 0) - DoubleScalarTy = ("double", CharScalarTy, 0) - IntScalarTy = ("int", CharScalarTy, 0) - InnerStructTy = {"Inner" (IntScalarTy, 0), (FloatScalarTy, 4)} - OuterStructTy = {"Outer", (FloatScalarTy, 0), (DoubleScalarTy, 4), - (InnerStructTy, 12)} + Root = "TBAA Root" + Char = (Parent=Root, Size=1, Id="char") + Float = (Parent=Char, Size=4, Id="float") + Double = (Parent=Char, Size=8, Id="double") + Int = (Parent=Char, Size=4, Id="int") + Inner = {Parent=Char, Size=8, Id= "Inner", + (Type=Int, Offset=0, Size=4), + (Type=Float, Offset=4, Size=4)} + Outer = {Parent=Char, Size=20, Id="Outer", + (Type=Float, Offset=0, Size=4), + (Type=Double, Offset=4, Size=8), + (Type=Inner, Offset=12, Size=8)} + +The accesses ``inner->f`` and ``outer->inner.f`` alias, because the base type +of the first access (``Inner``) is also the type of a field of the second +base type (``Outer``) and both refer to the same member within that field. +In contrast, accesses ``*i`` and ``*f`` do not alias as they have +different base types and neither of the base types has fields. -with (e.g.) ``ImmediateParent(OuterStructTy, 12)`` = ``(InnerStructTy, -0)``, ``ImmediateParent(InnerStructTy, 0)`` = ``(IntScalarTy, 0)``, and -``ImmediateParent(IntScalarTy, 0)`` = ``(CharScalarTy, 0)``. +Accesses ``inner->i`` and ``inner->f`` do not alias either, because they +refer to non-overlapping fields within the same base type. .. _tbaa_node_representation: Representation """""""""""""" -The root node of a TBAA type hierarchy is an ``MDNode`` with 0 operands or -with exactly one ``MDString`` operand. +The root node of a TBAA type hierarchy is an ``MDNode`` with no operands: -Scalar type descriptors are represented as an ``MDNode`` s with two -operands. The first operand is an ``MDString`` denoting the name of the -struct type. LLVM does not assign meaning to the value of this operand, it -only cares about it being an ``MDString``. The second operand is an -``MDNode`` which points to the parent for said scalar type descriptor, -which is either another scalar type descriptor or the TBAA root. Scalar -type descriptors can have an optional third argument, but that must be the -constant integer zero. - -Struct type descriptors are represented as ``MDNode`` s with an odd number -of operands greater than 1. The first operand is an ``MDString`` denoting -the name of the struct type. Like in scalar type descriptors the actual -value of this name operand is irrelevant to LLVM. After the name operand, -the struct type descriptors have a sequence of alternating ``MDNode`` and -``ConstantInt`` operands. With N starting from 1, the 2N - 1 th operand, -an ``MDNode``, denotes a contained field, and the 2N th operand, a -``ConstantInt``, is the offset of the said contained field. The offsets -must be in non-decreasing order. - -Access tags are represented as ``MDNode`` s with either 3 or 4 operands. -The first operand is an ``MDNode`` pointing to the node representing the -base type. The second operand is an ``MDNode`` pointing to the node -representing the access type. The third operand is a ``ConstantInt`` that -states the offset of the access. If a fourth field is present, it must be -a ``ConstantInt`` valued at 0 or 1. If it is 1 then the access tag states -that the location being accessed is "constant" (meaning -``pointsToConstantMemory`` should return true; see `other useful -AliasAnalysis methods `_). The TBAA root of -the access type and the base type of an access tag must be the same, and -that is the TBAA root of the access tag. +.. code-block:: llvm + + !0 = !{} + +or exactly one ``MDString`` operand: + +.. code-block:: llvm + + !0 = !{ !"TBAA Root" } + +Type descriptors are ``MDNode`` tuples with at least three operands: + +.. code-block:: llvm + + !1 = !{ !0, i64 1, !"char" } + !2 = !{ !1, i64 4, !"float" } + !3 = !{ !1, i64 8, !"double" } + !4 = !{ !1, i64 4, !"int" } + !5 = !{ !1, i64 8, !"Inner", !4, i64 0, i64 4, !2, i64 4, i64 4 } + !6 = !{ !1, i64 20, !"Outer", !2, i64 0, i64 4, !3, i64 4, i64 8, !5, i64 12, i64 8 } + +The first operand is the ``MDNode`` of the parent descriptor or the TBAA root. + +The second operand is a ``ConstantInt`` node representing the size of the type +in bytes. + +The third operand is a ``Metadata`` node denoting the identifier of the type. +LLVM does not assign meaning to the value of this operand. + +Other operands are optional. They represent field descriptors so that each +descriptor encoded as a group of three operands. The first operand in a group +is the ``MDNode`` of the field's type descriptor. The second and third operands +are ``ConstantInt`` nodes that denote respectively the offset and size, +in bytes, of the field. Note that the offsets in field descriptors must be in +non-decreasing order. + +Access tags are represented as ``MDNode`` tuples with either 4 or 5 operands: + +.. code-block:: llvm + + !9 = !{ !6, !4, i64 16, i64 4 } ; tag7 + !7 = !{ !4, !4, i64 0, i64 4, i64 1 } ; An immutable access. + +The first and seconds operands are the ``MDNode`` s of the base and access type +descriptors respectively. + +The third and fourth operands are ``ConstantInt`` nodes that state the offset +and size of the access. Both are specified in bytes. + +The fifth operand is the optional immutability flag. It must be a +``ConstantInt`` valued at 0 (for mutable accesses) or 1 (for immutable ones). '``tbaa.struct``' Metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^