Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -4406,32 +4406,206 @@ implement typical C/C++ TBAA, but it can also be used to implement custom alias analysis behavior for other languages. -The current metadata format is very simple. TBAA metadata nodes have up -to three fields, e.g.: +The description of LLVM's TBAA system is broken into two parts. The first +part, :ref:`Semantics` talks about the high level +algorithm, and the second part, +:ref:`Representation` talks about how the various +entities are literally encoded in the IR as metadata nodes and about some +syntactic aspects of the scheme. -.. code-block:: llvm +It is always possible to trace any TBAA node to a "root" TBAA node (details +in the :ref:`Representation` section). TBAA +nodes with different roots have an unknown aliasing relationship, and LLVM +conservatively infers ``MayAlias`` between them. The rules mentioned in +this section only pertain to TBAA nodes living under the same root. + +.. _tbaa_node_semantics: + +Semantics +""""""""" + +The TBAA metadata system, referred to as "struct path TBAA" (not to be +confused with ``tbaa.struct``), consists of the following high level +concepts: *Type Descriptors*, further subdivided into scalar type +descriptors and struct type descriptors; and *Access Tags*. + +*Type descriptors* describe the type system of the higher level language +being compiled. + +*Struct type descriptors* denote types that contain a sequence of other +type descriptors, at known offsets. These other type descriptors can +either be struct type descriptors themselves or scalar type descriptors. + +*Scalar type descriptors* are a subset of struct type descriptors that +contain only one type at offset 0, which is either itself a scalar type +descriptor or the TBAA root. The latter case denotes a scalar type +descriptor that does not contain any other type. + +*Access tags* are metadata nodes attached to load and store instructions. +Access tags use type descriptors to describe the *location* being accessed +in terms of the type system of the higher level language. Access tags are +tuples consisting of a base type, an access type and an offset. The base +type is a scalar type descriptor or a struct type descriptor, the access +type is a scalar type descriptor, and the offset is a constant integer. + +The access tag, ``(BaseTy, AccessTy, Offset)`` from a load or store +instruction can be used to compute the set of aliasing access tags by this +algorithm: + +.. code-block:: text + + GetParent(ScalarTy): + (Name, Parent) = ScalarTy + return Parent + + TraverseIntoField(StructTy, Offset); + Let Field be the last (by increasing offset) field in StructTy that + begins before or on Offset, and NextOffset be the offset of said + Field. + + return (Field, Offset - NextOffset) + + + GetAliasingLocations(AccessTag): + (BaseTy, AccessTy, Offset) = AccessTag + AliasingSet = EmptySet + do + AliasingSet = AliasingSet Union { (BaseTy, AccessTy, Offset) } + if (BaseTy is a Scalar Type) + Assert Offset == 0 + BaseTy = GetParent(BaseTy) + else + (BaseTy, Offset) = TraverseIntoField(BaseTy, Offset) + while Next is not Root + return AliasingSet + + +Note that we do not use the ``AccessTy`` component in the algorithm above. +Today it is used to update TBAA metadata (when transforming the IR requires +it), but not as part of the aliasing query. + +For the updating logic to work, the metadata has to be structured in a way +that for every access tag ``(BaseTy, AccessTy, Offset)``, +``GetAliasingLocations((BaseTy, AccessTy, Offset))`` contains ``(AccessTy, +AccessTy, 0)``. + + +Two memory operations with access tags ``TagA`` and ``TagB`` may alias only +if either ``(TagB.BaseTy, _, TagB.Offset)`` belongs to +``GetAliasingLocations(TagA)`` or ``(TagA.BaseTy, _, TagA.Offset)`` belongs +to ``GetAliasingLocations(TagB)``, where ``_`` stands for any access type. + + +As a concrete example, consider: + +.. code-block:: c + + struct Inner { + int i; // offset 0 + float f; // offset 4 + }; + + struct Outer { + float f; // offset 0 + double d; // offset 4 + struct Inner inner_a; // offset 12 + }; + + void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) { + outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0) + outer->d = 0.0; // tag1: (OuterStructTy, DoubleScalarTy, 4) + outer->inner_a.i = 0; // tag2: (OuterStructTy, IntScalarTy, 12) + outer->inner_a.f = 0.0; // tag3: (OuterStructTy, IntScalarTy, 16) + inner->i = 0; // tag4: (InnerStructTy, IntScalarTy, 0) + inner->f = 0.0; // tag5: (InnerStructTy, FloatScalarTy, 4) + *f = 0.0; // tag6: (FloatScalarTy, FloatScalarTy, 0) + *i = 0; // tag7: (IntScalarTy, IntScalarTy, 0) + *c = 0; // tag8: (CharScalarTy, CharScalarTy, 0) + } + +With the type system descriptor graph as (note that in C and C++, a +``char`` can be used to access into any type): + +.. code-block:: text + + Root = "TBAA Root" + CharScalarTy = ("char", Root, 0) + FloatScalarTy = ("float", CharScalarTy, 0) + DoubleScalarTy = ("double", CharScalarTy, 0) + IntScalarTy = ("int", CharScalarTy, 0) + InnerStructTy = {"Inner" (IntScalarTy, 0), (FloatScalarTy, 4)} + OuterStructTy = {"Outer", (FloatScalarTy, 0), (DoubleScalarTy, 4), + (InnerStructTy, 12)} + + +``GetAliasingLocations(tag3)`` is ``{tag3, tag4, tag7, (CharScalarTy, +IntScalarTy, 0)}``. This means that other than ``tag3`` itself, ``tag3`` +aliases ``tag4``, ``tag7`` and ``tag8``, reflecting the fact that +``inner``, ``i`` and ``c`` could have been set to point into the containing +``outer`` structure. Alternatively, ``tag6`` does not alias ``tag3`` since +neither is ``(FloatScalarTy, _, 0)`` in ``GetAliasingLocations(tag3)``, nor +is ``(OuterStructTy, _, 16)`` in ``GetAliasingLocations(tag6)`` ``=`` +``{tag6, (CharScalarTy, FloatScalarTy, 0)}``. + +.. _tbaa_node_representation: + +Representation +"""""""""""""" + +The root node of a TBAA type hierarchy is an ``MDNode`` with 0 operands or +with exactly one ``MDString`` operand. + +Struct type descriptors are represented as ``MDNode`` s with an odd number +of operands greater than 1. The first operand is an ``MDString`` which, +like for scalar type descriptors, is for ergonomics -- LLVM only cares that +it finds an ``MDString`` there. After this the name, the struct type +descriptors have a sequence of alternating ``MDNode`` and ``ConstantInt`` +operands. The 2N + 1 th operand, an ``MDNode`` denotes a contained field, +and the 2N + 1 th operand, a ``ConstantInt`` is the offset of the said +constant field. The offsets must be in non-decreasing order. We can +determine the TBAA root a struct type descriptor belongs to by recursing +into any of its operands till we see a scalar type descriptor, at which +point the rule for finding the TBAA root for scalar type descriptor +applies. It should not matter which operand we recurse into, since they +should all lead to the same root. + +Scalar type descriptors, being a proper subset of struct type descriptors +have the same representation as struct type descriptors in general. +However, there is one shorthand for making the representation of scalar +type descriptors easier: an ``MDNode`` with two operands, the first operand +being an ``MDString`` and the second operand being another ``MDNode`` is +treated as having a constant integer 0 as an implicit third operand. We +can determine the TBAA root a scalar type descriptor belongs to by +recursing into this second operand till we see a root node. + +The example above, reified into metadata nodes, will look like: + +.. code-block:: text - !0 = !{ !"an example type tree" } - !1 = !{ !"int", !0 } - !2 = !{ !"float", !0 } - !3 = !{ !"const float", !2, i64 1 } - -The first field is an identity field. It can be any value, usually a -metadata string, which uniquely identifies the type. The most important -name in the tree is the name of the root node. Two trees with different -root node names are entirely disjoint, even if they have leaves with -common names. - -The second field identifies the type's parent node in the tree, or is -null or omitted for a root node. A type is considered to alias all of -its descendants and all of its ancestors in the tree. Also, a type is -considered to alias all types in other trees, so that bitcode produced -from multiple front-ends is handled conservatively. - -If the third field is present, it's an integer which if equal to 1 -indicates that the type is "constant" (meaning + !0 = !{!"TBAA Root"} ;; Root + !1 = !{!"char", !0, i64 0} ;; CharScalarTy + ;; Possible shorthand: !1 = !{!"char", !0} + !2 = !{!"float", !0, i64 0} ;; FloatScalarTy + ;; Possible shorthand: !2 = !{!"float", !0} + !3 = !{!"double", !0, i64 0} ;; DoubleScalarTy + ;; Possible shorthand: !3 = !{!"double", !0} + !4 = !{!"int", !0, i64 0} ;; IntScalarTy + ;; Possible shorthand: !4 = !{!"int", !0} + !5 = !{!"Inner", !4, i64 0, !2, i64 4} ;; InnerStructTy + !6 = !{!"Outer", !2, i64 0, !3, i64 4, !5, 12} + + +Access tags are represented as ``MDNode`` s with either 3 or 4 operands. +The first operand is an ``MDNode`` pointing to the node representing the +base type. The second operand is an ``MDNode`` pointing to the node +representing the access type. The third operand is a ``ConstantInt`` that +states the offset of the access. If a fourth field is present, it must be +a ``ConstantInt`` valued at 0 or 1. If it is 1 then the access tag states +that the location being accessed is "constant" (meaning ``pointsToConstantMemory`` should return true; see `other useful -AliasAnalysis methods `_). +AliasAnalysis methods `_). The TBAA root of +the access type and the base type of an access tag must be the same, and +that is the TBAA root of the access tag. '``tbaa.struct``' Metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^