The basic idea behind this patch is that since in strict aliasing mode all accesses to union members require their outermost enclosing union objects to be specified explicitly, then for a couple given accesses to union members of the form
p->a.b.c... q->x.y.z...
it is known they can only alias if both p and q point to the same union type and offset ranges of members a.b.c... and x.y.z.. overlap. Note that the actual types of the members do not matter.
Specifically, in this patch we do the following:
- Make unions to be valid TBAA base access types. This enables generation of TBAA type descriptors for unions.
- Encode union types as structures with a single member of a special "union member" type. Currently we do not encode information about sizes of types, but conceptually such union members are considered to be of the size of the whole union.
- Encode accesses to direct and indirect union members, including member arrays, as accesses to these special members. All accesses to members of a union thus get the same offset, which is the offset of the union they are part of. This means the existing LLVM TBAA machinery is able to handle such accesses with no changes.
While this is already an improvement comparing to the current situation, that is, representing all union accesses as may-alias ones, there are further changes planned to complete the support for unions. One of them is storing information about access sizes so we can distinct accesses to non-overlapping union members, including accesses to different elements of member arrays. Another change is encoding type sizes in order to make it possible to compute offsets within constant-indexed array elements. These enhancements will be addressed with separate patches.