This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Generate TBAA type descriptors in a more reliable manner
ClosedPublic

Authored by kosarev on Nov 13 2017, 1:44 AM.

Download Raw Diff

Details

Reviewers

rjmccall
hfinkel

Commits

rG5d9d32e82096: [CodeGen] Generate TBAA type descriptors in a more reliable manner
rC318752: [CodeGen] Generate TBAA type descriptors in a more reliable manner
rL318752: [CodeGen] Generate TBAA type descriptors in a more reliable manner

Summary

This patch introduces a couple of helper functions that make it possible to handle the caching logic in a single place.

Diff Detail

Repository: rL LLVM

Event Timeline

kosarev created this revision.Nov 13 2017, 1:44 AM

rjmccall added inline comments.Nov 13 2017, 10:41 AM

lib/CodeGen/CodeGenTBAA.cpp
267 ↗	(On Diff #122618)	The main danger with persisting this kind of reference is that DenseMap doesn't actually guarantee stability, so if there are recursive calls that can grow the data structure, the reference will become dangling. I think your patch is okay for the previous two functions, although I'd like you to check that, but in this function I'm confident that it's wrong, because you definitely recursively call getBaseTypeInfo below.

Indeed, DenseMap invalidates iterators on insertion. But then even the "technically correct" part of these changes make things more fragile while my original concern was reliability and not performance. I particularly don't like the repeating cache assignments.

What if we add a set of helper functions whose only purpose is to produce new nodes so we can handle cache-related things in a single place? Something like this:

llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(llvm::Type *Type) {
  ... Whatever we currently do in getTypeInfo(), except accesses to MetadataCache ...
}

llvm::MDNode *CodeGenTBAA::getTypeInfo(QualType QTy) {
  ...
  const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
  if (llvm::MDNode *N = MetadataCache[Ty])
    return N;

  return MetadataCache[Ty] = getTypeInfoHelper(Ty);
}

If for any reasons it is undesirable, then I think I better abandon this diff. Maybe just add a comment explaining that we lookup twice for the same key intentionally.

I think having a primary function that handles the caching logic makes some sense. I think there might be some cases that intentionally don't cache their normal result, though, so it might be harder than you think. Up to you whether you want to continue.

Reworked to use helper functions to separate producing metadata nodes from other code.

In D39953#929144, @rjmccall wrote:

I think there might be some cases that intentionally don't cache their normal result, though, so it might be harder than you think.

My understanding is that conceptually every canonical type has a single corresponding type node; we shall never return different nodes for the same type. With the updated patch we cache nodes for all types, including different versions of char. This is supposed to be an improvement saving us some execution time.

Okay, looks good.

This revision is now accepted and ready to land.Nov 20 2017, 5:05 PM

Closed by commit rL318752: [CodeGen] Generate TBAA type descriptors in a more reliable manner (authored by kosarev). · Explain WhyNov 21 2017, 3:18 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

CodeGenTBAA.h

8 lines

CodeGenTBAA.cpp

99 lines

Diff 123754

cfe/trunk/lib/CodeGen/CodeGenTBAA.h

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	bool CollectFields(uint64_t BaseOffset,
SmallVectorImpl<llvm::MDBuilder::TBAAStructField> &Fields,		SmallVectorImpl<llvm::MDBuilder::TBAAStructField> &Fields,
bool MayAlias);		bool MayAlias);

/// A wrapper function to create a scalar type. For struct-path aware TBAA,		/// A wrapper function to create a scalar type. For struct-path aware TBAA,
/// the scalar type has the same format as the struct type: name, offset,		/// the scalar type has the same format as the struct type: name, offset,
/// pointer to another node in the type DAG.		/// pointer to another node in the type DAG.
llvm::MDNode createTBAAScalarType(StringRef Name, llvm::MDNode Parent);		llvm::MDNode createTBAAScalarType(StringRef Name, llvm::MDNode Parent);

		/// getTypeInfoHelper - An internal helper function to generate metadata used
		/// to describe accesses to objects of the given type.
		llvm::MDNode getTypeInfoHelper(const Type Ty);

		/// getBaseTypeInfoHelper - An internal helper function to generate metadata
		/// used to describe accesses to objects of the given base type.
		llvm::MDNode getBaseTypeInfoHelper(const Type Ty);

public:		public:
CodeGenTBAA(ASTContext &Ctx, llvm::LLVMContext &VMContext,		CodeGenTBAA(ASTContext &Ctx, llvm::LLVMContext &VMContext,
const CodeGenOptions &CGO,		const CodeGenOptions &CGO,
const LangOptions &Features,		const LangOptions &Features,
MangleContext &MContext);		MangleContext &MContext);
~CodeGenTBAA();		~CodeGenTBAA();

/// getTypeInfo - Get metadata used to describe accesses to objects of the		/// getTypeInfo - Get metadata used to describe accesses to objects of the
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CodeGenTBAA.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	if (const RecordType *TTy = QTy->getAs<RecordType>()) {
// RD can be struct, union, class, interface or enum.		// RD can be struct, union, class, interface or enum.
// For now, we only handle struct and class.		// For now, we only handle struct and class.
if (RD->isStruct() \|\| RD->isClass())		if (RD->isStruct() \|\| RD->isClass())
return true;		return true;
}		}
return false;		return false;
}		}

llvm::MDNode *CodeGenTBAA::getTypeInfo(QualType QTy) {		llvm::MDNode CodeGenTBAA::getTypeInfoHelper(const Type Ty) {
// At -O0 or relaxed aliasing, TBAA is not emitted for regular types.
if (CodeGenOpts.OptimizationLevel == 0 \|\| CodeGenOpts.RelaxedAliasing)
return nullptr;

// If the type has the may_alias attribute (even on a typedef), it is
// effectively in the general char alias class.
if (TypeHasMayAlias(QTy))
return getChar();

// We need this function to not fall back to returning the "omnipotent char"
// type node for aggregate and union types. Otherwise, any dereference of an
// aggregate will result into the may-alias access descriptor, meaning all
// subsequent accesses to direct and indirect members of that aggregate will
// be considered may-alias too.
// TODO: Combine getTypeInfo() and getBaseTypeInfo() into a single function.
if (isValidBaseType(QTy))
return getBaseTypeInfo(QTy);

const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
if (llvm::MDNode *N = MetadataCache[Ty])
return N;

// Handle builtin types.		// Handle builtin types.
if (const BuiltinType *BTy = dyn_cast<BuiltinType>(Ty)) {		if (const BuiltinType *BTy = dyn_cast<BuiltinType>(Ty)) {
switch (BTy->getKind()) {		switch (BTy->getKind()) {
// Character types are special and can alias anything.		// Character types are special and can alias anything.
// In C++, this technically only includes "char" and "unsigned char",		// In C++, this technically only includes "char" and "unsigned char",
// and not "signed char". In C, it includes all three. For now,		// and not "signed char". In C, it includes all three. For now,
// the risk of exploiting this detail in C++ seems likely to outweigh		// the risk of exploiting this detail in C++ seems likely to outweigh
// the benefit.		// the benefit.
Show All 14 Lines	case BuiltinType::ULongLong:
return getTypeInfo(Context.LongLongTy);		return getTypeInfo(Context.LongLongTy);
case BuiltinType::UInt128:		case BuiltinType::UInt128:
return getTypeInfo(Context.Int128Ty);		return getTypeInfo(Context.Int128Ty);

// Treat all other builtin types as distinct types. This includes		// Treat all other builtin types as distinct types. This includes
// treating wchar_t, char16_t, and char32_t as distinct from their		// treating wchar_t, char16_t, and char32_t as distinct from their
// "underlying types".		// "underlying types".
default:		default:
return MetadataCache[Ty] =		return createTBAAScalarType(BTy->getName(Features), getChar());
createTBAAScalarType(BTy->getName(Features), getChar());
}		}
}		}

// C++1z [basic.lval]p10: "If a program attempts to access the stored value of		// C++1z [basic.lval]p10: "If a program attempts to access the stored value of
// an object through a glvalue of other than one of the following types the		// an object through a glvalue of other than one of the following types the
// behavior is undefined: [...] a char, unsigned char, or std::byte type."		// behavior is undefined: [...] a char, unsigned char, or std::byte type."
if (Ty->isStdByteType())		if (Ty->isStdByteType())
return MetadataCache[Ty] = getChar();		return getChar();

// Handle pointers and references.		// Handle pointers and references.
// TODO: Implement C++'s type "similarity" and consider dis-"similar"		// TODO: Implement C++'s type "similarity" and consider dis-"similar"
// pointers distinct.		// pointers distinct.
if (Ty->isPointerType() \|\| Ty->isReferenceType())		if (Ty->isPointerType() \|\| Ty->isReferenceType())
return MetadataCache[Ty] = createTBAAScalarType("any pointer",		return createTBAAScalarType("any pointer", getChar());
getChar());

// Enum types are distinct types. In C++ they have "underlying types",		// Enum types are distinct types. In C++ they have "underlying types",
// however they aren't related for TBAA.		// however they aren't related for TBAA.
if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {		if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
// In C++ mode, types have linkage, so we can rely on the ODR and		// In C++ mode, types have linkage, so we can rely on the ODR and
// on their mangled names, if they're external.		// on their mangled names, if they're external.
// TODO: Is there a way to get a program-wide unique name for a		// TODO: Is there a way to get a program-wide unique name for a
// decl with local linkage or no linkage?		// decl with local linkage or no linkage?
if (!Features.CPlusPlus \|\| !ETy->getDecl()->isExternallyVisible())		if (!Features.CPlusPlus \|\| !ETy->getDecl()->isExternallyVisible())
return MetadataCache[Ty] = getChar();		return getChar();

SmallString<256> OutName;		SmallString<256> OutName;
llvm::raw_svector_ostream Out(OutName);		llvm::raw_svector_ostream Out(OutName);
MContext.mangleTypeName(QualType(ETy, 0), Out);		MContext.mangleTypeName(QualType(ETy, 0), Out);
return MetadataCache[Ty] = createTBAAScalarType(OutName, getChar());		return createTBAAScalarType(OutName, getChar());
}		}

// For now, handle any other kind of type conservatively.		// For now, handle any other kind of type conservatively.
return MetadataCache[Ty] = getChar();		return getChar();
		}

		llvm::MDNode *CodeGenTBAA::getTypeInfo(QualType QTy) {
		// At -O0 or relaxed aliasing, TBAA is not emitted for regular types.
		if (CodeGenOpts.OptimizationLevel == 0 \|\| CodeGenOpts.RelaxedAliasing)
		return nullptr;

		// If the type has the may_alias attribute (even on a typedef), it is
		// effectively in the general char alias class.
		if (TypeHasMayAlias(QTy))
		return getChar();

		// We need this function to not fall back to returning the "omnipotent char"
		// type node for aggregate and union types. Otherwise, any dereference of an
		// aggregate will result into the may-alias access descriptor, meaning all
		// subsequent accesses to direct and indirect members of that aggregate will
		// be considered may-alias too.
		// TODO: Combine getTypeInfo() and getBaseTypeInfo() into a single function.
		if (isValidBaseType(QTy))
		return getBaseTypeInfo(QTy);

		const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
		if (llvm::MDNode *N = MetadataCache[Ty])
		return N;

		// Note that the following helper call is allowed to add new nodes to the
		// cache, which invalidates all its previously obtained iterators. So we
		// first generate the node for the type and then add that node to the cache.
		llvm::MDNode *TypeNode = getTypeInfoHelper(Ty);
		return MetadataCache[Ty] = TypeNode;
}		}

TBAAAccessInfo CodeGenTBAA::getVTablePtrAccessInfo() {		TBAAAccessInfo CodeGenTBAA::getVTablePtrAccessInfo() {
return TBAAAccessInfo(createTBAAScalarType("vtable pointer", getRoot()));		return TBAAAccessInfo(createTBAAScalarType("vtable pointer", getRoot()));
}		}

bool		bool
CodeGenTBAA::CollectFields(uint64_t BaseOffset,		CodeGenTBAA::CollectFields(uint64_t BaseOffset,
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	CodeGenTBAA::getTBAAStructInfo(QualType QTy) {
SmallVector<llvm::MDBuilder::TBAAStructField, 4> Fields;		SmallVector<llvm::MDBuilder::TBAAStructField, 4> Fields;
if (CollectFields(0, QTy, Fields, TypeHasMayAlias(QTy)))		if (CollectFields(0, QTy, Fields, TypeHasMayAlias(QTy)))
return MDHelper.createTBAAStructNode(Fields);		return MDHelper.createTBAAStructNode(Fields);

// For now, handle any other kind of type conservatively.		// For now, handle any other kind of type conservatively.
return StructMetadataCache[Ty] = nullptr;		return StructMetadataCache[Ty] = nullptr;
}		}

llvm::MDNode *CodeGenTBAA::getBaseTypeInfo(QualType QTy) {		llvm::MDNode CodeGenTBAA::getBaseTypeInfoHelper(const Type Ty) {
if (!isValidBaseType(QTy))		if (auto *TTy = dyn_cast<RecordType>(Ty)) {
return nullptr;

const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
if (llvm::MDNode *N = BaseTypeMetadataCache[Ty])
return N;

if (const RecordType *TTy = QTy->getAs<RecordType>()) {
const RecordDecl *RD = TTy->getDecl()->getDefinition();		const RecordDecl *RD = TTy->getDecl()->getDefinition();

const ASTRecordLayout &Layout = Context.getASTRecordLayout(RD);		const ASTRecordLayout &Layout = Context.getASTRecordLayout(RD);
SmallVector <std::pair<llvm::MDNode*, uint64_t>, 4> Fields;		SmallVector <std::pair<llvm::MDNode*, uint64_t>, 4> Fields;
unsigned idx = 0;		unsigned idx = 0;
for (RecordDecl::field_iterator i = RD->field_begin(),		for (RecordDecl::field_iterator i = RD->field_begin(),
e = RD->field_end(); i != e; ++i, ++idx) {		e = RD->field_end(); i != e; ++i, ++idx) {
QualType FieldQTy = i->getType();		QualType FieldQTy = i->getType();
Show All 9 Lines	if (auto *TTy = dyn_cast<RecordType>(Ty)) {
if (Features.CPlusPlus) {		if (Features.CPlusPlus) {
// Don't use the mangler for C code.		// Don't use the mangler for C code.
llvm::raw_svector_ostream Out(OutName);		llvm::raw_svector_ostream Out(OutName);
MContext.mangleTypeName(QualType(Ty, 0), Out);		MContext.mangleTypeName(QualType(Ty, 0), Out);
} else {		} else {
OutName = RD->getName();		OutName = RD->getName();
}		}
// Create the struct type node with a vector of pairs (offset, type).		// Create the struct type node with a vector of pairs (offset, type).
return BaseTypeMetadataCache[Ty] =		return MDHelper.createTBAAStructTypeNode(OutName, Fields);
MDHelper.createTBAAStructTypeNode(OutName, Fields);
}		}

return BaseTypeMetadataCache[Ty] = nullptr;		return nullptr;
		}

		llvm::MDNode *CodeGenTBAA::getBaseTypeInfo(QualType QTy) {
		if (!isValidBaseType(QTy))
		return nullptr;

		const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
		if (llvm::MDNode *N = BaseTypeMetadataCache[Ty])
		return N;

		// Note that the following helper call is allowed to add new nodes to the
		// cache, which invalidates all its previously obtained iterators. So we
		// first generate the node for the type and then add that node to the cache.
		llvm::MDNode *TypeNode = getBaseTypeInfoHelper(Ty);
		return BaseTypeMetadataCache[Ty] = TypeNode;
}		}

llvm::MDNode *CodeGenTBAA::getAccessTagInfo(TBAAAccessInfo Info) {		llvm::MDNode *CodeGenTBAA::getAccessTagInfo(TBAAAccessInfo Info) {
if (Info.isMayAlias())		if (Info.isMayAlias())
Info = TBAAAccessInfo(getChar());		Info = TBAAAccessInfo(getChar());

if (!Info.AccessType)		if (!Info.AccessType)
return nullptr;		return nullptr;
Show All 40 Lines