This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Generate TBAA type descriptors in a more reliable manner
ClosedPublic

Authored by kosarev on Nov 13 2017, 1:44 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

kosarev created this revision.Nov 13 2017, 1:44 AM
rjmccall added inline comments.Nov 13 2017, 10:41 AM
lib/CodeGen/CodeGenTBAA.cpp
267 ↗(On Diff #122618)

The main danger with persisting this kind of reference is that DenseMap doesn't actually guarantee stability, so if there are recursive calls that can grow the data structure, the reference will become dangling. I think your patch is okay for the previous two functions, although I'd like you to check that, but in this function I'm confident that it's wrong, because you definitely recursively call getBaseTypeInfo below.

Indeed, DenseMap invalidates iterators on insertion. But then even the "technically correct" part of these changes make things more fragile while my original concern was reliability and not performance. I particularly don't like the repeating cache assignments.

What if we add a set of helper functions whose only purpose is to produce new nodes so we can handle cache-related things in a single place? Something like this:

llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(llvm::Type *Type) {
  ... Whatever we currently do in getTypeInfo(), except accesses to MetadataCache ...
}

llvm::MDNode *CodeGenTBAA::getTypeInfo(QualType QTy) {
  ...
  const Type *Ty = Context.getCanonicalType(QTy).getTypePtr();
  if (llvm::MDNode *N = MetadataCache[Ty])
    return N;

  return MetadataCache[Ty] = getTypeInfoHelper(Ty);
}

If for any reasons it is undesirable, then I think I better abandon this diff. Maybe just add a comment explaining that we lookup twice for the same key intentionally.

rjmccall edited edge metadata.Nov 17 2017, 2:33 PM

I think having a primary function that handles the caching logic makes some sense. I think there might be some cases that intentionally don't cache their normal result, though, so it might be harder than you think. Up to you whether you want to continue.

kosarev updated this revision to Diff 123563.Nov 20 2017, 5:16 AM
kosarev retitled this revision from [CodeGen] Do not lookup for cached TBAA metadata nodes twice to [CodeGen] Generate TBAA type descriptors in a more reliable manner.
kosarev edited the summary of this revision. (Show Details)

Reworked to use helper functions to separate producing metadata nodes from other code.

I think there might be some cases that intentionally don't cache their normal result, though, so it might be harder than you think.

My understanding is that conceptually every canonical type has a single corresponding type node; we shall never return different nodes for the same type. With the updated patch we cache nodes for all types, including different versions of char. This is supposed to be an improvement saving us some execution time.

rjmccall accepted this revision.Nov 20 2017, 5:05 PM

Okay, looks good.

This revision is now accepted and ready to land.Nov 20 2017, 5:05 PM
This revision was automatically updated to reflect the committed changes.