Page MenuHomePhabricator

[Metadata] Add a resize/reserve capability for MDNodes
Needs ReviewPublic

Authored by wolfgangp on Apr 27 2022, 11:49 AM.

Details

Reviewers
dexonsmith
Summary

This patch a starting point for what's discussed in D118416, which is adding a resize capability to MDNodes.

We're distinguishing between small (< 16 operands) and large MDNodes, with the latter having their operands allocated in hung-off storage. Temporary and distinct MDNodes can be resized (to a larger operand capacity, they can't be made smaller), to which end a pointer to separately allocated hung-off storage is kept in the location of the co-allocated operands. This means that MDNodes with 0 operands ('tiny') need some extra space allocated for the pointer, which is not necessary for uniqued MDNodes with 0 operands. In order to distinguish the 2 tiny node types we use the concept 'tiny resizable' and 'tiny fixed' MDNodes (in addition to large and small). Note that this extra complexity could be eliminated if we're OK with allocating extra space even for uniqued tiny nodes. Only 2-3% of MDNodes seem to have 0 operands, so maybe this would not be a big deal.

The patch does not (yet) address the O(N) issue that arises from handling ModuleFlags of type "Append" during LTO, though it does fix the memory issue mentioned in 51893. For that we'd presumably want to scan all participating modules and reserve enough space for the operands from the start. This will be addressed in a later patch.

I didn't add a push_back() interface because there wasn't a client for it, but it can be easily added if needed. I named the resize method 'reserve' because we don't support shrinking anything.

For now, only MDTuples can be resized.

AFAICT, there is no effect on bitcode reading or writing.

Diff Detail

Event Timeline

wolfgangp created this revision.Apr 27 2022, 11:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 11:49 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
wolfgangp requested review of this revision.Apr 27 2022, 11:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 11:49 AM

Thanks for working on this! I think this will be a big improvement.

I have a few detailed suggestions inline (probably way too detailed, but I'm not sure I'll be super responsive in the next couple of weeks and didn't want to block you; don't feel you need to match my exact suggestions if you have a better idea).

I suggest splitting into multiple patches:

  1. Prep: Fix MDOperand(MDOperand&&) (etc.) so that std::move() is efficient.
  2. Prep: Refactor MDNode to clean up current allocation a bit, and to front-load NFC changes to keep (3) as clean as possible. (See operator delete comment for details.)
  3. The heart of this patch: make MDTuple resizable with a resize() API.
  4. Optional: Add the append() APIs (not sure they're needed but seem nice to have).
  5. Follow-up: improve module append flags, using (3) and/or (4).

For (3), the heart of this patch, a couple of specific comments to call out:

  • I think the API should be protected in MDNode and only exposed as public in MDTuple (for now; matching the assertion).
  • Might be cleaner to offload growth/capacity logic to SmallVector or similar (when we're in hung-off territory); see MDNode::reserve() for details.

In order to distinguish the 2 tiny node types we use the concept 'tiny resizable' and 'tiny fixed' MDNodes (in addition to large and small). Note that this extra complexity could be eliminated if we're OK with allocating extra space even for uniqued tiny nodes. Only 2-3% of MDNodes seem to have 0 operands, so maybe this would not be a big deal.

I think it's okay to allocate some extra space. But I think we can factor three modes ("fixed", "dynamic:small", and "dynamic:large") without too much complexity. "fixed" and "dynamic:small" can have identical layout; just one is allowed to resize and the other isn't.

I named the resize method 'reserve' because we don't support shrinking anything.

That seems like an arbitrary restriction. Why not shrink?

I feel like it would be simpler to *only* have resize() (not reserve or append, although those could be built on top I suppose).

  • If resize grows, it adds nullptr arguments. The MDNodeTest.Print unit test confirms my memory that nullptr is a valid operand value.
  • If resize shrinks, it drops the refs. The allocation never shrinks, but I don't see why we'd arbitrarily restrict it not to be allowed.
llvm/include/llvm/IR/Metadata.h
950

I think(?) it's technically UB (or at least an extension) to use [1] for an array that's bigger. I suggest leaving this out, but adding an acccessor that does the math:

MutableArrayRef<MDOperand> getOperands() {
  return makeMutableArrayRef(reinterpret_cast<MDOperand *>(this + 1), NumOperands);
}
ArrayRef<MDOperand> getOperands() const {
  return const_cast<HungOffOperandStorage *>(this)->getOperands();
}

and you can add a static assert that the alignment will be okay (if it fails some day we can add alignas noise).

953–960

It's obvious to me what the layout is. Can you add a comment above Info that spells it out clearly? Bitfield syntax (// unsigned Name : Size;) would be clear I think. (I'm not totally convinced the abstraction improves readability or reduces bugs but I'm not against it if you think it helps.)

969–972

Usually we do sizes as size_t, except when we're optimizing storage. There's no storage here though.

1032–1060

These all feel very private to me, not protected. (Also, I'm really not sure all this boilerplate is better than a normal bitfield.)

1040–1042

Else where in LLVM coallocated storage is usually called "small". I suggest using the word "small" instead of "coalloc".

1129–1131

I think this should be protected and only enabled on MDTuple for now. Callers can do a cast<MDTuple>()... although, with the append() API, is there even a reason to need this? I don't see any uses except in tests, and the tests could just call append().

1294–1300

I think append should be protected and only exposed in MDTuple, for now. Most of the node kinds have a fixed operand count and calling append() would corrupt them.

1389

Is an explicit capacity really useful/needed? I don't see any uses of this parameter except tests. It might be useful in tests to *access* the capacity (to confirm growth doubles or whatever), but if tests want to influence the capacity seems like they could just use append().

llvm/lib/IR/Metadata.cpp
569

It's a bit sketchy calling anything on N during delete (hence the msan suppression)... it seems even dodgier to call a member function that we can't even see.

Maybe as a prep commit we can refine MDNode's layout to fix the msan issue by moving the first 64-bits ahead of the MDNode. In MDNode, something like:

class MDNode {
  struct Header {
    unsigned NumOperands;
    unsigned NumUnresolved = 0;

    static constexpr size_t getAllocSize(unsigned NumOps) {
      return sizeof(MDOperand) * NumOps + sizeof(Header);
    }
    void *getAlloc() {
      return reinterpret_cast<char *>(this + 1) - getAllocSize(NumOperands);
    }

    MutableArrayRef<MDOperand> operands();
    ArrayRef<MDOperand> operands() const;
    explicit Header(unsigned NumOperands);
    ~Header();
  };
  Header &getHeader() {
    return *(reinterpret_cast<Header *>(this) - 1);
  }
  const Header &getHeader() const {
    return *(reinterpret_cast<const Header *>(this) - 1);
  }

Header is will be coallocated immediately before MDNode, and contain everything necessary to understand its layout.

When allocating:

void *Mem = malloc(Header::getAllocSize(NumOps) + Size);
Header &H = new(reinterpret_cast<char *>(Mem) + OpsAlloc) Header(NumOps);
return reinterpret_cast<void *>(&H + 1);

Here, when deleting:

void MDNode::operator delete(void *Mem) {
  Header *H = reinterpret_cast<Header *>(Mem) - 1;
  Mem = H->getAlloc();
  H->~Header(); // destroys operands
  ::operator delete(Mem);
}

This prep commit would be almost NFC (no functionality change) -- except for fixing the MSan thing and dropping the suppression -- and you could make other refactorings in it to remove NFC noise from this patch, making this one easier to review.

For example, you can structure Header helper functions to be similar to what this commit will land with HungOffOperandStorage; maybe you can share the implementations... or maybe not, but it could simplify the code a bit if you did.

571–572

This logic doesn't look quite right to me. If there's a capacity of 2 and only 1 op, then I think we'll have this in memory:

struct {
  MDOperand Op0;
  char Op1[sizeof(MDOperand)] Op1;
  MDNode Node;
};

I think you need to skip over Op1 to avoid destroying something that hasn't been constructed, and I'm not seeing how that happens...

But if you make the change I suggested above in a prep commit, using Header::ops_begin() and Header::ops_end(), then you won't even have to modify this loop here.

587–589

Ah, just spotted this late. Glad to see it, but seems like the related APIs should *only* show up in the public interface of MDTuple.

596–597

(I was a bit confused by NumOps - 1 here at first, but then I looked at HungOffOperandStorage and figured it out. I suggested a refactoring there that would make this just NumOps... a bit simpler I think.)

I'm not seeing the exponential resizing that I think we want. I think we want std::max(NumOps, OldOpSize * 2) or something like that, in order to grow. (Maybe I missed it and it's elsewhere? See, for example, SmallVectorBase::grow().)

There should also be an assertion for overflow if we go over uint32_t.

I'm wondering if it'd be better to defer most of the "growth" logic (except the first grow) to a battle-hardened vector class (for now, SmallVector). Here's the layout I'm thinking of:

// Pseudo-code, using template parameters for runtime things...
template <class Size> struct MDOperandCharArray {
  alignas(alignof(MDOperand)) char Buffer[sizeof(MDOperand) * Size];
};
template <size_t InitNumOps, bool IsDynamic> struct Layout {
  using LargeStorageVector = SmallVector<MDOperand, 0>;

  constexpr size_t NumOpsFitInVector = sizeof(LargeStorageVector) / sizeof(MDOperand);
  static_assert(NumOpsFitInVector * sizeof(MDOperand) == sizeof(LargeStorageVector));

  constexpr size_t MaxSmallSize = 15;
  constexpr bool InitLarge = InitNumOps > MaxSmallSize;
  constexpr bool NeedsLarge = InitLarge || IsDynamic;
  constexpr size_t MinSmallSize = NeedsLarge ? NumOpsFitInVector : 0;

  size_t SmallSize = MinSmallSize <= Size && Size <= MaxSmallSize ? InitNumOps : MinSmallSize;
  MDOperandCharArray<SmallSize> Ops;
  Header H;
  MDNode N;
  MDNodeSubclassTail Tail;
};

For dynamic nodes that start with 0 or 1 ops, Layout is bigger than what you have (for 64-bit, minimum small size of 2). But otherwise I think it's similar.

(Long-term, could add a (generic) TinyVector to ADT that is pointer-sized, and it could be used here to shrink the local allocation.)

Growing/shrinking would look like:

void Header::resize(size_t NumOps) {
  assert(IsDynamic);
  if (operands().size() == NumOps)
    return;

  if (!isSmall())
    getLarge().resize(NumOps);
  else if (NumOps <= SmallSize)
    resizeSmall(NumOps);
  else
    resizeSmallToLarge(NumOps);
}
void Header::resizeSmallToLarge(size_t NumOps) {
  assert(isSmall());
  assert(NumOps > SmallSize);
  LargeStorageVector NewOps(NumOps);
  llvm::move(operands(), NewOps.begin());
  resizeSmall(0);
  new (getLargePtr()) LargeStorageVector(std::move(NewOps));
  setToLarge(); // Sets the bit in Header that says large storage.
}
void Header::resizeSmall(size_t NumOps) {
  MutableArrayRef<MDOperand> ExistingOps = operands();
  assert(isSmall());
  assert(NumOps <= SmallSize);
  assert(NumOps != ExistingOps.size());
  int NumNew = (int)NumOps - (int)ExistingOps.size();
  MDOperand *O = ExistingOps.end();
  if (NumNew > 0) {
    for (int I = 0, E = NumNew; I != E; ++I)
      new (O++) MDOperand();
  } else {
    for (int I = 0, E = NumNew; I != E; --I)
      (--O)->~MDOperand();
  }
  setSmallNumOps(NumOps); // Records NumOps in Header.
  assert(O == ExistingOps.end());
}
void *Header::getLargePtr() {
  // Store it close by Header so it's more likely to be in the same cache line.
  static_assert(alignof(LargeStorageVector) <= alignof(Header));
  return reinterpret_cast<char *>(this) - sizeof(LargeStorageVector);
}
void *Header::getSmallPtr() {
  static_assert(alignof(MDOperand) <= alignof(Header));
  return reinterpret_cast<char *>(this) - sizeof(MDOperand) * SmallSize;
}
LargeStorageVector &getLarge() {
  assert(!isSmall());
  return *reinterpret_cast<LargeStorageVector *>(getLargePtr());
}
MutableArrayRef<MDOperand> Header::operands() {
  if (!isSmall())
    return getLarge();
  return makeMutableArrayRef(reinterpret_cast<MDOperand *>(getSmallPtr()), SmallNumOps);
}
607–610

What does the move constructor do? Can we fix it in a prep commit?

953

I think we usually use a C-style cast for integer casts (i.e., (size_t)Capacity). I don't feel strongly though if you prefer this; maybe check to see if it's documented in the LLVM coding style guide?

llvm/lib/Linker/IRMover.cpp
1413–1422

This is simpler if you can assume MDTuple, since you don't need to clone and replace. You can just make a new one without worrying about other fields.

(IIRC, clone() has some overhead by being temporary to start...)

1418

Not much point in the flag being uniqued if it's pointing at something that isn't uniqued. I wonder if this should be MDTuple::getDistinct() (leave it as-is if it's already the right kind).

1444–1462

Can we delay this usage to a follow-up patch? I'd rather add the IR feature first (since it's non-trivial) and then add the adoption.

It is useful to see it at the same time to understand usage. Phabricator has a feature to link two reviews together (edit related revisions -> child/parent).

wolfgangp added inline comments.May 3 2022, 11:15 AM
llvm/include/llvm/IR/Metadata.h
953–960

I'm not sure it helps either. I'm always a bit uncertain about how much the availability of abstractions constitutes an encouragement to use them. In this case, I guess it makes the derivation of properties like max values from bitfields a bit easier, but it's not that hard to get it right anyway. Otherwise it feels like abstraction for abstractions sake. I'd be happy to get rid of it if you agree.

1389

I had envisioned a slightly different usage model in that clients would figure out in advance how many operands they need and reserve the space first. With the automatic expansion done by resize() I was afraid of too much waste, given how much space large LTO links are already using, for example.

llvm/lib/IR/Metadata.cpp
571–572

Hmm, all the allocated operands should have been constructed in operator new. In this scheme the idea was that operator new allocates the capacity and the constructor assigns the actual operands, of which there may be fewer than the capacity. Either way, all allocated operand slots should be properly constructed.

In any case, if we don't separate capacity and number of actual assigned operands, this is academic.

607–610

The move constructor copies the operand to the newly allocated slot and does retrack(). Optionally, it could also null out the old location (or call the destructor on it).

llvm/lib/Linker/IRMover.cpp
1444–1462

Yeah, I'll follow the sequence you suggested up at the top.

dexonsmith added inline comments.May 3 2022, 11:51 AM
llvm/include/llvm/IR/Metadata.h
953–960

SGTM!

1389

With automatic expansion it'll be a strict improvement in memory usage over what it was before (previously, all previous sizes would still be around) and it changes a quadratic operation to amortized linear (calling Metadata::retrack() isn't cheap). I think it's a better starting point.

If a profile shows that the automatically-expanded nodes are too expensive in practice, could add a shrink_to_fit() later.

llvm/lib/IR/Metadata.cpp
571–572

Ah, as soon as I saw the word "capacity" I assumed there would be storage where the operands had not been constructed (similar to a vector). I do think automatic expansion is the right default for resizable storage.

607–610

Not obvious to me how this is different, but I think if there's an improvement to make in the move constructor we should fix it, and this (or SmallVector<MDOperand>) should just use it.