This is an archive of the discontinued LLVM Phabricator instance.

[MsgPack] New MsgPackDocument class
ClosedPublic

Authored by tpr on Jan 21 2019, 9:28 AM.

Details

Summary

A class that exposes a simple in-memory representation of a document of
MsgPack objects, that can be read from and written to MsgPack, read from
and written to YAML, and inspected and modified in memory. This is
intended to be a lighter-weight (in terms of memory allocations)
replacement for MsgPackTypes.

Two subsequent changes will:

  1. switch AMDGPU HSA metadata to using MsgPackDocument instead of MsgPackTypes;
  2. add MsgPack AMDGPU PAL metadata via MsgPackDocument.

Change-Id: Ie15a054831d5a6467c5867c064c8f8f6b80270e1

Diff Detail

Event Timeline

tpr created this revision.Jan 21 2019, 9:28 AM
tpr added a subscriber: dstuttard.

Probably best to have someone else from AMDGPU land review this for your uses there - I'm just giving some high level at-a-glance stylistic coverage here.

include/llvm/BinaryFormat/MsgPackDocument.h
237

Any chance of using unique_ptrs in this data structure so manual cleanup isn't required?

Or change it to a vector of std::string?

lib/BinaryFormat/MsgPackDocument.cpp
30

Prefer non-member overloads when possible - it ensures equal conversion handling for the left and right operands. (the operator can still be implemented as a friend, so using all the implementation details, etc)

73–77

Could you use insert (& check the bool in the pair returned from insert to determine if it was newly inserted, then do the expensive (getDocument()->getNode()) initialization only in that case) instead of find+lookup? That way the code doesn't do two map lookups in the new case, only one.

tpr updated this revision to Diff 183148.Jan 23 2019, 12:07 PM

V2: Addressed review comments.

tpr marked 3 inline comments as done.Jan 23 2019, 12:08 PM
scott.linder accepted this revision.Feb 1 2019, 11:42 AM

LGTM, sorry for the delay in reviewing this.

This changes how tags are handled and removes the ability to "ignore tags" completely, but it achieves the same result of not littering the YAML with annotations in the usual case. I can see some corner cases where a stray tag will now appear in the YAML where it would not have before, but I think your solution is cleaner.

This revision is now accepted and ready to land.Feb 1 2019, 11:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 11:42 AM
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.Jun 23 2020, 3:27 PM
llvm/trunk/include/llvm/BinaryFormat/MsgPackDocument.h
123–132 ↗(On Diff #190466)

Excuse the delayed review - found this while investigating other issues. I believe this code is invalid & is an aliasing violation (taking a pointer to ArrayDocNode (similarly with MapDocNode) when it's not pointing to an object of that type).

Can you please fix this code to not do that?

scott.linder added inline comments.Jun 24 2020, 12:44 PM
llvm/trunk/include/llvm/BinaryFormat/MsgPackDocument.h
123–132 ↗(On Diff #190466)

Is the issue that in the implementations:

void DocNode::convertToArray() { *this = getDocument()->getArrayNode(); }

and

ArrayDocNode getArrayNode() {
  auto N = DocNode(&KindAndDocs[size_t(Type::Array)]);
  Arrays.push_back(std::unique_ptr<DocNode::ArrayTy>(new DocNode::ArrayTy));
  N.Array = Arrays.back().get();
  return N.getArray();
}

As even though the object pointed to by this is destructed and a new ArrayDocNode is copy-constructed over it, *that* ArrayDocNode is not actually ever constructed. A DocNode is just "type punned" into an ArrayDocNode.

Would changing the definition to:

ArrayDocNode getArrayNode() {
  auto N = ArrayDocNode(&KindAndDocs[size_t(Type::Array)]);
  Arrays.push_back(std::unique_ptr<DocNode::ArrayTy>(new DocNode::ArrayTy));
  N.Array = Arrays.back().get();
  return N;
}

And the analogous change for MapDocNode avoid the UB?

dblaikie added inline comments.Jun 26 2020, 7:02 PM
llvm/trunk/include/llvm/BinaryFormat/MsgPackDocument.h
123–132 ↗(On Diff #190466)

Hey, thanks for chiming in!

I don't believe the proposed change would address the UB.

Essentially, it's important for C++ that the compiler can assume that the type of an object doesn't change when a function call is made:

some_type var;
var.func();
// var is of type 'some_type' here

While it is valid (but pretty tricky/best avoided) to have 'func()' destroy 'var', placement-new some other type in 'var's memory, then destroy that, then construct a new 'some_type' where 'var' is - that doesn't seem like it's the use case you're trying to get to here, so I don't believe that small exception helps you here.

*this = getArrayNode() doesn't change the type of *this, so it's still an aliasing violation to reinterpret_cast this to an ArrayDocNode*, because it's still not an ArrayDocNode* - the assignment in convertToArray calls DocNode's operator=(const DocNode&) - it's slicing, just taking the DocNode part of the ArrayDocNode and assigning it to *this- it doesn't change the dynamic type of *this.

Anymore than:

base b;
derived d;
b = d;

b is still of type base, not of type derived.

scott.linder added inline comments.Jun 26 2020, 8:14 PM
llvm/trunk/include/llvm/BinaryFormat/MsgPackDocument.h
123–132 ↗(On Diff #190466)

Thank you for breaking it down! That makes perfect sense in retrospect.