This is an archive of the discontinued LLVM Phabricator instance.

[MsgPack] MsgPackDocument::readFromBlob now merges
ClosedPublic

Authored by tpr on May 9 2020, 7:02 AM.

Details

Summary

The readFromBlob method can now be used to read MsgPack into a Document
that already contains something, merging the two. There is a new Merger
argument to readFromBlob, a callback function to resolve conflicts.

Change-Id: Icf3e959217fe33cd907a41516c0386aef2847c0c

Diff Detail

Event Timeline

tpr created this revision.May 9 2020, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2020, 7:02 AM
tpr updated this revision to Diff 263022.May 9 2020, 9:53 AM

V2: Pass map key to merger function.

tpr updated this revision to Diff 263158.May 11 2020, 6:44 AM

V3: Removed unneeded nullptr arg.

Dinesh is working on merging metadata in ROCm CompilerSupport using MsgPackDocument, and it fell out that merging YAML metadata was also handled transparently. What is the advantage to doing this via a function pointer passed to readFromBlob? It doesn't seem to save any copies as you still need to be able to compare the nodes; is it just a matter of readability? Would it be reasonable to also add a callback for fromYAML?

foad added a subscriber: foad.May 12 2020, 8:41 AM
foad added inline comments.
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
384–385

Can't the body of this default merger be just llvm_unreachable();?

tpr added a comment.May 12 2020, 11:43 AM

Dinesh is working on merging metadata in ROCm CompilerSupport using MsgPackDocument, and it fell out that merging YAML metadata was also handled transparently. What is the advantage to doing this via a function pointer passed to readFromBlob? It doesn't seem to save any copies as you still need to be able to compare the nodes; is it just a matter of readability? Would it be reasonable to also add a callback for fromYAML?

I too tried doing it by reading the second blob into a MsgPackDocument and merging it into the first one from there. But I think that does involve more copies/allocation/etc because a map in the second document would actually get created as a std::map before having its elements merged into the corresponding map in the first document.

I guess the same thing as I have done could be done for the YAML reading code.

tpr marked an inline comment as done.May 12 2020, 11:44 AM
tpr added inline comments.
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
384–385

Probably, although that is a change in behavior from before, which I'd rather not do as I think I am not the only user of it. (Also, even if we did do that, it should be "return false" so that the readFromBlob call returns false for failure.)

foad added inline comments.May 12 2020, 12:18 PM
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
384–385

I don't get it. Surely in existing use cases, DestNode is nil so the merger will not be called. And if it *was* called, there's no point in it returning true unless it actually copies some data from SrcNode to DestNode, otherwise it'll claim to have done a merge but actually it ignored the new incoming data and left the document exactly as it was (i.e. empty). What am I missing?

tpr marked an inline comment as done.May 13 2020, 12:34 AM
tpr added inline comments.
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
384–385

It's a bit of a corner case, but, in the code before, I think you could have a map containing two entries of the same name, both maps, and it would merge them together. Having two entries of the same name is probably not legal MsgPack in some sense.

In D79671#2032032, @tpr wrote:

Dinesh is working on merging metadata in ROCm CompilerSupport using MsgPackDocument, and it fell out that merging YAML metadata was also handled transparently. What is the advantage to doing this via a function pointer passed to readFromBlob? It doesn't seem to save any copies as you still need to be able to compare the nodes; is it just a matter of readability? Would it be reasonable to also add a callback for fromYAML?

I too tried doing it by reading the second blob into a MsgPackDocument and merging it into the first one from there. But I think that does involve more copies/allocation/etc because a map in the second document would actually get created as a std::map before having its elements merged into the corresponding map in the first document.

I guess the same thing as I have done could be done for the YAML reading code.

I see now, it seems like in the new version we still create the empty std::map and std::array, but as we identify them as conflicting immediately afterwards we never populate them. So we still have the same number of dead objects floating around, but we avoid populating them and then just copying out their contents, which saves us allocations and copies.

The other concern I have is that this may make it difficult to do semantic merges which need more context than the current node being merged. I think this is OK for HSA currently, because we just need to fail when we come across mismatching version nodes.

llvm/include/llvm/BinaryFormat/MsgPackDocument.h
377

Nil is a valid map key in MsgPack, so this is ambiguous; can we use an Empty entry, or just use an Optional<DocNode> where None represents the non-map case?

384–385

Isn't the behavior already changing? In the old code any existing document is discarded and you essentially parse a fresh document to replace it. In the new code the default is effectively inverted, and if you don't do anything special in your Merger you get the existing document with only distinct bits added in from the newly parsed one. If we are completely changing the semantics anyway it seems reasonable to fix odd bits like this.

Alternatively we could just have Merger be Optional and retain the behavior completely if it is not present.

llvm/lib/BinaryFormat/MsgPackDocument.cpp
156

Shouldn't this check for Empty? Why is Nil special concerning merges?

Is this because we represent an empty document as a Nil node? I didn't notice that in the original review, but I think we need to represent an empty document in a way that is distinct from one of the possible documents (in this case we can't distinguish an empty document from a document containing a single Nil node).

dineshkb-amd added inline comments.May 15 2020, 5:07 PM
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
385

Can we consider *DestNode = SrcNode if DestNode and SrcNode are of same type to be closer to old behavior.

llvm/lib/BinaryFormat/MsgPackDocument.cpp
140–142

One of the logical cases to consider for mergers involving ArrayNodes is that the "Merger" callback provider can append Node elements to DestNode. e.g. lets say the document looking like

Root{"K1" : [ {"P" : p1} ] } is merged with Root{"K1" : ["P" : p2} ] } where {} represents a map and [] represents an array. The expected result is Root{"K1" : [ {"P" : p1}, {"P" : p2} ] }

This is one of the cases when we deal with merging metadata in ROCm CompilerSupport. Can this be achieved with current callback if so shouldn't the Array index be modified to the number of elements added?

tpr updated this revision to Diff 264574.EditedMay 18 2020, 3:26 AM
V4: Addressed review comments:
    * Separated the idea of an empty node from a nil node
    * Made default merge behavior to fail on any conflict
    * Array merging now appends
tpr marked 6 inline comments as done.May 18 2020, 3:30 AM
tpr added inline comments.
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
384–385

OK, I've gone with Jay's suggestion of the default being to fail on any conflict.

llvm/lib/BinaryFormat/MsgPackDocument.cpp
140–142

Modified to append when merging arrays.

156

Good point. I have now separated the idea of an empty node from a nil one.

tpr marked 3 inline comments as done.May 18 2020, 3:57 AM

With the nil/empty change, there is a change in behavior in that doing a map lookup "map[key]" with a key not previously in the map creates a new map entry that is empty rather than nil. That is more correct, but I have just discovered some code in LLPC relying on the old behavior. I can fix the LLPC code, but is the change in behavior going to cause any HSA problems?

tpr updated this revision to Diff 265483.May 21 2020, 5:54 AM

V5: API change so client can choose whether array merging does merge or append.

In D79671#2041363, @tpr wrote:

With the nil/empty change, there is a change in behavior in that doing a map lookup "map[key]" with a key not previously in the map creates a new map entry that is empty rather than nil. That is more correct, but I have just discovered some code in LLPC relying on the old behavior. I can fix the LLPC code, but is the change in behavior going to cause any HSA problems?

Sorry, I didn't notice this right away. Thank you for checking, I don't believe we have anywhere that relies on the old behavior, we don't really have any cases where a nil node is valid in HSA metadata.

scott.linder accepted this revision.May 21 2020, 7:33 AM

LGTM, thank you for bearing with us taking a while to review. I left a small request but feel free to commit without updating the review.

llvm/include/llvm/BinaryFormat/MsgPackDocument.h
43

Small nit, could this be moved into the enum in MsgPackReader.h with a similar comment?

This revision is now accepted and ready to land.May 21 2020, 7:33 AM
tpr marked 2 inline comments as done.May 21 2020, 12:41 PM
tpr added inline comments.
llvm/include/llvm/BinaryFormat/MsgPackDocument.h
43

Will do, thanks.

This revision was automatically updated to reflect the committed changes.
tpr marked an inline comment as done.