This is an archive of the discontinued LLVM Phabricator instance.

[BinaryFormat] MessagePack reader/writer
ClosedPublic

Authored by scott.linder on Mar 13 2018, 8:00 AM.

Details

Summary

Add support for reading and writing MessagePack, a binary object serialization format which aims to be more compact than text formats like JSON or YAML. The specification can be found at https://github.com/msgpack/msgpack/blob/master/spec.md

AMD is defining our next code object metadata format in terms of MessagePack, and the linked patches reflect that use.

Diff Detail

Repository
rL LLVM

Event Timeline

scott.linder created this revision.Mar 13 2018, 8:00 AM

@chandlerc Have you had a chance to look at the patch? Do you know anyone else who would be interested?

I haven't really had time to look at this, but it seems somewhat unfortunate to add yet another format like this. =[ I guess I'm surprised that AMD is looking at baking this into their object file format. I'm not sure it makes sense to really consider landing this until the usage of it is at least available for review as well. While having the patches separate is good, I think it would be important to see the actual planned usage before adding all this code.

@dblaikie may have comments about the layering and the unittesting.

dblaikie added inline comments.Mar 26 2018, 3:12 PM
include/llvm/Support/MsgPackWriter.h
32 ↗(On Diff #138192)

This looks to be a layering violation - BinaryFormat depends on Support, not the other way around. So Support can't/shouldn't include BinaryFormat headers. (similar/same issue with the implementations in lib/Support)

unittests/Support/MsgPackReaderTest.cpp
17–18 ↗(On Diff #138192)

This could be a struct if it only has public members and public inheritance

22 ↗(On Diff #138192)

Presumably this could be removed/implicit.

43 ↗(On Diff #138192)

What makes these 32 values necessary/sufficient/interesting?

54–78 ↗(On Diff #138192)

A test like this doesn't seem to benefit much from having 3 tests inside it (they don't share anything) - might as well split them into 3 separate TEST_Fs? (also this would give them distinct names that might help provide the motivation for each variation at a glance)

Similarly in the rest of these tests and in the Writer tests too.

56 ↗(On Diff #138192)

std::string's 'assign' member might be more appropriate/direct

I haven't really had time to look at this, but it seems somewhat unfortunate to add yet another format like this. =[ I guess I'm surprised that AMD is looking at baking this into their object file format. I'm not sure it makes sense to really consider landing this until the usage of it is at least available for review as well. While having the patches separate is good, I think it would be important to see the actual planned usage before adding all this code.

@dblaikie may have comments about the layering and the unittesting.

Thank you for taking a look and adding subscribers; I didn't know who might have interest.

I understand the aversion to merging unused code, I just wanted to get feedback on the possibility as soon as possible. As for supporting another format I also understand the reluctance, but the MessagePack specification seems to be fairly static, and the code implementing it is relatively concise and straightforward. I do not think this code will have a significant impact on maintenance.

I am beginning work now on code for the AMD object file which will use MessagePack, so I should have more context available for this patch soon.

include/llvm/Support/MsgPackWriter.h
32 ↗(On Diff #138192)

Do you have thoughts on the best approach to fixing the layering? BinaryFormat seems like the correct place for the MessagePack constants, but I don't see why it can't be moved. Similarly, Support seems like the correct place for the MessagePack (de)serializer, but perhaps there is somewhere else it could live, possibly in BinaryFormat.

I really appreciate you taking a look, so thank you. I will address what I can and think of how best to move things to preserve layering.

dblaikie added inline comments.Mar 27 2018, 9:43 AM
include/llvm/Support/MsgPackWriter.h
32 ↗(On Diff #138192)

No strong opinions on what the right layering is here - I guess if it's reasonable to have it up in BinaryFormat that's probably better than having it in Support. (usual "least scope" kind of approach - not always the right thing, but at least a vague guidance)

scott.linder retitled this revision from [Support][RFC] MessagePack reader/writer to [BinaryFormat][RFC] MessagePack reader/writer.

Addressed feedback wrt. layering and testing. Moved library code from Support to BinaryFormat.

Clean up some comments and UB.

scott.linder retitled this revision from [BinaryFormat][RFC] MessagePack reader/writer to [BinaryFormat] MessagePack reader/writer.Jul 13 2018, 2:23 PM
scott.linder edited the summary of this revision. (Show Details)

Ping; the use in AMDGPU's backend is now up on Phabricator.

Rebase and ping

Testing is verbose, but probably suitable given the API involved.

Layering - looks about right, though if this is only used by one place, it should just go there rather than in a more common library like this.

unittests/BinaryFormat/MsgPackWriterTest.cpp
17 ↗(On Diff #158006)

You can drop the 'public' here, since it's the default/assumed in a struct.

17 ↗(On Diff #158006)

LLVM doesn't generally use global scope operator (the leading '::' in '::testing::Test') unless needed - and I hope it's not needed here.

548–549 ↗(On Diff #158006)

You can probably collapse all code like this:

std::string s;
s.assign(num, char);

to this:

std::string s(num, char);
549 ↗(On Diff #158006)

What's the static_cast here for? I'd have thought the usual arithmetic conversions would've made UINT8_MAX + 1 work correctly here without the cast?

552–553 ↗(On Diff #158006)

Wonder if it makes more sense to change these two lines (& all similar lines in these tests) to:

EXPECT_EQ(OStream.str(), std::string("...", 4) + s);
577 ↗(On Diff #158006)

Probably write this as:

MemoryBufferRef Ref(s, "");

Might actually even be better to not name this and roll it into its use below:

MPWriter.writeExt(0x02, MemoryBufferRef(s, ""));

(similar feedback for other similar code in this patch/tests)

599–600 ↗(On Diff #158006)

Looks slightly convoluted, compared to:

MemoryBuffeRef Ref(s, "");

or is there a reason to write it the way it is rather than that ^ ?

scott.linder marked 6 inline comments as done.Aug 1 2018, 12:05 PM
scott.linder added inline comments.
unittests/BinaryFormat/MsgPackWriterTest.cpp
549 ↗(On Diff #158006)

The static_cast may not be required here, but it is for e.g. TestWriteUInt64Min because UINT32_MAX + 1 wraps (and I believe it is required for UINT16_MAX + 1 as well). I added the cast in all the analogous tests to be consistent, but I can remove the UINT8 one if it is cleaner.

Address feedback

I have the library in BinaryFormat because it is used outside of the AMDGPU backend in tools (currently just llvm-readobj) in https://reviews.llvm.org/D48179

dblaikie accepted this revision.Aug 22 2018, 10:56 AM

Looks good - thanks!

This revision is now accepted and ready to land.Aug 22 2018, 10:56 AM
This revision was automatically updated to reflect the committed changes.
vitalybuka added inline comments.
llvm/trunk/lib/BinaryFormat/MsgPackWriter.cpp
91

This upsets ubsan

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/22688/steps/check-llvm%20ubsan/logs/stdio

- Testing: 27458 tests, 64 threads --
Testing: 0 
FAIL: LLVM-Unit :: BinaryFormat/./BinaryFormatTests/MsgPackWriter.TestWriteFloat64 (1375 of 27458)
******************** TEST 'LLVM-Unit :: BinaryFormat/./BinaryFormatTests/MsgPackWriter.TestWriteFloat64' FAILED ********************
Note: Google Test filter = MsgPackWriter.TestWriteFloat64
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MsgPackWriter
[ RUN      ] MsgPackWriter.TestWriteFloat64
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/BinaryFormat/MsgPackWriter.cpp:91:32: runtime error: -2.28999e+226 is outside the range of representable values of type 'float'
    #0 0x46e045 in llvm::msgpack::Writer::write(double) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/BinaryFormat/MsgPackWriter.cpp:91:32
    #1 0x455ace in MsgPackWriter_TestWriteFloat64_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm/unittests/BinaryFormat/MsgPackWriterTest.cpp:218:12
    #2 0x4a726f in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2474:5
    #3 0x4a80d5 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
    #4 0x4a8bd2 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
    #5 0x4b0503 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
    #6 0x4afeb6 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4257:10
    #7 0x49f524 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
    #8 0x49f524 in main /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:51
    #9 0x7fe190b362e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #10 0x411ae9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/unittests/BinaryFormat/BinaryFormatTests+0x411ae9)
kristina reopened this revision.EditedAug 22 2018, 6:29 PM
kristina added a subscriber: kristina.

Causes a regression with modular builds here (All other parallel build tasks failed on different parts of building the BinaryFormat module, this is just the first one):

[168/3291] Building CXX object lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o
FAILED: lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o
/usr/local/sdk/llvm-8.0.4078/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/BinaryFormat -I/SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat -Iinclude -I/SourceCache/llvm-trunk-8.0.4079/include -O3 -march=skylake -dwarf-4 -rtlib=compiler-rt -Wno-unused-command-line-argument -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -fmodules -fmodules-cache-path=/o/org.llvm.caches/llvm-8.0/4079/module.cache -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o -MF lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o.d -o lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o -c /SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat/Dwarf.cpp
While building module 'LLVM_BinaryFormat' imported from /SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat/Dwarf.cpp:14:
In file included from <module-includes>:6:
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:30:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FirstByte' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:28:1: note: namespace 'llvm::msgpack::FirstByte' begins here
namespace FirstByte {
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:42:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixBits' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:40:1: note: namespace 'llvm::msgpack::FixBits' begins here
namespace FixBits {
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:55:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixBitsMask' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:53:1: note: namespace 'llvm::msgpack::FixBitsMask' begins here
namespace FixBitsMask {
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:64:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixMax' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:62:1: note: namespace 'llvm::msgpack::FixMax' begins here
namespace FixMax {
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:74:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixLen' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:72:1: note: namespace 'llvm::msgpack::FixLen' begins here
namespace FixLen {
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:87:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixMin' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0/4079/include/llvm/BinaryFormat/MsgPack.h:85:1: note: namespace 'llvm::msgpack::FixMin' begins here
namespace FixMin {
^
/SourceCache/llvm-trunk-8.0/4079/lib/BinaryFormat/Dwarf.cpp:14:10: fatal error: could not build module 'LLVM_BinaryFormat'
#include "llvm/BinaryFormat/Dwarf.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
7 errors generated.
This revision is now accepted and ready to land.Aug 22 2018, 6:29 PM

Causes a regression with modular builds here:

[168/3291] Building CXX object lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o
FAILED: lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o
/usr/local/sdk/llvm-8.0.4078/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/BinaryFormat -I/SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat -Iinclude -I/SourceCache/llvm-trunk-8.0.4079/include -O3 -march=skylake -dwarf-4 -rtlib=compiler-rt -Wno-unused-command-line-argument -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -fmodules -fmodules-cache-path=/o/org.llvm.caches/llvm-8.0/4079/module.cache -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o -MF lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o.d -o lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/Dwarf.cpp.o -c /SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat/Dwarf.cpp
While building module 'LLVM_BinaryFormat' imported from /SourceCache/llvm-trunk-8.0.4079/lib/BinaryFormat/Dwarf.cpp:14:
In file included from <module-includes>:6:
/SourceCache/llvm-trunk-8.0.4079/include/llvm/BinaryFormat/MsgPack.h:30:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FirstByte' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0.4079/include/llvm/BinaryFormat/MsgPack.h:28:1: note: namespace 'llvm::msgpack::FirstByte' begins here
namespace FirstByte {
^
/SourceCache/llvm-trunk-8.0.4079/include/llvm/BinaryFormat/MsgPack.h:42:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixBits' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"
^
/SourceCache/llvm-trunk-8.0.4079/include/llvm/BinaryFormat/MsgPack.h:40:1: note: namespace 'llvm::msgpack::FixBits' begins here
namespace FixBits {
^
/SourceCache/llvm-trunk-8.0.4079/include/llvm/BinaryFormat/MsgPack.h:55:1: error: redundant #include of module 'LLVM_BinaryFormat.MsgPack' appears within namespace 'llvm::msgpack::FixBitsMask' [-Wmodules-import-nested-redundant]
#include "llvm/BinaryFormat/MsgPack.def"

If you're maintaining/watching the modules buildbot, this is probably the sort of thing worth fixing forward yourself - I don't think everyone's aware of/maintaining the modules build but those of us who do can fix things up easily enough - adding the .def file to the include/llvm/module.modulemap file along with all the other .def files that are already there.

kristina requested changes to this revision.Aug 22 2018, 6:56 PM

Don't have commit access but this should be revised to update the modulemap.

module LLVM_BinaryFormat {
    requires cplusplus
    umbrella "BinaryFormat" module * { export * }
    textual header "BinaryFormat/Dwarf.def"
    // -snip-
    textual header "BinaryFormat/WasmRelocs.def"
    textual header "BinaryFormat/MsgPack.def"
}

Modular build passes after that.

This revision now requires changes to proceed.Aug 22 2018, 6:56 PM

Don't have commit access but this should be revised to update the modulemap.

module LLVM_BinaryFormat {
    requires cplusplus
    umbrella "BinaryFormat" module * { export * }
    textual header "BinaryFormat/Dwarf.def"
    // -snip-
    textual header "BinaryFormat/WasmRelocs.def"
    textual header "BinaryFormat/MsgPack.def"
}

Modular build passes after that.

Ah, no worries - committed in r340506 - thanks!

If you're maintaining/watching the modules buildbot, this is probably the sort of thing worth fixing forward yourself - I don't think everyone's aware of/maintaining the modules build but those of us who do can fix things up easily enough - adding the .def file to the include/llvm/module.modulemap file along with all the other .def files that are already there.

Yes though unfortunately, since I lack commit access I can't commit minor changes like that, hence my request for revision.

Thank you both! I will try to keep modules in mind in the future.

Scott

kristina accepted this revision.Aug 22 2018, 7:12 PM

Don't have commit access but this should be revised to update the modulemap.

module LLVM_BinaryFormat {
    requires cplusplus
    umbrella "BinaryFormat" module * { export * }
    textual header "BinaryFormat/Dwarf.def"
    // -snip-
    textual header "BinaryFormat/WasmRelocs.def"
    textual header "BinaryFormat/MsgPack.def"
}

Modular build passes after that.

Ah, no worries - committed in r340506 - thanks!

Thank you! Buildbot for next one (numbering is odd, I know, /o/org.llvm.caches/llvm-8.0/4080) passed.

This revision is now accepted and ready to land.Aug 22 2018, 7:12 PM
kristina resigned from this revision.Aug 22 2018, 7:44 PM
scott.linder closed this revision.Aug 24 2018, 9:24 AM

Modular build and UB fixes (r340507) are committed.

anna added a subscriber: anna.Mar 31 2020, 12:49 PM

Hi,

It looks like there's some UB in this code. Pls see this failure: https://reviews.llvm.org/D76140#1953201
I'm not sure how to go about fixing the MsgPackReader code.
@scott.linder Could you PTAL?

anna added a comment.Mar 31 2020, 2:00 PM

Hi,

It looks like there's some UB in this code. Pls see this failure: https://reviews.llvm.org/D76140#1953201
I'm not sure how to go about fixing the MsgPackReader code.
@scott.linder Could you PTAL?

Pls ignore the above comment. It was infact a bug in my change.