This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Object/
-
llvm/
-
Object/
2
OffloadBinary.h
-
lib/Object/
-
Object/
-
CMakeLists.txt
2
OffloadBinary.cpp
-
unittests/Object/
-
Object/
-
CMakeLists.txt
-
OffloadingTest.cpp

Differential D122069

[Object] Add binary format for bundling offloading metadata
ClosedPublic

Authored by jhuber6 on Mar 19 2022, 8:20 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
grokos
ABataev
ronlieb
tianshilei1992

Commits

rGe471ba3d0122: [Object] Add binary format for bundling offloading metadata

Summary

We need to embed certain metadata along with a binary image when we wish
to perform a device-linking job on it. Currently this metadata was
embedded in the section name of the data itself. This worked, but made
adding new metadata very difficult and didn't work if the user did any
sort of section linking.

This patch introduces a custom binary format for bundling offloading
metadata with a device object file. This binary format is fundamentally
a simple string map table with some additional data and an embedded
image. I decided to use a custom format rather than using an existing
format (ELF, JSON, etc) because of the specialty use-case of this. We
need a simple binary format that can be concatenated without requiring
other external dependencies.

This extension will make it easier to extend the linker wrapper's
capabilties with whatever data is necessary. Eventually this will allow
us to remove all the external arguments passed to the linker wrapper and
embed it directly in the host's linker so device linking behaves exactly
like host linking.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Mar 19 2022, 8:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 8:20 AM

Herald added subscribers: ormris, dexonsmith, hiraditya, mgorny. · View Herald Transcript

jhuber6 requested review of this revision.Mar 19 2022, 8:20 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 19 2022, 8:20 AM

Herald added subscribers: llvm-commits, cfe-commits, sstefan1. · View Herald Transcript

jhuber6 added inline comments.Mar 19 2022, 8:21 AM

clang/include/clang/Basic/Offloading.h
1 ↗	(On Diff #416702)	Not sure if this should live in Clang or LLVM.

Harbormaster completed remote builds in B155202: Diff 416702.Mar 19 2022, 8:46 AM

Update alignment to worse-case padding of 16. I'm not sure if there's a bette solution for preventing the linker from adding padding between these sections if they are combined.

Harbormaster completed remote builds in B155212: Diff 416713.Mar 19 2022, 11:59 AM

Fix test after increasing alignment.

Harbormaster completed remote builds in B155216: Diff 416719.Mar 19 2022, 1:40 PM

Changing to add alignment to the global variable. Should ensure that the section
alignment is correct.

Harbormaster completed remote builds in B155238: Diff 416749.Mar 19 2022, 5:36 PM

I decided to use a custom format rather than using an existing format (ELF, JSON, etc) because of the specialty use-case of this.

Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from different LLVM trees.
If so, then maintaining compatibility of the binary format will become more painful over time, compared to using json.

Considering that we already have json parsing and serialization in LLVM, I don't see much benefit growing yet another on-disk format. AFAICT, JSON should do the job encoding relevant data just fine and would need less maintenance long term.

clang/include/clang/Basic/Offloading.h
87 ↗	(On Diff #416749)	On-disk format could use more comments.
87–104 ↗	(On Diff #416749)	Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
90 ↗	(On Diff #416749)	What does `Size` cover and what units it uses -- bytes, number of Entries, something else?
95–96 ↗	(On Diff #416749)	Should we use the matching enum types here? We're using 16-bit enums, so it would save us on casts when we use those fields and would make it harder to set a wrong value by mistake.
97 ↗	(On Diff #416749)	Do these flags have defined meaning?
98 ↗	(On Diff #416749)	What are the offsets relative to?

In D122069#3397373, @tra wrote:

I decided to use a custom format rather than using an existing format (ELF, JSON, etc) because of the specialty use-case of this.

Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from different LLVM trees.
If so, then maintaining compatibility of the binary format will become more painful over time, compared to using json.

As it stands now, this is only used by the linker-wrapper which is all clang. LLD is only called as part of the host link job so it's impossible to mix versions. That being said, eventually I want this functionality to be performed by the linker itself without needing the extra wrapper so it'll be possible in the future. I added the "Version" field specifically to emit a warning or error in the future if we change this. Even if we used JSON or some other form it would be a similar story if these get de-synced because we'd have different entries and keys. We could add to the struct without breaking the ABI since everything uses absolute offsets.

Considering that we already have json parsing and serialization in LLVM, I don't see much benefit growing yet another on-disk format. AFAICT, JSON should do the job encoding relevant data just fine and would need less maintenance long term.

I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.

This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work. It's not a deal-breaker, but it's somewhat of a turn-off.
There could be multiple of these contained in a single section, I primarily wanted something with a known size and easy to identify magic bytes to find where these live in the section. I wasn't sure how nicely this would interact with the JSON parser. It could also be done, but I wasn't sure how much utility there was.

clang/include/clang/Basic/Offloading.h
87 ↗	(On Diff #416749)	Noted, will add more if we settle on this format.
87–104 ↗	(On Diff #416749)	What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.
90 ↗	(On Diff #416749)	Size of the entire file, so you can do `Data[Header->Size]` and potentially get the next one in memory.
95–96 ↗	(On Diff #416749)	Yeah, the casts are annoying but I elected to be explicit with the size constraints in the header itself. Wasn't really sure which was better.
97 ↗	(On Diff #416749)	They're unused for now, I was planning on making it a bitfield so we could pass things like whether or not this file contains debug information or is optimized, kind of like fatbinary.
98 ↗	(On Diff #416749)	Absolute from the start of the file, can comment.

In D122069#3397560, @jhuber6 wrote:

I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.

This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work.

I didn't realize that content of the file is part of your packagin scheme. I've interpreted embed certain metadata along with a binary image as literally keeping the binaries as binaries and just adding a small blob with additional metadata. It was that data I meant to encode as JSON. Encoding the offload binaries themselves as JSON would indeed be wasteful.

We could keep the header as a binary (never changes on-disk format) and use JSON representation for the array of the entries (0-terminated string or string + length stored in the header) which also has fixed (as in version-agnostic) on-disk format, though of variable length.
Versioning still has to be dealt with, but now it would be independent of the on-disk format. The variable length of JSON is both a plus and a minus. On the positive side is that content is open. Some tool may add whatever is relevant for its own use, comments, provenance info, checksum, etc.
The downside is that it has variable length, so it would have to be written after the image binary. We would also need to deal with potential errors parsing JSON.

I don't have a strong preference either way. I think JSON may have few minor benefits, but the proposed binary format has the advantage of simplicity. We can always switch to json-encoded entries later by bumping the header version.

clang/include/clang/Basic/Offloading.h
87–104 ↗	(On Diff #416749)	`__attribute__((packed))`. Otherwise you depend on assumed natural alignment and that is target-dependent, IIRC.

In D122069#3397867, @tra wrote:

In D122069#3397560, @jhuber6 wrote:

I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.

This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work.

I didn't realize that content of the file is part of your packagin scheme. I've interpreted embed certain metadata along with a binary image as literally keeping the binaries as binaries and just adding a small blob with additional metadata. It was that data I meant to encode as JSON. Encoding the offload binaries themselves as JSON would indeed be wasteful.

We could keep the header as a binary (never changes on-disk format) and use JSON representation for the array of the entries (0-terminated string or string + length stored in the header) which also has fixed (as in version-agnostic) on-disk format, though of variable length.
Versioning still has to be dealt with, but now it would be independent of the on-disk format. The variable length of JSON is both a plus and a minus. On the positive side is that content is open. Some tool may add whatever is relevant for its own use, comments, provenance info, checksum, etc.
The downside is that it has variable length, so it would have to be written after the image binary. We would also need to deal with potential errors parsing JSON.

I don't have a strong preference either way. I think JSON may have few minor benefits, but the proposed binary format has the advantage of simplicity. We can always switch to json-encoded entries later by bumping the header version.

I like the idea of keeping the header, we could add an additional field to the header for the size of the entry and I feel like we'd be pretty future-proof if we want to change stuff. I think using this binary format for now is sufficient as long as we keep upgrading it to something more complex an open possibility.

I'm also not sure if I should extend this as a binary format inheriting from LLVM's Binary class. It would be a minimal amount of work but I'm not sure if this use-case warrants extending this to broader LLVM.

Add more comments and an entry size field to the header.

Harbormaster completed remote builds in B155516: Diff 417129.Mar 21 2022, 5:27 PM

Updating to use path instead of generic cmdline, makes it a lot easier to pass it. Also just adding a reserved field in case I want to add the cmdline back.

Harbormaster completed remote builds in B155907: Diff 417685.Mar 23 2022, 11:26 AM

Splitting this out into a patch for the format. Adding a unit test and changing
strings to now be an arbitrary string map. Hopefully the move to LLVM proper
won't draw ire for creating another binary format in LLVM.

jhuber6 retitled this revision from [Clang] Add binary format for bundling offloading metadata to [Object] Add binary format for bundling offloading metadata.Mar 25 2022, 8:07 AM

jhuber6 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B156290: Diff 418227.Mar 25 2022, 8:37 AM

Changing test, uniform_int_distribution doesn't support char or uint8_t according to the standard.

Harbormaster completed remote builds in B156309: Diff 418255.Mar 25 2022, 10:28 AM

Added some reviewers. I'd much prefer this used an existing binary format, DIY is prone to errors and maintenance hassles down the road. Don't care as much about which format as about it being one with an existing, tested implementation and ideally existing inspection tools.

In D122069#3413960, @JonChesterfield wrote:

Added some reviewers. I'd much prefer this used an existing binary format, DIY is prone to errors and maintenance hassles down the road. Don't care as much about which format as about it being one with an existing, tested implementation and ideally existing inspection tools.

I'm not married to the idea, and worst case scenario we can replace it with something else in the future. I'd like to get something like this working so I can finally make the new driver the default, so I'll just outline the problem and some of the potential solutions.

The issue is that we need to store some metadata along with a binary image so we know how to handle it at a later date. Currently we shove this in the name of the ELF section and parse it there, but that's not idea because more metadata will be needed in the future and it prevents us from doing things like relocatable linking or other merging (e.g. we want to store both an sm_70 and sm_35 image in the same file). So we want a binary format that can store some strings, other data. I'll just go over a brief overview of the options:

YAML / JSON

+ Ubiquitous simple text format for encoding object data
+ In-tree implementation
x Requires encoding to store the device image
x Will need binary padding and size calculation to make sure these merge properly in a section

Protocol Buffers

+ Well-tested
+ Implicit appending, no additional code required to handle merged sections.
x Out-of-tree, requires external dependencies to build and maintain in the future. No other use in Clang / LLVM

ELF

+ Ubiquitous tooling. Object extraction and copying for free
+ Simple key-value storage
x Difficult to calculate size, will need to figure out the size of the buffer and write it in later so we can read multiple appended sections.
x Difficult to create. The Elf object writer is completely tied to the MC backend. YAML2ELF would require base64 or similar again

MSGPACK

+ Exists in-tree in some form and well tested
+ Supports key-value storage
x Doesn't know its size, will need to add padding and a size field

Custom format

+ Relatively simple implementation that solves this specific problem
x No existing tooling support, more error prone

I decided to go with the custom format because it was the simplest to get working for a proof of concept to solve the problem I was immediately facing. I think ELF would be the next best if someones could suggest a way to write the data and get the size. MSGPACK seems to be @JonChesterfield's preferred method because it has a lot of use at AMD, it would work as long as we can figure out its size and get alignment. Let me know what suggestions you have because I really want to move forward with this.

jhuber6 added a child revision: D122683: [OpenMP] Use new offloading binary when embedding offloading images.Mar 29 2022, 1:46 PM

Hey @jhuber6 , as discussed in multi-company meeting, I think that we will need at least an arch field somewhere in this. We would like to create multi-arch binaries so that runtime can load the compatible one on its own.
You may even consider using TargetID Format to store the list of archs.

In D122069#3416694, @saiislam wrote:

Hey @jhuber6 , as discussed in multi-company meeting, I think that we will need at least an arch field somewhere in this. We would like to create multi-arch binaries so that runtime can load the compatible one on its own.
You may even consider using TargetID Format to store the list of archs.

The binary format contains a string map along with some integer fields. I have the getArch() function in the binary that just extracts the value get the "arch" key. This makes it easy to add some arbitrary data so I was planning on simply adding a "features" key as well. Then we can extract the associated image features and decide what to do with the image. I haven't thought of a good solution for allowing multiple compatible architectures, maybe a comma separated list of architectures. You can see the proposed usage right now in D122683 but more will be added.

Ping, I'd like to finalize the new driver in time for the GPU newsletter and the LLVM Performance Workshop at CGO.

ping

Couple of nits above but basically I'm convinced. The gnarly part of binary formats is string tables and I'm delighted that part of MC was readily reusable. Wrapping the string table in different bytes to align with the elf format may still be a good idea but it's not an obvious correctness hazard.

I like msgpack as a format but the writing machinery in llvm is not very reusable. Likewise elf is a great format but quite interwoven with MC. Protobuf seems to have the nice property of concatenating objects yielding a valid protobuf but the cost of codegen that isn't presently part of the llvm core, and is a slightly hairy dependency to pull in.

Medium term, factoring out parts of the elf handling for use here (and in lld?) is probably reasonable, but the leading magic bytes here are sufficient that we could detect that in backwards-compat fashion if the release gets ahead of us. The format here is essentially a string map which is likely to meet future requirements from other platforms adequately.

Thanks for sticking with this!

llvm/include/llvm/Object/OffloadBinary.h
75	these should probably be returning the enum types
98	enums here as well? They have uint16_t specified in the type so layout is stable
llvm/lib/Object/OffloadBinary.cpp
21	Not sure Expected<> helps hugely here - stuff only goes wrong as 'parse_failed' or failed to allocate, which is kind of the same thing - so we could return a default-initialized (null) unique_ptr on failure without loss of information
42	this is good, string table building is by far the most tedious part of formats like this

This revision is now accepted and ready to land.Apr 13 2022, 7:27 AM

Maxing suggested changes.

Harbormaster completed remote builds in B159465: Diff 422544.Apr 13 2022, 10:19 AM

This revision was landed with ongoing or failed builds.Apr 14 2022, 7:51 AM

Closed by commit rGe471ba3d0122: [Object] Add binary format for bundling offloading metadata (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGe471ba3d0122: [Object] Add binary format for bundling offloading metadata.

MaskRay mentioned this in rG2108f7a243a5: [Object] Fix namespace style issues in D122069.Jun 1 2022, 5:05 PM

MaskRay mentioned this in rGecb1d8448843: OffloadBinary: Switch to MapVector<StringRef, StringRef> to stabilize iteration….Feb 4 2023, 12:35 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

OffloadBinary.h

148 lines

lib/

Object/

CMakeLists.txt

1 line

OffloadBinary.cpp

144 lines

unittests/

Object/

CMakeLists.txt

1 line

OffloadingTest.cpp

65 lines

Diff 422869

llvm/include/llvm/Object/OffloadBinary.h

This file was added.

				//===--- Offloading.h - Utilities for handling offloading code -- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the binary format used for budingling device metadata with
				// an associated device image. The data can then be stored inside a host object
				// file to create a fat binary and read by the linker. This is intended to be a
				// thin wrapper around the image itself. If this format becomes sufficiently
				// complex it should be moved to a standard binary format like msgpack or ELF.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_BINARYFORMAT_OFFLOADING_H
				#define LLVM_BINARYFORMAT_OFFLOADING_H

				#include "llvm/ADT/StringMap.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include <memory>

				namespace llvm {

				/// The producer of the associated offloading image.
				enum OffloadKind : uint16_t {
				OFK_None = 0,
				OFK_OpenMP,
				OFK_Cuda,
				OFK_HIP,
				};

				/// The type of contents the offloading image contains.
				enum ImageKind : uint16_t {
				IMG_None = 0,
				IMG_Object,
				IMG_Bitcode,
				IMG_Cubin,
				IMG_Fatbinary,
				IMG_PTX,
				};

				/// A simple binary serialization of an offloading file. We use this format to
				/// embed the offloading image into the host executable so it can be extracted
				/// and used by the linker.
				///
				/// Many of these could be stored in the same section by the time the linker
				/// sees it so we mark this information with a header. The version is used to
				/// detect ABI stability and the size is used to find other offloading entries
				/// that may exist in the same section. All offsets are given as absolute byte
				/// offsets from the beginning of the file.
				class OffloadBinary {
				public:
				/// The offloading metadata that will be serialized to a memory buffer.
				struct OffloadingImage {
				ImageKind TheImageKind;
				OffloadKind TheOffloadKind;
				uint32_t Flags;
				StringMap<StringRef> StringData;
				MemoryBufferRef Image;
				};

				/// Attempt to parse the offloading binary stored in \p Data.
				static Expected<std::unique_ptr<OffloadBinary>> create(MemoryBufferRef);

				/// Serialize the contents of \p File to a binary buffer to be read later.
				static std::unique_ptr<MemoryBuffer> write(const OffloadingImage &);

				static uint64_t getAlignment() { return alignof(Header); }

				ImageKind getImageKind() const { return TheEntry->TheImageKind; }
				OffloadKind getOffloadKind() const { return TheEntry->TheOffloadKind; }
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions these should probably be returning the enum types JonChesterfield: these should probably be returning the enum types
				uint32_t getFlags() const { return TheEntry->Flags; }
				uint64_t getSize() const { return TheHeader->Size; }

				StringRef getTriple() const { return getString("triple"); }
				StringRef getArch() const { return getString("arch"); }
				StringRef getImage() const {
				return StringRef(&Buffer[TheEntry->ImageOffset], TheEntry->ImageSize);
				}

				StringRef getString(StringRef Key) const { return StringData.lookup(Key); }

				private:
				struct Header {
				uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD}; // 0x10FF10AD magic bytes.
				uint32_t Version = 1; // Version identifier.
				uint64_t Size; // Size in bytes of this entire binary.
				uint64_t EntryOffset; // Offset of the metadata entry in bytes.
				uint64_t EntrySize; // Size of the metadata entry in bytes.
				};

				struct Entry {
				ImageKind TheImageKind; // The kind of the image stored.
				OffloadKind TheOffloadKind; // The producer of this image.
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions enums here as well? They have uint16_t specified in the type so layout is stable JonChesterfield: enums here as well? They have uint16_t specified in the type so layout is stable
				uint32_t Flags; // Additional flags associated with the image.
				uint64_t StringOffset; // Offset in bytes to the string map.
				uint64_t NumStrings; // Number of entries in the string map.
				uint64_t ImageOffset; // Offset in bytes of the actual binary image.
				uint64_t ImageSize; // Size in bytes of the binary image.
				};

				struct StringEntry {
				uint64_t KeyOffset;
				uint64_t ValueOffset;
				};

				OffloadBinary(const char Buffer, const Header TheHeader,
				const Entry *TheEntry)
				: Buffer(Buffer), TheHeader(TheHeader), TheEntry(TheEntry) {

				const StringEntry *StringMapBegin =
				reinterpret_cast<const StringEntry *>(&Buffer[TheEntry->StringOffset]);
				for (uint64_t I = 0, E = TheEntry->NumStrings; I != E; ++I) {
				StringRef Key = &Buffer[StringMapBegin[I].KeyOffset];
				StringData[Key] = &Buffer[StringMapBegin[I].ValueOffset];
				}
				}

				OffloadBinary(const OffloadBinary &Other) = delete;

				/// Map from keys to offsets in the binary.
				StringMap<StringRef> StringData;
				/// Pointer to the beginning of the memory buffer for convenience.
				const char *Buffer;
				/// Location of the header within the binary.
				const Header *TheHeader;
				/// Location of the metadata entries within the binary.
				const Entry *TheEntry;
				};

				/// Convert a string \p Name to an image kind.
				ImageKind getImageKind(StringRef Name);

				/// Convert an image kind to its string representation.
				StringRef getImageKindName(ImageKind Name);

				/// Convert a string \p Name to an offload kind.
				OffloadKind getOffloadKind(StringRef Name);

				/// Convert an offload kind to its string representation.
				StringRef getOffloadKindName(OffloadKind Name);

				} // namespace llvm
				#endif

llvm/lib/Object/CMakeLists.txt

Show All 12 Lines	add_llvm_component_library(LLVMObject
IRObjectFile.cpp		IRObjectFile.cpp
IRSymtab.cpp		IRSymtab.cpp
MachOObjectFile.cpp		MachOObjectFile.cpp
MachOUniversal.cpp		MachOUniversal.cpp
Minidump.cpp		Minidump.cpp
ModuleSymbolTable.cpp		ModuleSymbolTable.cpp
Object.cpp		Object.cpp
ObjectFile.cpp		ObjectFile.cpp
		OffloadBinary.cpp
RecordStreamer.cpp		RecordStreamer.cpp
RelocationResolver.cpp		RelocationResolver.cpp
SymbolicFile.cpp		SymbolicFile.cpp
SymbolSize.cpp		SymbolSize.cpp
TapiFile.cpp		TapiFile.cpp
TapiUniversal.cpp		TapiUniversal.cpp
MachOUniversalWriter.cpp		MachOUniversalWriter.cpp
WasmObjectFile.cpp		WasmObjectFile.cpp
Show All 20 Lines

llvm/lib/Object/OffloadBinary.cpp

This file was added.

				//===- Offloading.cpp - Utilities for handling offloading code -- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Object/OffloadBinary.h"

				#include "llvm/ADT/StringSwitch.h"
				#include "llvm/MC/StringTableBuilder.h"
				#include "llvm/Object/Error.h"
				#include "llvm/Support/FileOutputBuffer.h"

				using namespace llvm;

				namespace llvm {

				Expected<std::unique_ptr<OffloadBinary>>
				OffloadBinary::create(MemoryBufferRef Buf) {
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Not sure Expected<> helps hugely here - stuff only goes wrong as 'parse_failed' or failed to allocate, which is kind of the same thing - so we could return a default-initialized (null) unique_ptr on failure without loss of information JonChesterfield: Not sure Expected<> helps hugely here - stuff only goes wrong as 'parse_failed' or failed to…
				if (Buf.getBufferSize() < sizeof(Header) + sizeof(Entry))
				return errorCodeToError(llvm::object::object_error::parse_failed);

				// Check for 0x10FF1OAD magic bytes.
				if (!Buf.getBuffer().startswith("\x10\xFF\x10\xAD"))
				return errorCodeToError(llvm::object::object_error::parse_failed);

				const char *Start = Buf.getBufferStart();
				const Header TheHeader = reinterpret_cast<const Header >(Start);
				const Entry *TheEntry =
				reinterpret_cast<const Entry *>(&Start[TheHeader->EntryOffset]);

				return std::unique_ptr<OffloadBinary>(
				new OffloadBinary(Buf.getBufferStart(), TheHeader, TheEntry));
				}

				std::unique_ptr<MemoryBuffer>
				OffloadBinary::write(const OffloadingImage &OffloadingData) {
				// Create a null-terminated string table with all the used strings.
				StringTableBuilder StrTab(StringTableBuilder::ELF);
				for (auto &KeyAndValue : OffloadingData.StringData) {
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions this is good, string table building is by far the most tedious part of formats like this JonChesterfield: this is good, string table building is by far the most tedious part of formats like this
				StrTab.add(KeyAndValue.getKey());
				StrTab.add(KeyAndValue.getValue());
				}
				StrTab.finalize();

				uint64_t StringEntrySize =
				sizeof(StringEntry) * OffloadingData.StringData.size();

				// Create the header and fill in the offsets. The entry will be directly
				// placed after the header in memory. Align the size to the alignment of the
				// header so this can be placed contiguously in a single section.
				Header TheHeader;
				TheHeader.Size =
				alignTo(sizeof(Header) + sizeof(Entry) + StringEntrySize +
				OffloadingData.Image.getBufferSize() + StrTab.getSize(),
				getAlignment());
				TheHeader.EntryOffset = sizeof(Header);
				TheHeader.EntrySize = sizeof(Entry);

				// Create the entry using the string table offsets. The string table will be
				// placed directly after the entry in memory, and the image after that.
				Entry TheEntry;
				TheEntry.TheImageKind = OffloadingData.TheImageKind;
				TheEntry.TheOffloadKind = OffloadingData.TheOffloadKind;
				TheEntry.Flags = OffloadingData.Flags;
				TheEntry.StringOffset = sizeof(Header) + sizeof(Entry);
				TheEntry.NumStrings = OffloadingData.StringData.size();

				TheEntry.ImageOffset =
				sizeof(Header) + sizeof(Entry) + StringEntrySize + StrTab.getSize();
				TheEntry.ImageSize = OffloadingData.Image.getBufferSize();

				SmallVector<char, 1024> Data;
				raw_svector_ostream OS(Data);
				OS << StringRef(reinterpret_cast<char *>(&TheHeader), sizeof(Header));
				OS << StringRef(reinterpret_cast<char *>(&TheEntry), sizeof(Entry));
				for (auto &KeyAndValue : OffloadingData.StringData) {
				uint64_t Offset = sizeof(Header) + sizeof(Entry) + StringEntrySize;
				StringEntry Map{Offset + StrTab.getOffset(KeyAndValue.getKey()),
				Offset + StrTab.getOffset(KeyAndValue.getValue())};
				OS << StringRef(reinterpret_cast<char *>(&Map), sizeof(StringEntry));
				}
				StrTab.write(OS);
				OS << OffloadingData.Image.getBuffer();

				// Add final padding to required alignment.
				assert(TheHeader.Size >= OS.tell() && "Too much data written?");
				OS.write_zeros(TheHeader.Size - OS.tell());
				assert(TheHeader.Size == OS.tell() && "Size mismatch");

				return MemoryBuffer::getMemBufferCopy(OS.str());
				}

				OffloadKind getOffloadKind(StringRef Name) {
				return llvm::StringSwitch<OffloadKind>(Name)
				.Case("openmp", OFK_OpenMP)
				.Case("cuda", OFK_Cuda)
				.Case("hip", OFK_HIP)
				.Default(OFK_None);
				}

				StringRef getOffloadKindName(OffloadKind Kind) {
				switch (Kind) {
				case OFK_OpenMP:
				return "openmp";
				case OFK_Cuda:
				return "cuda";
				case OFK_HIP:
				return "hip";
				default:
				return "none";
				}
				}

				ImageKind getImageKind(StringRef Name) {
				return llvm::StringSwitch<ImageKind>(Name)
				.Case("o", IMG_Object)
				.Case("bc", IMG_Bitcode)
				.Case("cubin", IMG_Cubin)
				.Case("fatbin", IMG_Fatbinary)
				.Case("s", IMG_PTX)
				.Default(IMG_None);
				}

				StringRef getImageKindName(ImageKind Kind) {
				switch (Kind) {
				case IMG_Object:
				return "o";
				case IMG_Bitcode:
				return "bc";
				case IMG_Cubin:
				return "cubin";
				case IMG_Fatbinary:
				return "fatbin";
				case IMG_PTX:
				return "s";
				default:
				return "";
				}
				}

				} // namespace llvm

llvm/unittests/Object/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	BinaryFormat			BinaryFormat
	Object			Object
	ObjectYAML			ObjectYAML
	)			)

	add_llvm_unittest(ObjectTests			add_llvm_unittest(ObjectTests
	ArchiveTest.cpp			ArchiveTest.cpp
	ELFObjectFileTest.cpp			ELFObjectFileTest.cpp
	ELFTypesTest.cpp			ELFTypesTest.cpp
	ELFTest.cpp			ELFTest.cpp
	MinidumpTest.cpp			MinidumpTest.cpp
	ObjectFileTest.cpp			ObjectFileTest.cpp
				OffloadingTest.cpp
	SymbolSizeTest.cpp			SymbolSizeTest.cpp
	SymbolicFileTest.cpp			SymbolicFileTest.cpp
	XCOFFObjectFileTest.cpp			XCOFFObjectFileTest.cpp
	)			)

	target_link_libraries(ObjectTests PRIVATE LLVMTestingSupport)			target_link_libraries(ObjectTests PRIVATE LLVMTestingSupport)

llvm/unittests/Object/OffloadingTest.cpp

This file was added.

				#include "llvm/Object/OffloadBinary.h"

				#include "llvm/Testing/Support/Error.h"
				#include "gtest/gtest.h"
				#include <random>

				TEST(OffloadingTest, checkOffloadingBinary) {
				// Create random data to fill the image.
				std::mt19937 Rng(std::random_device{}());
				std::uniform_int_distribution<uint64_t> SizeDist(0, 256);
				std::uniform_int_distribution<uint16_t> KindDist(0);
				std::uniform_int_distribution<uint16_t> BinaryDist(
				std::numeric_limits<uint8_t>::min(), std::numeric_limits<uint8_t>::max());
				std::uniform_int_distribution<int16_t> StringDist('!', '~');
				std::vector<uint8_t> Image(SizeDist(Rng));
				std::generate(Image.begin(), Image.end(), [&]() { return BinaryDist(Rng); });
				std::vector<std::pair<std::string, std::string>> Strings(SizeDist(Rng));
				for (auto &KeyAndValue : Strings) {
				std::string Key(SizeDist(Rng), '\0');
				std::string Value(SizeDist(Rng), '\0');

				std::generate(Key.begin(), Key.end(), [&]() { return StringDist(Rng); });
				std::generate(Value.begin(), Value.end(),
				[&]() { return StringDist(Rng); });

				KeyAndValue = std::make_pair(Key, Value);
				}

				// Create the image.
				llvm::StringMap<llvm::StringRef> StringData;
				for (auto &KeyAndValue : Strings)
				StringData[KeyAndValue.first] = KeyAndValue.second;
				std::unique_ptr<llvm::MemoryBuffer> ImageData =
				llvm::MemoryBuffer::getMemBuffer(
				{reinterpret_cast<char *>(Image.data()), Image.size()}, "", false);

				llvm::OffloadBinary::OffloadingImage Data;
				Data.TheImageKind = static_cast<llvm::ImageKind>(KindDist(Rng));
				Data.TheOffloadKind = static_cast<llvm::OffloadKind>(KindDist(Rng));
				Data.Flags = KindDist(Rng);
				Data.StringData = StringData;
				Data.Image = *ImageData;

				auto BinaryBuffer = llvm::OffloadBinary::write(Data);

				auto BinaryOrErr = llvm::OffloadBinary::create(*BinaryBuffer);
				if (!BinaryOrErr)
				FAIL();

				// Make sure we get the same data out.
				auto &Binary = **BinaryOrErr;
				ASSERT_EQ(Data.TheImageKind, Binary.getImageKind());
				ASSERT_EQ(Data.TheOffloadKind, Binary.getOffloadKind());
				ASSERT_EQ(Data.Flags, Binary.getFlags());

				for (auto &KeyAndValue : Strings)
				ASSERT_TRUE(StringData[KeyAndValue.first] ==
				Binary.getString(KeyAndValue.first));

				EXPECT_TRUE(Data.Image.getBuffer() == Binary.getImage());

				// Ensure the size and alignment of the data is correct.
				EXPECT_TRUE(Binary.getSize() % llvm::OffloadBinary::getAlignment() == 0);
				EXPECT_TRUE(Binary.getSize() == BinaryBuffer->getBuffer().size());
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Object] Add binary format for bundling offloading metadataClosedPublic

Details

Diff Detail

Event Timeline

YAML / JSON

Protocol Buffers

ELF

MSGPACK

Custom format

Revision Contents

Diff 422869

llvm/include/llvm/Object/OffloadBinary.h

llvm/lib/Object/CMakeLists.txt

llvm/lib/Object/OffloadBinary.cpp

llvm/unittests/Object/CMakeLists.txt

llvm/unittests/Object/OffloadingTest.cpp

[Object] Add binary format for bundling offloading metadata
ClosedPublic