This is an archive of the discontinued LLVM Phabricator instance.

[Object][DX] Parse DXContainer Parts
ClosedPublic

Authored by beanz on May 2 2022, 1:52 PM.

Details

Summary

DXContainer files are structured as parts. This patch adds support for
parsing out the file part offsets and file part headers.

Diff Detail

Event Timeline

beanz created this revision.May 2 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 1:52 PM
beanz requested review of this revision.May 2 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 1:52 PM
beanz added a reviewer: kuhar.May 18 2022, 6:41 AM
beanz added a subscriber: kuhar.

Adding @kuhar.

beanz updated this revision to Diff 430366.May 18 2022, 6:55 AM

Fixing incorrect code coment.

kuhar requested changes to this revision.May 19 2022, 5:24 PM

On high level, would it be possible to use an existing class like llvm::BinaryStreamReader instead that most of this data reading machinery?

And a general nit: I'm biased towards using ArrayRefs to represent data and StringRefs to represent text, but I don't think there's a consensus on this.

llvm/include/llvm/BinaryFormat/DXContainer.h
86

nit: swapBytes? LLVM prefers starting function names with verbs, especially the state-mutating ones: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

llvm/include/llvm/Object/DXContainer.h
39–42

nit: could we pull this class out of DXContainer? I think this would make it a bit more readable, especially since we have another level of nesting with the PartData struct.

58

Could you add a comment with a brief description of what this function does? It's impossible to tell based on the signature alone.

llvm/lib/Object/DXContainer.cpp
34

Can you give P a more descriptive name and/or add a function comment?
Also, why not return Expected<T>

39

What if P is not properly aligned for T*?
If it must be, can we add an assertion? If not, I think we would have to use a memcpy.

Also, shouldn't we check that T is trivially copyable in the first place?

76–77

Would it be possible for OffsetIt to equal the end iterator and end up with an uninitialized IteratorState?

llvm/unittests/Object/DXContainerTest.cpp
45

Could we have a test with an empty buffer, so that we know that this corner case was considered and whether this is supported or not?

This revision now requires changes to proceed.May 19 2022, 5:24 PM
beanz added a comment.May 19 2022, 6:34 PM

libObject uses MemoryBufferRefs which play back and forth with StringRef, and while it has been pointed out over and over again that many of StringRef’s methods should probably be on MemoryBufferRef because they are more broadly useful, it is the way it is.

I think ArrayRef<char> is probably the least common pattern for representing arbitrary data buffers in LLVM, and given the needs of libObject to operate with MemoryBufferRef I think this code is best to operate on StringRefs.

llvm/lib/Object/DXContainer.cpp
34

Mostly just because the usage pattern for Error is cleaner in this case, but also it avoids a copy and is in-line able.

With Expected the code would be:

auto ExVal = readValue(…);
if (!ExVal)
  return ExVal.takeError();
Val = *ExVal;

Which is a bit less clean to me.

76–77

It shouldn’t be, but looking through I think there’s an edge case. I will update with a fix.

kuhar added a comment.May 19 2022, 8:28 PM

libObject uses MemoryBufferRefs which play back and forth with StringRef, and while it has been pointed out over and over again that many of StringRef’s methods should probably be on MemoryBufferRef because they are more broadly useful, it is the way it is.
I think ArrayRef<char> is probably the least common pattern for representing arbitrary data buffers in LLVM, and given the needs of libObject to operate with MemoryBufferRef I think this code is best to operate on StringRefs.

I'm much more used to ArrayRef<uint8_t>. For example, you can see it used in BinaryStreamReader::readLongestContiguousChunk and there's also a helper conversion function ArrayRef<uint8_t> arrayRefFromStringRef(StringRef Input) in StringExtras.h. In total, I counted ~1500 uses of ArrayRef<uint8_t> in the monorepo, so I don't consider it an uncommon pattern.

That being said, I think sticking with StringRef is also a practical option and I don't oppose it.

kuhar added inline comments.May 19 2022, 8:44 PM
llvm/lib/Object/DXContainer.cpp
34

I'd disagree here. Let's look at a callsite and the code that follows:

uint32_t PartOffset;
if (Error Err = readValue(Data.getBuffer(), Current, PartOffset))
  return Err;
Current += sizeof(uint32_t);
if (PartOffset + sizeof(dxbc::PartHeader) > Data.getBufferSize())
  return parseFailed("Part offset points beyond boundary of the file");
PartOffsets.push_back(PartOffset);

In general, I would not say it's immediately that PartOffset is an output parameter. Without seeing the definition, we don't know if passing PartOffset to readValue will lead to uninitialized reads or not. I find it much more idiomatic to use return values to return values. With Expected<T>, the callsite and the surrounding code would look like this:

Expected<uint32_t> PartOffset = readValue(Data.getBuffer(), Current);
if (Error Err = PartOffset.takeError())
  return Err;
Current += sizeof(uint32_t);
if (*PartOffset + sizeof(dxbc::PartHeader) > Data.getBufferSize())
  return parseFailed("Part offset points beyond boundary of the file");
PartOffsets.push_back(*PartOffset);

Here it's very clear to me what is being returned and I would not worry about any uninitialized values, even without checking the implementation of readValue. And overall, the number of lines stays the same.

I also typically don't prefix/suffix expected values with anything special, I don't think it improves readability when you can see the variable type. This is also what the google style guide does: https://abseil.io/tips/181.

To be absolutely clear, this is not a big deal IMO either way and feel free to stick with the current API if you strongly prefer it.

beanz added a comment.May 19 2022, 9:17 PM

In total, I counted ~1500 uses of ArrayRef<uint8_t> in the monorepo, so I don't consider it an uncommon pattern.

Scanning the full monorepo is probably not the best measure for stylistic code patterns. The “golden-rule” of LLVM’s style guidelines is to match existing code patterns, and that is generally applied to the code in the area that you’re modifying (https://llvm.org/docs/CodingStandards.html#golden-rule).

beanz updated this revision to Diff 431693.May 24 2022, 9:22 AM
beanz marked 6 inline comments as done.

Updates based on feedback from @kuhar. Thank you!

beanz added a comment.May 24 2022, 9:23 AM

Updates uploaded.

kuhar added inline comments.May 24 2022, 12:48 PM
llvm/include/llvm/Object/DXContainer.h
39–42

Have you considered this suggestion?

llvm/lib/Object/DXContainer.cpp
37

nit: I think uintptr_t might be more appropriate

beanz added inline comments.May 31 2022, 11:13 AM
llvm/include/llvm/Object/DXContainer.h
39–42

Sorry, I had a comment on this that I failed to submit.

The downside to pulling the iterator out of the class is that trivial methods end up needing to be implemented in the implementation files. For example, if you put the iterator first in the header, the iterator's constructor and updateIterator method need to be defined in the implementation file because they depend on the DXContainer class. Conversely if you put the iterator after DXContainer, DXContainer begin and end need to be in the implementation file.

Moving trivial functions to the implementation file prevents them from being inlined.

I also experimented with replacing PartData with a std::pair, but found the first/second accessors to be less intuitive to read than the named members.

beanz updated this revision to Diff 433383.Jun 1 2022, 7:01 AM

size_t -> uintptr_t

kuhar accepted this revision.Jun 1 2022, 10:29 AM

LGTM

llvm/include/llvm/Object/DXContainer.h
39–42

You could define functions in the header if you are worried about performance in non-LTO builds. However, if you think that this change does not simplify take code or improve readability, I think that leaving it as-is is also fine.

This revision is now accepted and ready to land.Jun 1 2022, 10:29 AM
This revision was landed with ongoing or failed builds.Jun 1 2022, 12:55 PM
This revision was automatically updated to reflect the committed changes.