DXContainer files are structured as parts. This patch adds support for
parsing out the file part offsets and file part headers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
On high level, would it be possible to use an existing class like llvm::BinaryStreamReader instead that most of this data reading machinery?
And a general nit: I'm biased towards using ArrayRefs to represent data and StringRefs to represent text, but I don't think there's a consensus on this.
llvm/include/llvm/BinaryFormat/DXContainer.h | ||
---|---|---|
86 | nit: swapBytes? LLVM prefers starting function names with verbs, especially the state-mutating ones: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly | |
llvm/include/llvm/Object/DXContainer.h | ||
39–42 | nit: could we pull this class out of DXContainer? I think this would make it a bit more readable, especially since we have another level of nesting with the PartData struct. | |
58 | Could you add a comment with a brief description of what this function does? It's impossible to tell based on the signature alone. | |
llvm/lib/Object/DXContainer.cpp | ||
34 | Can you give P a more descriptive name and/or add a function comment? | |
39 | What if P is not properly aligned for T*? Also, shouldn't we check that T is trivially copyable in the first place? | |
76–77 | Would it be possible for OffsetIt to equal the end iterator and end up with an uninitialized IteratorState? | |
llvm/unittests/Object/DXContainerTest.cpp | ||
56 | Could we have a test with an empty buffer, so that we know that this corner case was considered and whether this is supported or not? |
libObject uses MemoryBufferRefs which play back and forth with StringRef, and while it has been pointed out over and over again that many of StringRef’s methods should probably be on MemoryBufferRef because they are more broadly useful, it is the way it is.
I think ArrayRef<char> is probably the least common pattern for representing arbitrary data buffers in LLVM, and given the needs of libObject to operate with MemoryBufferRef I think this code is best to operate on StringRefs.
llvm/lib/Object/DXContainer.cpp | ||
---|---|---|
34 | Mostly just because the usage pattern for Error is cleaner in this case, but also it avoids a copy and is in-line able. With Expected the code would be: auto ExVal = readValue(…); if (!ExVal) return ExVal.takeError(); Val = *ExVal; Which is a bit less clean to me. | |
76–77 | It shouldn’t be, but looking through I think there’s an edge case. I will update with a fix. |
I'm much more used to ArrayRef<uint8_t>. For example, you can see it used in BinaryStreamReader::readLongestContiguousChunk and there's also a helper conversion function ArrayRef<uint8_t> arrayRefFromStringRef(StringRef Input) in StringExtras.h. In total, I counted ~1500 uses of ArrayRef<uint8_t> in the monorepo, so I don't consider it an uncommon pattern.
That being said, I think sticking with StringRef is also a practical option and I don't oppose it.
llvm/lib/Object/DXContainer.cpp | ||
---|---|---|
34 | I'd disagree here. Let's look at a callsite and the code that follows: uint32_t PartOffset; if (Error Err = readValue(Data.getBuffer(), Current, PartOffset)) return Err; Current += sizeof(uint32_t); if (PartOffset + sizeof(dxbc::PartHeader) > Data.getBufferSize()) return parseFailed("Part offset points beyond boundary of the file"); PartOffsets.push_back(PartOffset); In general, I would not say it's immediately that PartOffset is an output parameter. Without seeing the definition, we don't know if passing PartOffset to readValue will lead to uninitialized reads or not. I find it much more idiomatic to use return values to return values. With Expected<T>, the callsite and the surrounding code would look like this: Expected<uint32_t> PartOffset = readValue(Data.getBuffer(), Current); if (Error Err = PartOffset.takeError()) return Err; Current += sizeof(uint32_t); if (*PartOffset + sizeof(dxbc::PartHeader) > Data.getBufferSize()) return parseFailed("Part offset points beyond boundary of the file"); PartOffsets.push_back(*PartOffset); Here it's very clear to me what is being returned and I would not worry about any uninitialized values, even without checking the implementation of readValue. And overall, the number of lines stays the same. I also typically don't prefix/suffix expected values with anything special, I don't think it improves readability when you can see the variable type. This is also what the google style guide does: https://abseil.io/tips/181. To be absolutely clear, this is not a big deal IMO either way and feel free to stick with the current API if you strongly prefer it. |
Scanning the full monorepo is probably not the best measure for stylistic code patterns. The “golden-rule” of LLVM’s style guidelines is to match existing code patterns, and that is generally applied to the code in the area that you’re modifying (https://llvm.org/docs/CodingStandards.html#golden-rule).
llvm/include/llvm/Object/DXContainer.h | ||
---|---|---|
39–42 | Sorry, I had a comment on this that I failed to submit. The downside to pulling the iterator out of the class is that trivial methods end up needing to be implemented in the implementation files. For example, if you put the iterator first in the header, the iterator's constructor and updateIterator method need to be defined in the implementation file because they depend on the DXContainer class. Conversely if you put the iterator after DXContainer, DXContainer begin and end need to be in the implementation file. Moving trivial functions to the implementation file prevents them from being inlined. I also experimented with replacing PartData with a std::pair, but found the first/second accessors to be less intuitive to read than the named members. |
LGTM
llvm/include/llvm/Object/DXContainer.h | ||
---|---|---|
39–42 | You could define functions in the header if you are worried about performance in non-LTO builds. However, if you think that this change does not simplify take code or improve readability, I think that leaving it as-is is also fine. |
nit: swapBytes? LLVM prefers starting function names with verbs, especially the state-mutating ones: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly