Continue making updates to llvm-readobj to display resource sections. This is necessary for testing the up and coming cvtres tool.
Details
- Reviewers
zturner ruiu - Commits
- rGefef15a0c784: Update llvm-readobj -coff-resources to display tree structure.
rG33fca46ec386: Update llvm-readobj -coff-resources to display tree structure.
rL302399: Update llvm-readobj -coff-resources to display tree structure.
rL302386: Update llvm-readobj -coff-resources to display tree structure.
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | Hmm I'm not quite sure what you mean. A resource directory entry is composed of two separate 32-bit fields, 64 bits in total. The first field can be either an offset to a string name or an ID number. The second field is either an offset to the resource data or the offset to the next directory level down the tree. We must access both of these fields to read an entry, we might be able to use a ulittle32_t to read the first field and then move up 4 bytes to read the second field, but it seems much clearer to be able to access either of them by name. |
1074 ↗ | (On Diff #97479) | This should actually be a function that is part of the coff_resource_dir_entry structure. |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | I also agree that it's not an equivalent struct. For @ruiu's proposed definition, sizeof(coff_resource_dir_entry) == 4, but for the one in the patch, sizeof(coff_resource_dir_entry) == 8. |
627 ↗ | (On Diff #97560) | I think this enum should go in llvm/Support/COFF.h. It's confusing that there are two files named COFF.h, but this file is for object file layout stuff, and the other file is more semantic information and enum / constant definitions. |
655 ↗ | (On Diff #97560) | It's a little jarring to see methods in unions. I think part of the reason is that when you're looking at a union, it's nice for it to be as compact as possible so that you can get a picture of the entire layout of the union in one screen. Can you move all the methods out of the union and into the top level struct? With that said, we don't normally put constant <-> string translation functions in low level headers like this. Take a look at COFFDumper.cpp, you will see a lot of things like this: static const EnumEntry<COFF::DLLCharacteristics> PEDLLCharacteristics[] = { LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NX_COMPAT ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_SEH ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_BIND ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_APPCONTAINER ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_GUARD_CF ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE), }; I think we should do something similar for the ResourceID enum. Also, unless you're using a strongly typed enum (e.g. enum class), you shouldn't explicitly scope it by saying ResourceID::. So you have to pick one of the following two patterns: enum class ResourceID { Cursor, // Names are not RID_ prefixed because you have to scope it with ResourceID:: Bitmap, ... }; enum ResourceID { RID_Cursor, // Names are RID_ prefixed because you refer to them unqualified. RID_Bitmap, ... }; |
703 ↗ | (On Diff #97560) | IIUC you are are taking the lower 31 bits of NameOffset? Can you write this as: return maskTrailingOnes<uint32_t>(31) & NameOffset; This function is defined in MathExtras.h |
710 ↗ | (On Diff #97560) | Same here. |
723–725 ↗ | (On Diff #97560) | I don't think we should put this logic here. Although in practice the two structures appear immediately after each other, in theory they should be independent of each other and the parsing code should be responsible for knowing this. |
1146 ↗ | (On Diff #97560) | No need for the argument to be const. Also can you mark this constructor explicit? |
1155 ↗ | (On Diff #97560) | Instead of storing a StringRef, store a BinaryByteStream. This way you don't have to reconstruct it every time you want to read from it. And it's just as good as a StringRef |
llvm/lib/Object/COFFObjectFile.cpp | ||
1597–1602 ↗ | (On Diff #97560) | Is this function ever called? |
1614 ↗ | (On Diff #97560) | What's the +2 for? |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1549–1550 ↗ | (On Diff #97560) | Should RSF be passed by value? It's immutable in principle anyway since all it contains is a StringRef, and it's thin as well. Usually when classes are suffixed with Ref, LLVM convention is that we should pass them by value since they're "as good as references" |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
655 ↗ | (On Diff #97560) | I opted to follow the pattern of other enums in COFFDumper. |
llvm/lib/Object/COFFObjectFile.cpp | ||
1597–1602 ↗ | (On Diff #97560) | ah you're right I replaced this function and forgot to remove it. |
1614 ↗ | (On Diff #97560) | ah woops I didn't realize the reader updated its own offset as it reads. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1549–1550 ↗ | (On Diff #97560) | yeah that makes sense |
llvm/lib/Object/COFFObjectFile.cpp | ||
---|---|---|
1600–1601 ↗ | (On Diff #97750) | You can define a variable inside an if. if (std::error_code EC = ...) And this is preferred way if EC's scope is limited inside that if as narrower scope = better. |
1611 ↗ | (On Diff #97750) | This seems odd. Why can't you do RawDirString[i]? |
1623 ↗ | (On Diff #97750) | You can directly return it without the assignment. |
Getting much closer. Just a few more things.
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
1076 ↗ | (On Diff #97750) | You can write this line: : BBS(Ref, support::little) {} |
llvm/lib/Object/COFFObjectFile.cpp | ||
1600 ↗ | (On Diff #97750) | You can shorten all these by one line if you say: if (auto EC = errorToErrorCode(Reader.readInteger(Length))) return EC; You can shorten them all TO one line if you add a macro like: #define RETURN_IF_ERROR(X) \ if (auto EC = errorToErrorCode(X)) \ return EC; And then say: RETURN_IF_ERROR(Reader.readInteger(Length)); |
1609–1612 ↗ | (On Diff #97750) | It looks like you're just truncating the second byte of each string? You should probably do a proper UTF16 -> UTF8 conversion here. #include "llvm/Support/ConvertUTF.h" ... ArrayRef<UTF16> RawDirString; std::string DirString; RETURN_IF_ERROR(Reader.readArray(RawDirString, Length)); if (!llvm::convertUTF16ToUTF8String(RawDirString, DirString)) return // some kind of error. |
1623–1626 ↗ | (On Diff #97750) | It would be nice to get rid of all the casting, in part because it makes the logic hard to follow, but also in part because there's no protection in case of corrupt data. For example, this function can never return an error, even if Entry is bogus and makes you read off into nowhere. So I would write this: const coff_resource_dir_table *Table = nullptr; BinaryStreamReader Reader(BBS); Reader.setOffset(Entry->Offset.value()); RETURN_IF_ERROR(Reader.readObject(Table)); return Table; this way if for some reason the offset is bogus, you'll get a useful error. |
1630 ↗ | (On Diff #97750) | How about adding a function: ErrorOr<const coff_resource_dir_table&> getTableAtOffset(unsigned Offset) { const coff_resource_dir_table *Table = nullptr; BinaryStreamReader Reader(BBS); Reader.setOffset(Offset); RETURN_IF_ERROR(Reader.readObject(Table)); assert(Table != nullptr); return *Table; } Then have this funciton and the one above it be: ErrorOr<const coff_resource_dir_table&> getEntrySubDir(const coff_resource_dir_entry &Entry) { return getTableAtOffset(Entry->Offset.value()); } ErrorOr<const coff_resource_dir_table&> getBaseTable() { return getTableAtOffset(0); } Note also that I'm using references now instead of pointers, since they can't be null. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
121 ↗ | (On Diff #97750) | Can this be a const coff_resouce_dir_table & instead of a pointer? |
143–145 ↗ | (On Diff #97750) | References instead of pointers. |
502–504 ↗ | (On Diff #97750) | This is strange. This somehow got touched by clang-format. |
1586–1590 ↗ | (On Diff #97750) | Does this work? auto &Entry = error(EO); I know it does with Expected<T>, but not sure about ErrorOr<T>. |
1599 ↗ | (On Diff #97750) | This is wrong, because RawName will get destroyed at the end of this scope, and the StringRef will point to bogus memory. |
1618 ↗ | (On Diff #97750) | Does NextLevel = "Language" not work? |
1621 ↗ | (On Diff #97750) | For these verbose names, you can use auto. Although in this case the auto &X = error(T) pattern might work, so this whole block could just be simplified to: auto &NextTable = error(RSF.getEntrySubDir(Entry)); |
1643–1645 ↗ | (On Diff #97750) | I suppose this is fine now. Ignore my comments earlier about FixedStreamArray. It would be more idiomatic to use here if there was a class like struct CoffResourceDirTable { const coff_resource_dir_table *Header; FixedStreamArray<coff_resource_dir_entry> Entries; }; And then instead of having these functions return coff_resource_dir_entry& or coff_resource_dir_table&, have them return these structures instead. In any case, don't worry about this for now. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1586–1590 ↗ | (On Diff #97750) | Unfortunately error() does not accept type ErrorOr. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1586–1590 ↗ | (On Diff #97750) | I found a function unwrapOrError that does exactly what we want. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | This is a comment about the style, but for operators who's precedences are "obvious", we don't use that many parentheses. Everyone knows that + has lower precedence than >=. *(reinterpret_cast<foo>(bar)) is the same as *reinterpret_cast<foo>(bar). (&Table + 1) + Index is the same as &Table + 1 + Index. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | Although both are ugly IMO :) How about auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table); return TablePtr[1 + Index]; |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | Correction: auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table + 1); return TablePtr[Index]; |
This is failing on some builders, includes valgrind, possibly sanitizers.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/9118
http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/13960
llvm/lib/Object/COFFObjectFile.cpp | ||
---|---|---|
1610 ↗ | (On Diff #98117) | I can suggest that converter may be moved to caller, then this method may return StringRef. |
1612 ↗ | (On Diff #98117) | Don't return local std::string as StringRef. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1571 ↗ | (On Diff #98117) | It writes back part of SmallString via StringRef. |
Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?
- Quick fix to D32609, it seems .o files are not transferred in all cases.
- Hopefully one last commit to fix this patch, addresses string reference
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1571 ↗ | (On Diff #98117) | Is it necessarily incorrect to modify IDStr in this case? The StringRef points to something we own within this scope. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1571 ↗ | (On Diff #98117) | It is doing overlapping memcpy. I think it'd not be dangerous since memcpy is done toward upper address. That said, please be careful to write back string via StringRef. |
- Quick fix to D32609, it seems .o files are not transferred in all cases.
- Hopefully one last commit to fix this patch, addresses string reference
- Also fixed compiler warning of signed to unsigned conversion.
actually it seems that the stringref fix resolves the failures, so we can probably go ahead with the merge.
It looks like this broke SystemZ, and probably all big-endian platforms:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/8010/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aresources.test
From the symptoms, it appears that the little-endian UTF16 halfwords from the COFF binary never get translated to big-endian host byte order.