Continue making updates to llvm-readobj to display resource sections. This is necessary for testing the up and coming cvtres tool.
Details
- Reviewers
zturner ruiu - Commits
- rGefef15a0c784: Update llvm-readobj -coff-resources to display tree structure.
rG33fca46ec386: Update llvm-readobj -coff-resources to display tree structure.
rL302399: Update llvm-readobj -coff-resources to display tree structure.
rL302386: Update llvm-readobj -coff-resources to display tree structure.
Diff Detail
- Repository
- rL LLVM
Event Timeline
It looks like your diff includes all the changes from the previous patch which was already LGTM'ed. So that we don't have to re-review code which was already reviewed, can you create that includes only those changes which are new and re-upload?
In general you want to avoid checking in a binary file. Can you create that object file using llvm-mc? If so, please check in an assembly file or a .c file and generate an object file when the test is executed.
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | You are not using this class, are you? If you plan to use it later, please introduce this later. |
1068–1071 ↗ | (On Diff #96988) | I think it is better to use ErrorOr<T>. |
1076 ↗ | (On Diff #96988) | StringRef has a (non-explicit) constructor that takes char*, so you don't need to construct StringRef explicitly. You can just return a string literal. |
In addition to ruiu's comments about not checking in binary files, I think we also don't want to use outside projects as the source of our input files. It's pretty clear this object file came from a chrome binary. We should be coming making all of our own stuff from scratch, Chromium doesn't really have any kind of connection to LLVM so it doesn't make sense to use their bianries for our tests when we can just generate our own.
The 3MB .bmp image file seems too large. Could you use a smaller image? A 1x1 image should suffice.
Sorry, this is not yet ready for review, I have some intermediate changes to complete.
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | It is used in COFFDumper when we display each entry in a table. |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | So I do not see why you need this. Even if you need this, do you really need this many fields? They are actually pointing to the same memory location that is at offset 0 of a struct, so this is equivalent to union coff_resource_dir_entry { support::ulittle32_t NameOffset; support::ulittle32_t ID; support::ulittle32_t DataEntryOffset support::ulittle32_t SubdirOffset; }; It seems to me that you don't need to add this struct. You can just handle resource directory entry as a ulittle32_t, can't you? |
1076 ↗ | (On Diff #96988) | What about this? |
1072 ↗ | (On Diff #97479) | Does it compile? (Look at the stray ,). |
1074 ↗ | (On Diff #97479) | As a local convention of this directory, it seems functions that returns a name of something is named getSomething. So this should be named getIDName. (But what ID is this?) |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
1072 ↗ | (On Diff #97479) | Why is there a comma here? |
llvm/lib/Object/COFFObjectFile.cpp | ||
1604–1610 ↗ | (On Diff #97479) | This code has endianness problems. See later for suggested fix. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1537 ↗ | (On Diff #97479) | Can you use llvm::BinaryStreamReader here? This could would be written: BinaryByteStream Stream(Ref, support::little); BinaryStreamReader Reader(Stream); const coff_resource_dir_table *TypeTable; error(Reader.readObject(TypeTable)); // ... for (int i=0; i < TypeTable->NumberOfNameEntries; ++i) { const coff_resource_dir_entry *Entry; error(Reader.readObject(Entry)); uint32_t OldOffset = Reader.getOffset(); Reader.setOffset(Entry->NameOffset); uint16_t Length; StringRef TypeName; error(Reader.readInteger(Length)); error(Reader.readFixedString(TypeName, Length)); ListScope ResourceType(W, "Type: " + TypeName.str()); Reader.setOffset(OldOffset); } for (int i=0; i < TypeTable->NumberOfIDEntries; ++i) { const coff_resource_dir_entry *Entry; error(Reader.readObject(Entry)); StringRef TypeName = RSF.convertIDToType(Entry->Identifier.ID); ListScope ResourceType(W, "Type: " + TypeName.str()); } Note that your original code has a bug because it is using uint16_t to read an integer, instead of ulittle16_t. The code here fixes that since you specify the endianness in the stream. |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
1076 ↗ | (On Diff #97479) | Can we have an enumeration for this in COFF.h? enum class ResourceID : uint32_t { RID_Cursor = 1, RID_Bitmap = 2, ... }; |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | Hmm I'm not quite sure what you mean. A resource directory entry is composed of two separate 32-bit fields, 64 bits in total. The first field can be either an offset to a string name or an ID number. The second field is either an offset to the resource data or the offset to the next directory level down the tree. We must access both of these fields to read an entry, we might be able to use a ulittle32_t to read the first field and then move up 4 bytes to read the second field, but it seems much clearer to be able to access either of them by name. |
1074 ↗ | (On Diff #97479) | This should actually be a function that is part of the coff_resource_dir_entry structure. |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
641–644 ↗ | (On Diff #96988) | I also agree that it's not an equivalent struct. For @ruiu's proposed definition, sizeof(coff_resource_dir_entry) == 4, but for the one in the patch, sizeof(coff_resource_dir_entry) == 8. |
627 ↗ | (On Diff #97560) | I think this enum should go in llvm/Support/COFF.h. It's confusing that there are two files named COFF.h, but this file is for object file layout stuff, and the other file is more semantic information and enum / constant definitions. |
655 ↗ | (On Diff #97560) | It's a little jarring to see methods in unions. I think part of the reason is that when you're looking at a union, it's nice for it to be as compact as possible so that you can get a picture of the entire layout of the union in one screen. Can you move all the methods out of the union and into the top level struct? With that said, we don't normally put constant <-> string translation functions in low level headers like this. Take a look at COFFDumper.cpp, you will see a lot of things like this: static const EnumEntry<COFF::DLLCharacteristics> PEDLLCharacteristics[] = { LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NX_COMPAT ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_SEH ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_BIND ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_APPCONTAINER ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_GUARD_CF ), LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE), }; I think we should do something similar for the ResourceID enum. Also, unless you're using a strongly typed enum (e.g. enum class), you shouldn't explicitly scope it by saying ResourceID::. So you have to pick one of the following two patterns: enum class ResourceID { Cursor, // Names are not RID_ prefixed because you have to scope it with ResourceID:: Bitmap, ... }; enum ResourceID { RID_Cursor, // Names are RID_ prefixed because you refer to them unqualified. RID_Bitmap, ... }; |
703 ↗ | (On Diff #97560) | IIUC you are are taking the lower 31 bits of NameOffset? Can you write this as: return maskTrailingOnes<uint32_t>(31) & NameOffset; This function is defined in MathExtras.h |
710 ↗ | (On Diff #97560) | Same here. |
723–725 ↗ | (On Diff #97560) | I don't think we should put this logic here. Although in practice the two structures appear immediately after each other, in theory they should be independent of each other and the parsing code should be responsible for knowing this. |
1146 ↗ | (On Diff #97560) | No need for the argument to be const. Also can you mark this constructor explicit? |
1155 ↗ | (On Diff #97560) | Instead of storing a StringRef, store a BinaryByteStream. This way you don't have to reconstruct it every time you want to read from it. And it's just as good as a StringRef |
llvm/lib/Object/COFFObjectFile.cpp | ||
1597–1602 ↗ | (On Diff #97560) | Is this function ever called? |
1614 ↗ | (On Diff #97560) | What's the +2 for? |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1549–1550 ↗ | (On Diff #97560) | Should RSF be passed by value? It's immutable in principle anyway since all it contains is a StringRef, and it's thin as well. Usually when classes are suffixed with Ref, LLVM convention is that we should pass them by value since they're "as good as references" |
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
655 ↗ | (On Diff #97560) | I opted to follow the pattern of other enums in COFFDumper. |
llvm/lib/Object/COFFObjectFile.cpp | ||
1597–1602 ↗ | (On Diff #97560) | ah you're right I replaced this function and forgot to remove it. |
1614 ↗ | (On Diff #97560) | ah woops I didn't realize the reader updated its own offset as it reads. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1549–1550 ↗ | (On Diff #97560) | yeah that makes sense |
llvm/lib/Object/COFFObjectFile.cpp | ||
---|---|---|
1600–1601 ↗ | (On Diff #97750) | You can define a variable inside an if. if (std::error_code EC = ...) And this is preferred way if EC's scope is limited inside that if as narrower scope = better. |
1611 ↗ | (On Diff #97750) | This seems odd. Why can't you do RawDirString[i]? |
1623 ↗ | (On Diff #97750) | You can directly return it without the assignment. |
Getting much closer. Just a few more things.
llvm/include/llvm/Object/COFF.h | ||
---|---|---|
1076 ↗ | (On Diff #97750) | You can write this line: : BBS(Ref, support::little) {} |
llvm/lib/Object/COFFObjectFile.cpp | ||
1600 ↗ | (On Diff #97750) | You can shorten all these by one line if you say: if (auto EC = errorToErrorCode(Reader.readInteger(Length))) return EC; You can shorten them all TO one line if you add a macro like: #define RETURN_IF_ERROR(X) \ if (auto EC = errorToErrorCode(X)) \ return EC; And then say: RETURN_IF_ERROR(Reader.readInteger(Length)); |
1609–1612 ↗ | (On Diff #97750) | It looks like you're just truncating the second byte of each string? You should probably do a proper UTF16 -> UTF8 conversion here. #include "llvm/Support/ConvertUTF.h" ... ArrayRef<UTF16> RawDirString; std::string DirString; RETURN_IF_ERROR(Reader.readArray(RawDirString, Length)); if (!llvm::convertUTF16ToUTF8String(RawDirString, DirString)) return // some kind of error. |
1623–1626 ↗ | (On Diff #97750) | It would be nice to get rid of all the casting, in part because it makes the logic hard to follow, but also in part because there's no protection in case of corrupt data. For example, this function can never return an error, even if Entry is bogus and makes you read off into nowhere. So I would write this: const coff_resource_dir_table *Table = nullptr; BinaryStreamReader Reader(BBS); Reader.setOffset(Entry->Offset.value()); RETURN_IF_ERROR(Reader.readObject(Table)); return Table; this way if for some reason the offset is bogus, you'll get a useful error. |
1630 ↗ | (On Diff #97750) | How about adding a function: ErrorOr<const coff_resource_dir_table&> getTableAtOffset(unsigned Offset) { const coff_resource_dir_table *Table = nullptr; BinaryStreamReader Reader(BBS); Reader.setOffset(Offset); RETURN_IF_ERROR(Reader.readObject(Table)); assert(Table != nullptr); return *Table; } Then have this funciton and the one above it be: ErrorOr<const coff_resource_dir_table&> getEntrySubDir(const coff_resource_dir_entry &Entry) { return getTableAtOffset(Entry->Offset.value()); } ErrorOr<const coff_resource_dir_table&> getBaseTable() { return getTableAtOffset(0); } Note also that I'm using references now instead of pointers, since they can't be null. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
121 ↗ | (On Diff #97750) | Can this be a const coff_resouce_dir_table & instead of a pointer? |
143–145 ↗ | (On Diff #97750) | References instead of pointers. |
502–504 ↗ | (On Diff #97750) | This is strange. This somehow got touched by clang-format. |
1586–1590 ↗ | (On Diff #97750) | Does this work? auto &Entry = error(EO); I know it does with Expected<T>, but not sure about ErrorOr<T>. |
1599 ↗ | (On Diff #97750) | This is wrong, because RawName will get destroyed at the end of this scope, and the StringRef will point to bogus memory. |
1618 ↗ | (On Diff #97750) | Does NextLevel = "Language" not work? |
1621 ↗ | (On Diff #97750) | For these verbose names, you can use auto. Although in this case the auto &X = error(T) pattern might work, so this whole block could just be simplified to: auto &NextTable = error(RSF.getEntrySubDir(Entry)); |
1643–1645 ↗ | (On Diff #97750) | I suppose this is fine now. Ignore my comments earlier about FixedStreamArray. It would be more idiomatic to use here if there was a class like struct CoffResourceDirTable { const coff_resource_dir_table *Header; FixedStreamArray<coff_resource_dir_entry> Entries; }; And then instead of having these functions return coff_resource_dir_entry& or coff_resource_dir_table&, have them return these structures instead. In any case, don't worry about this for now. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1586–1590 ↗ | (On Diff #97750) | Unfortunately error() does not accept type ErrorOr. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1586–1590 ↗ | (On Diff #97750) | I found a function unwrapOrError that does exactly what we want. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | This is a comment about the style, but for operators who's precedences are "obvious", we don't use that many parentheses. Everyone knows that + has lower precedence than >=. *(reinterpret_cast<foo>(bar)) is the same as *reinterpret_cast<foo>(bar). (&Table + 1) + Index is the same as &Table + 1 + Index. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | Although both are ugly IMO :) How about auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table); return TablePtr[1 + Index]; |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1628–1631 ↗ | (On Diff #97877) | Correction: auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table + 1); return TablePtr[Index]; |
This is failing on some builders, includes valgrind, possibly sanitizers.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/9118
http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/13960
llvm/lib/Object/COFFObjectFile.cpp | ||
---|---|---|
1610 ↗ | (On Diff #98117) | I can suggest that converter may be moved to caller, then this method may return StringRef. |
1612 ↗ | (On Diff #98117) | Don't return local std::string as StringRef. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
1571 ↗ | (On Diff #98117) | It writes back part of SmallString via StringRef. |
Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?
- Quick fix to D32609, it seems .o files are not transferred in all cases.
- Hopefully one last commit to fix this patch, addresses string reference
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1571 ↗ | (On Diff #98117) | Is it necessarily incorrect to modify IDStr in this case? The StringRef points to something we own within this scope. |
llvm/tools/llvm-readobj/COFFDumper.cpp | ||
---|---|---|
1571 ↗ | (On Diff #98117) | It is doing overlapping memcpy. I think it'd not be dangerous since memcpy is done toward upper address. That said, please be careful to write back string via StringRef. |
- Quick fix to D32609, it seems .o files are not transferred in all cases.
- Hopefully one last commit to fix this patch, addresses string reference
- Also fixed compiler warning of signed to unsigned conversion.
actually it seems that the stringref fix resolves the failures, so we can probably go ahead with the merge.
It looks like this broke SystemZ, and probably all big-endian platforms:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/8010/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aresources.test
From the symptoms, it appears that the little-endian UTF16 halfwords from the COFF binary never get translated to big-endian host byte order.