This is an archive of the discontinued LLVM Phabricator instance.

Update llvm-readobj -coff-resources to display tree structure.
ClosedPublic

Authored by ecbeckmann on Apr 27 2017, 11:33 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ecbeckmann created this revision.Apr 27 2017, 11:33 AM
ecbeckmann edited the summary of this revision. (Show Details)Apr 27 2017, 11:42 AM
zturner edited edge metadata.Apr 27 2017, 11:43 AM

It looks like your diff includes all the changes from the previous patch which was already LGTM'ed. So that we don't have to re-review code which was already reviewed, can you create that includes only those changes which are new and re-upload?

ecbeckmann updated this revision to Diff 96988.Apr 27 2017, 1:20 PM

Rebase to build off of boilerplate patch.

ecbeckmann edited the summary of this revision. (Show Details)Apr 27 2017, 1:20 PM
ruiu edited edge metadata.Apr 27 2017, 1:34 PM

In general you want to avoid checking in a binary file. Can you create that object file using llvm-mc? If so, please check in an assembly file or a .c file and generate an object file when the test is executed.

llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

You are not using this class, are you? If you plan to use it later, please introduce this later.

1068–1071 ↗(On Diff #96988)

I think it is better to use ErrorOr<T>.

1076 ↗(On Diff #96988)

StringRef has a (non-explicit) constructor that takes char*, so you don't need to construct StringRef explicitly. You can just return a string literal.

In addition to ruiu's comments about not checking in binary files, I think we also don't want to use outside projects as the source of our input files. It's pretty clear this object file came from a chrome binary. We should be coming making all of our own stuff from scratch, Chromium doesn't really have any kind of connection to LLVM so it doesn't make sense to use their bianries for our tests when we can just generate our own.

ecbeckmann updated this revision to Diff 97025.Apr 27 2017, 5:20 PM

Added a new test using my own rc file, also fixed minor bugs in COFFDumper.

ruiu added a comment.Apr 27 2017, 5:22 PM

The 3MB .bmp image file seems too large. Could you use a smaller image? A 1x1 image should suffice.

ecbeckmann updated this revision to Diff 97479.May 2 2017, 11:12 AM

Use tiny bmp instead of large ones, for efficiency.

Sorry, this is not yet ready for review, I have some intermediate changes to complete.

llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

It is used in COFFDumper when we display each entry in a table.

ruiu added inline comments.May 2 2017, 11:22 AM
llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

So I do not see why you need this. Even if you need this, do you really need this many fields? They are actually pointing to the same memory location that is at offset 0 of a struct, so this is equivalent to

union coff_resource_dir_entry {
  support::ulittle32_t NameOffset;
  support::ulittle32_t ID;
  support::ulittle32_t DataEntryOffset
  support::ulittle32_t SubdirOffset;
};

It seems to me that you don't need to add this struct. You can just handle resource directory entry as a ulittle32_t, can't you?

1076 ↗(On Diff #96988)

What about this?

1072 ↗(On Diff #97479)

Does it compile? (Look at the stray ,).

1074 ↗(On Diff #97479)

As a local convention of this directory, it seems functions that returns a name of something is named getSomething. So this should be named getIDName. (But what ID is this?)

zturner added inline comments.May 2 2017, 11:30 AM
llvm/include/llvm/Object/COFF.h
1072 ↗(On Diff #97479)

Why is there a comma here?

llvm/lib/Object/COFFObjectFile.cpp
1604–1610 ↗(On Diff #97479)

This code has endianness problems. See later for suggested fix.

llvm/tools/llvm-readobj/COFFDumper.cpp
1537 ↗(On Diff #97479)

Can you use llvm::BinaryStreamReader here? This could would be written:

BinaryByteStream Stream(Ref, support::little);
BinaryStreamReader Reader(Stream);
const coff_resource_dir_table *TypeTable;
error(Reader.readObject(TypeTable));
// ...
for (int i=0; i < TypeTable->NumberOfNameEntries; ++i) {
  const coff_resource_dir_entry *Entry;
  error(Reader.readObject(Entry));
  uint32_t OldOffset = Reader.getOffset();

  Reader.setOffset(Entry->NameOffset);
  uint16_t Length;
  StringRef TypeName;
  error(Reader.readInteger(Length));
  error(Reader.readFixedString(TypeName, Length));
  ListScope ResourceType(W, "Type: " + TypeName.str());

  Reader.setOffset(OldOffset);
}

for (int i=0; i < TypeTable->NumberOfIDEntries; ++i) {
  const coff_resource_dir_entry *Entry;
  error(Reader.readObject(Entry));

  StringRef TypeName = RSF.convertIDToType(Entry->Identifier.ID);
  ListScope ResourceType(W, "Type: " + TypeName.str());
}

Note that your original code has a bug because it is using uint16_t to read an integer, instead of ulittle16_t. The code here fixes that since you specify the endianness in the stream.

zturner added inline comments.May 2 2017, 11:33 AM
llvm/include/llvm/Object/COFF.h
1076 ↗(On Diff #97479)

Can we have an enumeration for this in COFF.h?

enum class ResourceID : uint32_t {
  RID_Cursor = 1,
  RID_Bitmap = 2,
  ...
};
ecbeckmann updated this revision to Diff 97560.May 2 2017, 11:55 PM
ecbeckmann marked 8 inline comments as done.

Recursively display entire tree.

ecbeckmann added inline comments.May 2 2017, 11:56 PM
llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

Hmm I'm not quite sure what you mean. A resource directory entry is composed of two separate 32-bit fields, 64 bits in total. The first field can be either an offset to a string name or an ID number. The second field is either an offset to the resource data or the offset to the next directory level down the tree. We must access both of these fields to read an entry, we might be able to use a ulittle32_t to read the first field and then move up 4 bytes to read the second field, but it seems much clearer to be able to access either of them by name.

1074 ↗(On Diff #97479)

This should actually be a function that is part of the coff_resource_dir_entry structure.

zturner added inline comments.May 3 2017, 10:04 AM
llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

I also agree that it's not an equivalent struct. For @ruiu's proposed definition, sizeof(coff_resource_dir_entry) == 4, but for the one in the patch, sizeof(coff_resource_dir_entry) == 8.

627 ↗(On Diff #97560)

I think this enum should go in llvm/Support/COFF.h. It's confusing that there are two files named COFF.h, but this file is for object file layout stuff, and the other file is more semantic information and enum / constant definitions.

655 ↗(On Diff #97560)

It's a little jarring to see methods in unions. I think part of the reason is that when you're looking at a union, it's nice for it to be as compact as possible so that you can get a picture of the entire layout of the union in one screen.

Can you move all the methods out of the union and into the top level struct?

With that said, we don't normally put constant <-> string translation functions in low level headers like this. Take a look at COFFDumper.cpp, you will see a lot of things like this:

static const EnumEntry<COFF::DLLCharacteristics> PEDLLCharacteristics[] = {
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA      ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY      ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NX_COMPAT            ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_SEH               ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_BIND              ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_APPCONTAINER         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER           ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_GUARD_CF             ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE),
};

I think we should do something similar for the ResourceID enum.

Also, unless you're using a strongly typed enum (e.g. enum class), you shouldn't explicitly scope it by saying ResourceID::. So you have to pick one of the following two patterns:

enum class ResourceID {
  Cursor,   // Names are not RID_ prefixed because you have to scope it with ResourceID::
  Bitmap,
  ...
};

enum ResourceID {
  RID_Cursor,   // Names are RID_ prefixed because you refer to them unqualified.
  RID_Bitmap,
  ...
};
703 ↗(On Diff #97560)

IIUC you are are taking the lower 31 bits of NameOffset? Can you write this as:

return maskTrailingOnes<uint32_t>(31) & NameOffset;

This function is defined in MathExtras.h

710 ↗(On Diff #97560)

Same here.

723–725 ↗(On Diff #97560)

I don't think we should put this logic here. Although in practice the two structures appear immediately after each other, in theory they should be independent of each other and the parsing code should be responsible for knowing this.

1146 ↗(On Diff #97560)

No need for the argument to be const. Also can you mark this constructor explicit?

1155 ↗(On Diff #97560)

Instead of storing a StringRef, store a BinaryByteStream. This way you don't have to reconstruct it every time you want to read from it. And it's just as good as a StringRef

llvm/lib/Object/COFFObjectFile.cpp
1597–1602 ↗(On Diff #97560)

Is this function ever called?

1614 ↗(On Diff #97560)

What's the +2 for?

llvm/tools/llvm-readobj/COFFDumper.cpp
1549–1550 ↗(On Diff #97560)

Should RSF be passed by value? It's immutable in principle anyway since all it contains is a StringRef, and it's thin as well. Usually when classes are suffixed with Ref, LLVM convention is that we should pass them by value since they're "as good as references"

ecbeckmann updated this revision to Diff 97750.May 3 2017, 5:04 PM
ecbeckmann marked 10 inline comments as done.

Refactored parsing funcationality into more logical places.

ecbeckmann added inline comments.May 3 2017, 5:05 PM
llvm/include/llvm/Object/COFF.h
655 ↗(On Diff #97560)

I opted to follow the pattern of other enums in COFFDumper.

llvm/lib/Object/COFFObjectFile.cpp
1597–1602 ↗(On Diff #97560)

ah you're right I replaced this function and forgot to remove it.

1614 ↗(On Diff #97560)

ah woops I didn't realize the reader updated its own offset as it reads.

llvm/tools/llvm-readobj/COFFDumper.cpp
1549–1550 ↗(On Diff #97560)

yeah that makes sense

ruiu added inline comments.May 3 2017, 5:30 PM
llvm/lib/Object/COFFObjectFile.cpp
1600–1601 ↗(On Diff #97750)

You can define a variable inside an if.

if (std::error_code EC = ...)

And this is preferred way if EC's scope is limited inside that if as narrower scope = better.

1611 ↗(On Diff #97750)

This seems odd. Why can't you do RawDirString[i]?

1623 ↗(On Diff #97750)

You can directly return it without the assignment.

ecbeckmann updated this revision to Diff 97757.May 3 2017, 5:49 PM
ecbeckmann marked 3 inline comments as done.

Cleaned up error code checks.

Getting much closer. Just a few more things.

llvm/include/llvm/Object/COFF.h
1076 ↗(On Diff #97750)

You can write this line:

: BBS(Ref, support::little) {}
llvm/lib/Object/COFFObjectFile.cpp
1600 ↗(On Diff #97750)

You can shorten all these by one line if you say:

if (auto EC = errorToErrorCode(Reader.readInteger(Length)))
  return EC;

You can shorten them all TO one line if you add a macro like:

#define RETURN_IF_ERROR(X) \
if (auto EC = errorToErrorCode(X)) \
  return EC;

And then say:

RETURN_IF_ERROR(Reader.readInteger(Length));
1609–1612 ↗(On Diff #97750)

It looks like you're just truncating the second byte of each string? You should probably do a proper UTF16 -> UTF8 conversion here.

#include "llvm/Support/ConvertUTF.h"
...
ArrayRef<UTF16> RawDirString;
std::string DirString;
RETURN_IF_ERROR(Reader.readArray(RawDirString, Length));
if (!llvm::convertUTF16ToUTF8String(RawDirString, DirString))
  return // some kind of error.
1623–1626 ↗(On Diff #97750)

It would be nice to get rid of all the casting, in part because it makes the logic hard to follow, but also in part because there's no protection in case of corrupt data. For example, this function can never return an error, even if Entry is bogus and makes you read off into nowhere. So I would write this:

const coff_resource_dir_table *Table = nullptr;

BinaryStreamReader Reader(BBS);
Reader.setOffset(Entry->Offset.value());
RETURN_IF_ERROR(Reader.readObject(Table));
return Table;

this way if for some reason the offset is bogus, you'll get a useful error.

1630 ↗(On Diff #97750)

How about adding a function:

ErrorOr<const coff_resource_dir_table&> getTableAtOffset(unsigned Offset) {
const coff_resource_dir_table *Table = nullptr;

BinaryStreamReader Reader(BBS);
Reader.setOffset(Offset);
RETURN_IF_ERROR(Reader.readObject(Table));
assert(Table != nullptr);
return *Table;
}

Then have this funciton and the one above it be:

ErrorOr<const coff_resource_dir_table&> getEntrySubDir(const coff_resource_dir_entry &Entry) {
  return getTableAtOffset(Entry->Offset.value());
}

ErrorOr<const coff_resource_dir_table&> getBaseTable() {
  return getTableAtOffset(0);
}

Note also that I'm using references now instead of pointers, since they can't be null.

llvm/tools/llvm-readobj/COFFDumper.cpp
121 ↗(On Diff #97750)

Can this be a const coff_resouce_dir_table & instead of a pointer?

143–145 ↗(On Diff #97750)

References instead of pointers.

502–504 ↗(On Diff #97750)

This is strange. This somehow got touched by clang-format.

1586–1590 ↗(On Diff #97750)

Does this work?

auto &Entry = error(EO);

I know it does with Expected<T>, but not sure about ErrorOr<T>.

1599 ↗(On Diff #97750)

This is wrong, because RawName will get destroyed at the end of this scope, and the StringRef will point to bogus memory.

1618 ↗(On Diff #97750)

Does NextLevel = "Language" not work?

1621 ↗(On Diff #97750)

For these verbose names, you can use auto. Although in this case the auto &X = error(T) pattern might work, so this whole block could just be simplified to:

auto &NextTable = error(RSF.getEntrySubDir(Entry));
1643–1645 ↗(On Diff #97750)

I suppose this is fine now. Ignore my comments earlier about FixedStreamArray. It would be more idiomatic to use here if there was a class like

struct CoffResourceDirTable {
  const coff_resource_dir_table *Header;
  FixedStreamArray<coff_resource_dir_entry> Entries;
};

And then instead of having these functions return coff_resource_dir_entry& or coff_resource_dir_table&, have them return these structures instead.

In any case, don't worry about this for now.

ecbeckmann updated this revision to Diff 97774.May 3 2017, 11:03 PM
ecbeckmann marked 11 inline comments as done.

Some more cleanup.

ecbeckmann added inline comments.May 3 2017, 11:06 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1586–1590 ↗(On Diff #97750)

Unfortunately error() does not accept type ErrorOr.

ecbeckmann updated this revision to Diff 97778.May 3 2017, 11:35 PM
ecbeckmann marked an inline comment as not done.

Fixed formatting issues.

ecbeckmann updated this revision to Diff 97877.May 4 2017, 2:19 PM

Use the unwrapOrError function for clarity.

ecbeckmann added inline comments.May 4 2017, 2:20 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1586–1590 ↗(On Diff #97750)

I found a function unwrapOrError that does exactly what we want.

ruiu added inline comments.May 4 2017, 2:23 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

This is a comment about the style, but for operators who's precedences are "obvious", we don't use that many parentheses.

Everyone knows that + has lower precedence than >=.

*(reinterpret_cast<foo>(bar)) is the same as *reinterpret_cast<foo>(bar).

(&Table + 1) + Index is the same as &Table + 1 + Index.

zturner added inline comments.May 4 2017, 2:27 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

Although both are ugly IMO :) How about

auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table);
return TablePtr[1 + Index];
zturner added inline comments.May 4 2017, 2:29 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

Correction:

auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table + 1);
return TablePtr[Index];
ecbeckmann updated this revision to Diff 97882.May 4 2017, 2:54 PM
ecbeckmann marked 2 inline comments as done.

Cleaned up some style.

zturner accepted this revision.May 4 2017, 5:04 PM

lgtm, thanks for being so patient!

This revision is now accepted and ready to land.May 4 2017, 5:04 PM
This revision was automatically updated to reflect the committed changes.
ecbeckmann reopened this revision.May 7 2017, 4:26 PM
This revision is now accepted and ready to land.May 7 2017, 4:26 PM
ecbeckmann updated this revision to Diff 98115.May 7 2017, 4:27 PM

Rename the file from .o to .obj.coff, since it seems git ignores .o files

ecbeckmann updated this revision to Diff 98116.May 7 2017, 4:38 PM

Remove the .obj so this will be done in another patch

ecbeckmann updated this revision to Diff 98117.May 7 2017, 4:40 PM
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
ecbeckmann closed this revision.May 7 2017, 4:44 PM
chapuni added a subscriber: chapuni.May 7 2017, 6:23 PM
llvm/lib/Object/COFFObjectFile.cpp
1610 ↗(On Diff #98117)

I can suggest that converter may be moved to caller, then this method may return StringRef.

1612 ↗(On Diff #98117)

Don't return local std::string as StringRef.

llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

It writes back part of SmallString via StringRef.
Could you rewrite not to modify (SmallString)IDStr?

ecbeckmann reopened this revision.May 7 2017, 6:36 PM
This revision is now accepted and ready to land.May 7 2017, 6:36 PM

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

ecbeckmann updated this revision to Diff 98121.May 7 2017, 7:00 PM
ecbeckmann marked 2 inline comments as done.
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
  • Hopefully one last commit to fix this patch, addresses string reference
llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

Is it necessarily incorrect to modify IDStr in this case? The StringRef points to something we own within this scope.

chapuni added inline comments.May 7 2017, 7:21 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

It is doing overlapping memcpy. I think it'd not be dangerous since memcpy is done toward upper address.

That said, please be careful to write back string via StringRef.

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

done in D32958

ecbeckmann updated this revision to Diff 98124.May 7 2017, 7:57 PM
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
  • Hopefully one last commit to fix this patch, addresses string reference
  • Also fixed compiler warning of signed to unsigned conversion.

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

done in D32958

actually it seems that the stringref fix resolves the failures, so we can probably go ahead with the merge.

It looks like this broke SystemZ, and probably all big-endian platforms:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/8010/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aresources.test

From the symptoms, it appears that the little-endian UTF16 halfwords from the COFF binary never get translated to big-endian host byte order.