Page MenuHomePhabricator

Update llvm-readobj -coff-resources to display tree structure.
ClosedPublic

Authored by ecbeckmann on Apr 27 2017, 11:33 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ecbeckmann added inline comments.May 2 2017, 11:56 PM
llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

Hmm I'm not quite sure what you mean. A resource directory entry is composed of two separate 32-bit fields, 64 bits in total. The first field can be either an offset to a string name or an ID number. The second field is either an offset to the resource data or the offset to the next directory level down the tree. We must access both of these fields to read an entry, we might be able to use a ulittle32_t to read the first field and then move up 4 bytes to read the second field, but it seems much clearer to be able to access either of them by name.

1074 ↗(On Diff #97479)

This should actually be a function that is part of the coff_resource_dir_entry structure.

zturner added inline comments.May 3 2017, 10:04 AM
llvm/include/llvm/Object/COFF.h
641–644 ↗(On Diff #96988)

I also agree that it's not an equivalent struct. For @ruiu's proposed definition, sizeof(coff_resource_dir_entry) == 4, but for the one in the patch, sizeof(coff_resource_dir_entry) == 8.

627 ↗(On Diff #97560)

I think this enum should go in llvm/Support/COFF.h. It's confusing that there are two files named COFF.h, but this file is for object file layout stuff, and the other file is more semantic information and enum / constant definitions.

655 ↗(On Diff #97560)

It's a little jarring to see methods in unions. I think part of the reason is that when you're looking at a union, it's nice for it to be as compact as possible so that you can get a picture of the entire layout of the union in one screen.

Can you move all the methods out of the union and into the top level struct?

With that said, we don't normally put constant <-> string translation functions in low level headers like this. Take a look at COFFDumper.cpp, you will see a lot of things like this:

static const EnumEntry<COFF::DLLCharacteristics> PEDLLCharacteristics[] = {
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA      ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY      ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NX_COMPAT            ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_SEH               ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_NO_BIND              ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_APPCONTAINER         ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER           ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_GUARD_CF             ),
  LLVM_READOBJ_ENUM_ENT(COFF, IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE),
};

I think we should do something similar for the ResourceID enum.

Also, unless you're using a strongly typed enum (e.g. enum class), you shouldn't explicitly scope it by saying ResourceID::. So you have to pick one of the following two patterns:

enum class ResourceID {
  Cursor,   // Names are not RID_ prefixed because you have to scope it with ResourceID::
  Bitmap,
  ...
};

enum ResourceID {
  RID_Cursor,   // Names are RID_ prefixed because you refer to them unqualified.
  RID_Bitmap,
  ...
};
703 ↗(On Diff #97560)

IIUC you are are taking the lower 31 bits of NameOffset? Can you write this as:

return maskTrailingOnes<uint32_t>(31) & NameOffset;

This function is defined in MathExtras.h

710 ↗(On Diff #97560)

Same here.

723–725 ↗(On Diff #97560)

I don't think we should put this logic here. Although in practice the two structures appear immediately after each other, in theory they should be independent of each other and the parsing code should be responsible for knowing this.

1146 ↗(On Diff #97560)

No need for the argument to be const. Also can you mark this constructor explicit?

1155 ↗(On Diff #97560)

Instead of storing a StringRef, store a BinaryByteStream. This way you don't have to reconstruct it every time you want to read from it. And it's just as good as a StringRef

llvm/lib/Object/COFFObjectFile.cpp
1597–1602 ↗(On Diff #97560)

Is this function ever called?

1614 ↗(On Diff #97560)

What's the +2 for?

llvm/tools/llvm-readobj/COFFDumper.cpp
1549–1550 ↗(On Diff #97560)

Should RSF be passed by value? It's immutable in principle anyway since all it contains is a StringRef, and it's thin as well. Usually when classes are suffixed with Ref, LLVM convention is that we should pass them by value since they're "as good as references"

ecbeckmann updated this revision to Diff 97750.May 3 2017, 5:04 PM
ecbeckmann marked 10 inline comments as done.

Refactored parsing funcationality into more logical places.

ecbeckmann added inline comments.May 3 2017, 5:05 PM
llvm/include/llvm/Object/COFF.h
655 ↗(On Diff #97560)

I opted to follow the pattern of other enums in COFFDumper.

llvm/lib/Object/COFFObjectFile.cpp
1597–1602 ↗(On Diff #97560)

ah you're right I replaced this function and forgot to remove it.

1614 ↗(On Diff #97560)

ah woops I didn't realize the reader updated its own offset as it reads.

llvm/tools/llvm-readobj/COFFDumper.cpp
1549–1550 ↗(On Diff #97560)

yeah that makes sense

ruiu added inline comments.May 3 2017, 5:30 PM
llvm/lib/Object/COFFObjectFile.cpp
1600–1601 ↗(On Diff #97750)

You can define a variable inside an if.

if (std::error_code EC = ...)

And this is preferred way if EC's scope is limited inside that if as narrower scope = better.

1611 ↗(On Diff #97750)

This seems odd. Why can't you do RawDirString[i]?

1623 ↗(On Diff #97750)

You can directly return it without the assignment.

ecbeckmann updated this revision to Diff 97757.May 3 2017, 5:49 PM
ecbeckmann marked 3 inline comments as done.

Cleaned up error code checks.

Getting much closer. Just a few more things.

llvm/include/llvm/Object/COFF.h
1076 ↗(On Diff #97750)

You can write this line:

: BBS(Ref, support::little) {}
llvm/lib/Object/COFFObjectFile.cpp
1600 ↗(On Diff #97750)

You can shorten all these by one line if you say:

if (auto EC = errorToErrorCode(Reader.readInteger(Length)))
  return EC;

You can shorten them all TO one line if you add a macro like:

#define RETURN_IF_ERROR(X) \
if (auto EC = errorToErrorCode(X)) \
  return EC;

And then say:

RETURN_IF_ERROR(Reader.readInteger(Length));
1609–1612 ↗(On Diff #97750)

It looks like you're just truncating the second byte of each string? You should probably do a proper UTF16 -> UTF8 conversion here.

#include "llvm/Support/ConvertUTF.h"
...
ArrayRef<UTF16> RawDirString;
std::string DirString;
RETURN_IF_ERROR(Reader.readArray(RawDirString, Length));
if (!llvm::convertUTF16ToUTF8String(RawDirString, DirString))
  return // some kind of error.
1623–1626 ↗(On Diff #97750)

It would be nice to get rid of all the casting, in part because it makes the logic hard to follow, but also in part because there's no protection in case of corrupt data. For example, this function can never return an error, even if Entry is bogus and makes you read off into nowhere. So I would write this:

const coff_resource_dir_table *Table = nullptr;

BinaryStreamReader Reader(BBS);
Reader.setOffset(Entry->Offset.value());
RETURN_IF_ERROR(Reader.readObject(Table));
return Table;

this way if for some reason the offset is bogus, you'll get a useful error.

1630 ↗(On Diff #97750)

How about adding a function:

ErrorOr<const coff_resource_dir_table&> getTableAtOffset(unsigned Offset) {
const coff_resource_dir_table *Table = nullptr;

BinaryStreamReader Reader(BBS);
Reader.setOffset(Offset);
RETURN_IF_ERROR(Reader.readObject(Table));
assert(Table != nullptr);
return *Table;
}

Then have this funciton and the one above it be:

ErrorOr<const coff_resource_dir_table&> getEntrySubDir(const coff_resource_dir_entry &Entry) {
  return getTableAtOffset(Entry->Offset.value());
}

ErrorOr<const coff_resource_dir_table&> getBaseTable() {
  return getTableAtOffset(0);
}

Note also that I'm using references now instead of pointers, since they can't be null.

llvm/tools/llvm-readobj/COFFDumper.cpp
121 ↗(On Diff #97750)

Can this be a const coff_resouce_dir_table & instead of a pointer?

143–145 ↗(On Diff #97750)

References instead of pointers.

502–504 ↗(On Diff #97750)

This is strange. This somehow got touched by clang-format.

1586–1590 ↗(On Diff #97750)

Does this work?

auto &Entry = error(EO);

I know it does with Expected<T>, but not sure about ErrorOr<T>.

1599 ↗(On Diff #97750)

This is wrong, because RawName will get destroyed at the end of this scope, and the StringRef will point to bogus memory.

1618 ↗(On Diff #97750)

Does NextLevel = "Language" not work?

1621 ↗(On Diff #97750)

For these verbose names, you can use auto. Although in this case the auto &X = error(T) pattern might work, so this whole block could just be simplified to:

auto &NextTable = error(RSF.getEntrySubDir(Entry));
1643–1645 ↗(On Diff #97750)

I suppose this is fine now. Ignore my comments earlier about FixedStreamArray. It would be more idiomatic to use here if there was a class like

struct CoffResourceDirTable {
  const coff_resource_dir_table *Header;
  FixedStreamArray<coff_resource_dir_entry> Entries;
};

And then instead of having these functions return coff_resource_dir_entry& or coff_resource_dir_table&, have them return these structures instead.

In any case, don't worry about this for now.

ecbeckmann updated this revision to Diff 97774.May 3 2017, 11:03 PM
ecbeckmann marked 11 inline comments as done.

Some more cleanup.

ecbeckmann added inline comments.May 3 2017, 11:06 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1586–1590 ↗(On Diff #97750)

Unfortunately error() does not accept type ErrorOr.

ecbeckmann updated this revision to Diff 97778.May 3 2017, 11:35 PM
ecbeckmann marked an inline comment as not done.

Fixed formatting issues.

ecbeckmann updated this revision to Diff 97877.May 4 2017, 2:19 PM

Use the unwrapOrError function for clarity.

ecbeckmann added inline comments.May 4 2017, 2:20 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1586–1590 ↗(On Diff #97750)

I found a function unwrapOrError that does exactly what we want.

ruiu added inline comments.May 4 2017, 2:23 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

This is a comment about the style, but for operators who's precedences are "obvious", we don't use that many parentheses.

Everyone knows that + has lower precedence than >=.

*(reinterpret_cast<foo>(bar)) is the same as *reinterpret_cast<foo>(bar).

(&Table + 1) + Index is the same as &Table + 1 + Index.

zturner added inline comments.May 4 2017, 2:27 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

Although both are ugly IMO :) How about

auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table);
return TablePtr[1 + Index];
zturner added inline comments.May 4 2017, 2:29 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1628–1631 ↗(On Diff #97877)

Correction:

auto TablePtr = reinterpret_cast<const coff_resource_dir_entry *>(&Table + 1);
return TablePtr[Index];
ecbeckmann updated this revision to Diff 97882.May 4 2017, 2:54 PM
ecbeckmann marked 2 inline comments as done.

Cleaned up some style.

zturner accepted this revision.May 4 2017, 5:04 PM

lgtm, thanks for being so patient!

This revision is now accepted and ready to land.May 4 2017, 5:04 PM
This revision was automatically updated to reflect the committed changes.
ecbeckmann reopened this revision.May 7 2017, 4:26 PM
This revision is now accepted and ready to land.May 7 2017, 4:26 PM
ecbeckmann updated this revision to Diff 98115.May 7 2017, 4:27 PM

Rename the file from .o to .obj.coff, since it seems git ignores .o files

ecbeckmann updated this revision to Diff 98116.May 7 2017, 4:38 PM

Remove the .obj so this will be done in another patch

ecbeckmann updated this revision to Diff 98117.May 7 2017, 4:40 PM
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
ecbeckmann closed this revision.May 7 2017, 4:44 PM
chapuni added a subscriber: chapuni.May 7 2017, 6:23 PM
llvm/lib/Object/COFFObjectFile.cpp
1610 ↗(On Diff #98117)

I can suggest that converter may be moved to caller, then this method may return StringRef.

1612 ↗(On Diff #98117)

Don't return local std::string as StringRef.

llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

It writes back part of SmallString via StringRef.
Could you rewrite not to modify (SmallString)IDStr?

ecbeckmann reopened this revision.May 7 2017, 6:36 PM
This revision is now accepted and ready to land.May 7 2017, 6:36 PM

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

ecbeckmann updated this revision to Diff 98121.May 7 2017, 7:00 PM
ecbeckmann marked 2 inline comments as done.
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
  • Hopefully one last commit to fix this patch, addresses string reference
llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

Is it necessarily incorrect to modify IDStr in this case? The StringRef points to something we own within this scope.

chapuni added inline comments.May 7 2017, 7:21 PM
llvm/tools/llvm-readobj/COFFDumper.cpp
1571 ↗(On Diff #98117)

It is doing overlapping memcpy. I think it'd not be dangerous since memcpy is done toward upper address.

That said, please be careful to write back string via StringRef.

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

done in D32958

ecbeckmann updated this revision to Diff 98124.May 7 2017, 7:57 PM
  • Quick fix to D32609, it seems .o files are not transferred in all cases.
  • Hopefully one last commit to fix this patch, addresses string reference
  • Also fixed compiler warning of signed to unsigned conversion.

Since some bots are red and we're dealing with memory corruption / sanitizer failures which may require a closer investigation to fully resolve, let's revert this temporarily while you investigate?

done in D32958

actually it seems that the stringref fix resolves the failures, so we can probably go ahead with the merge.

It looks like this broke SystemZ, and probably all big-endian platforms:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/8010/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aresources.test

From the symptoms, it appears that the little-endian UTF16 halfwords from the COFF binary never get translated to big-endian host byte order.