This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Add --print-archive-stats=
ClosedPublic

Authored by MaskRay on Apr 27 2020, 11:41 PM.

Details

Summary

gold has an option --print-symbol-counts= which prints:

// For each archive
archive $archive $members $fetched_members
// For each object file
symbols $object $defined_symbols $used_defined_symbols

In most cases, $defined_symbols = $used_defined_symbols unless weak
symbols are present. Strangely $used_defined_symbols includes symbols defined relative to --gc-sections discarded sections.
The symbols lines do not appear to be useful.

archive lines are useful: $fetched_members=0 lines correspond to
unused archives. The information can be used to trim dependencies.

This patch implements --print-archive-stats= which prints the number of
members and the number of fetched members for each archive.

Diff Detail

Event Timeline

MaskRay created this revision.Apr 27 2020, 11:41 PM
alexander-shaposhnikov added inline comments.
lld/ELF/InputFiles.cpp
1173
  1. what about std::distance(file->child_begin(err), file->child_end()); ?
  2. just curious why is the error consumed ? e.g. alternatively this could return Expected<...>
lld/ELF/Driver.h
33 ↗(On Diff #260548)

if this vector is not modified outside of this class then probably it would be good to keep it private and provide a read-only accessor instead.

MaskRay marked an inline comment as done.Apr 28 2020, 12:38 AM
MaskRay added inline comments.
lld/ELF/InputFiles.cpp
1173
  1. difference_type for these iterator types is not defined so std::distance can't be used:
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_iterator_base_funcs.h:138:5: note: candidate template ignored: substitution failure
 [with _InputIterator = llvm::fallible_iterator<llvm::object::Archive::ChildFallibleIterator>]: no type named 'difference_type' in 'std::iterator_trai
ts<llvm::fallible_iterator<llvm::object::Archive::ChildFallibleIterator> >'
  1. just curious why is the error consumed ? e.g. alternatively this could return Expected<...>

--whole-archive and a special case call getArchiveMembers. When the control flow reaches here, err will guaranteed to be success. For the rest the archive may be corrupted but we just don't care about the error here.

MaskRay marked an inline comment as done.Apr 28 2020, 12:39 AM
MaskRay added inline comments.
lld/ELF/Driver.h
33 ↗(On Diff #260548)

The code style in lld may be a bit special. In many places public member variables are used in place of public member functions (especially getters and setters)...

grimar added inline comments.Apr 28 2020, 2:09 AM
lld/ELF/Driver.h
33 ↗(On Diff #260548)

I think a bit more consistent way would be to add a archiveFiles vector and initialize it from doParseFile:

template <class ELFT> static void doParseFile(InputFile *file) {
  if (!isCompatible(file))
    return;

  // Binary file
  if (auto *f = dyn_cast<BinaryFile>(file)) {
    binaryFiles.push_back(f);
    f->parse();
    return;
  }

  // .a file
  if (auto *f = dyn_cast<ArchiveFile>(file)) {
    f->parse(); 
    /// <- HERE
    return;
  }
....

This would be consistent with other vectors we have:

std::vector<BinaryFile *> binaryFiles;
std::vector<BitcodeFile *> bitcodeFiles;
std::vector<LazyObjFile *> lazyObjFiles;
std::vector<InputFile *> objectFiles;
std::vector<SharedFile *> sharedFiles;
lld/ELF/InputFiles.cpp
1173

What about this?

size_t ArchiveFile::getMemberCount() const {
  Error err = Error::success();
  auto range = file->children(err);
  // A comment about why `consumeError` is OK.
  consumeError(std::move(err));

  size_t count = 0;
  for (auto i = range.begin(), e = range.end(); i != e; ++i)
    ++count;
  return count;
}
lld/ELF/MapFile.cpp
262

Should this code belong to MapFile.cpp? (I am not sure. The file header says it is for -Map, though we also handle -cref here)

269

I'd suggest to add the option name. We do it sometimes:

\lld\ELF\Driver.cpp(1408):    error("-image-base: number expected, but got " + s);
\lld\ELF\Driver.cpp(1479):    error("--undefined-glob: " + toString(pat.takeError()));
lld/ELF/Options.td
298

speicified -> specified

Perhaps, make it more generic? Something like "Write archives usage statistic to the .."

lld/ELF/Writer.cpp
603–604

The comment needs an update.

MaskRay updated this revision to Diff 260686.Apr 28 2020, 10:12 AM
MaskRay marked 8 inline comments as done.

Address comments

lld/ELF/Driver.h
33 ↗(On Diff #260548)

This will add yet another global variable, but since it is grouped with other similar variables, I think it is fine. In return, we can avoid private->public visibility change and expose Driver.h to MapFile.cpp, it is probably slightly better.

This exposes another issue that D46034 does not clear lazyObjFiles when relinking. I will clear it as well.

lld/ELF/InputFiles.cpp
1173

llvm::object::Archive::child_iterator keeps a reference to the Error object. consumeError(std::move(err)) has to be called after the iteration is done.

Using for (auto i = range.begin(), e = range.end(); i != e; ++i) can avoid an unused loop variable but it seems to become more complex. In the end I still prefer a range-based for loop, even if there is an unused loop variable.

lld/ELF/MapFile.cpp
262

It looks like we are using MapFile.cpp for ad-hoc statistical features.. This file hasn't become very complex. I think it is probably fine to keep it here (writeArchiveStats takes less than 20 lines). When it becomes more complex, we can consider a file rename.

grimar added inline comments.Apr 29 2020, 6:00 AM
lld/ELF/InputFiles.cpp
1178

What do you think if instead of adding a size_t getMemberCount() const; to ArchiveFile we add:

Archive* getArchiveFile() const;

And then just inline the logic of getMemberCount() to writeArchiveStats()?

(The logic is not used anywhere else and I can't say that such loops look nice,
perhaps hiding it away from the general code might look better).

grimar accepted this revision.Apr 29 2020, 6:08 AM

LGTM.

lld/ELF/InputFiles.cpp
1173

but it seems to become more complex.

It shows the intention better though (IMO).

1178

Thinking about it again, I am not sure it will look much better as we already have
size_t getFetchedMemberCount() const { return seen.size(); } anyways.

lld/ELF/InputFiles.h
330

Members?

This revision is now accepted and ready to land.Apr 29 2020, 6:08 AM
grimar added inline comments.Apr 29 2020, 6:16 AM
lld/ELF/InputFiles.h
330

"member count" seems correct, so nevermind.

MaskRay edited the summary of this revision. (Show Details)Apr 29 2020, 6:04 PM
This revision was automatically updated to reflect the committed changes.