This is an archive of the discontinued LLVM Phabricator instance.

[wip] COFF: New symbol table design.
ClosedPublic

Authored by pcc on Jun 8 2016, 6:51 PM.

Details

Summary

This is a port of the new symbol table design from r268178 to the COFF linker.

I measured a small regression in link time with this patch (about 5-10%
for Chromium's chrome_child.dll) on my Windows machine. There is still a
regression if I disable the parallel object file parsing feature in the
existing code. I haven't had a chance to investigate the root cause yet.
I think we probably shouldn't land this until this is tracked down.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 60130.Jun 8 2016, 6:51 PM
pcc retitled this revision from to [wip] COFF: New symbol table design..
pcc updated this object.
pcc added reviewers: ruiu, rafael.
pcc added a subscriber: llvm-commits.
ruiu edited edge metadata.Jun 9 2016, 11:48 AM

It's interesting that this patch regresses performance even though this approach proved to be effective for the ELF linker. Does it mean the parallel object file parsing is effective for COFF? If so, it may indicate a possible optimization we can make for the ELF linker.

pcc added a subscriber: pcc.Jun 9 2016, 6:19 PM

I still don't understand the cause of the regression. As I mentioned, I
still see a regression if I disable parallel object file parsing in the
existing code (by replacing the code starting at COFF/SymbolTable.cpp:28
with just "std::launch Policy = std::launch::deferred;") so it might not
have anything to do with parallel object file parsing.

I would also like to see whether this has something to do with the host OS.
I am working on a patch that adds a /reproduce flag to the COFF linker to
make it easier to measure performance on a Linux host.

Peter

pcc updated this revision to Diff 80854.Dec 8 2016, 6:40 PM
pcc edited edge metadata.

Reviving this in preparation for running some performance measurements

ruiu added a comment.Dec 9 2016, 1:14 PM

I think I want to get this in to close the architectural difference between COFF and ELF, and I like it removes the half-baked parallel file parsing.

So now that I'm able to link chrome_child.dll on Linux it turns out that I get very different results with "perf stat":

Before (median of 5 runs): 11.623297415 seconds time elapsed
After: 8.123567205 seconds time elapsed

#include <sys/time.h>

static uint64_t microsec(struct timespec Time) {
  return Time.tv_sec * 1000000 + (Time.tv_nsec / 1000);
}

void elf::log2(const Twine &Msg) {
  static uint64_t Start = 0;
  struct timespec Time;
  clock_gettime(CLOCK_MONOTONIC, &Time);
  if (Start == 0)
    Start = microsec(Time);
  llvm::errs() << (microsec(Time) - Start) << " " << Msg << "\n";
}

I often add this piece of code to measure time of each pass. This is hacky but still better than time command. Did you try to measure this patch on Windows?

lld/COFF/SymbolTable.cpp
78 ↗(On Diff #80854)

Remove llvm:: from here and other places.

pcc added a comment.Dec 9 2016, 1:38 PM
In D21166#618607, @ruiu wrote:

I think I want to get this in to close the architectural difference between COFF and ELF, and I like it removes the half-baked parallel file parsing.

So now that I'm able to link chrome_child.dll on Linux it turns out that I get very different results with "perf stat":

Before (median of 5 runs): 11.623297415 seconds time elapsed
After: 8.123567205 seconds time elapsed

#include <sys/time.h>

static uint64_t microsec(struct timespec Time) {
  return Time.tv_sec * 1000000 + (Time.tv_nsec / 1000);
}

void elf::log2(const Twine &Msg) {
  static uint64_t Start = 0;
  struct timespec Time;
  clock_gettime(CLOCK_MONOTONIC, &Time);
  if (Start == 0)
    Start = microsec(Time);
  llvm::errs() << (microsec(Time) - Start) << " " << Msg << "\n";
}

I often add this piece of code to measure time of each pass. This is hacky but still better than time command. Did you try to measure this patch on Windows?

Yes, and I'm still seeing the regression on Windows. The MSVC profiler tells me that I'm spending a lot of time creating memory maps for files, so I'm wondering whether memory map performance on Windows is the root cause of the very different performance results as compared to Linux (maybe this patch somehow causes us to open files repeatedly or something, I don't know). I'm still investigating, but I think I'd be fine with landing this in parallel with the investigation.

ruiu accepted this revision.Dec 9 2016, 1:48 PM
ruiu edited edge metadata.

LGTM. Please submit this patch. I'm now trying to close the gap between ELF and COFF, and that's exactly what I want. Thank you for doing this.

This revision is now accepted and ready to land.Dec 9 2016, 1:48 PM
This revision was automatically updated to reflect the committed changes.