This adds an /errorrepro:<dir> flag to lld-link. The flag
instructs the linker to write a repro.tar file in the specified
directory if linking fails. The functionality is modeled after
/linkrepro, which unconditionally writes the reproducer.
Details
- Reviewers
rnk ruiu • espindola
Diff Detail
- Build Status
Buildable 19255 Build 19255: arc lint + arc unit
Event Timeline
What was the motivation for this new option? Given that it is easy to write run the same command again with /linkrepro, I don't see an immediate need to add this as a new option.
I am investigating https://bugs.chromium.org/p/chromium/issues/detail?id=786127, which is a failure that does not reproduce reliably. I could use /linkrepro to save all link inputs always, but that would use up a lot of disk space. Running the command again with /linkrepro will almost certainly not reproduce the failure. Having a way to only save the reproducer when an error occurs would help here, and also seems a generally useful feature to have.
lld/COFF/Driver.cpp | ||
---|---|---|
58 | Is this really called when lld exits? I think by default lld calls _exit() to avoid the cost of destruction, so dtors are not called on exit. |
If the test failure is related to LTO, would this really help? As I understand it, the failures are on the object files created by LTO, which I don't think would appear in the linkrepro file.
I wrote a small patch:
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp index 6d8a0c7c1f4..2931bc23d21 100644 --- a/lld/COFF/Chunks.cpp +++ b/lld/COFF/Chunks.cpp @@ -97,9 +97,11 @@ void SectionChunk::applyRelX64(uint8_t *Off, uint16_t Type, OutputSection *OS, case IMAGE_REL_AMD64_REL32_5: add32(Off, S - P - 9); break; case IMAGE_REL_AMD64_SECTION: applySecIdx(Off, OS); break; case IMAGE_REL_AMD64_SECREL: applySecRel(this, Off, OS, S); break; - default: + default: { + llvm::outs() << toHex(File->MB.getBuffer()) << '\n'; fatal("unsupported relocation type 0x" + Twine::utohexstr(Type)); } + } } void SectionChunk::applyRelX86(uint8_t *Off, uint16_t Type, OutputSection *OS,
and used it to build chromium in a loop on my machine overnight. With that I was able to reproduce the problem and capture one of the object files. I'll take a look at it.
Thanks, @pcc . Out of curiosity, how did you build chrome (I guess I'm mostly interested in args.gn and the command you used to build). I've been trying this on my machine and it can literally go for weeks without hitting the problem.
lld/COFF/Driver.cpp | ||
---|---|---|
58 | Good point. In an earlier version I had implemented hooks to be run by exitLld() so that I could write the error repro there. This version was an attempt to simplify that, based on the idea that the destructor for ReproWriter is run before control leaves LinkerDriver::link(). That works fine for some cases, but not cases where we exceed the error limit and ExitEarly is true, or we call fatal(). Let me see if I can come up with a better way. |
Here is the args.gn I was using. It's basically the same as the bot:
is_clang = true is_component_build = false is_debug = false is_official_build = true symbol_level = 2 use_lld = true use_thin_lto = true
I also had a local patch to chrome to get it to use my hacked lld:
diff --git a/build/toolchain/win/BUILD.gn b/build/toolchain/win/BUILD.gn index 4d9d1f45f870..8e4a30b223ba 100644 --- a/build/toolchain/win/BUILD.gn +++ b/build/toolchain/win/BUILD.gn @@ -109,7 +109,7 @@ template("msvc_toolchain") { # lld-link includes a replacement for lib.exe that can produce thin # archives and understands bitcode (for lto builds). lib = "$prefix/$lld_link /lib /llvmlibthin" - link = "$prefix/$lld_link" + link = "C:/src/l/ra/bin/lld-link.exe" if (host_os != "win") { # See comment adding --rsp-quoting to $cl above for more information. link = "$link --rsp-quoting=posix"
I built it with this batch file:
for /l %%x in (1, 1, 100) do ( mkdir llvm_exp%%x cd llvm_exp%%x copy ..\args.gn . gn gen . ninja all > ninja.out cd .. )
lld/COFF/Driver.cpp | ||
---|---|---|
58 | I think we are guaranteed that we will either leave the scope of LinkerDriver::link(), or we will call exitLld() (there is another case where we die from a signal, but that's not a case I'm looking to address here; we can address that in a later patch if we want to). This patch as it is already addresses the case of going out of scope, so we need to address the exitLld() case. exitLld() already calls llvm_shutdown(), so we could use ManagedStatic...but I prefer to actually add a way to register hooks for exitLld(). I'll go ahead and add that, unless someone objects. |
I still don't see a compelling reason to do this to be honest. The use case we had in mind was to help debug an intermittent failure, but as we discovered it wouldn't have helped debug that failure. Also, if a failure is intermittent then it can be reproduced just as much with /errorrepro as with /linkrepro.
I still don't see a compelling reason to do this to be honest.
Fair enough, and if the consensus is that we'd rather not have this, I'll withdraw it. You've reduced the problem that originally prompted me to write this (and you're right it doesn't capture the object files LTO generates). However, aside from that specific problem, I still think that something like this would be useful to diagnose linker errors we may encounter in the future, particularly ones that don't reproduce reliably. As I pointed out earlier, /linkrepro isn't really what I would want for that because you either need to re-run the link until you hit the same failure again, which could be very time consuming, or pre-emptively run with /linkrepro, which slows down links and writes lots of bytes you usually don't care about.
As I pointed out earlier, /linkrepro isn't really what I would want for that because you either need to re-run the link until you hit the same failure again, which could be very time consuming, or pre-emptively run with /linkrepro, which slows down links and writes lots of bytes you usually don't care about.
You could always rerun the link with /linkrepro if the link fails. Even if the second link succeeds, you still have a reproducer that you can try repeatedly until you hit the failure.
After checking with ruiu, it seems we are all on the side of not adding this. Abandoning.
Is this really called when lld exits? I think by default lld calls _exit() to avoid the cost of destruction, so dtors are not called on exit.