This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Cover usage of LLD as a library
ClosedPublic

Authored by aganea on Nov 18 2019, 1:32 AM.

Details

Summary

This patch introduces testing of LLD as library (only for the COFF driver for now). At the moment, we simply run the driver three times in a row, without shutting down LLD. This ensures the cleanup is properly done between each run.

Covering the other drivers works, but some tests need to be improved. An illustrated example is the fix done in the ELF test below. If there's interest in enabling this feature for the other drivers, perhaps respective owners could take a look, and attempt to fix the tests? @sbc100 @int3 @MaskRay The reason is that I'm printing "LLD_IN_TEST (new LLD session)" in stdout, in-between each run, to be able to differentiate the output.

Diff Detail

Event Timeline

blackhole12 created this revision.Nov 18 2019, 1:32 AM
smeenai added a subscriber: smeenai.

It's recommended to uploaded patches with full context (e.g. by using git diff -U9999 when uploading a patch using the web interface).

ruiu added a comment.Nov 18 2019, 9:35 PM

I share your concern and I'm sorry about leaving so many global variables as uninitialized in the second run of the linker in the same process. But I'd like to find and fix the problems without abolishing global variables in lld, as I don't actually dislike them. One thing we could do is initializing global variables with invalid data -- like -1 or {nullptr} so that you'd get an error even on the first run if you do not reset them before use. I'll create a patch and send it to you.

lld/COFF/DebugTypes.cpp
93 ↗(On Diff #229762)

Instead of defining a new function, you could add TypeServerSource::instances.clear() at the beginning of loadTypeServerSource().

lld/COFF/Writer.cpp
85

Likewise, I'd add outputSections.clear() at the beginning of Writer::run().

Moved the clear commands to where @ruiu suggested. I was unable to upload a full context diff because the web interface refuses to work properly.

@blackhole12 https://llvm.org/docs/GettingStarted.html#sending-patches

apt install arcanist (clone libphutil + clone arcanist)
cd llvm; arc install-certificate
arc diff 'HEAD^' # the comment will update the revision with the full diff

Update with full context

After further testing, this actually doesn't work, but only if you are loading an associated PDB file. I have no idea what's going on, because I don't understand the internals of TypeServerSource, but it's throwing an assertion error at DebugTypes.cpp:191

assert(it != TypeServerSource::instances.end());

My previous fix, which only cleared the caches at the very end of the link() call, does work in this scenario, which suggests to me that the TypeServerSource cache is likely being cleared too often. I highly suggest wrapping this fix into https://reviews.llvm.org/D70766, because it is clear this code is simply far too entwined with itself to properly reason about.

aganea added a subscriber: aganea.
aganea added inline comments.
lld/COFF/DebugTypes.cpp
236 ↗(On Diff #231826)

This can't work. loadTypeServerSource might be called several times during a single link. Please wait until D79672 lands, it will fix this particular TypeServerSource issue.

lld/COFF/Writer.cpp
604

Can you instead move outputSections in the OutputSection class, and call .clear() in lld/COFF/Driver.cpp, link(), like for the others static containers?

blackhole12 marked an inline comment as done.
blackhole12 edited the summary of this revision. (Show Details)

This update removes the debuginfo fix, which is no longer needed, and instead moves outputSections into OutputSection and calls it directly from link()

aganea added inline comments.May 15 2020, 6:26 AM
lld/COFF/Driver.cpp
99

Sorry for making you change another time, but I think the code would look better if following the existing nomenclature: OutputSection::instances.

lld/COFF/Writer.cpp
1091

You need to run git clang-format on your changes before uploading the patch. You simply need to add clang/tools/clang-format/git-clang-format and a copy of clang-format.exe somewhere in your %PATH% (if running on Windows).

blackhole12 marked an inline comment as done.

I changed it to ::instances, but clang-format tried to reformat a bunch of code unrelated to my changes, so I only ran it against Writer.cpp to minimize spurious changes.

Seems good, could please remove the unrelated clang-format changes?
Do you have commit access?

lld/COFF/Writer.cpp
78

Anything unrelated like this line change should be removed from the patch. We could run clang-format separately on the files after.

108

Remove change.

312

Remove change.

866

Remove change.

1880–1884

Please remove this change as well as the one below.

blackhole12 marked 7 inline comments as done.

Removed unnecessary formatting changes. I do not have commit access @aganea so someone else will have to merge this change.

aganea commandeered this revision.May 19 2020, 1:21 PM
aganea edited reviewers, added: blackhole12; removed: aganea.
aganea updated this revision to Diff 265014.May 19 2020, 1:25 PM
aganea edited reviewers, added: MaskRay, amccarth; removed: blackhole12.
aganea edited subscribers, added: blackhole12; removed: MaskRay.

Taking over the patch, after discussing with Erik.

I went back to the original idea of having a single function for clearing outputSections. I felt that in the end, there were too many changes to support a single line of code in Driver.cpp.

I read back through some of the history, but I still don't feel like I have all the context. That said, this change looks simple and straightforward to me.

Would it be feasible and useful to create a test that links two trivial programs with a single instantiation of the lld process? It seems like it would detect regressions and catch any introduction of a new global that needs to be reset.

I read back through some of the history, but I still don't feel like I have all the context. That said, this change looks simple and straightforward to me.

Would it be feasible and useful to create a test that links two trivial programs with a single instantiation of the lld process? It seems like it would detect regressions and catch any introduction of a new global that needs to be reset.

That is great suggestion! I will see what I can do. Maybe something along the lines of LLVM_ENABLE_ASSERTIONS + LLVM_ENABLE_EXPENSIVE_CHECKS + run all LLD tests while calling link() three times in a row?

I read back through some of the history, but I still don't feel like I have all the context. That said, this change looks simple and straightforward to me.

Would it be feasible and useful to create a test that links two trivial programs with a single instantiation of the lld process? It seems like it would detect regressions and catch any introduction of a new global that needs to be reset.

That is great suggestion! I will see what I can do. Maybe something along the lines of LLVM_ENABLE_ASSERTIONS + LLVM_ENABLE_EXPENSIVE_CHECKS + run all LLD tests while calling link() three times in a row?

It might be better to run link() on different inputs. If you were to re-link the same program, a cache that wasn't cleared may return the expected values and thus elude detection. If that's difficult, I'm happy with anything that at least tries link() more than once in a single process lifetime and isn't too costly for build and test times.

aganea updated this revision to Diff 285762.Aug 14 2020, 2:51 PM
aganea retitled this revision from [LLD][COFF] Fix missing cache cleanup in COFF::link() to [LLD][COFF] Cover usage of LLD as a library.
aganea edited the summary of this revision. (Show Details)
aganea added subscribers: int3, sbc100.

Add support for testing LLD as a library, by running the driver several times in a row when LLD_IN_TEST is defined. The value of LLD_IN_TEST tells how many times LLD should loop (3 times right now).

@amccarth As for your suggestion, I don't see an easy way right now to pass three different sets of command-line arguments to LLD. We could maybe think of a special syntax for that purpose, and/or use response files.

In the meanwhile, all COFF tests pass with this enabled (tested on Windows & Linux). The other drivers work, but some tests would need to fixed/improved, so I intentionnally enabled the feature only for the COFF driver.

aganea edited the summary of this revision. (Show Details)Aug 14 2020, 2:54 PM
aganea updated this revision to Diff 285772.Aug 14 2020, 3:32 PM

Fix clang-(format|tidy)

MaskRay added a comment.EditedAug 14 2020, 3:43 PM

Thanks for working on this. I think making lldMain "reentrancy safe" is useful.

Currently the output is replicated atoi(getenv("LLD_IN_TEST")) times. There are also some LLD_IN_TEST (new LLD session) lines. This behavior can disturb

  • tests with -NOT: CHECK lines (~20 test/ELF tests failed due to this problem).
  • tests with lld .... 2>&1 | count 0 (test that there is no output)

I suggest we do it only when --verbose (port specific; inside coff::link, elf::link, ...) is specified. For non-verbose mode, is it possible to suppress output for the first N-1 invocations of lldMain?

I noticed a problem in llvm::timeTraceProfilerInitialize.

I just fixed an ELF problem related to InputFile::isInGroup (b358daddea04ea3c52e0e5bd5e851cee47f7f27f)

LLD_IN_TEST=3 is currently set for COFF. Note that this can make tests slower. I measured the performance on my machine:

  • LLD_IN_TEST=1: check-lld-elf => 5.1s
  • LLD_IN_TEST=2: check-lld-elf => 5.9s
  • LLD_IN_TEST=3: check-lld-elf => 6.7s
  • (I am glad that the ld.lld RUN lines don't take the majority of time:) )

I wonder whether 2 is sufficient.

lld/test/COFF/guardcf-lto.ll
11 ↗(On Diff #285762)

Why can't %t.ll be used?

lld/tools/lld/lld.cpp
145

= 0

172

Probably int to avoid unsigned underflow

or for (; i > 0; --i) {

183

Not all ports support -mllvm or similar options. This should be moved into individual ports, right before cl::Parse* is called.

grimar added inline comments.Aug 17 2020, 2:42 AM
lld/test/ELF/cgprofile-shared-warn.s
12 ↗(On Diff #285772)

No need to include "ld.lld" to the check line.

lld/test/ELF/zdefs.s
14 ↗(On Diff #285772)

I think testing just "error: " is enough. You do not expect any errors here, not just "undefined symbol" error.
Also, it is safer to provide a minimal check string when -NOT is used. In case if we change "undefined symbol"
message, it might be easy to forget to update such check lines.

lld/tools/lld/lld.cpp
144

It needs a comment to explain what does it return.

176

May be I am missing something, but I read it as: you use SaveAndRestore to set
errorHandler().verbose to true and restore the value back, because want to bypass
the check in:

void ErrorHandler::log(const Twine &msg) {
  if (!verbose)
    return;
  std::lock_guard<std::mutex> lock(mu);
  lld::errs() << logName << ": " << msg << "\n";
}

My question: why instead not to use lld::errs() directly here?
E.g:

lld::errs() << ErrorHandler::logName <<  ": "LLD_IN_TEST (new LLD session)\n";
182

Should we return r; here when *ret != r? This should make a test fail even when asserts are disabled I think.

aganea updated this revision to Diff 286665.EditedAug 19 2020, 2:37 PM
aganea marked 9 inline comments as done.
aganea added a subscriber: tentzen.

As suggested,

  • Set LLD_IN_TEST=2 (still enabled for COFF only)
  • Remove stdout & stderr for all iterations but the last one.

Also:

  • Use CrashRecoveryContext to handle crashes or abort, and allow re-entrance.
  • Ensure the cleanup is always done in case of a crash or abort.
  • Disable LLD_IN_TEST for lld/test/ELF/invalid-cie-reference.s because it requires unwinding of SEH to work correctly on Windows, and that doesn't work with previous (actual?) versions of Clang (I imagine @tentzen's work will fix that)
aganea added a comment.EditedAug 19 2020, 2:42 PM

Thanks for the feedback @MaskRay @grimar!

I think making lldMain "reentrancy safe" is useful.

I have another series of changes to also make it thread-safe (that is, being able to link several executables at the same time in different threads). But I'll send out an RFC first, because that requires changes in how cl::opts are stored. The goal is to ultimately build a project from inside a single process.

For non-verbose mode, is it possible to suppress output for the first N-1 invocations of lldMain?

Done.

I noticed a problem in llvm::timeTraceProfilerInitialize.

You mean the thread_locals that are not reinitialized?

LLD_IN_TEST=3 is currently set for COFF.
I wonder whether 2 is sufficient.

Done.

lld/test/COFF/guardcf-lto.ll
11 ↗(On Diff #285762)

Because -dll outputs both %t.dll and %t.lib, which happens to also be an input, so the second iteration would fail otherwise.

lld/tools/lld/lld.cpp
176

Removed and replaced by what @MaskRay was proposing, ie. only print output for the last iteration.

aganea added inline comments.
lld/tools/lld/lld.cpp
175

@blackhole12 Since this goal of this patch is in part to fix an issue you've raised, is this API in line with what you're expecting? (when using LLD as a library)

Mostly looks good. Thanks for working on it!

lld/Common/ErrorHandler.cpp
51

llvm::nulls()

164

The two conditions can be merged

aganea updated this revision to Diff 287126.Aug 21 2020, 3:28 PM
aganea marked 2 inline comments as done.

As suggested.

lld/tools/lld/lld.cpp
236

@MaskRay I've also replaced the LLVM_BUILD_TRAP with raise, just to properly throw the right signal (also take a look at CrashRecoveryContext.cpp). Since you're more knowledgable than me in this area, let me know you think of a better way.

MaskRay added inline comments.Aug 21 2020, 3:53 PM
lld/tools/lld/lld.cpp
236

Signals are numbered from 1 (0 is used as an existence and permission check). Use > 128

llvm/lib/Support/CrashRecoveryContext.cpp
381 ↗(On Diff #287126)

Using 128+n looks good. We can probably replace the bash link with the more standard
https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html

Historically, shells have returned an exit status of 128+ n, where n represents the signal number....

blackhole12 accepted this revision.Aug 21 2020, 4:31 PM
blackhole12 added inline comments.
lld/tools/lld/lld.cpp
175

I'm not sure why you are asking about this part in particular. When using LLD as a library, I call the appropriate ::link() function directly so I can pass in additional arguments like the error stream or standard out stream. I cannot use lldMain or safeLldMain for my use case.

This revision is now accepted and ready to land.Aug 21 2020, 4:31 PM
aganea added inline comments.Aug 24 2020, 7:58 AM
lld/tools/lld/lld.cpp
175

That answers my question. How do you handle crashes or signals in that case? If you think safeLldMain can be useful to you, I could also add two extra arguments to redirect the std streams.

blackhole12 added inline comments.Aug 24 2020, 8:45 AM
lld/tools/lld/lld.cpp
175

Currently, I don't handle anything. If LLD crashes, my compiler crashes. A safe invocation, however, would certainly be handy, as it would allow the compiler to emit an error message instead of crashing.

Just fwiw, a smaller patch on the same topic was just committed a few days ago, from D86401. This looks like a bigger and more complete fix for the same issue - but before completing it, pay attention to pick up/remove the fix from D86401 (I see that the same vector already gets cleared by this patch as well).

Just fwiw, a smaller patch on the same topic was just committed a few days ago, from D86401. This looks like a bigger and more complete fix for the same issue - but before completing it, pay attention to pick up/remove the fix from D86401 (I see that the same vector already gets cleared by this patch as well).

Thanks for the info Martin!

MaskRay added inline comments.Aug 24 2020, 12:24 PM
lld/Common/ErrorHandler.cpp
54

Nit: One blank line above

@MaskRay Thanks will do - I still have one last issue I don't understand, maybe you can help?

lld/tools/lld/lld.cpp
236

The test at L33 in lld/test/ELF/lto/ltopasses-custom.ll fails on Linux because below I'm calling raise(SIGABRT), which in theory should return 134. But the control flow then goes to not, in Program.inc, just after the call to sys::wait4. However at that point, the status value is 0x8600 and I don't understand why. If I run lld without LLD_IN_TEST, the abort() call for that test is called normally (not in a CrashRecoveryContext) and when reaching the same point after sys::wait4, status is 0x86 (134, that is 128+SIGABRT) in that case.
So one case, abort() exits the process, we get 0x86.
The other case, raise(SIGABRT) gets 0x8600.
Any ideas on how to fix this? That was the reason why I used LLVM_BUILTIN_TRAP before using raise, because it properly returns (137) 128+SIGKILL.
Any ideas?

aganea updated this revision to Diff 287747.Aug 25 2020, 12:26 PM
aganea marked an inline comment as done.

Rebase.
Address @MaskRay's suggestions.
Made safeLldLink a public API and made it possible to provide user-defined stdout/stderr streams.

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2020, 12:26 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aganea marked 3 inline comments as done.Aug 25 2020, 12:29 PM
MaskRay added inline comments.Aug 25 2020, 1:35 PM
clang/tools/driver/driver.cpp
537 ↗(On Diff #287747)

(From a non-native speaker)

"that is codes >128" Should the singular form be used?

llvm/include/llvm/Support/Signals.h
119 ↗(On Diff #287747)

[readability-identifier-naming]

Many very old function use UpperCase, but newer functions should use lowerCase.

aganea updated this revision to Diff 287952.Aug 26 2020, 6:18 AM
aganea marked 2 inline comments as done.

Address comments.
Added a CrashRecoveryTest.

clang/tools/driver/driver.cpp
537 ↗(On Diff #287747)

Rephrased.

aganea added a comment.Sep 2 2020, 7:50 AM

Ping! @MaskRay any further comments?

MaskRay accepted this revision.Sep 2 2020, 9:40 AM

Looks great!

@rnk @amccarth Do you have more comments? ☺️

amccarth accepted this revision.Sep 21 2020, 8:37 AM

LGTM.

rnk accepted this revision.Sep 23 2020, 4:17 PM

Looks good to me, I didn't review very in depth, but I see the test case that we need. :)

This revision was landed with ongoing or failed builds.Sep 24 2020, 12:09 PM
This revision was automatically updated to reflect the committed changes.
ayrivera added inline comments.
lld/ELF/Driver.cpp
895

Hi,

I built locally lld and came across an issue. The ELF driver isn't recognizing multiple -mllvm options. It only process the last -mllvm option that was entered in the command line. For example, I can add -mllvm -print-after-all -debug-pass=Arguments and it only prints the information from -debug-pass=Arguments, and not the IR and the debug data. If I swap the order, then the IR is printed and not the debug information.

I noticed that this change set added a call to cl:ResetAllOptionOccurrences() (line 895) inside parseClangOption. The function parseClangOption is called inside the loop that process the -mllvm options. If I comment this line, then everything works correctly and I can pass multiple -mllvm options through the command.

Is this an intended behavior or an actual issue? Thanks for your time.

MaskRay added inline comments.Sep 28 2020, 5:44 PM
lld/ELF/Driver.cpp
895

Hi,

For example, I can add -mllvm -print-after-all -debug-pass=Arguments

You need two -mllvm, i.e. -mllvm -print-after-all -mllvm -debug-pass=Arguments. And with the option, I see the print-after-all dump.

ayrivera added inline comments.Sep 28 2020, 5:52 PM
lld/ELF/Driver.cpp
895

Hi,

You need two -mllvm, i.e. -mllvm -print-after-all -mllvm -debug-pass=Arguments. And with the option, I see the print-after-all dump.

Thanks for the quick reply. Sorry, this is a typo. I'm using both -mllvm and still got the issue.

aganea added inline comments.Sep 28 2020, 6:03 PM
lld/ELF/Driver.cpp
895

Is this an intended behavior or an actual issue?

At first sight, this is an actual issue. Normally, cl::ResetAllOptionOccurrences() should be called only once per compiler or linker instance, to reset the internal counters for the cl::opts. I would move it somewhere at the begining of readConfigs(). Can you try that see if that solves your issue?

ayrivera added inline comments.Sep 29 2020, 7:58 AM
lld/ELF/Driver.cpp
895

Hi @aganea ,

This patch https://reviews.llvm.org/D88461 , posted by @MaskRay , works. It does what you suggested. Thanks for the quick reply.

rnk added inline comments.Oct 16 2020, 3:24 PM
lld/Common/ErrorHandler.cpp
78

This appears to have regressed shutdown. sys::Process::Exit calls exit, not _exit, so now atexit destructors are run. That's unintended, right?

aganea marked an inline comment as done.Oct 16 2020, 4:03 PM
aganea added inline comments.
lld/Common/ErrorHandler.cpp
78

Yes, the fix is in D88348!