This is an archive of the discontinued LLVM Phabricator instance.

Initialize global vectors with invalid members to catch uninitialization errors
Needs ReviewPublic

Authored by ruiu on Nov 18 2019, 10:49 PM.

Details

Summary

If you are using lld as a library, and if you are calling lld more than
once in your process, you may encounter a problem that lld runs correctly
only on the first run. This is often caused by global vectors that are
not emptied on each linker invocation.

This patch initializes global vectors with nullptr elements so that
we need to reset them even for the first run of the linker to make
it easy to catch errors.

Event Timeline

ruiu created this revision.Nov 18 2019, 10:49 PM
Herald added a project: Restricted Project. · View Herald Transcript

We have other non-vector global variables that cannot be checked this way: SyntheticSections.h InStruct in, OutputSections.h Out::*, LinkerScript.h script, Config.h config, Relocations.cpp undefs, etc. My feeling is that if we want to detect all potential problems, we can hack a ld.lld server, let ld.lld commands invoked by lit send linking requests to the server. Many tests run several ld.lld commands. This can probably catch most problems. The idea is simple but it may not be worthwhile implemented in the code base, though.

ruiu added a comment.Nov 19 2019, 12:16 AM

The other way of testing this is actually invoking the linker twice from main(), only in a test mode (specified by some environment variable), to see if it works. It is doable, and it should be able to detect all errors, but I'm not sure if we want to run tests twice.

I am not particularly convinced that this will fix the problem, as it simply moves the problem around - instead of having to remember to clean up global variables, now programmers have to remember to initialize them with invalid values. This would certainly make it easier to detect errors once you realize there is one, but if someone silently adds a global variable and forgets to initialize it, you're back at square one. The only reliable way to fix the problem is to create an object whose constructor or destructor is automatically called when necessary, which is why I suggested a global container object. Adding anything to the global container object doesn't require the programmer to remember anything, because the destructor will automatically destruct everything inside of it. If this is simply not possible, some sort of baseline test as suggested by MaskRay is likely your only option.

One thing that might work better than potentially random problems if the containers aren't cleared, is some kind of assert only, or self-diagnosis mode that could give an assert or error message that a data-structure hadn't been cleared. For example assert(inputSections.empty()); assert(In.bss == nullptr);. With asserts checking I think we could get away with a small set of tests to run twice with some --self-diagnostic-option to check the majority of the cases.

I agree with blackhole12 that this doesn't solve the problem for new global variables, it only mitigates refactoring of existing ones although it does provide a strong hint as to what should be done. A RAII container for the existing containers could help, although it again suffers from the problem that we'd need to remember to use one for new globals. I think it would need to be combined with some kind of testing.

Does it make sense to accumulate all global variables in the one place? Like place them to a separate file with initGlobals method?
I think one of their problems is that they spreaded everywhere and initialized from many places.

ruiu added a comment.Nov 19 2019, 8:59 PM

Does it make sense to accumulate all global variables in the one place? Like place them to a separate file with initGlobals method?
I think one of their problems is that they spreaded everywhere and initialized from many places.

Yeah, that might be both practical and effective, let me try that to see how it will look like.