Currently, no one owns script::Parser buffers, but yet ELFLinkingContext gets
updated with StringRef pointers to data inside Parser buffers. Since this buffer
is locally owned inside GnuLdDriver::evalLinkerScript(), as soon as this
function finishes, all pointers in ELFLinkingContext that comes from linker
scripts get invalid. The problem is that we need someone to own linker scripts
data structures and, since ELFLinkingContext transports references to linker
scripts data, we can simply make it also own all linker scripts data.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hrm, looks like it's doing too much to solve the issue we have. We basically want the linker script memory buffer to have the same lifespan as the linking context. Instead of adding a ScriptInstance to the linking context, can we add a method to transfer the ownership of a memory buffer to the linking context? Then we modify script::Lexer to not take the ownership of a given memory buffer.
Although I don't see major problems with your patch, I like Rui's proposed approach better.
The reason I created a new class that encapsulates the parser is because I
plan to later add code to this class (or to another new class called
ScriptSema that is owned by ScriptInstance) that cooks the AST with
semantic info to answer high level questions about the linker script, such
as "in which output section to put this input section, according to this
linker script?". Anyway, I will need to find a location to put new data
structures to store this data, and I thought of this as a first step. What
do you think?
I didn't think hard enough about that, but it might be good to have a class to cook linker scripts. But if you have such class, you can instantiate that class in the driver and only set that class instance to the linking context, no? You wouldn't have to add a pointer to linker script ASTs to the linking context, I guess.
I prefer a patch that does one thing; in this case, we should try to fix the unsafe memory access first, so I'd fix that without adding any new feature.
I also want to solve the problem of this patch (make ELFLinkingContext own
the buffers) without mixing with future directions on cooking linkerscript
AST. However, we will essentially end up with a vector of BumpPtrAllocators
in ELFLinkingContext (one for each linker script), which makes the code, in
my opinion, much harder to understand (it's better to have "hey, these are
a bunch of linkerscripts" than "hey, here are a bunch of unnamed buffers
that need to be anchored somewhere"). What are the benefits of this? We
avoid including LinkerScript.h into ELFLinkingContext.h, but that is
something that we will eventually need to do to avoid rewriting all classes
in LinkerScript.h in their "cooked" version, don't you think?
For example, if you look specifically at the problem of mapping input
sections to output sections, we already have classes that encode this same
information in the script namespace (script::OutputSectionDescription in
the raw script AST) and in SectionChunks.h (template class
lld::OutputSection<>), used by DefaultLayout. I wanted to avoid creating
yet another class such as "CookedOutputSectionInfo", and that's why I had
an intent on preserving the original linker script AST structure in the
LinkingContext -- mostly to avoid rewriting the same info in different data
structures that stores the same information that is already exposed in the
AST.
Let me know your thoughts and thanks for reviewing this.
Adding a test might give more information on the functionality that you are trying to add too.
include/lld/ReaderWriter/ELFLinkingContext.h | ||
---|---|---|
295–311 ↗ | (On Diff #19115) | Why do you recognize the master linker script from the rest ? The linker can use more than one -T option to specify more than one linker script too. Input files that are linker scripts are positional too. For example : ld 1.o -lc Where libc.so is a linker script that needs to be handled in command line order. |
include/lld/ReaderWriter/ELFLinkingContext.h | ||
---|---|---|
304 ↗ | (On Diff #19168) | Add a blank line after this line. |
include/lld/ReaderWriter/LinkerScript.h | ||
748 ↗ | (On Diff #19168) | Maybe we can write "= delete" instead of LLVM_DELETED_FUNCTION? I believe the macro is for pre-C++11 compilers. |
765 ↗ | (On Diff #19168) | This function allows the user to lazy-evaluate linker scripts, but I think lazy evaluation is not a good idea here. I'd like get errors as soon as possible if there's an error in a linker script. How about this. Change the signature of parse to std::error_code parse(). We call this function after instantiating the class. If it returns an error, it means there's an error in the linker script. If it doesn't return an error, subsequent getTopLevelNode will always succeed. Also, getTopLevelNode seems a bit too long. Maybe get is enough? |
I committed with LLVM_DELETED_FUNCTION because "= delete" breaks my MSVC 2012 box. After we move on to officially only support MSVC >= 2013, we can drop this macro.