This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Better error reporting for linker scripts
ClosedPublic

Authored by evgeny777 on Nov 17 2016, 6:07 AM.

Details

Summary

This patch shows both file and line for linker script errors, including:

  • Errors occurred in lexer (ScriptParserBase::tokenize)
  • Errors occurred in included files (via INCLUDE directive)

See updated test cases in diagnostics.s

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 updated this revision to Diff 78358.Nov 17 2016, 6:07 AM
evgeny777 retitled this revision from to [ELF] Better error reporting for linker scripts.
evgeny777 updated this object.
evgeny777 added reviewers: ruiu, rafael.
evgeny777 set the repository for this revision to rL LLVM.
evgeny777 added a project: lld.
evgeny777 added subscribers: grimar, ikudrin, llvm-commits.
grimar added inline comments.Nov 17 2016, 6:24 AM
ELF/LinkerScript.cpp
1140 ↗(On Diff #78358)

In one of my latest patches I found that it is confusing to have multiple lines in output that are starting from "error: xxx".
I think we should not call error() for more than a single error at once.

1159 ↗(On Diff #78358)

<removed (have some issue with applying del on comment in phab, it does not want to do that. I have no time to wait until it do, so had to edit in that way)>

evgeny777 added inline comments.Nov 17 2016, 6:54 AM
ELF/LinkerScript.cpp
1140 ↗(On Diff #78358)

Here multiple lines are used to highlight error position, which might be useful, especially when you run linker from command line. I would keep this bearing in mind that we already have tests for it.

ruiu edited edge metadata.Nov 17 2016, 9:23 AM

I wonder if we can remove INCLUDE directive from the linker script. I added that without thinking about it, but I've never seen any use of it. Linker scripts that add other linker scripts usually add that with INPUT() directive.

This is a question, I cannot answer :). You can't replace INCLUDE with INPUT and vice versa, but if you remove it then this patch will be much simpler.

ruiu added a comment.Nov 17 2016, 12:01 PM

Let's try to remove it to see if we actually need it. Yeah, courage.

Looks like I found few projects which use INCLUDE directive:

http://libopencm3.org/wiki/Run_From_RAM
https://github.com/cobyism/edimax-br-6528n/blob/master/AP/mkimg/RTL8196C_1200_tools/libstrip/libstrip
https://github.com/RIOT-OS/RIOT/blob/master/cpu/stm32f4/ldscripts/stm32f415rg.ld
https://chromium.googlesource.com/chromiumos/third_party/coreboot/+/firmware-uboot_v2-1299.B/src/arch/x86/Makefile.bootblock.inc

The last one seems to be a chromebook boot loader: it includes script files in auto-generated linker script ldscript.ld

I don't know how important is to support those in lld, but please confirm that you want to get rid of INCLUDE

ruiu added a comment.Nov 18 2016, 12:06 AM

I'm not speaking for Google, but Chromebook boot loader is probably a large user we can't ignore, so I'm inclined to not remove INCLUDE. I'll review your patch tomorrow.

evgeny777 updated this revision to Diff 78482.Nov 18 2016, 1:01 AM
evgeny777 edited edge metadata.
evgeny777 removed rL LLVM as the repository for this revision.

Simplified and add test case

rafael added inline comments.Nov 18 2016, 6:31 AM
ELF/LinkerScript.cpp
1020 ↗(On Diff #78482)

That is pretty much what a MemomryBufferRef is. Can't you store that instead of the std::pair?

ruiu added inline comments.Nov 18 2016, 9:57 AM
ELF/LinkerScript.cpp
1131 ↗(On Diff #78482)

Currently, error reporting is zero-cost if there's no error (we find an error location after an error is raised), and that seemed to be a nice hack to use a StringRef's pointer to find a location. However, it didn't work well for INCLUDE as you know. I think we don't want to keep the hack that's proved to be working not well.

Linker scripts are small, so it is OK to do more and use more memory when tokenizing them.

How about this? We can change tokenize to return not only tokens but token locations as a parallel array, like this.

std::pair<std::vector<StringRef>, std::vector<std::string>>
tokenize(MemoryBufferRef MB);

ScriptParser class then store the vectors to Tokens and LineNo.

ScriptParser class then store the vectors to Tokens and LineNo.

You also have to store file name for each token and I think this is too much. How about storing a vector of MemoryBufferRef instead of 'Input' in ScriptParserBase? Please take a look at the updated diff.

evgeny777 updated this revision to Diff 78704.Nov 21 2016, 5:06 AM

Moved file management to ScriptParserBase. Now errors in version scripts and dynamic lists are handled in a new way as well

ruiu accepted this revision.Nov 21 2016, 6:13 AM
ruiu edited edge metadata.

It's much better than before, but I still think that we could do more by using parallel arrays. We want to maintain four tuples (filename, line content, line number, column number) for each token, and all of them are small (they are intetgers and StringRefs). So I think it's not going to be too much.

That being said, we can do that later. It is indeed a very good improvement. Thanks!

LGTM

ELF/ScriptParser.cpp
159 ↗(On Diff #78704)

I'd rather just use Tokens[Pos - 1] instead of current() everywhere because this function is used from those who already know internals of this class.

ELF/ScriptParser.h
32 ↗(On Diff #78704)

Make this private.

36 ↗(On Diff #78704)

Ditto

This revision is now accepted and ready to land.Nov 21 2016, 6:13 AM
This revision was automatically updated to reflect the committed changes.