Allows to open and successfully parse Linker Scripts in UTF-8 BOM encoding.
Details
Diff Detail
- Repository
- rLLD LLVM Linker
Event Timeline
We happen to have BOMs in the linker scripts for our custom platform because they were authored in Windows (where we cross-compile from).
This doesn't seem very necessary. GNU ld rejects BOM as well. I think you can achieve cross compilability by simply removing BOM.
% xxd a.x 00000000: efbb bf45 4e54 5259 285f 6c61 6265 6c29 ...ENTRY(_label) 00000010: 0a % ld.bfd a.o -T a.x -o a ld.bfd:a.x:1: ignoring invalid character `\357' in expression ld.bfd:a.x:1: ignoring invalid character `\273' in expression ld.bfd:a.x:1: ignoring invalid character `\277' in expression
ld ignores the BOM, but still links. lld mistakes it for part of the next token and generates an error.
A BOM is perfectly valid at the start of a UTF-8 file (though not very useful, granted). Why not support it? Other tools under the LLVM umbrella do, e.g. clang.
We need a good reason to support it, not the other way around. ld emitting warnings is a pretty clear signal that BOM is not a good idea in a linker script.
I disagree, ld emitting invalid character warnings is simply a good indication that it was never tested with such an input :-)
If the community thinks this bug isn't important enough to warrant the two-line fix, then so be it, I won't push the matter. No hard feelings.
This was merely an attempt to fix another instance of a common Windows-incompatibility bug that we happened to come across with a real-world linker script.
This needs a test case, and it's a good idea to upload patches with more context (use -U99999 if generating a patch via git to upload to the web interface, or just use arcanist).
... I actually think this is a good indication that your tests have never been verified. If you append --fatal-warnings (like the compiler driver option -Werror), ld.bfd will return 1.
If the community thinks this bug isn't important enough to warrant the two-line fix, then so be it, I won't push the matter. No hard feelings.
This was merely an attempt to fix another instance of a common Windows-incompatibility bug that we happened to come across with a real-world linker script.
From your arguments I never see the reasoning that this should be supported ;-) In some cases lld is more rigorous - that oftentimes identifies brittle constructs. For this BOM case it may be hard to argue it is brittle, but from the GNU ld warning I really can't say it should be supported.
The thought "UTF-8 BOM is not useful" may just be ingrained in my mind. I probably just don't know enough of Unicode to simply say "this is going to be useful in lld". Probably raise an issue on https://sourceware.org/ and see what the GNU ld maintainers say?
Other tools under the LLVM umbrella do, e.g. clang
Please also keep in mind that clang doesn't support options such as -finput-charset -fexec-charset. Its UTF-8 support is just enough that does not make people feel sad.