This linking backend is a work in progress, but is enough
to link simple programs including linking against library
archives.
Details
Diff Detail
- Build Status
Buildable 10520 Build 10520: arc lint + arc unit
Event Timeline
wasm/Writer.cpp | ||
---|---|---|
242 | It needs comment -- it is not clear to me what this is for. | |
286–288 | I don't think you need this as you can do this at the beginning of writeTo(). You need a finalizeContents if you don't know the size of a section unless you call that function, but this is not the case. I guess you can eliminate finalizeContents entirely from this patch. | |
415 | Debug code? | |
419 | clang-format | |
513 | Flip the condition and do early return. | |
885–887 | You can remove these three {s. | |
890 | Can you add a comment? I think I do not understand this wasm-specific function. | |
916 | Ditto | |
916 | Location -> Buf By convention, Buf should be the first parameter. | |
927 | I think you can merge this and the following switch by removing Encoding variable. | |
943–944 | Instead of computing the size of a padding, I think it is better to just write 5-bytes yourself. I'd define write{U,S}LEB128(uint8_t *Buf, {u,}int32_t) which always writes 5 byte to Buf. | |
958 | I think I do not understand this function. Is this to create dynamic relocations? Please write a comment. | |
1022 | What is this for? Is this a .rel.dyn section in wasm? | |
1042 | This needs a comment. | |
1059 | Ditto. Please describe it for those who are not very familiar with wasm. | |
1124 | Looks like writeOutput is used only once, and writeOutput is a one-liner. You can inline it. | |
1134 | Please write a comment so that I can understand what this function is supposed to do. | |
1137 | MemoryPtr = alignTo(Config->GlobalBase, DataAlignment); | |
1243 | What are Memory and Table? | |
1252 | What is Elem? | |
1335–1338 | "Num" is a more popular prefix than "Total" to manage a number of occurrences of something. |
wasm/Error.cpp | ||
---|---|---|
22 | Can we use: #ifdef HAVE_UNISTD_H #include <unistd.h> #endif |
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Update relocation names
- [WebAssembly] Feedback from code review
- [WebAssembly] clang-format the patch
- [WebAssembly] Feedback from code review
- [WebAssembly] Factor out write utils
- [WebAssembly] For codereview feedback
tools/lld/CMakeLists.txt | ||
---|---|---|
21 | I'll just remove the link for now. The linker can be/is used via flavor=wasm. | |
wasm/Driver.cpp | ||
211 | I ran clang-format over the whole patch to fix this and the many other places like it. | |
wasm/Error.cpp | ||
22 | I don't think lld currently includes llvm's config.h. This is the exact same code in COFF/Error.cpp and ELF/Error.cpp so I think changing this can/should probably be done in a separate change. | |
wasm/InputFiles.cpp | ||
143–144 | I'm not sure I understand exactly what you mean. If I make WasmObj a regular pointer won't I have to manually delete it in the destructor? Why is that better? (Sorry C++11+ is kinda new to me). FWIW, this is exacly how the COFF linker handles this (See COFF/InputFiles.cpp:92) | |
wasm/SymbolTable.cpp | ||
40 | Yes. In fact all lazy symbols are undefined. I believe this is try to ELF and COFF linkers too (i.e. LazyKind > LastDefinedKind) | |
85 | Interesting, my intuition here is to check for error cases an early out.. so the that fall through at the end is the success path. But I guess this makes sense. | |
86–91 | Also added a test for this. | |
wasm/Symbols.h | ||
31 | Copied from lld/COFF/Symbols.h. | |
49 | Ok, I changed it so that lazy symbols are not undefined. This simplified the code a few places too. I had been assuming that all symbols were either defined or undefined, but it seems that doesn't need to be that case. | |
wasm/Writer.cpp | ||
103 | All these functions are for creating the synthetic sections, which are created in memory prior to mmaping the output file (since their size needs to be known). | |
103 | I did some refactoring and put all these writer utility functions in a separate file. | |
173 | Yes. Only custom sections have names. All the known sections are identified by their integer code. | |
186 | Basically I'm printing extra information here in order to override the default behaviour of write_uleb128. Maybe the debug output a little more useful, but maybe over complicated? | |
239 | For synthetic sections we need to fully serialize them in order to know that size of the final output file. i.e. we need to write them out before we can mmap the file. This is only true for the synthetic sections. | |
286–288 | Actually we don't know the size of the section until we synthesize its header, however you are right we can just this in the constructor and avoid implementing this method at all. | |
916 | Renamed for clarity | |
943–944 | I agree. But encodeULEB128 is defined in include/llvm/Support/LEB128.h and it takes a padding count. It should really take a total byte count to avoid having to do this calculation. But that should probably be a separate change. I work on that in parallel. | |
1022 | This is for '-r' since the output object needs to contain all the relocs in that case (I assume this is the same for ELF/COFF)? | |
1243 | Memory defines the linear memory required my the wasm module. It is essentially just and initial and max size. Tables currently only used for indirect function calls. All indirect function calls are done via that table. Taking the address of a function in C causes that function to be added to this table. This is because functions don't themselves live in addressable memory. | |
1252 | These are the element of the table (the indirect function call table). |
ELF/Error.cpp | ||
---|---|---|
19–22 ↗ | (On Diff #113607) | What is this change? |
tools/lld/lld.cpp | ||
37 | I don't think lld -flavor <flavor> is a preferred way of dispatching, and I'd avoid using it in wasm. If you use the option, and someone implements a different linker for wasm, it still has to claim it is lld and support -flavor option. How about ld-wasm? If, for example, bfd wants to support wasm, they could name their linker as ld-wasm.bfd. | |
wasm/Driver.cpp | ||
113–116 | By convention, I'd rename Result -> Res and SymbolName -> Sym. | |
142–143 | Remove a comment about sysroot. | |
207 | I don't think this function has to be a member of LinkerDriver. You can just move the code from here to Archive::addMember, no? | |
291–295 | This is probably a bit simpler: // Parse -allow-undefined-file <filename> if (auto *Arg = Args.getLastArg(OPT_allow_undefined_file)) if (Optional<MemoryBufferRef> Buf = readFile()) for (StringRef S : getLines(*Buf)) Config->AllowUndefinedSymbols.insert(S); | |
297–304 | Use error instead of fatal as fatal is a noreturn function but we want to usually do not exit on the first error. A rule of thumb is to use fatal only for corrupted file. For other errors, use error. | |
wasm/InputFiles.cpp | ||
72 | I wouldn't assert(Sym) because Symtab->find(Name)->isDefined() should immediately fail with SEGV if find returns a nullptr. | |
82 | Ditto | |
98–99 | We avoid hash table lookup as much as possible because it's too slow. Instead of storing symbol names to vectors, we usually store Symbol * or SymbolBody * to vectors, so that we don't need to lookup a hash table. Can you do that? Basically, we don't want to lookup a hash table more than once for each string instance. If you do more than once, then you probably should keep the hash table lookup result (i.e. Symbol * or SymbolBody *) so that it doesn't have to repeat the same hash table lookup. I think this is one of the reasons why lld is much faster than other linkers. | |
99–101 | It feels to me that the code has probably a bit too much asserts. Can you simply return Symtab->find(Name)->getOutputIndex()? | |
wasm/InputFiles.h | ||
108 | Could you add blank lines before the block comments? | |
wasm/SymbolTable.h | ||
23–25 | This is not used in the header. | |
wasm/Symbols.h | ||
58 | Isn't function global? | |
wasm/Writer.cpp | ||
100 | It is still using the streaming output interface, but is there a reason to prefer that? Since we are using mmap-based or std::vector-backed buffer exclusively in lld in general, I don't like to introduce a new way of writing to a buffer unless it is absolutely necessary. | |
102 | Do you have to use formatv? It seems you can do "section type [" + sectionTypeToString(Type) + "]". | |
256 | Outline this constructor and writeTo function. | |
497 | Calculates |
- Merge branch 'master' into wasm
- [WebAssembly] Update expectations now that SymbolInfo not a required YAML field.
- [WebAssembly] Don't try to export main if its undefined
- [WebAssembly] Review feedback
wasm/Driver.cpp | ||
---|---|---|
291–295 | I think I originally make the argument handling and the file processing seperate since I imagined a future where these files might grow in number and/or complexity (i.e. a set of default files in addition to the command line provided one). But I guess we can add that complexity where we get there. | |
wasm/Symbols.h | ||
58 | Wasm symbols are modeled as either either functions, or globals (in the wasm sense). Wasm globals correspond to C/C++ globals and have addresses in linear memory. So a given symbol is either a wasm function or a wasm global. | |
wasm/Writer.cpp | ||
100 | For writing these synthetic sections that contains a lot of LEB encoded data I think this stream interface makes a lot of sense. The LEB encoding functions can write directly to them and they grow on demand. Without this we would need to invent a new kind of auto-growing buffer that does basically the same thing. We can't know how big the vector/buffer is up front because we don't know the length of the LEBs until we encode them. |
- [WebAssembly] Fix index for R_WEBASSEMBLY_MEMORY_ADDR_* relocations
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Don't re-export entry point as 'main'
- [WebAssembly] Allow mulitple wasm data segments
- [WebAssembly] Update tests to match changes in llvm
- [WebAssembly] Update LEB to match changes to LEB128.h
- [WebAssembly] don't test data-layout with -r
- [WebAssembly] Refactor code to create OutputSections.cpp
Sorry for the slow response. This patch is huge and needed concentration. Generally,
test/wasm/archive.ll | ||
---|---|---|
2 | All these tests need "REQUIRES: wasm" so that they won't fail on buildbots that don't have wasm enabled. | |
wasm/Driver.cpp | ||
181 | I'd use fatal only to report corrupted input files. Since this is just a command-line error, I'd use error instead. This way, we can report more than one error. | |
229 | This is -help. (/help is a Windows option.) | |
268–271 | This is more consistent with other option handlers. if (auto *Arg = Args.getLastArg(OPT_allow_undefined_file)) if (Optional<MemoryBufferRef> Buf = readFile(A->getValue())) ... Also this code catches error when you pass -allow-undefined-filename ''. | |
275–279 | Please use error(). | |
278 | hasArgNoClaim is the same as hasArg in lld's usage of the class, so I"d use hasArg. | |
wasm/Driver.h | ||
1 ↗ | (On Diff #115885) | It seems everything declared in this header is actually being used only by Driver.cpp, so can you move them to Driver.cpp? |
wasm/InputFiles.cpp | ||
14 | Driver.h is unused. | |
167–172 | Do you think you can just insert symbols to the symbol table instead of temporarily storing names to these FunctionImports or GlobalImports variables? I'd just add an Undefined symbol (or something more appropriate) here and add its return value (a Symbol *) to FunctionSymbols and GlobalSymbols, so that I can remove FunctionImports and GlobalImports functions as well as getFunctionSymbol and getGlobalSymbol accessors. | |
184 | Inlining this seems more straightforward since the function is short and used only once: Symbols.push_back(Symtab->addUndefined(this, &Sym)); | |
188 | Symbols.push_back(Symtab->addDefined(this, &Sym)); | |
209 | Here's a question: do you have to use llvm-ar to create archive files for wasm files? I guess the answer is yes because regular ar commands don't understand wasm files so they cannot create symbol tables for them. We have the same issue for LTO, because for LTO builds, input files are LLVM bitcode files which regular ar commands don't know about. We found that the problem is very annoying and using the right ar command is sometimes pretty hard. So we have this hack in ELF. You don't want make a change to this patch at the moment, but I think this is something you want to consider. | |
267 | We found that it doesn't make much sense to strip directory name components. I wouldn't call getBasename() on them. | |
wasm/OutputSections.cpp | ||
30 | Can you return StringRef so that it becomes more llvm-ish? | |
57–58 | If the use of fatal is correct (if this is to report a corrupted input file), then please remove return after that, since fatal is a _Noreturn function. If not, please use error. | |
62 | toString functions should be named in lld namespace instead of lld::<arch> namespace, so that it doesn't hide overriden toString functions in lld namespace. | |
63 | "section: " is not part of a name, so I'd remove it. If you need this label, print it out on caller side. | |
64 | This is perhaps a bit easier to read. std::string S = sectionTypeToString(Section->Type); if (!Section->Name.empty()) return S + "(" + Section->Name + ")"; return S; | |
81 | Looks like you can just return the RHS value instead of assigning it to a temporary variable. | |
98 | fatal is a _Noreturn function, so no need to add break after it, as it is unreacahble. | |
wasm/OutputSections.h | ||
25–26 | I don't think you need this constructor because, instead of OutputDataSegment(Foo, Bar) you can write like this: {Foo, Bar} | |
54 | Offset = 0 is a preferred way of initializing a member if it always initialized to 0. | |
68–71 | nit: you could write without reassignments: memcpy(Buf + Offset, Header.data(), Header.size()); memcpy(Buf + Offset + Header.size(), Body.data(), Body.size()); |
wasm/OutputSections.cpp | ||
---|---|---|
107 | Can you use ArrayRef? | |
148 | Is it okay to fallthrough? If so, add LLVM_FALLTHROUGH. | |
151 | Why you can't directly call encodeULEB128(Reloc.Value, Buf, 5); here? | |
156 | Ditto -- it seems you could call encodeSLEB128(Reloc.Value, Buf, 5); here. | |
165 | Previous switch doesn't have a default, so the control could reach here with Encoding uninitialized, no? | |
166–173 | Remove {} | |
196 | I believe you don't need this because raw_string_ostream::~raw_string_ostream would do this for you. | |
202 | If BodySize is always initialized to 0, the preferred way is to declare the variable with the initial value (i.e. BodySize = 0;) | |
214 | remove const as it is obvious that we don't mutate Content. | |
231–232 | You want to merge successive log into one function call because otherwise other thread could write other lines between the two lines. | |
wasm/OutputSections.h | ||
51 | I don't know why you need to aggregate input sections' relocations into one place. For COFF or ELF, we don't do that. Each input section has writeTo method and knows how to copy itself to a given buffer and relocate itself. Can't you do in the same way? | |
wasm/Symbols.cpp | ||
47 | If you know File is of ObjectFile, use cast instead of dyn_cast. | |
wasm/Symbols.h | ||
95 | I geuss you can initialize ArchiveSymbol = {nullptr, 0, 0}; | |
107–110 | In lld we don't use operator overloading unless it is absolutely necessary. Since you can explicitly call toString() before emitting it to an output buffer, you can live without this. I'd remove this function. | |
wasm/Writer.cpp | ||
36 | constexpr instead of const may make more sense. | |
70 | Do you need this? | |
145 | We use pre-increments unless there's reason to use post-increments. | |
194–196 | You can omit {} | |
207–211 | You can omit {} | |
267 | is? | |
346 | Remove an empty function. | |
376 | auto -> auto * | |
385 | auto -> auto * | |
505 | We usually name this Offset. Is uint32_t enough? I thought 64-bit executables could be larger than 4 GiB. | |
585 | I'm a bit confused about what this function does. Does this calculate file offsets as opposed to virtual memory offsets? In general, this file needs more comments. | |
608–613 | Looks like it's the same as if (WasmFile->memories().size() > 1) fatal(File->getName() + ": contains more than one memory"); | |
628 | No else after fatal, as fatal does not return. | |
765–771 | You could write void wasm::writeResult() { Writer().run(); } without these namespaces. | |
wasm/Writer.h | ||
23–27 | It seems you are not using this struct. | |
wasm/WriterUtils.cpp | ||
24 | Use the LLVM naming convention. | |
35–36 | fatal is _Noreturn. | |
40–41 | Instead of enclosing the entire file, can you explicitly add wasm:: to each public functions? | |
wasm/WriterUtils.h | ||
40–44 | These two functions look the same. |
I'm sorry that I accidentally hit "Submit" button while I was writing a comment.
So, generally, this patch is improved in a right direction. There are still some code that I don't understand yet, mainly in Writer, but it is probably due to my lack of knowledge about wasm.
Thank you for all the feedback @ruiu. It not surprise to me that a patch of this size takes a long time to review. I'm currently working on a fairly large change to allow data sections with the same name to be combined into a single section.. in order to allow things like init_array/fini_array. My aim is to get that done before the initial version of this patch lands.
- [WebAssembly] Update test output to match llvm changes
- [WebAssembly] only run wasm tests when wasm is enabled
- [WebAssembly] Review feedback
test/wasm/archive.ll | ||
---|---|---|
2 | I added lit.local.cfg to this directory to disable them all in this case. | |
wasm/InputFiles.cpp | ||
167–172 | Yes, you are right. The work i'm doing to combine data sections will address, and makes this code much cleaner. So this will be fixed on my next upload. | |
209 | Yes, llvm-ar would be needed to great the archive indexes. I see yes, it might be good to handle archives with no symbol table. I'll add a TODO note here. | |
267 | Ah, I was copying from the COFF linker. We should probably make this consistent at some point (perhaps as good candidate for shared code?) | |
wasm/OutputSections.cpp | ||
57–58 | switched to llvm_unreachable | |
196 | True, but the getSize() below depends on Header, so we need the value before the end of the function. I could refactor to avoid this, especially since its only a logging statement. | |
wasm/OutputSections.h | ||
25–26 | I'm creating these using emplace_back() which seems to require a constructor. I could just use push_back() instead I guess? |
Is this ready for another round of review?
wasm/WriterUtils.cpp | ||
---|---|---|
38 | More conventional way of doing it is to add using namespace lld; at the start of each file and do not enclose a file with namespace lld {}. |
- Merge branch 'master' into wasm
- [WebAssembly] Use UINT32_MAX rather than 0 for adddresses of undefined globals/functions
- [WebAssembly] Combine data segments with the same name
- Reflect upstream path changes to include/lld/common (#7)
- Delete lldConfig library from CMakeFiles.txt
- Add lldCommon to wasm LINK_LIBS (left over from last update)
- Merge branch 'master' into wasm
- Fix a merge conflict
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Ensure than name section is in function index order
- [WebAssembly] Fix for SLEB relocations of type UINT32_MAX.
- Merge remote-tracking branch 'origin/master' into wasm_work
- [WebAssembly] Switch to Common/ErrorHandler.h
- [WebAssembly] use Common/Threads.h
This patch generally looking good. Looks like all my big concerns have already been resolved.
I have a little concern on documentation. Since wasm is a new object file format, and it employs different concepts than ELF or COFF, we cannot assume any prior knowledge to readers on wasm file format or linking semantics other than the common knowledge about linking. So, in short, I think this patch needs a bit more comment. Particularly, comments from high level would help readers understand your code, because once you understand a concept, minor details of your code will look pretty obvious. (This is a pretty straightforward linker implementation, so once you understand what it is supposed to do, it is easy to understand the code, I think.)
tools/lld/CMakeLists.txt | ||
---|---|---|
21 | Looks like <arch>-<toolname> is more common than <toolname>-<arch>. For example, cross gcc for AVR installed on my system is avr-gcc and not gcc-avr. So, maybe wasm-ld is a better name than ld-wasm. Sorry for back-and-forth. | |
wasm/InputFiles.cpp | ||
136 | Please add a brief function comment saying that this function returns a segment that contains a given symbol. | |
156–159 | nit: pre-++ is a bit more C++-ish. | |
234 | count++ → ++Count | |
245 | each other.) | |
wasm/InputFiles.h | ||
84–86 | They are private functions, no? | |
118 | nit: add a blank line. | |
124–128 | Can you add a blank line before each line comment? | |
wasm/InputSegment.h | ||
7 ↗ | (On Diff #120681) | Can you describe what is "segment" in wasm? I was thinking that segments are another name for output sections, but looks like it is a different concept. |
wasm/OutputSections.cpp | ||
34 | Return StringRef instead of const char *. | |
57–58 | Looks like this function is not updated. | |
wasm/Symbols.cpp | ||
36 | Can you add function comments to functions defined in this file? | |
wasm/Writer.cpp | ||
70 | Looks like you didn't update this file for my previous comments. |
- [WebAssembly] Handle early exit via exitLld
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Fix --verbose flag
- [WebAssembly] Fix gcc warnings/errors
- [WebAssembly] Fix code relocation for large output files
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Support comments in wasm.syms
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Update for llvm change
- [WebAssembly] Review feedback
Hi ruiu.
I've added some documentation, removed the whole-file namesspace wrapping, integrated with the recent libCommon changes and upstream llvm changes.
Hopefully we can land this soon, so we can move to incremental diffs that will be easier to review. Or maybe you think I think try to be more feature complete for the initial version?
I'd write more documents to describe the wasm object file format and the wasm linker's semantics, but we can do that later.
LGTM with these changes.
Thank you very much for your patience!
docs/WebAssembly.rst | ||
---|---|---|
8 ↗ | (On Diff #122741) | mimic |
33 ↗ | (On Diff #122741) | functions |
wasm/Driver.cpp | ||
83 | If you change this to LinkerDriver().link(Args), can you remove global variable Driver? | |
151 | Could you revert this '#' check if you don't need it now? I mean if we want to add a way to have a comment in some file, we want to do the same thing for ELF and wasm, instead of inventing a different way for each target. |
- Fix use after free
- Merge remote-tracking branch 'origin/master' into wasm
- [WebAssembly] Add test case for output >128 functions
- [WebAssembly] Refactor the way relocations are applied and written
- [WebAssemlby] Parallel code writing
- [WebAssembly] Parallel data writing
- [WebAssembly] clang-format
- [WebAssembly] IWYU
- fix typo in addDefinedGlobal
- Merge branch 'master' into wasm
- review feedback
wasm/Driver.cpp | ||
---|---|---|
103 | After this was changed from const to constexpr, VS 2017 compiler on the waterfall bot does not compile this, saying: tools\lld\wasm\Options.inc(28): error C2131: expression did not evaluate to a constant tools\lld\wasm\Options.inc(28): note: failure was caused by non-constant arguments or reference to a non-constant symbol I'm not really sure what the problem is, since this looks fine to me, but anyway could you possibly revert this to const, to make the VS2017 happy? |
No problem! Glad to land this initial version. Lets hope it sticks ..
docs/WebAssembly.rst | ||
---|---|---|
33 ↗ | (On Diff #122741) | Reworded. |
wasm/Driver.cpp | ||
103 | Ok, I reverted that for now. The other linker backed all use const rather than `constexpr` anyway. | |
151 | This is already a completely wasm-specific feature. And it quite useful to support comments at the moment. See: https://github.com/jfbastien/musl/blob/wasm-prototype-1/arch/wasm32/wasm.syms In the long run I think we might be able to find a better solution to this. Some way to inform the linker about the expected undefined symbols. Maybe some kind of declspec in the source code? |
All these tests need "REQUIRES: wasm" so that they won't fail on buildbots that don't have wasm enabled.