The debug (custom) sections can be generated using D44184 logic. The linker needs to combine/concat these sections and re-adjust relocation records. (The documentation for relocation records can be found at https://github.com/WebAssembly/tool-conventions/pull/50)
Details
Diff Detail
- Repository
- rLLD LLVM Linker
- Build Status
Buildable 17740 Build 17740: arc lint + arc unit
Event Timeline
For a patch like this, I'd like to know the overview of the debug story in the wasm world first, so that I can review this patch in that story line. I can imagine a lot of different strategies to enable debug info for wasm. Is debug info embedded to wasm file or handled as a separate file? Will it be in DWARF format (looks like so)? Can DWARF really support wasm-style memory abstraction? I'd be really helpful if you share your plan.
As now, there is no concrete proposal/specs on the table for wasm debug information and support yet. The D44184 and this patch will export DWARF information as it is generated by LLVM, which is more than enough to export source maps for WebAssembly. At the moment there is some information missing, such as DebugValue, but that's addressable in the future. While this entire work appears to be too experimental, it will help to prototype and identify future debug information format for WebAssembly and start experiments with debuggers earlier.
Is debug info embedded to wasm file or handled as a separate file? Will it be in DWARF format (looks like so)?
The generated by LLVM wasm files will have custom sections with DWARF (?) information in them as it is generated by LLVM. But it is not necessarily what will be published on the web -- the non-functional requirement for the wasm is still to keep binary as small as possible, so the custom sections (including debug one) will not be present in production builds.
Can DWARF really support wasm-style memory abstraction?
We will try to find the way to express that in DWARF structures, if it's not already happening. I'm more concerned about expressing stack machine registers (in DebugValue expressions) via DWARF -- in a worst case, we will need to extend the format to address that.
If its OK with you I'm going to try and split off the generic custom section handling and land that first. See https://reviews.llvm.org/D45297
Since this looks like a straightforward implementation to copy DWARF sections to an output, I think I'm fine with this change overall as long as wasm guys are fine. But I feel more comfortable if you document the debug story somewhere (as a separate doc or as a comment). Currently it is too terse that readers of the code can't tell what is decided and what's not.
wasm/InputFiles.cpp | ||
---|---|---|
314 | Is this a new enum? Looks like it doesn't exist at the moment. |
General support for custom sections is here: https://reviews.llvm.org/D45340
Once that lands this change should get a lot smaller I hope.
I don't think I understand what you are trying to achieve with this patch in the first place, as you didn't explain it anywhere. Can you write a patch description properly so that we can understand the intention of the change and what the change is?
wasm/InputChunks.h | ||
---|---|---|
196 | This function looks a bit odd. This is "InputSection", so an instance of this class is a section. And does a section has "getSection" method? It doesn't sound quite right. | |
197 | We don't use the term "Payload" in lld. We simply call it Data when we have to name it. Looks like this function is unused? Can you remove this? | |
wasm/InputFiles.cpp | ||
46 | getSizeLength sounds a odd name. Aren't size and length synonyms? | |
47 | Let's be consistent. Since this function returns size_t, it should be size_t. But I think it is better to use uint64_t when something can be larger than uint32_t. | |
148 | Are you sure that uint8_t is enough? We don't normally add const when it is obvious. | |
155 | We don't use auto unless it's type is obvious. | |
195 | When you use {}, add braces to all if and else ifs. I.e. if (...) { } else if (...) { } else { } or if (...) ...; else if (...) ...; else ...; are OK, but if (...) else if (...) else if (...) { } else ... isn't. |
wasm/InputFiles.cpp | ||
---|---|---|
112 | Can you use Sym->Function->getFunctionBody() here like you do below? | |
117 | Can you remove this TODO? Shouldn't the value in the file always be section-relative? | |
335 | I don't think undefined section symbols should be possible. In fact, so we even want defined section symbols in the symbol table? They don't have unique names after all. The way ELF handles this is that all section symbols have local binding so should never get added to symboltable. | |
wasm/SymbolTable.h | ||
42 ↗ | (On Diff #144045) | We already type with this name in OutputSections.h. |
wasm/Symbols.h | ||
196 | I don't think we can have undefined section symbols, so maybe we only need one type here? |
wasm/SymbolTable.h | ||
---|---|---|
42 ↗ | (On Diff #144045) | Renamed to InputFilesSectionsSet (not sure how to give a better name for structure that holds and maps InputSections to their InputFiles) |
wasm/InputChunks.h | ||
---|---|---|
28 | No longer needed here? maybe in the cpp file? | |
wasm/InputFiles.cpp | ||
155 | Maybe I'm missing something but SymbolTable seems like the wrong place to store this. Why can't you just replace this code with ? InputSection* IS = CustomSectionsByIndex[Sym->SectionIndex]; return IS->OutputOffset + Reloc.Addend; I'm probably missing something here. |
wasm/InputFiles.cpp | ||
---|---|---|
155 |
Yeah, I'm trying to figure this out atm
Index CustomSectionsByIndex in expectes InputFile section index, but Sym->SectionIndex just contains index for symbol data (not associated with output or input section index). |
Looking much better in general
wasm/Symbols.h | ||
---|---|---|
183 | This extra symbol type seems a little strange to me... Could we put the getOutputSectionIndex/setOutputSectionIndex in the base SectionSymbol class so we don't need a separate CombinedSection? Then we could want to call getOutputSectionIndex on all of the DefinedSection object too. | |
wasm/Writer.cpp | ||
788 | You can just use S->getName() I think | |
800 | I don't see how you can use SymbolIndex here without incrementing it. | |
804 | This loop seems like its doing some different things. I wonder if we can find a way to build CombinedSectionSymbols in some other phase? For example perhaps in createCustomSections? Since for each custom section we need such a symbol? | |
809 | Can you replace these three lines with Sym = &P.first->second and avoid the continue? |
Simplify assignSymtab for CombinedSectionSymbols
Fix createCustomSections for non-referred sections
wasm/Symbols.h | ||
---|---|---|
183 | The CombinedSection comprised from InputChunks of one or more DefinedSections and only CombinedSection will be needed getOutputSectionIndex() for linking section. We have getOutputSymbolIndex() for reloc entries, that is assigned in the assignSymtab. | |
wasm/Writer.cpp | ||
800 | It is incremented at OutputSym->setOutputSymbolIndex(SymbolIndex++); below |
wasm/InputFiles.cpp | ||
---|---|---|
47 | I don't think you need "const" with ArrayRef as Refs are immutable already. | |
83 | Section symbols are always defined. So you can just cast here and never return 0. Also, perhaps getSectionSymbol should just return DefinedSection instead? | |
335 | The error message is little confusing. How about "section symbols cannot be undefined" | |
wasm/Symbols.h | ||
196 | Hot about this: We can have single SectionSymbol which we know is always defined. It can have getOutputSectionIndex() / setOutputSectionIndex() as well as const InputSection *Section;. The Writer could then just pick the first SectionSymbol with a given name to be the in the symbol table? Then we wouldn't need to create any new symbols and we wouldn't need CombinedSection class at all. Would that work? I could be missing something. | |
wasm/Writer.cpp | ||
399 | Is this just to save the Strings? If so does Saver.save() work? | |
491 | A pattern use elsewhere is to simply to assume you can cast to the ramaining type in the final else. e.g: } else { // At this point we know we have CombinedSection auto *S = cast<CombinedSection>(Sym)); ... } That way the cast will fail at runtime if anything goes wrong and you don't need the assert case |
Refactor SectionSymbol (remove CombinedSection and DefinedSection)
Remove InternedStrings
Review items
I like this better. Almost ready to land I think.
I'm mentioned that we can/should remove as much of the checking for ".debug_" as we can since I don't think change is specific the debug sections really, it should work for any custom section that want to include relocations.
@ruiu do you understand what are trying to do here?
wasm/Writer.cpp | ||
---|---|---|
398 | I think we can drop the second part of this condition perhaps? | |
400–401 | now I guess it should be "code, data, or custom sections" | |
778 | SectionSymbolIndices? | |
789 | Can we drop the first clause of this condition? | |
790 | .count(Name) == 0? |
wasm/Writer.cpp | ||
---|---|---|
765 | I don't think you need this since sections symbols are always local right? |
No longer needed here? maybe in the cpp file?