It's fairly straightforward - we just need a Live flag on the symbol to track whether undefined symbols were accessed or not.
InputGlobal has a "Live" member, and this patch adds a new "Live" member to Symbol which seems odd to me. It feels like something isn't quite right...
nit: you can merge these two ifs.
I don't think we should add a notion of "liveness" to symbols. It is easy to be confused, but "liveness" is for chunks, data, globals, etc, and symbols themselves are not subject of GC. I understand that since each wasm global have only one symbol associated with it, that distinction is ambiguous, but I still want to maintain that distinction to make things easy to understand.
I don't really like the yaml test case, but hopefully it will be shortlived.
I think getChunk can now be removed and inlines into isLive and getLive, it think it will be more readable that wasy.
Maybe assert isUndefined here?
But for undefined symbols we need to know which ones to write to the relocatable output. For undefined symbols there is no chunk.
We already have in in ELF/Symbols.h: unsigned Used : 1;
If you prefer we can give it the same name here for consistency.
I'm not sure there is any difference between Used and Live though, so it might be worth renaming the one in ELF?
If symbol is referenced by someone else, we have to do something in some cases. E.g. we have to put it to the dynamic symbol table if it is being used. But that's different from garbage collection.
Thanks for the comments, I'll deal with those nits.
Would it be a good compromise to rename "Symbol::Live" to "Symbol::ImportLive"? That would make it clear that it's the imported function object that's being GC'd, not the Symbol itself. Even more strictly, I could move the Live bit to the UndefinedGlobal/UndefinedFunction/UndefinedData classes (although that's more mess code-wise it might be seen as clearer?).
I'd have to check if that's possible, or it would break aliasing. Liveness isn't really a property of the Symbol, it conceptually is a property of the "thing" that the symbol points to. It's just that for undefined symbols there is no "thing". To me, it's cleaner to avoid getting/setting the Live bit on the Symbol except in the undefined case where we have to.
On a side note - A Wasm global is just like any other symbol. There can be many Symbols for the InputGlobal just as there can be many Symbols aliasing the same InputFunction.
I agree that we don't want to GC symbols. I've added a Live bit to the Symbol, not in order to GC the symbol, but in order to GC the function/global object which is imported. The Symbol itself is not GC'ed, but in the case of undefined Symbols, there isn't any existing object to hold the "Live" flag. It has to go somewhere on the Symbol, although it could be renamed to "Used" or heavily commented to clarify.
Regardless of whether a symbol is used/live, it's always included in the symbol table - no pruning is being done there.
This is a little like the dynamic symbol table. Our imports a the moral equivalent right now of dynamic undefined symbols. We want to populate it only with those symbols that we use. This change as an incremental improvement makes sense to me.
I don't think that last statement is true is it?
If an undefined symbols it not used (or not live if you prefer), it should not appear in the symbol table, and it should not appear in the imports of the final binary.
- rebased and fixed conflicts
- Move Live flag from Symbol to UndefinedFunction::ImportLive to clarify what it is that we're marking live
- small nits based on feedback
I've inlined it if that's neater for you (and converted to switch after inlining), but it'll still be used in MarkLive
Currently, symbol table is only written out when there's no GC, and the full contents of LLD's Symtab are written out. But you're right it might not be true in future if we were ever to write out a symtab from LLD for non-relocatable output.
I think this was nicer how you had it before. Why not just use a single bit on the symbol base class? Especially since we are about to add undefined data to the final output.. we don't want to repeat the ImportLive again.
I also don't have a problem with the Live bit of a symbol mirroring the Live bit on the segment it points too, I think both concepts are useful.
For this kind of switch case, I think we are moving towards just using dyn_cast<>.
I know! But you can't please everyone... I pulled it out of Symbol and into the derived classes in order to try and please Rui's request to remove the Live bit from Symbol.
Would we all be happy to have an "ImportLive" bit in the Symbol base class?
(I used a switch just to avoid a six-way if chain...)
Well, I know you guys want to add the notion of "liveness" to symbols, but I think that's sematnically not correct. Symbol is essentially a label attached to something, and they are not dead or alive themselves. What we mark and garbage collect is chunks. It even feels wrong to add markLive() to a symbol is not quite right. That shouldn't belong to symbol. Can you keep it in MarkLive.cpp?
OK, in this case we are using this information to build the imports (i.e. the dynamic symbols table) so perhaps if we go back to the idea that sections can be "Live" and symbols can be "Used", would that be ok with you? That would match the ELF linker better perhaps?
Similarity to or disparity from ELF isn't that important, but perhaps yes, though I'm not 100% sure if it's the best way. "Used" is probably a bit ambiguous, so I'd rename it "Referenced" or something.
Also, I think that the current way of representing an imported symbol doesn't hit the sweet spot. IIUC, undefined symbols are used to represent imported symbols, but I believe we should have a dedicated symbol type for imported symbols. After resolving and before writing, we can read all .import files, find all symbols referenced by the files, and replace them with some kind of imported symbol type, so that we can handle undefined symbols as really undefined.
Thanks for that! OK, I'll update these reviews and nudge them forwards next week. I've just been very busy at work this week with a customer project, and nut been able to spend and time on wasm in the evenings at home.
Updated - sorry for the long wait!
I think this now represents the consensus? At least, I think I've made all the changes requested above.
It's now looking almost identical to the very first revision of the diff actually..!
Rebased, and changed to add a "Referenced" bit on the Symbol class.
It's been another fortnight on this issue - is it now ready to go?
I made the change Rui requested (as best I could understand it), and I think I've addressed all the comments Sam has added to this PR (although I'm aware that you weren't happy with the approach on another related PR, this change itself has so far met with approval, based on the comments above).
Mostly this LGTM. I like the addition of markLive() to the symbol class, although I know ruiu has objections to it. I think it makes sense since it is symmetric with the existing isLive() (which has precedence in COFF/Symbols.h)
In terms of doing the marking as part of GC or not I think I'm OK with this change, although I don't really consider globals to be sections. I think there is a wider discussion to be had about the utility of --no-gc-sections for wasm and what exactly it means.