Page MenuHomePhabricator

[WebAssembly] call_indirect issues table number relocs
Needs ReviewPublic

Authored by wingo on Nov 6 2020, 7:50 AM.

Details

Summary

If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.

Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
wingo added a comment.Tue, Jan 26, 7:59 AM

OK this was unexpectedly gnarly as we had to undo a number of tests. The interdiff between the two latest revisions shows what changed relative to the previously reviewed patch.

I was surprised how hard it was to get the current subtarget feature set from the places where I needed it. Since the subtarget features can change during the compilation unit apparently, I used the "used-in-reloc" flag on the symbol to indicate whether a table should be included in the symbol table or not. This works -- except in the case of function pointers, where they get emitted not as part of an instruction but rather during lowering, and the fragment has no STI. Baffling. Is there a better way?

Tested MC, CodeGen, and lld wasm tests, and all passing with this fixup, having also reapplied https://reviews.llvm.org/D92215.

(+aardappel back to subscribers, given he reviewed MC part earlier; +tlively)

tlively added inline comments.Tue, Jan 26, 4:09 PM
llvm/lib/MC/WasmObjectWriter.cpp
530–531

What happens if reference-types is not enabled? It looks like we would still have a table number relocation and a table symbol, but the table symbol isn't marked as used. I assume the addition of if (WasmSym->isNoStrip()) above causes the symbol not to be emitted in this case, but is the relocation still emitted?

Overall the WasmObjectWriter seems like the wrong place to be checking target features. The encoding of a Wasm object's contents shouldn't depend on whether reference-types is enabled; it's the contents themselves that should depend on whether reference-types is enabled. Can we encapsulate all the target feature checking in codegen so that we only emit table number expressions/relocations/symbols if reference-types is enabled?

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
493

I have a similar comment here. IMO, if the assembly contains a table symbol, it should be unconditionally parsed as containing a table symbol. It should be the responsibility of the producer of the assembly to not use table symbols if they want to be compatible with older tools.

I'm not sure what to do about the special case of __indirect_function_table, though. When was that introduced?

llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
879

Why do we need a table symbol at all in this case (and also in DAG ISel), especially since it can't be emitted?

wingo added inline comments.Wed, Jan 27, 12:41 AM
llvm/lib/MC/WasmObjectWriter.cpp
530–531

Let's say the compilation unit contains a function pointer but no call_indirect. In that case the object file will need an __indirect_function_table import. However, there is no explicit relocation against __indirect_function_table; this code handles TABLE_INDEX relocations, which are for the index of a function pointer in a table, but which don't explicitly mention which table. Hence the setNoStrip() on the __indirect_function_table symbol -- even if it's not the target of a relocation, the table import should be written to the output file. This result is the same whether reference types are enabled or not.

Additionally -- if & only if reference-types are enabled -- the table symbol will need to be written to the symbol table. We indicate that by abusing the setUsedInReloc flag. That feels like a hack in a way but perhaps a benign one.

Backing up a bit, though, and taking this opportunity to answer questions posed farther down -- when do we know if a table symbol should be emitted to the symbol table? The writer and assembler layers don't have much access to features, which can change during the compilation unit. Generally speaking it's the code that emits instructions that knows whether a feature is enabled or not. So for example when we emit call_indirect, we emit either a proper symbol reference or a hard-coded reference to table 0, depending on the feature flag.

But function pointers are a special case, because their symbol references only name their address space (the table) implicitly. What we really want to do here, in all cases, is emit a __indirect_function_table import if and only if it is needed. Whether it goes in the symbol table or not is a side question. The normal way for the compiler to know if a symbol is needed is, well, explicit references: is it used in a relocation. But here since there is no explicit reference, we make a hack and explicitly add the symbol to the "GC root set" via setNoStrip.

Do you see an earlier place in codegen or so that would be more appropriate for handling this logic? I tried to do so but did not succeed. It's not just function pointers in code; it's also static data. I found that only at this lower level would I know for sure whether there were a function pointer in code or not.

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
493

So it's OK to have .s files that use features but don't check / declare them? Fine by me.

Still, even without reference types, emitting call_indirect needs to ensure the presence of its function table -- __indirect_function_table by default. So there does need to be some logic here AFAIU.

llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
879

The table symbol has two functions:

  1. Ensure that the table import is emitted to the object file.
  2. When reference types are enabled, to allow the table reference in the call_indirect to be relocated, via the symbol table entry and appropriate TABLE_NUMBER relocations.

So the answer to your question is (1).

I'm beginning to (re-)wonder if we should leave the current "call_indirect" as special in that it always implicitly refers the magic table zero.. then we would introduce a new instruction for call_indirect_explict that could be used if and only if reference types is enabled and would have the relocation. Data and function addresses will always be special in llvm so I think its OK to reflect that if it makes things simpler. I guess it would make things simpler in some ways and more complex in other? (Sorry if I'm re-litigating prior decisions here).

Thanks for your thoughtful answers, @wingo!

I'm beginning to (re-)wonder if we should leave the current "call_indirect" as special in that it always implicitly refers the magic table zero.. then we would introduce a new instruction for call_indirect_explict that could be used if and only if reference types is enabled and would have the relocation. Data and function addresses will always be special in llvm so I think its OK to reflect that if it makes things simpler. I guess it would make things simpler in some ways and more complex in other? (Sorry if I'm re-litigating prior decisions here).

I like the idea of treating table zero as a magical table that is assumed to always exist and that all "normal" function pointers and indirect calls implicitly refer to, even when reference-types is enabled. That's how the table used to work, right? This would solve the problem of making sure the table is live when there are call_indirects or function pointers present in an object -- table 0 would always be live. Then table symbols would only be used for additional user-defined tables from @pmatos's work, so they would only be used at all when reference-types is enabled. I don't think we even need a new kind of call_indirect instruction for this, since we will be able to differentiate "normal" indirect calls to table zero from other indirect calls by the address space of the called function pointer.

I'm beginning to (re-)wonder if we should leave the current "call_indirect" as special in that it always implicitly refers the magic table zero.. then we would introduce a new instruction for call_indirect_explict that could be used if and only if reference types is enabled and would have the relocation. Data and function addresses will always be special in llvm so I think its OK to reflect that if it makes things simpler. I guess it would make things simpler in some ways and more complex in other? (Sorry if I'm re-litigating prior decisions here).

I am having a hard time seeing what a new instruction would buy us. I would think there is already a sufficiently clear difference between call_indirect with an immediate table operand -- which was the case before I started, and is still the case now if reference-types is disabled -- and call_indirect with a symbol reference. There would only need to be a relocation in the latter case.

I like the idea of treating table zero as a magical table that is assumed to always exist and that all "normal" function pointers and indirect calls implicitly refer to, even when reference-types is enabled. That's how the table used to work, right? This would solve the problem of making sure the table is live when there are call_indirects or function pointers present in an object -- table 0 would always be live.

I think this goes against what was already done in D91637 and D92840. If individual object files can signal their use of the indirect function table via the presence or absence of the __indirect_function_table import, the linker can then only include it in the final linking stage when needed -- a minor win. I can see the point in treating __indirect_function_table as a good default, and which in the absence of reference-types will be table number 0, but reserving table 0 only complicates things when reference types *are* enabled -- because then you have two kinds of tables.

Just to be clear -- here is my use case.

https://github.com/Igalia/ref-cpp/blob/master/milestones/m3/test.S#L76

WAT version here:

https://github.com/Igalia/ref-cpp/blob/master/milestones/m3/test.wat#L47

Basically I am implementing a side table mapping externref to index, in webassembly instead of in JS. I want to have named table symbols so I can have different mappings, and to be able to table.get / table.set / etc on them, but to do this I need to solve the table linking problem. But to solve table linking I had to fix call_indirect. That's why I'm here. I can see how address spaces may relate to this on an IR level, but on the MC level (where I am right now) I think I can probably ignore it.

I'm beginning to (re-)wonder if we should leave the current "call_indirect" as special in that it always implicitly refers the magic table zero.. then we would introduce a new instruction for call_indirect_explict that could be used if and only if reference types is enabled and would have the relocation. Data and function addresses will always be special in llvm so I think its OK to reflect that if it makes things simpler. I guess it would make things simpler in some ways and more complex in other? (Sorry if I'm re-litigating prior decisions here).

I am having a hard time seeing what a new instruction would buy us. I would think there is already a sufficiently clear difference between call_indirect with an immediate table operand -- which was the case before I started, and is still the case now if reference-types is disabled -- and call_indirect with a symbol reference. There would only need to be a relocation in the latter case.

I agree that we just need a single call_indirect that can take either a constant zero table operand or a symbolic table operand. Both types of operands need to be able to coexist in the same link or even in the same object file to make MVP object files linkable with reference-types object files. I suggest that codegen only produce symbolic table operands to refer to user-defined funcref tables, not the default MVP indirect call table (table 0) because that will give us the nice property that objects will contain new table symbols and relocations only if they use reference-types features, not just if they enable reference-types features. This property should also remove the need to do any form of feature detection in the WasmObjectWriter.

I like the idea of treating table zero as a magical table that is assumed to always exist and that all "normal" function pointers and indirect calls implicitly refer to, even when reference-types is enabled. That's how the table used to work, right? This would solve the problem of making sure the table is live when there are call_indirects or function pointers present in an object -- table 0 would always be live.

I think this goes against what was already done in D91637 and D92840. If individual object files can signal their use of the indirect function table via the presence or absence of the __indirect_function_table import, the linker can then only include it in the final linking stage when needed -- a minor win. I can see the point in treating __indirect_function_table as a good default, and which in the absence of reference-types will be table number 0, but reserving table 0 only complicates things when reference types *are* enabled -- because then you have two kinds of tables.

Yes, I agree that it's better not to emit the table at all if it's not needed. I'm not totally clear on whether those patches to emit the table only when necessary are ABI-breaking changes, though. Do I understand correctly that they are not ABI-breaking because wasm-ld previously ignored the presence of the table import and that those patches signal whether the table is used by the presence of the import in the object file and not via new symbol or relocation types? If so, I agree that table 0 (i.e. '__indirect_function_table') should be live only if necessary. I think the crux of the matter is whether there should be two kinds of tables, though. AFAICT, wasm-ld needs to support two kinds of tables (hard-coded 0 and symbolic) if we want to keep supporting the MVP object ABI. Since we need to support both types of tables anyway, it seems simpler to let usage rather than enabled features determine when each kind of table is used. Normal function pointers and indirect calls to function pointers should use the hard-coded table 0, just as they do in MVP object files and all indirect calls into user-defined funcref tables should use symbolic tables. This scheme prevents the MC layer from having to do any feature detection and maintains compatibility with the MVP ABI.

Just to be clear -- here is my use case.

https://github.com/Igalia/ref-cpp/blob/master/milestones/m3/test.S#L76

WAT version here:

https://github.com/Igalia/ref-cpp/blob/master/milestones/m3/test.wat#L47

Basically I am implementing a side table mapping externref to index, in webassembly instead of in JS. I want to have named table symbols so I can have different mappings, and to be able to table.get / table.set / etc on them, but to do this I need to solve the table linking problem. But to solve table linking I had to fix call_indirect. That's why I'm here. I can see how address spaces may relate to this on an IR level, but on the MC level (where I am right now) I think I can probably ignore it.

I'm very excited for this all to come together :) You're right that address spaces won't show up in the MC layer at all. I mentioned them only to point out that the choice of whether to use a hard-coded table 0 or a symbolic table operand for call_indirect can be done in codegen rather than in the MC layer.

wingo added a comment.Thu, Jan 28, 1:35 AM

Thanks for the comments, tlively :)

I agree that we just need a single call_indirect that can take either a constant zero table operand or a symbolic table operand. Both types of operands need to be able to coexist in the same link or even in the same object file to make MVP object files linkable with reference-types object files.

Whoa -- is linking MVP object files with reference-types object files a requirement? This was totally not on my radar. Note that with feature flags, we can prohibit linking between MVP and reftypes object files.

I think this goes against what was already done in D91637 and D92840. If individual object files can signal their use of the indirect function table via the presence or absence of the __indirect_function_table import, the linker can then only include it in the final linking stage when needed -- a minor win. I can see the point in treating __indirect_function_table as a good default, and which in the absence of reference-types will be table number 0, but reserving table 0 only complicates things when reference types *are* enabled -- because then you have two kinds of tables.

Yes, I agree that it's better not to emit the table at all if it's not needed. I'm not totally clear on whether those patches to emit the table only when necessary are ABI-breaking changes, though. Do I understand correctly that they are not ABI-breaking because wasm-ld previously ignored the presence of the table import and that those patches signal whether the table is used by the presence of the import in the object file and not via new symbol or relocation types? If so, I agree that table 0 (i.e. '__indirect_function_table') should be live only if necessary.

This is correct, yes. Old object files always have an __indirect_function_table import, and the old linker always produces __indirect_function_table. Newer object files may omit __indirect_function_table if it is not needed, and newer linkers may omit __indirect_function_table in the output if no imput file requires it. It is always safe (but can be suboptimal) to emit an indirect function table, so old linkers can work with new object files. New linkers can also work with old object files, because the __indirect_function_table import is always present, so the table will not be omitted. So no ABI break AFAIU.

I agree that we just need a single call_indirect that can take either a constant zero table operand or a symbolic table operand. Both types of operands need to be able to coexist in the same link or even in the same object file to make MVP object files linkable with reference-types object files.

Whoa -- is linking MVP object files with reference-types object files a requirement? This was totally not on my radar. Note that with feature flags, we can prohibit linking between MVP and reftypes object files.

Yes, I expect we will receive bug reports from confused users if we don't allow linking MVP and reftypes object files together. Consider a library author who has no use for reference types and wants their library to run on as many engines as possible, including those that may not support reference types. That library author might distribute just a single build of their library that does not have the reference-types feature enabled. An application author who does use reference types should still be able to link against that library without requiring a separate build. Although you're right that we could just make that an error, that would inconvenience the library author who now has to distribute two builds of their library, even though the builds do not differ in the features they use.

I think this goes against what was already done in D91637 and D92840. If individual object files can signal their use of the indirect function table via the presence or absence of the __indirect_function_table import, the linker can then only include it in the final linking stage when needed -- a minor win. I can see the point in treating __indirect_function_table as a good default, and which in the absence of reference-types will be table number 0, but reserving table 0 only complicates things when reference types *are* enabled -- because then you have two kinds of tables.

Yes, I agree that it's better not to emit the table at all if it's not needed. I'm not totally clear on whether those patches to emit the table only when necessary are ABI-breaking changes, though. Do I understand correctly that they are not ABI-breaking because wasm-ld previously ignored the presence of the table import and that those patches signal whether the table is used by the presence of the import in the object file and not via new symbol or relocation types? If so, I agree that table 0 (i.e. '__indirect_function_table') should be live only if necessary.

This is correct, yes. Old object files always have an __indirect_function_table import, and the old linker always produces __indirect_function_table. Newer object files may omit __indirect_function_table if it is not needed, and newer linkers may omit __indirect_function_table in the output if no imput file requires it. It is always safe (but can be suboptimal) to emit an indirect function table, so old linkers can work with new object files. New linkers can also work with old object files, because the __indirect_function_table import is always present, so the table will not be omitted. So no ABI break AFAIU.

Thanks for confirming!

wingo added a comment.Thu, Jan 28, 2:05 AM

Thanks again for comments!

I would like to preserve the property that allows the linker to omit __indirect_function_table if it is not needed. Right now the linker has two ways it treats incoming files: if the input has table symtab entries, the input is assumed to have symbols for all tables and relocs for all table references. Otherwise, if the input has tables (import or not) but no symbols, it is an MVP input which requires an indirect function table. It would be sufficient to link the two if we ensure that the indirect function table is allocated table number 0.

Is the suggestion that for reftypes objects, we would omit the symbol for __indirect_function_table, and omit relocations against it? It would mean that the linker would have to grovel reftypes inputs for table imports that aren't in the symtab, in order to know if a reftypes object uses the indirect function table. This is doable but it is less optimal: since the linker doesn't have a precise idea of where the indirect function table is used, it has to assume that if it's live in the input, it's live in the output. If each table reference has a reloc, this isn't the case, and we can have more precise GC. But given the tradeoffs, do you prefer that solution?

Thanks again for comments!

I would like to preserve the property that allows the linker to omit __indirect_function_table if it is not needed. Right now the linker has two ways it treats incoming files: if the input has table symtab entries, the input is assumed to have symbols for all tables and relocs for all table references. Otherwise, if the input has tables (import or not) but no symbols, it is an MVP input which requires an indirect function table. It would be sufficient to link the two if we ensure that the indirect function table is allocated table number 0.

Is the suggestion that for reftypes objects, we would omit the symbol for __indirect_function_table, and omit relocations against it? It would mean that the linker would have to grovel reftypes inputs for table imports that aren't in the symtab, in order to know if a reftypes object uses the indirect function table. This is doable but it is less optimal: since the linker doesn't have a precise idea of where the indirect function table is used, it has to assume that if it's live in the input, it's live in the output. If each table reference has a reloc, this isn't the case, and we can have more precise GC. But given the tradeoffs, do you prefer that solution?

When you say "the linker would have to grovel reftypes inputs for table imports that aren't in the symtab" are you referring to the way things work today with MVP object files? I can't remember exactly how the mechanism works but we currently have to imply the existence of the table based on TABLE_INDEX relocations right? Sorry I've not been following all the comments here. Are you saying you want to improve on the status quo (make this more precise), or are you concerned about a regression (i.e. becoming less precise about the GC of the table in the final output).

I'm also not 100% sure of how things work today, but I hope it wouldn't be too difficult to emit the MVP table precisely when necessary by seeing whether or not it would be empty immediately before writing out the final binary. I'm perfectly willing to believe that I'm missing something that prevents that from working, though.

I'm also not 100% sure of how things work today, but I hope it wouldn't be too difficult to emit the MVP table precisely when necessary by seeing whether or not it would be empty immediately before writing out the final binary. I'm perfectly willing to believe that I'm missing something that prevents that from working, though.

Unfortunately this is not the case. You can still have

void call(void (*f)(void)) {
  f();
}

in the compilation unit, causing call_indirect to be emitted but no function pointer definition.

I'm also not 100% sure of how things work today, but I hope it wouldn't be too difficult to emit the MVP table precisely when necessary by seeing whether or not it would be empty immediately before writing out the final binary. I'm perfectly willing to believe that I'm missing something that prevents that from working, though.

Unfortunately this is not the case. You can still have

void call(void (*f)(void)) {
  f();
}

in the compilation unit, causing call_indirect to be emitted but no function pointer definition.

I meant that the linker should choose whether to emit the table into the final binary by checking whether it would be empty or not. So in this example, whether or not this compilation unit had a table in its object file, the linker would see that some function has its address taken (in the caller of call) and needs to be allocated a table slot. Since the table wouldn't be empty, the linker emits the table in the final binary.

Is the suggestion that for reftypes objects, we would omit the symbol for __indirect_function_table, and omit relocations against it? It would mean that the linker would have to grovel reftypes inputs for table imports that aren't in the symtab, in order to know if a reftypes object uses the indirect function table. This is doable but it is less optimal: since the linker doesn't have a precise idea of where the indirect function table is used, it has to assume that if it's live in the input, it's live in the output. If each table reference has a reloc, this isn't the case, and we can have more precise GC. But given the tradeoffs, do you prefer that solution?

When you say "the linker would have to grovel reftypes inputs for table imports that aren't in the symtab" are you referring to the way things work today with MVP object files?

Yes. I thought it would be a stopgap mechanism but this would make it be more permanent, happening both for reftypes and non-reftypes binaries.

I can't remember exactly how the mechanism works but we currently have to imply the existence of the table based on TABLE_INDEX relocations right?

No, the linker can be a bit more dumb than that -- it implies the existence of a table based on whether an input needs it. MVP inputs declare that they need a table by having a table import. An input needs the function table if it has an indirect call or a function pointer (bitcast). The latter case coresponds to TABLE_INDEX relocs, but the former has no relocs in MVP inputs.

Sorry I've not been following all the comments here. Are you saying you want to improve on the status quo (make this more precise), or are you concerned about a regression (i.e. becoming less precise about the GC of the table in the final output).

AFAIU Thomas was initially suggesting always reserving table 0 -- always having a table import in input object files, the linker always includes a table in the output. Then no symbols or relocations are needed to refer to the indirect function table; it's just assumed that the linker will assign it table number 0. I was saying that as long as inputs signal their need for a table, the linker shouldn't have to write a table output. The linker would just have to assign table number 0 to __indirect_function_table, if it is there; otherwise table 0 could be assigned to no table at all, if there are no tables, or some other table, for reftypes inputs that declare other tables.

If this works for you all I can hack this up. Will be Monday though as I have a day of meetings today.

wingo added a comment.EditedFri, Jan 29, 12:11 AM

I'm also not 100% sure of how things work today, but I hope it wouldn't be too difficult to emit the MVP table precisely when necessary by seeing whether or not it would be empty immediately before writing out the final binary. I'm perfectly willing to believe that I'm missing something that prevents that from working, though.

Unfortunately this is not the case. You can still have

void call(void (*f)(void)) {
  f();
}

in the compilation unit, causing call_indirect to be emitted but no function pointer definition.

I meant that the linker should choose whether to emit the table into the final binary by checking whether it would be empty or not. So in this example, whether or not this compilation unit had a table in its object file, the linker would see that some function has its address taken (in the caller of call) and needs to be allocated a table slot. Since the table wouldn't be empty, the linker emits the table in the final binary.

This doesn't work. You can have a file with call_indirect but no declared function pointers. In that case the final binary will not validate.

I'm also not 100% sure of how things work today, but I hope it wouldn't be too difficult to emit the MVP table precisely when necessary by seeing whether or not it would be empty immediately before writing out the final binary. I'm perfectly willing to believe that I'm missing something that prevents that from working, though.

Unfortunately this is not the case. You can still have

void call(void (*f)(void)) {
  f();
}

in the compilation unit, causing call_indirect to be emitted but no function pointer definition.

I meant that the linker should choose whether to emit the table into the final binary by checking whether it would be empty or not. So in this example, whether or not this compilation unit had a table in its object file, the linker would see that some function has its address taken (in the caller of call) and needs to be allocated a table slot. Since the table wouldn't be empty, the linker emits the table in the final binary.

Ah yes I remember what that doesn't work.

Because you can have this code:

a.o

void call(void (*f)(void)) {
   if (f) {
      f();
   }
}

b.o:

void foo() { 
  call(NULL);
}

Now you have a program that contains call_indirect but no function pointers. Its kind of annoying that the wasm standard makes this invalid but that is the current state. I think I have bought this up before in the standards discussions but perhaps we could re-litigate? WTYT tlively, would this make sense as a spec change?

Ah tricky. I understand why we can't just look at whether the table would have any non-null contents now. Thanks for explanations!

AFAIU Thomas was initially suggesting always reserving table 0 -- always having a table import in input object files, the linker always includes a table in the output. Then no symbols or relocations are needed to refer to the indirect function table; it's just assumed that the linker will assign it table number 0. I was saying that as long as inputs signal their need for a table, the linker shouldn't have to write a table output. The linker would just have to assign table number 0 to __indirect_function_table, if it is there; otherwise table 0 could be assigned to no table at all, if there are no tables, or some other table, for reftypes inputs that declare other tables.

If this works for you all I can hack this up. Will be Monday though as I have a day of meetings today.

Yes, this sounds good, although you're right that purely using the imports is not as precise as using relocations because they don't allow the linker to GC unnecessary tables. What do you think about this scheme:

  • MVP object files signal table usage by importing or not importing a table.
  • Reftypes object files signal table usage by a table symbol and relocations, even for the default table (table 0).
  • The linker emits table zero if any non-reftypes object imports a table or if the default table symbol is live.

This scheme has the following valuable properties:

  • The MVP object file ABI is not changed. MVP object files do not use new symbol or relocation types.
  • MVP and reftypes object files can be linked together.
  • If all objects that use the default table have reftypes enabled, we get the full precision of emitting the table only if its symbol is live.
  • All feature detection happens in codegen, not in object writing.

I know this has somewhat come full circle and this scheme is close to what you've already implemented. Thanks for bearing with me as I've wrapped my head around all the issues!

  • Reftypes object files signal table usage by a table symbol and relocations, even for the default table (table 0).

Any reason this has to happen right away? why not leave normal call_indirect to the default table alone for now (so reftypes object and MVP objects are the same in this regard for now)? We can always change our mind later and add this if we want to increase the GC precision (this can be done as a followup or not at all).

Leaving that part as a follow-on sounds fine to me. No need to stabilize the reference-types object ABI any time soon. My impression is that the table symbols and relocations are mostly implemented already, though, so if @wingo's interested in getting that part working, I don't see a reason to delay.

wingo added a comment.EditedMon, Feb 1, 6:49 AM

Hi :)

What do you think about this scheme:

  • MVP object files signal table usage by importing or not importing a table.
  • Reftypes object files signal table usage by a table symbol and relocations, even for the default table (table 0).
  • The linker emits table zero if any non-reftypes object imports a table or if the default table symbol is live.

Just for clarity, this differs from where I was initially going in two ways:

  • I had made it so that MVP object files could have symbols and TABLE_NUMBER relocs, but this proved incompatible with old linkers. Therefore table symbols and TABLE_NUMBER relocs will be emitted only if reftypes is enabled.
  • I didn't have any special logic to keep __indirect_function_table assigned to table number 0, assuming that either all inputs or no inputs had table symbols and relocatable table references. This is incompatible with linking to old MVP object files whose table references are not relocatable.

So I will first add the logic to the linker that it will assign table number 0 to __indirect_function_table, if it is present.

This scheme has the following valuable properties:

  • The MVP object file ABI is not changed. MVP object files do not use new symbol or relocation types.
  • MVP and reftypes object files can be linked together.

Agreed.

  • If all objects that use the default table have reftypes enabled, we get the full precision of emitting the table only if its symbol is live.

It's "fullish" precision. TABLE_INDEX relocations for function pointer bitcasts don't explicitly reference the table, so instead they setNoStrip on the indirect function table. There can be cases where the linker omits the function pointer, causing the function table to no longer be necessary, but the linker can't know that.

  • All feature detection happens in codegen, not in object writing.

Yes, for call_indirect. But when we emit a function pointer at the MC layer -- either in data or in code -- we need that to cause the function table to be present. For function pointers in code, i think we can find the feature set just fine, so that for reftypes we cause a symtab entry to be emitted for the function table. But for function pointers in data I don't know a good way to do feature detection; do you have any ideas here?

Thanks for your patience with this small but gnarly problem!

wingo added a comment.Tue, Feb 2, 11:40 PM
  • All feature detection happens in codegen, not in object writing.

Yes, for call_indirect. But when we emit a function pointer at the MC layer -- either in data or in code -- we need that to cause the function table to be present. For function pointers in code, i think we can find the feature set just fine, so that for reftypes we cause a symtab entry to be emitted for the function table. But for function pointers in data I don't know a good way to do feature detection; do you have any ideas here?

Gentle ping regarding this point -- this is why I was mucking with features in the writer. I plan on having another look in lowering but if you have other ideas they are welcome.

wingo added a comment.Wed, Feb 3, 1:37 AM

OK one nit I just realized. It would be nice to allow linking MVP and reftypes inputs. However this will not be possible if the reftypes input imports a table. This is because table numbers are assigned first to imports, then to module-local definitions. The MVP input would have call_indirect against table 0, expecting it to be the indirect function table, but instead it is an import from the reftypes module. I guess the linker just emits an error in that case.

It would be possible to have a command-line tool to migrate an MVP object file to a reftypes object file, parsing function code and reconstructing relocs. I guess the linker itself isn't the right place for that though.

sbc100 added a comment.Wed, Feb 3, 8:22 AM

OK one nit I just realized. It would be nice to allow linking MVP and reftypes inputs. However this will not be possible if the reftypes input imports a table. This is because table numbers are assigned first to imports, then to module-local definitions. The MVP input would have call_indirect against table 0, expecting it to be the indirect function table, but instead it is an import from the reftypes module. I guess the linker just emits an error in that case.

It would be possible to have a command-line tool to migrate an MVP object file to a reftypes object file, parsing function code and reconstructing relocs. I guess the linker itself isn't the right place for that though.

Yup that does sounds like a showstopper for linking mixed inputs. Should we have the linker error out in only cases were there are imported tables? Or should error out on any kind of mixed inputs? I guess it depends how useful linking mixed object files is, and how often folks will be using table imports.

I think maybe we should start out by making mixed inputs a hard error. We could try to relax it later if there is demand. Does that sounds reasonable @tlively ?

  • All feature detection happens in codegen, not in object writing.

Yes, for call_indirect. But when we emit a function pointer at the MC layer -- either in data or in code -- we need that to cause the function table to be present. For function pointers in code, i think we can find the feature set just fine, so that for reftypes we cause a symtab entry to be emitted for the function table. But for function pointers in data I don't know a good way to do feature detection; do you have any ideas here?

Gentle ping regarding this point -- this is why I was mucking with features in the writer. I plan on having another look in lowering but if you have other ideas they are welcome.

Would it be possible to do this feature detection and addition of a table in WebAssemblyAsmPrinter.cpp, which is the transition from the MachineInstr layer to the MC layer? This is the last point where the original LLVM Module is still available, so in the worst case you could traverse the Module's global data to find function pointers.

OK one nit I just realized. It would be nice to allow linking MVP and reftypes inputs. However this will not be possible if the reftypes input imports a table. This is because table numbers are assigned first to imports, then to module-local definitions. The MVP input would have call_indirect against table 0, expecting it to be the indirect function table, but instead it is an import from the reftypes module. I guess the linker just emits an error in that case.

It would be possible to have a command-line tool to migrate an MVP object file to a reftypes object file, parsing function code and reconstructing relocs. I guess the linker itself isn't the right place for that though.

Yup that does sounds like a showstopper for linking mixed inputs. Should we have the linker error out in only cases were there are imported tables? Or should error out on any kind of mixed inputs? I guess it depends how useful linking mixed object files is, and how often folks will be using table imports.

I think maybe we should start out by making mixed inputs a hard error. We could try to relax it later if there is demand. Does that sounds reasonable @tlively ?

I would prefer to error out only when a reftypes object imports a table. This seems similar to what we do when linking MVP and atomics-enabled objects, i.e. allow the link when we know it to be safe and possible. That's certainly been useful in practice, so I expect that allowing MVP+reftypes links will be useful as well as reftypes picks up adoption. Being permissive and having good error messages from the start will save us a lot of support emails and Emscripten issues down the line :) Of course this relaxation doesn't have to be in the initial patch, but I don't think we should wait to implement it until users complain.

sbc100 requested changes to this revision.Mon, Feb 8, 3:17 PM
This revision now requires changes to proceed.Mon, Feb 8, 3:17 PM
wingo added a comment.Thu, Feb 11, 6:40 AM

Thanks for patience with the delay here. Paging this back in -- I will move the logic from MC/WasmObjectWriter.cpp to WebAssemblyAsmPrinter.cpp -- there I will iterate over all functions in the module, and if any of them hasAddressTaken, then I will ensure an __indirect_function_table definition exists and will be written to the output (setNoStrip). We keep the hack that table symbols only go to the symbol table if they are marked as used in relocations. If the reference types feature is enabled and an address-taken function is present in the module, the indirect function table will be explicitly marked isUsedInReloc. Just broadcasting intention here so that there are no more late surprises :)

Sounds good! Thanks for your patience with the previous late surprises :)

Actually, one question.

We keep the hack that table symbols only go to the symbol table if they are marked as used in relocations.

It seems like it would be simpler to not create a symbol table in the first place in cases where it will not be emitted into the binary anyway. In LowerCallResults, can we do

MIB.addSym(WebAssembly::getOrCreateFunctionTableSymbol(
    MF.getContext(), "__indirect_function_table"));

only if MF.getSubtarget<WebAssemblySubtarget>().hasReferenceTypes() and otherwise continue doing MIB.addImm(0)?

Actually, one question.

We keep the hack that table symbols only go to the symbol table if they are marked as used in relocations.

It seems like it would be simpler to not create a symbol table in the first place in cases where it will not be emitted into the binary anyway. In LowerCallResults, can we do

MIB.addSym(WebAssembly::getOrCreateFunctionTableSymbol(
    MF.getContext(), "__indirect_function_table"));

only if MF.getSubtarget<WebAssemblySubtarget>().hasReferenceTypes() and otherwise continue doing MIB.addImm(0)?

This could work, but the issue comes as to whether or not to emit the __indirect_function_table import. Currently the mechanism from D91637 (which, looking at it now, landed in November already -- how has this task gone on for so long???) is that the __indirect_function_table import is added the output file only if needed. We need some way for a compilation unit to record that it actually needs the indirect function table. Currently we use the presence of the table symbol in the symtab.

wingo updated this revision to Diff 323323.Fri, Feb 12, 7:10 AM

Squish with D92215

wingo planned changes to this revision.Fri, Feb 12, 11:39 AM
wingo updated this revision to Diff 323704.Mon, Feb 15, 3:07 AM

Logic ensuring function-to-pointer casts cause __indirect_function_table to be present moved to WebAssemblyAsmPrinter; bug pending for WebAssemblyAsmParser though.

wingo added a comment.Mon, Feb 15, 3:21 AM

Thanks for patience with the delay here. Paging this back in -- I will move the logic from MC/WasmObjectWriter.cpp to WebAssemblyAsmPrinter.cpp -- there I will iterate over all functions in the module, and if any of them hasAddressTaken, then I will ensure an __indirect_function_table definition exists and will be written to the output (setNoStrip). We keep the hack that table symbols only go to the symbol table if they are marked as used in relocations. If the reference types feature is enabled and an address-taken function is present in the module, the indirect function table will be explicitly marked isUsedInReloc. Just broadcasting intention here so that there are no more late surprises :)

@tlively: So, this appears to work. However I ran into a problem in WebAssemblyAsmParser -- it needs to do this same logic, if a function-to-pointer cast happens in assembly, we should ensure that the indirect function table is present in the output. Before, this case was caught by WasmObjectWriter. E.g. test/MC/WebAssembly/weak-alias.s has:

load_hidden_func:
    .functype   load_hidden_func () -> (i32)
    global.get  __table_base
    i32.const   hidden_func@TBREL
    i32.add
    end_function

But with no call_indirect, the output doesn't have __indirect_function_table, as it should.

I could check if any instruction is i32.const or i64.const with the operand being a symbol, and check that the symbol is a function, but that is gross :/ I will continue to think about this but any pointers you might have would be welcome. I lean towards my previous solution, fwiw.

wingo added a comment.Mon, Feb 15, 5:25 AM

Thinking about this a bit more after a nap -- how about I put some ensure-there's-a-function-table logic back to WasmObjectWriter in response to a TABLE_INDEX reloc, but I remove the feature detection from there. I remove the logic from AsmPrinter (maybe; see below).

I have a new idea about how to determine when to write a symtab entry for __indirect_function_table. How about, in WasmObjectWriter, we only write table symtab entries if they are needed. If any table symtab entry is needed, all tables should have symtab entries.
A table needs a symtab entry if:

  • it is used in a reloc (call_indirect issued a symbol reference); or
  • it is not the indirect function table (check via name and isFunctionTable())

With this strategy, we may get a reference-types file which needs a __indirect_function_table because of a function bitcast, but it's not used in a reloc so no symtab entry is emitted. That's fine with the linker logic from D96001. The indirect function table will not be relocatable in that case but that's fine; the indirect function table will be assigned table number 0 anyway. Only way that can fail is if the linker output defines __indirect_function_table locally but some other compilation unit imports a table; perhaps that is an argument for not removing the AsmPrinter case.

wingo added a comment.Mon, Feb 15, 5:55 AM

A table needs a symtab entry if:

  • it is used in a reloc (call_indirect issued a symbol reference); or
  • it is not the indirect function table (check via name and isFunctionTable())

Thinking again. Apologies for the late tergiversations -- if you have a compilation unit with a function bitcast but no call_indirect and no other tables, in practice you have a relocatable table. The linker can assign __indirect_function_table to any table number because there are no uses to relocate. (I had locally added a hasNonRelocatableUses flag to the MCSymbolWasm for tracking this.) But, a linker would only assign non-zero table numbers if there is a table symtab entry, and old linkers won't accept table symtab entries, so you really need to gate the presence of the symtab entry on the reference-types feature. You can't decide automatically :/

I think this problem is important to solve so I will continue to think about it. The previous feature-detection-in-the-writer appeared to solve the problem fwiw.

wingo updated this revision to Diff 323760.Mon, Feb 15, 8:04 AM

Rework how we determine when a table should have a symtab entry.

wingo added a comment.Mon, Feb 15, 8:06 AM

OK this thing is horrible in many ways but it does pass tests. @tlively would you mind having a look to see if especially the WasmObjectWriter, MCSymbolWasm, WebAssemblyAsmPrinter, and WebAssemblyAsmParser changes look right to you? @sbc100 input also very welcome of course!

wingo updated this revision to Diff 323898.Tue, Feb 16, 12:24 AM
wingo marked 6 inline comments as done.

Fix clang-tidy nits; remove a vestigial call to setUsedInReloc

wingo added a comment.Wed, Feb 17, 7:15 AM

Gently ping here; this is the next patch to land, and as we have few overnights left in the week it would be great to get review tonight. I know it's long; my apologies!

wingo edited the summary of this revision. (Show Details)Wed, Feb 17, 7:16 AM
wingo edited the summary of this revision. (Show Details)
wingo edited the summary of this revision. (Show Details)

Sorry for the delay, @wingo. I'm taking a look now.

tlively accepted this revision.Thu, Feb 18, 12:32 PM

I have two remaining concerns about this patch:

  1. I still feel that creating a table symbol that won't be emitted into the binary is an overly complex way of signaling that a table import is needed. I would prefer a simple shouldImportTable boolean somewhere to record that information. However, I don't want to hold this patch up any further on that concern because I don't have a good idea of where that boolean should go. I would be happy if we could return to this point once things are working end-to-end.
  2. The comment I left on the "magic" table operand insertion this patch adds to the assembler. As I wrote, I would prefer to error out instead of implicitly inserting the missing operand, but that's a relatively minor detail that is unrelated to most of the work done in this patch, so I would happy to leave further discussion of that to a follow-up as well.

I do want to continue discussing those points, but I know this is a largish patch that has been outstanding for a while and is blocking further work. If you feel it would be more efficient, I would be fine landing this as-is and addressing those points in follow-ups.

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
234–241

The code in this patch fixes up the user's assembly by adding in table operands that did not appear in the source. IIUC, the motivation for this is to maintain backward compatibility with existing assembly that does not contain explicit table operands. IMO, rather than magically fixing the user's code, it would be better to error out on any call_indirect without an explicit table operand when reference-types is enabled. The downside would be that users would need to edit their assembly code when they enable reference-types, but I think the benefit of removing this "magic" from the assembler would be worth it.

689

In the text format the table index appears first. I think it would be good to be consistent with that rather than the binary format.

llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
872
wingo marked an inline comment as done.Fri, Feb 19, 12:59 AM
wingo added inline comments.
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
234–241

Context was https://reviews.llvm.org/D90948#inline-849046, FWIW. Sure, I can make this change, but note that the other downside is that reference-types compilation units that want to link to MVP objects will need to correctly spell "__indirect_function_table" in their call_indirect instances, otherwise you could accidentally use the wrong table.

689

Sure.

wingo marked an inline comment as done.Fri, Feb 19, 5:50 AM
wingo added inline comments.
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
689

Just to confirm, we want the text format to have a different operand order than the CALL_INDIRECT instruction from WebAssemblyCall.td -- can do.

wingo added inline comments.Fri, Feb 19, 6:38 AM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
689

Hummm, I will have to write a dedicated printer in WebAssemblyInstrPrint.cpp for {RET_,}CALL_INDIRECT_S, to invert the operand order. Really sorry to bother but @tlively can you confirm this is the right thing? Thanks :)

tlively added inline comments.Fri, Feb 19, 11:50 AM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
689

Yes, confirmed. I hadn't thought about how this will need a custom printer, but I still think it is the right thing to do. Folks who read and write this assembly format will probably also be spending time at least reading standard wat/wast files and probably not much time looking at the raw binaries, so I think it would be most helpful to the user for us to match the text format rather than the binary format here.

tlively added inline comments.Fri, Feb 19, 12:04 PM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
234–241

Acknowledged. I asked @aardappel and he didn't feel strongly about it and I'd like to get feedback from @sbc100 about it but he's out of office today. I'd be ok landing this as-is and potentially revisiting it with feedback from more stakeholders. (The more I think about it, the less strongly I feel about this, too.)

Moin :) Revisiting -- as you suggest I am going to land as-is, then follow up with a patch to not "magically" refer to __indirect_function_table in +reference-types files. If you reach a conclusion about argument order in this evening's wasm-tools meeting, please let me know and I can follow up as necessary there too.

This revision was not accepted when it landed; it landed in state Needs Review.Mon, Feb 22, 1:17 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
wingo added a comment.Mon, Feb 22, 2:32 AM

Yaargh, bit by the "phab description is not necessarily what gets committed" issue -- patch committed is fine but description was outdated. In mozilla for example they land patches instead via "lando", a phab service that applies the patch as reviewed instead of opening space for errors like this one. Anyway, my apologies for the commit note mismatch.

I think this broke the LLVM roll on Emscripten CI. I don't have time to bisect atm, but this is the only wasm-related commit in the range, and the compiler crash is on a file and function modified in this patch:

llvm::WebAssembly::getOrCreateFunctionTableSymbol(llvm::MCContext&, llvm::WebAssemblySubtarget const*) WebAssemblyUtilities.cpp

Crash details: https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket.appspot.com/8854596160871978800/+/steps/Build_Emscripten__upstream_/0/stdout

Regression range: https://chromium-review.googlesource.com/c/emscripten-releases/+/2711978

Crash is when building system/lib/libc/musl/src/stdio/stderr.c in libc.

dmajor added a subscriber: dmajor.Mon, Feb 22, 8:15 PM

I think this broke the LLVM roll on Emscripten CI. I don't have time to bisect atm, but this is the only wasm-related commit in the range, and the compiler crash is on a file and function modified in this patch:

We're also seeing this in Mozilla's build of https://github.com/WebAssembly/wasi-sdk, and our bisect indeed pointed to this commit.

In case having a second set of repro steps helps, we're doing a relatively vanilla make : https://searchfox.org/mozilla-central/source/taskcluster/scripts/misc/build-wasi-sysroot.sh#9

wingo added a comment.Tue, Feb 23, 2:24 AM

Thanks for the reports, I will revert this patch and reland when I test against emscripten.

wingo reopened this revision.Tue, Feb 23, 2:51 AM