This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Output provisional table to match LLD relocatable output
ClosedPublic

Authored by ncw on Jan 16 2018, 3:14 AM.

Details

Summary

LLD now avoids duplicating functions in the output table, but the Clang logic does not match. To avoid unnecessary variability, this commit cleans up that difference in behaviour.

No functional changes. This is because the Wasm "table" in object files is purely cosmetic, and simply contains provisional values. LLD ignores its contents, and it simply exists to make the object file valid and look roughly right.


(This was the third chunk split out of D41954.)

Diff Detail

Repository
rL LLVM

Event Timeline

ncw created this revision.Jan 16 2018, 3:14 AM
ncw added inline comments.Jan 16 2018, 3:19 AM
test/MC/WebAssembly/external-func-address.ll
32 ↗(On Diff #129927)

Test expectations updated: all ELEM sections are now written out with offset 1. Avoids unnecessary differences between the different Wasm files.

test/MC/WebAssembly/global-ctor-dtor.ll
124 ↗(On Diff #129927)

Test expectations updated: all "preliminary" values for relocations are incremented by one (since the table now has offset 1).

test/MC/WebAssembly/weak-alias.ll
65 ↗(On Diff #129927)

Test expectations now updated: in the alias test, the table is now one element smaller (here and line 134) since WasmObjectWriter now uses ResolveSymbol for the "provisional" ELEMS section - this matches the LLD behaviour, writing out the index of the actual function called rather than the import that's used as a stand-in for aliases.

sbc100 added inline comments.Jan 18 2018, 7:38 PM
lib/MC/WasmObjectWriter.cpp
226 ↗(On Diff #129927)

If this is constant then there no need to for it to be a member. Perhaps static const int in the file scope?

I split off that table offset part of this change: D42284
And make a corresponding LLD-side change that can land first: D42285

I hope you don't mind me landing this stuff peicemeal. I think its better to be incremental.

ncw updated this revision to Diff 130648.Jan 19 2018, 10:19 AM

Update:

  • Simple rebase

I know you might be landing this one in a different order... I'm just keeping this sequence of patches updated and consistent on top of master, to help cherry-picking things across and making sure they all apply cleanly (I'm using git).

ncw updated this revision to Diff 130720.Jan 19 2018, 4:37 PM

Updated:

  • Another rebase after #4c and #4d landed
ncw updated this revision to Diff 131077.Jan 23 2018, 8:06 AM

Updated:

  • Rebased to avoid conflict after previous change was landed with some reformating

Can you update the description of this change? I'm not sure all of it still applies.

lib/MC/WasmObjectWriter.cpp
220 ↗(On Diff #131077)

Maps function index to table element index, no? Only functions can exist in this map right?

sbc100 added inline comments.Jan 23 2018, 12:18 PM
lib/MC/WasmObjectWriter.cpp
220 ↗(On Diff #131077)

Also, this isn't really map is it? It more a list of function indexes to be written to the indirect function table.

With that in mind, why not call it TableElems, perhaps?

515 ↗(On Diff #131077)

This doesn't look like it should work to me.

TableIndices is not a map.. but a list of function indexes.. so it seems like this will look up the N'th element in the table, not the table entry for function N.

test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

How as this change generated a new data segment? I guess maybe it was zero before so now it was elided?

It looks to me like we really want to two different function addresses here for the address of the function itself and the address of the weak alias. Its not until link time that these two things become the same thing. Maybe it doesn't really matter as long as the linker can distinguish them, but it makes sense to be that there would be two different function addresses in this object file.

ncw added a comment.Jan 23 2018, 2:10 PM

Thanks for the review!

lib/MC/WasmObjectWriter.cpp
220 ↗(On Diff #131077)

Functions and function imports too. It's a map in the sense that it's an int->int lookup table; it's just a dense map with bounded elements so an array made sense.

TableIndices[i] is the table index of the ith function, so the name makes sense to me.

515 ↗(On Diff #131077)

It's an array more than a vector really.

test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Ah, the test data before was skipping the section - see the "CHECK" in the original file, rather than "CHECK-NEXT". I thought it was useful to expand the test to assert on the section that was being skipped; it was odd that previously it was deliberately not examining the final segment.

"It's not until link time that they become the same thing" -> but the table in the object file is linked. The "provisional" values that are used for relocations in the object file are a preview of the values that would get written out. I think it makes sense to use the same rules for writing out provisional values as we do for writing out the actual values in LLD. Hence we don't put the exact same function twice in the table - there's really no point. Seeing a function appear multiple times in the table doesn't make the output any easier to read.

ncw retitled this revision from [WebAssembly] Symbol changes #3: Cosmetic table, LLVM. NFC. to [WebAssembly] Symbol changes #3: Make Clang object file tables match LLD output. NFC..Jan 23 2018, 2:28 PM
ncw edited the summary of this revision. (Show Details)
sbc100 added inline comments.Jan 23 2018, 2:28 PM
lib/MC/WasmObjectWriter.cpp
220 ↗(On Diff #131077)

When we talk about "function index space" that always includes imported functions.

If that is the case then I think I prefer a map, since most functions are not address taken. I also don't see why you wouldn't want it to continue to a map that is indexed by symbol.

test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

But for intermediate output such as this (or the lld -r) we are still waiting for the a potential strong version of the symbol, so the weak alias and the function in points too are still district (both here and in lld -r).

sbc100 added inline comments.Jan 23 2018, 2:30 PM
test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Hmm.. maybe i'll expand this test expectation now as a separate change to make this change more explicit.

ncw added inline comments.Jan 23 2018, 2:51 PM
lib/MC/WasmObjectWriter.cpp
220 ↗(On Diff #131077)

I see the code comment now, you're right the comment is anticipating a future change (should say "function index" not "function symbol index" since that doesn't exist yet).

I just did it because I thought the array would be "denser" in time/space, and take a bit less code than instantiating a map template. I suppose it doesn't matter, maybe using a DenseMap is clearer. I can make that change tomorrow.

test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Yes - LLD might well resolve the indexes separately. But it doesn't seem to me that there's any added clarity in making the object file different to the linked output here (ie we're just outputting the same table that you'd get if you run LLD to link the single object file).

The output you're requesting is going to look like this. It doesn't seem to me to assist in examining the object file to put the same function twice in the table:

(table 3 anyfunc)
(elem (i32.const 1) $func $func)
(memory $0 1)
(export "memory" (memory $0))
(export "func" (func $func))
(export "funcAlias" (func $func)) // Weak alias for func
(func $func  (type blah)
)
(func $anotherFunc (type blah)
 // A call to the alias and a call to the non-alias...
 // But there's no way to tell which is which! The alias is weakly-
 // defined, so it's not an import
 (call_indirect 1)
 (call_indirect 2)
)

Of course the other alternative is just to output a totally empty table in object files!

sbc100 added inline comments.Jan 23 2018, 3:02 PM
test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

I'm taking a look at the lld -r output now. I'm sure we can agree that the clang output should match that of lld -r more that it matches just lld right?

It does looks like lld -r currently doesn't do the useful thing I was expecting.

The output if wasm-objdump -x -d -r a lot more useful to me that anything in llvm right now (until we can disassembly wasm files, which is coming soon). Hopefully we will have better tools that objtoyaml to use in our test cases at some point.

In this case I modified lld/test/wasm/weak-alias.ll to use --relocatable and I got the following:

0000cd <call_alias_ptr>:
 0000d1: 23 80 80 80 80 00          | get_global 0
           0000d2: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 0000d7: 41 10                      | i32.const 16
 0000d9: 6b                         | i32.sub
 0000da: 22 00                      | tee_local 0
 0000dc: 24 80 80 80 80 00          | set_global 0
           0000dd: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 0000e2: 20 00                      | get_local 0
 0000e4: 41 80 80 80 80 00          | i32.const 0
           0000e5: R_WEBASSEMBLY_TABLE_INDEX_SLEB 1
 0000ea: 36 02 08                   | i32.store 2 8
 0000ed: 10 81 80 80 80 00          | call 1 <direct_fn>
           0000ee: R_WEBASSEMBLY_FUNCTION_INDEX_LEB 1 <direct_fn>
 0000f3: 21 01                      | set_local 1
 0000f5: 20 00                      | get_local 0
 0000f7: 41 10                      | i32.const 16
 0000f9: 6a                         | i32.add
 0000fa: 24 80 80 80 80 00          | set_global 0
           0000fb: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 000100: 20 01                      | get_local 1
 000102: 0b                         | end

and

000103 <call_direct_ptr>:
 000107: 23 80 80 80 80 00          | get_global 0
           000108: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 00010d: 41 10                      | i32.const 16
 00010f: 6b                         | i32.sub
 000110: 22 00                      | tee_local 0
 000112: 24 80 80 80 80 00          | set_global 0
           000113: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 000118: 20 00                      | get_local 0
 00011a: 41 81 80 80 80 00          | i32.const 1
           00011b: R_WEBASSEMBLY_TABLE_INDEX_SLEB 1
 000120: 36 02 08                   | i32.store 2 8
 000123: 10 81 80 80 80 00          | call 1 <direct_fn>
           000124: R_WEBASSEMBLY_FUNCTION_INDEX_LEB 1 <direct_fn>
 000129: 21 01                      | set_local 1
 00012b: 20 00                      | get_local 0
 00012d: 41 10                      | i32.const 16
 00012f: 6a                         | i32.add
 000130: 24 80 80 80 80 00          | set_global 0
           000131: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 000136: 20 01                      | get_local 1
 000138: 0b                         | end

I was hoping that the two R_WEBASSEMBLY_FUNCTION_INDEX_LEB entries where would be different, as I think they should be. So I think that might be bug in the --relocatable output generation.

ncw added inline comments.Jan 23 2018, 3:13 PM
test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Yes, I agree Clang should most closely match lld -r - but I was hoping that *both* of them would match normal LLD output for the table (ie one table entry per "real" function). If there is to be any difference between object files and linked files in terms of the table that's output, Clang and lld -r should match.

The relocations should definitely be different, and reference the two different symbols directly. But since those two symbols each reference the same (Wasm) function index, I personally would find it odd to have multiple table indices for the same Wasm function.

I think you're right that there is a bug in the lld -r output for relocations, but it's worth checking whether #5 fixes it? The handling of symbols vs functions in LLD is generally a bit patchy, and I think #5 fixes the overall approach.

This is after all cosmetic, I really don't mind what the outcome is! I was trying to make Clang, lld -r, and normal LLD make a matching table, so that we don't have unnecessary variability.

ncw updated this revision to Diff 131247.Jan 24 2018, 6:15 AM
ncw marked 4 inline comments as done.

Updated:

  • Renamed table map as requested, tidied diff slightly

Also split out a small chunk from patch #5 into a separate revision, #4f (D42476).

test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Aha - wait! I must have been very sleepy last night to forget why lld -r doesn't work with aliases!

It's because your "double symbol fix" (so that aliases are both imported and exported) with the "alt index" isn't present in LLD. WasmObjectWriter hackily writes out the weak symbols twice, but LLD only has support for reading in twice-declared symbols, not for writing them out.

There really isn't any point fixing this. Why introduce new code in LLD at this stage, for writing out the import-and-export format, when we're just days away from switching to using symbol indexes, at which point the need for doing that goes away.

So that's why the fix isn't split out of patch #5.

I see you've added a test - that's great, I'll check that #5 does indeed fix that test.

ncw added inline comments.Jan 24 2018, 6:37 AM
test/MC/WebAssembly/weak-alias.ll
197 ↗(On Diff #131077)

Oops - when I said "there really isn't any point fixing this", what I meant was, there really isn't any point fixing LLD so that lld -r output matches what Clang currently outputs. Namely, I don't see any point adding code to LLD to make the -r output use the imports/exports strategy that Clang currently has.

Of course it's worth fixing in the long run, with the symtab changes.

I've tested with #5, and can confirm that LLD does indeed write out the expected relocations:

0000cc <call_alias_ptr>:
 0000d0: 23 80 80 80 80 00          | get_global 0
           0000d1: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 0000d6: 41 10                      | i32.const 16
 0000d8: 6b                         | i32.sub
 0000d9: 22 00                      | tee_local 0
 0000db: 24 80 80 80 80 00          | set_global 0
           0000dc: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 0000e1: 20 00                      | get_local 0
 0000e3: 41 81 80 80 80 00          | i32.const 1
           0000e4: R_WEBASSEMBLY_TABLE_INDEX_SLEB 7
 0000e9: 36 02 08                   | i32.store 2 8
 0000ec: 10 81 80 80 80 00          | call 1 <direct_fn>
           0000ed: R_WEBASSEMBLY_FUNCTION_INDEX_LEB 7
 0000f2: 21 01                      | set_local 1
 0000f4: 20 00                      | get_local 0
 0000f6: 41 10                      | i32.const 16
 0000f8: 6a                         | i32.add
 0000f9: 24 80 80 80 80 00          | set_global 0
           0000fa: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 0000ff: 20 01                      | get_local 1
 000101: 0b                         | end

000102 <call_direct_ptr>:
 000106: 23 80 80 80 80 00          | get_global 0
           000107: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 00010c: 41 10                      | i32.const 16
 00010e: 6b                         | i32.sub
 00010f: 22 00                      | tee_local 0
 000111: 24 80 80 80 80 00          | set_global 0
           000112: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 000117: 20 00                      | get_local 0
 000119: 41 81 80 80 80 00          | i32.const 1
           00011a: R_WEBASSEMBLY_TABLE_INDEX_SLEB 2
 00011f: 36 02 08                   | i32.store 2 8
 000122: 10 81 80 80 80 00          | call 1 <direct_fn>
           000123: R_WEBASSEMBLY_FUNCTION_INDEX_LEB 2 <call_direct>
 000128: 21 01                      | set_local 1
 00012a: 20 00                      | get_local 0
 00012c: 41 10                      | i32.const 16
 00012e: 6a                         | i32.add
 00012f: 24 80 80 80 80 00          | set_global 0
           000130: R_WEBASSEMBLY_GLOBAL_INDEX_LEB 0
 000135: 20 01                      | get_local 1
 000137: 0b                         | end

We have two instances of call 1 <direct_fn>, but the relocations are different indices. Clearly wasm-objdump is going to have to be updated to understand the new relocation format... maybe wait until the symtab format is stable.

ncw retitled this revision from [WebAssembly] Symbol changes #3: Make Clang object file tables match LLD output. NFC. to [WebAssembly] Output provisional table to match LLD relocatable output.Jan 25 2018, 7:46 AM
ncw edited the summary of this revision. (Show Details)

Is this change still worth landing with planned symbol table changes coming down the line?

ncw added a comment.Jan 31 2018, 2:06 AM

Is this change still worth landing with planned symbol table changes coming down the line?

Yes! This change is just to do with the provisional indexes - which are Wasm indexes, and won't change with all the symbol table changes. The code & tests affected by this change aren't touched by any of my later commits.

ncw updated this revision to Diff 132154.Jan 31 2018, 5:48 AM

Updated to remove conflicts, no real changes

sbc100 added inline comments.Jan 31 2018, 10:34 AM
lib/MC/WasmObjectWriter.cpp
502 ↗(On Diff #132154)

I guess this error was removed because it should never happen right?

And converting it to an assert wouldn't be very useful?

Ok, I'm about to land this. What do you think of my proposed description:

[WebAssembly] MC: Resolve aliases when creating provisional table entries
    
This change is useful for the upcoming addition of the symbol
table (D41954) since in that world function aliases all share
the same function index.
    
This change does not effect lld because the wasm "table" in
object file format is purely cosmetic and lld ignores its
contents.

This change does not effect lld because the wasm "table" in
object file format is purely cosmetic and lld ignores its
contents.

Is "cosmetic" the right term? Is it really to ensure that the object file still validates?

This change does not effect lld because the wasm "table" in
object file format is purely cosmetic and lld ignores its
contents.

Is "cosmetic" the right term? Is it really to ensure that the object file still validates?

I can reword. The key is that lld basically ignores it.

ncw added a comment.Jan 31 2018, 10:55 AM

Thanks! Description looks good. (Could be "affect" :) )

lib/MC/WasmObjectWriter.cpp
502 ↗(On Diff #132154)

Hmm, maybe it would be better to keep the check. In fact, there are several places where we do a check using "!thing.count()", when we could instead be using "find" to do the lookup only once. That would be ideal I guess, maybe with "findOrAbort" static helper function.

sbc100 added inline comments.Jan 31 2018, 11:02 AM
lib/MC/WasmObjectWriter.cpp
502 ↗(On Diff #132154)

Doesn't std::vector::operator[] effectively give us the "findOrAbort" semantics?

I think I originally added this nice errors to aid debugging, but if they shouldn't ever occur in practice then we should probably at most use a single assert (and perhaps just really on std::vector to blow up for us?)

ncw added inline comments.Jan 31 2018, 11:24 AM
lib/MC/WasmObjectWriter.cpp
502 ↗(On Diff #132154)

I think it's actually a map, so it currently has "insert null if not present" semantics. So there will be a null-dereference crash which effectively an assert.

Feel free to add an assert when committing - or I can update the diff if you'd prefer?

What at lot of trouble over a small change, I'm sorry about this.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2018, 11:32 AM
This revision was automatically updated to reflect the committed changes.
sbc100 reopened this revision.Jan 31 2018, 11:45 AM
sbc100 accepted this revision.
This revision is now accepted and ready to land.Jan 31 2018, 11:45 AM
sbc100 closed this revision.Jan 31 2018, 1:57 PM