This is an archive of the discontinued LLVM Phabricator instance.

[lld][WebAssembly] Allow data symbols to extend past end of segment
ClosedPublic

Authored by sbc100 on May 11 2021, 3:28 PM.

Details

Summary

This fixes a bug with string merging which assumes that string symbols
are NULL terminated and therefore can end up trimming unused bytes.
Indeed this is exactly what happens the merge-string.s test.

I verified that this can happen in ELF too given the right conditions.

Diff Detail

Event Timeline

sbc100 created this revision.May 11 2021, 3:28 PM
sbc100 requested review of this revision.May 11 2021, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 3:28 PM
sbc100 updated this revision to Diff 344584.May 11 2021, 4:13 PM

add test

I'm not sure I understand what the description means, in particular how null-termination is related to the symbol extending past the end of the segment. (more generally I guess I don't understand why we want to allow symbols to extend past the end).

I'm not sure I understand what the description means, in particular how null-termination is related to the symbol extending past the end of the segment. (more generally I guess I don't understand why we want to allow symbols to extend past the end).

Each data symbol has a give size, but string tail merging ignores that an will treat each string as NULL terminated. In this example the symbol bar is of size 6 string merging removes the trailing 3 chars which are essentially unreachable but the symbol still has a size of 6 in the symbol table. Since its at the end of the section it end up being reports has overlapping the end of the section by 3 bytes... at least according the the size in the symbol table. I've verified the ELF does the same thing.

I'm not sure I understand what the description means, in particular how null-termination is related to the symbol extending past the end of the segment. (more generally I guess I don't understand why we want to allow symbols to extend past the end).

Each data symbol has a give size, but string tail merging ignores that an will treat each string as NULL terminated. In this example the symbol bar is of size 6 string merging removes the trailing 3 chars which are essentially unreachable but the symbol still has a size of 6 in the symbol table. Since its at the end of the section it end up being reports has overlapping the end of the section by 3 bytes... at least according the the size in the symbol table. I've verified the ELF does the same thing.

So now just only check that a symbol starts within the segment it is defined in.

dschuff added inline comments.May 11 2021, 5:57 PM
lld/test/wasm/merge-string.s
19

does .asciz mean that it puts a NUL after each one of the "bc" substrings here? (in other words, is it 'b' 'c' \0 'b' 'c' \0 ?

sbc100 added inline comments.May 11 2021, 6:06 PM
lld/test/wasm/merge-string.s
19

Yes, exactly. So the size here was actually wrong to begin with.

dschuff accepted this revision.May 11 2021, 6:17 PM
This revision is now accepted and ready to land.May 11 2021, 6:17 PM
This revision was landed with ongoing or failed builds.May 12 2021, 1:48 PM
This revision was automatically updated to reflect the committed changes.