This is an archive of the discontinued LLVM Phabricator instance.

Correct aligment computation for shared object symbols
ClosedPublic

Authored by shenhan on May 31 2018, 12:31 PM.

Details

Summary

The original computation for shared object symbol alignment is wrong when st_value equals 0.

It is very unusual for dso symbols to have st_value equal 0. But when it happens, it causes obscure run time bugs.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

shenhan created this revision.May 31 2018, 12:31 PM
shenhan added subscribers: echristo, ruiu.
shenhan edited the summary of this revision. (Show Details)May 31 2018, 1:21 PM
ruiu added a comment.May 31 2018, 1:22 PM

Ahh, that's obviously a bug. Thank you for finding! Could you write a test?

shenhan updated this revision to Diff 149348.May 31 2018, 1:22 PM

Thanks Rui. I just updated the patch a little bit, that - st_value is only valid for alignment when symbol shndx is set to SHN_COMMON.

I'll add a test case and fix other failing tests. Thanks.

ruiu added inline comments.May 31 2018, 1:26 PM
ELF/InputFiles.cpp
905

Are you sure if this is correct?

pcc added a subscriber: pcc.May 31 2018, 1:30 PM
pcc added inline comments.
ELF/InputFiles.cpp
905

It doesn't look right to me -- a dynamic symbol table should not contain symbols in SHN_COMMON.

shenhan added inline comments.May 31 2018, 1:32 PM
ELF/InputFiles.cpp
905

From ELF specification:

If a symbol’s value refers to a specific location within a section, its section index member, st_shndx,
holds an index into the section header table. (shenhan: if st_shndx contains a valid section index, then st_value is a specific location)

Some special section index values give other semantics.

SHN_COMMON The symbol labels a common block that has not yet been allocated. The symbol’s value
gives alignment constraints, similar to a section’s sh_addralign member. That is, the
link editor will allocate the storage for the symbol at an address that is a multiple of
st_value. (shenhan: if st_shndx equals SHN_COMMON, then st_value is the alignment requirement).

I put my understandings in the (shenhan: ...)

ruiu added a comment.May 31 2018, 1:34 PM

Common symbol is for (linkable) object files. Linked object files (i.e. execuables or DSOs) don't contain common symbols.

pcc added inline comments.May 31 2018, 1:39 PM
ELF/InputFiles.cpp
906

Actually, now that I think about it, I think this was correct before. The sh_addralign field says what the maximum alignment of anything in the section is going to be, and the code on lines 899-900 (in the old code) uses the symbol's st_value to infer a smaller alignment wherever possible.

ruiu added inline comments.May 31 2018, 1:45 PM
ELF/InputFiles.cpp
906

Hmm, you are probably right. Han, what are you trying to fix by this change?

shenhan added inline comments.May 31 2018, 1:51 PM
ELF/InputFiles.cpp
906

I cc'ed you and @pcc in the internal bug.

ELF/InputFiles.cpp
906

@pcc I debugged this function, the first 2 symbols this function processes are:

symbol1: shndx=18 st_value=0
symbol2: shndx=18 st_value=112

It appears to me st_value is more like a offset than a alignment information. (But it could be interpreted as alignment as well if we count the tailing zeros)

However, st_value=0 for symbol1 results in the final RET being 1 in the original code, that is apparently not correct.

So definitely the alignment calculation here is not correct. That's why if I change the variable name (so it does not appear as the first symbol) the program runs correctly.

ruiu added inline comments.May 31 2018, 2:10 PM
ELF/InputFiles.cpp
906

I guess that Ret should just be initialized to (uint64_t)-1?

By the way, all that confusion could have been prevented if you have added a test case to this patch from beginning, so please always include a test case (from the beginning) to demonstrate what you are fixing. I guess that without a test, even you are not sure that this fix is correct...

pcc added inline comments.May 31 2018, 2:11 PM
ELF/InputFiles.cpp
906

In a linked executable or shared library the st_value is supposed to be a virtual address, not the offset from the start of a section. So it's surprising to see st_value=0 here, unless maybe this is a special symbol like __ehdr_start. Maybe the problem is with the producer of the .so?

Thanks @ruiu .

@pcc do you agree with initializing Ret to uint64-1?

ruiu added a comment.May 31 2018, 2:36 PM

Can you first add a test to demonstrate the issue you are fixing?

pcc added a comment.May 31 2018, 2:39 PM

Thanks @ruiu .

@pcc do you agree with initializing Ret to uint64-1?

Sounds good. Even in the case where the virtual address is intentionally 0, we would still want to use the sh_addralign alignment.

shenhan updated this revision to Diff 149374.May 31 2018, 3:43 PM
shenhan edited the summary of this revision. (Show Details)May 31 2018, 3:49 PM

Thanks @ruiu Updated the patch, also attached a test case.

pcc added inline comments.May 31 2018, 3:55 PM
ELF/InputFiles.cpp
899

Is that correct? If there is a symbol at virtual address 0 in an unknown section, doesn't that mean that we should error out via the Ret > UINT32_MAX code path below since we don't know what alignment to use?

test/ELF/relocation-copy-alignment.s
4

What are the st_values of the symbols in %t.so? I don't think this would result in any of them being 0. You might want to consider using yaml2obj to create the .so file.

shenhan added inline comments.May 31 2018, 4:41 PM
ELF/InputFiles.cpp
899

We have quite a few tests that link with libm.so, which has quite a few symbols with st_value as 0 and shndx as "ABS". Shall we handle symbols with "ABS" shndex separately (if symbols has absolute address, linker does not need to change its address or its address is not relevant, right?)

 221: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.4
 238: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.2.5
 394: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.15
 399: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.18
1736: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.15
1922: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.4
1953: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.18
2136: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS GLIBC_2.2.5
pcc added inline comments.May 31 2018, 6:00 PM
ELF/InputFiles.cpp
899

We should only end up in this function if we need to create a copy relocation for a symbol. Those symbols appear to be associated with symbol versions so we wouldn't expect to be linking against them directly. Furthermore I don't think there is anything sensible that we can do if we're asked to create a copy relocation for an absolute symbol, so I think the correct behaviour would be to error out.

ruiu added inline comments.May 31 2018, 8:01 PM
test/ELF/Inputs/relocation-copy-alignment.s
7

Do not include a trailing blank line. I'd recommend you do git diff before sending a patch and visit reviews.llvm.org to review your patch yourself on the web to catch this kind of errors.

test/ELF/relocation-copy-alignment.s
4

I want to see a test case with llvm-mc because it would be more realistic test case and demonstrate a real problem. With yaml2obj, it is perhaps too easy to create a nonsensical object file.

pcc added inline comments.May 31 2018, 8:09 PM
test/ELF/relocation-copy-alignment.s
4

I'm not sure if it's actually possible to do that using llvm-mc, though. It may be possible using a linker script, but that seems more fragile than testing for the property more directly.

ruiu added inline comments.May 31 2018, 8:13 PM
test/ELF/relocation-copy-alignment.s
4

That makes me wonder how this issue caused a trouble to our internal software. Looks like I need to take a look at that program...

shenhan added inline comments.Jun 1 2018, 10:36 AM
ELF/InputFiles.cpp
899

By reading/debugging the code - this "getAlignment" is called by "parseRest" and parseRest iterates all GlobalELFSyms() and calls getAlignment for each symbol (except when sym is undefined or binding is STB_LOCAL or its a mips symbol).

For the above example, GLIBC_2.4 is a global defined symbol, so lld tries to create COPY_RELOC for the symbol, the logic is correct here.

So shall we exclude such symbols (global defined symbol but with abs address 0) at the first place, eg. in parseRest?

pcc added inline comments.Jun 1 2018, 11:06 AM
ELF/InputFiles.cpp
899

You're right, we do end up querying the alignment for every symbol here:
http://llvm-cs.pcc.me.uk/tools/lld/ELF/InputFiles.cpp#967

The alignment is stored in a field here:
http://llvm-cs.pcc.me.uk/tools/lld/ELF/Symbols.h#264

But the only user of the alignment is in addCopyRelSymbol here:
http://llvm-cs.pcc.me.uk/tools/lld/ELF/Relocations.cpp#537

I think you'll find that we only call addCopyRelSymbol on symbols that we create a copy relocation for, and GLIBC_2.4 shouldn't be one of them.

We need to add absolute symbols to the symbol table so that they can be linked against, but we don't need to allow creating copy relocations for them. It would probably be fine to store 0 in the alignment field for such symbols and error out in addCopyRelSymbol if we see a 0.

shenhan updated this revision to Diff 149538.Jun 1 2018, 1:44 PM

@pcc thanks for the review. Updated the patch -

  1. if getAlignment sees a ABS symbols, returns 0
  2. initialize Ret to UINT64_MAX
  3. in "addCopyRelSymbol", errors out if alignment is 0.
  4. unit test case that tests for proper handling of ABS symbols.
ruiu added inline comments.Jun 1 2018, 1:54 PM
ELF/Relocations.cpp
532–534 ↗(On Diff #149538)

Can you really trigger this? Please always add a test when you add a new feature.

pcc added inline comments.Jun 1 2018, 1:58 PM
ELF/InputFiles.cpp
909

I was thinking that instead of erroring here you would return 0. Then you wouldn't need the special case for SHN_ABS.

shenhan updated this revision to Diff 149540.Jun 1 2018, 2:09 PM
shenhan added inline comments.
ELF/Relocations.cpp
532–534 ↗(On Diff #149538)

No, from the discussion, this could never be triggered, otherwise this is a linker error.
Replaced with assert.

ruiu added inline comments.Jun 1 2018, 2:12 PM
ELF/Relocations.cpp
532–534 ↗(On Diff #149538)

We don't add use assert() that much in lld. Please just remove.

shenhan updated this revision to Diff 149542.Jun 1 2018, 2:13 PM
shenhan updated this revision to Diff 149548.Jun 1 2018, 2:23 PM
shenhan marked 4 inline comments as done.
shenhan added inline comments.
ELF/InputFiles.cpp
909

Yup, done.

pcc added inline comments.Jun 1 2018, 3:05 PM
test/ELF/copy-relocation-zero-abs-addr.s
15 ↗(On Diff #149548)

I think this is still missing a test showing that a non-SHN_ABS symbol with st_value=0 will be aligned using the section's sh_addralign.

shenhan updated this revision to Diff 149589.Jun 1 2018, 5:32 PM
shenhan added inline comments.
test/ELF/copy-relocation-zero-abs-addr.s
15 ↗(On Diff #149548)

Thanks. Added test case new copy-relocation-zero-nonabs-addr.s.

Mind take another look?

ruiu added inline comments.Jun 5 2018, 9:26 AM
ELF/InputFiles.cpp
911

I'm not too excited about change. This error message was to catch a nearly-broken object file that has insanely large alignment requirement. Now you are making a change to ignore such object file.

pcc added inline comments.Jun 5 2018, 11:25 AM
ELF/Relocations.cpp
532–534 ↗(On Diff #149538)

It's possible to get here with something like (the somewhat contrived)

.globl abs
abs = 0
.size abs, 1

in a .so, I think. I was imagining that we'd change the if (SymSize == 0) above to be if (SymSize == 0 || SS.Alignment == 0). That would cover this case as well as the huge alignment case.

shenhan updated this revision to Diff 150037.Jun 5 2018, 1:56 PM
shenhan marked an inline comment as done.

Re-write the getAlignment function, keep the original logic but with the 3 additional tests:

  1. return 0 for symbols with absolute address
  2. for symbols with zero st_value, return section alignment.
  3. for symbols with zero st_value and whose section alignment does not exist, return 0
ELF/Relocations.cpp
532–534 ↗(On Diff #149538)

Done. Also, rewrite the getAlignment function, so huge alignment is erred out in getAlignment. And absolute symbols will error out here.

pcc added inline comments.Jun 5 2018, 2:06 PM
ELF/InputFiles.cpp
911

It should be possible to link against symbols with huge alignments, we just can't create copy relocations for them. So I think you can make InputFiles.cpp look more like your previous revision (i.e. return 0 on huge alignments). The check that you've added to Relocations.cpp should be enough to make sure that we error out in that case. We don't have to produce a good error message in that case because we expect it to be rare.

shenhan updated this revision to Diff 150050.Jun 5 2018, 3:16 PM
shenhan added inline comments.
ELF/InputFiles.cpp
911

Thanks, done.

pcc accepted this revision.Jun 5 2018, 4:07 PM

This LGTM if @ruiu is happy.

This revision is now accepted and ready to land.Jun 5 2018, 4:07 PM

@ruiu whatś your opinion on this?

ruiu accepted this revision.Jun 6 2018, 1:56 PM

LGTM

ELF/Relocations.cpp
532 ↗(On Diff #150050)

If this code is reachable by giving a (even contrived) object file, it should've been error() instead of fatal(), but you don't need to fix that since that's not new code. I'll address that in a follow-up patch.

This revision was automatically updated to reflect the committed changes.