This is an archive of the discontinued LLVM Phabricator instance.

[MC] Improve st_size propagation rule
ClosedPublic

Authored by MaskRay on Apr 6 2022, 11:56 PM.

Details

Summary

.symver foo, foo@ver creates the MCSymbolELF foo@ver whose almost all
attributes (including st_size) should inherit from foo (GNU as behavior).

a041ef1bd8905f0d58e301c6830b183002ff1847 added st_size propagation which works
for many cases but fails for the following one:

.set __GLIBC_2_12_sys_errlist, _sys_errlist_internal
.type   __GLIBC_2_12_sys_errlist,@object
.size   __GLIBC_2_12_sys_errlist, 1080
.symver __GLIBC_2_12_sys_errlist, sys_errlist@GLIBC_2.12
...
_sys_errlist_internal:
.size   _sys_errlist_internal, 1072

sys_errlist@GLIBC_2.12's st_size is 1072 (incorrect), which does not match
__GLIBC_2_12_sys_errlist's st_size: 1080.

The problem is that Base is (the final) _sys_errlist_internal while we want
to respect (the intermediate) __GLIBC_2_12_sys_errlist's st_size.
Fix this by following the MCSymbolRefExpr assignment chain and finding
the closest non-null getSize(), which covers most needs. Notably MCBinaryExpr
is not handled, but it is rare enough to matter.

Diff Detail

Event Timeline

MaskRay created this revision.Apr 6 2022, 11:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 11:56 PM
MaskRay requested review of this revision.Apr 6 2022, 11:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 11:56 PM
MaskRay updated this revision to Diff 421104.Apr 6 2022, 11:56 PM

drop an unneeded test change

MaskRay added inline comments.Apr 7 2022, 12:05 AM
llvm/test/MC/ELF/offset.s
50

I think this is an improvement. No objections from me. I had a quick thought about how we might be able to support the chained assignment case although it is only based on looking at this review so it could be flawed. Ideally we'd not wait to propagate the size in writeSymbol. I'm guessing that it is possible to write something like

.size x, 2
y = x
z = y
.size y, 1 // should also set size of z to 1.

So it is not as simple as propagating the size on assignment. Perhaps it would be some kind of size propagation step for all symbols prior to writing them.

llvm/lib/MC/ELFObjectWriter.cpp
556

Nit: which covers most needs or which can cover most needs

558

Would it be possible to follow the chain to get to the last non-zero size. This is assuming that in the case

.size x, 2
y = x
.size y, 1
z = y
z1 = z

*Sym = dyn_cast<MCSymbolRefExpr>(Symbol.getVariableValue(false) for z1 will give z
and Symbol.isVariable() is true for Sym, in that case can we use another if (auto *Sym = dyn_cast<MCSymbolRefExpr>(Symbol.getVariableValue(false)) to get to y, check that size is non zero and use that?

If this is possible I don't think it would be too much more complex.

I tested this patch on my glibc clang branch and while it has fixed the st_value aliases issues, it now triggers another regression:

$ cat /tmp/test.S 
.symver __GLIBC_2_1_sys_nerr, sys_nerr@GLIBC_2.2.5
.symver __GLIBC_2_1__sys_nerr, _sys_nerr@GLIBC_2.2.5

                                        # End of file scope inline assembly
        .type   __GLIBC_2_1_sys_nerr,@object    # @__GLIBC_2_1_sys_nerr
        .section        .rodata,"a",@progbits
        .globl  __GLIBC_2_1_sys_nerr
        .p2align        2
__GLIBC_2_1_sys_nerr:
        .long   125                             # 0x7d
        .size   __GLIBC_2_1_sys_nerr, 4

        .globl  __GLIBC_2_1__sys_nerr
.set __GLIBC_2_1__sys_nerr, __GLIBC_2_1_sys_nerr
$ clang /tmp/test.S -c -o /tmp/test-clang.os
$ $ readelf -s /tmp/test-clang.os  | grep _sys_nerr@
     4: 0000000000000000     0 OBJECT  GLOBAL DEFAULT    3 _sys_nerr@GLIBC_2.2.5

It should have the size '4' instead of '0'.

MaskRay updated this revision to Diff 421364.Apr 7 2022, 4:43 PM
MaskRay marked 2 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Implement the chained lookup. Fix zatrazz's case.

llvm/lib/MC/ELFObjectWriter.cpp
558

Thanks for the suggestion. Adopted.

MaskRay added a comment.EditedApr 7 2022, 4:55 PM

So it is not as simple as propagating the size on assignment. Perhaps it would be some kind of size propagation step for all symbols prior to writing them.

Agree. The complexity is largely because the base symbol's size may be set after an assignment.

The size propagation basically needs to be done in a topological order (e.g. reverse post-order).
It can be implemented as a separate pass, or as the current approach does.
The current approach just has quadratic complexity in terms of the number of assignments. I think it is fine since MC already has such quadratic behaviors in several places (e.g. Asmparser "Recursive use of " error checking, Layout.getBaseSymbol(Symbol) just few lines above) and the complexity doesn't matter in practice (it's difficult to have a too-long assignment chain).
What's implemented here seems sufficient for a large number of use cases of using linker scripts symbol assignments and .symver.

peter.smith accepted this revision.Apr 8 2022, 5:27 AM

Thanks for the update. Assuming zatrazz has no objections LGTM.

This revision is now accepted and ready to land.Apr 8 2022, 5:27 AM
This revision was landed with ongoing or failed builds.Apr 8 2022, 10:06 AM
This revision was automatically updated to reflect the committed changes.