This is an archive of the discontinued LLVM Phabricator instance.

[llvm-size][libobject] Add explicit "inTextSegment" methods similar to "isText" section methods to calculate size correctly.
ClosedPublic

Authored by rupprecht on Nov 9 2018, 5:18 PM.

Details

Summary

llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable.

The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files.

This fixes pr38723.

Diff Detail

Repository
rL LLVM

Event Timeline

rupprecht created this revision.Nov 9 2018, 5:18 PM

I think this is the right direction, couple comment:

  • More detail in the comments for the *Segment methods. Maybe something elaborating about link/execution time or something else? Examples as well :)
  • While it's fairly obvious it'll work, a couple of sections with the right flags but not named the standard thing would be good for the test. .text.unlikely and .data1 maybe?
MaskRay added inline comments.Nov 10 2018, 1:25 PM
include/llvm/Object/ELFObjectFile.h
767 ↗(On Diff #173478)

The parentheses outside of ! are redundant.

GNU size uses Berkeley format by default (static int berkeley_format = BSD_DEFAULT;)
I'd say it uses a weird counting https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=binutils/size.c;h=3c72e484752d0272a24ac628f497f89ecf36d547;hb=refs/heads/master#l459

459 if ((flags & SEC_CODE) != 0 || (flags & SEC_READONLY) != 0)
460 textsize += size;
461 else if ((flags & SEC_HAS_CONTENTS) != 0)
462 datasize += size;
463 else
464 bsssize += size;

Basically SHF_EXECINSTR sections (.text) + !SHF_WRITE sections (.rodata .eh_frame ...).

I feel the name inTextSegment is a bit misleading. How about isBerkeleyText?

inDataSegment may be renamed to isBerkeleyData.

MaskRay added inline comments.Nov 10 2018, 1:50 PM
include/llvm/Object/ELFObjectFile.h
767 ↗(On Diff #173478)

The Berkeley format is not that weird. It uses some simple condition to approximate how the traditional linkers partition sections into segments.

With the advent of fine-grained segments (split of R and RX): -z separate-code in ld.bfd and lld's default (unless --no-rosegment), the computation may be less relevant. isBerkeleyText makes more sense to me.

rupprecht updated this revision to Diff 177953.Dec 12 2018, 3:35 PM
  • Remove useless parens
  • Rename inXXXSegment->isBerkeleyXXX
  • Clarify segment comments
  • Add test cases that ensure section flags are used instead of section names
rupprecht marked 2 inline comments as done.Dec 12 2018, 3:39 PM

Sorry, for not getting to this patch sooner, I dropped it due to vacation + some unexpected things... I'm still eager to get this patch in :)

I think this is the right direction, couple comment:

  • More detail in the comments for the *Segment methods. Maybe something elaborating about link/execution time or something else? Examples as well :)

Expanded a bit, I can add more if this isn't enough

  • While it's fairly obvious it'll work, a couple of sections with the right flags but not named the standard thing would be good for the test. .text.unlikely and .data1 maybe?

Done, added a couple examples both ways (named .text.* but not actually text, and named .something_random but is actually data)

(Addressed Ray's comments too)

MaskRay accepted this revision.Dec 12 2018, 3:52 PM
MaskRay added inline comments.
include/llvm/Object/ObjectFile.h
111 ↗(On Diff #177953)

Typo here. initialized -> uninitialized

https://en.wikipedia.org/wiki/.bss "BSS reserves a block of uninitialized data"

This revision is now accepted and ready to land.Dec 12 2018, 3:52 PM
This revision was automatically updated to reflect the committed changes.
rupprecht marked an inline comment as done.