This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add --set-section-alignment
ClosedPublic

Authored by MaskRay on Sep 17 2019, 5:21 AM.

Details

Summary

Fixes PR43181. This option was recently added to GNU objcopy (binutils
PR24942).

llvm-objcopy -I binary -O elf64-x86-64 --set-section-alignment .data=8 can set the alignment of .data.

Diff Detail

Event Timeline

MaskRay created this revision.Sep 17 2019, 5:21 AM

binutils commit fa463e9fc644e7a3bad39aa73bf6be72ea865805 (2019-08-28) implemented --set-section-alignment.

  • Its interaction with --rename-section is unclear. We disallow the combination of --rename-section + --set-section-flags because both options can specify the flags. --set-section-alignment does not have the conflict.
  • In GNU objcopy, --set-section-alignment .data=8 specifies the exponent for the power of two, so this sets the alignment to 256. I think this a strange behavior.

Reported both issues at https://sourceware.org/bugzilla/show_bug.cgi?id=24942

MaskRay updated this revision to Diff 220486.Sep 17 2019, 5:31 AM

Delete unused SectionAlignmentUpdate

grimar added inline comments.Sep 17 2019, 5:49 AM
test/tools/llvm-objcopy/ELF/binary-input.test
120

We sometimes use something like:

# ALIGN:      Name: .data
# ALIGN:      AddressAlignment:
# ALIGN-SAME: 8

Its a bit more straightforward way probably.

test/tools/llvm-objcopy/ELF/set-section-alignment.test
28

The number of spaces between ":" and "error" is different in these 4 tests. I'd suggest stick to a single space everywhere.
(or to align this message if your aim was to have a vertical aligned "error: " lines).

btw, should this one be a error? Tools sometimes allows overriding the options (e.g. linker), it does not seem to be a huge problem to set
the alignment multiple times.

40

What about adding a section with alignment != 0 and a test that shows it can be dropped to 0?

MaskRay updated this revision to Diff 220530.Sep 17 2019, 9:59 AM
MaskRay marked 2 inline comments as done.

Address review comments
Depends on D67668

MaskRay added inline comments.Sep 17 2019, 10:06 AM
test/tools/llvm-objcopy/ELF/set-section-alignment.test
28

or to align this message if your aim was to have a vertical aligned "error: " lines

Yes, they are aligned by error: .

btw, should this one be a error?

I agree this does not have to be an error. I'll let the last option win.

jakehehrlich added inline comments.Sep 17 2019, 11:42 AM
tools/llvm-objcopy/ELF/ELFObjcopy.cpp
769

I'm confused, you still go though handleArgs here. Why do you need to handle this flag above?

MaskRay updated this revision to Diff 220600.Sep 17 2019, 7:29 PM
MaskRay marked an inline comment as done.

Delete change to BinaryReader
Don't depend on insert_or_assign.

MaskRay marked an inline comment as done.Sep 17 2019, 7:30 PM
MaskRay added inline comments.
tools/llvm-objcopy/ELF/ELFObjcopy.cpp
769

They are not needed. handleArgs sets the alignment of the newly created .data. I don't know why I added the code.

MaskRay marked 2 inline comments as done.Sep 17 2019, 7:33 PM
MaskRay updated this revision to Diff 220601.Sep 17 2019, 7:33 PM

Forgot to fix a test

abrachet marked an inline comment as done.Sep 17 2019, 7:56 PM
abrachet added inline comments.
tools/llvm-objcopy/ELF/ELFObjcopy.cpp
673

Is this idea behind not putting this and the SetSectionFlags loop that does the same thing in the same loop because they are unlikely to be used? Without looking very far I see 4 times including here that we go through every section.

Not that it makes sense for this patch but it might eventually be worth it to hash every section by name into a StringMap so we only iterate once over all sections and can handle SetSectionAlignment, SetSectionFlags and SectionsToRename more efficiently.

MaskRay marked an inline comment as done and an inline comment as not done.Sep 17 2019, 8:29 PM
MaskRay added inline comments.
tools/llvm-objcopy/ELF/ELFObjcopy.cpp
673

I think the time is dominated by the hash table lookup SetSectionAlignment.find, not the loop itself. So separate loop is probably better if we unlikely do lots of different operations in the same invocation of llvm-objcopy.

The same strategy is used by replaceAndRemoveSections. Though, I am not there it is the best there. The updateAndRemoveSymbols way may be simpler. I'd like to explore whether regrouping some operations may make the logic clearer there.

grimar accepted this revision.Sep 18 2019, 1:40 AM

It looks good to me, but please wait until someone else approve this too.

This revision is now accepted and ready to land.Sep 18 2019, 1:40 AM

I'm very happy to see this change, but I hesitate to diverge in behaviour from GNU's behaviour. I certainly prefer --set-section-alginment to set it to an arbitrary alignment though, so I think we need to lean on the GNU maintainers to change their end.

test/tools/llvm-objcopy/ELF/set-section-alignment.test
36

What about a test case for where the section does not exist?

tools/llvm-objcopy/CopyConfig.cpp
172

Maybe "invalid alignment" would be a slightly better wording? I'm not sure though.

604

It's not clear to me what SAU stands for. Perhaps NameAndAlignment would be a better name?

tools/llvm-objcopy/ELF/Object.h
949–950

This change seems unrelated? (Happy for it to be a separate commit though).

MaskRay marked 6 inline comments as done and an inline comment as not done.Sep 23 2019, 8:33 PM

I'm very happy to see this change, but I hesitate to diverge in behaviour from GNU's behaviour. I certainly prefer --set-section-alginment to set it to an arbitrary alignment though, so I think we need to lean on the GNU maintainers to change their end.

The behavior difference is also my concern, so I hold off on this patch. I argue at https://sourceware.org/bugzilla/show_bug.cgi?id=24942#c9 that --set-section-alignment .foo=8 => sh_addralign=256 is counterintuitive. Please chime in if you feel the same.

tools/llvm-objcopy/CopyConfig.cpp
604

Changed to NameAndAlign (NameAndAlignment would cause a line wrap).

tools/llvm-objcopy/ELF/Object.h
949–950

Thanks for noting this. I originally added a new parameter here. It seems I forgot to delete the parameter when I switched to a new approach.

MaskRay updated this revision to Diff 221460.Sep 23 2019, 8:34 PM
MaskRay marked 2 inline comments as done.

Address review comments

I'm very happy to see this change, but I hesitate to diverge in behaviour from GNU's behaviour. I certainly prefer --set-section-alginment to set it to an arbitrary alignment though, so I think we need to lean on the GNU maintainers to change their end.

The behavior difference is also my concern, so I hold off on this patch. I argue at https://sourceware.org/bugzilla/show_bug.cgi?id=24942#c9 that --set-section-alignment .foo=8 => sh_addralign=256 is counterintuitive. Please chime in if you feel the same.

When you say chime in do you mean here or on that bug report?

Yes I also think that specifying in powers of two is very strange when no other alignment related flags accept input like this. It seems like from what the people on that bug report said they went for this behavior because bfd represents section alignment in this form. Which is a silly reason to make users need to just remember that this one flag accepts alignment as 2^<align>. In their patch, I wouldn't even get an error --set-section-alignment .foo=256 would later become 1 << 256 if I followed their code correctly after a quick skim. Now my section has sh_addralign=0 with no warning why, this flag isn't matured yet on their end.

I don't know how long you will need to wait for them to change it given this was committed very recently, and they seemed uninterested in your suggestion to change it to something more sensible.

I'm very happy to see this change, but I hesitate to diverge in behaviour from GNU's behaviour. I certainly prefer --set-section-alginment to set it to an arbitrary alignment though, so I think we need to lean on the GNU maintainers to change their end.

The behavior difference is also my concern, so I hold off on this patch. I argue at https://sourceware.org/bugzilla/show_bug.cgi?id=24942#c9 that --set-section-alignment .foo=8 => sh_addralign=256 is counterintuitive. Please chime in if you feel the same.

When you say chime in do you mean here or on that bug report?

I mean https://sourceware.org/bugzilla/show_bug.cgi?id=24942 , the bugzilla ticket where people discuss the --set-section-alignment option in GNU objcopy. Can you please forward the comment there:) ?

If you'd like to play with GNU objcopy:

git clone git://sourceware.org/git/binutils-gdb.git
cd binutils-gdb
mkdir Debug
../configure --enable-binutils --disable-gas --disable-gdb --disable-gold --disable-ld CFLAGS='-O0 -g'
make -j all-binutils
# Built binutils/objcopy
jhenderson accepted this revision.Sep 24 2019, 2:03 AM

Looks good to me, but as agreed, let's wait for the GNU objcopy discussion to reach a conclusion before committing. I've posted there expressing my view. FWIW, our proprietary equivalent to objcopy uses raw values for this, rather than powers of two stuff, so the behaviour proposed in this patch would make life easier for them if and when they migrate over.

Looks like the author of that patch submitted a new patch to accept alignment of power of twos not 2^<align> after James’ comment :)

That patch will give an error if the given <align> is not a power of two Including 0. Do you have thoughts on erroring for non powers excluding 0?

Also, it might be worth adding a test setting alignment to a negative value to make sure StringRef::getAsInteger<unsigned> properly fails.

Looks good to me, but as agreed, let's wait for the GNU objcopy discussion to reach a conclusion before committing. I've posted there expressing my view. FWIW, our proprietary equivalent to objcopy uses raw values for this, rather than powers of two stuff, so the behaviour proposed in this patch would make life easier for them if and when they migrate over.

+1, whichever choice we take, the two tools should be in agreement, so let's wait. Also a personal +1 for preferring setting the literal value as in this patch rather than the exponent.

test/tools/llvm-objcopy/ELF/binary-input.test
119–120

I don't think splitting it into -SAME lines is very readable here; it seems more useful in cases where it needs to be a different check prefix. Combine it?

test/tools/llvm-objcopy/ELF/set-section-alignment.test
8–10

Same for all the -SAME in this file

38–40

Also add a test for a very large alignment, e.g. 2^33?

tools/llvm-objcopy/CopyConfig.cpp
170–173

What about an error check for NewAlign != 2^x?

MaskRay marked 6 inline comments as done.Sep 24 2019, 11:56 PM
MaskRay added inline comments.
test/tools/llvm-objcopy/ELF/set-section-alignment.test
8–10

If I do not use -SAME, I'll have to enumerate all fields between Name and AddressAlignment, with -NEXT:. I think -SAME is clearer.

CHECK: AddressAlignment: 4 can match something like:

Name: foo
AddressAlignment: 8

Name: bar
AddressAlignment: 4
38–40

Huge alignment like --set-section-alignment=.foo=0x100000001 cannot be used in portable tests. The llvm-objcopy segment/section writer will advance file offsets to a multiple of the alignment, which will create a huge sparse file. If the underlying file system does not support sparse files, we may get ENOSPC.

(As an missing optimization, we can change the layout algorithm to not advance file offsets for the next section of SHT_NOBITS (GNU objcopy has such a rule), but I don't think that is worthwhile: we would lose the nice property that sh_offset is monotonically non-decreasing)

Given the file size problem, I do not want to test the 2^33 case.

tools/llvm-objcopy/CopyConfig.cpp
170–173

set-section-alignment.test tests non-power-of-2 alignments, e.g.

# RUN: llvm-objcopy --set-section-alignment .foo=4 --set-section-alignment .bar=0x5 \
# RUN:   --set-section-alignment .baz=0 %t %t.2

I don't think that makes sense in practice, but supporting that does not hurt anything, and saves us some error checking code.

jhenderson added inline comments.Sep 25 2019, 1:59 AM
test/tools/llvm-objcopy/ELF/set-section-alignment.test
8–10

+1 to keeping -SAME. I find it less readable if I have to read lots of irrelevant fields. Maybe indenting the number might make it more readable, but I'm not sure:

# CHECK:      Name: .foo
# CHECK:      AddressAlignment:
# CHECK-SAME:                   4
rupprecht added inline comments.Sep 25 2019, 10:55 AM
test/tools/llvm-objcopy/ELF/set-section-alignment.test
8–10

Ah, I see why this simplifies it now. I wasn't familiar with this FileCheck idiom -- maybe we should add this to the docs. I'll see if I can write up my understanding.

Now that I get it, there's a whole lot of other tests I think we can clean up :)

38–40

SGTM, can you just check manually that it works? I kinda want to just make sure that the getAsInteger method you're using won't silently fail for larger than int32 types.

If you had error checking for pow-2 types, you could also cheat by testing the error string since that would fail fast & not actually start to lay anything out in a file system (and hit the portability problems).

tools/llvm-objcopy/CopyConfig.cpp
170–173

It would silently generate an invalid ELF file at least, per http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_header: "... Currently, only 0 and positive integral powers of two are allowed."

I think it's useful because, while it seems we all agree it's more natural to have this be the raw value, there may be some that think this is the exponent, and may expect in your example that .bar has an alignment of 2^5.

I don't know that Mach-O/COFF have similar restrictions, so I don't know if this check can be general or needs to go in an ELF-specific error checker.

MaskRay marked 4 inline comments as done.Sep 25 2019, 8:02 PM
MaskRay added inline comments.
tools/llvm-objcopy/CopyConfig.cpp
170–173

I agree the ELF spec does not say the alignment can be a non-power-of-two. Allowing it can be seen as an extension, just like some language constructs that are accepted by gcc/clang but not allowed by the C/C++ standard. I think allowing non-power-of-two is fine:

  • There are many ways to create an executable/shared object that cannot run.
  • --set-section-alignment=5 can be seen as a strong signal that the user requests for an invalid setting. This can be used for testing purposes. For example, yaml2obj is capable to create lots of types of invalid objects.
  • sh_addralign is ignored for executables/object files. When applied on an object, a gold or lld user will soon notice the problem if they do --set-section-alignment=5 but expected 32 => ld.lld: error: b.o:(.baz): sh_addralign is not a power of 2, gold: error: b.o: invalid alignment 17 for section ".baz". GNU ld does not error, but silently rounded up sh_addralign. I think I shall report a bug.
  • Adding the error checking takes some lines of code.

So I think it is not necessary to restraint --set-section-alignment= to what the ELF spec allows.

MaskRay updated this revision to Diff 221883.Sep 25 2019, 8:03 PM

Align -SAME

jhenderson added inline comments.Sep 26 2019, 1:53 AM
test/tools/llvm-objcopy/ELF/binary-input.test
120

ALIGN-SAME: 8{{$}} maybe to avoid spuriously matching against '8000' or whatever.

Same goes elsewhere.

MaskRay updated this revision to Diff 221899.Sep 26 2019, 2:00 AM
MaskRay marked an inline comment as done.

Append {{$}}, i.e. use ALIGN-SAME: 8{{$}}

jhenderson accepted this revision.Sep 26 2019, 2:40 AM

Latest changes LGTM.

rupprecht accepted this revision.Sep 26 2019, 10:30 AM

LGTM too, thanks for the clarifications!

Thanks to everyone who has taken part in the discussion. GNU objcopy mainline has switched to the raw value instead of a power of two: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=146a3bb9cc78c459e63c23259d766b01c32df705
With a bit of luck this change may be included in the binutils 2.33 release sources https://sourceware.org/ml/binutils/2019-10/msg00011.html

I'll commit this change in a minute.

This revision was automatically updated to reflect the committed changes.