This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Implemented linkerscript sections padding.
ClosedPublic

Authored by grimar on Feb 15 2016, 7:51 AM.

Details

Reviewers
ruiu
rafael
Summary

BSD linker scripts contain special cases to add NOP padding to code sections. Syntax is next:

.init:
 {
   KEEP (*(.init))
 } =0x90909090

(0x90 is NOP)

This patch implements that functionality.

Diff Detail

Event Timeline

grimar updated this revision to Diff 47993.Feb 15 2016, 7:51 AM
grimar retitled this revision from to [ELF] - Implemented linkerscript sections padding..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar updated this object.
grimar added subscribers: llvm-commits, grimar.
emaste added a subscriber: emaste.Feb 15 2016, 8:37 AM
ruiu added inline comments.Feb 16 2016, 11:18 AM
ELF/InputSection.cpp
250–261 ↗(On Diff #47993)

This is not a right place to add this code. You want to add code to OutputSection<ELFT>::writeTo instead. I think the following code should suffice as an initial implementation.

+static void fill(uint8_t *Buf, size_t Size, ArrayRef<uint8_t> A) {
+  size_t I = 0;
+  for (; I < Size; I += A.size())
+    memcpy(Buf + I, A.data(), A.size());
+  memcpy(Buf + I, A.data(), Size - I);
+}
+
 template <class ELFT> void OutputSection<ELFT>::writeTo(uint8_t *Buf) {
+  ArrayRef<uint8_t> Filler = Script->getFiller(Name);
+  if (!Filler.empty())
+    fill(Buf, getSize(), Filler);
   for (InputSection<ELFT> *C : Sections)
     C->writeTo(Buf);
 }
grimar added inline comments.Feb 17 2016, 2:57 AM
ELF/InputSection.cpp
250–261 ↗(On Diff #47993)

Your code mixes the filling byters order it seems. I mean if we have two sections, 1 byte each with alignment 4, then if we have =0x11223344 filler:

My code will produce: [SEC1] 0x11 0x22 0x33 [SEC2].
Your one will have [SEC1] 0x22 0x33 0x44 [SEC2].

I am not sure how much that is important as I saw fillers are used for NOPs which always were 0x90909090. If we are assuming that that is not important at all, then instead of fill() call we can use

memset(Buf, Filler[0], getSize());

But if we want to support the order (and use all bytes), then I will adopt the algorithm.
Anyways I dont think we should use the fill() as it is now since it is somewhere in the middle.
What do you think ?

ruiu added inline comments.Feb 17 2016, 9:08 AM
ELF/InputSection.cpp
250–261 ↗(On Diff #47993)

I understand your point. As you might have implied, fill() should just work, so I'd like to go with that. That is way simpler than the current approach and probably faster (memcpy is usually pretty fast, which I experimented with Windows LLD.) I don't think setting the first byte is a good idea. NOP is one byte instruction on x86, but on RISC processors it is not.

grimar updated this revision to Diff 48285.Feb 18 2016, 1:15 AM
  • Addressed review comments.
grimar added inline comments.Feb 18 2016, 1:17 AM
ELF/InputSection.cpp
250–261 ↗(On Diff #47993)

Well, and the last means that version of fill() from above will not work as expected on RISC. It anyways will not work as expected for cases where one byte is not enough (because of shifted bytes). Moreover, what it worse, some times it _can_ work correctly, so that is random bug.

I don't argue about it is faster and simplier than current approach. Thats true. I just want to say that fill() should ideally be fixed to be correct and I am ready to do that. Since you insisted to use it as is, I did that in current patch iteration (just fixed the buffer overrun).

Current implementation should at least have assert/error that checks that all 4 bytes are equal I think.

ruiu added inline comments.Feb 18 2016, 12:53 PM
ELF/LinkerScript.cpp
444–451

So you cannot use getAsInteger because it may be larger than the integer size.

ELF/LinkerScript.h
45

I realized that I didn't actually understand what you said until now (because my approach should work and memcpy is the right way to copy filler to the section). You seems to misunderstand the feature.

If a filler is of the form 0x<hex digits>, then it is an arbitrary long sequence of hex digits. It is not a four byte value.

So this type needs to be std::vector<uint8_t>.

grimar added inline comments.Feb 19 2016, 7:27 AM
ELF/LinkerScript.h
45

Let me try to explain again then. Consider the next script and code:

SECTIONS {
 .mysec : { *(.mysec*) } =0x11223344 
}
.section        .mysec.1,"a"
.align  4
.byte   0x66

.section        .mysec.2,"a"
.align  4
.byte   0x66

.globl _start
_start:
 nop

Output from my initial implementation would be: 0x66 0x11 0x22 0x33 0x66. What is equal to gold and bytes are used correctly (in growing order, started from first for each 'hole' heeds filling).

Output from your approach is: 0x66 0x22 0x33 0x44 0x66. So bytes are shifted what makes no sence in using bytes at all, since result is unexpected.

So this type needs to be std::vector<uint8_t>.

Agree here, just checked gold, it allows any length filler it seems:

SECTIONS
{
  . = 0x08048000 + SIZEOF_HEADERS;
  .mysec        : { *(.mysec*) } =0x11223344556677889911223344
}
ruiu added inline comments.Feb 22 2016, 3:22 PM
ELF/LinkerScript.h
45

I think I completely understand your point, and still I prefer my approach.

Speaking of Filler's type, can you change the type to std::vector<uint8_t>?

grimar updated this revision to Diff 48912.Feb 24 2016, 4:45 AM
  • Addressed review comments
ruiu added inline comments.Feb 24 2016, 5:20 PM
ELF/LinkerScript.cpp
63–66

Why don't you just do

for (OutSection &C : OutSections)
  if (C.Name == Name)
    return C.Filler;
return {};

?

69–70

This is a bit odd way of doing it because we can use lambdas.

auto I = std::find_if(Begin, End, [](OutSection &C) { return C.Name == A; });
auto J = std::find_if(Begin, End, [](OutSection &C) { return C.Name == B; });
462–469

Define this piece of code as a separate function, say,

std::vector<uint8_t> readHex(StringRef S);

and replace this code with

OutSec.Filler = readHex(Tok);
grimar updated this revision to Diff 49022.Feb 25 2016, 1:44 AM
grimar marked 2 inline comments as done.
  • Addressed review comments.
  • Updated testcase.
ELF/LinkerScript.cpp
69–70

That because I used this functor in getFiller() either before.
Not sure that labdas are looking better here now.

btw that can be something like:

OutSection *I = nullptr;
OutSection *J = nullptr;
for (OutSection &C : OutSections) {
  if (!I && C.Name == A)
    I = &C;
  if (!J && C.Name == B)
    J = &C;
  if (I && J)
    return I < J ? -1 : 1;
}
return 0;
ruiu added inline comments.Feb 25 2016, 9:25 AM
ELF/LinkerScript.cpp
69–70

This new code with lambdas looks good to me.

417

Let's rename parseHex. (Because readSomething usually reads Something from the current token stream.)

419

I don't think you need to call reserve. We are handling tiny data, so allocation cost should be negligible.

423

slice(2, S.size()) -> substr(2)

460–461

Do you need to add check here? You can just go proceed.

grimar updated this revision to Diff 49165.Feb 26 2016, 1:29 AM
grimar marked 4 inline comments as done.
  • Addressed review comments
ELF/LinkerScript.cpp
417

Done.

419

Removed.

423

Done.

460–461

Removed check.

ruiu accepted this revision.Feb 26 2016, 5:07 AM
ruiu edited edge metadata.

LGTM

ELF/LinkerScript.cpp
61

return{} -> return {}

453

This is not safe because Tok may be empty.

if (Tok.startswith("="))
This revision is now accepted and ready to land.Feb 26 2016, 5:07 AM
grimar marked 2 inline comments as done.Feb 26 2016, 6:53 AM
grimar added inline comments.
ELF/LinkerScript.cpp
453

Done.

grimar marked an inline comment as done.Feb 26 2016, 6:53 AM
grimar added inline comments.
ELF/LinkerScript.cpp
61

Done.

grimar closed this revision.Feb 26 2016, 6:55 AM

r262020