This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF] Add support for INPUT_SECTION_FLAGS
ClosedPublic

Authored by peter.smith on Jan 15 2020, 2:39 AM.

Details

Summary

The INPUT_SECTION_FLAGS linker script command is used to constrain the section pattern matching to sections that match certain combinations of flags.

There are two ways to express the constraint.

  • withFlags: Section must have these flags.
  • withoutFlags: Section must not have these flags.

The syntax of the command is:

INPUT_SECTION_FLAGS '(' sect_flag_list ')'
sect_flag_list: NAME
| sect_flag_list '&' NAME

Where NAME matches a section flag name such as SHF_EXECINSTR, or the integer value of a section flag. If the first character of NAME is ! then it means must not contain flag.
As an example from the ld man page: https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html

SECTIONS {
  .text : { INPUT_SECTION_FLAGS (SHF_MERGE & SHF_STRINGS) *(.text) }
  .text2 :  { INPUT_SECTION_FLAGS (!SHF_WRITE) *(.text) }
}
  • .text will match sections called .text that have both the SHF_MERGE and SHF_STRINGS flag.
  • .text2 will match sections called .text that don't have the SHF_WRITE flag.

The flag names accepted are the generic to all targets and SHF_ARM_PURECODE as it is very useful to filter all the pure code sections into a single program header that can be marked execute never.

fixes PR44265 https://bugs.llvm.org/show_bug.cgi?id=44265

The GNU ld documentation for this feature is limited to the link above. I found the grammar https://github.com/bminor/binutils-gdb/blob/master/ld/ldgram.y clear enough, particularly sect_flags.

As I see it the main use case for INPUT_SECTION_FLAGS is porting projects from other linkers that primarily match by section flags rather than by name. For example Arm's proprietary linker scatter file notation uses notation like:

RO 0x8000 { *(+ro) }
RW +0 { *(+rw, +zi) }

Projects moving to linker scripts have to approximate this by section name and this can be a challenge. The INPUT_SECTION_FLAGS can ease this transition.

Diff Detail

Event Timeline

peter.smith created this revision.Jan 15 2020, 2:39 AM

First round of comments from me.
I mostly was focused on "is it possible to simplify the implementation for start?" question.

lld/ELF/LinkerScript.h
159

I wonder, can we have something like flagsMask here and everywhere instead of two variables?

lld/ELF/ScriptParser.cpp
1136

Are integers used in the wild? I wonder if we can stick to supporting named flags only,
it should be easier to read scripts I think.

1164

This is perhaps not a best idea how to handle flags?
I guess we should not accept MIPS flags for non-mips targets and so on?
(I see that we probably do not know the target plafrorm at this point, I think).

I wonder, can we just support a generic set for start? Will it be enough?
I.e. write/alloc/exec flags (+something else)?

1186

This probably is not correct way. What if we have a different unexpected token here, like "+"?
It seems you should be able to remove this setError.

1191

You need to print the value I think.

lld/test/ELF/input-section-flags.s
97

I'd suggest to rename sections instead.
To: .sec.a, .sec.ax, etc.

Because such comments are not really useful by themselves probably?

peter.smith marked 6 inline comments as done.Jan 15 2020, 11:17 AM

Thanks for the comments, I'll make an update tomorrow. I'll plan to:

  • Try the flagsMask + withMask to see if it looks any better.
  • Limit the recognised flags to what BFD accepts, I'd like to continue to recognise integers.
  • Look at the error reporting in the parser.
  • Update the tests.
lld/ELF/LinkerScript.h
159

I couldn't think of a way of doing it with just one flagsMask as for each flag bit there are 3 states (don't care, required set, required not-set) and with a single mask we've only got 2. An alternative to the above is to use flagsMask and withMask. Both with and without would add to flagsMask, but only with would add to withMask. Then to check we have just.

(sec->flags | flagsMask) == withMask;

That saves a comparison at, I guess a slight cost to readability, but we'd still need two values. Happy to change to that if you'd prefer it. Open to alternative suggestions about how to represent it.

lld/ELF/ScriptParser.cpp
1136

I've not seen them used, but it does offer an instant workaround if a new flag is added in a future release, or a more limited set of flags is chosen to be accepted.

1164

I'd be happy with that, although I'd prefer we supported integers in that case as then if we'd not included someone's favourite flag they can still use it. The GNU linker supports:

{ "SHF_WRITE", SHF_WRITE },
{ "SHF_ALLOC", SHF_ALLOC },
{ "SHF_EXECINSTR", SHF_EXECINSTR },
{ "SHF_MERGE", SHF_MERGE },
{ "SHF_STRINGS", SHF_STRINGS },
{ "SHF_INFO_LINK", SHF_INFO_LINK},
{ "SHF_LINK_ORDER", SHF_LINK_ORDER},
{ "SHF_OS_NONCONFORMING", SHF_OS_NONCONFORMING},
{ "SHF_GROUP", SHF_GROUP },
{ "SHF_TLS", SHF_TLS },
{ "SHF_MASKOS", SHF_MASKOS },
{ "SHF_EXCLUDE", SHF_EXCLUDE },

Which is pretty much just the generic ones. I know SHF_ARM_PURECODE would be really useful, but that could be represented as an integer.

1186

Agreed I'll rethink and see if I can come up with something better.

1191

Agreed, thanks for the suggestion

lld/test/ELF/input-section-flags.s
97

OK, I'll update.

MaskRay added inline comments.Jan 15 2020, 2:58 PM
lld/ELF/LinkerScript.cpp
434

This may need a rebase if D72775 lands first.

lld/ELF/LinkerScript.h
159

The transformation is (sec->flag | dont-care) = (withMask | dont-care), i.e.

(sec->flags | ~(withMask | withoutMask)) == (withMask | ~(withMask | withoutMask))

It harms readability and unlikely improves performance.

lld/ELF/ScriptParser.cpp
1164

We can do something like tools/llvm-readobj/ELFDumper.cpp:

#define ENUM_ENT(enum) { #enum, ELF::enum }

ENUM_ENT(SHF_WRITE)
ENUM_ENT(SHF_ALLOC)

I agree that the generic ones plus SHF_ARM_PURECODE should be sufficient.

lld/test/ELF/input-section-flags-diag1.test
2

Nit: -unknown-linux -> ``(This is generic, not Linux specific)

3

%t -> /dev/null

lld/test/ELF/input-section-flags-diag2.test
3

%t -> /dev/null

lld/test/ELF/input-section-flags.s
4

##

11

Nit: Indent continuation lines.

peter.smith marked 22 inline comments as done.Jan 16 2020, 9:19 AM

Thanks for the comments, I'll upload a new patch shortly.

lld/ELF/LinkerScript.cpp
434

I've rebased as D72775 has landed.

lld/ELF/LinkerScript.h
159

Thanks, I'll leave it as it is for now.

lld/ELF/ScriptParser.cpp
1164

I'll implement using ENUM_ENT and will restrict to generic + SHF_ARM_PURECODE. I don't think that it is that important to filter the available flags by Target as we are mapping unique name to value and we're not mapping back from non-unique value to name.

I've kept the ability to accept an int for now. Happy to remove if you'd prefer, given it reuses an existing function it is not a lot of extra effort to include.

lld/test/ELF/input-section-flags-diag1.test
2

will fix.

3

will fix

lld/test/ELF/input-section-flags-diag2.test
3

Will fix.

lld/test/ELF/input-section-flags.s
4

will fix

11

Will fix.

peter.smith marked 8 inline comments as done.
peter.smith edited the summary of this revision. (Show Details)

Address code review comments so far. Main changes:

  • Rebased
  • Used EnumEntry as in ELFDumper.cpp
  • Updated parser with better diagnostic message
  • Updated tests
MaskRay accepted this revision.Jan 16 2020, 3:56 PM
MaskRay added inline comments.
lld/ELF/ScriptParser.cpp
863

Mis-indented.

1164

Accept an integer is a nice extension.

The loop can be simplified as:

while (!errorCount()) {
  StringRef tok = unquote(next());
  bool without = tok.consume_front("!");
  if (llvm::Optional<uint64_t> flag = parseFlag(tok)) {
    if (without)
      withoutFlags |= *flag;
    else
      withFlags |= *flag;
  } else {
    setError("unrecognised flag: " + tok);
  }
  if (consume(")"))
    break;
  if (!consume("&")) {
    next();
    setError("expected & or )");
  }
}
This revision is now accepted and ready to land.Jan 16 2020, 3:56 PM
grimar added inline comments.Jan 17 2020, 1:48 AM
lld/ELF/ScriptParser.cpp
708

I think more natural way is to use consume:

if (consume("INPUT_SECTION_FLAGS"))
  std::tie(withFlags, withoutFlags) = readInputSectionFlags();

StringRef filePattern = next();
...
803

The same.

862

We parse "KEEP" in readInputSectionDescription, and you handle "INPUT_SECTION_FLAGS" for it right there.
Seems you can remove handling of "INPUT_SECTION_FLAGS" from here and move it to readInputSectionDescription too
(for the no-KEEP case).

This should allow avoid having withFlags = withoutFlags = 0; code (and variables itself) around.

1145

May be use llvm::StringSwitch instead?

grimar added inline comments.Jan 17 2020, 2:31 AM
lld/ELF/ScriptParser.cpp
1145

To clarify: StringSwitch looks probably much simpler than a combination of EnumEntry<unsigned> array + llvm::find_if.
So it seems could be:

 return StringSwitch<llvm::Optional<uint64_t>> (tok)
   .Case("SHF_WRITE", ELF::SHF_WRITE)
...

or

 return StringSwitch<llvm::Optional<uint64_t>> (tok)
   .Case(ENUM_ENT(SHF_WRITE))
...

(If we want to use defines. I am not sure in that, given that the number of flags is not so large now,
but have no strong objections)

Both ways still look better probably?

peter.smith marked 6 inline comments as done.Jan 17 2020, 4:23 AM

Thanks for the review. I'll upload another patch that I hope to have addressed all the comments.

lld/ELF/ScriptParser.cpp
708

Thanks for the suggestion, have applied and also in readOverlayDescription.

862

On my first attempt I had the logic in readInputSectionDescription. The problem I found was with the no section description case matched in:

else {
      // We have a file name and no input sections description. It is not a
      // commonly used syntax, but still acceptable. In that case, all sections
      // from the file will be included.
}

As INPUT_SECTION_DESCRIPTION always has parentheses it is matched by

else if (peek() == "(")

So if I write:

{ INPUT_SECTION_DESCRIPTION(some flag list) filespec }

it matches the else if (peek() == "("). Unfortunately the above is not easily distinguishable from:

{ INPUT_SECTION_DESCRIPTION(some flag list) filespec(inputsectiondescription) }

I couldn't think of an easy way of distinguishing between these two cases at this level. I'm open to suggestions as I'm not keen on having to clear the state.

1145

I've rewritten using StringSwitch, thanks for the suggestion, I think it looks simpler.

1164

Thanks for the suggestion, I've applied.

Updated diff. Main changes:

  • Use stringswitch with enumerations
  • Incorporate simplification suggestions
grimar added inline comments.Jan 17 2020, 5:36 AM
lld/ELF/ScriptParser.cpp
711

I think you can now get rid of tok = next();:

InputSectionDescription *cmd =
  readInputSectionRules(next(), withFlags, withoutFlags);
803

And here.

862

I see now, thanks.
There seems to be another problem of having "INPUT_SECTION_FLAGS" here:

I think that scripts like
"{ INPUT_SECTION_DESCRIPTION(list1) INPUT_SECTION_DESCRIPTION(list2) <the_rest_part>}" are accepted?
It is not good.

The solution I have in mind is to refactor the following code

// We have a file name and no input sections description. It is not a
// commonly used syntax, but still acceptable. In that case, all sections
// from the file will be included.
auto *isd = make<InputSectionDescription>(tok, withFlags, withoutFlags);
isd->sectionPatterns.push_back({{}, StringMatcher({"*"})});
cmd->sectionCommands.push_back(isd);
withFlags = withoutFlags = 0;

Into a different function that will also try to handle INPUT_SECTION_FLAGS. Sounds like it might work?
I am happy to try to experiment with this (or another approaches), but would like to clarify:

  1. Do we need/want to support this case? "We have a file name and no input sections description" case seems to be corner case (I am not sure how often it is used). And it is not clear if other linkers support this combination. I quess it is probably a reasonable one though.
  2. If we want it, then the question is: how much it is important to support this from start? It seems that leaving it for a follow-up patch (and changing this place to "first attemp logic" for now) can be a reasonable solution to proceed for now.
lld/test/ELF/input-section-flags-diag3.test
6

Seems it belongs to a different test case and needs an update?

grimar added inline comments.Jan 17 2020, 5:40 AM
lld/ELF/ScriptParser.cpp
862

The solution I have in mind is to refactor the following code

Ah, and this will not work because of the reason you've mentioned. Sorry.
I'd really leave this corner-case for a follow-up patch I think.

peter.smith marked 3 inline comments as done.Jan 17 2020, 6:39 AM

I'll remove the support for the corner-case of when there is just a filepattern. Most likely won't get the time today, if I don't then I'll update on Monday. Thanks for the comments.

lld/ELF/ScriptParser.cpp
711

I don't think I can, that line is the same as line 699 in the diff on the left hand side.

StringRef filePattern = next();
if (consume("INPUT_SECTION_FLAGS"))

I think I can rename tok to filePattern though.

862

I was just following the ld.bfd grammar, so I'm happy to remove it and leave it for a follow up patch if it were necessary.

lld/test/ELF/input-section-flags-diag3.test
6

Thanks I'll update the comment.

  • Removed support for corner case where there is just a filespec. This permits the INPUT_SECTION_FLAGS logic to be moved to readInputSectionDescription from readOutputSectionDescription.
  • Changed comment in test
MaskRay added inline comments.Jan 17 2020, 1:13 PM
lld/test/ELF/input-section-flags.s
52

Indent .outsec*

66

Indent .outsec1

grimar accepted this revision.Jan 19 2020, 12:20 PM

I've put a few suggestions, but generally this LGTM. Thanks!

lld/ELF/ScriptParser.cpp
711

Not sure I understand why there can be any problems with the following code?

if (consume("INPUT_SECTION_FLAGS"))
   std::tie(withFlags, withoutFlags) = readInputSectionFlags();
 InputSectionDescription *cmd =
     readInputSectionRules(next(), withFlags, withoutFlags);

(I tried and it works)
You do not use tok below the readInputSectionRules anyways.

The original code had the StringRef filePattern variable.
I am not sure how much useful to have it,
but at least it named the token somehow.

But in the following version

tok = next();
InputSectionDescription *cmd =
    readInputSectionRules(tok, withFlags, withoutFlags);

I do not see why it might be useful to do tok = next(); and not just inline the next()?

So I'd suggest to write in one of the following ways:

if (consume("INPUT_SECTION_FLAGS"))
   std::tie(withFlags, withoutFlags) = readInputSectionFlags();
 InputSectionDescription *cmd =
     readInputSectionRules(next(), withFlags, withoutFlags);

or stick to the original code:

if (consume("INPUT_SECTION_FLAGS"))
  std::tie(withFlags, withoutFlags) = readInputSectionFlags();
StringRef filePattern = next();
InputSectionDescription *cmd =
    readInputSectionRules(filePattern, withFlags, withoutFlags);
722

I'd suggest to early return in the "INPUT_SECTION_FLAGS" branch just like we do in the "KEEP" branch.
It seems probably a bit simpler to exit early here too?

if (tok == "INPUT_SECTION_FLAGS") {
  std::tie(withFlags, withoutFlags) = readInputSectionFlags();
  return readInputSectionRules(next(), withFlags, withoutFlags);
}
return readInputSectionRules(tok, withFlags, withoutFlags);
803

This comment was missed. Here you can inline the next() call too:

if (consume("INPUT_SECTION_FLAGS"))
  std::tie(withFlags, withoutFlags) = readInputSectionFlags();
cmd->sectionCommands.push_back(
    readInputSectionRules(next(), withFlags, withoutFlags));

This is what original code did, btw.

MaskRay added inline comments.Jan 19 2020, 1:51 PM
lld/ELF/ScriptParser.cpp
722

tok = next(); looks fine to me.. The return statement will become complex with the change.

peter.smith marked 3 inline comments as done.Jan 20 2020, 11:50 AM

Thanks for the comments. It has been a bit of a nightmare of a day and I've only just got to this. I'll apply the suggestions and commit tomorrow morning so I can catch any problems.

lld/ELF/ScriptParser.cpp
711

I see now, I missed the inlining of next(), I'll update to do

if (consume("INPUT_SECTION_FLAGS"))
   std::tie(withFlags, withoutFlags) = readInputSectionFlags();
 InputSectionDescription *cmd =
     readInputSectionRules(next(), withFlags, withoutFlags);
722

I'll keep as it is for the moment.

803

Apologies for missing it, I'll update to inline.

This revision was automatically updated to reflect the committed changes.