This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Section-handling refactoring/overhaul
ClosedPublic

Authored by maksfb on Oct 7 2022, 2:49 PM.

Details

Summary

Simplify the logic of handling sections in BOLT. This change brings more
direct and predictable mapping of BinarySection instances to sections in
the input and output files.

  • Only sections from the input binary will have a non-null SectionRef. When a new section is created as a copy of the input section, its SectionRef is reset to null.
  • RewriteInstance::getOutputSectionName() is removed as the section name in the output file is now defined by BinarySection::getOutputName().
  • Querying BinaryContext for sections by name uses their original name. E.g., getUniqueSectionByName(".rodata") will return the original section even if the new .rodata section was created.
  • Input file sections (with relocations applied) are emitted via MC with ".bolt.org" prefix. However, their name in the output binary is unchanged unless a new section with the same name is created.
  • New sections are emitted internally with ".bolt.new" prefix if there's a name conflict with an input file section. Their original name is preserved in the output file.
  • Section header string table is properly populated with section names that are actually used. Previously we used to include discarded section names as well.
  • Fix the problem when dynamic relocations were propagated to a new section with a name that matched a section in the input binary. E.g., the new .rodata with jump tables had dynamic relocations from the original .rodata.

Diff Detail

Event Timeline

maksfb created this revision.Oct 7 2022, 2:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 2:49 PM
Herald added a subscriber: treapster. · View Herald Transcript
maksfb requested review of this revision.Oct 7 2022, 2:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 2:49 PM
maksfb updated this revision to Diff 467638.Oct 13 2022, 4:57 PM

Address comments by Rafael.

rafauler accepted this revision.Oct 13 2022, 5:41 PM

LGTM with comments

bolt/lib/Core/BinaryContext.cpp
2033

Flagging in case you want to stick to the other format that would write this as "Section (...) ."

bolt/lib/Rewrite/RewriteInstance.cpp
3843–3849

Check this loop

This revision is now accepted and ready to land.Oct 13 2022, 5:41 PM
maksfb updated this revision to Diff 467657.Oct 13 2022, 6:44 PM

Fix section read-only check. Remove mapExtraSections().

This revision was automatically updated to reflect the committed changes.
maksfb marked 2 inline comments as done.

I'm a bit concerned about adding OrgSecPrefix to EFMM. My understanding is that now it's impossible or really hard to rewrite a section in-place, because EFMM will automatically duplicate it and add prefix to the original section. I'm developing a patch to rewrite the entire binary and i'm now struggling to rebase it on top of this. I would appreciate clarifications regarding EFMM changes and how to update sections in-place with this code.

I'm a bit concerned about adding OrgSecPrefix to EFMM. My understanding is that now it's impossible or really hard to rewrite a section in-place, because EFMM will automatically duplicate it and add prefix to the original section. I'm developing a patch to rewrite the entire binary and i'm now struggling to rebase it on top of this. I would appreciate clarifications regarding EFMM changes and how to update sections in-place with this code.

With this change, many sections are still updated in-place, e.g. ".rodata" and every section with code pointers. "-use-old-text" also will rewrite code at the original location.

If the change you are working on relocates every single section at a new location, while reusing space in the original binary, perhaps you need to deregister all original sections?

@maksfb, it doesn't seem like we should deregister updated sections. We don't deregister .init_array or .rodata. My understanding of current logic is that when you emit a section with orgSecPrefix, EFMM will update the original section, and when you emit with original name, EFMM creates a new section and renames the old. Aside from a slight confusion, it causes troubles updating .text section, because you cannot simply emit .text with prefix the same way we do with data sections, so we always get a new BinarySection instance while we need to update the old. When we have two instances, we can't deregister old .text section since a lot of stuff depends on it. So it looks like at least for .text we'll be better off calling BC.registerOrUpdateSection() in EFMM without extra name management. But if we dive further in binary rewriting, similar issues arise with eh frames, and in that case i think it's more reasonable to keep all that renaming/duplicating stuff in RewriteInstance/BinaryEmitter since we may decide different strategy depending on the options.

@treapster, could you post a detailed description of your plans (or an RFC) to discourse? It’s possible I’m missing details of what you are planning to do.

@maksfb a patch is now not mature enough for RFC, but the idea is that we add more relocations including data-to-data, create them for plt, and emit all these sections with original names and call registerOrUpdate for each in EFMM. If we want to duplicate some section like .rodata, we can rename it to ".bolt.org.rodata" before emitting or emit it under a new name. In my original patch i used the latter, but for compatibility, i think renaming before emission should work. So, the general idea is that instead of this sorcery in EFMM we can call something like RenameDuplicatedSections() prior to emission(in RewriteInstance), where we decide based on options which sections we want rewritten and which duplicated, and then just emit everything under original name. Do you think it's a good idea? Or it won't work?

@maksfb a patch is now not mature enough for RFC, but the idea is that we add more relocations including data-to-data, create them for plt, and emit all these sections with original names and call registerOrUpdate for each in EFMM. If we want to duplicate some section like .rodata, we can rename it to ".bolt.org.rodata" before emitting or emit it under a new name. In my original patch i used the latter, but for compatibility, i think renaming before emission should work. So, the general idea is that instead of this sorcery in EFMM we can call something like RenameDuplicatedSections() prior to emission(in RewriteInstance), where we decide based on options which sections we want rewritten and which duplicated, and then just emit everything under original name. Do you think it's a good idea? Or it won't work?

@treapster, to be clear, you don't need a patch to submit an RFC, we can have a discussion even before any code is produced. Having this discussion on discourse.llvm.org has other benefits.

BOLT has a special handling for input file sections because it tries to be as noninvasive as possible. The renaming that takes place in EFMM is happening b/c we don't have full control over what sections are emitted by MC layer. E.g., the new ".eh_frame" will be emitted by MC under ".eh_frame" name using special directives while emitting regular code. To avoid a conflict with the original ".eh_frame", EFMM uses prefixes. If you intent to update the original ".eh_frame", assuming the size hasn't increased, you can still do so, but you have to use a special prefix (OrgSecPrefix) while emitting it manually. That's how we handle in-place updates of ".data", ".rodata" etc.

If you are doing a full binary rewrite, you don't have to preserve the original section contents and location. I assume, you have full control over section size and placement too. You can create copies of input file sections, and then deregister the originals since you wouldn't need them while writing out the file. If the input file sections are removed from the list of sections, no renaming will take place inside EFMM.

For the patch, if possible, break it down to multiple small changes. E.g. I suspect you will have to handle more relocation types in BOLT, and that change could be fairly independent.