This is an archive of the discontinued LLVM Phabricator instance.

[ObjectYAML] Basic support for chained fixups.
ClosedPublic

Authored by drodriguez on Sep 19 2022, 7:14 PM.

Details

Summary

Add basic binary support for chained fixups. This allows basic tests
with chained fixups without trying to create a format for them until the
work on the Object library is considered finished.

Diff Detail

Event Timeline

drodriguez created this revision.Sep 19 2022, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 7:14 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
drodriguez requested review of this revision.Sep 19 2022, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 7:14 PM
beanz added a subscriber: lhames.

Looping in @lhames in case he has any thoughts from Apple.

From a design perspective I'm not sure this is the approach I'd take, but I also don't know that I'm the best person to judge since I'm not going to be the user here.

When I added the initial mach-o support the emphasis was on representing the binary accurately enough to reproduce binary-accurate transformations, but to also have the YAML representation be human readable wherever it could be. This patch really just treats it as binary data, which makes it pretty unreadable.

llvm/lib/ObjectYAML/MachOEmitter.cpp
603
610
617
llvm/tools/obj2yaml/macho2yaml.cpp
630

When I added the initial mach-o support the emphasis was on representing the binary accurately enough to reproduce binary-accurate transformations, but to also have the YAML representation be human readable wherever it could be. This patch really just treats it as binary data, which makes it pretty unreadable.

Agree 100%. The need from this come from D133974. I had the need of testing that the data was moved correctly during objcopy, but I don't have the need to inspect the data itself. My hope is that whenever someone has the need to check that the data is actually representing the right thing, they will have the knowledge of what the data represents and how better transform it into YAML. I prefer not to check-in executables in the source tree, but if we don't even want this code as a temporary solution, I think I can work around the need of adding this support.

More code style.

drodriguez marked 4 inline comments as done.Sep 20 2022, 12:51 PM
beanz added a comment.Sep 20 2022, 2:00 PM

Agree 100%. The need from this come from D133974. I had the need of testing that the data was moved correctly during objcopy, but I don't have the need to inspect the data itself. My hope is that whenever someone has the need to check that the data is actually representing the right thing, they will have the knowledge of what the data represents and how better transform it into YAML.

Amusingly the only documentation I'm aware of that documents the format of the export trie... is in a file you modified in this change: https://github.com/llvm/llvm-project/blob/main/llvm/tools/obj2yaml/macho2yaml.cpp#L473

pete added a subscriber: pete.Sep 20 2022, 2:02 PM
pete added inline comments.
llvm/lib/ObjectYAML/MachOYAML.cpp
171

Should this be LinkEditData.DyldExportsTrie?

llvm/tools/obj2yaml/macho2yaml.cpp
646

So the exports trie is a little weird. It used to live in the LC_DYLD_INFO/LC_DYLD_INFO_ONLY along with bind opcodes. If the binary is using opcode based fixups, then it is required that the trie come from there, not the LC_DYLD_EXPORTS_TRIE.

If the binary is newer, and uses LC_DYLD_CHAINED_FIXUPS then it is required that the export trie (if there is one) comes from the LC_DYLD_EXPORTS_TRIE.

I don't know if that impacts this code or not, but roughly i'd expect the export trie to be dumped either next to the bind opcodes as part of the LC_DYLD_INFO, or next to the chained fixups as a LC_DYLD_EXPORTS_TRIE. Does this dump method handle both those cases, or just the LC_DYLD_EXPORTS_TRIE case?

drodriguez marked an inline comment as done.Sep 20 2022, 3:46 PM

Amusingly the only documentation I'm aware of that documents the format of the export trie... is in a file you modified in this change: https://github.com/llvm/llvm-project/blob/main/llvm/tools/obj2yaml/macho2yaml.cpp#L473

😆 I think you wanted to use "As usual", and not "Amunsigly". Lacking format documentation seems to be a constant with Mach-O 😆

Do you happen to know if the contents of the newer LC_DYLD_EXPORTS_TRIE have the same format as the previous trie? It seems to me that they look very similar. I wonder if I can reuse a little bit of the code and improve this to have a more human-friendly dump (I will need to modify some code of the Object library first, which seems to not deal with this old place/new place duality).

llvm/lib/ObjectYAML/MachOYAML.cpp
171

Thanks. Good eyes. I did not realize that the initial yaml dump had the same values in both places.

llvm/tools/obj2yaml/macho2yaml.cpp
646

Thanks for the explanation. It clarifies why I could not get one with both LC_DYLD_INFO and LC_DYLD_CHAINED_FIXUPS. I was confuse why there was the design had two structures named almost the same ("exports trie" and a "dyld exports trie").

As you pointed out above, this was doing the wrong thing. I think autocomplete of dyld… found getDyldInfoExportsTrie and I did not realize that I was looking for the wrong piece of the binary. I changed the code to find the LC and dump the contents directly.

This code also only deals with LC_DYLD_EXPORTS_TRIE, since it was the one we started having problems (for the problems see D133974). It seems that the code in dumpExportsTrie and processExportNode dump the other.

drodriguez marked an inline comment as done.

Fix incorrect mapping. Fix code for dumping the right LC. Fix test

beanz added a comment.Sep 20 2022, 5:26 PM

Do you happen to know if the contents of the newer LC_DYLD_EXPORTS_TRIE have the same format as the previous trie?

@pete can correct me if I’m wrong, but I believe they are the same.

pete added a comment.Sep 20 2022, 6:17 PM

Do you happen to know if the contents of the newer LC_DYLD_EXPORTS_TRIE have the same format as the previous trie?

@pete can correct me if I’m wrong, but I believe they are the same.

Yep, exactly the same contents.

pete accepted this revision.Sep 20 2022, 6:18 PM

LGTM. Thanks for doing this!

This revision is now accepted and ready to land.Sep 20 2022, 6:18 PM

I'm not familiar with Mach-O object files. I can only give some minor suggestions on your codes.

llvm/lib/ObjectYAML/MachOEmitter.cpp
603

Since Obj.LinkEdit.ChainedFixups is an array of bytes. You can simply serialize it using

if (Obj.LinkEdit.ChainedFixups.size() > 0)
  OS.write(reinterpret_case<const char *>(Obj.LinkEdit.ChainedFixups.data()),
           Obj.LinkEdit.ChainedFixups.size());
610

Ditto.

617

Ditto.

llvm/tools/obj2yaml/macho2yaml.cpp
671

Why not using reinterpret_case<> here?

beanz added inline comments.Sep 22 2022, 10:35 AM
llvm/tools/obj2yaml/macho2yaml.cpp
671

void* to uint8_t* doesn’t need a reinterpret_cast so it is better to use a static_cast.

Higuoxing added inline comments.Sep 23 2022, 3:31 AM
llvm/tools/obj2yaml/macho2yaml.cpp
671

Yes, I mean why not just

uint8_t *Bytes = reinterpret_cast<uint8_t *>(&DICE);
beanz added inline comments.Sep 23 2022, 7:25 AM
llvm/tools/obj2yaml/macho2yaml.cpp
671

Ah, I see. The single reinterpret_cast is probably clearer here. Sorry for the noise.

Split into D134569, D134571 and this. Use the feedback provided by Higuoxing

drodriguez retitled this revision from [ObjectYAML] Support chained fixups, dyld exports trie, data in code to [ObjectYAML] Basic support for chained fixups..Sep 23 2022, 4:03 PM
drodriguez edited the summary of this revision. (Show Details)
drodriguez updated this revision to Diff 474339.Nov 9 2022, 1:24 PM
drodriguez marked an inline comment as done.

Remove pieces related to the export trie to avoid dependency in previous diff.

This revision was landed with ongoing or failed builds.Nov 11 2022, 10:26 AM
This revision was automatically updated to reflect the committed changes.