This is an archive of the discontinued LLVM Phabricator instance.

Extend obj2yaml to optionally preserve raw __LINKEDIT/__DATA segments.
ClosedPublic

Authored by aprantl on Nov 4 2021, 6:38 PM.

Details

Summary

I am planning to upstream MachOObjectFile code to support Darwin
chained fixups. In order to test the new parser features we need a way
to produce correct (and incorrect) chained fixups. Right now the only
tool that can produce them is the Darwin linker. To avoid having to
check in binary files, this patch allows obj2yaml to print a hexdump
of the raw LINKEDIT and DATA segment, which both allows to
bootstrap the parser and enables us to easily create malformed inputs
to test error handling in the parser.

This patch adds two new options to obj2yaml:

-raw-data-segment
-raw-linkedit-segment

Diff Detail

Event Timeline

aprantl created this revision.Nov 4 2021, 6:38 PM
aprantl requested review of this revision.Nov 4 2021, 6:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2021, 6:38 PM
alexander-shaposhnikov added inline comments.
llvm/lib/Object/MachOObjectFile.cpp
2052

static ?

llvm/tools/obj2yaml/macho2yaml.cpp
654

in the future someone might want to dump other segments too, though I'm not sure what would be the best solution here

aprantl updated this revision to Diff 385088.Nov 5 2021, 8:35 AM

Put function into inline namespace.

llvm/lib/Object/MachOObjectFile.cpp
2052

Good point! My understanding is that a template cannot also be a static function, but I should at least put it into the anonymous namespace.

llvm/tools/obj2yaml/macho2yaml.cpp
654

These two are all I need right now, but if these options proliferate in the future, we should probably pass a list of raw segment names or an option struct.

llvm/tools/obj2yaml/macho2yaml.cpp
654

Yeah, I also think that passing a list would be a good option. Somehow it feels like it'd be better than the newly introduced boolean flags - in this case we won't need to change the command line options in the future or keep them around just for compatibility. E.g. one could use --raw-segments=<segment name 1>,<segment name 2> instead of
'-raw-data-segment -raw-linkedit-segment' + the code will improve a little bit - maybe switch now ?

aprantl updated this revision to Diff 385177.Nov 5 2021, 1:45 PM

Generalize command line options to use a bitvector to list all raw segments.

thanks, generally looks good to me, i think this feature is very useful.
I'm wondering if we can add tests for this flag ? If I'm not mistaken you had a test in your first revision but it appears to have disappeared in the latest one.

aprantl updated this revision to Diff 385183.Nov 5 2021, 1:58 PM

Indeed. I forgot to git-add the file!

llvm/tools/obj2yaml/obj2yaml.cpp
27

I'd like just to point out that in general users can specify their one segment names,
e.g. via attribute((section("CUSTOMSEG,text")), if you use enum here they won't be able to dump them
without modifying the tool
(Phabricator doesn't like double underscore, the name above is <double underscore>CUSTOMSEG)

llvm/tools/obj2yaml/obj2yaml.cpp
27

typo: their own

aprantl added inline comments.Nov 5 2021, 2:25 PM
llvm/tools/obj2yaml/obj2yaml.cpp
27

Yeah, I guess if someone wanted to use such a feature, they will have to generalize this further.

This revision is now accepted and ready to land.Nov 5 2021, 2:29 PM