This is an archive of the discontinued LLVM Phabricator instance.

[ObjectYAML][MachO] Add support for relocations
ClosedPublic

Authored by alexander-shaposhnikov on Apr 9 2020, 6:03 PM.

Details

Summary

Add support for relocations for MachO to ObjectYAML / yaml2obj / obj2yaml.

Test plan: make check-all

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 6:03 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

just in case: the failed pre-merge check is complaining on naming of some fields (that is consistent with the "local" style / existing macho-specific structs, but violates the usual naming guidelines)

The Mach-O bits look good to me. I'd be happy if someone more familiar with the YAML bits could take a look at those as well, but they seem straightforward enough.

llvm/include/llvm/ObjectYAML/MachOYAML.h
31

Nit: change two spaces to one between "being" and "relocated"

llvm/lib/ObjectYAML/MachOEmitter.cpp
267

Any reason to prefer any_relocation_info to relocation_info? Assigning to the individual fields in it would be much clearer than all the shifting and bit arithmetic below IMO.

283

Same here, but with scattered_relocation_info.

alexander-shaposhnikov marked an inline comment as done.Apr 14 2020, 11:54 AM
alexander-shaposhnikov added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

thank you for review!
relocation_info appears to break the pattern used across the codebase
when we populate a struct and then if necessary call MachO::swapStruct (there is no "swapStruct" for relocation_info), it seems to me it would make things inconsistent / a tiny bit harder to reason about.

MaskRay added inline comments.Apr 14 2020, 12:41 PM
llvm/test/ObjectYAML/MachO/relocations_arm64.yaml
94

Among these relocation fields, does any one have a default value and can be omitted? For example, if scattered is almost always false (I don't know if that is true), it can be omitted for simpler output.

smeenai added a subscriber: lhames.Apr 14 2020, 1:15 PM
smeenai added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

So I'm not actually sure swapStruct is doing the right thing for any_relocation_info, looking at this more closely.

Firstly, Apple's Mach-O headers don't define the bitfield in relocation_info differently for big vs. little-endian machines, but LLVM's header does, and that discrepancy seems potentially bad. @lhames changed that in rL358839; perhaps he can comment on that?

Secondly, for scattered_relocation_info, the intent of the big-endian vs. little-endian definitions of the bitfield appears to be that the bitfield has the same layout on both, e.g. that r_scattered is always the most significant bit. However, [swapStruct for any_relocation_info](http://llvm.org/doxygen/BinaryFormat_2MachO_8h_source.html#l01258) just indiscriminately byte-swaps both words in the any_relocation_info, which doesn't seem correct.

alexander-shaposhnikov marked an inline comment as done.Apr 14 2020, 1:50 PM
alexander-shaposhnikov added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

@smeenai thanks for checking it, I'll take a closer look at it and add a test for a different endianness.

alexander-shaposhnikov planned changes to this revision.Apr 14 2020, 3:49 PM
alexander-shaposhnikov marked an inline comment as done.Apr 14 2020, 7:45 PM
alexander-shaposhnikov added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

0/ https://en.cppreference.com/w/cpp/language/bit_field see the section Notes - so it looks like in general the way how bit fields are packed is not defined by the standard, though I'm not 100% sure and would not claim anything here.

1/ In general, the picture appears to be not super clean and straightforward, e.g. see the comments in
<mach-o/reloc.h>.

2/ To figure out how exactly relocations need to be serialized into the binary file I was looking at how LLVM's libObject parses MachO object files in the first place.
See
https://llvm.org/doxygen/MachOObjectFile_8cpp_source.html#l00068 ,
https://llvm.org/doxygen/MachOObjectFile_8cpp_source.html#l04403 ,
https://llvm.org/doxygen/MachOObjectFile_8cpp_source.html#l00153 .

3/ I've added a test with a big endian binary file and I've also verified that
otool -r / llvm-objdump --macho --reloc / obj2yaml are all consistent and can correctly handle the binaries
created by yaml2obj.

Address comments, add a test with a big endian binary.

alexander-shaposhnikov marked an inline comment as done.Apr 14 2020, 7:59 PM
alexander-shaposhnikov added inline comments.
llvm/test/ObjectYAML/MachO/relocations_arm64.yaml
94

it depends on the target, for x86_64 OSX - yeah, to the best of my knowledge scattered relocations are not used,
for ARMv7 - there are quite a lot of them. Honestly I find it easier to understand the test when the format is the same / all the fields are reported.
These tests are based on a (real-world) tiny code snippet compiled for various targets (for ppc-apple-darwin I had to use clang-5.0)

int x = 1;
int y = 2;
long z = (long)&x;
long w = (long)&x - (long)&y;

int main() {
  return z + w;
}
grimar added a subscriber: grimar.EditedApr 15 2020, 2:54 AM

I know nothing about MachO, few suggestions about the code are below.
I wonder if new tests should be splitted and placed to test\tools\yaml2obj and test\tools\obj2yaml folders
(it is where the active development of these tools goes and where we have tests for relocations for another targets).

llvm/include/llvm/ObjectYAML/MachOYAML.h
32

Should it be called offset?

34

I'd rename to something like sym_or_sec_index probably as symbolnum doesn't seem to be a right name given the comment saying it can hold a section index.

37

Should it be called log2length or alike?

58

Can you add a test when a YAML description has a relocations property which is empty?

relocations: []

llvm/lib/ObjectYAML/MachOEmitter.cpp
63

Add a comment?

331

LLVM often use IsLE, I'd suggest to follow as it is shorter and more common.

340

I think you can inline these 5 const unsigned variables.

358

The same.

llvm/test/ObjectYAML/MachO/relocations_arm64.yaml
2

Should instead we add 2 test cases, one to test\tools\obj2yaml and one to test\tools\yaml2obj?
(I'd expect to see tests for these tools)

llvm/tools/obj2yaml/macho2yaml.cpp
143–144

This comment is duplicated twice in this method.

148

May be generalize this piece a bit? It does not seem to be a perfomance critical place, so you probably can do:

SectionType Sec;
memcpy((void *)&Sec, Curr, sizeof(SectionType));
if (Obj.isLittleEndian() != sys::IsLittleEndianHost)
  MachO::swapStruct(Sec);

// For MachO section indices start from 1.
if (Expected<MachOYAML::Section> S =
        constructSection(Sec, Sections.size() + 1))
  Sections.push_back(std::move(*S));
else
  return S.takeError();
241

Unrelated. (Seems this file might benefit from a independent NFC change converting autos to real types).

alexander-shaposhnikov marked an inline comment as done.Apr 15 2020, 10:20 AM
alexander-shaposhnikov added inline comments.
llvm/include/llvm/ObjectYAML/MachOYAML.h
32

@grimar - I agree, however, the rationale here was to reuse the existing names, even though they are awful, to avoid causing further pain when people try to match different structures from different codebases.

http://llvm.org/doxygen/MachORelocation_8h_source.html
https://opensource.apple.com/source/cctools/cctools-949.0.1/include/mach-o/reloc.h.auto.html

grimar added inline comments.Apr 16 2020, 2:08 AM
llvm/include/llvm/ObjectYAML/MachOYAML.h
32

I see. Perhaps it is OK then. Sticking to official naming (here and below) probably makes sense.

36

As far I understand the values that are >=4 are invalid. I'd suggest to use /* 0=byte, 1=word, 2=long, 3=quad */ comment then (taken from https://opensource.apple.com/source/cctools/cctools-949.0.1/include/mach-o/reloc.h.auto.html), it is currently not obvious it that the value is expected to be in [0..3] I think.

Perhaps worth adding a test with length == max(uint8_t), length == 5 to show the behavior. Seems that with the value of 5, this will produce the relocation like if the length was 1? Ideally you might want to validate fields and report an error, e.g (from ELF):

StringRef MappingTraits<ELFYAML::Symbol>::validate(IO &IO,
                                                   ELFYAML::Symbol &Symbol) {
  if (Symbol.Index && Symbol.Section.data())
    return "Index and Section cannot both be specified for Symbol";
  return StringRef();
}

yaml2obj is often used to produce broken objects, but in this case the behavior is unexpected (i.e. if it produced a relocation with the length 5, it would be fine), so probably needs to be reported.

Fields validating is probably not that important to do for the initial implementation though.

grimar added inline comments.Apr 16 2020, 2:13 AM
llvm/include/llvm/ObjectYAML/MachOYAML.h
36

Fields validating is probably not that important to do for the initial implementation though.

And testing broken values either I think. Up to you.

alexander-shaposhnikov marked an inline comment as done.Apr 16 2020, 2:40 AM
alexander-shaposhnikov added inline comments.
llvm/test/ObjectYAML/MachO/relocations_arm64.yaml
2

i was looking at other tests in llvm/test/ObjectYAML/MachO/ .
They all appear to have a similar structure (in particular, they use yaml2obj / obj2yaml )

Address comments, add a test with an empty relocations list

alexander-shaposhnikov marked an inline comment as done.Apr 16 2020, 2:45 AM
alexander-shaposhnikov marked an inline comment as done.Apr 16 2020, 2:49 AM
alexander-shaposhnikov added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
340

so basically here integer promotion rules come into play, e.g. if i am not mistaken bool would be implicitly casted to int (signed) and then left shifted, since we are shifting to the left everything would be okay (even though int is signed https://en.cppreference.com/w/cpp/language/operator_arithmetic#Bitwise_shift_operators), but I somehow wanted to avoid these hidden implicit transformations and make things explicit.

alexander-shaposhnikov marked 2 inline comments as done.Apr 16 2020, 2:52 AM
grimar added inline comments.Apr 16 2020, 3:36 AM
llvm/lib/ObjectYAML/MachOEmitter.cpp
340

I do not mind leaving bools alone (I've not noticed them honestly),
but why not to inline other variables at least to make the code shorter?

const unsigned IsPCRel = R.is_pcrel ? 1 : 0;
const unsigned IsExtern = R.is_extern ? 1 : 0;
MRE.r_word1 = ((unsigned)R.symbolnum << 0) | (IsPCRel << 24) |
              ((unsigned)R.length << 25) | (IsExtern << 27) |
              ((unsigned)R.type << 28);
llvm/test/ObjectYAML/MachO/relocations_empty.yaml
9

I guess having a one section with no relocations was enough.

alexander-shaposhnikov marked an inline comment as done.Apr 16 2020, 9:32 AM
alexander-shaposhnikov added inline comments.
llvm/lib/ObjectYAML/MachOEmitter.cpp
340

not sure it provides any improvements in terms of readability,
additionally C-style casts might disappoint clang-tidy. I find it easier to read the current version.

trim down the empty relocations list test

lhames added inline comments.Apr 16 2020, 11:03 AM
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

Firstly, Apple's Mach-O headers don't define the bitfield in relocation_info differently for big vs. little-endian machines, but LLVM's header does, and that discrepancy seems potentially bad. @lhames changed that in rL358839; perhaps he can comment on that?

Huh. Re-reading reloc.h it looks like my change was broken. I wonder that why that didn't blow more tests up. :/

I'll see if I can fix this today.

lhames added inline comments.Apr 16 2020, 4:21 PM
llvm/lib/ObjectYAML/MachOEmitter.cpp
267

Fixed in 386f1c114d5.

alexander-shaposhnikov marked an inline comment as done.Apr 16 2020, 7:30 PM

@grimar, @smeenai - many thanks for reviewing, I'm wondering if we can move forward here / would you like me to make any adjustments ?

This looks good on my end. I'll let @grimar confirm all his comments have been addressed.

grimar accepted this revision.Apr 21 2020, 2:50 AM

The change about ObjectYAML/obj2yaml looks generally fine, so LGTM.
(Perhaps would be nice to see an approve from someone who knows MachO better too)

I've put the last nit inline. It is probably up to you to decide keep it or to change,
as I have no horse in this race, because MachO is not my area.

llvm/lib/ObjectYAML/MachOEmitter.cpp
340

not sure it provides any improvements in terms of readability,

The following variables looks excessive as they do not introduce new meaning.
Removing them would help in that sense.

const unsigned IsPCRel = R.is_pcrel;
const unsigned IsExtern = R.is_extern;
const unsigned Type = R.type;

additionally C-style casts might disappoint clang-tidy.

We have many places in LLVM where (unsigned)-1, (uint64_t)-1 and other similar conversions are used,
so I'd say it is very common.

This revision is now accepted and ready to land.Apr 21 2020, 2:50 AM
alexander-shaposhnikov marked an inline comment as done.Apr 21 2020, 10:33 AM
smeenai accepted this revision.Apr 21 2020, 12:06 PM

The Mach-O bits LGTM.

This revision was automatically updated to reflect the committed changes.