This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho][nfc] Refactor to accommodate paired relocs
ClosedPublic

Authored by gkm on Nov 2 2020, 7:51 AM.

Details

Reviewers
int3
Group Reviewers
Restricted Project
Commits
rGd4ec3346b1ba: [lld-macho][nfc] Refactor to accommodate paired relocs
Summary

This is a refactor to pave the way for supporting paired-ADDEND for ARM64. The only paired reloc type for X86_64 is SUBTRACTOR. In a later diff, I will add SUBTRACTOR for both X86_64 and ARM64.

  • s/getImplicitAddend/getAddend/ because it handles all forms of addend: implicit, explicit, paired.
  • add predicate bool isPairedReloc()
  • check range of relInfo.r_symbolnum is internal, unrelated to user-input, so use assert(), not error()
  • minor cleanups & rearrangements in InputFile::parseRelocations()

Diff Detail

Event Timeline

gkm created this revision.Nov 2 2020, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2020, 7:51 AM
gkm requested review of this revision.Nov 2 2020, 7:51 AM
compnerd added inline comments.
lld/MachO/Arch/X86_64.cpp
29

I wonder if it makes sense to use a std::pair<const relocation_info, const relocation_info> &. It should be tested, but I could see the compiler being smart enough to unpack the pair if possible, which just makes it much more clear that that the two parameters are the paired relocation. It took me a moment to realize that.

What happens in the case of an unpaired relocation?

lld/MachO/InputFiles.cpp
215

You could also do std::next(&RI) if you want to continue using the range based for loop.

256

This really isn't equivalent. The assert may be compiled out, while the previous behaviour would have been preserved and reported an error to the user.

gkm marked an inline comment as done.Nov 5 2020, 10:57 PM
gkm added inline comments.
lld/MachO/Arch/X86_64.cpp
29

I looked at some cases with [[ godbolt.org | godbolt.org ]] for paired vs. unpaired caller & callee. When the args are scalars, scalar replacement of aggregates works properly for callees, and functions that accept foo(long a, long b) vs. foo(std::pair<long, long> x) produce identical argument handling code. Note that x here is a value not a reference, otherwise there is extra indirection. When I looked beyond scalars toward struct MachO::relocation_info, I found that neither GCC nor LLVM could apply SRoA to both layers of pair + struct. On the caller side, there is overhead associated with making pairs. Sooo ... pairs come at some runtime cost.

The paired reloc is only valid when it has r_type of *_RELOC_ADDEND or *_RELOC_SUBTRACTOR. Otherwise it is ignored.

I fiddled writing it using std::pair<> and was not impressed that it improve clarity, while imposing runtime overhead.

lld/MachO/InputFiles.cpp
215

That's an improvement! 😺

256

I changed it to an assertion on the grounds that it is an internal "can't happen" error, rather than something a user might induce with malformed input.

gkm marked an inline comment as done.Nov 5 2020, 11:26 PM
gkm added inline comments.
lld/MachO/InputFiles.cpp
215

Only in concept ... it doesn't work in practice. std::next() accepts and returns an iterator. Range-based for keeps the iterator behind the scenes. The best I can do to make this more C++ish is to use an iterator object with std::next() vs. integer array index with ordinary arithmetic. Methinks there is no great advantage.

int3 added a subscriber: int3.Nov 9 2020, 9:42 AM

Nit: Commit title seems to indicate that we are adding support for paired relocs in this diff, but it seems like this diff is just paving the way for adding that support. Might be worth rephrasing... could also tag with [nfc] to make it explicit.

Also, would be good to add some comments explaining what a paired relocation is, especially since it appears to be a Mach-O specific thing. (I know Apple's reloc.h explains it, but ideally someone reading LLD's code would be able to understand what's going on without cross-referencing)

lld/MachO/Arch/X86_64.cpp
72–78

pass by const ref?

73

ditto about parens

77–78

pass by const ref

lld/MachO/InputFiles.cpp
212

other reviewers have mentioned that LLVM style is to evaluate loop constants in the assignment part of the loop header (i.e. size_t i= 0, e = < relInfos.size(); i < e; ...). I don't really care much either way, just FYI

215

nit: I'd prefer to skip the extra parens around assignment and return expressions

256

A fuzzer could generate a file to trigger this error though, right?

(As an aside, we do need more compact ways of emitting error strings...)

gkm updated this revision to Diff 312297.Dec 16 2020, 1:37 PM
gkm marked 8 inline comments as done.
  • revise according to review feedback: remove parens around RHS expressions of return & assignment
gkm updated this revision to Diff 312319.Dec 16 2020, 3:53 PM
  • Add block comment about paired relocs
thakis added a subscriber: thakis.Dec 16 2020, 4:15 PM
thakis added inline comments.
lld/MachO/InputFiles.cpp
256

LLD's philosophy is that it assumes that inputs are generally valid: https://github.com/llvm/llvm-project/blob/main/lld/docs/NewLLD.rst "The current policy is that it is your responsibility to give trustworthy object files. The function is guaranteed to return as long as you do not pass corrupted or malicious object files. A corrupted file could cause a fatal error or SEGV. That being said, you don't need to worry too much about it if you create object files in the usual way and give them to the linker. It is naturally expected to work, or otherwise it's a linker's bug." So crashing on fuzzer input is fine, as long as object files generated by compilers or assemblers work.

int3 accepted this revision.Dec 16 2020, 6:07 PM

Nit: Commit title seems to indicate that we are adding support for paired relocs in this diff, but it seems like this diff is just paving the way for adding that support. Might be worth rephrasing... could also tag with [nfc] to make it explicit.

^ do remember to address that, as well as the const ref stuff. Otherwise lgtm

lld/MachO/Arch/X86_64.cpp
72–78

doesn't look like the const ref comment was addressed

lld/MachO/InputFiles.cpp
227–236

love the explanation!

250
256

oh that's a relief. thanks for pointing it out

This revision is now accepted and ready to land.Dec 16 2020, 6:07 PM

Nit: Commit title seems to indicate that we are adding support for paired relocs in this diff, but it seems like this diff is just paving the way for adding that support. Might be worth rephrasing... could also tag with [nfc] to make it explicit.

^ do remember to address that

As a reminder, if you change your commit message locally, you need to pass --verbatim to arc diff to also update it in Phabricator.

int3 added inline comments.Dec 16 2020, 6:46 PM
lld/MachO/InputFiles.cpp
227–236

inconvenient and more costly at runtime.

I might be misunderstanding, but do you mean link time instead of runtime?

gkm marked 5 inline comments as done.Dec 17 2020, 9:49 AM
gkm added inline comments.
lld/MachO/Arch/X86_64.cpp
72–78

relocation_info is a 64-bit struct that can pass-by-value in a register. Making it a reference introduces unnecessary indirection.

77–78

relocation_info is a 64-bit struct that can pass-by-value in a register. Making it a reference introduces unnecessary indirection.

lld/MachO/InputFiles.cpp
212

I did a quick audit for this in lld. The recommended LLVM style is used approx 1/3 of the time.

227–236

Better.

256

No, the assertion is checking for malformed symbol-table entries. Local symbol numbering is not accessible from assembler source.

gkm updated this revision to Diff 312551.Dec 17 2020, 10:23 AM
gkm marked an inline comment as done.
gkm retitled this revision from [lld-macho] Handle paired relocs to [lld-macho][nfc] Refactor to accommodate paired relocs.
  • cleanup: convert remaining instances of relocation_info pass-by-cost-reference to pass-by-value
  • address lingering feedback nits
This revision was automatically updated to reflect the committed changes.
smeenai added inline comments.Dec 17 2020, 11:24 PM
lld/MachO/InputFiles.cpp
212

I was curious about this so I looked at the generated assembly (my host compiler was Clang 9). At least in this case, it's smart enough to realize that the loop constant is, well, constant (and that it's just sec.nreloc) and uses that constant value instead of recomputing things in each iteration. (The value gets spilled to the stack so it's still a memory reload, but the extra variable wouldn't have saved you either.) I can imagine that analysis being a lot harder to do for the compiler in the general case though, hence the LLVM style recommendation (but yeah, it doesn't seem too enforced in LLD).