This is an archive of the discontinued LLVM Phabricator instance.

[llvm-install-name-tool] Merge rpath with id/change
ClosedPublic

Authored by sameerarora101 on Jun 29 2020, 2:48 PM.

Details

Summary

Merge -rpath option within the same loop as -id and
-change. This also simplifies -add_rpath.
Added another test case for -add_rpath to test the
new implementation. Depends on D82613

Diff Detail

Event Timeline

sameerarora101 created this revision.Jun 29 2020, 2:48 PM
Herald added a project: Restricted Project. · View Herald Transcript
sameerarora101 marked an inline comment as done.Jun 29 2020, 3:00 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346
NOTE: Cannot store StringRef in OriginlRPaths (i.e cannot have set<StringRef> OriginalRPaths). This is because the StringRef CurrentRPath is a reference to the string stored inside LC and thus when we change the payload (via -rpath option) CurrentRPath is also updated to hold a reference to the new string. Hence I need to store the actual strings inside the set.
Ktwu added inline comments.Jun 29 2020, 6:25 PM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346

nit: are these braces intentional? It's inconsistent with the other cases.

347

I think you have some options available depending on how clever you want to get with sets if you care about minimizing string cloning. Instead of tracking original rpaths, you can track operations on the config rpaths, for example (those strings shouldn't change unexpectedly beneath you, right)?

std::set<StringRef> UpdatedRPaths;

case Macho::LC_RPATH:
  StringRef CurrentRPath = ...
  auto FoundArg = ...
  if (FoundArg != end) {
    if (CurrentRPath in UpdatedRPaths) {
      // duplicate scenario
    }
    if (CurrentRPath in RPathToAdd) {
      // Also a dup scenario
    }
    updateLoadCommandPayloadString()
    UpdatedRPaths.insert(FoundArg->first);
  }
...

for (const auto &OldNew : Config.RPathsToUpdate) {
  std::tie(Old, New) = OldNew;
  if (Old not in UpdatedMap) {
    // no load command scenario
  }
  // dup scenario handled above
}

...
std::set<StringRef> AddedRPaths
for (StringRef RPath : Config.RPathToAdd) {
  if (RPath in AddedRPaths) {
    // dup
  }
  AddedRPaths.insert(RPath);
}
jhenderson added inline comments.Jun 30 2020, 12:47 AM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346

You probably don't want a std::set here. Have you reviewed the Programmer's Manual on what set type to use?

346

I think the braces are required because of the declaration of CurrentRPath in this case statement.

353

I think the break should be inside the braces. It looks weird where it is.

358–362

I'd be tempted to move this code above the AddSection code so that all rpath related stuff is together.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346
  1. I wanted to mention it on one of the previous diffs but looks like I forgot - it would be better to place all the details related to updating load commands into a separate helper function rather than directly into handleArgs.
static Error updateLoadCommands(const CopyConfig &Config, Object &Obj)

handleArgs would invoke this function (updateLoadCommands) similarly to removeSections, updateAndRemoveSymbols etc.

Probably it would be even better first to refactor the code this way (on a separate diff) before moving forward if you don't mind
(now handleArgs is accumulating the details it should not really be aware of)

  1. Please, take a look at StringSet, StringMap and their interfaces

(in particular, if I am not mistaken they would help you avoid some str() calls below)

Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.

Ktwu added inline comments.Jun 30 2020, 10:26 AM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346

Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.

+1, although does StringMap have deterministic iteration? Maybe SetVector would work as well?

sameerarora101 marked 12 inline comments as done.Jun 30 2020, 12:59 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346

yup, that is indeed the case.

346

yup, I have replace it with a DenseSet<StringRef>

346
  1. ok, so i refactored the code for updating LCs into a separate function updateLoadCommands and call that from handleArgs.

Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.

I agree, but I think I should do that in a separate diff considering that it would require changing the way I parse options in CopyConfig.cpp. What do you think?

347

Thanks a lot for the clever tip, UpdatedRPaths.insert(FoundArg->first) was really helpful. However, I am not sure if I understood the logic completely (sorry if I missed something 😔). In particular, say I have
2 LC: lcA and lcB
and 1 -rpath option :lcB -> lcA

We want this to throw an error but it is currently passing with the above logic.

sameerarora101 marked 2 inline comments as done.Jun 30 2020, 1:03 PM

Refactoring all load command operations into updateLoadCommands

Adding comments over -change option

Ktwu added inline comments.Jun 30 2020, 1:37 PM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
346

What do you think?

Definitely do that in a separate diff -- it's much better to keep refactoring changes like that confined to their own diff instead of mixed up with new feature work.

347

Ah, my pseudocode is bad, sorry! You already fixed the crux of the problem I was talking about with the DenseSet<StringRef; no need to worry about my idea.

Changing vector<Pair> to DenseMaps and refactoring

sameerarora101 marked an inline comment as done.Jun 30 2020, 5:47 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
163

As recommended by @alexshap it is clearer to understand (and the code is cleaner too) if iterate through the LCs twice + use Maps (DenseMaps) down below.

I've added a few comments but other than that this version looks good to me

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
161

nit: what about renaming OriginalRPaths -> RPaths ?

163

yeah, thanks!

196

nit: FoundArg doesn't seem to be a very descriptive name

StringRef RPath = getPayloadString(LC);
StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath);
...
207
StringRef InstallName = getPayloadString(LC);
NewInstallName = Config.InstallNamesToUpdate.lookup(InstallName)
jhenderson added inline comments.Jul 1 2020, 1:33 AM
llvm/tools/llvm-objcopy/CopyConfig.h
181–183

Would StringMap not work instead of DenseMap?

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
163

Nit here and below: you need full stops at the end of your comments.

164–167

I thought you were avoiding iterating twice over the LoadCommands before? Why are this loop and the update loop below separate?

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
164–167

@jhenderson - I think this was my suggestion, imo splitting these steps makes the code easier to reason about.
From the performance perspective it doesn't matter much, even huge binaries have ~70 load commands.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
164–167

Also if I'm not mistaken it has enabled @sameerarora101 to move error reporting (lines 171 -180) earlier, before any changes / updates are made. This makes sense to me.

sameerarora101 marked 7 inline comments as done.Jul 1 2020, 7:20 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.h
181–183

I think it would work too. However, if I am not mistaken, while comparing StringSet and DenseSet in a previous diff, @smeenai suggested that StringSet takes ownership of its strings (and copies them), which would be unnecessary copying (and so DenseSet would be better). I thought it would be the same case for StringMap vs DenseMap and so I implemented with DenseMap. What do you think?

sameerarora101 marked an inline comment as done.

Resolving nits: changing variable names

This revision is now accepted and ready to land.Jul 1 2020, 10:40 AM
jhenderson accepted this revision.Jul 2 2020, 2:36 AM

LGTM.

llvm/tools/llvm-objcopy/CopyConfig.h
181–183

Makes sense. I haven't looked at the data structures themselves.

FWIW, I don't think taking ownership is necessarily an issue, since there aren't going to be significant numbers of options so the string copying isn't going to be significant.

smeenai added inline comments.Jul 2 2020, 8:27 PM
llvm/tools/llvm-objcopy/CopyConfig.cpp
886

The pointer returned by calling .data() on a StringRef might not be null-terminated, so I don't think this is safe.

There's a form of createStringError that takes a twine instead of a format string, which is more efficient than the intermediate allocations associated with .str().c_str(). Given that this is an error path, efficiency really doesn't matter though :) The twine form should get implicitly used if you do something like

"cannot specify both -rpath " + It1->getFirst() + " " + It1->getSecond() + " and -rpath " + Old + " " + New

You could also get fancy and use the * precision specifier to specify the string length manually, instead of constructing the intermediate std::string. See https://en.cppreference.com/w/cpp/io/c/fprintf for details about *. Again though, this is purely academic; efficiency doesn't really matter that much for error paths. In other words, just doing .str().c_str() as before should be fine.

llvm/tools/llvm-objcopy/CopyConfig.h
181–183

Yeah, it's hard to say in general. From my understanding of the structures (and it's been a bit since I looked at them, so I may very well be mistaken):

  • StringMap copies the string contents, which is unnecessary copying and memory usage.
  • DenseMap allocates space for 64 key/value pairs by default (as soon as you insert a single entry), which is also unnecessary memory usage.
  • StringMap does a heap allocation for every key you insert.
  • DenseMap stores all key/value pairs in a single memory allocation, which is likely more cache-efficient when probing the hash table and iterating over all elements.
  • StringMap caches the hash values of its keys, which means it can do cheap comparisons when probing, and doesn't need to recompute hash values when resizing the hash table.
  • You can achieve the same with DenseMap by using CachedHashStringRef, however.

Over here, given the expected size of these things, I don't think it will end up mattering at all, but it is interesting to think through :)

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
65–66

Not related to this diff, but something I noticed: given that RPathsToRemove starts out as a copy of Config.RPathsToRemove and you're just removing elements from it, can't you just error out for any remaining entries in RPathsToRemove instead?

170

Can Old and New be declared inside the loop?

217

Nit: do we want to be adding new load commands in a function called updateLoadCommands? At least to me that function name seems like it should only be updating existing load commands, since we have a separate removeLoadCommands to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate addLoadCommands function.

346

Good point about the deterministic iteration. Given that duplicate -rpath commands are an error instead of the last one winning out, I think non-deterministic iteration should be okay, though I haven't thought about it too hard.

llvm/tools/llvm-objcopy/CopyConfig.cpp
886

if I am not mistaken in this particular case they are null-terminated (see e.g. https://reviews.llvm.org/D81575),
but perhaps it would be better not to rely on this.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
217

so basically the idea was to group together logical pieces of handleArgs (to some reasonable extent).
Besides error-reporting removeLoadCommands is ~10-12 lines of code, so I'd probably inline it into
updateLoadCommands for consistency.

sameerarora101 marked 9 inline comments as done.Jul 5 2020, 11:29 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
886

I see, thanks! I have switched it to the twine format as suggested.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
65–66

So the idea here was to raise an error for the first specified RPath that doesn't exist. For that purpose, we iterate through Config.RPathsToRemove and raise an error for the first RPath not deleted.

217

@alexshap Instead of inlining the whole removeLoadCommands inside updateLoadCommands I think it would cleaner if I just call

// Remove LCs.
if (Error E = removeLoadCommands(Config, Obj))
  return E;

from inside updateLoadCommands. This can allow for independent development of removeLoadCommands in future as well. What do you think? (I have updated the current diff with this change, however, I can update it again in case we want something else)

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
217

I'm not sure that removeLoadCommands is realistically independent from updateLoadCommands, e.g. the order in which you modify the list of load commands appears to be important. Since it's small (~10 lines) it seems preferable to avoid creating this weird asymmetry between removing/adding.

sameerarora101 marked 4 inline comments as done.Jul 5 2020, 12:24 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
217

I see. Ok, I have inlined removeLoadCommands into updateLoadCommands then. Thanks 😊

sameerarora101 marked an inline comment as done.

Inlining removeLoadCommands into updateLoadCommands

jhenderson added inline comments.Jul 6 2020, 1:06 AM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
161

Nit: remove blank line at start of function.

217

There are two options. Either a) rename updateLoadCommands to something more generic (e.g. processLoadCommands, in which case I'd ensure all load command processing is done in that function), or b) think of it purely in conceptual terms where the load commands in the function name refers to the set of load commands, rather than each individual load command, if that makes sense. Thus you update that set by adding/removing elements, and also changing the existing elements.

sameerarora101 marked 3 inline comments as done.Jul 6 2020, 12:37 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
217

I like processLoadCommands too. Updated (I can change it to something else too?)

sameerarora101 marked an inline comment as done.

updateLoadCommands -> processLoadCommands

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
217

+1

This revision was automatically updated to reflect the committed changes.
smeenai added inline comments.Jul 6 2020, 3:27 PM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
65–66

Got it. RPathsToRemove is a DenseSet though, and I don't believe that guarantees that iteration order will be the same as insertion order (or even that iteration order will be deterministic); you'd need a SetVector for that.

sameerarora101 reopened this revision.Jul 6 2020, 4:12 PM
This revision is now accepted and ready to land.Jul 6 2020, 4:12 PM

updating std::tie(Old, New) = OldNew; as it was breaking on Darwin

sameerarora101 marked an inline comment as done.Jul 6 2020, 4:13 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
198–199

this is the lates update. Would it work on Darwin? thanks

smeenai accepted this revision.Jul 6 2020, 6:19 PM
smeenai added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
198–199

Yup, that builds. I believe this is a libc++ bug though, and I commented on https://bugs.llvm.org/show_bug.cgi?id=17550#c11

sameerarora101 marked 2 inline comments as done.Jul 6 2020, 8:33 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
198–199

ok, thanks a lot @smeenai . I'll commit this now and keep an eye on other buildbot failures

sameerarora101 marked an inline comment as done.Jul 6 2020, 8:36 PM
This revision was automatically updated to reflect the committed changes.