This is an archive of the discontinued LLVM Phabricator instance.

[llvm-install-name-tool] Add -rpath option
ClosedPublic

Authored by sameerarora101 on Jun 17 2020, 1:18 PM.

Details

Summary

Implement -rpath option for install-name-tool via replacing the old
rpath entry with the new one in place.

Diff Detail

Event Timeline

sameerarora101 created this revision.Jun 17 2020, 1:18 PM
Herald added a project: Restricted Project. · View Herald Transcript

cmdsize update

sameerarora101 marked 2 inline comments as done.Jun 17 2020, 1:34 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
874

Unlike -delete_rpath, we thought it was necessary to check duplication in case of -rpath. This is because it would not be useful to do something like -rpath A B .... -rpath A C or something like -rpath A B ... -rpath B C. Additionally, this also matches XCode's -rpath behavior.

888

This highlights a warning :
warning: lambda templates are only available with -std=c++2a or -std=gnu++2a [-Wpedantic]

Is this fine? Not sure what I should do about it?

sameerarora101 marked an inline comment as done.Jun 17 2020, 1:41 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
874

I can remove this check if people feel the need?

sameerarora101 edited the summary of this revision. (Show Details)Jun 17 2020, 1:42 PM
jhenderson added inline comments.Jun 18 2020, 2:11 AM
llvm/test/tools/llvm-objcopy/MachO/install-name-tool-rpath.test
39–40

Nit: indent the FileCheck line by an additional space, matching e.g. lines 8 and 19.

84

It probably is best to do this after each individual error, rather than once at the end (just to make sure that one doesn't change the binary and the next change it back).

jhenderson added inline comments.Jun 18 2020, 2:11 AM
llvm/test/tools/llvm-objcopy/MachO/install-name-tool-rpath.test
35

Use cp rather than llvm-objcopy to copy the file.

68

You probably want to show the behaviour when different values are used too.

75

Ditto.

77–81

How do you say "RPath" - "ah-path" or something else? If "ah-path", (which is how I read it), the "a" before should be "an".

Also might be worth having another case here where -rpath has no arguments at all.

llvm/tools/llvm-objcopy/CopyConfig.cpp
874

No, it makes reasonable sense to me. I guess if people ware used to XCode's behaviour, it's okay to follow it.

875–876

There's an llvm::find_if which just takes the container. There's an is_contained method too which does std::find + the end check, which might be worth extending to allow it to take a lambda (something like is_contained_if). You might want to do that (in a prerequisite patch), to simplify a lot of this code.

881

adl_end(Config.RPathsToUpdate) -> Config.RPathsToUpdate.end() is more common syntax in LLVM.

888

LLVM is built with C++14 so this will fail the build on most build bots.

I'm actually struggling to follow the logic of this whole section, if I'm honest! If I follow it right, I think you could simplify it to something like below:

if (is_contained(Config.RPathsToRemove), OldRPath)
  return creatStringError(...);
if (is_contained(Config.RPathsToRemove), NewRPath)
  return creatStringError(...);
if (is_contained(Config.RPathsToAdd), OldRPath)
  return creatStringError(...);
if (is_contained(Config.RPathsToAdd), NewRPath)
  return creatStringError(...);

Alternatively, with an is_contained_if you could fold it down to just two cases, one for each of the two existing Config variables.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
139–140

These two lines can be folded together I believe using std::back_insertor in the std::copy line, right?

291

NULL -> nullptr

Same below.

sameerarora101 marked 16 inline comments as done.Jun 18 2020, 8:20 AM
sameerarora101 added inline comments.
llvm/test/tools/llvm-objcopy/MachO/install-name-tool-rpath.test
68

With different values it works fine - added the test now, thanks!

77–81

yup, it should be "an", thanks 😄(added the test too)

llvm/tools/llvm-objcopy/CopyConfig.cpp
875–876

oh thanks, llvm::find_if helps. But, I don't want is_contained because it just returns true or false. I also need access to the element itself in case true is returned (error msg use the element). Therefore, I am explicitly checking whether It != RPaths.end() and using the iterator to access the element if the above condition is true.

If I use is_contained, I would have to again iterate through the vector to find the element, right?

888

ok, i got rid of the lambda. No warnings are thrown (the code is little cleaner too perhaps).

For the pattern,

if (is_contained(Config.RPathsToRemove), OldRPath)
  return creatStringError(...);

the issue is again that I need access to the element (if is_contained is true) to construct the error msg which means iterating the vector again. Instead I am following this pattern:

It = find_if(Config.RPathsToRemove, StringChecker)
if (It != Config.RPathsToRemove.end())
  return createStringError(..<use It here>..);

It = find_if(Config.RPathsToAdd, StringChecker)
if (It != Config.RPathsToAdd.end())
  return createStringError(..<use It here>..);

Is this fine?

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
139–140

So I tried folding the 2 lines into std::copy(NewName.begin(), NewName.end(), std::back_insertor(LC->Payload)); but it fails the rpath test giving the following error: truncated or malformed object (load command 1 extends past end of file).

Surprisingly, even LC->Payload.assign(NewName.begin(), NewName.end()); didn't work (giving the same error as above).

However, the initial approach seems to work lol, not sure what's the difference in behavior from these new approaches?

sameerarora101 marked 4 inline comments as done.
  • Updating test as per comments
  • Cleaning CopyConfig.cpp

auto Arg -> auto *Arg (as per clang-tidy, fails build otherwise).

jhenderson added inline comments.Jun 19 2020, 12:06 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
874–876

Move this down to where it's used for the first time.

875–876

Yeah, that's right. I didn't realise you wanted to use the element, so find_if looks good.

888

Thanks. I find the new code much easier to follow.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
139–140

That error message doesn't really give me enough info to understand what's going on. Did you perhaps write one too many bytes somewhere/not write the null byte? Regarding the copy, you'd also need to clear beforehand, to avoid it just appending the data to the end, so actually assign is probably want you want anyway, but I believe you may need to push a null byte to the end too, i.e.

LC->Payload.assign(NewName.begin(), NewName.end());
LC->Payload.push_back('\0');
sameerarora101 marked 5 inline comments as done.Jun 19 2020, 7:25 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
139–140

Oh I see now why it wasn't working - I didn't align the Payload to a multiple of 8. NewCmd has been aligned to a multiple of 8 as

uint32_t NewCmdsize = alignTo(sizeof(T) + NewName.size() + 1, 8);

So using the following approach

LC->Payload.assign(NewName.begin(), NewName.end());
LC->Payload.push_back('\0');

still won't be sufficient as we need to align it to multiple of 8. (Something like pushing '\0' for NewCmdSize - sizeof(T) - NewName.size() times)

I couldn't find a cleaner way of doing it beside the current implementation itself, what do you think?

sameerarora101 marked an inline comment as done.

Moving StringChecker down before its first use.

LGTM, but I would like @smeenai or @alexshap (or someone else with Mach-O knowledge) to be happy with the logic too.

jhenderson accepted this revision.Jun 22 2020, 1:09 AM
This revision is now accepted and ready to land.Jun 22 2020, 1:09 AM
alexander-shaposhnikov requested changes to this revision.Jun 22 2020, 5:39 PM
alexander-shaposhnikov added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
873
  1. llvm::find_if - since the code is already inside the namespace "llvm" the prefix is unnecessary.
  2. StringRef can be passed by value
  3. You can reuse your code a tiny bit further + I would probably rename some things here:
auto Match  = [=](StringRef RPath) { return RPath == Old || RPath == New; };
    
auto It1 = find_if(Config.RPathsToUpdate, [](const std::pair<StringRef, StringRef> & OldNew) { 
                                      return Match(OldNew.first) || Match(OldNew.second); });
909

Config.RPathsToUpdate.emplace_back(Old, New);

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
135

Different load commands have very different payloads, so i think this function is a bit misleading.
What about this:

static void updateRPathLoadCommand(LoadCommand &LC, StringRef NewRPath) {
    assert( ....  ); // make sure that LC is indeed LC_RPATH;
    ...
}
287
StringRef Old, New;
for (std::tie(Old, New) : Config.RPathsToUpdate) {
    ...
}
294

If I'm not mistaken there are multiple places where this construction is used, so i think we need to factor out it into a helper function.

StringRef getRPath(const LoadCommand &LC) {
    assert(LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH &&
              "LC_RPATH is expected");
    return StringRef(reinterpret_cast<char *>(LC.Payload.data()),
                              LC.Payload.size()).rtrim('\0');
}

Plus I would slightly reorganize the code (see below):

StringRef Old, New;
for (std::tie(Old, New) : Config.RPathsToUpdate) {
     auto It = find_if(Obj.LoadCommands, [](const LoadCommand &LC) { 
                              return LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH &&
                                         getRPath(LC) == Old); });
     if (It == Obj.LoadCommands.end())
         ...
     if (getRPath(*It) == New)
        ...
     updateRPathLoadCommand(*It, New);
}
This revision now requires changes to proceed.Jun 22 2020, 5:39 PM
sameerarora101 marked 8 inline comments as done.Jun 22 2020, 10:20 PM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
873

nice, thanks! 😊

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
135

So it turns out the way to update LC name follows the same logic (as above) for all the options -rpath, -id, and -change for llvm-install-name-tool. Here is how I was thinking about using it :

-rpath : updateLoadCommandName<MachO::rpath_command>(&LC, NewRPath);,

-id : updateLoadCommandName<MachO::dylib_command>(&LC, NewID);

-change : updateLoadCommandName<MachO::dylib_command>(&LC, NewInstallName);

This allowed me to refactor the common logic of updating the name payload of any load command (instead of making it specific for rpath). Would this not be ok?

287

for (std::tie(Old, New) : Config.RPathsToUpdate) is perhaps not allowed : it says for range declaration must declare a variable.

I have updated it to

StringRef OldRPath, NewRPath;
for (auto &RPair : Config.RPathsToUpdate) {
   std::tie(OldRPath, NewRPath) = RPair;
   ...
}

I hope this works?

294

Ok, I refactored getting the RPath construction into a separate function and updated it to be used by add_rpath, delete_rpath and rpath options.

For the reorganization, we need to check that New doesn't exists for any of the LC_RPATH commands. Please correct me in case I am wrong, but it seems that the above suggested code finds an LC_RPATH that matches Old. However, above, we throw an error only when either we can't find such a command (It == Obj.LoadCommands.end()) or if the command's rpath equals the New value(getRPath(*It) == New). But we need to throw the latter error in case any command's value equals New. Here is what I have done now:

StringRef Old, New;
  for (auto &RPair : Config.RPathsToUpdate) {
    std::tie(Old, New) = RPair;

    auto Match = [&Obj] (StringRef RPath) {
      return find_if(Obj.LoadCommands, [=](const LoadCommand &LC) {
        return LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_RPATH &&
               getRPath(LC) == RPath;
      });
    };

    auto NewIt = Match(New);
    if (NewIt != Obj.LoadCommands.end())
      return Error

    auto OldIt = Match(Old);
    if (OldIt == Obj.LoadCommands.end())
        return Error

    updateLoadCommandName<MachO::rpath_command>(&(*OldIt), New);
  }

Would this work? Thanks.

sameerarora101 marked 3 inline comments as done.

Refactoring code under CopyConfig and MachOObjcopy.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
25

nit: static

P.S. but, please, take a look at my comment below (line 278), let's try to understand the broader picture.
It doesn't mean that something is wrong now but it's necessary to think about the best way to organize this code
(to improve readability and avoid code duplication in the future).

135

alright, let's give it a try.

A few nits:

  1. since in this function LC can't be null, it would be better to pass a reference (from the readability perspective)
  2. maybe there is a better name for this function ? it actually updates the payload,

so I'd either use something like updateLoadCommand or updateLoadCommandPayload.

  1. This code requires assert (that the type of the load command is one of those which you listed above)
  2. T should be replaced with something more specific & informative, e.g. LCType
291

I'm not sure that Match is the best possible name here (it's more like findLoadCommand, but this requires more thinking),
but, more importantly, it's necessary to find out how the case of dylib_command differs from rpath_command.
Based on this information one will be able to refactor this code / make an informed decision - should it be generalized or not.

sameerarora101 marked 5 inline comments as done.Jun 23 2020, 8:24 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
25

ok thanks 😊 I have refactored this too we'll be needing it for -change option as well. I have renamed it to getLoadCommandPayload. Is this fine?

291

So it seems that this should not be generalized as -id and -change would not be making use of it. This is because:

  • this current Match function takes in a StringRef RPath and checks for LC such that LC is of type MachO::LC_RPATH and matches this RPath passed in as argument.
  • for -id: I need to search for a LC such that LC.MachOLoadCommand.load_command_data.cmd == MachO::LC_ID_DYLIB. Since there can only be one such LC, we don't need to pass in another StringRef Name as an argument. And so we wouldn't be using the Match function as we don't wish to pass in an argument to it.
  • for -change: (currently I am matching the behavior of this option with that of XCode's). To match XCode's behavior, implementation of this option involves iterating over LCs before iterating over provided (old, new) install name pairs. More precisely,

here is what we are doing here in case of -rpath:

for (RPair : Config.RPathsToUpdate) {.                     // iterate over provided (old,new) pairs
  ...
  find_if( Obj.LoadCommands, ...)                          // iterate over all LCs to find the right LC
  ...

However, the iteration would be reversed in case of -change option as follows:

for (LoadCommand &LC : Obj.LoadCommands) {.                       // iterate over all LCs first
  ...
  for (auto InstallNamePair : Config.InstallNamesToUpdate)        // for each LC now, find the (old,new) pair
  ...

You see how the order of for loops is reversed. This reversed iteration is necessary in case of -change option to match the behavior of XCode's install_name_tool. And because of this, we wouldn't be able to use the Match function above (as that requires taking in a name argument and iterating over LCs after).

In light of this I have currently renamed Match to FindRPathLC (as it is rpath specific). What do you think?

Also. in case we do end up changing -change option's behavior in a later diff, I can then perhaps refactor the common code out later? (For now though, it doesn't seems to be needing refactoring.)

sameerarora101 marked 2 inline comments as done.

Updating MachOObjcopy

sameerarora101 marked an inline comment as done.Jun 23 2020, 8:49 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
25

I haven't added an assert for LC_REEXPORT_DYLIB , LC_LOAD_UPWARD_DYLIB or LC_LAZY_LOAD_DYLIB as llvm-objcopy doesn't support them yet. I can add a TODO comment here if needed?

sameerarora101 edited the summary of this revision. (Show Details)

auto &RPair -> const auto &RPair (for clang-tidy)

@alexshap Please also take a brief look at how updateLoadCommandPayload function is being used again for -id option here: https://reviews.llvm.org/D82410. What do you think?

jhenderson added inline comments.Jun 24 2020, 12:54 AM
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
25

TODO comments for missing functionality is usually reasonable, though I wonder whether we should just have an unsupported error somewhere for such cases.

136–144

Since this assert is common between the two functions, it might be worth pulling it into a simple helper. Something like "isLoadCommandWithPayload" and just call it instead:

assert(isLoadCommandWithPayload(LC && "unsupported load command encountered");
sameerarora101 marked 4 inline comments as done.Jun 24 2020, 7:09 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
25

so there is currently an unsupported error thrown by MachO/MachOLayoutBuilder.cpp for such cases:

...
 case MachO::LC_SOURCE_VERSION:
      // Nothing to update.
      break;
    default:
      // Abort if it's unsupported in order to prevent corrupting the object.
      return createStringError(llvm::errc::not_supported,
                               "unsupported load command (cmd=0x%x)", cmd);
    }
...

But yeah I can add a TODO above as well. Thanks!

136–144

nice, thanks 😊

sameerarora101 marked 2 inline comments as done.

refactoring assert and adding a TODO

jhenderson accepted this revision.Jun 25 2020, 12:54 AM

LGTM again, but please wait for @alexshap.

Rebasing (for harbormaster)

Here is the diff for -change options as well: https://reviews.llvm.org/D82613. Please take a look at how I am using updateLoadCommandPayload for -id and -change. Thanks!

changing name ...Payload(...) -> ...PayloadString(...)

This revision is now accepted and ready to land.Jun 26 2020, 4:32 PM
This revision was automatically updated to reflect the committed changes.
gchatelet added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35

In release mode warning: unused function 'isLoadCommandWithPayloadString'. Maybe you want to guard the function definition?

#ifndef NDEBUG
static bool isLoadCommandWithPayloadString(const LoadCommand &LC) { ... }
#endif
sameerarora101 marked 2 inline comments as done.Jun 29 2020, 7:08 AM
sameerarora101 added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35

Thanks for catching this! I was running my tests with asserts on. I have pushed a fix for the issue here: https://reviews.llvm.org/D82768 Please review.