Merge -rpath option within the same loop as -id and
-change. This also simplifies -add_rpath.
Added another test case for -add_rpath to test the
new implementation. Depends on D82613
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 | NOTE: Cannot store StringRef in OriginlRPaths (i.e cannot have set<StringRef> OriginalRPaths). This is because the StringRef CurrentRPath is a reference to the string stored inside LC and thus when we change the payload (via -rpath option) CurrentRPath is also updated to hold a reference to the new string. Hence I need to store the actual strings inside the set. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 | nit: are these braces intentional? It's inconsistent with the other cases. | |
347 | I think you have some options available depending on how clever you want to get with sets if you care about minimizing string cloning. Instead of tracking original rpaths, you can track operations on the config rpaths, for example (those strings shouldn't change unexpectedly beneath you, right)? std::set<StringRef> UpdatedRPaths; case Macho::LC_RPATH: StringRef CurrentRPath = ... auto FoundArg = ... if (FoundArg != end) { if (CurrentRPath in UpdatedRPaths) { // duplicate scenario } if (CurrentRPath in RPathToAdd) { // Also a dup scenario } updateLoadCommandPayloadString() UpdatedRPaths.insert(FoundArg->first); } ... for (const auto &OldNew : Config.RPathsToUpdate) { std::tie(Old, New) = OldNew; if (Old not in UpdatedMap) { // no load command scenario } // dup scenario handled above } ... std::set<StringRef> AddedRPaths for (StringRef RPath : Config.RPathToAdd) { if (RPath in AddedRPaths) { // dup } AddedRPaths.insert(RPath); } |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 | You probably don't want a std::set here. Have you reviewed the Programmer's Manual on what set type to use? | |
346 | I think the braces are required because of the declaration of CurrentRPath in this case statement. | |
353 | I think the break should be inside the braces. It looks weird where it is. | |
358–362 | I'd be tempted to move this code above the AddSection code so that all rpath related stuff is together. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 |
static Error updateLoadCommands(const CopyConfig &Config, Object &Obj) handleArgs would invoke this function (updateLoadCommands) similarly to removeSections, updateAndRemoveSymbols etc. Probably it would be even better first to refactor the code this way (on a separate diff) before moving forward if you don't mind
(in particular, if I am not mistaken they would help you avoid some str() calls below) Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 |
+1, although does StringMap have deterministic iteration? Maybe SetVector would work as well? |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 | yup, that is indeed the case. | |
346 | yup, I have replace it with a DenseSet<StringRef> | |
346 |
I agree, but I think I should do that in a separate diff considering that it would require changing the way I parse options in CopyConfig.cpp. What do you think? | |
347 | Thanks a lot for the clever tip, UpdatedRPaths.insert(FoundArg->first) was really helpful. However, I am not sure if I understood the logic completely (sorry if I missed something 😔). In particular, say I have We want this to throw an error but it is currently passing with the above logic. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
346 |
Definitely do that in a separate diff -- it's much better to keep refactoring changes like that confined to their own diff instead of mixed up with new feature work. | |
347 | Ah, my pseudocode is bad, sorry! You already fixed the crux of the problem I was talking about with the DenseSet<StringRef; no need to worry about my idea. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
163 | As recommended by @alexshap it is clearer to understand (and the code is cleaner too) if iterate through the LCs twice + use Maps (DenseMaps) down below. |
I've added a few comments but other than that this version looks good to me
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
161 | nit: what about renaming OriginalRPaths -> RPaths ? | |
163 | yeah, thanks! | |
196 | nit: FoundArg doesn't seem to be a very descriptive name StringRef RPath = getPayloadString(LC); StringRef NewRPath = Config.RPathsToUpdate.lookup(RPath); ... | |
207 | StringRef InstallName = getPayloadString(LC); NewInstallName = Config.InstallNamesToUpdate.lookup(InstallName) |
llvm/tools/llvm-objcopy/CopyConfig.h | ||
---|---|---|
181–183 | Would StringMap not work instead of DenseMap? | |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
163 | Nit here and below: you need full stops at the end of your comments. | |
164–167 | I thought you were avoiding iterating twice over the LoadCommands before? Why are this loop and the update loop below separate? |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
164–167 | @jhenderson - I think this was my suggestion, imo splitting these steps makes the code easier to reason about. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
164–167 | Also if I'm not mistaken it has enabled @sameerarora101 to move error reporting (lines 171 -180) earlier, before any changes / updates are made. This makes sense to me. |
llvm/tools/llvm-objcopy/CopyConfig.h | ||
---|---|---|
181–183 | I think it would work too. However, if I am not mistaken, while comparing StringSet and DenseSet in a previous diff, @smeenai suggested that StringSet takes ownership of its strings (and copies them), which would be unnecessary copying (and so DenseSet would be better). I thought it would be the same case for StringMap vs DenseMap and so I implemented with DenseMap. What do you think? |
LGTM.
llvm/tools/llvm-objcopy/CopyConfig.h | ||
---|---|---|
181–183 | Makes sense. I haven't looked at the data structures themselves. FWIW, I don't think taking ownership is necessarily an issue, since there aren't going to be significant numbers of options so the string copying isn't going to be significant. |
llvm/tools/llvm-objcopy/CopyConfig.cpp | ||
---|---|---|
886 | The pointer returned by calling .data() on a StringRef might not be null-terminated, so I don't think this is safe. There's a form of createStringError that takes a twine instead of a format string, which is more efficient than the intermediate allocations associated with .str().c_str(). Given that this is an error path, efficiency really doesn't matter though :) The twine form should get implicitly used if you do something like "cannot specify both -rpath " + It1->getFirst() + " " + It1->getSecond() + " and -rpath " + Old + " " + New You could also get fancy and use the * precision specifier to specify the string length manually, instead of constructing the intermediate std::string. See https://en.cppreference.com/w/cpp/io/c/fprintf for details about *. Again though, this is purely academic; efficiency doesn't really matter that much for error paths. In other words, just doing .str().c_str() as before should be fine. | |
llvm/tools/llvm-objcopy/CopyConfig.h | ||
181–183 | Yeah, it's hard to say in general. From my understanding of the structures (and it's been a bit since I looked at them, so I may very well be mistaken):
Over here, given the expected size of these things, I don't think it will end up mattering at all, but it is interesting to think through :) | |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
65–66 | Not related to this diff, but something I noticed: given that RPathsToRemove starts out as a copy of Config.RPathsToRemove and you're just removing elements from it, can't you just error out for any remaining entries in RPathsToRemove instead? | |
170 | Can Old and New be declared inside the loop? | |
217 | Nit: do we want to be adding new load commands in a function called updateLoadCommands? At least to me that function name seems like it should only be updating existing load commands, since we have a separate removeLoadCommands to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate addLoadCommands function. | |
346 | Good point about the deterministic iteration. Given that duplicate -rpath commands are an error instead of the last one winning out, I think non-deterministic iteration should be okay, though I haven't thought about it too hard. |
llvm/tools/llvm-objcopy/CopyConfig.cpp | ||
---|---|---|
886 | if I am not mistaken in this particular case they are null-terminated (see e.g. https://reviews.llvm.org/D81575), | |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
217 | so basically the idea was to group together logical pieces of handleArgs (to some reasonable extent). |
llvm/tools/llvm-objcopy/CopyConfig.cpp | ||
---|---|---|
886 | I see, thanks! I have switched it to the twine format as suggested. | |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
65–66 | So the idea here was to raise an error for the first specified RPath that doesn't exist. For that purpose, we iterate through Config.RPathsToRemove and raise an error for the first RPath not deleted. | |
217 | @alexshap Instead of inlining the whole removeLoadCommands inside updateLoadCommands I think it would cleaner if I just call // Remove LCs. if (Error E = removeLoadCommands(Config, Obj)) return E; from inside updateLoadCommands. This can allow for independent development of removeLoadCommands in future as well. What do you think? (I have updated the current diff with this change, however, I can update it again in case we want something else) |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
217 | I'm not sure that removeLoadCommands is realistically independent from updateLoadCommands, e.g. the order in which you modify the list of load commands appears to be important. Since it's small (~10 lines) it seems preferable to avoid creating this weird asymmetry between removing/adding. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
217 | I see. Ok, I have inlined removeLoadCommands into updateLoadCommands then. Thanks 😊 |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
161 | Nit: remove blank line at start of function. | |
217 | There are two options. Either a) rename updateLoadCommands to something more generic (e.g. processLoadCommands, in which case I'd ensure all load command processing is done in that function), or b) think of it purely in conceptual terms where the load commands in the function name refers to the set of load commands, rather than each individual load command, if that makes sense. Thus you update that set by adding/removing elements, and also changing the existing elements. |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
217 | I like processLoadCommands too. Updated (I can change it to something else too?) |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
217 | +1 |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
198–199 | this is the lates update. Would it work on Darwin? thanks |
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp | ||
---|---|---|
198–199 | Yup, that builds. I believe this is a libc++ bug though, and I commented on https://bugs.llvm.org/show_bug.cgi?id=17550#c11 |
Would StringMap not work instead of DenseMap?