This is an archive of the discontinued LLVM Phabricator instance.

[Support] Upstream anonymization and manipulating of BCSymbolMaps
AbandonedPublic

Authored by JDevlieghere on Apr 22 2018, 8:56 AM.

Details

Summary

This patch upstreams support for Apple's implementation of anonymization.
The code is used internally to read and write BCSymbolMaps in the embedded
bitcode workflow. We need this logic upstream in order to upstream the part of
dsymutil that deals with dSYMs generated for bitcode-enabled App Store builds.

If there's interest from the community I'd be more than happy to upstream the
anonymization pass as well.

This is a slightly modified version of the code originally written by Steven Wu.

Diff Detail

Event Timeline

JDevlieghere created this revision.Apr 22 2018, 8:56 AM
dexonsmith requested changes to this revision.Apr 22 2018, 4:37 PM

Thanks for working on this! I've just done a pass through for style, and I have a few nits to pick.

If there's interest from the community I'd be more than happy to upstream the
anonymization pass as well.

I don't see any reason to hold it back...

llvm/include/llvm/Support/Anonymization.h
36

Why is Alloc() listed here?

59

Not sure this comment adds any value.

62

Should this have three slashes (///) so doxygen understands it?

69

Should this be a doxygen comment? Also, it should likely start witha capital and end with a period.

73–79

Should this be a doxygen comment? Same with comments that follow.

80

I feel like this should have names for the parameters. It's not obvious from the declaration how this API is used.

82

Misspelled "implementation"; also, it's not clear to me that this comment adds value.

83

Should this have a name for the parameter?

87–88

I think this should be a doxygen comment.

96

I think I prefer = default to {}.

105

This won't serve as a vtable anchor for Anonymizer, because it's pure virtual. I think you should give it an implementation in the source file.

119

Since IR is an acronym, this seems better to me as IRPrefix.

120

I see no reason to list Anonymizer() here.

130

Variable name should start with a capital.

133–137

I would have expected Prefix and Suffix, since "pre" and "suf" don't seem entirely obvious, and IRPrefix and IRNum, since "ir" is an acronym.

llvm/lib/Support/Anonymization.cpp
22

reverse and keepPrefix should be Reverse and KeepPrefix.

This revision now requires changes to proceed.Apr 22 2018, 4:37 PM
aprantl added inline comments.Apr 23 2018, 9:01 AM
llvm/include/llvm/Support/Anonymization.h
42

cann you add an anonymous enum { keepPrefix = true; };

46

Reverse

59

///

Thanks for working on this Jonas. Duncan and Adrian has already covered pretty much all I want to say. A small additional comments.

llvm/include/llvm/Support/Anonymization.h
57

With the current LLVM infrastructure, it would be good to return Error to deal with the incompatible format.

llvm/unittests/Support/AnonymizationTest.cpp
15

It would be good to test looking up index doesn't exist or serialization of the reverse map as well.

zturner added inline comments.
llvm/include/llvm/Support/Anonymization.h
11

It's not entirely clear what "anonymization" means. Can you improve this comment here?

Also, the summary said it's needed for dsymutil. Does it need to be in Support?

JDevlieghere marked 19 inline comments as done.

Address feedback from Duncan, Adrian and Steven. Thanks everyone!

JDevlieghere marked an inline comment as done.Apr 26 2018, 9:26 AM
JDevlieghere added inline comments.
llvm/include/llvm/Support/Anonymization.h
11

I hope it will become more clean once I have a diff for the module pass. The latter is also the reason this lives in support, we need it there as well as in dsymutil.

JDevlieghere marked an inline comment as done.May 10 2018, 8:15 AM

ping

steven_wu accepted this revision.May 10 2018, 8:27 AM

Are you planning to upstream the module pass to justify this to be part of Support library?

Otherwise, LGTM.

Are you planning to upstream the module pass to justify this to be part of Support library?

Otherwise, LGTM.

Thanks Steven. Definitely, I just want to make sure everyone is okay with the API before starting on that :-)

dexonsmith added inline comments.May 10 2018, 8:38 AM
llvm/include/llvm/Support/Anonymization.h
32

Can you explain how?

55

s/textural/textual/

107

It's not really clear what the difference is between the two maps. When should a derived class use which? (Can you add comments, either here or in the class comment, explaining?)

It's also a little strange that their names distinguish whether they are "reversible", but I don't see anything in the base class about a reverse map. Should the reverse map be moved to here?

108

Should this be IrreversibleForwardMap?

144

Since "ir" is an acronym, can we make this IRNum? That will match IRPrefix as well.

llvm/lib/Support/Anonymization.cpp
21

s/KeepPrefixfix/KeepPrefix/

28–29

It's unfortunate to hit the StringMap twice in a row like this. Is that necessary?

28–32

This code looks identical in structure to the code following for the other map. Can it be factored out into a private function that takes a MapTy&?

31–32

It might be simpler to skip the local variable:

return ForwardMap[S] = anonymizeImpl(...)
43–45

Hitting the map twice seems unnecessary here.

dexonsmith added inline comments.May 10 2018, 9:18 AM
llvm/include/llvm/Support/Anonymization.h
151

The virtual seems unnecessary here, since we have override. I did a quick grep of llvm/include/llvm and other headers seem to agree with me.

llvm/lib/Support/Anonymization.cpp
54

Another "fixfix" typo here.

59

Missing period at the end of the sentence. Also, I think "If S" would be more clear (but if you disagree, please make it "If the symbol").

68–71

Is there a reason this isn't inside the previous if statement? I wonder if the assertion would be obvious enough to skip in that case.

69

It's not obvious why it's reasonable to copy S here, rather than uniquing in a StringSet. Can you add a comment explaining why this won't allocate too many strings?

96

It looks like this could be const MemoryBuffer &, since Buffer isn't modified, and you assume it isn't nullptr. Is there something I'm missing from the other (yet-to-be-upstreamed) derived class?

Also a nit: it seems like the llvm:: qualifier is unnecessary.

138

Should this be "IndexLength"?

More generally, should this really be:

StringRef Index = Key.slice(...);

?

llvm/unittests/Support/AnonymizationTest.cpp
207

The .operator bool() seems funny. Does EXPECT_TRUE(E) not work? If not, I think EXPECT_TRUE(bool(E)) would be clearer than what's currently there.

211

It might be nice to add a comment saying that this is checking for version "1.0".

JDevlieghere abandoned this revision.Jan 28 2019, 3:56 PM

Similar functionality (limited to dsymutil) was landed a while ago.