mt.exe performs a tree merge where certain element nodes are combined into one. This introduces the possibility of xml namespaces conflicting with each other. The original mt.exe has a hierarchy whereby certain namespace names can override others, and nodes that would then end up in ambigious namespaces have their namespaces explicitly defined. This namespace handles this merging process.
Diff Detail
- Build Status
Buildable 9413 Build 9413: arc lint + arc unit
Event Timeline
Before looking into details, there are a few things I want to say.
- This is a large patch but the commit message is too terse. Please write it to describe what this patch does.
- Now you have a lot of functions that are individually guarded by #ifdef's, which is somewhat error-prone. It is probably better to have one larger #ifdef and put all functions that are called only when libxml is available into it.
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h | ||
---|---|---|
38 | This will probably have internal linkage and get duplicated into every TU that includes this header. Can you sink this down into the .cpp file? This is also just static data, so consider making this an array like this: static const std::pair<const char *, const char *> MtNsHrefsPrefixes[] = { .... }; That should avoid some dynamic initialization, while still letting you loop over it and make StringRefs: https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors | |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
64 | I feel like this would be simpler as: // If either pointer is null, use pointer equality. If both are null, they are equal, otherwise they are unequal. if (!A || !B) return A == B; | |
69 | Right now, these functions are externally visible symbols in the global namespace. We should take some steps to hide them from users of LLVM. Any free function that you don't need outside this file should be static: https://llvm.org/docs/CodingStandards.html#anonymous-namespaces | |
121 | Echoing what @ruiu said, I was imagining that we'd have one large ifdef between stub implementations of WindowsManifest::merge etc, and the real implementations and all of the helpers you wrote here that call libxml2. |
Avoid static constructor, remove symbols from global namespace, separate real and stub implementations, simplify string compare.
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h | ||
---|---|---|
58–59 | Perhaps we should hide the implementation details here with the pimpl pattern: http://en.cppreference.com/w/cpp/language/pimpl Then we don't need to include xmlreader.h above behind ifdefs in the header, and clients will not have to worry about possible name or macro conflicts. The "Impl" class would store the vector of merged XML documents. |
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h | ||
---|---|---|
49–51 | Usually we write private members after public members. Consistency matter, so please make new code consistent with existing code in general. | |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
79–80 | Please run clang-format every time you upload a new patch to phab. | |
468–475 | No else after return (apply this rule to other places in this file.) | |
701 | clang-format. |
Mostly style issues. I'd like to see all the char pointers go away in favor of StringRef, unless there's a strong reason why this won't work.
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
53–60 | Why not have this be a vector of pairs of StringRef? We're not modifying anything, and these will be in constant storage, so I think StringRef is fine? Maybe even a std::map<StringRef, StringRef>, which can still use initializer list syntax. | |
69 | It would be nice if we could deal entirely with StringRef and do conversions at the highest level possible so we never have to deal with this stuff. StringRef toStringRef(const unsigned char *Str) { return StringRef(FROM_XML_CHAR(Str)); } static bool isMergeableElement(StringRef S) { constexpr StringRef KnownList[] = { ... }; return llvm::is_contained(KnownList, S); } then the above xmlStringsEqual() also just becomes toStringRef(X) == toStringRef(Y), which would most likely just turn into X == Y because you'll be getting rid of the pointer-ness at the highest level and everything will actually be dealing with StringRef behind the scenes. | |
90–91 | For a concrete example of the previous suggestion: static xmlAttrPtr getAttribute(xmlNodePtr Node, StringRef AttributeName) { for (auto Attribute = Node->properties; Attribute; Attribute = Attribute->next) { if (toStringRef(Attribute->Name) == AttributeName) return true; } return false; } | |
102–103 | namespaceOverrides(StringRef HRef1, StringRef HRef2) | |
111–115 | You can use the range-based llvm::find_if(MtNsHrefsPrefixes, [HRef2]...); Alternatively, if my initial suggestion of making it a map works, you could just write: auto HRef2Position = MtNsHrefsPrefixes.find(HRef2); |
I agree changing to StringRef would be good for consistency. However there are a couple reasons I see not to. One is just simply the string type libxml2 uses is unsigned char *, so keeping as such whenever possible reduces the number of conversions. Second, my merging implementation allows nullptr to be a valid string value, and is used to designate the prefix of a default xml namespace, and libxml also uses this representation. Therefore we have things like making nullptr comparisons, which StringRef explicitly disallows.
llvm/include/llvm/WindowsManifest/WindowsManifestMerger.h | ||
---|---|---|
49–51 | Sorry, this was leftover from a previous version where I had inline forwarding calls in the header, so I needed the implementation class defined prior. | |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
79–80 | I had run clang-format before uploading, however I believe the problem is I was diffing against the wrong previous revision so not all my changes were covered. | |
468–475 | also changed 2 other places |
Publishing comments about coding style first. I'll do real code review later.
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
82–85 | I think I wrote this before, but I believe we usually avoid this style for (...) if (...) { } You want to either use this style for (...) { if (...) { } } or for (...) if (...) .... | |
479–481 | if (foo) return true; return false; is equivalent to return foo; | |
489–490 | else after return |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Since HRef1Position and HRef2Position are iterators for the same map, you can compare them directly, can't you? |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
132–136 | Sorry for picking up styles, but even in the same patch, you are using two styles. Here, you are doing for (...) { if (...) { } } but in the above function you did for (...) { if (...) ... } Consistency matters because it helps people who read your code, so please pay a bit more attention to consistency. It is also a good practice to read your code yourself several times after you think you "finish" your patch, as there's always something you can improve as a finishing touch. It will shorten the time for code review as (1) you will unlikely to get nitpicking comments on style and (2) consistent coding style makes code reviewers more confident that you know what you are doing. | |
146–152 | Maybe a better way of writing this is to use assignments in ifs. if (xmlNsPtr Def = search(HRef, Node)) return Def; if (xmlNsPtr Def = xmlNewNs(Node, HRef, getPrefixForHref(HRef))) return Def; return make_error<WindowsManifestError>("failed to create new namespace"); | |
173 | Remove this blank line. | |
218 | It's good to do continue here, as early continues are preferred. | |
235 | Ditto | |
349–350 | Ditto |
Only 1 nitpick, I think reid and rui covered the rest pretty well.
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
659–660 | !llvm::is_contained(RequiredPrefixes, Def) will allow you to get this all on a single line. Similar applies to other places in the file. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
659–660 | Also, invert the conditional first and then put a continue. if (!Def->prefix || llvm::is_contained(RequiredPrefixes, Def)) { Prev = Def; continue; } ... This keeps the indentation down and makes it easier to read. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
82–85 | Ah okay....I think I've heard other variations of a similar principle, but not this exact style rule. I'll change this everywhere. | |
110–112 | the only way to directly compare iterators is using std::distance. However, behavior is undefined if the 2nd iterator is not reachable by incrementing the first iterator, which will often be the case for our usage. Therefore, the only way to correctly compare is to first find each one's absolute position in the map, which I do here. | |
132–136 | I understand, consistency is important. I have been reading the code myself before sending it out for review, but I guess there are certain areas I need to focus on more. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Maps are ordered, so why not just use the ordering function directly? return FROM_XML_CHAR(HRef1) > FROM_XML_CHAR(HRef2)? |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Unfortunately HRef1 and HRef2 do not point within the table, but rather xmlChar arrays allocated by libxml2's parser, so we can't compare those. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | I think I'm still mis-understanding something. The code here is looking both entries up by key, then after determining that both keys are present, it then computes each iterator's distance from the end. And if the first one is farther from the end (e.g. closer to the beginning) than the second one, it returns true, otherwise it returns false. But when you iterate a map, it iterates in key order. So saying "the first one is closer to the beginning" is identical to saying "the first one is less than the second one". So as long as you use the same comparison function that the map uses internally to order keys, then it should be the same. And in this case, the map seems to be using a string comparison. So it seems like all you have to do is string compare the two arguments and return true if the first one is less than the second one? |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | I'm no longer using a map, but rather a vector of pairs, therefore the keys stay in the same order as they were put in the definition. No comparison operator is used on the StringRefs. We purely want to find the index in the vector and compare them. So using StringRef / string comparison will not work here. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Ahh, I see. I didn't realize we were no longer using a map. In that case, why doesn't return Iter1 < Iter2; work? vector::iterator is a random accessor iterator, which supports relational operators. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | find() in this case returns type "std::_Rb_tree_const_iterator" which does not support comparison. I think it is because it is a vector of pairs, so this iterator is used? I feel like a complicated template specification is required to get it to return vector::iterator |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Something doesn't seem right. If it's std::_Rb_tree_const_iterator, then this can't possibly be a vector of pairs, it's a std::map<StringRef, StringRef>. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
110–112 | Ah shoot my apologies, I was looking at the wrong revision. It indeed uses a map. Actually, this seems to be a bug I inadvertantly created, because I assume the sort will put it in alphabetical order, when really it should stay in the same order. I will change to a vector and use position comparison on iterators. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
102 | Overall, you can simplify this function as auto It1 = llvm::find_if( MtNsHrefsPrefixes, [=](const std::pair<StringRef, StringRef> &element) { return xmlStringsEqual(HRef1, TO_XML_CHAR(element.first.data())); }); auto It2 = llvm::find_if( MtNsHrefsPrefixes, [=](const std::pair<StringRef, StringRef> &element) { return xmlStringsEqual(HRef2, TO_XML_CHAR(element.first.data())); }); return It1 < It2; | |
106 | Coding style issue: element -> Element | |
113 | Instead of explicitly capturing variables, use [=] or [&]. element -> Element | |
116–118 | I believe you can just remove if (HRef2Position == std::end(MtNsHrefsPrefixes)) return true; because if HRef2Position is std::end(MtNsHrefsPrefixes), expression HRef1Position < HRef2Position is always true in this context. | |
218 | It's not done. | |
349–350 | Not done. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
218 | What I was saying is that, if you have this code for (...) { if (...) { if (...) { ... } else { ... } } else if (...) { ... } else if (...) { ... } } then you can rewrite it with early continues like this: for (...) { if (...) { if (...) { ... continue; } ... continue; } if (...) { ... continue; } ... } |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
218 | ah okay. As of the most recent revision this criteria should be fulfilled. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
221 | Well, you want to remove continue, as I described in the previous comment. (Please fix not only this place but other places as well.) |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
221 | wait I'm confused, I thought we want to add the early continues? |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
221 | Ah sorry, I meant not continue but else. |
I think this is pretty close to ready. How do other reviewers feel?
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
116 | typo on "recursively" | |
221 | Most of these if / else branches are about equivalent in terms of complexity. I'd actually suggest removing all the continues that got added (except the !Attribute->ns check). Most of the continues in the current code are redundant, because they'd hit the end of the loop naturally anyway. |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
221 | I think there's a small benefit in terms of readability |
llvm/lib/WindowsManifest/WindowsManifestMerger.cpp | ||
---|---|---|
274–276 | No else after return | |
llvm/test/tools/llvm-mt/big_merge.test | ||
5–8 | Maybe it is better to write /manifest <path1> \ /manifest <path2> \ ... instead of <path1> /manifest \ <path2> /manifest \ ... because paths are arguments for /manifest. | |
llvm/tools/llvm-mt/llvm-mt.cpp | ||
106–107 | !A && !B is preferred over !(A || B). But do you have to do this way? I think usually we handle OPT_UNKNOWN to report unknown options instead of putting all known options to some groups. Look at https://github.com/llvm-project/llvm-project-20170507/blob/dd24779ef3e3e592cc0c7561a2fb1fbe3309fa1c/lld/ELF/DriverUtils.cpp#L112 for example. |
llvm/tools/llvm-mt/llvm-mt.cpp | ||
---|---|---|
106–107 | I have tried this....however for some reason OPT_UNKNOWN never filters any arguments for me. It seems for some reason the invalid flags are recognized as inputs, not flags. I'm not sure if I'm setting up the options incorrectly or if this is a bug in the option library.....also not sure if this is worth the effort to fix. |
llvm/tools/llvm-mt/llvm-mt.cpp | ||
---|---|---|
106–107 | Ah, okay, I think I know why it behaves like that. Unlike Unix, Windows command line options start with '/', and the character is ambiguous as to whether it is a path (e.g. /bin/ls) or an option (e.g. /manifest). LLVM's option parser's behavior is that, if an option "/foo" is defined, "/foo" is handled as an option. Otherwise, "/foo" is handled as a path. This is an unavoidable consequence if you allow both Unix-style paths and Windows-style command line options. Since that's what we want, it is okay. I think Group<supported> does not do anything. I believe your llvm-mt command still does not warn on unknown options. Please try llvm-mt /foo, for example. My guess is that it will warn that it cannot find file "/foo". You may want to just remove Group<supported> from Options.td and also remove this piece of new code. |
Reformat test.
llvm/tools/llvm-mt/llvm-mt.cpp | ||
---|---|---|
106–107 | I think you're right. It also seems to be somewhat of a bug with the option parsing library, OPT_UNKNOWN never seems to do anything. I just tried with lld-link and it seems unable to detect invalid flags, complaining rather that it can't open the given file. So that line in DriverUtils.cpp doesn't actually do anything. My Group<supported> example does work however. For example, running llvm-mt /foo does indeed detect the invalid option given, instead of interpreting it as an input file. |
llvm/tools/llvm-mt/llvm-mt.cpp | ||
---|---|---|
106–107 | OPT_UNKNOWN does work if you pass an argument with - prefix. Try -foo for example. I think it'll be handled as an OPT_UNKNOWN option. I believe that your Group<supported> doesn't really work. Well, it's working in some sense, but this is the same as iterating over OPT_INPUT, no? I mean, I believe your code is the same as for (auto *Arg : Args.filtered(OPT_INPUT)) reportError(Twine("invalid option ") + Arg->getSpelling()); |
This will probably have internal linkage and get duplicated into every TU that includes this header. Can you sink this down into the .cpp file?
This is also just static data, so consider making this an array like this:
That should avoid some dynamic initialization, while still letting you loop over it and make StringRefs: https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors